[gmx-users] jobs failed

Peter C. Lai pcl at uab.edu
Thu Apr 5 20:28:36 CEST 2012


On 2012-04-05 04:15:00PM +0200, Albert wrote:

Your box is violent shrinking too much for the domain decomposition routines
to handle. What ensemble and integrator are you running? If normal md, then
are you trying to NPT before NVT equilibration?

> Hello:
>    I am using the following script to run Gromacs in cluster, but it failed:
> 
> # @ job_name = bm
> # @ class = kdm-large
> # @ error = gromacs.info
> # @ output = gromacs.out
> # @ environment = COPY_ALL
> # @ wall_clock_limit = 10:00:00
> # @ notification = error
> # @ job_type = bluegene
> # @ bg_size = 64
> # @ queue
> mpirun -exe /opt/gromacs/4.5.5/bin/mdrun_mpi_bg -args "-v -s md.tpr -o 
> md.trr -cpo md.cpt -c md.gro -g md-out.log -launch" -mode VN -np 256
> 
> 
> and here is the log file
> 
> 
> Back Off! I just backed up md-out.log to ./#md-out.log.1#
> Getting Loaded...
> Reading file md.tpr, VERSION 4.5.5 (single precision)
> Loaded with Money
> 
> 
> Will use 192 particle-particle and 64 PME only nodes
> This is a guess, check the performance at the end of the log file
> Making 3D domain decomposition 8 x 4 x 6
> 
> Back Off! I just backed up md.trr to ./#md.trr.2#
> 
> Back Off! I just backed up traj.xtc to ./#traj.xtc.3#
> 
> Back Off! I just backed up ener.edr to ./#ener.edr.3#
> 
> WARNING: This run will generate roughly 3302 Mb of data
> 
> starting mdrun 'BmEH-complex-POA in water'
> 50000000 steps, 100000.0 ps.
> step 0
> 
> NOTE: Turning on dynamic load balancing
> 
> vol 0.41  imb F 18% pme/F 0.61 step 100, will finish Tue Apr 17 13:49:51 
> 2012
> vol 0.42  imb F 12% pme/F 0.60 step 200, will finish Sun Apr 15 23:46:30 
> 2012
> vol 0.44  imb F 12% pme/F 0.57 step 300, will finish Sun Apr 15 12:20:49 
> 2012
> vol 0.45  imb F 12% pme/F 0.58 step 400, will finish Sun Apr 15 07:01:25 
> 2012
> vol 0.48  imb F 12% pme/F 0.57 step 500, will finish Sun Apr 15 03:46:13 
> 2012
> vol 0.49! imb F 11% pme/F 0.57 step 600, will finish Sun Apr 15 01:43:05 
> 2012
> vol 0.46! imb F 10% pme/F 0.59 step 700, will finish Sun Apr 15 00:01:14 
> 2012
> vol 0.42! imb F 10% pme/F 0.58 step 800, will finish Sat Apr 14 22:56:06 
> 2012
> vol 0.45! imb F 12% pme/F 0.56 step 900, will finish Sat Apr 14 22:16:49 
> 2012
> vol 0.46! imb F 10% pme/F 0.57 step 1000, will finish Sat Apr 14 
> 21:49:10 2012
> vol 0.46! imb F  9% pme/F 0.58 step 1100, will finish Sat Apr 14 
> 21:26:04 2012
> vol 0.47! imb F 10% pme/F 0.57 step 1200, will finish Sat Apr 14 
> 21:02:35 2012
> vol 0.45  imb F  9% pme/F 0.58 step 1300, will finish Sat Apr 14 
> 20:34:22 2012
> vol 0.45! imb F  9% pme/F 0.58 step 1400, will finish Sat Apr 14 
> 20:15:54 2012
> vol 0.48! imb F 11% pme/F 0.57 step 1500, will finish Sat Apr 14 
> 20:07:48 2012
> vol 0.47! imb F 10% pme/F 0.58 step 1600, will finish Sat Apr 14 
> 19:57:46 2012
> vol 0.47! imb F 13% pme/F 0.58 step 1700, will finish Sat Apr 14 
> 19:51:47 2012
> vol 0.45! imb F 11% pme/F 0.58 step 1800, will finish Sat Apr 14 
> 19:44:37 2012
> vol 0.46! imb F 13% pme/F 0.57 step 1900, will finish Sat Apr 14 
> 19:37:10 2012
> vol 0.50! imb F 12% pme/F 0.58 step 2000, will finish Sat Apr 14 
> 19:29:20 2012
> vol 0.50! imb F 12% pme/F 0.58 step 2100, will finish Sat Apr 14 
> 19:23:00 2012
> vol 0.48  imb F 10% pme/F 0.57 step 2200, will finish Sat Apr 14 
> 19:15:43 2012
> vol 0.50! imb F 11% pme/F 0.57 step 2300, will finish Sat Apr 14 
> 19:13:30 2012
> vol 0.49! imb F 11% pme/F 0.57 step 2400, will finish Sat Apr 14 
> 19:10:14 2012
> vol 0.48  imb F 10% pme/F 0.58 step 2500, will finish Sat Apr 14 
> 19:01:51 2012
> vol 0.47! imb F 12% pme/F 0.58 step 2600, will finish Sat Apr 14 
> 18:55:11 2012
> vol 0.48! imb F 11% pme/F 0.58 step 2700, will finish Sat Apr 14 
> 18:49:47 2012
> vol 0.46! imb F 12% pme/F 0.58 step 2800, will finish Sat Apr 14 
> 18:45:32 2012
> 
> -------------------------------------------------------
> Program mdrun_mpi_bg, VERSION 4.5.5
> Source code file: ../../../src/mdlib/domdec.c, line: 2633
> 
> Fatal error:
> Step 2850: The domain decomposition grid has shifted too much in the 
> Z-direction around cell 5 0 2
> 
> For more information and tips for troubleshooting, please check the GROMACS
> website at http://www.gromacs.org/Documentation/Errors
> -------------------------------------------------------
> 
> "Don't Push Me, Cause I'm Close to the Edge" (Tricky)
> 
> Error on node 162, will try to stop all the nodes
> Halting parallel program mdrun_mpi_bg on CPU 162 out of 256
> 
> gcq#8: "Don't Push Me, Cause I'm Close to the Edge" (Tricky)
> 
> Abort(-1) on node 162 (rank 162 in comm 1140850688): application called 
> MPI_Abort(MPI_COMM_WORLD, -1) - process 162
> <Apr 05 13:15:36.667617> BE_MPI (ERROR): The error message in the job 
> record is as follows:
> <Apr 05 13:15:36.667681> BE_MPI (ERROR):   "killed with signal 6"
> 
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

-- 
==================================================================
Peter C. Lai			| University of Alabama-Birmingham
Programmer/Analyst		| KAUL 752A
Genetics, Div. of Research	| 705 South 20th Street
pcl at uab.edu			| Birmingham AL 35294-4461
(205) 690-0808			|
==================================================================




More information about the gromacs.org_gmx-users mailing list