[gmx-users] jobs failed

Albert mailmd2011 at gmail.com
Thu Apr 5 16:15:00 CEST 2012


Hello:
   I am using the following script to run Gromacs in cluster, but it failed:

# @ job_name = bm
# @ class = kdm-large
# @ error = gromacs.info
# @ output = gromacs.out
# @ environment = COPY_ALL
# @ wall_clock_limit = 10:00:00
# @ notification = error
# @ job_type = bluegene
# @ bg_size = 64
# @ queue
mpirun -exe /opt/gromacs/4.5.5/bin/mdrun_mpi_bg -args "-v -s md.tpr -o 
md.trr -cpo md.cpt -c md.gro -g md-out.log -launch" -mode VN -np 256


and here is the log file


Back Off! I just backed up md-out.log to ./#md-out.log.1#
Getting Loaded...
Reading file md.tpr, VERSION 4.5.5 (single precision)
Loaded with Money


Will use 192 particle-particle and 64 PME only nodes
This is a guess, check the performance at the end of the log file
Making 3D domain decomposition 8 x 4 x 6

Back Off! I just backed up md.trr to ./#md.trr.2#

Back Off! I just backed up traj.xtc to ./#traj.xtc.3#

Back Off! I just backed up ener.edr to ./#ener.edr.3#

WARNING: This run will generate roughly 3302 Mb of data

starting mdrun 'BmEH-complex-POA in water'
50000000 steps, 100000.0 ps.
step 0

NOTE: Turning on dynamic load balancing

vol 0.41  imb F 18% pme/F 0.61 step 100, will finish Tue Apr 17 13:49:51 
2012
vol 0.42  imb F 12% pme/F 0.60 step 200, will finish Sun Apr 15 23:46:30 
2012
vol 0.44  imb F 12% pme/F 0.57 step 300, will finish Sun Apr 15 12:20:49 
2012
vol 0.45  imb F 12% pme/F 0.58 step 400, will finish Sun Apr 15 07:01:25 
2012
vol 0.48  imb F 12% pme/F 0.57 step 500, will finish Sun Apr 15 03:46:13 
2012
vol 0.49! imb F 11% pme/F 0.57 step 600, will finish Sun Apr 15 01:43:05 
2012
vol 0.46! imb F 10% pme/F 0.59 step 700, will finish Sun Apr 15 00:01:14 
2012
vol 0.42! imb F 10% pme/F 0.58 step 800, will finish Sat Apr 14 22:56:06 
2012
vol 0.45! imb F 12% pme/F 0.56 step 900, will finish Sat Apr 14 22:16:49 
2012
vol 0.46! imb F 10% pme/F 0.57 step 1000, will finish Sat Apr 14 
21:49:10 2012
vol 0.46! imb F  9% pme/F 0.58 step 1100, will finish Sat Apr 14 
21:26:04 2012
vol 0.47! imb F 10% pme/F 0.57 step 1200, will finish Sat Apr 14 
21:02:35 2012
vol 0.45  imb F  9% pme/F 0.58 step 1300, will finish Sat Apr 14 
20:34:22 2012
vol 0.45! imb F  9% pme/F 0.58 step 1400, will finish Sat Apr 14 
20:15:54 2012
vol 0.48! imb F 11% pme/F 0.57 step 1500, will finish Sat Apr 14 
20:07:48 2012
vol 0.47! imb F 10% pme/F 0.58 step 1600, will finish Sat Apr 14 
19:57:46 2012
vol 0.47! imb F 13% pme/F 0.58 step 1700, will finish Sat Apr 14 
19:51:47 2012
vol 0.45! imb F 11% pme/F 0.58 step 1800, will finish Sat Apr 14 
19:44:37 2012
vol 0.46! imb F 13% pme/F 0.57 step 1900, will finish Sat Apr 14 
19:37:10 2012
vol 0.50! imb F 12% pme/F 0.58 step 2000, will finish Sat Apr 14 
19:29:20 2012
vol 0.50! imb F 12% pme/F 0.58 step 2100, will finish Sat Apr 14 
19:23:00 2012
vol 0.48  imb F 10% pme/F 0.57 step 2200, will finish Sat Apr 14 
19:15:43 2012
vol 0.50! imb F 11% pme/F 0.57 step 2300, will finish Sat Apr 14 
19:13:30 2012
vol 0.49! imb F 11% pme/F 0.57 step 2400, will finish Sat Apr 14 
19:10:14 2012
vol 0.48  imb F 10% pme/F 0.58 step 2500, will finish Sat Apr 14 
19:01:51 2012
vol 0.47! imb F 12% pme/F 0.58 step 2600, will finish Sat Apr 14 
18:55:11 2012
vol 0.48! imb F 11% pme/F 0.58 step 2700, will finish Sat Apr 14 
18:49:47 2012
vol 0.46! imb F 12% pme/F 0.58 step 2800, will finish Sat Apr 14 
18:45:32 2012

-------------------------------------------------------
Program mdrun_mpi_bg, VERSION 4.5.5
Source code file: ../../../src/mdlib/domdec.c, line: 2633

Fatal error:
Step 2850: The domain decomposition grid has shifted too much in the 
Z-direction around cell 5 0 2

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

"Don't Push Me, Cause I'm Close to the Edge" (Tricky)

Error on node 162, will try to stop all the nodes
Halting parallel program mdrun_mpi_bg on CPU 162 out of 256

gcq#8: "Don't Push Me, Cause I'm Close to the Edge" (Tricky)

Abort(-1) on node 162 (rank 162 in comm 1140850688): application called 
MPI_Abort(MPI_COMM_WORLD, -1) - process 162
<Apr 05 13:15:36.667617> BE_MPI (ERROR): The error message in the job 
record is as follows:
<Apr 05 13:15:36.667681> BE_MPI (ERROR):   "killed with signal 6"




More information about the gromacs.org_gmx-users mailing list