[gmx-users] jobs failed
Albert
mailmd2011 at gmail.com
Thu Apr 5 16:15:00 CEST 2012
Hello:
I am using the following script to run Gromacs in cluster, but it failed:
# @ job_name = bm
# @ class = kdm-large
# @ error = gromacs.info
# @ output = gromacs.out
# @ environment = COPY_ALL
# @ wall_clock_limit = 10:00:00
# @ notification = error
# @ job_type = bluegene
# @ bg_size = 64
# @ queue
mpirun -exe /opt/gromacs/4.5.5/bin/mdrun_mpi_bg -args "-v -s md.tpr -o
md.trr -cpo md.cpt -c md.gro -g md-out.log -launch" -mode VN -np 256
and here is the log file
Back Off! I just backed up md-out.log to ./#md-out.log.1#
Getting Loaded...
Reading file md.tpr, VERSION 4.5.5 (single precision)
Loaded with Money
Will use 192 particle-particle and 64 PME only nodes
This is a guess, check the performance at the end of the log file
Making 3D domain decomposition 8 x 4 x 6
Back Off! I just backed up md.trr to ./#md.trr.2#
Back Off! I just backed up traj.xtc to ./#traj.xtc.3#
Back Off! I just backed up ener.edr to ./#ener.edr.3#
WARNING: This run will generate roughly 3302 Mb of data
starting mdrun 'BmEH-complex-POA in water'
50000000 steps, 100000.0 ps.
step 0
NOTE: Turning on dynamic load balancing
vol 0.41 imb F 18% pme/F 0.61 step 100, will finish Tue Apr 17 13:49:51
2012
vol 0.42 imb F 12% pme/F 0.60 step 200, will finish Sun Apr 15 23:46:30
2012
vol 0.44 imb F 12% pme/F 0.57 step 300, will finish Sun Apr 15 12:20:49
2012
vol 0.45 imb F 12% pme/F 0.58 step 400, will finish Sun Apr 15 07:01:25
2012
vol 0.48 imb F 12% pme/F 0.57 step 500, will finish Sun Apr 15 03:46:13
2012
vol 0.49! imb F 11% pme/F 0.57 step 600, will finish Sun Apr 15 01:43:05
2012
vol 0.46! imb F 10% pme/F 0.59 step 700, will finish Sun Apr 15 00:01:14
2012
vol 0.42! imb F 10% pme/F 0.58 step 800, will finish Sat Apr 14 22:56:06
2012
vol 0.45! imb F 12% pme/F 0.56 step 900, will finish Sat Apr 14 22:16:49
2012
vol 0.46! imb F 10% pme/F 0.57 step 1000, will finish Sat Apr 14
21:49:10 2012
vol 0.46! imb F 9% pme/F 0.58 step 1100, will finish Sat Apr 14
21:26:04 2012
vol 0.47! imb F 10% pme/F 0.57 step 1200, will finish Sat Apr 14
21:02:35 2012
vol 0.45 imb F 9% pme/F 0.58 step 1300, will finish Sat Apr 14
20:34:22 2012
vol 0.45! imb F 9% pme/F 0.58 step 1400, will finish Sat Apr 14
20:15:54 2012
vol 0.48! imb F 11% pme/F 0.57 step 1500, will finish Sat Apr 14
20:07:48 2012
vol 0.47! imb F 10% pme/F 0.58 step 1600, will finish Sat Apr 14
19:57:46 2012
vol 0.47! imb F 13% pme/F 0.58 step 1700, will finish Sat Apr 14
19:51:47 2012
vol 0.45! imb F 11% pme/F 0.58 step 1800, will finish Sat Apr 14
19:44:37 2012
vol 0.46! imb F 13% pme/F 0.57 step 1900, will finish Sat Apr 14
19:37:10 2012
vol 0.50! imb F 12% pme/F 0.58 step 2000, will finish Sat Apr 14
19:29:20 2012
vol 0.50! imb F 12% pme/F 0.58 step 2100, will finish Sat Apr 14
19:23:00 2012
vol 0.48 imb F 10% pme/F 0.57 step 2200, will finish Sat Apr 14
19:15:43 2012
vol 0.50! imb F 11% pme/F 0.57 step 2300, will finish Sat Apr 14
19:13:30 2012
vol 0.49! imb F 11% pme/F 0.57 step 2400, will finish Sat Apr 14
19:10:14 2012
vol 0.48 imb F 10% pme/F 0.58 step 2500, will finish Sat Apr 14
19:01:51 2012
vol 0.47! imb F 12% pme/F 0.58 step 2600, will finish Sat Apr 14
18:55:11 2012
vol 0.48! imb F 11% pme/F 0.58 step 2700, will finish Sat Apr 14
18:49:47 2012
vol 0.46! imb F 12% pme/F 0.58 step 2800, will finish Sat Apr 14
18:45:32 2012
-------------------------------------------------------
Program mdrun_mpi_bg, VERSION 4.5.5
Source code file: ../../../src/mdlib/domdec.c, line: 2633
Fatal error:
Step 2850: The domain decomposition grid has shifted too much in the
Z-direction around cell 5 0 2
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
"Don't Push Me, Cause I'm Close to the Edge" (Tricky)
Error on node 162, will try to stop all the nodes
Halting parallel program mdrun_mpi_bg on CPU 162 out of 256
gcq#8: "Don't Push Me, Cause I'm Close to the Edge" (Tricky)
Abort(-1) on node 162 (rank 162 in comm 1140850688): application called
MPI_Abort(MPI_COMM_WORLD, -1) - process 162
<Apr 05 13:15:36.667617> BE_MPI (ERROR): The error message in the job
record is as follows:
<Apr 05 13:15:36.667681> BE_MPI (ERROR): "killed with signal 6"
More information about the gromacs.org_gmx-users
mailing list