[gmx-users] parallel run hangs (not crashed)

chris.neale at utoronto.ca chris.neale at utoronto.ca
Tue May 16 20:35:26 CEST 2006


I am running a system of 185K atoms. The structure is energy minimized and the
dynamics run appears to be going smoothly until it just hangs. The job still
exists on the first node, but none of the 4 nodes are doing any work and I don't
get any error messages.

The trajectory looks good and no step.***.pdb files were created.

My only clue is that an energy file was created but was empty -- and should have
some data based on nstenergy=10000

The last portion of output to my mdrun_mpi -g log file was:
           Step           Time         Lambda
          26300       52.60000        0.00000

   Energies (kJ/mol)
           Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.
    8.77749e+04    1.66791e+05    6.40224e+04    7.13528e+04    6.57086e+03
          LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)   Coul. recip.
    1.01403e+05    1.73522e+05   -3.08556e+03   -2.51895e+06   -1.65998e+06
 Position Rest.      Potential    Kinetic En.   Total Energy    Temperature
    5.65104e+03   -3.50492e+06    5.44080e+05   -2.96084e+06    2.99289e+02
 Pressure (bar)
    3.40970e+01

My commands were:

GROMPP:
${ED}/grompp -np 4 -f grompp_md.mdp -n ${MOL}.ndx -c ${MOL}_m.gro -p ${MOL}.top
-o ${MOL}_mm.tpr > output.mm_grompp

MDRUN_MPI:
${ED}/mdrun_mpi -np 4 -nice 4 -s ${MOL}_mm.tpr -o ${MOL}_mm.trr -c ${MOL}_mm.gro
-g output.mm_mdrun -v -deffnm run1g 2> output.mm_mdrun_e

LAM SCRIPT:
#!/bin/sh
PATH=.:/work/lam/bin:$PATH
LAMRSH="ssh -x"
export LAMRSH PATH
cd ${MYDIR}
lamboot -v lamhosts
mpirun N ${MYDIR}/run.sh
lamhalt

And my mdp file was:
title               =  seriousMD
cpp                 =  /usr/bin/cpp
define              =  -DPOSRES_LIPID -DPOSRES_PAGP -DPOSRES_LDA -DPOSRES_XSOL
integrator          =  md
nsteps              =  50000
tinit               =  0
dt                  =  0.002
comm_mode           =  angular
nstcomm             =  1
comm_grps           =  System
nstxout             =  10000
nstvout             =  10000
nstfout             =  10000
nstlog              =  100
nstlist             =  10
nstenergy           =  10000
nstxtcout           =  250
ns_type             =  grid
pbc                 =  xyz
coulombtype         =  PME
fourierspacing      =  0.15
pme_order           =  4
vdwtype             =  switch
rvdw_switch         =  0.9
rvdw                =  1.0
rlist               =  1.1
DispCorr            =  no
Pcoupl              =  Berendsen
tau_p               =  0.5
compressibility     =  4.5e-5
ref_p               =  1.
tcoupl              =  nose-hoover
tc_grps             =  Protein_LDA   XSOL_SOL_NA+   POPE
tau_t               =  0.05          0.05           0.05
ref_t               =  300.          300.           300.
annealing           =  no            no             no
gen_vel             =  yes
gen_temp            =  300.          300.           300.
gen_seed            =  9896
constraints         =  hbonds
constraint_algorithm=  shake
shake_tol           =  0.0001

Thanks.
Chris.



More information about the gromacs.org_gmx-users mailing list