[gmx-users] parallel run hangs (not crashed)
chris.neale at utoronto.ca
chris.neale at utoronto.ca
Tue May 16 20:35:26 CEST 2006
I am running a system of 185K atoms. The structure is energy minimized and the
dynamics run appears to be going smoothly until it just hangs. The job still
exists on the first node, but none of the 4 nodes are doing any work and I don't
get any error messages.
The trajectory looks good and no step.***.pdb files were created.
My only clue is that an energy file was created but was empty -- and should have
some data based on nstenergy=10000
The last portion of output to my mdrun_mpi -g log file was:
Step Time Lambda
26300 52.60000 0.00000
Energies (kJ/mol)
Bond Angle Proper Dih. Ryckaert-Bell. Improper Dih.
8.77749e+04 1.66791e+05 6.40224e+04 7.13528e+04 6.57086e+03
LJ-14 Coulomb-14 LJ (SR) Coulomb (SR) Coul. recip.
1.01403e+05 1.73522e+05 -3.08556e+03 -2.51895e+06 -1.65998e+06
Position Rest. Potential Kinetic En. Total Energy Temperature
5.65104e+03 -3.50492e+06 5.44080e+05 -2.96084e+06 2.99289e+02
Pressure (bar)
3.40970e+01
My commands were:
GROMPP:
${ED}/grompp -np 4 -f grompp_md.mdp -n ${MOL}.ndx -c ${MOL}_m.gro -p ${MOL}.top
-o ${MOL}_mm.tpr > output.mm_grompp
MDRUN_MPI:
${ED}/mdrun_mpi -np 4 -nice 4 -s ${MOL}_mm.tpr -o ${MOL}_mm.trr -c ${MOL}_mm.gro
-g output.mm_mdrun -v -deffnm run1g 2> output.mm_mdrun_e
LAM SCRIPT:
#!/bin/sh
PATH=.:/work/lam/bin:$PATH
LAMRSH="ssh -x"
export LAMRSH PATH
cd ${MYDIR}
lamboot -v lamhosts
mpirun N ${MYDIR}/run.sh
lamhalt
And my mdp file was:
title = seriousMD
cpp = /usr/bin/cpp
define = -DPOSRES_LIPID -DPOSRES_PAGP -DPOSRES_LDA -DPOSRES_XSOL
integrator = md
nsteps = 50000
tinit = 0
dt = 0.002
comm_mode = angular
nstcomm = 1
comm_grps = System
nstxout = 10000
nstvout = 10000
nstfout = 10000
nstlog = 100
nstlist = 10
nstenergy = 10000
nstxtcout = 250
ns_type = grid
pbc = xyz
coulombtype = PME
fourierspacing = 0.15
pme_order = 4
vdwtype = switch
rvdw_switch = 0.9
rvdw = 1.0
rlist = 1.1
DispCorr = no
Pcoupl = Berendsen
tau_p = 0.5
compressibility = 4.5e-5
ref_p = 1.
tcoupl = nose-hoover
tc_grps = Protein_LDA XSOL_SOL_NA+ POPE
tau_t = 0.05 0.05 0.05
ref_t = 300. 300. 300.
annealing = no no no
gen_vel = yes
gen_temp = 300. 300. 300.
gen_seed = 9896
constraints = hbonds
constraint_algorithm= shake
shake_tol = 0.0001
Thanks.
Chris.
More information about the gromacs.org_gmx-users
mailing list