[gmx-users] Re: parallel run hangs (not crashed)
Jason O'Young
joyoung at uwo.ca
Tue May 16 22:14:05 CEST 2006
I have been having similar problems to this. I have posted to the
mailing list with the title "mdrun_mpi stops at random" but no one
has seen my reply :(.
It seems to be almost at random which of the processes falls...
Jason
> I am running a system of 185K atoms. The structure is energy
> minimized and the
> dynamics run appears to be going smoothly until it just hangs. The
> job still
> exists on the first node, but none of the 4 nodes are doing any
> work and I don't
> get any error messages.
>
> The trajectory looks good and no step.***.pdb files were created.
>
> My only clue is that an energy file was created but was empty --
> and should have
> some data based on nstenergy=10000
>
> The last portion of output to my mdrun_mpi -g log file was:
> Step Time Lambda
> 26300 52.60000 0.00000
>
> Energies (kJ/mol)
> Bond Angle Proper Dih. Ryckaert-Bell.
> Improper Dih.
> 8.77749e+04 1.66791e+05 6.40224e+04 7.13528e+04
> 6.57086e+03
> LJ-14 Coulomb-14 LJ (SR) Coulomb (SR)
> Coul. recip.
> 1.01403e+05 1.73522e+05 -3.08556e+03 -2.51895e+06
> -1.65998e+06
> Position Rest. Potential Kinetic En. Total Energy
> Temperature
> 5.65104e+03 -3.50492e+06 5.44080e+05 -2.96084e+06
> 2.99289e+02
> Pressure (bar)
> 3.40970e+01
>
> My commands were:
>
> GROMPP:
> ${ED}/grompp -np 4 -f grompp_md.mdp -n ${MOL}.ndx -c ${MOL}_m.gro -
> p ${MOL}.top
> -o ${MOL}_mm.tpr > output.mm_grompp
>
> MDRUN_MPI:mdrun_mpi stops at random
> ${ED}/mdrun_mpi -np 4 -nice 4 -s ${MOL}_mm.tpr -o ${MOL}_mm.trr -c $
> {MOL}_mm.gro
> -g output.mm_mdrun -v -deffnm run1g 2> output.mm_mdrun_e
>
> LAM SCRIPT:
> #!/bin/sh
> PATH=.:/work/lam/bin:$PATH
> LAMRSH="ssh -x"
> export LAMRSH PATH
> cd ${MYDIR}
> lamboot -v lamhosts
> mpirun N ${MYDIR}/run.sh
> lamhalt
>
> And my mdp file was:
> title = seriousMD
> cpp = /usr/bin/cpp
> define = -DPOSRES_LIPID -DPOSRES_PAGP -DPOSRES_LDA -
> DPOSRES_XSOL
> integrator = md
> nsteps = 50000
> tinit = 0
> dt = 0.002
> comm_mode = angular
> nstcomm = 1
> comm_grps = System
> nstxout = 10000
> nstvout = 10000
> nstfout = 10000
> nstlog = 100
> nstlist = 10
> nstenergy = 10000
> nstxtcout = 250
> ns_type = grid
> pbc = xyz
> coulombtype = PME
> fourierspacing = 0.15
> pme_order = 4
> vdwtype = switch
> rvdw_switch = 0.9
> rvdw = 1.0
> rlist = 1.1
> DispCorr = no
> Pcoupl = Berendsen
> tau_p = 0.5
> compressibility = 4.5e-5
> ref_p = 1.
> tcoupl = nose-hoover
> tc_grps = Protein_LDA XSOL_SOL_NA+ POPE
> tau_t = 0.05 0.05 0.05
> ref_t = 300. 300. 300.
> annealing = no no no
> gen_vel = yes
> gen_temp = 300. 300. 300.
> gen_seed = 9896
> constraints = hbonds
> constraint_algorithm= shake
> shake_tol = 0.0001
>
> Thanks.
> Chris.
More information about the gromacs.org_gmx-users
mailing list