[gmx-users] Re: parallel run hangs (not crashed)

Jason O'Young joyoung at uwo.ca
Tue May 16 22:14:05 CEST 2006


I have been having similar problems to this. I have posted to the  
mailing list with the title "mdrun_mpi stops at random" but no one  
has seen my reply :(.

It seems to be almost at random which of the processes falls...

Jason

> I am running a system of 185K atoms. The structure is energy  
> minimized and the
> dynamics run appears to be going smoothly until it just hangs. The  
> job still
> exists on the first node, but none of the 4 nodes are doing any  
> work and I don't
> get any error messages.
>
> The trajectory looks good and no step.***.pdb files were created.
>
> My only clue is that an energy file was created but was empty --  
> and should have
> some data based on nstenergy=10000
>
> The last portion of output to my mdrun_mpi -g log file was:
>            Step           Time         Lambda
>           26300       52.60000        0.00000
>
>    Energies (kJ/mol)
>            Bond          Angle    Proper Dih. Ryckaert-Bell.   
> Improper Dih.
>     8.77749e+04    1.66791e+05    6.40224e+04    7.13528e+04     
> 6.57086e+03
>           LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)    
> Coul. recip.
>     1.01403e+05    1.73522e+05   -3.08556e+03   -2.51895e+06    
> -1.65998e+06
>  Position Rest.      Potential    Kinetic En.   Total Energy     
> Temperature
>     5.65104e+03   -3.50492e+06    5.44080e+05   -2.96084e+06     
> 2.99289e+02
>  Pressure (bar)
>     3.40970e+01
>
> My commands were:
>
> GROMPP:
> ${ED}/grompp -np 4 -f grompp_md.mdp -n ${MOL}.ndx -c ${MOL}_m.gro - 
> p ${MOL}.top
> -o ${MOL}_mm.tpr > output.mm_grompp
>
> MDRUN_MPI:mdrun_mpi stops at random
> ${ED}/mdrun_mpi -np 4 -nice 4 -s ${MOL}_mm.tpr -o ${MOL}_mm.trr -c $ 
> {MOL}_mm.gro
> -g output.mm_mdrun -v -deffnm run1g 2> output.mm_mdrun_e
>
> LAM SCRIPT:
> #!/bin/sh
> PATH=.:/work/lam/bin:$PATH
> LAMRSH="ssh -x"
> export LAMRSH PATH
> cd ${MYDIR}
> lamboot -v lamhosts
> mpirun N ${MYDIR}/run.sh
> lamhalt
>
> And my mdp file was:
> title               =  seriousMD
> cpp                 =  /usr/bin/cpp
> define              =  -DPOSRES_LIPID -DPOSRES_PAGP -DPOSRES_LDA - 
> DPOSRES_XSOL
> integrator          =  md
> nsteps              =  50000
> tinit               =  0
> dt                  =  0.002
> comm_mode           =  angular
> nstcomm             =  1
> comm_grps           =  System
> nstxout             =  10000
> nstvout             =  10000
> nstfout             =  10000
> nstlog              =  100
> nstlist             =  10
> nstenergy           =  10000
> nstxtcout           =  250
> ns_type             =  grid
> pbc                 =  xyz
> coulombtype         =  PME
> fourierspacing      =  0.15
> pme_order           =  4
> vdwtype             =  switch
> rvdw_switch         =  0.9
> rvdw                =  1.0
> rlist               =  1.1
> DispCorr            =  no
> Pcoupl              =  Berendsen
> tau_p               =  0.5
> compressibility     =  4.5e-5
> ref_p               =  1.
> tcoupl              =  nose-hoover
> tc_grps             =  Protein_LDA   XSOL_SOL_NA+   POPE
> tau_t               =  0.05          0.05           0.05
> ref_t               =  300.          300.           300.
> annealing           =  no            no             no
> gen_vel             =  yes
> gen_temp            =  300.          300.           300.
> gen_seed            =  9896
> constraints         =  hbonds
> constraint_algorithm=  shake
> shake_tol           =  0.0001
>
> Thanks.
> Chris.




More information about the gromacs.org_gmx-users mailing list