[gmx-users] Re: parallel run hangs (not crashed)

David van der Spoel spoel at xray.bmc.uu.se
Tue May 16 22:31:59 CEST 2006


Jason O'Young wrote:
> I have been having similar problems to this. I have posted to the 
> mailing list with the title "mdrun_mpi stops at random" but no one has 
> seen my reply :(.
> 
it's been quite hectic on the list lately...

as always if you have a reproducible problem of a crash after a short 
time, you are welcome to submit a bugzilla. But be prepared to hand over 
lots of information.

Unfortunately there are many factors besides GROMACS that can make 
programs crash, e.g. overheating, broken memory chips, broken FFT 
libraries, shitty compilers.

It's always a good test to compile with gcc in case you tried something 
else first.


> It seems to be almost at random which of the processes falls...
> 
> Jason
> 
>> I am running a system of 185K atoms. The structure is energy minimized 
>> and the
>> dynamics run appears to be going smoothly until it just hangs. The job 
>> still
>> exists on the first node, but none of the 4 nodes are doing any work 
>> and I don't
>> get any error messages.
>>
>> The trajectory looks good and no step.***.pdb files were created.
>>
>> My only clue is that an energy file was created but was empty -- and 
>> should have
>> some data based on nstenergy=10000
>>
>> The last portion of output to my mdrun_mpi -g log file was:
>>            Step           Time         Lambda
>>           26300       52.60000        0.00000
>>
>>    Energies (kJ/mol)
>>            Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper 
>> Dih.
>>     8.77749e+04    1.66791e+05    6.40224e+04    7.13528e+04    
>> 6.57086e+03
>>           LJ-14     Coulomb-14        LJ (SR)   Coulomb (SR)   Coul. 
>> recip.
>>     1.01403e+05    1.73522e+05   -3.08556e+03   -2.51895e+06   
>> -1.65998e+06
>>  Position Rest.      Potential    Kinetic En.   Total Energy    
>> Temperature
>>     5.65104e+03   -3.50492e+06    5.44080e+05   -2.96084e+06    
>> 2.99289e+02
>>  Pressure (bar)
>>     3.40970e+01
>>
>> My commands were:
>>
>> GROMPP:
>> ${ED}/grompp -np 4 -f grompp_md.mdp -n ${MOL}.ndx -c ${MOL}_m.gro -p 
>> ${MOL}.top
>> -o ${MOL}_mm.tpr > output.mm_grompp
>>
>> MDRUN_MPI:mdrun_mpi stops at random
>> ${ED}/mdrun_mpi -np 4 -nice 4 -s ${MOL}_mm.tpr -o ${MOL}_mm.trr -c 
>> ${MOL}_mm.gro
>> -g output.mm_mdrun -v -deffnm run1g 2> output.mm_mdrun_e
>>
>> LAM SCRIPT:
>> #!/bin/sh
>> PATH=.:/work/lam/bin:$PATH
>> LAMRSH="ssh -x"
>> export LAMRSH PATH
>> cd ${MYDIR}
>> lamboot -v lamhosts
>> mpirun N ${MYDIR}/run.sh
>> lamhalt
>>
>> And my mdp file was:
>> title               =  seriousMD
>> cpp                 =  /usr/bin/cpp
>> define              =  -DPOSRES_LIPID -DPOSRES_PAGP -DPOSRES_LDA 
>> -DPOSRES_XSOL
>> integrator          =  md
>> nsteps              =  50000
>> tinit               =  0
>> dt                  =  0.002
>> comm_mode           =  angular
>> nstcomm             =  1
>> comm_grps           =  System
>> nstxout             =  10000
>> nstvout             =  10000
>> nstfout             =  10000
>> nstlog              =  100
>> nstlist             =  10
>> nstenergy           =  10000
>> nstxtcout           =  250
>> ns_type             =  grid
>> pbc                 =  xyz
>> coulombtype         =  PME
>> fourierspacing      =  0.15
>> pme_order           =  4
>> vdwtype             =  switch
>> rvdw_switch         =  0.9
>> rvdw                =  1.0
>> rlist               =  1.1
>> DispCorr            =  no
>> Pcoupl              =  Berendsen
>> tau_p               =  0.5
>> compressibility     =  4.5e-5
>> ref_p               =  1.
>> tcoupl              =  nose-hoover
>> tc_grps             =  Protein_LDA   XSOL_SOL_NA+   POPE
>> tau_t               =  0.05          0.05           0.05
>> ref_t               =  300.          300.           300.
>> annealing           =  no            no             no
>> gen_vel             =  yes
>> gen_temp            =  300.          300.           300.
>> gen_seed            =  9896
>> constraints         =  hbonds
>> constraint_algorithm=  shake
>> shake_tol           =  0.0001
>>
>> Thanks.
>> Chris.
> 
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use thewww 
> interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php


-- 
David.
________________________________________________________________________
David van der Spoel, PhD, Assoc. Prof., Molecular Biophysics group,
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596,  	75124 Uppsala, Sweden
phone:	46 18 471 4205		fax: 46 18 511 755
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



More information about the gromacs.org_gmx-users mailing list