[gmx-users] Issues running Gromacs with MPI/OpenMP in cpu cluster

Carsten Kutzner ckutzne at gwdg.de
Thu Feb 13 18:32:40 CET 2014


Hi Mousumi,

from the fact that you get lots of backup files directly at the beginning 
I suspect that your mdrun is not MPI-enabled. This behavior is exactly what
one would get when launching a number of serial mdrun’s on the same input file.
Maybe you need to look for a mdrun_mpi executable.

There should be only one set of output files. If everything works correctly,
in your md.log output file in line 2 there should be a comment on
how many nodes Gromacs was using - this should be 40 if all 40 MPI processes
correctly work together.

Carsten


On 13 Feb 2014, at 18:03, Mousumi Bhattacharyya <Mousumi.Bhattacharyya at oicr.on.ca> wrote:

> Dear GROMACS users,
> 
> I am facing a strange situation by running Gromacs (v - 4.6.3) in our local cpu-cluster using MPI/OpenMP parallelization process. I am trying to simulate a big heterogeneous aquas-polymer system in octahedron box.
> 
> I use the following command to run my simulations and use sge queuing system to submit the job in the cluster.
> mpirun -nolocal -np 40 mdrun -ntomp 1
> 
> Immediately after launching the job a huge number of backup files are getting generated in the same directory. Generally naming conventions of these backup files are #my_mol.log.$n# OR #my_mol.edr.$n#  etc, i.e. more than one backup log, .edr, .gro, etc files are generated, I suppose due to the parallelization. But, after running some steps the log file and the sge-error file starts complaining about the disk space although there is no disk space scarcity. Lowering the frequency of writing to the output did not help neither increasing -cpt from 15 -> 50 helped.
> 
> The stranger part is here when I see the backup files are getting updated regularly individually at the same directory (although the main .log file stopped updating itself) and run for the entire steps (e.g. 10000 steps) individually as if $n number of individual simulations are running.
> 
> My .mdp file is :
> title                        =  MD test run of 2 ns
> ; using Verlet scheme
> cutoff-scheme  = Verlet
> ; Run parameters
> integrator            = md
> nsteps                  = 1000000
> dt                            = 0.002
> ; Output control
> nstxout                = 10000
> nstvout                = 5000
> nstxtcout             = 1000
> nstenergy           = 1000
> nstlog                    = 1000
> ; Bond parameters
> continuation      = yes
> constraint_algorithm = lincs
> constraints          = all-bonds
> lincs_iter              = 1
> lincs_order         = 4
> ; Neighborsearching
> ns_type               = grid
> nstlist                    = 10
> rlist                         = 1.0
> rcoulomb             = 1.0
> vdw-type            = cut-off
> rvdw                      = 1.0
> ; Electrostatics
> coulombtype     = PME
> pme_order         = 4
> fourierspacing   = 0.16
> ; Temperature coupling is on
> tcoupl                   = V-rescale
> tc-grps                  = polymer SOL_Ion
> tau_t                     = 0.1      0.1
> ref_t                      = 300     300
> ; Pressure coupling is on
> pcoupl                  = Parrinello-Rahman
> pcoupltype         = isotropic
> tau_p                    = 2.0
> ref_p                     = 1.0
> compressibility = 4.5e-5
> ; Periodic boundary conditions
> pbc                         = xyz
> ; Dispersion correction
> DispCorr               = EnerPres
> ; Velocity generation
> gen_vel                = no
> 
> 
> The commands I used to run:
> 
> J = i-1
> ### When I changing in .mdp file
> grompp -f md.mdp -c test_$j.tpr -o test_$i.tpr -t test_$j.cpt -p test_solv.top -n my_index.ndx >& log.grompp
> mpirun -nolocal -np 40 $GMX_HOME/bin/mdrun -ntomp 1 -s test_$i.tpr -deffnm test_$i -cpt 30
> 
> ### When I try to extend the simulations
> tpbconv -s test_${j}.tpr -o test_${i}.tpr -extend 1000 >& log_${i}.tpbconv
> mpirun -nolocal -np 40 $GMX_HOME/bin/mdrun -s test_${i}.tpr -deffnm test_${i} -cpi test_${j}.cpt -cpt 30
> 
> 
> As this is my first attempt with the Gromacs and no one has ever used MD/Gromacs in my surrounding I could not verify this strange behavior. I have tried to search for the error in the web as well as in the community forum but not found any reference to this issue. I may not have searched with proper terminology however. I am also in touch with my sys admin who are also little bit lost in this. If anyone could help me to get out of this situation I would be highly obliged. Please let me know if I am not very explicit in describing the issue.
> 
> 
> Looking forward to hear from you.
> Thanks in advance,
> Mousumi
> 
> 
> Ontario Institute for Cancer Research
> MaRS Centre
> Toronto, Ontario
> 
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.



More information about the gromacs.org_gmx-users mailing list