[gmx-users] Issues running Gromacs with MPI/OpenMP in cpu cluster

Thu Feb 13 19:35:03 CET 2014

On Thu, Feb 13, 2014 at 6:39 PM, Thomas Schlesier <schlesi at uni-mainz.de> wrote:
> Hi,
> I'm no expert for this stuff, but could it be that you generate about 40 of
> the #my_mol.log.$n# files (probably only 39)?
> It could be that the 'mpirun' starts 40 'mdrun'-jobs and each generates its
> own out put.
> For GROMACS 4.6.x I always used
> mdrun -nt X ...
> to start a parallel run (where X would be 40 in your case). I think GROMACS
> 4.6.x has the MPI stuff build into it and therefore doesn't need an external
> 'mpirun' (but i could be wrong - I only know how to use the stuff, but don't
> completly understand it...)

The build-in thread-MPI works only *within* a node. Across nodes you
need a "real" MPI.

To check what capabilities was mdrun (or any other tool) build with,
run "mdrun -version", among others you'll find a field "MPI library:"
which will list either "thread_mpi" for the build-in MPI
parallelization or "MPI" for the "real" MPI needed to run across
multiple nodes.

Cheers,
--
Szilárd

> Hope this helps a little
>
> Greetings
> Thomas
>
>
>
> Am 13.02.2014 18:13, schrieb
> gromacs.org_gmx-users-request at maillist.sys.kth.se:
>>
>> Dear GROMACS users,
>>
>> I am facing a strange situation by running Gromacs (v - 4.6.3) in our
>> local cpu-cluster using MPI/OpenMP parallelization process. I am trying to
>> simulate a big heterogeneous aquas-polymer system in octahedron box.
>>
>> I use the following command to run my simulations and use sge queuing
>> system to submit the job in the cluster.
>> mpirun -nolocal -np 40 mdrun -ntomp 1
>>
>> Immediately after launching the job a huge number of backup files are
>> getting generated in the same directory. Generally naming conventions of
>> these backup files are #my_mol.log.$n# OR #my_mol.edr.$n#  etc, i.e. more
>> than one backup log, .edr, .gro, etc files are generated, I suppose due to
>> the parallelization. But, after running some steps the log file and the
>> sge-error file starts complaining about the disk space although there is no
>> disk space scarcity. Lowering the frequency of writing to the output did not
>> help neither increasing -cpt from 15 -> 50 helped.
>>
>> The stranger part is here when I see the backup files are getting updated
>> regularly individually at the same directory (although the main .log file
>> stopped updating itself) and run for the entire steps (e.g. 10000 steps)
>> individually as if $n number of individual simulations are running.
>>
>> My .mdp file is :
>> title                        =  MD test run of 2 ns
>> ; using Verlet scheme
>> cutoff-scheme  = Verlet
>> ; Run parameters
>> integrator            = md
>> nsteps                  = 1000000
>> dt                            = 0.002
>> ; Output control
>> nstxout                = 10000
>> nstvout                = 5000
>> nstxtcout             = 1000
>> nstenergy           = 1000
>> nstlog                    = 1000
>> ; Bond parameters
>> continuation      = yes
>> constraint_algorithm = lincs
>> constraints          = all-bonds
>> lincs_iter              = 1
>> lincs_order         = 4
>> ; Neighborsearching
>> ns_type               = grid
>> nstlist                    = 10
>> rlist                         = 1.0
>> rcoulomb             = 1.0
>> vdw-type            = cut-off
>> rvdw                      = 1.0
>> ; Electrostatics
>> coulombtype     = PME
>> pme_order         = 4
>> fourierspacing   = 0.16
>> ; Temperature coupling is on
>> tcoupl                   = V-rescale
>> tc-grps                  = polymer SOL_Ion
>> tau_t                     = 0.1      0.1
>> ref_t                      = 300     300
>> ; Pressure coupling is on
>> pcoupl                  = Parrinello-Rahman
>> pcoupltype         = isotropic
>> tau_p                    = 2.0
>> ref_p                     = 1.0
>> compressibility = 4.5e-5
>> ; Periodic boundary conditions
>> pbc                         = xyz
>> ; Dispersion correction
>> DispCorr               = EnerPres
>> ; Velocity generation
>> gen_vel                = no
>>
>>
>> The commands I used to run:
>>
>> J = i-1
>> ### When I changing in .mdp file
>> grompp -f md.mdp -c test_$j.tpr -o test_$i.tpr -t test_$j.cpt -p
>> test_solv.top -n my_index.ndx >& log.grompp
>> mpirun -nolocal -np 40 $GMX_HOME/bin/mdrun -ntomp 1 -s test_$i.tpr -deffnm
>> test_$i -cpt 30
>>
>> ### When I try to extend the simulations
>> tpbconv -s test_${j}.tpr -o test_${i}.tpr -extend 1000 >& log_${i}.tpbconv
>> mpirun -nolocal -np 40 $GMX_HOME/bin/mdrun -s test_${i}.tpr -deffnm
>> test_${i} -cpi test_${j}.cpt -cpt 30
>>
>>
>> As this is my first attempt with the Gromacs and no one has ever used
>> MD/Gromacs in my surrounding I could not verify this strange behavior. I
>> have tried to search for the error in the web as well as in the community
>> forum but not found any reference to this issue. I may not have searched
>> with proper terminology however. I am also in touch with my sys admin who
>> are also little bit lost in this. If anyone could help me to get out of this
>> situation I would be highly obliged. Please let me know if I am not very
>> explicit in describing the issue.
>>
>>
>> Looking forward to hear from you.
>> Thanks in advance,
>> Mousumi
>>
>>
>> Ontario Institute for Cancer Research
>> MaRS Centre
>> Toronto, Ontario
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
> mail to gmx-users-request at gromacs.org.