[gmx-users] Issues running Gromacs with MPI/OpenMP in cpu cluster

Mousumi Bhattacharyya Mousumi.Bhattacharyya at oicr.on.ca
Thu Feb 13 18:13:05 CET 2014


Dear GROMACS users,

I am facing a strange situation by running Gromacs (v - 4.6.3) in our local cpu-cluster using MPI/OpenMP parallelization process. I am trying to simulate a big heterogeneous aquas-polymer system in octahedron box.

I use the following command to run my simulations and use sge queuing system to submit the job in the cluster.
mpirun -nolocal -np 40 mdrun -ntomp 1

Immediately after launching the job a huge number of backup files are getting generated in the same directory. Generally naming conventions of these backup files are #my_mol.log.$n# OR #my_mol.edr.$n#  etc, i.e. more than one backup log, .edr, .gro, etc files are generated, I suppose due to the parallelization. But, after running some steps the log file and the sge-error file starts complaining about the disk space although there is no disk space scarcity. Lowering the frequency of writing to the output did not help neither increasing -cpt from 15 -> 50 helped.

The stranger part is here when I see the backup files are getting updated regularly individually at the same directory (although the main .log file stopped updating itself) and run for the entire steps (e.g. 10000 steps) individually as if $n number of individual simulations are running.

My .mdp file is :
title                        =  MD test run of 2 ns
; using Verlet scheme
cutoff-scheme  = Verlet
; Run parameters
integrator            = md
nsteps                  = 1000000
dt                            = 0.002
; Output control
nstxout                = 10000
nstvout                = 5000
nstxtcout             = 1000
nstenergy           = 1000
nstlog                    = 1000
; Bond parameters
continuation      = yes
constraint_algorithm = lincs
constraints          = all-bonds
lincs_iter              = 1
lincs_order         = 4
; Neighborsearching
ns_type               = grid
nstlist                    = 10
rlist                         = 1.0
rcoulomb             = 1.0
vdw-type            = cut-off
rvdw                      = 1.0
; Electrostatics
coulombtype     = PME
pme_order         = 4
fourierspacing   = 0.16
; Temperature coupling is on
tcoupl                   = V-rescale
tc-grps                  = polymer SOL_Ion
tau_t                     = 0.1      0.1
ref_t                      = 300     300
; Pressure coupling is on
pcoupl                  = Parrinello-Rahman
pcoupltype         = isotropic
tau_p                    = 2.0
ref_p                     = 1.0
compressibility = 4.5e-5
; Periodic boundary conditions
pbc                         = xyz
; Dispersion correction
DispCorr               = EnerPres
; Velocity generation
gen_vel                = no


The commands I used to run:

J = i-1
### When I changing in .mdp file
grompp -f md.mdp -c test_$j.tpr -o test_$i.tpr -t test_$j.cpt -p test_solv.top -n my_index.ndx >& log.grompp
mpirun -nolocal -np 40 $GMX_HOME/bin/mdrun -ntomp 1 -s test_$i.tpr -deffnm test_$i -cpt 30

### When I try to extend the simulations
tpbconv -s test_${j}.tpr -o test_${i}.tpr -extend 1000 >& log_${i}.tpbconv
mpirun -nolocal -np 40 $GMX_HOME/bin/mdrun -s test_${i}.tpr -deffnm test_${i} -cpi test_${j}.cpt -cpt 30


As this is my first attempt with the Gromacs and no one has ever used MD/Gromacs in my surrounding I could not verify this strange behavior. I have tried to search for the error in the web as well as in the community forum but not found any reference to this issue. I may not have searched with proper terminology however. I am also in touch with my sys admin who are also little bit lost in this. If anyone could help me to get out of this situation I would be highly obliged. Please let me know if I am not very explicit in describing the issue.


Looking forward to hear from you.
Thanks in advance,
Mousumi


Ontario Institute for Cancer Research
MaRS Centre
Toronto, Ontario




More information about the gromacs.org_gmx-users mailing list