[gmx-developers] Problem with simulation in 8 nodes
Jose Duarte
duarte at molgen.mpg.de
Mon Aug 4 14:30:16 CEST 2008
I'm running a simulation with the latest version of gromacs from CVS. My
protein is 90 residues long, I add waters and ions as usual and then
perform energy minimization, position restrained equilibration and a
molecular dynamics run. This all works perfectly fine on 1 cpu (standard
mdrun executable) on 4 and on 6 (using mdrun_mpi) but misteriously fails
on 8 cpus. I've tried this on several setups: using lam-mpi in linux on
a single multi-core box, using lam-mpi on several nodes of a cluster,
using open-mpi on a multi-core Mac. I'm always getting exactly the same
behaviour: all works fine on 4 or 6 cpus but fails on 8. Gromacs is
compiled with default parameters (single precision).
The problem itself comes in the energy minimization step. I run a pretty
standard EM with PME for electrostatic interactions. This is the error
message when running mdrun:
##########
Making 2D domain decomposition 4 x 2 x 1
Steepest Descents:
Tolerance (Fmax) = 1.00000e+01
Number of steps = 5000
A list of missing interactions:
G96Angle of 1304 missing -1
Proper Dih. of 510 missing -2
Improper Dih. of 409 missing -1
-------------------------------------------------------
Program mdrun_mpi, VERSION 3.3.99_development_20080718
Source code file: domdec_top.c, line: 88
Software inconsistency error:
Some interactions seem to be assigned multiple times
-------------------------------------------------------
Error on node 3, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 3 out of 8
##########
The one thing I notice different in this case compare to running on 4 or
6 cpus is that in those cases the domain decomposition is 1D instead of
2D, no idea if that's relevant.
Actually looking at the log file produced by mdrun the simulation seems
to run properly until step 403, after which this error is reported:
##########
Not all bonded interactions have been properly assigned to the domain
decomposition cells
A list of missing interactions:
G96Angle of 1304 missing -1
Proper Dih. of 510 missing -2
Improper Dih. of 409 missing -1
##########
I have also tried to run the same procedure on another protein but the
problem doesn't arise at all, so it seems to be related to that
particular protein. I can send the pdb file if that's helpful.
Any ideas? Is this a bug?
Thanks
Jose
Jose M. Duarte
Max Planck Institute for Molecular Genetics
Ihnestr. 63-73
14195 Berlin
Germany
More information about the gromacs.org_gmx-developers
mailing list