[gmx-users] Making sure I understand an error

Michael Lerner mglerner+gromacs at gmail.com
Mon Jun 29 19:33:39 CEST 2009


I have a 72-lipid DPPC (MARTINI) system that I ran for 400ns in GROMACS
3.3.3. I picked a snapshot from the middle (~100ns), so I know it should be
equilibrated and be able to run for several hundred nanoseconds. If I use
GROMACS 4.0.3 (or 4.0.5) and 1, 2, or 4 processors, everything is great.
However, if I use more than 4 processors (either on a single node or on two
nodes), I get errors like this:

------------- begin error -------------
vol 0.87  imb F  2% step 350900, will finish Sat Jun 27 23:16:51 2009
vol 0.84  imb F  4% step 351000, will finish Sat Jun 27 23:16:49 2009

A list of missing interactions:
            G96Angle of    576 missing      1

Molecule type 'DPP'
the first 10 missing interactions, except for exclusions:
            G96Angle atoms    2    3    5      global   254   255   257

Program mdrun_mpi, VERSION 4.0.5
Source code file: domdec_top.c, line: 341

Fatal error:
1 of the 1368 bonded interactions could not be calculated because some atoms
involved moved further apart than the multi-body cut-off distance (1.2 nm)
or the two-body cut-off distance (1.2 nm), see option -rdd, for pairs and
tabulated bonds also see option -ddcheck

"Pump Up the Volume Along With the Tempo" (Jazzy Jeff)

Error on node 0, will try to stop all the nodes
Halting parallel program mdrun_mpi on CPU 0 out of 8

gcq#177: "Pump Up the Volume Along With the Tempo" (Jazzy Jeff)

h199:0.MPID_Abort: h199:0.MPI Abort by user Aborting program !
h199:0.MPID_CH_Abort: h199:0.Aborting program!
Abort on node h199 due to MPI_Abort (type 2)
-------------- end error --------------

This error happens on a different step depending on how many processors I
use, whether I use gfortran or ifort, etc.

Am I understanding correctly that I have a triplet of particles A-B-C where
the bond-angle term cannot be calculated because the distance between A and
C is greater than 1.2 nm?

Is dynamic load balancing causing the error to happen at different steps for
different numbers of processors?

It appears that I can fix the problem by setting -rdd=1.4 on the command
line, but I'd like to make sure I'm not just sweeping something else under
the rug.

For what it's worth, the equilibrium bond lengths in MARTINI's DPPC model
are all either .47 or .37 nm. In the -rdd=1.4 run, the maximum bond lengths
range from .68 to .71 nm depending on the particular bond and the A-C
distances from the A-B-C triplets range from 1.09 to 1.21.

Also, is there any chance that the default settings will get this right for
my system in the future?



Michael Lerner, Ph.D.
IRTA Postdoctoral Fellow
Laboratory of Computational Biology NIH/NHLBI
5635 Fishers Lane, Room T909, MSC 9314
Rockville, MD 20852 (UPS/FedEx/Reality)
Bethesda MD 20892-9314 (USPS)
