[gmx-developers] Strange behavior with domain decomposition -- could this be handled more easily?

Shirts, Michael (mrs5pt) mrs5pt at eservices.virginia.edu
Tue Jun 19 02:27:56 CEST 2012

Hi, all-

* The system: a grafted polymer system, 2D pbc in xy, with polymers attached
to one wall, and a second wall at z = 3x polymer length.

* The problem: When run with -nt >=3, after a few hundred steps it quits
with errors of the

Step 210:
The charge group starting at atom 3067 moved than the distance allowed by
the domain decomposition in direction X
distance out of cell 1.755857
New coordinates:    1.756    8.825   33.301
Old cell boundaries in direction X:    0.000   28.000
New cell boundaries in direction X:    0.000   28.000

Fatal error:
A charge group moved too far between two domain decomposition steps
This usually means that your system is not well equilibrated
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

* The diagnosis: It has to do with the automated domain decomposition

Automatically, it decided to start splitting the box in the z direction.  It
starts with equal spacing in the z direction, but that means that there is a
lot of empty space at the top, so it starts to dynamically move the
divisions between boxes down.  Eventually, as the boxes are moving down,
they intersect the polymer.  Since it's adjusting so quickly to all the
empty space, the box boundaries are moving fast.  It interprets this as the
polymer moving, not the boundary, and assumes the simulation is blowing up.

* The solution: Manually set the domain decomposition with the
flag -dd to mdrun. We avoid decomposition in the z direction, so
for 8 cores, set something like -dd 4 2 1.

* The question: Is something that can be automatically detected in the code?
It was not at all obvious what was going on at first, and most users would
have to give up -- I needed to debug for a while to identify this.  Perhaps
the scaling be adjusted so that the box moved a maximum absolute distance as
well as a maximum percent, so that drawing of the box boundaries is never
larger than the "moved too much" distance?

I'm happy to file a redmine with sample files if we can indeed consider this
a bug.  I would consider a crash for a "not wrong" setup without proper
description of why it happened and how to avoid it to be a bug.

Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu

More information about the gromacs.org_gmx-developers mailing list