[gmx-users] domain decomposition

irene farabella i.farabella at mail.cryst.bbk.ac.uk
Fri May 7 14:57:55 CEST 2010


I am new to Gromacs and especially at parallel runs. I have some problem
running my system using domain decomposition. I apologize but it will
be a long mail...

My system is made up by a membrane protein embedded is a mixed lipid
bilayer (POPE/POPG).
I tried to run it on 8 nodes but the simulation crashed due to  "A charge
group moved too far between two domain decomposition steps" and
high % of load imbalance between nodes. I then tested the same run on a
single node and it worked. I tried then different amount of nodes,
changing DD (using mdrun -dd option) and it seems that more node I am
using the less the performance in terms of ns/day although the load
imbalance % is highly variable. During this test I found that the
optimal nodes for my system is running it on 6 node with a DD 3:2:1 (vol
0.80  imb F  1% )

---- from the log file:

Initializing Domain Decomposition on 6 nodes
Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition
Initial maximum inter charge-group distances:
 two-body bonded interactions: 0.703 nm, LJ-14, atoms 21579 21587
 multi-body bonded interactions: 2.038 nm, Proper Dih., atoms 20434 20405
Minimum cell size due to bonded interactions: 2.241 nm
Maximum distance for 5 constraints, at 120 deg. angles, all-trans: 0.820 nm
Estimated maximum distance required for P-LINCS: 0.820 nm
Domain decomposition grid 3 x 2 x 1, separate PME nodes 0
Domain decomposition nodeid 0, coordinates 0 0 0

Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Will do PME sum in reciprocal space.

Making 2D domain decomposition grid 3 x 2 x 1, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
 0:  Protein_POPE_POPG
 1:  SOL_NA+_CL-
There are: 82711 Atoms
Charge group distribution at step 2000000: 5512 5533 5668 5694 5535 5626
Grid: 7 x 10 x 14 cells
Initial temperature: 309.243 K

Using this setting I finally managed to equilibrate my system by going trough a
series of restrained runs.

Surprisingly after 6,5 ns of non-restrained run (Step  3289500) the simulation
crashes with :
"Fatal error:
A charge group moved too far between two domain decomposition steps
This usually means that your system is not well equilibrated ".

It seems strange that it crashes only at 3289500 steps of a non-restrained run.
I am now running a short run starting from a short while before the
crash step using a single processor and, as suspected, it is going
smoothly. My guess is that something is going wrong with the domain
decomposition of a such non-homogeneous system, considering that there
are also charged lipids that complicate it but I have no idea how to
solve/improve it.

I am using gromacs-4.0.5.

Any suggestion are welcome.

Irene Farabella
Wellcome Trust PhD student

