[gmx-users] Exceeding of Maximum allowed number of DD cells

Mon Jan 11 01:38:24 CET 2010

Chao Zhang wrote:
> Dear GMX-Users,
> 
> I'm testing my 256 full hydrated lipid on blue gene. The purpose is to find out the right number for "-npme", as mdrun can not estimate itself successfully.
> 
> I met the problem that how to match the maxinum allowed number for DD cells with large number of CPU cores.
> 
> My simulation box size  is about 8x8x9nm^3, with normal LINCS parameter and dds=0.8. The log file said that the maximum allowed number of DD cells is 8x8x9.

Why are you setting -dds? -rcon and -rdd are also variables to play 
with... but if you get too "close to the bone" you can run into (e.g.) 
LINCS problems. A lipid-water system has inhomogeneous interaction 
density, so the DD load-balance needs to scale the starting guess for 
cells, and -dds will have a significant effect at high parallelization. 
See mdrun -h, and the manual and GROMACS 4 paper.

> As far as I understand, DD assigns one core to one cell, so the maximum core I can use in this case for PP part is 8x8x9=576 cores.
> 
> I then ran with 512 cores with -npme=128. My system runs without problem.
> 
> What if I want to use more cores?
> 
> Then I try to increase the "-dds" from 0.8 to 0.9, this leads to an increasing of "the maximum allowed number of DD cells" to  8x10x10. 
> 
> This time is 1024 cores in total and I set -npme=224, then PP part will have 800 cores which are within 10x10x10.
> 
> The system ran initially but corrupted very soon with warning that "DD cell 2 1 4 could on obtain 56 of the 57 atoms that are connected via constraints from the neighboring cells ...."
> 
> Therefore the dilemma is if I increase the "-dds", I can meet the requirement for the maximum allowed number of DD cells, but fail the maximum length of constraints in LINCS.
> 
> Does it mean that for a relative small system, it is not possible to using up to thousand of cores by domain decomposition?

Yes. Each cell takes responsibility for a subset of atoms, and then 
communicates them to neighbouring cell who need to know. As the cell 
gets smaller, the communication cost would get larger. GROMACS sets a 
number of semi-artificial constraints on the cell size with the above 
options. There is a lower limit on DD cell size in practice for a given 
system with the GROMACS 4 implementation, but you have to experiment to 
find it. Whether you derive any speed advantage from moving towards that 
limit will depend on the relative performance of your processors and 
network.

IBM's Blue Matter MD code is supposed to work down at around 1 atom per 
core, but GROMACS isn't built to do that.

> I know that if it makes more sense to use thousand of cores for huge system, but if my purpose is simply to speed up the simulation, what should I do?

If your objective is increasing effective sampling, REMD of 16 replicas 
of 64 cores (or similar) makes much sense.

Mark