[gmx-developers] Why is domain decomposition the default parallelization algorithm in version 4 and later?

Sat May 12 12:34:05 CEST 2012

Hi,

That comment was probably already outdated at the time of publication.
For most system, except very small ones, the interaction range will be
significantly longer than the interaction range.
But also I (re)invented the, later termed, eighth shell domain decomposition
method, which has significantly less communication volume than the
commonly used half shell method.

The electrostatic potential, or any other property for that matter, should
not depend on the way it's calculated. But you must have ran different
simulations, one with PD and one with DD. As MD is chaotic, results will
always be different, unless you average over a trajectory of a completely
converged ensemble.

Cheers,

Berk

On 05/11/2012 10:03 PM, Andrew DeYoung wrote:
> Hi,
>
> I hope that this is an appropriate topic for this list.  If it is not,
> please let me know and I will be happy to move it.
>
> I think that in versions prior to 4, particle decomposition was the only
> parallelization method available.  According to the 2005 J. Comput. Chem.
> Paper (http://onlinelibrary.wiley.com/doi/10.1002/jcc.20291/abstract) by Dr.
> van der Spoel et al.:
>
> "An early design decision was the choice to work with particle decomposition
> rather than domain decomposition to distribute work over the processors.
> ...  Domain decomposition is a better choice only when linear system size
> considerably exceeds the range of interaction, which is seldom the case in
> molecular dynamics.  With particle decomposition ... every processor keeps
> in its local memory the complete coordinate set of the system rather than
> restricting storage to the coordinates it needs. This is simpler and saves
> communication overhead, while the memory claim is usually not a limiting
> factor at all, even for millions of particles.  ...  Communication is
> essentially restricted to sending coordinates and forces once per time step
> around the processor ring.  These choices have proven to be robust over time
> and easily applicable to modern processor clusters."  [page 1702]
>
> But in version 4, domain decomposition was implemented and is now the
> default parallelization algorithm in mdrun.  Why is this the case?  From
> reading the 2008 paper by Hess et al. in JCTC
> (http://pubs.acs.org/doi/abs/10.1021/ct700301q), it seems that domain
> decomposition can be and is better performing than particle decomposition if
> implemented cleverly, despite domain decomposition's higher communication
> overhead:
>
> "GROMACS was in fact set up to run in parallel on 10Mbit ethernet from the
> start in 1992 but used a particle/force decomposition that did not scale
> well.  The single-instruction-multiple-data kernels we introduced in 2000
> made the relative scaling even worse (although absolute performance improved
> significantly), since the fraction of remaining time spent on communication
> increased.  A related problem was load imbalance; with particle
> decomposition one can frequently avoid imbalance by distributing different
> types of molecules uniformly over the processors.  Domain decomposition, on
> the other hand, requires automatic load balancing to avoid deterioration of
> performance."  [page 436]
>
> "Previous GROMACS versions used a ring communication topology, where half of
> the coordinates/forces were sent over half the ring. To be frank, the only
> thing to be said in favor of that is that it was simple."  [page 441]
>
> Unfortunately, I am not very well-versed in parallelization algorithms and
> high-performance computing in general.  Can you please tell me in 1-2
> sentences why domain decomposition is now the default parallelization
> method?
>
> (In recent simulations I have run, I have seen some seemingly significant
> differences in the electric potential (calculated using g_potential) when I
> use particle decomposition versus when I use domain decomposition.  Do you
> know if this has been observed?  I do not see any discussion of the
> differences in results between the two algorithms.  I am convinced that I
> must be making a mistake (it seems unlikely that I, of all people would find
> a bug), but I have not yet found my mistake.)
>
> Thanks for your time!
>
> Andrew DeYoung
> Carnegie Mellon University
>