[gmx-developers] How to distribute charges over parallel nodes
mark.abraham at anu.edu.au
Wed May 4 10:46:40 CEST 2011
On 04/05/11, Igor Leontyev <ileontyev at ucdavis.edu> wrote:
> To make partial charges be adjustable according to acting field I have introduced modifications to gromacs 4.0.7. The serial (single thread) version seems to be ready and I want to implement parallelization (with particle decomposition). In my current implementation:
> - values of mdatoms->chargeA for local atoms are updated in "do_md" at the begininig of each timestep;
> - 'MPI_Sendrecv' + 'gmx_wait' are used in "do_force" (right after the call "move_cgcm") to distribute the new charges over parallel nodes.
> After this the array mdatoms->chargeA have updated values on all nodes. But some problem arises later in "gmx_pme_do" (modification free routine) hanging up execution and even PC.
Standard procedure is to use a debugger to see which memory access from where is problematic. I'm not aware of a free parallel debugger, however. Bisecting with printf() calls can work...
> Is it possible that source of the problem is in use of ('MPI_Sendrecv' + 'gmx_wait') in wrong place of the code?
I doubt it.
> Many communications are performed in "gmx_pme_do", e.g. "pmeredist" calls 'MPI_Alltoallv' for charge and coordinate redistribution over the nodes.
> Is there a particular reason in gromacs code why some communications are done by 'MPI_Sendrecv' but other by 'MPI_Alltoallv'? What is the right way (or right MPI routine) to distribute the locally updated charges over all nodes?
Various parts of the code date from times when different parts of the MPI standard had implementations of varying quality,
and some parts are throwbacks (I gather) to the way very early versions of GROMACS were designed to communicate on a parallel machine with ring topology.
These days, we should use the collective communication calls rather than introduce maintenance issues re-implementing wheels.
I can't help with clues on how a PD simulation should distribute such information, except that there must be a mapping somewhere of simulation atom to MPI rank that distributed the data in mdatoms shortly after it was constructed from the .tpr file.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers