[gmx-developers] How to distribute charges over parallel nodes
Mark.Abraham at anu.edu.au
Fri May 6 08:32:03 CEST 2011
On 4/05/2011 7:57 PM, Igor Leontyev wrote:
> Thank you for prompt response.
>>> To make partial charges be adjustable according to acting field I
>>> have introduced modifications to gromacs 4.0.7. The serial (single
>>> thread) version seems to be ready and I want to implement
>>> parallelization (with particle decomposition). In my current
>>> - values of mdatoms->chargeA for local atoms are updated in "do_md"
>>> at the begininig of each timestep;
>>> - 'MPI_Sendrecv' + 'gmx_wait' are used in "do_force" (right after
>>> the call "move_cgcm") to distribute the new charges over parallel
>>> After this the array mdatoms->chargeA have updated values on all
>>> nodes. But some problem arises later in "gmx_pme_do" (modification
>>> free routine) hanging up execution and even PC.
>> Standard procedure is to use a debugger to see which memory access
>> from where is problematic. I'm not aware of a free parallel debugger,
>> however. Bisecting with printf() calls can work...
> I debug parallel gromacs by GDB + DDD. The problem, however, appears
> irregularly somewhere in gmx_pme_do such that I can not locate
> precisely the problematic line. More specifically, the program is
> executed ok doing step by step debugging but it might hang in run regime.
That suggests a memory problem. Take your serial code and pass it
through tools like valgrind. Then try the parallel version, etc. Or use
a real memory debugger like MemoryScape from TotalView.
>>> Is it possible that source of the problem is in use of
>>> ('MPI_Sendrecv' + 'gmx_wait') in wrong place of the code?
>> I doubt it.
>>> Many communications are performed in "gmx_pme_do", e.g. "pmeredist"
>>> calls 'MPI_Alltoallv' for charge and coordinate redistribution over
>>> the nodes.
>>> Is there a particular reason in gromacs code why some communications
>>> are done by 'MPI_Sendrecv' but other by 'MPI_Alltoallv'? What is
>>> the right way (or right MPI routine) to distribute the locally
>>> updated charges over all nodes?
>> Various parts of the code date from times when different parts of the
>> MPI standard had implementations of varying quality,
>> and some parts are throwbacks (I gather) to the way very early
>> versions of GROMACS were designed to communicate on a parallel
>> machine with ring topology.
>> These days, we should use the collective communication calls rather
>> than introduce maintenance issues re-implementing wheels.
> I am not quite experienced in MPI business. Could you be more specific
> what are the modern MPI routines?
Collective communication is not "modern", but modern implementations are
of high enough quality to suggest their use.
> As for examples, which gmx routines use the modern communications?
Not enough of them. :-) This should get cleaned up in the C++ switch.
Unfortunately, it's not enough just to survey which MPI function is
called. The current replica-exchange code uses collective communication,
but does so in a way that is not very scalable. Last time I remember
looking, the data structures built from the .tpr on the master node were
passed around a ring with a separate MPI call for multiple bits of data
structures. That should be done with a packed MPI data type and a
broadcast, but since it's only done once it's not a big deal...
>> I can't help with clues on how a PD simulation should distribute such
>> information, except that there must be a mapping somewhere of
>> simulation atom to MPI rank that distributed the data in mdatoms
>> shortly after it was constructed from the .tpr file.
> I believe it is taken carry in my modifications.
More information about the gromacs.org_gmx-developers