[gmx-developers] domain decomposition issue
Mark Abraham
Mark.Abraham at anu.edu.au
Wed Aug 15 07:44:27 CEST 2012
On 15/08/2012 5:46 AM, francesco oteri wrote:
> Dear gromacs users and developers,
> I have a question related to domain decomposition:
>
> I have to tun multiple simulation and every step
> 1) In the sim X the state of simY (and simY need state from sim X)
> 2) getting the potential energy
> 3) continuing
>
>
>
> right now I am testing the point1. In particular I exchange the state
> between simulation X and Y two times:
> the first time this permit at simulation X to get the state of Y ( and
> vicecersa) while the second exchange
> restore the original situatation.
Seems you don't actually want to exchange states, but rather do a
computation on a copy of the other state. I'd either
1) do exchange_state on X into a different t_state from the one with
which X is simulating (so there is no need to exchange back, since doing
dd_partition_system a second time requires neighbour searching twice and
that will kill your scaling even harder)
2) get Y to do the computation on its coordinates, since swapping the
result is probably much cheaper than collecting the state, communcating
the state, doing NS and DD on the state and then computing on it. That
might mean maintaining multiple t_forcerec or gmx_mtop_t, but at least
those data structures are likely constant, so you only have to
communicate them rarely(once?).
>
> I inserted the following code between lines
>
> if ((repl_ex_nst > 0) && (step > 0) && !bLastStep &&
> do_per_step(step,repl_ex_nst))
> {
>
> and
>
> bExchanged = replica_exchange(fplog, cr, repl_ex, state_global,
> enerd->term, state,step,t);
>
>
> //Performing the first exchange
> if (DOMAINDECOMP(cr))
> {
> dd_collect_state(cr->dd,state,state_global);
>
> if (MASTER(cr))
> {
> exchange_state(cr->ms, Y, state_global);
> }
> if (DOMAINDECOMP(cr))
> {
> dd_partition_system(fplog,step,cr,TRUE,1,
> state_global,top_global,ir,
> state,NULL,mdatoms,top,fr,
> vsite,shellfc,constr,
> nrnb,wcycle,FALSE);
> }
>
> //Now every node should have its part of the Y simulation
> //Getting potential energy
>
>
>
> //Performing the second exchange
> if (MASTER(cr))
> {
> exchange_state(cr->ms, Y, state_global); // I don't need to call
> because nothing changed state_global dd_collect_state
> }
>
> if (DOMAINDECOMP(cr))
> {
> dd_partition_system(fplog,step,cr,TRUE,1,
> state_global,top_global,ir,
> state,NULL,mdatoms,top,fr,
> vsite,shellfc,constr,
> nrnb,wcycle,FALSE);
> }
>
> //Now state Y is back to simulation Y
>
>
> The problem is that this simple code gives me problem, in particular
> it gives LINCS problem
> in do_force the step after my code is executed.
>
> Since forcing bNS=TRUE solves the problem, I guess there is some issue
> with neighbor list updating
> but I dont understand why.
>
> I observed that in, after the last dd_partition_system, syste->natoms
> had an other value compared with the value
> it has at the before my code is executed.
>
> What is my error?
Particularly with dynamic load balancing, there is no reason that the DD
for any replica should resemble the DD for any other replica. Each
processor can have totally different atoms, and a different number of
atoms, so blindly copying stuff into those data structures will lead to
the kinds of problems you see. I'd still expect problems even if you
disable dynamic load balancing. Hence my suggestions above.
The implementation of replica exchange in GROMACS scales poorly because
exchanging coordinates requires subsequent NS and DD. So I'd encourage
you to avoid that route if you can. Exchanging Hamiltonians is much
cheaper. I have an implementation of that for T-REMD, but it won't see
the light of day any time soon.
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120815/ebf080a4/attachment.html>
More information about the gromacs.org_gmx-developers
mailing list