[gmx-developers] domain decomposition issue

Wed Aug 15 07:44:27 CEST 2012

On 15/08/2012 5:46 AM, francesco oteri wrote:
> Dear gromacs users and developers,
> I have a question related to domain decomposition:
>
> I have to tun multiple simulation and every step
> 1) In the sim X the state of simY (and simY need state from sim X)
> 2) getting the potential energy
> 3) continuing
>
>
>
> right now I am testing the point1. In particular I exchange the state 
> between simulation X and Y two times:
> the first time this permit at simulation X to get the state of Y ( and 
> vicecersa) while the second exchange
> restore the original situatation.

Seems you don't actually want to exchange states, but rather do a 
computation on a copy of the other state. I'd either
1) do exchange_state on X into a different t_state from the one with 
which X is simulating (so there is no need to exchange back, since doing 
dd_partition_system a second time requires neighbour searching twice and 
that will kill your scaling even harder)
2) get Y to do the computation on its coordinates, since swapping the 
result is probably much cheaper than collecting the state, communcating 
the state, doing NS and DD on the state and then computing on it. That 
might mean maintaining multiple t_forcerec or gmx_mtop_t, but at least 
those data structures are likely constant, so you only have to 
communicate them rarely(once?).

>
> I inserted the following code between lines
>
>    if ((repl_ex_nst > 0) && (step > 0) && !bLastStep &&
>             do_per_step(step,repl_ex_nst))
>         {
>
> and
>
> bExchanged = replica_exchange(fplog, cr, repl_ex, state_global, 
> enerd->term, state,step,t);
>
>
> //Performing the first exchange
>   if (DOMAINDECOMP(cr))
>         {
>  dd_collect_state(cr->dd,state,state_global);
>
>  if (MASTER(cr))
>         {
>           exchange_state(cr->ms, Y, state_global);
>        }
>   if (DOMAINDECOMP(cr))
>     {
>            dd_partition_system(fplog,step,cr,TRUE,1,
> state_global,top_global,ir,
> state,NULL,mdatoms,top,fr,
> vsite,shellfc,constr,
>  nrnb,wcycle,FALSE);
>     }
>
> //Now every node should have its part of the Y simulation
> //Getting potential energy
>
>
>
> //Performing the second exchange
> if (MASTER(cr))
>   {
>     exchange_state(cr->ms, Y, state_global); // I don't need to  call 
> because nothing changed state_global dd_collect_state
>   }
>
>   if (DOMAINDECOMP(cr))
>     {
>            dd_partition_system(fplog,step,cr,TRUE,1,
> state_global,top_global,ir,
> state,NULL,mdatoms,top,fr,
> vsite,shellfc,constr,
>  nrnb,wcycle,FALSE);
>     }
>
> //Now state Y is back to simulation Y
>
>
> The problem is that this simple code gives me problem, in particular 
> it gives LINCS problem
> in do_force the step after my code is executed.
>
> Since forcing bNS=TRUE solves the problem, I guess there is some issue 
> with neighbor list updating
> but I dont understand why.
>
> I observed that in, after the last dd_partition_system, syste->natoms 
> had an other value compared with the value
> it has at the before my code is executed.
>
> What is my error?

Particularly with dynamic load balancing, there is no reason that the DD 
for any replica should resemble the DD for any other replica. Each 
processor can have totally different atoms, and a different number of 
atoms, so blindly copying stuff into those data structures will lead to 
the kinds of problems you see. I'd still expect problems even if you 
disable dynamic load balancing. Hence my suggestions above.

The implementation of replica exchange in GROMACS scales poorly because 
exchanging coordinates requires subsequent NS and DD. So I'd encourage 
you to avoid that route if you can. Exchanging Hamiltonians is much 
cheaper. I have an implementation of that for T-REMD, but it won't see 
the light of day any time soon.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120815/ebf080a4/attachment.html>