[gmx-users] Simulation time losses with REMD

Sun Jan 30 00:26:04 CET 2011

> -----Original Message-----
> From: gmx-users-bounces at gromacs.org [mailto:gmx-users-
> bounces at gromacs.org] On Behalf Of Mark Abraham
> Sent: 29 January 2011 08:24
> To: Discussion list for GROMACS users
> Subject: Re: [gmx-users] Simulation time losses with REMD
> 
> On 28/01/2011 4:46 PM, Mark Abraham wrote:
> > Hi,
> >
> > I compared the .log file time accounting for same .tpr file run alone
> > in serial or as part of an REMD simulation (with each replica on a
> > single proessor). It ran about 5-10% slower in the latter. The effect
> > was a bit larger when comparing the same .tpr on 8 processors with
> > REMD with 8 processers per replica. The effect seems fairly
> > independent of whether I compare the lowest or highest replica.
> 
> OK I found the issue by binary-searching the code looking for the
> offending line. It's in compute_globals() in src/kernel/md.c. The call
> to gmx_sum_sim consumes all the extra time. This code is taking care of
> synchronization for possibly doing checkpointing.
> 
>                  if (MULTISIM(cr) && bInterSimGS)
>                  {
>                      if (MASTER(cr))
>                      {
>                          /* Communicate the signals between the
> simulations */
>                          gmx_sum_sim(eglsNR,gs_buf,cr->ms);
>                      }
>                      /* Communicate the signals form the master to the
> others */
>                      gmx_bcast(eglsNR*sizeof(gs_buf[0]),gs_buf,cr);
>                  }
> 
> This eventually calls
> 
> void gmx_sumf_comm(int nr,float r[],MPI_Comm mpi_comm)
> {
> #if defined(MPI_IN_PLACE_EXISTS) || defined(GMX_THREADS)
>      MPI_Allreduce(MPI_IN_PLACE,r,nr,MPI_FLOAT,MPI_SUM,mpi_comm);
> #else
>      /* this function is only used in code that is not performance
> critical,
>         (during setup, when comm_rec is not the appropriate
> communication
>         structure), so this isn't as bad as it looks. */
>      float *buf;
>      int i;
> 
>      snew(buf, nr);
>      MPI_Allreduce(r,buf,nr,MPI_FLOAT,MPI_SUM,mpi_comm);
>      for(i=0; i<nr; i++)
>          r[i] = buf[i];
>      sfree(buf);
> #endif
> }
> 
> Clearly the comment is out of date. My nstlist=5, repl_ex_nst=2500 and
> nstcalcenergy=-1, so that triggers gs.nstms=5 and so bInterSimGS is
> TRUE
> every 5 steps. I'm not sure whether the problem is with nstlist, or the
> multi-simulation checkpointing engineering, or what.
> 
> Mark

So are you saying that this code itself is slow (and called frequently), or this is showing the latency in synchronising replicas? If the latter, then presumably if you comment this out (or adjust nstlist or whatever), then it will just defer to the latency to the REMD call itself?
(I'll check my own example in due course, but our systems happen to be down this weekend.)

Martyn

-- 
Scanned by iCritical.