[gmx-users] Simulation time losses with REMD

Sun Jan 30 02:31:54 CET 2011

On 30/01/2011 10:26 AM, martyn.winn at stfc.ac.uk wrote:
>
>> -----Original Message-----
>> From: gmx-users-bounces at gromacs.org [mailto:gmx-users-
>> bounces at gromacs.org] On Behalf Of Mark Abraham
>> Sent: 29 January 2011 08:24
>> To: Discussion list for GROMACS users
>> Subject: Re: [gmx-users] Simulation time losses with REMD
>>
>> On 28/01/2011 4:46 PM, Mark Abraham wrote:
>>> Hi,
>>>
>>> I compared the .log file time accounting for same .tpr file run alone
>>> in serial or as part of an REMD simulation (with each replica on a
>>> single proessor). It ran about 5-10% slower in the latter. The effect
>>> was a bit larger when comparing the same .tpr on 8 processors with
>>> REMD with 8 processers per replica. The effect seems fairly
>>> independent of whether I compare the lowest or highest replica.
>> OK I found the issue by binary-searching the code looking for the
>> offending line. It's in compute_globals() in src/kernel/md.c. The call
>> to gmx_sum_sim consumes all the extra time. This code is taking care of
>> synchronization for possibly doing checkpointing.
>>
>>                   if (MULTISIM(cr)&&  bInterSimGS)
>>                   {
>>                       if (MASTER(cr))
>>                       {
>>                           /* Communicate the signals between the
>> simulations */
>>                           gmx_sum_sim(eglsNR,gs_buf,cr->ms);
>>                       }
>>                       /* Communicate the signals form the master to the
>> others */
>>                       gmx_bcast(eglsNR*sizeof(gs_buf[0]),gs_buf,cr);
>>                   }
>>
>> This eventually calls
>>
>> void gmx_sumf_comm(int nr,float r[],MPI_Comm mpi_comm)
>> {
>> #if defined(MPI_IN_PLACE_EXISTS) || defined(GMX_THREADS)
>>       MPI_Allreduce(MPI_IN_PLACE,r,nr,MPI_FLOAT,MPI_SUM,mpi_comm);
>> #else
>>       /* this function is only used in code that is not performance
>> critical,
>>          (during setup, when comm_rec is not the appropriate
>> communication
>>          structure), so this isn't as bad as it looks. */
>>       float *buf;
>>       int i;
>>
>>       snew(buf, nr);
>>       MPI_Allreduce(r,buf,nr,MPI_FLOAT,MPI_SUM,mpi_comm);
>>       for(i=0; i<nr; i++)
>>           r[i] = buf[i];
>>       sfree(buf);
>> #endif
>> }
>>
>> Clearly the comment is out of date. My nstlist=5, repl_ex_nst=2500 and
>> nstcalcenergy=-1, so that triggers gs.nstms=5 and so bInterSimGS is
>> TRUE
>> every 5 steps. I'm not sure whether the problem is with nstlist, or the
>> multi-simulation checkpointing engineering, or what.
>>
>> Mark
> So are you saying that this code itself is slow (and called frequently), or this is showing the latency in synchronising replicas? If the latter, then presumably if you comment this out (or adjust nstlist or whatever), then it will just defer to the latency to the REMD call itself?
> (I'll check my own example in due course, but our systems happen to be down this weekend.)

I've already controlled for the REMD cost and latency. The issue is what 
is causing the extra delay.

I've worked out what the issue is, and I'll move this thread to a 
Redmine issue - http://redmine.gromacs.org/issues/691

Mark