[gmx-users] Simulation time losses with REMD

Mark Abraham Mark.Abraham at anu.edu.au
Sat Jan 29 09:23:30 CET 2011

On 28/01/2011 4:46 PM, Mark Abraham wrote:
> Hi,
> I compared the .log file time accounting for same .tpr file run alone 
> in serial or as part of an REMD simulation (with each replica on a 
> single proessor). It ran about 5-10% slower in the latter. The effect 
> was a bit larger when comparing the same .tpr on 8 processors with 
> REMD with 8 processers per replica. The effect seems fairly 
> independent of whether I compare the lowest or highest replica.

OK I found the issue by binary-searching the code looking for the 
offending line. It's in compute_globals() in src/kernel/md.c. The call 
to gmx_sum_sim consumes all the extra time. This code is taking care of 
synchronization for possibly doing checkpointing.

                 if (MULTISIM(cr) && bInterSimGS)
                     if (MASTER(cr))
                         /* Communicate the signals between the 
simulations */
                     /* Communicate the signals form the master to the 
others */

This eventually calls

void gmx_sumf_comm(int nr,float r[],MPI_Comm mpi_comm)
#if defined(MPI_IN_PLACE_EXISTS) || defined(GMX_THREADS)
     /* this function is only used in code that is not performance 
        (during setup, when comm_rec is not the appropriate communication
        structure), so this isn't as bad as it looks. */
     float *buf;
     int i;

     snew(buf, nr);
     for(i=0; i<nr; i++)
         r[i] = buf[i];

Clearly the comment is out of date. My nstlist=5, repl_ex_nst=2500 and 
nstcalcenergy=-1, so that triggers gs.nstms=5 and so bInterSimGS is TRUE 
every 5 steps. I'm not sure whether the problem is with nstlist, or the 
multi-simulation checkpointing engineering, or what.


More information about the gromacs.org_gmx-users mailing list