[gmx-users] Simulation time losses with REMD
Mark Abraham
Mark.Abraham at anu.edu.au
Fri Jan 28 06:46:08 CET 2011
Hi,
I compared the .log file time accounting for same .tpr file run alone in
serial or as part of an REMD simulation (with each replica on a single
proessor). It ran about 5-10% slower in the latter. The effect was a bit
larger when comparing the same .tpr on 8 processors with REMD with 8
processers per replica. The effect seems fairly independent of whether I
compare the lowest or highest replica.
The system is 1ns of Ace-(Ala)_10-NME in CHARMM27 with GROMACS 4.5.3
using NVT, PME, virtual sites, 4fs timesteps, rlist=rvdw=rcoulomb=1.0nm
with REMD ranging over 20 replicas distributed exponentially from 298K
to 431.57K using v-rescale T-coupling. The machine has two quad-core
processors per node with Inifiniband connection. The Infiniband switch
is shared with other users' calculations, so some load-based variability
can and does occur, but this should have shown up in a named part of the
time accounting.
My first thought was that REMD exchange latency was to blame, so I
quickly hacked in a change to report the length of time spent in the
REMD initialization routine, and then each call to the REMD
exchange-attempt routine.
Comparing the performance between REMD and serial of the lowest replica
on a single processor, I saw with diff:
Computing: Nodes Number G-Cycles Seconds %
7394,7403c6910,6918
< Vsite constr. 1 250001 40.271 13.8 0.7
< Neighbor search 1 25011 434.982 148.7 7.1
< Force 1 250001 3607.375 1232.8 59.1
< PME mesh 1 250001 1270.407 434.1 20.8
< Vsite spread 1 500002 41.671 14.2 0.7
< Write traj. 1 3 7.873 2.7 0.1
< Update 1 250001 82.822 28.3 1.4
< Constraints 1 250001 154.231 52.7 2.5
< REMD 1 100 59.070 20.2 1.0
< Rest 1 409.862 140.1 6.7
---
> Vsite constr. 1 250001 40.526 13.8 0.7
> Neighbor search 1 25001 434.871 148.6 7.5
> Force 1 250001 3601.463 1230.8 62.2
> PME mesh 1 250001 1292.675 441.8 22.3
> Vsite spread 1 500002 41.479 14.2 0.7
> Write traj. 1 3 17.153 5.9 0.3
> Update 1 250001 82.114 28.1 1.4
> Constraints 1 250001 154.426 52.8 2.7
> Rest 1 122.023 41.7 2.1
7405c6920
< Total 1 6108.562 2087.5 100.0
---
> Total 1 5786.731 1977.5 100.0
So "Rest" goes up from 122 s to 409 s under REMD, even after factoring
out the 59 s actually spent in REMD. With the highest replica:
Computing: Nodes Number G-Cycles Seconds %
7394,7403c6910,6918
< Vsite constr. 1 250001 40.261 13.8 0.7
< Neighbor search 1 25016 434.878 148.6 7.1
< Force 1 250001 3606.913 1232.6 59.0
< PME mesh 1 250001 1264.716 432.2 20.7
< Vsite spread 1 500002 41.268 14.1 0.7
< Write traj. 1 3 7.113 2.4 0.1
< Update 1 250001 82.491 28.2 1.4
< Constraints 1 250001 153.207 52.4 2.5
< REMD 1 100 60.272 20.6 1.0
< Rest 1 417.399 142.6 6.8
---
> Vsite constr. 1 250001 40.518 13.8 0.7
> Neighbor search 1 25001 435.069 148.7 7.6
> Force 1 250001 3609.196 1233.4 62.6
> PME mesh 1 250001 1283.082 438.5 22.3
> Vsite spread 1 500002 41.825 14.3 0.7
> Write traj. 1 3 13.063 4.5 0.2
> Update 1 250001 82.011 28.0 1.4
> Constraints 1 250001 154.350 52.7 2.7
> Rest 1 102.249 34.9 1.8
7405c6920
< Total 1 6108.520 2087.5 100.0
---
> Total 1 5761.363 1968.8 100.0
Here 102 s becomes 417 s despite factoring out 60 s for REMD. So the
time spent doing the exchange is just noticeable, but quite a bit less
than the observed increase in total time.
For the lowest replica in parallel:
8481,8496c7971,7985
< Domain decomp. 8 25010 152.338 52.1 1.8
< DD comm. load 8 24226 1.085 0.4 0.0
< DD comm. bounds 8 24219 4.167 1.4 0.0
< Vsite constr. 8 250001 62.857 21.5 0.8
< Comm. coord. 8 250001 132.068 45.1 1.6
< Neighbor search 8 25010 367.001 125.4 4.4
< Force 8 250001 3446.528 1177.8 41.2
< Wait + Comm. F 8 250001 252.245 86.2 3.0
< PME mesh 8 250001 2113.009 722.1 25.3
< Vsite spread 8 500002 102.749 35.1 1.2
< Write traj. 8 1 1.206 0.4 0.0
< Update 8 250001 85.793 29.3 1.0
< Constraints 8 250001 464.294 158.7 5.5
< Comm. energies 8 250002 73.343 25.1 0.9
< REMD 8 100 162.661 55.6 1.9
< Rest 8 945.642 323.2 11.3
---
> Domain decomp. 8 25001 146.561 50.1 2.0
> DD comm. load 8 22943 0.989 0.3 0.0
> DD comm. bounds 8 22901 3.768 1.3 0.1
> Vsite constr. 8 250001 64.035 21.9 0.9
> Comm. coord. 8 250001 124.487 42.5 1.7
> Neighbor search 8 25001 367.342 125.5 5.0
> Force 8 250001 3443.161 1176.7 46.9
> Wait + Comm. F 8 250001 237.697 81.2 3.2
> PME mesh 8 250001 2119.205 724.2 28.9
> Vsite spread 8 500002 95.092 32.5 1.3
> Write traj. 8 1 0.920 0.3 0.0
> Update 8 250001 85.529 29.2 1.2
> Constraints 8 250001 391.469 133.8 5.3
> Comm. energies 8 250002 120.291 41.1 1.6
> Rest 8 139.127 47.5 1.9
8498c7987
< Total 8 8366.984 2859.3 100.0
---
> Total 8 7339.674 2508.3 100.0
Again REMD exchanges are only a small fraction of the increase (139 s to
946 s despite 163 s accounted for).
Does anyone have a theory on what could be causing this?
Mark
More information about the gromacs.org_gmx-users
mailing list