[gmx-users] Simulation time losses with REMD

Mark Abraham Mark.Abraham at anu.edu.au
Fri Jan 28 06:46:08 CET 2011


Hi,

I compared the .log file time accounting for same .tpr file run alone in 
serial or as part of an REMD simulation (with each replica on a single 
proessor). It ran about 5-10% slower in the latter. The effect was a bit 
larger when comparing the same .tpr on 8 processors with REMD with 8 
processers per replica. The effect seems fairly independent of whether I 
compare the lowest or highest replica.

The system is 1ns of Ace-(Ala)_10-NME in CHARMM27 with GROMACS 4.5.3 
using NVT, PME, virtual sites, 4fs timesteps, rlist=rvdw=rcoulomb=1.0nm 
with REMD ranging over 20 replicas distributed exponentially from 298K 
to 431.57K using v-rescale T-coupling. The machine has two quad-core 
processors per node with Inifiniband connection. The Infiniband switch 
is shared with other users' calculations, so some load-based variability 
can and does occur, but this should have shown up in a named part of the 
time accounting.

My first thought was that REMD exchange latency was to blame, so I 
quickly hacked in a change to report the length of time spent in the 
REMD initialization routine, and then each call to the REMD 
exchange-attempt routine.

Comparing the performance between REMD and serial of the lowest replica 
on a single processor, I saw with diff:
    Computing:         Nodes     Number     G-Cycles    Seconds     %
7394,7403c6910,6918
<  Vsite constr.          1     250001       40.271       13.8     0.7
<  Neighbor search        1      25011      434.982      148.7     7.1
<  Force                  1     250001     3607.375     1232.8    59.1
<  PME mesh               1     250001     1270.407      434.1    20.8
<  Vsite spread           1     500002       41.671       14.2     0.7
<  Write traj.            1          3        7.873        2.7     0.1
<  Update                 1     250001       82.822       28.3     1.4
<  Constraints            1     250001      154.231       52.7     2.5
<  REMD                   1        100       59.070       20.2     1.0
<  Rest                   1                 409.862      140.1     6.7
---
 >  Vsite constr.          1     250001       40.526       13.8     0.7
 >  Neighbor search        1      25001      434.871      148.6     7.5
 >  Force                  1     250001     3601.463     1230.8    62.2
 >  PME mesh               1     250001     1292.675      441.8    22.3
 >  Vsite spread           1     500002       41.479       14.2     0.7
 >  Write traj.            1          3       17.153        5.9     0.3
 >  Update                 1     250001       82.114       28.1     1.4
 >  Constraints            1     250001      154.426       52.8     2.7
 >  Rest                   1                 122.023       41.7     2.1
7405c6920
<  Total                  1                6108.562     2087.5   100.0
---
 >  Total                  1                5786.731     1977.5   100.0

So "Rest" goes up from 122 s to 409 s under REMD, even after factoring 
out the 59 s actually spent in REMD. With the highest replica:

    Computing:         Nodes     Number     G-Cycles    Seconds     %
7394,7403c6910,6918
<  Vsite constr.          1     250001       40.261       13.8     0.7
<  Neighbor search        1      25016      434.878      148.6     7.1
<  Force                  1     250001     3606.913     1232.6    59.0
<  PME mesh               1     250001     1264.716      432.2    20.7
<  Vsite spread           1     500002       41.268       14.1     0.7
<  Write traj.            1          3        7.113        2.4     0.1
<  Update                 1     250001       82.491       28.2     1.4
<  Constraints            1     250001      153.207       52.4     2.5
<  REMD                   1        100       60.272       20.6     1.0
<  Rest                   1                 417.399      142.6     6.8
---
 >  Vsite constr.          1     250001       40.518       13.8     0.7
 >  Neighbor search        1      25001      435.069      148.7     7.6
 >  Force                  1     250001     3609.196     1233.4    62.6
 >  PME mesh               1     250001     1283.082      438.5    22.3
 >  Vsite spread           1     500002       41.825       14.3     0.7
 >  Write traj.            1          3       13.063        4.5     0.2
 >  Update                 1     250001       82.011       28.0     1.4
 >  Constraints            1     250001      154.350       52.7     2.7
 >  Rest                   1                 102.249       34.9     1.8
7405c6920
<  Total                  1                6108.520     2087.5   100.0
---
 >  Total                  1                5761.363     1968.8   100.0

Here 102 s becomes 417 s despite factoring out 60 s for REMD. So the 
time spent doing the exchange is just noticeable, but quite a bit less 
than the observed increase in total time.

For the lowest replica in parallel:

8481,8496c7971,7985
<  Domain decomp.         8      25010      152.338       52.1     1.8
<  DD comm. load          8      24226        1.085        0.4     0.0
<  DD comm. bounds        8      24219        4.167        1.4     0.0
<  Vsite constr.          8     250001       62.857       21.5     0.8
<  Comm. coord.           8     250001      132.068       45.1     1.6
<  Neighbor search        8      25010      367.001      125.4     4.4
<  Force                  8     250001     3446.528     1177.8    41.2
<  Wait + Comm. F         8     250001      252.245       86.2     3.0
<  PME mesh               8     250001     2113.009      722.1    25.3
<  Vsite spread           8     500002      102.749       35.1     1.2
<  Write traj.            8          1        1.206        0.4     0.0
<  Update                 8     250001       85.793       29.3     1.0
<  Constraints            8     250001      464.294      158.7     5.5
<  Comm. energies         8     250002       73.343       25.1     0.9
<  REMD                   8        100      162.661       55.6     1.9
<  Rest                   8                 945.642      323.2    11.3
---
 >  Domain decomp.         8      25001      146.561       50.1     2.0
 >  DD comm. load          8      22943        0.989        0.3     0.0
 >  DD comm. bounds        8      22901        3.768        1.3     0.1
 >  Vsite constr.          8     250001       64.035       21.9     0.9
 >  Comm. coord.           8     250001      124.487       42.5     1.7
 >  Neighbor search        8      25001      367.342      125.5     5.0
 >  Force                  8     250001     3443.161     1176.7    46.9
 >  Wait + Comm. F         8     250001      237.697       81.2     3.2
 >  PME mesh               8     250001     2119.205      724.2    28.9
 >  Vsite spread           8     500002       95.092       32.5     1.3
 >  Write traj.            8          1        0.920        0.3     0.0
 >  Update                 8     250001       85.529       29.2     1.2
 >  Constraints            8     250001      391.469      133.8     5.3
 >  Comm. energies         8     250002      120.291       41.1     1.6
 >  Rest                   8                 139.127       47.5     1.9
8498c7987
<  Total                  8                8366.984     2859.3   100.0
---
 >  Total                  8                7339.674     2508.3   100.0

Again REMD exchanges are only a small fraction of the increase (139 s to 
946 s despite 163 s accounted for).

Does anyone have a theory on what could be causing this?

Mark




More information about the gromacs.org_gmx-users mailing list