[gmx-users] >60% slowdown with GPU / verlet and sd integrator

Szilárd Páll szilard.pall at cbr.su.se
Tue Jan 15 19:35:22 CET 2013


Hi Floris,

Great feedback, this needs to be looked into. Could you please file a bug
report, preferably with a tpr (and/or all inputs) as well as log files.

Thanks,

--
Szilárd


On Tue, Jan 15, 2013 at 3:50 AM, Floris Buelens <floris_buelens at yahoo.com>wrote:

> Hi,
>
>
> I'm seeing MD simulation running a lot slower with the sd integrator than
> with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found no
> documented indication that this should be the case.
> Timings and logs pasted in below - wall time seems to be accumulating up
> in Update and Rest, adding up to >60% of total. The effect is still there
> without GPU, ca. 40% slowdown when switching from group to Verlet with the
> SD integrator
> System: Xeon E5-1620, 1x GTX 680, gromacs
> 4.6-beta3-dev-20130107-e66851a-unknown, GCC 4.4.6 and 4.7.0
>
> I didn't file a bug report yet as I don't have much variety of testing
> conditions available right now, I hope someone else has a moment to try to
> reproduce?
>
> Timings:
>
> cpu (ns/day)
> sd / verlet: 6
> sd / group: 10
> md / verlet: 9.2
> md / group: 11.4
>
> gpu (ns/day)
> sd / verlet: 11
> md / verlet: 29.8
>
>
>
> **************MD integrator, GPU / verlet
>
> M E G A - F L O P S   A C C O U N T I N G
>
>  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>  V&F=Potential and force  V=Potential only  F=Force only
>
>  Computing:                               M-Number         M-Flops  % Flops
>
> -----------------------------------------------------------------------------
>  Pair Search distance check            1244.988096       11204.893     0.1
>  NxN QSTab Elec. + VdW [F]           194846.615488     7988711.235    91.9
>  NxN QSTab Elec. + VdW [V&F]           2009.923008      118585.457     1.4
>  1,4 nonbonded interactions              31.616322        2845.469     0.0
>  Calc Weights                           703.010574       25308.381     0.3
>  Spread Q Bspline                     14997.558912       29995.118     0.3
>  Gather F Bspline                     14997.558912       89985.353     1.0
>  3D-FFT                               47658.567884      381268.543     4.4
>  Solve PME                               20.580896        1317.177     0.0
>  Shift-X                                  9.418458          56.511     0.0
>  Angles                                  21.879375        3675.735     0.0
>  Propers                                 48.599718       11129.335     0.1
>  Virial                                  23.498403         422.971     0.0
>  Stop-CM                                  2.436616          24.366     0.0
>  Calc-Ekin                               93.809716        2532.862     0.0
>  Lincs                                   12.147284         728.837     0.0
>  Lincs-Mat                              131.328750         525.315     0.0
>  Constraint-V                           246.633614        1973.069     0.0
>  Constraint-Vir                          23.486379         563.673     0.0
>  Settle                                  74.129451       23943.813     0.3
>
> -----------------------------------------------------------------------------
>  Total                                                 8694798.114   100.0
>
> -----------------------------------------------------------------------------
>
>
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
>  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
>
> -----------------------------------------------------------------------------
>  Neighbor search        1    8        201       0.944       27.206     3.3
>  Launch GPU ops.        1    8       5001       0.371       10.690     1.3
>  Force                  1    8       5001       2.185       62.987     7.7
>  PME mesh               1    8       5001      15.033      433.441    52.9
>  Wait GPU local         1    8       5001       1.551       44.719     5.5
>  NB X/F buffer ops.     1    8       9801       0.538       15.499     1.9
>  Write traj.            1    8          2       0.725       20.912     2.6
>  Update                 1    8       5001       2.318       66.826     8.2
>  Constraints            1    8       5001       2.898       83.551    10.2
>  Rest                   1                       1.832       52.828     6.5
>
> -----------------------------------------------------------------------------
>  Total                  1                      28.394      818.659   100.0
>
> -----------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------
>  PME spread/gather      1    8      10002       8.745      252.144    30.8
>  PME 3D-FFT             1    8      10002       5.392      155.458    19.0
>  PME solve              1    8       5001       0.869       25.069     3.1
>
> -----------------------------------------------------------------------------
>
>  GPU timings
>
> -----------------------------------------------------------------------------
>  Computing:                         Count  Wall t (s)      ms/step       %
>
> -----------------------------------------------------------------------------
>  Pair list H2D                        201       0.080        0.397     0.4
>  X / q H2D                           5001       0.698        0.140     3.7
>  Nonbonded F kernel                  4400      14.856        3.376    79.1
>  Nonbonded F+ene k.                   400       1.667        4.167     8.9
>  Nonbonded F+prune k.                 100       0.441        4.407     2.3
>  Nonbonded F+ene+prune k.             101       0.535        5.300     2.9
>  F D2H                               5001       0.501        0.100     2.7
>
> -----------------------------------------------------------------------------
>  Total                                         18.778        3.755   100.0
>
> -----------------------------------------------------------------------------
>
>  Force evaluation time GPU/CPU: 3.755 ms/3.443 ms = 1.091
> For optimal performance this ratio should be close to 1!
>
>
>                Core t (s)   Wall t (s)        (%)
>        Time:      221.730       28.394      780.9
>                  (ns/day)    (hour/ns)
> Performance:       30.435        0.789
>
>
>
>
> *****************SD integrator, GPU / verlet
>
> M E G A - F L O P S   A C C O U N T I N G
>
>  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>  V&F=Potential and force  V=Potential only  F=Force only
>
>  Computing:                               M-Number         M-Flops  % Flops
>
> -----------------------------------------------------------------------------
>  Pair Search distance check            1254.604928       11291.444     0.1
>  NxN QSTab Elec. + VdW [F]           197273.059584     8088195.443    91.6
>  NxN QSTab Elec. + VdW [V&F]           2010.150784      118598.896     1.3
>  1,4 nonbonded interactions              31.616322        2845.469     0.0
>  Calc Weights                           703.010574       25308.381     0.3
>  Spread Q Bspline                     14997.558912       29995.118     0.3
>  Gather F Bspline                     14997.558912       89985.353     1.0
>  3D-FFT                               47473.892284      379791.138     4.3
>  Solve PME                               20.488896        1311.289     0.0
>  Shift-X                                  9.418458          56.511     0.0
>  Angles                                  21.879375        3675.735     0.0
>  Propers                                 48.599718       11129.335     0.1
>  Virial                                  23.498403         422.971     0.0
>  Update                                 234.336858        7264.443     0.1
>  Stop-CM                                  2.436616          24.366     0.0
>  Calc-Ekin                               93.809716        2532.862     0.0
>  Lincs                                   24.289712        1457.383     0.0
>  Lincs-Mat                              262.605000        1050.420     0.0
>  Constraint-V                           246.633614        1973.069     0.0
>  Constraint-Vir                          23.486379         563.673     0.0
>  Settle                                 148.229268       47878.054     0.5
>
> -----------------------------------------------------------------------------
>  Total                                                 8825351.354   100.0
>
> -----------------------------------------------------------------------------
>
>
>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
>  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
>
> -----------------------------------------------------------------------------
>  Neighbor search        1    8        201       0.945       27.212     1.2
>  Launch GPU ops.        1    8       5001       0.384       11.069     0.5
>  Force                  1    8       5001       2.180       62.791     2.7
>  PME mesh               1    8       5001      15.029      432.967    18.5
>  Wait GPU local         1    8       5001       3.327       95.844     4.1
>  NB X/F buffer ops.     1    8       9801       0.542       15.628     0.7
>  Write traj.            1    8          2       0.749       21.582     0.9
>  Update                 1    8       5001      28.044      807.908    34.5
>  Constraints            1    8      10002       5.562      160.243     6.8
>  Rest                   1                      24.488      705.458    30.1
>
> -----------------------------------------------------------------------------
>  Total                  1                      81.250     2340.701   100.0
>
> -----------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------
>  PME spread/gather      1    8      10002       8.769      252.615    10.8
>  PME 3D-FFT             1    8      10002       5.367      154.630     6.6
>  PME solve              1    8       5001       0.865       24.910     1.1
>
> -----------------------------------------------------------------------------
>
>  GPU timings
>
> -----------------------------------------------------------------------------
>  Computing:                         Count  Wall t (s)      ms/step       %
>
> -----------------------------------------------------------------------------
>  Pair list H2D                        201       0.080        0.398     0.4
>  X / q H2D                           5001       0.699        0.140     3.4
>  Nonbonded F kernel                  4400      16.271        3.698    79.6
>  Nonbonded F+ene k.                   400       1.827        4.568     8.9
>  Nonbonded F+prune k.                 100       0.482        4.816     2.4
>  Nonbonded F+ene+prune k.             101       0.584        5.787     2.9
>  F D2H                               5001       0.505        0.101     2.5
>
> -----------------------------------------------------------------------------
>  Total                                         20.448        4.089   100.0
>
> -----------------------------------------------------------------------------
>
>  Force evaluation time GPU/CPU: 4.089 ms/3.441 ms = 1.188
> For optimal performance this ratio should be close to 1!
>
>                Core t (s)   Wall t (s)        (%)
>        Time:      643.440       81.250      791.9
>                  (ns/day)    (hour/ns)
> Performance:       10.636        2.256
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list