[gmx-users] >60% slowdown with GPU / verlet and sd integrator

James Starlight jmsstarlight at gmail.com
Wed Jan 16 15:26:18 CET 2013


Hi all!

I've also done some calculations with the SD integraator used as the
thermostat ( without t_coupl ) with the system of 65k atoms I obtained
10ns\day performance on gtc 670 and 4th core i5.
I haventrun any simulations with MD integrator yet so It should test it.

James

2013/1/15 Szilárd Páll <szilard.pall at cbr.su.se>:
> Hi Floris,
>
> Great feedback, this needs to be looked into. Could you please file a bug
> report, preferably with a tpr (and/or all inputs) as well as log files.
>
> Thanks,
>
> --
> Szilárd
>
>
> On Tue, Jan 15, 2013 at 3:50 AM, Floris Buelens <floris_buelens at yahoo.com>wrote:
>
>> Hi,
>>
>>
>> I'm seeing MD simulation running a lot slower with the sd integrator than
>> with md - ca. 10 vs. 30 ns/day for my 47000 atom system. I found no
>> documented indication that this should be the case.
>> Timings and logs pasted in below - wall time seems to be accumulating up
>> in Update and Rest, adding up to >60% of total. The effect is still there
>> without GPU, ca. 40% slowdown when switching from group to Verlet with the
>> SD integrator
>> System: Xeon E5-1620, 1x GTX 680, gromacs
>> 4.6-beta3-dev-20130107-e66851a-unknown, GCC 4.4.6 and 4.7.0
>>
>> I didn't file a bug report yet as I don't have much variety of testing
>> conditions available right now, I hope someone else has a moment to try to
>> reproduce?
>>
>> Timings:
>>
>> cpu (ns/day)
>> sd / verlet: 6
>> sd / group: 10
>> md / verlet: 9.2
>> md / group: 11.4
>>
>> gpu (ns/day)
>> sd / verlet: 11
>> md / verlet: 29.8
>>
>>
>>
>> **************MD integrator, GPU / verlet
>>
>> M E G A - F L O P S   A C C O U N T I N G
>>
>>  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>>  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>>  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>>  V&F=Potential and force  V=Potential only  F=Force only
>>
>>  Computing:                               M-Number         M-Flops  % Flops
>>
>> -----------------------------------------------------------------------------
>>  Pair Search distance check            1244.988096       11204.893     0.1
>>  NxN QSTab Elec. + VdW [F]           194846.615488     7988711.235    91.9
>>  NxN QSTab Elec. + VdW [V&F]           2009.923008      118585.457     1.4
>>  1,4 nonbonded interactions              31.616322        2845.469     0.0
>>  Calc Weights                           703.010574       25308.381     0.3
>>  Spread Q Bspline                     14997.558912       29995.118     0.3
>>  Gather F Bspline                     14997.558912       89985.353     1.0
>>  3D-FFT                               47658.567884      381268.543     4.4
>>  Solve PME                               20.580896        1317.177     0.0
>>  Shift-X                                  9.418458          56.511     0.0
>>  Angles                                  21.879375        3675.735     0.0
>>  Propers                                 48.599718       11129.335     0.1
>>  Virial                                  23.498403         422.971     0.0
>>  Stop-CM                                  2.436616          24.366     0.0
>>  Calc-Ekin                               93.809716        2532.862     0.0
>>  Lincs                                   12.147284         728.837     0.0
>>  Lincs-Mat                              131.328750         525.315     0.0
>>  Constraint-V                           246.633614        1973.069     0.0
>>  Constraint-Vir                          23.486379         563.673     0.0
>>  Settle                                  74.129451       23943.813     0.3
>>
>> -----------------------------------------------------------------------------
>>  Total                                                 8694798.114   100.0
>>
>> -----------------------------------------------------------------------------
>>
>>
>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>>  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
>>
>> -----------------------------------------------------------------------------
>>  Neighbor search        1    8        201       0.944       27.206     3.3
>>  Launch GPU ops.        1    8       5001       0.371       10.690     1.3
>>  Force                  1    8       5001       2.185       62.987     7.7
>>  PME mesh               1    8       5001      15.033      433.441    52.9
>>  Wait GPU local         1    8       5001       1.551       44.719     5.5
>>  NB X/F buffer ops.     1    8       9801       0.538       15.499     1.9
>>  Write traj.            1    8          2       0.725       20.912     2.6
>>  Update                 1    8       5001       2.318       66.826     8.2
>>  Constraints            1    8       5001       2.898       83.551    10.2
>>  Rest                   1                       1.832       52.828     6.5
>>
>> -----------------------------------------------------------------------------
>>  Total                  1                      28.394      818.659   100.0
>>
>> -----------------------------------------------------------------------------
>>
>> -----------------------------------------------------------------------------
>>  PME spread/gather      1    8      10002       8.745      252.144    30.8
>>  PME 3D-FFT             1    8      10002       5.392      155.458    19.0
>>  PME solve              1    8       5001       0.869       25.069     3.1
>>
>> -----------------------------------------------------------------------------
>>
>>  GPU timings
>>
>> -----------------------------------------------------------------------------
>>  Computing:                         Count  Wall t (s)      ms/step       %
>>
>> -----------------------------------------------------------------------------
>>  Pair list H2D                        201       0.080        0.397     0.4
>>  X / q H2D                           5001       0.698        0.140     3.7
>>  Nonbonded F kernel                  4400      14.856        3.376    79.1
>>  Nonbonded F+ene k.                   400       1.667        4.167     8.9
>>  Nonbonded F+prune k.                 100       0.441        4.407     2.3
>>  Nonbonded F+ene+prune k.             101       0.535        5.300     2.9
>>  F D2H                               5001       0.501        0.100     2.7
>>
>> -----------------------------------------------------------------------------
>>  Total                                         18.778        3.755   100.0
>>
>> -----------------------------------------------------------------------------
>>
>>  Force evaluation time GPU/CPU: 3.755 ms/3.443 ms = 1.091
>> For optimal performance this ratio should be close to 1!
>>
>>
>>                Core t (s)   Wall t (s)        (%)
>>        Time:      221.730       28.394      780.9
>>                  (ns/day)    (hour/ns)
>> Performance:       30.435        0.789
>>
>>
>>
>>
>> *****************SD integrator, GPU / verlet
>>
>> M E G A - F L O P S   A C C O U N T I N G
>>
>>  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>>  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>>  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>>  V&F=Potential and force  V=Potential only  F=Force only
>>
>>  Computing:                               M-Number         M-Flops  % Flops
>>
>> -----------------------------------------------------------------------------
>>  Pair Search distance check            1254.604928       11291.444     0.1
>>  NxN QSTab Elec. + VdW [F]           197273.059584     8088195.443    91.6
>>  NxN QSTab Elec. + VdW [V&F]           2010.150784      118598.896     1.3
>>  1,4 nonbonded interactions              31.616322        2845.469     0.0
>>  Calc Weights                           703.010574       25308.381     0.3
>>  Spread Q Bspline                     14997.558912       29995.118     0.3
>>  Gather F Bspline                     14997.558912       89985.353     1.0
>>  3D-FFT                               47473.892284      379791.138     4.3
>>  Solve PME                               20.488896        1311.289     0.0
>>  Shift-X                                  9.418458          56.511     0.0
>>  Angles                                  21.879375        3675.735     0.0
>>  Propers                                 48.599718       11129.335     0.1
>>  Virial                                  23.498403         422.971     0.0
>>  Update                                 234.336858        7264.443     0.1
>>  Stop-CM                                  2.436616          24.366     0.0
>>  Calc-Ekin                               93.809716        2532.862     0.0
>>  Lincs                                   24.289712        1457.383     0.0
>>  Lincs-Mat                              262.605000        1050.420     0.0
>>  Constraint-V                           246.633614        1973.069     0.0
>>  Constraint-Vir                          23.486379         563.673     0.0
>>  Settle                                 148.229268       47878.054     0.5
>>
>> -----------------------------------------------------------------------------
>>  Total                                                 8825351.354   100.0
>>
>> -----------------------------------------------------------------------------
>>
>>
>>      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>>
>>  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
>>
>> -----------------------------------------------------------------------------
>>  Neighbor search        1    8        201       0.945       27.212     1.2
>>  Launch GPU ops.        1    8       5001       0.384       11.069     0.5
>>  Force                  1    8       5001       2.180       62.791     2.7
>>  PME mesh               1    8       5001      15.029      432.967    18.5
>>  Wait GPU local         1    8       5001       3.327       95.844     4.1
>>  NB X/F buffer ops.     1    8       9801       0.542       15.628     0.7
>>  Write traj.            1    8          2       0.749       21.582     0.9
>>  Update                 1    8       5001      28.044      807.908    34.5
>>  Constraints            1    8      10002       5.562      160.243     6.8
>>  Rest                   1                      24.488      705.458    30.1
>>
>> -----------------------------------------------------------------------------
>>  Total                  1                      81.250     2340.701   100.0
>>
>> -----------------------------------------------------------------------------
>>
>> -----------------------------------------------------------------------------
>>  PME spread/gather      1    8      10002       8.769      252.615    10.8
>>  PME 3D-FFT             1    8      10002       5.367      154.630     6.6
>>  PME solve              1    8       5001       0.865       24.910     1.1
>>
>> -----------------------------------------------------------------------------
>>
>>  GPU timings
>>
>> -----------------------------------------------------------------------------
>>  Computing:                         Count  Wall t (s)      ms/step       %
>>
>> -----------------------------------------------------------------------------
>>  Pair list H2D                        201       0.080        0.398     0.4
>>  X / q H2D                           5001       0.699        0.140     3.4
>>  Nonbonded F kernel                  4400      16.271        3.698    79.6
>>  Nonbonded F+ene k.                   400       1.827        4.568     8.9
>>  Nonbonded F+prune k.                 100       0.482        4.816     2.4
>>  Nonbonded F+ene+prune k.             101       0.584        5.787     2.9
>>  F D2H                               5001       0.505        0.101     2.5
>>
>> -----------------------------------------------------------------------------
>>  Total                                         20.448        4.089   100.0
>>
>> -----------------------------------------------------------------------------
>>
>>  Force evaluation time GPU/CPU: 4.089 ms/3.441 ms = 1.188
>> For optimal performance this ratio should be close to 1!
>>
>>                Core t (s)   Wall t (s)        (%)
>>        Time:      643.440       81.250      791.9
>>                  (ns/day)    (hour/ns)
>> Performance:       10.636        2.256
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists



More information about the gromacs.org_gmx-users mailing list