[gmx-users] hardware problem of GPU?

Albert mailmd2011 at gmail.com
Fri Jun 6 08:11:44 CEST 2014


Hi Mark:

thanks a lot for reply.  Here is my log file informations.  I've got 
another GPU machine with two GTX690, and the double CPU job is much 
faster than single GPU. But this dual GTX780Ti is not the case, so I 
carious about what's happening to the hardware since Gromacs was 
compiled in the same way, and the testing system are the same.

thanks a lot

-----------------------------------------------log---------------------------------------------------------------------------------------------
  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
  V&F=Potential and force  V=Potential only  F=Force only

  Computing:                               M-Number         M-Flops % Flops
-----------------------------------------------------------------------------
  Pair Search distance check          449758.183440 4047823.651     0.1
  NxN Ewald Elec. + VdW [F]        114203606.933184 7537438057.590    95.3
  NxN Ewald Elec. + VdW [V&F]        1153624.365888 123437807.150     1.6
  1,4 nonbonded interactions           30707.512283 2763676.105     0.0
  Calc Weights                        413752.665501 14895095.958     0.2
  Spread Q Bspline                   8826723.530688 17653447.061     0.2
  Gather F Bspline                   8826723.530688 52960341.184     0.7
  3D-FFT                            15297568.453746 122380547.630     1.5
  Solve PME                             7839.867456 501751.517     0.0
  Shift-X                               3447.992667 20687.956     0.0
  Angles                               21342.508537 3585541.434     0.0
  Propers                              32957.513183 7547270.519     0.1
  Impropers                             3147.501259 654680.262     0.0
  RB-Dihedrals                            87.500035 21612.509     0.0
  Virial                               13803.055212 248454.994     0.0
  Stop-CM                               1379.285334 13792.853     0.0
  Calc-Ekin                            27583.610334 744757.479     0.0
  Lincs                                11865.004746 711900.285     0.0
  Lincs-Mat                           256110.102444 1024440.410     0.0
  Constraint-V                        149670.059868 1197360.479     0.0
  Constraint-Vir                       13780.555122 330733.323     0.0
  Settle                               41980.016792 13559545.424     0.2
-----------------------------------------------------------------------------
  Total                                              7905739325.773 100.0
-----------------------------------------------------------------------------


      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:         Nodes   Th.     Count  Wall t (s) G-Cycles       %
-----------------------------------------------------------------------------
  Neighbor search        1   20      62501     156.247 9380.162     2.3
  Launch GPU ops.        1   20    2500001     182.404 10950.541     2.7
  Force                  1   20    2500001    1047.581 62890.858    15.3
  PME mesh               1   20    2500001    2546.280 152864.323    37.3
  Wait GPU local         1   20    2500001     808.773 48554.193    11.8
  NB X/F buffer ops.     1   20    4937501     114.557 6877.380     1.7
  Write traj.            1   20         58       1.380 82.874     0.0
  Update                 1   20    2500001     519.331 31177.740     7.6
  Constraints            1   20    2500001     757.477 45474.638    11.1
  Rest                   1                     694.482 41692.777    10.2
-----------------------------------------------------------------------------
  Total                  1                    6828.512   409945.484 100.0
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
  PME spread/gather      1   20    5000002    1910.053 114668.815    28.0
  PME 3D-FFT             1   20    5000002     516.241 30992.236     7.6
  PME solve              1   20    2500001     112.115 6730.761     1.6
-----------------------------------------------------------------------------

  GPU timings
-----------------------------------------------------------------------------
  Computing:                         Count  Wall t (s) ms/step       %
-----------------------------------------------------------------------------
  Pair list H2D                      62501      14.934 0.239     0.3
  X / q H2D                        2500001     206.939 0.083     4.6
  Nonbonded F kernel               2250000    3527.275 1.568    78.8
  Nonbonded F+ene k.                187500     405.370 2.162     9.1
  Nonbonded F+ene+prune k.           62501     167.980 2.688     3.8
  F D2H                            2500001     154.048 0.062     3.4
-----------------------------------------------------------------------------
  Total                                       4476.545        1.791 100.0
-----------------------------------------------------------------------------

Force evaluation time GPU/CPU: 1.791 ms/1.438 ms = 1.246
For optimal performance this ratio should be close to 1!


NOTE: The GPU has >20% more load than the CPU. This imbalance causes
       performance loss, consider using a shorter cut-off and a finer 
PME grid.

                Core t (s)   Wall t (s)        (%)
        Time:   136384.758     6828.512     1997.3
                          1h53:48
                  (ns/day)    (hour/ns)
Performance:       63.264        0.379




On 06/05/2014 10:05 PM, Mark Abraham wrote:
> What did you learn from the performance output at the end of the log file?
>
> Mark



More information about the gromacs.org_gmx-users mailing list