[gmx-users] How to increase the volatile gpu-util

leila karami karami.leila1 at gmail.com
Sat Jul 29 07:23:49 CEST 2017


Dear Mark,

Thank for your answer.

> Much more relevant is what gromacs reports in the end of the log file.

The end of my log file is as follows:

============================================================

        M E G A - F L O P S   A C C O U N T I N G

NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 Pair Search distance check       388751078.685856  3498759708.173     0.3
 NxN RF Elec. + LJ [F]            28110392002.016254 1068194896076.618    98.2
 NxN RF Elec. + LJ [V&F]          283943606.686080 15332954761.048     1.4
 Shift-X                            5089620.339308    30537722.036     0.0
 Bonds                              2100000.007000   123900000.413     0.0
 Angles                             1650000.005500   277200000.924     0.0
 Impropers                           150000.000500    31200000.104     0.0
 Virial                             5090295.339353    91625316.108     0.0
 Stop-CM                            1017924.678616    10179246.786     0.0
 P-Coupling                         5089620.000000    30537720.000     0.0
 Calc-Ekin                         10179240.678616   274839498.323     0.0
 Lincs                               525000.005250    31500000.315     0.0
 Lincs-Mat                         10800000.108000    43200000.432     0.0
 Constraint-V                       1050000.007000     8400000.056     0.0
 Constraint-Vir                       26250.001750      630000.042     0.0
-----------------------------------------------------------------------------
 Total                                             1087980360051.378   100.0
-----------------------------------------------------------------------------


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 16 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Neighbor search        1   16   15000001  149304.198    5242873.051  11.7
 Launch GPU ops.        1   16  300000001   17306.465     607723.034   1.4


                               70691,2       99%
 Neighbor search        1   16   15000001  149304.198    5242873.051  11.7
 Launch GPU ops.        1   16  300000001   17306.465     607723.034   1.4
 Force                  1   16  300000001  398616.578   13997571.024  31.3
 Wait GPU local         1   16  300000001  371904.849   13059578.625  29.2
 NB X/F buffer ops.     1   16  585000001  131860.152    4630318.829  10.4
 Write traj.            1   16       7414    1428.772      50171.859   0.1
 Update                 1   16  300000001  136357.663    4788250.627  10.7
 Constraints            1   16  300000001   12172.356     427436.850   1.0
 Rest                                       52848.935    1855810.221   4.2
-----------------------------------------------------------------------------
 Total                                     1271799.968   44659734.120 100.0
-----------------------------------------------------------------------------

 GPU timings
-----------------------------------------------------------------------------
 Computing:                         Count  Wall t (s)      ms/step       %
-----------------------------------------------------------------------------
 Pair list H2D                   15000001   18161.643        1.211     2.3
 X / q H2D                      300000001  274639.405        0.915    35.2
 Nonbonded F kernel             285000000  276274.002        0.969    35.4
 Nonbonded F+prune k.            12000000   15443.741        1.287     2.0
 Nonbonded F+ene+prune k.         3000001    3975.874        1.325     0.5
 F D2H                          300000001  191562.559        0.639    24.6
-----------------------------------------------------------------------------
 Total                                     780057.223        2.600   100.0
-----------------------------------------------------------------------------

Force evaluation time GPU/CPU: 2.600 ms/1.329 ms = 1.957

NOTE: 12 % of the run time was spent in pair search,
      you might want to increase nstlist (this has no effect on accuracy)


               Core t (s)   Wall t (s)        (%)
       Time: 20365044.088  1271799.968     1601.3
                         14d17h16:39
                 (ns/day)    (hour/ns)
Performance:      611.417        0.039
Finished mdrun on rank 0 Sat Jul 29 05:40:47 2017

=============================================================

Best,


More information about the gromacs.org_gmx-users mailing list