[gmx-users] How to increase the volatile gpu-util
leila karami
karami.leila1 at gmail.com
Sat Jul 29 07:23:49 CEST 2017
Dear Mark,
Thank for your answer.
> Much more relevant is what gromacs reports in the end of the log file.
The end of my log file is as follows:
============================================================
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 388751078.685856 3498759708.173 0.3
NxN RF Elec. + LJ [F] 28110392002.016254 1068194896076.618 98.2
NxN RF Elec. + LJ [V&F] 283943606.686080 15332954761.048 1.4
Shift-X 5089620.339308 30537722.036 0.0
Bonds 2100000.007000 123900000.413 0.0
Angles 1650000.005500 277200000.924 0.0
Impropers 150000.000500 31200000.104 0.0
Virial 5090295.339353 91625316.108 0.0
Stop-CM 1017924.678616 10179246.786 0.0
P-Coupling 5089620.000000 30537720.000 0.0
Calc-Ekin 10179240.678616 274839498.323 0.0
Lincs 525000.005250 31500000.315 0.0
Lincs-Mat 10800000.108000 43200000.432 0.0
Constraint-V 1050000.007000 8400000.056 0.0
Constraint-Vir 26250.001750 630000.042 0.0
-----------------------------------------------------------------------------
Total 1087980360051.378 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 16 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 16 15000001 149304.198 5242873.051 11.7
Launch GPU ops. 1 16 300000001 17306.465 607723.034 1.4
70691,2 99%
Neighbor search 1 16 15000001 149304.198 5242873.051 11.7
Launch GPU ops. 1 16 300000001 17306.465 607723.034 1.4
Force 1 16 300000001 398616.578 13997571.024 31.3
Wait GPU local 1 16 300000001 371904.849 13059578.625 29.2
NB X/F buffer ops. 1 16 585000001 131860.152 4630318.829 10.4
Write traj. 1 16 7414 1428.772 50171.859 0.1
Update 1 16 300000001 136357.663 4788250.627 10.7
Constraints 1 16 300000001 12172.356 427436.850 1.0
Rest 52848.935 1855810.221 4.2
-----------------------------------------------------------------------------
Total 1271799.968 44659734.120 100.0
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 15000001 18161.643 1.211 2.3
X / q H2D 300000001 274639.405 0.915 35.2
Nonbonded F kernel 285000000 276274.002 0.969 35.4
Nonbonded F+prune k. 12000000 15443.741 1.287 2.0
Nonbonded F+ene+prune k. 3000001 3975.874 1.325 0.5
F D2H 300000001 191562.559 0.639 24.6
-----------------------------------------------------------------------------
Total 780057.223 2.600 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 2.600 ms/1.329 ms = 1.957
NOTE: 12 % of the run time was spent in pair search,
you might want to increase nstlist (this has no effect on accuracy)
Core t (s) Wall t (s) (%)
Time: 20365044.088 1271799.968 1601.3
14d17h16:39
(ns/day) (hour/ns)
Performance: 611.417 0.039
Finished mdrun on rank 0 Sat Jul 29 05:40:47 2017
=============================================================
Best,
More information about the gromacs.org_gmx-users
mailing list