[gmx-users] hardware problem of GPU?
Albert
mailmd2011 at gmail.com
Fri Jun 6 08:11:44 CEST 2014
Hi Mark:
thanks a lot for reply. Here is my log file informations. I've got
another GPU machine with two GTX690, and the double CPU job is much
faster than single GPU. But this dual GTX780Ti is not the case, so I
carious about what's happening to the hardware since Gromacs was
compiled in the same way, and the testing system are the same.
thanks a lot
-----------------------------------------------log---------------------------------------------------------------------------------------------
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 449758.183440 4047823.651 0.1
NxN Ewald Elec. + VdW [F] 114203606.933184 7537438057.590 95.3
NxN Ewald Elec. + VdW [V&F] 1153624.365888 123437807.150 1.6
1,4 nonbonded interactions 30707.512283 2763676.105 0.0
Calc Weights 413752.665501 14895095.958 0.2
Spread Q Bspline 8826723.530688 17653447.061 0.2
Gather F Bspline 8826723.530688 52960341.184 0.7
3D-FFT 15297568.453746 122380547.630 1.5
Solve PME 7839.867456 501751.517 0.0
Shift-X 3447.992667 20687.956 0.0
Angles 21342.508537 3585541.434 0.0
Propers 32957.513183 7547270.519 0.1
Impropers 3147.501259 654680.262 0.0
RB-Dihedrals 87.500035 21612.509 0.0
Virial 13803.055212 248454.994 0.0
Stop-CM 1379.285334 13792.853 0.0
Calc-Ekin 27583.610334 744757.479 0.0
Lincs 11865.004746 711900.285 0.0
Lincs-Mat 256110.102444 1024440.410 0.0
Constraint-V 149670.059868 1197360.479 0.0
Constraint-Vir 13780.555122 330733.323 0.0
Settle 41980.016792 13559545.424 0.2
-----------------------------------------------------------------------------
Total 7905739325.773 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Th. Count Wall t (s) G-Cycles %
-----------------------------------------------------------------------------
Neighbor search 1 20 62501 156.247 9380.162 2.3
Launch GPU ops. 1 20 2500001 182.404 10950.541 2.7
Force 1 20 2500001 1047.581 62890.858 15.3
PME mesh 1 20 2500001 2546.280 152864.323 37.3
Wait GPU local 1 20 2500001 808.773 48554.193 11.8
NB X/F buffer ops. 1 20 4937501 114.557 6877.380 1.7
Write traj. 1 20 58 1.380 82.874 0.0
Update 1 20 2500001 519.331 31177.740 7.6
Constraints 1 20 2500001 757.477 45474.638 11.1
Rest 1 694.482 41692.777 10.2
-----------------------------------------------------------------------------
Total 1 6828.512 409945.484 100.0
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
PME spread/gather 1 20 5000002 1910.053 114668.815 28.0
PME 3D-FFT 1 20 5000002 516.241 30992.236 7.6
PME solve 1 20 2500001 112.115 6730.761 1.6
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 62501 14.934 0.239 0.3
X / q H2D 2500001 206.939 0.083 4.6
Nonbonded F kernel 2250000 3527.275 1.568 78.8
Nonbonded F+ene k. 187500 405.370 2.162 9.1
Nonbonded F+ene+prune k. 62501 167.980 2.688 3.8
F D2H 2500001 154.048 0.062 3.4
-----------------------------------------------------------------------------
Total 4476.545 1.791 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 1.791 ms/1.438 ms = 1.246
For optimal performance this ratio should be close to 1!
NOTE: The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer
PME grid.
Core t (s) Wall t (s) (%)
Time: 136384.758 6828.512 1997.3
1h53:48
(ns/day) (hour/ns)
Performance: 63.264 0.379
On 06/05/2014 10:05 PM, Mark Abraham wrote:
> What did you learn from the performance output at the end of the log file?
>
> Mark
More information about the gromacs.org_gmx-users
mailing list