[gmx-users] The question of performance of GPU acceleration
DeChang Li
li.dc06 at gmail.com
Wed Jul 6 05:56:36 CEST 2016
Dear all,
I used GPU acceleration in Gromacs-5.0.4. I want to know whether the
acceleration performance is good or not.
Here is my hardware:
2 Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, totally 12 physical cores.
2 board of NVIDIA Tesla K10 GPU, totally 6144 GPU processor cores.
32GB DDR4 2133MHz memory
My simulation system contain about 480,000 atoms, used PME with grid 0.16,
pme_order=6, non-bonded cut-off 1nm, nstlist = 40.
The following is the performance:
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
NB VdW [V&F] 30048.030048 30048.030 0.0
Pair Search distance check 1643106.927152 14787962.344 0.1
NxN Ewald Elec. + LJ [F] 422678896.374912 27896807160.744 95.8
NxN Ewald Elec. + LJ [V&F] 4269950.969088 456884753.692 1.6
1,4 nonbonded interactions 43309.043309 3897813.898 0.0
Calc Weights 1426624.426623 51358479.358 0.2
Spread Q Bspline 30434654.434624 60869308.869 0.2
Gather F Bspline 30434654.434624 182607926.608 0.6
3D-FFT 44406686.579502 355253492.636 1.2
Solve PME 138238.986240 8847295.119 0.0
Reset In Box 11888.525000 35665.575 0.0
CG-CoM 11889.000541 35667.002 0.0
Propers 36440.036440 8344768.345 0.0
Impropers 2746.002746 571168.571 0.0
Virial 19043.716081 342786.889 0.0
Stop-CM 4755.885541 47558.855 0.0
Calc-Ekin 95109.151082 2567947.079 0.0
Lincs 27924.896602 1675493.796 0.0
Lincs-Mat 809415.182240 3237660.729 0.0
Constraint-V 539622.857861 4316982.863 0.0
Constraint-Vir 20468.413005 491241.912 0.0
Settle 161257.688219 52086233.295 0.2
(null) 1060.001060 0.000 0.0
-----------------------------------------------------------------------------
Total 29105097416.211 100.0
-----------------------------------------------------------------------------
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 341414.5
av. #atoms communicated per step for LINCS: 2 x 36121.5
Average load imbalance: 75.3 %
Part of the total run time spent waiting due to load imbalance: 4.1 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: Y 2
%
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 12 MPI ranks
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Domain decomp. 12 1 25000 1011.294 29057.964 2.9
DD comm. load 12 1 25000 85.817 2465.806 0.2
DD comm. bounds 12 1 25000 124.974 3590.946 0.4
Neighbor search 12 1 25001 483.458 13891.413 1.4
Launch GPU ops. 12 1 2000002 124.094 3565.638 0.4
Comm. coord. 12 1 975000 1905.608 54754.706 5.5
Force 12 1 1000001 1489.965 42811.850 4.3
Wait + Comm. F 12 1 1000001 435.575 12515.570 1.3
PME mesh 12 1 1000001 19507.755 560525.285 56.2
Wait GPU nonlocal 12 1 1000001 17.722 509.211 0.1
Wait GPU local 12 1 1000001 5.608 161.146 0.0
NB X/F buffer ops. 12 1 3950002 458.601 13177.195 1.3
COM pull force 12 1 1000001 640.120 18392.870 1.8
Write traj. 12 1 539 17.620 506.289 0.1
Update 12 1 1000001 1912.108 54941.466 5.5
Constraints 12 1 1000001 5255.916 151020.645 15.1
Comm. energies 12 1 100001 916.317 26328.958 2.6
Rest 300.654 8638.816 0.9
-----------------------------------------------------------------------------
Total 34693.203 996855.772 100.0
-----------------------------------------------------------------------------
Breakdown of PME mesh computation
-----------------------------------------------------------------------------
PME redist. X/F 12 1 2000002 7236.634 207933.530 20.9
PME spread/gather 12 1 2000002 7925.435 227725.169 22.8
PME 3D-FFT 12 1 2000002 2616.866 75191.620 7.5
PME 3D-FFT Comm. 12 1 2000002 1506.177 43277.664 4.3
PME solve Elec 12 1 1000001 217.415 6247.077 0.6
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 415256.517 34693.203 1196.9
9h38:13
(ns/day) (hour/ns)
Performance: 4.981 4.818
More information about the gromacs.org_gmx-users
mailing list