[gmx-developers] performance 4.6.3 vs 5.0rc1
Mirco Wahab
mirco.wahab at chemie.tu-freiberg.de
Fri Jun 27 00:07:31 CEST 2014
Performance test on a large system:
2.4 x 10^6 particles,
MARTINI vesicle in water
GTX-660Ti, 6-core Phenom II X6
- nstlist = 40
- rlist = 2.4
- coulombtype = Reaction-Field
- cutoff-scheme = verlet
- coulomb-modifier = Potential-shift
- epsilon_rf = 0
- verlet-buffer-drift = 0.005
- rcoulomb = 1.1
- rcoulomb_switch = 0.0
- epsilon_r = 15
- vdw_type = cut-off
- rvdw_switch = 0.9
- rvdw = 1.1
- vdw-modifier = Potential-shift
- tcoupl = v-rescale ; Berendsen
- tc-grps = DPPC BSCHX W
- tau_t = 1.0 1.0 1.0
- ref_t = 315 315 315
- Pcoupl = Berendsen
- Pcoupltype = isotropic
- tau_p = 6
Both tests start *from the same tpr* (generated w/4.6.3)
4.6.3 8.363 ns/day
5.0.rc1 6.604 ns/day
log file summaries here ==>
======================= 4.6.3 ========================================
R E A L C Y C L E A N D T I M E A C C O U N T I N G
Computing: Nodes Th. Count Wall t (s) G-Cycles %
-----------------------------------------------------------------------------
Neighbor search 1 6 38 20.559 395.924 6.7
Launch GPU ops. 1 6 1481 0.439 8.445 0.1
Force 1 6 1481 60.728 1169.515 19.8
Wait GPU local 1 6 1481 57.421 1105.832 18.8
NB X/F buffer ops. 1 6 2924 53.788 1035.867 17.6
Write traj. 1 6 2 2.084 40.137 0.7
Update 1 6 1481 22.860 440.249 7.5
Constraints 1 6 1481 57.303 1103.547 18.7
Rest 1 30.818 593.509 10.1
-----------------------------------------------------------------------------
Total 1 306.000 5893.027 100.0
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 38 0.640 16.844 0.5
X / q H2D 1481 11.715 7.910 10.0
Nonbonded F kernel 1436 95.190 66.288 81.1
Nonbonded F+ene k. 7 0.473 67.625 0.4
Nonbonded F+ene+prune k. 38 2.723 71.645 2.3
F D2H 1481 6.648 4.489 5.7
-----------------------------------------------------------------------------
Total 117.389 79.263 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 79.263 ms/41.005 ms = 1.933
======================= 5.0rc1 ========================================
On 1 MPI rank, each using 6 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Nodes Threads Count (s) total sum %
-----------------------------------------------------------------------------
Neighbor search 1 6 69 43.134 906.243 6.1
Launch GPU ops. 1 6 2721 0.931 19.563 0.1
Force 1 6 2721 124.095 2607.240 17.4
Wait GPU local 1 6 2721 167.917 3527.955 23.6
NB X/F buffer ops. 1 6 5373 140.788 2957.975 19.8
Write traj. 1 6 2 2.346 49.282 0.3
Update 1 6 2721 59.083 1241.340 8.3
Constraints 1 6 2721 111.926 2351.573 15.7
Rest 61.781 1298.019 8.7
-----------------------------------------------------------------------------
Total 712.000 14959.191 100.0
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 69 1.555 22.542 0.5
X / q H2D 2721 27.089 9.955 9.3
Nonbonded F kernel 2638 240.172 91.043 82.7
Nonbonded F+ene k. 14 1.324 94.599 0.5
Nonbonded F+ene+prune k. 69 7.308 105.912 2.5
F D2H 2721 13.105 4.816 4.5
-----------------------------------------------------------------------------
Total 290.554 106.782 100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 106.782 ms/45.606 ms = 2.341
More information about the gromacs.org_gmx-developers
mailing list