[gmx-developers] performance 4.6.3 vs 5.0rc1

Mirco Wahab mirco.wahab at chemie.tu-freiberg.de
Fri Jun 27 00:07:31 CEST 2014


Performance test on a large system:

  2.4 x 10^6 particles,
  MARTINI vesicle in water
  GTX-660Ti, 6-core Phenom II X6

  - nstlist              = 40
  - rlist                = 2.4
  - coulombtype          = Reaction-Field
  - cutoff-scheme        = verlet
  - coulomb-modifier     = Potential-shift
  - epsilon_rf           = 0
  - verlet-buffer-drift  = 0.005
  - rcoulomb             = 1.1
  - rcoulomb_switch      = 0.0
  - epsilon_r            = 15
  - vdw_type             = cut-off
  - rvdw_switch          = 0.9
  - rvdw                 = 1.1
  - vdw-modifier         = Potential-shift
  - tcoupl               = v-rescale	; Berendsen
  - tc-grps              = DPPC BSCHX W
  - tau_t                = 1.0  1.0  1.0
  - ref_t                = 315  315  315
  - Pcoupl               = Berendsen
  - Pcoupltype           = isotropic
  - tau_p                = 6

Both tests start *from the same tpr* (generated w/4.6.3)
4.6.3    8.363 ns/day
5.0.rc1  6.604 ns/day

log file summaries here ==>

======================= 4.6.3 ========================================
      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
-----------------------------------------------------------------------------
  Neighbor search        1    6         38      20.559      395.924     6.7
  Launch GPU ops.        1    6       1481       0.439        8.445     0.1
  Force                  1    6       1481      60.728     1169.515    19.8
  Wait GPU local         1    6       1481      57.421     1105.832    18.8
  NB X/F buffer ops.     1    6       2924      53.788     1035.867    17.6
  Write traj.            1    6          2       2.084       40.137     0.7
  Update                 1    6       1481      22.860      440.249     7.5
  Constraints            1    6       1481      57.303     1103.547    18.7
  Rest                   1                      30.818      593.509    10.1
-----------------------------------------------------------------------------
  Total                  1                     306.000     5893.027   100.0
-----------------------------------------------------------------------------

  GPU timings
-----------------------------------------------------------------------------
  Computing:                         Count  Wall t (s)      ms/step       %
-----------------------------------------------------------------------------
  Pair list H2D                         38       0.640       16.844     0.5
  X / q H2D                           1481      11.715        7.910    10.0
  Nonbonded F kernel                  1436      95.190       66.288    81.1
  Nonbonded F+ene k.                     7       0.473       67.625     0.4
  Nonbonded F+ene+prune k.              38       2.723       71.645     2.3
  F D2H                               1481       6.648        4.489     5.7
-----------------------------------------------------------------------------
  Total                                        117.389       79.263   100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 79.263 ms/41.005 ms = 1.933


======================= 5.0rc1 ========================================

On 1 MPI rank, each using 6 OpenMP threads

  Computing:          Num   Num      Call    Wall time         Giga-Cycles
                      Nodes Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
  Neighbor search        1    6         69      43.134        906.243   6.1
  Launch GPU ops.        1    6       2721       0.931         19.563   0.1
  Force                  1    6       2721     124.095       2607.240  17.4
  Wait GPU local         1    6       2721     167.917       3527.955  23.6
  NB X/F buffer ops.     1    6       5373     140.788       2957.975  19.8
  Write traj.            1    6          2       2.346         49.282   0.3
  Update                 1    6       2721      59.083       1241.340   8.3
  Constraints            1    6       2721     111.926       2351.573  15.7
  Rest                                          61.781       1298.019   8.7
-----------------------------------------------------------------------------
  Total                                        712.000      14959.191 100.0
-----------------------------------------------------------------------------

  GPU timings
-----------------------------------------------------------------------------
  Computing:                         Count  Wall t (s)      ms/step       %
-----------------------------------------------------------------------------
  Pair list H2D                         69       1.555       22.542     0.5
  X / q H2D                           2721      27.089        9.955     9.3
  Nonbonded F kernel                  2638     240.172       91.043    82.7
  Nonbonded F+ene k.                    14       1.324       94.599     0.5
  Nonbonded F+ene+prune k.              69       7.308      105.912     2.5
  F D2H                               2721      13.105        4.816     4.5
-----------------------------------------------------------------------------
  Total                                        290.554      106.782   100.0
-----------------------------------------------------------------------------
Force evaluation time GPU/CPU: 106.782 ms/45.606 ms = 2.341






More information about the gromacs.org_gmx-developers mailing list