[gmx-developers] performance 4.6.3 vs 5.0rc1

Roland Schulz roland at utk.edu
Fri Jun 27 00:22:30 CEST 2014


Hi,

Can you create a redmine issue and upload the tpr and the "mdrun -version"
output for both? Is the performance only worse with GPU or also without?
What about if you use latest release-5-0 branch (fine if you can use the
version with my patch from the previous email)?

Roland


On Thu, Jun 26, 2014 at 6:07 PM, Mirco Wahab <
mirco.wahab at chemie.tu-freiberg.de> wrote:

> Performance test on a large system:
>
>   2.4 x 10^6 particles,
>   MARTINI vesicle in water
>   GTX-660Ti, 6-core Phenom II X6
>
>   - nstlist              = 40
>   - rlist                = 2.4
>   - coulombtype          = Reaction-Field
>   - cutoff-scheme        = verlet
>   - coulomb-modifier     = Potential-shift
>   - epsilon_rf           = 0
>   - verlet-buffer-drift  = 0.005
>   - rcoulomb             = 1.1
>   - rcoulomb_switch      = 0.0
>   - epsilon_r            = 15
>   - vdw_type             = cut-off
>   - rvdw_switch          = 0.9
>   - rvdw                 = 1.1
>   - vdw-modifier         = Potential-shift
>   - tcoupl               = v-rescale    ; Berendsen
>   - tc-grps              = DPPC BSCHX W
>   - tau_t                = 1.0  1.0  1.0
>   - ref_t                = 315  315  315
>   - Pcoupl               = Berendsen
>   - Pcoupltype           = isotropic
>   - tau_p                = 6
>
> Both tests start *from the same tpr* (generated w/4.6.3)
> 4.6.3    8.363 ns/day
> 5.0.rc1  6.604 ns/day
>
> log file summaries here ==>
>
> ======================= 4.6.3 ========================================
>       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
>   Computing:         Nodes   Th.     Count  Wall t (s)     G-Cycles       %
>
> -----------------------------------------------------------------------------
>   Neighbor search        1    6         38      20.559      395.924     6.7
>   Launch GPU ops.        1    6       1481       0.439        8.445     0.1
>   Force                  1    6       1481      60.728     1169.515    19.8
>   Wait GPU local         1    6       1481      57.421     1105.832    18.8
>   NB X/F buffer ops.     1    6       2924      53.788     1035.867    17.6
>   Write traj.            1    6          2       2.084       40.137     0.7
>   Update                 1    6       1481      22.860      440.249     7.5
>   Constraints            1    6       1481      57.303     1103.547    18.7
>   Rest                   1                      30.818      593.509    10.1
>
> -----------------------------------------------------------------------------
>   Total                  1                     306.000     5893.027   100.0
>
> -----------------------------------------------------------------------------
>
>   GPU timings
>
> -----------------------------------------------------------------------------
>   Computing:                         Count  Wall t (s)      ms/step       %
>
> -----------------------------------------------------------------------------
>   Pair list H2D                         38       0.640       16.844     0.5
>   X / q H2D                           1481      11.715        7.910    10.0
>   Nonbonded F kernel                  1436      95.190       66.288    81.1
>   Nonbonded F+ene k.                     7       0.473       67.625     0.4
>   Nonbonded F+ene+prune k.              38       2.723       71.645     2.3
>   F D2H                               1481       6.648        4.489     5.7
>
> -----------------------------------------------------------------------------
>   Total                                        117.389       79.263   100.0
>
> -----------------------------------------------------------------------------
> Force evaluation time GPU/CPU: 79.263 ms/41.005 ms = 1.933
>
>
> ======================= 5.0rc1 ========================================
>
> On 1 MPI rank, each using 6 OpenMP threads
>
>   Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                       Nodes Threads  Count      (s)         total sum    %
>
> -----------------------------------------------------------------------------
>   Neighbor search        1    6         69      43.134        906.243   6.1
>   Launch GPU ops.        1    6       2721       0.931         19.563   0.1
>   Force                  1    6       2721     124.095       2607.240  17.4
>   Wait GPU local         1    6       2721     167.917       3527.955  23.6
>   NB X/F buffer ops.     1    6       5373     140.788       2957.975  19.8
>   Write traj.            1    6          2       2.346         49.282   0.3
>   Update                 1    6       2721      59.083       1241.340   8.3
>   Constraints            1    6       2721     111.926       2351.573  15.7
>   Rest                                          61.781       1298.019   8.7
>
> -----------------------------------------------------------------------------
>   Total                                        712.000      14959.191 100.0
>
> -----------------------------------------------------------------------------
>
>   GPU timings
>
> -----------------------------------------------------------------------------
>   Computing:                         Count  Wall t (s)      ms/step       %
>
> -----------------------------------------------------------------------------
>   Pair list H2D                         69       1.555       22.542     0.5
>   X / q H2D                           2721      27.089        9.955     9.3
>   Nonbonded F kernel                  2638     240.172       91.043    82.7
>   Nonbonded F+ene k.                    14       1.324       94.599     0.5
>   Nonbonded F+ene+prune k.              69       7.308      105.912     2.5
>   F D2H                               2721      13.105        4.816     4.5
>
> -----------------------------------------------------------------------------
>   Total                                        290.554      106.782   100.0
>
> -----------------------------------------------------------------------------
> Force evaluation time GPU/CPU: 106.782 ms/45.606 ms = 2.341
>
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>



-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20140626/da0e0139/attachment-0001.html>


More information about the gromacs.org_gmx-developers mailing list