[gmx-developers] performance 4.6.3 vs 5.0rc1
Roland Schulz
roland at utk.edu
Fri Jun 27 00:22:30 CEST 2014
Hi,
Can you create a redmine issue and upload the tpr and the "mdrun -version"
output for both? Is the performance only worse with GPU or also without?
What about if you use latest release-5-0 branch (fine if you can use the
version with my patch from the previous email)?
Roland
On Thu, Jun 26, 2014 at 6:07 PM, Mirco Wahab <
mirco.wahab at chemie.tu-freiberg.de> wrote:
> Performance test on a large system:
>
> 2.4 x 10^6 particles,
> MARTINI vesicle in water
> GTX-660Ti, 6-core Phenom II X6
>
> - nstlist = 40
> - rlist = 2.4
> - coulombtype = Reaction-Field
> - cutoff-scheme = verlet
> - coulomb-modifier = Potential-shift
> - epsilon_rf = 0
> - verlet-buffer-drift = 0.005
> - rcoulomb = 1.1
> - rcoulomb_switch = 0.0
> - epsilon_r = 15
> - vdw_type = cut-off
> - rvdw_switch = 0.9
> - rvdw = 1.1
> - vdw-modifier = Potential-shift
> - tcoupl = v-rescale ; Berendsen
> - tc-grps = DPPC BSCHX W
> - tau_t = 1.0 1.0 1.0
> - ref_t = 315 315 315
> - Pcoupl = Berendsen
> - Pcoupltype = isotropic
> - tau_p = 6
>
> Both tests start *from the same tpr* (generated w/4.6.3)
> 4.6.3 8.363 ns/day
> 5.0.rc1 6.604 ns/day
>
> log file summaries here ==>
>
> ======================= 4.6.3 ========================================
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> Computing: Nodes Th. Count Wall t (s) G-Cycles %
>
> -----------------------------------------------------------------------------
> Neighbor search 1 6 38 20.559 395.924 6.7
> Launch GPU ops. 1 6 1481 0.439 8.445 0.1
> Force 1 6 1481 60.728 1169.515 19.8
> Wait GPU local 1 6 1481 57.421 1105.832 18.8
> NB X/F buffer ops. 1 6 2924 53.788 1035.867 17.6
> Write traj. 1 6 2 2.084 40.137 0.7
> Update 1 6 1481 22.860 440.249 7.5
> Constraints 1 6 1481 57.303 1103.547 18.7
> Rest 1 30.818 593.509 10.1
>
> -----------------------------------------------------------------------------
> Total 1 306.000 5893.027 100.0
>
> -----------------------------------------------------------------------------
>
> GPU timings
>
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
>
> -----------------------------------------------------------------------------
> Pair list H2D 38 0.640 16.844 0.5
> X / q H2D 1481 11.715 7.910 10.0
> Nonbonded F kernel 1436 95.190 66.288 81.1
> Nonbonded F+ene k. 7 0.473 67.625 0.4
> Nonbonded F+ene+prune k. 38 2.723 71.645 2.3
> F D2H 1481 6.648 4.489 5.7
>
> -----------------------------------------------------------------------------
> Total 117.389 79.263 100.0
>
> -----------------------------------------------------------------------------
> Force evaluation time GPU/CPU: 79.263 ms/41.005 ms = 1.933
>
>
> ======================= 5.0rc1 ========================================
>
> On 1 MPI rank, each using 6 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Nodes Threads Count (s) total sum %
>
> -----------------------------------------------------------------------------
> Neighbor search 1 6 69 43.134 906.243 6.1
> Launch GPU ops. 1 6 2721 0.931 19.563 0.1
> Force 1 6 2721 124.095 2607.240 17.4
> Wait GPU local 1 6 2721 167.917 3527.955 23.6
> NB X/F buffer ops. 1 6 5373 140.788 2957.975 19.8
> Write traj. 1 6 2 2.346 49.282 0.3
> Update 1 6 2721 59.083 1241.340 8.3
> Constraints 1 6 2721 111.926 2351.573 15.7
> Rest 61.781 1298.019 8.7
>
> -----------------------------------------------------------------------------
> Total 712.000 14959.191 100.0
>
> -----------------------------------------------------------------------------
>
> GPU timings
>
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
>
> -----------------------------------------------------------------------------
> Pair list H2D 69 1.555 22.542 0.5
> X / q H2D 2721 27.089 9.955 9.3
> Nonbonded F kernel 2638 240.172 91.043 82.7
> Nonbonded F+ene k. 14 1.324 94.599 0.5
> Nonbonded F+ene+prune k. 69 7.308 105.912 2.5
> F D2H 2721 13.105 4.816 4.5
>
> -----------------------------------------------------------------------------
> Total 290.554 106.782 100.0
>
> -----------------------------------------------------------------------------
> Force evaluation time GPU/CPU: 106.782 ms/45.606 ms = 2.341
>
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20140626/da0e0139/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list