[gmx-users] Maximising Hardware Performance on Local node: Optimal settings
Kutzner, Carsten
ckutzne at gwdg.de
Wed Dec 4 18:29:25 CET 2019
Hi,
> Am 04.12.2019 um 17:53 schrieb Matthew Fisher <matthew.fisher at stcatz.ox.ac.uk>:
>
> Dear all,
>
> We're currently running some experiments with a new hardware configuration and attempting to maximise performance from it. Our system contains 1x V100 and 2x 12 core (24 logical) Xeon Silver 4214 CPUs which, after optimisation of CUDA drivers & kernels etc., we've been able to get a performance of 210 ns/day for 60k atoms with GROMACS 2019.3 (allowing mdrun to select threads, which has surprised us as it only creates 24 OpenMP threads for our 48 logical core system). Furthermore we have a surprising amount of wasted GPU time. Therefore, we were wondering if anyone had any advice on how we could maximise our hardware output?
Run multi-simulations, this will reduce the single-simulation performance a bit, but
you will get a much higher aggregate performance - if this is an option for you.
You might want to look at https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.26011 and our
earlier paper for tips on how to optimize performance on GPU nodes.
Best,
Carsten
> We've enclosed the real cycle and time accounting display below.
>
> Any help will be massively appreciated
>
> Thanks,
> Matt
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 1 MPI rank, each using 24 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
> -----------------------------------------------------------------------------
> Neighbor search 1 24 12501 32.590 1716.686 3.2
> Launch GPU ops. 1 24 2500002 105.169 5539.764 10.2
> Force 1 24 1250001 140.283 7389.414 13.6
> Wait PME GPU gather 1 24 1250001 79.714 4198.902 7.7
> Reduce GPU PME F 1 24 1250001 25.159 1325.260 2.4
> Wait GPU NB local 1 24 1250001 264.961 13956.769 25.7
> NB X/F buffer ops. 1 24 2487501 177.862 9368.871 17.3
> Write traj. 1 24 252 5.748 302.799 0.6
> Update 1 24 1250001 81.151 4274.601 7.9
> Constraints 1 24 1250001 70.231 3699.389 6.8
> Rest 47.521 2503.167 4.6
> -----------------------------------------------------------------------------
> Total 1030.389 54275.623 100.0
> -----------------------------------------------------------------------------
>
> Core t (s) Wall t (s) (%)
> Time: 24729.331 1030.389 2400.0
> (ns/day) (hour/ns)
> Performance: 209.630 0.114
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa
More information about the gromacs.org_gmx-users
mailing list