[gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5

Szilárd Páll pall.szilard at gmail.com
Mon Mar 9 17:01:01 CET 2020


Hi Andreas,

Sorry for the delay.

I can confirm the regression. This affects the energy calculation steps
where the GPU bonded computational did get significantly slower (as a
side-effect of optimizations that mainly targeted the force-only kernels).

Can you please file an issue on redmine.gromacs.org and upload the data you
shared with me?

As a workaround you should consider using an nstcalcenergy>1; bumping it to
just ~10 would eliminate most of the regression and would improve the
performance of other computation too (the nonbonded F-only kernels are also
at least 1.5x faster than the force+energy kernels).
Alternatively, I recall you had decent CPU, so you could run the bonded
interactions on the CPU

Side-note: you are using an overly fine PME grid that you did not scale
along the (overly accurate) the rather long cut-offs (see
http://manual.gromacs.org/documentation/current/user-guide/mdp-options.html#mdp-fourierspacing
).

Cheers,
--
Szilárd


On Fri, Feb 28, 2020 at 11:10 AM Andreas Baer <andreas.baer at fau.de> wrote:

> Hi,
>
> sorry for it!
>
> https://faubox.rrze.uni-erlangen.de/getlink/fiUpELsXokQr3a7vyeDSKdY3/benchmarks_2019-2020_all
>
> Cheers,
> Andreas
>
> On 27.02.20 17:59, Szilárd Páll wrote:
>
> On Thu, Feb 27, 2020 at 1:08 PM Andreas Baer <andreas.baer at fau.de> wrote:
>
>> Hi,
>>
>> On 27.02.20 12:34, Szilárd Páll wrote:
>> > Hi
>> >
>> > On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer <andreas.baer at fau.de>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> with the link below, additional log files for runs with 1 GPU should be
>> >> accessible now.
>> >>
>> > I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.
>> >
>> > It would also help if you could share some input files in case if
>> further
>> > testing is needed.
>> Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
>> -bonded gpu -update gpu` as parameters. However, it is run on the same
>> machine with smt disabled.
>> With the following link, I provide all the tests on this machine, I did
>> by now, along with a summary of the performance for the several input
>> parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
>> the scripts to run these.
>>
>
> Links seems to be missing.
> --
> Szilárd
>
>
>> I hope, this helps. If there is anything else, I can do to help, please
>> let me know!
>> >
>> >
>> >> Thank you for the comment with the rlist, I did not know, that this
>> will
>> >> affect the performance negatively.
>> >
>> > It does in multiple ways. First, you are using a rather long list buffer
>> > which will make the nonbonded pair-interaction calculation more
>> > computational expensive than it could be if you just used a tolerance
>> and
>> > let the buffer be calculated. Secondly, as setting a manual rlist
>> disables
>> > the automated verlet buffer calculation, it prevents mdrun from using a
>> > dual pairl-list setup (see
>> >
>> http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning
>> )
>> > which has additional performance benefits.
>> Ok, thank you for the explanation!
>> >
>> > Cheers,
>> > --
>> > Szilárd
>> Cheers,
>> Andreas
>> >
>> >
>> >
>> >> I know, about the nstcalcenergy, but
>> >> I need it for several of my simulations.
>> > Cheers,
>> >> Andreas
>> >>
>> >> On 26.02.20 16:50, Szilárd Páll wrote:
>> >>> Hi,
>> >>>
>> >>> Can you please check the performance when running on a single GPU
>> 2019 vs
>> >>> 2020 with your inputs?
>> >>>
>> >>> Also note that you are using some peculiar settings that will have an
>> >>> adverse effect on performance (like manually set rlist disallowing the
>> >> dual
>> >>> pair-list setup, and nstcalcenergy=1).
>> >>>
>> >>> Cheers,
>> >>>
>> >>> --
>> >>> Szilárd
>> >>>
>> >>>
>> >>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer <andreas.baer at fau.de>
>> >> wrote:
>> >>>> Hello,
>> >>>>
>> >>>> here is a link to the logfiles.
>> >>>>
>> >>>>
>> >>
>> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>> >>>> If necessary, I can also provide some more log or tpr/gro/... files.
>> >>>>
>> >>>> Cheers,
>> >>>> Andreas
>> >>>>
>> >>>>
>> >>>> On 26.02.20 16:09, Paul bauer wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> you can't add attachments to the list, please upload the files
>> >>>>> somewhere to share them.
>> >>>>> This might be quite important to us, because the performance
>> >>>>> regression is not expected by us.
>> >>>>>
>> >>>>> Cheers
>> >>>>>
>> >>>>> Paul
>> >>>>>
>> >>>>> On 26/02/2020 15:54, Andreas Baer wrote:
>> >>>>>> Hello,
>> >>>>>>
>> >>>>>> from a set of benchmark tests with large systems using Gromacs
>> >>>>>> versions 2019.5 and 2020, I obtained some unexpected results:
>> >>>>>> With the same set of parameters and the 2020 version, I obtain a
>> >>>>>> performance that is about 2/3 of the 2019.5 version. Interestingly,
>> >>>>>> according to nvidia-smi, the GPU usage is about 20% higher for the
>> >>>>>> 2020 version.
>> >>>>>> Also from the log files it seems, that the 2020 version does the
>> >>>>>> computations more efficiently, but spends so much more time
>> waiting,
>> >>>>>> that the overall performance drops.
>> >>>>>>
>> >>>>>> Some background info on the benchmarks:
>> >>>>>> - System contains about 2.1 million atoms.
>> >>>>>> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16
>> cores +
>> >>>>>> SMT; 4x NVIDIA Tesla V100
>> >>>>>>     (similar results with less significant performance drop (~15%)
>> on a
>> >>>>>> different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2
>> („Ivy
>> >>>>>> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
>> >>>>>> - Several options for -ntmpi, -ntomp, -bonded, -pme are used to
>> find
>> >>>>>> the optimal set. However the performance drop seems to be
>> persistent
>> >>>>>> for all such options.
>> >>>>>>
>> >>>>>> Two representative log files are attached.
>> >>>>>> Does anyone have an idea, where this drop comes from, and how to
>> >>>>>> choose the parameters for the 2020 version to circumvent this?
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Andreas
>> >>>>>>
>> >>>> --
>> >>>> Gromacs Users mailing list
>> >>>>
>> >>>> * Please search the archive at
>> >>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >>>> posting!
>> >>>>
>> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>>>
>> >>>> * For (un)subscribe requests visit
>> >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>> or
>> >>>> send a mail to gmx-users-request at gromacs.org.
>> >> --
>> >> Gromacs Users mailing list
>> >>
>> >> * Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> >> posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> >> send a mail to gmx-users-request at gromacs.org.
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>
>
>


More information about the gromacs.org_gmx-users mailing list