[gmx-users] Performance issues with Gromacs 2020 on GPUs - slower than 2019.5
Andreas Baer
andreas.baer at fau.de
Thu Feb 27 13:07:38 CET 2020
Hi,
On 27.02.20 12:34, Szilárd Páll wrote:
> Hi
>
> On Thu, Feb 27, 2020 at 11:31 AM Andreas Baer <andreas.baer at fau.de> wrote:
>
>> Hi,
>>
>> with the link below, additional log files for runs with 1 GPU should be
>> accessible now.
>>
> I meant to ask you to run single-rank GPU runs, i.e. gmx mdrun -ntmpi 1.
>
> It would also help if you could share some input files in case if further
> testing is needed.
Ok, there is now also an additional benchmark with `-ntmpi 1 -ntomp 4
-bonded gpu -update gpu` as parameters. However, it is run on the same
machine with smt disabled.
With the following link, I provide all the tests on this machine, I did
by now, along with a summary of the performance for the several input
parameters (both in `logfiles`), as well as input files (`C60xh.7z`) and
the scripts to run these.
I hope, this helps. If there is anything else, I can do to help, please
let me know!
>
>
>> Thank you for the comment with the rlist, I did not know, that this will
>> affect the performance negatively.
>
> It does in multiple ways. First, you are using a rather long list buffer
> which will make the nonbonded pair-interaction calculation more
> computational expensive than it could be if you just used a tolerance and
> let the buffer be calculated. Secondly, as setting a manual rlist disables
> the automated verlet buffer calculation, it prevents mdrun from using a
> dual pairl-list setup (see
> http://manual.gromacs.org/documentation/2018.1/release-notes/2018/major/features.html#dual-pair-list-buffer-with-dynamic-pruning)
> which has additional performance benefits.
Ok, thank you for the explanation!
>
> Cheers,
> --
> Szilárd
Cheers,
Andreas
>
>
>
>> I know, about the nstcalcenergy, but
>> I need it for several of my simulations.
> Cheers,
>> Andreas
>>
>> On 26.02.20 16:50, Szilárd Páll wrote:
>>> Hi,
>>>
>>> Can you please check the performance when running on a single GPU 2019 vs
>>> 2020 with your inputs?
>>>
>>> Also note that you are using some peculiar settings that will have an
>>> adverse effect on performance (like manually set rlist disallowing the
>> dual
>>> pair-list setup, and nstcalcenergy=1).
>>>
>>> Cheers,
>>>
>>> --
>>> Szilárd
>>>
>>>
>>> On Wed, Feb 26, 2020 at 4:11 PM Andreas Baer <andreas.baer at fau.de>
>> wrote:
>>>> Hello,
>>>>
>>>> here is a link to the logfiles.
>>>>
>>>>
>> https://faubox.rrze.uni-erlangen.de/getlink/fiX8wP1LwSBkHRoykw6ksjqY/benchmarks_2019-2020
>>>> If necessary, I can also provide some more log or tpr/gro/... files.
>>>>
>>>> Cheers,
>>>> Andreas
>>>>
>>>>
>>>> On 26.02.20 16:09, Paul bauer wrote:
>>>>> Hello,
>>>>>
>>>>> you can't add attachments to the list, please upload the files
>>>>> somewhere to share them.
>>>>> This might be quite important to us, because the performance
>>>>> regression is not expected by us.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Paul
>>>>>
>>>>> On 26/02/2020 15:54, Andreas Baer wrote:
>>>>>> Hello,
>>>>>>
>>>>>> from a set of benchmark tests with large systems using Gromacs
>>>>>> versions 2019.5 and 2020, I obtained some unexpected results:
>>>>>> With the same set of parameters and the 2020 version, I obtain a
>>>>>> performance that is about 2/3 of the 2019.5 version. Interestingly,
>>>>>> according to nvidia-smi, the GPU usage is about 20% higher for the
>>>>>> 2020 version.
>>>>>> Also from the log files it seems, that the 2020 version does the
>>>>>> computations more efficiently, but spends so much more time waiting,
>>>>>> that the overall performance drops.
>>>>>>
>>>>>> Some background info on the benchmarks:
>>>>>> - System contains about 2.1 million atoms.
>>>>>> - Hardware: 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz = 16 cores +
>>>>>> SMT; 4x NVIDIA Tesla V100
>>>>>> (similar results with less significant performance drop (~15%) on a
>>>>>> different machine: 2 or 4 nodes with each [2x Intel Xeon 2660v2 („Ivy
>>>>>> Bridge“) @ 2.2GHz = 20 cores + SMT; 2x NVIDIA Kepler K20])
>>>>>> - Several options for -ntmpi, -ntomp, -bonded, -pme are used to find
>>>>>> the optimal set. However the performance drop seems to be persistent
>>>>>> for all such options.
>>>>>>
>>>>>> Two representative log files are attached.
>>>>>> Does anyone have an idea, where this drop comes from, and how to
>>>>>> choose the parameters for the 2020 version to circumvent this?
>>>>>>
>>>>>> Regards,
>>>>>> Andreas
>>>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list