[gmx-users] simulation on 2 gpus

Sun Aug 4 01:38:49 CEST 2019

Stefano,

A recent run with 140000 atoms, including 10000 isopropanol  molecules on top of  an end restrained PDMS surface of  74000 atoms  in a 20 20 30 nm box ran at 67 ns/d nvt with the mdrun conditions I posted. It took 120 ns for 100 molecules of an adsorbate  to go from solution to the surface.   I don't think this will set the world ablaze with any benchmarks but it is acceptable to get some work done.

Linux Mint Mate 18, AMD Threadripper 32 core 2990wx 4.2Ghz, 32GB DDR4, 2x RTX 2080TI gmx2019 in the simplest gmx configuration for gpus,  CUDA version 10, Nvidia 410.7p loaded  from the repository

Paul

> On Aug 3, 2019, at 12:58 PM, paul buscemi <pbuscemi at q.com> wrote:
> 
> Stefano,
> 
> Here is a typical run
> 
> fpr minimization mdrun -deffnm   grofile. -nn gpu 
> 
> and for other runs for a 32 core
> 
> gmx -deffnm grofile.nvt  -nb gpu -pme gpu -ntomp  8  -ntmpi 8  -npme 1 -gputasks 0000000011111111  -pin on   
> 
> Depending on the molecular system/model   -ntomp -4 -ntmpi 16  may be faster   - of course adjusting -gputasks
> 
> Rarely do I find that not using ntomp and ntpmi is faster, but it is never bad
> 
> Let me know how it goes.
> 
> Paul
> 
>> On Aug 3, 2019, at 4:41 AM, Stefano Guglielmo <stefano.guglielmo at unito.it> wrote:
>> 
>> Hi Paul,
>> thanks for the reply. Would you mind posting the command you used or
>> telling how did you balance the work between cpu and gpu?
>> 
>> What about pinning? Does anyone know how to deal with a cpu topology like
>> the one reported in my previous post and if it is relevant for performance?
>> Thanks
>> Stefano
>> 
>> Il giorno sabato 3 agosto 2019, Paul Buscemi <pbuscemi at q.com> ha scritto:
>> 
>>> I run the same system and setup but no nvlink. Maestro runs both gpus at
>>> 100 percent. Gromacs typically 50 --60 percent can do 600ns/d on 20000
>>> atoms
>>> 
>>> PB
>>> 
>>>> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.boyd at uconn.edu> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I've done a lot of research/experimentation on this, so I can maybe get
>>> you
>>>> started - if anyone has any questions about the essay to follow, feel
>>> free
>>>> to email me personally, and I'll link it to the email thread if it ends
>>> up
>>>> being pertinent.
>>>> 
>>>> First, there's some more internet resources to checkout. See Mark's talk
>>> at
>>>> -
>>>> https://bioexcel.eu/webinar-performance-tuning-and-
>>> optimization-of-gromacs/
>>>> Gromacs development moves fast, but a lot of it is still relevant.
>>>> 
>>>> I'll expand a bit here, with the caveat that Gromacs GPU development is
>>>> moving very fast and so the correct commands for optimal performance are
>>>> both system-dependent and a moving target between versions. This is a
>>> good
>>>> thing - GPUs have revolutionized the field, and with each iteration we
>>> make
>>>> better use of them. The downside is that it's unclear exactly what sort
>>> of
>>>> CPU-GPU balance you should look to purchase to take advantage of future
>>>> developments, though the trend is certainly that more and more
>>> computation
>>>> is being offloaded to the GPUs.
>>>> 
>>>> The most important consideration is that to get maximum total throughput
>>>> performance, you should be running not one but multiple simulations
>>>> simultaneously. You can do this through the -multidir option, but I don't
>>>> recommend that in this case, as it requires compiling with MPI and limits
>>>> some of your options. My run scripts usually use "gmx mdrun ... &" to
>>>> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
>>>> -pinoffset, and -gputasks. I can give specific examples if you're
>>>> interested.
>>>> 
>>>> Another important point is that you can run more simulations than the
>>>> number of GPUs you have. Depending on CPU-GPU balance and quality, you
>>>> won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
>>>> you might increase it up to 1.5x. This would involve targeting the same
>>> GPU
>>>> with -gputasks.
>>>> 
>>>> Within a simulation, you should set up a benchmarking script to figure
>>> out
>>>> the best combination of thread-mpi ranks and open-mp threads - this can
>>>> have pretty drastic effects on performance. For example, if you want to
>>> use
>>>> your entire machine for one simulation (not recommended for maximal
>>> 
>>> --
>>> Gromacs Users mailing list
>>> 
>>> * Please search the archive at http://www.gromacs.org/
>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>> 
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> 
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>> 
>> 
>> 
>> -- 
>> Stefano GUGLIELMO PhD
>> Assistant Professor of Medicinal Chemistry
>> Department of Drug Science and Technology
>> Via P. Giuria 9
>> 10125 Turin, ITALY
>> ph. +39 (0)11 6707178
>> -- 
>> Gromacs Users mailing list
>> 
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>> 
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.