[gmx-users] simulation on 2 gpus

Szilárd Páll pall.szilard at gmail.com
Fri Aug 16 17:47:15 CEST 2019


On Mon, Aug 5, 2019 at 5:00 PM Stefano Guglielmo
<stefano.guglielmo at unito.it> wrote:
>
> Dear Paul,
> thanks for suggestions. Following them I managed to run 91 ns/day for the
> system I referred to in my previous post with the configuration:
> gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 -gputasks
> 0000111 -pin on (still 28 threads seems to be the best choice)
>
> and 56 ns/day for two independent runs:
> gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 -gputasks
> 0000000 -pin on -pinoffset 0 -pinstride 1
> gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 -gputasks
> 1111111 -pin on -pinoffset 28 -pinstride 1
> which is a fairly good result.

Use no DD in single-GPU runs, i.e. for the latter, just simply
gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
-gputasks 00 -pin on -pinoffset 0 -pinstride 1

You can also have mdrun's multidir functionality manage an ensemble of
jobs (related or not) so you don't have to manually start, calculate
pinning, etc.


> I am still wondering if somehow I should pin the threads in some different
> way in order to reflect the cpu topology and if this can influence
> performance (if I remember well NAMD allows the user to indicate explicitly
> the cpu core/threads to use in a computation).

Your pinning does reflect the CPU topology -- the 4x7=28 threads are
pinned to consecutive hardware threads (because of -pinstride 1, i.e.
don't skip the second hardware thread of the core). The mapping of
software to hardware threads happens based on a the topology-based
hardware thread indexing, see the hardware detection report in the log
file.

> When I tried to run two simulations with the following configuration:
> gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1 -gputasks
> 00001111 -pin on -pinoffset 0 -pinstride 1
> gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1 -gputasks
> 00001111 -pin on -pinoffset 0 -pinstride 32
> the system crashed down. Probably this is normal and I am missing something
> quite obvious.

Not really. What do you mean by "crashed down", the machine should not
crash, nor should the simulation. Even though your machine has 32
cores / 64 threads, using all of these may not always be beneficial as
using more threads where there is too little work to scale will have
an overhead. Have you tried using all cores but only 1 thread / core
(i.e. 32 threads in total with pinstride 2)?

Cheers,
--
Szilárd

>
> Thanks again for the valuable advices
> Stefano
>
>
>
> Il giorno dom 4 ago 2019 alle ore 01:40 paul buscemi <pbuscemi at q.com> ha
> scritto:
>
> > Stefano,
> >
> > A recent run with 140000 atoms, including 10000 isopropanol  molecules on
> > top of  an end restrained PDMS surface of  74000 atoms  in a 20 20 30 nm
> > box ran at 67 ns/d nvt with the mdrun conditions I posted. It took 120 ns
> > for 100 molecules of an adsorbate  to go from solution to the surface.   I
> > don't think this will set the world ablaze with any benchmarks but it is
> > acceptable to get some work done.
> >
> > Linux Mint Mate 18, AMD Threadripper 32 core 2990wx 4.2Ghz, 32GB DDR4, 2x
> > RTX 2080TI gmx2019 in the simplest gmx configuration for gpus,  CUDA
> > version 10, Nvidia 410.7p loaded  from the repository
> >
> > Paul
> >
> > > On Aug 3, 2019, at 12:58 PM, paul buscemi <pbuscemi at q.com> wrote:
> > >
> > > Stefano,
> > >
> > > Here is a typical run
> > >
> > > fpr minimization mdrun -deffnm   grofile. -nn gpu
> > >
> > > and for other runs for a 32 core
> > >
> > > gmx -deffnm grofile.nvt  -nb gpu -pme gpu -ntomp  8  -ntmpi 8  -npme 1
> > -gputasks 0000000011111111  -pin on
> > >
> > > Depending on the molecular system/model   -ntomp -4 -ntmpi 16  may be
> > faster   - of course adjusting -gputasks
> > >
> > > Rarely do I find that not using ntomp and ntpmi is faster, but it is
> > never bad
> > >
> > > Let me know how it goes.
> > >
> > > Paul
> > >
> > >> On Aug 3, 2019, at 4:41 AM, Stefano Guglielmo <
> > stefano.guglielmo at unito.it> wrote:
> > >>
> > >> Hi Paul,
> > >> thanks for the reply. Would you mind posting the command you used or
> > >> telling how did you balance the work between cpu and gpu?
> > >>
> > >> What about pinning? Does anyone know how to deal with a cpu topology
> > like
> > >> the one reported in my previous post and if it is relevant for
> > performance?
> > >> Thanks
> > >> Stefano
> > >>
> > >> Il giorno sabato 3 agosto 2019, Paul Buscemi <pbuscemi at q.com> ha
> > scritto:
> > >>
> > >>> I run the same system and setup but no nvlink. Maestro runs both gpus
> > at
> > >>> 100 percent. Gromacs typically 50 --60 percent can do 600ns/d on 20000
> > >>> atoms
> > >>>
> > >>> PB
> > >>>
> > >>>> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.boyd at uconn.edu> wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> I've done a lot of research/experimentation on this, so I can maybe
> > get
> > >>> you
> > >>>> started - if anyone has any questions about the essay to follow, feel
> > >>> free
> > >>>> to email me personally, and I'll link it to the email thread if it
> > ends
> > >>> up
> > >>>> being pertinent.
> > >>>>
> > >>>> First, there's some more internet resources to checkout. See Mark's
> > talk
> > >>> at
> > >>>> -
> > >>>> https://bioexcel.eu/webinar-performance-tuning-and-
> > >>> optimization-of-gromacs/
> > >>>> Gromacs development moves fast, but a lot of it is still relevant.
> > >>>>
> > >>>> I'll expand a bit here, with the caveat that Gromacs GPU development
> > is
> > >>>> moving very fast and so the correct commands for optimal performance
> > are
> > >>>> both system-dependent and a moving target between versions. This is a
> > >>> good
> > >>>> thing - GPUs have revolutionized the field, and with each iteration we
> > >>> make
> > >>>> better use of them. The downside is that it's unclear exactly what
> > sort
> > >>> of
> > >>>> CPU-GPU balance you should look to purchase to take advantage of
> > future
> > >>>> developments, though the trend is certainly that more and more
> > >>> computation
> > >>>> is being offloaded to the GPUs.
> > >>>>
> > >>>> The most important consideration is that to get maximum total
> > throughput
> > >>>> performance, you should be running not one but multiple simulations
> > >>>> simultaneously. You can do this through the -multidir option, but I
> > don't
> > >>>> recommend that in this case, as it requires compiling with MPI and
> > limits
> > >>>> some of your options. My run scripts usually use "gmx mdrun ... &" to
> > >>>> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> > >>>> -pinoffset, and -gputasks. I can give specific examples if you're
> > >>>> interested.
> > >>>>
> > >>>> Another important point is that you can run more simulations than the
> > >>>> number of GPUs you have. Depending on CPU-GPU balance and quality, you
> > >>>> won't double your throughput by e.g. putting 4 simulations on 2 GPUs,
> > but
> > >>>> you might increase it up to 1.5x. This would involve targeting the
> > same
> > >>> GPU
> > >>>> with -gputasks.
> > >>>>
> > >>>> Within a simulation, you should set up a benchmarking script to figure
> > >>> out
> > >>>> the best combination of thread-mpi ranks and open-mp threads - this
> > can
> > >>>> have pretty drastic effects on performance. For example, if you want
> > to
> > >>> use
> > >>>> your entire machine for one simulation (not recommended for maximal
> > >>>
> > >>> --
> > >>> Gromacs Users mailing list
> > >>>
> > >>> * Please search the archive at http://www.gromacs.org/
> > >>> Support/Mailing_Lists/GMX-Users_List before posting!
> > >>>
> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>>
> > >>> * For (un)subscribe requests visit
> > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > >>> send a mail to gmx-users-request at gromacs.org.
> > >>>
> > >>
> > >>
> > >> --
> > >> Stefano GUGLIELMO PhD
> > >> Assistant Professor of Medicinal Chemistry
> > >> Department of Drug Science and Technology
> > >> Via P. Giuria 9
> > >> 10125 Turin, ITALY
> > >> ph. +39 (0)11 6707178
> > >> --
> > >> Gromacs Users mailing list
> > >>
> > >> * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >>
> > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >>
> > >> * For (un)subscribe requests visit
> > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
> --
> Stefano GUGLIELMO PhD
> Assistant Professor of Medicinal Chemistry
> Department of Drug Science and Technology
> Via P. Giuria 9
> 10125 Turin, ITALY
> ph. +39 (0)11 6707178
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list