[gmx-users] simulation on 2 gpus

Tue Aug 20 15:28:30 CEST 2019

Dear Szilard,

thanks for the very clear answer.
Following your suggestion I tried to run without DD; for the same system I
run two simulations on two gpus:

gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
-gputasks 00 -pin on -pinoffset 0 -pinstride 1

gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
-gputasks 11 -pin on -pinoffset 28 -pinstride 1

but again the system crashed; with this I mean that after few minutes the
machine goes off (power off) without any error message, even without using
all the threads.

I then tried running the two simulations on the same gpu without DD:

gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
-gputasks 00 -pin on -pinoffset 0 -pinstride 1

gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
-gputasks 00 -pin on -pinoffset 28 -pinstride 1

and I obtained better performance (about 70 ns/day) with a massive use of
the gpu (around 90%), comparing to the two runs on two gpus I reported in
the previous post
(gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1 -gputasks
0000000 -pin on -pinoffset 0 -pinstride 1
 gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
-gputasks 1111111 -pin on -pinoffset 28 -pinstride 1).

As for pinning, cpu topology according to log file is:
hardware topology: Basic
    Sockets, cores, and logical processors:
      Socket  0: [   0  32] [   1  33] [   2  34] [   3  35] [   4  36] [
5  37] [   6  38] [   7  39] [  16  48] [  17  49] [  18  50] [  19  51] [
 20  52] [  21  53] [  22  54] [  23  55] [   8  40] [   9  41] [  10  42]
[  11  43] [  12  44] [  13  45] [  14  46] [  15  47] [  24  56] [  25
 57] [  26  58] [  27  59] [  28  60] [  29  61] [  30  62] [  31  63]
If I understand well (absolutely not sure) it should not be that convenient
to pin to consecutive threads, and indeed I found a subtle degradation of
performance for a single simulation, switching from:
gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks
00 -pin on
to
gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0 -gputasks
00 -pin on -pinoffset 0 -pinstride 1.

Thanks again
Stefano

Il giorno ven 16 ago 2019 alle ore 17:48 Szilárd Páll <
pall.szilard at gmail.com> ha scritto:

> On Mon, Aug 5, 2019 at 5:00 PM Stefano Guglielmo
> <stefano.guglielmo at unito.it> wrote:
> >
> > Dear Paul,
> > thanks for suggestions. Following them I managed to run 91 ns/day for the
> > system I referred to in my previous post with the configuration:
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> -gputasks
> > 0000111 -pin on (still 28 threads seems to be the best choice)
> >
> > and 56 ns/day for two independent runs:
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> -gputasks
> > 0000000 -pin on -pinoffset 0 -pinstride 1
> > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> -gputasks
> > 1111111 -pin on -pinoffset 28 -pinstride 1
> > which is a fairly good result.
>
> Use no DD in single-GPU runs, i.e. for the latter, just simply
> gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> -gputasks 00 -pin on -pinoffset 0 -pinstride 1
>
> You can also have mdrun's multidir functionality manage an ensemble of
> jobs (related or not) so you don't have to manually start, calculate
> pinning, etc.
>
>
> > I am still wondering if somehow I should pin the threads in some
> different
> > way in order to reflect the cpu topology and if this can influence
> > performance (if I remember well NAMD allows the user to indicate
> explicitly
> > the cpu core/threads to use in a computation).
>
> Your pinning does reflect the CPU topology -- the 4x7=28 threads are
> pinned to consecutive hardware threads (because of -pinstride 1, i.e.
> don't skip the second hardware thread of the core). The mapping of
> software to hardware threads happens based on a the topology-based
> hardware thread indexing, see the hardware detection report in the log
> file.
>
> > When I tried to run two simulations with the following configuration:
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1
> -gputasks
> > 00001111 -pin on -pinoffset 0 -pinstride 1
> > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1
> -gputasks
> > 00001111 -pin on -pinoffset 0 -pinstride 32
> > the system crashed down. Probably this is normal and I am missing
> something
> > quite obvious.
>
> Not really. What do you mean by "crashed down", the machine should not
> crash, nor should the simulation. Even though your machine has 32
> cores / 64 threads, using all of these may not always be beneficial as
> using more threads where there is too little work to scale will have
> an overhead. Have you tried using all cores but only 1 thread / core
> (i.e. 32 threads in total with pinstride 2)?
>
> Cheers,
> --
> Szilárd
>
> >
> > Thanks again for the valuable advices
> > Stefano
> >
> >
> >
> > Il giorno dom 4 ago 2019 alle ore 01:40 paul buscemi <pbuscemi at q.com> ha
> > scritto:
> >
> > > Stefano,
> > >
> > > A recent run with 140000 atoms, including 10000 isopropanol  molecules
> on
> > > top of  an end restrained PDMS surface of  74000 atoms  in a 20 20 30
> nm
> > > box ran at 67 ns/d nvt with the mdrun conditions I posted. It took 120
> ns
> > > for 100 molecules of an adsorbate  to go from solution to the
> surface.   I
> > > don't think this will set the world ablaze with any benchmarks but it
> is
> > > acceptable to get some work done.
> > >
> > > Linux Mint Mate 18, AMD Threadripper 32 core 2990wx 4.2Ghz, 32GB DDR4,
> 2x
> > > RTX 2080TI gmx2019 in the simplest gmx configuration for gpus,  CUDA
> > > version 10, Nvidia 410.7p loaded  from the repository
> > >
> > > Paul
> > >
> > > > On Aug 3, 2019, at 12:58 PM, paul buscemi <pbuscemi at q.com> wrote:
> > > >
> > > > Stefano,
> > > >
> > > > Here is a typical run
> > > >
> > > > fpr minimization mdrun -deffnm   grofile. -nn gpu
> > > >
> > > > and for other runs for a 32 core
> > > >
> > > > gmx -deffnm grofile.nvt  -nb gpu -pme gpu -ntomp  8  -ntmpi 8  -npme
> 1
> > > -gputasks 0000000011111111  -pin on
> > > >
> > > > Depending on the molecular system/model   -ntomp -4 -ntmpi 16  may be
> > > faster   - of course adjusting -gputasks
> > > >
> > > > Rarely do I find that not using ntomp and ntpmi is faster, but it is
> > > never bad
> > > >
> > > > Let me know how it goes.
> > > >
> > > > Paul
> > > >
> > > >> On Aug 3, 2019, at 4:41 AM, Stefano Guglielmo <
> > > stefano.guglielmo at unito.it> wrote:
> > > >>
> > > >> Hi Paul,
> > > >> thanks for the reply. Would you mind posting the command you used or
> > > >> telling how did you balance the work between cpu and gpu?
> > > >>
> > > >> What about pinning? Does anyone know how to deal with a cpu topology
> > > like
> > > >> the one reported in my previous post and if it is relevant for
> > > performance?
> > > >> Thanks
> > > >> Stefano
> > > >>
> > > >> Il giorno sabato 3 agosto 2019, Paul Buscemi <pbuscemi at q.com> ha
> > > scritto:
> > > >>
> > > >>> I run the same system and setup but no nvlink. Maestro runs both
> gpus
> > > at
> > > >>> 100 percent. Gromacs typically 50 --60 percent can do 600ns/d on
> 20000
> > > >>> atoms
> > > >>>
> > > >>> PB
> > > >>>
> > > >>>> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.boyd at uconn.edu>
> wrote:
> > > >>>>
> > > >>>> Hi,
> > > >>>>
> > > >>>> I've done a lot of research/experimentation on this, so I can
> maybe
> > > get
> > > >>> you
> > > >>>> started - if anyone has any questions about the essay to follow,
> feel
> > > >>> free
> > > >>>> to email me personally, and I'll link it to the email thread if it
> > > ends
> > > >>> up
> > > >>>> being pertinent.
> > > >>>>
> > > >>>> First, there's some more internet resources to checkout. See
> Mark's
> > > talk
> > > >>> at
> > > >>>> -
> > > >>>> https://bioexcel.eu/webinar-performance-tuning-and-
> > > >>> optimization-of-gromacs/
> > > >>>> Gromacs development moves fast, but a lot of it is still relevant.
> > > >>>>
> > > >>>> I'll expand a bit here, with the caveat that Gromacs GPU
> development
> > > is
> > > >>>> moving very fast and so the correct commands for optimal
> performance
> > > are
> > > >>>> both system-dependent and a moving target between versions. This
> is a
> > > >>> good
> > > >>>> thing - GPUs have revolutionized the field, and with each
> iteration we
> > > >>> make
> > > >>>> better use of them. The downside is that it's unclear exactly what
> > > sort
> > > >>> of
> > > >>>> CPU-GPU balance you should look to purchase to take advantage of
> > > future
> > > >>>> developments, though the trend is certainly that more and more
> > > >>> computation
> > > >>>> is being offloaded to the GPUs.
> > > >>>>
> > > >>>> The most important consideration is that to get maximum total
> > > throughput
> > > >>>> performance, you should be running not one but multiple
> simulations
> > > >>>> simultaneously. You can do this through the -multidir option, but
> I
> > > don't
> > > >>>> recommend that in this case, as it requires compiling with MPI and
> > > limits
> > > >>>> some of your options. My run scripts usually use "gmx mdrun ...
> &" to
> > > >>>> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> > > >>>> -pinoffset, and -gputasks. I can give specific examples if you're
> > > >>>> interested.
> > > >>>>
> > > >>>> Another important point is that you can run more simulations than
> the
> > > >>>> number of GPUs you have. Depending on CPU-GPU balance and
> quality, you
> > > >>>> won't double your throughput by e.g. putting 4 simulations on 2
> GPUs,
> > > but
> > > >>>> you might increase it up to 1.5x. This would involve targeting the
> > > same
> > > >>> GPU
> > > >>>> with -gputasks.
> > > >>>>
> > > >>>> Within a simulation, you should set up a benchmarking script to
> figure
> > > >>> out
> > > >>>> the best combination of thread-mpi ranks and open-mp threads -
> this
> > > can
> > > >>>> have pretty drastic effects on performance. For example, if you
> want
> > > to
> > > >>> use
> > > >>>> your entire machine for one simulation (not recommended for
> maximal
> > > >>>
> > > >>> --
> > > >>> Gromacs Users mailing list
> > > >>>
> > > >>> * Please search the archive at http://www.gromacs.org/
> > > >>> Support/Mailing_Lists/GMX-Users_List before posting!
> > > >>>
> > > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >>>
> > > >>> * For (un)subscribe requests visit
> > > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > >>> send a mail to gmx-users-request at gromacs.org.
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Stefano GUGLIELMO PhD
> > > >> Assistant Professor of Medicinal Chemistry
> > > >> Department of Drug Science and Technology
> > > >> Via P. Giuria 9
> > > >> 10125 Turin, ITALY
> > > >> ph. +39 (0)11 6707178
> > > >> --
> > > >> Gromacs Users mailing list
> > > >>
> > > >> * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > > >>
> > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >>
> > > >> * For (un)subscribe requests visit
> > > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > send a mail to gmx-users-request at gromacs.org.
> > > >
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> > >
> >
> >
> > --
> > Stefano GUGLIELMO PhD
> > Assistant Professor of Medicinal Chemistry
> > Department of Drug Science and Technology
> > Via P. Giuria 9
> > 10125 Turin, ITALY
> > ph. +39 (0)11 6707178
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.

-- 
Stefano GUGLIELMO PhD
Assistant Professor of Medicinal Chemistry
Department of Drug Science and Technology
Via P. Giuria 9
10125 Turin, ITALY
ph. +39 (0)11 6707178