[gmx-users] simulation on 2 gpus

Fri Sep 6 15:46:26 CEST 2019

Hi Szilard,

thanks for suggestions.

As for the strange crash, the workstation works fine using only cpu; the
problem seems to be related to gpu usage, when both cards are used for 200
W over 250 (more or less) the workstation turns off. It is not about PSU
(even in the "offending" case we are quite below the maximum power), and it
is neither related to temperature (it happens even if gpu temp is around
55-60 °C). The vendor did some tests and accordingly the hardware seems to
be ok. Do you (or anyone else in the list) have any particular test to
suggest that can more specifically help to diagnose the problem?

Any opinion is appreciated,

thanks

Il giorno mercoledì 21 agosto 2019, Szilárd Páll <pall.szilard at gmail.com>
ha scritto:

> Hi Stefano,
>
>
> On Tue, Aug 20, 2019 at 3:29 PM Stefano Guglielmo
> <stefano.guglielmo at unito.it> wrote:
> >
> > Dear Szilard,
> >
> > thanks for the very clear answer.
> > Following your suggestion I tried to run without DD; for the same system
> I
> > run two simulations on two gpus:
> >
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > -gputasks 00 -pin on -pinoffset 0 -pinstride 1
> >
> > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > -gputasks 11 -pin on -pinoffset 28 -pinstride 1
> >
> > but again the system crashed; with this I mean that after few minutes the
> > machine goes off (power off) without any error message, even without
> using
> > all the threads.
>
> That is not normal and I strongly recommend investigating it as it
> could be a sign of an underlying system/hardware instability or fault
> which could ultimately lead to incorrect simulation results.
>
> Are you sure that:
> - your machine is stable and reliable at high loads; is the PSU sufficient?
> - your hardware has been thoroughly stress-tested and it does not show
> instabilities?
>
> Does the crash also happen with GROMACS running on the CPU only (using
> all cores)?
> I'd recommend running some stress-tests that fully load the machine
> for a few hours to see if the error persists.
>
> > I then tried running the two simulations on the same gpu without DD:
> >
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > -gputasks 00 -pin on -pinoffset 0 -pinstride 1
> >
> > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > -gputasks 00 -pin on -pinoffset 28 -pinstride 1
> >
> > and I obtained better performance (about 70 ns/day) with a massive use of
> > the gpu (around 90%), comparing to the two runs on two gpus I reported in
> > the previous post
> > (gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> -gputasks
> > 0000000 -pin on -pinoffset 0 -pinstride 1
> >  gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > -gputasks 1111111 -pin on -pinoffset 28 -pinstride 1).
>
> That is expected; domain-decomposition on a single GPU is unnecessary
> and introduces overheads that limit performance.
>
> > As for pinning, cpu topology according to log file is:
> > hardware topology: Basic
> >     Sockets, cores, and logical processors:
> >       Socket  0: [   0  32] [   1  33] [   2  34] [   3  35] [   4  36] [
> > 5  37] [   6  38] [   7  39] [  16  48] [  17  49] [  18  50] [  19  51]
> [
> >  20  52] [  21  53] [  22  54] [  23  55] [   8  40] [   9  41] [  10
> 42]
> > [  11  43] [  12  44] [  13  45] [  14  46] [  15  47] [  24  56] [  25
> >  57] [  26  58] [  27  59] [  28  60] [  29  61] [  30  62] [  31  63]
> > If I understand well (absolutely not sure) it should not be that
> convenient
> > to pin to consecutive threads,
>
> On the contrary, pinning to consecutive threads is the recommended
> behavior. More generally, application threads are expected to be
> pinned to consecutive cores (as threading parallelization will benefit
> from the resulting cache access patterns); now, CPU cores can have
> multiple hardware threads and depending on whether using one or
> mulitpole makes sense (performance-wise), will determine whether a
> stride of 1 or 2 is best. Typically, when most work is offloaded to a
> GPU and many CPU cores are available 1 thread/core is best.
>
> Note that the above topology mapping simply means that the indexed
> entities that the operating system calls "CPU" grouped in "[]"
> correspond to hardware threads of the same core, i.e. core 0 is [0
> 32], core 1 [1 33], etc. Pinning with a stride happens into this map:
> - with a -pinstride 1 thread mapping will be (app thread->hardware
> thread): 0->0, 1->32, 2->1, 3->33,...
> - with a -pinstride 2 thread mapping will be (-||-): 0->0, 1->1, 2->2,
> 3->3, ...
>
> > and indeed I found a subtle degradation of
> > performance for a single simulation, switching from:
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> -gputasks
> > 00 -pin on
> > to
> > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> -gputasks
> > 00 -pin on -pinoffset 0 -pinstride 1.
>
> If you compare the log files of the two, you should notice that the
> former used a pinstride 2 resulting in the use 28 cores while the
> latter using only 14 cores; the likely reason for only a small
> difference is that there is not enough CPU work to scale to 28 cores
> and additionally, these specific TR CPUs are tricky to scale across
> using wide multi-threaded parallelization.
>
> Cheers,
> --
> Szilárd
>
>
> >
> > Thanks again
> > Stefano
> >
> >
> >
> >
> > Il giorno ven 16 ago 2019 alle ore 17:48 Szilárd Páll <
> > pall.szilard at gmail.com> ha scritto:
> >
> > > On Mon, Aug 5, 2019 at 5:00 PM Stefano Guglielmo
> > > <stefano.guglielmo at unito.it> wrote:
> > > >
> > > > Dear Paul,
> > > > thanks for suggestions. Following them I managed to run 91 ns/day
> for the
> > > > system I referred to in my previous post with the configuration:
> > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > -gputasks
> > > > 0000111 -pin on (still 28 threads seems to be the best choice)
> > > >
> > > > and 56 ns/day for two independent runs:
> > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > -gputasks
> > > > 0000000 -pin on -pinoffset 0 -pinstride 1
> > > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > -gputasks
> > > > 1111111 -pin on -pinoffset 28 -pinstride 1
> > > > which is a fairly good result.
> > >
> > > Use no DD in single-GPU runs, i.e. for the latter, just simply
> > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > > -gputasks 00 -pin on -pinoffset 0 -pinstride 1
> > >
> > > You can also have mdrun's multidir functionality manage an ensemble of
> > > jobs (related or not) so you don't have to manually start, calculate
> > > pinning, etc.
> > >
> > >
> > > > I am still wondering if somehow I should pin the threads in some
> > > different
> > > > way in order to reflect the cpu topology and if this can influence
> > > > performance (if I remember well NAMD allows the user to indicate
> > > explicitly
> > > > the cpu core/threads to use in a computation).
> > >
> > > Your pinning does reflect the CPU topology -- the 4x7=28 threads are
> > > pinned to consecutive hardware threads (because of -pinstride 1, i.e.
> > > don't skip the second hardware thread of the core). The mapping of
> > > software to hardware threads happens based on a the topology-based
> > > hardware thread indexing, see the hardware detection report in the log
> > > file.
> > >
> > > > When I tried to run two simulations with the following configuration:
> > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1
> > > -gputasks
> > > > 00001111 -pin on -pinoffset 0 -pinstride 1
> > > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1
> > > -gputasks
> > > > 00001111 -pin on -pinoffset 0 -pinstride 32
> > > > the system crashed down. Probably this is normal and I am missing
> > > something
> > > > quite obvious.
> > >
> > > Not really. What do you mean by "crashed down", the machine should not
> > > crash, nor should the simulation. Even though your machine has 32
> > > cores / 64 threads, using all of these may not always be beneficial as
> > > using more threads where there is too little work to scale will have
> > > an overhead. Have you tried using all cores but only 1 thread / core
> > > (i.e. 32 threads in total with pinstride 2)?
> > >
> > > Cheers,
> > > --
> > > Szilárd
> > >
> > > >
> > > > Thanks again for the valuable advices
> > > > Stefano
> > > >
> > > >
> > > >
> > > > Il giorno dom 4 ago 2019 alle ore 01:40 paul buscemi <pbuscemi at q.com>
> ha
> > > > scritto:
> > > >
> > > > > Stefano,
> > > > >
> > > > > A recent run with 140000 atoms, including 10000 isopropanol
> molecules
> > > on
> > > > > top of  an end restrained PDMS surface of  74000 atoms  in a 20 20
> 30
> > > nm
> > > > > box ran at 67 ns/d nvt with the mdrun conditions I posted. It took
> 120
> > > ns
> > > > > for 100 molecules of an adsorbate  to go from solution to the
> > > surface.   I
> > > > > don't think this will set the world ablaze with any benchmarks but
> it
> > > is
> > > > > acceptable to get some work done.
> > > > >
> > > > > Linux Mint Mate 18, AMD Threadripper 32 core 2990wx 4.2Ghz, 32GB
> DDR4,
> > > 2x
> > > > > RTX 2080TI gmx2019 in the simplest gmx configuration for gpus,
> CUDA
> > > > > version 10, Nvidia 410.7p loaded  from the repository
> > > > >
> > > > > Paul
> > > > >
> > > > > > On Aug 3, 2019, at 12:58 PM, paul buscemi <pbuscemi at q.com>
> wrote:
> > > > > >
> > > > > > Stefano,
> > > > > >
> > > > > > Here is a typical run
> > > > > >
> > > > > > fpr minimization mdrun -deffnm   grofile. -nn gpu
> > > > > >
> > > > > > and for other runs for a 32 core
> > > > > >
> > > > > > gmx -deffnm grofile.nvt  -nb gpu -pme gpu -ntomp  8  -ntmpi 8
> -npme
> > > 1
> > > > > -gputasks 0000000011111111  -pin on
> > > > > >
> > > > > > Depending on the molecular system/model   -ntomp -4 -ntmpi 16
> may be
> > > > > faster   - of course adjusting -gputasks
> > > > > >
> > > > > > Rarely do I find that not using ntomp and ntpmi is faster, but
> it is
> > > > > never bad
> > > > > >
> > > > > > Let me know how it goes.
> > > > > >
> > > > > > Paul
> > > > > >
> > > > > >> On Aug 3, 2019, at 4:41 AM, Stefano Guglielmo <
> > > > > stefano.guglielmo at unito.it> wrote:
> > > > > >>
> > > > > >> Hi Paul,
> > > > > >> thanks for the reply. Would you mind posting the command you
> used or
> > > > > >> telling how did you balance the work between cpu and gpu?
> > > > > >>
> > > > > >> What about pinning? Does anyone know how to deal with a cpu
> topology
> > > > > like
> > > > > >> the one reported in my previous post and if it is relevant for
> > > > > performance?
> > > > > >> Thanks
> > > > > >> Stefano
> > > > > >>
> > > > > >> Il giorno sabato 3 agosto 2019, Paul Buscemi <pbuscemi at q.com>
> ha
> > > > > scritto:
> > > > > >>
> > > > > >>> I run the same system and setup but no nvlink. Maestro runs
> both
> > > gpus
> > > > > at
> > > > > >>> 100 percent. Gromacs typically 50 --60 percent can do 600ns/d
> on
> > > 20000
> > > > > >>> atoms
> > > > > >>>
> > > > > >>> PB
> > > > > >>>
> > > > > >>>> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.boyd at uconn.edu
> >
> > > wrote:
> > > > > >>>>
> > > > > >>>> Hi,
> > > > > >>>>
> > > > > >>>> I've done a lot of research/experimentation on this, so I can
> > > maybe
> > > > > get
> > > > > >>> you
> > > > > >>>> started - if anyone has any questions about the essay to
> follow,
> > > feel
> > > > > >>> free
> > > > > >>>> to email me personally, and I'll link it to the email thread
> if it
> > > > > ends
> > > > > >>> up
> > > > > >>>> being pertinent.
> > > > > >>>>
> > > > > >>>> First, there's some more internet resources to checkout. See
> > > Mark's
> > > > > talk
> > > > > >>> at
> > > > > >>>> -
> > > > > >>>> https://bioexcel.eu/webinar-performance-tuning-and-
> > > > > >>> optimization-of-gromacs/
> > > > > >>>> Gromacs development moves fast, but a lot of it is still
> relevant.
> > > > > >>>>
> > > > > >>>> I'll expand a bit here, with the caveat that Gromacs GPU
> > > development
> > > > > is
> > > > > >>>> moving very fast and so the correct commands for optimal
> > > performance
> > > > > are
> > > > > >>>> both system-dependent and a moving target between versions.
> This
> > > is a
> > > > > >>> good
> > > > > >>>> thing - GPUs have revolutionized the field, and with each
> > > iteration we
> > > > > >>> make
> > > > > >>>> better use of them. The downside is that it's unclear exactly
> what
> > > > > sort
> > > > > >>> of
> > > > > >>>> CPU-GPU balance you should look to purchase to take advantage
> of
> > > > > future
> > > > > >>>> developments, though the trend is certainly that more and more
> > > > > >>> computation
> > > > > >>>> is being offloaded to the GPUs.
> > > > > >>>>
> > > > > >>>> The most important consideration is that to get maximum total
> > > > > throughput
> > > > > >>>> performance, you should be running not one but multiple
> > > simulations
> > > > > >>>> simultaneously. You can do this through the -multidir option,
> but
> > > I
> > > > > don't
> > > > > >>>> recommend that in this case, as it requires compiling with
> MPI and
> > > > > limits
> > > > > >>>> some of your options. My run scripts usually use "gmx mdrun
> ...
> > > &" to
> > > > > >>>> initiate subprocesses, with combinations of -ntomp, -ntmpi,
> -pin
> > > > > >>>> -pinoffset, and -gputasks. I can give specific examples if
> you're
> > > > > >>>> interested.
> > > > > >>>>
> > > > > >>>> Another important point is that you can run more simulations
> than
> > > the
> > > > > >>>> number of GPUs you have. Depending on CPU-GPU balance and
> > > quality, you
> > > > > >>>> won't double your throughput by e.g. putting 4 simulations on
> 2
> > > GPUs,
> > > > > but
> > > > > >>>> you might increase it up to 1.5x. This would involve
> targeting the
> > > > > same
> > > > > >>> GPU
> > > > > >>>> with -gputasks.
> > > > > >>>>
> > > > > >>>> Within a simulation, you should set up a benchmarking script
> to
> > > figure
> > > > > >>> out
> > > > > >>>> the best combination of thread-mpi ranks and open-mp threads -
> > > this
> > > > > can
> > > > > >>>> have pretty drastic effects on performance. For example, if
> you
> > > want
> > > > > to
> > > > > >>> use
> > > > > >>>> your entire machine for one simulation (not recommended for
> > > maximal
> > > > > >>>
> > > > > >>> --
> > > > > >>> Gromacs Users mailing list
> > > > > >>>
> > > > > >>> * Please search the archive at http://www.gromacs.org/
> > > > > >>> Support/Mailing_Lists/GMX-Users_List before posting!
> > > > > >>>
> > > > > >>> * Can't post? Read http://www.gromacs.org/
> Support/Mailing_Lists
> > > > > >>>
> > > > > >>> * For (un)subscribe requests visit
> > > > > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> gmx-users
> > > or
> > > > > >>> send a mail to gmx-users-request at gromacs.org.
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Stefano GUGLIELMO PhD
> > > > > >> Assistant Professor of Medicinal Chemistry
> > > > > >> Department of Drug Science and Technology
> > > > > >> Via P. Giuria 9
> > > > > >> 10125 Turin, ITALY
> > > > > >> ph. +39 (0)11 6707178
> > > > > >> --
> > > > > >> Gromacs Users mailing list
> > > > > >>
> > > > > >> * Please search the archive at
> > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > posting!
> > > > > >>
> > > > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >>
> > > > > >> * For (un)subscribe requests visit
> > > > > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> gmx-users
> > > or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > >
> > > > > > --
> > > > > > Gromacs Users mailing list
> > > > > >
> > > > > > * Please search the archive at
> > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > posting!
> > > > > >
> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >
> > > > > > * For (un)subscribe requests visit
> > > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> gmx-users
> > > or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > >
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at
> > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > posting!
> > > > >
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > > > send a mail to gmx-users-request at gromacs.org.
> > > > >
> > > >
> > > >
> > > > --
> > > > Stefano GUGLIELMO PhD
> > > > Assistant Professor of Medicinal Chemistry
> > > > Department of Drug Science and Technology
> > > > Via P. Giuria 9
> > > > 10125 Turin, ITALY
> > > > ph. +39 (0)11 6707178
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> or
> > > send a mail to gmx-users-request at gromacs.org.
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > send a mail to gmx-users-request at gromacs.org.
> >
> >
> >
> > --
> > Stefano GUGLIELMO PhD
> > Assistant Professor of Medicinal Chemistry
> > Department of Drug Science and Technology
> > Via P. Giuria 9
> > 10125 Turin, ITALY
> > ph. +39 (0)11 6707178
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.

-- 
Stefano GUGLIELMO PhD
Assistant Professor of Medicinal Chemistry
Department of Drug Science and Technology
Via P. Giuria 9
10125 Turin, ITALY
ph. +39 (0)11 6707178