[gmx-users] simulation on 2 gpus

Szilárd Páll pall.szilard at gmail.com
Fri Sep 6 17:57:01 CEST 2019


On Fri, Sep 6, 2019 at 3:47 PM Stefano Guglielmo
<stefano.guglielmo at unito.it> wrote:
>
> Hi Szilard,
>
> thanks for suggestions.
>
>
> As for the strange crash, the workstation works fine using only cpu; the
> problem seems to be related to gpu usage, when both cards are used for 200
> W over 250 (more or less) the workstation turns off. It is not about PSU
> (even in the "offending" case we are quite below the maximum power),

How far below? Note that PSU efficiency and quality does also affect
stability at high load.

> and it
> is neither related to temperature (it happens even if gpu temp is around
> 55-60 °C). The vendor did some tests and accordingly the hardware seems to
> be ok. Do you (or anyone else in the list) have any particular test to
> suggest that can more specifically help to diagnose the problem?

I suggest the following for load testing:
https://github.com/ComputationalRadiationPhysics/cuda_memtest
and for memory stress testing:
https://github.com/ComputationalRadiationPhysics/cuda_memtest

Cheers,
--

Szilárd

>
> Any opinion is appreciated,
>
> thanks
>
> Il giorno mercoledì 21 agosto 2019, Szilárd Páll <pall.szilard at gmail.com>
> ha scritto:
>
> > Hi Stefano,
> >
> >
> > On Tue, Aug 20, 2019 at 3:29 PM Stefano Guglielmo
> > <stefano.guglielmo at unito.it> wrote:
> > >
> > > Dear Szilard,
> > >
> > > thanks for the very clear answer.
> > > Following your suggestion I tried to run without DD; for the same system
> > I
> > > run two simulations on two gpus:
> > >
> > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > > -gputasks 00 -pin on -pinoffset 0 -pinstride 1
> > >
> > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > > -gputasks 11 -pin on -pinoffset 28 -pinstride 1
> > >
> > > but again the system crashed; with this I mean that after few minutes the
> > > machine goes off (power off) without any error message, even without
> > using
> > > all the threads.
> >
> > That is not normal and I strongly recommend investigating it as it
> > could be a sign of an underlying system/hardware instability or fault
> > which could ultimately lead to incorrect simulation results.
> >
> > Are you sure that:
> > - your machine is stable and reliable at high loads; is the PSU sufficient?
> > - your hardware has been thoroughly stress-tested and it does not show
> > instabilities?
> >
> > Does the crash also happen with GROMACS running on the CPU only (using
> > all cores)?
> > I'd recommend running some stress-tests that fully load the machine
> > for a few hours to see if the error persists.
> >
> > > I then tried running the two simulations on the same gpu without DD:
> > >
> > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > > -gputasks 00 -pin on -pinoffset 0 -pinstride 1
> > >
> > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > > -gputasks 00 -pin on -pinoffset 28 -pinstride 1
> > >
> > > and I obtained better performance (about 70 ns/day) with a massive use of
> > > the gpu (around 90%), comparing to the two runs on two gpus I reported in
> > > the previous post
> > > (gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > -gputasks
> > > 0000000 -pin on -pinoffset 0 -pinstride 1
> > >  gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > -gputasks 1111111 -pin on -pinoffset 28 -pinstride 1).
> >
> > That is expected; domain-decomposition on a single GPU is unnecessary
> > and introduces overheads that limit performance.
> >
> > > As for pinning, cpu topology according to log file is:
> > > hardware topology: Basic
> > >     Sockets, cores, and logical processors:
> > >       Socket  0: [   0  32] [   1  33] [   2  34] [   3  35] [   4  36] [
> > > 5  37] [   6  38] [   7  39] [  16  48] [  17  49] [  18  50] [  19  51]
> > [
> > >  20  52] [  21  53] [  22  54] [  23  55] [   8  40] [   9  41] [  10
> > 42]
> > > [  11  43] [  12  44] [  13  45] [  14  46] [  15  47] [  24  56] [  25
> > >  57] [  26  58] [  27  59] [  28  60] [  29  61] [  30  62] [  31  63]
> > > If I understand well (absolutely not sure) it should not be that
> > convenient
> > > to pin to consecutive threads,
> >
> > On the contrary, pinning to consecutive threads is the recommended
> > behavior. More generally, application threads are expected to be
> > pinned to consecutive cores (as threading parallelization will benefit
> > from the resulting cache access patterns); now, CPU cores can have
> > multiple hardware threads and depending on whether using one or
> > mulitpole makes sense (performance-wise), will determine whether a
> > stride of 1 or 2 is best. Typically, when most work is offloaded to a
> > GPU and many CPU cores are available 1 thread/core is best.
> >
> > Note that the above topology mapping simply means that the indexed
> > entities that the operating system calls "CPU" grouped in "[]"
> > correspond to hardware threads of the same core, i.e. core 0 is [0
> > 32], core 1 [1 33], etc. Pinning with a stride happens into this map:
> > - with a -pinstride 1 thread mapping will be (app thread->hardware
> > thread): 0->0, 1->32, 2->1, 3->33,...
> > - with a -pinstride 2 thread mapping will be (-||-): 0->0, 1->1, 2->2,
> > 3->3, ...
> >
> > > and indeed I found a subtle degradation of
> > > performance for a single simulation, switching from:
> > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > -gputasks
> > > 00 -pin on
> > > to
> > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > -gputasks
> > > 00 -pin on -pinoffset 0 -pinstride 1.
> >
> > If you compare the log files of the two, you should notice that the
> > former used a pinstride 2 resulting in the use 28 cores while the
> > latter using only 14 cores; the likely reason for only a small
> > difference is that there is not enough CPU work to scale to 28 cores
> > and additionally, these specific TR CPUs are tricky to scale across
> > using wide multi-threaded parallelization.
> >
> > Cheers,
> > --
> > Szilárd
> >
> >
> > >
> > > Thanks again
> > > Stefano
> > >
> > >
> > >
> > >
> > > Il giorno ven 16 ago 2019 alle ore 17:48 Szilárd Páll <
> > > pall.szilard at gmail.com> ha scritto:
> > >
> > > > On Mon, Aug 5, 2019 at 5:00 PM Stefano Guglielmo
> > > > <stefano.guglielmo at unito.it> wrote:
> > > > >
> > > > > Dear Paul,
> > > > > thanks for suggestions. Following them I managed to run 91 ns/day
> > for the
> > > > > system I referred to in my previous post with the configuration:
> > > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > > -gputasks
> > > > > 0000111 -pin on (still 28 threads seems to be the best choice)
> > > > >
> > > > > and 56 ns/day for two independent runs:
> > > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > > -gputasks
> > > > > 0000000 -pin on -pinoffset 0 -pinstride 1
> > > > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 7 -npme 1
> > > > -gputasks
> > > > > 1111111 -pin on -pinoffset 28 -pinstride 1
> > > > > which is a fairly good result.
> > > >
> > > > Use no DD in single-GPU runs, i.e. for the latter, just simply
> > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 28 -ntmpi 1 -npme 0
> > > > -gputasks 00 -pin on -pinoffset 0 -pinstride 1
> > > >
> > > > You can also have mdrun's multidir functionality manage an ensemble of
> > > > jobs (related or not) so you don't have to manually start, calculate
> > > > pinning, etc.
> > > >
> > > >
> > > > > I am still wondering if somehow I should pin the threads in some
> > > > different
> > > > > way in order to reflect the cpu topology and if this can influence
> > > > > performance (if I remember well NAMD allows the user to indicate
> > > > explicitly
> > > > > the cpu core/threads to use in a computation).
> > > >
> > > > Your pinning does reflect the CPU topology -- the 4x7=28 threads are
> > > > pinned to consecutive hardware threads (because of -pinstride 1, i.e.
> > > > don't skip the second hardware thread of the core). The mapping of
> > > > software to hardware threads happens based on a the topology-based
> > > > hardware thread indexing, see the hardware detection report in the log
> > > > file.
> > > >
> > > > > When I tried to run two simulations with the following configuration:
> > > > > gmx mdrun -deffnm run -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1
> > > > -gputasks
> > > > > 00001111 -pin on -pinoffset 0 -pinstride 1
> > > > > gmx mdrun -deffnm run2 -nb gpu -pme gpu -ntomp 4 -ntmpi 8 -npme 1
> > > > -gputasks
> > > > > 00001111 -pin on -pinoffset 0 -pinstride 32
> > > > > the system crashed down. Probably this is normal and I am missing
> > > > something
> > > > > quite obvious.
> > > >
> > > > Not really. What do you mean by "crashed down", the machine should not
> > > > crash, nor should the simulation. Even though your machine has 32
> > > > cores / 64 threads, using all of these may not always be beneficial as
> > > > using more threads where there is too little work to scale will have
> > > > an overhead. Have you tried using all cores but only 1 thread / core
> > > > (i.e. 32 threads in total with pinstride 2)?
> > > >
> > > > Cheers,
> > > > --
> > > > Szilárd
> > > >
> > > > >
> > > > > Thanks again for the valuable advices
> > > > > Stefano
> > > > >
> > > > >
> > > > >
> > > > > Il giorno dom 4 ago 2019 alle ore 01:40 paul buscemi <pbuscemi at q.com>
> > ha
> > > > > scritto:
> > > > >
> > > > > > Stefano,
> > > > > >
> > > > > > A recent run with 140000 atoms, including 10000 isopropanol
> > molecules
> > > > on
> > > > > > top of  an end restrained PDMS surface of  74000 atoms  in a 20 20
> > 30
> > > > nm
> > > > > > box ran at 67 ns/d nvt with the mdrun conditions I posted. It took
> > 120
> > > > ns
> > > > > > for 100 molecules of an adsorbate  to go from solution to the
> > > > surface.   I
> > > > > > don't think this will set the world ablaze with any benchmarks but
> > it
> > > > is
> > > > > > acceptable to get some work done.
> > > > > >
> > > > > > Linux Mint Mate 18, AMD Threadripper 32 core 2990wx 4.2Ghz, 32GB
> > DDR4,
> > > > 2x
> > > > > > RTX 2080TI gmx2019 in the simplest gmx configuration for gpus,
> > CUDA
> > > > > > version 10, Nvidia 410.7p loaded  from the repository
> > > > > >
> > > > > > Paul
> > > > > >
> > > > > > > On Aug 3, 2019, at 12:58 PM, paul buscemi <pbuscemi at q.com>
> > wrote:
> > > > > > >
> > > > > > > Stefano,
> > > > > > >
> > > > > > > Here is a typical run
> > > > > > >
> > > > > > > fpr minimization mdrun -deffnm   grofile. -nn gpu
> > > > > > >
> > > > > > > and for other runs for a 32 core
> > > > > > >
> > > > > > > gmx -deffnm grofile.nvt  -nb gpu -pme gpu -ntomp  8  -ntmpi 8
> > -npme
> > > > 1
> > > > > > -gputasks 0000000011111111  -pin on
> > > > > > >
> > > > > > > Depending on the molecular system/model   -ntomp -4 -ntmpi 16
> > may be
> > > > > > faster   - of course adjusting -gputasks
> > > > > > >
> > > > > > > Rarely do I find that not using ntomp and ntpmi is faster, but
> > it is
> > > > > > never bad
> > > > > > >
> > > > > > > Let me know how it goes.
> > > > > > >
> > > > > > > Paul
> > > > > > >
> > > > > > >> On Aug 3, 2019, at 4:41 AM, Stefano Guglielmo <
> > > > > > stefano.guglielmo at unito.it> wrote:
> > > > > > >>
> > > > > > >> Hi Paul,
> > > > > > >> thanks for the reply. Would you mind posting the command you
> > used or
> > > > > > >> telling how did you balance the work between cpu and gpu?
> > > > > > >>
> > > > > > >> What about pinning? Does anyone know how to deal with a cpu
> > topology
> > > > > > like
> > > > > > >> the one reported in my previous post and if it is relevant for
> > > > > > performance?
> > > > > > >> Thanks
> > > > > > >> Stefano
> > > > > > >>
> > > > > > >> Il giorno sabato 3 agosto 2019, Paul Buscemi <pbuscemi at q.com>
> > ha
> > > > > > scritto:
> > > > > > >>
> > > > > > >>> I run the same system and setup but no nvlink. Maestro runs
> > both
> > > > gpus
> > > > > > at
> > > > > > >>> 100 percent. Gromacs typically 50 --60 percent can do 600ns/d
> > on
> > > > 20000
> > > > > > >>> atoms
> > > > > > >>>
> > > > > > >>> PB
> > > > > > >>>
> > > > > > >>>> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.boyd at uconn.edu
> > >
> > > > wrote:
> > > > > > >>>>
> > > > > > >>>> Hi,
> > > > > > >>>>
> > > > > > >>>> I've done a lot of research/experimentation on this, so I can
> > > > maybe
> > > > > > get
> > > > > > >>> you
> > > > > > >>>> started - if anyone has any questions about the essay to
> > follow,
> > > > feel
> > > > > > >>> free
> > > > > > >>>> to email me personally, and I'll link it to the email thread
> > if it
> > > > > > ends
> > > > > > >>> up
> > > > > > >>>> being pertinent.
> > > > > > >>>>
> > > > > > >>>> First, there's some more internet resources to checkout. See
> > > > Mark's
> > > > > > talk
> > > > > > >>> at
> > > > > > >>>> -
> > > > > > >>>> https://bioexcel.eu/webinar-performance-tuning-and-
> > > > > > >>> optimization-of-gromacs/
> > > > > > >>>> Gromacs development moves fast, but a lot of it is still
> > relevant.
> > > > > > >>>>
> > > > > > >>>> I'll expand a bit here, with the caveat that Gromacs GPU
> > > > development
> > > > > > is
> > > > > > >>>> moving very fast and so the correct commands for optimal
> > > > performance
> > > > > > are
> > > > > > >>>> both system-dependent and a moving target between versions.
> > This
> > > > is a
> > > > > > >>> good
> > > > > > >>>> thing - GPUs have revolutionized the field, and with each
> > > > iteration we
> > > > > > >>> make
> > > > > > >>>> better use of them. The downside is that it's unclear exactly
> > what
> > > > > > sort
> > > > > > >>> of
> > > > > > >>>> CPU-GPU balance you should look to purchase to take advantage
> > of
> > > > > > future
> > > > > > >>>> developments, though the trend is certainly that more and more
> > > > > > >>> computation
> > > > > > >>>> is being offloaded to the GPUs.
> > > > > > >>>>
> > > > > > >>>> The most important consideration is that to get maximum total
> > > > > > throughput
> > > > > > >>>> performance, you should be running not one but multiple
> > > > simulations
> > > > > > >>>> simultaneously. You can do this through the -multidir option,
> > but
> > > > I
> > > > > > don't
> > > > > > >>>> recommend that in this case, as it requires compiling with
> > MPI and
> > > > > > limits
> > > > > > >>>> some of your options. My run scripts usually use "gmx mdrun
> > ...
> > > > &" to
> > > > > > >>>> initiate subprocesses, with combinations of -ntomp, -ntmpi,
> > -pin
> > > > > > >>>> -pinoffset, and -gputasks. I can give specific examples if
> > you're
> > > > > > >>>> interested.
> > > > > > >>>>
> > > > > > >>>> Another important point is that you can run more simulations
> > than
> > > > the
> > > > > > >>>> number of GPUs you have. Depending on CPU-GPU balance and
> > > > quality, you
> > > > > > >>>> won't double your throughput by e.g. putting 4 simulations on
> > 2
> > > > GPUs,
> > > > > > but
> > > > > > >>>> you might increase it up to 1.5x. This would involve
> > targeting the
> > > > > > same
> > > > > > >>> GPU
> > > > > > >>>> with -gputasks.
> > > > > > >>>>
> > > > > > >>>> Within a simulation, you should set up a benchmarking script
> > to
> > > > figure
> > > > > > >>> out
> > > > > > >>>> the best combination of thread-mpi ranks and open-mp threads -
> > > > this
> > > > > > can
> > > > > > >>>> have pretty drastic effects on performance. For example, if
> > you
> > > > want
> > > > > > to
> > > > > > >>> use
> > > > > > >>>> your entire machine for one simulation (not recommended for
> > > > maximal
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Gromacs Users mailing list
> > > > > > >>>
> > > > > > >>> * Please search the archive at http://www.gromacs.org/
> > > > > > >>> Support/Mailing_Lists/GMX-Users_List before posting!
> > > > > > >>>
> > > > > > >>> * Can't post? Read http://www.gromacs.org/
> > Support/Mailing_Lists
> > > > > > >>>
> > > > > > >>> * For (un)subscribe requests visit
> > > > > > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> > gmx-users
> > > > or
> > > > > > >>> send a mail to gmx-users-request at gromacs.org.
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Stefano GUGLIELMO PhD
> > > > > > >> Assistant Professor of Medicinal Chemistry
> > > > > > >> Department of Drug Science and Technology
> > > > > > >> Via P. Giuria 9
> > > > > > >> 10125 Turin, ITALY
> > > > > > >> ph. +39 (0)11 6707178
> > > > > > >> --
> > > > > > >> Gromacs Users mailing list
> > > > > > >>
> > > > > > >> * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > > posting!
> > > > > > >>
> > > > > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > > >>
> > > > > > >> * For (un)subscribe requests visit
> > > > > > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> > gmx-users
> > > > or
> > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > > >
> > > > > > > --
> > > > > > > Gromacs Users mailing list
> > > > > > >
> > > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > > posting!
> > > > > > >
> > > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > > >
> > > > > > > * For (un)subscribe requests visit
> > > > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_
> > gmx-users
> > > > or
> > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > >
> > > > > > --
> > > > > > Gromacs Users mailing list
> > > > > >
> > > > > > * Please search the archive at
> > > > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > > > posting!
> > > > > >
> > > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > > >
> > > > > > * For (un)subscribe requests visit
> > > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > > > > send a mail to gmx-users-request at gromacs.org.
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Stefano GUGLIELMO PhD
> > > > > Assistant Professor of Medicinal Chemistry
> > > > > Department of Drug Science and Technology
> > > > > Via P. Giuria 9
> > > > > 10125 Turin, ITALY
> > > > > ph. +39 (0)11 6707178
> > > > > --
> > > > > Gromacs Users mailing list
> > > > >
> > > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > > >
> > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > > >
> > > > > * For (un)subscribe requests visit
> > > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
> > or
> > > > send a mail to gmx-users-request at gromacs.org.
> > > > --
> > > > Gromacs Users mailing list
> > > >
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > > > posting!
> > > >
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > > * For (un)subscribe requests visit
> > > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > > > send a mail to gmx-users-request at gromacs.org.
> > >
> > >
> > >
> > > --
> > > Stefano GUGLIELMO PhD
> > > Assistant Professor of Medicinal Chemistry
> > > Department of Drug Science and Technology
> > > Via P. Giuria 9
> > > 10125 Turin, ITALY
> > > ph. +39 (0)11 6707178
> > > --
> > > Gromacs Users mailing list
> > >
> > > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/GMX-Users_List before posting!
> > >
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> > > * For (un)subscribe requests visit
> > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
>
>
>
> --
> Stefano GUGLIELMO PhD
> Assistant Professor of Medicinal Chemistry
> Department of Drug Science and Technology
> Via P. Giuria 9
> 10125 Turin, ITALY
> ph. +39 (0)11 6707178
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list