[gmx-users] simulation on 2 gpus

Fri Jul 26 14:57:37 CEST 2019

Hi,

It's rather like the example at
http://manual.gromacs.org/current/user-guide/mdrun-performance.html#examples-for-mdrun-on-one-node
where
instead of

gmx mdrun -nt 6 -pin on -pinoffset 0 -pinstride 1
gmx mdrun -nt 6 -pin on -pinoffset 6 -pinstride 1

to run on a machine with 12 hardware threads, you want to adapt the number
of threads and also specify disjoint GPU sets, e.g.

gmx mdrun -nt 32 -pin on -pinoffset 0 -pinstride 1 -gpu_id 0
gmx mdrun -nt 32 -pin on -pinoffset 32 -pinstride 1 -gpu_id 1

That lets mdrun choose the mix of thread-MPI ranks vs OpenMP threads on
those ranks, but you could replace -nt 32 with -ntmpi N -ntomp M so long as
the product of N and M are 32.

Mark

On Fri, 26 Jul 2019 at 14:22, Gregory Man Kai Poon <gpoon at gsu.edu> wrote:

> Hi Kevin,
> Thanks for your very useful post.  Could you give a few command line
> examples on how to start multiple runs at different times (e.g., allocate a
> subset of CPU/GPU to one run, and start another run later using another
> unsubset of yet-unallocated CPU/GPU).  Also, could you elaborate on the
> drawbacks of the MPI compilation that you hinted at?
> Gregory
>
> From: Kevin Boyd<mailto:kevin.boyd at uconn.edu>
> Sent: Thursday, July 25, 2019 10:31 PM
> To: gmx-users at gromacs.org<mailto:gmx-users at gromacs.org>
> Subject: Re: [gmx-users] simulation on 2 gpus
>
> Hi,
>
> I've done a lot of research/experimentation on this, so I can maybe get you
> started - if anyone has any questions about the essay to follow, feel free
> to email me personally, and I'll link it to the email thread if it ends up
> being pertinent.
>
> First, there's some more internet resources to checkout. See Mark's talk at
> -
>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioexcel.eu%2Fwebinar-performance-tuning-and-optimization-of-gromacs%2F&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=%2BaUIuI63M7HRo%2B2VSUs0WIr0nYB10jE7lxnHW6gM8Os%3D&amp;reserved=0
> Gromacs development moves fast, but a lot of it is still relevant.
>
> I'll expand a bit here, with the caveat that Gromacs GPU development is
> moving very fast and so the correct commands for optimal performance are
> both system-dependent and a moving target between versions. This is a good
> thing - GPUs have revolutionized the field, and with each iteration we make
> better use of them. The downside is that it's unclear exactly what sort of
> CPU-GPU balance you should look to purchase to take advantage of future
> developments, though the trend is certainly that more and more computation
> is being offloaded to the GPUs.
>
> The most important consideration is that to get maximum total throughput
> performance, you should be running not one but multiple simulations
> simultaneously. You can do this through the -multidir option, but I don't
> recommend that in this case, as it requires compiling with MPI and limits
> some of your options. My run scripts usually use "gmx mdrun ... &" to
> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> -pinoffset, and -gputasks. I can give specific examples if you're
> interested.
>
> Another important point is that you can run more simulations than the
> number of GPUs you have. Depending on CPU-GPU balance and quality, you
> won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
> you might increase it up to 1.5x. This would involve targeting the same GPU
> with -gputasks.
>
> Within a simulation, you should set up a benchmarking script to figure out
> the best combination of thread-mpi ranks and open-mp threads - this can
> have pretty drastic effects on performance. For example, if you want to use
> your entire machine for one simulation (not recommended for maximal
> efficiency), you have a lot of decomposition options (ignoring PME - which
> is important, see below):
>
> -ntmpi 2 -ntomp 32 -gputasks 01
> -ntmpi 4 -ntomp 16 -gputasks 0011
> -ntmpi 8 -ntomp 8  -gputasks 00001111
> -ntmpi 16 -ntomp 4 -gputasks 000000001111111
> (and a few others - note that ntmpi * ntomp = total threads available)
>
> In my experience, you need to scan the options in a benchmarking script for
> each simulation size/content you want to simulate, and the difference
> between the best and the worst can be up to a factor of 2-4 in terms of
> performance. If you're splitting your machine among multiple simulations, I
> suggest running 1 mpi thread (-ntmpi 1) per simulation, unless your
> benchmarking suggests that the optimal performance lies elsewhere.
>
> Things get more complicated when you start putting PME on the GPUs. For the
> machines I work on, putting PME on GPUs absolutely improves performance,
> but I'm not fully confident in that assessment without testing your
> specific machine - you have a lot of cores with that threadripper, and this
> is another area where I expect Gromacs 2020 might shift the GPU-CPU optimal
> balance.
>
> The issue with PME on GPUs is that we can (currently) only have one rank
> doing GPU PME work. So, if we have a machine with say 20 cores and 2 gpus,
> if I run the following
>
> gmx mdrun .... -ntomp 10 -ntmpi 2 -pme gpu -npme 1 -gputasks 01
>
> , two ranks will be started - one with cores 0-9, will work on the
> short-range interactions, offloading where it can to GPU 0, and the PME
> rank (cores 10-19)  will offload to GPU 1. There is one significant problem
> (and one minor problem) with this setup. First, it is massively inefficient
> in terms of load balance. In a typical system (there are exceptions), PME
> takes up ~1/3 of the computation that short-range interactions take. So, we
> are offloading 1/4 of our interactions to one GPU and 3/4 to the other,
> which leads to imbalance. In this specific case (2 GPUs and sufficient
> cores), the most optimal solution is often (but not always) to run with
> -ntmpi 4 (in this example, then -ntomp 5), as the PME rank then gets 1/4 of
> the GPU instructions, proportional to the computation needed.
>
> The second(less critical - don't worry about this unless you're
> CPU-limited) problem is that PME-GPU mpi ranks only use 1 CPU core in their
> calculations. So, with a node of 20 cores and 2 GPUs, if I run a simulation
> with -ntmpi 4 -ntmpi 5 -pme gpu -npme 1 -pme gpu, each one of those ranks
> will have 5 CPUs, but the PME rank will only use one of them. You can
> specify the number of PME cores per rank with -ntomp_pme. This is useful in
> restricted cases. For example, given the above architecture setup (20
> cores, 2 GPUs), I could maximally exploit my CPUs with the following
> commands:
>
> gmx mdrun .... -ntmpi 4 -ntomp 3 -ntomp_pme 1 -pme gpu -npme 1 -gputasks
> 0000 -pin on -pinoffset 0 &
> gmx mdrun .... -ntmpi 4 -ntomp 3 -ntomp_pme 1 -pme gpu -npme 1 -gputasks
> 1111 -pin on -pinoffset 10
>
> where the 1st 10 cores are (0-2 - PP) (3-5 - PP) (6-8 -PP) (9 - PME)
> and similar for the other 10 cores.
>
> There are a few other parameters to scan for minor improvements - for
> example nstlist, which I typically scan in a range between 80-140 for GPU
> simulations, with an effect between 2-5% of performance
>
> I'm happy to expand the discussion with anyone who's interested.
>
> Kevin
>
>
> On Thu, Jul 25, 2019 at 1:47 PM Stefano Guglielmo <
> stefano.guglielmo at unito.it> wrote:
>
> > Dear all,
> > I am trying to run simulation with Gromacs 2019.2 on a workstation with
> an
> > amd Threadripper cpu (32 core, 64 threads, 128 GB RAM and with two rtx
> 2080
> > ti with nvlink bridge. I read user's guide section regarding performance
> > and I am exploring some possibile combinations of cpu/gpu work to run as
> > fast as possible. I was wondering if some of you has experience of
> running
> > on more than one gpu with several cores and can give some hints as
> starting
> > point.
> > Thanks
> > Stefano
> >
> >
> > --
> > Stefano GUGLIELMO PhD
> > Assistant Professor of Medicinal Chemistry
> > Department of Drug Science and Technology
> > Via P. Giuria 9
> > 10125 Turin, ITALY
> > ph. +39 (0)11 6707178
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> >
> https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_List&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=Cl4IpR%2B4PrlBvfwODwfWNzDLao3eTVg%2BQDXNiQCFJno%3D&amp;reserved=0
> > before posting!
> >
> > * Can't post? Read
> >
> https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=XjyPITW4lVBI09tVbqPwwFAan22YS6ZkGEkBk9fuaGM%3D&amp;reserved=0
> >
> > * For (un)subscribe requests visit
> >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-users&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=t0GoJ9Udb%2FRCmHCOVgyC242c%2FCHGJJ9WMi5KUfe9T8k%3D&amp;reserved=0
> > or send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_List&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=Cl4IpR%2B4PrlBvfwODwfWNzDLao3eTVg%2BQDXNiQCFJno%3D&amp;reserved=0
> before posting!
>
> * Can't post? Read
> https://nam03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=XjyPITW4lVBI09tVbqPwwFAan22YS6ZkGEkBk9fuaGM%3D&amp;reserved=0
>
> * For (un)subscribe requests visit
>
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-users&amp;data=02%7C01%7Cgpoon%40gsu.edu%7Cfd42b6ec3efa41d855b608d711714bdd%7C515ad73d8d5e4169895c9789dc742a70%7C0%7C0%7C636997050628368338&amp;sdata=t0GoJ9Udb%2FRCmHCOVgyC242c%2FCHGJJ9WMi5KUfe9T8k%3D&amp;reserved=0
> or send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>