[gmx-users] simulation on 2 gpus

Fri Jul 26 14:59:25 CEST 2019

Sure - you can do it 2 ways with normal Gromacs. Either run the simulations
in separate terminals, or use ampersands to run them in the background of 1
terminal.

I'll give a concrete example for your threadripper, using 32 of your cores,
so that you could run some other computation on the other 32. I typically
make a bash variable with all the common arguments.

Given tprs run1.tpr ...run4.tpr

gmx_common="gmx mdrun -ntomp 8 -ntmpi 1 -pme gpu -nb gpu -pin on -pinstride
1"
$gmx_common -deffnm run1 -pinoffset 32 -gputasks 00 &
$gmx_common -deffnm run2 -pinoffest 40 -gputasks 00 &
$gmx_common -deffnm run3 -pinoffset 48 -gputasks 11 &
$gmx_common -deffnm run3 -pinoffset 56 -gputasks 11

So run1 will run on cores 32-39, on GPU 0, run2 on cores 40-47 on the same
GPU, and the other 2 runs will use GPU 1. Note the ampersands on the first
3 runs, so they'll go off in the background

I should also have mentioned one peculiarity with running with -ntmpi 1 and
-pme gpu, in that even though there's now only one rank (with nonbonded and
PME both running on it), you still need 2 gpu tasks for that one rank, one
for each type of interaction.

As for multidir, I forget what troubles I ran into exactly, but I was
unable to run some subset of simulations. Anyhow if you aren't running on a
cluster, I see no reason to compile with MPI and have to use srun or slurm,
and need to use gmx_mpi rather than gmx. The built-in thread-mpi gives you
up to 64 threads, and can have a minor (<5% in my experience) performance
benefit over MPI.

Kevin

On Fri, Jul 26, 2019 at 8:21 AM Gregory Man Kai Poon <gpoon at gsu.edu> wrote:

> Hi Kevin,
> Thanks for your very useful post.  Could you give a few command line
> examples on how to start multiple runs at different times (e.g., allocate a
> subset of CPU/GPU to one run, and start another run later using another
> unsubset of yet-unallocated CPU/GPU).  Also, could you elaborate on the
> drawbacks of the MPI compilation that you hinted at?
> Gregory
>
> From: Kevin Boyd<mailto:kevin.boyd at uconn.edu>
> Sent: Thursday, July 25, 2019 10:31 PM
> To: gmx-users at gromacs.org<mailto:gmx-users at gromacs.org>
> Subject: Re: [gmx-users] simulation on 2 gpus
>
> Hi,
>
> I've done a lot of research/experimentation on this, so I can maybe get you
> started - if anyone has any questions about the essay to follow, feel free
> to email me personally, and I'll link it to the email thread if it ends up
> being pertinent.
>
> First, there's some more internet resources to checkout. See Mark's talk at
> -
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbioexcel.eu%2Fwebinar-performance-tuning-and-optimization-of-gromacs%2F&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053958340&amp;sdata=6KGjrrBb8w%2FSqMvTtdzoiufYtOpmOKX5QyYNFAivTMo%3D&amp;reserved=0
> Gromacs development moves fast, but a lot of it is still relevant.
>
> I'll expand a bit here, with the caveat that Gromacs GPU development is
> moving very fast and so the correct commands for optimal performance are
> both system-dependent and a moving target between versions. This is a good
> thing - GPUs have revolutionized the field, and with each iteration we make
> better use of them. The downside is that it's unclear exactly what sort of
> CPU-GPU balance you should look to purchase to take advantage of future
> developments, though the trend is certainly that more and more computation
> is being offloaded to the GPUs.
>
> The most important consideration is that to get maximum total throughput
> performance, you should be running not one but multiple simulations
> simultaneously. You can do this through the -multidir option, but I don't
> recommend that in this case, as it requires compiling with MPI and limits
> some of your options. My run scripts usually use "gmx mdrun ... &" to
> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> -pinoffset, and -gputasks. I can give specific examples if you're
> interested.
>
> Another important point is that you can run more simulations than the
> number of GPUs you have. Depending on CPU-GPU balance and quality, you
> won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
> you might increase it up to 1.5x. This would involve targeting the same GPU
> with -gputasks.
>
> Within a simulation, you should set up a benchmarking script to figure out
> the best combination of thread-mpi ranks and open-mp threads - this can
> have pretty drastic effects on performance. For example, if you want to use
> your entire machine for one simulation (not recommended for maximal
> efficiency), you have a lot of decomposition options (ignoring PME - which
> is important, see below):
>
> -ntmpi 2 -ntomp 32 -gputasks 01
> -ntmpi 4 -ntomp 16 -gputasks 0011
> -ntmpi 8 -ntomp 8  -gputasks 00001111
> -ntmpi 16 -ntomp 4 -gputasks 000000001111111
> (and a few others - note that ntmpi * ntomp = total threads available)
>
> In my experience, you need to scan the options in a benchmarking script for
> each simulation size/content you want to simulate, and the difference
> between the best and the worst can be up to a factor of 2-4 in terms of
> performance. If you're splitting your machine among multiple simulations, I
> suggest running 1 mpi thread (-ntmpi 1) per simulation, unless your
> benchmarking suggests that the optimal performance lies elsewhere.
>
> Things get more complicated when you start putting PME on the GPUs. For the
> machines I work on, putting PME on GPUs absolutely improves performance,
> but I'm not fully confident in that assessment without testing your
> specific machine - you have a lot of cores with that threadripper, and this
> is another area where I expect Gromacs 2020 might shift the GPU-CPU optimal
> balance.
>
> The issue with PME on GPUs is that we can (currently) only have one rank
> doing GPU PME work. So, if we have a machine with say 20 cores and 2 gpus,
> if I run the following
>
> gmx mdrun .... -ntomp 10 -ntmpi 2 -pme gpu -npme 1 -gputasks 01
>
> , two ranks will be started - one with cores 0-9, will work on the
> short-range interactions, offloading where it can to GPU 0, and the PME
> rank (cores 10-19)  will offload to GPU 1. There is one significant problem
> (and one minor problem) with this setup. First, it is massively inefficient
> in terms of load balance. In a typical system (there are exceptions), PME
> takes up ~1/3 of the computation that short-range interactions take. So, we
> are offloading 1/4 of our interactions to one GPU and 3/4 to the other,
> which leads to imbalance. In this specific case (2 GPUs and sufficient
> cores), the most optimal solution is often (but not always) to run with
> -ntmpi 4 (in this example, then -ntomp 5), as the PME rank then gets 1/4 of
> the GPU instructions, proportional to the computation needed.
>
> The second(less critical - don't worry about this unless you're
> CPU-limited) problem is that PME-GPU mpi ranks only use 1 CPU core in their
> calculations. So, with a node of 20 cores and 2 GPUs, if I run a simulation
> with -ntmpi 4 -ntmpi 5 -pme gpu -npme 1 -pme gpu, each one of those ranks
> will have 5 CPUs, but the PME rank will only use one of them. You can
> specify the number of PME cores per rank with -ntomp_pme. This is useful in
> restricted cases. For example, given the above architecture setup (20
> cores, 2 GPUs), I could maximally exploit my CPUs with the following
> commands:
>
> gmx mdrun .... -ntmpi 4 -ntomp 3 -ntomp_pme 1 -pme gpu -npme 1 -gputasks
> 0000 -pin on -pinoffset 0 &
> gmx mdrun .... -ntmpi 4 -ntomp 3 -ntomp_pme 1 -pme gpu -npme 1 -gputasks
> 1111 -pin on -pinoffset 10
>
> where the 1st 10 cores are (0-2 - PP) (3-5 - PP) (6-8 -PP) (9 - PME)
> and similar for the other 10 cores.
>
> There are a few other parameters to scan for minor improvements - for
> example nstlist, which I typically scan in a range between 80-140 for GPU
> simulations, with an effect between 2-5% of performance
>
> I'm happy to expand the discussion with anyone who's interested.
>
> Kevin
>
>
> On Thu, Jul 25, 2019 at 1:47 PM Stefano Guglielmo <
> stefano.guglielmo at unito.it> wrote:
>
> > Dear all,
> > I am trying to run simulation with Gromacs 2019.2 on a workstation with
> an
> > amd Threadripper cpu (32 core, 64 threads, 128 GB RAM and with two rtx
> 2080
> > ti with nvlink bridge. I read user's guide section regarding performance
> > and I am exploring some possibile combinations of cpu/gpu work to run as
> > fast as possible. I was wondering if some of you has experience of
> running
> > on more than one gpu with several cores and can give some hints as
> starting
> > point.
> > Thanks
> > Stefano
> >
> >
> > --
> > Stefano GUGLIELMO PhD
> > Assistant Professor of Medicinal Chemistry
> > Department of Drug Science and Technology
> > Via P. Giuria 9
> > 10125 Turin, ITALY
> > ph. +39 (0)11 6707178
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_List&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053958340&amp;sdata=2%2FCC3SZgnYolAwNRUaPg1%2BmCc1%2Bb%2FZwU38g9FxqJp2A%3D&amp;reserved=0
> > before posting!
> >
> > * Can't post? Read
> >
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053958340&amp;sdata=esfRK00iIHBlFN285W6JkWFr8S4HQ3%2B9jn3R45v%2FBvY%3D&amp;reserved=0
> >
> > * For (un)subscribe requests visit
> >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-users&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=5bBv3pZe45cZk8wKpCZgYlCxZEQ5sD0RO8e8EcgqnOw%3D&amp;reserved=0
> > or send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_List&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=xMCvfr9LVUW37ZurhHDi%2BqW76PZnH78E2MIR7yQF6Qw%3D&amp;reserved=0
> before posting!
>
> * Can't post? Read
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=fHDaHZPUZf57P%2FMXkIxN%2FqmtkRtvDu4B%2B9EQiU20BnA%3D&amp;reserved=0
>
> * For (un)subscribe requests visit
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-users&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=5bBv3pZe45cZk8wKpCZgYlCxZEQ5sD0RO8e8EcgqnOw%3D&amp;reserved=0
> or send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists%2FGMX-Users_List&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=xMCvfr9LVUW37ZurhHDi%2BqW76PZnH78E2MIR7yQF6Qw%3D&amp;reserved=0
> before posting!
>
> * Can't post? Read
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gromacs.org%2FSupport%2FMailing_Lists&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=fHDaHZPUZf57P%2FMXkIxN%2FqmtkRtvDu4B%2B9EQiU20BnA%3D&amp;reserved=0
>
> * For (un)subscribe requests visit
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmaillist.sys.kth.se%2Fmailman%2Flistinfo%2Fgromacs.org_gmx-users&amp;data=02%7C01%7Ckevin.boyd%40uconn.edu%7C370b1f9ca9ad4a5e52af08d711c3d178%7C17f1a87e2a254eaab9df9d439034b080%7C0%7C0%7C636997405053968336&amp;sdata=5bBv3pZe45cZk8wKpCZgYlCxZEQ5sD0RO8e8EcgqnOw%3D&amp;reserved=0
> or send a mail to gmx-users-request at gromacs.org.
>