[gmx-users] simulation on 2 gpus

Paul Buscemi pbuscemi at q.com
Sat Aug 3 00:29:45 CEST 2019


I run the same system and setup but no nvlink. Maestro runs both gpus at 100 percent. Gromacs typically 50 --60 percent can do 600ns/d on 20000 atoms 

PB

> On Jul 25, 2019, at 9:30 PM, Kevin Boyd <kevin.boyd at uconn.edu> wrote:
> 
> Hi,
> 
> I've done a lot of research/experimentation on this, so I can maybe get you
> started - if anyone has any questions about the essay to follow, feel free
> to email me personally, and I'll link it to the email thread if it ends up
> being pertinent.
> 
> First, there's some more internet resources to checkout. See Mark's talk at
> -
> https://bioexcel.eu/webinar-performance-tuning-and-optimization-of-gromacs/
> Gromacs development moves fast, but a lot of it is still relevant.
> 
> I'll expand a bit here, with the caveat that Gromacs GPU development is
> moving very fast and so the correct commands for optimal performance are
> both system-dependent and a moving target between versions. This is a good
> thing - GPUs have revolutionized the field, and with each iteration we make
> better use of them. The downside is that it's unclear exactly what sort of
> CPU-GPU balance you should look to purchase to take advantage of future
> developments, though the trend is certainly that more and more computation
> is being offloaded to the GPUs.
> 
> The most important consideration is that to get maximum total throughput
> performance, you should be running not one but multiple simulations
> simultaneously. You can do this through the -multidir option, but I don't
> recommend that in this case, as it requires compiling with MPI and limits
> some of your options. My run scripts usually use "gmx mdrun ... &" to
> initiate subprocesses, with combinations of -ntomp, -ntmpi, -pin
> -pinoffset, and -gputasks. I can give specific examples if you're
> interested.
> 
> Another important point is that you can run more simulations than the
> number of GPUs you have. Depending on CPU-GPU balance and quality, you
> won't double your throughput by e.g. putting 4 simulations on 2 GPUs, but
> you might increase it up to 1.5x. This would involve targeting the same GPU
> with -gputasks.
> 
> Within a simulation, you should set up a benchmarking script to figure out
> the best combination of thread-mpi ranks and open-mp threads - this can
> have pretty drastic effects on performance. For example, if you want to use
> your entire machine for one simulation (not recommended for maximal



More information about the gromacs.org_gmx-users mailing list