[gmx-users] On the choosing of GPU for MD runs

Szilárd Páll szilard.pall at cbr.su.se
Fri Jan 25 03:19:08 CET 2013


Typically, the fastest the GPU the better. On a more detailed level, less
multiprocessors (that's not "cores"!) with faster clock speeds help in
achieving higher performance with small systems. As GPUs are highly
parallel processors, 20-30k atoms/GPU can be considered small-ish -
especially with Kepler GPUs.

Our algorithms neither require large amount of memory nor are affected very
much by the speed of the on-board GPU main memory. For this reason ECC,
which is available on Tesla compute cards, has a negligible effect on the

The ideal CPU and GPU depends on the type of simulation and the amount/mix
of force computational work determined by system and settings.  With PME is
used, the bonded force and long-range electrostatics is calculated on the
CPU while the GPU computes non-bonded forces (see the wiki for more
details). Therefore, the longer the cut-off you provide in the mdp file
(which will result in coarser the PME grid) the relatively higher the GPU
non-bonded workload compared to the CPU PME workload will be. mdrun does
automated particle-particle - PME load balancing but I will not get into
those details now. The point is that if for a given cut-off the GPU is not
fast enough to finish before the CPU, the CPU will have to wait and
therefore idle; the opposite case will be handled by the PP-PME load
balancing. Therefore, for a modern desktop CPU, as the GROMACS CPU code is
also highly tuned, you will typically need a mid to high-end desktop
GeForce card or a Tesla GPU. If you use long cut-offs (>1.2 nm), my guess
is that you might need a faster GPU for the pretty fast Intel CPU you have,
but that you can see by looking at the amount of "Wait for GPU" time in the
log file.

For reference, on an Intel i7-3930K and a GTX 680 with a 134k solvated
protein system with PME, rc=1.0, dt=5fs (with virtual sites) I get 33
ns/day (15 ns/day with dt=0.2 without virtual sites) with pretty good
CPU-GPU balance (only a few % waiting for GPU). If I increase the cut-off
to 1.2 nm, the performance drops to 26.8 ns/day caused by GPU taking much
more time to compute due to the larger cut off (the force-only kernel takes
30% more time) and the CPU ends up waiting for 35% of the total time.



On Mon, Jan 14, 2013 at 9:00 AM, James Starlight <jmsstarlight at gmail.com>wrote:

> Dear Gromacs Users!
> I wounder to know some detailes about choosing of the gpu for md with
> gromacs. In particular on what properties of the videoadapter should I
> pay most attention ? What modern gpu nvidia-series might give best
> performance (gtx 6xx, tesla or quadro series) ? Could you provide me
> with some bechmarks besides the information present on the Gromacs web
> ?
> For instance with the gpu Geforce GTX 670 + core i5 (4 cores) I have
> performance 10ns\day for explicit system with 67000 atoms ( protein in
> tip3p water). Have someone better results with common home-like
> desktop? :)
> James
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

More information about the gromacs.org_gmx-users mailing list