[gmx-users] General conceptual question about advantage of GPUs

Szilárd Páll szilard.pall at cbr.su.se
Wed Apr 10 16:24:49 CEST 2013

Hi Andrew,

As others have said, 40x speedup with GPUs is certainly possible, but more
often than not comparisons leading to such numbers are not entirely fair -
at least from a computational perspective. The most common case is when
people compare legacy, poorly (SIMD)-optimized codes with some new GPU
accelerated code that typically somebody just spent a lot of effort on. Of
course, performance improvement for the users of any code is beneficial,
but the way such achievements are pitched (and published), is IMHO rather

If you look around in the computing/computational science literature and
try to find the sound comparisons of highly (or at least comparably) tuned
CPU and GPU codes, you will mostly see around 3-10x speedup reported. There
are cases when the performance difference can be much more e.g solvers,
random number generators, and in general codes that are more suitable for
GPUs and can make use of the Teraflops of raw computational horsepower. For
a CPU-GPU comparison of "basic" algorithms you can have a look at this
recent report from NVIDIA in which you'll see that even BLAS is only 2-10x
faster on GPUs: http://goo.gl/I7ADg

When it comes to GROMACS, you can expect 2-4x performance improvement (with
PME), but the numbers will depend a *lot* on simulation settings, hardware,
as well as system size. Besides Erik Lindahl's soon to be published
web-seminar, you can also have a look at related talks from the 2012 and
2013 GTC conference.

Finally, just a note on the ~5 ns/day on 24 CPU (cores I assume). I'm not
sure what exactly are you simulating, but the numbers seem to be rather
low. With a small solvated protein system of ~7000 atoms with vsites and 5
fs time steps, on a single 6-core Core i7-3930K CPU I get 250 ns/day and by
adding a GeForce GTX 680 GPU I get 600 ns/day. Even with accounting for the
2-2.5x performance difference due to longer time steps, this is still more
than an order of magnitude higher than what you get. If you really need
particle decomposition, you can consider using the Verlet scheme without
domain-decomposition with OpenMP multithreading parallelization which is
roughly equivalent with PD and runs efficiently on up to 24-32 threads on


On Tue, Apr 9, 2013 at 5:38 PM, Andrew DeYoung <adeyoung at andrew.cmu.edu>wrote:

> Hi,
> For the past 2 years I have been running Gromacs on a standard Linux
> cluster
> (with nodes containing 24 CPUs).  As you know, Gromacs scales excellently
> (and is super efficient), and since the CPUs are Intel Xeon 2.4 GHz
> processors, the simulations run quite fast (I can run 10 ns of ~7000 atoms
> in ~2 days on 24 CPUs).  Can I expect GPUs to be any or much faster than
> these CPUs?
> There is a rumor in the department that GPUs can give a performance
> increase
> of 10-40 times relative to CPUs, although that group is using another MD
> package.  I am curious whether this performance improvement is typical.  (I
> would not expect it for Gromacs, though, since Gromacs is already super
> fast!)
> If you have time, do you know of any review or opinion papers that might
> discuss the advantages (advantages of performance or otherwise) of using
> GPUs over CPUs?
> Thanks for your time!
> Andrew DeYoung
> Carnegie Mellon University
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

More information about the gromacs.org_gmx-users mailing list