[gmx-users] General conceptual question about advantage of GPUs
szilard.pall at cbr.su.se
Wed Apr 10 21:58:41 CEST 2013
On Wed, Apr 10, 2013 at 4:24 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:
> Hi Andrew,
> As others have said, 40x speedup with GPUs is certainly possible, but more
> often than not comparisons leading to such numbers are not entirely fair -
> at least from a computational perspective. The most common case is when
> people compare legacy, poorly (SIMD)-optimized codes with some new GPU
> accelerated code that typically somebody just spent a lot of effort on. Of
> course, performance improvement for the users of any code is beneficial, but
> the way such achievements are pitched (and published), is IMHO rather
> If you look around in the computing/computational science literature and try
> to find the sound comparisons of highly (or at least comparably) tuned CPU
> and GPU codes, you will mostly see around 3-10x speedup reported. There are
> cases when the performance difference can be much more e.g solvers, random
> number generators, and in general codes that are more suitable for GPUs and
> can make use of the Teraflops of raw computational horsepower. For a CPU-GPU
> comparison of "basic" algorithms you can have a look at this recent report
> from NVIDIA in which you'll see that even BLAS is only 2-10x faster on GPUs:
> When it comes to GROMACS, you can expect 2-4x performance improvement (with
> PME), but the numbers will depend a *lot* on simulation settings, hardware,
> as well as system size. Besides Erik Lindahl's soon to be published
> web-seminar, you can also have a look at related talks from the 2012 and
> 2013 GTC conference.
It's just occurred to me that our talks from this year's GTC are not
online yet, but they should be available in a few weeks. You can find
them by searching for "gromacs" here:
> Finally, just a note on the ~5 ns/day on 24 CPU (cores I assume). I'm not
> sure what exactly are you simulating, but the numbers seem to be rather low.
> With a small solvated protein system of ~7000 atoms with vsites and 5 fs
> time steps, on a single 6-core Core i7-3930K CPU I get 250 ns/day and by
> adding a GeForce GTX 680 GPU I get 600 ns/day. Even with accounting for the
> 2-2.5x performance difference due to longer time steps, this is still more
> than an order of magnitude higher than what you get. If you really need
> particle decomposition, you can consider using the Verlet scheme without
> domain-decomposition with OpenMP multithreading parallelization which is
> roughly equivalent with PD and runs efficiently on up to 24-32 threads on
> On Tue, Apr 9, 2013 at 5:38 PM, Andrew DeYoung <adeyoung at andrew.cmu.edu>
>> For the past 2 years I have been running Gromacs on a standard Linux
>> (with nodes containing 24 CPUs). As you know, Gromacs scales excellently
>> (and is super efficient), and since the CPUs are Intel Xeon 2.4 GHz
>> processors, the simulations run quite fast (I can run 10 ns of ~7000 atoms
>> in ~2 days on 24 CPUs). Can I expect GPUs to be any or much faster than
>> these CPUs?
>> There is a rumor in the department that GPUs can give a performance
>> of 10-40 times relative to CPUs, although that group is using another MD
>> package. I am curious whether this performance improvement is typical.
>> would not expect it for Gromacs, though, since Gromacs is already super
>> If you have time, do you know of any review or opinion papers that might
>> discuss the advantages (advantages of performance or otherwise) of using
>> GPUs over CPUs?
>> Thanks for your time!
>> Andrew DeYoung
>> Carnegie Mellon University
>> gmx-users mailing list gmx-users at gromacs.org
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
More information about the gromacs.org_gmx-users