[gmx-users] Gromacs GPU system question

Szilárd Páll szilard.pall at cbr.su.se
Thu Jun 27 04:31:09 CEST 2013

Thanks Mirco, good info, your numbers look quite consistent. The only
complicating factor is that your CPUs are overclocked by different
amounts, which changes the relative performances somewhat compared to
non-overclocked parts.

However, let me list some prices to show that the top-of-the line AMD
and Intel CPUs are in a different league:
- AMD FX-8350 is a ~170 Eur;
- i7 3770 ~250-280 Eur;
- i7 3930K ~500 Eur.
It's pretty obvious that AMD wins in terms of price/performance
especially if we also consider motherboards which for AMD tend to be
slightly cheaper. AMD can't compete in performance with the 6-core
Intel Sandy Bridge, though.

Note that the AMD-Intel difference is larger in CPU-only benchmarks
(the difference is quite large in the CPU non-bonded kernels). With
the non-bondeds offloaded to a GPU the AMD-Intel difference shrinks. I
get only 1.3x compared to the 1.6x without GPU on a stock FX-8350 / i7
3930K + a fast enough GPU (50k atoms, PME, vsites, rc=1.0 nm).

If we're already at it, note that the new Haswell CPUs will give a
pretty serious performance boost once AVX2 kernels are out (wip) and
the i7 4770 cost pretty much the same as the 3770.

To conclude, for the price conscious I suggest AMD FX-8350, for max
performance Intel i7 4770 or 3930K.

Regarding Hyper-Threading, in GROMACS it can actually improve
performance by up to 25%. This is typically the case when running on a
single CPU+GPU or with CPUs only at not very high parallelization.
However, HT can often hurt performance as it requires running 2x more
threads, e.g. with multiple GPUs it's nearly always better to not use
HT and e.g. at 100 atoms/core 50 atoms/thread (with HT) will most
probably be slower than 100 atoms/thread.

When it comes to GPUs, what's best will very much depend on the
simulation settings. GPUs are used as co-processors, so it's always
best if they are fast enough to "keep up" with the host CPU, but it's
hard to say which GPU is fast enough for which CPU *in general*. E.g.
for a 6-core Intel you'll certainly want a GTX 770/780 or Titan, but
if you run with long cut-offs even with these you might get CPU-GPU
load imbalance.
Note that in general you are better off using fewer and faster GPUs
rather than more mainly because the domain-decomposition overhead
(when gouring from one to two GPUs) is quite high and also because
performance on GPUs deteriorates quickly below 30-40k atoms/GPU.


On Sun, Jun 23, 2013 at 12:05 AM, Mirco Wahab
<mirco.wahab at chemie.tu-freiberg.de> wrote:
> On 22.06.2013 22:18, Mare Libero wrote:
>> The vendor I contacted was pushing for one of the
>> high end i7 processors with hyper-threading. But from what I can read,
>> most of the MD software don't make any use of it. So, using a the
>> multi-cores AMD (like your  FX-8350) can be a cheaper and more
>> advantageous option.
> Your vendor is, in my opinion, right. The AMD consumer multicores
> (Piledriver) aren't actually eight-core cpus, but rather similar
> to 4 core cpus (they are called 'modules').
> For testing a user-defined potential, I once compiled performance
> figures over a range of actual commodity hardware (available to me).
> These are all workstations and usually overclocked somehow by the
> students (but only if there's no crash at all in a year ;-)
> This is all *without GPU*, only the plain and raw CPU processing
> power for Gromacs is checked for (last column).
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>  Test case:
>  Two coarse-grained implicit-solvent vesicles bumping into each other
>  SD integrator
>  480,000 particles
>  Box (110nm)³
>  User-defined potential (rc=0.8225nm)
>  dt=0.020ps
>   --------------------------------------------
>     CPU                 Arch    Cores   ns/day
>   --------------------------------------------
>   - X6/1090T;3.3GHz     SSE2    6C/6T   19.130
>   - FX-8350;4.5GHz      AVX_FMA 4M/8T   34.175
>   - i7/2600K;4.2GHz     AVX_256 4C/8T   39.073
>   - i7/3770K;4.4GHz     AVX_265 4C/8T   41.931
>   - i7/3930K;4.2GHz     AVX_256 6c/12T  56.891
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> You can see here, for CPU performance, you can't
> really choose anything different from the 6-core i7/3930K.
> It costs some bucks more than the 4-core-CPUs but will run
> significantly faster the time you use it.
>> Most of what we do is protein-protein interactions and protein stability
>> studies with explicit water/ions. One of our projects now has <100,000
>> atoms in a 100 Ang water box (7,800 protein atoms + 67,000 water). It's
>> difficult to be more specific on the parameters since each project is
>> different, but in general we do not deviate much from a standard NPT run.
> 10nm box/75K atoms is not very large. I guess you'd use a time step
> of 0.002 ps and a united atom model + spc or spc/e water? 100ns/day
> seem possible with any GPU from GTX-660 or higher. If you buy a mighty
> GPU (Titan), the question will be: can your n-core-CPU saturate such a
> fast GPU monster? A good compromise would be, probably, the GTX-780
> which is a slightly reduced Titan for half the price and all options
> open.
> my € 0.02
> M.
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

More information about the gromacs.org_gmx-users mailing list