[gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?
stephane.teletchea at univ-nantes.fr
Tue Feb 24 16:02:04 CET 2015
Le 24/02/2015 13:29, David McGiven a écrit :
> I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test
>> >system using a 2 fs time step I get
>> >24 ns/d on 64 AMD cores 6272
>> >16 ns/d on 32 AMD cores 6380
>> >36 ns/d on 32 AMD cores 6380 with 1x GTX 980
>> >40 ns/d on 32 AMD cores 6380 with 2x GTX 980
>> >27 ns/d on 20 Intel cores 2680v2
>> >52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980
>> >62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980
> I think 20 Intel cores means 2 x 10 cores each.
> But Szilard just mentioned in this same thread :
> If you can afford them get the 14/16 or 18 core v3 Haswells, those are
>> >*really* fast, but a pair can cost as much as a decent car.
> I know for sure gromacs escalates VERY well on 4 x 16 cores latests AMD
> (Interlagos, Bulldozer, etc.) machines. But have no experience with Intel
My experience with latest gromacs and fftw build on my machine is that
one should not consider the "hyperthreaded" "cores" , but only the real
My system has 24 "cores" (E5-2620 v2 @ 2.10GHz + NVIDIA K4000), but
really only 12 "real" cores.
Using pin, running only one test system with optimized conditions I used
available at the gromacs web site (ADH, rnase, villin,
My results were :
45,75 ns/day with -nt 6 and gpu on
47,10 ns/day with -nt 12 and gpu on
27,66 ns/day with -nt 24 and gpu on
35,31 ns/day with -nt 12 and gpu off
21,37 ns/day with -nt 24 and gpu off
The results are more or less similar in the other benchmarks, 6 cores +
GPU close to 12 cores + GPU, and faster than 24 cores...
The difference in the GPU case is the aveage GPU usage, which is more
than 85 % during the tests runs when not all processors are in use while
it drops to 50 % if all cores are in use (using a rough observation of
the GPU usage using nvidia-smi-tool).
I have no explanation for the CPU-only benchmarked though, since I have
enabled or disabled pinning, ensured that only one job was running at a
time, etc. I have not played a lot with -nt, either omp or mpi, since
this machine is a single node.
Hope this helps in showing that "more expensive" may not be the way...
Lecturer, UFIP, UMR 6286 CNRS, Team Protein Design In Silico
UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322 Nantes cedex 03, France
Tél : +33 251 125 636 / Fax : +33 251 125 632
http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
More information about the gromacs.org_gmx-users