[gmx-users] NVIDIA GTX cards in Rackable servers, how do you do it ?
Téletchéa Stéphane
stephane.teletchea at univ-nantes.fr
Tue Feb 24 16:02:04 CET 2015
Le 24/02/2015 13:29, David McGiven a écrit :
> I never benchmarked 64-core AMD nodes with GPUs. With a 80 k atoms test
>> >system using a 2 fs time step I get
>> >24 ns/d on 64 AMD cores 6272
>> >16 ns/d on 32 AMD cores 6380
>> >36 ns/d on 32 AMD cores 6380 with 1x GTX 980
>> >40 ns/d on 32 AMD cores 6380 with 2x GTX 980
>> >27 ns/d on 20 Intel cores 2680v2
>> >52 ns/d on 20 Intel cores 2680v2 with 1x GTX 980
>> >62 ns/d on 20 Intel cores 2680v2 with 2x GTX 980
> I think 20 Intel cores means 2 x 10 cores each.
>
> But Szilard just mentioned in this same thread :
>
> If you can afford them get the 14/16 or 18 core v3 Haswells, those are
>> >*really* fast, but a pair can cost as much as a decent car.
>
> I know for sure gromacs escalates VERY well on 4 x 16 cores latests AMD
> (Interlagos, Bulldozer, etc.) machines. But have no experience with Intel
> Xeon.
My experience with latest gromacs and fftw build on my machine is that
one should not consider the "hyperthreaded" "cores" , but only the real
cores.
My system has 24 "cores" (E5-2620 v2 @ 2.10GHz + NVIDIA K4000), but
really only 12 "real" cores.
Using pin, running only one test system with optimized conditions I used
the benchmarks
available at the gromacs web site (ADH, rnase, villin,
http://www.gromacs.org/GPU_acceleration),
My results were :
*** rnase_cubic
45,75 ns/day with -nt 6 and gpu on
47,10 ns/day with -nt 12 and gpu on
27,66 ns/day with -nt 24 and gpu on
35,31 ns/day with -nt 12 and gpu off
21,37 ns/day with -nt 24 and gpu off
The results are more or less similar in the other benchmarks, 6 cores +
GPU close to 12 cores + GPU, and faster than 24 cores...
The difference in the GPU case is the aveage GPU usage, which is more
than 85 % during the tests runs when not all processors are in use while
it drops to 50 % if all cores are in use (using a rough observation of
the GPU usage using nvidia-smi-tool).
I have no explanation for the CPU-only benchmarked though, since I have
enabled or disabled pinning, ensured that only one job was running at a
time, etc. I have not played a lot with -nt, either omp or mpi, since
this machine is a single node.
Hope this helps in showing that "more expensive" may not be the way...
Best,
Stéphane
--
Lecturer, UFIP, UMR 6286 CNRS, Team Protein Design In Silico
UFR Sciences et Techniques, 2, rue de la Houssinière, Bât. 25, 44322 Nantes cedex 03, France
Tél : +33 251 125 636 / Fax : +33 251 125 632
http://www.ufip.univ-nantes.fr/ - http://www.steletch.org
More information about the gromacs.org_gmx-users
mailing list