[gmx-users] GPU performance

Szilárd Páll szilard.pall at cbr.su.se
Wed Apr 10 01:53:45 CEST 2013


Hi Ben,

That performance is not reasonable at all - neither for CPU only run on
your quad-core Sandy Bridge, nor for the CPU+GPU run. For the latter you
should be getting more like 50 ns/day or so.

What's strange about your run is that the CPU-GPU load balancing is picking
a *very* long cut-off which means that your CPU is for some reason
performing very badly. Check how is mdrun behaving while running in
top/htop nad if you are not seeing ~400% CPU utilization, there is
something wrong - perhaps threads getting locked to the same core (to check
that try -pin off).

Secondly, note that you are using OpenMM-specific settings from the old
GROMACS-OpenMM comparison benchmarks in which the grid spacing is overly
coarse (you could use something like a fourier-spacing=0.125 or even larger
with rc=1.0).

Cheers,

--
Szilárd


On Tue, Apr 9, 2013 at 10:27 PM, Benjamin Bobay <bgbobay at ncsu.edu> wrote:

> Good afternoon -
>
> I recently installed gromacs-4.6 on CentOS6.3 and the installation went
> just fine.
>
> I have a Tesla C2075 GPU.
>
> I then downloaded the benchmark directories and ran a bench mark on the
> GPU/ dhfr-solv-PME.bench
>
> This is what I got:
>
> Using 1 MPI thread
> Using 4 OpenMP threads
>
> 1 GPU detected:
>   #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible
>
> 1 GPU user-selected for this run: #0
>
>
> Back Off! I just backed up ener.edr to ./#ener.edr.1#
> starting mdrun 'Protein in water'
> -1 steps, infinite ps.
> step   40: timed with pme grid 64 64 64, coulomb cutoff 1.000: 4122.9
> M-cycles
> step   80: timed with pme grid 56 56 56, coulomb cutoff 1.143: 3685.9
> M-cycles
> step  120: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3110.8
> M-cycles
> step  160: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3365.1
> M-cycles
> step  200: timed with pme grid 40 40 40, coulomb cutoff 1.600: 3499.0
> M-cycles
> step  240: timed with pme grid 52 52 52, coulomb cutoff 1.231: 3982.2
> M-cycles
> step  280: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3129.2
> M-cycles
> step  320: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3425.4
> M-cycles
> step  360: timed with pme grid 42 42 42, coulomb cutoff 1.524: 2979.1
> M-cycles
>               optimal pme grid 42 42 42, coulomb cutoff 1.524
> step 4300 performance: 1.8 ns/day
>
> and from the nvidia-smi output:
> Tue Apr  9 10:13:46 2013
> +------------------------------------------------------+
>
> | NVIDIA-SMI 4.304.37   Driver Version: 304.37
> |
>
> |-------------------------------+----------------------+----------------------+
> | GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr.
> ECC |
> | Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute
> M. |
>
> |===============================+======================+======================|
> |   0  Tesla C2075              | 0000:03:00.0      On |
> 0 |
> | 30%   67C    P0    80W / 225W |   4%  200MB / 5375MB |      4%
> Default |
>
> +-------------------------------+----------------------+----------------------+
>
>
>
> +-----------------------------------------------------------------------------+
> | Compute processes:                                               GPU
> Memory |
> |  GPU       PID  Process name
> Usage      |
>
> |=============================================================================|
> |    0     22568  mdrun
> 59MB  |
>
> +-----------------------------------------------------------------------------+
>
>
> So I am only getting 1.8ns/day !!!!! Is that right? It seems very very
> small compared to the CPU test where I am getting the same:
>
> step 200 performance: 1.8 ns/day    vol 0.79  imb F 14%
>
> >From the md.log of the GPU test:
> Detecting CPU-specific acceleration.
> Present hardware specification:
> Vendor: GenuineIntel
> Brand:  Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
> Family:  6  Model: 45  Stepping:  7
> Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc
> pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
> tdt x2a
> pic
> Acceleration most likely to fit this hardware: AVX_256
> Acceleration selected at GROMACS compile time: AVX_256
>
>
> 1 GPU detected:
>   #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible
>
> 1 GPU user-selected for this run: #0
>
> Will do PME sum in reciprocal space.
>
> Any thoughts as to why it is so slow?
>
> many thanks!
> Ben
>
> --
> ____________________________________________
> Research Assistant Professor
> North Carolina State University
> Department of Molecular and Structural Biochemistry
> 128 Polk Hall
> Raleigh, NC 27695
> Phone: (919)-513-0698
> Fax: (919)-515-2047
> ____________________________________________
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list