[gmx-users] GROMACS performance with an NVidia Tesla k40c

Szilárd Páll pall.szilard at gmail.com
Tue Jan 19 22:05:29 CET 2016


Hi,


On Tue, Jan 19, 2016 at 8:34 PM, Michail Palaiokostas Avramidis <
m.palaiokostas at qmul.ac.uk> wrote:

> Dear GMX users,
>
>
> I have recently installed an Nvidia Tesla K40c in my workstation (already
> had a quadro k2000) and I am currently trying to optimize its usage with
> GROMACS. I used two compilations of GROMACS, one is the standard one as
> suggested in the beginning of the installation documentation and one where
> I added some more flags to see what will happen. The latter compilation
> used:
>
>
> cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on
> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-7.5
> -DNVML_INCLUDE_DIR=/usr/include/nvidia/gdk
> -DNVML_LIBRARY=/usr/lib/nvidia-352/libnvidia-ml.so
>
>
Looks reasonable.


>
> So far I used 4 different combinations to test a water-membrane system of
> ~30500 atoms for 5000 steps:
>
> 1) CPU only,
>
> 2) CPU+2GPUs (the default),
>
> 3) CPU+Quadro and
>
> 4) CPU+Tesla.
>
> Obviously the fastest is the Tesla one with 31ns/day. This is 3.6 times
> faster than the CPU-only setup.
>
>
> While this is good, I am not entirely satisfied with the speed-up. Do you
> think is normal? Would you expect more?
>

3.6x is perfectly normal; the typical GPU acceleration improvement is 2-4x.

What makes you unsatisfied; why do you expect more speedup? (If you happen
to be comparing to the speedup of other MD packages, do consider that
GROMACS has highly optimized SIMD CPU kernels which makes it quite fast on
CPUs only. With an already highly optimized baseline it's harder get high
speedup, no matter what kind of accelerator you use.


> One thing I noticed is that there was absolutely no difference with using
> the custom, GPU-oriented compilation of GROMACS. Did I miss something there?
>

Not sure what you're referring to here, could you clarify?


> The second thing I noticed is that even by increasing nstlist the
> performance remained the same (despite the suggestion in the documentation).
>

Increasing from what value to what value? Note that mdrun will by default
increase nstlist if the initial value is small.
See Table 2 and related text in http://doi.wiley.com/10.1002/jcc.24030.

Finally, in my log file I got the message (the actual log is attached to
> the message):
>
> Force evaluation time GPU/CPU: 2.232 ms/3.189 ms = 0.700
>
> For optimal performance this ratio should be close to 1!
>
> NOTE: The GPU has >25% less load than the CPU. This imbalance causes
> performance loss.
>
>
> Can you please help me solve this imbalance? At the moment I am executing
> gromacs with: gmx mdrun -v -deffnm npt-ini -gpu_id 0
>

The automated CPU-GPU load balancer should address this on its own - if
possible. If your CPU is relatively slow, there is often not much more to
do.

Post log files of your runs and we may be able to suggest more.

Cheers,
--
Szilárd

Thank you in advance for your help.
>
>
> Best Regards,
>
> Michail
>
>
> -------------------------------------------------------------------
> Michail (Michalis) Palaiokostas
> PhD Student
> School of Engineering and Materials Science
> Queen Mary University of London
> -------------------------------------------------------------------
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list