[gmx-users] Question about GPU acceleration in GROMACS 5

Fri Dec 12 15:48:02 CET 2014

Hi Mark

Thanks for your detailed reposponce.

I still don't see the reason for the GPU loading to be only around 50%, but
also why does this number increases with increasing CPU cores.

For example, when using 1 CPU (-ntomp 1 i nthe mdrun) , the GPU loading is
only about 25-30%, although with 4 CPU cores the GPU loading is 55%.

Considering that the work done on the GPU takes a lot longer that the one
on the CPU, I believe the GPU loading should not change when changing the
number of openmp threads. Is this correct or do I miss something here?

Addtionally, I don't really see the reason that the GPU is not loaded 100%.
Is this because of the system size?

Tommy

*Hi,*
>
> *Only the short-ranged non-bonded work is offloaded to the GPU, but that's*
> *almost all the force-based work you are doing. So it is entirely*
> *unsurprising that the work done on the GPU takes a lot longer than it
> does*
> *on the CPU. That warning is aimed at the more typical PME-based
> simulation*
> *where the long-ranged part is done on the CPU, and now there is load to*
> *balance. Running constraints+update happens only on the CPU, which is**always
> a bottleneck, and worse in your case.*
>
> *Ideally, we'd share some load that your simulations are doing solely on
> the*
> *GPU with the CPU, and/or do the update on the GPU, but none of the**infrastructure
> is there for that.*
> *Mark*

On Fri, Dec 12, 2014 at 2:00 PM, Tomy van Batis <tomyvanbatis at gmail.com>
wrote:
>
> Dear all
>
> I am working with a system of about 200.000 particles. All the non-bonded
> interactions on the system are Lennard-Jones type (no Coulomb). I constrain
> the bond-length with Lincs. No torsion or bending interactions are taken
> into account.
>
>
> I am running the simulations on a 4-core Xeon® E5-1620 vs @ 3.70GHz
> together with an NVIDIA Tesla K20Xm. I observe a strange behavior when
> looking to performance of the simulations:
>
>
> 1. Running in 4 cores+gpu
>
> GPU/CPU force evaluation time=9.5 and GPU usage=58% (I see that with the
> command nvidia-smi)
>
>
> [image: Inline image 1]
>
>
>
> 2. Running in 2 cores+gpu
>
> GPU/CPU force evaluation time=9.9 and GPU usage=45-50% (Image is not
> included due to size restrictions)
>
>
>
> The situation doesn't change if I include the option -nd gpu (or gpu_cpu)
> in the mdrun.
>
>
> I can see in the mailing list that the force evaluation time should be
> about 1, that means that I am far away from the optimal performance.
>
>
> Does anybody have any suggestions about how to improve the computational
> speed?
>
>
> Thanks in advance,
>
> Tommy
>
>
>
>