[gmx-users] Question about GPU acceleration in GROMACS 5

Carsten Kutzner ckutzne at gwdg.de
Fri Dec 12 16:19:34 CET 2014


Hi,

On 12 Dec 2014, at 15:47, Tomy van Batis <tomyvanbatis at gmail.com> wrote:

> Hi Mark
> 
> Thanks for your detailed reposponce.
> 
> I still don't see the reason for the GPU loading to be only around 50%, but
> also why does this number increases with increasing CPU cores.
> 
> For example, when using 1 CPU (-ntomp 1 i nthe mdrun) , the GPU loading is
> only about 25-30%, although with 4 CPU cores the GPU loading is 55%.
> 
> Considering that the work done on the GPU takes a lot longer that the one
> on the CPU, I believe the GPU loading should not change when changing the
> number of openmp threads. Is this correct or do I miss something here?
> 
> Addtionally, I don't really see the reason that the GPU is not loaded 100%.
> Is this because of the system size?
The GPU is idle part of the time step when it waits for new positions to 
calculate the forces for. The time step integration is done on the CPU.
Additionally, there is a balancing between real- and reciprocal space part
of the electrostatics calculation done that optimizes for the shortest
possible time step, not the highest possible GPU load.

Carsten

> 
> Tommy
> 
> 
> 
> *Hi,*
>> 
>> *Only the short-ranged non-bonded work is offloaded to the GPU, but that's*
>> *almost all the force-based work you are doing. So it is entirely*
>> *unsurprising that the work done on the GPU takes a lot longer than it
>> does*
>> *on the CPU. That warning is aimed at the more typical PME-based
>> simulation*
>> *where the long-ranged part is done on the CPU, and now there is load to*
>> *balance. Running constraints+update happens only on the CPU, which is**always
>> a bottleneck, and worse in your case.*
>> 
>> *Ideally, we'd share some load that your simulations are doing solely on
>> the*
>> *GPU with the CPU, and/or do the update on the GPU, but none of the**infrastructure
>> is there for that.*
>> *Mark*
> 
> 
> On Fri, Dec 12, 2014 at 2:00 PM, Tomy van Batis <tomyvanbatis at gmail.com>
> wrote:
>> 
>> Dear all
>> 
>> I am working with a system of about 200.000 particles. All the non-bonded
>> interactions on the system are Lennard-Jones type (no Coulomb). I constrain
>> the bond-length with Lincs. No torsion or bending interactions are taken
>> into account.
>> 
>> 
>> I am running the simulations on a 4-core Xeon® E5-1620 vs @ 3.70GHz
>> together with an NVIDIA Tesla K20Xm. I observe a strange behavior when
>> looking to performance of the simulations:
>> 
>> 
>> 1. Running in 4 cores+gpu
>> 
>> GPU/CPU force evaluation time=9.5 and GPU usage=58% (I see that with the
>> command nvidia-smi)
>> 
>> 
>> [image: Inline image 1]
>> 
>> 
>> 
>> 2. Running in 2 cores+gpu
>> 
>> GPU/CPU force evaluation time=9.9 and GPU usage=45-50% (Image is not
>> included due to size restrictions)
>> 
>> 
>> 
>> The situation doesn't change if I include the option -nd gpu (or gpu_cpu)
>> in the mdrun.
>> 
>> 
>> I can see in the mailing list that the force evaluation time should be
>> about 1, that means that I am far away from the optimal performance.
>> 
>> 
>> Does anybody have any suggestions about how to improve the computational
>> speed?
>> 
>> 
>> Thanks in advance,
>> 
>> Tommy
>> 
>> 
>> 
>> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.


--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa



More information about the gromacs.org_gmx-users mailing list