[gmx-users] GPU waits for CPU, any remedies?
Szilárd Páll
pall.szilard at gmail.com
Wed Sep 17 16:18:13 CEST 2014
Dear Michael,
I checked and indeed, the Ryckaert-Bellman dihedrals are not SIMD
accelerated - that's why they are quite slow. While you CPU is the
bottleneck and you're quite right that the PP-PME balancing can't do
much about this kind of imbalance, the good news is that it can be
faster - even without a new CPU.
With SIMD this will accelerate quite well and will likely cut down
your bonded time by a lot (I'd guess at least 3-4x with AVX, maybe
more with FMA). This code has ben been SIMD optimized yet mostly
because in typical runs the RB computation takes relatively little
time, and additionally it is not quite developer-friendly the way
these kernels need to be written/rewritten for SIMD-acceleration.
However it will likely get implemented soon which in your case will
bring big improvements.
Cheers,
--
Szilárd
On Wed, Sep 17, 2014 at 3:01 PM, Michael Brunsteiner <mbx0009 at yahoo.com> wrote:
>
> Dear Szilard,
> yes it seems i just should have done a bit more reserarch regarding
> the optimal CPU/GPU combination ... and as you point out, the
> bonded interactions are the culprits ... most often people probably
> simulate aqueous systems, in which LINCS does most of this work
> here i have a polymer glass ... different story ...
> the flops table you miss was in my previous mail (see below for another
> copy) and indeed it tells me that 65% of ther CPU load is "Force" while
> only 15.5% is for PME mesh, and i assume only the latter is what can
> be modified by dynamic load balancing ... i assume this means
> there is no way to improve things ... i guess i just have to live
> with the fact that for this type of system my slow CPU is the
> bottleneck ... if you have any other ideas please let me know...
> regards
> mic
>
>
>
> :
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
> -----------------------------------------------------------------------------
> Neighbor search 1 12 251 0.574 23.403 2.1
> Launch GPU ops. 1 12 10001 0.627 25.569 2.3
> Force 1 12 10001 17.392 709.604 64.5
> PME mesh 1 12 10001 4.172 170.234 15.5
> Wait GPU local 1 12 10001 0.206 8.401 0.8
> NB X/F buffer ops. 1 12 19751 0.239 9.736 0.9
> Write traj. 1 12 11 0.381 15.554 1.4
> Update 1 12 10001 0.303 12.365 1.1
> Constraints 1 12 10001 1.458 59.489 5.4
> Rest 1.621 66.139 6.0
> -----------------------------------------------------------------------------
> Total 26.973 1100.493 100.0
>
> ===============================
>
> Why be happy when you could be normal?
>
> --------------------------------------------
> On Tue, 9/16/14, Szilárd Páll <pall.szilard at gmail.com> wrote:
>
> Subject: Re: [gmx-users] GPU waits for CPU, any remedies?
> To: "Michael Brunsteiner" <mbx0009 at yahoo.com>
> Cc: "Discussion list for GROMACS users" <gmx-users at gromacs.org>, "gromacs.org_gmx-users at maillist.sys.kth.se" <gromacs.org_gmx-users at maillist.sys.kth.se>
> Date: Tuesday, September 16, 2014, 6:52 PM
>
> Well, it looks like you are i)
> unlucky ii) limited by the huge bonded workload.
>
> i) As your system is quite small, mdrun thinks that there
> are no
> convenient grids between 32x32x32 and 28x28x28 (see the
> PP-PME tuning
> output). As the latter corresponds to quite a big jump in
> cut-off
> (from 1.296 to 1.482) which more than doubles the non-bonded
> workload
> and is slower than the former, mdrun sticks to using 1.296
> nm as
> coulomb cut-off. You may be able to gain some performance by
> tweaking
> your fourier grid spacing a bit to help mdrun generating
> some
> additional grids that could give more cut-off settings in
> the 1.3-1.48
> range. However, on a second thought, there aren't more
> convenient grid
> sizes between 28 and 32, I guess.
>
> ii) The primary issue is however that your bonded workload
> is much
> higher than it normally is. I'm not fully familiar with the
> implementation, but I think this may be due to the RB term
> which is
> quite slow. This time it's the flops table that could
> confirm this
> this, but as you still have not shared the entire log file,
> we/I can't
> tell.
>
> Cheers,
> --
> Szilárd
>
>
More information about the gromacs.org_gmx-users
mailing list