[gmx-developers] std::vector<Rvec> slowness
hess at kth.se
Wed Mar 23 10:51:38 CET 2016
On 2016-03-23 10:50, Erik Lindahl wrote:
> I haven’t followed the discussion in detail, but a long time ago I
> remember having simliar issues in the kernels when using a list of
> rvec (in plain-old-c, no c++) instead of extracting the pointer and
> handling the multiply-by-3 manually. Could it be something similar
> here, e.g. that the compiler things it does not know enough about RVec
> rather than something going from with the outer list?
That would be my guess. The index used in the same loop comes from a
vector as well and doesn't seem to affect performance.
>> On 23 Mar 2016, at 10:44, Berk Hess <hess at kth.se
>> <mailto:hess at kth.se>> wrote:
>> On 2016-03-23 10:42, Mark Abraham wrote:
>>> On Wed, Mar 23, 2016 at 9:44 AM Berk Hess <hess at kth.se> wrote:
>>> Luckily Szilard does thorough testing and noticed a performance
>>> degradation in change set 25 of
>>> https://gerrit.gromacs.org/#/c/5232/ The
>>> only signifcant change with respect to previous sets is replacing C
>>> pointers by std::vector. I traced the performance difference
>>> back to a
>>> single loop, which must have become several factors slower to
>>> the time difference. I get the performance back when replacing the
>>> vector by a pointer extracted with .data(), see below. I looked
>>> at the
>>> assembly code from gcc 5.3.1 and the vector case generated 200 extra
>>> instructions, which makes it difficult to see what the actual
>>> is. The pointer case uses a lot of vmovss and vaddss, which the
>>> one does much less, but this is only serial SIMD instruction. I
>>> that  in vector might does bounds checking,
>>> Definitely not in release builds.
>>> but it seems it does not.
>>> Can anyone explain why the vector case can be so slow?
>>> If this is a general issue (with RVec or more?), we need to
>>> always extra
>>> a pointer with .data() for use in all inner-loops. This is pretty
>>> annoying and difficult to enforce.
>>> const std::vector<RVec> f_foreign =
>>> This does a copy of the vector, and doesn't seem to be in any
>>> version of this patch in gerrit. Is this what you meant to write?
>> I tried this. But my original "vectorized" patch set took a pointer
>> from idt_foreign and did not copy the vector, that gives the same,
>> slow, performance.
>>> const RVec *f_foreign =
>>> int natom = atomList->atom.size();
>>> for (int i = 0; i < natom; i++)
>>> int ind = atomList->atom[i];
>>> rvec_inc(f[ind], f_foreign[ind]);
>>> Gromacs Developers mailing list
>>> * Please search the archive at
>>> before posting!
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> * For (un)subscribe requests visit
>>> or send a mail to gmx-developers-request at gromacs.org
>>> <mailto:gmx-developers-request at gromacs.org>.
>> Gromacs Developers mailing list
>> * Please search the archive at
>> before posting!
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> * For (un)subscribe requests visit
>> or send a mail to gmx-developers-request at gromacs.org
>> <mailto:gmx-developers-request at gromacs.org>.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers