[gmx-developers] std::vector<Rvec> slowness
Berk Hess
hess at kth.se
Wed Mar 23 15:09:50 CET 2016
Hi,
It could be that I was wrong. I can't reproduce the performance
difference in this particular loop. It might have been caused by the
extra vector copy (but I am pretty sure I checked without as well). So
it looks like there is no issue with std::vector<RVec>
Szilard observation of performance degradation between my vsite patch
with and without std::vector is still there though. We'll track down
where this comes from.
Cheers,
Berk
On 2016-03-23 14:46, Berk Hess wrote:
> On 2016-03-23 14:18, Mark Abraham wrote:
>> Hi,
>>
>> On Wed, Mar 23, 2016 at 1:56 PM Berk Hess <hess at kth.se> wrote:
>>
>> Hi,
>>
>> It's impractical and bad for performance to go from AoS to SoA.
>>
>>
>> Sure, but that's what we do every MD step on typical CPUs. We do DD
>> communication on XYZXYZXYZXYZ and then transpose them to XXXXYYYYZZZZ
>> to use in short-ranged kernels. If we did DD communication on
>> XXXXX....YYYYY..... ZZZZZZ.... then we still have gather-scatter
>> overhead, but we don't have transposition in the mix.
>>
>> The question is now if we can fix this issue or if we should
>> always try to use .data() in inner-loops
>>
>>
>> I think you're missing my point... we've always done a cast from rvec
>> * to real * to pass to inner loops. :-) Using data() is a similar
>> idea - hence my suggestion of as_real_array(theVectorOfRVecs).
> No. The bonded, vsite and constraint code uses rvec and we'd like to
> keep this in most cases to use rvec operations.
>>
>> or if we should completely avoid std::vector<RVec> in performance
>> sensitive code and use good old C pointers with nasty manual
>> allocation.
>>
>>
>> Allocation is irrelevant to how one uses it. One must have a real *
>> to be sure the compiler isn't confused. To get that from a vector, we
>> use data(). Whether the allocation of the memory is managed by
>> std::vector or snew is immaterial.
> That's true. But we need a way of avoiding coders from using [] on
> std::vector<RVec> in performance sensitive code. Or we could consider
> not allowing that anywhere, which is at least possible (I just
> discussed this with Mark).
>
> Berk
>>
>> Mark
>>
>> Cheers,
>>
>> Berk
>>
>>
>> On 2016-03-23 13:43, Mark Abraham wrote:
>>> Hi,
>>>
>>> RVec was intended as a convenience feature for people writing
>>> high-level, rather than low-level code. I would indeed not
>>> advocate using it in performance-sensitive mdrun code without
>>> checking for such performance effects.
>>>
>>> I think the real issue is closer to that the use of *rvec*
>>> generally confuses compilers because annotations about pointers
>>> do indeed get lost, and perhaps [3] in real [3][] is not as
>>> close to being part of the type in the compiler implementation
>>> as we might expect. (For a long time, we've used very little
>>> rvec close to the short-ranged kernels. The first thing we do
>>> with the rvec * args to the SIMD bondeds is to cast them to real
>>> *. I think this is just as hard to police as Berk's concern
>>> about not using std::vector<RVec>::operator[] naively - the
>>> problem isn't the "fault" of std::vector or its use.)
>>>
>>> The use of rvec * engenders some transposing that might be
>>> avoided if we had SoA rather than AoS layout in mdrun. A single
>>> rvec is likely to have all three coordinates on the same cache
>>> line, but if one is anyway often using nearby atoms the caching
>>> behaviour of three separate real * arrays should be rather
>>> similar. One still tends to have gather/scatter overhead, however.
>>>
>>> Mark
>>>
>>>
>>> On Wed, Mar 23, 2016 at 11:24 AM Berk Hess <hess at kth.se
>>> <mailto:hess at kth.se>> wrote:
>>>
>>> No.
>>> using the RVec * pointer obtained through .data() already
>>> fixes the issue. We don't even need as_rvec_array().
>>>
>>> The real issue is that it's near impossible to enforce this
>>> and detect vector access of RVecs if we start using
>>> std::vector everywhere through the code.
>>>
>>> Cheers,
>>>
>>> Berk
>>>
>>>
>>> On 2016-03-23 11:19, Mark Abraham wrote:
>>>> Hi,
>>>>
>>>> If so, then we might want a helper function like
>>>> as_rvec_array, but as_real_array instead.
>>>>
>>>> Mark
>>>>
>>>> On Wed, Mar 23, 2016 at 10:51 AM Berk Hess <hess at kth.se
>>>> <mailto:hess at kth.se>> wrote:
>>>>
>>>> On 2016-03-23 10:50, Erik Lindahl wrote:
>>>>> Hi,
>>>>>
>>>>> I haven’t followed the discussion in detail, but a
>>>>> long time ago I remember having simliar issues in the
>>>>> kernels when using a list[] of rvec[] (in plain-old-c,
>>>>> no c++) instead of extracting the pointer and handling
>>>>> the multiply-by-3 manually. Could it be something
>>>>> similar here, e.g. that the compiler things it does
>>>>> not know enough about RVec rather than something going
>>>>> from with the outer list?
>>>> That would be my guess. The index used in the same loop
>>>> comes from a vector as well and doesn't seem to affect
>>>> performance.
>>>>
>>>>
>>>> Berk
>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Erik
>>>>>
>>>>>> On 23 Mar 2016, at 10:44, Berk Hess <hess at kth.se> wrote:
>>>>>>
>>>>>> On 2016-03-23 10:42, Mark Abraham wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Wed, Mar 23, 2016 at 9:44 AM Berk Hess
>>>>>>> <hess at kth.se> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Luckily Szilard does thorough testing and
>>>>>>> noticed a performance
>>>>>>> degradation in change set 25 of
>>>>>>> https://gerrit.gromacs.org/#/c/5232/ The
>>>>>>> only signifcant change with respect to previous
>>>>>>> sets is replacing C
>>>>>>> pointers by std::vector. I traced the
>>>>>>> performance difference back to a
>>>>>>> single loop, which must have become several
>>>>>>> factors slower to explain
>>>>>>> the time difference. I get the performance back
>>>>>>> when replacing the
>>>>>>> vector by a pointer extracted with .data(), see
>>>>>>> below. I looked at the
>>>>>>> assembly code from gcc 5.3.1 and the vector case
>>>>>>> generated 200 extra
>>>>>>> instructions, which makes it difficult to see
>>>>>>> what the actual difference
>>>>>>> is. The pointer case uses a lot of vmovss and
>>>>>>> vaddss, which the vector
>>>>>>> one does much less, but this is only serial SIMD
>>>>>>> instruction. I thought
>>>>>>> that [] in vector might does bounds checking,
>>>>>>>
>>>>>>>
>>>>>>> Definitely not in release builds.
>>>>>>>
>>>>>>> but it seems it does not.
>>>>>>> Can anyone explain why the vector case can be so
>>>>>>> slow?
>>>>>>>
>>>>>>> If this is a general issue (with RVec or more?),
>>>>>>> we need to always extra
>>>>>>> a pointer with .data() for use in all
>>>>>>> inner-loops. This is pretty
>>>>>>> annoying and difficult to enforce.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Berk
>>>>>>>
>>>>>>> const std::vector<RVec> f_foreign =
>>>>>>> idt_foreign->force
>>>>>>>
>>>>>>>
>>>>>>> This does a copy of the vector, and doesn't seem to
>>>>>>> be in any version of this patch in gerrit. Is this
>>>>>>> what you meant to write?
>>>>>> I tried this. But my original "vectorized" patch set
>>>>>> took a pointer from idt_foreign and did not copy the
>>>>>> vector, that gives the same, slow, performance.
>>>>>>
>>>>>> Berk
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> or
>>>>>>> const RVec *f_foreign =
>>>>>>> idt_foreign->force.data();
>>>>>>>
>>>>>>> int natom = atomList->atom.size();
>>>>>>> for (int i = 0; i < natom; i++)
>>>>>>> {
>>>>>>> int ind = atomList->atom[i];
>>>>>>> rvec_inc(f[ind], f_foreign[ind]);
>>>>>>> }
>>>>>>> --
>>>>>>> Gromacs Developers mailing list
>>>>>>>
>>>>>>> * Please search the archive at
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>>>> before posting!
>>>>>>>
>>>>>>> * Can't post? Read
>>>>>>> http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>> * For (un)subscribe requests visit
>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>>>> or send a mail to
>>>>>>> gmx-developers-request at gromacs.org.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gromacs Developers mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>>> before posting!
>>>>>>
>>>>>> * Can't post? Read
>>>>>> http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>>> or send a mail to gmx-developers-request at gromacs.org.
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Gromacs Developers mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>> before posting!
>>>>
>>>> * Can't post? Read
>>>> http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>> or send a mail to gmx-developers-request at gromacs.org
>>>> <mailto:gmx-developers-request at gromacs.org>.
>>>>
>>>>
>>>>
>>>
>>> --
>>> Gromacs Developers mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>> before posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>> or send a mail to gmx-developers-request at gromacs.org
>>> <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>
>>>
>>
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org
>> <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20160323/2e784037/attachment-0001.html>
More information about the gromacs.org_gmx-developers
mailing list