[gmx-developers] std::vector<Rvec> slowness

Wed Mar 23 15:09:50 CET 2016

Hi,

It could be that I was wrong. I can't reproduce the performance 
difference in this particular loop. It might have been caused by the 
extra vector copy (but I am pretty sure I checked without as well). So 
it looks like there is no issue with std::vector<RVec>

Szilard observation of performance degradation between my vsite patch 
with and without std::vector is still there though. We'll track down 
where this comes from.

Cheers,

Berk

On 2016-03-23 14:46, Berk Hess wrote:
> On 2016-03-23 14:18, Mark Abraham wrote:
>> Hi,
>>
>> On Wed, Mar 23, 2016 at 1:56 PM Berk Hess <hess at kth.se> wrote:
>>
>>     Hi,
>>
>>     It's impractical and bad for performance to go from AoS to SoA.
>>
>>
>> Sure, but that's what we do every MD step on typical CPUs. We do DD 
>> communication on XYZXYZXYZXYZ and then transpose them to XXXXYYYYZZZZ 
>> to use in short-ranged kernels. If we did DD communication on 
>> XXXXX....YYYYY..... ZZZZZZ.... then we still have gather-scatter 
>> overhead, but we don't have transposition in the mix.
>>
>>     The question is now if we can fix this issue or if we should
>>     always try to use .data() in inner-loops
>>
>>
>> I think you're missing my point... we've always done a cast from rvec 
>> * to real * to pass to inner loops. :-) Using data() is a similar 
>> idea - hence my suggestion of as_real_array(theVectorOfRVecs).
> No. The bonded, vsite and constraint code uses rvec and we'd like to 
> keep this in most cases to use rvec operations.
>>
>>     or if we should completely avoid std::vector<RVec> in performance
>>     sensitive code and use good old C pointers with nasty manual
>>     allocation.
>>
>>
>> Allocation is irrelevant to how one uses it. One must have a real * 
>> to be sure the compiler isn't confused. To get that from a vector, we 
>> use data(). Whether the allocation of the memory is managed by 
>> std::vector or snew is immaterial.
> That's true. But we need a way of avoiding coders from using [] on 
> std::vector<RVec> in performance sensitive code. Or we could consider 
> not allowing that anywhere, which is at least possible (I just 
> discussed this with Mark).
>
> Berk
>>
>> Mark
>>
>>     Cheers,
>>
>>     Berk
>>
>>
>>     On 2016-03-23 13:43, Mark Abraham wrote:
>>>     Hi,
>>>
>>>     RVec was intended as a convenience feature for people writing
>>>     high-level, rather than low-level code. I would indeed not
>>>     advocate using it in performance-sensitive mdrun code without
>>>     checking for such performance effects.
>>>
>>>     I think the real issue is closer to that the use of *rvec*
>>>     generally confuses compilers because annotations about pointers
>>>     do indeed get lost, and perhaps [3] in real [3][] is not as
>>>     close to being part of the type in the compiler implementation
>>>     as we might expect. (For a long time, we've used very little
>>>     rvec close to the short-ranged kernels. The first thing we do
>>>     with the rvec * args to the SIMD bondeds is to cast them to real
>>>     *. I think this is just as hard to police as Berk's concern
>>>     about not using std::vector<RVec>::operator[] naively - the
>>>     problem isn't the "fault" of std::vector or its use.)
>>>
>>>     The use of rvec * engenders some transposing that might be
>>>     avoided if we had SoA rather than AoS layout in mdrun. A single
>>>     rvec is likely to have all three coordinates on the same cache
>>>     line, but if one is anyway often using nearby atoms the caching
>>>     behaviour of three separate real * arrays should be rather
>>>     similar. One still tends to have gather/scatter overhead, however.
>>>
>>>     Mark
>>>
>>>
>>>     On Wed, Mar 23, 2016 at 11:24 AM Berk Hess <hess at kth.se
>>>     <mailto:hess at kth.se>> wrote:
>>>
>>>         No.
>>>         using the RVec * pointer obtained through .data() already
>>>         fixes the issue. We don't even need as_rvec_array().
>>>
>>>         The real issue is that it's near impossible to enforce this
>>>         and detect vector access of RVecs if we start using
>>>         std::vector everywhere through the code.
>>>
>>>         Cheers,
>>>
>>>         Berk
>>>
>>>
>>>         On 2016-03-23 11:19, Mark Abraham wrote:
>>>>         Hi,
>>>>
>>>>         If so, then we might want a helper function like
>>>>         as_rvec_array, but as_real_array instead.
>>>>
>>>>         Mark
>>>>
>>>>         On Wed, Mar 23, 2016 at 10:51 AM Berk Hess <hess at kth.se
>>>>         <mailto:hess at kth.se>> wrote:
>>>>
>>>>             On 2016-03-23 10:50, Erik Lindahl wrote:
>>>>>             Hi,
>>>>>
>>>>>             I haven’t followed the discussion in detail, but a
>>>>>             long time ago I remember having simliar issues in the
>>>>>             kernels when using a list[] of rvec[] (in plain-old-c,
>>>>>             no c++) instead of extracting the pointer and handling
>>>>>             the multiply-by-3 manually. Could it be something
>>>>>             similar here, e.g. that the compiler things it does
>>>>>             not know enough about RVec rather than something going
>>>>>             from with the outer list?
>>>>             That would be my guess. The index used in the same loop
>>>>             comes from a vector as well and doesn't seem to affect
>>>>             performance.
>>>>
>>>>
>>>>             Berk
>>>>
>>>>>
>>>>>             Cheers,
>>>>>
>>>>>             Erik
>>>>>
>>>>>>             On 23 Mar 2016, at 10:44, Berk Hess <hess at kth.se> wrote:
>>>>>>
>>>>>>             On 2016-03-23 10:42, Mark Abraham wrote:
>>>>>>>             Hi,
>>>>>>>
>>>>>>>             On Wed, Mar 23, 2016 at 9:44 AM Berk Hess
>>>>>>>             <hess at kth.se> wrote:
>>>>>>>
>>>>>>>                 Hi,
>>>>>>>
>>>>>>>                 Luckily Szilard does thorough testing and
>>>>>>>                 noticed a performance
>>>>>>>                 degradation in change set 25 of
>>>>>>>                 https://gerrit.gromacs.org/#/c/5232/ The
>>>>>>>                 only signifcant change with respect to previous
>>>>>>>                 sets is replacing C
>>>>>>>                 pointers by std::vector. I traced the
>>>>>>>                 performance difference back to a
>>>>>>>                 single loop, which must have become several
>>>>>>>                 factors slower to explain
>>>>>>>                 the time difference. I get the performance back
>>>>>>>                 when replacing the
>>>>>>>                 vector by a pointer extracted with .data(), see
>>>>>>>                 below. I looked at the
>>>>>>>                 assembly code from gcc 5.3.1 and the vector case
>>>>>>>                 generated 200 extra
>>>>>>>                 instructions, which makes it difficult to see
>>>>>>>                 what the actual difference
>>>>>>>                 is. The pointer case uses a lot of vmovss and
>>>>>>>                 vaddss, which the vector
>>>>>>>                 one does much less, but this is only serial SIMD
>>>>>>>                 instruction. I thought
>>>>>>>                 that [] in vector might does bounds checking,
>>>>>>>
>>>>>>>
>>>>>>>             Definitely not in release builds.
>>>>>>>
>>>>>>>                 but it seems it does not.
>>>>>>>                 Can anyone explain why the vector case can be so
>>>>>>>                 slow?
>>>>>>>
>>>>>>>                 If this is a general issue (with RVec or more?),
>>>>>>>                 we need to always extra
>>>>>>>                 a pointer with .data() for use in all
>>>>>>>                 inner-loops. This is pretty
>>>>>>>                 annoying and difficult to enforce.
>>>>>>>
>>>>>>>                 Cheers,
>>>>>>>
>>>>>>>                 Berk
>>>>>>>
>>>>>>>                     const std::vector<RVec> f_foreign =
>>>>>>>                 idt_foreign->force
>>>>>>>
>>>>>>>
>>>>>>>             This does a copy of the vector, and doesn't seem to
>>>>>>>             be in any version of this patch in gerrit. Is this
>>>>>>>             what you meant to write?
>>>>>>             I tried this. But my original "vectorized" patch set
>>>>>>             took a pointer from idt_foreign and did not copy the
>>>>>>             vector, that gives the same, slow, performance.
>>>>>>
>>>>>>             Berk
>>>>>>>
>>>>>>>             Mark
>>>>>>>
>>>>>>>                 or
>>>>>>>                     const RVec          *f_foreign  =
>>>>>>>                 idt_foreign->force.data();
>>>>>>>
>>>>>>>                      int natom = atomList->atom.size();
>>>>>>>                      for (int i = 0; i < natom; i++)
>>>>>>>                      {
>>>>>>>                          int ind = atomList->atom[i];
>>>>>>>                  rvec_inc(f[ind], f_foreign[ind]);
>>>>>>>                      }
>>>>>>>                 --
>>>>>>>                 Gromacs Developers mailing list
>>>>>>>
>>>>>>>                 * Please search the archive at
>>>>>>>                 http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>>>>                 before posting!
>>>>>>>
>>>>>>>                 * Can't post? Read
>>>>>>>                 http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>
>>>>>>>                 * For (un)subscribe requests visit
>>>>>>>                 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>>>>                 or send a mail to
>>>>>>>                 gmx-developers-request at gromacs.org.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>             -- 
>>>>>>             Gromacs Developers mailing list
>>>>>>
>>>>>>             * Please search the archive at
>>>>>>             http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>>>             before posting!
>>>>>>
>>>>>>             * Can't post? Read
>>>>>>             http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>>             * For (un)subscribe requests visit
>>>>>>             https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>>>             or send a mail to gmx-developers-request at gromacs.org.
>>>>>
>>>>>
>>>>>
>>>>
>>>>             --
>>>>             Gromacs Developers mailing list
>>>>
>>>>             * Please search the archive at
>>>>             http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>>             before posting!
>>>>
>>>>             * Can't post? Read
>>>>             http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>>             * For (un)subscribe requests visit
>>>>             https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>>             or send a mail to gmx-developers-request at gromacs.org
>>>>             <mailto:gmx-developers-request at gromacs.org>.
>>>>
>>>>
>>>>
>>>
>>>         --
>>>         Gromacs Developers mailing list
>>>
>>>         * Please search the archive at
>>>         http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>>         before posting!
>>>
>>>         * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>>         * For (un)subscribe requests visit
>>>         https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>>         or send a mail to gmx-developers-request at gromacs.org
>>>         <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>
>>>
>>
>>     --
>>     Gromacs Developers mailing list
>>
>>     * Please search the archive at
>>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>>     before posting!
>>
>>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>>     * For (un)subscribe requests visit
>>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>     or send a mail to gmx-developers-request at gromacs.org
>>     <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20160323/2e784037/attachment-0001.html>