[gmx-developers] 4xN kernel using Advanced NEON (VL=128 bits) and double-FP
guido.giuntoli at huawei.com
Fri Nov 27 11:29:31 CET 2020
I understand. I am feeding this kernel for both architectures (ARM / NEON and x86 / AVX2) with the same input structures (initialized in the stack, I have replaced the STL vectors for C-style arrays and I am aligning using alignas(X) keyword for GCC) :
// src/gromacs/nbnxm/atomdata.h -> src/nbnxb_class_t_new.h
const nbnxn_atomdata_t nbat;
// src/gromacs/nbnxm/pairlist.h -> src/NbnxnPairlistCpu_new.h
const NbnxnPairlistCpu nbl;
I suppose that for both architectures the results should be the same for the force vector in the output, right? Same input same output no matter the SIMD instructions used, is that correct? Are these dummy particle defined in one of these two structures?
From: gromacs.org_gmx-developers-bounces at maillist.sys.kth.se [mailto:gromacs.org_gmx-developers-bounces at maillist.sys.kth.se] On Behalf Of Szilárd Páll
Sent: Thursday, November 26, 2020 12:30 PM
To: Discussion list for GROMACS development <gmx-developers at gromacs.org>
Cc: gmx-developers mailing list <gromacs.org_gmx-developers at maillist.sys.kth.se>
Subject: Re: [gmx-developers] 4xN kernel using Advanced NEON (VL=128 bits) and double-FP
On Thu, Nov 26, 2020, 9:25 AM Guido Giuntoli <guido.giuntoli at huawei.com<mailto:guido.giuntoli at huawei.com>> wrote:
I am building a minikernel of the 4 x N kernel and testing it in different architectures. Currently I am trying with in an ARMv8 with Advanced NEON using double-FP.
I see that this kernel operates with 4 particles at the time. When I use double-FP I can fill each SIMD unit with 2 FP numbers (128 bits long for Advanced NEON). I checked that that some interactions in this kernel between the 4 particles are not performed: particle 3 and 4 have net force = 0 (in this minikernel). Am I losing something here?
Not all interactions computed will evaluate to non-stop values.
E.g. there could be "dummy" particles used for padding or ( depending on your inputs) part of excluded particles.
Is the kernel prepared to operate in double precision for vector lengths of 128 bits? Is the masking mechanism intended to solve this issue?
Note: when I use single-FP the 4 forces are different from 0 as I expected.
Thanks a lot for the help ;)
Best regards | Mit freundlichen Grüßen
HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Dusseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Li Peng, Li Jian, Shi Yanli
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Li Peng, Li Jian, Shi Yanli
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
Gromacs Developers mailing list
* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org<mailto:gmx-developers-request at gromacs.org>.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers