[gmx-developers] 4xN kernel using Advanced NEON (VL=128 bits) and double-FP

Thu Nov 26 09:25:14 CET 2020

Hi,

I am building a minikernel of the 4 x N kernel and testing it in different architectures. Currently I am trying with in an ARMv8 with Advanced NEON using double-FP.

I see that this kernel operates with 4 particles at the time. When I use double-FP I can fill each SIMD unit with 2 FP numbers (128 bits long for Advanced NEON). I checked that that some interactions in this kernel between the 4 particles are not performed: particle 3 and 4 have net force = 0 (in this minikernel). Am I losing something here? Is the kernel prepared to operate in double precision for vector lengths of 128 bits? Is the masking mechanism intended to solve this issue?

Note: when I use single-FP the 4 forces are different from 0 as I expected.

Thanks a lot for the help ;)

Best regards | Mit freundlichen Grüßen

Guido Giuntoli

HUAWEI TECHNOLOGIES Duesseldorf GmbH
Hansaallee 205, 40549 Dusseldorf, Germany, www.huawei.com<http://www.huawei.com/>
Registered Office: Düsseldorf, Register Court Düsseldorf, HRB 56063,
Managing Director: Li Peng, Li Jian, Shi Yanli
Sitz der Gesellschaft: Düsseldorf, Amtsgericht Düsseldorf, HRB 56063,
Geschäftsführer: Li Peng, Li Jian, Shi Yanli
-----------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20201126/6072a23e/attachment.html>