[gmx-developers] OpenACC Development
Erik Lindahl
erik.lindahl at gmail.com
Fri Jul 1 20:45:11 CEST 2016
Hi,
> On 01 Jul 2016, at 20:21, Millad Ghane <mghane at cs.uh.edu> wrote:
>
> I am not saying there is no performance loss. There is, but the
> performance loss shouldn't be more than like 30% "for the GPU codes”.
I think you are completely underestimating the importance of data layout.
Of course OpenACC would do decently if we keep the 90% of the GPU-related code we wrote to provide e.g. GPU-optimize data layouts.
However, in that case there is no use whatsoever for OpenACC since the few lines of “CUDA” code in the kernels are straightforward - and you are losing 30% for no use whatsoever.
You would still need a different data layout for Xeon Phi.
Adding OpenACC pragmas isn’t very difficult, so again: if you believe that you can write a completely general implementation that is only 30% slower in the accelerated kernels (~15% in total) and that works on all architectures, please do it and show us :-)
>> Thatâs like saying all C++ implementations of molecular dynamics should
>> have the same performance because itâs the same language.
>> If that was true, you should not see any performance difference when you
>> disable SIMD. After all, all floating-point math on x86 is implemented
>> with SSE/AVX instructions today.
>>
> It's different. By enabling and using SIMD commands you are actually
> accessing and exploiting some hardware features of CPU. So, since you are
> accessing high performance features, the code executes faster.
No. Please check the assembly output of your compiler. Your compiler WILL be generating AVX2 instructions with proper flags and “-O3", but it is not capable of reorganizing the data layout.
> But
> changing languages with the same SIMD configuration, the output should be
> roughly the same.
“same SIMD configuration” means keeping the 99% of the data layouts optimized for each architecture. Sorry, but that’s not making your implementation portable.
> However, I argue that in this case, C is a little bit
> more faster than C++.
No. Intrinsics will generate identical assembly in C and C++ - and that is based on knowledge, not opinions. Once we exploit template parameters, C++ completely kills C for complex kernels.
Cheers,
Erik
More information about the gromacs.org_gmx-developers
mailing list