[gmx-developers] OpenACC Development
pall.szilard at gmail.com
Fri Jul 1 19:03:27 CEST 2016
On Fri, Jul 1, 2016 at 6:17 PM, Millad Ghane <mghane at cs.uh.edu> wrote:
> Hi Berk,
> Thanks for your reply.
> I know that OpenACC is a programming language. What I meant by OpenACC
> architecture was "OpenACC software architecture". I know that eventually,
> the OpenACC code is executed by an underlying hardware architecture (which
> could be NVidia or AMD/ATI or even Xeon PHI from Intel).
> What I hope to achieve: introducing a set of kernels that are optimized
> for OpenACC.
There is no such thing as "optimized for OpenACC" (well you could con
that term to express data-flow optimizations, but you'd need to
You implement kernels _using_ OpenACC + code transformations to cater
for the different arch you target. TBH OpenACC doesn't define much
more than the pragmas for data flow and constraints on this, but the
code transformation to make the kernels run with OK performance are up
to the developers to do on a case-by-case basis while hoping that the
compiler does a reasonable job in optimizing.
> What I have done: trying to parallelize the 4x4 Plain-C code for CPU using
> OpenACC compiler directives.
Plain C SIMD (reference SIMD) kernels are the ones to target
(GMX_SIMD=Reference). These run significantly faster than the no SIMD
ones (GMX_SIMD=None). 4x2 reference is typically the fastest with the
amount/quality of auto-vectorization compilers can accomplish.
> The problem is that the code and especially the data structures (in
> Plain-C CPU kernels) get complex in kernel level to some extent.
> Therefore, that was what I was looking for: introducing kernels for
> OpenACC programming model.
That still feels like it may be wrong target, but of course it depends
what _you_ want to accomplish/investigate.
There are two questions that come to my mind:
- How fast can you get some GROMACS kernels when expressed with OpenACC
- How can you implement the data/algorithmic flow with OpenACC to
target multiple architectures.
I think focusing on the kernels is a not very interesting or rewarding
task -- unless the question you are posing is exactly as expressed
above. In that case, I'd hope one's explicit goal is to i) investigate
the source of the differences between what the OpenACC compiler can
achieve vs. the manually tuned kernels and ii) provide solid data on
how to get the kernels faster (i.e. feedback to the OpenACC team
The latter seems more interesting (and potentially more rewarding).
That's because you can potentially replace a huge amount of host-side
boilerplate code (far more than kernel code) with "just" a few
pragmas. It also feels like it is likely a more realistic target to
implement decent data flow and CPU-GPU (or CPU-CPU) concurrency using
#pragma programming and achieve good performance. I'm not sure, but
it's quite likely that the existing fast kernels can be reused.
> Best Regards,
>> Hi Millad,
>> Welcome to the list.
>> GROMACS aims for close to optimal performance on all relevant hardware.
>> To achieve this, we write highly optimized non-bonded (and other)
>> kernels for all relevant architectures. Currently we have plain-C CPU
>> kernels (extremely slow), SIMD, CUDA and OpenCL non-bonded kernels. The
>> optimization is not so much in the arithmetics, but rather in the data
>> layout. OpenACC is not an architecture, but a programming standard. So
>> you would need to choose a target architecture for your OpenACC
>> "acceleration" and then choose one of our kernel types. But the only
>> gain of this would be semi-automated offloading. The question is then if
>> the offloading will be efficient enough.
>> On 2016-07-01 02:05, Millad Ghane wrote:
>>> Hello everyone,
>>> I am a PhD student in computer science at University of Houston.
>>> Currently, in this Summer as an intern, I am working with the physics
>>> department in our school to work on GROMACS in order to port it to
>>> OpenACC. My adviser for this project is Prof. Cheung.
>>> My understanding is that GROMCS currently supports NVIDIA GPUs (and also
>>> SIMDs on CPUs), however my job is to investigate the ability of
>>> transferring the code to OpenACC, which is more heterogeneous and
>>> ubiquitous compared to CUDA.
>>> My question regarding the development of GROMACS is that whether you are
>>> supporting or planning to support OpenACC currently or in future. And,
>>> you are not supporting OpenACC, my question is that how can I introduce
>>> new "kernel functions" for supporting OpenACC. How much work should be
>>> Based on my understanding, you had different kernel function for
>>> architectures (CPU, CPU with SIMD, GPU). I wanted to know how much
>>> is required to introduce new architecture like OpenACC?
>>> Before getting in touch with you, I dig into some of the code and tried
>>> parallelize the CPU version of kernel code using OpenACC constructs, but
>>> the code gets messy and the data is not transferred correctly on the
>>> device. So, my hope is to introduce new kernels for OpenACC like the way
>>> you introduced different ones for CPUs and GPUs. This way, we have more
>>> controls over data transfers and kernel codes.
>>> I hope I was clear enough and make you interested.
>>> Best Regards,
>>> Millad Ghane
>>> Computer Science Department
>>> University of Houston
>> Gromacs Developers mailing list
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>> send a mail to gmx-developers-request at gromacs.org.
> Gromacs Developers mailing list
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers