[gmx-developers] OpenACC Development

Fri Jul 1 18:38:57 CEST 2016

Hi Millad,

As Berk explained, the problem is that there is no optimal OpenACC programming model you can phrase everything in.

Xeon Phi, Cuda, AMD, and x86 have completely different properties. At some point we will have to choose a layout for higher-level data structures, and there is no single model that will be optimal for all of them.

OpenACC will never be able to compete with:

1) The low-level CUDA optimization we already have for NVIDIA
2) The low-level SIMD implementation we have for all CPU architectures
3) The low-level OpenCL implementation we have for AMD 
4) The low-level AVX512 kernels we for KNL.

We are still not entirely happy with the KNL performance, but that is not something that can be fixed in the kernels - instead we need to rewrite the highest-level code so we can have hundreds of threads executing efficiently. OpenACC in any kernels won’t fix that.

Cheers,

Erik

> On 01 Jul 2016, at 18:17, Millad Ghane <mghane at cs.uh.edu> wrote:
> 
> Hi Berk,
> 
> Thanks for your reply.
> 
> I know that OpenACC is a programming language. What I meant by OpenACC
> architecture was "OpenACC software architecture". I know that eventually,
> the OpenACC code is executed by an underlying hardware architecture (which
> could be NVidia or AMD/ATI or even Xeon PHI from Intel).
> 
> What I hope to achieve: introducing a set of kernels that are optimized
> for OpenACC.
> What I have done: trying to parallelize the 4x4 Plain-C code for CPU using
> OpenACC compiler directives.
> 
> The problem is that the code and especially the data structures (in
> Plain-C CPU kernels) get complex in kernel level to some extent.
> Therefore, that was what I was looking for: introducing kernels for
> OpenACC programming model.
> 
> 
> Best Regards,
> Millad
> 
> 
> 
> 
>> Hi Millad,
>> 
>> Welcome to the list.
>> 
>> GROMACS aims for close to optimal performance on all relevant hardware.
>> To achieve this, we write highly optimized non-bonded (and other)
>> kernels for all relevant architectures. Currently we have plain-C CPU
>> kernels (extremely slow), SIMD, CUDA and OpenCL non-bonded kernels. The
>> optimization is not so much in the arithmetics, but rather in the data
>> layout. OpenACC is not an architecture, but a programming standard. So
>> you would need to choose a target architecture for your OpenACC
>> "acceleration" and then choose one of our kernel types. But the only
>> gain of this would be semi-automated offloading. The question is then if
>> the offloading will be efficient enough.
>> 
>> Cheers,
>> 
>> Berk
>> 
>> On 2016-07-01 02:05, Millad Ghane wrote:
>>> Hello everyone,
>>> 
>>> I am a PhD student in computer science at University of Houston.
>>> Currently, in this Summer as an intern, I am working with the physics
>>> department in our school to work on GROMACS in order to port it to
>>> OpenACC. My adviser for this project is Prof. Cheung.
>>> 
>>> My understanding is that GROMCS currently supports NVIDIA GPUs (and also
>>> SIMDs on CPUs), however my job is to investigate the ability of
>>> transferring the code to OpenACC, which is more heterogeneous and
>>> ubiquitous compared to CUDA.
>>> 
>>> My question regarding the development of GROMACS is that whether you are
>>> supporting or planning to support OpenACC currently or in future. And,
>>> if
>>> you are not supporting OpenACC, my question is that how can I introduce
>>> new "kernel functions" for supporting OpenACC. How much work should be
>>> done?
>>> 
>>> Based on my understanding, you had different kernel function for
>>> different
>>> architectures (CPU, CPU with SIMD, GPU). I wanted to know how much
>>> effort
>>> is required to introduce new architecture like OpenACC?
>>> 
>>> Before getting in touch with you, I dig into some of the code and tried
>>> to
>>> parallelize the CPU version of kernel code using OpenACC constructs, but
>>> the code gets messy and the data is not transferred correctly on the
>>> device. So, my hope is to introduce new kernels for OpenACC like the way
>>> you introduced different ones for CPUs and GPUs. This way, we have more
>>> controls over data transfers and kernel codes.
>>> 
>>> 
>>> I hope I was clear enough and make you interested.
>>> 
>>> 
>>> Best Regards,
>>> Millad Ghane
>>> Computer Science Department
>>> University of Houston
>>> 
>>> 
>> 
>> --
>> Gromacs Developers mailing list
>> 
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>> 
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>> send a mail to gmx-developers-request at gromacs.org.
>> 
> 
> 
> -- 
> Gromacs Developers mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.