[gmx-developers] How to parallel Gromacs with multi-thread?

Tue Nov 24 15:57:33 CET 2015

Hi,

Please don't reply to a digest, because that is confusing for everybody...
(and for a mailing list with O(1) posts per day, in which you want to
participate in discussion, please subscribe in non-digest mode!)

Vincent said

> Hi, the architecture of my machine is like CPU+GPU which are both on
chip. I can use MPI but it will only use the "CPU" part of my machine. So I
want to parallelize the hotspot by offload it to the "GPU" part. My idea is
to do some modification on the code to split the nblist which is like the
OpenMP version of the code. But for the lack of support of OpenMP, is there
anyway to split the nblist while compiling with OpenMP=OFF?

Such loops do O(100k) operations, which is far too small to think about
accelerating with offloading to a device with maybe hundreds of cores,
unless maybe your GPU shares L3 cache.

You would do well to consider reading some of the published work about the
parallelization of GROMACS, e.g.
http://www.sciencedirect.com/science/article/pii/S0010465513001975. There's
already an OpenCL port in GROMACS, too.

What is your objective? Why is parallelising GROMACS on some mystery
hardware with neither SIMD nor OpenMP a good thing to do for someone
learning about GROMACS?

Mark

On Tue, Nov 24, 2015 at 9:25 AM Berk Hess <hess at kth.se> wrote:

> Hi,
>
> Why not simply use MPI parallelization?
>
> But what (exotic) architecture does not support OpenMP and SIMD? If you
> don't have SIMD, I would think it's not worth using it for production. You
> get great performance from a cheap Intel CPU + NVidia GPU machine.
>
> Cheers,
>
> Berk
>
>
> On 2015-11-24 06:05, Vinson Leung wrote:
>
> Hi everyone, I am new learner to Gromacs and I want to implement Gromacs
> in a multi-core CPU machine which is  for my research. Because the machine
> we use only support MPI (no openmp, no SIMD), so I profiled the MPI-only
> version of Gromacs-5.0.4 and found that the hotspot was nbnxn_kernel_ref()
> in src/gromacs/mdlib/nbnxn_kernel_ref.c which occupied 80% of the total
> running time. Naturally I want to accelerate the nbnxn_kernel_ref() by
> parallelization with multi-thread. After I simply make some analysis and
> found that the structure of nbnxn_kernel_ref() is like below:
> ========================================================
> for (nb = 0 ; nb < nnbl; nb++)
> {
> ......
>       for( n = 0 ; n < nbl->nci ; n++ )  // defined in
> nbnxn_kernel_ref_outer.h
>       {
>       ....
>       }
> ...h
> }
> ========================================================
> So here is my quesion. When I compile with OpenMP=OFF, the value of nnbl
> is 1 during the runtime.  So  can I  parallelize the inner loop by just
> evenly separate the inner loop  ( nbl->nci )  to multi-core?
> Or I found that when I compile with OpenMP=ON the value of nnbl is not 1
> which I can parallelize the outer loop with multi-thread. But my machine
> does not support OpenMP. So is there any way to make some modification in
> the code and compile with OpenMP=OFF to make the value of nnbl is not 1?
> Thanks.
>
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20151124/986b766b/attachment.html>