[gmx-developers] Any news on SIMD intrinsic accelerated kernel for Blue Gene/Q?

Szilárd Páll szilard.pall at cbr.su.se
Thu May 9 03:45:49 CEST 2013


On Wed, May 8, 2013 at 11:48 PM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
> Have you compiled Gromacs with auto-vectorization turned on (e.g.
> CFLAGS = -g -O3 -qarch=qp -qtune=qp -qsimd=auto -qhot=level=1
> -qprefetch -qunroll=yes -qreport) and run the code through HPM (see
> https://wiki.alcf.anl.gov/bgq-earlyaccess/images/4/4a/Hpct-bgq.pdf for
> instructions - it requires only modest source changes) to determine
> that the compiler is not capable of vectorizing the code?  I am far
> from an XL fanboy but it is imprudent to devote human effort when a
> machine can do the job already.

I doubt that auto-vectorization will give much performance benefit -
at least based on my experience on x86. However, as we've invested no
effort into making the code auto-vectorization friendly, it may be
possible to get the plain C kernels more vectorizer-friendly.

>
> The HPM results will be of value no matter what the compiler is doing,
> as this level of profiling information is absolutely critical to
> making intelligent choices about how to optimize any code for BGQ.
> You might discover, for example, that QPX is not the best place to
> start tuning anyways.  It could be that there are memory access issues
> that are costing Gromacs far more than a factor of 4 (that is the most
> one can expect from a 4-way FPU, of course - I assume that the FMA is
> already used because it is generic to PowerPC).

I'm not sure what you mean by "generic to PowerPC", but unless the
compiler happens to be able to auto-vectorize a lot of the kernel
code, FMA will not be used (intensively).

If we are already at it, I'd like to ask your opinion on the ease of
use and potential advantage of a few BG features that looked
particularly interesting to me at first while briefly browsing through
the docs. Note that I have very limited knowledge of BG, so forgive me
if I'm asking trivial questions.

- Efficient atomics: how efficient is efficient? What are the
consequences of having to "predesignate" memory for atomic use? How
limiting is the number of memory translation entries?
What makes it sounds particularly interesting is that stores are
queued and do not stall the CPU. So, more concretely, is it feasible
to implement a reduction with OpenMP threads doing potentially
conflicting updates (low conflict rate) and get this faster than code
that produces N outputs on N threads and reducing these? The fact that

- Is OpenMP synchronization much faster than on x86 (due to the above)?

- Collective FP network operations: is it feasible to use it for
anything but huge problem sizes? I am thinking about potential use in
the reduction required after halo exchange with domain-decomposition
(which is otherwise quite lightweight).

- What is the efficiency of SMT, in particular compared to
HyperThreading (for compute/cache intensive code like MD)?


Cheers,
--
Szilárd

>
> Best,
>
> Jeff
>
> On Wed, May 8, 2013 at 3:46 PM, Bin Liu <fdusuperstring at gmail.com> wrote:
>> Dear developers,
>>
>>
>> In the GROMACS acceleration and parallelization webpage, I found
>>
>> We will add Blue Gene P and/or Q and AVX2 in the near future.
>>
>> I am quite excited about this news, since I am a researcher in Canada, and
>> the University of Toronto has purchased a Blue Gene Q cluster which will be
>> operated under Scinet Consortium.
>>
>> https://support.scinet.utoronto.ca/wiki/index.php/BGQ#System_Status:_BETA
>>
>> Without the SIMD intrinsic accelerated kernel for Blue Gene/Q, perhaps they
>> won't install GROMACS on it since a lot of computational resources are going
>> to be wasted. If the GROMACS developer can implement  the SIMD intrinsic
>> accelerated kernel, I will be more than grateful for that.
>>
>>
>> Bin
>>
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> ALCF docs: http://www.alcf.anl.gov/user-guides
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.



More information about the gromacs.org_gmx-developers mailing list