[gmx-developers] Transactional memory technology for GROMACS

Jeff Hammond jhammond at alcf.anl.gov
Wed May 8 23:42:50 CEST 2013

It is our impression at ALCF that there is essentially no good use for
TM.  All supposed use cases can also be addressed without hardware TM,
either by eliminating the data race or recasting it such that some
other mutex/atomic approach is suitable.  The overhead of hardware TM
on BGQ is quite high and - in our opinion - too high to be useful for

In any case, I think that asking the question in the form of "can we
use hardware feature X for application Y?" is fundamentally the wrong
way to think in parallel computing.  The right way to ask the question
is "what does application Y need in order to run efficiently on
hardware Z?"  The purpose of hardware TM was to enable the use of
OpenMP in regions of code that were otherwise not thread-safe but it
turns out that there is no magic bullet: there is no substitute for
intelligent programmers (who have been shown to always be able to
solve the threading problem better than hardware).

Far and away the most important hardware features on BGQ to exploit in
Gromacs are: (1) QPX vector FPU instructions, (2) L1 prefetch
features, and (3) general cache optimizations.  Unless Gromacs is
truly exceptional among all the scientific codes that I've seen, the
OpenMP and MPI usage can be improved significantly to improve parallel
performance and scalability.  This is not a criticism but rather an
acknowledgment of the finite amount of time that any scientific
programmer has to profile and tune their code for the latest

Optimizing code for Blue Gene/Q is my day job and I would be happy to
help anyone who is interested in tuning Gromacs for this architecture.
 I know that many of our users would be interested in seeing this
happen.  In an act of shameless self-promotion, I'll note that I have
tried to document all sorts of BGQ features, ranging from the mundane
to the insane, on this page:



On Wed, May 8, 2013 at 4:20 PM, Bin Liu <fdusuperstring at gmail.com> wrote:
> Dear developers,
> Could the transactional memory technology in Blue Gene/Q and Haswell be
> useful for future GROMACS kernels?  Perhaps for the OpenMP thread
> parallelization. Can GROMACS benefit anything (potentially) from the
> technology?
> If the answer is yes, does it require extra effort to make use of it? Of
> course, transactional memory is a programming paradigm. But as for Hardware
> Lock Elision, in theory the programmer doesn't need to do anything but wait
> for new version of dynamic libraries. Is it possible that the future OpenMP
> standard and compilers (gcc, icc) support HLE and GROMACS can use it by
> turning on some flags?
> Thank you very much.
> Bin
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.

Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
ALCF docs: http://www.alcf.anl.gov/user-guides

More information about the gromacs.org_gmx-developers mailing list