[gmx-developers] GROMACS 4.6 pre-release: new algorithms, parallelization schemes, & GPU acceleration

Szilárd Páll szilard.pall at cbr.su.se
Thu Mar 1 00:59:20 CET 2012

Dear Developers,

With the GROMACS 4.6 release shaping up, I am writing to announce the
public availability of some of the major features this release will
introduce, and invite you for testing them:
- "Verlet" cut-off scheme and the "nbnxn" non-bonded algorithms;
- multi-level parallelization (MPI+OpenMP);
- heterogeneous parallelization with native GPU-acceleration (MPI+OpenMP+CUDA).

The code implementing the above features is brand new and requires
intensive testing before reaching the beta state. The aim of this
pre-beta phase is to uncover the remaining bugs and performance
issues. To speed up the testing process, we would like to ask for your
help with it.

The code is in a stable, close to feature-complete state and resides
in the nbnxn_hybrid_acc feature-branch during the pre-release testing.
The pre-release phase effectively starts now and lasts until the code
gets merged in to the 4.6 release branch. Fixes and improvements will
be regularly pushed into the repository, so keep pulling for updates.
Also, if you experience problems, the first thing you should do is to
pull the changes and see if there are any new fixes that solve your

Before jumping at testing here's some documentation you should check out first:
- Detailed description of the GROMACS cut-off schemes:
- Introduction to acceleration & parallelization in GROMACS:
- CMake instructions:
(Note that these wiki pages are still under construction; if you wish
to help with improving them, contact us.)

One-liner for the impatient ones:
$ git clone git://git.gromacs.org/gromacs.git -b nbnxn_hybrid_acc &&
cd gromacs && cmake ../ && make install -j4

Bug reports, suggestions on improvements, as well as bugfixes are
welcome and much appreciated! Please consider submitting fixes to our
gerrit code-review page (gerrit.gromacs.org), it's the quickest and
easiest way contribute a patch for GROMACS!


Footnote -- Known yet to be solved issues:
- preliminary implementation of run configuration setup -- related
mdrun options and environment variables will change;
- preliminary (naive) NUMA awareness implementation -- can't use HT
out of the box with (thread-)MPI+OpenMP;
- manual CPU/core affinity setting needed for multiple process/machine
runs (automatic thread pinning has to be turned off);
- *avoid CUDA 4.1* for now; the GPU nbnxn kernels are, due to a
regression in the nvcc compiler, 5-10% slower with CUDA 4.1 compared
to 4.0;
- CMake:
 - no support for advanced CPU acceleration (AVX, SSE 4.1, native optimizations)
 - no ICC 12 support
 - not Windows support

More information about the gromacs.org_gmx-developers mailing list