[gmx-users] (no subject)

Mark Abraham mark.j.abraham at gmail.com
Fri Nov 30 01:38:30 CET 2012

Hi GROMACS users!

We've finally got the first beta release of GROMACS 4.6 ready for you to
try out! We've put a lot of very hard work into it, and we hope you'll like
the good things we've done. Things won't be perfect yet, so we'll be
looking forward to your help finding the things we haven't done well enough
yet! Remember, if you want the big performance gains that will be available
in 4.6, then you'll want to know things will build and work well on your
hardware, and the best way of doing that is helping us over the next few
weeks. At the same time, we discourage you from doing work with this code
whose scientific reliability you need to trust - this is very much a draft
version of the software!

You can find the manual here
ftp://ftp.gromacs.org/pub/manual/gromacs-manual-4.6-beta1.pdf and the
source code here ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.6-beta1.tar.gz.

It would be great for us if some of you want to try out the new code on
lots of different hardware and operation systems and report build problems,
inconsistencies, strange or lacking documentation and in worst case pure
bugs. To tempt you to do so here's a bit of a carrot corresponding to the
new features:

* A brand-new native GPU implementation layer. Gromacs now does
heterogeneous parallalization using both CPUs and modern NVIDIA GPUs at the
same time, the GPU port also works in parallel using both multiple cards in
a node or multiple nodes, and it's smoking fast. There's lots of heroic
work by Szilard Pall and Berk Hess here, and special thanks to NVIDIA and
Mark Berger for their assistance in making this happen.

* Gromacs can now use OpenMP parallelization for better scaling inside
nodes, in particular when doing the FFT part on the CPU while the GPU does
the normal nonbonded interactions.

* Automatic load balancing between direct-space and PME nodes, and lots of
improvements in domain decomposition load balancing and scaling.

* We have a brand new set of classical nonbonded interaction kernels, and
Gromacs can now use either SSE2, SSE4.1, 128-bit AVX with FMA support (AMD)
or 256-bit AVX (Intel), all of them in both single and double precision.
The performance difference depends on your system and parallelization, but
it is quite large in many cases - we have seen >40% improvement on ion
channels running on modern AMD machines!  Did we mention that the classical
C kernels are faster too since we can now do force-only interactions for
most steps?

* There are new kernels using analytical switch/shift functions that are
quite a bit faster, and a new CPU-implementation of verlet kernels that
guarantee buffered interactions (no atoms drifting in/out of the neighbor
list range) that conserve energy extremely well.

* There is a large new module to do advanced free energy calculations,
thanks to Michael Shirts. Trust us, you need the full manual to decipher
all the possibilities…

* Gromacs has switched completely to CMake for configuration and building.
To be honest, we do expect some hiccups from this, but it has enabled us to
provide much more automation and advanced features as part of the setup -
and Gromacs now works on Windows out-of-the-box. Please test as many parts
of the build system as you can!

* All raw assembly has been replaced by machine intrinsics in C. This does
wonders for readability, but it means the compiler and compiler flags
matter. On x86, you will typically get 5-10% better performance from icc
than gcc.

Happy simulating!

The GROMACS development team

More information about the gromacs.org_gmx-users mailing list