[gmx-developers] Posit-enablement of GROMACS

Mark Abraham mark.j.abraham at gmail.com
Wed Aug 1 16:12:05 CEST 2018


Hi,

On Tue, Jul 31, 2018, 20:30 Theodore Omtzigt <theo at stillwater-sc.com> wrote:

> On Tue, Jul 31, Mark Abraham wrote:
>
> One of our supercomputer center customers want to enable GROMACS with this
> > new number system, and I am looking for subject matter experts that we
> > should work with to enable GROMACS with the C++ posit arithmetic library
> > that has been developed for the posit-based hardware accelerators.
> >
>
> What advantage has that center identified?
>
> Memory and MPI communication bandwidth.
>

At scale, on CPU machines, GROMACS runs inside L1 cache, and is limited by
network latency of the messages sent (mostly a few KB each). So there's no
free win there.

Posits typically compete with IEEE floating point twice their size. So
> 32bit posits tend to beat 64bit doubles in terms of numerical accuracy.
>

GROMACS already makes extensive use of mixed precision - essentially all
computation is done in single and some aspects of reduction are in double.
So e.g. if a 16-bit posit could reduce cache pressure vs a 32-bit float,
then some simulation system that used to be L2 bound could become L1 bound.
That would only be a benefit on hardware that can handle operations on the
posit type natively.

Secondly, posits with their reproducibility create pure identity
> transformations when using forward/inverse transforms. We have FFT/iFFT
> kernels using16bit posits that beat 64bit double implementations in terms
> of numerical accuracy.
>

Great, but as I mentioned we're not aware of cases where the existing
numerical accuracy of a float-based FFT implementation is insufficient at a
given set of FFT parameters (e.g. grid size and spline interpolation
order). Reproducibility is a nice attribute for debugging, but otherwise
there's no strong requirement for the trajectories to be reproducible.
Observing a specific rare event is sometimes useful, but if you only see it
in one trajectory, then it is difficult to defend such observations as
relevant.


> The FFT in particular was the first kernel we were going to enable with
> posits to improve the performance of the communication.
>

Typical 3D FFT grids used in biomolecular MD simulations are between 64
cubed and 256 cubed. Those are down the small end for most people doing
computations on FFTs, or development work to improve those computations.
Optimizations that reduce the number of messages (or data movement
generally) are likely useful, but bandwidth is not a problem.

Given this additional information, do you think that they are mistaken in
> thinking that posits would be a net benefit to performance?
>

It does sound like a generic hope, rather than one based on extensive
understanding of how high performance MD codes work. :-) I don't mean to be
negative, but MD codes are probably not the low hanging fruit for
demonstrating the benefits of this approach.

Mark


> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20180801/6518ccb5/attachment-0001.html>


More information about the gromacs.org_gmx-developers mailing list