[gmx-developers] gromacs.org_gmx-developers Digest, Vol 183, Issue 10

Wed Jul 24 19:19:28 CEST 2019

Hi,

Thanks for all the information. Unless I am misunderstanding the numbers,
yes, compared to single precision, the GeForce cards are awful (about
1/30th of single precision performance). That being said, because single
precision speed is absurdly high, the GeForce 1080 Ti still provides
330-350 GFLOPS in double. In comparison, the i9-7940X provides about 60
GFLOPS in double according to the benchmarks I've seen (
https://ranker.sisoftware.co.uk/top_device.php?q=c2ffc9f1d7bad7eaccbe83b395fcc1f0d6be83b690e8d5e4c2a7c2ffcfe1d4f281bc8c&l=en).
So, assuming the processor can keep the GPU fed, even the "awful"
performance of the GeForce 1080 Ti would be like adding ~5 CPUs.

If you were to move to the Volta-based cards, their double precision
performance is in the 6 - 7.4 TFLOP range, or 100X faster than a high-end
i9 (and presumably way beyond what you could get from a dual or quad Xeon
system). Granted, the Volta cards are expensive, but if you want a whole
cluster in a box, it seems like a pretty good deal. (Again, if the
processor can keep the GPU fed -- I have no idea if you could actually
realize anywhere near a 100X speedup).

Either of these scenarios seems pretty good to me -- unless the processor
cannot keep the GPU fed.

Assuming that a substantial double precision speedup could be realized, I'd
be interested in knowing if anyone on the list would like to implement this
on a contract basis. If so, feel free to contact me directly.

Sincerely,
James

On Wed, Jul 24, 2019 at 7:17 AM <
gromacs.org_gmx-developers-request at maillist.sys.kth.se> wrote:

> Send gromacs.org_gmx-developers mailing list submissions to
>         gromacs.org_gmx-developers at maillist.sys.kth.se
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>
> or, via email, send a message with subject or body 'help' to
>         gromacs.org_gmx-developers-request at maillist.sys.kth.se
>
> You can reach the person managing the list at
>         gromacs.org_gmx-developers-owner at maillist.sys.kth.se
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gromacs.org_gmx-developers digest..."
>
>
> Today's Topics:
>
>    1. Re: Double precision on GPU's (Szil?rd P?ll)
>    2. Re: Double precision on GPU's (Berk Hess)
>    3. Re: Double precision on GPU's (Erik Lindahl)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 24 Jul 2019 15:20:21 +0200
> From: Szil?rd P?ll <pall.szilard at gmail.com>
> To: Discussion list for GROMACS development
>         <gmx-developers at gromacs.org>
> Subject: Re: [gmx-developers] Double precision on GPU's
> Message-ID:
>         <CANnYEw7nm=36HCY29EXSRFZyYz8bcMZWM5L=
> pNd06PNehgKknw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> Indeed, a straight port without much performance optimization may not be a
> lot of effort, but integrating an additional kernel flavor into the
> existing codebase will mean added complexity which will probably require
> some refactoring and preliminary work to accommodate the new set of kernels
> without code duplication and avoiding introducing complexity or performance
> overhead in the current kernels.
>
> However also note that most NVIDIA consumer cards -- which are very widely
> used by our users -- have a 32x lower DP throughput than SP which is far
> more than what what most people would find acceptable, I'd say.
>
> --
> Szil?rd
>
>
> On Mon, Jul 22, 2019 at 10:28 PM Berk Hess <hess at kth.se> wrote:
>
> > Hi,
> >
> > IIRC all Nvidia Tesla cards have always had double precision, at half the
> > throughput of single precision. But there are very few cases where double
> > precision is needed. Energy drift in single precision is never an issue,
> > unless you really can not use a thermostat.
> >
> > But having said that, making the GPU code, either CUDA or OpenCL work in
> > double precision is probably not much effort. But making it work
> > efficiently requires optimizing several algorithmic parameters and maybe
> > changing the arrangement of some data in the different GPU memory levels.
> >
> > Cheers,
> >
> > Berk
> >
> > On 7/22/19 10:10 PM, James wrote:
> >
> > Hi,
> >
> > My apologies if this question has been previously discussed. I just
> joined
> > the list and all I know is that from reading the docs and release
> comments,
> > writ ing code for double precision on GPU's is not a priority.
> >
> > However, I believe all recent upper-end Nvidia cards have native double
> > precision (which was not true several generations ago). So, you don't
> have
> > to have a real "scientific computing" GPU to take advantage of this --
> most
> > people probably already have the hardware. Still, I understand that most
> > people do not need/want to run double precision. But, some do (and you
> have
> > to if you are concerned with conservation of energy -- the energy drift
> in
> > single precision is substantial).
> >
> > So, I would like to ask what the level of effort to do this is believed
> to
> > be? Would it require a lot of new code, or would it be porting the single
> > precision code to double precision?
> >
> > Sincerely,
> > James
> >
> >
> > --
> > Gromacs Developers mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> > or send a mail to gmx-developers-request at gromacs.org.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/6b8b282e/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Wed, 24 Jul 2019 15:25:53 +0200
> From: Berk Hess <hess at kth.se>
> To: gmx-developers at gromacs.org
> Subject: Re: [gmx-developers] Double precision on GPU's
> Message-ID: <e49ccd11-2dc5-1d52-efe7-e39299fcf436 at kth.se>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>
> Hi,
>
> The energy conservation check can still be done with a thermostat, as
> GROMACS keeps track of the amount of energy the thermostat adds to or
> removes from the system.
>
> I don't understand what you mean exactly with "energy balance". Either
> you are interested in energy dissipation between different parts of the
> system, in which case you often can not use a thermostat, or you are not
> and then you can, and probably should, use a thermostat (and still keep
> track of energy conservation).
>
> Cheers,
>
> Berk
>
> On 2019-07-23 21:19 , James wrote:
> > Hi Berk,
> >
> > Thank you for the information. I prefer not to use thermostats, at
> > least when trying to get quantitative values, because (and I may be
> > misunderstanding thermostats, but since changing the temperature
> > changes the energy, I assume this is true) then I cannot use "energy
> > in = energy out" as a sanity check. I'm more interested in energy
> > balance than in maintaining a given temperature.
> >
> > If I were to be able to get the double precision GPU stuff coded,
> > would the team be willing to maintain it? Or, since I don't know the
> > code, perhaps a better question is: Does someone who is already
> > familiar with the relevant code have time to do this as a side project
> > (with compensation)? If anyone is interested, please feel free to
> > contact me off list.
> >
> > Thanks,
> > James
> >
> > =======================================
> > Hi,
> >
> > IIRC all Nvidia Tesla cards have always had double precision, at half
> > the throughput of single precision. But there are very few cases where
> > double precision is needed. Energy drift in single precision is never an
> > issue, unless you really can not use a thermostat.
> >
> > But having said that, making the GPU code, either CUDA or OpenCL work in
> > double precision is probably not much effort. But making it work
> > efficiently requires optimizing several algorithmic parameters and maybe
> > changing the arrangement of some data in the different GPU memory levels.
> >
> > Cheers,
> >
> > Berk
> >
> > On 7/22/19 10:10 PM, James wrote:
> > >/Hi, />//>/My apologies if this question has been previously discussed.
> I just />/joined the list and all I know is that from reading the docs and
> />/release comments, writing code for double precision on GPU's is not a
> />/priority. />//>/However, I believe all recent upper-end Nvidia cards
> have native />/double precision (which was not true several generations
> ago). So, you />/don't have to have a real "scientific computing" GPU to
> take advantage />/of this -- most people probably already have the
> hardware. Still, I />/understand that most people do not need/want to run
> double precision. />/But, some do (and you have to if you are concerned
> with conservation />/of energy -- the energy drift in single precision is
> substantial). />//>/So, I would like to ask what the level of effort to do
> this is />/believed to be? Would it require a lot of new code, or would it
> be />/porting the single precision code to double precision?
> />//>/Sincerely, />/James />
> >
> >
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/54a798c7/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Wed, 24 Jul 2019 16:17:24 +0200
> From: Erik Lindahl <erik.lindahl at gmail.com>
> To: gmx-developers at gromacs.org
> Subject: Re: [gmx-developers] Double precision on GPU's
> Message-ID: <CF6EF0AC-6C50-4CC7-95EB-049E997C610B at gmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> Just to add some minor things to what Berk already said:
>
> 1. With current nvidia cards, for the Tesla line the double precision
> floprate is 50% of single. However, for the consumer cards it's only
> 1/8-1/12th. In practice the latter is so low that it's often better to run
> on the CPU instead. This is the main reason we haven't bothered with it yet
> - most academic labs use GeForce :-)
>
> 2. You are likely well aware of this, but for users in general another
> problem with skipping thermostats for complex biophysical systems is that
> the potential energy often goes down as the system relaxes - which will
> make the temperature go up in an NVE ensemble.
>
> 3. Having said that, I think it should be a fairly straightforward minor
> project for someone. It's probably mostly a matter of changing variable
> types so they compile to double if GMX_DOUBLE is set.
>
> Cheers,
>
> Erik
>
> Erik Lindahl <erik.lindahl at scilifelab.se>
> Professor of Biophysics
> Science for Life Laboratory
> Stockholm University & KTH
> Office (SciLifeLab): +46 8 524 81567
> Cell (Sweden): +46 73 4618050
> Cell (US): +1 (650) 924 7674
>
>
>
> > On 24 Jul 2019, at 15:25, Berk Hess <hess at kth.se> wrote:
> >
> > Hi,
> >
> > The energy conservation check can still be done with a thermostat, as
> GROMACS keeps track of the amount of energy the thermostat adds to or
> removes from the system.
> >
> > I don't understand what you mean exactly with "energy balance". Either
> you are interested in energy dissipation between different parts of the
> system, in which case you often can not use a thermostat, or you are not
> and then you can, and probably should, use a thermostat (and still keep
> track of energy conservation).
> >
> > Cheers,
> >
> > Berk
> >
> >> On 2019-07-23 21:19 , James wrote:
> >> Hi Berk,
> >>
> >> Thank you for the information. I prefer not to use thermostats, at
> least when trying to get quantitative values, because (and I may be
> misunderstanding thermostats, but since changing the temperature changes
> the energy, I assume this is true) then I cannot use "energy in = energy
> out" as a sanity check. I'm more interested in energy balance than in
> maintaining a given temperature.
> >>
> >> If I were to be able to get the double precision GPU stuff coded, would
> the team be willing to maintain it? Or, since I don't know the code,
> perhaps a better question is: Does someone who is already familiar with the
> relevant code have time to do this as a side project (with compensation)?
> If anyone is interested, please feel free to contact me off list.
> >>
> >> Thanks,
> >> James
> >>
> >> =======================================
> >> Hi,
> >>
> >> IIRC all Nvidia Tesla cards have always had double precision, at half
> >> the throughput of single precision. But there are very few cases where
> >> double precision is needed. Energy drift in single precision is never
> an
> >> issue, unless you really can not use a thermostat.
> >>
> >> But having said that, making the GPU code, either CUDA or OpenCL work
> in
> >> double precision is probably not much effort. But making it work
> >> efficiently requires optimizing several algorithmic parameters and
> maybe
> >> changing the arrangement of some data in the different GPU memory
> levels.
> >>
> >> Cheers,
> >>
> >> Berk
> >>
> >> On 7/22/19 10:10 PM, James wrote:
> >> > Hi,
> >> >
> >> > My apologies if this question has been previously discussed. I just
> >> > joined the list and all I know is that from reading the docs and
> >> > release comments, writing code for double precision on GPU's is not a
> >> > priority.
> >> >
> >> > However, I believe all recent upper-end Nvidia cards have native
> >> > double precision (which was not true several generations ago). So,
> you
> >> > don't have to have a real "scientific computing" GPU to take
> advantage
> >> > of this -- most people probably already have the hardware. Still, I
> >> > understand that most people do not need/want to run double precision.
> >> > But, some do (and you have to if you are concerned with conservation
> >> > of energy -- the energy drift in single precision is substantial).
> >> >
> >> > So, I would like to ask what the level of effort to do this is
> >> > believed to be? Would it require a lot of new code, or would it be
> >> > porting the single precision code to double precision?
> >> >
> >> > Sincerely,
> >> > James
> >> >
> >>
> >>
> >
> > --
> > Gromacs Developers mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/98915b6c/attachment.html
> >
>
> ------------------------------
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
> End of gromacs.org_gmx-developers Digest, Vol 183, Issue 10
> ***********************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/1e226f22/attachment-0001.html>