[gmx-developers] gromacs.org_gmx-developers Digest, Vol 183, Issue 10

Mon Aug 12 12:08:55 CEST 2019

Hi,

Raw TFlops is not a very poot indicative of the performance of most
real-world code as well as our compute-bound kernels, let alone those
that are more memory-bound.
For an overview of how raw (SP) flop-rate relates to the performance
of GROMACS GPU kernels take a look at our analysis in a recent
publication (Figs 2/3):
https://onlinelibrary.wiley.com/doi/full/10.1002/jcc.26011

Lots of TFlops can end up contributing very little to effective
application performance if other parts of the architecture, like the
memory subsystem end up being the severely limiting factor  -- as can
be seen quite often with GPU accelerators as well as modern wide-SIMD
CPUs.

Cheers,
--
Szilárd

On Wed, Jul 24, 2019 at 7:20 PM James <james at ryley.com> wrote:
>
> Hi,
>
> Thanks for all the information. Unless I am misunderstanding the numbers, yes, compared to single precision, the GeForce cards are awful (about 1/30th of single precision performance). That being said, because single precision speed is absurdly high, the GeForce 1080 Ti still provides 330-350 GFLOPS in double. In comparison, the i9-7940X provides about 60 GFLOPS in double according to the benchmarks I've seen (https://ranker.sisoftware.co.uk/top_device.php?q=c2ffc9f1d7bad7eaccbe83b395fcc1f0d6be83b690e8d5e4c2a7c2ffcfe1d4f281bc8c&l=en). So, assuming the processor can keep the GPU fed, even the "awful" performance of the GeForce 1080 Ti would be like adding ~5 CPUs.
>
> If you were to move to the Volta-based cards, their double precision performance is in the 6 - 7.4 TFLOP range, or 100X faster than a high-end i9 (and presumably way beyond what you could get from a dual or quad Xeon system). Granted, the Volta cards are expensive, but if you want a whole cluster in a box, it seems like a pretty good deal. (Again, if the processor can keep the GPU fed -- I have no idea if you could actually realize anywhere near a 100X speedup).
>
> Either of these scenarios seems pretty good to me -- unless the processor cannot keep the GPU fed.
>
> Assuming that a substantial double precision speedup could be realized, I'd be interested in knowing if anyone on the list would like to implement this on a contract basis. If so, feel free to contact me directly.
>
> Sincerely,
> James
>
>
> On Wed, Jul 24, 2019 at 7:17 AM <gromacs.org_gmx-developers-request at maillist.sys.kth.se> wrote:
>>
>> Send gromacs.org_gmx-developers mailing list submissions to
>>         gromacs.org_gmx-developers at maillist.sys.kth.se
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>>
>> or, via email, send a message with subject or body 'help' to
>>         gromacs.org_gmx-developers-request at maillist.sys.kth.se
>>
>> You can reach the person managing the list at
>>         gromacs.org_gmx-developers-owner at maillist.sys.kth.se
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of gromacs.org_gmx-developers digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Re: Double precision on GPU's (Szil?rd P?ll)
>>    2. Re: Double precision on GPU's (Berk Hess)
>>    3. Re: Double precision on GPU's (Erik Lindahl)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 24 Jul 2019 15:20:21 +0200
>> From: Szil?rd P?ll <pall.szilard at gmail.com>
>> To: Discussion list for GROMACS development
>>         <gmx-developers at gromacs.org>
>> Subject: Re: [gmx-developers] Double precision on GPU's
>> Message-ID:
>>         <CANnYEw7nm=36HCY29EXSRFZyYz8bcMZWM5L=pNd06PNehgKknw at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi,
>>
>> Indeed, a straight port without much performance optimization may not be a
>> lot of effort, but integrating an additional kernel flavor into the
>> existing codebase will mean added complexity which will probably require
>> some refactoring and preliminary work to accommodate the new set of kernels
>> without code duplication and avoiding introducing complexity or performance
>> overhead in the current kernels.
>>
>> However also note that most NVIDIA consumer cards -- which are very widely
>> used by our users -- have a 32x lower DP throughput than SP which is far
>> more than what what most people would find acceptable, I'd say.
>>
>> --
>> Szil?rd
>>
>>
>> On Mon, Jul 22, 2019 at 10:28 PM Berk Hess <hess at kth.se> wrote:
>>
>> > Hi,
>> >
>> > IIRC all Nvidia Tesla cards have always had double precision, at half the
>> > throughput of single precision. But there are very few cases where double
>> > precision is needed. Energy drift in single precision is never an issue,
>> > unless you really can not use a thermostat.
>> >
>> > But having said that, making the GPU code, either CUDA or OpenCL work in
>> > double precision is probably not much effort. But making it work
>> > efficiently requires optimizing several algorithmic parameters and maybe
>> > changing the arrangement of some data in the different GPU memory levels.
>> >
>> > Cheers,
>> >
>> > Berk
>> >
>> > On 7/22/19 10:10 PM, James wrote:
>> >
>> > Hi,
>> >
>> > My apologies if this question has been previously discussed. I just joined
>> > the list and all I know is that from reading the docs and release comments,
>> > writ ing code for double precision on GPU's is not a priority.
>> >
>> > However, I believe all recent upper-end Nvidia cards have native double
>> > precision (which was not true several generations ago). So, you don't have
>> > to have a real "scientific computing" GPU to take advantage of this -- most
>> > people probably already have the hardware. Still, I understand that most
>> > people do not need/want to run double precision. But, some do (and you have
>> > to if you are concerned with conservation of energy -- the energy drift in
>> > single precision is substantial).
>> >
>> > So, I would like to ask what the level of effort to do this is believed to
>> > be? Would it require a lot of new code, or would it be porting the single
>> > precision code to double precision?
>> >
>> > Sincerely,
>> > James
>> >
>> >
>> > --
>> > Gromacs Developers mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> > or send a mail to gmx-developers-request at gromacs.org.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/6b8b282e/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 24 Jul 2019 15:25:53 +0200
>> From: Berk Hess <hess at kth.se>
>> To: gmx-developers at gromacs.org
>> Subject: Re: [gmx-developers] Double precision on GPU's
>> Message-ID: <e49ccd11-2dc5-1d52-efe7-e39299fcf436 at kth.se>
>> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>>
>> Hi,
>>
>> The energy conservation check can still be done with a thermostat, as
>> GROMACS keeps track of the amount of energy the thermostat adds to or
>> removes from the system.
>>
>> I don't understand what you mean exactly with "energy balance". Either
>> you are interested in energy dissipation between different parts of the
>> system, in which case you often can not use a thermostat, or you are not
>> and then you can, and probably should, use a thermostat (and still keep
>> track of energy conservation).
>>
>> Cheers,
>>
>> Berk
>>
>> On 2019-07-23 21:19 , James wrote:
>> > Hi Berk,
>> >
>> > Thank you for the information. I prefer not to use thermostats, at
>> > least when trying to get quantitative values, because (and I may be
>> > misunderstanding thermostats, but since changing the temperature
>> > changes the energy, I assume this is true) then I cannot use "energy
>> > in = energy out" as a sanity check. I'm more interested in energy
>> > balance than in maintaining a given temperature.
>> >
>> > If I were to be able to get the double precision GPU stuff coded,
>> > would the team be willing to maintain it? Or, since I don't know the
>> > code, perhaps a better question is: Does someone who is already
>> > familiar with the relevant code have time to do this as a side project
>> > (with compensation)? If anyone is interested, please feel free to
>> > contact me off list.
>> >
>> > Thanks,
>> > James
>> >
>> > =======================================
>> > Hi,
>> >
>> > IIRC all Nvidia Tesla cards have always had double precision, at half
>> > the throughput of single precision. But there are very few cases where
>> > double precision is needed. Energy drift in single precision is never an
>> > issue, unless you really can not use a thermostat.
>> >
>> > But having said that, making the GPU code, either CUDA or OpenCL work in
>> > double precision is probably not much effort. But making it work
>> > efficiently requires optimizing several algorithmic parameters and maybe
>> > changing the arrangement of some data in the different GPU memory levels.
>> >
>> > Cheers,
>> >
>> > Berk
>> >
>> > On 7/22/19 10:10 PM, James wrote:
>> > >/Hi, />//>/My apologies if this question has been previously discussed. I just />/joined the list and all I know is that from reading the docs and />/release comments, writing code for double precision on GPU's is not a />/priority. />//>/However, I believe all recent upper-end Nvidia cards have native />/double precision (which was not true several generations ago). So, you />/don't have to have a real "scientific computing" GPU to take advantage />/of this -- most people probably already have the hardware. Still, I />/understand that most people do not need/want to run double precision. />/But, some do (and you have to if you are concerned with conservation />/of energy -- the energy drift in single precision is substantial). />//>/So, I would like to ask what the level of effort to do this is />/believed to be? Would it require a lot of new code, or would it be />/porting the single precision code to double precision? />//>/Sincerely, />/James />
>> >
>> >
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/54a798c7/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Wed, 24 Jul 2019 16:17:24 +0200
>> From: Erik Lindahl <erik.lindahl at gmail.com>
>> To: gmx-developers at gromacs.org
>> Subject: Re: [gmx-developers] Double precision on GPU's
>> Message-ID: <CF6EF0AC-6C50-4CC7-95EB-049E997C610B at gmail.com>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Hi,
>>
>> Just to add some minor things to what Berk already said:
>>
>> 1. With current nvidia cards, for the Tesla line the double precision floprate is 50% of single. However, for the consumer cards it's only 1/8-1/12th. In practice the latter is so low that it's often better to run on the CPU instead. This is the main reason we haven't bothered with it yet - most academic labs use GeForce :-)
>>
>> 2. You are likely well aware of this, but for users in general another problem with skipping thermostats for complex biophysical systems is that the potential energy often goes down as the system relaxes - which will make the temperature go up in an NVE ensemble.
>>
>> 3. Having said that, I think it should be a fairly straightforward minor project for someone. It's probably mostly a matter of changing variable types so they compile to double if GMX_DOUBLE is set.
>>
>> Cheers,
>>
>> Erik
>>
>> Erik Lindahl <erik.lindahl at scilifelab.se>
>> Professor of Biophysics
>> Science for Life Laboratory
>> Stockholm University & KTH
>> Office (SciLifeLab): +46 8 524 81567
>> Cell (Sweden): +46 73 4618050
>> Cell (US): +1 (650) 924 7674
>>
>>
>>
>> > On 24 Jul 2019, at 15:25, Berk Hess <hess at kth.se> wrote:
>> >
>> > Hi,
>> >
>> > The energy conservation check can still be done with a thermostat, as GROMACS keeps track of the amount of energy the thermostat adds to or removes from the system.
>> >
>> > I don't understand what you mean exactly with "energy balance". Either you are interested in energy dissipation between different parts of the system, in which case you often can not use a thermostat, or you are not and then you can, and probably should, use a thermostat (and still keep track of energy conservation).
>> >
>> > Cheers,
>> >
>> > Berk
>> >
>> >> On 2019-07-23 21:19 , James wrote:
>> >> Hi Berk,
>> >>
>> >> Thank you for the information. I prefer not to use thermostats, at least when trying to get quantitative values, because (and I may be misunderstanding thermostats, but since changing the temperature changes the energy, I assume this is true) then I cannot use "energy in = energy out" as a sanity check. I'm more interested in energy balance than in maintaining a given temperature.
>> >>
>> >> If I were to be able to get the double precision GPU stuff coded, would the team be willing to maintain it? Or, since I don't know the code, perhaps a better question is: Does someone who is already familiar with the relevant code have time to do this as a side project (with compensation)? If anyone is interested, please feel free to contact me off list.
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >> =======================================
>> >> Hi,
>> >>
>> >> IIRC all Nvidia Tesla cards have always had double precision, at half
>> >> the throughput of single precision. But there are very few cases where
>> >> double precision is needed. Energy drift in single precision is never an
>> >> issue, unless you really can not use a thermostat.
>> >>
>> >> But having said that, making the GPU code, either CUDA or OpenCL work in
>> >> double precision is probably not much effort. But making it work
>> >> efficiently requires optimizing several algorithmic parameters and maybe
>> >> changing the arrangement of some data in the different GPU memory levels.
>> >>
>> >> Cheers,
>> >>
>> >> Berk
>> >>
>> >> On 7/22/19 10:10 PM, James wrote:
>> >> > Hi,
>> >> >
>> >> > My apologies if this question has been previously discussed. I just
>> >> > joined the list and all I know is that from reading the docs and
>> >> > release comments, writing code for double precision on GPU's is not a
>> >> > priority.
>> >> >
>> >> > However, I believe all recent upper-end Nvidia cards have native
>> >> > double precision (which was not true several generations ago). So, you
>> >> > don't have to have a real "scientific computing" GPU to take advantage
>> >> > of this -- most people probably already have the hardware. Still, I
>> >> > understand that most people do not need/want to run double precision.
>> >> > But, some do (and you have to if you are concerned with conservation
>> >> > of energy -- the energy drift in single precision is substantial).
>> >> >
>> >> > So, I would like to ask what the level of effort to do this is
>> >> > believed to be? Would it require a lot of new code, or would it be
>> >> > porting the single precision code to double precision?
>> >> >
>> >> > Sincerely,
>> >> > James
>> >> >
>> >>
>> >>
>> >
>> > --
>> > Gromacs Developers mailing list
>> >
>> > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20190724/98915b6c/attachment.html>
>>
>> ------------------------------
>>
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.
>>
>> End of gromacs.org_gmx-developers Digest, Vol 183, Issue 10
>> ***********************************************************
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or send a mail to gmx-developers-request at gromacs.org.