[gmx-developers] FLOP accounting in Gromacs 4.6

Tue Sep 17 03:29:39 CEST 2013

Agreed. I regretted that as soon as I sent it. Per node, per dollar
and/or per watt are better normalizations.

Jeff

Sent from my iPhone

On Sep 16, 2013, at 6:52 PM, "Szilárd Páll" <szilard.pall at cbr.su.se> wrote:

> On Mon, Sep 16, 2013 at 3:45 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> +1 to Berk's comment.
>>
>> The fact that doing N-body w/ O(N^2) algorithm is the best way to hit
>> peak flop/s immediately suggests this is the wrong metric.
>>
>> The best (portable) performance metric I've seen for an MD code is
>> particle updates per second per core. That's what we use when
>> analyzing LAMMPS on BGQ vs x86, etc.
>
> Slightly off-topic, but even the the "per core" metric is a bit
> dangerous these days when an AMD core, especially for a floating-point
> intensive code, isn't really the same as an Intel core.
>
> --
> Szilárd
>
>>
>> There are essentially no flops in comm btw (reductions are the
>> exception); there one often uses just total time as the figure of
>> merit.
>> Jeff
>>
>> Sent from my iPhone
>>
>> On Sep 16, 2013, at 3:38 AM, Berk Hess <hess at kth.se> wrote:
>>
>>> Hi,
>>>
>>> Every time we get such a question, our return question is: why are you asking?
>>> Any application should only care about application performance and not about any other measure, such as flops.
>>>
>>> The flop rate will depend very much on the algorithm, as well as on the hardware.
>>> On Sandy/Ivy Bridge, the Verlet PME kernels reach around 50% of peak, even more with GMX_NBNXN_SIMD_4XN set:
>>> http://www.sciencedirect.com/science/article/pii/S0010465513001975
>>> The FFT probably also get around 50% of peak. But all other code is far less flop intensive. The total I get is around 30% of peak.
>>> But you can crank this up by shifting work from PME mesh to pair interactions, using GMX_NBNXN_SIMD_4XN, etc.
>>> RF will get lower flop rates, but higher ns/day, etc.
>>>
>>> So flop numbers are meaningless for most purposes.
>>> I think there are only two useful cases: analyzing algorithm performance (but combined with other measures) and convincing people when they can't be convinced that flops are a useless measure. In the latter case we should make sure to maximize the flops by optimizing the settings for that purpose.
>>> But I think the flop count is still reasonably accurate (+-10%). Flops in communication should be negligible.
>>>
>>> Cheers,
>>>
>>> Berk
>>>
>>> On 09/16/2013 10:10 AM, Carsten Kutzner wrote:
>>>> Hi,
>>>>
>>>> can I use the Mega-Flops accounting at the end of the md.log file to
>>>> calculate how much of the theoretical peak performance of a processor
>>>> Gromacs is using? I understand that Flops used in communication are
>>>> not counted, so the accounting will give me a lower estimate.
>>>>
>>>> At what percentage of the theoretical peak performance will Gromacs 4.6
>>>> typically run using the Verlet kernels and PME (let's say we have a
>>>> big MD system)?
>>>>
>>>> Do I have to divide the reported Flops by two when running single precision?
>>>>
>>>> Thanks,
>>>>  Carsten
>>>
>>> --
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-developers-request at gromacs.org.
>> --
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.