[gmx-developers] FLOP accounting in Gromacs 4.6
jeff.science at gmail.com
Tue Sep 17 03:29:39 CEST 2013
Agreed. I regretted that as soon as I sent it. Per node, per dollar
and/or per watt are better normalizations.
Sent from my iPhone
On Sep 16, 2013, at 6:52 PM, "Szilárd Páll" <szilard.pall at cbr.su.se> wrote:
> On Mon, Sep 16, 2013 at 3:45 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> +1 to Berk's comment.
>> The fact that doing N-body w/ O(N^2) algorithm is the best way to hit
>> peak flop/s immediately suggests this is the wrong metric.
>> The best (portable) performance metric I've seen for an MD code is
>> particle updates per second per core. That's what we use when
>> analyzing LAMMPS on BGQ vs x86, etc.
> Slightly off-topic, but even the the "per core" metric is a bit
> dangerous these days when an AMD core, especially for a floating-point
> intensive code, isn't really the same as an Intel core.
>> There are essentially no flops in comm btw (reductions are the
>> exception); there one often uses just total time as the figure of
>> Sent from my iPhone
>> On Sep 16, 2013, at 3:38 AM, Berk Hess <hess at kth.se> wrote:
>>> Every time we get such a question, our return question is: why are you asking?
>>> Any application should only care about application performance and not about any other measure, such as flops.
>>> The flop rate will depend very much on the algorithm, as well as on the hardware.
>>> On Sandy/Ivy Bridge, the Verlet PME kernels reach around 50% of peak, even more with GMX_NBNXN_SIMD_4XN set:
>>> The FFT probably also get around 50% of peak. But all other code is far less flop intensive. The total I get is around 30% of peak.
>>> But you can crank this up by shifting work from PME mesh to pair interactions, using GMX_NBNXN_SIMD_4XN, etc.
>>> RF will get lower flop rates, but higher ns/day, etc.
>>> So flop numbers are meaningless for most purposes.
>>> I think there are only two useful cases: analyzing algorithm performance (but combined with other measures) and convincing people when they can't be convinced that flops are a useless measure. In the latter case we should make sure to maximize the flops by optimizing the settings for that purpose.
>>> But I think the flop count is still reasonably accurate (+-10%). Flops in communication should be negligible.
>>> On 09/16/2013 10:10 AM, Carsten Kutzner wrote:
>>>> can I use the Mega-Flops accounting at the end of the md.log file to
>>>> calculate how much of the theoretical peak performance of a processor
>>>> Gromacs is using? I understand that Flops used in communication are
>>>> not counted, so the accounting will give me a lower estimate.
>>>> At what percentage of the theoretical peak performance will Gromacs 4.6
>>>> typically run using the Verlet kernels and PME (let's say we have a
>>>> big MD system)?
>>>> Do I have to divide the reported Flops by two when running single precision?
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org
>>> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-developers-request at gromacs.org.
>> gmx-developers mailing list
>> gmx-developers at gromacs.org
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-developers-request at gromacs.org.
> gmx-developers mailing list
> gmx-developers at gromacs.org
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers