[gmx-developers] FLOP accounting in Gromacs 4.6

Jeff Hammond jeff.science at gmail.com
Mon Sep 16 15:45:19 CEST 2013

+1 to Berk's comment.

The fact that doing N-body w/ O(N^2) algorithm is the best way to hit
peak flop/s immediately suggests this is the wrong metric.

The best (portable) performance metric I've seen for an MD code is
particle updates per second per core. That's what we use when
analyzing LAMMPS on BGQ vs x86, etc.

There are essentially no flops in comm btw (reductions are the
exception); there one often uses just total time as the figure of


Sent from my iPhone

On Sep 16, 2013, at 3:38 AM, Berk Hess <hess at kth.se> wrote:

> Hi,
> Every time we get such a question, our return question is: why are you asking?
> Any application should only care about application performance and not about any other measure, such as flops.
> The flop rate will depend very much on the algorithm, as well as on the hardware.
> On Sandy/Ivy Bridge, the Verlet PME kernels reach around 50% of peak, even more with GMX_NBNXN_SIMD_4XN set:
> http://www.sciencedirect.com/science/article/pii/S0010465513001975
> The FFT probably also get around 50% of peak. But all other code is far less flop intensive. The total I get is around 30% of peak.
> But you can crank this up by shifting work from PME mesh to pair interactions, using GMX_NBNXN_SIMD_4XN, etc.
> RF will get lower flop rates, but higher ns/day, etc.
> So flop numbers are meaningless for most purposes.
> I think there are only two useful cases: analyzing algorithm performance (but combined with other measures) and convincing people when they can't be convinced that flops are a useless measure. In the latter case we should make sure to maximize the flops by optimizing the settings for that purpose.
> But I think the flop count is still reasonably accurate (+-10%). Flops in communication should be negligible.
> Cheers,
> Berk
> On 09/16/2013 10:10 AM, Carsten Kutzner wrote:
>> Hi,
>> can I use the Mega-Flops accounting at the end of the md.log file to
>> calculate how much of the theoretical peak performance of a processor
>> Gromacs is using? I understand that Flops used in communication are
>> not counted, so the accounting will give me a lower estimate.
>> At what percentage of the theoretical peak performance will Gromacs 4.6
>> typically run using the Verlet kernels and PME (let's say we have a
>> big MD system)?
>> Do I have to divide the reported Flops by two when running single precision?
>> Thanks,
>>   Carsten
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-developers-request at gromacs.org.

More information about the gromacs.org_gmx-developers mailing list