[gmx-developers] automated performance testing

Fri Oct 17 01:27:49 CEST 2014

Hi,

There have been attempts at putting together a benchmark suite, the
first one quite a few years ago resulting in the gmxbench.sh script
and few input systems; more additions have been made a couple of
months ago, everything is here: git.gromacs.org/benchmarks.git

On Tue, Sep 30, 2014 at 6:22 PM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
> Hi,
>
> Cherry-picking Michael's email into its own thread:
>
>> Is there a plan (long term) to do (essentially) automated performance
>> tests so that we
>> can perform consistent(ish) checks for new changes in code, then post the
>> results in an
>> easy(ish) way to interpret for others?
>
> There's no organized plan.  I've lately been trying to organize a dedicated
> machine here so we can start to do some of this - had we had it and the
> right kinds of tests then various bugs would not have gone unnoticed.

While a machine is useful, I think the what and how to benchmarks are
to clarify first; given the difficulties in concretizing the benchmark
setup in the past, these aspects require attention earlier rather than
later, I think. Some of the questions that previous attempts brought
up:
- facilitate comparing results (to other codes or older version of
GROMACS) while avoiding the pitfall of using of "smallest" common
denominator features/algorithms (like JACC or the STFC benchmarks);
- test algorithms or functionality: one may be interested in the
algorithmic performance while others want to know how fast can one
compute X.

> In
> principle, people could run that test suite on their own hardware, of
> course.

I think the best would be if many people ran on many different hardware.

However, I think reproducibility is quite tricky. It can only be
ensured if we get a couple of identical machines, set them up with
identical software and avoid most upgrades (e.g. kernel, libc[++],
etc.) that can affect performance, keeping a machine as backup to
avoid having to rerun all reference runs when the hardware breaks.
Even this will only ensure observing performance on one particular
piece of hardware with the set of algorithms, parallelization,
optimizations actually used.

> One option I've been toying with lately is dumping the mdrun performance
> data matrix to XML (or something) so that some existing plotting machinery
> can show the trend over time (and also observe a per-commit delta large
> enough to vote -1 on Jenkins).

Isn't per commit is an overkill an ill-scaling setup? I think it's
better to do less frequent (e.g. weekly) and per-reqeust performance
regression tests. Proper testing anyway requires running dozens of
combinations of input+launch configurations on several platforms. It's
not a fun task, I know because I've been doing quite extensive
performance testing semi-manually.

--
Szilárd

> I also mean to have a poke around with
> http://www.phoromatic.com/ to see if maybe it already has infrastructure we
> could use.
>
> Mark
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
> send a mail to gmx-developers-request at gromacs.org.