[gmx-developers] Test Suite, CTest, CDash

Wed Aug 25 16:44:03 CEST 2010

>
>
> ----- Original Message -----
> From: "Esztermann, Ansgar" <Ansgar.Esztermann at mpi-bpc.mpg.de>
> Date: Wednesday, August 25, 2010 22:49
> Subject: [gmx-developers] Test Suite, CTest, CDash
> To: Discussion list for GROMACS development <gmx-developers at gromacs.org>
>
>> Hello everyone,
>>
>>
>> a few years ago, I have been briefly involved in the gcc
>> project. I was extremely impressed by the huge test suite and by
>> the way it could be used to quickly detect regressions. Putting
>> aside for the moment the fact that "correct behaviour" is much
>> easier to define for a compiler than for a simulation engine, I
>> think that gromacs would profit from a similar test suite.
>
> Agreed. There are huge combinatorial and chaotic problems, though. I'd
> expect there must be some literature on dealing with the former.
>
> Testing an MD implementation might use a non-trivial amount of computing
> resources. I've seen more than a few bugs that required a combination of
> three or more moderately unusual conditions to occur.  Those are the ones
> that regression suites are really useful for finding. However how do you
> store the reference results? If you don't store them, you have to
> recompute them, which requires having the reference version of the code
> compiled, and some compute time.
>
>> So I've taken Mark Abrahams' Regression Tests and started to
>
> Actually, I think David van der Spoel wrote them originally. I just did
> some updates a year or more ago. Rossen has done some recent things - see
> threads on gmx-developers.
>
>> convert them to CTest. The simple and complex tests are already
>> in my local git repository (although some of them are still
>> failing). There is CDash support as well:
>> http://my.cdash.org/index.php?project=Gromacs
>> I'm planning to extend and maintain these tests. Offhand, these
>> points strike me as important:
>>
>> -Integrate more of Mark's tests (i.e. kernel tests, double
>> precision tests)
>> -Fix the tests that are failing (or the software, if the failure
>> is genuine)
>
> The failing tests mostly fail on the virial calculation, because that's
> sensitive to just about everything else. There ought to be a tolerance
> within which the calculation is acceptably accurate, however. How do we
> work out what that is?
>

The same fundamental problems is also present for energies.
MD is chaotic and diverges exponentially.

I have recently fixed gmxcheck for 4.5 to check off-diagonal
tensor elements relatively against the diagonal. This fixes most
virial issues, which were caused by the off-diagonal terms that
fluctuate around zero.

I was thinking of a more general fix for checks.
Relative error checks on energy terms are nonsense,
since the allowed error is not necessarily related to the magnitude
of the term.
A possible fix would be to compare the energy and virial terms
against a fraction of the kinetic energy. This would take care
of both the system size and temperature dependencies of the deviations.

Berk

>> -Add more tests (unit tests, "bugzilla" tests triggering known bugs)
>> -Add coverage support
>
> One of my more frustrating experiences was a bug provoked when I did an
> "mdrun -rerun" on a 3.3.3 trajectory under 4.0.x. Trajectories written
> under each version have different properties, including whether broken
> molecules might be written, and the consequences of one such difference
> drove me up the wall for weeks. (Part of the problem was my being ignorant
> of the possibility that trajectory properties might have changed, of
> course!)
>
> Now, suppose there'd been a regression suite during the 3.3.3 - 4.0
> transition that was capable of flagging to the developers that, under some
> conditions, a 3.3.3 trajectory would rerun differently under 4.0. We'd
> have to have gotten lucky about the events in the 3.3.3 trajectory, and in
> the 4.0 rerun. Even so, there'd be no good solution, because the
> trajectory formats don't embed any version number. So we'd either have to
> have implemented after-the-fact a magic number scheme (e.g. as used for
> testing endian-ness) that changed with version number of the code that
> produced it, or made 4.0 print a warning against rerunning 3.3.3
> trajectories. Neither of those solutions is all that palatable. (Of
> course, prevention is better than cure - a file format that encodes a
> version number of the program writing it is better than one that doesn't,
> but that's not obvious when you go to design the format!)
>
> So I'm going to sound a bit negative here, but I'm skeptical that it's
> possible to do a sound job of a MD regression suite. Who's got a computer
> science theoretician looking for a project? :-)
>
> Mark
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.