[gmx-developers] Test Suite, CTest, CDash

Shirts, Michael (mrs5pt) mrs5pt at eservices.virginia.edu
Thu Aug 26 15:31:52 CEST 2010

> So I'm going to sound a bit negative here, but I'm skeptical that it's
> possible to do a sound job of a MD regression suite. Who's got a computer
> science theoretician looking for a project? :-)

As others have pointed out point, a somewhat sound job is still far better
than the absence of a job.

A few thoughts: 

1) As algorithms and implementations change, a single reference version
doesn't necessarily make that much sense.  For example, if a change is made
in how the PME is calculated that makes it a bit more accurate, all of the
sudden it becomes impossible to compare anything with electrostatics to the
reference version. 

Instead, what I would argue is that instead of comparing to a fixed
reference, any change in code that affects the output when compared to a
previous commit needs to be justified and noted, and then either resolved as
a bug fix, or recorded as being the intended consequence of the change.

2) To make this happen, there really needs to be a suite that runs every
night, over as many conditions as can be checked in that amount of time, and
prints a report in an easily accessible place that gets checked every time.
Sending an email of the list to gmx_development might get a bit
overwhelming, but someone should be checking every night to see, and
complaining to people if there are issues that they are not resolving.  One
solution would perhaps be to automatically email only the people who
committed code during that day?

3) In most cases, for these sorts of delta checks, ultra-short runs are
sufficient.  Indeed, probably nlist steps are needed at most (so at least
one round of neighborlist searches is included), if we are looking at exact
binary matches.  If it doesn't show up then, it is unlikely to show up
later.  There will be some binary differences that are just because of
floating point errors, but can probably be verified and checked easily.

4) Starting a list of options that should be checked would be a good thing
to put on the wiki.  I don't have time to manage a page like this now, but
would commit to helping update it.

5) The other thing that matters, in the end, is whether the physics is
correct.  So fundamentally, there should be tests that verify this.  For
example, for reversible integrators, the RMSD error should be proportional
to the square of dt as dt is decreased. For symplectic integrators (both md
and md-vv w/o coupling), there should be no drift over time at all (side
note: we should eventually try to base the implementation on symplectic
integrators; right now, only non-coupling is symplectic.)  These sorts of
validations don't need to be run as often, but should be run, and the code
should really emphasize methods that have good properties.  I have a test
suite that looks at this, and would be willing to help integrate it.  This
is something that might not quite work as an overnight suite, though.

6) I think one serious discussion that needs to be held is pruning
functionality: kruft is the bane of all long term software projects, as it
makes everything gradually more and more difficult to implement.  Features
that are easy to maintain (or could be made easy to maintain with a bit of
re-engineering) are not a problem; it's features that make the code
relatively difficult to code other features in that are the problem.  The
more kruft, the harder it is to write bug free code.

There should likely be a developers wiki page with features nominated, which
would allow these sorts of discussions to take place.  Things shouldn't be
axed without warning.  In many cases, recoding to fit more cleanly into the
architecture might allow avoidance of elimination.

Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu

More information about the gromacs.org_gmx-developers mailing list