[gmx-developers] New Test Set

Sat Feb 11 08:35:08 CET 2012

On 2012-02-10 20:17, Roland Schulz wrote:
>
>
> On Sun, Feb 5, 2012 at 3:53 PM, Shirts, Michael (mrs5pt)
> <mrs5pt at eservices.virginia.edu <mailto:mrs5pt at eservices.virginia.edu>>
> wrote:
>
>     Hi, all-
>
>     My opinion:
>
>     I think there should probably be two classes of sets -- fast fully
>     automated
>     sets, and more sophisticated content validation sets.
>
>     For the fast fully automated test, I would suggest:
>     -testing a large range of input .mdps, tops and gros for whether
>     they run
>     through grompp and mdrun. Not performing whether the output is
>     correct or
>     not, because that is very hard to automate -- just testing whether
>     it runs
>     for, say 20-100 steps or so.
>
>
> Yes having that set of inputs is needed. Should we start a wiki page to
> start listing all the inputs we want to include? Or what would be the
> best way to collaborative creating this set of inputs?

A while ago I talked about this with Nicu who works with Siewert Jan 
Marrink. He is a software engineer by training and suggested the 
following. For each import parameter you take extreme values (e.g. 
timestep 5 fs and 0.001 fs) and a random value in between. Then there 
would be N^3 different parameter combinations for N parameters which 
probably is way too many combination, even if N would be only 20. 
Therefore you now pick a subset of, say 200 or 1000 out of these N^3 
possible tests, and this becomes the test set. With such a set-up it is 
quite easy to see that we'd test at least the extreme value which are 
possible where things can go wrong. A few of these tests would actually 
be prohibited by grompp too, but in all likelihood not nearly enough.

At the time when Nicu & I discussed this we even considered publishing 
this, since I am not aware of another scientific code that has such 
rigorous testing tools.

>
>
>     Longer term, we should look more at validating code at a physical level.
>     Clearly testing energy conservation is a good idea for integrators; it's
>     fairly sensitive.  I think we need to discuss a bit more about how to
>     evaluate energy conservation.  This actually can take a fair amount
>     of time,
>     and I'm thinking this is something that perhaps should wait for 5.0.
>       For
>     thermostats and barostats, I'm working on a very sensitive test of
>     ensemble
>     validity. I'll email a copy to the list when it's ready to go (1-2
>     weeks?),
>     and this is something that can also be incorporated in an integrated
>     testing
>     regime, but again, these sorts of tests will take multiple hours, not
>     seconds.   That sort of testing level can't be part of the day to
>     day build.
>
> Well even if the tests take ~2000CPU-hours, I think we (maybe even Erik
> by himself) have the resources to run this weekly.
>
>
>
>      > - What are the requirements for the new test set? E.g. how easy
>     should it
>      > be to see whats wrong when a test fails?
>
>     For the first set of tests, I can imagine that it would be nice to
>     be able
>     to look at the outputs of the tests, and diff different outputs
>     corresponding to different code versions to help track down changes
>     were.
>     But I'm suspicious about making the evaluation of these tests decided on
>     automatically at first.  I agree such differences should EVENTUALLY be
>     automated, but I'd prefer several months of investigation and discussion
>     before deciding exactly what "correct" is.
>
>
> I think a wrong reference value is better than no reference value. Even
> a wrong reference value would allow us to detect if e.g. different
> compilers give significant different results (maybe some give the
> correct value). Also it would help to avoid adding additional bugs. Of
> course we shouldn't release the test set to the outside before we are
> relative sure that it actually correct.
>
>      > Should the test support being run
>      > under valgrind? Other?
>
>     Valgrind is incredibly slow and can fail for weird reasons -- I'm
>     not sure
>     it would add much to do it under valgrind.
>
> I have the current C++ tests (those written by Teemu) running under
> valgrind in Jenkins. It wasn't very hard to write a few suppression
> rules to make valgrind not report any false positives. Now Jenkins
> can automatically fail the build if the code has any memory errors.
> Obviously one woudn't run any of the long running tests with valgrind.
> But for the unit tests I think it might be very useful to catch bugs.
>
>     I DON'T think we should have any test set that starts to look at more
>     complicated features right now -- it will take months to get that
>     working,
>     and we need to get 4.6 out of the door on the order of weeks, so we
>     can move
>     on to the next improvements.  4.6 doesn't have to be perfectly
>     flawless, as
>     long as it's closer to perfect than 4.5.
>
>
> My reason for delaying the 4.6 release would not be to improve the 4.6
> release. I agree with you we probably can't guarantee that the reference
> value are correct in time anyhow, so we probably wouldn't even want to
> ship the tests with 4.6. My worry is that as soon as 4.6 is out the
> focus is on adding new cool features instead of working on these boring
> tasks we should do, because they help us in the long run. E.g. if we
> would have agreed that we don't have a 4.6 release, the C++ conversion
> would most likely be much further along. And I don't see how we can
> create an incentive mechanism to work on these issues without somehow
> coupling it to releases.
>
> Roland
>
>
>     Best,
>     ~~~~~~~~~~~~
>     Michael Shirts
>     Assistant Professor
>     Department of Chemical Engineering
>     University of Virginia
>     michael.shirts at virginia.edu <mailto:michael.shirts at virginia.edu>
>     (434)-243-1821 <tel:%28434%29-243-1821>
>
>
>
>     --
>     gmx-developers mailing list
>     gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>     http://lists.gromacs.org/mailman/listinfo/gmx-developers
>     Please don't post (un)subscribe requests to the list. Use the
>     www interface or send it to gmx-developers-request at gromacs.org
>     <mailto:gmx-developers-request at gromacs.org>.
>
>
>
>
>
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
> 865-241-1537, ORNL PO BOX 2008 MS6309
>
>

-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se