[gmx-developers] New Test Set
David van der Spoel
spoel at xray.bmc.uu.se
Sat Feb 11 08:35:08 CET 2012
On 2012-02-10 20:17, Roland Schulz wrote:
> On Sun, Feb 5, 2012 at 3:53 PM, Shirts, Michael (mrs5pt)
> <mrs5pt at eservices.virginia.edu <mailto:mrs5pt at eservices.virginia.edu>>
> Hi, all-
> My opinion:
> I think there should probably be two classes of sets -- fast fully
> sets, and more sophisticated content validation sets.
> For the fast fully automated test, I would suggest:
> -testing a large range of input .mdps, tops and gros for whether
> they run
> through grompp and mdrun. Not performing whether the output is
> correct or
> not, because that is very hard to automate -- just testing whether
> it runs
> for, say 20-100 steps or so.
> Yes having that set of inputs is needed. Should we start a wiki page to
> start listing all the inputs we want to include? Or what would be the
> best way to collaborative creating this set of inputs?
A while ago I talked about this with Nicu who works with Siewert Jan
Marrink. He is a software engineer by training and suggested the
following. For each import parameter you take extreme values (e.g.
timestep 5 fs and 0.001 fs) and a random value in between. Then there
would be N^3 different parameter combinations for N parameters which
probably is way too many combination, even if N would be only 20.
Therefore you now pick a subset of, say 200 or 1000 out of these N^3
possible tests, and this becomes the test set. With such a set-up it is
quite easy to see that we'd test at least the extreme value which are
possible where things can go wrong. A few of these tests would actually
be prohibited by grompp too, but in all likelihood not nearly enough.
At the time when Nicu & I discussed this we even considered publishing
this, since I am not aware of another scientific code that has such
rigorous testing tools.
> Longer term, we should look more at validating code at a physical level.
> Clearly testing energy conservation is a good idea for integrators; it's
> fairly sensitive. I think we need to discuss a bit more about how to
> evaluate energy conservation. This actually can take a fair amount
> of time,
> and I'm thinking this is something that perhaps should wait for 5.0.
> thermostats and barostats, I'm working on a very sensitive test of
> validity. I'll email a copy to the list when it's ready to go (1-2
> and this is something that can also be incorporated in an integrated
> regime, but again, these sorts of tests will take multiple hours, not
> seconds. That sort of testing level can't be part of the day to
> day build.
> Well even if the tests take ~2000CPU-hours, I think we (maybe even Erik
> by himself) have the resources to run this weekly.
> > - What are the requirements for the new test set? E.g. how easy
> should it
> > be to see whats wrong when a test fails?
> For the first set of tests, I can imagine that it would be nice to
> be able
> to look at the outputs of the tests, and diff different outputs
> corresponding to different code versions to help track down changes
> But I'm suspicious about making the evaluation of these tests decided on
> automatically at first. I agree such differences should EVENTUALLY be
> automated, but I'd prefer several months of investigation and discussion
> before deciding exactly what "correct" is.
> I think a wrong reference value is better than no reference value. Even
> a wrong reference value would allow us to detect if e.g. different
> compilers give significant different results (maybe some give the
> correct value). Also it would help to avoid adding additional bugs. Of
> course we shouldn't release the test set to the outside before we are
> relative sure that it actually correct.
> > Should the test support being run
> > under valgrind? Other?
> Valgrind is incredibly slow and can fail for weird reasons -- I'm
> not sure
> it would add much to do it under valgrind.
> I have the current C++ tests (those written by Teemu) running under
> valgrind in Jenkins. It wasn't very hard to write a few suppression
> rules to make valgrind not report any false positives. Now Jenkins
> can automatically fail the build if the code has any memory errors.
> Obviously one woudn't run any of the long running tests with valgrind.
> But for the unit tests I think it might be very useful to catch bugs.
> I DON'T think we should have any test set that starts to look at more
> complicated features right now -- it will take months to get that
> and we need to get 4.6 out of the door on the order of weeks, so we
> can move
> on to the next improvements. 4.6 doesn't have to be perfectly
> flawless, as
> long as it's closer to perfect than 4.5.
> My reason for delaying the 4.6 release would not be to improve the 4.6
> release. I agree with you we probably can't guarantee that the reference
> value are correct in time anyhow, so we probably wouldn't even want to
> ship the tests with 4.6. My worry is that as soon as 4.6 is out the
> focus is on adding new cool features instead of working on these boring
> tasks we should do, because they help us in the long run. E.g. if we
> would have agreed that we don't have a 4.6 release, the C++ conversion
> would most likely be much further along. And I don't see how we can
> create an incentive mechanism to work on these issues without somehow
> coupling it to releases.
> Michael Shirts
> Assistant Professor
> Department of Chemical Engineering
> University of Virginia
> michael.shirts at virginia.edu <mailto:michael.shirts at virginia.edu>
> (434)-243-1821 <tel:%28434%29-243-1821>
> gmx-developers mailing list
> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
> 865-241-1537, ORNL PO BOX 2008 MS6309
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
spoel at xray.bmc.uu.se http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers