[gmx-developers] New Test Set

Fri Feb 10 20:17:32 CET 2012

On Sun, Feb 5, 2012 at 3:53 PM, Shirts, Michael (mrs5pt) <
mrs5pt at eservices.virginia.edu> wrote:

> Hi, all-
>
> My opinion:
>
> I think there should probably be two classes of sets -- fast fully
> automated
> sets, and more sophisticated content validation sets.
>
> For the fast fully automated test, I would suggest:
> -testing a large range of input .mdps, tops and gros for whether they run
> through grompp and mdrun. Not performing whether the output is correct or
> not, because that is very hard to automate -- just testing whether it runs
> for, say 20-100 steps or so.
>

Yes having that set of inputs is needed. Should we start a wiki page to
start listing all the inputs we want to include? Or what would be the best
way to collaborative creating this set of inputs?

>
> Longer term, we should look more at validating code at a physical level.
> Clearly testing energy conservation is a good idea for integrators; it's
> fairly sensitive.  I think we need to discuss a bit more about how to
> evaluate energy conservation.  This actually can take a fair amount of
> time,
> and I'm thinking this is something that perhaps should wait for 5.0.  For
> thermostats and barostats, I'm working on a very sensitive test of ensemble
> validity. I'll email a copy to the list when it's ready to go (1-2 weeks?),
> and this is something that can also be incorporated in an integrated
> testing
> regime, but again, these sorts of tests will take multiple hours, not
> seconds.   That sort of testing level can't be part of the day to day
> build.
>
Well even if the tests take ~2000CPU-hours, I think we (maybe even Erik by
himself) have the resources to run this weekly.

>
>
> > - What are the requirements for the new test set? E.g. how easy should it
> > be to see whats wrong when a test fails?
>
> For the first set of tests, I can imagine that it would be nice to be able
> to look at the outputs of the tests, and diff different outputs
> corresponding to different code versions to help track down changes were.
> But I'm suspicious about making the evaluation of these tests decided on
> automatically at first.  I agree such differences should EVENTUALLY be
> automated, but I'd prefer several months of investigation and discussion
> before deciding exactly what "correct" is.
>

I think a wrong reference value is better than no reference value. Even a
wrong reference value would allow us to detect if e.g. different compilers
give significant different results (maybe some give the correct value).
Also it would help to avoid adding additional bugs. Of course we shouldn't
release the test set to the outside before we are relative sure that it
actually correct.

> > Should the test support being run
> > under valgrind? Other?
>
> Valgrind is incredibly slow and can fail for weird reasons -- I'm not sure
> it would add much to do it under valgrind.
>
I have the current C++ tests (those written by Teemu) running under
valgrind in Jenkins. It wasn't very hard to write a few suppression rules
to make valgrind not report any false positives. Now Jenkins
can automatically fail the build if the code has any memory errors.
Obviously one woudn't run any of the long running tests with valgrind. But
for the unit tests I think it might be very useful to catch bugs.

> I DON'T think we should have any test set that starts to look at more
> complicated features right now -- it will take months to get that working,
> and we need to get 4.6 out of the door on the order of weeks, so we can
> move
> on to the next improvements.  4.6 doesn't have to be perfectly flawless, as
> long as it's closer to perfect than 4.5.
>

My reason for delaying the 4.6 release would not be to improve the 4.6
release. I agree with you we probably can't guarantee that the reference
value are correct in time anyhow, so we probably wouldn't even want to ship
the tests with 4.6. My worry is that as soon as 4.6 is out the focus is on
adding new cool features instead of working on these boring tasks we should
do, because they help us in the long run. E.g. if we would have agreed that
we don't have a 4.6 release, the C++ conversion would most likely be much
further along. And I don't see how we can create an incentive mechanism to
work on these issues without somehow coupling it to releases.

Roland

>
> Best,
> ~~~~~~~~~~~~
> Michael Shirts
> Assistant Professor
> Department of Chemical Engineering
> University of Virginia
> michael.shirts at virginia.edu
> (434)-243-1821
>
>
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
>
>

-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120210/01057f9b/attachment.html>