[gmx-developers] New Test Set

Sat Feb 11 17:27:55 CET 2012

> Yes having that set of inputs is needed. Should we start a wiki page to
> start listing all the inputs we want to include? Or what would be the best
> way to collaborative creating this set of inputs?

A wiki page is good.  I can commit to spending an hour or so this weekend
discussing each parameter and what issues can come up. I can start putting
together a variety of .mdp and .top files.

To what extent can we start with the older test suite, and modify it?

> Well even if the tests take ~2000CPU-hours, I think we (maybe even Erik by
> himself) have the resources to run this weekly.

We can definitely come up with a large number if physical tests that can
come in under that amount of time.

>> For the first set of tests, I can imagine that it would be nice to be able
>> to look at the outputs of the tests, and diff different outputs
>> corresponding to different code versions to help track down changes were.
>> But I'm suspicious about making the evaluation of these tests decided on
>> automatically at first.  I agree such differences should EVENTUALLY be
>> automated, but I'd prefer several months of investigation and discussion
>> before deciding exactly what "correct" is.
>> 
> 
> I think a wrong reference value is better than no reference value. Even a
> wrong reference value would allow us to detect if e.g. different compilers
> give significant different results (maybe some give the correct value).
> Also it would help to avoid adding additional bugs. Of course we shouldn't
> release the test set to the outside before we are relative sure that it
> actually correct.

I'm just saying that we shouldn't, at first, have automatic failures if the
reference values change.  I very much agree with SAVING the results with
each build so that changes can be tracked down.

> I have the current C++ tests (those written by Teemu) running under
> valgrind in Jenkins. It wasn't very hard to write a few suppression rules
> to make valgrind not report any false positives. Now Jenkins
> can automatically fail the build if the code has any memory errors.
> Obviously one woudn't run any of the long running tests with valgrind. But
> for the unit tests I think it might be very useful to catch bugs.

In that case, running some subset of the cases under valgrid certainly makes
sense. 

> My reason for delaying the 4.6 release would not be to improve the 4.6
> release. I agree with you we probably can't guarantee that the reference
> value are correct in time anyhow, so we probably wouldn't even want to ship
> the tests with 4.6. My worry is that as soon as 4.6 is out the focus is on
> adding new cool features instead of working on these boring tasks we should
> do, because they help us in the long run. E.g. if we would have agreed that
> we don't have a 4.6 release, the C++ conversion would most likely be much
> further along. And I don't see how we can create an incentive mechanism to
> work on these issues without somehow coupling it to releases.

But if we talk about releases without deciding on anything, then everybody
keeps developing new stuff.  At some point, we need to agree on what
conditions 4.6 will statisfy, and get it out the door, so everyone that is
developing new stuff has to do it within the context of master.

Best,
~~~~~~~~~~~~
Michael Shirts
Assistant Professor
Department of Chemical Engineering
University of Virginia
michael.shirts at virginia.edu
(434)-243-1821