[gmx-users] gromp error for gmxtest-4.0.4 and 3.3.3 on new gromacs-4.0.5 install

Wed Dec 2 00:43:44 CET 2009

Mark Abraham wrote:
> mrshirts at gmail.com wrote:
>> What other input might you need for a test set? As a minor developer 
>> and a stickler for accuracy, I would be very much interested in the 
>> sorts of inputs your looking for, and have some ideas as well.
> 

I'd be willing to help, as well.

> There's a number of issues listed in the posts in the URLs below, chief 
> among them the absence of a bug-free reference version of GROMACS. The 
> other issues mostly arise because if there's no documentation of what is 
> being tested *for*, it's hard to do maintenance on the test.
> 

Can we guarantee any version will ever be bug-free?  Or should there just be a 
test set for every version such that the tests succeed given the inherent 
limitations or potential bugs in the software?  This could get a bit laborious, 
re-creating reference data for every version, but might be the most thorough way 
to proceed.

> There's currently no tests designed for GROMACS in parallel. It's far 
> from clear that there's a suitable reference GROMACS version anyway.
> 

Would it even be possible to design a meaningful parallel set, given the 
inherent potential for deviations due to, i.e. dynamic load balancing?  Even 
mdrun -reprod doesn't completely guarantee reproducibility, does it?  Might some 
of the more advanced features also depend on the FFT implementation, as well?

> One clear need is a mechanism to permit features to be tested in 
> combination in an automated manner. The set of "complex" tests that 
> already exist are a good start, but they're far from complete. It should 
> be possible to ask a script to test thermostats in (X,Y) with barostats 
> in (W,Z), using -sum/-nosum with the constraint of -npme 0. (This is not 
> at all silly - I spent several weeks this year proving that I'd found a 
> GROMACS bug. It transpired that the problem was with the V-rescale 
> thermostat under -nosum, and I only noticed that because I was using 
> -rerun!) To avoid combinatorial explosion of the reference data, that 
> data would have to be generated at the same time as the test data. Thus 
> the user would need to have installed some known good GROMACS version, 
> and done some "bootstrap" correctness tests of that against supplied 
> reference data, before moving on to more complex cases with 
> user-generated reference data. This requires that the script "know" how 
> to test each feature, so that it can correctly construct reference and 
> test runs. The above example is easy - the script knows that to test a 
> thermostat or barostat, the reference and test .mdp files need to have a 
> certain form, and testing a command line flag is easier still. The 
> script would also need to know how to reject tests of mutually-exclusive 
> features.
> 

That does sound relatively simple to do, but would probably also require a bit 
of re-organization in the test set.  For example, instead of the four or so 
directories now, we'd probably have to expand to substantially more depending on 
the features being tested (which also helps in determining what the tests do). 
README files are also a must in each directory, similar to the AMBER test set.

> In principle, each new feature implemented should be regarded as 
> incomplete until there's a test that functions correctly. This means 
> that the author of the feature needs to designate a GROMACS version that 
> is a suitable reference case (e.g. you can't test V-rescale against a 
> 3.x reference version because it wasn't implemented back then!) That 
> becomes rapidly untenable for a user of the test suite, since they would 
> have to have access to multiple different versions - there'd have to be 
> a web server for providing reference data. There's further complications 
> if testing feature A (whose reference version is 4.0.2) in combination 
> with feature B (whose reference version is 4.0.4). Clearly you'd have to 
> use at least 4.0.4 to generate a reference case for A & B together, and 
> then have to test that A alone in 4.0.4 is correct with respect to 4.0.2.
> 
> I don't know how to bring order to this chaos! I do know that the lack 
> of a solution will continue to cost everyone time and money doing broken 
> simulations and chasing bugs.
> 

Would it be useful to start a wiki page on the topic, perhaps somewhere within 
the development section, sort of like what was once done for features to be 
implemented in the main software?  That way, there's a central site for listing 
ideas, comments, and progress.

-Justin

-- 
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================