[gmx-users] gromp error for gmxtest-4.0.4 and 3.3.3 on new gromacs-4.0.5 install

Mon Nov 30 23:30:24 CET 2009

mrshirts at gmail.com wrote:
> What other input might you need for a test set? As a minor developer and 
> a stickler for accuracy, I would be very much interested in the sorts of 
> inputs your looking for, and have some ideas as well.

There's a number of issues listed in the posts in the URLs below, chief 
among them the absence of a bug-free reference version of GROMACS. The 
other issues mostly arise because if there's no documentation of what is 
being tested *for*, it's hard to do maintenance on the test.

There's currently no tests designed for GROMACS in parallel. It's far 
from clear that there's a suitable reference GROMACS version anyway.

One clear need is a mechanism to permit features to be tested in 
combination in an automated manner. The set of "complex" tests that 
already exist are a good start, but they're far from complete. It should 
be possible to ask a script to test thermostats in (X,Y) with barostats 
in (W,Z), using -sum/-nosum with the constraint of -npme 0. (This is not 
at all silly - I spent several weeks this year proving that I'd found a 
GROMACS bug. It transpired that the problem was with the V-rescale 
thermostat under -nosum, and I only noticed that because I was using 
-rerun!) To avoid combinatorial explosion of the reference data, that 
data would have to be generated at the same time as the test data. Thus 
the user would need to have installed some known good GROMACS version, 
and done some "bootstrap" correctness tests of that against supplied 
reference data, before moving on to more complex cases with 
user-generated reference data. This requires that the script "know" how 
to test each feature, so that it can correctly construct reference and 
test runs. The above example is easy - the script knows that to test a 
thermostat or barostat, the reference and test .mdp files need to have a 
certain form, and testing a command line flag is easier still. The 
script would also need to know how to reject tests of mutually-exclusive 
features.

In principle, each new feature implemented should be regarded as 
incomplete until there's a test that functions correctly. This means 
that the author of the feature needs to designate a GROMACS version that 
is a suitable reference case (e.g. you can't test V-rescale against a 
3.x reference version because it wasn't implemented back then!) That 
becomes rapidly untenable for a user of the test suite, since they would 
have to have access to multiple different versions - there'd have to be 
a web server for providing reference data. There's further complications 
if testing feature A (whose reference version is 4.0.2) in combination 
with feature B (whose reference version is 4.0.4). Clearly you'd have to 
use at least 4.0.4 to generate a reference case for A & B together, and 
then have to test that A alone in 4.0.4 is correct with respect to 4.0.2.

I don't know how to bring order to this chaos! I do know that the lack 
of a solution will continue to cost everyone time and money doing broken 
simulations and chasing bugs.

Mark

>  > Frankly, the mentions of the test set on the web pages are 
> misleading. gmxtest-4.0.4 doesn't serve its purpose. gmxtest-3.3.3 was 
> useful for GROMACS 3-series installs, I expect, but almost nobody wants 
> to be using GROMACS 3. I've done a bunch of improvements on the 
> publicly-available git version 
> (http://lists.gromacs.org/pipermail/gmx-developers/2009-August/003573.html 
> and 
> http://lists.gromacs.org/pipermail/gmx-developers/2009-August/003586.html), 
> but input is needed from people other than me before a useful test set 
> can be released.
>  >
>  >
>  >
>  > Mark