[gmx-developers] regression tests - git failing?

Sat Aug 8 07:05:51 CEST 2009

Mark Abraham wrote:
> On 08/07/09, Berk Hess <hess at cbr.su.se> wrote:
>> Mark Abraham wrote:
>>> Michael Shirts wrote:
>>>> Hi, guys-
>>>>
>>>> I just pulled the latest git directory to see how my own changes were
>>>> faring on the regression tests -- I got the following failures.
>>>>
>>>> single precision:
>>>> Testing aminoacids . . . FAILED. Check checkpot.out (      35 errors),
>>>> checkvir.out (402 errors) files in aminoacids
>>>>
>>>> double precision:
>>>> Testing aminoacids . . . FAILED. Check checkpot.out (      35 errors),
>>>> checkvir.out (402 errors) files in aminoacids
>>>> N      Reference   This test
>>>>    10    -39.0984    -32.3783
>>>>    11    -39.0984    -32.3783
>>>>    12    -93.8123    -75.8899
>>>>    13    -93.8123    -75.8899
>>>>    21    -11007.5    -9914.46
>>>> There were 5 differences in final energy with the reference file
>>>> All 45 pdb2gmx tests PASSED
>>>> pdb2gmx tests FAILED
>>>>
>>>> Now that we have a good regression test set that runs in just a few
>>>> minutes, and have public git repositories that make it easier to share
>>>> beta code, I'm wondering if it would make sense to ask that commits to
>>>> the main repository pass the regression tests -- this would make it
>>>> easier to locate problems and eliminate many coding errors.
>>> There is a current problem with a handful of the regression tests
>>> inasmuch as the reference values are still computed with a 3.3.2
>>> version (IIRC). These manifest as a checkvir issue. I haven't
>>> committed the time to solving it, though as I did the last fixes to
>>> the regression tests, I probably should.
>>>
>>> Mark
>> This is due a change in the Berendsen (and v-rescale) termostat I made.
>> The velocties at t-dt/2 are now scaled iso at t+dt/2, this provides much
>> better energy conservation
>> with the v-rescale thermostat.
>>
>> I guess the best solution would be to remove temperature coupling in all
>> test sets that are
>> not intended to test temperature coupling. (also Berendsen is not the
>> right algorithm to test).
>> We would probably want the initial temperature to be around 300 K.
>> The references could still be made with 3.3 to keep the test sets
>> backwards compatible.
> 
> 
> OK, I can do that.
> 
> 1) Generate reference data with 3.3.3 with .mdp options that 4.x can reproduce, i.e. largely without T-coupling and with suitable grid search options.
> 
> 2) Generate a v-rescaling test with 4.0.5, and note that failure will be expected if testing with 3.3.x
> 
> Any other gotchas?

Found a gotcha. DvdS's README notes that "For the kernel tests, 
reference states were calculated with the double precision C loops in 
Gromacs 3.3, after making sure the results were identical to those of 
Gromacs 3.2". That would seem to imply that we intend to test the 
kernels against double precision C kernels regardless of the precision 
of the kernel under testing.

This doesn't seem like a valid test. For at least kernel010, the 
double-precision reference .trr is twice the size of the 
single-precision, and the double-precision .tpr is larger than the 
single-precision .tpr by about the same absolute margin, suggesting that 
these references really were produced by GROMACS compiled at the 
corresponding precision, contrary to the README. Comments?

Mark