[gmx-developers] Does GROMACS pass random numbers through checkpoint/restart files?

Mon Apr 21 09:51:02 CEST 2008

David van der Spoel wrote:
> David Cerutti wrote:
>> Hello,
>>
>>    I'm not a frequent user of gromacs, but I'm on your developers 
>> list nonetheless... cause for introspection.
>>
>>    I noticed a very bizarre yet scary artifact in some simulations I 
>> was doing with a Langevin thermostat, and with some help of the AMBER 
>> code developers I traced it to the fact that I was feeding in the 
>> same random number seed with every restart of a new simulation 
>> segment.  The effect was to destabilize my protein to varying 
>> degrees, depending on how frequently I checkpointed the run.  The 
>> developers are now implementing a patch to address this behavior in 
>> the current release, but a more rigorous solution such as the way 
>> CHARMM passes the pseudo-random state vector through its checkpoint 
>> files will not be available until some future release.
>>
>>    The implications of what I'm finding could be summarized thus: 
>> every simulation done with AMBER, using a Langevin thermostat 
>> (probably an Andersen thermostat as well) is affected by this 
>> repeating random numbers problem UNLESS the user specified IGB=-1 (a 
>> Generalized Born parameter that can be used to set the random seed 
>> based on the clock time) or rewrote the input file with every restart 
>> (which is what I'm doing now). In the extreme case of 1000 steps per 
>> restart, the protein will unfold; in more subtle cases, with perhaps 
>> 25,000 to 100,000 steps per restart, it appears that the protein may 
>> be slightly destabilized.  I have a 145ns simulation of a 
>> dimer-of-dimers protein that breaks along its weak interface 110ns 
>> into the run, but I'm thinking that it tracks back to this random 
>> numbers problem.  There was a portion of the run where I lengthened 
>> the segments to 1,000,000 steps, which I now see corresponds to a 
>> brief healing of the tetramer before I went back to 100,000 steps and 
>> blew it apart--wild!
>>
>>    I am writing up some more tests as a communication to make sure 
>> that other investigators know this can happen.  I, for one, have just 
>> lost over 500ns of MD to this artifact, and I suspect that it lurks 
>> in many, many studies in the literature.  One of the developers also 
>> commented that numerous users were frustrated with the Langevin 
>> thermostat and so went back to Berendsen weak-coupling; it may all 
>> track back to these random numbers.
>>
>>    As it appears in the manual, GROMACS may be vulnerable to this 
>> sort of error if the user does not specify Id_seed=-1, unless the 
>> random number state vector is being saved in the checkpoint file for 
>> the next restart.  Please let me know how the code handles this issue.
>>
>> Thanking you,
>>
>> Dave Cerutti
>
> Dave,
>
> thanks for an insightful comment. I think we should investigate this 
> further. Very recently I learned of an approach for Monte Carlo 
> simulations of particles (in a very different) field, where each atom 
> carries it's own random number seed (and state variables), obviously 
> this state should be conserved over restarts. This is in particular 
> important for parallel runs, and if you continue a simulation on a 
> different number of processors. The overhead is not more than some 
> memory per atom. The reference for the paper is:
>
> The Monte Carlo Method: Versatility Unbounded In A Dynamic Computing 
> World, Chattanooga, Tennessee, April 17–21, 2005, on CD-ROM, American 
> Nuclear Society, LaGrange Park, IL (2005)
> PARALLEL MONTE CARLO PARTICLE TRANSPORT AND THE QUALITY OF RANDOM 
> NUMBER GENERATORS: HOW GOOD IS GOOD ENOUGH?
> Richard Procassini and Bret Beck
>
> Cheers,
Keeping a random state attached to a particle is nearly never important.
In that way you make a simulation reproducable. But as long as you have 
more different
random states than particles, this is never an issue for normal 
production runs.
Also this is easy to implement for particle decomposition, but the 
communication can be very
costly with domain decomposition.

The issue brought up by Dave can be very problematic, but only when the 
state is heavily influenced
by the random numbers. This will be the case for Monte Carlo and 
Brownian dynamics simulations.
During my PhD work, I experienced a nasty case of recurring dynamics 
with BD.
With stochastic dynamics (Langevin dynamics with momenta), the 
velocities will often contribute more
to the state than the random numbers. But for a protein in vacuum, or 
with implicit solvent (currently
not implemented in Gromacs), the velocity space could possibly be so 
small that the random state
can have a significant influence on the results.

In Gromacs 3 the random state is not stored, and even worse, with 
ld_seed=-1 grompp generates
a random seed, but tpbconv reuses the same seed. But as I said, I think 
that for SD is solvent
the velocity state of the solvent will swamp the memory effect of the SD 
random state.

Just a week ago I implemented checkpoint files for Gromacs 4, including 
storage of the complete ring state
for BD and SD.

Berk.