[gmx-developers] Does GROMACS pass random numbers through checkpoint/restart files?
David van der Spoel
spoel at xray.bmc.uu.se
Mon Apr 21 09:10:07 CEST 2008
David Cerutti wrote:
> Hello,
>
> I'm not a frequent user of gromacs, but I'm on your developers list
> nonetheless... cause for introspection.
>
> I noticed a very bizarre yet scary artifact in some simulations I was
> doing with a Langevin thermostat, and with some help of the AMBER code
> developers I traced it to the fact that I was feeding in the same random
> number seed with every restart of a new simulation segment. The effect
> was to destabilize my protein to varying degrees, depending on how
> frequently I checkpointed the run. The developers are now implementing
> a patch to address this behavior in the current release, but a more
> rigorous solution such as the way CHARMM passes the pseudo-random state
> vector through its checkpoint files will not be available until some
> future release.
>
> The implications of what I'm finding could be summarized thus: every
> simulation done with AMBER, using a Langevin thermostat (probably an
> Andersen thermostat as well) is affected by this repeating random
> numbers problem UNLESS the user specified IGB=-1 (a Generalized Born
> parameter that can be used to set the random seed based on the clock
> time) or rewrote the input file with every restart (which is what I'm
> doing now). In the extreme case of 1000 steps per restart, the protein
> will unfold; in more subtle cases, with perhaps 25,000 to 100,000 steps
> per restart, it appears that the protein may be slightly destabilized.
> I have a 145ns simulation of a dimer-of-dimers protein that breaks along
> its weak interface 110ns into the run, but I'm thinking that it tracks
> back to this random numbers problem. There was a portion of the run
> where I lengthened the segments to 1,000,000 steps, which I now see
> corresponds to a brief healing of the tetramer before I went back to
> 100,000 steps and blew it apart--wild!
>
> I am writing up some more tests as a communication to make sure that
> other investigators know this can happen. I, for one, have just lost
> over 500ns of MD to this artifact, and I suspect that it lurks in many,
> many studies in the literature. One of the developers also commented
> that numerous users were frustrated with the Langevin thermostat and so
> went back to Berendsen weak-coupling; it may all track back to these
> random numbers.
>
> As it appears in the manual, GROMACS may be vulnerable to this sort
> of error if the user does not specify Id_seed=-1, unless the random
> number state vector is being saved in the checkpoint file for the next
> restart. Please let me know how the code handles this issue.
>
> Thanking you,
>
> Dave Cerutti
Dave,
thanks for an insightful comment. I think we should investigate this
further. Very recently I learned of an approach for Monte Carlo
simulations of particles (in a very different) field, where each atom
carries it's own random number seed (and state variables), obviously
this state should be conserved over restarts. This is in particular
important for parallel runs, and if you continue a simulation on a
different number of processors. The overhead is not more than some
memory per atom. The reference for the paper is:
The Monte Carlo Method: Versatility Unbounded In A Dynamic Computing
World, Chattanooga, Tennessee, April 17–21, 2005, on CD-ROM, American
Nuclear Society, LaGrange Park, IL (2005)
PARALLEL MONTE CARLO PARTICLE TRANSPORT AND THE QUALITY OF RANDOM NUMBER
GENERATORS: HOW GOOD IS GOOD ENOUGH?
Richard Procassini and Bret Beck
Cheers,
--
David van der Spoel, Ph.D.
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205. Fax: +4618511755.
spoel at xray.bmc.uu.se spoel at gromacs.org http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers
mailing list