[gmx-developers] Does GROMACS pass random numbers through checkpoint/restart files?

David van der Spoel spoel at xray.bmc.uu.se
Mon Apr 21 09:10:07 CEST 2008

David Cerutti wrote:
> Hello,
>    I'm not a frequent user of gromacs, but I'm on your developers list 
> nonetheless... cause for introspection.
>    I noticed a very bizarre yet scary artifact in some simulations I was 
> doing with a Langevin thermostat, and with some help of the AMBER code 
> developers I traced it to the fact that I was feeding in the same random 
> number seed with every restart of a new simulation segment.  The effect 
> was to destabilize my protein to varying degrees, depending on how 
> frequently I checkpointed the run.  The developers are now implementing 
> a patch to address this behavior in the current release, but a more 
> rigorous solution such as the way CHARMM passes the pseudo-random state 
> vector through its checkpoint files will not be available until some 
> future release.
>    The implications of what I'm finding could be summarized thus: every 
> simulation done with AMBER, using a Langevin thermostat (probably an 
> Andersen thermostat as well) is affected by this repeating random 
> numbers problem UNLESS the user specified IGB=-1 (a Generalized Born 
> parameter that can be used to set the random seed based on the clock 
> time) or rewrote the input file with every restart (which is what I'm 
> doing now). In the extreme case of 1000 steps per restart, the protein 
> will unfold; in more subtle cases, with perhaps 25,000 to 100,000 steps 
> per restart, it appears that the protein may be slightly destabilized.  
> I have a 145ns simulation of a dimer-of-dimers protein that breaks along 
> its weak interface 110ns into the run, but I'm thinking that it tracks 
> back to this random numbers problem.  There was a portion of the run 
> where I lengthened the segments to 1,000,000 steps, which I now see 
> corresponds to a brief healing of the tetramer before I went back to 
> 100,000 steps and blew it apart--wild!
>    I am writing up some more tests as a communication to make sure that 
> other investigators know this can happen.  I, for one, have just lost 
> over 500ns of MD to this artifact, and I suspect that it lurks in many, 
> many studies in the literature.  One of the developers also commented 
> that numerous users were frustrated with the Langevin thermostat and so 
> went back to Berendsen weak-coupling; it may all track back to these 
> random numbers.
>    As it appears in the manual, GROMACS may be vulnerable to this sort 
> of error if the user does not specify Id_seed=-1, unless the random 
> number state vector is being saved in the checkpoint file for the next 
> restart.  Please let me know how the code handles this issue.
> Thanking you,
> Dave Cerutti


thanks for an insightful comment. I think we should investigate this 
further. Very recently I learned of an approach for Monte Carlo 
simulations of particles (in a very different) field, where each atom 
carries it's own random number seed (and state variables), obviously 
this state should be conserved over restarts. This is in particular 
important for parallel runs, and if you continue a simulation on a 
different number of processors. The overhead is not more than some 
memory per atom. The reference for the paper is:

The Monte Carlo Method: Versatility Unbounded In A Dynamic Computing 
World, Chattanooga, Tennessee, April 17–21, 2005, on CD-ROM, American 
Nuclear Society, LaGrange Park, IL (2005)
Richard Procassini and Bret Beck

David van der Spoel, Ph.D.
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205. Fax: +4618511755.
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se

More information about the gromacs.org_gmx-developers mailing list