[gmx-users] RE: About the binary identical results by restarting from the checkpoint file

Mark Abraham mark.j.abraham at gmail.com
Mon Jun 3 19:15:12 CEST 2013


On Mon, Jun 3, 2013 at 6:59 PM, Cuiying Jian <cuiying_jian at hotmail.com>wrote:

> Hi Mark,
>
> Thanks for your reply. I tested restarting simulations with .cpt files by
> GROMACS 4.6.1.  and the problems are still there, i.e. I cannot get binary
> identical results from restarted simulations with those from continuous
> simulations. The command I used for restarting is as the following (Only
> one processor is used during the simulations.):
> mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod
>

This is not generally enough to generate a serial run in 4.6, by the way.
GROMACS tries very hard to automatically use all the resources available in
the best way. See mdrun -h for various -nt* options, and consult the
pre-step-0 part of the .log file for feedback.

For further information, I attach my original .mdp file below:
> constraints          =  all-bonds         ; convert all bonds to
> constraints.
> integrator                 =  md
> dt                          =  0.002              ; ps !
> nsteps                  =  10000             ; total 2 ns.
> nstcomm             =  10                    ; frequency for center of
> mass motion removal.
> nstxout                =  5                      ; collect data every 10.0
> ps.
> nstxtcout             =  5                      ; frequency to write
> coordinate to xtc trajectory.
> nstvout                =  5                      ; frequency to write
> velocities to output trajectory.
> nstfout                 =  5                      ; frequency to write
> forces to output trajectory.
> nstlog                   =  5                      ; frequency to write
> energies to log file.
> nstenergy            =  5                      ; frequency to write
> energies to energy file.
> nstlist                   =  1                       ; frequency to update
> the neighbor list.
> ns_type               =  grid
> rlist                       =  1.4
> coulombtype      =  PME
> rcoulomb            =  1.4
> vdwtype              =  cut-off
> rvdw                     =  1.4
> pme_order          =  8                                 ; use 6,8 or 10
> when running in parallel
> ewald_rtol           =  1e-5
> optimize_fft        =  yes
> DispCorr               =  no                     ; don't apply any
> correction
> ;open LINCS
> constraint_algorithm = LINCS
> lincs_order                   = 4               ;highest order in the
> expansion of the constraint coupling matrix
> lincs_warnangle          = 30             ;maximum angle that a bond can
> rotate before LINCS will complain
> lincs_iter                      = 1                ;number of iterations
> to correct for a rotational lengthening in LINCS
> ; Temperature coupling is on
> Tcoupl                          = v-rescale
>

This coupling algorithm has a stochastic component, and at least at some
points in history the random number generator was either not checkpointed
properly, or not propagated in parallel properly. I'm not sure offhand if
any of that has been fixed yet (I doubt it), but you can test (parts of)
this hypothesis by using Berendsen (in any GROMACS 4.x), or really being
sure you've run a single thread.

If Berendsen is fully reproducible, then the RNG is the issue. While that's
irritating, it probably won't get fixed before GROMACS 5 (as a side effect
of other stuff going on).

Mark

tau_t                             = 0.1
> tc-grps                          = HEP
> ref_t                              =  300
> ; Pressure  coupling is on
> Pcoupl                          = parrinello-rahman
> Pcoupltype                  = isotropic
> tau_p                            = 1.0
> compressibility           = 4.5e-5
> ref_p                             = 1.0
> ; generate velocity is on at 300 K.
> gen_vel              = yes
> gen_temp          = 300.0
> gen_seed           = -1
>
> Is there something wrong with my .mdp file or my command? Thanks a lot.
>
> Cheers,
> Cuiying
> > On Sun, Jun 2, 2013 at 10:37 PM, Cuiying Jian <cuiying_jian at hotmail.com
> >wrote:
> >
> > > Hi GROMACS Users,
> > >
> > > These days, I am testing restarting simulaitions with .cpt files. I
> > > already set nlist=1 in the .mdp file. I can restart my simulations
> (which
> > > are stopped manually) with the following commands (version 4.0.7):
> > > mpiexec mdrun_s_mpi -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod
> > > -reprod is used to force binary identical simulaitons.
> > >
> > > During the restarted simulations, same number of processors are used as
> > > that in the simulation interrupted. The only case, in which I can get
> > > binary identical results with those from the continuous simulations
> (which
> > > are not stopped manually), is for SPC water molecules. Any other
> molecules
> > > (like -heptane), I can never get binary identical results with those
> from
> > > the continuous simulations.
> > >
> > > I also try to get new .tpr files by:
> > > tpbconv_s -s md.tpr -f md.trr -e md.edr -c md_c.tpr -cont
> > > and then:
> > > mpiexec mdrun_s_mpi -v -s md_c.tpr -cpt 0 -cpi md.cpt -deffnm md_c
> -reprod
> > > But I still cannot get binary identical results.
> > >
> > > I also test the simulations with only one processor and binary
> identical
> > > results are still not obtained. Using double precision does not solve
> the
> > > problems.
> > >
> > > I think that the above problems are caused by some information may not
> be
> > > stored during the running of the simulations.
> > >
> >
> > That seems likely. The leading candidate would be a random number
> generator
> > you're using for a stochastic integrator. Your .mdp file would have been
> > useful.
> >
> > On the other hand, if I run two independent simulations using the exactly
> > > same number of processors, the same commands and the same input files,
> i.e.
> > > mpiexec mdrun_s_mpi -v -s md.tpr -deffnm md -reprod
> > > I can always get binary identical results from these two independent
> > > simulations.
> > >
> > > I understand that MD is chaotic and if we run simulation for enough
> long
> > > time, simulation results should converge. Also, there are factors
> which may
> > > affect the reproducibility as described in the GROMACS website. But
> for my
> > > purpose, I am curious about whether there are certain methods through
> which
> > > I can get binary identical results from restarted simulations and
> > > continuous simulations. Thanks a lot.
> > >
> >
> > There are ways to be fully reproducible, but probably not every
> combination
> > of algorithms has that property. 4.0.7 is so old no problem will be
> fixed,
> > unless it can also be shown in 4.6 ;-)
> >
> > Mark
>
>
>
>
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list