[gmx-users] What is the most reliable way to run repeats for reproducibility?

Wed Jan 10 15:59:39 CET 2018

Hi Mark,
Thank you very much.

) For the link you provide, I think I could not manipulate most of the computer resources, as I submit my jobs to our cluster, and the jobs are distributed to different available cores randomly.

) For "random seed" of velocity, I found here and I enabled this option:
gen_vel = yes
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/06_equil.html

So does it mean that it is better to use the same em.tpr and run different NVT,NPT,etc. for different repeats, so as to initialise it with different velocities?

) How the "natural chaotic divergence during equilibration" is reflected at which step?

The link says: "The Central Limit Theorem tells us that in the case of infinitely long simulation all observables converge to their equilibrium values". But I think this "equilibrium" is not practical for protein in MD. For example, if I am running a protein at 370K, ultimately it will unfold, like boiling an egg in water, it takes 10 min. But in MD, the time scale is way more shorter, i.e. usually a few hundred ns scale. We could "never" see the proteins converges within that short period.

So my understanding about "equilibrium" is the equilibration for temperature/pressure/density, but not the protein itself. Is that correct?
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/06_equil.html
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/07_equil2.html

Yours sincerely
Cheng

------------------ Original ------------------
From:  "ZHANG Cheng";<272699575 at qq.com>;
Date:  Wed, Jan 10, 2018 09:11 PM
To:  "gromacs.org_gmx-users"<gromacs.org_gmx-users at maillist.sys.kth.se>;

Subject:  What is the most reliable way to run repeats for reproducibility?

Dear Gromacs,
I can think of different ways of running repeats, after reading Justin's lysozyme tutorial.

The 1st way: all starting from the same em.tpr after energy minimization (EM) and use em.tpr individually for subsequent steps (NVT, NPT and production MD):
) repeat 1: same em.tpr → NVT → NPT → md_0_1.tpr→ production MD
) repeat 2: same em.tpr → NVT → NPT → md_0_1.tpr→ production MD
) repeat 3: same em.tpr → NVT → NPT → md_0_1.tpr→ production MD
......

The 2nd way: all starting from the same md_0_1.tpr and use it for different production MD:
) repeat 1: same em.tpr → same NVT → same NPT → same md_0_1.tpr→ production MD
) repeat 2: same md_0_1.tpr→ production MD
) repeat 3: same md_0_1.tpr→ production MD
......

The 3rd way: all starting from the same check point file within the production run and use it for the rest of the production MD:
) repeat 1: same em.tpr → same NVT → same NPT → same md_0_1.tpr→ same production MD for 50 ns → same .cpt file → production MD for another 200 ns
) repeat 2: same .cpt file → production MD for another 200 ns
) repeat 3: same .cpt file → production MD for another 200 ns
......

Of course, the 3rd way is easier. But does it mean it may not cover enough conformations, as they tend to be more resembled from each other than the 1st approach? Is there a standard way to handle the repeats?

Thank you.

Yours sincerely
Cheng