[gmx-users] Error bars - g_wham
Jochen Hub
jhub at gwdg.de
Fri Mar 1 15:43:33 CET 2013
Hi Stephen,
computing errors from umbrella sampling is not trivial at al.
Generally, there are two possibilities:
- If each histogram overlaps only with one neighboring histogram, you
*must* know the autocorrelation time of each window. This is often a
problem in MD simulations, because there may be hidden slow transitions.
In my experience, you always underestimate the autocorrelation time and
as a consequence the statistical error. However, if you do know the
autocorrelation time, then you can either (a) compute the PMF from
blocks of the data (how you did), but this only works if the blocks are
longer than the autocorrelation time or (b) use "g_wham -bs-method traj"
to generate new "synthentic" histograms for bootstrapping (which
incorporate the autocorrelation time). Important: You cannot do
bootstrapping of complete histograms, because there is only one
histogram at each window.
- If you have many histograms overlapping each other, or if you did
umbrella sampling at each window multiple times, you can assume that the
different histograms represent all possible histograms at the respective
position of the reaction coordinate. Then, you can do bootstrapping of
histograms with g_wham -bs-method b-hist. It is not obvious how many
histograms you need at each position, but maybe 10 is a reasonable number.
Coming to your error from blocks of data (10-30, 30-50, 50-70, 70-100
ns). The small error you get could mean two things:
a) you have a well-converged PMF (good)
b) you have long autocorrelations. Therefore, the histograms from the
blocks are similar. Therefore, you underestimate the error (bad).
So you see, it is not trivial to estimate errors from umbrella sampling.
My experience is that bootstrapping of histograms is more reliable, but
it requires that you have multiple histograms at each position (and
these histograms should be uncorrelated!!). But at least, this way you
do not need to know the autocorrelation times, but instead "only" need
to generate histograms which are independent. The latter is easier in
practice, because independent simulations are more likely to be
uncorrelated than frames *within* one simulation.
I hope this helps a bit.
Cheers,
Jochen
Am 2/19/13 3:22 PM, schrieb Steven Neumann:
> Dear Gmx Users,
>
> I run 10 US windows of 100 ns each - ion binding protein. I have a
> great convergence of profiles and good windows overlap.I tried to see
> PMF profiles from 10-30, 30-50, 50-70, 70-100 ns and they look very
> similar (Max. error would be 0.2 kcal/mol). The overal deltaG is about
> -5 kcal/mol.
>
> When I use g_wham with -nBootsrap 200 and -bins 200 I get error bars
> of -1.2 kcal/mol which is very significant.
> How can I impove my error bars? Why they are so large?
>
> Steven
>
--
---------------------------------------------------
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---------------------------------------------------
More information about the gromacs.org_gmx-users
mailing list