[gmx-users] Error bars - g_wham

Fri Mar 1 15:43:33 CET 2013

Hi Stephen,

computing errors from umbrella sampling is not trivial at al.

Generally, there are two possibilities:

- If each histogram overlaps only with one neighboring histogram, you 
*must* know the autocorrelation time of each window. This is often a 
problem in MD simulations, because there may be hidden slow transitions. 
In my experience, you always underestimate the autocorrelation time and 
as a consequence the statistical error. However, if you do know the 
autocorrelation time, then you can either (a) compute the PMF from 
blocks of the data (how you did), but this only works if the blocks are 
longer than the autocorrelation time or (b) use "g_wham -bs-method traj" 
to generate new "synthentic" histograms for bootstrapping (which 
incorporate the autocorrelation time). Important: You cannot do 
bootstrapping of complete histograms, because there is only one 
histogram at each window.

- If you have many histograms overlapping each other, or if you did 
umbrella sampling at each window multiple times, you can assume that the 
different histograms represent all possible histograms at the respective 
position of the reaction coordinate. Then, you can do bootstrapping of 
histograms with g_wham -bs-method b-hist. It is not obvious how many 
histograms you need at each position, but maybe 10 is a reasonable number.

Coming to your error from blocks of data (10-30, 30-50, 50-70, 70-100 
ns). The small error you get could mean two things:
a) you have a well-converged PMF (good)
b) you have long autocorrelations. Therefore, the histograms from the 
blocks are similar. Therefore, you underestimate the error (bad).

So you see, it is not trivial to estimate errors from umbrella sampling. 
My experience is that bootstrapping of histograms is more reliable, but 
it requires that you have multiple histograms at each position (and 
these histograms should be uncorrelated!!). But at least, this way you 
do not need to know the autocorrelation times, but instead "only" need 
to generate histograms which are independent. The latter is easier in 
practice, because independent simulations are more likely to be 
uncorrelated than frames *within* one simulation.

I hope this helps a bit.

Cheers,
Jochen

Am 2/19/13 3:22 PM, schrieb Steven Neumann:
> Dear Gmx Users,
>
> I run 10 US windows of 100 ns each - ion binding protein. I have a
> great convergence of profiles and good windows overlap.I tried to see
> PMF profiles from 10-30, 30-50, 50-70, 70-100 ns and they look very
> similar (Max. error would be 0.2 kcal/mol). The overal deltaG is about
> -5 kcal/mol.
>
> When I use g_wham with -nBootsrap 200 and -bins 200 I get error bars
> of -1.2 kcal/mol which is very significant.
> How can I impove my error bars? Why they are so large?
>
> Steven
>

-- 
---------------------------------------------------
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---------------------------------------------------