# [gmx-users] FEP and error estimation

David Mobley dmobley at gmail.com
Mon Jan 9 19:06:54 CET 2006

```Maik,

The stdev/sqrt(n-1) will be an underestimate of the true error and even of
what your estimate *should* be, because it neglects the time correlation in
the data series. Depending on what you're doing the correlation time can be
long (for example, I have some binding affinity calculations where the
ligand can undergo large-scale changes in orientation with a timescale on
the order of 1ns). Anyway, the point is, you really need to take this
correlation time into account, because otherwise, you *do* have a lot of
samples, but they're not independent. What you really want is to compute
something like std/sqrt(n_eff) where n_eff is the effective number of
independent samples you have.

Anyway, the best way to do this is to compute the effective statistical
inefficiency g from the normalized autocorrelation function. Then you can
compute stdev/sqrt((n-1)/g) or some such.

Check out this recent paper from  our group for some information on this:
http://www.dillgroup.ucsf.edu/dl_papers/replica-exchange-wham.pdf. Most of
the paper is of course not relevant to what you are doing, but see
especially section 2.4 and 5.2. I've personally been using some of the
techniques in this paper (and those it references) to do exactly the sort of
error analysis you're trying to do, in collaboration with J. Chodera, so get
back to us if you need any help or further information after reading the
relevant sections of the paper.

I want to emphasize that this careful accounting of error is *very*
important. One thing I've found when doing careful free energy calculations
with good error analysis in this way is that it's hard to get good results,
and probably a lot of the published results without careful error analysis
actually have very large error which the authors are unaware of.

Thanks,
David

On 1/9/06, Maik Goette <mgoette at mpi-bpc.mpg.de> wrote:
>
> Dear all
>
> I was asking myself how to get an adequate error for FEP simulations.
> g_analyze spits out two values:
> The standard deviation (which seems to be not good for a correct error
> estimation) and std.dev./sqrt(n-1).
> I now think, that the 2nd one is the method, Berk derived in his PhD
> thesis and therefore the better one(?). But I am not really sure about
> this.
> Any comment?
>
> Thank you
>
> --
> Maik Goette, Dipl. Biol.
> Max Planck Institute for Biophysical Chemistry
> Theoretical & computational biophysics department
> Am Fassberg 11
> 37077 Goettingen
> Germany
> Tel.  : ++49 551 201 2310
> Fax   : ++49 551 201 2302
> Email : mgoette[at]mpi-bpc.mpg.de
>          mgoette2[at]gwdg.de
> WWW   : http://www.mpibpc.gwdg.de/groups/grubmueller/
> _______________________________________________
> gmx-users mailing list
> gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-users/attachments/20060109/3a9e6d29/attachment.html>
```