[gmx-users] error estimates of free energy calculations

Tue May 15 19:41:34 CEST 2007

Hi David and Berk,

Thanks a lot for all of your suggestions, which I used to take 
another look at my protocol. And I think I've found the main source 
of my problem:

Starting with a file containing three columns (for lambda, dG/dl, and 
the error in dG/dl calculated by "g_analyze -ee"), I ran "g_analyze -
integrate -xydy" on this file. Looking at the source code (function 
evaluate_integral in autocorr.c), this procedure works fine for 
integrating the dG/dl's using the trapezium rule.

However, in my opinion (assuming the total error should be calculated 
from standard error propagation), the way g_analyze calculates the 
total error is incorrect. It results in an overestimation of this 
total error by at least a factor of sqrt((number of lambda points) - 
1). This factor is 4.5 in my case, accounting for the larger part of 
my previoulsy mentioned loss of accuracy. And indeed, as you guys 
already mentioned, the remaining part may be explained from the fact 
that Berk uses twice as many lambda values.

Anyway, I'm writing my own small program now to correctly determine 
the total error through standard error propagation. Depending on 
whether this influences other code or functionality, the developers 
might also consider implementing this in evaluate_integral in 
autocorr.c. Just let me know if you need more info.

Thanks again,
Jeroen

> Date: Fri, 11 May 2007 17:33:06 +0200
> From: "Berk Hess" <gmx3 at hotmail.com>
> Subject: Re: [gmx-users] error estimates of free energy calculations
> To: gmx-users at gromacs.org
> Message-ID: <BAY110-F10785A66707D8D56CB91178E390 at phx.gbl>
> Content-Type: text/plain; format=flowed
> 
> 
> 
> 
> >From: "David Mobley" <dmobley at gmail.com>
> >Reply-To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> >To: "Discussion list for GROMACS users" <gmx-users at gromacs.org>
> >Subject: Re: [gmx-users] error estimates of free energy calculations
> >Date: Fri, 11 May 2007 08:19:13 -0700
> >
> >Hi,
> >
> >Berk may weigh in on this, but...
> >
> >>To test my protocol, I have been trying to reproduce some of Berk
> >>Hess's hydration free energy results from his 2006 paper (J. Phys.
> >>Chem. B 110, 17616-17626, 2006). Although my obtained average free
> >>energies are quite comparable, my error estimates are much higher
> >>than the errors reported in that paper. I'd like to figure out what's
> >>causing this difference in accuracy, because I really need optimal
> >>accuracy for my future calculations.
> >>
> >>I'm using GROMACS 3.3.1, the G53a6 force field and 3.0 nm
> >>dodecahedron simulation boxes with ~631 SPC waters. After
> >>mimimization and 500 ps of equilibration for each of the 21 equally
> >>spaced lambda-values, I simulated 5 ns for the propane, phenylalanine
> >>and asparagine analogues. Theoretically, this should give me a
> >>significantly better accuracy than the paper's 500 ps production runs
> >>(approx. 1/sqrt(10) times better).
> >
> >Looks to me like Berk is using 42 lambda values in solvent, as he
> >spaces them more tightly around the feature in the curve, and 29 in
> >vacuum. This should give lower uncertainties.
> 
> I think this could already explain the difference.
> 
> When decoupling Coulomb and LJ at the same time you get nasty
> behavior especially around lambda=0. I would guess nearly all
> your error comes from this point and maybe some from the middle peak.
> As David says, it is much easier to decouple separately.
> If you want to do it at the same time, you need to use an optimized spacing.
> 
> Also you said you integrated the error directly?
> This is not correct, you should integrate the variance.
> That would make your errors even worse.
> 
> Berk.