[gmx-users] error estimates of free energy calculations

Thu May 10 23:57:59 CEST 2007

Dear GROMACS users,

To test my protocol, I have been trying to reproduce some of Berk 
Hess's hydration free energy results from his 2006 paper (J. Phys. 
Chem. B 110, 17616-17626, 2006). Although my obtained average free 
energies are quite comparable, my error estimates are much higher 
than the errors reported in that paper. I'd like to figure out what's 
causing this difference in accuracy, because I really need optimal 
accuracy for my future calculations.

I'm using GROMACS 3.3.1, the G53a6 force field and 3.0 nm 
dodecahedron simulation boxes with ~631 SPC waters. After 
mimimization and 500 ps of equilibration for each of the 21 equally 
spaced lambda-values, I simulated 5 ns for the propane, phenylalanine 
and asparagine analogues. Theoretically, this should give me a 
significantly better accuracy than the paper's 500 ps production runs 
(approx. 1/sqrt(10) times better).

Below the .mdp file I used for te production run of the solvated 
system:

=== .mdp file ===
; RUN CONTROL PARAMETERS
integrator               = sd
dt                       = 0.002
nsteps                   = 2500000
comm_mode                = none

; OUTPUT CONTROL OPTIONS
nstxout                  = 50000
nstvout                  = 50000
nstfout                  = 50000
nstlog                   = 50000
nstxtcout                = 100

; NEIGHBORSEARCHING PARAMETERS
nstlist                  = 5

; OPTIONS FOR ELECTROSTATICS AND VDW
coulombtype              = pme
rvdw                     = 1.4

; OPTIONS FOR WEAK COUPLING ALGORITHMS
tc_grps                  = system
tau_t                    = 1.0
ref_t                    = 298.15
pcoupl                   = berendsen
tau_p                    = 0.5
compressibility          = 4.5e-5
ref_p                    = 1.01325

; OPTIONS FOR BONDS
constraints              = all-bonds

; FREE ENERGY CONTROL STUFF
free_energy              = yes
init_lambda              = 0.00
sc_alpha                 = 0.6
sc_power                 = 1.0
sc_sigma                 = 0.28
=== end of .mdp file ===

For the in vacuo simulations a similar protocol was used, but without 
pbc and cutoffs. The error estimates of the in vacuo run were 
negligibly small compared to those of the solvated system.

To the best of my knowledge, the only real difference in parameters 
is the pressure coupling. The paper uses stochastic pressure coupling 
which is not available in standard GROMACS 3.3.1. But I doubt that 
that should make this much of a difference.

My results in kJ/mol:
Val: 8.0 +/- 0.3
Phe: -1.9 +/- 0.6
Asn: -47.0 +/- 0.6

When I limt my calculation to the first 500 ps of my production runs, 
to be able to compare them directly to the paper's results, it 
unsurprisingly gets even worse (paper's results in parentheses):
Val: -7.7 +/- 1.0   (7.8 +/- 0.2)
Phe: -1.7 +/- 2.0   (-2.0 +/- 0.3)
Asn: -47.1 +/- 1.7   (-43.1 +/- 0.3)

My averages and error estimates for each lamda value were calculated 
with the -ee option of g_analyze (a block average fitting procedure, 
similar to the paper). The average free energies and total error 
estimates for each lambda-value were integrated with g_analyze -
integrate -xydy.

My accuracy is also significantly worse than the ones reported in A. 
Villa, J Comput Chem 23, 548-553, 2002 and J. Maccallum, J Comput 
Chem 24, 1930-1935, 2003. And it doesn't even come near to the 
extraordinary accuracy reported by Michael Shirts in his papers (who 
btw also uses 5 ns runs).

I read the recent discussion on this list about error estimation of 
free energie calulations (starting with 
http://www.gromacs.org/pipermail/gmx-users/2007-May/027237.html). But 
since I'm trying to reproduce Berk's results, also using the same 
error estimation procedure, I don't think that discussion gives a 
good explanation for my deviations.

Does somebody have a clue why my error estimates consistently are 
approx. 6 times worse? Am I doing something totally wrong here, 
either during simulation or during analysis? Should my error 
estimates be divided by some number that I'm not aware of?

I know this is a rather detailed question, but I really hope someone 
can help me with this. 

Many thanks in advance,
Jeroen