[gmx-users] Very large fluctuations in dg/dl

Patrick Fuchs Patrick.Fuchs at ebgm.jussieu.fr
Tue May 8 12:20:54 CEST 2007


Hi John,
thanks a lot for your reply.
Indeed, the standard deviation I presented in my previous post is the one 
of dg/dl samples. I was just surprised by the fact the std. dev. is always 
larger than the value itself (since I'm starting with FE calculation I had 
no expectation of what the behavior would be and needed a confirmation).
I was also suprised about the convergence of the mean. I put a plot 
for each lambda value at the URL:
http://condor.ebgm.jussieu.fr/~fuchs/download/convergence.png
(at each time step, I recalculate the mean from the beginning). My first 
observation was that 1 ns seemed to be a minimum for certain lambda values
(e.g. lambda=0.70). I sometimes read in literature that some authors 
used a few hundreds of ps, which seemed (to me) not sufficient for 
proper convergence.
Now, if we come back to the error estimate of the mean, I found (for 
lambda=0.00) 0.2 kJ/mol using block averaging (using the -ee option of 
g_analyze), which is reasonable I imagine (even if higher precisions 
have been described in literature). I'm not sure whether this is the same
way of calculating the uncertainty compared to what you proposed.
Can you confirm?
I will have a look to the book you mentioned, thanks for the pointer.
Cheers,

Patrick

On Mon, 7 May 2007, John D. Chodera wrote:

> Hi Patrick,
>
>> I find a reasonable DeltaGsol value of 8.6 kJ/mol for methane (compared
>> to 8.7 in Geerke & van Gunsteren, ChemPhysChem 2006, 7, 671 ? 678) but
>> I get really huge fluctuations in the values of dg/dl:
>> lambda=0.00: 5.0 +/- 10.8 (mean +/- standard deviation)
>> lambda=0.05: 4.3 +/- 11.2
>
> Is the standard deviation you quote here the standard deviation of the dg/dl 
> samples, or the standard deviation of the mean?
>
> If the former, then this behavior is totally expected: While the standard 
> deviation of a random variable (your dg/dl samples) may be large, with enough 
> sampling, we can get a very precise estimate of the mean.  More sampling will 
> not change the standard deviation of the dg/dl samples, but it will reduce 
> the standard error in the mean, which is what we need for precise estimates 
> of free energy differences.
>
> The uncertainty in the estimate of <dg/dl> is given simply by
>
> d<dg/dl> = sigma / sqrt(N / g)
>
> where here, sigma is the standard deviation of your dg/dl samples, N is the 
> number of data points you have collected, and g is something called the 
> "statistical inefficiency", which can be estimated from the correlation time 
> of your dg/dl samples.  More information on this sort of analysis can be 
> found in reference [1] below.
>
> Once you have the uncertainty in each estimate of <dg/dl>, you still have to 
> combine these to get the uncertainty estimate for the integrated free energy 
> difference using standard propagation of error.  This depends on your choice 
> of quadrature for TI.  David Mobley has done a lot of this, and I'm sure 
> would be willing to help if you had trouble figuring it out.
>
> Good luck!
>
> - John
>
> [1] W. Janke. Statistical analysis of simulations: Data correlations and 
> error estimation. In J. Grotendorst, D. Marx, and A. Murmatsu, editors, 
> Quantum Simulations of Complex Many-Body Systems: From Theory to Algorithms, 
> volume 10, pages 423?445. John von Neumann Institute for Computing, 2002.
>
> --
> John Chodera <jchodera at gmail.com>             | Mobile    : 415 867-7384
> Postdoctoral researcher, Pande lab            | Lab phone : 650.723.1097
> Department of Chemistry, Stanford University  | Lab fax   : 650.724.4021
> http://www.dillgroup.ucsf.edu/~jchodera
> --
>
> Date: Mon, 07 May 2007 16:41:26 +0200
> From: Patrick Fuchs <Patrick.Fuchs at ebgm.jussieu.fr>
> Subject: [gmx-users] Very large fluctuations in dg/dl
> To: gmx-users at gromacs.org
> Message-ID: <463F3A96.9050206 at ebgm.jussieu.fr>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> Hi Gromacs users,
> I have a few questions related to solvation free energy calculation via
> thermodynamic integration.
> I'm trying to reproduce some literature data (on e.g. methane,
> methanol...) using the GROMOS G53a6 force field. I followed the tutorial
> of David Mobley (thanks to him BTW), but I used the standard non bonded
> options of the G53a6 force field (instead of OPLS). For each lambda
> value I do a minimization, a 10 ps NVT followed by a 20 ps NPT
> equilibration, and a 1 ns NVT production using the sd integrator. I used
> 21 lambda values (0.00, 0.05...1.00).
> Here's my topology file:
> ----------------begining of methane.top------------------------
> ; topology for a methane molecule
>
> ; include GROMOS53a6 force field
> #include "ffG53a6.itp"
>
> ;;;;;;; begin methane definition ;;;;;;;
> [ moleculetype ]
> ; Name           nrexcl
> METH             3
>
> [ atoms ]
> ;nr type resnr residue atom cgnr charge mass    typeB chargeB massB
> 1  CH4  1     METH    C1   0    0.0000 16.0430 DUM   0.0000  16.04300
> ;;;;;; end methane definition ;;;;;;;;
>
> ; include water topology
> #ifdef FLEX_SPC
> #include "flexspc.itp"
> #else
> #include "spc.itp"
> #endif
>
> [ system ]
> ; name
> 1 methane molecule in water
>
> [ molecules ]
> ; name  number
> METH    1
> SOL     893
> -----------------end of methane.top------------------------
>
> And here is my mdp file for lambda=0:
> ---------------begining of prod.mdp---------------------
> title               = production NVT methane/water
> cpp                 = /lib/cpp
> ; OPTIONS FOR BOND CONSTRAINTS
> constraints         = all-bonds
> ; RUN CONTROL PARAMETERS
> integrator          = sd
> tinit               = 0
> dt                  = 0.002
> nsteps              = 500000 ; 1000 ps
> ; NUMBER OF STEPS FOR CENTER OF MASS MOTION REMOVAL
> nstcomm             = 100
> ; OUTPUT CONTROL OPTIONS
> nstxout                  = 500
> nstvout                  = 500
> nstfout                  = 0
> nstlog                   = 500
> nstenergy                = 100
> nstxtcout                = 5000
> xtc-precision            = 1000
> ; NON BONDED STUFF
> ns_type             =  grid
> nstlist                = 5
> rlist                  = 0.8
> coulombtype            = generalized-reaction-field
> rcoulomb               = 1.4
> rvdw                   = 1.4
> epsilon_rf             = 54.0
> ;OPTIONS FOR TEMPERATURE COUPLING
> tc_grps                  = system
> tau_t                    = 0.1 ; inverse langevin friction cst
> ref_t                    = 300
> ;OPTIONS FOR PRESSURE COUPLING
> Pcoupl                   = no
> tau_p                    = 0.5
> compressibility          = 4.5e-5
> ref_p                    = 1.0
> ; FREE ENERGY CONTROL STUFF
> free_energy              = yes
> init_lambda              = 0.00
> delta_lambda             = 0
> sc_alpha                 = 0.5
> sc-power                 = 1.0
> sc-sigma                 = 0.3
> ; VELOCITY GENERATION
> gen_vel                  = yes
> gen_temp                 = 300
> gen_seed                 = -1
> -----------------end of prod.mdp------------------------
> I find a reasonable DeltaGsol value of 8.6 kJ/mol for methane (compared
> to 8.7 in Geerke & van Gunsteren, ChemPhysChem 2006, 7, 671 ? 678) but
> I get really huge fluctuations in the values of dg/dl:
> lambda=0.00: 5.0 +/- 10.8 (mean +/- standard deviation)
> lambda=0.05: 4.3 +/- 11.2
> ...
> lambda=1.00: -0.3 +/- 4.0
> Furthermore, each of these mean value is very slow at converging (1 ns
> seems a minimum for certain lambda values...).
> I can't get reasonable fluctuations even if I sample more. In addition,
> there are very frequent warnings in the log file such as:
> ----
> Large VCM(group rest):      0.01363,      0.00818,      0.01147,
> ekin-cm:  3.09490e+00
> ----
> Here are my questions:
> 1) Has someone an idea of what could be the cause of these [very] large
> fluctuations? Does it come from my setup, or is this a normal behavior?
> 2) Are these 'Large VCM(group rest)' warnings related to the use of sd
> integrator (when I switch to md integrator, I no longer get these
> warnings) ?
> Thanks for your answer,
>
> Patrick
>



More information about the gromacs.org_gmx-users mailing list