[gmx-users] Possible free energy bug?

Thu Mar 10 19:55:54 CET 2011

Hi All,

I've been troubleshooting a problem for some time now and I wanted to report it 
here and solicit some feedback before I submit a bug report to see if there's 
anything else I can try.

Here's the situation: I ran some free energy calculations (thermodynamic 
integration) a long time ago using version 3.3.3 to determine the hydration free 
energy of a series of small molecules.  Results were good and they ended up as 
part of a paper, so I'm trying to reproduce the methodology with 4.5.3 (using 
BAR) to see if I understand the workflow completely.  The problem is my systems 
are crashing.  The runs simply stop randomly (usually within a few hundred ps) 
with lots of LINCS warnings and step*.pdb files being written.

I know the parameters are good, and produce stable trajectories, since I spent 
months on them some years ago. The system prep is steepest descents EM to Fmax < 
100 (always achieved), NVT at 298 K for 100 ps, NPT at 298K/1 bar for 100 ps, 
then 5 ns of data collection under NPT conditions.  Here's the rundown of what 
I'm seeing:

1. All LJ transformations work fine.  The problem only comes when I have a 
molecule with full LJ interaction and I am "charging" it (i.e., introducing 
charges to the partially-interacting species).

2. Simulations at lambda=1 (full interaction) work fine.

3. Simulations with the free energy code off entirely work fine under all 
conditions.

4. I cannot run in serial due to http://redmine.gromacs.org/issues/715.  The bug 
seems to affect other systems and is not specifically related to my free energy 
calculations.

5. Running with DD fails because my system is relatively small (more on this in 
a moment).

6. Running with mdrun -pd 2 works, but mdrun -pd 4 crashes for any value of 
lambda != 1.

7. I created a larger system (instead of a 3x3x3-nm cube of water with my 
molecule, I used 4x4x4) and ran on 4 CPU's with DD (lambda = 0, i.e. full vdW, 
no intermolecular Coulombic interactions - .mdp file is below).  This run also 
crashed with some warnings about DD cell size:

DD  load balancing is limited by minimum cell size in dimension X
DD  step 329999  vol min/aver 0.748! load imb.: force 31.5%

...and then the actual crash:

-------------------------------------------------------
Program mdrun_4.5.3_gcc_mpi, VERSION 4.5.3
Source code file: domdec_con.c, line: 693

Fatal error:
DD cell 0 0 0 could only obtain 14 of the 15 atoms that are connected via 
constraints from the neighboring cells. This probably means your constraint 
lengths are too long compared to the domain decomposition cell size. Decrease 
the number of domain decomposition grid cells or lincs-order or use the -rcon 
option of mdrun.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Watching the trajectory doesn't seem to give any useful information.  The small 
molecule of interest is at a periodic boundary when the crash happens, but there 
are several crosses prior to the crash without incident, so I don't know if the 
issue is related to PBC or not, but it appears not.

8. I initially thought the problem might be related to the barostat, but 
switching from P-R to Berendsen does not alleviate the problem, nor does 
increasing tau_p (tested 0.5, 1.0, 2.0, and 5.0 - all crash).  Longer tau_p 
simply delays the crash, but does not prevent it.

So after all that, I'm wondering if (1) anyone has seen the same, or (2) if 
there's anything else I can try (environment variables, hidden tricks, etc) that 
I can use to get to the bottom of this before I give up and file a bug report.

If you made it this far, thanks for reading my novel and hopefully someone can 
give me some ideas.  The .mdp file I'm using is below, but it is just one of 
many that I've tried.  In theory, it should work, since the parameters are the 
same as my successful 3.3.3 runs, with the exception of the new free energy 
features in 4.5.3 and obvious keyword changes related to the difference in version.

-Justin

--- .mdp file ---

; Run control
integrator               = sd       ; Langevin dynamics
tinit                    = 0
dt                       = 0.002
nsteps                   = 2500000  ; 5 ns
nstcomm                  = 100
; Output control
nstxout                  = 500
nstvout                  = 500
nstfout                  = 0
nstlog                   = 500
nstenergy                = 500
nstxtcout                = 0
xtc-precision            = 1000
; Neighborsearching and short-range nonbonded interactions
nstlist                  = 5
ns_type                  = grid
pbc                      = xyz
rlist                    = 0.9
; Electrostatics
coulombtype              = PME
rcoulomb                 = 0.9
; van der Waals
vdw-type                 = cutoff
rvdw                     = 1.4
; Apply long range dispersion corrections for Energy and Pressure
DispCorr                  = EnerPres
; Spacing for the PME/PPPM FFT grid
fourierspacing           = 0.12
; EWALD/PME/PPPM parameters
pme_order                = 4
ewald_rtol               = 1e-05
epsilon_surface          = 0
optimize_fft             = no
; Temperature coupling
; tcoupl is implicitly handled by the sd integrator
tc_grps                  = system
tau_t                    = 1.0
ref_t                    = 298
; Pressure coupling is on for NPT
Pcoupl                   = Berendsen
tau_p                    = 2.0
compressibility          = 4.5e-05
ref_p                    = 1.0
; Free energy control stuff
free_energy              = yes
init_lambda              = 0.00
delta_lambda             = 0
foreign_lambda           = 0.05
sc-alpha                 = 0
sc-power                 = 1.0
sc-sigma                 = 0
couple-moltype           = MOR      ; name of moleculetype to couple
couple-lambda0           = vdw      ; vdW interactions
couple-lambda1           = vdw-q    ; turn on everything
couple-intramol          = no
dhdl_derivatives         = yes      ; this line (and the next two) are defaults
separate_dhdl_file       = yes      ; included only for pedantry
nstdhdl                  = 10
; Do not generate velocities
gen_vel                  = no
; options for bonds
constraints              = all-bonds
; Type of constraint algorithm
constraint-algorithm     = lincs
; Constrain the starting configuration
; since we are continuing from NPT
continuation             = yes
; Highest order in the expansion of the constraint coupling matrix
lincs-order              = 4

-- 
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================