[gmx-users] combining differently-generated force-fields

Fri May 2 07:57:54 CEST 2008

chris.neale at utoronto.ca wrote:
> I don't have a problem, per se, but would like to discuss the problems 
> that may, or may not, arise when mixing force fields.
> 
> It is clear to me why one would not want to calculate the free energy of 
> binding for two proteins, one using the amber ff and the other using the 
> opls ff; also it is clear that there would be problems simulating a box 
> of water half of which is tip3p and half of which is spc. The common 
> thing to these examples is that such simulations would apply dissimilar 
> parameter sets for similar functional groups and therefore any results 
> could be subject to significant biases, the source of which will not be 
> obvious to the user.
> 
> However, If one was simulating the binding of a protein to DNA, or a 
> protein embedded in a lipid bilayer, the functional groups are no longer 
> shared by different types of macromolecules. Since I work on membrane 
> proteins, let me take the case of an oplsaa protein in a Berger lipid 
> bilayer. Not only are these ff's differently generated, but one is 
> all-atom and one is united-atom. The important difference in this case 
> is that there are few functional groups of the lipids that resemble 
> those of the protein e.g. the NH3 of a lipid head-group choline and a 
> lysine of the protein. Generally though, the functional groups are 
> entirely different between these macromolecules. I believe that this is 
> also the case for protein-DNA simulations. Therefore, what biases can 
> possibly occur by the combination of different ff's in this case that 
> could not also occur by combinations that exclusively use a single ff?

Force field parameters are the result of some kind of global 
optimization procedure. As such it is well-known that you should not 
expect a strong correlation between a bond stretching parameter and any 
real measure of bond strength. This is because that real interaction is 
being modeled a) approximately, and b) through model interactions not 
necessarily localised to the two bonded atoms.

One would not expect to reach the same near-global minimum after 
optimizing over protein parameters for two given sets of water 
parameters. Trivially, the water-protein Coulomb interactions will have 
to be different. Thus, the intra-protein Coulomb interactions will have 
to be different. This may directly affect some bonded interactions, 
depending on your exclusion treatment. Finally, then can be all manner 
of indirect effects that might depend on which local minimum your 
optimization ended up on. The same goes for any other sets of 
constrained and free variables you might use in a parameterization 
process, and IMO makes for a clear presumption of numerical suicide from 
mixing force fields, possibly except in some fortuitous and well-tested 
cases. Hopefully this oplsaa-Berger mix is such a case, but I don't know 
anything about it.

> I take the extreme example and ask: what special relevance do the opls 
> ion parameters have to the opls protein parameters? It seems to me that, 
> although they "derive them in a manner consistent with how the rest of 
> the force field was originally derived" 
> (http://wiki.gromacs.org/index.php/Parameterization), in this extreme 
> case I believe that this is an entirely abstract concept of no 
> particular value. In other words, how can Na+ possibly be generated 
> consistently/inconsistently with an amino acid that contains no Na?

In part, the general advice you cite is sound for cases where one is not 
going to do a fully rigorous test of the performance of the parameters - 
e.g. the antechamber or PRODRG approach. Using a similar methodology 
gives one some basis for optimism. Using a different one *and not 
testing* is random and asks for trouble. Using a different one *and 
testing* for performance on observables relevant to the study you wish 
to perform using those parameters seems quite reasonable to me. The only 
value in an extended MM force field is its ability to model a physical 
system featuring the elements of that extension. If you can demonstrate 
it does that well enough, then the method by which you extended it seems 
irrelevant.

Also, it could be true that achieving success in such a test has been 
experienced to be difficult unless one has followed a similar methodology.

> To clearly state my current point of view in the absence of a shred of 
> data, I suggest the following: "One should not combine parameters that 
> are derived inconsistently of one another except in cases where such 
> combination can be made without introducing multiple parametric 
> definitions of a given functional group." 

I would disagree strongly for the above kinds of reasons.

> If you believe that, it would 
> therefore be acceptable to combine the following in any way: i) protein, 
> ii) water, iii) ion, iv) DNA, v) lipid, vi) carbohydrate. The seventh 
> group: small molecules, is difficult to classify since one must take 
> into consideration the specific functional groups. For example, I would 
> suggest that ATP and a protein should be fine if different ff's are 
> used, but that ATP and DNA should use a consistent ff when simulated in 
> conjunction.
> 
> As we ramp up our simulations for ever-increasing cpu power and for 
> gromacs 4, these questions are well beyond pedantic. It is one thing to 
> develop parameters for a small molecule consistently with the the 
> methodology used for the protein/DNA ff. However, simulations of more 
> than one different type of macromolecule (e.g. protein-DNA simulations) 
> would greatly benefit, it seems, from the ability to use the DNA 
> parameters that lead to the most accurate sampling of DNA phase space 
> and the protein parameters that lead to the most accurate sampling of 
> protein phase space. It is my conjecture that such combinations would 
> not only be appropriate, but that they would be optimal.

These phase spaces are not independent. A solute phase space is sampled 
differently in different solvent models. There is no reason to suppose 
that the combinations you suggest would even be close to effective, 
never mind optimal.

> Disclaimer: If you are considering combining differently-generated 
> force-fields, please do not take this post as encouragement. The 
> standard logic never to combine force-fields is still recommended. I 
> only wanted to have some discussion on this topic.
> 
> Thanks for all comments, especially those that are in disagreement with 
> my proposition.

You're welcome :-)

Mark