[gmx-users] PMF Transmembrane proteins

Nash, Anthony Anthony.Nash at warwick.ac.uk
Sun Dec 30 20:57:34 CET 2012

Dear Justin

Thanks for your reply.

I am studying a TM peptide, looking at how favourable the front and reverse face are. Hence, different orientations but the same composition of amino acids. 

I then have a duplicate of this system (call it system B), but with a couple of residue substitutions. I am comparing free energies of System A front orientation with System A back orientation (and then B with B). 

I have separated the peptides, and ran umbrella simulations every 1 angstrom apart (I experimented with windows closer together, in some regions hence the generalisation to half an angstrom). Papers I have looked through which sample up to 2 nm apart have been running umbrella windows every 0.2 of an angstrom. My PMF plateau is around 6.5-7.5 nm, which implies around 78 windows per system.

Taking one system as an example, if I normalise at 7.5 nm the difference at their global minima is around 37.63 kJ mol−1 (8.99 kCal mol−1), but then when I break down the PMF graph (0-1ns, 0-2ns, 0-3nm) the regions greater than 4.5 nm have not converged. Hence, normalising anywhere in this unconverged region would be wildly inaccurate. If I normalise to zero at the point in which one of the system pairs (A&A, or B&B) converges, the free energy different is 5 kJ mol−1. 

If by error estimate you mean from the inbuilt bootstrap analysis, then I am very confident that the error along the entire reaction coordinate is very small; which is confusing as I am not sure whether the slower converging regions of the reaction coordinate (4.5 - 7.5) should shower greater error!?

Yes it is heart breaking to hear that I may need to run the system for longer. This however, isn't a massive problem, given the time of the year the HPCs I am using are a bit empty. Would you suggest I need to run the 40 ns for longer (continuation run), or begin brand new windows from scratch and include those in the g_wham calculations? I guess the 40 ns one, given that lipid reorientation needs to be taken into account.

I don't have numbers in front of me, but the system box size is big (I went through your tutorial several times, and I have seen the PMF graph as a result of a box too small). The longest i.e., the reaction coordinate, is along the Y axes, Z is normal to the bilayer. When I calculate the PMF, I have pull_dim = Y Y N. 

I'll get around to putting up some ASAP.

Thanks very much for your help

From: gmx-users-bounces at gromacs.org [gmx-users-bounces at gromacs.org] on behalf of Justin Lemkul [jalemkul at vt.edu]
Sent: 30 December 2012 19:25
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] PMF Transmembrane proteins

On 12/30/12 12:28 PM, Nash, Anthony wrote:
> Dear gmx users,
> I posted a couple of weeks ago with regards to correctly using umbrella
> sampling and the WHAM on atomistic transmembrane proteins with a reaction
> coordinate as a function of interhelical distance. I have a single TM dimer,
> but with a different transmembrane domain face at the helix-helix interface.

So you have proteins with identical sequences in each case, but different

> What did I conclude from the replies? Correct calculation of the differences
> in free energy is taken by normalizing to zero at the plateau (flat) of the
> graph i.e., where your PMF graph (after g_wham) flattens out you apply the
> -zprof0 at this point, before calculating the difference between of each
> system.
> The problem I face is that this occurs around 6.8 - 7.5nm along the reaction
> coordinate. And, applying a umbrella window every half angstrom has made this
> extremely computationally expensive. However, I have persisted, and I have

Every half Angstrom?  That sounds like massive overkill for almost any system.
Do you mean half nanometer instead?

> all my graphs, and I have a plateau on all of them. Yet the PMF graphs
> between 4 - 7.5 nm are not yet converging or close to the same sampled
> energy, even though all four systems are identical in amino acid composition.
> Each window has ran for up to 40 ns.

Are there any differences at all in these systems, or are they simply intended
to be replicates of one another?  I'm just curious to know what you're comparing

> So when I normalise, depending on where I normalise I will get massively
> different difference in free energies.

Real numbers would help us understand what's going on here, as would a report of
the error estimate that g_wham cna provide.

> I have tried the strategy of taking the windows of a particular systems at
> 4.5 nm to 7.5 nm along the reaction coordinate (where there is no short range
> interaction), and applying those windows into the other three systems. This
> brings their region of plateau a lot closer together, whilst preserving the
> windows and the convergence of the region along the reaction coordinate where
> there are close-range forces in play (converges 0.8 nm - 4.5 nm).
> Alternatively, could I make the assumption that as their amino acid
> composition is identical across all four systems, I can normalise where each
> of the four graphs finish converging (around 4.5 nm)?

In theory, two peptides that are at a given distance from one another can sample
basically an infinite number of configurations, so at some point you should see
some similarities between the different systems.  During the length of a 40-ns
MD simulation, such an assumption likely is not valid, so picking and choosing
configurations that you think will make the results better is questionable, at
best.  Since you're dealing with a lipid bilayer, you can't consider only the
protein dynamics; you have lipid reorientation times that may be at play,
especially in the presence of artificial restraints between the proteins.  There
is an interplay between all elements of the system, and 40 ns may not be long
enough to assure that you really have sufficient sampling (heartbreaking to
hear, I know, especially for expensive simulations like umbrella sampling).

On a related note, how large are your boxes?  Is there adequate space between
periodic images at such a large COM separation?  I would imagine that your
systems would have to be quite large to avoid spurious PBC interactions.  Are
you pulling along a single dimension or along a diagonal?  Can you post any
images to demonstrate two systems that give very different results, but visually
seem that they should not?  There is so much to guess at here that it's very
difficult to make conclusive recommendations.

> I would really appreciate any advice from someone who has insight on this
> issue. I have looked through many journals and founds lots of entries where
> they just stop at 2.0 nm with no plateau, yet I haven't found a single WHAM
> of reconstructing umbrella windows where they normalise at the plateau. I

Shameless plug:


> have also contacted a few authors of journals with regards to why they cut
> off at 2 nm. None of them took into consideration that their PMF graphs were
> still raising.

Typical ;)  There are a fair number of very questionable PMF curves in the
literature, I'm afraid.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080

gmx-users mailing list    gmx-users at gromacs.org
* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-request at gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

More information about the gromacs.org_gmx-users mailing list