[gmx-users] Re: Reference structure for PCA.
modi.vivek2009 at gmail.com
Tue Feb 12 09:02:57 CET 2013
> Message: 1
> Date: Sun, 10 Feb 2013 21:32:15 +0000 (WET)
> From: baptista at itqb.unl.pt
> Subject: Re: [gmx-users] Reference structure for PCA.
> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> Message-ID: <alpine.DEB.2.00.1302102127430.8574 at simul36.itqb.unl.pt>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Thank you very much for the reply. I have gone through the paper you
mentioned. The central structure seems to be a good choice for the
> Hi Vivek,
> There are two distinct steps involved: (1) the fit of your trajectory to a
> reference structure, which corresponds to choose a conformation space; (2)
> the use of the PCA method, which corresponds to find in that space a new
> basis set whose ordered axes sequentially maximize dispersion (hopefully
> capturing the distribution main features with only a few of the new
> coordinates). The two steps just happen to be done by the same program.
> The structure chosen for fitting is related to step 1, while the average
> structure used to compute the covariance matrix is related to step 2 -- as
> already pointed by Tjerk, the two structures are generally not the same.
> The aim of the fit is to get rid of the global translation and rotation of
> your protein in the simulation box, trying to place all the sampled
> structures in a single 3D space that reflects "only" the conformational
> differences. But this is necessarily approximate, because the
> superimposition of any pair of structures after the global fit will be
> always worse than you would get by making a pairwise fit of the two. Thus,
> you want to get a final dispersion around the reference as small as
> possible. So, of the two average structures that you tried, you should
> choose the one computed from the last 30 ns (it's not surprising that it
> gives a smaller dispersion, because it refers to the segment you are
> analyzing). Still, using an average structure as a reference is a somewhat
> illusory solution, because that average must itself be obtained after
> fitting the trajectory to some reference... In a study of a small flexible
> peptide (where the choice of reference may have drastic effects), we found
> that a good reference seems to be the "central structure" of your sample,
> defined as the one that, when taken as a reference, leads to the lowest
> overall dispersion (http://dx.doi.org/10.1021/jp902991u). The article
> discusses the issues pointed above, so you may want to give it a look.
> You can also avoid the need of a reference by choosing a different
> conformation space for PCA, a popular alternative being the phi and psi
> dihedrals (look in the manual). Note that this dihedral space is a bit
> different from the more usual one discussed above, each reflecting a
> different kind of conformational proximity (this is also discussed in the
> article). It's up to you to decide which one better suits your problem.
> Hope this helps.
> Thank you very much for your reply Tsjerk.
I understand that the two reference structures are different. I had a query
because I found the method is very sensitive to the choice of the reference
structure for fitting. Most of the publications either do not mention the
reference structure. Only in few papers I found initial structure for
fitting. But the method gives different results if initial structure is
used; or average from complete trajectory; or average over a certain time
window is used as the structure for reference.
The average was calculated using the following command:
g_rmsf -f *.xtc -s *.tpr -ox average70-100ns.pdb -b 70000 -e 100000
> On Sat, 9 Feb 2013, Tsjerk Wassenaar wrote:
> > Hi,
> > The commands would certainly help, including the commands for getting the
> > reference structure. Do note that the reference is the reference for
> > fitting, which is 'external', i.e. provided by the user. This is not the
> > same as the structure used to calculate the deviations, which is the
> > average structure of the frames selected.
> > Cheers,
> > Tsjerk
> > On Sat, Feb 9, 2013 at 7:06 PM, bipin singh <bipinelmat at gmail.com>
> >> Hi vivek,
> >> I have few questions related to your query:
> >> During covariance matrix calculation, g_covar by default takes average
> >> structure of the trajectory as a reference structure then why you are
> >> giving it average structure of your trajectory (0-100ns) manually.
> >> Moreover without looking at your commands which you have used, it would
> >> difficult for anyone that why are you getting these surprising results.
> >> On Thu, Feb 7, 2013 at 1:26 PM, vivek modi <modi.vivek2009 at gmail.com>
> >> wrote:
> >>> Hello,
> >>> I have troubled you with a similar question before also, but I guess I
> >> need
> >>> some more clarification. My question is about the reference structure
> >>> PCA analysis.
> >>> I have 100ns long protein simulation which I want to analyze using PCA.
> >> The
> >>> RMSD shows fluctuations upto initial 25-30ns and then becomes very
> >> stable.
> >>> I have performed PCA on the last 30ns window of the simulation where I
> >>> assume the simulation has converged (I also did on other time windows
> >>> well).
> >>> The question is this:
> >>> I did the analysis on the last 30ns window in two ways by taking two
> >>> different reference structures.
> >>> a. I take the average structure of the trajectory (0-100ns) as
> >>> the reference and then do the fitting and calculate covariance matrix
> >>> last 30ns. This is done because I suspect that the average structure
> >>> full trajectory will reflect all the changes occurring in the protein.
> >>> also gives me low cosines (<0.1). The PCs show movement occurring in
> >>> certain regions of the protein.
> >>> b. I take the average structure from the same window (last 30ns) then
> >>> the fitting and calculate covariance matrix for the same. This is done
> >> with
> >>> an assumption that the reference structure must reflect the
> >>> equilibriated/stable part of the trajectory unlike the previous case.
> >>> Surprisingly it gives me high cosines (>0.5). Unlike the previous case,
> >>> this method shows very small movement in the protein (very low RMSF).
> >>> Both of these methods give me different RMSF for the PCs although they
> >> are
> >>> done on the same part of the trajectory but the reference structure is
> >>> influencing the output.
> >>> Which protocol among the two is appropriate ? And how can we explain
> >> high
> >>> cosines in second case where the reference structure is the average of
> >> the
> >>> same time window (there must not be large deviation) while I get low
> >> cosine
> >>> for the first case where deviations are calculated from the full
> >> trajectory
> >>> average (large deviation) ?
> >>> Any help is appreciated.
> >>> Thanks,
> >>> -Vivek Modi
> >>> Graduate Student
> >>> IITK.
> >>> --
> >>> gmx-users mailing list gmx-users at gromacs.org
> >>> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >>> * Please don't post (un)subscribe requests to the list. Use the
> >>> www interface or send it to gmx-users-request at gromacs.org.
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> --
> >> *-----------------------
> >> Thanks and Regards,
> >> Bipin Singh*
> >> --
> >> gmx-users mailing list gmx-users at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> * Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > --
> > Tsjerk A. Wassenaar, Ph.D.
> > post-doctoral researcher
> > Biocomputing Group
> > Department of Biological Sciences
> > 2500 University Drive NW
> > Calgary, AB T2N 1N4
> > Canada
> > --
> > gmx-users mailing list gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> Antonio M. Baptista
> Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa
> Av. da Republica - EAN, 2780-157 Oeiras, Portugal
> phone: +351-214469619 email: baptista at itqb.unl.pt
> fax: +351-214411277 WWW: http://www.itqb.unl.pt/~baptista
More information about the gromacs.org_gmx-users