[gmx-users] Re: Reference structure for PCA.

Ahmet yıldırım ahmedo047 at gmail.com
Tue Feb 12 13:53:18 CET 2013


Hi,

S = 1/N sum (x - ref) (x - ref)'

or

S = 1/(N-1) sum (x - ref) (x - ref)'

N: the number of frames

Which one is right?

2013/2/12 Tsjerk Wassenaar <tsjerkw at gmail.com>

> Hi Vivek,
>
> If you use the g_covar option -ref, you not only use the reference
> structure for fitting, you use it for calculating the deviations. Your
> covariance matrix is built as:
>
> S = 1/N sum (x - ref) (x - ref)'
>
> If you leave out the option -ref then the average structure will be
> used for the covariance matrix, which is what you (should) want. In
> that case the reference structure is only used for fitting. You might
> want to redo your calculations to check the effect of fitting.
>
> Cheers,
>
> Tsjerk
>
> On Tue, Feb 12, 2013 at 9:02 AM, vivek modi <modi.vivek2009 at gmail.com>
> wrote:
> >>
> >>
> >> Message: 1
> >> Date: Sun, 10 Feb 2013 21:32:15 +0000 (WET)
> >> From: baptista at itqb.unl.pt
> >> Subject: Re: [gmx-users] Reference structure for PCA.
> >> To: Discussion list for GROMACS users <gmx-users at gromacs.org>
> >> Message-ID: <alpine.DEB.2.00.1302102127430.8574 at simul36.itqb.unl.pt>
> >> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
> >>
> >
> > Hello Antonio,
> >
> > Thank you very much for the reply. I have gone through the paper you
> > mentioned. The central structure seems to be a good choice for the
> > reference structure.
> >
> >
> > Regards,
> >
> > -Vivek.
> >
> >
> >>
> >> Hi Vivek,
> >>
> >> There are two distinct steps involved: (1) the fit of your trajectory
> to a
> >> reference structure, which corresponds to choose a conformation space;
> (2)
> >> the use of the PCA method, which corresponds to find in that space a new
> >> basis set whose ordered axes sequentially maximize dispersion (hopefully
> >> capturing the distribution main features with only a few of the new
> >> coordinates). The two steps just happen to be done by the same program.
> >> The structure chosen for fitting is related to step 1, while the average
> >> structure used to compute the covariance matrix is related to step 2 --
> as
> >> already pointed by Tjerk, the two structures are generally not the same.
> >>
> >> The aim of the fit is to get rid of the global translation and rotation
> of
> >> your protein in the simulation box, trying to place all the sampled
> >> structures in a single 3D space that reflects "only" the conformational
> >> differences. But this is necessarily approximate, because the
> >> superimposition of any pair of structures after the global fit will be
> >> always worse than you would get by making a pairwise fit of the two.
> Thus,
> >> you want to get a final dispersion around the reference as small as
> >> possible. So, of the two average structures that you tried, you should
> >> choose the one computed from the last 30 ns (it's not surprising that it
> >> gives a smaller dispersion, because it refers to the segment you are
> >> analyzing). Still, using an average structure as a reference is a
> somewhat
> >> illusory solution, because that average must itself be obtained after
> >> fitting the trajectory to some reference... In a study of a small
> flexible
> >> peptide (where the choice of reference may have drastic effects), we
> found
> >> that a good reference seems to be the "central structure" of your
> sample,
> >> defined as the one that, when taken as a reference, leads to the lowest
> >> overall dispersion (http://dx.doi.org/10.1021/jp902991u). The article
> >> discusses the issues pointed above, so you may want to give it a look.
> >>
> >> You can also avoid the need of a reference by choosing a different
> >> conformation space for PCA, a popular alternative being the phi and psi
> >> dihedrals (look in the manual). Note that this dihedral space is a bit
> >> different from the more usual one discussed above, each reflecting a
> >> different kind of conformational proximity (this is also discussed in
> the
> >> article). It's up to you to decide which one better suits your problem.
> >>
> >> Hope this helps.
> >> Cheers,
> >> Antonio
> >>
> >> Thank you very much for your reply Tsjerk.
> > I understand that the two reference structures are different. I had a
> query
> > because I found the method is very sensitive to the choice of the
> reference
> > structure for fitting. Most of the publications either do not mention the
> > reference structure. Only in few papers I found initial structure for
> > fitting. But the method gives different  results if initial structure is
> > used; or average from complete trajectory;  or average over a certain
> time
> > window is used as the structure for reference.
> > The average was calculated using the following command:
> >
> > g_rmsf -f *.xtc -s *.tpr -ox average70-100ns.pdb -b 70000 -e 100000
> >
> >
> >
> > Regards,
> >
> > -Vivek.
> >
> >
> >
> >> On Sat, 9 Feb 2013, Tsjerk Wassenaar wrote:
> >>
> >> > Hi,
> >> >
> >> > The commands would certainly help, including the commands for getting
> the
> >> > reference structure. Do note that the reference is the reference for
> >> > fitting, which is 'external', i.e. provided by the user. This is not
> the
> >> > same as the structure used to calculate the deviations, which is the
> >> > average structure of the frames selected.
> >> >
> >> > Cheers,
> >> >
> >> > Tsjerk
> >> >
> >> > On Sat, Feb 9, 2013 at 7:06 PM, bipin singh <bipinelmat at gmail.com>
> >> wrote:
> >> >
> >> >> Hi vivek,
> >> >>
> >> >> I have few questions related to your query:
> >> >>
> >> >> During covariance matrix calculation, g_covar by default takes
> average
> >> >> structure of the trajectory as a reference structure then why you are
> >> >> giving it average structure of your trajectory (0-100ns) manually.
> >> >> Moreover without looking at your commands which you have used, it
> would
> >> be
> >> >> difficult for anyone that why are you getting these surprising
> results.
> >> >> On Thu, Feb 7, 2013 at 1:26 PM, vivek modi <modi.vivek2009 at gmail.com
> >
> >> >> wrote:
> >> >>
> >> >>> Hello,
> >> >>>
> >> >>> I have troubled you with a similar question before also, but I
> guess I
> >> >> need
> >> >>> some more clarification. My question is about the reference
> structure
> >> in
> >> >>> PCA analysis.
> >> >>> I have 100ns long protein simulation which I want to analyze using
> PCA.
> >> >> The
> >> >>> RMSD shows fluctuations upto initial 25-30ns and then becomes very
> >> >> stable.
> >> >>> I have performed PCA on the last 30ns window of the simulation
> where I
> >> >>> assume the simulation has converged (I also did on other time
> windows
> >> as
> >> >>> well).
> >> >>>
> >> >>> The question is this:
> >> >>> I did the analysis on the last 30ns window in two ways by taking two
> >> >>> different reference structures.
> >> >>>
> >> >>> a. I take the average structure of the trajectory (0-100ns) as
> >> >>> the reference and then do the fitting and calculate covariance
> matrix
> >> for
> >> >>> last 30ns. This is done because I suspect that the average structure
> >> over
> >> >>> full trajectory will reflect all the changes occurring in the
> protein.
> >> It
> >> >>> also gives me low cosines (<0.1). The PCs show movement occurring in
> >> >>> certain regions of the protein.
> >> >>>
> >> >>> b. I take the average structure from the same window (last 30ns)
> then
> >> do
> >> >>> the fitting and calculate covariance matrix for the same. This is
> done
> >> >> with
> >> >>> an assumption that the reference structure must reflect the
> >> >>> equilibriated/stable part of the trajectory unlike the previous
> case.
> >> >>> Surprisingly it gives me high cosines (>0.5). Unlike the previous
> case,
> >> >>> this method shows very small movement in the protein (very low
> RMSF).
> >> >>>
> >> >>> Both of these methods give me different RMSF for the PCs although
> they
> >> >> are
> >> >>> done on the same part of the trajectory but the reference structure
> is
> >> >>> influencing the output.
> >> >>>
> >> >>>  Which protocol among the two is appropriate ?  And how can we
> explain
> >> >> high
> >> >>> cosines in second case where the reference structure is the average
> of
> >> >> the
> >> >>> same time window (there must not be large deviation) while I get low
> >> >> cosine
> >> >>> for the first case where deviations are calculated from the full
> >> >> trajectory
> >> >>> average (large deviation) ?
> >> >>>
> >> >>> Any help is appreciated.
> >> >>>
> >> >>> Thanks,
> >> >>>
> >> >>> -Vivek Modi
> >> >>> Graduate Student
> >> >>> IITK.
> >> >>> --
> >> >>> gmx-users mailing list    gmx-users at gromacs.org
> >> >>> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> >>> * Please search the archive at
> >> >>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> >>> * Please don't post (un)subscribe requests to the list. Use the
> >> >>> www interface or send it to gmx-users-request at gromacs.org.
> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> *-----------------------
> >> >> Thanks and Regards,
> >> >> Bipin Singh*
> >> >> --
> >> >> gmx-users mailing list    gmx-users at gromacs.org
> >> >> http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> >> * Please search the archive at
> >> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> >> * Please don't post (un)subscribe requests to the list. Use the
> >> >> www interface or send it to gmx-users-request at gromacs.org.
> >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Tsjerk A. Wassenaar, Ph.D.
> >> >
> >> > post-doctoral researcher
> >> > Biocomputing Group
> >> > Department of Biological Sciences
> >> > 2500 University Drive NW
> >> > Calgary, AB T2N 1N4
> >> > Canada
> >> > --
> >> > gmx-users mailing list    gmx-users at gromacs.org
> >> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> >> > * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> >> > * Please don't post (un)subscribe requests to the list. Use the
> >> > www interface or send it to gmx-users-request at gromacs.org.
> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >> >
> >> --
> >> Antonio M. Baptista
> >> Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa
> >> Av. da Republica - EAN, 2780-157 Oeiras, Portugal
> >> phone: +351-214469619         email: baptista at itqb.unl.pt
> >> fax:   +351-214411277         WWW:   http://www.itqb.unl.pt/~baptista
> >>
> --------------------------------------------------------------------------
> >>
> >>
> >>
> >> ------------------------------
> >>
> >>
> >>
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>
>
> --
> Tsjerk A. Wassenaar, Ph.D.
>
> post-doctoral researcher
> Biocomputing Group
> Department of Biological Sciences
> 2500 University Drive NW
> Calgary, AB T2N 1N4
> Canada
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



-- 
Ahmet Yıldırım



More information about the gromacs.org_gmx-users mailing list