[gmx-users] Reference structure for PCA.

baptista at itqb.unl.pt baptista at itqb.unl.pt
Sun Feb 10 22:32:15 CET 2013


Hi Vivek,

There are two distinct steps involved: (1) the fit of your trajectory to a 
reference structure, which corresponds to choose a conformation space; (2) 
the use of the PCA method, which corresponds to find in that space a new 
basis set whose ordered axes sequentially maximize dispersion (hopefully 
capturing the distribution main features with only a few of the new 
coordinates). The two steps just happen to be done by the same program. 
The structure chosen for fitting is related to step 1, while the average 
structure used to compute the covariance matrix is related to step 2 -- as 
already pointed by Tjerk, the two structures are generally not the same.

The aim of the fit is to get rid of the global translation and rotation of 
your protein in the simulation box, trying to place all the sampled 
structures in a single 3D space that reflects "only" the conformational 
differences. But this is necessarily approximate, because the 
superimposition of any pair of structures after the global fit will be 
always worse than you would get by making a pairwise fit of the two. Thus, 
you want to get a final dispersion around the reference as small as 
possible. So, of the two average structures that you tried, you should 
choose the one computed from the last 30 ns (it's not surprising that it 
gives a smaller dispersion, because it refers to the segment you are 
analyzing). Still, using an average structure as a reference is a somewhat 
illusory solution, because that average must itself be obtained after 
fitting the trajectory to some reference... In a study of a small flexible 
peptide (where the choice of reference may have drastic effects), we found 
that a good reference seems to be the "central structure" of your sample, 
defined as the one that, when taken as a reference, leads to the lowest 
overall dispersion (http://dx.doi.org/10.1021/jp902991u). The article 
discusses the issues pointed above, so you may want to give it a look.

You can also avoid the need of a reference by choosing a different 
conformation space for PCA, a popular alternative being the phi and psi 
dihedrals (look in the manual). Note that this dihedral space is a bit 
different from the more usual one discussed above, each reflecting a 
different kind of conformational proximity (this is also discussed in the 
article). It's up to you to decide which one better suits your problem.

Hope this helps.
Cheers,
Antonio

On Sat, 9 Feb 2013, Tsjerk Wassenaar wrote:

> Hi,
>
> The commands would certainly help, including the commands for getting the
> reference structure. Do note that the reference is the reference for
> fitting, which is 'external', i.e. provided by the user. This is not the
> same as the structure used to calculate the deviations, which is the
> average structure of the frames selected.
>
> Cheers,
>
> Tsjerk
>
> On Sat, Feb 9, 2013 at 7:06 PM, bipin singh <bipinelmat at gmail.com> wrote:
>
>> Hi vivek,
>>
>> I have few questions related to your query:
>>
>> During covariance matrix calculation, g_covar by default takes average
>> structure of the trajectory as a reference structure then why you are
>> giving it average structure of your trajectory (0-100ns) manually.
>> Moreover without looking at your commands which you have used, it would be
>> difficult for anyone that why are you getting these surprising results.
>> On Thu, Feb 7, 2013 at 1:26 PM, vivek modi <modi.vivek2009 at gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have troubled you with a similar question before also, but I guess I
>> need
>>> some more clarification. My question is about the reference structure in
>>> PCA analysis.
>>> I have 100ns long protein simulation which I want to analyze using PCA.
>> The
>>> RMSD shows fluctuations upto initial 25-30ns and then becomes very
>> stable.
>>> I have performed PCA on the last 30ns window of the simulation where I
>>> assume the simulation has converged (I also did on other time windows as
>>> well).
>>>
>>> The question is this:
>>> I did the analysis on the last 30ns window in two ways by taking two
>>> different reference structures.
>>>
>>> a. I take the average structure of the trajectory (0-100ns) as
>>> the reference and then do the fitting and calculate covariance matrix for
>>> last 30ns. This is done because I suspect that the average structure over
>>> full trajectory will reflect all the changes occurring in the protein. It
>>> also gives me low cosines (<0.1). The PCs show movement occurring in
>>> certain regions of the protein.
>>>
>>> b. I take the average structure from the same window (last 30ns) then do
>>> the fitting and calculate covariance matrix for the same. This is done
>> with
>>> an assumption that the reference structure must reflect the
>>> equilibriated/stable part of the trajectory unlike the previous case.
>>> Surprisingly it gives me high cosines (>0.5). Unlike the previous case,
>>> this method shows very small movement in the protein (very low RMSF).
>>>
>>> Both of these methods give me different RMSF for the PCs although they
>> are
>>> done on the same part of the trajectory but the reference structure is
>>> influencing the output.
>>>
>>>  Which protocol among the two is appropriate ?  And how can we explain
>> high
>>> cosines in second case where the reference structure is the average of
>> the
>>> same time window (there must not be large deviation) while I get low
>> cosine
>>> for the first case where deviations are calculated from the full
>> trajectory
>>> average (large deviation) ?
>>>
>>> Any help is appreciated.
>>>
>>> Thanks,
>>>
>>> -Vivek Modi
>>> Graduate Student
>>> IITK.
>>> --
>>> gmx-users mailing list    gmx-users at gromacs.org
>>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>>> * Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it to gmx-users-request at gromacs.org.
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>
>>
>>
>> --
>> *-----------------------
>> Thanks and Regards,
>> Bipin Singh*
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>
>
>
> -- 
> Tsjerk A. Wassenaar, Ph.D.
>
> post-doctoral researcher
> Biocomputing Group
> Department of Biological Sciences
> 2500 University Drive NW
> Calgary, AB T2N 1N4
> Canada
> -- 
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
--
Antonio M. Baptista
Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa
Av. da Republica - EAN, 2780-157 Oeiras, Portugal
phone: +351-214469619         email: baptista at itqb.unl.pt
fax:   +351-214411277         WWW:   http://www.itqb.unl.pt/~baptista
--------------------------------------------------------------------------




More information about the gromacs.org_gmx-users mailing list