[gmx-users] Re: PCA eigenvalue normalization

Fri Apr 7 22:37:19 CEST 2006

Hi Tsjerk,

The system I am working on is the C-terminal tails of tubulin.  The  
structure of the tails is missing in all of the crystal structures of  
tubulin, likely due to the flexibility of the tails.  Since tubulin  
is quite large, a heterodimer of almost 900 residues, it is not  
really possible for us to adequately sample the tails' configuration  
space by simulating the whole protein.  What we have done is to  
simulate nine different isotypes of the c-terminal tail fragment  
(9-26 residues) using constant pressure REMD.

Among the properties we are interested in is quantifying and  
comparing the flexibility of tails from the different isotypes.  I  
have performed PCA twice on this; fitting the all the C-alphas and  
then the backbone atoms of the first three residues to a reference  
structure.  The motivation for the later fitting procedure is that  
the fragments would be anchored to tubulin at the fragments N- 
terminus.  This is to look at how the fragments behave in the absence  
of tubulin's potential but as if they are still anchored.  When  
performing the PCA we are fitting to a reference structure but still  
using the deviations from the mean and not from the reference  
structure.  To compare the flexibility between the isotypes we had  
hoped to normalize the eigenvalues.

I did note that I was inadvertently using the deviation from the  
reference structure rather than from the mean (-ref).  Is this what  
you meant by a non-central covariance matrix?  Using the deviation  
from the mean I obtained standard deviations more inline with what I  
originally expected.  These are 0.8 nm at most if I normalize the  
variance by the number of atoms.

My two questions are then:

1) does it make sense to fit the last three residues, as I described  
above, for the purposes of PCA?

2) is it possible to compare the relative flexibilities of the  
fragments using PCA?

Thank you,

Tyler

> Hi Tyler,
>
> First, what question are you trying to answer? You're different  
> peptides
> have completely different conformational spaces, simply because of the
> differences in degrees of freedom, so you can't compare the PCA  
> results from
> one system with the other. That is, unless you pick a subset from each
> system, consisting of comparable particles, for which you can  
> safely make
> the assumption that under equal circumstances should give the same
> eigenvectors and -values. From that assumption, you could try to  
> make an
> assessment whether the behaviour between the systems is different.
>
> Also, since you're using only the first three residues for fitting,  
> you
> generate a non-central covariance matrix. That would be useful if  
> you would
> like to exaggerate certain motional features, right, but it makes the
> interpretation of PCA results difficult. If it's for the purpose of
> comparing things, I wouldn't go there if I were you. The non- 
> centrality is
> also the reason that your standard deviations end up high. You're not
> subtracting the mean so your standard deviations is sqrt( sum(x^2)/N )
> rather than sqrt( sum((x-average)^2))/N ). Is this really what you  
> want to
> do? What are you expecting to get from this? I'd like to know the  
> question
> your trying to answer and your assumptions on the nature of the  
> data...
>
> Cheers,
>
> Tsjerk
>
> On 4/7/06, Tyler Luchko <tluchko at ualberta.ca> wrote:
>>
>> Hello,
>>
>> Thank you for the previous responses.  I still have some questions
>> about the eigenvalues however.
>>
>> I should note that the frames of my trajectory have been fit to a
>> reference structure using the backbone atoms of the first three
>> residues.  This is because the peptide is a fragment of a much larger
>> protein.
>>
>> 1) If I wish to compare the eigenvalues of several peptides of
>> different lengths how would I normalize the eigenvalues?  Do I simply
>> divide by the number of atoms used in the calculation?
>>
>> 2) If the eigenvalue represents the sum of the variances for each
>> particle along the eigenvector then dividing the eigenvector by the
>> number of atoms used in the calculation should be the average
>> variance. Likewise, the square root of this should be the average
>> standard deviation per atom.  In my case, the first eigenvector is a
>> stretching in the length of the peptide.  Shouldn't the average
>> standard deviation per atom along this stretching motion be smaller
>> that the standard deviation in the length of the entire peptide, or
>> at least smaller than the extended length of the peptide?
>>
>> Thank you,
>>
>> Tyler
>>
>>> Hi Tyler,
>>>
>>> Note that the eigenvalue represents the sum of the variances for  
>>> each
>>> particle along the associated eigenvector. That seems quite
>>> reasonable to
>>> me.
>>>
>>> Tsjerk
>>>
>>> On 4/6/06, Tyler Luchko <tluchko at ualberta.ca> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I have performed PCA analysis, without mass weighting, on a peptide
>>>> using g_covar and g_anaeig.  The first principal component  
>>>> generally
>>>> corresponds to the stretching of the peptide.  I understand that  
>>>> each
>>>> eigenvalue represents the variance in the motion along the  
>>>> associated
>>>> eigenvector.  However, the square root of the variance for the  
>>>> first
>>>> eigenvalue is ~20 nm while the maximum extended length of any  
>>>> peptide
>>>> is ~3 nm.  I have tried normalizing the eigenvalues by the  
>>>> number of
>>>> atoms used for the analysis (73) but this gives the standard
>>>> deviation of the motion to be ~2.2 nm, still much too large.  I  
>>>> would
>>>> like to know how to normalize the eigenvalues to obtain reasonable
>>>> standard deviations from the eigenvalues.
>>>>
>>>> Thank you,
>>>>
>>>> Tyler
>>>>
>>>>
>>>>   ________________________________________________________________
>>>> (_    Tyler Luchko                           Ph.D. Candidate    _)
>>>>   _)   Department of Physics            University of Alberta   (_
>>>> (_    Edmonton, Alberta, Canada                                 _)
>>>>   _)   780-492-1063                       tluchko at ualberta.ca   (_
>>>> (________________________________________________________________)
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>> http://www.gromacs.org/mailman/listinfo/gmx-users
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Tsjerk A. Wassenaar, M.Sc.
>>> Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
>>> Dept. of Biophysical Chemistry
>>> University of Groningen
>>> Nijenborgh 4
>>> 9747AG Groningen, The Netherlands
>>> +31 50 363 4336
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: http://www.gromacs.org/pipermail/gmx-users/attachments/
>>> 20060406/0ffa9560/attachment-0001.html
>>>
>> _______________________________________________
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-users
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>
>
>
> --
>
> Tsjerk A. Wassenaar, M.Sc.
> Groningen Biomolecular Sciences and Biotechnology Institute (GBB)
> Dept. of Biophysical Chemistry
> University of Groningen
> Nijenborgh 4
> 9747AG Groningen, The Netherlands
> +31 50 363 4336