[gmx-users] [Fwd: please help me with PCA questions]

Tsjerk Wassenaar t.a.wassenaar at chem.rug.nl
Wed Nov 24 12:28:13 CET 2004


Jinzhi Tan,

Sorry that no reply was given before. First off, PCA is just a method to 
reformulate the fluctuations of the atoms you find in a trajectory. That 
basically means that you only get information about the structures of 
these on the time-scales you use. So, if you want to show something with 
PCA you should first have an idea of the kind of things you want to find 
(and what the possibilities and limitations of PCA are).

With regards to the time scale, though it is largely dependent on what 
you're after, you should note that for runs shorter than about 15 ns 
(check some work by Berk Hess), the first few principal components you 
get may be due to random diffusion, rather than a protein specific 
effect. You can check the cosine content of the pc's to see whether it's 
random diffusion you see.

The second question again depends on what you want to see. Basically you 
start from a point where you have a fair guess that the system is 
equilibrated. That's not the same as having the RMSD leveled off. 
Whether it's correct to do PCA on a one ns run depends on the questions 
and the inferences and conclusions made afterwards :p

The values you get for the extreme projections are just what they say 
they are. Compare it to the RMSD value of a structure w.r.t. the average 
structure along a certain eigenvector. If you want to have it more 
correct and exact, first check the Gromacs manual and then get a 
statistical textbook treating PCA.

Hope this is of some use to you.

Cheers,

Tsjerk

David van der Spoel wrote:

>-------- Forwarded Message --------
>From: Jinzhi Tan <jztan at mail.shcnc.ac.cn>
>Reply-To: jztan at mail.shcnc.ac.cn
>To: spoel at xray.bmc.uu.se
>Subject: please help me with PCA questions
>Date: Wed, 24 Nov 2004 9:29:43 +0800
>Dear Prof.van der Spoel,
>
>I am a gmx-user. A few days ago, I ask some questions on gmx-user list. I think maybe they are some very stupid questions for nobody would answer them. I hope you can help me. Thank you very much! Please see the followings:
>
>After I run the conventional MD simulation for several nanoseconds,I want to do PCA. I encountered some problems. 
>
>Firstly, How long time should I run the conventional MD when I try to do PCA? as long as possible? I was told that the samples in the conformational space will be enough if the simulation time is long enough. But I am not sure it does work because I found some loops are mobile and they moved just at the first several hundreds picoseconds and then they hold the new position for a long time. I wonder if they can come back to their original conformation if I run long MD simulation?  Another case is the protein unfolding. Some papers reported the protein unfolding after a long MD time (several nanoseconds), but I wonder if the time is long enough, the protein can fold automatically. What do we think about the effect of the force field?
>
>Secondly, which time should I select as the initial time of PCA? Should I select the time when the RMSD of the protein tends to be level off after about two nanosecond or should I select the whole MD simulation time? But in some papers, they just run one nanosecond in total and then do PCA? Is it correct?  
> 
>Thirdly, I used two methods to analyze the first eigenvector and got different results? I am not sure why they are different? If I use: g_anaeig -v eigenvec.trr -first 1 -last 1 -extr vec1_extreme.pdb, I got the following result:
>
>1 eigenvectors selected for output: 1
>Last frame       9445 time 9445.000   
>eigenvector           Minimum           Maximum
>                 value       time      value       time
>      1      -6.273994      454.0   5.266299     9429.0
>Writing 2 frames along eigenvector 1 to vec1_extreme.pdb
> 
>When I use: g_anaeig -v eigenvec.trr -first 1 -last 8 -extr vec18_extreme.pdb, I got:
>
>8 eigenvectors selected for output: 1 2 3 4 5 6 7 8
>Last frame       9445 time 9445.000   
>eigenvector           Minimum           Maximum
>                 value       time      value       time
>      1      -6.273994      454.0   5.266299     9429.0
>      2      -4.850856       11.0   4.864636     5113.0
>      3      -2.722965     6113.0   2.619274     2238.0
>      4      -2.837103     3826.0   2.447154     8460.0
>      5      -3.493261     7502.0   2.076011      778.0
>      6      -2.219512     5995.0   2.655742      489.0
>      7      -1.916822     5302.0   2.395802     2613.0
>      8      -2.154755       62.0   1.883655     7235.0
>Writing 2 frames along eigenvector 1 to vec18_extreme1.pdb
>Writing 2 frames along eigenvector 2 to vec18_extreme2.pdb
>Writing 2 frames along eigenvector 3 to vec18_extreme3.pdb
>Writing 2 frames along eigenvector 4 to vec18_extreme4.pdb
>Writing 2 frames along eigenvector 5 to vec18_extreme5.pdb
>Writing 2 frames along eigenvector 6 to vec18_extreme6.pdb
>Writing 2 frames along eigenvector 7 to vec18_extreme7.pdb
>Writing 2 frames along eigenvector 8 to vec18_extreme8.pdb
> 
>So what is the mean of "value"? Is the time corresponding to the real simulation time? But I check the snapshot at 454.0 ps,vec1_extreme.pdb (select the minimal) and vec18_extreme1.pdb (select the minimal), they are not the same! So what is meaning of the time? 
>
>For the two results, the information of first eigenvector is the same (as above), but actually the vec1_extreme.pdb and vec18_extreme1.pdb is different. Should they be the same?
>
>I am not sure if I am confused about the basic theory of PCA or make some other mistakes. Hope you can give me some advice. Thank you very much!
>
>Best wishes,
>
>Jinzhi Tan  
>2004-11-10
>************************************
>E-mail: tanjinzhi at hotmail.com 
>        jztan at mail.shcnc.ac.cn   
>************************************ 
>
>
>
>
> 
>
>
>  
>


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- :)
-- :) 	Tsjerk A. Wassenaar, M.Sc.
-- :) 	Molecular Dynamics Group
-- :) 	Dept. of Biophysical Chemistry
-- :) 	University of Groningen
-- :) 	Nijenborgh 4
-- :) 	9747 AG Groningen
-- :) 	The Netherlands
-- :)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- :)
-- :) 	Hi! I'm a .signature virus!
-- :) 	Copy me into your ~/.signature to help me spread!
-- :)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





More information about the gromacs.org_gmx-users mailing list