[gmx-users] How to make PCA effective?

Jinzhi Tan jztan at mail.shcnc.ac.cn
Thu Nov 11 13:48:24 CET 2004


Dear gmx-users,

After I run the conventional MD simulation for several nanoseconds,I want to do PCA. I encountered some problems. 

Firstly, How long time should I run the conventional MD when I try to do PCA? as long as possible? I was told that the samples in the conformational space will be enough if the simulation time is long enough. But I am not sure it does work because I found some loops are mobile and they moved just at the first several hundreds picoseconds and then they hold the new position for a long time. I wonder if they can come back to their original conformation if I run long MD simulation?  Another case is the protein unfolding. Some papers reported the protein unfolding after a long MD time (several nanoseconds), but I wonder if the time is long enough, the protein can fold automatically. What do we think about the effect of the force field?

Secondly, which time should I select as the initial time of PCA? Should I select the time when the RMSD of the protein tends to be level off after about two nanosecond or should I select the whole MD simulation time? But in some papers, they just run one nanosecond in total and then do PCA? Is it correct?  
 
Thirdly, I used two methods to analyze the first eigenvector and got different results? I am not sure why they are different? If I use: g_anaeig -v eigenvec.trr -first 1 -last 1 -extr vec1_extreme.pdb, I got the following result:

1 eigenvectors selected for output: 1
Last frame       9445 time 9445.000   
eigenvector           Minimum           Maximum
                 value       time      value       time
      1      -6.273994      454.0   5.266299     9429.0
Writing 2 frames along eigenvector 1 to vec1_extreme.pdb
 
When I use: g_anaeig -v eigenvec.trr -first 1 -last 8 -extr vec18_extreme.pdb, I got:

8 eigenvectors selected for output: 1 2 3 4 5 6 7 8
Last frame       9445 time 9445.000   
eigenvector           Minimum           Maximum
                 value       time      value       time
      1      -6.273994      454.0   5.266299     9429.0
      2      -4.850856       11.0   4.864636     5113.0
      3      -2.722965     6113.0   2.619274     2238.0
      4      -2.837103     3826.0   2.447154     8460.0
      5      -3.493261     7502.0   2.076011      778.0
      6      -2.219512     5995.0   2.655742      489.0
      7      -1.916822     5302.0   2.395802     2613.0
      8      -2.154755       62.0   1.883655     7235.0
Writing 2 frames along eigenvector 1 to vec18_extreme1.pdb
Writing 2 frames along eigenvector 2 to vec18_extreme2.pdb
Writing 2 frames along eigenvector 3 to vec18_extreme3.pdb
Writing 2 frames along eigenvector 4 to vec18_extreme4.pdb
Writing 2 frames along eigenvector 5 to vec18_extreme5.pdb
Writing 2 frames along eigenvector 6 to vec18_extreme6.pdb
Writing 2 frames along eigenvector 7 to vec18_extreme7.pdb
Writing 2 frames along eigenvector 8 to vec18_extreme8.pdb
 
So what is the mean of "value"? Is the time corresponding to the real simulation time? But I check the snapshot at 454.0 ps,vec1_extreme.pdb (select the minimal) and vec18_extreme1.pdb (select the minimal), they are not the same! So what is meaning of the time? 

For the two results, the information of first eigenvector is the same (as above), but actually the vec1_extreme.pdb and vec18_extreme1.pdb is different. Should they be the same?

I am not sure if I am confused about the basic theory of PCA or make some other mistakes. Hope you can give me some advice. Thank you very much!

Best wishes,

Jinzhi Tan  <tanjinzhi at hotmail.com>
2004-11-10
 



More information about the gromacs.org_gmx-users mailing list