[gmx-users] Re: questions about Principal Component Analysis

Thomas Evangelidis tevang3 at gmail.com
Sat Mar 10 15:18:07 CET 2012

Regarding my second question, I have been experimenting with the cosine
content using different portions of the trajectory and these are the
results I got for the first principal component:

proj-ev1_coscont_0-5ns.xvg 0.0174761
proj-ev1_coscont_0-10ns.xvg 0.0283423
proj-ev1_coscont_0-15ns.xvg 4.16906e-06
proj-ev1_coscont_0-20ns.xvg 0.0689592
proj-ev1_coscont_0-25ns.xvg 0.161691
proj-ev1_coscont_0-30ns.xvg 0.298431
proj-ev1_coscont_0-35ns.xvg 0.535732
proj-ev1_coscont_0-40ns.xvg 0.767029
proj-ev1_coscont_0-45ns.xvg 0.885829
proj-ev1_coscont_0-50ns.xvg 0.906473

proj-ev1_coscont_5-50ns.xvg  0.8823
proj-ev1_coscont_10-50ns.xvg 0.751018
proj-ev1_coscont_15-50ns.xvg 0.537473
proj-ev1_coscont_20-50ns.xvg 0.357136
proj-ev1_coscont_25-50ns.xvg 0.145889
proj-ev1_coscont_30-50ns.xvg 0.0150995
proj-ev1_coscont_35-50ns.xvg 0.00123905
proj-ev1_coscont_40-50ns.xvg 0.00675679
proj-ev1_coscont_45-50ns.xvg 0.0105643

The total time I have run the simulation was 70ns, but judging from the
steep increase of the RMSD (RMSD plot attached), I decided to exclude the
first 20ns from the analysis. Hence the cosine content values above
correspond to the last 50ns and the counting starts from the 20th ns.

Clearly the convergence of the last 50ns is not good
(proj-ev1_coscont_0-50ns.xvg: 0.906473), but the PC1 plot (attached) shows
a steep decrease at ~30ns which looks like a conformational transition. It
is also evident that the cosine content decreases drastically approximately
at that point (25-50ns: 0.145889, 30-50ns.xvg: 0.0150995, 35-50ns.xvg:
0.00123905, etc.) and reached values that are not bad.

Unfortunately, extending the simulation is not an option due to lack of
time (I am forced to finish the manuscript soon). So would you recommend
doing essential dynamics for the last 20ns?

I would GREATLY appreciate any comments!!!


On 7 March 2012 21:56, Thomas Evangelidis <tevang3 at gmail.com> wrote:

> Dear GROMACS community,
> I have two questions regarding PCA. I have run MD simulations for 70 ns
> for a protein of 1100 amino acids, of which I decided - based on the RMSD -
> to analyze the last 50.
> 1) The protein consists of 5 domains, out of which only one is of interest
> in this study. Is it right to do draw conclusion from PCA restricted that
> domain (400 aa)?
> 2) How can I find out if the simulation time is sufficient to do PCA?
> Thanks in advance for any feedback.
> Thomas
