[gmx-users] PCA depends of the number of frames?

Tsjerk Wassenaar tsjerkw at gmail.com
Mon Sep 26 08:53:28 CEST 2011

Hi Ricardo

> For the case (1) and (2) the most representative structure was used in the
> option -s ( One that has the lowest rmsd with respect to the average of each
> cluster).
> In case (3) the initial structure of the MD was used in the option -s.

If all belong to the same system, it is better to use one reference
structure, to define the conformational space in the same way,
allowing direct comparison.

> When I look the eigenvalues for the case (1) and (2), I found that the
> eigenvalue is zero only after index="number of frames" (see below)
> In the case (3) the distribution is smooth
> I could expect a similar distribution for the case (1) and (2), because the
> frames are representative of the dymanics of the protein.
> Why this difference?
> PCA depend of the number of frames?

Yes, it does. This has, in fact, been pointed out in the early papers
on PCA in MD. I think it's best to read up more about PCA, including
some introductory material from statistics. One thing I'll give away
though... ;) Consider the motion of a particle in three dimensions. If
you have two frames, you can say something about motion along a line.
You need two frames to say something about motion in a plane, and you
need at least three points to say something about motion in all three
dimensions. Now in your case, each conformation is one point and the
conformational space in which the point moves has 3N dimensions. If
you have two points, you can only say something about motion along a
line, i.e., you have one component with nonzero eigenvector. With
three points (conformations), you can obtain two eigenvectors, which
span a plane, etc.

Hope it helps,


Tsjerk A. Wassenaar, Ph.D.

post-doctoral researcher
Molecular Dynamics Group
* Groningen Institute for Biomolecular Research and Biotechnology
* Zernike Institute for Advanced Materials
University of Groningen
The Netherlands

More information about the gromacs.org_gmx-users mailing list