# [gmx-users] PCA

Tsjerk Wassenaar tsjerkw at gmail.com
Fri May 21 10:05:18 CEST 2010

```Hi Pawan,

You may want to read up on PCA in some elementary multivariate
statistics textbook to get a better grasp on what it does and how it's
done.

> I have a little concept problem regarding principal component analysis. So
> my question is about ED sampling are as follows:
>
> 1. I have read from the manual that g_covar calculates and diagonalize the
> (mass-weighted) covariance matrix. So what is the meaning of mass-weighted
> in covariance matrix?

All coordinates are multiplied by their mass prior to the analysis.
Coordinates are taken relative to the center of mass, rather than to
the center of geometry.

> 2. g_covar output the eigenval.xvg and and eigenvec.trr, but when I opened
> the eigenval.xvg file it will shows nothing, i don't know what was wrong
> with it?

I don't know either, you didn't say anything that could give a pointer
to the problem. What was you're command line, what were your
selections, and what was the output of the program? And opening, does
it mean looking at it as text or looking at it with xmgrace?

> 3. what is the difference between covariance matrix and normal mode analysis
> because both were used to generate the eigenval.xvg and eigenvec.trr file?

A covariance matrix is a matrix and normal modes analysis is an
analysis technique. 'Essential dynamics' involves PCA on a positional
covariance matrix, and Normal modes analysis involves PCA on the
Hessian matrix. Reading some literature wouldn't hurt here.

> 4. g_anaeig analyze the eigenvectors, so it is possible to fitted all the
> structures generated at the time of simulations of single structure without
> using the other structure?
> I mean to say that it is possible to use single structure as initial to
> simulate and ED sampling?

If you've determined a set of eigenvectors you want to use for your
sampling, and these correspond to your system, yes. But you need a set
of structures to obtain the eigenvectors.

> 5. what is the need of eigenvec2.trr input file in g_anaeig to generate the
> single number of covariance matrix as shown in manual? I have used to input
> only one eigenvec.trr and eigenval.xvg, then it is right to do this?

The single number is a measure for the overlap of one sampled space
with another. Surely you'll need two sets of vectors to calculate an
overlap.

> 6. I have used eigenval.xvg as input file in g_anaeig which do not shows
> nothing when used to open in xmgrace. Then how this file used for generating
> eigcomp.xvg, proj.xvg, eigrmsf.xvg, 2dproj.xvg, 3dproj.pdb (which I have
> successfully generated).

The eigenvalues are not used for the projections, or for analysis of
the projections. But was it also really empty? Because that would
imply you're PCA did not complete successfully. You should have a look
at the file in text.

> 7. One last question is related to g_analyze that it reads ascii file and
> analyze data sets, but in actual it used some graph.xvg file as input. I am
> calculate the cosine content of the principal components.

It's a default file name that the program searches for if no file name
is specified.

Hope it helps,

Tsjerk

--
Tsjerk A. Wassenaar, Ph.D.

post-doctoral researcher
Molecular Dynamics Group
Groningen Institute for Biomolecular Research and Biotechnology
University of Groningen
The Netherlands

```