[gmx-users] Principal Components Analysis in Gromacs

James Starlight jmsstarlight at gmail.com
Wed Mar 28 18:16:08 CEST 2012

Hi Tsjerk!

First, I'd like also thanks you for your  help.

Today I tried to make PCA of my X-ray data as well as comparison between
results of such PCA and EDA ( PCA wich is based on the MD trajectory of the
same protein).

In generaly I have no any problems with the X-ray PCA but I've forced with
some during comparison of results of both PCAs due to the different atom
numbers in both datasets ( X-ray structures consist of missing atoms ). As
I understood this problem could be solved by selection atoms for the MD
trajectory ( before PCA calculation of that data) wich are present in all
X-ray structures.

Because of the X-ray dataset consist of some missing residues in the loops
of structures I've had to reduce atom numbers in initial tpr topology.
So by means of tpbconv  I've made new .tpr file for this X-ray structures.
This new TPR was based on the  .tpr file wich I've obtained from the MD of
full-atomic model of protein. In new TPR I include only mainchain atoms as
well as residues presented in the X-ray structures ( I've defined it by
means of index.ndx file ). Does this aproach correct ? Are there any extra
ways to obtaint TPR file for my X-ray dataset  without redusing topology of
the MD structure ?

Also I'd like to ask some more about PCA results.

1) Firstly, what exactly is the new compresed trajectory made by
-filt filtered.xtc

In case of x-ray PCA this trajectory correspond to the initial numbers of
pdb's files but in case of MD_based PCA this trajectory consist of redused
number of atoms.

2) Also I'd like to know what actyally is the
-extr extremePCA.pdb ?

In case of MD_based PCA I've obtaned 20 different extreme.pdb trajectories
where 20 was the number of calculated Principal components.

But in case of X-ray PCA I've obtained only one such file wich was similar
to visualisation of the motion along softest PC althrough I've calculated
10 eigenvectors. Why in case of PCA I've obtain only one such file and what
exactly this trajectory is ? Is there any extra methods to visualise
motions along specified components ( not for ensembles of components )?

3) How I could specify exactly number of PC in the projection graphs ? AS I
understood 2d and 3d projections are made along the -first and -last
components. How I can make such projections based on two/ three another
specified components (e.g along 2 and 7 modes from 20 calculated) ?

Thanks for help,


> Hi Thomas,
> > Thanks for all the clarifications about PCA you make on the mailing list!
> Thank you for the appreciation :)
> > I have a question about the commandlines you wrote. Why do you use the
> .tpr
> > file with the "-s" flag? Is it because you want to compare the
> > mass-wheighted covariance matrices? I use to calculate the covariance
> > matrices by giving to g_covar a .pdb file with the "-s" flag and then
> > calculate the RMSIP without giving any structure file. I guess no masses
> are
> > used in that covariance analysis, right? Do you recommend using atom
> masses
> > for PCA in general?
> Well, I admit that in most cases I don't use mass-weighting myself.
> Unless you include hydrogens, it also doesn't matter much, as the
> masses are not very different. Only if you want to calculate
> frequencies, e.g. to connect to NMA and/or IR spectroscopy you would
> really need masses.
> If you use a .pdb or .gro file, you don't get mass-weighting. And
> you're right that for calculating the RMSIP, and the subspace overlap,
> and the martix of inner products, you don't need a structure filel,
> but only the eigenvectors, and possibly the eigenvalues.
> Cheers,
> Tsjerk
