[gmx-users] Replicating the Results from gmx covar

Wed Jul 25 02:55:00 CEST 2018

Hi,

I've been developing codes to perform PCA of the alpha-carbons from a 
processed GROMACS trajectory of a protein in the human-readable .g96 
format. I perform the standard post-processing of the trajectory (still 
in the compressed .xtc version generated from gmx mdrun) as follows:

1) I correct any broken bonds across the periodic boundary conditions 
(PBCs) using gmx trjconv and simultaneously move the centre-of-mass to 
the origin of the box using -center -boxcenter zero and output my 
reference structure used for the rest of the processing workflow, 
ref.pdb:

echo "3 1" | gmx trjconv -f traj.xtc -s top.tpr -pbc whole -center 
-boxcenter zero -dump 0 -o ref.pdb

where traj.xtc is the 'raw' trajectory output from mdrun and top.tpr is 
the topology generated by gmx grompp called immediately prior to the 
mdrun command that generates traj.xtc.

2) I remove any bonds broken across the PBCs in the trajectory using 
-pbc whole. In the same step, I set the center-of-mass of the protein to 
the origin of the simulation box using the command -center -boxcenter 
zero:

echo "3 1" | gmx trjconv -f traj.xtc -s ref.pdb -pbc whole -center 
-boxcenter zero -o after_pbc.xtc

3) I remove the rotations in the trajectory by fitting to the reference 
structure in ref.pdb:

echo "3 1" | gmx trjconv -quiet -f after_pbc.xtc -s ref.pdb -fit rot -o 
after_rot.xtc

4) I convert the trajectory to .g96 format for analysis:

echo "3" | gmx trjconv -quiet -f after_rot.xtc -s $top -o traj.g96

keeping only the alpha-carbons.

Then, I run my analysis codes on the traj.g96 file. One thing I 
calculate is the average structure and write it to a PDB file. Then, to 
benchmark my calculations, I compare my results to what GROMACS finds 
for a PCA of the alpha-carbons of the same input trajectory (without the 
processing steps given above):

echo "3 3" | gmx covar -f traj.xtc -s top.tpr -ascii

Now, things get sticky when I compare my PCA results to those found 
using GROMACS. The eigenvalues of the covaraince matrix that I get are 
nearly identically to what GROMACS finds, viz, for the first ten 
eigenvalues:

Me:

1  0.19775810587712664
2   7.9646496095133204E-002
3   4.5068418039738017E-002
4   2.3479149896592204E-002
5   1.3160235463867121E-002
6   1.1623284734275811E-002
7   9.3204844796344801E-003
8   7.6473789980896715E-003
9   6.9955131377012871E-003
10   6.2603369172376079E-003

gmx covar:

1 0.197767
2 0.0796882
3 0.0450706
4 0.0234754
5 0.0131577
6 0.011621
7 0.0093207
8 0.0076525
9 0.00699542
10 0.00626022

But, the covariance matrices are subtly different, and different enough 
that it significantly effects the subsequent calculations using the 
covariance matrix.

Furthermore, I find that the average structures are significantly 
different; the average from gmx covar appears to be rotated relative to 
the average structure I find using my code. Checking the reference 
structure, it looks as though the average I find is oriented roughly in 
line with the reference, but the average structure from gmx covar is 
rotated in a way I don't understand.

I've done some seriously diagnostics on the code I wrote and checked all 
the obviously things ( is the covariance matrix symmetric? Are its 
eigenvectors orthogonal? Is the sum of the eigenvectors equal to the 
trace of the covariance matrix? Do I have six zero eigenvalues?, etc.) 
and everything passes.

So, I'm led to conclude that I am processing the trajectory differently 
than what GROMACS is doing in the gmx covar function. To the best of my 
knowledge, given my relative lack of knowledge of C and C++ that when 
gmx covar fits to the reference structure and calculates averages, it

1) Removes broken bonds across PBCs in the reference structure and moves 
it the origin of the simulation box, and

2) Removes broken bonds across PBCs in the trajectory, moves the 
center-of-mass of the protein to the origin, then performs a rotational 
fit to the reference structure.

But, it seems like that is what I am doing! My question (finally!) is

1) Is my interpretation of how GROMACS treats the trajectory inside gmx 
covar correct? If not, what is it doing?

2) Is it possible to reproduce the corrections GROMACS is making to the 
input trajectory in gmx covar using the tools in gmx trjconv? If not, is 
there any way to extract the corrected trajectory gmx covar is using to 
calculate the covariance matrix?

Thanks,

Eric Beyerle