[gmx-users] Replicating the Results from gmx covar
Eric R Beyerle
ebeyerle at uoregon.edu
Wed Jul 25 02:55:00 CEST 2018
Hi,
I've been developing codes to perform PCA of the alpha-carbons from a
processed GROMACS trajectory of a protein in the human-readable .g96
format. I perform the standard post-processing of the trajectory (still
in the compressed .xtc version generated from gmx mdrun) as follows:
1) I correct any broken bonds across the periodic boundary conditions
(PBCs) using gmx trjconv and simultaneously move the centre-of-mass to
the origin of the box using -center -boxcenter zero and output my
reference structure used for the rest of the processing workflow,
ref.pdb:
echo "3 1" | gmx trjconv -f traj.xtc -s top.tpr -pbc whole -center
-boxcenter zero -dump 0 -o ref.pdb
where traj.xtc is the 'raw' trajectory output from mdrun and top.tpr is
the topology generated by gmx grompp called immediately prior to the
mdrun command that generates traj.xtc.
2) I remove any bonds broken across the PBCs in the trajectory using
-pbc whole. In the same step, I set the center-of-mass of the protein to
the origin of the simulation box using the command -center -boxcenter
zero:
echo "3 1" | gmx trjconv -f traj.xtc -s ref.pdb -pbc whole -center
-boxcenter zero -o after_pbc.xtc
3) I remove the rotations in the trajectory by fitting to the reference
structure in ref.pdb:
echo "3 1" | gmx trjconv -quiet -f after_pbc.xtc -s ref.pdb -fit rot -o
after_rot.xtc
4) I convert the trajectory to .g96 format for analysis:
echo "3" | gmx trjconv -quiet -f after_rot.xtc -s $top -o traj.g96
keeping only the alpha-carbons.
Then, I run my analysis codes on the traj.g96 file. One thing I
calculate is the average structure and write it to a PDB file. Then, to
benchmark my calculations, I compare my results to what GROMACS finds
for a PCA of the alpha-carbons of the same input trajectory (without the
processing steps given above):
echo "3 3" | gmx covar -f traj.xtc -s top.tpr -ascii
Now, things get sticky when I compare my PCA results to those found
using GROMACS. The eigenvalues of the covaraince matrix that I get are
nearly identically to what GROMACS finds, viz, for the first ten
eigenvalues:
Me:
1 0.19775810587712664
2 7.9646496095133204E-002
3 4.5068418039738017E-002
4 2.3479149896592204E-002
5 1.3160235463867121E-002
6 1.1623284734275811E-002
7 9.3204844796344801E-003
8 7.6473789980896715E-003
9 6.9955131377012871E-003
10 6.2603369172376079E-003
gmx covar:
1 0.197767
2 0.0796882
3 0.0450706
4 0.0234754
5 0.0131577
6 0.011621
7 0.0093207
8 0.0076525
9 0.00699542
10 0.00626022
But, the covariance matrices are subtly different, and different enough
that it significantly effects the subsequent calculations using the
covariance matrix.
Furthermore, I find that the average structures are significantly
different; the average from gmx covar appears to be rotated relative to
the average structure I find using my code. Checking the reference
structure, it looks as though the average I find is oriented roughly in
line with the reference, but the average structure from gmx covar is
rotated in a way I don't understand.
I've done some seriously diagnostics on the code I wrote and checked all
the obviously things ( is the covariance matrix symmetric? Are its
eigenvectors orthogonal? Is the sum of the eigenvectors equal to the
trace of the covariance matrix? Do I have six zero eigenvalues?, etc.)
and everything passes.
So, I'm led to conclude that I am processing the trajectory differently
than what GROMACS is doing in the gmx covar function. To the best of my
knowledge, given my relative lack of knowledge of C and C++ that when
gmx covar fits to the reference structure and calculates averages, it
1) Removes broken bonds across PBCs in the reference structure and moves
it the origin of the simulation box, and
2) Removes broken bonds across PBCs in the trajectory, moves the
center-of-mass of the protein to the origin, then performs a rotational
fit to the reference structure.
But, it seems like that is what I am doing! My question (finally!) is
1) Is my interpretation of how GROMACS treats the trajectory inside gmx
covar correct? If not, what is it doing?
2) Is it possible to reproduce the corrections GROMACS is making to the
input trajectory in gmx covar using the tools in gmx trjconv? If not, is
there any way to extract the corrected trajectory gmx covar is using to
calculate the covariance matrix?
Thanks,
Eric Beyerle
More information about the gromacs.org_gmx-users
mailing list