[gmx-users] PCA analysis with different atoms in -s and -f

ZHANG Cheng 272699575 at qq.com
Fri Apr 3 13:26:50 CEST 2020


Dear Eduardo,


Many thanks for your detailed explanation. Sorry I am not an expert on the PCA. So I may need more explanations from you if you do not mind. Feel free to correct if I am wrong.


I have done the common analysis to the MD run at different conditions, including RMSD, Rg, secondary structure, native contacts, etc. I can find some differences among those conditions. But I feel that there are infinite properties to choose to compare. So I am trying to find a more systematic way to quantify the difference. I found the "gmx anaeig -over" seems to be an ideal option.


Can I ask,


1) How the "-ref" is used in the PCA analysis? i.e. How the "deviation" is used in the process of PCA? Do I need to fully understand the mathmatical equations in order to understand it?
# -ref no (default)
Use the deviation from the structure file (i.e. -s name.pdb)
# -ref yes
Use the deviation from the average of the trajectories


2) So far, I choose "MainChain" as the least square fit, and "C-alpha" for the PCA. I hope I can see the significant difference between the different conditions at the C-alpha level. But if not, I may choose "Backbone" or "MainChain" for the PCA.


3) Can you explain what is "long enough to have at least two halves of trajectory 0.pdb CA. 100% overlap of covariance matrices", and what is "block analysis to calculate overlap error as a function of time length"?


Thank you!


Yours sincerely
Cheng





------------------ Original ------------------
From:&nbsp;"ZHANG Cheng"<272699575 at qq.com&gt;;
Date:&nbsp;Wed, Apr 1, 2020 09:10 AM
To:&nbsp;"gromacs.org_gmx-users"<gromacs.org_gmx-users at maillist.sys.kth.se&gt;;
Cc:&nbsp;"ZHANG Cheng"<272699575 at qq.com&gt;;
Subject:&nbsp;PCA analysis with different atoms in -s and -f



I am trying to compare trajectories from different MD simulations, including different pH and different mutants. The initial PDB (i.e. 0.pdb) is the same, but the derived PDBs (1.pdb, 2.pdb, etc.) are different due to protonation states and mutations. Those different PDBs were used individually for the MD.


To obtain the eigen vectors, should I use the 0.pdb as the reference structure?
# gmx covar -s 0.pdb -f 1.xtc -v 1.trr
(use 0.pdb as reference, and calculate the eigen vectors from trajectories of 1.pdb)


The first is to choose the least squares fit. Though the atoms in "Protein", "Protein-H" are different between 0.pdb and 1.xtc, they are same in "C-alpha", "Backbone" and "MainChain". However, when I choose "C-alpha" for the "least squares fit", I still got the warning:
# WARNING: number of atoms in tpx (442) and trajectory (6622) do not match


The calculation can still be done. So must I provide "1.pdb" as reference for "1.xtc", or is it still okay to use "-s 0.pdb"?


Afterwards, I want to run
# gmx anaeig -s 0.pdb -over overlap_1_2.xvg -v2 1.trr -v 2.trr
to compare the similarity between Condition 1 and Condition 2. Is this correct?


More information about the gromacs.org_gmx-users mailing list