[gmx-users] Dihedral PCA documentation

Matthias Ernst matthias.ernst at physik.uni-freiburg.de
Fri Mar 3 17:41:42 CET 2017

Sorry for the delay, but I wanted to add a few points to this discussion
as I am doing my PhD in the group where dPCA was developed.
I hope these will clarify a little bit what's done in the tutorial and
why (took me a while to figure it out).
Obviously, it would be worth to rewrite my comments in a slightly more
concise form, maybe with examples, and make them available to the
community on some suitable platform. I will see if I find the time.
@Mark: which form should such a suggestion for an improved How-To have
and where could I submit it and how?

As an introduction and following up on what Mark said, I want to point
out that you do not need any GROMACS tool for doing a PCA, but you *can*
use them in a way that somewhat "misuses" trr and gro files.
The way it's done in the tutorial means that the covariance matrix,
eigenvectors and the PCs are stored in the trr files - but beware: the
trr format was meant for storing trajectory data, not PCA data. That's
why Mark called it a "hack" and that's why we need the "resized.gro", as
explained later.
In our group, we do a lot of PCA and usually just work on plain ASCII
data files as input and output files.
But there is not (yet) a black-box programm where you just plug in a
trajectory and in the end get nice results - you should know what's
going on under the hood, as only then you can interpret the PCA results
properly and judge if they make sense.

What you need for actually performing a PCA are
1) the data you want to perform the PCA on. This could be an MD
trajectory (in case there is *really* a good reason to use cartesian
coordinates, internal coordinates are more advisable), but as well an
ASCII file where you saved all the angles or distances or whatever
coordinates you want to use. Personally, I like the MDAnalysis Python
library as it easily allows to monitor e.g. a few angles or distances or
whatever quantity seems a good idea to try as input for a PCA.
You may also kick out irrelevant observables to speed up things and
reduce noise.
2) a tool to calculate the covariance matrix of your data, diagonalize
it and project the data on the eigenvectors, i.e. calculate the PCs.
In this step, it might be a good idea to save some files, namely the
statistics of the data, covariance matrix, eigenvectors and -values and
of course the PCs which is a LxD file with L lines (observations) for
the L frames you had in the trajectory and D variables (observables)
that your input file had. Note: if you started with an ASCII file, it
also had dimensions LxD (PCA is just a rotation of your coordinates).
3) analysis tools to create whatever plots you like. Usually, we use 1D
and 2D histograms as well as autocorrelation decays.

For step 2, I can recommend a tool written by a colleague of mine called
fastCPA that you can find at http://lettis.github.io/FastPCA

Now, I would like to quote a post I did some time ago at

dPCA was developed in the group where I'm working now and I have to say
that the way it is described in the GROMACS howto
(http://www.gromacs.org/Documentation/How-tos/Dihedral_PCA) feels like a
dirty hack. It works, though, if you do it correctly. The trick is that
you first do the sin- and cos-transformation of the dihedrals, save them
to a kind of "dummy" trajectory, create a kind of "dummy" structure just
to get the right number of coordinates and then evaluate it using the
standard GROMACS tools.

Let me briefly summarize what the tutorial tells you to do (I refer to
their numbering) and what is actually meant:
A) choose the dihedral angles you need (1. in tutorial, given by 4 atoms
per dihedral) and do the sin- and cos-transformation. This is then saved
as a new GROMACS trajectory (called dangle.trr in the tutorial). BUT,
the values are saved in triples of variables as the trajectory was
initially intended for cartesian coordinates, meaning there can be
"empty" coordinates which are 0.0 always if the number of coordinates
you use is not divisable by 3.
B) create a dummy index file which just tells g_covar how many atoms you
need for storing all values you have. Step 3 says you need to put
integer(2*N/3) atoms in the index file. This means: you have N dihedrals
and doing the sin- and cos-transformation, you get 2*N coordinates. They
are stored in triples of 3 as mentioned above, so you need M=2*N/3 atoms
but you need to round up (indicated by the "integer") to make sure you
don't drop one or two coordinates. That's why in the example you need to
count up to 4 in the index file (10 coordinates after transformation
meaning 3,33 atoms => round up to 4)
C) Create also a dummy gro file which also contains M atoms. This can
contain any values whatsoever, it just has to be a .gro file with the
correct number of atoms. Step 4 in the tutorial does this.
D) To actually perform the PCA, you need to diagonalize the covariance
matrix of your coordinates. This is done by g_covar in step 5. You need
to provide your dummy gro and index files as well as some options to
tell the tool not to do fitting, mass weighting or whatever, which is
why there are so many flags. If something goes wrong here, I would
assume that there is an error in the files you had to create in the
steps before.
E) To evaluate your PCA, you can plot one- or two-dimensional
projections of the data onto the eigenvectors of the covariance matrix
(the so-called principal components). This is done in steps 6 to 8.

Best Regards,

On 02/28/2017 05:17 PM, Mark Abraham wrote:
> Hi,
> On Tue, Feb 28, 2017 at 5:51 AM Anna Vernon <lappala.anna at gmail.com> wrote:
>> Thank you Justin,
>> that makes sense, thank you for help. However, I am struggling with the
>> rest, due to lacking documentation & examples. Like Alex said, this
>> web-page http://www.gromacs.org/Documentation/How-tos/Dihedral_PCA is
>> outdated, but there is no new version of it available...
> Well, when you figure out some improvements, please suggest them.
> Unfortunately it is often the case with such community projects that
> someone writes some code, contributes it, and then disappears leaving
> nobody quite sure what it is good for or how it works :-)
>> Or at least I
>> am not finding it (and I have noticed that people on the list have asked
>> questions similar to mine, but there were no answers to any of my
>> questions, sadly).
>> mk_angndx -s h3_md.tpr -n h3_ANGLE.ndx -type dihedral
>> is what I now have done, and my next steps should be the following, I
>> guess,
> Paraphrasing, the PCA howto at your link says "1. use gmx mk_angndx, 2. use
> gmx angle -or dangle.trr, 3. make covar.ndx 4. gmx trjconv -f dangle.trr"
> trjconv -s h3_md.tpr -f dangle.trr -o resized.gro -n covar.ndx -e 0
>> but I do not have a trr file because mk_angndx does not generate it
>> (unlike the old documentation command angle)
> mk_angndx never did that. You seem to be skipping the gmx angle command
> that makes that trr file.
>> and again, my tpr file
>> contains waters, not just the molecule of interest that I have in my
>> .gro file. I also do not understand the difference between the
>> dangle.ndx file and covar.ndx file... What is resized.gro?
> It's a hack that copes with the way somebody thought it was a good idea to
> abuse the trr format for storing non-trajectory data, to suit things like
> covariance analyses. Steps 3 and 4 seem fairly easy to follow, while I
> agree that the reason for doing them is mysterious. :-)
>> It would be
>> really helpful to write out those explanations in documentation and
>> perhaps give a more realistic example because this really becomes
>> frustrating.
> I'm open to suggested improvements, but a "realistic" example with 243
> dihedrals seems unlikely to make the instructions easier to follow...
> Mark
> Thanks for your time & help, it is very much appreciated.
>> Anna
>> On 27/02/17 19:58, Justin Lemkul wrote:
>>> On 2/27/17 9:55 PM, Anna Vernon wrote:
>>>> Thanks Justin,
>>>> but I did use this command, which is supposed to be dihedral:
>>>> g_angle -f h3_nowater.gro -n h3_ANGLE.ndx -or h3_ANGLE.trr -type
>>>> dihedral
>>> Sure, but g_angle is only going to give you the options presented to
>>> it in the index file.  If those groups were created incorrectly (note
>>> my comments were about mk_angndx, not g_angle) then you have to create
>>> them properly.  You can't get dihedrals from valence angles :)
>>> -Justin
>>>> thanks
>>>> A
>>>> On 27/02/17 19:35, Justin Lemkul wrote:
>>>>> On 2/27/17 9:29 PM, Anna Vernon wrote:
>>>>>> Thanks Alex.
>>>>>> tried the new syntax, below the output of g_angle, which did not
>>>>>> get any
>>>>>> easier... Хрень осталась...)))
>>>>>> Group     0 (UB_th=108.4_297.062f) has    36 elements
>>>>>> Group     1 (UB_th=108.9_384.092f) has    30 elements
>>>>>> Group     2 (UB_th=109.7_794.962f) has     6 elements
>>>>>> Group     3 (UB_th=111.5_376.562f) has     6 elements
>>>>>> Group     4 (UB_th=110.1_221.752f) has    36 elements
>>>>>> Group     5 (UB_th=109.0_297.062f) has    33 elements
>>>>>> Group     6 (UB_th=109.5_430.952f) has    36 elements
>>>>>> Group     7 (UB_th=113.5_585.762f) has    12 elements
>>>>>> Group     8 (UB_th=121.0_376.562f) has    12 elements
>>>>>> Group     9 (UB_th=119.5_351.462f) has    24 elements
>>>>>> Group    10 (UB_th=124.0_669.442f) has    12 elements
>>>>>> Group    11 (UB_th=112.5_167.362f) has    12 elements
>>>>>> Group    12 (UB_th=121.0_669.442f) has    15 elements
>>>>>> Group    13 (UB_th=109.5_276.142f) has    30 elements
>>>>>> Group    14 (UB_th=107.0_418.402f) has    12 elements
>>>>>> Group    15 (UB_th=112.2_365.682f) has     6 elements
>>>>>> Group    16 (UB_th=109.5_271.122f) has     6 elements
>>>>>> Group    17 (UB_th=109.5_271.122f) has    12 elements
>>>>>> Group    18 (UB_th=110.0_365.682f) has     3 elements
>>>>>> Group    19 (UB_th=111.0_292.882f) has     6 elements
>>>>>> Group    20 (UB_th=112.2_338.902f) has     3 elements
>>>>>> Group    21 (UB_th=111.0_359.822f) has     6 elements
>>>>>> Group    22 (UB_th=108.0_401.662f) has     6 elements
>>>>>> Group    23 (UB_th=110.1_288.702f) has     6 elements
>>>>>> Group    24 (UB_th=115.2_443.502f) has     6 elements
>>>>>> Group    25 (UB_th=107.5_433.462f) has     6 elements
>>>>>> Group    26 (UB_th=120.0_383.252f) has    12 elements
>>>>>> Group    27 (UB_th=120.0_334.722f) has    36 elements
>>>>>> Group    28 (UB_th=120.0_251.042f) has    60 elements
>>>>>> Group    29 (UB_th=110.1_279.742f) has    18 elements
>>>>>> Group    30 (UB_th=105.8_351.462f) has     3 elements
>>>>>> Group    31 (UB_th=112.1_343.092f) has     6 elements
>>>>>> Group    32 (UB_th=116.5_418.402f) has     3 elements
>>>>>> Group    33 (UB_th=122.5_627.602f) has     3 elements
>>>>>> Group    34 (UB_th=120.0_418.402f) has     6 elements
>>>>>> Group    35 (UB_th=120.0_192.462f) has     3 elements
>>>>> These are Urey-Bradley angles with the given equilibrium angles and
>>>>> force
>>>>> constants.  If you're using mk_angndx to create the index group, you
>>>>> want
>>>>> "-type dihedral" instead of "-type angle" (which is the default).
>>>>> -Justin
>>>>>> On 27/02/17 16:04, Alex wrote:
>>>>>>> I think you might be referring to the 2010 version of the online
>>>>>>> manual (
>>>>>>> http://www.gromacs.org/Documentation/How-tos/Dihedral_PCA), which
>>>>>>> is pretty
>>>>>>> outdated. A lot of syntax seems to be changing fairly rapidly, so
>>>>>>> хрень
>>>>>>> like this is to be expected. ;)
>>>>>>> The current manual points to gmx angle, gmx covar and, gmx anaeig for
>>>>>>> covariance analysis, and for those the updated help pages are
>>>>>>> separate:
>>>>>>> http://manual.gromacs.org/programs/gmx-angle.html
>>>>>>> http://manual.gromacs.org/programs/gmx-covar.html
>>>>>>> http://manual.gromacs.org/programs/gmx-anaeig.html
>>>>>>> Alex
>>>>>>> On Mon, Feb 27, 2017 at 3:44 PM, Anna Lappala
>>>>>>> <lappala.anna at gmail.com>
>>>>>>> wrote:
>>>>>>>> Dear Gromacs users,
>>>>>>>> The documentation on Dihedral PCA analysis is confusing.
>>>>>>>>   Firstly, the links are redirecting to the main gromacs page
>>>>>>>> which does
>>>>>>>> not help because one needs to search for the specific commands
>>>>>>>> from there.
>>>>>>>> Secondly, -s option does not work in step 2: invalid command line
>>>>>>>> argument
>>>>>>>> "-s".
>>>>>>>> Thirdly, my tpr file contains water which I do not want to
>>>>>>>> include in the
>>>>>>>> analysis, how can I change that?
>>>>>>>> .ndx file output produced by mk_angndx is a mess- first line, for
>>>>>>>> example,
>>>>>>>> is [UB_th=108.4_297.062f]
>>>>>>>> I need a small number of dihedrals, so I could write the file
>>>>>>>> myself but
>>>>>>>> there again is no clear explanation how to do it: the [foo] 1 2 3
>>>>>>>> 4 example
>>>>>>>> could be expanded- do I use "dihedrals" instead of foo? Or
>>>>>>>> filename? Or
>>>>>>>> molecule name?
>>>>>>>> Finally, I would really appreciate it if someone could give an
>>>>>>>> explanation
>>>>>>>> or a clear example of how this works, for a simple system or a
>>>>>>>> protein,
>>>>>>>> because documentation is unclear to me.
>>>>>>>> Kind regards,
>>>>>>>> Anna
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>> * Please search the archive at http://www.gromacs.org/
>>>>>>>> Support/Mailing_Lists/GMX-Users_List before posting!
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send a mail to gmx-users-request at gromacs.org.
>> --
>> Gromacs Users mailing list
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list