[gmx-users] g_principal -- bug or very bad choice of filenames

Sat Feb 8 00:18:38 CET 2014

Dear all,

This is a follow-up to an old thread on g_principal, which continues 
(as of version 4.6.5) to suffer from what I would call a bug or, at 
least, a very bad and misleading choice of output file names. This 
message is primarily a further warning to users, but I also hope that it 
promotes the solution of the problem (I explicitly indicate below the 
small required code changes).

The program g_principal "calculates the three principal axes of inertia 
for a group of atoms" and generates three output files with the default 
names axis{1,2,3}.dat. Although the content of these files is not 
explained, their names naturally suggest that axisN.dat contains the xyz 
components of the Nth principal axis (presumably in the conventional 
major-to-minor order). Indeed, each of these files contains (besides the 
time) 3 real values per frame, and this interpretation also yields three 
vectors which are orthonormal, as one would expect for the new frame 
defining the principal axes.

However, this "natural" interpretation is wrong, and not because of a 
different axes order. As correctly identified by Chris Neale in the 
message included below, the file axis1.dat contains the x components of 
the 1st, 2nd and 3rd axes, the file axis2.dat their y components, and 
the file axis3.dat their z components (he checked with VMD, we did it 
with in-house programs in C and Octave). I think this a rather 
convoluted and extremely misleading way to output the principal axes.

My guess is that this problem derives from a simple confusion between 
the form of the orthogonal matrix obtained when solving the eigenvalue 
problem -- depending on the adopted algorithm, the vectors defining the 
new basis (the principal axes) can be either the columns or the rows of 
this matrix. I didn't look beyond gmx_principal.c, but the problem can 
be easily solved there by replacing the current lines

         fprintf(axis1, "%15.10f     %15.10f  %15.10f  %15.10f\n", t, 
axes[XX][XX], axes[YY][XX], axes[ZZ][XX]);
         fprintf(axis2, "%15.10f     %15.10f  %15.10f  %15.10f\n", t, 
axes[XX][YY], axes[YY][YY], axes[ZZ][YY]);
         fprintf(axis3, "%15.10f     %15.10f  %15.10f  %15.10f\n", t, 
axes[XX][ZZ], axes[YY][ZZ], axes[ZZ][ZZ]);

with

         fprintf(axis1, "%15.10f     %15.10f  %15.10f  %15.10f\n", t, 
axes[XX][XX], axes[XX][YY], axes[XX][ZZ]);
         fprintf(axis2, "%15.10f     %15.10f  %15.10f  %15.10f\n", t, 
axes[YY][XX], axes[YY][YY], axes[YY][ZZ]);
         fprintf(axis3, "%15.10f     %15.10f  %15.10f  %15.10f\n", t, 
axes[ZZ][XX], axes[ZZ][YY], axes[ZZ][ZZ]);

With this change, the file names would make perfect sense, with 
axisN.dat simply containing the components of the Nth principal axis. 
(Note that both the by-row and by-column readings give orthonormal 
vectors, because this is an orthogonal matrix.)

The file moi.dat also produced by g_principal is fine, containing the 
moments of inertia along the principal axes in the proper order (lowest 
to highest, since the inertia _around_ those axes increases as one goes 
from the major to the minor).

I believe that, as it stands now, g_principal is misleading many users 
into the wrong interpretation of its output. Maybe some developer wants 
to have a look at this issue and introduce my suggested fix.

Best,
Antonio

-- 
Antonio M. Baptista
Instituto de Tecnologia Quimica e Biologica, Universidade Nova de 
Lisboa
Av. da Republica - EAN, 2780-157 Oeiras, Portugal
phone: +351-214469619         email: baptista at itqb.unl.pt
fax:   +351-214411277         WWW:   http://www.itqb.unl.pt/~baptista
--------------------------------------------------------------------------

Nov 02, 2010; 4:46pm Chris Neale
Principal axis (g_principal)

Dear Sarah,

I just ran into the same thing so I'm posting this as a very late
response to your querry.

I think that g_principal prints the transpose of the principal axes
matrix, which is the exact opposite of what one would expect.

(1) First, I define a system that is longest in Z, second longest in
Y, and shortest in X:

cneale at iram:~$ cat a.gro
a
12
      1CL-     CL    1   0.000   0.000   0.000
      2CL-     CL    2   0.000   1.000   0.000
      3CL-     CL    3   0.000   2.000   0.000
      4CL-     CL    4   1.000   0.000   0.000
      5CL-     CL    5   1.000   1.000   0.000
      6CL-     CL    6   1.000   2.000   0.000
      7CL-     CL    7   0.000   0.000  10.000
      8CL-     CL    8   0.000   1.000  10.000
      9CL-     CL    9   0.000   2.000  10.000
     10CL-     CL   10   1.000   0.000  10.000
     11CL-     CL   11   1.000   1.000  10.000
     12CL-     CL   12   1.000   2.000  10.000
50 50 50
cneale at iram:~$ cat a.top
#include "ffoplsaa.itp"
#include "ions.itp"

[ system ]
; name
grid

[ molecules ]
; name        number
CL-  12

############

(2) Next I calculate the principal axes:

$ echo 0 |g_principal -f a.gro -s a.tpr >/dev/null 2>&1; cat axis1.dat
axis2.dat axis3.dat
     0.0000000000        0.0000000000     0.0000000000     1.0000000000
     0.0000000000        0.0000000000     1.0000000000     0.0000000000
     0.0000000000        1.0000000000     0.0000000000     0.0000000000

############

(3) Finally, I rotate it by 90 deg and calculate the principal axes:

$ editconf -f a.gro -o b.gro -rotate 90 0 0 >/dev/null 2>&1; echo 0
|g_principal -f b.gro -s a.tpr >/dev/null 2>&1; cat axis1.dat
axis2.dat axis3.dat
     0.0000000000        0.0000000000     0.0000000000     1.0000000000
     0.0000000000        1.0000000000     0.0000000000     0.0000000000
     0.0000000000        0.0000000000     1.0000000000     0.0000000000

############

(4) I use VMD to confirm that the longest axis is in Y. Thus one must
read the columns in axis1.dat as time1, X1, X2, X3 whereas one would
intuit that they read as time1, X1, Y1, Z1.

I didn't test this extensively, but if my colleague determines that I
am incorrect in what I have posted here then I will post that to this
list.

If I am correct, then it would be very useful to get some #titles in
the axisN.dat output files from g_principal.

Chris.

--------------------------------------------------------------------------

Dear gmx users,

I am doing several analyses (version 4.0.4) on my simulations with 
small
organic molecules inserting into a DMPC bilayer. Now I would like to 
calculate
whether the small molecule inserts into the membrane with a specific 
angle to
the membrane normal (or z-axis). I have used two approaches:

1) My molecule is a cyclohexene ring with two "para" substituents. I 
made an
index group with carbon number 3 and number 10 (in different groups) 
which
corresponds to the first substituent-atoms on each side of the ring. 
This
vector describes the shape of the molecule quite well. I then used 
g_bundle to
calculate the angle between this vector and the z-axis by echoing first 
the
C3-group and then the C10-group:

echo 0 1 | g_bundle -f xxx.xtc -s xxx.tpr -n xxx-vector.ndx -na 1 -z 
-ot
xxx-vector.xvg

This gives fairly reasonable results and angle distribution.

2) I also calculated the principal axis by g_principal where I echoed 
each
small molecule as a whole:

echo 0 | g_principal -f xxx.xtc -s xxx-1.tpr -n xxx-lim.ndx -a1
xxx-principal.dat -a2 xxx-axis2.dat -a3 xxx-axis3.dat

To get the angle between the principal axis and the z-axis one takes 
the arcos
the the z-component. This created quite a different and very broad 
angle
distribution.

This puzzled me very much since the difference between the C3-C10 
vector and
the principal axis should not be that great.

I then used a small python program to calculate the principal axis of 
my
molecule and the result was again quite different than from
g_principal. I have
tried to rotate each of my substituent groups by 120 degrees (creating 
9
different conformations) and then calculate the difference in the 
principal
axis but this creates only a deviation of max. 16-17 degrees not at all
accounting for the very different angle distributions seen. I have not
included
numbers but I can do so along with the python code. I would like to ask 
if
anyone can see any obvious mistakes or know of something special to be
aware of
when using g_principal?

Thank you,

Sarah