[gmx-users] output of g_covar with -ascii option , and the format of covar.dat

Fri Sep 7 02:19:53 CEST 2007

Ok, I think what was confusing me was the format of the covar.dat file 
(the covariance matrix as a text file), that's outputted from g_covar 
when using the -ascii function.   So I'll clarify this below, hopefully 
to relieve any future confusion . . .

Let N = the # of particles for which you'd like to calculate the 
covariance matrix.

You would expect the covariance matrix to have dimensions 3N x 3N.  But 
when you look at the covar.dat file (or however you wish to name it) 
obtained by using the -ascii option, the dimensions of that matrix in 
the text file is actually 3N^2 x 3 (always 3 columns).  I'm not sure why 
the format is this way . . . probably to save space in the horizontal 
direction.  I was unclear what the entries in the rectangular matrix were.

Let C = this matrix from the dat file, of dimensions 3N^2 x 3. 
Let's factor C = AB , such that A has dimensions (3N^2 x N) , and B has 
dimensions (N x 3).  Thus, the product of AB yields the correct 
dimensions of C.  (This factorization may help you in reconstructing the 
covariance matrix, if you desire something in 3N x 3N format, or if 
you're just trying to understand and pick out some of the entries in 
covar.dat). 

Let's look at an example for further illustration.  Suppose N = 3.  We 
construct the covariance matrix:
[ababakha at chemcca50 CovarAnalysis]$ g_covar -s ../../FullMD/FullMD1.tpr 
-f ../../FullMD/FullMD1.trr -n CAlphaPractice.ndx -ascii covar.dat
....
Choose a group for the least squares fit
Group     0 (CAlpha_practice) has     3 elements
There is one group in the index
Choose a group for the covariance analysis
Group     0 (CAlpha_practice) has     3 elements
There is one group in the index
Calculating the average structure ...
trn version: GMX_trn_file (single precision)
Last frame       1000 time 1000.000
Constructing covariance matrix (9x9) ...
Last frame       1000 time 1000.000
Read 1001 frames
.... and so on . . .

Now if we look at the covar.dat file:

[ababakha at chemcca50 CovarAnalysis]$ more covar.dat
3.26402e-05 -6.83794e-06 -3.9065e-05
-4.76002e-05 -2.60562e-05 2.47818e-06
1.49601e-05 3.28941e-05 3.65869e-05
-6.83794e-06 3.46278e-06 1.12547e-05
9.38582e-06 4.41876e-06 -1.57612e-06
-2.54786e-06 -7.88152e-06 -9.67853e-06
-3.9065e-05 1.12547e-05 5.13989e-05
5.60832e-05 2.96123e-05 -4.56461e-06
-1.70182e-05 -4.08668e-05 -4.68343e-05
-4.76002e-05 9.38582e-06 5.60832e-05
7.0521e-05 3.79785e-05 -4.61594e-06
-2.29209e-05 -4.73642e-05 -5.14673e-05
-2.60562e-05 4.41876e-06 2.96123e-05
3.79785e-05 2.14427e-05 -9.89136e-07
-1.19224e-05 -2.58614e-05 -2.86232e-05
2.47818e-06 -1.57612e-06 -4.56461e-06
-4.61594e-06 -9.89136e-07 2.5659e-06
2.13774e-06 2.56526e-06 1.9987e-06
1.49601e-05 -2.54786e-06 -1.70182e-05
-2.29209e-05 -1.19224e-05 2.13774e-06
7.96086e-06 1.44703e-05 1.48805e-05
3.28941e-05 -7.88152e-06 -4.08668e-05
-4.73642e-05 -2.58614e-05 2.56526e-06
1.44703e-05 3.37428e-05 3.83016e-05
3.65869e-05 -9.67853e-06 -4.68343e-05
-5.14673e-05 -2.86232e-05 1.9987e-06
1.48805e-05 3.83016e-05 4.48356e-05

Indeed, it is 3N^2 by 3, or 27 x 3 in this case, b/c N = 3. Let this 
matrix = C.  The entries of C are as follows:

C = [
x1x1 x1y1 x1z1
x2x1 x2y1 x2z1
x3x1 x3y1 x3z1
y1x1 y1y1 y1z1
y2x1 y2y1 y2z1
y3x1 y3y1 y3z1
z1x1 z1y1 z1z1
z2x1 z2y1 z2z1
z3x1 z3y1 z3z1
x1x2 x1y2 x1z2
x2x2 x2y2 x2z2
x3x2 x3y2 x3z2
y1x2 y1y2 y1z2
y2x2 y2y2 y2z2
y3x2 y3y2 y3z2
z1x2 z1y2 z1z2
z2x2 z2y2 z2z2
z3x2 z3y2 z3z2
x1x3 x1y3 x1z3
x2x3 x2y3 x2z3
x3x3 x3y3 x3z3
y1x3 y1y3 y1z3
y2x3 y2y3 y2z3
y3x3 y3y3 y3z3
z1x3 z1y3 z1z3
z2x3 z2y3 z2z3
z3x3 z3y3 z3z3
] ;

You can do some brief visual inspection to confirm this.  For instance, 
since z3x3 = x3z3, C(27,1) should be the same as C(21,3).  Indeed it is, 
the value is 1.48805e-05. 

And if you wanted to further factor this, you could:
C=AB, where the corresponding dimensions are:
27x3 = (27x3)(3x3)

A = [
x1  0  0
x2  0  0
x3  0  0
y1  0  0
y2  0  0
y3  0  0
z1  0  0
z2  0  0
z3  0  0
0  x1  0
0  x2  0
0  x3  0
0  y1  0
0  y2  0
0  y3  0
0  z1  0
0  z2  0
0  z3  0
0  0  x1
0  0  x2
0  0  x3
0  0  y1
0  0  y2
0  0  y3
0  0  z1
0  0  z2
0  0  z3
];

and
B = [
x1 y1 z1
x2 y2 z2
x3 y3 z3 ]

Arneh Babakhani wrote:
> Hi Tsjerk, yes I did make a mistake in the group that I chose . . . should
> be C-alpha.  Anyway, thanks for the help, appreciate it,
>
> Arneh
>
>   
>> Hi Arneh,
>>
>> I think this has been covered on the mailing list before. But anyway,
>> the number indeed refers to the atom index number, running from 1-N.
>> You mean that you have 1934427 X 3 elements in the covar.dat file? So
>> that corresponds to sqrt( 1934427 X 3 ) / 3 = 803 atoms. I think you
>> made a mistake with the group you chose (Protein rather than
>> c-alpha?). You would expect to have 90000 elements for a system of 100
>> atoms.
>>
>> Hope it helps,
>>
>> Tsjerk
>>
>> On 9/5/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
>>     
>>> Thanks Tsjerk,
>>>
>>> One other question, regarding the covar.dat file (the covariance matrix
>>> that's outputted when using the -ascii flag):
>>>
>>> I dont' quite understand the format of it.  In the manual, it states:
>>> "The order of the elements is: x1x1, x1y1, x1z1, x1x2, ..."
>>>
>>> Does "1" and "2" refer to the particles in the group?  I'm asking b/c my
>>> particular covar.dat file is of dimmensions 1934427 X 3 , for a group of
>>> about 100 atoms and a traj of 1 NS.  Just trying to make sense out of it
>>> . . .
>>>
>>> Thanks,
>>>
>>> Arneh
>>>
>>>
>>>
>>> Tsjerk Wassenaar wrote:
>>>       
>>>> Hi Arneh,
>>>>
>>>> Yes. As you should be able to recall, the (linear, not "generalized")
>>>> correlation is formally given as:
>>>>
>>>> cor(x,y) = cov(x,y) / ( sqrt(var(x))*sqrt(var(y)) )
>>>>
>>>> The diagonal elements of the covariance matrix give the variances...
>>>>
>>>> Cheers,
>>>>
>>>> Tsjerk
>>>>
>>>> On 8/31/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
>>>>
>>>>         
>>>>> Sorry for the confusion . I want the correlation.  So I can just use
>>>>> g_covar to get the covar. matrix, and divide each term by what you
>>>>>           
>>> stated,
>>>       
>>>>> to get the correlation matrix? (if I follow you correctly).
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>>           
>>>>>> Hi Arneh,
>>>>>>
>>>>>> You're not clear on what you want. Is it a covariance matrix or a
>>>>>> correlation matrix. Correlation and covariance are different things.
>>>>>> g_covar lets you write out a covariance matrix using the option
>>>>>> -ascii. In case you want the correlation matrix (though IIRC Karplus
>>>>>> also used the covariance matrix), you have to divide each element
>>>>>>             
>>> m_ij
>>>       
>>>>>> by sqrt(m_ii)*sqrt(m_jj).
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Tsjerk
>>>>>>
>>>>>> On 8/31/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
>>>>>>
>>>>>>             
>>>>>>> Can anyone briefly recommend a procedure for calculating the
>>>>>>>               
>>> correlation
>>>       
>>>>>>> matrix (not the diagonalized covariance matrix, as done by g_covar)
>>>>>>>               
>>> of a
>>>       
>>>>>>> specified group?
>>>>>>>
>>>>>>> In particular, I'm looking to calculate the covariance matrix, as
>>>>>>> specifed
>>>>>>> in the Karplus paper (Proteins: Vol 11:205-217, 1991), where each
>>>>>>>               
>>> entry
>>>       
>>>>>>> of
>>>>>>> the matrix is defined as the correlation between the i-th and j-th
>>>>>>>               
>>> atom:
>>>       
>>>>>>> c_ij = <delta r_i> <delta r_j>
>>>>>>>
>>>>>>> where delta r is the deviation of i from its average position,
>>>>>>>               
>>> averaged
>>>       
>>>>>>> over the ensemble.
>>>>>>>
>>>>>>> Or, is there a way to use g_covar and suppress the diagonalization
>>>>>>>               
>>> step,
>>>       
>>>>>>> so as to obtain only the translation correlation matrix??? (I
>>>>>>>               
>>> couldn't
>>>       
>>>>>>> find anything to this effect in the manual).
>>>>>>>
>>>>>>> Much thanks,
>>>>>>>
>>>>>>> Arneh
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> gmx-users mailing list    gmx-users at gromacs.org
>>>>>>> http://www.gromacs.org/mailman/listinfo/gmx-users
>>>>>>> Please search the archive at http://www.gromacs.org/search before
>>>>>>> posting!
>>>>>>> Please don't post (un)subscribe requests to the list. Use the
>>>>>>> www interface or send it to gmx-users-request at gromacs.org.
>>>>>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> --
>>>>>> Tsjerk A. Wassenaar, Ph.D.
>>>>>> Junior UD (post-doc)
>>>>>> Biomolecular NMR, Bijvoet Center
>>>>>> Utrecht University
>>>>>> Padualaan 8
>>>>>> 3584 CH Utrecht
>>>>>> The Netherlands
>>>>>> P: +31-30-2539931
>>>>>> F: +31-30-2537623
>>>>>>
>>>>>>
>>>>>>             
>>>>
>>>>         
>> --
>> Tsjerk A. Wassenaar, Ph.D.
>> Junior UD (post-doc)
>> Biomolecular NMR, Bijvoet Center
>> Utrecht University
>> Padualaan 8
>> 3584 CH Utrecht
>> The Netherlands
>> P: +31-30-2539931
>> F: +31-30-2537623
>> _______________________________________________
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://www.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before posting!
>> Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>>     
>
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>
>