[gmx-users] output of g_covar with -ascii option , and the format of covar.dat

Tsjerk Wassenaar tsjerkw at gmail.com
Fri Sep 7 08:03:32 CEST 2007


Hi Arneh,

I usually take R (http://cran.r-project.org/) to work with the
covariance matrix:

S<-scan("covar.dat")
S<-matrix(S,ncol=sqrt(length(S)))
image(S)

Something similar will go for Matlab by the way.

Maybe worth a try...

Cheers,

Tsjerk

On 9/7/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
> Ok, I think what was confusing me was the format of the covar.dat file
> (the covariance matrix as a text file), that's outputted from g_covar
> when using the -ascii function.   So I'll clarify this below, hopefully
> to relieve any future confusion . . .
>
> Let N = the # of particles for which you'd like to calculate the
> covariance matrix.
>
> You would expect the covariance matrix to have dimensions 3N x 3N.  But
> when you look at the covar.dat file (or however you wish to name it)
> obtained by using the -ascii option, the dimensions of that matrix in
> the text file is actually 3N^2 x 3 (always 3 columns).  I'm not sure why
> the format is this way . . . probably to save space in the horizontal
> direction.  I was unclear what the entries in the rectangular matrix were.
>
> Let C = this matrix from the dat file, of dimensions 3N^2 x 3.
> Let's factor C = AB , such that A has dimensions (3N^2 x N) , and B has
> dimensions (N x 3).  Thus, the product of AB yields the correct
> dimensions of C.  (This factorization may help you in reconstructing the
> covariance matrix, if you desire something in 3N x 3N format, or if
> you're just trying to understand and pick out some of the entries in
> covar.dat).
>
> Let's look at an example for further illustration.  Suppose N = 3.  We
> construct the covariance matrix:
> [ababakha at chemcca50 CovarAnalysis]$ g_covar -s ../../FullMD/FullMD1.tpr
> -f ../../FullMD/FullMD1.trr -n CAlphaPractice.ndx -ascii covar.dat
> ....
> Choose a group for the least squares fit
> Group     0 (CAlpha_practice) has     3 elements
> There is one group in the index
> Choose a group for the covariance analysis
> Group     0 (CAlpha_practice) has     3 elements
> There is one group in the index
> Calculating the average structure ...
> trn version: GMX_trn_file (single precision)
> Last frame       1000 time 1000.000
> Constructing covariance matrix (9x9) ...
> Last frame       1000 time 1000.000
> Read 1001 frames
> .... and so on . . .
>
> Now if we look at the covar.dat file:
>
> [ababakha at chemcca50 CovarAnalysis]$ more covar.dat
> 3.26402e-05 -6.83794e-06 -3.9065e-05
> -4.76002e-05 -2.60562e-05 2.47818e-06
> 1.49601e-05 3.28941e-05 3.65869e-05
> -6.83794e-06 3.46278e-06 1.12547e-05
> 9.38582e-06 4.41876e-06 -1.57612e-06
> -2.54786e-06 -7.88152e-06 -9.67853e-06
> -3.9065e-05 1.12547e-05 5.13989e-05
> 5.60832e-05 2.96123e-05 -4.56461e-06
> -1.70182e-05 -4.08668e-05 -4.68343e-05
> -4.76002e-05 9.38582e-06 5.60832e-05
> 7.0521e-05 3.79785e-05 -4.61594e-06
> -2.29209e-05 -4.73642e-05 -5.14673e-05
> -2.60562e-05 4.41876e-06 2.96123e-05
> 3.79785e-05 2.14427e-05 -9.89136e-07
> -1.19224e-05 -2.58614e-05 -2.86232e-05
> 2.47818e-06 -1.57612e-06 -4.56461e-06
> -4.61594e-06 -9.89136e-07 2.5659e-06
> 2.13774e-06 2.56526e-06 1.9987e-06
> 1.49601e-05 -2.54786e-06 -1.70182e-05
> -2.29209e-05 -1.19224e-05 2.13774e-06
> 7.96086e-06 1.44703e-05 1.48805e-05
> 3.28941e-05 -7.88152e-06 -4.08668e-05
> -4.73642e-05 -2.58614e-05 2.56526e-06
> 1.44703e-05 3.37428e-05 3.83016e-05
> 3.65869e-05 -9.67853e-06 -4.68343e-05
> -5.14673e-05 -2.86232e-05 1.9987e-06
> 1.48805e-05 3.83016e-05 4.48356e-05
>
> Indeed, it is 3N^2 by 3, or 27 x 3 in this case, b/c N = 3. Let this
> matrix = C.  The entries of C are as follows:
>
> C = [
> x1x1 x1y1 x1z1
> x2x1 x2y1 x2z1
> x3x1 x3y1 x3z1
> y1x1 y1y1 y1z1
> y2x1 y2y1 y2z1
> y3x1 y3y1 y3z1
> z1x1 z1y1 z1z1
> z2x1 z2y1 z2z1
> z3x1 z3y1 z3z1
> x1x2 x1y2 x1z2
> x2x2 x2y2 x2z2
> x3x2 x3y2 x3z2
> y1x2 y1y2 y1z2
> y2x2 y2y2 y2z2
> y3x2 y3y2 y3z2
> z1x2 z1y2 z1z2
> z2x2 z2y2 z2z2
> z3x2 z3y2 z3z2
> x1x3 x1y3 x1z3
> x2x3 x2y3 x2z3
> x3x3 x3y3 x3z3
> y1x3 y1y3 y1z3
> y2x3 y2y3 y2z3
> y3x3 y3y3 y3z3
> z1x3 z1y3 z1z3
> z2x3 z2y3 z2z3
> z3x3 z3y3 z3z3
> ] ;
>
> You can do some brief visual inspection to confirm this.  For instance,
> since z3x3 = x3z3, C(27,1) should be the same as C(21,3).  Indeed it is,
> the value is 1.48805e-05.
>
> And if you wanted to further factor this, you could:
> C=AB, where the corresponding dimensions are:
> 27x3 = (27x3)(3x3)
>
> A = [
> x1  0  0
> x2  0  0
> x3  0  0
> y1  0  0
> y2  0  0
> y3  0  0
> z1  0  0
> z2  0  0
> z3  0  0
> 0  x1  0
> 0  x2  0
> 0  x3  0
> 0  y1  0
> 0  y2  0
> 0  y3  0
> 0  z1  0
> 0  z2  0
> 0  z3  0
> 0  0  x1
> 0  0  x2
> 0  0  x3
> 0  0  y1
> 0  0  y2
> 0  0  y3
> 0  0  z1
> 0  0  z2
> 0  0  z3
> ];
>
> and
> B = [
> x1 y1 z1
> x2 y2 z2
> x3 y3 z3 ]
>
> Arneh Babakhani wrote:
> > Hi Tsjerk, yes I did make a mistake in the group that I chose . . . should
> > be C-alpha.  Anyway, thanks for the help, appreciate it,
> >
> > Arneh
> >
> >
> >> Hi Arneh,
> >>
> >> I think this has been covered on the mailing list before. But anyway,
> >> the number indeed refers to the atom index number, running from 1-N.
> >> You mean that you have 1934427 X 3 elements in the covar.dat file? So
> >> that corresponds to sqrt( 1934427 X 3 ) / 3 = 803 atoms. I think you
> >> made a mistake with the group you chose (Protein rather than
> >> c-alpha?). You would expect to have 90000 elements for a system of 100
> >> atoms.
> >>
> >> Hope it helps,
> >>
> >> Tsjerk
> >>
> >> On 9/5/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
> >>
> >>> Thanks Tsjerk,
> >>>
> >>> One other question, regarding the covar.dat file (the covariance matrix
> >>> that's outputted when using the -ascii flag):
> >>>
> >>> I dont' quite understand the format of it.  In the manual, it states:
> >>> "The order of the elements is: x1x1, x1y1, x1z1, x1x2, ..."
> >>>
> >>> Does "1" and "2" refer to the particles in the group?  I'm asking b/c my
> >>> particular covar.dat file is of dimmensions 1934427 X 3 , for a group of
> >>> about 100 atoms and a traj of 1 NS.  Just trying to make sense out of it
> >>> . . .
> >>>
> >>> Thanks,
> >>>
> >>> Arneh
> >>>
> >>>
> >>>
> >>> Tsjerk Wassenaar wrote:
> >>>
> >>>> Hi Arneh,
> >>>>
> >>>> Yes. As you should be able to recall, the (linear, not "generalized")
> >>>> correlation is formally given as:
> >>>>
> >>>> cor(x,y) = cov(x,y) / ( sqrt(var(x))*sqrt(var(y)) )
> >>>>
> >>>> The diagonal elements of the covariance matrix give the variances...
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Tsjerk
> >>>>
> >>>> On 8/31/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
> >>>>
> >>>>
> >>>>> Sorry for the confusion . I want the correlation.  So I can just use
> >>>>> g_covar to get the covar. matrix, and divide each term by what you
> >>>>>
> >>> stated,
> >>>
> >>>>> to get the correlation matrix? (if I follow you correctly).
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Hi Arneh,
> >>>>>>
> >>>>>> You're not clear on what you want. Is it a covariance matrix or a
> >>>>>> correlation matrix. Correlation and covariance are different things.
> >>>>>> g_covar lets you write out a covariance matrix using the option
> >>>>>> -ascii. In case you want the correlation matrix (though IIRC Karplus
> >>>>>> also used the covariance matrix), you have to divide each element
> >>>>>>
> >>> m_ij
> >>>
> >>>>>> by sqrt(m_ii)*sqrt(m_jj).
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Tsjerk
> >>>>>>
> >>>>>> On 8/31/07, Arneh Babakhani <ababakha at mccammon.ucsd.edu> wrote:
> >>>>>>
> >>>>>>
> >>>>>>> Can anyone briefly recommend a procedure for calculating the
> >>>>>>>
> >>> correlation
> >>>
> >>>>>>> matrix (not the diagonalized covariance matrix, as done by g_covar)
> >>>>>>>
> >>> of a
> >>>
> >>>>>>> specified group?
> >>>>>>>
> >>>>>>> In particular, I'm looking to calculate the covariance matrix, as
> >>>>>>> specifed
> >>>>>>> in the Karplus paper (Proteins: Vol 11:205-217, 1991), where each
> >>>>>>>
> >>> entry
> >>>
> >>>>>>> of
> >>>>>>> the matrix is defined as the correlation between the i-th and j-th
> >>>>>>>
> >>> atom:
> >>>
> >>>>>>> c_ij = <delta r_i> <delta r_j>
> >>>>>>>
> >>>>>>> where delta r is the deviation of i from its average position,
> >>>>>>>
> >>> averaged
> >>>
> >>>>>>> over the ensemble.
> >>>>>>>
> >>>>>>> Or, is there a way to use g_covar and suppress the diagonalization
> >>>>>>>
> >>> step,
> >>>
> >>>>>>> so as to obtain only the translation correlation matrix??? (I
> >>>>>>>
> >>> couldn't
> >>>
> >>>>>>> find anything to this effect in the manual).
> >>>>>>>
> >>>>>>> Much thanks,
> >>>>>>>
> >>>>>>> Arneh
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> gmx-users mailing list    gmx-users at gromacs.org
> >>>>>>> http://www.gromacs.org/mailman/listinfo/gmx-users
> >>>>>>> Please search the archive at http://www.gromacs.org/search before
> >>>>>>> posting!
> >>>>>>> Please don't post (un)subscribe requests to the list. Use the
> >>>>>>> www interface or send it to gmx-users-request at gromacs.org.
> >>>>>>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> --
> >>>>>> Tsjerk A. Wassenaar, Ph.D.
> >>>>>> Junior UD (post-doc)
> >>>>>> Biomolecular NMR, Bijvoet Center
> >>>>>> Utrecht University
> >>>>>> Padualaan 8
> >>>>>> 3584 CH Utrecht
> >>>>>> The Netherlands
> >>>>>> P: +31-30-2539931
> >>>>>> F: +31-30-2537623
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >> --
> >> Tsjerk A. Wassenaar, Ph.D.
> >> Junior UD (post-doc)
> >> Biomolecular NMR, Bijvoet Center
> >> Utrecht University
> >> Padualaan 8
> >> 3584 CH Utrecht
> >> The Netherlands
> >> P: +31-30-2539931
> >> F: +31-30-2537623
> >> _______________________________________________
> >> gmx-users mailing list    gmx-users at gromacs.org
> >> http://www.gromacs.org/mailman/listinfo/gmx-users
> >> Please search the archive at http://www.gromacs.org/search before posting!
> >> Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-users-request at gromacs.org.
> >> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> >>
> >>
> >
> > _______________________________________________
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://www.gromacs.org/mailman/listinfo/gmx-users
> > Please search the archive at http://www.gromacs.org/search before posting!
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > Can't post? Read http://www.gromacs.org/mailing_lists/users.php
> >
> >
> _______________________________________________
> gmx-users mailing list    gmx-users at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search before posting!
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>


-- 
Tsjerk A. Wassenaar, Ph.D.
Junior UD (post-doc)
Biomolecular NMR, Bijvoet Center
Utrecht University
Padualaan 8
3584 CH Utrecht
The Netherlands
P: +31-30-2539931
F: +31-30-2537623



More information about the gromacs.org_gmx-users mailing list