[gmx-users] g_cluster (gromos method) creates clusters with members having RMSD greater than the cutoff to the cluster "middle"

João M. Damas jmdamas at itqb.unl.pt
Wed Apr 1 06:09:08 CEST 2015


I think it _may_ even be more serious than what Christopher reported.

The "center of the cluster" (as defined by Daura et al. and not the "middle
structure" calculated by g_cluster) may not even be outputted with the
cluster that was generated from it. This is due to the way the algorithm
from Daura et al. is implemented on g_cluster (gromos method).

The "neighbour list" based on the cut-off is constructed only once and is
stored in the nnb[] structure. After that structure is sorted (qsort) by
the "number of neighbours" (nnb[].nr), the top-ranked "neighbour row"
(nnb[0].nb) is outputted as a (first) cluster, just like the original
article says. After this, as the original article states, "the structures
of this cluster [must be] eliminated from the pool of structures" for the
process to be repeated, and I think here is where the implementation
diverges from the original algorithm.

One way of doing this correctly would be be to construct a new neighbour
list based on the structures left, but I guess that, although correct, it
would lead to unnecessary calculations. The current implementation on
g_cluster does the structure elimination by removing the (first) cluster
structures from the nnb[].nb arrays, which store all the "neighbours". What
may not be evident is that the "neighbour rows" constructed based on the
structures marked for removal should also disappear, because they may be
outputted as "phantom clusters" built from a structures that are already
part of other clusters.

A "quick" test to prove that the method is poorly implemented is to save
the "center of the cluster" for later output. With this output, one can see
that, for many of the less populated clusters, the "center of the cluster"
is not even part of the cluster.

In practice, due to circumstances unknown to me, it seems like it does not
affect much the clusters outputted (from a test case I have), which
explains why this has passed unnoticed. I only noticed it because I had a
very weird cluster with average RMSD higher than the cut-off, which lead me
to these diggings. After correcting the code according to my findings, that
weird cluster turned out to be two clusters (it was a "phantom cluster").
So, for someone who may be dealing with similar issues, this may be their
source.

Going back to the redmine filed by Christopher: for the sake of keeping the
analyze_clusters function fairly general, I think the "middle structure"
should be the one being outputted to the .log (and etc). But maybe the
function can be updated to use the method variable and, in the case of
gromos, output the "center of the clusters" to stderr as a warning? Just an
idea.

Cheers,
João

PS: This was based on version 4.6.7. A quick diff against 5.0.4 shows
nothing changed in the gromos function of the gmx_cluster.c, so it applies
to version 5 too.
PSS: The gromos function has an unnecessary variable row being declared,
with memory being allocated and freed, without anything happening in the
middle. It can be removed.

On Sat, Feb 14, 2015 at 3:44 AM, Christopher Neale <
chris.neale at alum.utoronto.ca> wrote:

> Dear Users:
>
> I have sorted out the issue and filed a redmine:
>
> http://redmine.gromacs.org/issues/1688
>
> ________________________________________
> From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <
> gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of
> Christopher Neale <chris.neale at alum.utoronto.ca>
> Sent: 12 February 2015 22:53
> To: gromacs.org_gmx-users at maillist.sys.kth.se
> Subject: [gmx-users] g_cluster (gromos method) creates clusters with
> members having RMSD greater than the cutoff to the cluster "middle"
>
> Dear Users:
>
> I have run g_cluster from gromacs 4.6.5 as follows:
>
> g_cluster -s my.tpr -f tmp.xtc -method gromos -nofit -o
> rmsd-clust_nofit.xpm -g cluster_nofit.log -sz clust-size_nofit.xvg -cl
> clusters_nofit.pdb -n index.ndx -cutoff 0.275 -wcl 10 -cl finally.xtc
>
> The top of cluster_nofit.log looks like this:
> Using gromos method for clustering
> Using RMSD cutoff 0.275 nm
> The RMSD ranges from 0.0247008 to 5.03412 nm
> Average RMSD is 0.45568
> Number of structures for matrix 12391
> Energy of the matrix is 4177.11 nm
>
> Found 874 clusters
>
> Writing middle structure for each cluster to finally.xtc
> Writing all structures for the first 10 clusters with more than 1
> structures to finally.xtc%03d.xtc
>
> cl. | #st  rmsd | middle rmsd | cluster members
>   1 | 4219  0.202 | 784700 .154 |  92900 103300 116100 125900 126500
> 133000 156600
> ...
>     |           |             | 693900 694000 694200 694300 694600 694700
> 695700
> ...
>
> So the frame at 694700 is in the first cluster.
>
> However, when I use g_rms to look at the rmsd between frame at 784700 and
> 694700, I find that the RMSD is 0.29 nm.
>
> I am confused because frame 784700 is listed as the "middle" (which I
> assume is the cluster centroid) so frame 694700 should not be included in
> this cluster.
>
> When I put these two frames into a single .xtc with only 2 frames,
> g_cluster correctly puts them into two different clusters when I use a
> cutoff of 0.275 nm.
>
> I understand that some of the members in a cluster can have a pairwise
> RMSD greater than the cutoff, but I thought that the "middle" of the
> cluster should have an RMSD less than the cutoff to all cluster members.
>
> Can anybody help me figure out what I am missing?
>
> Thank you,
> Chris.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>



-- 
João M. Damas
PhD Student
Protein Modelling Group
ITQB-UNL, Oeiras, Portugal
Tel:+351-214469613


More information about the gromacs.org_gmx-users mailing list