[gmx-users] g_select syntax

Fri Aug 29 21:06:47 CEST 2014

Hi,

On Fri, Aug 29, 2014 at 4:19 AM, Bin Liu <fdusuperstring at gmail.com> wrote:

> I am recently puzzled by the syntax and behaviour of g_select. I want to
> obtain the residue index list of LIPID whose center of mass is within 1.0
> nm of the surface of protein. In my case, each LIPID molecule consists of
> only one residue. I wrote the selection.dat as follows, and set -selrpos to
> atom and -seltype to res_com. Here I think "Protein" is the reference
> group, so -selrpos should be atom because I care about the distance to its
> surface. "LIPID" is the analysis group and I care about their individual
> center of mass. So -seltype should be res_com.
>
> selection.dat:
> resname LIPID and within 1.0 of group "Protein";
>
> g_select -sf selection.dat -f traj.trr -s traj.tpr -n system.ndx  -oi
> index.dat -seltype res_com -selrpos atom
>

Yes, this should give you what you expect. It selects all LIPID atoms that
are within 1.0 nm from the protein, and then groups them by residue for the
-oi output.

However I tried another selection. This time instead of retrieving the
> residue index, I tried to retrieve the index of a key atom of the LIPID
> molecule.
> selection.dat:
> rdist = res_com within 1.0 of group "Protein";
> group_C15 = (resname LIPID) and (rdist) and (name C15);
> group_C15;
>
> g_select -sf selection.dat -f traj.trr -s traj.tpr -n system.ndx  -oi
> index.dat -seltype atom -selrpos atom
>

"res_com within" has a different meaning from using -seltype res_com: your
second selection selects all C15 atoms that are in LIPID residues, and
where the center-of-mass of the whole residue is within 1 nm from the
protein (the last part is the "res_com within" expression).

-seltype res_com in the first example is equivalent to writing this, where
the res_com is in a very different location:
res_com of (resname LIPID and within 1.0 of group "Protein")

Hopefully this helps understanding where the difference between the
selections comes from.

I thought these two selections should give the same number of indices per
> frame, as the second selection merely retrieve the atom indices of the
> corresponding key atoms in the LIPID molecules selected by the first
> selection. However the first selection gives significantly more indices
> than the second selection does. I guess my understanding of g_select syntax
> might be flawed. Please point out my misunderstanding. Thank you very much.
>

If you want to select the key atoms that match those from your first
selection, you need to write something more complex:

name C15 and same residue as (resname LIPID and within 1.0 of group
"Protein")

The last selection should be self-explanatory.

Hope this helps,
Teemu