[gmx-users] g_sas: unable to reproduce data from original article of the DCLM (Eisenhaber1995)

Teemu Murtola teemu.murtola at gmail.com
Thu Feb 27 15:16:00 CET 2014


Hi,

Thanks for the report. In relation to
https://gerrit.gromacs.org/#/c/2726/I had been planning to add some
tests for the actual algorithm used, and
this report made me prioritize this. Here's an intermediate result. I have
not yet tested the PBC-enabled computation at all, and haven't looked at
the development history in detail, so I cannot give a full picture.

But so far I've found that the algorithm implemented in g_sas does not work
correctly in the no-PBC case, except possibly with equally sized spheres.
It fails to reproduce even basic analytical results with two spheres of
unequal size (as long as the sphere that appears first in the coordinate
file is smaller). This appears to be fixed by removing (there should be
only one matching line)

  ra2max = radius[iat];

from src/tools/nsc.c (in 4.6 and earlier). This may not be the only problem
in the algorithm, but can you check whether such a modified version helps
with the -nopbc case? I'll post additional information once I have had time
to test more cases.

Best regards,
Teemu


On Sun, Feb 23, 2014 at 5:29 AM, João M. Damas <jmdamas at itqb.unl.pt> wrote:

> Thank you for your fast reply and your suggestions! The results are in
> Attached Table 2 (see below).
>
>    - I followed your PBC suggestion, but used the -pbc flag instead (I
>    still ended up doing yours, read further ahead). While turning off PBC
> for
>    GMX4.0.4 leaves the results unchanged, doing the same for GMX4.6.3
> changes
>    the results closer to the 4.0.4 results. Different results from changing
>    PBC in different versions could possibly mean different boxes coming
> from
>    possible different processing. Since I'm providing to g_sas a .pdb file
>    only with ATOM records, one could assume g_sas is taking the box from
> the
>    .tpr file provided in -s. The .tpr file was created using
> pdb2gmx+grompp. I
>    checked the .gro files boxes and they are all the same (for the same
>    protein, of course). Maybe g_sas is guessing a box on its own (and
>    differently for different versions)? Thinking the -pbc flag may be
> behaving
>    weirdly, I have also made a batch equal to Attached Table 2 but running
> a
>    editconf -d 5 in between pdb2gmx and grompp (your suggestion), and the
>    results are all the same as with the smaller box, confirming the
>    -nopbc/-pbc differences as really weird... Any ideas?
>    - changing from single to double precison only causes changes that are,
>    at maximum, of a few square angstroms, but most commonly under 1 square
>    angstrom. This does not seem to be the answer either...
>
> Further input on this issue is deeply appreciated!
>
> Best,
> João
>
> P.S: g_sas issues a warning that comes from atomprop.c that starts with
> "Masses and atomic (Van der Waals) radii will be guessed based on residue
> and atom names...". From what I read from the code, this means they will be
> fetched from vdwradii.dat and atommasses.dat instead of the topology. Maybe
> those files should mentioned in that warning? Just a minor remark.
>
> --
> Attached Table 2
> met  - method used, anl=analytical method, num=numerical method (DCLM)
> dots - number of points used for the numerical method
> src  - source of the data, paper=Eisenhaber1995, ASC=ASC software,
>        g{404,463}{sp,dp}=g_sas of GMX4.{0.4,6.3} {single,double} precision
> pbc  - PBC flag for g_sas, yes=-pbc, no=-nopbc
> Areas are in square angstrom
> ======================================================
>   met  dots     src  pbc      4PTI      3FXN      1TIM
> ======================================================
>   anl         paper        3973.80   6943.80  20002.80
>   anl           ASC        3973.81   6943.80  20002.80
>
>   num   122   paper        3961.40   6968.30  19970.90
>   num   122     ASC        3961.44   6968.33  19970.90
>   num   122  g404sp  yes   4165.31   7727.73  24110.10
>   num   122  g404sp   no   4165.31   7727.73  24110.10
>   num   122  g404dp  yes   4165.57   7722.80  24110.30
>   num   122  g404dp   no   4165.57   7722.80  24110.30
>   num   122  g463sp  yes   3825.86   6257.83  19867.90
>   num   122  g463sp   no   4169.21   7731.63  24080.20
>   num   122  g463dp  yes   3826.12   6258.51  19866.90
>   num   122  g463dp   no   4169.47   7726.70  24080.30
>
>   num   362   paper        3971.80   6933.40  19997.10
>   num   362     ASC        3971.79   6933.37  19997.10
>   num   362  g404sp  yes   4192.12   7704.33  24161.50
>   num   362  g404sp   no   4192.12   7704.33  24161.50
>   num   362  g404dp  yes   4191.91   7703.31  24160.20
>   num   362  g404dp   no   4191.91   7703.31  24160.20
>   num   362  g463sp  yes   3838.35   6248.43  19886.80
>   num   362  g463sp   no   4195.02   7704.70  24132.10
>   num   362  g463dp  yes   3838.24   6249.87  19884.70
>   num   362  g463dp   no   4194.81   7703.68  24130.80
>
>   num   642   paper        3967.90   6944.40  19998.70
>   num   642     ASC        3967.78   6944.37  19998.70
>   num   642  g404sp  yes   4187.54   7703.00  24148.90
>   num   642  g404sp   no   4187.54   7703.00  24148.90
>   num   642  g404dp  yes   4187.71   7698.46  24144.60
>   num   642  g404dp   no   4187.71   7698.46  24144.60
>   num   642  g463sp  yes   3831.64   6255.64  19882.70
>   num   642  g463sp   no   4188.63   7701.99  24119.40
>   num   642  g463dp  yes   3831.14   6254.56  19879.30
>   num   642  g463dp   no   4188.86   7697.44  24115.10
>
>   num  1002   paper        3974.10   6939.10  20012.20
>   num  1002     ASC        3974.07   6939.12  20012.20
>   num  1002  g404sp  yes   4191.68   7702.32  24173.20
>   num  1002  g404sp   no   4191.68   7702.32  24173.20
>   num  1002  g404dp  yes   4191.89   7697.87  24171.60
>   num  1002  g404dp   no   4191.89   7697.87  24171.60
>   num  1002  g463sp  yes   3840.58   6250.14  19897.40
>   num  1002  g463sp   no   4193.60   7700.64  24143.10
>   num  1002  g463dp  yes   3841.84   6249.88  19894.70
>   num  1002  g463dp   no   4194.01   7696.19  24141.40
>
>   num  1472   paper        3975.70   6943.30  19997.00
>   num  1472     ASC        3975.70   6943.35  19996.90
>   num  1472  g404sp  yes   4197.65   7710.93  24153.70
>   num  1472  g404sp   no   4197.65   7710.93  24153.70
>   num  1472  g404dp  yes   4197.79   7709.92  24153.90
>   num  1472  g404dp   no   4197.79   7709.92  24153.90
>   num  1472  g463sp  yes   3841.69   6256.92  19882.80
>   num  1472  g463sp   no   4198.08   7709.30  24123.30
>   num  1472  g463dp  yes   3842.52   6259.11  19882.70
>   num  1472  g463dp   no   4198.54   7708.29  24123.50
> ======================================================
>
>
> On Sat, Feb 22, 2014 at 9:13 PM, David van der Spoel
> <spoel at xray.bmc.uu.se>wrote:
>
> > On 2014-02-22 18:34, João M. Damas wrote:
> >
> >> Dear all,
> >>
> >> I have been having some puzzling results when using g_sas to analyze
> some
> >> trajectories, so I have decided to go back to basics.
> >>
> >> As described in the documentation, g_sas implements the double cubic
> >> lattice method (DCLM) originally presented in Eisenhaber1995 (doi:
> >> 10.1002/jcc.540160303 <http://dx.doi.org/10.1002/jcc.540160303>).
> Table I
> >>
> >> of that article shows a test case of three proteins (PDB code: 4PTI,
> 3FXN
> >> and 1TIM), where the DCLM (numerical method) at different point
> densities
> >> is compared against an analytic method (also by Eisenhaber, citation
> 30).
> >>
> >> My question was: can g_sas reproduce the areas reported in Table I of
> >> Eisenhaber1995?
> >>
> >> First of all, I went on to reproduce the reported areas with the
> software
> >> originally developed by Eisenhaber, which has both the analytical and
> >> numerical (DCLM) methods implemented - ASC software. As it can be seen
> in
> >> Attached Table (see below), running ASC provided results very similar
> (to
> >> the tenths of square angstrom), if not equal, to the reported ones.
> >> Besides
> >> assessing the reported areas, I now have a software which allows me to
> >> benchmark other cases.
> >>
> >> The next step was answering my question. For that, I had to make sure
> >> g_sas
> >> was performing the same calculation as ASC software:
> >>
> >>     - Both ASC and g_sas took the same input structures (only using the
> >> ATOM
> >>
> >>     records). In the case of g_sas, I had to build a .tpr file for the
> -s
> >> flag
> >>     of g_sas, which was done using pdb2gmx+grompp. Like with the ASC,
> the
> >> g_sas
> >>     calculation was done using only the heavy atoms (using the Protein-H
> >> group
> >>     for calculation and output groups).
> >>     - The radii used also had to be the same for both ASC and g_sas.
> g_sas
> >>
> >>     is still using the vdwradii.dat, so I created a new vdwradii.dat
> file
> >> with
> >>     the Eisenberg&McLachlan radii that are used in the ASC calculations
> >> (both
> >>     mine and the reported ones). The radii were correctly attributed to
> >> the
> >>     atoms, as checked in the g_sas.log file outputted by the debug mode.
> >>     - In both cases the probe radius was 1.4 angstroms (0.14 nm as per
> >> g_sas
> >>
> >>     input).
> >>
> >> The area calculations were then performed using g_sas (number of points
> >> used were provided using the -ndots flag). Originally, I used GROMACS
> >> 4.0.4, but afterwards I also performed the calculations using GROMACS
> >> 4.6.3. Results are presented in Attached Table (see below).
> >>
> >>     - GROMACS 4.0.4 g_sas DCLM overestimates the area by about 5, 10 and
> >> 20%
> >>
> >>     for 4PTI, 3FXN and 1TIM, respectively, in relation to ASC DCLM. This
> >> is
> >>     also the same overestimation of the areas obtained through an
> >> analytical
> >>     method, since ASC DCLM gives a good estimation of the analytical
> ones,
> >>     unlike g_sas.
> >>     - On the other hand, GROMACS 4.6.3 g_sas DCLM underestimates the
> area
> >> by
> >>
> >>     about 3, 10 and 0.5% for 4PTI, 3FXN and 1TIM, respectively, in
> >> relation to
> >>     ASC DCLM. I have checked that this different behavior appeared from
> >> the
> >>     4.0.X to the 4.5.X versions. This is odd since I have searched the
> >> release
> >>     notes for g_sas and I haven't found any change in this tool... Maybe
> >> this
> >>     is a clue to what is going on?
> >>     - From my understanding, DCLM implemented on ASC or on g_sas
> shouldn't
> >>
> >>     give different results. Moreover, even if there could be some
> >> differences
> >>     when using a low number of points (due some factor of randomness I
> >> may not
> >>     be aware), this difference should have already vanished when using
> >> more
> >>     than 1000 points per atom. Take note that increasing the number of
> >> points
> >>     in g_sas does not give areas nearer the analytical results (hence,
> >> more
> >>     dots may not mean more accuracy, maybe this is the reason 24 is the
> >> default
> >>     number...). Can anyone point me out on some mistake I may be doing?
> >> Maybe
> >>     there is a factor that I may be disregarding...
> >>
> >> I have some data which point out that these differences (errors?) may
> lead
> >> to very large errors when calculating properties like buried/contact
> >> areas,
> >> specially if the buried/contact areas have different magnitudes from the
> >> molecule's areas. But this may be the theme of a another/follow-up
> e-mail
> >> when I get ASC working on my trajectories.
> >>
> >> Best regards,
> >> João M. Damas
> >>
> >> --
> >> Attached Table
> >> met  - method used, anl=analytical method, num=numerical method (DCLM)
> >> dots - number of points used for the numerical method
> >> src  - source of the data, paper=Eisenhaber1995, ASC=ASC software,
> >> gmx404=g_sas of GMX4.0.4,
> >>         gmx463=g_sas of GMX4.6.3
> >> Areas are in square angstroms
> >> ==================================================
> >>    met  dots     src      4PTI      3FXN      1TIM
> >> ==================================================
> >>    anl         paper   3973.80   6943.80  20002.80
> >>    anl           ASC   3973.81   6943.80  20002.80
> >>
> >>    num   122   paper   3961.40   6968.30  19970.90
> >>    num   122     ASC   3961.44   6968.33  19970.90
> >>    num   122  gmx404   4165.31   7727.73  24110.10
> >>    num   122  gmx463   3825.86   6257.83  19867.90
> >>
> >>    num   362   paper   3971.80   6933.40  19997.10
> >>    num   362     ASC   3971.79   6933.37  19997.10
> >>    num   362  gmx404   4192.12   7704.33  24161.50
> >>    num   362  gmx463   3838.35   6248.43  19886.80
> >>
> >>    num   642   paper   3967.90   6944.40  19998.70
> >>    num   642     ASC   3967.78   6944.37  19998.70
> >>    num   642  gmx404   4187.54   7703.00  24148.90
> >>    num   642  gmx463   3831.64   6255.64  19882.70
> >>
> >>    num  1002   paper   3974.10   6939.10  20012.20
> >>    num  1002     ASC   3974.07   6939.12  20012.20
> >>    num  1002  gmx404   4191.68   7702.32  24173.20
> >>    num  1002  gmx463   3840.58   6250.14  19897.40
> >>
> >>    num  1472   paper   3975.70   6943.30  19997.00
> >>    num  1472     ASC   3975.70   6943.35  19996.90
> >>    num  1472  gmx404   4197.65   7710.93  24153.70
> >>    num  1472  gmx463   3841.69   6256.92  19882.80
> >> ==================================================
> >>
> >>  Thanks for a thorough analysis. Some more points to test:
> > - compile gromacs in double precision and rerun
> > - verify there are no periodic boundary effects by making the box larger.
>


More information about the gromacs.org_gmx-users mailing list