[gmx-users] g_sas: unable to reproduce data from original article of the DCLM (Eisenhaber1995)

João M. Damas jmdamas at itqb.unl.pt
Sat Feb 22 18:34:26 CET 2014


Dear all,

I have been having some puzzling results when using g_sas to analyze some
trajectories, so I have decided to go back to basics.

As described in the documentation, g_sas implements the double cubic
lattice method (DCLM) originally presented in Eisenhaber1995 (doi:
10.1002/jcc.540160303 <http://dx.doi.org/10.1002/jcc.540160303>). Table I
of that article shows a test case of three proteins (PDB code: 4PTI, 3FXN
and 1TIM), where the DCLM (numerical method) at different point densities
is compared against an analytic method (also by Eisenhaber, citation 30).

My question was: can g_sas reproduce the areas reported in Table I of
Eisenhaber1995?

First of all, I went on to reproduce the reported areas with the software
originally developed by Eisenhaber, which has both the analytical and
numerical (DCLM) methods implemented - ASC software. As it can be seen in
Attached Table (see below), running ASC provided results very similar (to
the tenths of square angstrom), if not equal, to the reported ones. Besides
assessing the reported areas, I now have a software which allows me to
benchmark other cases.

The next step was answering my question. For that, I had to make sure g_sas
was performing the same calculation as ASC software:

   - Both ASC and g_sas took the same input structures (only using the ATOM
   records). In the case of g_sas, I had to build a .tpr file for the -s flag
   of g_sas, which was done using pdb2gmx+grompp. Like with the ASC, the g_sas
   calculation was done using only the heavy atoms (using the Protein-H group
   for calculation and output groups).
   - The radii used also had to be the same for both ASC and g_sas. g_sas
   is still using the vdwradii.dat, so I created a new vdwradii.dat file with
   the Eisenberg&McLachlan radii that are used in the ASC calculations (both
   mine and the reported ones). The radii were correctly attributed to the
   atoms, as checked in the g_sas.log file outputted by the debug mode.
   - In both cases the probe radius was 1.4 angstroms (0.14 nm as per g_sas
   input).

The area calculations were then performed using g_sas (number of points
used were provided using the -ndots flag). Originally, I used GROMACS
4.0.4, but afterwards I also performed the calculations using GROMACS
4.6.3. Results are presented in Attached Table (see below).

   - GROMACS 4.0.4 g_sas DCLM overestimates the area by about 5, 10 and 20%
   for 4PTI, 3FXN and 1TIM, respectively, in relation to ASC DCLM. This is
   also the same overestimation of the areas obtained through an analytical
   method, since ASC DCLM gives a good estimation of the analytical ones,
   unlike g_sas.
   - On the other hand, GROMACS 4.6.3 g_sas DCLM underestimates the area by
   about 3, 10 and 0.5% for 4PTI, 3FXN and 1TIM, respectively, in relation to
   ASC DCLM. I have checked that this different behavior appeared from the
   4.0.X to the 4.5.X versions. This is odd since I have searched the release
   notes for g_sas and I haven't found any change in this tool... Maybe this
   is a clue to what is going on?
   - From my understanding, DCLM implemented on ASC or on g_sas shouldn't
   give different results. Moreover, even if there could be some differences
   when using a low number of points (due some factor of randomness I may not
   be aware), this difference should have already vanished when using more
   than 1000 points per atom. Take note that increasing the number of points
   in g_sas does not give areas nearer the analytical results (hence, more
   dots may not mean more accuracy, maybe this is the reason 24 is the default
   number...). Can anyone point me out on some mistake I may be doing? Maybe
   there is a factor that I may be disregarding...

I have some data which point out that these differences (errors?) may lead
to very large errors when calculating properties like buried/contact areas,
specially if the buried/contact areas have different magnitudes from the
molecule's areas. But this may be the theme of a another/follow-up e-mail
when I get ASC working on my trajectories.

Best regards,
João M. Damas

--
Attached Table
met  - method used, anl=analytical method, num=numerical method (DCLM)
dots - number of points used for the numerical method
src  - source of the data, paper=Eisenhaber1995, ASC=ASC software,
gmx404=g_sas of GMX4.0.4,
       gmx463=g_sas of GMX4.6.3
Areas are in square angstroms
==================================================
  met  dots     src      4PTI      3FXN      1TIM
==================================================
  anl         paper   3973.80   6943.80  20002.80
  anl           ASC   3973.81   6943.80  20002.80

  num   122   paper   3961.40   6968.30  19970.90
  num   122     ASC   3961.44   6968.33  19970.90
  num   122  gmx404   4165.31   7727.73  24110.10
  num   122  gmx463   3825.86   6257.83  19867.90

  num   362   paper   3971.80   6933.40  19997.10
  num   362     ASC   3971.79   6933.37  19997.10
  num   362  gmx404   4192.12   7704.33  24161.50
  num   362  gmx463   3838.35   6248.43  19886.80

  num   642   paper   3967.90   6944.40  19998.70
  num   642     ASC   3967.78   6944.37  19998.70
  num   642  gmx404   4187.54   7703.00  24148.90
  num   642  gmx463   3831.64   6255.64  19882.70

  num  1002   paper   3974.10   6939.10  20012.20
  num  1002     ASC   3974.07   6939.12  20012.20
  num  1002  gmx404   4191.68   7702.32  24173.20
  num  1002  gmx463   3840.58   6250.14  19897.40

  num  1472   paper   3975.70   6943.30  19997.00
  num  1472     ASC   3975.70   6943.35  19996.90
  num  1472  gmx404   4197.65   7710.93  24153.70
  num  1472  gmx463   3841.69   6256.92  19882.80
==================================================

-- 
João M. Damas
PhD Student
Protein Modelling Group
ITQB-UNL, Oeiras, Portugal
Tel:+351-214469613


More information about the gromacs.org_gmx-users mailing list