[gmx-users] g_sas: unable to reproduce data from original article of the DCLM (Eisenhaber1995)
João M. Damas
jmdamas at itqb.unl.pt
Sun Feb 23 04:30:18 CET 2014
Hello David,
Thank you for your fast reply and your suggestions! The results are in
Attached Table 2 (see below).
- I followed your PBC suggestion, but used the -pbc flag instead (I
still ended up doing yours, read further ahead). While turning off PBC for
GMX4.0.4 leaves the results unchanged, doing the same for GMX4.6.3 changes
the results closer to the 4.0.4 results. Different results from changing
PBC in different versions could possibly mean different boxes coming from
possible different processing. Since I'm providing to g_sas a .pdb file
only with ATOM records, one could assume g_sas is taking the box from the
.tpr file provided in -s. The .tpr file was created using pdb2gmx+grompp. I
checked the .gro files boxes and they are all the same (for the same
protein, of course). Maybe g_sas is guessing a box on its own (and
differently for different versions)? Thinking the -pbc flag may be behaving
weirdly, I have also made a batch equal to Attached Table 2 but running a
editconf -d 5 in between pdb2gmx and grompp (your suggestion), and the
results are all the same as with the smaller box, confirming the
-nopbc/-pbc differences as really weird... Any ideas?
- changing from single to double precison only causes changes that are,
at maximum, of a few square angstroms, but most commonly under 1 square
angstrom. This does not seem to be the answer either...
Further input on this issue is deeply appreciated!
Best,
João
P.S: g_sas issues a warning that comes from atomprop.c that starts with
"Masses and atomic (Van der Waals) radii will be guessed based on residue
and atom names...". From what I read from the code, this means they will be
fetched from vdwradii.dat and atommasses.dat instead of the topology. Maybe
those files should mentioned in that warning? Just a minor remark.
--
Attached Table 2
met - method used, anl=analytical method, num=numerical method (DCLM)
dots - number of points used for the numerical method
src - source of the data, paper=Eisenhaber1995, ASC=ASC software,
g{404,463}{sp,dp}=g_sas of GMX4.{0.4,6.3} {single,double} precision
pbc - PBC flag for g_sas, yes=-pbc, no=-nopbc
Areas are in square angstrom
======================================================
met dots src pbc 4PTI 3FXN 1TIM
======================================================
anl paper 3973.80 6943.80 20002.80
anl ASC 3973.81 6943.80 20002.80
num 122 paper 3961.40 6968.30 19970.90
num 122 ASC 3961.44 6968.33 19970.90
num 122 g404sp yes 4165.31 7727.73 24110.10
num 122 g404sp no 4165.31 7727.73 24110.10
num 122 g404dp yes 4165.57 7722.80 24110.30
num 122 g404dp no 4165.57 7722.80 24110.30
num 122 g463sp yes 3825.86 6257.83 19867.90
num 122 g463sp no 4169.21 7731.63 24080.20
num 122 g463dp yes 3826.12 6258.51 19866.90
num 122 g463dp no 4169.47 7726.70 24080.30
num 362 paper 3971.80 6933.40 19997.10
num 362 ASC 3971.79 6933.37 19997.10
num 362 g404sp yes 4192.12 7704.33 24161.50
num 362 g404sp no 4192.12 7704.33 24161.50
num 362 g404dp yes 4191.91 7703.31 24160.20
num 362 g404dp no 4191.91 7703.31 24160.20
num 362 g463sp yes 3838.35 6248.43 19886.80
num 362 g463sp no 4195.02 7704.70 24132.10
num 362 g463dp yes 3838.24 6249.87 19884.70
num 362 g463dp no 4194.81 7703.68 24130.80
num 642 paper 3967.90 6944.40 19998.70
num 642 ASC 3967.78 6944.37 19998.70
num 642 g404sp yes 4187.54 7703.00 24148.90
num 642 g404sp no 4187.54 7703.00 24148.90
num 642 g404dp yes 4187.71 7698.46 24144.60
num 642 g404dp no 4187.71 7698.46 24144.60
num 642 g463sp yes 3831.64 6255.64 19882.70
num 642 g463sp no 4188.63 7701.99 24119.40
num 642 g463dp yes 3831.14 6254.56 19879.30
num 642 g463dp no 4188.86 7697.44 24115.10
num 1002 paper 3974.10 6939.10 20012.20
num 1002 ASC 3974.07 6939.12 20012.20
num 1002 g404sp yes 4191.68 7702.32 24173.20
num 1002 g404sp no 4191.68 7702.32 24173.20
num 1002 g404dp yes 4191.89 7697.87 24171.60
num 1002 g404dp no 4191.89 7697.87 24171.60
num 1002 g463sp yes 3840.58 6250.14 19897.40
num 1002 g463sp no 4193.60 7700.64 24143.10
num 1002 g463dp yes 3841.84 6249.88 19894.70
num 1002 g463dp no 4194.01 7696.19 24141.40
num 1472 paper 3975.70 6943.30 19997.00
num 1472 ASC 3975.70 6943.35 19996.90
num 1472 g404sp yes 4197.65 7710.93 24153.70
num 1472 g404sp no 4197.65 7710.93 24153.70
num 1472 g404dp yes 4197.79 7709.92 24153.90
num 1472 g404dp no 4197.79 7709.92 24153.90
num 1472 g463sp yes 3841.69 6256.92 19882.80
num 1472 g463sp no 4198.08 7709.30 24123.30
num 1472 g463dp yes 3842.52 6259.11 19882.70
num 1472 g463dp no 4198.54 7708.29 24123.50
======================================================
On Sat, Feb 22, 2014 at 9:13 PM, David van der Spoel
<spoel at xray.bmc.uu.se>wrote:
> On 2014-02-22 18:34, João M. Damas wrote:
>
>> Dear all,
>>
>> I have been having some puzzling results when using g_sas to analyze some
>> trajectories, so I have decided to go back to basics.
>>
>> As described in the documentation, g_sas implements the double cubic
>> lattice method (DCLM) originally presented in Eisenhaber1995 (doi:
>> 10.1002/jcc.540160303 <http://dx.doi.org/10.1002/jcc.540160303>). Table I
>>
>> of that article shows a test case of three proteins (PDB code: 4PTI, 3FXN
>> and 1TIM), where the DCLM (numerical method) at different point densities
>> is compared against an analytic method (also by Eisenhaber, citation 30).
>>
>> My question was: can g_sas reproduce the areas reported in Table I of
>> Eisenhaber1995?
>>
>> First of all, I went on to reproduce the reported areas with the software
>> originally developed by Eisenhaber, which has both the analytical and
>> numerical (DCLM) methods implemented - ASC software. As it can be seen in
>> Attached Table (see below), running ASC provided results very similar (to
>> the tenths of square angstrom), if not equal, to the reported ones.
>> Besides
>> assessing the reported areas, I now have a software which allows me to
>> benchmark other cases.
>>
>> The next step was answering my question. For that, I had to make sure
>> g_sas
>> was performing the same calculation as ASC software:
>>
>> - Both ASC and g_sas took the same input structures (only using the
>> ATOM
>>
>> records). In the case of g_sas, I had to build a .tpr file for the -s
>> flag
>> of g_sas, which was done using pdb2gmx+grompp. Like with the ASC, the
>> g_sas
>> calculation was done using only the heavy atoms (using the Protein-H
>> group
>> for calculation and output groups).
>> - The radii used also had to be the same for both ASC and g_sas. g_sas
>>
>> is still using the vdwradii.dat, so I created a new vdwradii.dat file
>> with
>> the Eisenberg&McLachlan radii that are used in the ASC calculations
>> (both
>> mine and the reported ones). The radii were correctly attributed to
>> the
>> atoms, as checked in the g_sas.log file outputted by the debug mode.
>> - In both cases the probe radius was 1.4 angstroms (0.14 nm as per
>> g_sas
>>
>> input).
>>
>> The area calculations were then performed using g_sas (number of points
>> used were provided using the -ndots flag). Originally, I used GROMACS
>> 4.0.4, but afterwards I also performed the calculations using GROMACS
>> 4.6.3. Results are presented in Attached Table (see below).
>>
>> - GROMACS 4.0.4 g_sas DCLM overestimates the area by about 5, 10 and
>> 20%
>>
>> for 4PTI, 3FXN and 1TIM, respectively, in relation to ASC DCLM. This
>> is
>> also the same overestimation of the areas obtained through an
>> analytical
>> method, since ASC DCLM gives a good estimation of the analytical ones,
>> unlike g_sas.
>> - On the other hand, GROMACS 4.6.3 g_sas DCLM underestimates the area
>> by
>>
>> about 3, 10 and 0.5% for 4PTI, 3FXN and 1TIM, respectively, in
>> relation to
>> ASC DCLM. I have checked that this different behavior appeared from
>> the
>> 4.0.X to the 4.5.X versions. This is odd since I have searched the
>> release
>> notes for g_sas and I haven't found any change in this tool... Maybe
>> this
>> is a clue to what is going on?
>> - From my understanding, DCLM implemented on ASC or on g_sas shouldn't
>>
>> give different results. Moreover, even if there could be some
>> differences
>> when using a low number of points (due some factor of randomness I
>> may not
>> be aware), this difference should have already vanished when using
>> more
>> than 1000 points per atom. Take note that increasing the number of
>> points
>> in g_sas does not give areas nearer the analytical results (hence,
>> more
>> dots may not mean more accuracy, maybe this is the reason 24 is the
>> default
>> number...). Can anyone point me out on some mistake I may be doing?
>> Maybe
>> there is a factor that I may be disregarding...
>>
>> I have some data which point out that these differences (errors?) may lead
>> to very large errors when calculating properties like buried/contact
>> areas,
>> specially if the buried/contact areas have different magnitudes from the
>> molecule's areas. But this may be the theme of a another/follow-up e-mail
>> when I get ASC working on my trajectories.
>>
>> Best regards,
>> João M. Damas
>>
>> --
>> Attached Table
>> met - method used, anl=analytical method, num=numerical method (DCLM)
>> dots - number of points used for the numerical method
>> src - source of the data, paper=Eisenhaber1995, ASC=ASC software,
>> gmx404=g_sas of GMX4.0.4,
>> gmx463=g_sas of GMX4.6.3
>> Areas are in square angstroms
>> ==================================================
>> met dots src 4PTI 3FXN 1TIM
>> ==================================================
>> anl paper 3973.80 6943.80 20002.80
>> anl ASC 3973.81 6943.80 20002.80
>>
>> num 122 paper 3961.40 6968.30 19970.90
>> num 122 ASC 3961.44 6968.33 19970.90
>> num 122 gmx404 4165.31 7727.73 24110.10
>> num 122 gmx463 3825.86 6257.83 19867.90
>>
>> num 362 paper 3971.80 6933.40 19997.10
>> num 362 ASC 3971.79 6933.37 19997.10
>> num 362 gmx404 4192.12 7704.33 24161.50
>> num 362 gmx463 3838.35 6248.43 19886.80
>>
>> num 642 paper 3967.90 6944.40 19998.70
>> num 642 ASC 3967.78 6944.37 19998.70
>> num 642 gmx404 4187.54 7703.00 24148.90
>> num 642 gmx463 3831.64 6255.64 19882.70
>>
>> num 1002 paper 3974.10 6939.10 20012.20
>> num 1002 ASC 3974.07 6939.12 20012.20
>> num 1002 gmx404 4191.68 7702.32 24173.20
>> num 1002 gmx463 3840.58 6250.14 19897.40
>>
>> num 1472 paper 3975.70 6943.30 19997.00
>> num 1472 ASC 3975.70 6943.35 19996.90
>> num 1472 gmx404 4197.65 7710.93 24153.70
>> num 1472 gmx463 3841.69 6256.92 19882.80
>> ==================================================
>>
>> Thanks for a thorough analysis. Some more points to test:
> - compile gromacs in double precision and rerun
> - verify there are no periodic boundary effects by making the box larger.
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
> spoel at xray.bmc.uu.se http://folding.bmc.uu.se
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
João M. Damas
PhD Student
Protein Modelling Group
ITQB-UNL, Oeiras, Portugal
Tel:+351-214469613
More information about the gromacs.org_gmx-users
mailing list