[gmx-users] help with chromophore of a GFP

Thu Mar 21 21:52:08 CET 2013

On 3/21/13 4:43 PM, Mark Abraham wrote:
> On Thu, Mar 21, 2013 at 4:30 PM, Anna MARABOTTI <amarabotti at unisa.it> wrote:
>
>>
>>
>> Dear Mark,
>>
>> thank you for your message. I'm happy to be on the
>> right track; unfortunately the end point seems to be very far away...
>>
>>
>> I tried to obtain that CFY hydrogens and protein hydrogens are all
>> matching the aminoacids.rtp entry, in order to avoid dealing with
>> aminoacids.hdb. This is what I did:
>>
>> - starting from the pdb file of
>> the protein, I removed CFY entry (prot_noCFY.pdb)
>>
>> - I used pdb2gmx to
>> add H to the protein only: pdb2gmx -f prot_noCFY.pdb -o prot_noCFY_H.pdb
>> -p topol.top
>>
>> - I inserted CFY_H.pdb (obtained with Pymol in a previous
>> passage in which I added H with Pymol to the protein, including CFY)
>> into prot_noCFY_H.pdb, obtaining prot_CFY_H.pdb.
>>
>> In this way, H atoms
>> bound to "regular" residues have been added using Amber99SB, therefore
>> they are compatible with this ff, and atoms of CFY (previously added
>> with Pymol) have the same naming convention in aminoacids.rtp (that I
>> edited using atom types, charges etc. calculated with Antechamber on
>> this molecule coming from Pymol). Obviously, the atom numbering is not
>> sequential: the last atom of V63 (the last "regular" residue before CFY)
>> is numbered 938, the first atom of H68 (the first "regular" residue
>> after CFY) is numbered 939, and the atoms of CFY66 are numbered from 1
>> to 70. Moreover, since the sequence of atoms in aminoacids.rtp is not
>> the same as in the coordinates of CFY (I adapted the sequence of atoms
>> following the format of other residues in aminoacids.rtp), the numbering
>> of CFY in the prot_CFY_H.pdb is not ordered (1-2-3-....-69-70) but
>> disordered (19-54-20-55...49-50-24-25).
>>
>
> Seems fine. pdb2gmx is mostly about atom/residue naming. grompp is mostly
> about atom/residue/moleculetype ordering.
>
> - At this stage, I used
>> pdb2gmx again to create the topol.top file with all coordinates correct:
>>
>>
>> pdb2gmx -f prot_CFY_H.pdb -o prot_complete.gro -p topol.top
>>
>>
>> (selecting amber99sb forcefield and tip3p for water, as recommended
>> option)
>>
>> This is the message error from pdb2gmx:
>>
>> Read 'FLUORESCENT
>> PROTEIN', 3346 atoms
>> Analyzing pdb file
>> Splitting PDB chains based on
>> TER records or changing chain id.
>> There are 1 chains and 0 blocks of
>> water and 218 residues with 3346 atoms
>>
>>   chain #res #atoms
>>   1 'A' 213
>> 3346
>>
>
> I'd be concerned about the difference in residue count here, but 4.5.4 is
> so old I've no idea whose fault this is.
>
>
>> All occupancies are one
>> Opening force field file
>> ./amber99sb.ff/atomtypes.atp
>> Atomtype 1
>> Reading residue database...
>> (amber99sb)
>> Opening force field file
>> ./amber99sb.ff/aminoacids.rtp
>> Residue 94
>> Sorting it all out...
>> Opening
>> force field file ./amber99sb.ff/dna.rtp
>> Residue 110
>> Sorting it all
>> out...
>> Opening force field file ./amber99sb.ff/rna.rtp
>> Residue
>> 126
>> Sorting it all out...
>> Opening force field file
>> ./amber99sb.ff/aminoacids.hdb
>> Opening force field file
>> ./amber99sb.ff/dna.hdb
>> Opening force field file
>> ./amber99sb.ff/rna.hdb
>> Opening force field file
>> ./amber99sb.ff/aminoacids.n.tdb
>> Opening force field file
>> ./amber99sb.ff/aminoacids.c.tdb
>>
>> Processing chain 1 'A' (3346 atoms, 213
>> residues)
>> There are 327 donors and 319 acceptors
>> There are 539 hydrogen
>> bonds
>> Will use HISE for residue 22
>> Will use HISD for residue 38
>> Will use
>> HISE for residue 62
>> Will use HISE for residue 68
>> Will use HISD for
>> residue 109
>> Will use HISE for residue 119
>> Will use HISE for residue
>> 172
>> Will use HISH for residue 193
>> Will use HISH for residue 197
>> Will use
>> HISE for residue 217
>> Identified residue SER3 as a starting
>> terminus.
>> Identified residue SER218 as a ending terminus.
>> 8 out of 8
>> lines of specbond.dat converted successfully
>> Special Atom Distance
>> matrix:
>>   MET9 MET11 MET15 HIS22 HIS38 MET41 MET47
>>   SD110 SD149 SD232
>> NE2317 NE2549 SD596 SD700
>>   MET11 SD149 0.807
>>   MET15 SD232 2.279 1.627
>>
>> HIS22 NE2317 3.707 2.983 1.466
>>   HIS38 NE2549 1.401 0.928 2.127 3.254
>>
>> MET41 SD596 1.458 0.665 1.144 2.384 1.001
>>   MET47 SD700 3.059 2.324 0.995
>> 0.801 2.656 1.761
>>   MET53 SD777 2.786 1.999 0.990 1.171 2.160 1.373
>> 0.603
>>   HIS62 NE2917 2.340 1.733 0.833 1.797 1.988 1.236 1.583
>>   HIS68
>> NE21002 0.884 0.597 1.466 2.916 1.356 0.885 2.347
>>   HIS109 NE21638 2.061
>> 1.886 1.380 2.614 2.661 1.862 2.279
>>   HIS119 NE21803 1.459 0.967 0.923
>> 2.372 1.617 0.812 1.870
>>   MET135 SD2041 3.480 2.751 1.316 0.606 2.919
>> 2.121 0.993
>>   MET162 SD2439 2.521 1.976 1.656 2.412 1.855 1.543 2.264
>>
>> HIS172 NE22588 3.632 2.949 1.894 1.657 2.872 2.338 1.945
>>   CYS174 SG2623
>> 2.968 2.372 1.452 1.861 2.428 1.848 1.924
>>   MET189 SD2891 2.167 2.379
>> 2.736 4.000 2.754 2.569 3.722
>>   HIS193 NE22942 2.003 2.001 2.490 3.686
>> 2.049 2.075 3.396
>>   HIS197 NE23011 2.012 1.634 1.830 2.896 1.554 1.426
>> 2.614
>>   HIS217 NE23329 2.545 2.376 2.831 3.805 2.039 2.305 3.575
>>   MET53
>> HIS62 HIS68 HIS109 HIS119 MET135 MET162
>>   SD777 NE2917 NE21002 NE21638
>> NE21803 SD2041 SD2439
>>   HIS62 NE2917 1.363
>>   HIS68 NE21002 2.107 1.482
>>
>> HIS109 NE21638 2.365 1.568 1.372
>>   HIS119 NE21803 1.688 0.976 0.584
>> 1.078
>>   MET135 SD2041 1.057 1.365 2.661 2.490 2.119
>>   MET162 SD2439 1.878
>> 0.871 1.805 2.246 1.520 1.861
>>   HIS172 NE22588 1.721 1.401 2.829 2.860
>> 2.359 1.067 1.342
>>   CYS174 SG2623 1.694 0.725 2.140 2.152 1.681 1.297
>> 0.745
>>   MET189 SD2891 3.547 2.310 1.858 1.893 1.980 3.627 2.290
>>   HIS193
>> NE22942 3.076 1.890 1.639 2.197 1.760 3.221 1.547
>>   HIS197 NE23011 2.229
>> 1.149 1.407 2.078 1.323 2.401 0.676
>>   HIS217 NE23329 3.146 2.112 2.205
>> 2.935 2.272 3.263 1.402
>>   HIS172 CYS174 MET189 HIS193 HIS197
>>   NE22588
>> SG2623 SD2891 NE22942 NE23011
>>   CYS174 SG2623 0.826
>>   MET189 SD2891 3.417
>> 2.599
>>   HIS193 NE22942 2.831 2.079 1.020
>>   HIS197 NE23011 2.011 1.324
>> 1.766 0.939
>>   HIS217 NE23329 2.629 2.068 1.936 0.946 1.003
>> Opening force
>> field file ./amber99sb.ff/aminoacids.arn
>> Opening force field file
>> ./amber99sb.ff/dna.arn
>> Opening force field file
>> ./amber99sb.ff/rna.arn
>> Checking for duplicate atoms....
>> Now there are
>> 3345 atoms. Deleted 1 duplicates.
>>
>
> That also looks suspicious.
>
>
>> Now there are 213 residues with 3345
>> atoms
>> Making bonds...
>> Warning: Long Bond (988-989 = 0.453624
>> nm)
>>
>
> That seems like it might be a peptide bond bridging a "gap" where pdb2gmx
> was unable to recognize the intervening content as a peptide residue.
>
>
>>
>> WARNING: atom O1 is missing in residue CFY 66 in the pdb
>> file
>>
>> -------------------------------------------------------
>> Program
>> pdb2gmx_d, VERSION 4.5.4
>> Source code file: pdb2top.c, line: 1463
>>
>> Fatal
>> error:
>> There were 1 missing atoms in molecule Protein_chain_A, if you
>> want to use this incomplete topology anyhow, use the option -missing
>> For
>> more information and tips for troubleshooting, please check the
>> GROMACS
>> website at http://www.gromacs.org/Documentation/Errors
>>
>> The
>> strange thing is that I checked for this error, but atom O1 in residue
>> CFY66 is present BOTH in the starting .pdb file (the one I used for
>> pdb2gmx) AND in the aminoacids.rtp file!!!! I checked 4 or 5 times,
>> every time erasing the old file, checking the file IMMEDIATELY BEFORE
>> submitting it to pdb2gmx. All atoms present in aminoacids.rtp for CFY
>> residue are also present in the .pdb file and vice versa, and I am sure
>> I did not make the stupid error of naming the atom 01 (zero-one) instead
>> of O1 (o-one).
>>
>> I suspect that this atom is the one which is deleted
>> because recognized as duplicated, but I'm not sure about it and I don't
>> know how to check it. I am sure there are no duplicated atoms in CFY.
>>
>>
>> I feel like this is a "fake" error message (i.e.: there is an error in
>> my files, but it is not the one that is reported in the message:
>> probably a problem occur around this atom, but it is not exactly ON this
>> atom). However, I am not able to find errors.
>>
>
> Hmm that seems weird. Justin's theory sounds plausible, but I haven't seen
> someone stumble on that before. Also plausible is that pdb2gmx thinks your
> CFY is a disconnected part of the chain and needs terminating (which might
> happen with an oxygen named O1?).
>

I stumbled across it when working with the GFP chromophore a while back :)

http://redmine.gromacs.org/issues/567

Still technically an open "bug," though I agree that it's really expected 
behavior, provided one knows how pdb2gmx works, which involves lots of steps, of 
course.

-Justin

> It's possible there's buggy behaviour here that has been fixed in the two
> years since that code was released. There certainly has been an upgrade of
> the "is this really a new chain" machinery. Unless you have a strong
> scientific reason to keep 4.5.4, I'd switch to 4.6.1 (or 4.5.6 if you
> really have to keep 4.5). If Justin's fix doesn't work, and you have
> problems with a more recent version, then we can look closer.
>
>
>> BTW the "long bond" of
>> the other warning message is not involving residue CFY.
>>
>
> Yeah, but my bet is those atoms are the C-terminus and N-terminus of the
> fragments that should form peptide bonds to CFY.
>
> Mark
>
>
>> Any help is
>> welcome
>>
>> Thank you so much.
>>
>> Anna
>>
>> Il 21.03.2013 12:00
>> gmx-users-request at gromacs.org ha scritto:
>>
>>>> Dear gmx-users, it's
>> about two weeks that I'm trying to solve this problem, and I can't, so
>> I'm asking your help. I want to do some MD simulations on a protein of
>> the family of green fluorescent protein. This protein, as you know, has
>> a chromophore (CFY) derived from four residues of the protein
>> (F64-C65-Y66-G67) and covalently bound to the rest of the protein chain.
>> How to parametrize this object, since it is not recognized by pdb2gmx? I
>> looked at the gmx-users list and the suggestion was to create a new
>> entry in the .rtp file of the selected forcefield.
>>>
>>> Indeed, this
>> kind of problem is most easily solved by making a new
>>> "residue" that
>> contains the whole chromophore, such that it links to its
>>> neighbours
>> with normal peptide links.
>>> ------------------------------ Message: 5
>> Date: Thu, 21 Mar 2013 11:46:12 +0100 From: Mark Abraham
>> <mark.j.abraham at gmail.com [2]> Subject: Re: [gmx-users] help with
>> chromophore of a GFP To: Discussion list for GROMACS users
>> <gmx-users at gromacs.org [3]> Message-ID:
>> <CAMNuMASicyMGiVb_x5sY1YB44th8VKNioQVhzDqq-tAm9TnRqQ at mail.gmail.com [4]>
>> Content-Type: text/plain; charset=ISO-8859-1 On Wed, Mar 20, 2013 at
>> 6:01 PM, Anna MARABOTTI <amarabotti at unisa.it [5]> wrote:
>>>
>>>> I
>> decided to use Amber99SB since it seemed the better for my scope, then I
>> start trying to parameterize it. This is what I did: * I used Pymol to
>> add H to my pdb file, since I want to use an all H forcefield and since
>> Antechamber (see below) does not work without H * I extracted the
>> segment V63-CFY-H68 from my .pdb file. I did this since, when I
>> extracted CFY only, I had problems with the terminals * Following the
>> Antechamber tutorial, I used Antechamber (using the traditional Amber
>> force field, not GAFF) to calculate charges and to assign atom types to
>> this segment. * I used these calculated parameters in order to add the
>> CFY residue to aminoacids.rtp in amber99sb.ff directory. * I tried to
>> modify also aminoacids.hdb, but since it seemed too complicated to me, I
>> decided to keep it unchanged, and to give pdb2gmx the protein with H
>> already present * No need to add new atom/bond types to ffbonded.itp and
>> ffnonbonded.itp: they seem all present. Since CFY is bound to the rest
>> of protein with common peptide bonds, I did not change specbond.dat
>> either. * I added CFY in residuetypes.dat with the specification
>> "Protein" In my opinion, all was ready to go, instead... When I launched
>> pdb2gmx to my protein with H added by PyMol, I got immediately an error:
>> Fatal error: Atom H01 in residue SER 3 was not found in rtp entry NSER
>> with 13 atoms while sorting atoms. For a hydrogen, this can be a
>> different protonation state, or it might have had a different number in
>> the PDB file and was rebuilt (it might for instance have been H3, and we
>> only expected H1 & H2). Note that hydrogens might have been added to the
>> entry for the N-terminus. Remove this hydrogen or choose a different
>> protonation state to solve it. Option -ignh will ignore all hydrogens in
>> the input. For more information and tips for troubleshooting, please
>> check the GROMACS website at http://www.gromacs.org/Documentation/Errors
>> [1][1]
>>>>
>>>>>  From this error I
>>>> understand that: * the code for H
>> in PyMol is different from the code for H in Amber (read from
>> aminoacids.rtp); in order to correct this error, I should add -ignh in
>> order to ignore H in input.
>>>
>>> pdb2gmx has to be able to make sense of
>> the atom naming. There are lots of
>>> different conventions for how to
>> name atoms, particularly hydrogen atoms.
>>> pdb2gmx can't possibly encode
>> the logic to convert all of those
>>> conventions. So the path of least
>> resistance can be to ignore hydrogens and
>>> regenerate them according to
>> the generation rules.
>>>
>>> However, you can just rename them in the
>> input file so that pdb2gmx
>>> understands your meaning. The NSER entry in
>> the .rtp file shows you the
>>> names pdb2gmx expects. If you edit the
>> names of those hydrogen atoms
>>> (probably H01, H02, H03) in your input
>> coordinate file accordingly (to H1,
>>> H2, H3), things will be fine. Be
>> sure you don't break the required column
>>> formatting of the coordinate
>> file!
>>>
>>> *
>>
>>
>>
>> Links:
>> ------
>> [1]
>> http://www.gromacs.org/Documentation/Errors
>> [2]
>> mailto:mark.j.abraham at gmail.com
>> [3] mailto:gmx-users at gromacs.org
>> [4]
>> mailto:CAMNuMASicyMGiVb_x5sY1YB44th8VKNioQVhzDqq-tAm9TnRqQ at mail.gmail.com
>> [5]
>> mailto:amarabotti at unisa.it
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
>> * Please don't post (un)subscribe requests to the list. Use the
>> www interface or send it to gmx-users-request at gromacs.org.
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>

-- 
========================================

Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================