[gmx-users] pdb2gmx: how to preserve original ions?

Mon Jul 23 18:19:26 CEST 2018

Thanks Justin and Mark!

Following up by replying to a digest message (and switched to html formatting) so hope this looks nice...

I decided to take Mark’s advice and convert to known restypes. Seems easy enough for ions since I haven’t started looking into what’s necessary for parameterization yet. I’ve pasted the first pass of my code below (python3) in case anybody finds it interesting. My project is to write a “workflow” tool which can automate as much cleanup/input generation as possible for an arbitrary pdb (but I’m just in the beginning stages of writing the gromacs input prep). It’s not very well tested (and certainly not validated scientifically), but at least you could review my approach. Of course I’d be happy for any of it to be integrated directly into pdb2gmx, too. ;-)

One question: why is ACE included in aminoacids.rtp for charm27 but NME is not? Well anyway, I’ll add logic later to do something more clever based on the forcefield.

Cheers,
Amos.

'''Use pdb2gmx to process the pdb. Mostly, we're using it to rename
residues, but also to validate the pdb file itself. However, pdb2gmx
requires some workarounds.

1) Its default values for some options aren't picked based on
context. For example, ACE is incompatible with NH3+ because it doesn't
have an N atom.

2) Many of its options, e.g., termini handling, can only be selected
interactively. I've used pexpect to solve this based on a regex on the
output. This seems fragile, but not sure what else I can do.

3) pdb2gmx seems unaware of which HETATMs (using "standard" restypes)
the forcefield already knows about and it will require them to be
removed from the input pdb. The next best approach would be for us to
parse their files. The quick solution here is to punt with hard coded
lists of what we think is known. Not carefully tested.

4) pdb2gmx doesn't recognize that the ACE residue is used for
termini. Therefore we inject TER records right before them.

'''
def makeGromacs(infile):
   print("\n\n# Making Gromacs files...\n\n")
   dirname = "gromacs"
   shutil.rmtree(dirname,ignore_errors=True)
   os.mkdir(dirname)

   unknown = os.path.realpath(dirname+"/UNKNOWN.pdb")
   known = os.path.realpath(dirname+"/KNOWN.pdb")
   knownIons = {" CL":"CLA",
                " NA":"SOD",
                "  K":"POT",
                " MG":" MG",
                " CA":"CAL",
                " ZN":"ZN2",
                " CS":"CES"}
   knownHetatms = {"HOH":"HOH",
                   "ACE":"ACE"}
   unknownSeen = set()
   unknownCounts = defaultdict(int)
   prevRestype = None
   with open(unknown,"w") as het:
      with open(known,"w") as atm:
         for line in open(infile,'r'):
            if not line.startswith("HETATM") and not line.startswith("ATOM"):
               prevRestype = None
               continue
            restype = line[17:20]
            if restype == "ACE" and prevRestype != "ACE" and prevRestype is not None:
               atm.write("TER\n")
            if not line.startswith("HETATM") or restype in knownIons or restype in knownHetatms:
               if restype in knownIons:
                  newName = knownIons[restype]
                  line = line[:12] + "%4s%1s%3s"%(newName,line[16],newName) + line[20:]
               atm.write(line)
            else:
               key = line[17:26] #restype and residx
               if key not in unknownSeen:
                  unknownCounts[restype] += 1
                  unknownSeen.add(key)
               het.write(line)
            prevRestype = restype
   print("\n\nSnipping unknown restypes:",dict(unknownCounts),"\n\n")
   origDir = os.getcwd()
   os.chdir(dirname)

   cmd = "gmx_d pdb2gmx -ter -o gromacs.pdb -p gromacs.top -ignh -water tip3p -ff charmm27 -f "+known
   print("# Using pexpect to handle interaction with:",cmd)
   p = pexpect.spawn(cmd,encoding="UTF-8")

   while p.isalive():
      try:
         p.expect("Select (start|end) terminus type for ([\w\-\d]*\r\n)(.+None\s*\r\n)")
         print(p.before)
         print(p.after)

         resname = p.match.group(2)
         print("# pdbCompiler parsed resname:",resname)

         options = { m.group(2):m.group(1) for m in re.finditer('\s+(\d+)\:\s+([\w\d\+\-]+)\r\n', p.match.groups()[-1])}
         print("# pdbCompiler parsed termini options:",options)

         selectedOption = "0" #the default option
         if resname.startswith("ACE") or resname.startswith("NME"):
            selectedOption = options["None"]

         found = False
         for k,v in options.items():
            if v == selectedOption:
               found = True
               print("# Selecting terminal option %s:%s for residue %s"%(v,k,resname))
         if not found: print("Unknown option but trying anyway:",selectedOption)
         p.sendline(selectedOption)

      except pexpect.EOF:
         print(p.before)
      except pexpect.TIMEOUT:
         print("TIMEOUT")
         print(p.before)

   os.chdir(origDir)
   runCmdsAndZipResults([],dirname)
   print("\n\n# Done making Gromacs files...\n\n")

Message: 2
Date: Tue, 17 Jul 2018 21:26:11 -0400
From: Justin Lemkul <jalemkul at vt.edu<mailto:jalemkul at vt.edu>>
To: gmx-users at gromacs.org<mailto:gmx-users at gromacs.org>
Subject: Re: [gmx-users] pdb2gmx: how to preserve original ions?
Message-ID: <dba7c125-0f3b-0a16-0d85-f0dd054b40a2 at vt.edu<mailto:dba7c125-0f3b-0a16-0d85-f0dd054b40a2 at vt.edu>>
Content-Type: text/plain; charset=windows-1252; format=flowed

On 7/17/18 9:22 PM, Anderson, Amos wrote:
Hi Gromacs users,

I?ve never used gromacs before, so sorry if this question has an obvious answer somewhere ? it seems like the sort I should have been able to find an answer for?

I want to write a python script to prepare an arbitrary pdb for use with gromacs (e.g., does these steps http://www.mdtutorials.com/gmx/complex/01_pdb2gmx.html for the user). The input pdb to my script may have waters, ions, and ligand(s) in it already. In the tutorial it says "you are going to want to strip out the crystal waters, PO4, and BME. Note that such a procedure is not universally appropriate (i.e., the case of a bound active site water molecule).? The wording maybe implies that other than active site waters and the ligand of interest, I?d never want to preserve HETATOMs in my input file?

If not, and if I do want to keep them, how do I? Do I just treat all of them like ligands in the tutorial, e.g., maybe a protocol like this (more precise for ions I care about)?

       *   grep HETATOM my.pdb > hetatoms.pdb

HETATM, but yes.

       *   gmx editconf -f hetatoms.pdb -o hetatoms.gro
       *   copy/paste hetatoms.gro into my main .gro file (is there python/utility anywhere for merging .gro files I could leverage?)

Conversion to .gro format is unnecessary; GROMACS handles PDB and other
formats, if you prefer.

       *   for ions gromacs doesn?t already know about (ions.itp), follow the instructions described here: http://www.mdtutorials.com/gmx/complex/02_topology.html

Parametrization servers won't generally deal with monoatomic species. In
a pairwise additive force field, the charge is already fixed and LJ have
to be tuned, typically based on free energy of solvation.

(in other words, I?m puzzled why pdb2gmx is complaining about ions it should already know about from the built-in ions.itp)

pdb2gmx doesn't know anything about ions.itp, that comes in later when
calling grompp. pdb2gmx only knows about what is defined in the force
field .rtp file(s).

-Justin

--
==================================================

Justin A. Lemkul, Ph.D.
Assistant Professor
Virginia Tech Department of Biochemistry

303 Engel Hall
340 West Campus Dr.
Blacksburg, VA 24061

jalemkul at vt.edu<mailto:jalemkul at vt.edu> | (540) 231-3129
http://www.thelemkullab.com

==================================================

------------------------------

Message: 4
Date: Wed, 18 Jul 2018 08:31:00 +0200
From: Mark Abraham <mark.j.abraham at gmail.com<mailto:mark.j.abraham at gmail.com>>
To: gmx-users at gromacs.org<mailto:gmx-users at gromacs.org>
Cc: "gromacs.org_gmx-users at maillist.sys.kth.se<mailto:gromacs.org_gmx-users at maillist.sys.kth.se>"
<gromacs.org_gmx-users at maillist.sys.kth.se<mailto:gromacs.org_gmx-users at maillist.sys.kth.se>>
Subject: Re: [gmx-users] pdb2gmx: how to preserve original ions?
Message-ID:
<CAMNuMASh_0j2oWuujDv69+daXHqp7EcDsa01G_Vscqv2ZTfyFw at mail.gmail.com<mailto:CAMNuMASh_0j2oWuujDv69+daXHqp7EcDsa01G_Vscqv2ZTfyFw at mail.gmail.com>>
Content-Type: text/plain; charset="UTF-8"

Hi,

The name of the residue in that force fields aminoacids.rtp is CLA, which
is the only thing pdb2gmx can understand. Otherwise your procedure should
just work if you rename your ion residues appropriately. Do let us know how
you go!

Mark

On Wed, Jul 18, 2018, 03:23 Anderson, Amos <Amos.Anderson at pfizer.com<mailto:Amos.Anderson at pfizer.com>> wrote:

Hi Gromacs users,

I?ve never used gromacs before, so sorry if this question has an obvious
answer somewhere ? it seems like the sort I should have been able to find
an answer for?

I want to write a python script to prepare an arbitrary pdb for use with
gromacs (e.g., does these steps
http://www.mdtutorials.com/gmx/complex/01_pdb2gmx.html for the user). The
input pdb to my script may have waters, ions, and ligand(s) in it already.
In the tutorial it says "you are going to want to strip out the crystal
waters, PO4, and BME. Note that such a procedure is not universally
appropriate (i.e., the case of a bound active site water molecule).? The
wording maybe implies that other than active site waters and the ligand of
interest, I?d never want to preserve HETATOMs in my input file?

If not, and if I do want to keep them, how do I? Do I just treat all of
them like ligands in the tutorial, e.g., maybe a protocol like this (more
precise for ions I care about)?

      *   grep HETATOM my.pdb > hetatoms.pdb
      *   gmx editconf -f hetatoms.pdb -o hetatoms.gro
      *   copy/paste hetatoms.gro into my main .gro file (is there
python/utility anywhere for merging .gro files I could leverage?)
      *   for ions gromacs doesn?t already know about (ions.itp), follow
the instructions described here:
http://www.mdtutorials.com/gmx/complex/02_topology.html

(in other words, I?m puzzled why pdb2gmx is complaining about ions it
should already know about from the built-in ions.itp)

Thanks for any advice,
Amos.

To help future googlers, I run pdb2gmx on a file that has CL in it like
this:

gmx_d pdb2gmx -v -o output.gro -p output.top -ignh -water tip3p -ff
charmm27 -f my.pdb

  I get this error:

Program:     gmx pdb2gmx, version 2016.3
Source file: src/gromacs/gmxpreprocess/resall.cpp (line 649)

Fatal error:
Residue 'CL' not found in residue topology database

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

which surprises me because CL is defined in ions.itp. It says it?s finding
other files in the same directory as ions.itp, so it doesn?t seem like some
kind of misconfigured software install.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-request at gromacs.org<mailto:gmx-users-request at gromacs.org>.