[gmx-users] Trjconv PDB files define solvent as "ATOM"?

Tue Mar 20 07:43:59 CET 2012

On 20/03/2012 5:10 AM, John Ladasky wrote:
> I am trying to import PDB file snapshots from a GROMACS 
> 4.5.4-generated trajectory into other software tools -- specifically, 
> Biopython.  I generate the snapshots using trjconv in GROMACS.
>
> I am interested in the water molecules from my solvent box, so I do 
> not discard them.  When trjconv prompts me to "Select group for 
> output", I select "Group 0 (System)".  However, in downstream 
> applications, I do want to differentiate the solvent atoms from my 
> protein polymer, and ensure that each group of atoms (protein atoms, 
> solvent atoms) is placed in a distinct category.
>
> Biopython's PDB file parser is not cooperating with me.  It is 
> attempting to append the water molecules as additional RESIDUES of my 
> polymer.  Obviously, this is incorrect.  So, where's the problem, 
> Biopython or GROMACS?  Looking through the PDB file specification, 
> version 3.2, I found the following passage:
>
> "The ATOM records present the atomic coordinates for standard amino 
> acids and nucleotides. They also present the occupancy and temperature 
> factor for each atom. Non-polymer chemical coordinates use the HETATM 
> record type."
>
> If I am reading this correctly, my solvent atoms should be tagged as 
> "HETATM" rather than as "ATOM".  But the files that trjconv produces 
> label every atom as "ATOM", whether it's an atom from the protein or 
> an atom from a water molecule.
>
> Is there any way to make trjconv use "HETATM" for solvent atoms?  I do 
> not see anything in the trjconv documentation.  I also do not 
> understand why trjconv might produce PDB files which do not adhere to 
> the standard.  There may be a good reason, I don't know.

Strict adherence by software to the PDB format is something of an 
exception rather than the rule. Often you will see TER records and/or 
chain IDs used to differentiate different parts of the same system. For 
this kind of reason, most software that claims to read PDB should have 
some way of making subset selections that are not dependent on the 
contents of the PDB file. You should consult the Biopython documentation 
to see how it likes to interpret things, and how you can customize that.

trjconv cannot attempt to guess how all possible pieces of software 
might like to interpret its results, and so it produces something 
generic and plausible. Depending how flexible Biopython is, you may need 
to use a shell script to post-process the trjconv output to do something 
like Tsjerk suggested, or insert TER records, or change chain IDs. Do 
read how Biopython works, first.

Mark