[gmx-users] Problem: pdb2gmx with a more complex system

Tue Mar 27 04:16:06 CEST 2012

On 27/03/2012 1:04 PM, Justin A. Lemkul wrote:
>
>
> Jernej Zidar wrote:
>> Hi.
>>   I'm trying to import a PDB containing the following components to
>> Gromacs using pdb2gmx:
>> - polymer (1 molecule composed of 5 residues)
>> - lipids (cholesterol-37 molecules and sphingomyelin-78 molecules)
>> - water (2297 molecules)
>>
>>   The problem I have is that for some strange reason pdb2gmx does not
>> recognize the lipid part as being composed of 112 molecules, but
>> rather just as one molecule.
>>   If I attempt to manually correct this to 112, grompp complains about
>> the system containing 1345900 atoms (total number of lipid atoms
>> 11954*112 molecules+161 from the polymer bit+6891 from water) instead
>> of 19006 atoms (161 from the polymer bit, 11954 from the lipids, 6891
>> from water).
>>
>
> It sounds like you're confusing "molecule" with "moleculetype."  In 
> Gromacs, a moleculetype need not contain a single chemical molecule; 
> it can contain any combination of atoms.  If you tell grompp that 
> there are 112 copies of the moleculetype, then you get the behavior 
> you're seeing.

Yes.

> When dealing with a complex system like this one, it is often far 
> easier to:
>
> 1. Create a coordinate file of each individual molecule type (true 
> molecule, that is your polymer, cholesterol, and sphingomyelin)
> 2. Run pdb2gmx on each of these to obtain an individual molecule topology
> 3. Remove the unnecessary #include statements and directives from the 
> resulting molecule .top files to convert them to .itp
> 4. Create your own topology that simply uses #include statements
> 5. If you have alternating molecules in the coordinate file, re-order 
> it so it has each molecular species in blocks for far easier topology 
> handling
>
> Such a procedure creates a .top that is less redundant and more clear 
> topology.

I don't think this should be necessary with recent GROMACS versions. 
With suitable use of -chainsep, pdb2gmx is capable of recognizing each 
molecule in turn, writing a molecule .itp file for it, and constructing 
a simple .top that #includes the lot. The result can be more verbose 
than a minimal .top that deals elegantly with multiple copies of the 
same molecule, but that's not very important.

>
>>   Questions:
>> a) Why are the water molecules properly recognized? The only thing I
>> had to do was to use some sed commands to change the segment name from
>> "bulk" to SOL and the atomtypes from TIP3 (OH2, H1, H2) to SPC (OW,
>> HW1, HW2).

$GMXLIB/residuetypes.dat gives pdb2gmx the clue that this is water, The 
other stuff is something unknown, so it doesn't have a cunning plan for it.

>>
>
> Atom naming must match whatever Gromacs conventions are in place.
>
>> b) I used CHARMM to generate the lipid bilayer. The membrane building
>> process occurs in two stages so the residues in the resulting bilayers
>> are arranged in this order: cholesterol, sphingomyelin, cholesterol,
>> sphingomyelin. Could this be the cause?
>>   Why aren't the lipid residues recognized as separate molecules?
>>
>
> Again I think you are confusing the terminology and convention here.

pdb2gmx -chainsep can probably just deal with this.

Mark
>
> -Justin
>
>>   The lipid molecules are defined as separate molecules in the joined
>> charmm36cgenff forcefield, where I used the existing lipid molecules
>> in lipids.rtp as a template to add my own molecules.
>>
>>   I should emphasize the all the residues/molecules work perfectly
>> within CHARMM, but then again CHARMM has a different modus operandi.
>>
>> Thanks in advance for any help,
>> Jernej Zidar
>