[gmx-users] converting all-atom pdb to cg.pdb

Tue Nov 10 12:50:07 CET 2009

Francesco Pietra wrote:
> I corrected the awk script as indicated (attached here)
> 
> With input
> ATOM      1  N   LEU     1     153.242  64.673  95.851  0.00  0.00           N
> ATOM      2  CA  LEU     1     154.534  64.963  95.169  0.00  0.00           C
> ..........
> the output
> ATOM      2  BN0 LEU     1     154.534  64.963  95.169  0.00  0.00
> ATOM      4  SC1 LEU     1     156.589  66.550  95.065  0.00  0.00
> ................
> 
> is correct, the cgpdb file opens correctly in viewers, but there are
> contacts, which gromacs was unable to relax at the relaxation stage
> (nor it was at the all-atoms stage of the input file). Therefore I
> relaxed the all-atoms input file with AMBER until no contacts at 0.8A
> VDW, then repeated the awk script with the relaxed pdb file. Input
> 
> ATOM      1  N   LEU A   1     153.242  64.673  95.851  0.00  0.00           N
> ATOM      2  CA  LEU A   1     154.534  64.963  95.169  0.00  0.00           C
> .................
> the output
> ATOM      2  BN0 LEU     0       1.000 154.534  64.963 95.17  0.00
> ATOM      4  SC1 LEU     0       1.000 156.589  66.550 95.06  0.00
> .................
> is grossly incorrect. Notice that both input files above give correct
> psf and cg pdb files with VMD, just to say that the files are correct
> pdb layout.
> 
> I was unable to understand why the awk script one time works, another
> time not, just when I have a relxed file.
> 

Because now you have a chain identifier, a case that I believe I mentioned last 
time.  If you look at the script, the pattern matching expects a numeric field 
after the residue name; in the case of a chain identifier, this is not true and 
the script returns a zero instead of the actual residue number.  Note too that 
in the output, every field is shifted exactly by one place as a result.

-Justin

> I would appreciate very much that a stable version of the awk script
> is posted if my corrections were incorrect. Replacement in the martini
> web page would also be appreciated because one normally trusts in what
> is officially posted.
> 
> thanks
> francesco pietra
> 
> On Mon, Nov 9, 2009 at 12:58 PM, Justin A. Lemkul <jalemkul at vt.edu> wrote:
>>
>> Francesco Pietra wrote:
>>> Does the atom2cg_v2.1.awk require the indication of the subunit (A, B,
>>> C, etc) in the pdb file of a multimeric protein?
>>>
>>> From
>>>
>>> ATOM      1  N   LEU     1     153.242  64.673  95.851  0.00  0.00
>>>   N
>>> ATOM      2  CA  LEU     1     154.534  64.963  95.169  0.00  0.00
>>>   C
>>> ATOM      3  CB  LEU     1     155.257  66.191  95.767  0.00  0.00
>>>   C
>>> ATOM      4  CG  LEU     1     156.589  66.550  95.065  0.00  0.00
>>>   C
>>> ATOM      5  CD1 LEU     1     156.406  66.834  93.574  0.00  0.00
>>>   C
>>> ATOM      6  CD2 LEU     1     157.222  67.770  95.727  0.00  0.00
>>>   C
>>> ATOM      7  C   LEU     1     155.425  63.717  95.081  0.00  0.00
>>>   C
>>> ATOM      8  O   LEU     1     155.371  63.026  94.063  0.00  0.00
>>>   O
>>> ATOM      9  N   SER     2     156.233  63.409  96.105  0.00  0.00
>>>   N
>>>
>>> I get
>>>
>>> ATOM      2  BN0 LEU  154.534      64.963  95.169   0.000  0.00  0.00
>>> ATOM      4  SC1 LEU  156.589      66.550  95.065   0.000  0.00  0.00
>>> ATOM     10  BN0 SER  157.124      62.235  96.094   0.000  0.00  0.00
>>>
>>> i.e., weird residue numbers.
>>>
>> The awk script simply copies the information from one line to the new file,
>> using the old atom numbers.  You can use genconf -renumber to fix this.  The
>> reason why the residue number isn't being written is because there is a
>> problem with the atom2cg script that I have posted here a number of times.
>>  For example, you need to fix each line of the script:
>>
>> OLD LINE
>> if($1=="ATOM" && $4=="ARG" && $3=="CA")
>> printf("%4s  %5i %4s %3s  %4s    %8.3f%8.3f%8.3f%6.2f%6.2f    \n",$1, $2,
>> "BN0", $4, $6, $7, $8, $9,$10,$11);
>>
>> FIXED LINE
>> if($1=="ATOM" && $4=="ARG" && $3=="CA")
>> printf("%4s  %5i %4s %3s  %4i    %8.3f%8.3f%8.3f%6.2f%6.2f    \n",$1, $2,
>> "BN0", $4, $5, $6, $7, $8, $9,$10,$11);
>>
>>
>>> In another case (coming from AMBER, where the subunit indication is
>>> omitted) with the subunit indicated, the residue numbers in the cg
>>> file are correct. I don't see any other difference between the two
>>> starting files. Or should I look for a different cause.
>>>
>> Then that's simply a matter of luck :)  The print statements in the original
>> awk script do not expect chain identifiers, so the printing worked due to
>> the extra field.
>>
>> -Justin
>>
>>> thanks
>>>
>>> francesco pietra
>>> .......
>> --
>> ========================================
>>
>> Justin A. Lemkul
>> Ph.D. Candidate
>> ICTAS Doctoral Scholar
>> Department of Biochemistry
>> Virginia Tech
>> Blacksburg, VA
>> jalemkul[at]vt.edu | (540) 231-9080
>> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>>
>> ========================================
>> --
>> gmx-users mailing list    gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before posting!
>> Please don't post (un)subscribe requests to the list. Use the www interface
>> or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>>

-- 
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================