[gmx-users] converting all-atom pdb to cg.pdb
Justin A. Lemkul
jalemkul at vt.edu
Tue Nov 10 12:50:07 CET 2009
Francesco Pietra wrote:
> I corrected the awk script as indicated (attached here)
>
> With input
> ATOM 1 N LEU 1 153.242 64.673 95.851 0.00 0.00 N
> ATOM 2 CA LEU 1 154.534 64.963 95.169 0.00 0.00 C
> ..........
> the output
> ATOM 2 BN0 LEU 1 154.534 64.963 95.169 0.00 0.00
> ATOM 4 SC1 LEU 1 156.589 66.550 95.065 0.00 0.00
> ................
>
> is correct, the cgpdb file opens correctly in viewers, but there are
> contacts, which gromacs was unable to relax at the relaxation stage
> (nor it was at the all-atoms stage of the input file). Therefore I
> relaxed the all-atoms input file with AMBER until no contacts at 0.8A
> VDW, then repeated the awk script with the relaxed pdb file. Input
>
> ATOM 1 N LEU A 1 153.242 64.673 95.851 0.00 0.00 N
> ATOM 2 CA LEU A 1 154.534 64.963 95.169 0.00 0.00 C
> .................
> the output
> ATOM 2 BN0 LEU 0 1.000 154.534 64.963 95.17 0.00
> ATOM 4 SC1 LEU 0 1.000 156.589 66.550 95.06 0.00
> .................
> is grossly incorrect. Notice that both input files above give correct
> psf and cg pdb files with VMD, just to say that the files are correct
> pdb layout.
>
> I was unable to understand why the awk script one time works, another
> time not, just when I have a relxed file.
>
Because now you have a chain identifier, a case that I believe I mentioned last
time. If you look at the script, the pattern matching expects a numeric field
after the residue name; in the case of a chain identifier, this is not true and
the script returns a zero instead of the actual residue number. Note too that
in the output, every field is shifted exactly by one place as a result.
-Justin
> I would appreciate very much that a stable version of the awk script
> is posted if my corrections were incorrect. Replacement in the martini
> web page would also be appreciated because one normally trusts in what
> is officially posted.
>
> thanks
> francesco pietra
>
> On Mon, Nov 9, 2009 at 12:58 PM, Justin A. Lemkul <jalemkul at vt.edu> wrote:
>>
>> Francesco Pietra wrote:
>>> Does the atom2cg_v2.1.awk require the indication of the subunit (A, B,
>>> C, etc) in the pdb file of a multimeric protein?
>>>
>>> From
>>>
>>> ATOM 1 N LEU 1 153.242 64.673 95.851 0.00 0.00
>>> N
>>> ATOM 2 CA LEU 1 154.534 64.963 95.169 0.00 0.00
>>> C
>>> ATOM 3 CB LEU 1 155.257 66.191 95.767 0.00 0.00
>>> C
>>> ATOM 4 CG LEU 1 156.589 66.550 95.065 0.00 0.00
>>> C
>>> ATOM 5 CD1 LEU 1 156.406 66.834 93.574 0.00 0.00
>>> C
>>> ATOM 6 CD2 LEU 1 157.222 67.770 95.727 0.00 0.00
>>> C
>>> ATOM 7 C LEU 1 155.425 63.717 95.081 0.00 0.00
>>> C
>>> ATOM 8 O LEU 1 155.371 63.026 94.063 0.00 0.00
>>> O
>>> ATOM 9 N SER 2 156.233 63.409 96.105 0.00 0.00
>>> N
>>>
>>> I get
>>>
>>> ATOM 2 BN0 LEU 154.534 64.963 95.169 0.000 0.00 0.00
>>> ATOM 4 SC1 LEU 156.589 66.550 95.065 0.000 0.00 0.00
>>> ATOM 10 BN0 SER 157.124 62.235 96.094 0.000 0.00 0.00
>>>
>>> i.e., weird residue numbers.
>>>
>> The awk script simply copies the information from one line to the new file,
>> using the old atom numbers. You can use genconf -renumber to fix this. The
>> reason why the residue number isn't being written is because there is a
>> problem with the atom2cg script that I have posted here a number of times.
>> For example, you need to fix each line of the script:
>>
>> OLD LINE
>> if($1=="ATOM" && $4=="ARG" && $3=="CA")
>> printf("%4s %5i %4s %3s %4s %8.3f%8.3f%8.3f%6.2f%6.2f \n",$1, $2,
>> "BN0", $4, $6, $7, $8, $9,$10,$11);
>>
>> FIXED LINE
>> if($1=="ATOM" && $4=="ARG" && $3=="CA")
>> printf("%4s %5i %4s %3s %4i %8.3f%8.3f%8.3f%6.2f%6.2f \n",$1, $2,
>> "BN0", $4, $5, $6, $7, $8, $9,$10,$11);
>>
>>
>>> In another case (coming from AMBER, where the subunit indication is
>>> omitted) with the subunit indicated, the residue numbers in the cg
>>> file are correct. I don't see any other difference between the two
>>> starting files. Or should I look for a different cause.
>>>
>> Then that's simply a matter of luck :) The print statements in the original
>> awk script do not expect chain identifiers, so the printing worked due to
>> the extra field.
>>
>> -Justin
>>
>>> thanks
>>>
>>> francesco pietra
>>> .......
>> --
>> ========================================
>>
>> Justin A. Lemkul
>> Ph.D. Candidate
>> ICTAS Doctoral Scholar
>> Department of Biochemistry
>> Virginia Tech
>> Blacksburg, VA
>> jalemkul[at]vt.edu | (540) 231-9080
>> http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
>>
>> ========================================
>> --
>> gmx-users mailing list gmx-users at gromacs.org
>> http://lists.gromacs.org/mailman/listinfo/gmx-users
>> Please search the archive at http://www.gromacs.org/search before posting!
>> Please don't post (un)subscribe requests to the list. Use the www interface
>> or send it to gmx-users-request at gromacs.org.
>> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
>>
>>
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
More information about the gromacs.org_gmx-users
mailing list