[gmx-users] support mmcif format?

Mark Abraham mark.j.abraham at gmail.com
Mon Nov 23 12:52:51 CET 2015


Hi,


On Mon, Nov 23, 2015 at 12:09 PM Hannes Loeffler <Hannes.Loeffler at stfc.ac.uk>
wrote:

> On Mon, 23 Nov 2015 09:57:55 +0000
> Mark Abraham <mark.j.abraham at gmail.com> wrote:
>
> > I think we are unlikely to plan to write mmCIF directly as a
> > trajectory format,
>
> It's hard to see for me why someone would even want to use mmCIF as
> a trajectory format.  The format serves, primarily, the crystallography
> community to store a single set of coordinates together with a huge
> amount of meta data, most of which are probably rather irrelevant for
> trajectories.   (For archiving purposes this is of course interesting
> but no need to write this into every single trajectory, just like the
> topology is kept separate).
>
> I think for the simulation community the mmCIF format should be what the
> PDB format should always have been: a read-only file format to be
> converted into a format developers think is best for their particular
> piece of software.  Any future setup program should be able to read
> mmCIF but several libraries are already available to do exactly that.
>

Exactly. Albert's link shows an example of the new ATOM record at

http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html

loop_
    _atom_site.group_PDB
    _atom_site.type_symbol
    _atom_site.label_atom_id
    _atom_site.label_comp_id
    _atom_site.label_asym_id
    _atom_site.label_seq_id
    _atom_site.label_alt_id
    _atom_site.Cartn_x
    _atom_site.Cartn_y
    _atom_site.Cartn_z
    _atom_site.occupancy
    _atom_site.B_iso_or_equiv
    _atom_site.footnote_id
    _atom_site.auth_seq_id
    _atom_site.id
    ATOM N  N   VAL  A  11  .  25.369  30.691  11.795  1.00  17.93  .  11   1
    ATOM C  CA  VAL  A  11  .  25.970  31.965  12.332  1.00  17.75  .  11   2
    ATOM C  C   VAL  A  11  .  25.569  32.010  13.808  1.00  17.83  .  11   3
    ATOM O  O   VAL  A  11  .  24.735  31.190  14.167  1.00  17.53  .  11   4
    ATOM C  CB  VAL  A  11  .  25.379  33.146  11.540  1.00  17.66  .  11   5
    ATOM C  CG1 VAL  A  11  .  25.584  33.034  10.030  1.00  18.86  .  11   6
    ATOM C  CG2 VAL  A  11  .  23.933  33.309  11.872  1.00  17.12  .  11   7
    ATOM N  N   THR  A  12  .  26.095  32.930  14.590  1.00  18.97  4  12   8
    ATOM C  CA  THR  A  12  .  25.734  32.995  16.032  1.00  19.80  4  12   9
    ATOM C  C   THR  A  12  .  24.695  34.106  16.113  1.00  20.92  4  12  10
    ATOM O  O   THR  A  12  .  24.869  35.118  15.421  1.00  21.84  4  12  11
    ATOM C  CB  THR  A  12  .  26.911  33.346  17.018  1.00  20.51  4  12  12
    ATOM O  OG1 THR  A  12  3  27.946  33.921  16.183  0.50  20.29  4  12  13
    ATOM O  OG1 THR  A  12  4  27.769  32.142  17.103  0.50  20.59  4  12  14
    ATOM C  CG2 THR  A  12  3  27.418  32.181  17.878  0.50  20.47  4  12  15


That's similar to the old PDB ATOM record. Even just considering these new
records, that's about 10x more bytes of data on each line than we need,
which we have to store and do I/O and parsing on. We have to store it for
every atom in every frame, and we have to plan for the possibility of
billions of both atoms and frames. This should be handled with metadata
pointers and such, and this is what e.g. .tpr+.trr, .tpr+.xtc, do (albeit
poorly), and what .tng can do well (when we finish with it).

Mark

Cheers,
> Hannes.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list