[gmx-users] support mmcif format?
Mark Abraham
mark.j.abraham at gmail.com
Mon Nov 23 12:52:51 CET 2015
Hi,
On Mon, Nov 23, 2015 at 12:09 PM Hannes Loeffler <Hannes.Loeffler at stfc.ac.uk>
wrote:
> On Mon, 23 Nov 2015 09:57:55 +0000
> Mark Abraham <mark.j.abraham at gmail.com> wrote:
>
> > I think we are unlikely to plan to write mmCIF directly as a
> > trajectory format,
>
> It's hard to see for me why someone would even want to use mmCIF as
> a trajectory format. The format serves, primarily, the crystallography
> community to store a single set of coordinates together with a huge
> amount of meta data, most of which are probably rather irrelevant for
> trajectories. (For archiving purposes this is of course interesting
> but no need to write this into every single trajectory, just like the
> topology is kept separate).
>
> I think for the simulation community the mmCIF format should be what the
> PDB format should always have been: a read-only file format to be
> converted into a format developers think is best for their particular
> piece of software. Any future setup program should be able to read
> mmCIF but several libraries are already available to do exactly that.
>
Exactly. Albert's link shows an example of the new ATOM record at
http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Categories/atom_site.html
loop_
_atom_site.group_PDB
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_seq_id
_atom_site.label_alt_id
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.footnote_id
_atom_site.auth_seq_id
_atom_site.id
ATOM N N VAL A 11 . 25.369 30.691 11.795 1.00 17.93 . 11 1
ATOM C CA VAL A 11 . 25.970 31.965 12.332 1.00 17.75 . 11 2
ATOM C C VAL A 11 . 25.569 32.010 13.808 1.00 17.83 . 11 3
ATOM O O VAL A 11 . 24.735 31.190 14.167 1.00 17.53 . 11 4
ATOM C CB VAL A 11 . 25.379 33.146 11.540 1.00 17.66 . 11 5
ATOM C CG1 VAL A 11 . 25.584 33.034 10.030 1.00 18.86 . 11 6
ATOM C CG2 VAL A 11 . 23.933 33.309 11.872 1.00 17.12 . 11 7
ATOM N N THR A 12 . 26.095 32.930 14.590 1.00 18.97 4 12 8
ATOM C CA THR A 12 . 25.734 32.995 16.032 1.00 19.80 4 12 9
ATOM C C THR A 12 . 24.695 34.106 16.113 1.00 20.92 4 12 10
ATOM O O THR A 12 . 24.869 35.118 15.421 1.00 21.84 4 12 11
ATOM C CB THR A 12 . 26.911 33.346 17.018 1.00 20.51 4 12 12
ATOM O OG1 THR A 12 3 27.946 33.921 16.183 0.50 20.29 4 12 13
ATOM O OG1 THR A 12 4 27.769 32.142 17.103 0.50 20.59 4 12 14
ATOM C CG2 THR A 12 3 27.418 32.181 17.878 0.50 20.47 4 12 15
That's similar to the old PDB ATOM record. Even just considering these new
records, that's about 10x more bytes of data on each line than we need,
which we have to store and do I/O and parsing on. We have to store it for
every atom in every frame, and we have to plan for the possibility of
billions of both atoms and frames. This should be handled with metadata
pointers and such, and this is what e.g. .tpr+.trr, .tpr+.xtc, do (albeit
poorly), and what .tng can do well (when we finish with it).
Mark
Cheers,
> Hannes.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list