[gmx-developers] libxml2 versus JSON

Erik Lindahl erik.lindahl at gmail.com
Thu May 12 18:54:29 CEST 2016


Hi,

Yes, that's my point too :-)  First, I'm all for just picking JSON right
away, but there are a few additional things that are not rocket science,
but that we also should settle on before hacking away to avoid each data
file inventing its own standard. For instance:

* How do we handle units? Lists?

* How do we specify any property that depends on the name/type of a
particle, residue, or chain? How do we specify things that depend on
pairwise combinations?

* How do we handle text fields/comments that we might want to be able to
read into Gromacs?

* We should have some sort of metadata for all parameters that describe the
source, and what program version or person who wrote it. How should
references to scientific papers be specified so we don't have to parse them
from free text format?

* It would be good to decide on some labels on the very highest level as we
implement things. Initially we might only have "ChemicalProperties", but
that way it will be more straightforward to later add more labels so we get
a nice hierarchical description - without having to decide all other labels
now. If I undersand JSON correctly, it does not have functionality for
including a file into another, but if each file contains a hierarchy
description of the data it might be possible to just concatenate them?

* How should this type of (extensible) data be represented on class level
in the code (so we avoid one implementation for each tool that needs
external data)? This should be independent of the file storage format - if
some of tools are IO-bound this will make it possible to later implement a
binary alternative, possibly including automatic compilation from updated
text files.


Anything else we should consider?

Cheers,

Erik


On Thu, May 12, 2016 at 5:22 PM, Berk Hess <hess at kth.se> wrote:

> Hi,
>
> I think David question is not so much about our current data handled by
> pdb2gmx, but rather for new data he needs to handle. We should of course
> keep the whole picture in mind, but we should not end up in endless
> topology formatting discussions, when all we need right away is a more
> generic container type format organization.
>
> Cheers,
>
> Berk
> On May 12, 2016 6:06 PM, Erik Lindahl <erik.lindahl at gmail.com> wrote:
>
> Hi,
>
> Rome wasn't built in a day, so no - I don't want the kitchen sink
> initially. However, it would be good if we could decide on some high-level
> principles and roughly how formats should later be implemented
> hierarchically.
>
> Where should comments and metadata go? Can we get a format where it is
> trivial to combine multiple files if a new analysis tools suddenly needs
> multiple types of data already implemented for other programs (without
> writing a new parser)?  For instance, if we were to rewrite pdb2gmx,
> roughly how should all those files be organized?
>
> With a bit of thought and a common parsing layer, it feels like it should
> be much easier for any developer to start adding/changing formats
> one-by-one instead of having a large discussion involving everybody each
> time somebody needs to add a file type!
>
> Cheers,
>
> Erik
>
>
> On Thu, May 12, 2016 at 4:54 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
>
> Hi,
>
> What's the scope? If we want to write a wrapper layer around some JSON
> code so people can use it for parsing input files, while we have a way to
> replace the dependency with something else in the future, then that's
> pretty much fine by me.
>
> What I'm not keen on is the rabbit hole of converting all our existing
> parameter-like file formats to some JSON format. That amounts to re-writing
> lots of our setup tools. That would likely be a big improvement in code
> quality, but it's dozens of hours of input from quite a few people to agree
> on how it should look, and then a few coding months to write tests,
> re-implement the code, and get it reviewed. This create a bunch of friction
> for users, for no immediate gain. Who's got the resources for that, and
> what's the big payoff?
>
> Mark
>
> On Thu, May 12, 2016 at 5:25 PM David van der Spoel <spoel at xray.bmc.uu.se>
> wrote:
>
> On 12/05/16 14:16, Erik Lindahl wrote:
> > Hi,
> >
> > While I’m all for JSON, the format per se isn’t critical. What is needed
> is for somebody (or a couple of people) to sit down and start working on a
> larger framework for writing and reading all sorts of data, and how to
> handle this in an abstract way. Then the actual file format is simply a
> module that can be replaced if we ever want to change it.  But, this will
> require volunteers!
> >
> This is what we discussed in the gerrit patch, to have a module on top
> of it that would form the API for the rest of the code. For me this is
> the most important thing to decide in gmx development in the near future.
>
> > Just picking a format and then having dozens of modules all fire away
> with creating their own data fields directly in that format doesn’t bring
> any more portability than using raw text files, IMHO :-)
> >
> > Cheers,
> >
> > Erik
> >
> >
> >
> >
> >> On 12 May 2016, at 12:40, David van der Spoel <spoel at xray.bmc.uu.se>
> wrote:
> >>
> >> Hi,
> >>
> >> with the developer meeting coming up next week I would like to once
> again bring up the issue of what file format to use for database files. We
> have discussed this for over ten years and not having a decision is
> stopping innovation.
> >>
> >> I propose we vote on it at the meeting next week if we can not reach
> concensus. Personally I am beyond caring which of the two as long as we
> make a decision - now we have nothing.
> >>
> >> Cheers,
> >> --
> >> David van der Spoel, Ph.D., Professor of Biology
> >> Dept. of Cell & Molec. Biol., Uppsala University.
> >> Box 596, 75124 Uppsala, Sweden. Phone:       +46184714205.
> >> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> >> --
> >> Gromacs Developers mailing list
> >>
> >> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
> >
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205.
> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>
>
>
>
> --
> --
> Erik Lindahl <erik.lindahl at gmail.com>
> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
>



-- 
--
Erik Lindahl <erik.lindahl at gmail.com>
Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
University
Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20160512/58bee9d0/attachment-0002.html>


More information about the gromacs.org_gmx-developers mailing list