[gmx-developers] libxml2 versus JSON

David van der Spoel spoel at xray.bmc.uu.se
Thu May 12 21:13:46 CEST 2016


On 12/05/16 18:54, Erik Lindahl wrote:
> Hi,
>
> Yes, that's my point too :-)  First, I'm all for just picking JSON right
> away, but there are a few additional things that are not rocket science,
> but that we also should settle on before hacking away to avoid each data
> file inventing its own standard. For instance:
>
> * How do we handle units? Lists?

I guess the three files now describing atom properties could be used as 
a test case. Of course units have to be explicit.

>
> * How do we specify any property that depends on the name/type of a
> particle, residue, or chain? How do we specify things that depend on
> pairwise combinations?
>
> * How do we handle text fields/comments that we might want to be able to
> read into Gromacs?

Sigh... All this stuff is easy in xml.


>
> * We should have some sort of metadata for all parameters that describe
> the source, and what program version or person who wrote it. How should
> references to scientific papers be specified so we don't have to parse
> them from free text format?
>
> * It would be good to decide on some labels on the very highest level as
> we implement things. Initially we might only have "ChemicalProperties",
> but that way it will be more straightforward to later add more labels so
> we get a nice hierarchical description - without having to decide all
> other labels now. If I undersand JSON correctly, it does not have
> functionality for including a file into another, but if each file
> contains a hierarchy description of the data it might be possible to
> just concatenate them?
>
> * How should this type of (extensible) data be represented on class
> level in the code (so we avoid one implementation for each tool that
> needs external data)? This should be independent of the file storage
> format - if some of tools are IO-bound this will make it possible to
> later implement a binary alternative, possibly including automatic
> compilation from updated text files.
>
>
> Anything else we should consider?
>
> Cheers,
>
> Erik
>
>
> On Thu, May 12, 2016 at 5:22 PM, Berk Hess <hess at kth.se
> <mailto:hess at kth.se>> wrote:
>
>     Hi,
>
>     I think David question is not so much about our current data handled
>     by pdb2gmx, but rather for new data he needs to handle. We should of
>     course keep the whole picture in mind, but we should not end up in
>     endless topology formatting discussions, when all we need right away
>     is a more generic container type format organization.
>
>     Cheers,
>
>     Berk
>
>     On May 12, 2016 6:06 PM, Erik Lindahl <erik.lindahl at gmail.com
>     <mailto:erik.lindahl at gmail.com>> wrote:
>
>         Hi,
>
>         Rome wasn't built in a day, so no - I don't want the kitchen
>         sink initially. However, it would be good if we could decide on
>         some high-level principles and roughly how formats should later
>         be implemented hierarchically.
>
>         Where should comments and metadata go? Can we get a format where
>         it is trivial to combine multiple files if a new analysis tools
>         suddenly needs multiple types of data already implemented for
>         other programs (without writing a new parser)?  For instance, if
>         we were to rewrite pdb2gmx, roughly how should all those files
>         be organized?
>
>         With a bit of thought and a common parsing layer, it feels like
>         it should be much easier for any developer to start
>         adding/changing formats one-by-one instead of having a large
>         discussion involving everybody each time somebody needs to add a
>         file type!
>
>         Cheers,
>
>         Erik
>
>
>         On Thu, May 12, 2016 at 4:54 PM, Mark Abraham
>         <mark.j.abraham at gmail.com <mailto:mark.j.abraham at gmail.com>> wrote:
>
>             Hi,
>
>             What's the scope? If we want to write a wrapper layer around
>             some JSON code so people can use it for parsing input files,
>             while we have a way to replace the dependency with something
>             else in the future, then that's pretty much fine by me.
>
>             What I'm not keen on is the rabbit hole of converting all
>             our existing parameter-like file formats to some JSON
>             format. That amounts to re-writing lots of our setup tools.
>             That would likely be a big improvement in code quality, but
>             it's dozens of hours of input from quite a few people to
>             agree on how it should look, and then a few coding months to
>             write tests, re-implement the code, and get it reviewed.
>             This create a bunch of friction for users, for no immediate
>             gain. Who's got the resources for that, and what's the big
>             payoff?
>
>             Mark
>
>             On Thu, May 12, 2016 at 5:25 PM David van der Spoel
>             <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>> wrote:
>
>                 On 12/05/16 14:16, Erik Lindahl wrote:
>                  > Hi,
>                  >
>                  > While I’m all for JSON, the format per se isn’t
>                 critical. What is needed is for somebody (or a couple of
>                 people) to sit down and start working on a larger
>                 framework for writing and reading all sorts of data, and
>                 how to handle this in an abstract way. Then the actual
>                 file format is simply a module that can be replaced if
>                 we ever want to change it.  But, this will require
>                 volunteers!
>                  >
>                 This is what we discussed in the gerrit patch, to have a
>                 module on top
>                 of it that would form the API for the rest of the code.
>                 For me this is
>                 the most important thing to decide in gmx development in
>                 the near future.
>
>                  > Just picking a format and then having dozens of
>                 modules all fire away with creating their own data
>                 fields directly in that format doesn’t bring any more
>                 portability than using raw text files, IMHO :-)
>                  >
>                  > Cheers,
>                  >
>                  > Erik
>                  >
>                  >
>                  >
>                  >
>                  >> On 12 May 2016, at 12:40, David van der Spoel
>                 <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>> wrote:
>                  >>
>                  >> Hi,
>                  >>
>                  >> with the developer meeting coming up next week I
>                 would like to once again bring up the issue of what file
>                 format to use for database files. We have discussed this
>                 for over ten years and not having a decision is stopping
>                 innovation.
>                  >>
>                  >> I propose we vote on it at the meeting next week if
>                 we can not reach concensus. Personally I am beyond
>                 caring which of the two as long as we make a decision -
>                 now we have nothing.
>                  >>
>                  >> Cheers,
>                  >> --
>                  >> David van der Spoel, Ph.D., Professor of Biology
>                  >> Dept. of Cell & Molec. Biol., Uppsala University.
>                  >> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
>                 <tel:%2B46184714205>.
>                  >> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
>                 http://folding.bmc.uu.se
>                  >> --
>                  >> Gromacs Developers mailing list
>                  >>
>                  >> * Please search the archive at
>                 http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>                 before posting!
>                  >>
>                  >> * Can't post? Read
>                 http://www.gromacs.org/Support/Mailing_Lists
>                  >>
>                  >> * For (un)subscribe requests visit
>                  >>
>                 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>                 or send a mail to gmx-developers-request at gromacs.org
>                 <mailto:gmx-developers-request at gromacs.org>.
>                  >
>
>
>                 --
>                 David van der Spoel, Ph.D., Professor of Biology
>                 Dept. of Cell & Molec. Biol., Uppsala University.
>                 Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
>                 <tel:%2B46184714205>.
>                 spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
>                 http://folding.bmc.uu.se
>                 --
>                 Gromacs Developers mailing list
>
>                 * Please search the archive at
>                 http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>                 before posting!
>
>                 * Can't post? Read
>                 http://www.gromacs.org/Support/Mailing_Lists
>
>                 * For (un)subscribe requests visit
>                 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>                 or send a mail to gmx-developers-request at gromacs.org
>                 <mailto:gmx-developers-request at gromacs.org>.
>
>
>             --
>             Gromacs Developers mailing list
>
>             * Please search the archive at
>             http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>             before posting!
>
>             * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>             * For (un)subscribe requests visit
>             https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>             or send a mail to gmx-developers-request at gromacs.org
>             <mailto:gmx-developers-request at gromacs.org>.
>
>
>
>
>         --
>         --
>         Erik Lindahl <erik.lindahl at gmail.com
>         <mailto:erik.lindahl at gmail.com>>
>         Professor of Biophysics, Dept. Biochemistry & Biophysics,
>         Stockholm University
>         Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
>
>     --
>     Gromacs Developers mailing list
>
>     * Please search the archive at
>     http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>     before posting!
>
>     * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
>     * For (un)subscribe requests visit
>     https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>     or send a mail to gmx-developers-request at gromacs.org
>     <mailto:gmx-developers-request at gromacs.org>.
>
>
>
>
> --
> --
> Erik Lindahl <erik.lindahl at gmail.com <mailto:erik.lindahl at gmail.com>>
> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
>


-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se


More information about the gromacs.org_gmx-developers mailing list