[gmx-developers] libxml2 versus JSON
David van der Spoel
spoel at xray.bmc.uu.se
Thu May 12 21:13:46 CEST 2016
On 12/05/16 18:54, Erik Lindahl wrote:
> Hi,
>
> Yes, that's my point too :-) First, I'm all for just picking JSON right
> away, but there are a few additional things that are not rocket science,
> but that we also should settle on before hacking away to avoid each data
> file inventing its own standard. For instance:
>
> * How do we handle units? Lists?
I guess the three files now describing atom properties could be used as
a test case. Of course units have to be explicit.
>
> * How do we specify any property that depends on the name/type of a
> particle, residue, or chain? How do we specify things that depend on
> pairwise combinations?
>
> * How do we handle text fields/comments that we might want to be able to
> read into Gromacs?
Sigh... All this stuff is easy in xml.
>
> * We should have some sort of metadata for all parameters that describe
> the source, and what program version or person who wrote it. How should
> references to scientific papers be specified so we don't have to parse
> them from free text format?
>
> * It would be good to decide on some labels on the very highest level as
> we implement things. Initially we might only have "ChemicalProperties",
> but that way it will be more straightforward to later add more labels so
> we get a nice hierarchical description - without having to decide all
> other labels now. If I undersand JSON correctly, it does not have
> functionality for including a file into another, but if each file
> contains a hierarchy description of the data it might be possible to
> just concatenate them?
>
> * How should this type of (extensible) data be represented on class
> level in the code (so we avoid one implementation for each tool that
> needs external data)? This should be independent of the file storage
> format - if some of tools are IO-bound this will make it possible to
> later implement a binary alternative, possibly including automatic
> compilation from updated text files.
>
>
> Anything else we should consider?
>
> Cheers,
>
> Erik
>
>
> On Thu, May 12, 2016 at 5:22 PM, Berk Hess <hess at kth.se
> <mailto:hess at kth.se>> wrote:
>
> Hi,
>
> I think David question is not so much about our current data handled
> by pdb2gmx, but rather for new data he needs to handle. We should of
> course keep the whole picture in mind, but we should not end up in
> endless topology formatting discussions, when all we need right away
> is a more generic container type format organization.
>
> Cheers,
>
> Berk
>
> On May 12, 2016 6:06 PM, Erik Lindahl <erik.lindahl at gmail.com
> <mailto:erik.lindahl at gmail.com>> wrote:
>
> Hi,
>
> Rome wasn't built in a day, so no - I don't want the kitchen
> sink initially. However, it would be good if we could decide on
> some high-level principles and roughly how formats should later
> be implemented hierarchically.
>
> Where should comments and metadata go? Can we get a format where
> it is trivial to combine multiple files if a new analysis tools
> suddenly needs multiple types of data already implemented for
> other programs (without writing a new parser)? For instance, if
> we were to rewrite pdb2gmx, roughly how should all those files
> be organized?
>
> With a bit of thought and a common parsing layer, it feels like
> it should be much easier for any developer to start
> adding/changing formats one-by-one instead of having a large
> discussion involving everybody each time somebody needs to add a
> file type!
>
> Cheers,
>
> Erik
>
>
> On Thu, May 12, 2016 at 4:54 PM, Mark Abraham
> <mark.j.abraham at gmail.com <mailto:mark.j.abraham at gmail.com>> wrote:
>
> Hi,
>
> What's the scope? If we want to write a wrapper layer around
> some JSON code so people can use it for parsing input files,
> while we have a way to replace the dependency with something
> else in the future, then that's pretty much fine by me.
>
> What I'm not keen on is the rabbit hole of converting all
> our existing parameter-like file formats to some JSON
> format. That amounts to re-writing lots of our setup tools.
> That would likely be a big improvement in code quality, but
> it's dozens of hours of input from quite a few people to
> agree on how it should look, and then a few coding months to
> write tests, re-implement the code, and get it reviewed.
> This create a bunch of friction for users, for no immediate
> gain. Who's got the resources for that, and what's the big
> payoff?
>
> Mark
>
> On Thu, May 12, 2016 at 5:25 PM David van der Spoel
> <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>> wrote:
>
> On 12/05/16 14:16, Erik Lindahl wrote:
> > Hi,
> >
> > While I’m all for JSON, the format per se isn’t
> critical. What is needed is for somebody (or a couple of
> people) to sit down and start working on a larger
> framework for writing and reading all sorts of data, and
> how to handle this in an abstract way. Then the actual
> file format is simply a module that can be replaced if
> we ever want to change it. But, this will require
> volunteers!
> >
> This is what we discussed in the gerrit patch, to have a
> module on top
> of it that would form the API for the rest of the code.
> For me this is
> the most important thing to decide in gmx development in
> the near future.
>
> > Just picking a format and then having dozens of
> modules all fire away with creating their own data
> fields directly in that format doesn’t bring any more
> portability than using raw text files, IMHO :-)
> >
> > Cheers,
> >
> > Erik
> >
> >
> >
> >
> >> On 12 May 2016, at 12:40, David van der Spoel
> <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>> wrote:
> >>
> >> Hi,
> >>
> >> with the developer meeting coming up next week I
> would like to once again bring up the issue of what file
> format to use for database files. We have discussed this
> for over ten years and not having a decision is stopping
> innovation.
> >>
> >> I propose we vote on it at the meeting next week if
> we can not reach concensus. Personally I am beyond
> caring which of the two as long as we make a decision -
> now we have nothing.
> >>
> >> Cheers,
> >> --
> >> David van der Spoel, Ph.D., Professor of Biology
> >> Dept. of Cell & Molec. Biol., Uppsala University.
> >> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
> <tel:%2B46184714205>.
> >> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> http://folding.bmc.uu.se
> >> --
> >> Gromacs Developers mailing list
> >>
> >> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> before posting!
> >>
> >> * Can't post? Read
> http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >>
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
> >
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
> <tel:%2B46184714205>.
> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> http://folding.bmc.uu.se
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> before posting!
>
> * Can't post? Read
> http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
>
>
>
>
> --
> --
> Erik Lindahl <erik.lindahl at gmail.com
> <mailto:erik.lindahl at gmail.com>>
> Professor of Biophysics, Dept. Biochemistry & Biophysics,
> Stockholm University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
>
>
>
>
> --
> --
> Erik Lindahl <erik.lindahl at gmail.com <mailto:erik.lindahl at gmail.com>>
> Professor of Biophysics, Dept. Biochemistry & Biophysics, Stockholm
> University
> Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
>
>
--
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
spoel at xray.bmc.uu.se http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers
mailing list