[gmx-developers] libxml2 versus JSON
mark.j.abraham at gmail.com
Mon May 23 17:07:22 CEST 2016
I don't want to kick off a massive design discussion that will distract us
from fixing bugs and testing the beta, but I wanted to post a summary of
discussion in Goettingen last week so people can digest and consider
(particularly as David wasn't able to make it there).
When shown a very simple refdata XML file and a plausible sketch of
tip3p.itp as JSON, and asked how they'd prefer a future topol.top to look,
there was an overwhelming vote in favour of JSON. We technically didn't ask
for a preference for staying with .mdp for mdrun settings, but other
discussion identified .mdp as a high-value target, and most people seem to
agree that the friction from such a switch is well enough offset by the
kinds of benefits below. But once we've thought a bit more, we can throw
some examples out to discussion on gmx-users.
We've had a few recent bugs/fixes caused by parsing of insufficient quality
via sscanf (https://gerrit.gromacs.org/#/c/5588/,
https://gerrit.gromacs.org/#/c/5884/ come immediately to mind), so
separating code for syntax checks from code for simulation logic checks,
and from code for filling of data structures would have obvious benefits to
everybody. (And particularly not just people modifying GROMACS.)
YAML was suggested as an alternative - it seems better for humans to read
and write. Technically it is close to being a superset of JSON, which makes
it easier to design something nicer for humans to use (and thus harder for
computers). yaml-cpp and yavl-cpp seem decent enough for the read/write and
validate tasks we respectively might want them to do (Boost dependency has
been removed since their last release via C++11; builds use CMake).
Tradeoffs between YAML and JSON seem reasonably well covered at
TOML was suggested as an alternative - it looks nicer for a human to read,
but doesn't have schemas yet, so isn't useful for us.
Modules of mdrun (whether standard or via some kind of extension, which was
a frequent need for people at the Gottingen workshop) should be able to
declare their own mdp-style settings without modifying half a dozen files
needed to also support gmx check and dump along with the obvious grompp and
mdrun. A reasonably elegant way for them to do that is to register a schema
fragment, so grompp can do syntax checking. It would be nice also to
combine the fragments into a schema that perhaps editors can use to help
users edit files well. We might later merge grompp and mdrun for some uses,
but we can arrange e.g. for the build system to embed the scheme into the
gmx binary rather than read and parse files to do string parsing.
Probably we can produce a fully-written-out .mdp in the new format to go
into the .tpr as a flat string, and that might change the responsibility
for mdrun-vs-tpr version checking, hopefully usefully, but this might need
some thought. mdrun modules can do their own initialization by calling
routines that query the validated input string, filling their own structs
as necessary for convenience or perhaps performance. We have been moving
towards use of inputrec that is const very early in mdrun, but moving to
JSON gives us an opening to do this formally and avoid the e.g.
pressure-coupling module being able to read parameters from the
temperature-coupling module without the developer being forced to consider
that design issue up front.
The external parsing library that we wrap for use by GROMACS will need good
support for schema handling, but I haven't identified a candidate for that
for JSON (David or Erik, did you?).
We'll want some code that transforms old .mdp into new format (whatever it
is). Maybe that will be a stripped version of grompp that persists for a
while as gmx convert-mdp or something.
Happy bug fixing/hunting/testing ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers