[gmx-developers] libxml2 versus JSON

Wed May 25 18:13:29 CEST 2016

Hi,

> On 25 May 2016, at 18:05, David van der Spoel <spoel at xray.bmc.uu.se> wrote:
>> 
>> 1) What grouping/nomenclature should we use for all stuff like this
>> (physical data) on the very highest level?
> Particle Properties
> |__> Particle Type
> |__> Particle Name
> |__> Property Type
> |__> Property Value
> |__> Property Unit
> |__> Reference
> 
> e.g.
> Particle Properties
> |__> Element
> |__> Cl
> |__> Electronegativity
> |__> 3.16
> |__> Volt
> |__> https://en.wikipedia.org/wiki/Electronegativity
> 

I guess the big question is whether we want all physical properties to be part of the particle property (in which case I guess they belong best in the force field?), or do we want separate blocks e.g. for electronegativity, scattering factors, etc?

Right now we have a bit of a mix, and maybe it’s easiest to continue that way (and maybe allow both options in the future) - at least we won’t have to move all the force fields to JSON just to get started using it….

>> 
>> 2) How should we handle/classify (nota bene: not implement) different
>> scattering factors? They should likely go into separate objects, but
>> somebody should likely spend a little time making a list of the ones we
>> expect to implement, and then we can have a quick discussion in redmine
>> that it makes sense.
> In general it will be hard to predict everything we will ever use there files have to be somewhat future proof.

Right, we won’t predict everything, but if we at least try to come up with a few dozen things we might want (and consider how they would fit) I think we’ll make much better design choices.

>> 3) How do we handle the mapping of particles (in particular for
>> different force fields) to physical properties? I would guess that we
>> might want to use elements when we can (to make things FF independent),
>> but we also need ways to create arbitrary mappings, including wildcards.
>> How do we handle cases with multiple alternative names for a residue?
>> Some setups might use residue-like groups, but not all of them.
> Ideally we would use IUPAB names for atoms in proteins with some kind of mapping for each force field. That would mean everything else could refer to IUPAB names. If we decide to keep the chaos in atom naming that goes with supporting all force fields we do, then we should have one master file for scattering factors that is processed by a script to generate force field specific files. That could also keep track of strange residue names. It would be great to keep this kind of stuff outside the gmx binary.

I like IUPAB, in particular if we can get something that’s mostly generic (at least for all-atom force fields).
> 
> I would also strongly advise against wild cards. Best way to get irreproducible results.

Good point. Better to have a script that generates multiple entries when we absolutely need them.
>> 
>> 5) How do we handle literature references?
> In the input file.

I think we should avoid plain-text fields as much as we can, but rather design simple markup formats for things like this too.

My plan is to spend the next ~2 weeks fixing bugs for release-2016, but then a JSON module is among the things I’ll look into.

Cheers,

Erik