[gmx-developers] libxml2 versus JSON

Wed May 25 18:05:21 CEST 2016

On 25/05/16 17:13, Erik Lindahl wrote:
> Hi,
>
>> On 25 May 2016, at 16:40, David van der Spoel <spoel at xray.bmc.uu.se
>> <mailto:spoel at xray.bmc.uu.se>> wrote:
>>>
>> You are absolutely right that my "approach" does not give us more than
>> independence from the underlying library. However my concern has been
>> for a while that waiting for one monster file format does not get us
>> further either. Fitting force field files, topologies and lots of
>> random data all in one structure is not the way forward either.
>>
>> But rather than whining I will propose a JSON schema for the WAXS
>> patch that is in gerrit, then we can take the discussion from there.
>>
>
> A couple of things we could probably decide on relatively easy (and
> these are probably all needed for the WAXS stuff):
>
>
> 1) What grouping/nomenclature should we use for all stuff like this
> (physical data) on the very highest level?
Particle Properties
|__> Particle Type
|__> Particle Name
|__> Property Type
|__> Property Value
|__> Property Unit
|__> Reference

e.g.
Particle Properties
|__> Element
|__> Cl
|__> Electronegativity
|__> 3.16
|__> Volt
|__> https://en.wikipedia.org/wiki/Electronegativity

>
> 2) How should we handle/classify (nota bene: not implement) different
> scattering factors? They should likely go into separate objects, but
> somebody should likely spend a little time making a list of the ones we
> expect to implement, and then we can have a quick discussion in redmine
> that it makes sense.
In general it will be hard to predict everything we will ever use there 
files have to be somewhat future proof.
>
> 3) How do we handle the mapping of particles (in particular for
> different force fields) to physical properties? I would guess that we
> might want to use elements when we can (to make things FF independent),
> but we also need ways to create arbitrary mappings, including wildcards.
> How do we handle cases with multiple alternative names for a residue?
> Some setups might use residue-like groups, but not all of them.
Ideally we would use IUPAB names for atoms in proteins with some kind of 
mapping for each force field. That would mean everything else could 
refer to IUPAB names. If we decide to keep the chaos in atom naming that 
goes with supporting all force fields we do, then we should have one 
master file for scattering factors that is processed by a script to 
generate force field specific files. That could also keep track of 
strange residue names. It would be great to keep this kind of stuff 
outside the gmx binary.

I would also strongly advise against wild cards. Best way to get 
irreproducible results.

>
> 4) How will we handle units and possibly conversions?
Units should be explicit in the input file. We have code to convert 
units based on unit string. Unknown unit in the input file yields fatal 
error.

>
> 5) How do we handle literature references?
In the input file.

>
> Cheers,
>
> Erik
>
> PS: As both Teemu & I commented about year ago, the old WAXS patch in
> gerrit suffers from a very large number concepts in a single change. I
> added some ~120 comments a year ago that I think nobody got back about.
> I can kind of understand that given the complexity, but given the
> patch’s age and that we now are all C++, I would strongly recommend
> trying to break it up into smaller parts rather than extending it!
>
>
>
>
>

-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se