[gmx-developers] libxml2

Mon Nov 18 16:23:43 CET 2013

Looks like a good start! Thanks.

I'm not sure whether <gromacs xmlns:gmx="http://www.gromacs.org/schemas">
should name a schema file, or a place to look up a schema file. Does anyone
know? In the past I have seem XML files with namespace-specific content like

  <gmx:sfactors type="Fourier" force_field="any" displaced_solvent="true"
reference="">
    <gmx:sfactor residue="ALA" atom="MW" type="1">

which I'm sure we'd like to avoid. Does David's suggestion achieve that?
Presumably we'd want something like xmllint to be able to validate against
the schema named in the XML file. Are we able to organize such a schema for
sfactor? I've never written one.

Mark

On Sat, Nov 16, 2013 at 1:43 PM, David van der Spoel
<spoel at xray.bmc.uu.se>wrote:

> On 2013-11-14 06:18, Teemu Murtola wrote:
>
>> Hi all,
>>
>> just a few quick comments on some technical details.
>>
>>           > On 2013-11-11 02:43, Erik Lindahl wrote:
>>           >> I’m fine with having it as a hard dependency, provided
>>         we’ve had it
>>           >> compile automatically during installs for a while without
>>         complaints (it
>>           >> has been on by default for 4.6, right?).
>>
>>
>> We've had it on by default if it is found for ages, but the build system
>> has semi-silently dropped it from the build if it is not found, with no
>> user-visible consequences, so I'm not sure whether that qualifies as the
>> required testing.
>>
>>         On Nov 11, 2013 8:33 AM, "David van der Spoel"
>>         <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
>>         <mailto:spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>>>
>> wrote:
>>
>>          > In addition, for many small files
>>         you don't need a dtd or schema (and in fact there isn't one for
>>         these
>>         xml files), it's just that the libxml2 library demands you put
>>         it into
>>         the file. If we're talking rtp files then that's another matter
>>         where
>>         more structure is needed.
>>
>>
>> Where does this "demand from libxml2" come from? The unit testing stuff
>> in master has been writing and reading XML files without DTDs or schemas
>> for years now using libxml2, and no one has reported any issues. So I
>> don't think there is any hard demand for a DTD in the XML file that it
>> parses. Additionally, I think that referencing a non-existent DTD file
>> serves no purpose whatsoever. I think that either you need to actually
>> write that DTD, or remove the reference.
>>
>> On Wed, Nov 13, 2013 at 8:57 PM, David van der Spoel
>> <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>> wrote:
>>
>>         The thing is that for small files it doesn't matter, neither DTD
>>         nor Schema is used if you don't need it. I still have a hard
>>         time comprehending why we would like to mix e.g. simulation data
>>         with all possible other stuff.
>>
>>
>> I think others' point is that without DTD or Schema validation, you need
>> to write a lot of validation stuff yourself, which is a lot of code, or
>> just live with the fact that all kinds of malformed input can get
>> accepted, which isn't much better than text files.
>>
>>         Just check libxml++ but that introduces another dependency so
>>         that's out. I will draft a gromacs frontend in C++ for libxml2
>>         with just subset of the functionality. There is however one
>>         issue: XML can be read in two fashions, using the DOM (Document
>>         Object Model) and using SAX (Simple API for XML). Until now I
>>         have used the DOM, which reads a whole document into memory, but
>>         the memory usage can be prohibitive. SAX should therefore be the
>>         preferred route. Any comments on that?
>>
>>
>> I think that unless we need to read very big XML files, DOM is a lot
>> more flexible. Parsing more complex data structures in SAX requires the
>> parser that receives all the SAX callbacks be a relatively complex state
>> machine, as it needs to incrementally construct all the data structures.
>> Code with the same level of functionality and modularity is probably a
>> lot easier to write and understand if written using DOM. If you want to
>> keep the ability to not load the whole document, using the third option
>> in libxml2, the reader API, is probably a better idea.
>>
>> It would be nice if the frontend would also be able to abstract away the
>> current usage of libxml2 in src/testutils/refdata.cpp. However, that is
>> perhaps quite different from what most other Gromacs code will use it,
>> so it may not be the highest priority. It is already quite well
>> encapsulated in this single file. But this is something to think about
>> in the design.
>>
>> Teemu
>>
>>
>>  Some progress. How's this:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <gromacs xmlns:gmx="http://www.gromacs.org/schemas">
>   <sfactors type="Fourier" force_field="any" displaced_solvent="true"
> reference="">
>     <sfactor residue="ALA" atom="MW" type="1">
>       <a0 unit="e">
>      10.0369
>       </a0>
>       <q0 unit="1/A">
>            0
>       </q0>
> ....
>   </sfactors>
> </gromacs>
>
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205.
> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> --
> gromacs.org_gmx-developers mailing list
> gromacs.org_gmx-developers at maillist.sys.kth.se
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20131118/60736ac7/attachment-0001.html>