[gmx-developers] libxml2

Thu Nov 14 06:18:13 CET 2013

Hi all,

just a few quick comments on some technical details.

 > On 2013-11-11 02:43, Erik Lindahl wrote:
>>  >> I’m fine with having it as a hard dependency, provided we’ve had it
>>  >> compile automatically during installs for a while without complaints
>> (it
>>  >> has been on by default for 4.6, right?).
>>
>
We've had it on by default if it is found for ages, but the build system
has semi-silently dropped it from the build if it is not found, with no
user-visible consequences, so I'm not sure whether that qualifies as the
required testing.

> On Nov 11, 2013 8:33 AM, "David van der Spoel" <spoel at xray.bmc.uu.se
>> <mailto:spoel at xray.bmc.uu.se>> wrote:
>>
> > In addition, for many small files
>> you don't need a dtd or schema (and in fact there isn't one for these
>> xml files), it's just that the libxml2 library demands you put it into
>> the file. If we're talking rtp files then that's another matter where
>> more structure is needed.
>>
>
Where does this "demand from libxml2" come from? The unit testing stuff in
master has been writing and reading XML files without DTDs or schemas for
years now using libxml2, and no one has reported any issues. So I don't
think there is any hard demand for a DTD in the XML file that it parses.
Additionally, I think that referencing a non-existent DTD file serves no
purpose whatsoever. I think that either you need to actually write that
DTD, or remove the reference.

On Wed, Nov 13, 2013 at 8:57 PM, David van der Spoel <spoel at xray.bmc.uu.se>
 wrote:

> The thing is that for small files it doesn't matter, neither DTD nor
>> Schema is used if you don't need it. I still have a hard time comprehending
>> why we would like to mix e.g. simulation data with all possible other stuff.
>>
>
I think others' point is that without DTD or Schema validation, you need to
write a lot of validation stuff yourself, which is a lot of code, or just
live with the fact that all kinds of malformed input can get accepted,
which isn't much better than text files.

Just check libxml++ but that introduces another dependency so that's out. I
>> will draft a gromacs frontend in C++ for libxml2 with just subset of the
>> functionality. There is however one issue: XML can be read in two fashions,
>> using the DOM (Document Object Model) and using SAX (Simple API for XML).
>> Until now I have used the DOM, which reads a whole document into memory,
>> but the memory usage can be prohibitive. SAX should therefore be the
>> preferred route. Any comments on that?
>
>
I think that unless we need to read very big XML files, DOM is a lot more
flexible. Parsing more complex data structures in SAX requires the parser
that receives all the SAX callbacks be a relatively complex state machine,
as it needs to incrementally construct all the data structures. Code with
the same level of functionality and modularity is probably a lot easier to
write and understand if written using DOM. If you want to keep the ability
to not load the whole document, using the third option in libxml2, the
reader API, is probably a better idea.

It would be nice if the frontend would also be able to abstract away the
current usage of libxml2 in src/testutils/refdata.cpp. However, that is
perhaps quite different from what most other Gromacs code will use it, so
it may not be the highest priority. It is already quite well encapsulated
in this single file. But this is something to think about in the design.

Teemu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20131114/37f6169f/attachment.html>