[gmx-developers] libxml2

Erik Lindahl erik.lindahl at scilifelab.se
Mon Nov 11 12:47:57 CET 2013


Hi,

On 10 Nov 2013, at 23:33, David van der Spoel <spoel at xray.bmc.uu.se> wrote:
> 
> I guess this will prevent us from using xml in practice. We have 
> discussed xml for ten years or so, but the transition to xml schema is a 
> real show stopper. I don't have the time to learn that as well. Does 
> that imply I should stop developing? In addition, for many small files 
> you don't need a dtd or schema (and in fact there isn't one for these 
> xml files), it's just that the libxml2 library demands you put it into 
> the file. If we're talking rtp files then that's another matter where 
> more structure is needed.

I think the ability to validate the contents of a file is the core concept we want from XML. An XML file that doesn’t have any DTD or Schema is just a textfile that looks fancier - you can add illegal data anywhere, and they you only rely on the internal logic of the program reading it to catch your error (or not) - that won’t really be much safer than our current text files.

Writing a schema for a simple file takes less than an hour to learn, and there are even free DTD-to-schema converters. Obviously, it will still be a lot of work to write an advanced schema e.g. for topologies, but I don’t think that’s on the table right now.  However, just as class design is a pain for all of us (well, maybe not Teemu :-), the reason for doing it is that it will save time for all developers and lead to fewer bugs in the long run.
> 
> Some other points, like having clear names and units I do agree with and 
> can change it my present application.
> 
> Common modules for writing and reading implies that all possible data 
> should be merged into one or a few monster formats. This in itself will 
> create extra problems.

Well, it doesn’t necessarily have to be _one_ single format, but I think it is a far better solution to standardize on how we do it rather than ~20 tools each inventing their own structure for how to store and read data? That is what we have right now with the text files...

> As for changing names of files, this shouldn't be necessary as one 
> should be able to see from the content what kind of file this is. No 
> strong feelings here but it would be very confusing to add many new 
> files names.

If we have a good namespace structure we can probably get around without it. However, at some point we have to consider how to separate the topology XML file from the mdp XML file in each directory.

> @Mark: an extra layer wouldn't help would it - there is no competing 
> package as far as I know. There is, however, libxml++, a C++ wrapper 
> around libxml2, which is slightly more logical to use in C++ code, but 
> it would imply an extra library. On the other hand that might function 
> as a thin wrapper around the library.

I know of at least Expat and MSXML, and quickly also foundmini-XML, Xerces, AsmXml and RapidXml, where the last two are claiming an order of magnitude faster parsing speeds than libxml2. 
I see no particular reason for using any of those libraries today, but this sounds like exactly the same situation where we originally saw no reason for any other FFT libraries than FFTW :-)

Cheers,

Erik

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20131111/fdc3c4e0/attachment.html>


More information about the gromacs.org_gmx-developers mailing list