[gmx-developers] libxml2

Mark Abraham mark.j.abraham at gmail.com
Wed Nov 13 15:00:38 CET 2013


On Nov 11, 2013 8:33 AM, "David van der Spoel" <spoel at xray.bmc.uu.se> wrote:
>
> On 2013-11-11 02:43, Erik Lindahl wrote:
>>
>> Hi,
>>
>> I’m fine with having it as a hard dependency, provided we’ve had it
>> compile automatically during installs for a while without complaints (it
>> has been on by default for 4.6, right?).
>>
>> However, I also sat down and looked a bit at the XML files in David’s
>> patch, and this made me realize we need a broader approach.
>>
>> Just introducing XML is not going to help us much, in particular not if
>> we just add a generic “XML” file type. This would be like merely having
>> “BIN” and “ASCII” types for all other files. Before we know it, we are
>> going to have a dozen of XML formats for different programs that have
>> nothing to do with each other, and there will be lots of input/output
>> routines in all programs processing them just-so-slightly differently.
>> In addition, adding XML tags instead of relying on tabs/space/newline is
>> of course a small step forward, but just a very small one - to really
>> fix things we need to make things even more structured.
>>
>> Some things I would like to see before we start using XML:
>>
>> 1) We need proper namespaces and sub-namespaces, so we can tell
>> different XML components from each other. This will also require us to
>> think a bit about information in general, even if we don’t implement all
>> components from the start. There are going to be lots of places where we
>> specify information on a residue and atom basis - how should all these
>> relate to each other?  When are things forcefield-specific vs. general,
>> and when should they go in the same vs. different files?
>>
>> 2) I think it makes a lot of sense to separate different XML files, so a
>> future mdp replacement might have extension xmdp, while the xml toplogy
>> has extension xtop. We should still be able to merge all of them in a
>> single file (fine with namespaces), but this will avoid the problems
>> when we specify an XML mdp file where a program was expecting an XML top
>> file (in other words, no generic “XML” file format that can contain
>> anything).
>>
>> 3) We need to think through naming carefully. In particlar: No custom
>> abbreviations unless it is really necessary. We should also use proper
>> names for types and similar settings, rather than merely translating our
>> old integer selectors to XML.
>>
>> 4) For any measurement, we should have units.
>>
>> 5) To enable XSLT transformations and better namespace handling, I think
>> we should standardize on (and require) schema descriptions for
>> validation, rather than the older DTDs.
>>
>> 6) We need some good common modules for reading/writing generic
>> structured data, so the actual files are isolated from the programs
>> using them.
>>
>>
>>
>> Some of this will take time, but my worry about pushing ahead and
>> starting to use XML anyway for individual programs is that it might
>> easily soon create a similar divergent mess as we’ve had with the
>> current text files?
>>
>
> I guess this will prevent us from using xml in practice. We have
discussed xml for ten years or so, but the transition to xml schema is a
real show stopper. I don't have the time to learn that as well. Does that
imply I should stop developing? In addition, for many small files you don't
need a dtd or schema (and in fact there isn't one for these xml files),
it's just that the libxml2 library demands you put it into the file. If
we're talking rtp files then that's another matter where more structure is
needed.

I agree with the desire to have a schema with which to validate and parse,
or the exercise just reduces to user-space punctuation spam. However, it's
too late now to embark on a wholesale schema design for 5.0. Even if not,
that schema would have to evolve as we learned our needs better. So I
suggest we/David comes up with something that meets the present need, and
is reasonably likely to work in a future context. Probably that's just a
matter of agreeing on a namespace name or two. I'm unsure of the technical
merits of schemas vs DTDs, but if we would chuck out the initial schema
later, it might as well be a DTD now, if that is easier.

> Some other points, like having clear names and units I do agree with and
can change it my present application.
>
> Common modules for writing and reading implies that all possible data
should be merged into one or a few monster formats. This in itself will
create extra problems.
>
> As for changing names of files, this shouldn't be necessary as one should
be able to see from the content what kind of file this is. No strong
feelings here but it would be very confusing to add many new files names.
>
> @Mark: an extra layer wouldn't help would it - there is no competing
package as far as I know. There is, however, libxml++, a C++ wrapper around
libxml2, which is slightly more logical to use in C++ code, but it would
imply an extra library. On the other hand that might function as a thin
wrapper around the library.

I'm seeking a GROMACS-implemented function layer so that e.g. every
analysis tool module is not including a libxml2 header and hard-coding
calls to its API. For example, the PME code does not call FFTW directly, it
calls the wrapper code that contains all the versioning by FFT library.
This style delivers lots of benefits. Likewise XDR. For XML, we probably
only need open, close, validate, read, parse and xpath-query functions.
Maybe some writing routines at some point. These functions only need to
pass through the arguments for now, but details depend on the actual use.

> Finally: The transition to c++ is hard enough on most developers - I have
been struggling with it over the last year, and slowly learning with lots
of helpful comments from Teemu. Let's try to keep life as simple as we can
- but not simpler.

Of course. Equally, trying to do things to save future-us pain has costs
now!

Mark

>> Cheers,
>>
>> Erik
>>
>>
>>
>>
>> On 10 Nov 2013, at 15:34, Mark Abraham <mark.j.abraham at gmail.com
>> <mailto:mark.j.abraham at gmail.com>> wrote:
>>
>>> My experience of libxml2 has been favourable. I'm happy with a
>>> dependency on it, but someone needs to identify a version (preferably
>>> one that is known to be in package repos and/or have binaries
>>> available on the web). I would suggest we implement the dependency
>>> roughly as we do for FFTW:
>>>
>>> * the install guide drops suitable hints to go get libxml2-dev(el)
>>> from your favourite repo (note that libxml2 might be installed by
>>> default, but we might need the #include headers that are only in the
>>> -dev or -devel packages!)
>>> * CMake detects if those exist in CMAKE_PREFIX_PATH, and gives a fatal
>>> error if not found.
>>> * the fatal error can be avoided by either letting the user supply a
>>> libxml2 tarball (e.g. so we can test in Jenkins also), or use cmake
>>> -DGMX_BUILD_OWN_LIBXML2 to do the same download-and-build thing.
>>>
>>> Even if legal, I'm not so keen on bundling the libxml2 tarball at
>>> ~5MB, when gromacs is ~10MB. Bundling just the headers we need in
>>> order to use a system libxml2 might be a good option.
>>>
>>> The proposed bump to require CMake version 2.8.8 in Redmine/Gerrit
>>> should make this a little smoother than it has been in the past.
>>>
>>> I think there should be a wrapper layer between libxml2 and the
>>> GROMACS code that uses it, so that we have the option to change the
>>> implementation if we want to do so later.
>>>
>>> There was an interesting post from Marcus Hanwell from Kitware on this
>>> list earlier this year about how their projects handle this kind of
>>> thing,
>>> (http://gromacs.5086.x6.nabble.com/parallel-make-problems-td5009226.html)
which
>>> seems like it should be what we should do now that we have several of
>>> these kinds of dependencies currently "living" in src/external (FFTW,
>>> Boost subset, TNG, now libxml2, maybe later PDBx or some FMM code,
>>> maybe gmxblas and gmxlapack should go live there). For 5.0, I can live
>>> with a hack that copies how we handle FFTW, though.
>>>
>>> Mark
>>>
>>>
>>> On Sun, Nov 10, 2013 at 9:35 PM, David van der
>>> Spoel<spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>>wrote:
>>>
>>>
>>>     On 2013-11-10 20:58, Erik Lindahl wrote:
>>>
>>>         Hi,
>>>
>>>         One reason could be that we haven’t really started to use
>>>         standardized XML input/output formats yet, although we’re
>>>         heading there long term. I’m also not enough of an expert to
>>>         say whether libxml2 is the best XML parser out there, since
>>>         there are quite a few alternatives?
>>>
>>>         If there are any specific new modules that would need it,
>>>         doesn’t it make more sense to have those modules go through
>>>         the normal code review (including a discussion of whether the
>>>         proposed XML formats are nicely designed, etc), rather than
>>>         separately making XML a hard requirement even if the current
>>>         code doesn’t rely on it?
>>>
>>>     This is why I asked. I think we should refrain from adding more
>>>     text only poorly documented files that are prone to errors.
>>>     Therefore I used a XML files in this
>>>     patchhttps://gerrit.gromacs.org/#/__c/2659/
>>>     <https://gerrit.gromacs.org/#/c/2659/>, however Teemu pointed out
>>>
>>>     that it does not compile without XML (since there are no ifdefs).
>>>     So rather than implementing TWO versions of the code to read in
>>>     the necessary data, we have to decide to make this obligatory now
>>>     or not.
>>>
>>>     As regards compiling under windows (from the libxml2 website):
>>>
>>>     Libxml2 is known to be very portable, the library should build and
>>>     work without serious troubles on a variety of systems (Linux,
>>>     Unix, Windows, CygWin, MacOS, MacOS X, RISC Os, OS/2, VMS, QNX,
>>>     MVS, VxWorks, ...)
>>>
>>>     It is distributed under the MIT license so I guess we could even
>>>     include it in the source code as a backup. With the known
>>>     portability and the license I don't see any reason not to. It
>>>     comes pre-installed on Macs and Linux by the way.
>>>
>>>
>>>
>>>
>>>         Cheers,
>>>
>>>         Erik
>>>
>>>         On 10 Nov 2013, at 11:51, David van der Spoel
>>>         <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>> wrote:
>>>
>>>             Hi,
>>>
>>>             we have decided a long time ago that for 5.0 libxml2 would
>>>             be required.
>>>             If there is any reason why we should remain to be able to
>>>             compile
>>>             gromacs WITHOUT libxml2, please speak up now.
>>>
>>>             If no good arguments will be brought forward then I will
>>>             change the main
>>>             CMakeList.txt such that gromacs will not compile without it.
>>>
>>>             --
>>>             David van der Spoel, Ph.D., Professor of Biology
>>>             Dept. of Cell & Molec. Biol., Uppsala University.
>>>             Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
>>>             <tel:%2B46184714205>.
>>>             spoel at xray.bmc.uu.se
>>>             <mailto:spoel at xray.bmc.uu.se>http://folding.bmc.uu.se
>>>             <http://folding.bmc.uu.se/>
>>>             --
>>>             gmx-developers mailing list
>>>             gmx-developers at gromacs.org <mailto:
gmx-developers at gromacs.org>
>>>             http://lists.gromacs.org/__mailman/listinfo/gmx-__developers
>>>
>>>             <http://lists.gromacs.org/mailman/listinfo/gmx-developers>
>>>             Please don't post (un)subscribe requests to the list. Use
the
>>>             www interface or send it
>>>             togmx-developers-request at __gromacs.org
>>>             <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>
>>>
>>>
>>>
>>>     --
>>>     David van der Spoel, Ph.D., Professor of Biology
>>>     Dept. of Cell & Molec. Biol., Uppsala University.
>>>     Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
>>>     <tel:%2B46184714205>.
>>>     spoel at xray.bmc.uu.se
>>>     <mailto:spoel at xray.bmc.uu.se>http://folding.bmc.uu.se
>>>     <http://folding.bmc.uu.se/>
>>>     --
>>>     gmx-developers mailing list
>>>     gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>>     http://lists.gromacs.org/__mailman/listinfo/gmx-__developers
>>>
>>>     <http://lists.gromacs.org/mailman/listinfo/gmx-developers>
>>>     Please don't post (un)subscribe requests to the list. Use the www
>>>     interface or send it togmx-developers-request at __gromacs.org
>>>     <mailto:gmx-developers-request at gromacs.org>.
>>>
>>>
>>> --
>>> gmx-developers mailing list
>>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>>>
>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>>> Please don't post (un)subscribe requests to the list. Use the
>>> www interface or send it togmx-developers-request at gromacs.org
>>> <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>>
>>
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205.
> spoel at xray.bmc.uu.se    http://folding.bmc.uu.se
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20131113/bece7748/attachment.html>


More information about the gromacs.org_gmx-developers mailing list