[gmx-developers] libxml2

David van der Spoel spoel at xray.bmc.uu.se
Wed Nov 13 19:57:44 CET 2013


On 2013-11-13 15:00, Mark Abraham wrote:
>
> On Nov 11, 2013 8:33 AM, "David van der Spoel" <spoel at xray.bmc.uu.se
> <mailto:spoel at xray.bmc.uu.se>> wrote:
>  >
>  > On 2013-11-11 02:43, Erik Lindahl wrote:
>  >>
>  >> Hi,
>  >>
>  >> I’m fine with having it as a hard dependency, provided we’ve had it
>  >> compile automatically during installs for a while without complaints (it
>  >> has been on by default for 4.6, right?).
>  >>
>  >> However, I also sat down and looked a bit at the XML files in David’s
>  >> patch, and this made me realize we need a broader approach.
>  >>
>  >> Just introducing XML is not going to help us much, in particular not if
>  >> we just add a generic “XML” file type. This would be like merely having
>  >> “BIN” and “ASCII” types for all other files. Before we know it, we are
>  >> going to have a dozen of XML formats for different programs that have
>  >> nothing to do with each other, and there will be lots of input/output
>  >> routines in all programs processing them just-so-slightly differently.
>  >> In addition, adding XML tags instead of relying on tabs/space/newline is
>  >> of course a small step forward, but just a very small one - to really
>  >> fix things we need to make things even more structured.
>  >>
>  >> Some things I would like to see before we start using XML:
>  >>
>  >> 1) We need proper namespaces and sub-namespaces, so we can tell
>  >> different XML components from each other. This will also require us to
>  >> think a bit about information in general, even if we don’t implement all
>  >> components from the start. There are going to be lots of places where we
>  >> specify information on a residue and atom basis - how should all these
>  >> relate to each other?  When are things forcefield-specific vs. general,
>  >> and when should they go in the same vs. different files?
>  >>
>  >> 2) I think it makes a lot of sense to separate different XML files, so a
>  >> future mdp replacement might have extension xmdp, while the xml toplogy
>  >> has extension xtop. We should still be able to merge all of them in a
>  >> single file (fine with namespaces), but this will avoid the problems
>  >> when we specify an XML mdp file where a program was expecting an XML top
>  >> file (in other words, no generic “XML” file format that can contain
>  >> anything).
>  >>
>  >> 3) We need to think through naming carefully. In particlar: No custom
>  >> abbreviations unless it is really necessary. We should also use proper
>  >> names for types and similar settings, rather than merely translating our
>  >> old integer selectors to XML.
>  >>
>  >> 4) For any measurement, we should have units.
>  >>
>  >> 5) To enable XSLT transformations and better namespace handling, I think
>  >> we should standardize on (and require) schema descriptions for
>  >> validation, rather than the older DTDs.
>  >>
>  >> 6) We need some good common modules for reading/writing generic
>  >> structured data, so the actual files are isolated from the programs
>  >> using them.
>  >>
>  >>
>  >>
>  >> Some of this will take time, but my worry about pushing ahead and
>  >> starting to use XML anyway for individual programs is that it might
>  >> easily soon create a similar divergent mess as we’ve had with the
>  >> current text files?
>  >>
>  >
>  > I guess this will prevent us from using xml in practice. We have
> discussed xml for ten years or so, but the transition to xml schema is a
> real show stopper. I don't have the time to learn that as well. Does
> that imply I should stop developing? In addition, for many small files
> you don't need a dtd or schema (and in fact there isn't one for these
> xml files), it's just that the libxml2 library demands you put it into
> the file. If we're talking rtp files then that's another matter where
> more structure is needed.
>
> I agree with the desire to have a schema with which to validate and
> parse, or the exercise just reduces to user-space punctuation spam.
> However, it's too late now to embark on a wholesale schema design for
> 5.0. Even if not, that schema would have to evolve as we learned our
> needs better. So I suggest we/David comes up with something that meets
> the present need, and is reasonably likely to work in a future context.
> Probably that's just a matter of agreeing on a namespace name or two.
> I'm unsure of the technical merits of schemas vs DTDs, but if we would
> chuck out the initial schema later, it might as well be a DTD now, if
> that is easier.
>
The thing is that for small files it doesn't matter, neither DTD nor 
Schema is used if you don't need it. I still have a hard time 
comprehending why we would like to mix e.g. simulation data with all 
possible other stuff.


>  > Some other points, like having clear names and units I do agree with
> and can change it my present application.
>  >
>  > Common modules for writing and reading implies that all possible data
> should be merged into one or a few monster formats. This in itself will
> create extra problems.
>  >
>  > As for changing names of files, this shouldn't be necessary as one
> should be able to see from the content what kind of file this is. No
> strong feelings here but it would be very confusing to add many new
> files names.
>  >
>  > @Mark: an extra layer wouldn't help would it - there is no competing
> package as far as I know. There is, however, libxml++, a C++ wrapper
> around libxml2, which is slightly more logical to use in C++ code, but
> it would imply an extra library. On the other hand that might function
> as a thin wrapper around the library.
>
> I'm seeking a GROMACS-implemented function layer so that e.g. every
> analysis tool module is not including a libxml2 header and hard-coding
> calls to its API. For example, the PME code does not call FFTW directly,
> it calls the wrapper code that contains all the versioning by FFT
> library. This style delivers lots of benefits. Likewise XDR. For XML, we
> probably only need open, close, validate, read, parse and xpath-query
> functions. Maybe some writing routines at some point. These functions
> only need to pass through the arguments for now, but details depend on
> the actual use.

Just check libxml++ but that introduces another dependency so that's 
out. I will draft a gromacs frontend in C++ for libxml2 with just subset 
of the functionality. There is however one issue: XML can be read in two 
fashions, using the DOM (Document Object Model) and using SAX (Simple 
API for XML). Until now I have used the DOM, which reads a whole 
document into memory, but the memory usage can be prohibitive. SAX 
should therefore be the preferred route. Any comments on that?


>
>  > Finally: The transition to c++ is hard enough on most developers - I
> have been struggling with it over the last year, and slowly learning
> with lots of helpful comments from Teemu. Let's try to keep life as
> simple as we can - but not simpler.
>
> Of course. Equally, trying to do things to save future-us pain has costs
> now!
>
> Mark
>
>  >> Cheers,
>  >>
>  >> Erik
>  >>
>  >>
>  >>
>  >>
>  >> On 10 Nov 2013, at 15:34, Mark Abraham <mark.j.abraham at gmail.com
> <mailto:mark.j.abraham at gmail.com>
>  >> <mailto:mark.j.abraham at gmail.com <mailto:mark.j.abraham at gmail.com>>>
> wrote:
>  >>
>  >>> My experience of libxml2 has been favourable. I'm happy with a
>  >>> dependency on it, but someone needs to identify a version (preferably
>  >>> one that is known to be in package repos and/or have binaries
>  >>> available on the web). I would suggest we implement the dependency
>  >>> roughly as we do for FFTW:
>  >>>
>  >>> * the install guide drops suitable hints to go get libxml2-dev(el)
>  >>> from your favourite repo (note that libxml2 might be installed by
>  >>> default, but we might need the #include headers that are only in the
>  >>> -dev or -devel packages!)
>  >>> * CMake detects if those exist in CMAKE_PREFIX_PATH, and gives a fatal
>  >>> error if not found.
>  >>> * the fatal error can be avoided by either letting the user supply a
>  >>> libxml2 tarball (e.g. so we can test in Jenkins also), or use cmake
>  >>> -DGMX_BUILD_OWN_LIBXML2 to do the same download-and-build thing.
>  >>>
>  >>> Even if legal, I'm not so keen on bundling the libxml2 tarball at
>  >>> ~5MB, when gromacs is ~10MB. Bundling just the headers we need in
>  >>> order to use a system libxml2 might be a good option.
>  >>>
>  >>> The proposed bump to require CMake version 2.8.8 in Redmine/Gerrit
>  >>> should make this a little smoother than it has been in the past.
>  >>>
>  >>> I think there should be a wrapper layer between libxml2 and the
>  >>> GROMACS code that uses it, so that we have the option to change the
>  >>> implementation if we want to do so later.
>  >>>
>  >>> There was an interesting post from Marcus Hanwell from Kitware on this
>  >>> list earlier this year about how their projects handle this kind of
>  >>> thing,
>  >>>
> (http://gromacs.5086.x6.nabble.com/parallel-make-problems-td5009226.html) which
>  >>> seems like it should be what we should do now that we have several of
>  >>> these kinds of dependencies currently "living" in src/external (FFTW,
>  >>> Boost subset, TNG, now libxml2, maybe later PDBx or some FMM code,
>  >>> maybe gmxblas and gmxlapack should go live there). For 5.0, I can live
>  >>> with a hack that copies how we handle FFTW, though.
>  >>>
>  >>> Mark
>  >>>
>  >>>
>  >>> On Sun, Nov 10, 2013 at 9:35 PM, David van der
>  >>> Spoel<spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> <mailto:spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>>>wrote:
>  >>>
>  >>>
>  >>>     On 2013-11-10 20:58, Erik Lindahl wrote:
>  >>>
>  >>>         Hi,
>  >>>
>  >>>         One reason could be that we haven’t really started to use
>  >>>         standardized XML input/output formats yet, although we’re
>  >>>         heading there long term. I’m also not enough of an expert to
>  >>>         say whether libxml2 is the best XML parser out there, since
>  >>>         there are quite a few alternatives?
>  >>>
>  >>>         If there are any specific new modules that would need it,
>  >>>         doesn’t it make more sense to have those modules go through
>  >>>         the normal code review (including a discussion of whether the
>  >>>         proposed XML formats are nicely designed, etc), rather than
>  >>>         separately making XML a hard requirement even if the current
>  >>>         code doesn’t rely on it?
>  >>>
>  >>>     This is why I asked. I think we should refrain from adding more
>  >>>     text only poorly documented files that are prone to errors.
>  >>>     Therefore I used a XML files in this
>  >>>     patchhttps://gerrit.gromacs.org/#/__c/2659/
> <http://gerrit.gromacs.org/#/__c/2659/>
>  >>>     <https://gerrit.gromacs.org/#/c/2659/>, however Teemu pointed out
>  >>>
>  >>>     that it does not compile without XML (since there are no ifdefs).
>  >>>     So rather than implementing TWO versions of the code to read in
>  >>>     the necessary data, we have to decide to make this obligatory now
>  >>>     or not.
>  >>>
>  >>>     As regards compiling under windows (from the libxml2 website):
>  >>>
>  >>>     Libxml2 is known to be very portable, the library should build and
>  >>>     work without serious troubles on a variety of systems (Linux,
>  >>>     Unix, Windows, CygWin, MacOS, MacOS X, RISC Os, OS/2, VMS, QNX,
>  >>>     MVS, VxWorks, ...)
>  >>>
>  >>>     It is distributed under the MIT license so I guess we could even
>  >>>     include it in the source code as a backup. With the known
>  >>>     portability and the license I don't see any reason not to. It
>  >>>     comes pre-installed on Macs and Linux by the way.
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>         Cheers,
>  >>>
>  >>>         Erik
>  >>>
>  >>>         On 10 Nov 2013, at 11:51, David van der Spoel
>  >>>         <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> <mailto:spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>>> wrote:
>  >>>
>  >>>             Hi,
>  >>>
>  >>>             we have decided a long time ago that for 5.0 libxml2 would
>  >>>             be required.
>  >>>             If there is any reason why we should remain to be able to
>  >>>             compile
>  >>>             gromacs WITHOUT libxml2, please speak up now.
>  >>>
>  >>>             If no good arguments will be brought forward then I will
>  >>>             change the main
>  >>>             CMakeList.txt such that gromacs will not compile
> without it.
>  >>>
>  >>>             --
>  >>>             David van der Spoel, Ph.D., Professor of Biology
>  >>>             Dept. of Cell & Molec. Biol., Uppsala University.
>  >>>             Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
>  >>>             <tel:%2B46184714205>.
>  >>> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
>  >>>             <mailto:spoel at xray.bmc.uu.se
> <mailto:spoel at xray.bmc.uu.se>>http://folding.bmc.uu.se
>  >>>             <http://folding.bmc.uu.se/>
>  >>>             --
>  >>>             gmx-developers mailing list
>  >>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> <mailto:gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>  >>> http://lists.gromacs.org/__mailman/listinfo/gmx-__developers
>  >>>
>  >>>             <http://lists.gromacs.org/mailman/listinfo/gmx-developers>
>  >>>             Please don't post (un)subscribe requests to the list.
> Use the
>  >>>             www interface or send it
>  >>>             togmx-developers-request at __gromacs.org <http://gromacs.org>
>  >>>             <mailto:gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>>.
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>
>  >>>     --
>  >>>     David van der Spoel, Ph.D., Professor of Biology
>  >>>     Dept. of Cell & Molec. Biol., Uppsala University.
>  >>>     Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
>  >>>     <tel:%2B46184714205>.
>  >>> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
>  >>>     <mailto:spoel at xray.bmc.uu.se
> <mailto:spoel at xray.bmc.uu.se>>http://folding.bmc.uu.se
>  >>>     <http://folding.bmc.uu.se/>
>  >>>     --
>  >>>     gmx-developers mailing list
>  >>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> <mailto:gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>  >>> http://lists.gromacs.org/__mailman/listinfo/gmx-__developers
>  >>>
>  >>>     <http://lists.gromacs.org/mailman/listinfo/gmx-developers>
>  >>>     Please don't post (un)subscribe requests to the list. Use the www
>  >>>     interface or send it togmx-developers-request at __gromacs.org
> <http://gromacs.org>
>  >>>     <mailto:gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>>.
>  >>>
>  >>>
>  >>> --
>  >>> gmx-developers mailing list
>  >>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> <mailto:gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>  >>>
>  >>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
>  >>> Please don't post (un)subscribe requests to the list. Use the
>  >>> www interface or send it togmx-developers-request at gromacs.org
> <mailto:togmx-developers-request at gromacs.org>
>  >>> <mailto:gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>>.
>  >>
>  >>
>  >>
>  >>
>  >
>  >
>  > --
>  > David van der Spoel, Ph.D., Professor of Biology
>  > Dept. of Cell & Molec. Biol., Uppsala University.
>  > Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205.
>  > spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> http://folding.bmc.uu.se
>  > --
>  > gmx-developers mailing list
>  > gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>  > http://lists.gromacs.org/mailman/listinfo/gmx-developers
>  > Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
>
>
>


-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se



More information about the gromacs.org_gmx-developers mailing list