[gmx-developers] libxml2
David van der Spoel
spoel at xray.bmc.uu.se
Wed Nov 13 19:57:44 CET 2013
On 2013-11-13 15:00, Mark Abraham wrote:
>
> On Nov 11, 2013 8:33 AM, "David van der Spoel" <spoel at xray.bmc.uu.se
> <mailto:spoel at xray.bmc.uu.se>> wrote:
> >
> > On 2013-11-11 02:43, Erik Lindahl wrote:
> >>
> >> Hi,
> >>
> >> I’m fine with having it as a hard dependency, provided we’ve had it
> >> compile automatically during installs for a while without complaints (it
> >> has been on by default for 4.6, right?).
> >>
> >> However, I also sat down and looked a bit at the XML files in David’s
> >> patch, and this made me realize we need a broader approach.
> >>
> >> Just introducing XML is not going to help us much, in particular not if
> >> we just add a generic “XML” file type. This would be like merely having
> >> “BIN” and “ASCII” types for all other files. Before we know it, we are
> >> going to have a dozen of XML formats for different programs that have
> >> nothing to do with each other, and there will be lots of input/output
> >> routines in all programs processing them just-so-slightly differently.
> >> In addition, adding XML tags instead of relying on tabs/space/newline is
> >> of course a small step forward, but just a very small one - to really
> >> fix things we need to make things even more structured.
> >>
> >> Some things I would like to see before we start using XML:
> >>
> >> 1) We need proper namespaces and sub-namespaces, so we can tell
> >> different XML components from each other. This will also require us to
> >> think a bit about information in general, even if we don’t implement all
> >> components from the start. There are going to be lots of places where we
> >> specify information on a residue and atom basis - how should all these
> >> relate to each other? When are things forcefield-specific vs. general,
> >> and when should they go in the same vs. different files?
> >>
> >> 2) I think it makes a lot of sense to separate different XML files, so a
> >> future mdp replacement might have extension xmdp, while the xml toplogy
> >> has extension xtop. We should still be able to merge all of them in a
> >> single file (fine with namespaces), but this will avoid the problems
> >> when we specify an XML mdp file where a program was expecting an XML top
> >> file (in other words, no generic “XML” file format that can contain
> >> anything).
> >>
> >> 3) We need to think through naming carefully. In particlar: No custom
> >> abbreviations unless it is really necessary. We should also use proper
> >> names for types and similar settings, rather than merely translating our
> >> old integer selectors to XML.
> >>
> >> 4) For any measurement, we should have units.
> >>
> >> 5) To enable XSLT transformations and better namespace handling, I think
> >> we should standardize on (and require) schema descriptions for
> >> validation, rather than the older DTDs.
> >>
> >> 6) We need some good common modules for reading/writing generic
> >> structured data, so the actual files are isolated from the programs
> >> using them.
> >>
> >>
> >>
> >> Some of this will take time, but my worry about pushing ahead and
> >> starting to use XML anyway for individual programs is that it might
> >> easily soon create a similar divergent mess as we’ve had with the
> >> current text files?
> >>
> >
> > I guess this will prevent us from using xml in practice. We have
> discussed xml for ten years or so, but the transition to xml schema is a
> real show stopper. I don't have the time to learn that as well. Does
> that imply I should stop developing? In addition, for many small files
> you don't need a dtd or schema (and in fact there isn't one for these
> xml files), it's just that the libxml2 library demands you put it into
> the file. If we're talking rtp files then that's another matter where
> more structure is needed.
>
> I agree with the desire to have a schema with which to validate and
> parse, or the exercise just reduces to user-space punctuation spam.
> However, it's too late now to embark on a wholesale schema design for
> 5.0. Even if not, that schema would have to evolve as we learned our
> needs better. So I suggest we/David comes up with something that meets
> the present need, and is reasonably likely to work in a future context.
> Probably that's just a matter of agreeing on a namespace name or two.
> I'm unsure of the technical merits of schemas vs DTDs, but if we would
> chuck out the initial schema later, it might as well be a DTD now, if
> that is easier.
>
The thing is that for small files it doesn't matter, neither DTD nor
Schema is used if you don't need it. I still have a hard time
comprehending why we would like to mix e.g. simulation data with all
possible other stuff.
> > Some other points, like having clear names and units I do agree with
> and can change it my present application.
> >
> > Common modules for writing and reading implies that all possible data
> should be merged into one or a few monster formats. This in itself will
> create extra problems.
> >
> > As for changing names of files, this shouldn't be necessary as one
> should be able to see from the content what kind of file this is. No
> strong feelings here but it would be very confusing to add many new
> files names.
> >
> > @Mark: an extra layer wouldn't help would it - there is no competing
> package as far as I know. There is, however, libxml++, a C++ wrapper
> around libxml2, which is slightly more logical to use in C++ code, but
> it would imply an extra library. On the other hand that might function
> as a thin wrapper around the library.
>
> I'm seeking a GROMACS-implemented function layer so that e.g. every
> analysis tool module is not including a libxml2 header and hard-coding
> calls to its API. For example, the PME code does not call FFTW directly,
> it calls the wrapper code that contains all the versioning by FFT
> library. This style delivers lots of benefits. Likewise XDR. For XML, we
> probably only need open, close, validate, read, parse and xpath-query
> functions. Maybe some writing routines at some point. These functions
> only need to pass through the arguments for now, but details depend on
> the actual use.
Just check libxml++ but that introduces another dependency so that's
out. I will draft a gromacs frontend in C++ for libxml2 with just subset
of the functionality. There is however one issue: XML can be read in two
fashions, using the DOM (Document Object Model) and using SAX (Simple
API for XML). Until now I have used the DOM, which reads a whole
document into memory, but the memory usage can be prohibitive. SAX
should therefore be the preferred route. Any comments on that?
>
> > Finally: The transition to c++ is hard enough on most developers - I
> have been struggling with it over the last year, and slowly learning
> with lots of helpful comments from Teemu. Let's try to keep life as
> simple as we can - but not simpler.
>
> Of course. Equally, trying to do things to save future-us pain has costs
> now!
>
> Mark
>
> >> Cheers,
> >>
> >> Erik
> >>
> >>
> >>
> >>
> >> On 10 Nov 2013, at 15:34, Mark Abraham <mark.j.abraham at gmail.com
> <mailto:mark.j.abraham at gmail.com>
> >> <mailto:mark.j.abraham at gmail.com <mailto:mark.j.abraham at gmail.com>>>
> wrote:
> >>
> >>> My experience of libxml2 has been favourable. I'm happy with a
> >>> dependency on it, but someone needs to identify a version (preferably
> >>> one that is known to be in package repos and/or have binaries
> >>> available on the web). I would suggest we implement the dependency
> >>> roughly as we do for FFTW:
> >>>
> >>> * the install guide drops suitable hints to go get libxml2-dev(el)
> >>> from your favourite repo (note that libxml2 might be installed by
> >>> default, but we might need the #include headers that are only in the
> >>> -dev or -devel packages!)
> >>> * CMake detects if those exist in CMAKE_PREFIX_PATH, and gives a fatal
> >>> error if not found.
> >>> * the fatal error can be avoided by either letting the user supply a
> >>> libxml2 tarball (e.g. so we can test in Jenkins also), or use cmake
> >>> -DGMX_BUILD_OWN_LIBXML2 to do the same download-and-build thing.
> >>>
> >>> Even if legal, I'm not so keen on bundling the libxml2 tarball at
> >>> ~5MB, when gromacs is ~10MB. Bundling just the headers we need in
> >>> order to use a system libxml2 might be a good option.
> >>>
> >>> The proposed bump to require CMake version 2.8.8 in Redmine/Gerrit
> >>> should make this a little smoother than it has been in the past.
> >>>
> >>> I think there should be a wrapper layer between libxml2 and the
> >>> GROMACS code that uses it, so that we have the option to change the
> >>> implementation if we want to do so later.
> >>>
> >>> There was an interesting post from Marcus Hanwell from Kitware on this
> >>> list earlier this year about how their projects handle this kind of
> >>> thing,
> >>>
> (http://gromacs.5086.x6.nabble.com/parallel-make-problems-td5009226.html) which
> >>> seems like it should be what we should do now that we have several of
> >>> these kinds of dependencies currently "living" in src/external (FFTW,
> >>> Boost subset, TNG, now libxml2, maybe later PDBx or some FMM code,
> >>> maybe gmxblas and gmxlapack should go live there). For 5.0, I can live
> >>> with a hack that copies how we handle FFTW, though.
> >>>
> >>> Mark
> >>>
> >>>
> >>> On Sun, Nov 10, 2013 at 9:35 PM, David van der
> >>> Spoel<spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> <mailto:spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>>>wrote:
> >>>
> >>>
> >>> On 2013-11-10 20:58, Erik Lindahl wrote:
> >>>
> >>> Hi,
> >>>
> >>> One reason could be that we haven’t really started to use
> >>> standardized XML input/output formats yet, although we’re
> >>> heading there long term. I’m also not enough of an expert to
> >>> say whether libxml2 is the best XML parser out there, since
> >>> there are quite a few alternatives?
> >>>
> >>> If there are any specific new modules that would need it,
> >>> doesn’t it make more sense to have those modules go through
> >>> the normal code review (including a discussion of whether the
> >>> proposed XML formats are nicely designed, etc), rather than
> >>> separately making XML a hard requirement even if the current
> >>> code doesn’t rely on it?
> >>>
> >>> This is why I asked. I think we should refrain from adding more
> >>> text only poorly documented files that are prone to errors.
> >>> Therefore I used a XML files in this
> >>> patchhttps://gerrit.gromacs.org/#/__c/2659/
> <http://gerrit.gromacs.org/#/__c/2659/>
> >>> <https://gerrit.gromacs.org/#/c/2659/>, however Teemu pointed out
> >>>
> >>> that it does not compile without XML (since there are no ifdefs).
> >>> So rather than implementing TWO versions of the code to read in
> >>> the necessary data, we have to decide to make this obligatory now
> >>> or not.
> >>>
> >>> As regards compiling under windows (from the libxml2 website):
> >>>
> >>> Libxml2 is known to be very portable, the library should build and
> >>> work without serious troubles on a variety of systems (Linux,
> >>> Unix, Windows, CygWin, MacOS, MacOS X, RISC Os, OS/2, VMS, QNX,
> >>> MVS, VxWorks, ...)
> >>>
> >>> It is distributed under the MIT license so I guess we could even
> >>> include it in the source code as a backup. With the known
> >>> portability and the license I don't see any reason not to. It
> >>> comes pre-installed on Macs and Linux by the way.
> >>>
> >>>
> >>>
> >>>
> >>> Cheers,
> >>>
> >>> Erik
> >>>
> >>> On 10 Nov 2013, at 11:51, David van der Spoel
> >>> <spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> <mailto:spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> we have decided a long time ago that for 5.0 libxml2 would
> >>> be required.
> >>> If there is any reason why we should remain to be able to
> >>> compile
> >>> gromacs WITHOUT libxml2, please speak up now.
> >>>
> >>> If no good arguments will be brought forward then I will
> >>> change the main
> >>> CMakeList.txt such that gromacs will not compile
> without it.
> >>>
> >>> --
> >>> David van der Spoel, Ph.D., Professor of Biology
> >>> Dept. of Cell & Molec. Biol., Uppsala University.
> >>> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
> >>> <tel:%2B46184714205>.
> >>> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> >>> <mailto:spoel at xray.bmc.uu.se
> <mailto:spoel at xray.bmc.uu.se>>http://folding.bmc.uu.se
> >>> <http://folding.bmc.uu.se/>
> >>> --
> >>> gmx-developers mailing list
> >>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> <mailto:gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
> >>> http://lists.gromacs.org/__mailman/listinfo/gmx-__developers
> >>>
> >>> <http://lists.gromacs.org/mailman/listinfo/gmx-developers>
> >>> Please don't post (un)subscribe requests to the list.
> Use the
> >>> www interface or send it
> >>> togmx-developers-request at __gromacs.org <http://gromacs.org>
> >>> <mailto:gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>>.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> David van der Spoel, Ph.D., Professor of Biology
> >>> Dept. of Cell & Molec. Biol., Uppsala University.
> >>> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205
> >>> <tel:%2B46184714205>.
> >>> spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> >>> <mailto:spoel at xray.bmc.uu.se
> <mailto:spoel at xray.bmc.uu.se>>http://folding.bmc.uu.se
> >>> <http://folding.bmc.uu.se/>
> >>> --
> >>> gmx-developers mailing list
> >>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> <mailto:gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
> >>> http://lists.gromacs.org/__mailman/listinfo/gmx-__developers
> >>>
> >>> <http://lists.gromacs.org/mailman/listinfo/gmx-developers>
> >>> Please don't post (un)subscribe requests to the list. Use the www
> >>> interface or send it togmx-developers-request at __gromacs.org
> <http://gromacs.org>
> >>> <mailto:gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>>.
> >>>
> >>>
> >>> --
> >>> gmx-developers mailing list
> >>> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> <mailto:gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
> >>>
> >>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> >>> Please don't post (un)subscribe requests to the list. Use the
> >>> www interface or send it togmx-developers-request at gromacs.org
> <mailto:togmx-developers-request at gromacs.org>
> >>> <mailto:gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>>.
> >>
> >>
> >>
> >>
> >
> >
> > --
> > David van der Spoel, Ph.D., Professor of Biology
> > Dept. of Cell & Molec. Biol., Uppsala University.
> > Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
> > spoel at xray.bmc.uu.se <mailto:spoel at xray.bmc.uu.se>
> http://folding.bmc.uu.se
> > --
> > gmx-developers mailing list
> > gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> > http://lists.gromacs.org/mailman/listinfo/gmx-developers
> > Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
>
>
>
--
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
spoel at xray.bmc.uu.se http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers
mailing list