[gmx-users] GROMACS and XML

Peter Murray-Rust pm286 at cam.ac.uk
Wed Apr 17 09:42:59 CEST 2002

At 21:04 16/04/2002 +0200, David van der Spoel wrote:
>On Tue, 16 Apr 2002, Peter Murray-Rust wrote:
>I think the latest release will compile under Windows as well, IRIC with
>both Cygwin and MS tools.

Thanks (and to ErikL as well)  - sounds as if I should install Cygwin.

>I am currently working on creating XML data
>formats using the libxml2 library. (src/gmxlib/xmlio.c is very primitive
>and not functional in the 3.1 release, in CVS it is beginning to get
>somewhat more complete). The main effort goes into creating DTDs right

After I mailed the list I noticed that there was an XML activity in 
GROMACS. This is really great because XML can significantly increase the 
(re)use of programs and data. CML (Chemical Markup Language) has been 
developed for "small molecules" and is now starting to become widely used. 
I have been extending the design to cover computational chemistry including 
MM, MD and MO methods. We are still at an early stage and I am very 
fortunate in that Herman Berendsen is spending 3 months in Cambridge. He 
and I have just started thinking about how we can abstract the input and 
output to such programs so that it is program-independent. This is a very 
significant challenge, of course, and we have to balance the generality 
(across all such computational experiments) against the complexity of the 
DTD (or Schema - I have just converted CML to Schemas and it has 
considerable advantages for validation).

Some general principles of XML design... XML is designed to be re-used so 
it's worth looking at existing solutions for components of the information. 
Thus MathML (semantic) should be able to manage the equations, HTML any 
text and images, SVG the graphics and CML the (static) small molecule 
information. There is no complete agreement on macromolecular structures in 
XML but it may well be based on mmCIF (as this is the basis of the OMG 
submission). There is a large amount of information in most scientific 
disciplines which is scalar/array data with dataTypes, units and semantics 
which can be added from dictionaries. I have developed a language to 
support this (STMML) and this may be useful. In any case you are going to 
have to build a dictionary of concepts, with dataTypes, units and 
definitions :-) The more that this dictionary can be shared with other 
codes, the more that generic tools can be built for input, validation, 
analysis and display.

One general rule: for every XML element ("tag") you create software has to 
exist! So always bear in mind how the XML is to be processed. For CML we 
have three approaches - XSLT (probably the easiest and most generally 
applicable), CML-DOM and CML-SAX. I expect that in computational chemistry 
the XSLT approach will initially the most useful.

I certainly don't want to reinvent anything that has already been done - is 
the GROMACS DTD reachable from the website or is it in a CVS repository?


>If you don't want to use configure you can ofcourse make your code
>conditional (#ifdef) and compile with certain CPPFLAGS and LDFLAGS.
>Groeten, David.
>Dr. David van der Spoel,        Biomedical center, Dept. of Biochemistry
>Husargatan 3, Box 576,          75123 Uppsala, Sweden
>phone:  46 18 471 4205          fax: 46 18 511 755
>spoel at xray.bmc.uu.se    spoel at gromacs.org   http://zorn.bmc.uu.se/~spoel

Peter Murray-Rust, pm286 AT cam.ac.uk
Unilever Centre for Molecular Informatics, Chemistry Department
Lensfield Road, Cambridge, CB2 1EW, UK

More information about the gromacs.org_gmx-users mailing list