[gmx-developers] Re: markup for macromolecules in biology

Erik Lindahl lindahl at csb.stanford.edu
Tue Oct 14 03:44:25 CEST 2003


This interaction is great - especially since some of you probably have 
more experience in actually using XML.

Our background is essentially this:

1. As we are implementing more forcefields, and people are starting to 
use Gromacs for non-protein stuff, we've realized we need a much more 
flexible way of specifying parameters and molecules. For now we've 
hacked and extended our formats - that is both a pain, and it hurts 
portability. I'm also working with Warren DeLano to add Gromacs support 
in Pymol. Reading files is trivial, but it would be better if we could 
also read the topology to get bonds and charges correct. Again, this 
won't work if the Gromacs topology format is changing and becomes 
out-of-sync with pymol.

2. We would also like to keep the description of the molecule 
connectivity separate from the actual implementation in a certain 
forcefield (atom types, united vs. all atoms, etc.) - this would also 
make it easier for other programs to read our files without parsing all 
the detailed atom types and bond parameters, etc.

XML is an obvious solution for this - my current plan is to use vanilla 
CML for the molecule description, and then have a program read our own 
XML force field files to convert it into a specific topology (that will 
probably be internal in the program, though - it was just too difficult 
to do in XSLT for me :-)

We are somewhat hesitant on using XML "everywhere" - at least not as a 
default format. It would be fairly straightforward to allow input 
parameters (number of steps, cutoff radius, etc) as XML, but it really 
doesn't buy us much, and it would be harder for the users to always 
edit proper XML files in their input. The same goes for writing 
coordinates, trajectories and energies: we have have very compact and 
portable binary formats that we probably want to stick to.

However, we're always open to good arguments :-)

Do I understand it right that your point of view is rather how to 
communicate input and output between programs, with grid applications?
We already support several file formats, so it should be 
straightforward to define another one - together we could probably 
"force" it as a virtual standard :-)

One issue is how detailed the markup should be. If it isn't the default 
format it might make sense to have it very detailed for coordinate 
files, but trajectories and compressed trajectories should probably be 
binary for size reasons. We have extremely efficient code for 
reading/writing compressed coordinates that we could release completely 
free (BSD license or something...)



On Sunday, October 12, 2003, at 07:56 AM, Kaihsu Tai wrote:

> (copying to the GROMACS developers mailing list as well)
> Peter Murray-Rust, 2003-10-12 11:39:58+0100:
>>> We are hoping to start a collaboration with the developers
>>> of the molecular dynamics simulation package GROMACS
>>> "http://www.gromacs.org/".
>> Have you contacted them?
>> The GROMACS group have contacted me about jointly creating an XML
>> representation for GROMACS. We don't have details yet but I would 
>> expect
>> that it would be able to support your requirements.
> Yes, we have contacted the GROMACS developers.  My colleague
> Stuart Murdock talked with David van der Spoel at a
> conference this week.  Mark Sansom suggested that we have a
> 'triangle' of collaboration....
>>> At the moment BioSimGrid has a relational database as its
>>> backend, but so far it appears that a frontend based on some
>>> standardized markup language across simulation packages
>>> would be helpful in depositing data.  Would you like to
>>> further discuss this matter with us, and possibly also the
>>> GROMACS developers?
>> I would be delighted to talk with you. Basically there is a family of
>> markup languages based on CML (Chemical Markup Language). We have 
>> tools
>> that allow each domain to create their own XML schema. We have 
>> languages in
>> the area of CMLComp (computational), CMLCM (condensed matter) that 
>> would be
>> most relevant for your interests. These are not finalised and we are 
>> very
>> happy to be driven by real problems.
>> How should we take this forward - should I come to Oxford - and 
>> perhaps
>> give a demonstration. Or do you wish to come to Cambridge?
> I think the most efficient way to start is for us to read
> some of your documents, so we know what to expect.  Would
> you please send some pointers?  Thanks.
> _______________________________________________
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://www.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.

More information about the gromacs.org_gmx-developers mailing list