[gmx-developers] Re: TNG format in Gromacs

Szilárd Páll szilard.pall at cbr.su.se
Tue Apr 17 18:14:02 CEST 2012

On Tue, Apr 17, 2012 at 5:10 PM, Erik Lindahl <erik at kth.se> wrote:
> Hi,
> On Apr 17, 2012, at 4:47 PM, Roland Schulz wrote:
>> But now you are moving the goal-posts!  The aim of the present TNG-based project was NOT "fancy" IO, but a new default simple portable Gromacs trajectory format that (1) includes headers for atom names and stuff, (2) is a small free library that can easily be contributed to other codes so they can read/write our files, and (3) enable better compression.
>> What I meant with "fancy" IO was that it is optional. These 3 things aren't required to run a simulation on an exotic platform (e.g. Kei) and to be able to analysis the results (after potentially converting).
> No. A new default file format will not be "optional".
> To be fairly blunt: this work is financed from an EU project with a deliverable to produce a small library that we can offer for inclusion in other codes and push as a standard MD file format, so that part is simply not negotiable.
>> Option 1) Up to 100k lines we have to write and support. And the code can only use the subset of HDF5 supported.
>> Option 2) Users on very exotic platforms have to keep using XTC and in post-production convert their files (only if they want to benefit of HDF5 advantages in analysis
>> I really don't see how Option 1 could win in any reasonable cost benefit analysis. :-)
> Option (2) is not on the table. We simply are NOT introducing a new _default_ file format that does not work everywhere, since that does not fulfill the EU deliverable we have. In that case we are simply going with alternative (3), which is "ditch HDF5 completely" for TNG.  This might sound extreme, but if we have to keep support XTC indefinitely we will in practice never get a new default format - that would de facto still be XTC!
> Again, let's separate things:
> 1) Optional file formats: Here I'm flexible and will accept almost anything.
> 2) Default core Gromacs file formats: We must be able to get these working on any platform with a reasonable amount of work, and we must be able to provide a small library for other programs to use.
> If (2) is possible with a very scaled-down HDF5 implementation that somebody is willing to support and fix portability issues for, that could be an interesting alternative for the container, but if that isn't the case we cannot use HDF5 for (2) - but it would still be fine for (1).
> However, I still haven't seen any discussion about the *concrete* features HDF5 would provide for the new XTC-like format?

Not trying to argue for or against HDF5, I just have an observation.

It sounds like the main arguments against HDF5 are:

a) The need for a new format which will *immediately* replace XTC as
the default in Gromacs and seems to have requirements that pretty much
exclude any external library (that is not as widespread as libc :).

b) It's not in line with the EU deliverable which requires a new
library with certain specs to be written. Could it be that this is the
classic case where the specification was created before the actual
requirement engineering?

c) The apparent need for ultimate portability to extremely rare and
exotic architectures without accepting XTC as a fallback on these
platforms (with conversion for post-processing and analysis). I might
be wrong, but to me it seems that these extremely rare architectures
are often more showcase platforms rather than the iron on which 99% of
the science is carried out.

Correct me if I'm wrong, but XTC can't just be phased out in a day or
two. Supporting it will be needed in the foreseeable future will
require considerable effort -- especially with the major code
reorganization imposed by 5.0. This just *adds* to the effort of
developing+testing+maintaining an entirely new format written from
scratch, which aims to be nothing less than a universal future-proof
format with aspirations toward becoming a standard (= lots of
design+development effort). I can safely say that the manpower
required by these two tasks is quite heavy. Choosing to do everything
from scratch might maximize the benefit on the long run, but it also
maximizes the effort required already for v1.0.

Showcasing GROMACS is certainly quite important e.g. for funding.
However, if requiring ultimate portability of *every* new piece of
code limits possibilities and considerably slows down the development,
I would argue that this is the textbook example of the software
engineering 80/20 rule. Wouldn't it be beneficial to struck a balance
between portability and effort/time required by accepting a short-term
"compromise" (which isn't really a compromise if we don't consider
ultimate portability a strict requirement :). XTC will have to be
anyway maintained anyway so it could as well be kept as the
alternative for platforms where the new format is not supported in
early versions. So a file format that works on all reference platforms
(that we can and should simply list and track) with XTC  as a fallback
for the exotic iron should be an acceptable compromise, I'd say.


> Cheers,
> Erik
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.

More information about the gromacs.org_gmx-developers mailing list