[gmx-developers] Re: TNG format in Gromacs

Tue Apr 17 16:58:03 CEST 2012

On 2012-04-17 16:47, Roland Schulz wrote:
> Hi,
>
> On Tue, Apr 17, 2012 at 9:48 AM, Erik Lindahl <erik at kth.se
> <mailto:erik at kth.se>> wrote:
>
>
>     On Apr 17, 2012, at 3:18 PM, Roland Schulz wrote:
>
>      > E.g. for parallelization the issue is very similar as it is for
>     portability. Supporting domain decomposition makes it more difficult
>     for everyone and everyone has to make sure that they don't brake it.
>     And it is only included because it essential to Gromacs and used by
>     almost everyone.
>
>     Right - and that's of course something we don't want to push down
>     just on the few people working with parallelization :-) We don't
>     have automated tests for it yet, but when we have more functional
>     tests the idea is that we should automatically reject patches that
>     break parallel runs!
>
> Yes. But we only do it for parallelization because the majority (in this
> case probably everyone) agrees that this is important. We wouldn't
> accept a feature which would be as time consuming for every developer as
> parallelization is, but only useful for a small minority. :-)
>
>     I simply don't buy the argument that just because these 1132 lines
>     are not perfect (they obviously aren't) portability doesn't matter
>     at all and we might as well include 10 megabytes of additional
>     source code where we have no control of the portability.
>
> I didn't say portability isn't important at all. All I'm saying is that
> portability shouldn't be treated as a Boolean. In practice portability
> is, as any other metric, a scale. And the decision to support 99.9% of
> platforms instead 99.5% should be a matter of cost benefit analysis as
> is adding a new feature.
>
>      > But I think that "fancy" IO is also an optional feature. I agree
>     that it is a very important feature and it has many disadvantages if
>     the same format is not used everywhere. But it is also
>     non-essential. And at that point it should become a matter of
>     cost-benefit and not a matter of principal. I.e. how many people
>     benefit from features made possible by HDF5 (e.g. because limited
>     developer time wouldn't allow them without HDF5) versus how much of
>     a pain is it to the few people how have to live with XTC (and
>     conversion). And one very important factor in that cost-benefit
>     analysis is the ratio of users.
>
>     But now you are moving the goal-posts!  The aim of the present
>     TNG-based project was NOT "fancy" IO, but a new default simple
>     portable Gromacs trajectory format that (1) includes headers for
>     atom names and stuff, (2) is a small free library that can easily be
>     contributed to other codes so they can read/write our files, and (3)
>     enable better compression.
>
> What I meant with "fancy" IO was that it is optional. These 3 things
> aren't required to run a simulation on an exotic platform (e.g. Kei) and
> to be able to analysis the results (after potentially converting).
>
>     It would of course be nice if this format also allowed efficient
>     parallel IO and advanced slicing, but that has never been the
>     primary goal of the file format project, in particular not if it
>     starts to come in conflict with the aims above.
>
> As a said before, parallel IO isn't the issue. (Simple) parallel writing
> is easier without HDF5. Parallel reading (for analysis) is possible as
> long as the format is seekable (can be easily added even to XTC by
> creating a 2nd file with the index).
>
>
>     Having said that, we just discussed things here in the lab, and one
>     alternative could be to have a simple built-in HDF5 implementation
>     that can write correct headers for 1-3 dimensional arrays so our
>     normal files are HDF5-compliant when written on a single node. This
>     should be possible to do in ~100k of source code. If there is no
>     external HDF5 library present, this will be the only alternative
>     supported, and you will not be able to use e.g. parallel IO - but
>     the file format will work.
>
>
> Option 1) Up to 100k lines we have to write and support. And the code
> can only use the subset of HDF5 supported.
> Option 2) Users on very exotic platforms have to keep using XTC and in
> post-production convert their files (only if they want to benefit of
> HDF5 advantages in analysis)
>
> I really don't see how Option 1 could win in any reasonable
> cost benefit analysis. :-)
>
> BTW: All of HDF5 is 135k lines (according to sloccount, exluding C++, HL
> or Fortran binding). And HDF5 has all OS depending functions (IO,
> threads, ..) abstracted. Thus only a small part (18 files, total 9300
> lines - this includes the respective headers and the abstraction layer
> itself) have any #ifdef for windows. Thus only those files would need to
> be touched to add support for a non POSIX, WINDOWS, or VMS OS. It is
> even possible to write an own low level file layer
> (http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html) which could be
> based on futil.c to have our own OS abstraction.
>
>     The caveat is what happens to the physical file format when HDF5
>     writes parallel IO? Will this result in a file with different
>     properties that is difficult for us to read with a naive
>     implementation?
>
> No problem. HDF5 parallel IO doesn't produce different formats. It
> writes in standard chunks (which would need to be supported anyhow for
> block compression and fast seek).
>
> Roland
>

Nice discussion. Just wanted to point out that if GROMACS needs HDF5 the 
big-iron vendors will help porting HDF5 to their platforms.

By the way, has anyone worked on a port to iOS yet :) ?

-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se