[gmx-developers] Re: TNG format in Gromacs
David van der Spoel
spoel at xray.bmc.uu.se
Tue Apr 17 16:58:03 CEST 2012
On 2012-04-17 16:47, Roland Schulz wrote:
> Hi,
>
> On Tue, Apr 17, 2012 at 9:48 AM, Erik Lindahl <erik at kth.se
> <mailto:erik at kth.se>> wrote:
>
>
> On Apr 17, 2012, at 3:18 PM, Roland Schulz wrote:
>
> > E.g. for parallelization the issue is very similar as it is for
> portability. Supporting domain decomposition makes it more difficult
> for everyone and everyone has to make sure that they don't brake it.
> And it is only included because it essential to Gromacs and used by
> almost everyone.
>
> Right - and that's of course something we don't want to push down
> just on the few people working with parallelization :-) We don't
> have automated tests for it yet, but when we have more functional
> tests the idea is that we should automatically reject patches that
> break parallel runs!
>
> Yes. But we only do it for parallelization because the majority (in this
> case probably everyone) agrees that this is important. We wouldn't
> accept a feature which would be as time consuming for every developer as
> parallelization is, but only useful for a small minority. :-)
>
> I simply don't buy the argument that just because these 1132 lines
> are not perfect (they obviously aren't) portability doesn't matter
> at all and we might as well include 10 megabytes of additional
> source code where we have no control of the portability.
>
> I didn't say portability isn't important at all. All I'm saying is that
> portability shouldn't be treated as a Boolean. In practice portability
> is, as any other metric, a scale. And the decision to support 99.9% of
> platforms instead 99.5% should be a matter of cost benefit analysis as
> is adding a new feature.
>
> > But I think that "fancy" IO is also an optional feature. I agree
> that it is a very important feature and it has many disadvantages if
> the same format is not used everywhere. But it is also
> non-essential. And at that point it should become a matter of
> cost-benefit and not a matter of principal. I.e. how many people
> benefit from features made possible by HDF5 (e.g. because limited
> developer time wouldn't allow them without HDF5) versus how much of
> a pain is it to the few people how have to live with XTC (and
> conversion). And one very important factor in that cost-benefit
> analysis is the ratio of users.
>
> But now you are moving the goal-posts! The aim of the present
> TNG-based project was NOT "fancy" IO, but a new default simple
> portable Gromacs trajectory format that (1) includes headers for
> atom names and stuff, (2) is a small free library that can easily be
> contributed to other codes so they can read/write our files, and (3)
> enable better compression.
>
> What I meant with "fancy" IO was that it is optional. These 3 things
> aren't required to run a simulation on an exotic platform (e.g. Kei) and
> to be able to analysis the results (after potentially converting).
>
> It would of course be nice if this format also allowed efficient
> parallel IO and advanced slicing, but that has never been the
> primary goal of the file format project, in particular not if it
> starts to come in conflict with the aims above.
>
> As a said before, parallel IO isn't the issue. (Simple) parallel writing
> is easier without HDF5. Parallel reading (for analysis) is possible as
> long as the format is seekable (can be easily added even to XTC by
> creating a 2nd file with the index).
>
>
> Having said that, we just discussed things here in the lab, and one
> alternative could be to have a simple built-in HDF5 implementation
> that can write correct headers for 1-3 dimensional arrays so our
> normal files are HDF5-compliant when written on a single node. This
> should be possible to do in ~100k of source code. If there is no
> external HDF5 library present, this will be the only alternative
> supported, and you will not be able to use e.g. parallel IO - but
> the file format will work.
>
>
> Option 1) Up to 100k lines we have to write and support. And the code
> can only use the subset of HDF5 supported.
> Option 2) Users on very exotic platforms have to keep using XTC and in
> post-production convert their files (only if they want to benefit of
> HDF5 advantages in analysis)
>
> I really don't see how Option 1 could win in any reasonable
> cost benefit analysis. :-)
>
> BTW: All of HDF5 is 135k lines (according to sloccount, exluding C++, HL
> or Fortran binding). And HDF5 has all OS depending functions (IO,
> threads, ..) abstracted. Thus only a small part (18 files, total 9300
> lines - this includes the respective headers and the abstraction layer
> itself) have any #ifdef for windows. Thus only those files would need to
> be touched to add support for a non POSIX, WINDOWS, or VMS OS. It is
> even possible to write an own low level file layer
> (http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html) which could be
> based on futil.c to have our own OS abstraction.
>
> The caveat is what happens to the physical file format when HDF5
> writes parallel IO? Will this result in a file with different
> properties that is difficult for us to read with a naive
> implementation?
>
> No problem. HDF5 parallel IO doesn't produce different formats. It
> writes in standard chunks (which would need to be supported anyhow for
> block compression and fast seek).
>
> Roland
>
Nice discussion. Just wanted to point out that if GROMACS needs HDF5 the
big-iron vendors will help porting HDF5 to their platforms.
By the way, has anyone worked on a port to iOS yet :) ?
--
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
spoel at xray.bmc.uu.se http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers
mailing list