[gmx-developers] Re: TNG format in Gromacs
Roland Schulz
roland at utk.edu
Tue Apr 17 17:10:25 CEST 2012
On Tue, Apr 17, 2012 at 10:58 AM, David van der Spoel
<spoel at xray.bmc.uu.se>wrote:
> On 2012-04-17 16:47, Roland Schulz wrote:
> > Hi,
> >
> > On Tue, Apr 17, 2012 at 9:48 AM, Erik Lindahl <erik at kth.se
> > <mailto:erik at kth.se>> wrote:
> >
> >
> > On Apr 17, 2012, at 3:18 PM, Roland Schulz wrote:
> >
> > > E.g. for parallelization the issue is very similar as it is for
> > portability. Supporting domain decomposition makes it more difficult
> > for everyone and everyone has to make sure that they don't brake it.
> > And it is only included because it essential to Gromacs and used by
> > almost everyone.
> >
> > Right - and that's of course something we don't want to push down
> > just on the few people working with parallelization :-) We don't
> > have automated tests for it yet, but when we have more functional
> > tests the idea is that we should automatically reject patches that
> > break parallel runs!
> >
> > Yes. But we only do it for parallelization because the majority (in this
> > case probably everyone) agrees that this is important. We wouldn't
> > accept a feature which would be as time consuming for every developer as
> > parallelization is, but only useful for a small minority. :-)
> >
> > I simply don't buy the argument that just because these 1132 lines
> > are not perfect (they obviously aren't) portability doesn't matter
> > at all and we might as well include 10 megabytes of additional
> > source code where we have no control of the portability.
> >
> > I didn't say portability isn't important at all. All I'm saying is that
> > portability shouldn't be treated as a Boolean. In practice portability
> > is, as any other metric, a scale. And the decision to support 99.9% of
> > platforms instead 99.5% should be a matter of cost benefit analysis as
> > is adding a new feature.
> >
> > > But I think that "fancy" IO is also an optional feature. I agree
> > that it is a very important feature and it has many disadvantages if
> > the same format is not used everywhere. But it is also
> > non-essential. And at that point it should become a matter of
> > cost-benefit and not a matter of principal. I.e. how many people
> > benefit from features made possible by HDF5 (e.g. because limited
> > developer time wouldn't allow them without HDF5) versus how much of
> > a pain is it to the few people how have to live with XTC (and
> > conversion). And one very important factor in that cost-benefit
> > analysis is the ratio of users.
> >
> > But now you are moving the goal-posts! The aim of the present
> > TNG-based project was NOT "fancy" IO, but a new default simple
> > portable Gromacs trajectory format that (1) includes headers for
> > atom names and stuff, (2) is a small free library that can easily be
> > contributed to other codes so they can read/write our files, and (3)
> > enable better compression.
> >
> > What I meant with "fancy" IO was that it is optional. These 3 things
> > aren't required to run a simulation on an exotic platform (e.g. Kei) and
> > to be able to analysis the results (after potentially converting).
> >
> > It would of course be nice if this format also allowed efficient
> > parallel IO and advanced slicing, but that has never been the
> > primary goal of the file format project, in particular not if it
> > starts to come in conflict with the aims above.
> >
> > As a said before, parallel IO isn't the issue. (Simple) parallel writing
> > is easier without HDF5. Parallel reading (for analysis) is possible as
> > long as the format is seekable (can be easily added even to XTC by
> > creating a 2nd file with the index).
> >
> >
> > Having said that, we just discussed things here in the lab, and one
> > alternative could be to have a simple built-in HDF5 implementation
> > that can write correct headers for 1-3 dimensional arrays so our
> > normal files are HDF5-compliant when written on a single node. This
> > should be possible to do in ~100k of source code. If there is no
> > external HDF5 library present, this will be the only alternative
> > supported, and you will not be able to use e.g. parallel IO - but
> > the file format will work.
> >
> >
> > Option 1) Up to 100k lines we have to write and support. And the code
> > can only use the subset of HDF5 supported.
> > Option 2) Users on very exotic platforms have to keep using XTC and in
> > post-production convert their files (only if they want to benefit of
> > HDF5 advantages in analysis)
> >
> > I really don't see how Option 1 could win in any reasonable
> > cost benefit analysis. :-)
> >
> > BTW: All of HDF5 is 135k lines (according to sloccount, exluding C++, HL
> > or Fortran binding). And HDF5 has all OS depending functions (IO,
> > threads, ..) abstracted. Thus only a small part (18 files, total 9300
> > lines - this includes the respective headers and the abstraction layer
> > itself) have any #ifdef for windows. Thus only those files would need to
> > be touched to add support for a non POSIX, WINDOWS, or VMS OS. It is
> > even possible to write an own low level file layer
> > (http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html) which could be
> > based on futil.c to have our own OS abstraction.
> >
> > The caveat is what happens to the physical file format when HDF5
> > writes parallel IO? Will this result in a file with different
> > properties that is difficult for us to read with a naive
> > implementation?
> >
> > No problem. HDF5 parallel IO doesn't produce different formats. It
> > writes in standard chunks (which would need to be supported anyhow for
> > block compression and fast seek).
> >
> > Roland
> >
>
> Nice discussion. Just wanted to point out that if GROMACS needs HDF5 the
> big-iron vendors will help porting HDF5 to their platforms.
>
> By the way, has anyone worked on a port to iOS yet :) ?
>
It seems ;-)
http://code.google.com/p/ios-face-detection/source/browse/OpenCV-2.2.0/include/opencv2/flann/hdf5.h?r=d35a62f475aa2813e4f3c80e50c33b7112389746
http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/2010-January/002357.html
Roland
>
>
> --
> David van der Spoel, Ph.D., Professor of Biology
> Dept. of Cell & Molec. Biol., Uppsala University.
> Box 596, 75124 Uppsala, Sweden. Phone: +46184714205.
> spoel at xray.bmc.uu.se http://folding.bmc.uu.se
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
>
>
--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120417/17a07106/attachment.html>
More information about the gromacs.org_gmx-developers
mailing list