[gmx-developers] Re: TNG format in Gromacs

Roland Schulz roland at utk.edu
Tue Apr 17 15:18:36 CEST 2012


On Tue, Apr 17, 2012 at 2:44 AM, Erik Lindahl <lindahl at cbr.su.se> wrote:

>
> On Apr 16, 2012, at 10:08 PM, Roland Schulz wrote:
> >
> > Obviously, by requiring that an external library works everywhere, we
> decide that no external library can ever be used (because it is impossible
> to test against all platforms/compilers) :-). Also I don't think it makes
> any sense to choose portability over every other important metric
> (maintainability, features, ...), no matter how small the portability
> difference and how huge the other advantages. Additionally. I think it
> should be up to the developers who want a certain portability to put in the
> effort to get it (as it is with other features).
>
> Unilke other features (such as a better RNG or faster parallel IO),
> portability has to be a feature of every single line of code in Gromacs. It
> only takes 30 seconds for anybody to destroy it, while it can be months of
> time to fix it if it is necessary to reimplement code. You can't "put in"
> portability separately from the rest of the code.
>
E.g. for parallelization the issue is very similar as it is for
portability. Supporting domain decomposition makes it more difficult for
everyone and everyone has to make sure that they don't brake it. And it is
only included because it essential to Gromacs and used by almost everyone.


> > I think we should have a list of supported platforms/compilers. As a
> developer it should be my duty to make sure that any code changes work on
> any of those (ideally Jenkins should do it for me) and I should be
> responsible to fix any bugs on any supported platform in my code. On the
> other hand, bugs on any non supported platform should be the responsibility
> of the person who wants it.
>
> We already have this list: An ISO C (soon C++) compiler. If your code does
> not compile on a standards-compliant compiler, the code is buggy.
>
This is only an idea but not a practical rule. It is impossible to write
any OS depending code (e.g. file operations) with that requirement. E.g.
futil.c only works if the OS is either POSIX or Windows. And also this rule
is unclear about how it is implemented/tested. Is it enough to show that a
strict inheritance to the standard is followed (e.g. gcc -pendantic) or
does it require to test all possible compilers (impossible).


>
> > If we would require all developers to support all platforms, we would
> shift the responsibility to support esoteric platforms to the developers of
> new features, even if the majority has no interest in the platform. This is
> not how we handle new features and I don't see any reason why we should do
> that for portability.
>
> Again, the difference between a feature and portability is that you can
> live without a new feature if it is possible to turn off, while
> non-portable code means Gromacs won't run at all. Only requiring an ISO
> compiler is a core value we have had for almost 20 years, and that decision
> won't be changed easily. '
>
> Historically, the portability to new platforms has been a tremendous
> advantage for Gromacs.
>
As far as I know, at least for the last few years portability to exotic
platforms (e.g. PS3, K) has been important to only a very small part of
users/developers.


> > I think we shouldn't require better portability to external code than we
> require for own own.
>
> Indeed - but the whole point is that Gromacs code that isn't portable when
> used with an ISO C/C++ compiler should result in a redmine issue and be
> fixed. When this has been urgent (e.g. for a new supercomputer or
> compiler), we have virtually always managed to fix it within 24h.
>
> We don't require this of external code or libraries for optional features,
> of course, but if a library is going to be used for a core Gromacs feature
> (like FFTs, or the XDR functionality) we have made sure there is a built-in
> version where we can take responsibility for the portability ourselves.
>

But I think that "fancy" IO is also an optional feature. I agree that it is
a very important feature and it has many disadvantages if the same format
is not used everywhere. But it is also non-essential. And at that point it
should become a matter of cost-benefit and not a matter of principal. I.e.
how many people benefit from features made possible by HDF5 (e.g. because
limited developer time wouldn't allow them without HDF5) versus how much of
a pain is it to the few people how have to live with XTC (and conversion).
And one very important factor in that cost-benefit analysis is the ratio of
users.


> >
> >
> > > I think the first step is to be concrete about exactly what useful
> features we will get automatically with HDF5 that will be very difficult or
> impossible to implement in our > own format?
> > It is one option to do that research first. We have started it but are
> not finished. Another option is, we first agree on the requirements any
> potential external library has to fulfill. I think we currently don't have
> any guidelines for this.
>
> The route we have taken this far is that you can use pretty much any
> external library, as long as there is Gromacs built-in functionality
> (potentially much slower) to make sure Gromacs always works. For instance,
> we have FFTPACK built-in although we prefer to use FFTW externally.
>
> > Well even if all of TNG would only be available with HDF5, I don't see
> why we would need to have it 100% portable. The XTC compression is not
> orders of magnitudes worse.
>
> 1) Because we want to avoid producing XTC for new trajectories. We want a
> small library that other programs (VMD, PyMol, Babel, etc.) can include to
> be able to read/write these trajectory formats. Case in point: I believe
> there are at least  two orders of magnitude more users that want Gromacs
> file formats they can directly feed into other programs compared to the
> number of users for which it is important with more efficient parallel IO.
>
Parallel IO is not something which would benefit from HDF5. The advantages
I see (as I said a preliminary list because we are in the middle of looking
into it):
- User can use HDF5 tools to look at files
- User can write easy scripts (simple binding to scripting languages) to
analyze data
- No large extra code base for container format in Gromacs
(maintainability)
- Less developer time (we can focus on MD specific code and not how to
implement a container format)
- Easy to implement convenience features (like adding structure and
topology to trajectory; combining trajectory, energies, and extra output
(e.g. pull) and thus having (optionally) only one output file; having build
in history; making files user extendable)
- Trajectory easy to index and seek (e.g. for parallel analysis)

Obviously one could either keep the own container format very simple, and
thus would miss most convince features, tools, and language bindings, or
one could make it very complex, and thus require a lot of developer time
and have the maintainability issues. But without an external library one
cannot have both.

2) Because it would introduce internal incompatibilities in Gromacs. If the
> default file format isn't available on all platforms we would be back in
> the situation before we had XTC available everywhere, and Gromacs compiled
> on host B would not always be able to read the files produced on host A
> with the very same Gromacs version.
>
But what is the problem if all non exotic systems would support the new
file format. Then one can still guarantee that the analysis can be done in
the new format. And the only thing, the (very few users) of exotic
platforms have to do, is to convert the XTC format to the new format (post
production) to avoid any problems.

Roland

>
>
>
>
>
>
>


-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120417/09c0ad85/attachment.html>


More information about the gromacs.org_gmx-developers mailing list