[gmx-developers] Re: TNG format in Gromacs

Erik Lindahl erik at kth.se
Tue Apr 17 15:48:17 CEST 2012


On Apr 17, 2012, at 3:18 PM, Roland Schulz wrote:

> E.g. for parallelization the issue is very similar as it is for portability. Supporting domain decomposition makes it more difficult for everyone and everyone has to make sure that they don't brake it. And it is only included because it essential to Gromacs and used by almost everyone.

Right - and that's of course something we don't want to push down just on the few people working with parallelization :-) We don't have automated tests for it yet, but when we have more functional tests the idea is that we should automatically reject patches that break parallel runs!
> 
> 
> This is only an idea but not a practical rule. It is impossible to write any OS depending code (e.g. file operations) with that requirement. E.g. futil.c only works if the OS is either POSIX or Windows. And also this rule is unclear about how it is implemented/tested. Is it enough to show that a strict inheritance to the standard is followed (e.g. gcc -pendantic) or does it require to test all possible compilers (impossible). 

It is a rule in the sense that we will accept bug reports and work to fix any portability issues on such platforms. This is quite different from saying we don't care about platforms not included on a compatibility list.

By the way, this is also a difference:

futil.c: 	1132 lines, 28811 characters, 32 short routines.		

There are roughly a dozen OS-dependent #ifdefs in this file. The second somebody needs to run Gromacs on a non-POSIX, non-windows platform, I can pretty much guarantee that we will get it working on that platform within 24h.

I simply don't buy the argument that just because these 1132 lines are not perfect (they obviously aren't) portability doesn't matter at all and we might as well include 10 megabytes of additional source code where we have no control of the portability.

> 
> But I think that "fancy" IO is also an optional feature. I agree that it is a very important feature and it has many disadvantages if the same format is not used everywhere. But it is also non-essential. And at that point it should become a matter of cost-benefit and not a matter of principal. I.e. how many people benefit from features made possible by HDF5 (e.g. because limited developer time wouldn't allow them without HDF5) versus how much of a pain is it to the few people how have to live with XTC (and conversion). And one very important factor in that cost-benefit analysis is the ratio of users. 

But now you are moving the goal-posts!  The aim of the present TNG-based project was NOT "fancy" IO, but a new default simple portable Gromacs trajectory format that (1) includes headers for atom names and stuff, (2) is a small free library that can easily be contributed to other codes so they can read/write our files, and (3) enable better compression.

It would of course be nice if this format also allowed efficient parallel IO and advanced slicing, but that has never been the primary goal of the file format project, in particular not if it starts to come in conflict with the aims above.


Having said that, we just discussed things here in the lab, and one alternative could be to have a simple built-in HDF5 implementation that can write correct headers for 1-3 dimensional arrays so our normal files are HDF5-compliant when written on a single node. This should be possible to do in ~100k of source code. If there is no external HDF5 library present, this will be the only alternative supported, and you will not be able to use e.g. parallel IO - but the file format will work.

The caveat is what happens to the physical file format when HDF5 writes parallel IO? Will this result in a file with different properties that is difficult for us to read with a naive implementation? In that case we will probably have to use a different extension for the parallel IO files to make clear that they are less portable - but we certainly won't mind having that as an optional feature.


Cheers,

Erik








More information about the gromacs.org_gmx-developers mailing list