[gmx-developers] Shall we ditch gro and g96 files?

Berk Hess hessb at mpip-mainz.mpg.de
Tue Apr 1 15:28:50 CEST 2008


Erik Lindahl wrote:

> Hi,
> On Apr 1, 2008, at 1:26 PM, Mathias PUETZ wrote:
>
>>
>> Espresso file format, like most other ascii formats has a serious 
>> problem,
>> if you worry about parallel IO: They are hardly parallelizable.
>> Even though IO may not be a problem today, given time, it will,
>> as simulated systems and number of compute nodes get larger
>> and serial CPU power for ASCII formatting on rank 0 no longer scales.
>> I would seriously recommend to consider John's suggestion and go for 
>> HDF5,
>> which parallelizes well and offers high flexibility.
>> For those who want the simplicity of Fortran array IO, I would rather 
>> spend
>> a bit of extra effort to develop a comfortable reader tool, that can 
>> extract ascii
>> readable data for those who don't want the complexity of having to deal
>> with HDF5 directly (although HDF5 comes with it's own flexible ascii 
>> readers
>> which might be sufficientfor most tasks). 
>
>
> We actually considered NetCDF a long time ago, but at that time we 
> decided against it since HDF5 was coming, but was too new/unstable 
> then :-)
>
> I think a lot of people (including me...) like to be able to do 
> "simple" coordinate manipulation through scripts that just grep/awk 
> for atom names, but I like Mathias suggestion of having a separate 
> tool to translate back/forth instead, and keep the "core" format HDF5.

I guess we would want two different formats.
One for trajectories and one for simple configurtion files.

David started this discussion with pdb, gro and g96, which are (at least 
in Gromacs)
mainly used for single configurations which are manipulated before and 
after runs
and not really used during runs.
I think for this purpose we want a plain ascii format that is somewhat 
flexible and extendible,
but does necessarily need to be easily parallelizable or small.

For real trajectory files we want something that parallelizes.

Berk.

>
> The only thing that worries me (just a little bit :-) is that it would 
> make us entirely dependent on a big external library. I know that HDF5 
> is _very_ portable, but at least in theory we could end up in a 
> situation where Gromacs doesn't work on some obscure platform e.g. 
> because there's a compiler bug affecting HDF5.  
>
> Of course, that might be a reasonable compromise, but since I ended up 
> doing my own implementation of the Unix external data representation 
> (XDR) when we first ported Gromacs to windows I've toyed around with 
> the idea of having some minimal built-in HDF5-generating code as a 
> backup...
>
> Mathias/John, do you or anybody else have any experience from using 
> HDF5 for development? Have there been different library versions that 
> you need to install, or do packages usually include their own copy of 
> the library?
>
> Cheers,
>
> Erik
>



More information about the gromacs.org_gmx-developers mailing list