[gmx-developers] Gromacs parallel I/O?

David van der Spoel spoel at xray.bmc.uu.se
Wed Jul 7 07:14:37 CEST 2010

On 7/7/10 1:57 AM, Roland Schulz wrote:
> On Tue, Jul 6, 2010 at 7:18 PM, Shirts, Michael (mrs5pt)
> <mrs5pt at eservices.virginia.edu <mailto:mrs5pt at eservices.virginia.edu>>
> wrote:
>      > BTW: Regarding parallel read of XTC for analysis tools. I suggest
>     we add an
>      > XTC meta-file to solve the problem of parallel read for XTC. To
>     be able to
>      > read frames in parallel we need to know the starting positions of
>     the frame.
>      > Using the bisect search for XTC in parallel will probably give  poor
>      > performance on most parallel IO systems (small random access IO
>     pattern - is
>      > what parallel IO systems don't like at all). Using TRR instead
>     for parallel
>      > analysis is also not such a good idea because even with parallel
>     IO several
>      > analysis will be IO bound and thus we could benefit from the XTC
>     compression.
>      > Thus an XTC file with a meta-file containing the starting
>     positions should
>      > give the best performance. A separate meta-file instead of adding the
>      > positions to the header has the advantage that we don't change
>     the current
>      > format and thus don't break compatibility with 3rd party softare.
>     Having a
>      > separate meta-file has the disadvantage of the required
>     bookkeeping to make
>      > sure that the XTC file and the metafile are up-to date to each
>     other, but I
>      > think this shouldn't be to difficult to solve. And if a meta-file
>     is missing
>      > or not up-to date it is possible to generate it on the fly.
>     I'm wondering if this is the sort of problem that eventually moving to
>     something like netCDF might help solve.  Clearly, it would be a
>     difficult
>     move, and would require interconversion utilities for backward
>     compatibility.
> I looked into this. The compression of XTC is very good. And good
> compression is important if you want to have a good IO rate (of the
> uncompressed data). NetCDF3 doesn't support compressions (there are
> unsupported extensions). HDF5/NetCDF4 support compression but
> only parallel read of compressed data not parallel write of compressed
> data. Also the zlib compression would have a significantly lower
> compression ration than the XTC compression does.
> Thus none would do by itself all we would like to do. Of course one
> could do the XTC compression within a NetCDF/HDF5 container, but I don't
> see how this would help anyone. Without the full required support for
> compression the only other advantage I could see in moving to
> NetCDF/HDF5 is that is easier for others to program readers/writers (is
> already very easy since the library xdrfile has been released). And if
> we have our custom compression within NetCDF/HDF5 than reading those
> files wouldn't be any easier than reading/writing current XTC files.
> Without compression we could as well use TRR. Writing a parallel
> reader/writer for that is dead simple (since the position of each frame
> is known from the number of atoms).

A person here at UU (Daniel Spångberg) has developed a new trajectory 
library (TNG - trajectory next generation). We are about to submit a 
paper about it. Key advantages over xtc:

- slightly better compression (slightly slower in the best form, but 
algorithm is tunable)
- support for velocities
- support for additional information (e.g. atom names) in one or more frames
- random search supported without binary search
- parallel compression
- open source

This will provide a very good basis for parallel trajectory I/O.

The main problem for parallel I/O is management of atom numbers in a 
domain decomposition setup. If atoms drift to another processor over 
time this will imply that bookkeeping has to deal with this, in 
particular when assembling the trajectories later for analysis.

> Roland
>     Best,
>     ~~~~~~~~~~~~
>     Michael Shirts
>     Assistant Professor
>     Department of Chemical Engineering
>     University of Virginia
>     michael.shirts at virginia.edu <mailto:michael.shirts at virginia.edu>
>     (434)-243-1821
>     --
>     gmx-developers mailing list
>     gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
>     http://lists.gromacs.org/mailman/listinfo/gmx-developers
>     Please don't post (un)subscribe requests to the list. Use the
>     www interface or send it to gmx-developers-request at gromacs.org
>     <mailto:gmx-developers-request at gromacs.org>.
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
> 865-241-1537, ORNL PO BOX 2008 MS6309

David van der Spoel, PhD, Professor of Biology
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596,  	75124 Uppsala, Sweden
phone:	46 18 471 4205		fax: 46 18 511 755
spoel at xray.bmc.uu.se	spoel at gromacs.org   http://folding.bmc.uu.se

More information about the gromacs.org_gmx-developers mailing list