[gmx-developers] Gromacs parallel I/O?
David van der Spoel
spoel at xray.bmc.uu.se
Wed Jul 7 07:14:37 CEST 2010
On 7/7/10 1:57 AM, Roland Schulz wrote:
> On Tue, Jul 6, 2010 at 7:18 PM, Shirts, Michael (mrs5pt)
> <mrs5pt at eservices.virginia.edu <mailto:mrs5pt at eservices.virginia.edu>>
> > BTW: Regarding parallel read of XTC for analysis tools. I suggest
> we add an
> > XTC meta-file to solve the problem of parallel read for XTC. To
> be able to
> > read frames in parallel we need to know the starting positions of
> the frame.
> > Using the bisect search for XTC in parallel will probably give poor
> > performance on most parallel IO systems (small random access IO
> pattern - is
> > what parallel IO systems don't like at all). Using TRR instead
> for parallel
> > analysis is also not such a good idea because even with parallel
> IO several
> > analysis will be IO bound and thus we could benefit from the XTC
> > Thus an XTC file with a meta-file containing the starting
> positions should
> > give the best performance. A separate meta-file instead of adding the
> > positions to the header has the advantage that we don't change
> the current
> > format and thus don't break compatibility with 3rd party softare.
> Having a
> > separate meta-file has the disadvantage of the required
> bookkeeping to make
> > sure that the XTC file and the metafile are up-to date to each
> other, but I
> > think this shouldn't be to difficult to solve. And if a meta-file
> is missing
> > or not up-to date it is possible to generate it on the fly.
> I'm wondering if this is the sort of problem that eventually moving to
> something like netCDF might help solve. Clearly, it would be a
> move, and would require interconversion utilities for backward
> I looked into this. The compression of XTC is very good. And good
> compression is important if you want to have a good IO rate (of the
> uncompressed data). NetCDF3 doesn't support compressions (there are
> unsupported extensions). HDF5/NetCDF4 support compression but
> only parallel read of compressed data not parallel write of compressed
> data. Also the zlib compression would have a significantly lower
> compression ration than the XTC compression does.
> Thus none would do by itself all we would like to do. Of course one
> could do the XTC compression within a NetCDF/HDF5 container, but I don't
> see how this would help anyone. Without the full required support for
> compression the only other advantage I could see in moving to
> NetCDF/HDF5 is that is easier for others to program readers/writers (is
> already very easy since the library xdrfile has been released). And if
> we have our custom compression within NetCDF/HDF5 than reading those
> files wouldn't be any easier than reading/writing current XTC files.
> Without compression we could as well use TRR. Writing a parallel
> reader/writer for that is dead simple (since the position of each frame
> is known from the number of atoms).
A person here at UU (Daniel Spångberg) has developed a new trajectory
library (TNG - trajectory next generation). We are about to submit a
paper about it. Key advantages over xtc:
- slightly better compression (slightly slower in the best form, but
algorithm is tunable)
- support for velocities
- support for additional information (e.g. atom names) in one or more frames
- random search supported without binary search
- parallel compression
- open source
This will provide a very good basis for parallel trajectory I/O.
The main problem for parallel I/O is management of atom numbers in a
domain decomposition setup. If atoms drift to another processor over
time this will imply that bookkeeping has to deal with this, in
particular when assembling the trajectories later for analysis.
> Michael Shirts
> Assistant Professor
> Department of Chemical Engineering
> University of Virginia
> michael.shirts at virginia.edu <mailto:michael.shirts at virginia.edu>
> gmx-developers mailing list
> gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org
> <mailto:gmx-developers-request at gromacs.org>.
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov <http://cmb.ornl.gov>
> 865-241-1537, ORNL PO BOX 2008 MS6309
David van der Spoel, PhD, Professor of Biology
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596, 75124 Uppsala, Sweden
phone: 46 18 471 4205 fax: 46 18 511 755
spoel at xray.bmc.uu.se spoel at gromacs.org http://folding.bmc.uu.se
More information about the gromacs.org_gmx-developers