[gmx-developers] Re: flushing files
roland at utk.edu
Wed Oct 13 10:07:06 CEST 2010
On Wed, Oct 13, 2010 at 3:35 AM, Erik Lindahl <lindahl at cbr.su.se> wrote:
> On Oct 13, 2010, at 9:23 AM, Roland Schulz wrote:
> This is not what we are doing at the moment. At the moment (flush after
> frame, sync after checkpoint) it is possible that the trajectory is broken.
> But the check-pointing append feature guarantees that it automatically fixes
> it. I like the approach of fast writing + automatic fix in the worst case
> better than having to guarantee that it is always correct from the
> beginning. Also it would be extremely difficult to guarantee it for all
> cases (e.g. for the case of a crash during writing of a frame).
> Yes, but that's a huge difference: Presently you might get broken frames if
> your simulation crashes. If you are on a file system that never flushes to
> disk with fflush() you won't get frames on the frontend, but at least they
> aren't broken.
I think, it is also currently possible (but unlikely) that the trajectory
appears broken. While a frame is written it is possible (I'm pretty sure
I encountered that before). But I see the point that we at least want to
make it as unlikely (most of the time it is not currently writing) as
possible without affecting the performance.
This might actually not be a problem with MPI-IO because we buffer the whole
frame in memory and then have one MPI_File_write call for the whole frame
(or more precise a MPI_File_write_ordered for a couple of frame). Thus
because we always write a whole frame in one go it should not be an issue.
We'll test to make sure.
If it is still an issue we can buffer more frames to not cause a performance
problem with MPI_File_sync after each write.
Independent of my original question and the CollectiveIO work, we might want
to make sure that we guarantee to fsync every 15min, even when we don't
checkpoint or only checkpoint infrequent. This might be a fix we want to add
to the release branch.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers