[gmx-developers] Re: flushing files

Wed Oct 13 14:47:14 CEST 2010

On 10/13/2010 02:09 PM, Janne Blomqvist wrote:
> Sander Pronk wrote:
>> On Oct 13, 2010, at 11:23 , Berk Hess wrote:
>>> I just discussed the flushing with Erik.
>>> I forgot the motivation for this, but it was to have only whole
>>> frames disks.
>>> If you don't flush, you'll often have partial frames.
>>> So the options are: or flush every frame or buffer and then flush or
>>> fsync.
>>> For the mpi i/o this is no issue, since we buffer internally, we can
>>> simply
>>> on fsync on write, which should happen at least when checkpointing.
>>>
>>> I guess the only remaining question is if flushing could be slow under
>>> circumstances where we would not want to use the mpi buffered i/o?
>>
>> There are a couple of circumstance that could trigger that: - we're
>> writing to an unbuffered file system (nfs in synchronious mode, for
>> example).
>> - the OS runs out of disk cache (i.e. RAM) and is forced to write out
>> to disk for each write() call. If this happens, there are bigger
>> problems to worry about for the user.
>>
>> the first case could be real (due to an overzealous system
>> administrator). 
>
> For NFSv2, yes. But, NFSv3 introduced the concept of "unstable"
> writes, and a separate COMMIT operation. The server is allowed to
> acknowledge an unstable write as soon as the data is in server memory,
> no need to flush it to nonvolatile storage. The COMMIT operation,
> then, instructs the server to flush any dirty data for the file in
> question to nonvolatile storage. As one can probably guess from the
> above, a fsync() syscall is in practice translated into a bunch of
> unstable writes (i.e. copying dirty data from the client to the
> server) followed by a COMMIT. Alternatively, the client can issue
> writes with the stable bit set, making the COMMIT unnecessary (this is
> then equivalent to how NFSv2 worked back in the day). My guess is that
> in practice these stable writes are used only for files opened with
> O_SYNC or similar.
>
> Now, due to the NFS consistency model, any dirty data must be flushed
> to stable storage when a file is closed on the client, so COMMIT is
> still used even if there are no explicit fsync()'s.
>
> My understanding is that the reason why sync exports still perform
> worse than async ones is that async treats COMMIT's as no-ops.
> Although the performance difference should be much smaller than it was
> with NFSv2.
>
> That being said, I don't really see how an fsync() every 15 minutes
> could be an issue. If calling fsync() for every frame is too
> expensive, couldn't that be worked around e.g. by making the code
> robust enough to not crash on short frames, or maybe include a
> checksum of the frame data in the frame header to guard against
> otherwise corrupted frames? At least for my own usage, losing a few
> frames in the unlikely event of a crash is no big deal, as long as I
> don't lose everything (and checkpoints every 15 min should ensure I
> never lose more than 15 min work, right?).
>
>
fsync every 15 minutes is not the issue here.
The original issue we were discussing is flushing files after writing
each frame.
This is not critical, all Gromacs tools can handle this nicely. But is
nice to have,
since then you avoid warnings about broken frames.

As I see it, there is no real issue.
At low parallelization flushing is not costly.
At high parallelization flushing might be costly, but there we want to
use some parallel,
buffering i/o anyhow, so we don't need to flush.

I think Roland's original question was if we really need to flush.
But we can leave the flushes in the normal code and don't use them in
the parallel i/o code.

Berk