[gmx-developers] Re: flushing files

Wed Oct 13 14:09:15 CEST 2010

Sander Pronk wrote:
> On Oct 13, 2010, at 11:23 , Berk Hess wrote:
>> I just discussed the flushing with Erik.
>> I forgot the motivation for this, but it was to have only whole frames disks.
>> If you don't flush, you'll often have partial frames.
>> So the options are: or flush every frame or buffer and then flush or fsync.
>> For the mpi i/o this is no issue, since we buffer internally, we can simply
>> on fsync on write, which should happen at least when checkpointing.
>>
>> I guess the only remaining question is if flushing could be slow under
>> circumstances where we would not want to use the mpi buffered i/o?
> 
> There are a couple of circumstance that could trigger that: 
> - we're writing to an unbuffered file system (nfs in synchronious mode, for example).
> - the OS runs out of disk cache (i.e. RAM) and is forced to write out to disk for each write() call. If this happens, there are bigger problems to worry about for the user.
> 
> the first case could be real (due to an overzealous system administrator). 

For NFSv2, yes. But, NFSv3 introduced the concept of "unstable" writes, 
and a separate COMMIT operation. The server is allowed to acknowledge an 
unstable write as soon as the data is in server memory, no need to flush 
it to nonvolatile storage. The COMMIT operation, then, instructs the 
server to flush any dirty data for the file in question to nonvolatile 
storage. As one can probably guess from the above, a fsync() syscall is 
in practice translated into a bunch of unstable writes (i.e. copying 
dirty data from the client to the server) followed by a COMMIT. 
Alternatively, the client can issue writes with the stable bit set, 
making the COMMIT unnecessary (this is then equivalent to how NFSv2 
worked back in the day). My guess is that in practice these stable 
writes are used only for files opened with O_SYNC or similar.

Now, due to the NFS consistency model, any dirty data must be flushed to 
stable storage when a file is closed on the client, so COMMIT is still 
used even if there are no explicit fsync()'s.

My understanding is that the reason why sync exports still perform worse 
than async ones is that async treats COMMIT's as no-ops. Although the 
performance difference should be much smaller than it was with NFSv2.

That being said, I don't really see how an fsync() every 15 minutes 
could be an issue. If calling fsync() for every frame is too expensive, 
couldn't that be worked around e.g. by making the code robust enough to 
not crash on short frames, or maybe include a checksum of the frame data 
in the frame header to guard against otherwise corrupted frames? At 
least for my own usage, losing a few frames in the unlikely event of a 
crash is no big deal, as long as I don't lose everything (and 
checkpoints every 15 min should ensure I never lose more than 15 min 
work, right?).

-- 
Janne Blomqvist