[gmx-developers] Re: flushing files

Wed Oct 13 09:41:05 CEST 2010

> On Wed, Oct 13, 2010 at 3:16 AM, Sander Pronk <pronk at cbr.su.se> wrote:
>
>>
>>> The difference is that an IO thread would virtually never run though;
>>> it
>>> would instantly block waiting for the filesystem, and in the mean time
>>> the
>>> real threads would get control back?
>>>
>> Yes but you can only have one IO thread per file (otherwise
>> the synchronization becomes quite difficult). Thus if the overhead is
>> larger
>> than the time between writes than your are still waiting. The time for
>> MPI_File_sync can be * extremely* long (compared to fflush).
>>
>>
>> There is already quite a bit of code dealing with files and threads. We
>> should be add a single syncing thread without too much effort. Is there
>> any
>> danger of running out of space for threads? (they each have a stack,
>> etc).
>>
>> BTW MPI_File_sync is not a collective call, right?
>>
> It is a collective call for all MPI ranks involved in this file handle. In
> the CollectiveIO branch we have a number of nodes writing the XTC file (we
> haven't yet implemented it for any other file). Thus the sync for the XTC
> file has to be called by all those ranks.
>
> Roland

I don't see a good reason for flushing after every frame.
It is nice to terminate immediately when you notice a frame can not
be flushed. But it does not matter much if we do this immediately
or within 15 minutes.
I guess the original reasoning for this was that we never checked,
so you could continue running uselessly for a whole day.
As long as we checkpoint, and thus fsync, often enough, it don't
see any disadvantages, except for losing up to 15 minutes of run time
when the file system is unreachable or full.

Berk