[gmx-developers] Collective IO

Fri Oct 1 09:35:36 CEST 2010

----- Original Message -----
From: Roland Schulz <roland at utk.edu>
Date: Friday, October 1, 2010 16:58
Subject: Re: [gmx-developers] Collective IO
To: Discussion list for GROMACS development <gmx-developers at gromacs.org>

> 
> 
> On Thu, Sep 30, 2010 at 9:19 PM, Mark Abraham <mark.abraham at anu.edu.au> wrote:
  > 
> 
> ----- Original Message -----
> From: Roland Schulz <roland at utk.edu>
> Date: Friday, October 1, 2010 9:04
> Subject: Re: [gmx-developers] Collective IO
>   To: Discussion list for GROMACS development <gmx-developers at gromacs.org>
> 
> > 
  > > 
> > On Thu, Sep 30, 2010 at 6:21 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:
  > >   Hi Roland,
   > > 
> >  Nice work, I'll definitely take a look at it!
   > > 
> >  Any idea on how does this improve scaling in general and at what
  > >  problem size starts to really matter? Does it introduce and overhead
> >  in smaller simulations or it is only conditionally turned on?
  > > 
> > At the moment it is always turned on for XTC when compiled with MPI. In serial or with threads nothing changes. At the moment we buffer at maximum 100 frames. If one uses less than 100 PP nodes than we buffer as many frames as the number of PP nodes. We also make sure that we don't buffer more than 250MB per node.    > > 
> > The 100 frames and 250MB are both constants which should probably still be tuned.
  > 
> Indeed - and the user should be able to tune them, too. They won't want to exceed their available physical memory, since buffering frames to virtual memory (if any) loses any gains from collective I/O.
  > Honestly we hadn't thought much about the 250MB limit. We first wanted to get feedback on the approach and the code before doing more benchmarks and tuning these parameters. It is very likely that their are no cases which benefit from using more than 2MB per MPI process.  > 
> In case we limit the memory usage to 2MB should we still make it configurable? I think adding to many mdrun option gets confusing. Should we make the number of buffered frames a hidden mdrun option or an environment variable (the default would be that the number is auto-tuned)?

Hmmm. 2MB feels like quite a low lower bound. Collective I/O requires of the order of several MB per process per operation to be worthwhile. OTOH you don't want to buffer excessively, because that loses more when hardware crashes occur. You do have the checkpoint interval as another upper bound, so that's probably fine. 250MB concerned me, because the BlueGene cpus have up to about 1GB per cpu...

I think a hidden option is probably best.

Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20101001/297fe399/attachment.html>