[gmx-developers] Collective IO

Roland Schulz roland at utk.edu
Fri Oct 1 08:57:50 CEST 2010


On Thu, Sep 30, 2010 at 9:19 PM, Mark Abraham <mark.abraham at anu.edu.au>wrote:

>
>
> ----- Original Message -----
> From: Roland Schulz <roland at utk.edu>
> Date: Friday, October 1, 2010 9:04
> Subject: Re: [gmx-developers] Collective IO
> To: Discussion list for GROMACS development <gmx-developers at gromacs.org>
>
> >
> >
> > On Thu, Sep 30, 2010 at 6:21 PM, Szilárd Páll <szilard.pall at cbr.su.se>wrote:
>
>> > Hi Roland,
>> >
>> > Nice work, I'll definitely take a look at it!
>> >
>> > Any idea on how does this improve scaling in general and at what
>> > problem size starts to really matter? Does it introduce and overhead
>> > in smaller simulations or it is only conditionally turned on?
>>
> >
> > At the moment it is always turned on for XTC when compiled with MPI. In
> serial or with threads nothing changes. At the moment we buffer at maximum
> 100 frames. If one uses less than 100 PP nodes than we buffer as many frames
> as the number of PP nodes. We also make sure that we don't buffer more than
> 250MB per node.
> >
> > The 100 frames and 250MB are both constants which should probably still
> be tuned.
>
> Indeed - and the user should be able to tune them, too. They won't want to
> exceed their available physical memory, since buffering frames to virtual
> memory (if any) loses any gains from collective I/O.
>
Honestly we hadn't thought much about the 250MB limit. We first wanted to
get feedback on the approach and the code before doing more benchmarks and
tuning these parameters. It is very likely that their are no cases which
benefit from using more than 2MB per MPI process.

In case we limit the memory usage to 2MB should we still make it
configurable? I think adding to many mdrun option gets confusing. Should we
make the number of buffered frames a hidden mdrun option or
an environment variable (the default would be that the number is
auto-tuned)?

Roland


> Mark
>
> > If the IO time is negligible than of course you want see any speedup.
> The IO time shows mostly up as comm. energies (because only 1 node is in
> write_traj and the others are waiting at the next comm. energies). Usually
> the IO time because important if you either have more than 100 nodes and/or
> you write very frequently.
> >
> > It should not be slower in any case. The only case might be when one
> writes from many IO nodes to a non parallel file-system (e.g. NFS) that the
> parallel IO could be slower. We will test this tomorrow. Otherwise it should
> always be as fast or faster.
> >
> > Ryan & Roland
> >
>
>>  >
>> >
>> > On Fri, Oct 1, 2010 at 12:11 AM, Roland Schulz <roland at utk.edu> wrote:
>> > > Hi,
>> > > we (Ryan&me) just uploaded our work on buffered MPI writing of XTC
>> > > trajectories. It can be found in the branch CollectiveIO.
>> > > We buffer a number of frames and use MPI IO to write those frames
>> from a
>> > > number of nodes (see previous mails for details). The
>> XTC trajectory is
>> > > written at least at every checkpoint guaranteeing that no frames are
>> lost if
>> > > a simulation crashes.
>> > > We have tested it in serial, with PME, with threads, with multi and
>> it seems
>> > > to work in all cases.
>> > > For 3 million atoms on 8192 writing every 1000 steps the performance
>> is
>> > > increased from 21ns/day to 34ns/day and the time spent in comm.
>> > > energies decreases from 47% to 7%.
>> > > Feedback on the code change is very welcome. If you want to look at
>> the
>> > > diff, I suggest to use:
>> > > git difftool afd66e48c4e608    #this is the origin/master from when
>> we
>> > > uploaded the branch
>> > > Please let us know what you would like us to change before we merge
>> this
>> > > into master.
>> > > Thanks
>> > > Ryan & Roland
>> > > --
>> > > ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
>> > > 865-241-1537, ORNL PO BOX 2008 MS6309
>> > >
>> > > --
>> > > gmx-developers mailing list
>> > > gmx-developers at gromacs.org
>> > > http://lists.gromacs.org/mailman/listinfo/gmx-developers
>> > > Please don't post (un)subscribe requests to the list. Use the
>> > > www interface or send it to gmx-developers-request at gromacs.org.
>> > >
>> > --
>> > gmx-developers mailing list
>> > gmx-developers at gromacs.org
>> > http://lists.gromacs.org/mailman/listinfo/gmx-developers<http://lists.gromacs.org/mailman/listinfo/gmx-developers>
>> > Please don't post (un)subscribe requests to the list. Use the
>> > www interface or send it to gmx-developers-request at gromacs.org.
>>
> >
> >
> >
> > --
> > ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> > 865-241-1537, ORNL PO BOX 2008 MS6309
>  > --
> > gmx-developers mailing list
> > gmx-developers at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-developers
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-developers-request at gromacs.org.
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>



-- 
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20101001/0a4c0f70/attachment.html>


More information about the gromacs.org_gmx-developers mailing list