[gmx-developers] MPI Datatypes and TMPI

Wed Apr 4 15:49:15 CEST 2012

On Wed, Apr 4, 2012 at 3:31 PM, Berk Hess <hess at kth.se> wrote:
> On 04/04/2012 03:22 PM, Roland Schulz wrote:
>
> Hi,
>
> we are looking how to best tackle the issue of packing data for
> MPI communication in 5.0. We believe this is an important issue because the
> two current approaches (serializing in serial buffer (e.g. global_stat) or
> sending in many small messages (e.g. initial bcast)) are both slow and
> produces difficult to read code code (global_stat). The many
> small messages is for large systems a serious issue for scaling and even the
> serializing is unnecessary slow because it means
> a potentially unnecessary copy.
>
> We first looked into Boost::MPI but while it is very nice in some aspects,
> it also has disadvantages. Thus we're looking at alternatives.
> Most interesting alternatives use MPI Datatypes to get high performance and
> avoid the unnecessary copy of serialization. The problem is that TMPI
> doesn't support MPI Datatypes.
>
> Thus my question: Is it planned to add Datatypes to TMPI? If not is TMPI
> still required in 5.0? Would it be sufficient to support OpenMP for non-MPI
> multi-core installations in 5.0? What was the reason for TMPI in the first
> place? Why did we not just bundle e.g. OpenMPI for those users missing an
> MPI implementation?
>
> Nothing is planned, but if it isn't much work, Sander might do it.
>
> For the old code path/kernels we still like to have TMPI, as this makes it
> far easier for normal users to get maximum performance.
> For the new Verlet scheme OpenMP seems to do very well, but with multiple
> CPUs and/or GPUs TMPI is still very nice,
> as it makes configuring and starting runs trivial.

Right, it would be *very* advantageous to keep tMPI support as on
multi-socket machines we will most probably need multi-level
parallelism with the added benefit of data locality of MPI-style
higher-level parallelization. This is unless we manage to get in 5.0 a
full-featured efficient task-parallelization with hardware locality
awareness which might make using multi-threading
only withing a node a feasible option.

--
Szilárd

> Cheers,
>
> Berk
>
>
> Roland
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
>
>
>
>
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.