[gmx-developers] mpi and thread command line options

Szilárd Páll szilard.pall at cbr.su.se
Wed Jul 11 19:27:26 CEST 2012


On Wed, Jul 11, 2012 at 6:51 PM, Christoph Junghans <junghans at votca.org>wrote:

> 2012/7/11 Roland Schulz <roland at utk.edu>:
> > On Wed, Jul 11, 2012 at 7:19 AM, Alexey Shvetsov
> > <alexxy at omrb.pnpi.spb.ru> wrote:
> >> Roland Schulz писал 2012-07-11 10:47:
> >>> On Wed, Jul 11, 2012 at 2:33 AM, Alexey Shvetsov
> >>> <alexxy at omrb.pnpi.spb.ru> wrote:
> >>>> Hi!
> >>>>
> >>>> mvpich2/mvapich (as well as its derviations like platform
> >>>> mpi,pc-mpi,intel-mpi) also will behave differently. So user can get
> >>>> cryptic message about launching mpd, in case of launching mdrun -np
> >>>> directly
> >>> Not quite. mpich2 requires for MPI_Comm_spawn to work that the
> >>> application is run with mpirun/mpiexec. See
> >>>
> >>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2012-June/012638.html
> >>> for the details. We would need to detect that and don't try to spawn
> >>> in that case (and run in serial with a warning).
> >>> Thus mpich2 would require: mpirun mdrun -np x. Of course that isn't
> >>> more convenient than mpirun -np x mdrun. The only advantage would be
> >>> that as with tmpi "mpirun mdrun" would automatically use as many
> >>> cores
> >>> as are available and are useful for the system, whereas without spawn
> >>> the user needs to decides the number of cores and we can't have any
> >>> automatic mechanism helping the user.
> >>>
> >>> Roland
> >>
> >> Ok. But how this will work with batch systems that automaticaly send
> >> number of processes to mpiexec, effective launch command will be
> >>
> >> $ mpiexec mdrun_mpi $mdrunargs
> >
> > The idea was to only do any spawn if mdrun_mpi is started in serial
> > (mpiexec -n 1). It was only meant to make mdrun_mpi behave the same as
> > tmpi mdrun for a single node. On clusters with batch system nothing
> > would have changed over the current situation.
> 1.) I agree with Roland that keeping the -nt option would be
> misleading, even if -nt gets a new meaning - "number of tasks", where
> tasks can be threads or mpi tasks.
> Also from my experience, half of the users are not aware of the -nt
> option either, they just start mdrun without any special setting,
> which means "guess" and we should keep that.
> So making -nt obsolete is not bad, I think.
>

Without thinking about the "I just want to type mdrun and nothing else"
type of users it would make sense to require a user to use -ntmpi and/or
-ntomp. However, I have the feeling that users could be
quite overwhelmed with the technical nature of the different
parallelization schemes. That's why the convenience-option -nt could be
good to keep.


> 2.) One class of system that has not been discussed yet, are the one
> with different number of OMP threads per node (like the Intel MIC),
> that should also be possible by explicitly defining OMP_NUM_THREADS on
> each node.
>

That you should work both automatically as well as manually already now as
both the detection and run configuration setup happens on a per-process
basis. You won't be able to have different # of OpenMP threads per process
with thread-MPI, though.

I'm tweaking the detection and # of threads code right now, so your
help/feedback would be useful. Do you still have access to such a machine?
If not, emulating this scenario would be possible with running on machines
that have different core count.


> 3.) With all these thread options and combination, we will need
> something like g_tune_mdrun, which I guess could be an extension of
> g_tune_pme.
>

On the ling run it would, but I'd rather see resources channeled into a
more flexible multi-level & task-parallelization setup in which we can
dynamically tune the number of threads per task and per process as well as
the number of processes. However, this is getting off-topic, let's get back
to it after 4.6.

Cheers,
--
Sz.


> Christoph
>
>
>
> >
> > Roland
> >
> >>
> >>
> >>
> >>>
> >>>>
> >>>> Roland Schulz писал 2012-07-11 05:09:
> >>>>> On Tue, Jul 10, 2012 at 8:09 PM, Szilárd Páll
> >>>>> <szilard.pall at cbr.su.se> wrote:
> >>>>>> On Tue, Jul 10, 2012 at 11:15 PM, Berk Hess <hess at kth.se> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> We are working on the final part of the 4.6 release, which is
> >>>>>>> making the MPI
> >>>>>>> and OpenMP thread setup automated, fully checked and user
> >>>>>>> friendly.
> >>>>>>> We have to decide on the naming of the options.
> >>>>>>> Roland has an implementation of mpi spawn ready. This would allow
> >>>>>>> to do
> >>>>>>> mdrun -np #processes instead of using mpirun (at least with
> >>>>>>> openmpi).
> >>>>>>
> >>>>>> Would this feature add anything but the convenience of being able
> >>>>>> to
> >>>>>> run without mpirun on a single node? Without MPI spawning working
> >>>>>> reliably in most cases (or with the ability to detect with a high
> >>>>>> certainty when it does not), enabling an -np mdrun option would
> >>>>>> just
> >>>>>> lead to confusion when mdrun exits with cryptic MPI error due to
> >>>>>> not
> >>>>>> being able to spawn.
> >>>>> The idea was to make mdrun behave the same whether it is compiled
> >>>>> with
> >>>>> real MPI or tMPI. Thus also only support a single node. But MPICH
> >>>>> is
> >>>>> behaving quite stupid and they also don't seem to care. And only
> >>>>> supporting it for OpenMPI is probably also more confusing then
> >>>>> helpful
> >>>>> (then tmpi+OpenMPI would behave the same but MPICH/MVAPICH would
> >>>>> behave different). So you are probably right that it is better to
> >>>>> not
> >>>>> add spawn at all.
> >>>>>
> >>>>>> Therefore, I'd be OK with a new *hidden* -np option that only
> >>>>>> works
> >>>>>> in
> >>>>>> single-node case, but not with a non-hidden one advertised in the
> >>>>>> documentation/wiki.
> >>>>> As a hidden option it would only help for testing. But I don't
> >>>>> think
> >>>>> it is worth adding it for just that.
> >>>>>
> >>>>> Roland
> >>>>
> >>>> --
> >>>> Best Regards,
> >>>> Alexey 'Alexxy' Shvetsov
> >>>> Petersburg Nuclear Physics Institute, NRC Kurchatov Institute,
> >>>> Gatchina, Russia
> >>>> Department of Molecular and Radiation Biophysics
> >>>> Gentoo Team Ru
> >>>> Gentoo Linux Dev
> >>>> mailto:alexxyum at gmail.com
> >>>> mailto:alexxy at gentoo.org
> >>>> mailto:alexxy at omrb.pnpi.spb.ru
> >>>> --
> >>>> gmx-developers mailing list
> >>>> gmx-developers at gromacs.org
> >>>> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> >>>> Please don't post (un)subscribe requests to the list. Use the
> >>>> www interface or send it to gmx-developers-request at gromacs.org.
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> >>> 865-241-1537, ORNL PO BOX 2008 MS6309
> >>
> >> --
> >> Best Regards,
> >> Alexey 'Alexxy' Shvetsov
> >> Petersburg Nuclear Physics Institute, NRC Kurchatov Institute,
> >> Gatchina, Russia
> >> Department of Molecular and Radiation Biophysics
> >> Gentoo Team Ru
> >> Gentoo Linux Dev
> >> mailto:alexxyum at gmail.com
> >> mailto:alexxy at gentoo.org
> >> mailto:alexxy at omrb.pnpi.spb.ru
> >> --
> >> gmx-developers mailing list
> >> gmx-developers at gromacs.org
> >> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> >> Please don't post (un)subscribe requests to the list. Use the
> >> www interface or send it to gmx-developers-request at gromacs.org.
> >>
> >>
> >>
> >
> >
> >
> > --
> > ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> > 865-241-1537, ORNL PO BOX 2008 MS6309
> > --
> > gmx-developers mailing list
> > gmx-developers at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-developers
> > Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-developers-request at gromacs.org.
>
>
>
> --
> Christoph Junghans
> Web: http://www.compphys.de
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20120711/2451a0bd/attachment.html>


More information about the gromacs.org_gmx-developers mailing list