Re: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP

hess@kth.se hess at kth.se
Mon Apr 29 07:52:05 CEST 2013


Only OpenMP ALONE can be significantly faster than MPI. This you can, obviously, only do one a single node. On a single node, mixed MPI + OpenMP can also be faster than MPI alone, but probably not on old hardware. But you say you are using two nodes, in which case MPI alone is nearly always fasyer, except at the scaling limit.

cheers,

Berk

----- Reply message -----
From: "Jochen Hub" <jhub at gwdg.de>
To: "Discussion list for GROMACS development" <gmx-developers at gromacs.org>
Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
Date: Sun, Apr 28, 2013 11:06




Am 4/25/13 2:09 PM, schrieb Berk Hess:
> Hi,
>
> This question is of general use, so it should have been posted to
> gmx-users.
>
> Szilard correctly me that sometimes hybrid parallelization can actually
> be faster,
> especially with Intel.

Do you (or does Sziliard) recall on which Intel machines you observed 
OpenMP threads that are faster than MPI processes? I ask since the 
OpenMP parallelization was 34% (!) slower on the somewhat outdated 
Harpertowns that I tested. (see my past post, 100k Atoms on 2 8-core nodes).

Cheers,
Jochen


>
> I will make add a table with the possible parallelization combinations
> to the acc.+par. page,
> with links to a benchmark page, where we put up some comparisons.
> To do this, we need the force calculation order patch, currently waiting
> in gerrit,
> which will improve the performance without PME nodes.
>
> Cheers,
>
> Berk
>
> On 04/25/2013 10:03 AM, Jochen Hub wrote:
>> Many thanks, Berk, for clarifying this.
>>
>> Cheers,
>> Jochen
>>
>> Am 4/25/13 9:52 AM, schrieb hess at kth.se:
>>> Hi,
>>>
>>> It allows for further scaling, when the domain decomposition is limiting
>>> the number of MPI ranks.
>>> It can be faster, especially on hundreds of cores.
>>> We need it with GPUs.
>>> OpenMP alone can be significantly faster than MPI alone.
>>>
>>> Cheers,
>>>
>>> Berk
>>>
>>>
>>> ----- Reply message -----
>>> From: "Erik Marklund" <erikm at xray.bmc.uu.se>
>>> To: "Discussion list for GROMACS development"
>>> <gmx-developers at gromacs.org>
>>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>>> Date: Thu, Apr 25, 2013 09:47
>>>
>>>
>>> Hi,
>>>
>>> Please remind me why we allow for mixed OpenMP+MPI even though it is
>>> always slower. It ought to be more complicated to maintain code that
>>> allows such mixing.
>>>
>>> Best,
>>> Erik
>>>
>>> On 25 Apr 2013, at 09:43, "hess at kth.se <mailto:hess at kth.se>"
>>> <hess at kth.se <mailto:hess at kth.se>> wrote:
>>>
>>>> Hi
>>>>
>>>> Yes, that is expected.
>>>> Combined MPI+ OpenMP is always slower than either of the two, except
>>>> close to the scaling limit.
>>>> Two OpenMP threads give the least overhead, especially with
>>>> hyperthreading. Although turning of hyperthreading is then probably
>>>> faster.
>>>>
>>>> Cheers,
>>>>
>>>> Berk
>>>>
>>>>
>>>> ----- Reply message -----
>>>> From: "Jochen Hub" <jhub at gwdg.de <mailto:jhub at gwdg.de>>
>>>> To: "Discussion list for GROMACS development"
>>>> <gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>>>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>>>> Date: Thu, Apr 25, 2013 09:37
>>>>
>>>>
>>>>
>>>> Am 4/24/13 9:53 PM, schrieb Mark Abraham:
>>>> > I suspect -np 2 is not starting a process on each node like I suspect
>>>> > you think it should, because all the symptoms are consistent with
>>>> that.
>>>> > Possibly the Host field in the .log file output is diagnostic here.
>>>> > Check how your your MPI configuration works.
>>>>
>>>> I fixed the issue with the mpi call. I make sure, that only one MPI
>>>> process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The
>>>> oversubscription warning does not appear, so everything seems fine.
>>>>
>>>> However, the performance is quite poor with MPI/OpenMP. Example:
>>>>
>>>> (100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)
>>>>
>>>> 16 MPI processes: 6.8 ns/day
>>>> 2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day
>>>> 4 MPI / 4 OpenMP each does not improve things.
>>>>
>>>> I use an icc13, and I tried different MPI implementations (Mvapich 1.8,
>>>> openmpi 1.33)
>>>>
>>>> Is that expected?
>>>>
>>>> Many thanks,
>>>> Jochen
>>>>
>>>
>>>
>>>
>>
>

-- 
---------------------------------------------------
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---------------------------------------------------
-- 
gmx-developers mailing list
gmx-developers at gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-developers
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20130429/56bd8caa/attachment.html>


More information about the gromacs.org_gmx-developers mailing list