[gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP

Thu Apr 25 14:09:10 CEST 2013

Hi,

This question is of general use, so it should have been posted to gmx-users.

Szilard correctly me that sometimes hybrid parallelization can actually 
be faster,
especially with Intel.

I will make add a table with the possible parallelization combinations 
to the acc.+par. page,
with links to a benchmark page, where we put up some comparisons.
To do this, we need the force calculation order patch, currently waiting 
in gerrit,
which will improve the performance without PME nodes.

Cheers,

Berk

On 04/25/2013 10:03 AM, Jochen Hub wrote:
> Many thanks, Berk, for clarifying this.
>
> Cheers,
> Jochen
>
> Am 4/25/13 9:52 AM, schrieb hess at kth.se:
>> Hi,
>>
>> It allows for further scaling, when the domain decomposition is limiting
>> the number of MPI ranks.
>> It can be faster, especially on hundreds of cores.
>> We need it with GPUs.
>> OpenMP alone can be significantly faster than MPI alone.
>>
>> Cheers,
>>
>> Berk
>>
>>
>> ----- Reply message -----
>> From: "Erik Marklund" <erikm at xray.bmc.uu.se>
>> To: "Discussion list for GROMACS development" 
>> <gmx-developers at gromacs.org>
>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>> Date: Thu, Apr 25, 2013 09:47
>>
>>
>> Hi,
>>
>> Please remind me why we allow for mixed OpenMP+MPI even though it is
>> always slower. It ought to be more complicated to maintain code that
>> allows such mixing.
>>
>> Best,
>> Erik
>>
>> On 25 Apr 2013, at 09:43, "hess at kth.se <mailto:hess at kth.se>"
>> <hess at kth.se <mailto:hess at kth.se>> wrote:
>>
>>> Hi
>>>
>>> Yes, that is expected.
>>> Combined MPI+ OpenMP is always slower than either of the two, except
>>> close to the scaling limit.
>>> Two OpenMP threads give the least overhead, especially with
>>> hyperthreading. Although turning of hyperthreading is then probably
>>> faster.
>>>
>>> Cheers,
>>>
>>> Berk
>>>
>>>
>>> ----- Reply message -----
>>> From: "Jochen Hub" <jhub at gwdg.de <mailto:jhub at gwdg.de>>
>>> To: "Discussion list for GROMACS development"
>>> <gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>>> Date: Thu, Apr 25, 2013 09:37
>>>
>>>
>>>
>>> Am 4/24/13 9:53 PM, schrieb Mark Abraham:
>>> > I suspect -np 2 is not starting a process on each node like I suspect
>>> > you think it should, because all the symptoms are consistent with 
>>> that.
>>> > Possibly the Host field in the .log file output is diagnostic here.
>>> > Check how your your MPI configuration works.
>>>
>>> I fixed the issue with the mpi call. I make sure, that only one MPI
>>> process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The
>>> oversubscription warning does not appear, so everything seems fine.
>>>
>>> However, the performance is quite poor with MPI/OpenMP. Example:
>>>
>>> (100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)
>>>
>>> 16 MPI processes: 6.8 ns/day
>>> 2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day
>>> 4 MPI / 4 OpenMP each does not improve things.
>>>
>>> I use an icc13, and I tried different MPI implementations (Mvapich 1.8,
>>> openmpi 1.33)
>>>
>>> Is that expected?
>>>
>>> Many thanks,
>>> Jochen
>>>
>>
>>
>>
>