[gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
hess at kth.se
Thu Apr 25 14:09:10 CEST 2013
This question is of general use, so it should have been posted to gmx-users.
Szilard correctly me that sometimes hybrid parallelization can actually
especially with Intel.
I will make add a table with the possible parallelization combinations
to the acc.+par. page,
with links to a benchmark page, where we put up some comparisons.
To do this, we need the force calculation order patch, currently waiting
which will improve the performance without PME nodes.
On 04/25/2013 10:03 AM, Jochen Hub wrote:
> Many thanks, Berk, for clarifying this.
> Am 4/25/13 9:52 AM, schrieb hess at kth.se:
>> It allows for further scaling, when the domain decomposition is limiting
>> the number of MPI ranks.
>> It can be faster, especially on hundreds of cores.
>> We need it with GPUs.
>> OpenMP alone can be significantly faster than MPI alone.
>> ----- Reply message -----
>> From: "Erik Marklund" <erikm at xray.bmc.uu.se>
>> To: "Discussion list for GROMACS development"
>> <gmx-developers at gromacs.org>
>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>> Date: Thu, Apr 25, 2013 09:47
>> Please remind me why we allow for mixed OpenMP+MPI even though it is
>> always slower. It ought to be more complicated to maintain code that
>> allows such mixing.
>> On 25 Apr 2013, at 09:43, "hess at kth.se <mailto:hess at kth.se>"
>> <hess at kth.se <mailto:hess at kth.se>> wrote:
>>> Yes, that is expected.
>>> Combined MPI+ OpenMP is always slower than either of the two, except
>>> close to the scaling limit.
>>> Two OpenMP threads give the least overhead, especially with
>>> hyperthreading. Although turning of hyperthreading is then probably
>>> ----- Reply message -----
>>> From: "Jochen Hub" <jhub at gwdg.de <mailto:jhub at gwdg.de>>
>>> To: "Discussion list for GROMACS development"
>>> <gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>>> Date: Thu, Apr 25, 2013 09:37
>>> Am 4/24/13 9:53 PM, schrieb Mark Abraham:
>>> > I suspect -np 2 is not starting a process on each node like I suspect
>>> > you think it should, because all the symptoms are consistent with
>>> > Possibly the Host field in the .log file output is diagnostic here.
>>> > Check how your your MPI configuration works.
>>> I fixed the issue with the mpi call. I make sure, that only one MPI
>>> process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The
>>> oversubscription warning does not appear, so everything seems fine.
>>> However, the performance is quite poor with MPI/OpenMP. Example:
>>> (100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)
>>> 16 MPI processes: 6.8 ns/day
>>> 2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day
>>> 4 MPI / 4 OpenMP each does not improve things.
>>> I use an icc13, and I tried different MPI implementations (Mvapich 1.8,
>>> openmpi 1.33)
>>> Is that expected?
>>> Many thanks,
More information about the gromacs.org_gmx-developers