Re: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP

Thu Apr 25 09:52:58 CEST 2013

Hi,

It allows for further scaling, when the domain decomposition is limiting the number of MPI ranks.
It can be faster, especially on hundreds of cores.
We need it with GPUs.
OpenMP alone can be significantly faster than MPI alone.

Cheers,

Berk

----- Reply message -----
From: "Erik Marklund" <erikm at xray.bmc.uu.se>
To: "Discussion list for GROMACS development" <gmx-developers at gromacs.org>
Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
Date: Thu, Apr 25, 2013 09:47
Hi,
Please remind me why we allow for mixed OpenMP+MPI even though it is always slower. It ought to be more complicated to maintain code that allows such mixing.
Best,Erik
On 25 Apr 2013, at 09:43, "hess at kth.se" <hess at kth.se> wrote:Hi

Yes, that is expected.
Combined MPI+ OpenMP is always slower than either of the two, except close to the scaling limit.
Two OpenMP threads give the least overhead, especially with hyperthreading. Although turning of hyperthreading is then probably faster.

Cheers,

Berk

----- Reply message -----
From: "Jochen Hub" <jhub at gwdg.de>
To: "Discussion list for GROMACS development" <gmx-developers at gromacs.org>
Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
Date: Thu, Apr 25, 2013 09:37

Am 4/24/13 9:53 PM, schrieb Mark Abraham:> I suspect -np 2 is not starting a process on each node like I suspect> you think it should, because all the symptoms are consistent with that.> Possibly the Host field in the .log file output is diagnostic here.> Check how your your MPI configuration works.I fixed the issue with the mpi call. I make sure, that only one MPI process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The oversubscription warning does not appear, so everything seems fine.However, the performance is quite poor with MPI/OpenMP. Example:(100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)16 MPI processes: 6.8 ns/day2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day4 MPI / 4 OpenMP each does not improve things.I use an icc13, and I tried different MPI implementations (Mvapich 1.8, openmpi 1.33)Is that expected?Many thanks,Jochen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20130425/528a3ab2/attachment.html>