[gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
erikm at xray.bmc.uu.se
Thu Apr 25 09:54:35 CEST 2013
That makes a lot of sense. Thanks.
On 25 Apr 2013, at 09:52, "hess at kth.se" <hess at kth.se> wrote:
> It allows for further scaling, when the domain decomposition is limiting the number of MPI ranks.
> It can be faster, especially on hundreds of cores.
> We need it with GPUs.
> OpenMP alone can be significantly faster than MPI alone.
> ----- Reply message -----
> From: "Erik Marklund" <erikm at xray.bmc.uu.se>
> To: "Discussion list for GROMACS development" <gmx-developers at gromacs.org>
> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
> Date: Thu, Apr 25, 2013 09:47
> Please remind me why we allow for mixed OpenMP+MPI even though it is always slower. It ought to be more complicated to maintain code that allows such mixing.
> On 25 Apr 2013, at 09:43, "hess at kth.se" <hess at kth.se> wrote:
>> Yes, that is expected.
>> Combined MPI+ OpenMP is always slower than either of the two, except close to the scaling limit.
>> Two OpenMP threads give the least overhead, especially with hyperthreading. Although turning of hyperthreading is then probably faster.
>> ----- Reply message -----
>> From: "Jochen Hub" <jhub at gwdg.de>
>> To: "Discussion list for GROMACS development" <gmx-developers at gromacs.org>
>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>> Date: Thu, Apr 25, 2013 09:37
>> Am 4/24/13 9:53 PM, schrieb Mark Abraham:
>> > I suspect -np 2 is not starting a process on each node like I suspect
>> > you think it should, because all the symptoms are consistent with that.
>> > Possibly the Host field in the .log file output is diagnostic here.
>> > Check how your your MPI configuration works.
>> I fixed the issue with the mpi call. I make sure, that only one MPI
>> process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The
>> oversubscription warning does not appear, so everything seems fine.
>> However, the performance is quite poor with MPI/OpenMP. Example:
>> (100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)
>> 16 MPI processes: 6.8 ns/day
>> 2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day
>> 4 MPI / 4 OpenMP each does not improve things.
>> I use an icc13, and I tried different MPI implementations (Mvapich 1.8,
>> openmpi 1.33)
>> Is that expected?
>> Many thanks,
> gmx-developers mailing list
> gmx-developers at gromacs.org
> Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the gromacs.org_gmx-developers