[gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP

Thu Apr 25 10:03:30 CEST 2013

Many thanks, Berk, for clarifying this.

Cheers,
Jochen

Am 4/25/13 9:52 AM, schrieb hess at kth.se:
> Hi,
>
> It allows for further scaling, when the domain decomposition is limiting
> the number of MPI ranks.
> It can be faster, especially on hundreds of cores.
> We need it with GPUs.
> OpenMP alone can be significantly faster than MPI alone.
>
> Cheers,
>
> Berk
>
>
> ----- Reply message -----
> From: "Erik Marklund" <erikm at xray.bmc.uu.se>
> To: "Discussion list for GROMACS development" <gmx-developers at gromacs.org>
> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
> Date: Thu, Apr 25, 2013 09:47
>
>
> Hi,
>
> Please remind me why we allow for mixed OpenMP+MPI even though it is
> always slower. It ought to be more complicated to maintain code that
> allows such mixing.
>
> Best,
> Erik
>
> On 25 Apr 2013, at 09:43, "hess at kth.se <mailto:hess at kth.se>"
> <hess at kth.se <mailto:hess at kth.se>> wrote:
>
>> Hi
>>
>> Yes, that is expected.
>> Combined MPI+ OpenMP is always slower than either of the two, except
>> close to the scaling limit.
>> Two OpenMP threads give the least overhead, especially with
>> hyperthreading. Although turning of hyperthreading is then probably
>> faster.
>>
>> Cheers,
>>
>> Berk
>>
>>
>> ----- Reply message -----
>> From: "Jochen Hub" <jhub at gwdg.de <mailto:jhub at gwdg.de>>
>> To: "Discussion list for GROMACS development"
>> <gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>> Date: Thu, Apr 25, 2013 09:37
>>
>>
>>
>> Am 4/24/13 9:53 PM, schrieb Mark Abraham:
>> > I suspect -np 2 is not starting a process on each node like I suspect
>> > you think it should, because all the symptoms are consistent with that.
>> > Possibly the Host field in the .log file output is diagnostic here.
>> > Check how your your MPI configuration works.
>>
>> I fixed the issue with the mpi call. I make sure, that only one MPI
>> process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The
>> oversubscription warning does not appear, so everything seems fine.
>>
>> However, the performance is quite poor with MPI/OpenMP. Example:
>>
>> (100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)
>>
>> 16 MPI processes: 6.8 ns/day
>> 2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day
>> 4 MPI / 4 OpenMP each does not improve things.
>>
>> I use an icc13, and I tried different MPI implementations (Mvapich 1.8,
>> openmpi 1.33)
>>
>> Is that expected?
>>
>> Many thanks,
>> Jochen
>>
>
>
>

-- 
---------------------------------------------------
Dr. Jochen Hub
Computational Molecular Biophysics Group
Institute for Microbiology and Genetics
Georg-August-University of Göttingen
Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany.
Phone: +49-551-39-14189
http://cmb.bio.uni-goettingen.de/
---------------------------------------------------