[gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP

Thu Apr 25 10:15:11 CEST 2013

Hi,

Even though it's generally slower, in some cases you can push the hard 
scaling limit 2-3 times because in hybrid mode the cell is decomposed in 
fewer domains for the same number of processors. It's also faster on 
clusters with slow networks.

Rossen

On 4/25/13 9:47 AM, Erik Marklund wrote:
> Hi,
>
> Please remind me why we allow for mixed OpenMP+MPI even though it is 
> always slower. It ought to be more complicated to maintain code that 
> allows such mixing.
>
> Best,
> Erik
>
> On 25 Apr 2013, at 09:43, "hess at kth.se <mailto:hess at kth.se>" 
> <hess at kth.se <mailto:hess at kth.se>> wrote:
>
>> Hi
>>
>> Yes, that is expected.
>> Combined MPI+ OpenMP is always slower than either of the two, except 
>> close to the scaling limit.
>> Two OpenMP threads give the least overhead, especially with 
>> hyperthreading. Although turning of hyperthreading is then probably 
>> faster.
>>
>> Cheers,
>>
>> Berk
>>
>>
>> ----- Reply message -----
>> From: "Jochen Hub" <jhub at gwdg.de <mailto:jhub at gwdg.de>>
>> To: "Discussion list for GROMACS development" 
>> <gmx-developers at gromacs.org <mailto:gmx-developers at gromacs.org>>
>> Subject: [gmx-developers] Oversubscribing on 4.62 with MPI / OpenMP
>> Date: Thu, Apr 25, 2013 09:37
>>
>>
>>
>> Am 4/24/13 9:53 PM, schrieb Mark Abraham:
>> > I suspect -np 2 is not starting a process on each node like I suspect
>> > you think it should, because all the symptoms are consistent with that.
>> > Possibly the Host field in the .log file output is diagnostic here.
>> > Check how your your MPI configuration works.
>>
>> I fixed the issue with the mpi call. I make sure, that only one MPI
>> process is started per node (mpiexec -n 2 -npernode=1 or -bynode) . The
>> oversubscription warning does not appear, so everything seems fine.
>>
>> However, the performance is quite poor with MPI/OpenMP. Example:
>>
>> (100 kAtoms, PME, Verlet, cutoffs at 1nm nstlist=10)
>>
>> 16 MPI processes: 6.8 ns/day
>> 2 MPI processes, 8 OpenMP threads pre MPI process: 4.46 ns/day
>> 4 MPI / 4 OpenMP each does not improve things.
>>
>> I use an icc13, and I tried different MPI implementations (Mvapich 1.8,
>> openmpi 1.33)
>>
>> Is that expected?
>>
>> Many thanks,
>> Jochen
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20130425/5cdfb983/attachment.html>