[gmx-developers] [RFC] thread affinity in mdrun

Fri Sep 20 07:06:47 CEST 2013

Hi!

I saw issues with demo Numascale system[1] and default (without external 
mpi) mdrun behavior -- it pins all 128 threads to first ~20 cores. 
Version with external  MPI (numascale provides openmpi offload module) 
works fine.

[1] http://numascale.com/numa_access.php

Szilárd Páll писал 19-09-2013 21:53:
> Hi,
> 
> I would like to get feedback on an issue (or more precisely a set of
> issues) related to thread/process affinities and
> i) the way we should (or should not) tweak the current behavior and
> ii) the way we should proceed in the future.
> 
> 
> Brief introduction, skip this if you are familiar with the
> implementation details:
> Currently, mdrun always sets per-thread affinity if the number of
> threads is equal to the number of "CPUs" detected (reported by the OS
> ~ number of hardware threads supported). However, if this is not the
> case, e.g. one wants to leave some cores empty (run multiple
> simulations per node) or avoid using HT, thread pinning will not be
> done. This can have quite harsh consequences on the performance -
> especially when OpenMP parallelization is used (most notably with
> GPUs).
> Additionally, we try hard to not override externally set affinities
> which means that if mdrun detects non-default affinity, it will not
> pin threads (not even if -pin on is used). This happens if the job
> scheduler sets the affinity, or if the user sets it e.g. with
> KMP_AFFINITY/GOMP_CPU_AFFINITY, taskset, etc., but even if the MPI
> implementation sets only its thread's affinity.
> 
> 
> On the one hand, there was a request (see
> http://redmine.gromacs.org/issues/1122) that we should allow forcing
> the affinity setting by mdrun either by "-pin on" acquiring more
> aggressive behavior or using a "-pin force" option. Please check out
> the discussion on the issue page and express your opinion on whether
> you agree/which behavior you support.
> 
> 
> On the other hand, more generally, I would like to get feedback on
> what people's experience is with affinity setting. I'll just list a
> few aspects of this issue that should be considered, but feel free to
> raise other issues:
> - per-process vs per-thread affinity;
> - affinity set by or required (for optimal performance)
> MPI/communication software stack;
> - GPU/accelerator NUMA aspects;
> - hwloc;
> - leaving a core empty, for interrupts (AMD/Cray?), MPI, NIC or GPU
> driver thread.
> 
> Note that this part of the discussion is aimed more at the behavior of
> mdrun in the future. This is especially relevant as the next major (?)
> version is being planned/developed and new tasking/parallelization
> design options are being explored.
> 
> Cheers,
> --
> Szilárd

-- 
Best Regards,
Alexey 'Alexxy' Shvetsov
Petersburg Nuclear Physics Institute, NRC Kurchatov Institute, Gatchina, 
Russia
Department of Molecular and Radiation Biophysics
Gentoo Team Ru
Gentoo Linux Dev
mailto:alexxyum at gmail.com
mailto:alexxy at gentoo.org
mailto:alexxy at omrb.pnpi.spb.ru