[gmx-developers] [RFC] thread affinity in mdrun

Szilárd Páll szilard.pall at cbr.su.se
Sun Sep 22 17:49:11 CEST 2013


Hi,

On Fri, Sep 20, 2013 at 7:06 AM, Alexey Shvetsov
<alexxy at omrb.pnpi.spb.ru> wrote:
> Hi!
>
> I saw issues with demo Numascale system[1] and default (without external
> mpi) mdrun behavior -- it pins all 128 threads to first ~20 cores. Version
> with external  MPI (numascale provides openmpi offload module) works fine.

That's possible. I don't know of any testing done on Numascale systems
- at least not for 4.6. Feel free to file a bug report! However,
somebody with access to the machine would need to contribute a patch
or at least help figuring out what does not work correctly in the
current hardware detection code.

Cheers,
--
Szilárd

>
> [1] http://numascale.com/numa_access.php
>
> Szilárd Páll писал 19-09-2013 21:53:
>
>> Hi,
>>
>> I would like to get feedback on an issue (or more precisely a set of
>> issues) related to thread/process affinities and
>> i) the way we should (or should not) tweak the current behavior and
>> ii) the way we should proceed in the future.
>>
>>
>> Brief introduction, skip this if you are familiar with the
>> implementation details:
>> Currently, mdrun always sets per-thread affinity if the number of
>> threads is equal to the number of "CPUs" detected (reported by the OS
>> ~ number of hardware threads supported). However, if this is not the
>> case, e.g. one wants to leave some cores empty (run multiple
>> simulations per node) or avoid using HT, thread pinning will not be
>> done. This can have quite harsh consequences on the performance -
>> especially when OpenMP parallelization is used (most notably with
>> GPUs).
>> Additionally, we try hard to not override externally set affinities
>> which means that if mdrun detects non-default affinity, it will not
>> pin threads (not even if -pin on is used). This happens if the job
>> scheduler sets the affinity, or if the user sets it e.g. with
>> KMP_AFFINITY/GOMP_CPU_AFFINITY, taskset, etc., but even if the MPI
>> implementation sets only its thread's affinity.
>>
>>
>> On the one hand, there was a request (see
>> http://redmine.gromacs.org/issues/1122) that we should allow forcing
>> the affinity setting by mdrun either by "-pin on" acquiring more
>> aggressive behavior or using a "-pin force" option. Please check out
>> the discussion on the issue page and express your opinion on whether
>> you agree/which behavior you support.
>>
>>
>> On the other hand, more generally, I would like to get feedback on
>> what people's experience is with affinity setting. I'll just list a
>> few aspects of this issue that should be considered, but feel free to
>> raise other issues:
>> - per-process vs per-thread affinity;
>> - affinity set by or required (for optimal performance)
>> MPI/communication software stack;
>> - GPU/accelerator NUMA aspects;
>> - hwloc;
>> - leaving a core empty, for interrupts (AMD/Cray?), MPI, NIC or GPU
>> driver thread.
>>
>> Note that this part of the discussion is aimed more at the behavior of
>> mdrun in the future. This is especially relevant as the next major (?)
>> version is being planned/developed and new tasking/parallelization
>> design options are being explored.
>>
>> Cheers,
>> --
>> Szilárd
>
>
> --
> Best Regards,
> Alexey 'Alexxy' Shvetsov
> Petersburg Nuclear Physics Institute, NRC Kurchatov Institute, Gatchina,
> Russia
> Department of Molecular and Radiation Biophysics
> Gentoo Team Ru
> Gentoo Linux Dev
> mailto:alexxyum at gmail.com
> mailto:alexxy at gentoo.org
> mailto:alexxy at omrb.pnpi.spb.ru
> --
> gmx-developers mailing list
> gmx-developers at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-developers
> Please don't post (un)subscribe requests to the list. Use the www interface
> or send it to gmx-developers-request at gromacs.org.



More information about the gromacs.org_gmx-developers mailing list