[gmx-users] Thread affinity setting failed

Szilárd Páll szilard.pall at cbr.su.se
Mon Mar 4 21:45:38 CET 2013


Hi,

There are some clarifications needed and as this might help you and other
understand what's going on, I'll take the time to explain things.

Affinity setting is a low-, operating system-level, operation and "locks"
(="pins") threads to physical cores of the CPU preventing the OS from
moving them which can cause performance drop - especially when using
OpenMP-multithreading on multi-socket and NUMA machines.

Now, mdrun will by default *try* to set affinity if you use all cores
detected (i.e if mdrun can be sure that it is the only application running
on the machine), but will by default *not* set thread affinities if the
number of thread/processes per compute node is less than the number of
cores detected. Hence, when you decrease -ntmpi to 7, you implicitly end up
turning off thread pinning, that's why the warnings don't show up.

The fact that affinity setting fails on your machine suggests that either
the system libraries don't support this or the mdrun code is not fully
compatible with your OS, the type of CPUs AFAIK don't matter at all. What
OS are you using? Is it an old installation?

If you are not using OpenMP - which btw you probably should with the Verlet
scheme if you are running running on a single node or at high
parallelization -, the performance will not be affected very much by the
lack of thread pinning. While the warnings themselves can often be safely
ignored, if only some of the threads/processes can't set affinities, this
might indicate a problem. I your case, if you were really seeing only 5
cores being used with 3 warnings, this might suggest that while the
affinity setting failed, three threads are using already "busy" cores
overlapping with others which will cause severe performance drop.

What you can do to avoid the performance drop is to turn of pinning by
passing "-pin off" to mdrun. Without OpenMP this will typically not cause a
large performance drop compared to having correct pinning and it will avoid
the bad overlapping threads/processes case.

I suspect that your machines might be running an old OS which could be
causing the failed affinity setting. If that is the case, you should talk
to your sysadmins and have them figure out the issue. If you have a
moderately new OS, you should not be seeing such issues, so I suggest that
you file a bug report with details like: OS + version + kernel version,
pthread library version, standard C library version.

Cheers,

--
Szilárd


On Mon, Mar 4, 2013 at 1:45 PM, Mark Abraham <mark.j.abraham at gmail.com>wrote:

> On Mon, Mar 4, 2013 at 6:02 AM, Reid Van Lehn <rvanlehn at mit.edu> wrote:
>
> > Hello users,
> >
> > I ran into a bug I do not understand today upon upgrading from v. 4.5.5
> to
> > v 4.6. I'm using older 8 core Intel Xeon E5430 machines, and when I
> > submitted a job for 8 cores to one of the nodes I received the following
> > error:
> >
> > NOTE: In thread-MPI thread #3: Affinity setting failed.
> >       This can cause performance degradation!
> >
> > NOTE: In thread-MPI thread #2: Affinity setting failed.
> >       This can cause performance degradation!
> >
> > NOTE: In thread-MPI thread #1: Affinity setting failed.
> >       This can cause performance degradation!
> >
> > I ran mdrun simply with the flags:
> >
> > mdrun -v -ntmpi 8 -deffnm em
> >
> > Using the top command, I confirmed that no other programs were running
> and
> > that mdrun was in fact only using 5 cores. Reducing -ntmpi to 7, however,
> > resulted in no error (only a warning about not using all of the logical
> > cores) and mdrun used 7 cores correctly. Since it warned about thread
> > affinity settings, I tried setting -pin on -pinoffset 0 even though I was
> > using all the cores on the machine. This resulted in the same error.
> > However, turning pinning off explicitly with -pin off (rather than -pin
> > auto) did correctly give me the all 8 cores again.
> >
> > While I figured out a solution in this particular instance, my question
> is
> > whether I should be have known from my hardware/settings that pinning
> > should be turned off (for future reference), or if this is a bug?
> >
>
> I'm not sure - those are 2007-era processors, so there may be some
> limitations in what they could do (or how well the kernel and system
> libraries support it). So investing time into working out the real problem
> is not really worthwhile. Thanks for reporting your work-around, however,
> others might benefit from it. If you plan on doing lengthy simulations, you
> might like to verify that you get linear scaling with increasing -ntmpi,
> and/or compare performance with the MPI version on the same hardware.
>
> Mark
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list