[gmx-users] Thread affinity setting failed

Reid Van Lehn rvanlehn at gmail.com
Thu Mar 7 04:17:52 CET 2013


Hi Szilárd,

Thank you very much for the detailed write up. To answer your question,
yes, I am using an old Linux distro, specifically CentOS 5.4, though
upgrading to 5.9 still had the same problem. I have another few machines
with different hardware CentOS 6.3 which does not have this issue so it is
likely an operating system issue based on your description. As I'm
(unfortunately...) also the sysadmin on this cluster I'm unlikely to find
the time to upgrade all the nodes, so I'll probably stick with the "-pin
off" workaround for now. Hopefully this thread might help out other users!

As an aside, I found that the OpenMP + Verlet combination was slower for
this particular system, but I suspect that it's because it's almost
entirely water and hence probably benefits from the Group scheme
optimizations for water described on the Gromacs website.

Thanks again for the explanation,
Reid

On Mon, Mar 4, 2013 at 3:45 PM, Szilárd Páll <szilard.pall at cbr.su.se> wrote:

> Hi,
>
> There are some clarifications needed and as this might help you and other
> understand what's going on, I'll take the time to explain things.
>
> Affinity setting is a low-, operating system-level, operation and "locks"
> (="pins") threads to physical cores of the CPU preventing the OS from
> moving them which can cause performance drop - especially when using
> OpenMP-multithreading on multi-socket and NUMA machines.
>
> Now, mdrun will by default *try* to set affinity if you use all cores
> detected (i.e if mdrun can be sure that it is the only application running
> on the machine), but will by default *not* set thread affinities if the
> number of thread/processes per compute node is less than the number of
> cores detected. Hence, when you decrease -ntmpi to 7, you implicitly end up
> turning off thread pinning, that's why the warnings don't show up.
>
> The fact that affinity setting fails on your machine suggests that either
> the system libraries don't support this or the mdrun code is not fully
> compatible with your OS, the type of CPUs AFAIK don't matter at all. What
> OS are you using? Is it an old installation?
>
> If you are not using OpenMP - which btw you probably should with the Verlet
> scheme if you are running running on a single node or at high
> parallelization -, the performance will not be affected very much by the
> lack of thread pinning. While the warnings themselves can often be safely
> ignored, if only some of the threads/processes can't set affinities, this
> might indicate a problem. I your case, if you were really seeing only 5
> cores being used with 3 warnings, this might suggest that while the
> affinity setting failed, three threads are using already "busy" cores
> overlapping with others which will cause severe performance drop.
>
> What you can do to avoid the performance drop is to turn of pinning by
> passing "-pin off" to mdrun. Without OpenMP this will typically not cause a
> large performance drop compared to having correct pinning and it will avoid
> the bad overlapping threads/processes case.
>
> I suspect that your machines might be running an old OS which could be
> causing the failed affinity setting. If that is the case, you should talk
> to your sysadmins and have them figure out the issue. If you have a
> moderately new OS, you should not be seeing such issues, so I suggest that
> you file a bug report with details like: OS + version + kernel version,
> pthread library version, standard C library version.
>
> Cheers,
>
> --
> Szilárd
>
>
> On Mon, Mar 4, 2013 at 1:45 PM, Mark Abraham <mark.j.abraham at gmail.com
> >wrote:
>
> > On Mon, Mar 4, 2013 at 6:02 AM, Reid Van Lehn <rvanlehn at mit.edu> wrote:
> >
> > > Hello users,
> > >
> > > I ran into a bug I do not understand today upon upgrading from v. 4.5.5
> > to
> > > v 4.6. I'm using older 8 core Intel Xeon E5430 machines, and when I
> > > submitted a job for 8 cores to one of the nodes I received the
> following
> > > error:
> > >
> > > NOTE: In thread-MPI thread #3: Affinity setting failed.
> > >       This can cause performance degradation!
> > >
> > > NOTE: In thread-MPI thread #2: Affinity setting failed.
> > >       This can cause performance degradation!
> > >
> > > NOTE: In thread-MPI thread #1: Affinity setting failed.
> > >       This can cause performance degradation!
> > >
> > > I ran mdrun simply with the flags:
> > >
> > > mdrun -v -ntmpi 8 -deffnm em
> > >
> > > Using the top command, I confirmed that no other programs were running
> > and
> > > that mdrun was in fact only using 5 cores. Reducing -ntmpi to 7,
> however,
> > > resulted in no error (only a warning about not using all of the logical
> > > cores) and mdrun used 7 cores correctly. Since it warned about thread
> > > affinity settings, I tried setting -pin on -pinoffset 0 even though I
> was
> > > using all the cores on the machine. This resulted in the same error.
> > > However, turning pinning off explicitly with -pin off (rather than -pin
> > > auto) did correctly give me the all 8 cores again.
> > >
> > > While I figured out a solution in this particular instance, my question
> > is
> > > whether I should be have known from my hardware/settings that pinning
> > > should be turned off (for future reference), or if this is a bug?
> > >
> >
> > I'm not sure - those are 2007-era processors, so there may be some
> > limitations in what they could do (or how well the kernel and system
> > libraries support it). So investing time into working out the real
> problem
> > is not really worthwhile. Thanks for reporting your work-around, however,
> > others might benefit from it. If you plan on doing lengthy simulations,
> you
> > might like to verify that you get linear scaling with increasing -ntmpi,
> > and/or compare performance with the MPI version on the same hardware.
> >
> > Mark
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



-- 
Reid Van Lehn
NSF/MIT Presidential Fellow
Alfredo Alexander-Katz Research Group
Ph.D Candidate - Materials Science



More information about the gromacs.org_gmx-users mailing list