[gmx-users] Thread affinity setting failed

Szilárd Páll szilard.pall at cbr.su.se
Fri Mar 8 14:05:24 CET 2013


Hi Reid,

Just saw your bug report and realized that you have an ancient kernel which
could be causing the issue. Let's move the discussion to the bug page (
http://redmine.gromacs.org/issues/1184), hopefully we can narrow the issue
down and then post the conclusions to the list later.

Cheers,

--
Szilárd


On Thu, Mar 7, 2013 at 7:06 AM, Roland Schulz <roland at utk.edu> wrote:

> Hi Raid,
>
> I just tested Gromacs 4.6.1 compiled with ICC 13 and GCC 4.1.2 on CentOS
> 5.6 and I don't have any problems with pinning. So it might be useful to
> open a bug and provide more details, because it should work for CentOS 5.x.
>
> Yes, for pure water the group kernels are faster than Verlet.
>
> Roland
>
>
> On Wed, Mar 6, 2013 at 10:17 PM, Reid Van Lehn <rvanlehn at gmail.com> wrote:
>
> > Hi Szilárd,
> >
> > Thank you very much for the detailed write up. To answer your question,
> > yes, I am using an old Linux distro, specifically CentOS 5.4, though
> > upgrading to 5.9 still had the same problem. I have another few machines
> > with different hardware CentOS 6.3 which does not have this issue so it
> is
> > likely an operating system issue based on your description. As I'm
> > (unfortunately...) also the sysadmin on this cluster I'm unlikely to find
> > the time to upgrade all the nodes, so I'll probably stick with the "-pin
> > off" workaround for now. Hopefully this thread might help out other
> users!
> >
> > As an aside, I found that the OpenMP + Verlet combination was slower for
> > this particular system, but I suspect that it's because it's almost
> > entirely water and hence probably benefits from the Group scheme
> > optimizations for water described on the Gromacs website.
> >
> > Thanks again for the explanation,
> > Reid
> >
> > On Mon, Mar 4, 2013 at 3:45 PM, Szilárd Páll <szilard.pall at cbr.su.se>
> > wrote:
> >
> > > Hi,
> > >
> > > There are some clarifications needed and as this might help you and
> other
> > > understand what's going on, I'll take the time to explain things.
> > >
> > > Affinity setting is a low-, operating system-level, operation and
> "locks"
> > > (="pins") threads to physical cores of the CPU preventing the OS from
> > > moving them which can cause performance drop - especially when using
> > > OpenMP-multithreading on multi-socket and NUMA machines.
> > >
> > > Now, mdrun will by default *try* to set affinity if you use all cores
> > > detected (i.e if mdrun can be sure that it is the only application
> > running
> > > on the machine), but will by default *not* set thread affinities if the
> > > number of thread/processes per compute node is less than the number of
> > > cores detected. Hence, when you decrease -ntmpi to 7, you implicitly
> end
> > up
> > > turning off thread pinning, that's why the warnings don't show up.
> > >
> > > The fact that affinity setting fails on your machine suggests that
> either
> > > the system libraries don't support this or the mdrun code is not fully
> > > compatible with your OS, the type of CPUs AFAIK don't matter at all.
> What
> > > OS are you using? Is it an old installation?
> > >
> > > If you are not using OpenMP - which btw you probably should with the
> > Verlet
> > > scheme if you are running running on a single node or at high
> > > parallelization -, the performance will not be affected very much by
> the
> > > lack of thread pinning. While the warnings themselves can often be
> safely
> > > ignored, if only some of the threads/processes can't set affinities,
> this
> > > might indicate a problem. I your case, if you were really seeing only 5
> > > cores being used with 3 warnings, this might suggest that while the
> > > affinity setting failed, three threads are using already "busy" cores
> > > overlapping with others which will cause severe performance drop.
> > >
> > > What you can do to avoid the performance drop is to turn of pinning by
> > > passing "-pin off" to mdrun. Without OpenMP this will typically not
> > cause a
> > > large performance drop compared to having correct pinning and it will
> > avoid
> > > the bad overlapping threads/processes case.
> > >
> > > I suspect that your machines might be running an old OS which could be
> > > causing the failed affinity setting. If that is the case, you should
> talk
> > > to your sysadmins and have them figure out the issue. If you have a
> > > moderately new OS, you should not be seeing such issues, so I suggest
> > that
> > > you file a bug report with details like: OS + version + kernel version,
> > > pthread library version, standard C library version.
> > >
> > > Cheers,
> > >
> > > --
> > > Szilárd
> > >
> > >
> > > On Mon, Mar 4, 2013 at 1:45 PM, Mark Abraham <mark.j.abraham at gmail.com
> > > >wrote:
> > >
> > > > On Mon, Mar 4, 2013 at 6:02 AM, Reid Van Lehn <rvanlehn at mit.edu>
> > wrote:
> > > >
> > > > > Hello users,
> > > > >
> > > > > I ran into a bug I do not understand today upon upgrading from v.
> > 4.5.5
> > > > to
> > > > > v 4.6. I'm using older 8 core Intel Xeon E5430 machines, and when I
> > > > > submitted a job for 8 cores to one of the nodes I received the
> > > following
> > > > > error:
> > > > >
> > > > > NOTE: In thread-MPI thread #3: Affinity setting failed.
> > > > >       This can cause performance degradation!
> > > > >
> > > > > NOTE: In thread-MPI thread #2: Affinity setting failed.
> > > > >       This can cause performance degradation!
> > > > >
> > > > > NOTE: In thread-MPI thread #1: Affinity setting failed.
> > > > >       This can cause performance degradation!
> > > > >
> > > > > I ran mdrun simply with the flags:
> > > > >
> > > > > mdrun -v -ntmpi 8 -deffnm em
> > > > >
> > > > > Using the top command, I confirmed that no other programs were
> > running
> > > > and
> > > > > that mdrun was in fact only using 5 cores. Reducing -ntmpi to 7,
> > > however,
> > > > > resulted in no error (only a warning about not using all of the
> > logical
> > > > > cores) and mdrun used 7 cores correctly. Since it warned about
> thread
> > > > > affinity settings, I tried setting -pin on -pinoffset 0 even
> though I
> > > was
> > > > > using all the cores on the machine. This resulted in the same
> error.
> > > > > However, turning pinning off explicitly with -pin off (rather than
> > -pin
> > > > > auto) did correctly give me the all 8 cores again.
> > > > >
> > > > > While I figured out a solution in this particular instance, my
> > question
> > > > is
> > > > > whether I should be have known from my hardware/settings that
> pinning
> > > > > should be turned off (for future reference), or if this is a bug?
> > > > >
> > > >
> > > > I'm not sure - those are 2007-era processors, so there may be some
> > > > limitations in what they could do (or how well the kernel and system
> > > > libraries support it). So investing time into working out the real
> > > problem
> > > > is not really worthwhile. Thanks for reporting your work-around,
> > however,
> > > > others might benefit from it. If you plan on doing lengthy
> simulations,
> > > you
> > > > might like to verify that you get linear scaling with increasing
> > -ntmpi,
> > > > and/or compare performance with the MPI version on the same hardware.
> > > >
> > > > Mark
> > > > --
> > > > gmx-users mailing list    gmx-users at gromacs.org
> > > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > > * Please search the archive at
> > > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > > * Please don't post (un)subscribe requests to the list. Use the
> > > > www interface or send it to gmx-users-request at gromacs.org.
> > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > > >
> > > --
> > > gmx-users mailing list    gmx-users at gromacs.org
> > > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > > * Please search the archive at
> > > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > > * Please don't post (un)subscribe requests to the list. Use the
> > > www interface or send it to gmx-users-request at gromacs.org.
> > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> > >
> >
> >
> >
> > --
> > Reid Van Lehn
> > NSF/MIT Presidential Fellow
> > Alfredo Alexander-Katz Research Group
> > Ph.D Candidate - Materials Science
> > --
> > gmx-users mailing list    gmx-users at gromacs.org
> > http://lists.gromacs.org/mailman/listinfo/gmx-users
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> > * Please don't post (un)subscribe requests to the list. Use the
> > www interface or send it to gmx-users-request at gromacs.org.
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> >
> >
> >
> >
>
>
> --
> ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
> 865-241-1537, ORNL PO BOX 2008 MS6309
> --
> gmx-users mailing list    gmx-users at gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the
> www interface or send it to gmx-users-request at gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>



More information about the gromacs.org_gmx-users mailing list