[gmx-developers] PME tuning-related hang [was Re: Gromacs 2016.3 (and earlier) freezing up.]

Szilárd Páll pall.szilard at gmail.com
Tue Sep 19 11:50:09 CEST 2017


Hi,

Why would you want to increase the MPI rank count in PME? Is it to
compensate for the thread scaling being worse than in the PP ranks?

It might be more worthwhile improving PME multi-threading rather than
allowing higher rank count.

--
Szilárd


On Tue, Sep 19, 2017 at 10:01 AM, Berk Hess <hess at kth.se> wrote:
> On 2017-09-18 18:34, John Eblen wrote:
>
> Hi Szilárd
>
> These runs used 2M huge pages. I will file a redmine shortly.
>
> On a related topic, how difficult would it be to modify GROMACS to support >
> 50%
> PME nodes?
>
> That's not so hard, but I see little benefit, since then the MPI
> communication is not reduced much compared to all ranks doing PME.
>
> Berk
>
>
>
> John
>
> On Fri, Sep 15, 2017 at 6:37 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>>
>> Hi John,
>>
>> Thanks for diagnosing the issue!
>>
>> We have been aware of this behavior, but been both intentional (as we
>> re-scan grids after the first pass at least once more); plus, it's
>> also simply been considered a "not too big of a deal" given that in
>> general mdrun has very low memory footprint. However, it seems that,
>> at least on this particular machine, our assumption was wrong. What is
>> the page sizes on Cori KNL?
>>
>> Can you please file a redmine with your observations?
>>
>> Thanks,
>> --
>> Szilárd
>>
>>
>> On Fri, Sep 15, 2017 at 8:25 PM, John Eblen <jeblen at acm.org> wrote:
>> > This issue appears to not be a GROMACS problem so much as a problem with
>> > "huge pages" that is
>> > triggered by PME tuning. PME tuning creates a large data structure for
>> > every
>> > cutoff that it tries, which
>> > is replicated on each PME node. These data structures are not freed
>> > during
>> > tuning, so memory usage
>> > expands. Normally it is still too small to cause problems. With huge
>> > pages,
>> > however, I get errors from
>> > "libhugetlbfs" and very slow runs if more than about five cutoffs are
>> > attempted.
>> >
>> > Sample output on NERSC Cori KNL with 32 nodes. Input system size is
>> > 248,101
>> > atoms.
>> >
>> > step 0
>> > step 100, remaining wall clock time:    24 s
>> > step  140: timed with pme grid 128 128 128, coulomb cutoff 1.200: 66.2
>> > M-cycles
>> > step  210: timed with pme grid 112 112 112, coulomb cutoff 1.336: 69.6
>> > M-cycles
>> > step  280: timed with pme grid 100 100 100, coulomb cutoff 1.496: 63.6
>> > M-cycles
>> > step  350: timed with pme grid 84 84 84, coulomb cutoff 1.781: 85.9
>> > M-cycles
>> > step  420: timed with pme grid 96 96 96, coulomb cutoff 1.559: 68.8
>> > M-cycles
>> > step  490: timed with pme grid 100 100 100, coulomb cutoff 1.496: 68.3
>> > M-cycles
>> > libhugetlbfs [nid08887:140420]: WARNING: New heap segment map at
>> > 0x10001200000 failed: Cannot allocate memory
>> > libhugetlbfs [nid08881:97968]: WARNING: New heap segment map at
>> > 0x10001200000 failed: Cannot allocate memory
>> > libhugetlbfs [nid08881:97978]: WARNING: New heap segment map at
>> > 0x10001200000 failed: Cannot allocate memory
>> >
>> > Szilárd, to answer to your questions: This is the verlet scheme. The
>> > problem
>> > happens during tuning, and
>> > no problems occur if -notunepme is used. In fact, the best performance
>> > thus
>> > far has been with 50% PME
>> > nodes, using huge pages, and '-notunepme'.
>> >
>> >
>> > John
>> >
>> > On Wed, Sep 13, 2017 at 6:20 AM, Szilárd Páll <pall.szilard at gmail.com>
>> > wrote:
>> >>
>> >> Forking the discussion as now we've learned more about the issue Åke
>> >> is reporting and it is quiterather dissimilar.
>> >>
>> >> On Mon, Sep 11, 2017 at 8:09 PM, John Eblen <jeblen at acm.org> wrote:
>> >> > Hi Szilárd
>> >> >
>> >> > No, I'm not using the group scheme.
>> >>
>> >>  $ grep -i 'cutoff-scheme' md.log
>> >>    cutoff-scheme                  = Verlet
>> >>
>> >> > The problem seems similar because:
>> >> >
>> >> > 1) Deadlocks and very slow runs can be hard to distinguish.
>> >> > 2) Since Mark mentioned it, I assume he believes PME tuning is a
>> >> > possible
>> >> >     cause, which is also the cause in my situation.
>> >>
>> >> Does that mean you tested with "-notunepme" and the excessive memory
>> >> usage could not be reproduced? Did the memory usage increase only
>> >> during the tuning or did it keep increasing after the tuning
>> >> completed?
>> >>
>> >> > 3) Åke may be experiencing higher-than-normal memory usage as far as
>> >> > I
>> >> > know.
>> >> >     Not sure how you know otherwise.
>> >> > 4) By "successful," I assume you mean the tuning had completed. That
>> >> > doesn't
>> >> >     mean, though, that the tuning could not be creating conditions
>> >> > that
>> >> > causes the
>> >> >     problem, like an excessively high cutoff.
>> >>
>> >> Sure. However, it's unlikely that the tuning creates conditions under
>> >> which the run proceeds after the after the initial tuning phase and
>> >> keeps allocating memory (which is more prone to be the source of
>> >> issues).
>> >>
>> >> I suggest to first rule our the bug I linked and if that's not the
>> >> culprit, we can have a closer look.
>> >>
>> >> Cheers,
>> >> --
>> >> Szilárd
>> >>
>> >> >
>> >> >
>> >> > John
>> >> >
>> >> > On Mon, Sep 11, 2017 at 1:09 PM, Szilárd Páll
>> >> > <pall.szilard at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> John,
>> >> >>
>> >> >> In what way do you think your problem is similar? Åke seems to be
>> >> >> experiencing a deadlock after successful PME tuning, much later
>> >> >> during
>> >> >> the run, but no excessive memory usage.
>> >> >>
>> >> >> Do you happen to be using the group scheme with 2016.x (release
>> >> >> code)?
>> >> >>
>> >> >> Your issue sounds more like it could be related to the the excessive
>> >> >> tuning bug with group scheme fixed quite a few months ago, but it's
>> >> >> yet to be released (https://redmine.gromacs.org/issues/2200).
>> >> >>
>> >> >> Cheers,
>> >> >> --
>> >> >> Szilárd
>> >> >>
>> >> >>
>> >> >> On Mon, Sep 11, 2017 at 6:50 PM, John Eblen <jeblen at acm.org> wrote:
>> >> >> > Hi
>> >> >> >
>> >> >> > I'm having a similar problem that is related to PME tuning. When
>> >> >> > it
>> >> >> > is
>> >> >> > enabled, GROMACS often, but not
>> >> >> > always, slows to a crawl and uses excessive amounts of memory.
>> >> >> > Using
>> >> >> > "huge
>> >> >> > pages" and setting a high
>> >> >> > number of PME processes seems to exacerbate the problem.
>> >> >> >
>> >> >> > Also, occurrences of this problem seem to correlate with how high
>> >> >> > the
>> >> >> > tuning
>> >> >> > raises the cutoff value.
>> >> >> >
>> >> >> > Mark, can you give us more information on the problems with PME
>> >> >> > tuning?
>> >> >> > Is
>> >> >> > there a redmine?
>> >> >> >
>> >> >> >
>> >> >> > Thanks
>> >> >> > John
>> >> >> >
>> >> >> > On Mon, Sep 11, 2017 at 10:53 AM, Mark Abraham
>> >> >> > <mark.j.abraham at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> Thanks. Was PME tuning active? Does it reproduce if that is
>> >> >> >> disabled?
>> >> >> >> Is
>> >> >> >> the PME tuning still active? How many steps have taken place (at
>> >> >> >> least
>> >> >> >> as
>> >> >> >> reported in the log file but ideally from processes)?
>> >> >> >>
>> >> >> >> Mark
>> >> >> >>
>> >> >> >> On Mon, Sep 11, 2017 at 4:42 PM Åke Sandgren
>> >> >> >> <ake.sandgren at hpc2n.umu.se>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> My debugger run finally got to the lockup.
>> >> >> >>>
>> >> >> >>> All processes are waiting on various MPI operations.
>> >> >> >>>
>> >> >> >>> Attached a stack dump of all 56 tasks.
>> >> >> >>>
>> >> >> >>> I'll keep the debug session running for a while in case anyone
>> >> >> >>> wants
>> >> >> >>> some more detailed data.
>> >> >> >>> This is a RelwithDeb build though so not everything is
>> >> >> >>> available.
>> >> >> >>>
>> >> >> >>> On 09/08/2017 11:28 AM, Berk Hess wrote:
>> >> >> >>> > But you should be able to get some (limited) information by
>> >> >> >>> > attaching a
>> >> >> >>> > debugger to an aldready running process with a release build.
>> >> >> >>> >
>> >> >> >>> > If you plan on compiling and running a new case, use a release
>> >> >> >>> > +
>> >> >> >>> > debug
>> >> >> >>> > symbols build. That should run as fast as a release build.
>> >> >> >>> >
>> >> >> >>> > Cheers,
>> >> >> >>> >
>> >> >> >>> > Berk
>> >> >> >>> >
>> >> >> >>> > On 2017-09-08 11:23, Åke Sandgren wrote:
>> >> >> >>> >> We have, at least, one case that when run over 2 nodes, or
>> >> >> >>> >> more,
>> >> >> >>> >> quite
>> >> >> >>> >> often (always) hangs, i.e. no more output in md.log or
>> >> >> >>> >> otherwise
>> >> >> >>> >> while
>> >> >> >>> >> mdrun still consumes cpu time. It takes a random time before
>> >> >> >>> >> it
>> >> >> >>> >> happens,
>> >> >> >>> >> like 1-3 days.
>> >> >> >>> >>
>> >> >> >>> >> The case can be shared if someone else wants to investigate.
>> >> >> >>> >> I'm
>> >> >> >>> >> planning to run it in the debugger to be able to break and
>> >> >> >>> >> look
>> >> >> >>> >> at
>> >> >> >>> >> states when it happens, but since it takes so long with the
>> >> >> >>> >> production
>> >> >> >>> >> build it is not something i'm looking forward to.
>> >> >> >>> >>
>> >> >> >>> >> On 09/08/2017 11:13 AM, Berk Hess wrote:
>> >> >> >>> >>> Hi,
>> >> >> >>> >>>
>> >> >> >>> >>> We are far behind schedule for the 2017 release. We are
>> >> >> >>> >>> working
>> >> >> >>> >>> hard
>> >> >> >>> >>> on
>> >> >> >>> >>> it, but I don't think we can promise a date yet.
>> >> >> >>> >>>
>> >> >> >>> >>> We have a 2016.4 release planned for this week (might slip
>> >> >> >>> >>> to
>> >> >> >>> >>> next
>> >> >> >>> >>> week). But if you can give us enough details to track down
>> >> >> >>> >>> your
>> >> >> >>> >>> hanging
>> >> >> >>> >>> issue, we might be able to fix it in 2016.4.
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>> >> >> >>> Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46
>> >> >> >>> 90-580
>> >> >> >>> 14
>> >> >> >>> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
>> >> >> >>> --
>> >> >> >>> Gromacs Developers mailing list
>> >> >> >>>
>> >> >> >>> * Please search the archive at
>> >> >> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> >> >> >>> before
>> >> >> >>> posting!
>> >> >> >>>
>> >> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >> >>>
>> >> >> >>> * For (un)subscribe requests visit
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >> >> >>> or send a mail to gmx-developers-request at gromacs.org.
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Gromacs Developers mailing list
>> >> >> >>
>> >> >> >> * Please search the archive at
>> >> >> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> >> >> >> before
>> >> >> >> posting!
>> >> >> >>
>> >> >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >> >>
>> >> >> >> * For (un)subscribe requests visit
>> >> >> >>
>> >> >> >>
>> >> >> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >> >> >> or
>> >> >> >> send a mail to gmx-developers-request at gromacs.org.
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Gromacs Developers mailing list
>> >> >> >
>> >> >> > * Please search the archive at
>> >> >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> >> >> > before
>> >> >> > posting!
>> >> >> >
>> >> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >> >
>> >> >> > * For (un)subscribe requests visit
>> >> >> >
>> >> >> >
>> >> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >> >> > or
>> >> >> > send a mail to gmx-developers-request at gromacs.org.
>> >> >> --
>> >> >> Gromacs Developers mailing list
>> >> >>
>> >> >> * Please search the archive at
>> >> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> >> >> before
>> >> >> posting!
>> >> >>
>> >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >>
>> >> >> * For (un)subscribe requests visit
>> >> >>
>> >> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >> >> or
>> >> >> send a mail to gmx-developers-request at gromacs.org.
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Gromacs Developers mailing list
>> >> >
>> >> > * Please search the archive at
>> >> > http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> >> > before
>> >> > posting!
>> >> >
>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >> >
>> >> > * For (un)subscribe requests visit
>> >> >
>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >> > or
>> >> > send a mail to gmx-developers-request at gromacs.org.
>> >
>> >
>
>
>
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
> send a mail to gmx-developers-request at gromacs.org.


More information about the gromacs.org_gmx-developers mailing list