[gmx-developers] PME tuning-related hang [was Re: Gromacs 2016.3 (and earlier) freezing up.]
Szilárd Páll
pall.szilard at gmail.com
Wed Sep 13 12:21:02 CEST 2017
Forking the discussion as now we've learned more about the issue Åke
is reporting and it is quiterather dissimilar.
On Mon, Sep 11, 2017 at 8:09 PM, John Eblen <jeblen at acm.org> wrote:
> Hi Szilárd
>
> No, I'm not using the group scheme.
$ grep -i 'cutoff-scheme' md.log
cutoff-scheme = Verlet
> The problem seems similar because:
>
> 1) Deadlocks and very slow runs can be hard to distinguish.
> 2) Since Mark mentioned it, I assume he believes PME tuning is a possible
> cause, which is also the cause in my situation.
Does that mean you tested with "-notunepme" and the excessive memory
usage could not be reproduced? Did the memory usage increase only
during the tuning or did it keep increasing after the tuning
completed?
> 3) Åke may be experiencing higher-than-normal memory usage as far as I know.
> Not sure how you know otherwise.
> 4) By "successful," I assume you mean the tuning had completed. That doesn't
> mean, though, that the tuning could not be creating conditions that
> causes the
> problem, like an excessively high cutoff.
Sure. However, it's unlikely that the tuning creates conditions under
which the run proceeds after the after the initial tuning phase and
keeps allocating memory (which is more prone to be the source of
issues).
I suggest to first rule our the bug I linked and if that's not the
culprit, we can have a closer look.
Cheers,
--
Szilárd
>
>
> John
>
> On Mon, Sep 11, 2017 at 1:09 PM, Szilárd Páll <pall.szilard at gmail.com>
> wrote:
>>
>> John,
>>
>> In what way do you think your problem is similar? Åke seems to be
>> experiencing a deadlock after successful PME tuning, much later during
>> the run, but no excessive memory usage.
>>
>> Do you happen to be using the group scheme with 2016.x (release code)?
>>
>> Your issue sounds more like it could be related to the the excessive
>> tuning bug with group scheme fixed quite a few months ago, but it's
>> yet to be released (https://redmine.gromacs.org/issues/2200).
>>
>> Cheers,
>> --
>> Szilárd
>>
>>
>> On Mon, Sep 11, 2017 at 6:50 PM, John Eblen <jeblen at acm.org> wrote:
>> > Hi
>> >
>> > I'm having a similar problem that is related to PME tuning. When it is
>> > enabled, GROMACS often, but not
>> > always, slows to a crawl and uses excessive amounts of memory. Using
>> > "huge
>> > pages" and setting a high
>> > number of PME processes seems to exacerbate the problem.
>> >
>> > Also, occurrences of this problem seem to correlate with how high the
>> > tuning
>> > raises the cutoff value.
>> >
>> > Mark, can you give us more information on the problems with PME tuning?
>> > Is
>> > there a redmine?
>> >
>> >
>> > Thanks
>> > John
>> >
>> > On Mon, Sep 11, 2017 at 10:53 AM, Mark Abraham
>> > <mark.j.abraham at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks. Was PME tuning active? Does it reproduce if that is disabled?
>> >> Is
>> >> the PME tuning still active? How many steps have taken place (at least
>> >> as
>> >> reported in the log file but ideally from processes)?
>> >>
>> >> Mark
>> >>
>> >> On Mon, Sep 11, 2017 at 4:42 PM Åke Sandgren
>> >> <ake.sandgren at hpc2n.umu.se>
>> >> wrote:
>> >>>
>> >>> My debugger run finally got to the lockup.
>> >>>
>> >>> All processes are waiting on various MPI operations.
>> >>>
>> >>> Attached a stack dump of all 56 tasks.
>> >>>
>> >>> I'll keep the debug session running for a while in case anyone wants
>> >>> some more detailed data.
>> >>> This is a RelwithDeb build though so not everything is available.
>> >>>
>> >>> On 09/08/2017 11:28 AM, Berk Hess wrote:
>> >>> > But you should be able to get some (limited) information by
>> >>> > attaching a
>> >>> > debugger to an aldready running process with a release build.
>> >>> >
>> >>> > If you plan on compiling and running a new case, use a release +
>> >>> > debug
>> >>> > symbols build. That should run as fast as a release build.
>> >>> >
>> >>> > Cheers,
>> >>> >
>> >>> > Berk
>> >>> >
>> >>> > On 2017-09-08 11:23, Åke Sandgren wrote:
>> >>> >> We have, at least, one case that when run over 2 nodes, or more,
>> >>> >> quite
>> >>> >> often (always) hangs, i.e. no more output in md.log or otherwise
>> >>> >> while
>> >>> >> mdrun still consumes cpu time. It takes a random time before it
>> >>> >> happens,
>> >>> >> like 1-3 days.
>> >>> >>
>> >>> >> The case can be shared if someone else wants to investigate. I'm
>> >>> >> planning to run it in the debugger to be able to break and look at
>> >>> >> states when it happens, but since it takes so long with the
>> >>> >> production
>> >>> >> build it is not something i'm looking forward to.
>> >>> >>
>> >>> >> On 09/08/2017 11:13 AM, Berk Hess wrote:
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> We are far behind schedule for the 2017 release. We are working
>> >>> >>> hard
>> >>> >>> on
>> >>> >>> it, but I don't think we can promise a date yet.
>> >>> >>>
>> >>> >>> We have a 2016.4 release planned for this week (might slip to next
>> >>> >>> week). But if you can give us enough details to track down your
>> >>> >>> hanging
>> >>> >>> issue, we might be able to fix it in 2016.4.
>> >>> >
>> >>>
>> >>> --
>> >>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>> >>> Internet: ake at hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14
>> >>> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
>> >>> --
>> >>> Gromacs Developers mailing list
>> >>>
>> >>> * Please search the archive at
>> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> >>> before
>> >>> posting!
>> >>>
>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>>
>> >>> * For (un)subscribe requests visit
>> >>>
>> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >>> or send a mail to gmx-developers-request at gromacs.org.
>> >>
>> >>
>> >> --
>> >> Gromacs Developers mailing list
>> >>
>> >> * Please search the archive at
>> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> >> posting!
>> >>
>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >>
>> >> * For (un)subscribe requests visit
>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> >> or
>> >> send a mail to gmx-developers-request at gromacs.org.
>> >
>> >
>> >
>> > --
>> > Gromacs Developers mailing list
>> >
>> > * Please search the archive at
>> > http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> > posting!
>> >
>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> >
>> > * For (un)subscribe requests visit
>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> > or
>> > send a mail to gmx-developers-request at gromacs.org.
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
>> send a mail to gmx-developers-request at gromacs.org.
>
>
>
> --
> Gromacs Developers mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers or
> send a mail to gmx-developers-request at gromacs.org.
More information about the gromacs.org_gmx-developers
mailing list