[gmx-developers] Gromacs 2016.3 (and earlier) freezing up.

John Eblen jeblen at acm.org
Mon Sep 11 20:09:26 CEST 2017


Hi Szilárd

No, I'm not using the group scheme.

The problem seems similar because:

1) Deadlocks and very slow runs can be hard to distinguish.
2) Since Mark mentioned it, I assume he believes PME tuning is a possible
    cause, which is also the cause in my situation.
3) Åke may be experiencing higher-than-normal memory usage as far as I know.
    Not sure how you know otherwise.
4) By "successful," I assume you mean the tuning had completed. That doesn't
    mean, though, that the tuning could not be creating conditions that
causes the
    problem, like an excessively high cutoff.


John

On Mon, Sep 11, 2017 at 1:09 PM, Szilárd Páll <pall.szilard at gmail.com>
wrote:

> John,
>
> In what way do you think your problem is similar? Åke seems to be
> experiencing a deadlock after successful PME tuning, much later during
> the run, but no excessive memory usage.
>
> Do you happen to be using the group scheme with 2016.x (release code)?
>
> Your issue sounds more like it could be related to the the excessive
> tuning bug with group scheme fixed quite a few months ago, but it's
> yet to be released (https://redmine.gromacs.org/issues/2200).
>
> Cheers,
> --
> Szilárd
>
>
> On Mon, Sep 11, 2017 at 6:50 PM, John Eblen <jeblen at acm.org> wrote:
> > Hi
> >
> > I'm having a similar problem that is related to PME tuning. When it is
> > enabled, GROMACS often, but not
> > always, slows to a crawl and uses excessive amounts of memory. Using
> "huge
> > pages" and setting a high
> > number of PME processes seems to exacerbate the problem.
> >
> > Also, occurrences of this problem seem to correlate with how high the
> tuning
> > raises the cutoff value.
> >
> > Mark, can you give us more information on the problems with PME tuning?
> Is
> > there a redmine?
> >
> >
> > Thanks
> > John
> >
> > On Mon, Sep 11, 2017 at 10:53 AM, Mark Abraham <mark.j.abraham at gmail.com
> >
> > wrote:
> >>
> >> Hi,
> >>
> >> Thanks. Was PME tuning active? Does it reproduce if that is disabled? Is
> >> the PME tuning still active? How many steps have taken place (at least
> as
> >> reported in the log file but ideally from processes)?
> >>
> >> Mark
> >>
> >> On Mon, Sep 11, 2017 at 4:42 PM Åke Sandgren <ake.sandgren at hpc2n.umu.se
> >
> >> wrote:
> >>>
> >>> My debugger run finally got to the lockup.
> >>>
> >>> All processes are waiting on various MPI operations.
> >>>
> >>> Attached a stack dump of all 56 tasks.
> >>>
> >>> I'll keep the debug session running for a while in case anyone wants
> >>> some more detailed data.
> >>> This is a RelwithDeb build though so not everything is available.
> >>>
> >>> On 09/08/2017 11:28 AM, Berk Hess wrote:
> >>> > But you should be able to get some (limited) information by
> attaching a
> >>> > debugger to an aldready running process with a release build.
> >>> >
> >>> > If you plan on compiling and running a new case, use a release +
> debug
> >>> > symbols build. That should run as fast as a release build.
> >>> >
> >>> > Cheers,
> >>> >
> >>> > Berk
> >>> >
> >>> > On 2017-09-08 11:23, Åke Sandgren wrote:
> >>> >> We have, at least, one case that when run over 2 nodes, or more,
> quite
> >>> >> often (always) hangs, i.e. no more output in md.log or otherwise
> while
> >>> >> mdrun still consumes cpu time. It takes a random time before it
> >>> >> happens,
> >>> >> like 1-3 days.
> >>> >>
> >>> >> The case can be shared if someone else wants to investigate. I'm
> >>> >> planning to run it in the debugger to be able to break and look at
> >>> >> states when it happens, but since it takes so long with the
> production
> >>> >> build it is not something i'm looking forward to.
> >>> >>
> >>> >> On 09/08/2017 11:13 AM, Berk Hess wrote:
> >>> >>> Hi,
> >>> >>>
> >>> >>> We are far behind schedule for the 2017 release. We are working
> hard
> >>> >>> on
> >>> >>> it, but I don't think we can promise a date yet.
> >>> >>>
> >>> >>> We have a 2016.4 release planned for this week (might slip to next
> >>> >>> week). But if you can give us enough details to track down your
> >>> >>> hanging
> >>> >>> issue, we might be able to fix it in 2016.4.
> >>> >
> >>>
> >>> --
> >>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
> >>> Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
> >>> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
> >>> --
> >>> Gromacs Developers mailing list
> >>>
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
> before
> >>> posting!
> >>>
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>
> >>> * For (un)subscribe requests visit
> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx
> -developers
> >>> or send a mail to gmx-developers-request at gromacs.org.
> >>
> >>
> >> --
> >> Gromacs Developers mailing list
> >>
> >> * Please search the archive at
> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> >> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or
> >> send a mail to gmx-developers-request at gromacs.org.
> >
> >
> >
> > --
> > Gromacs Developers mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or
> > send a mail to gmx-developers-request at gromacs.org.
> --
> Gromacs Developers mailing list
>
> * Please search the archive at http://www.gromacs.org/Support
> /Mailing_Lists/GMX-developers_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
> or send a mail to gmx-developers-request at gromacs.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://maillist.sys.kth.se/pipermail/gromacs.org_gmx-developers/attachments/20170911/174e0d37/attachment-0001.html>


More information about the gromacs.org_gmx-developers mailing list