[gmx-developers] Gromacs 2016.3 (and earlier) freezing up.
Åke Sandgren
ake.sandgren at hpc2n.umu.se
Mon Sep 25 17:51:55 CEST 2017
Update:
Building with IntelMPI instead of OpenMPI seems to have eliminated the
problem.
At least my second test run has run for >6 days without problems.
We used to se this within 3.5-4 days, often earlier.
I'm not 100% sure it solves the problem though since the first test run
i did with this build did hang, or at least took long enough between
writing data that my little monitor script triggered.
I'll run the test a few more times, but results from that will take some
time :-)
On 09/11/2017 05:24 PM, Åke Sandgren wrote:
> Command run was
> srun gmx_mpi mdrun -ntomp 1 -dlb yes -s ion_channel.tpr
>
> Not sure if PME tuning is on or off (not a use, just a sysadmin).
>
> Starting point is step 0, last checkpoint written is step 29776020.
> nstlog is 0 so no other output then checkpoint status.
> (This is ion_channel from Erik L. that we used as part of the benchmark
> for our system so he might know more of the details)
>
> On 09/11/2017 04:53 PM, Mark Abraham wrote:
>> Hi,
>>
>> Thanks. Was PME tuning active? Does it reproduce if that is disabled? Is
>> the PME tuning still active? How many steps have taken place (at least
>> as reported in the log file but ideally from processes)?
>>
>> Mark
>>
>> On Mon, Sep 11, 2017 at 4:42 PM Åke Sandgren <ake.sandgren at hpc2n.umu.se
>> <mailto:ake.sandgren at hpc2n.umu.se>> wrote:
>>
>> My debugger run finally got to the lockup.
>>
>> All processes are waiting on various MPI operations.
>>
>> Attached a stack dump of all 56 tasks.
>>
>> I'll keep the debug session running for a while in case anyone wants
>> some more detailed data.
>> This is a RelwithDeb build though so not everything is available.
>>
>> On 09/08/2017 11:28 AM, Berk Hess wrote:
>> > But you should be able to get some (limited) information by
>> attaching a
>> > debugger to an aldready running process with a release build.
>> >
>> > If you plan on compiling and running a new case, use a release + debug
>> > symbols build. That should run as fast as a release build.
>> >
>> > Cheers,
>> >
>> > Berk
>> >
>> > On 2017-09-08 11:23, Åke Sandgren wrote:
>> >> We have, at least, one case that when run over 2 nodes, or more,
>> quite
>> >> often (always) hangs, i.e. no more output in md.log or otherwise
>> while
>> >> mdrun still consumes cpu time. It takes a random time before it
>> happens,
>> >> like 1-3 days.
>> >>
>> >> The case can be shared if someone else wants to investigate. I'm
>> >> planning to run it in the debugger to be able to break and look at
>> >> states when it happens, but since it takes so long with the
>> production
>> >> build it is not something i'm looking forward to.
>> >>
>> >> On 09/08/2017 11:13 AM, Berk Hess wrote:
>> >>> Hi,
>> >>>
>> >>> We are far behind schedule for the 2017 release. We are working
>> hard on
>> >>> it, but I don't think we can promise a date yet.
>> >>>
>> >>> We have a 2016.4 release planned for this week (might slip to next
>> >>> week). But if you can give us enough details to track down your
>> hanging
>> >>> issue, we might be able to fix it in 2016.4.
>> >
>>
>> --
>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>> Internet: ake at hpc2n.umu.se <mailto:ake at hpc2n.umu.se> Phone: +46 90
>> 7866134 <tel:090-786%2061%2034> Fax: +46 90-580 14 <tel:090-580%2014>
>> Mobile: +46 70 7716134 <tel:070-771%2061%2034> WWW:
>> http://www.hpc2n.umu.se
>> --
>> Gromacs Developers mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-developers_List
>> before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-developers
>> or send a mail to gmx-developers-request at gromacs.org
>> <mailto:gmx-developers-request at gromacs.org>.
>>
>>
>>
>
--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: ake at hpc2n.umu.se Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
More information about the gromacs.org_gmx-developers
mailing list