[gmx-developers] next Gromacs release

David van der Spoel spoel at xray.bmc.uu.se
Mon Jun 14 15:34:31 CEST 2010


On 2010-06-14 15.13, Carsten Kutzner wrote:
> On Jun 12, 2010, at 12:14 PM, Carsten Kutzner wrote:
>
>> Hi,
>>
>> I have noticed that with some MPI implementations (Intel MPI, IBM's poe
>> and most likely also MPICH2) the g_tune_pme tool sometimes gets
>> stuck after having successfully done a part of the test runs.
>>
>> This happens for cases where mdrun (in init_domain_decomposition) cannot
>> find a suitable decomposition and shortly afterwards calls MPI_Abort via
>> gmx_fatal. Some (buggy!?) MPI implementations cannot guaratee that all MPI
>> processes are cleanly cancelled after a call to MPI_Abort and in those cases the
>> control is never given back to g_tune_pme, which then thinks mdrun is still running.
>>
>> My question is, do we really have to call gmx_fatal when no suitable dd can
>> be found? At that point, all MPI processes are still living and we could finish mdrun
>> cleanly with an MPI_Finalize (just as in successful runs), thus omitting the hangs in
>> the tuning utility. I think that also normal mdruns, when unable to find a dd grid,
>> would for those MPI's end as zombies until they eventually get killed by the queueing
>> system.
> Any comments?
>
> Should I check in a patch?
>
> Carsten
>

This is of course a special case that can be expected to happen. It 
would be much nicer to fix gmx_fatal, but how?

-- 
David van der Spoel, Ph.D., Professor of Biology
Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:	+46184714205.
spoel at xray.bmc.uu.se    http://folding.bmc.uu.se



More information about the gromacs.org_gmx-developers mailing list