[gmx-users] Gromacs 5.1.2 and OMP_NUM_THREADS
Susan Chacko
susanc at helix.nih.gov
Tue Jul 19 17:00:22 CEST 2016
We tried a rebuild of Gromacs 5.1.2 with OpenMPI 1.10.3, and also tried runs with -pin on, off, auto and --bind-to-none.
It seems that the results are non-deterministic: i.e. they sometimes succeed and sometimes fail.
The same was observed on both IB FDR and IB QDR fabrics.
Gromacs 5.1.2 built with Openmpi 1.10.0
-------------------------------------------------------
- Running without any setting for '-pin' makes mdrun_mpi jobs fail randomly.
- No difference using explicit settings for -pin (‘auto’, ‘on’, ‘off’); jobs hang randomly.
- No difference using '--bind-to-none' + any setting of '-pin’; jobs hang randomly.
- Have tested using exact same input files (topol.top, em.tpr, etc…)
- Have tested using pdb as input for genbox, solvate, ions, then halts randomly during minimization mdrun_mpi.
Gromacs 5.1.2 built with OpenMPI 1.10.3
--------------------------------------------------------
- As with OpenMPI 1.10.0, jobs halt randomly using any ‘-pin’ setting.
- When multiple jobs are run with the exact same input files and parameters, some fail and some do not.
- Some of the failed jobs and some working jobs ran on the same nodes, so it is not likely to be a hardware problem.
- Commonly encountered the error "ORTE has lost communication with its daemon located on node: hostname: node#”. Node varied between runs.
Any further suggestions? Not sure where to go from here....should the user return to using Gromacs 5.0.4?
Susan.
> On Jul 5, 2016, at 2:34 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
>
> Susan,
>
> Have you tried mpirun --bind-to none? For the last few releases
> OpenMPI messes with the CPUSET/affinities by default which may be
> interacting badly with the Intel OpeMP library.
>
> What about running with -pin on (or -pin off)?
>
> Cheers,
> --
> Szilárd
>
>
> On Tue, Jul 5, 2016 at 4:13 PM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>> Hi,
>>
>> OpenMPI 1.10.0 has six months worth of bugs now fixed in 1.10.3, some of
>> which seem plausible to explain this behaviour. There's been no GROMACS
>> issue that seems similar. Please try another OpenMPI and let us know how
>> you go!
>>
>> Mark
>>
>> On Tue, 5 Jul 2016 15:55 Susan Chacko <susanc at helix.nih.gov> wrote:
>>
>>>
>>> Hi all,
>>>
>>> One of our users is having problems with Gromacs 5.1.2. hanging at the
>>> start of an mdrun using OMP_NUM_THREADS=2. When run with OMP_NUM_THREADS=1,
>>> the job runs fine.
>>>
>>> The stalling command is:
>>> mpirun -np 128 mdrun_mpi -nb cpu -v -deffnm em
>>>
>>> The same command and job work fine in Gromacs 5.0.4 with OMP_NUM_THREADS=2
>>>
>>> Gromacs 5.0.4 and 5.1.2 were built on our system with Intel compiler
>>> 2015.1.133:
>>>
>>> cmake ../gromacs-5.1.2 \
>>> -DGMX_BUILD_OWN_FFTW=ON \
>>> -DREGRESSIONTEST_DOWNLOAD=ON \
>>> -DGMX_MPI=on \
>>> -DGMX_BUILD_MDRUN_ONLY=on \
>>> -DBUILD_SHARED_LIBS=off
>>>
>>> One difference I can see is that Gromacs 5.0.4 was built with OpenMPI
>>> 1.8.4, and Gromacs 5.1.2 was built with OpenMPI 1.10.0. Is that likely to
>>> be the cause of the problem? If so, I could rebuilt Gromacs 5.1.2 with
>>> OpenMPI 1.8.4
>>>
>>> Any ideas what might be causing the stall? Any other flags we should use
>>> to compile?
>>>
>>> All suggestions appreciated,
>>> Susan.
>>>
>>>
>>> Susan Chacko, PhD
>>> HPC @ NIH staff
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list