[gmx-users] Gromacs 5.1.2 and OMP_NUM_THREADS

Mark Abraham mark.j.abraham at gmail.com
Tue Jul 19 19:29:03 CEST 2016


Hi,

If you build GROMACS 5.0.4 with the OpenMPI 1.10.x stacks and observe the
same problems, then you'll confirm our expectation based on widespread
effective use of MPI-enabled GROMACS that the problem is either in OpenMPI
1.10.x, or how you've configured it. Or build GROMACS 5.1.2 with OpenMPI
1.8.4 (as you suggested earlier).

Mark

On Tue, Jul 19, 2016 at 5:00 PM Susan Chacko <susanc at helix.nih.gov> wrote:

>
> We tried a rebuild of Gromacs 5.1.2 with OpenMPI 1.10.3, and also tried
> runs with -pin on, off, auto and --bind-to-none.
> It seems that the results are non-deterministic: i.e. they sometimes
> succeed and sometimes fail.
> The same was observed on both IB FDR and IB QDR fabrics.
>
> Gromacs 5.1.2 built with Openmpi 1.10.0
> -------------------------------------------------------
> - Running without any setting for '-pin' makes mdrun_mpi jobs fail
> randomly.
> - No difference using explicit settings for -pin (‘auto’, ‘on’, ‘off’);
> jobs hang randomly.
> - No difference using '--bind-to-none' + any setting of '-pin’; jobs hang
> randomly.
> - Have tested using exact same input files (topol.top, em.tpr, etc…)
> - Have tested using pdb as input for genbox, solvate, ions, then halts
> randomly during minimization mdrun_mpi.
>
> Gromacs 5.1.2 built with OpenMPI 1.10.3
> --------------------------------------------------------
> - As with OpenMPI 1.10.0, jobs halt randomly using any ‘-pin’ setting.
> - When multiple jobs are run with the exact same input files and
> parameters, some fail and some do not.
> - Some of the failed jobs and some working jobs ran on the same nodes, so
> it is not likely to be a hardware problem.
> - Commonly encountered the error "ORTE has lost communication with its
> daemon located on node: hostname: node#”.  Node varied between runs.
>
> Any further suggestions? Not sure where to go from here....should the user
> return to using Gromacs 5.0.4?
>
> Susan.
>
>
>
> > On Jul 5, 2016, at 2:34 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> >
> > Susan,
> >
> > Have you tried mpirun --bind-to none? For the last few releases
> > OpenMPI messes with the CPUSET/affinities by default which may be
> > interacting badly with the Intel OpeMP library.
> >
> > What about running with -pin on (or -pin off)?
> >
> > Cheers,
> > --
> > Szilárd
> >
> >
> > On Tue, Jul 5, 2016 at 4:13 PM, Mark Abraham <mark.j.abraham at gmail.com>
> wrote:
> >> Hi,
> >>
> >> OpenMPI 1.10.0 has six months worth of bugs now fixed in 1.10.3, some of
> >> which seem plausible to explain this behaviour. There's been no GROMACS
> >> issue that seems similar. Please try another OpenMPI and let us know how
> >> you go!
> >>
> >> Mark
> >>
> >> On Tue, 5 Jul 2016 15:55 Susan Chacko <susanc at helix.nih.gov> wrote:
> >>
> >>>
> >>> Hi all,
> >>>
> >>> One of our users is having problems with Gromacs 5.1.2. hanging at the
> >>> start of an mdrun using OMP_NUM_THREADS=2. When run with
> OMP_NUM_THREADS=1,
> >>> the job runs fine.
> >>>
> >>> The stalling command is:
> >>> mpirun -np 128 mdrun_mpi -nb cpu -v -deffnm em
> >>>
> >>> The same command and job work fine in Gromacs 5.0.4 with
> OMP_NUM_THREADS=2
> >>>
> >>> Gromacs 5.0.4 and 5.1.2 were built on our system with Intel compiler
> >>> 2015.1.133:
> >>>
> >>> cmake ../gromacs-5.1.2  \
> >>> -DGMX_BUILD_OWN_FFTW=ON \
> >>> -DREGRESSIONTEST_DOWNLOAD=ON \
> >>> -DGMX_MPI=on \
> >>> -DGMX_BUILD_MDRUN_ONLY=on  \
> >>> -DBUILD_SHARED_LIBS=off
> >>>
> >>> One difference I can see is that Gromacs 5.0.4 was built with OpenMPI
> >>> 1.8.4, and Gromacs 5.1.2 was built with OpenMPI 1.10.0. Is that likely
> to
> >>> be the cause of the problem? If so, I could rebuilt Gromacs 5.1.2 with
> >>> OpenMPI 1.8.4
> >>>
> >>> Any ideas what might be causing the stall? Any other flags we should
> use
> >>> to compile?
> >>>
> >>> All suggestions appreciated,
> >>> Susan.
> >>>
> >>>
> >>> Susan Chacko, PhD
> >>> HPC @ NIH staff
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Gromacs Users mailing list
> >>>
> >>> * Please search the archive at
> >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> >>> posting!
> >>>
> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>>
> >>> * For (un)subscribe requests visit
> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> >>> send a mail to gmx-users-request at gromacs.org.
> >>>
> >> --
> >> Gromacs Users mailing list
> >>
> >> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >>
> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >>
> >> * For (un)subscribe requests visit
> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.


More information about the gromacs.org_gmx-users mailing list