[gmx-users] Gromacs 5.1.2 and OMP_NUM_THREADS

Susan Chacko susanc at helix.nih.gov
Tue Jul 19 17:00:22 CEST 2016


We tried a rebuild of Gromacs 5.1.2 with OpenMPI 1.10.3, and also tried runs with -pin on, off, auto and --bind-to-none. 
It seems that the results are non-deterministic: i.e. they sometimes succeed and sometimes fail. 
The same was observed on both IB FDR and IB QDR fabrics. 

Gromacs 5.1.2 built with Openmpi 1.10.0
-------------------------------------------------------
- Running without any setting for '-pin' makes mdrun_mpi jobs fail randomly. 
- No difference using explicit settings for -pin (‘auto’, ‘on’, ‘off’); jobs hang randomly.
- No difference using '--bind-to-none' + any setting of '-pin’; jobs hang randomly. 
- Have tested using exact same input files (topol.top, em.tpr, etc…)
- Have tested using pdb as input for genbox, solvate, ions, then halts randomly during minimization mdrun_mpi.

Gromacs 5.1.2 built with OpenMPI 1.10.3
--------------------------------------------------------
- As with OpenMPI 1.10.0, jobs halt randomly using any ‘-pin’ setting.
- When multiple jobs are run with the exact same input files and parameters, some fail and some do not.
- Some of the failed jobs and some working jobs ran on the same nodes, so it is not likely to be a hardware problem.
- Commonly encountered the error "ORTE has lost communication with its daemon located on node: hostname: node#”.  Node varied between runs.

Any further suggestions? Not sure where to go from here....should the user return to using Gromacs 5.0.4? 

Susan.



> On Jul 5, 2016, at 2:34 PM, Szilárd Páll <pall.szilard at gmail.com> wrote:
> 
> Susan,
> 
> Have you tried mpirun --bind-to none? For the last few releases
> OpenMPI messes with the CPUSET/affinities by default which may be
> interacting badly with the Intel OpeMP library.
> 
> What about running with -pin on (or -pin off)?
> 
> Cheers,
> --
> Szilárd
> 
> 
> On Tue, Jul 5, 2016 at 4:13 PM, Mark Abraham <mark.j.abraham at gmail.com> wrote:
>> Hi,
>> 
>> OpenMPI 1.10.0 has six months worth of bugs now fixed in 1.10.3, some of
>> which seem plausible to explain this behaviour. There's been no GROMACS
>> issue that seems similar. Please try another OpenMPI and let us know how
>> you go!
>> 
>> Mark
>> 
>> On Tue, 5 Jul 2016 15:55 Susan Chacko <susanc at helix.nih.gov> wrote:
>> 
>>> 
>>> Hi all,
>>> 
>>> One of our users is having problems with Gromacs 5.1.2. hanging at the
>>> start of an mdrun using OMP_NUM_THREADS=2. When run with OMP_NUM_THREADS=1,
>>> the job runs fine.
>>> 
>>> The stalling command is:
>>> mpirun -np 128 mdrun_mpi -nb cpu -v -deffnm em
>>> 
>>> The same command and job work fine in Gromacs 5.0.4 with OMP_NUM_THREADS=2
>>> 
>>> Gromacs 5.0.4 and 5.1.2 were built on our system with Intel compiler
>>> 2015.1.133:
>>> 
>>> cmake ../gromacs-5.1.2  \
>>> -DGMX_BUILD_OWN_FFTW=ON \
>>> -DREGRESSIONTEST_DOWNLOAD=ON \
>>> -DGMX_MPI=on \
>>> -DGMX_BUILD_MDRUN_ONLY=on  \
>>> -DBUILD_SHARED_LIBS=off
>>> 
>>> One difference I can see is that Gromacs 5.0.4 was built with OpenMPI
>>> 1.8.4, and Gromacs 5.1.2 was built with OpenMPI 1.10.0. Is that likely to
>>> be the cause of the problem? If so, I could rebuilt Gromacs 5.1.2 with
>>> OpenMPI 1.8.4
>>> 
>>> Any ideas what might be causing the stall? Any other flags we should use
>>> to compile?
>>> 
>>> All suggestions appreciated,
>>> Susan.
>>> 
>>> 
>>> Susan Chacko, PhD
>>> HPC @ NIH staff
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Gromacs Users mailing list
>>> 
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>> 
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>> 
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send a mail to gmx-users-request at gromacs.org.
>>> 
>> --
>> Gromacs Users mailing list
>> 
>> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>> 
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>> 
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.



More information about the gromacs.org_gmx-users mailing list