[gmx-users] Simulation freezes but the job keeps on running

Szilárd Páll pall.szilard at gmail.com
Thu Jan 25 20:42:03 CET 2018


On Thu, Jan 25, 2018 at 7:23 PM, Searle Duay <searle.duay at uconn.edu> wrote:

> Hi Ake,
>
> I am not sure, and I don't know how to check the build. But, I see the
> following in the output log file whenever I run GROMACS in PSC bridges:
>
> GROMACS version:    2016
> Precision:          single
> Memory model:       64 bit
> MPI library:        MPI
> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
> GPU support:        CUDA
> SIMD instructions:  AVX2_256
> FFT library:        fftw-3.3.4-sse2-avx
> RDTSCP usage:       enabled
> TNG support:        enabled
> Hwloc support:      hwloc-1.7.0
> Tracing support:    disabled
> Built on:           Fri Oct  7 15:06:50 EDT 2016
> Built by:           mmadrid at gpu012.pvt.bridges.psc.edu [CMAKE]
> Build OS/arch:      Linux 3.10.0-327.4.5.el7.x86_64 x86_64
> Build CPU vendor:   Intel
> Build CPU brand:    Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
> Build CPU family:   6   Model: 63   Stepping: 2
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf
> mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
> sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> C compiler:         /usr/lib64/ccache/cc GNU 4.8.5
> C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> C++ compiler:       /usr/lib64/ccache/c++ GNU 4.8.5
> C++ compiler flags:  -march=core-avx2    -std=c++0x   -O3 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> CUDA compiler:      /opt/packages/cuda/8.0RC/bin/nvcc nvcc: NVIDIA (R)
> Cuda
> compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
> Wed_May__4_21:01:56_CDT_2016;Cuda compilation tools, release 8.0, V8.0.26
> CUDA compiler
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=
> compute_30,code=sm_30;-gencode;arch=compute_35,code=
> sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=
> compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;\
>
> -gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_
> 61,code=sm_61;-gencode;arch=compute_60,code=compute_60;-
> gencode;arch=compute_61,code=compute_61;-use_fast_math;;;-
> Xcompiler;,-march=core-avx2,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funro\
>
> ll-all-loops,-fexcess-precision=fast,,;
> CUDA driver:        9.0
> CUDA runtime:       8.0
>
> Would that be built using openmpi?
>

Based on that it's hard to say. We don't detect MPI flavors, the only hint
from the version header would be the path to the compiler wrapper that
might indicate what was the MPI version used. However, in this case whoever
compiled GROMACS used ccache so we can't see the full path to an mpicc
binary.

I suggest that you consult your admins, perhaps try to use a different
MPI/version.


Åke, do you have any other data from out investigation (e.g. version/range
that reproduced the hangs, freqeuncy of hangs, size of the runs, etc.).

--
Szilárd



>
> Thanks!
>
> Searle
>
> On Thu, Jan 25, 2018 at 1:08 PM, Åke Sandgren <ake.sandgren at hpc2n.umu.se>
> wrote:
>
> > Is that build using openmpi?
> >
> > We've seen cases when gromacs built with openmpi hangs repeatedly, while
> > the same build using intelmpi works.
> >
> > We still haven't figured out why.
> >
> > On 01/25/2018 06:39 PM, Searle Duay wrote:
> > > Good day!
> > >
> > > I am running a 10 ns peptide-membrane simulation using GPUs from PSC
> > > Bridges. The simulation starts properly, but it does not end on the
> time
> > > that the simulation will end, as estimated by the software. The job is
> > > still running and the simulation seems frozen because no simulation
> time
> > is
> > > added even after an hour of the job running.
> > >
> > > I have submitted the following SLURM code:
> > >
> > > #!/bin/bash
> > > #SBATCH -J k80_1n_4g
> > > #SBATCH -o %j.out
> > > #SBATCH -N 1
> > > #SBATCH -n 28
> > > #SBATCH --ntasks-per-node=28
> > > #SBATCH -p GPU
> > > #SBATCH --gres=gpu:k80:4
> > > #SBATCH -t 48:00:00
> > > #SBATCH --mail-type=BEGIN,END,FAIL
> > > #SBATCH --mail-user=searle.duay at uconn.edu
> > >
> > > set echo
> > > set -x
> > >
> > > module load gromacs/2016_gpu
> > >
> > > echo SLURM_NPROCS= $SLURM_NPROCS
> > >
> > > cd $SCRATCH/prot_umbrella/gromacs/conv
> > >
> > > mpirun -np $SLURM_NPROCS gmx_mpi mdrun -deffnm umbrella8 -pf
> > > pullf-umbrella8.xvg -px pullx-umbrella8.xvg -v -ntomp 2
> > >
> > > exit
> > >
> > > I am not sure if the error is from the hardware or from my simulation
> > > setup. I have already ran similar simulations (I just varied the number
> > of
> > > nodes that I am using, but same system), and some of them are
> successful.
> > > There are just some which seems to freeze in the middle of the run.
> > >
> > > Thank you!
> > >
> >
> > --
> > Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
> > Internet: ake at hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
> > Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at http://www.gromacs.org/
> > Support/Mailing_Lists/GMX-Users_List before posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
>
>
>
> --
> Searle Aichelle S. Duay
> Ph.D. Student
> Chemistry Department, University of Connecticut
> searle.duay at uconn.edu
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list