[gmx-users] domain decomposition error >60 ns into simulation on a specific machine

Thu Feb 14 20:25:35 CET 2019

Hi all,

My student is trying to do a fairly straightforward MD simulation -- a
protein complex in water with ions with *no* pull coordinate.  It's on an
NVidia GPU-based machine and we're running gromacs 2018.3.

About 65 ns into the simulation, it dies with:

"an atom moved too far between two domain decomposition steps. This usually
means that your system is not well equilibrated"

If we restart at, say, 2 ns before it died, it then runs fine, PAST where
it died before, for another ~63 ns or so, and then dies with the same
error.  We have had far larger and arguably more complex gromacs jobs run
fine on this same machine.

Even stranger, when we run the same, problematic job on a different NVidia
GPU-based machine with slightly older CPUs that's running Gromacs 2016.4,
it runs fine (it's currently at 200 ns).

Below are the Gromacs hardware and compilation specs of the machine on
which it died in case that helps anyone:-  there is a note at the end of
this logfile output  that might be useful -- thanks in advance for any
ideas.
-----------------------------------------

GROMACS version:    2018.3
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
Built on:           2018-10-31 22:05:13
Build OS/arch:      Linux 3.10.0-693.21.1.el7.x86_64 x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Build CPU family:   6   Model: 85   Stepping: 4
Build CPU features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl
clfsh cmov cx8 cx16 f16c fma hle htt intel lahf m
mx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm
sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 4.8.5
C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 4.8.5
C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
driver;Copyright (c) 2005-2018 NVIDIA Corporat
ion;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release
10.0, V10.0.130
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=
sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode
;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;;
 ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        10.0
CUDA runtime:       10.0
Running on 1 node with total 20 cores, 40 logical cores, 4 compatible GPUs
Hardware detected:
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
    Family: 6   Model: 85   Stepping: 4
    Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh
cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr
 nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2
sse3 sse4.1 sse4.2 ssse3 tdt x2apic
    Number of AVX-512 FMA units: Cannot run AVX-512 detection - assuming 2
  Hardware topology: Basic
    Sockets, cores, and logical processors:
      Socket  0: [   0  20] [   1  21] [   2  22] [   3  23] [   4  24] [
5  25] [   6  26] [   7  27] [   8  28] [   9
 29]
      Socket  1: [  10  30] [  11  31] [  12  32] [  13  33] [  14  34] [
15  35] [  16  36] [  17  37] [  18  38] [  19
 39]
  GPU info:
    Number of GPUs detected: 4
    #0: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
    #1: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
    #2: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible
    #3: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
compatible

Highest SIMD level requested by all nodes in run: AVX_512
SIMD instructions selected at compile time:       AVX2_256
This program was compiled for different hardware than you are running on,
which could influence performance. This build might have been configured on
a
login node with only a single AVX-512 FMA unit (in which case AVX2 is
faster),
while the node you are running on has dual AVX-512 FMA units.

-- 
Mala L. Radhakrishnan
Whitehead Associate Professor of Critical Thought
Associate Professor of Chemistry
Wellesley College
106 Central Street
Wellesley, MA 02481
(781)283-2981