[gmx-users] domain decomposition error >60 ns into simulation on a specific machine

Thu Feb 14 22:03:13 CET 2019

Hi Mark,

To my knowledge, she's not using CHARMM-related FF's at all -- I think she
is using Amber03 (Alyssa, correct me if I'm wrong). Visually and RSMD-wise
the trajectory looks totally normal, but is there something specific I
should be looking for in the trajectory, either visually or quantitatively?

Thanks,

Mala

On Thu, Feb 14, 2019 at 3:35 PM Mark Abraham <mark.j.abraham at gmail.com>
wrote:

> Hi,
>
> What does the trajectory look like before it crashes?
>
> We did recently fix a bug relevant to simulations using CHARMM switching
> functions on GPUs, if that could be an explanation. We will probably put
> out a new 2018 version with that fix next week (or so).
>
> Mark
>
> On Thu., 14 Feb. 2019, 20:26 Mala L Radhakrishnan, <mradhakr at wellesley.edu
> >
> wrote:
>
> > Hi all,
> >
> > My student is trying to do a fairly straightforward MD simulation -- a
> > protein complex in water with ions with *no* pull coordinate.  It's on an
> > NVidia GPU-based machine and we're running gromacs 2018.3.
> >
> > About 65 ns into the simulation, it dies with:
> >
> > "an atom moved too far between two domain decomposition steps. This
> usually
> > means that your system is not well equilibrated"
> >
> > If we restart at, say, 2 ns before it died, it then runs fine, PAST where
> > it died before, for another ~63 ns or so, and then dies with the same
> > error.  We have had far larger and arguably more complex gromacs jobs run
> > fine on this same machine.
> >
> > Even stranger, when we run the same, problematic job on a different
> NVidia
> > GPU-based machine with slightly older CPUs that's running Gromacs 2016.4,
> > it runs fine (it's currently at 200 ns).
> >
> > Below are the Gromacs hardware and compilation specs of the machine on
> > which it died in case that helps anyone:-  there is a note at the end of
> > this logfile output  that might be useful -- thanks in advance for any
> > ideas.
> > -----------------------------------------
> >
> > GROMACS version:    2018.3
> > Precision:          single
> > Memory model:       64 bit
> > MPI library:        thread_mpi
> > OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support:        CUDA
> > SIMD instructions:  AVX2_256
> > FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128
> > RDTSCP usage:       enabled
> > TNG support:        enabled
> > Hwloc support:      disabled
> > Tracing support:    disabled
> > Built on:           2018-10-31 22:05:13
> > Build OS/arch:      Linux 3.10.0-693.21.1.el7.x86_64 x86_64
> > Build CPU vendor:   Intel
> > Build CPU brand:    Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> > Build CPU family:   6   Model: 85   Stepping: 4
> > Build CPU features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl
> > clfsh cmov cx8 cx16 f16c fma hle htt intel lahf m
> > mx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm
> > sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > C compiler:         /usr/bin/cc GNU 4.8.5
> > C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
> > -fexcess-precision=fast
> > C++ compiler:       /usr/bin/c++ GNU 4.8.5
> > C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
> compiler
> > driver;Copyright (c) 2005-2018 NVIDIA Corporat
> > ion;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release
> > 10.0, V10.0.130
> > CUDA compiler
> >
> >
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=
> >
> >
> sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode
> >
> >
> ;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;;
> >
> >
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver:        10.0
> > CUDA runtime:       10.0
> > Running on 1 node with total 20 cores, 40 logical cores, 4 compatible
> GPUs
> > Hardware detected:
> >   CPU info:
> >     Vendor: Intel
> >     Brand:  Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> >     Family: 6   Model: 85   Stepping: 4
> >     Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh
> > cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr
> >  nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2
> > sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> >     Number of AVX-512 FMA units: Cannot run AVX-512 detection - assuming
> 2
> >   Hardware topology: Basic
> >     Sockets, cores, and logical processors:
> >       Socket  0: [   0  20] [   1  21] [   2  22] [   3  23] [   4  24] [
> > 5  25] [   6  26] [   7  27] [   8  28] [   9
> >  29]
> >       Socket  1: [  10  30] [  11  31] [  12  32] [  13  33] [  14  34] [
> > 15  35] [  16  36] [  17  37] [  18  38] [  19
> >  39]
> >   GPU info:
> >     Number of GPUs detected: 4
> >     #0: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
> > compatible
> >     #1: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
> > compatible
> >     #2: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
> > compatible
> >     #3: NVIDIA GeForce GTX 1080 Ti, compute cap.: 6.1, ECC:  no, stat:
> > compatible
> >
> > Highest SIMD level requested by all nodes in run: AVX_512
> > SIMD instructions selected at compile time:       AVX2_256
> > This program was compiled for different hardware than you are running on,
> > which could influence performance. This build might have been configured
> on
> > a
> > login node with only a single AVX-512 FMA unit (in which case AVX2 is
> > faster),
> > while the node you are running on has dual AVX-512 FMA units.
> >
> >
> >
> > --
> > Mala L. Radhakrishnan
> > Whitehead Associate Professor of Critical Thought
> > Associate Professor of Chemistry
> > Wellesley College
> > 106 Central Street
> > Wellesley, MA 02481
> > (781)283-2981
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>

-- 
Mala L. Radhakrishnan
Whitehead Associate Professor of Critical Thought
Associate Professor of Chemistry
Wellesley College
106 Central Street
Wellesley, MA 02481
(781)283-2981