[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)

Mon Nov 5 21:13:04 CET 2018

Dear Gromacs Users,

I have problem with my Gromacs 2018.3 that keeps crashing due to
segmentation fault after quite long simulations time (more than 12 h wall
clock). It is hard for me to tell why because there is no information why,
except the segmentation fault message. Please find below shortened output
from the log file:
Command line:
  gmx mdrun -v -deffnm md_0_1

GROMACS version:    2018.3
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
Built on:           2018-10-17 19:53:24
Built by:           kolman at kolman-B85-HD3 [CMAKE]
Build OS/arch:      Linux 4.15.0-36-generic x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Build CPU family:   6   Model: 60   Stepping: 3
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/gcc-6 GNU 6.4.0
C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /usr/bin/g++-6 GNU 6.4.0
C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        9.10
CUDA runtime:       9.10

Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
Hardware detected:
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
    Family: 6   Model: 60   Stepping: 3
    Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel lahf
mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
sse3 sse4.1 sse4.2 ssse3 tdt x2apic
  Hardware topology: Basic
    Sockets, cores, and logical processors:
      Socket  0: [   0   4] [   1   5] [   2   6] [   3   7]
  GPU info:
    Number of GPUs detected: 1
    #0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC:  no, stat:
compatible
...

nput Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.002
   nsteps                         = 50000000
   init-step                      = 0
   simulation-part                = 1
   comm-mode                      = Linear
   nstcomm                        = 100
   bd-fric                        = 0
   ld-seed                        = -105855329
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 500000
   nstvout                        = 500000
   nstfout                        = 0
   nstlog                         = 500000
   nstcalcenergy                  = 100
   nstenergy                      = 50000
   nstxout-compressed             = 50000
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 10
   ns-type                        = Grid
   pbc                            = xyz
   periodic-molecules             = false
   verlet-buffer-tolerance        = 0.005
   rlist                          = 1
   coulombtype                    = PME
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 1
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 1
   DispCorr                       = EnerPres
   table-extension                = 1
   fourierspacing                 = 0.118
   fourier-nx                     = 52
   fourier-ny                     = 52
   fourier-nz                     = 52
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   implicit-solvent               = No
   gb-algorithm                   = Still
   nstgbradii                     = 1
   rgbradii                       = 1
   gb-epsilon-solvent             = 80
   gb-saltconc                    = 0
   gb-obc-alpha                   = 1
   gb-obc-beta                    = 0.8
   gb-obc-gamma                   = 4.85
   gb-dielectric-offset           = 0.009
   sa-algorithm                   = Ace-approximation
   sa-surface-tension             = 2.05016
   tcoupl                         = V-rescale
   nsttcouple                     = 10
   nh-chain-length                = 0
   print-nose-hoover-chain-variables = false
   pcoupl                         = Parrinello-Rahman
   pcoupltype                     = Isotropic
   nstpcouple                     = 10
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
   ref-p (3x3):
      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
   refcoord-scaling               = COM
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = false
   QMconstraints                  = 0
   QMMMscheme                     = 0
   MMChargeScaleFactor            = 1
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = true
   Shake-SOR                      = false
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 1
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = false
   awh                            = false
   rotation                       = false
   interactiveMD                  = false
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = false
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = false
   swapcoords                     = no
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
   applied-forces:
     electric-field:
       x:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       y:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       z:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
grpopts:
   nrdf:     7859.43     33729.6
   ref-t:         300         300
   tau-t:         0.1         0.1
annealing:          No          No
annealing-npoints:           0           0
   acc:            0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Changing nstlist from 10 to 100, rlist from 1 to 1.148

Using 1 MPI thread
Using 8 OpenMP threads

1 GPU auto-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
  PP:0,PME:0
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
...
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size: 1073

Long Range LJ corr.: <C6> 3.3459e-04
Generated table with 1074 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1074 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1074 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1074 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1074 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1074 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Using GPU 8x8 nonbonded short-range kernels

Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
  outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
  inner list: updated every  12 steps, buffer 0.002 nm, rlist 1.002 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would
be:
  outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
  inner list: updated every  12 steps, buffer 0.050 nm, rlist 1.050 nm

Using Lorentz-Berthelot Lennard-Jones combination rule

Initializing LINear Constraint Solver
The number of constraints is 3840

There are: 20736 Atoms

Started mdrun on rank 0 Sun Nov  4 23:01:29 2018
           Step           Time
              0        0.00000

   Energies (kJ/mol)
            U-B    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    7.80480e+03    5.27100e+03    8.63175e+01    4.08652e+03    4.83769e+03
        LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.      Potential
    3.63164e+04   -2.90354e+03   -3.22530e+05    1.96307e+03   -2.65067e+05
    Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC (bar)
    5.18776e+04   -2.13190e+05   -2.13177e+05    3.00053e+02   -2.32857e+02
 Pressure (bar)   Constr. rmsd
   -5.67996e+01    9.57285e-06

step  200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
M-cycles
step  400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
M-cycles
step  600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
M-cycles
step  800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
M-cycles
step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
M-cycles
step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
M-cycles
step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
M-cycles
step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
M-cycles
step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
M-cycles
              optimal pme grid 48 48 48, coulomb cutoff 1.045

Last checkpoint:

Writing checkpoint, step 22388100 at Mon Nov  5 08:31:29 2018

           Step           Time
       22500000    45000.00000

   Energies (kJ/mol)
            U-B    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
    7.74565e+03    5.28043e+03    5.63610e+01    3.87191e+03    4.35044e+03
        LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.      Potential
    3.61122e+04   -2.92965e+03   -3.24570e+05    1.59058e+03   -2.68492e+05
    Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC (bar)
    5.16199e+04   -2.16872e+05   -3.11535e+05    2.98562e+02   -2.37059e+02
 Pressure (bar)   Constr. rmsd
    4.08107e+01    9.30833e-06

Thank you in advance for any help. Please let me know if any additional
information is needed.

Best regards,
Krzysztof