[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)
Krzysztof Kolman
krzysztof.kolman at gmail.com
Mon Nov 5 21:13:04 CET 2018
Dear Gromacs Users,
I have problem with my Gromacs 2018.3 that keeps crashing due to
segmentation fault after quite long simulations time (more than 12 h wall
clock). It is hard for me to tell why because there is no information why,
except the segmentation fault message. Please find below shortened output
from the log file:
Command line:
gmx mdrun -v -deffnm md_0_1
GROMACS version: 2018.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: 2018-10-17 19:53:24
Built by: kolman at kolman-B85-HD3 [CMAKE]
Build OS/arch: Linux 4.15.0-36-generic x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Build CPU family: 6 Model: 60 Stepping: 3
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/gcc-6 GNU 6.4.0
C compiler flags: -march=core-avx2 -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler: /usr/bin/g++-6 GNU 6.4.0
C++ compiler flags: -march=core-avx2 -std=c++11 -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 9.10
CUDA runtime: 9.10
Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Family: 6 Model: 60 Stepping: 3
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel lahf
mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2
sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 4] [ 1 5] [ 2 6] [ 3 7]
GPU info:
Number of GPUs detected: 1
#0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC: no, stat:
compatible
...
nput Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 50000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = -105855329
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 500000
nstvout = 500000
nstfout = 0
nstlog = 500000
nstcalcenergy = 100
nstenergy = 50000
nstxout-compressed = 50000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 10
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 1
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 1
DispCorr = EnerPres
table-extension = 1
fourierspacing = 0.118
fourier-nx = 52
fourier-ny = 52
fourier-nz = 52
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = V-rescale
nsttcouple = 10
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 10
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
refcoord-scaling = COM
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = true
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
grpopts:
nrdf: 7859.43 33729.6
ref-t: 300 300
tau-t: 0.1 0.1
annealing: No No
annealing-npoints: 0 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing nstlist from 10 to 100, rlist from 1 to 1.148
Using 1 MPI thread
Using 8 OpenMP threads
1 GPU auto-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
...
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size: 1073
Long Range LJ corr.: <C6> 3.3459e-04
Generated table with 1074 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1074 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1074 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1074 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1074 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1074 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Using GPU 8x8 nonbonded short-range kernels
Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
inner list: updated every 12 steps, buffer 0.002 nm, rlist 1.002 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would
be:
outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
inner list: updated every 12 steps, buffer 0.050 nm, rlist 1.050 nm
Using Lorentz-Berthelot Lennard-Jones combination rule
Initializing LINear Constraint Solver
The number of constraints is 3840
There are: 20736 Atoms
Started mdrun on rank 0 Sun Nov 4 23:01:29 2018
Step Time
0 0.00000
Energies (kJ/mol)
U-B Proper Dih. Improper Dih. LJ-14 Coulomb-14
7.80480e+03 5.27100e+03 8.63175e+01 4.08652e+03 4.83769e+03
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
3.63164e+04 -2.90354e+03 -3.22530e+05 1.96307e+03 -2.65067e+05
Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
5.18776e+04 -2.13190e+05 -2.13177e+05 3.00053e+02 -2.32857e+02
Pressure (bar) Constr. rmsd
-5.67996e+01 9.57285e-06
step 200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
M-cycles
step 400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
M-cycles
step 600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
M-cycles
step 800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
M-cycles
step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
M-cycles
step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
M-cycles
step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
M-cycles
step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
M-cycles
step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
M-cycles
optimal pme grid 48 48 48, coulomb cutoff 1.045
Last checkpoint:
Writing checkpoint, step 22388100 at Mon Nov 5 08:31:29 2018
Step Time
22500000 45000.00000
Energies (kJ/mol)
U-B Proper Dih. Improper Dih. LJ-14 Coulomb-14
7.74565e+03 5.28043e+03 5.63610e+01 3.87191e+03 4.35044e+03
LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
3.61122e+04 -2.92965e+03 -3.24570e+05 1.59058e+03 -2.68492e+05
Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
5.16199e+04 -2.16872e+05 -3.11535e+05 2.98562e+02 -2.37059e+02
Pressure (bar) Constr. rmsd
4.08107e+01 9.30833e-06
Thank you in advance for any help. Please let me know if any additional
information is needed.
Best regards,
Krzysztof
More information about the gromacs.org_gmx-users
mailing list