[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)
Krzysztof Kolman
krzysztof.kolman at gmail.com
Tue Nov 6 10:55:22 CET 2018
Dear Gromacs Users,
I just wanted to add an additional information. After doing restart, the
simulation crashed (again segmentation fault) after the same time interval,
which is 12h and 22500000 steps (so now I am at 45000000 steps out of
50000000). I think that this obserevation proves that it is not related to
an unstable simulation but only to some kind of software issue.
Kind regards,
Krzysztof
pon., 5 lis 2018 o 21:12 Krzysztof Kolman <krzysztof.kolman at gmail.com>
napisał(a):
> Dear Gromacs Users,
>
> I have problem with my Gromacs 2018.3 that keeps crashing due to
> segmentation fault after quite long simulations time (more than 12 h wall
> clock). It is hard for me to tell why because there is no information why,
> except the segmentation fault message. Please find below shortened output
> from the log file:
> Command line:
> gmx mdrun -v -deffnm md_0_1
>
> GROMACS version: 2018.3
> Precision: single
> Memory model: 64 bit
> MPI library: thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support: CUDA
> SIMD instructions: AVX2_256
> FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
> RDTSCP usage: enabled
> TNG support: enabled
> Hwloc support: disabled
> Tracing support: disabled
> Built on: 2018-10-17 19:53:24
> Built by: kolman at kolman-B85-HD3 [CMAKE]
> Build OS/arch: Linux 4.15.0-36-generic x86_64
> Build CPU vendor: Intel
> Build CPU brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> Build CPU family: 6 Model: 60 Stepping: 3
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
> intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
> rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> C compiler: /usr/bin/gcc-6 GNU 6.4.0
> C compiler flags: -march=core-avx2 -O3 -DNDEBUG -funroll-all-loops
> -fexcess-precision=fast
> C++ compiler: /usr/bin/g++-6 GNU 6.4.0
> C++ compiler flags: -march=core-avx2 -std=c++11 -O3 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
> Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
> CUDA compiler
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver: 9.10
> CUDA runtime: 9.10
>
>
> Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
> Hardware detected:
> CPU info:
> Vendor: Intel
> Brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> Family: 6 Model: 60 Stepping: 3
> Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel
> lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
> sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> Hardware topology: Basic
> Sockets, cores, and logical processors:
> Socket 0: [ 0 4] [ 1 5] [ 2 6] [ 3 7]
> GPU info:
> Number of GPUs detected: 1
> #0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC: no, stat:
> compatible
> ...
>
> nput Parameters:
> integrator = md
> tinit = 0
> dt = 0.002
> nsteps = 50000000
> init-step = 0
> simulation-part = 1
> comm-mode = Linear
> nstcomm = 100
> bd-fric = 0
> ld-seed = -105855329
> emtol = 10
> emstep = 0.01
> niter = 20
> fcstep = 0
> nstcgsteep = 1000
> nbfgscorr = 10
> rtpi = 0.05
> nstxout = 500000
> nstvout = 500000
> nstfout = 0
> nstlog = 500000
> nstcalcenergy = 100
> nstenergy = 50000
> nstxout-compressed = 50000
> compressed-x-precision = 1000
> cutoff-scheme = Verlet
> nstlist = 10
> ns-type = Grid
> pbc = xyz
> periodic-molecules = false
> verlet-buffer-tolerance = 0.005
> rlist = 1
> coulombtype = PME
> coulomb-modifier = Potential-shift
> rcoulomb-switch = 0
> rcoulomb = 1
> epsilon-r = 1
> epsilon-rf = inf
> vdw-type = Cut-off
> vdw-modifier = Potential-shift
> rvdw-switch = 0
> rvdw = 1
> DispCorr = EnerPres
> table-extension = 1
> fourierspacing = 0.118
> fourier-nx = 52
> fourier-ny = 52
> fourier-nz = 52
> pme-order = 4
> ewald-rtol = 1e-05
> ewald-rtol-lj = 0.001
> lj-pme-comb-rule = Geometric
> ewald-geometry = 0
> epsilon-surface = 0
> implicit-solvent = No
> gb-algorithm = Still
> nstgbradii = 1
> rgbradii = 1
> gb-epsilon-solvent = 80
> gb-saltconc = 0
> gb-obc-alpha = 1
> gb-obc-beta = 0.8
> gb-obc-gamma = 4.85
> gb-dielectric-offset = 0.009
> sa-algorithm = Ace-approximation
> sa-surface-tension = 2.05016
> tcoupl = V-rescale
> nsttcouple = 10
> nh-chain-length = 0
> print-nose-hoover-chain-variables = false
> pcoupl = Parrinello-Rahman
> pcoupltype = Isotropic
> nstpcouple = 10
> tau-p = 1
> compressibility (3x3):
> compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
> compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
> compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
> ref-p (3x3):
> ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
> refcoord-scaling = COM
> posres-com (3):
> posres-com[0]= 0.00000e+00
> posres-com[1]= 0.00000e+00
> posres-com[2]= 0.00000e+00
> posres-comB (3):
> posres-comB[0]= 0.00000e+00
> posres-comB[1]= 0.00000e+00
> posres-comB[2]= 0.00000e+00
> QMMM = false
> QMconstraints = 0
> QMMMscheme = 0
> MMChargeScaleFactor = 1
> qm-opts:
> ngQM = 0
> constraint-algorithm = Lincs
> continuation = true
> Shake-SOR = false
> shake-tol = 0.0001
> lincs-order = 4
> lincs-iter = 1
> lincs-warnangle = 30
> nwall = 0
> wall-type = 9-3
> wall-r-linpot = -1
> wall-atomtype[0] = -1
> wall-atomtype[1] = -1
> wall-density[0] = 0
> wall-density[1] = 0
> wall-ewald-zfac = 3
> pull = false
> awh = false
> rotation = false
> interactiveMD = false
> disre = No
> disre-weighting = Conservative
> disre-mixed = false
> dr-fc = 1000
> dr-tau = 0
> nstdisreout = 100
> orire-fc = 0
> orire-tau = 0
> nstorireout = 100
> free-energy = no
> cos-acceleration = 0
> deform (3x3):
> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> simulated-tempering = false
> swapcoords = no
> userint1 = 0
> userint2 = 0
> userint3 = 0
> userint4 = 0
> userreal1 = 0
> userreal2 = 0
> userreal3 = 0
> userreal4 = 0
> applied-forces:
> electric-field:
> x:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> y:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> z:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> grpopts:
> nrdf: 7859.43 33729.6
> ref-t: 300 300
> tau-t: 0.1 0.1
> annealing: No No
> annealing-npoints: 0 0
> acc: 0 0 0
> nfreeze: N N N
> energygrp-flags[ 0]: 0
>
> Changing nstlist from 10 to 100, rlist from 1 to 1.148
>
> Using 1 MPI thread
> Using 8 OpenMP threads
>
> 1 GPU auto-selected for this run.
> Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
> PP:0,PME:0
> Pinning threads with an auto-selected logical core stride of 1
> System total charge: 0.000
> Will do PME sum in reciprocal space for electrostatic interactions.
> ...
> Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
> Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
> Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size:
> 1073
>
> Long Range LJ corr.: <C6> 3.3459e-04
> Generated table with 1074 data points for Ewald.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for LJ6.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for LJ12.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for 1-4 COUL.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for 1-4 LJ6.
> Tabscale = 500 points/nm
> Generated table with 1074 data points for 1-4 LJ12.
> Tabscale = 500 points/nm
>
> Using GPU 8x8 nonbonded short-range kernels
>
> Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
> outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
> inner list: updated every 12 steps, buffer 0.002 nm, rlist 1.002 nm
> At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would
> be:
> outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
> inner list: updated every 12 steps, buffer 0.050 nm, rlist 1.050 nm
>
> Using Lorentz-Berthelot Lennard-Jones combination rule
>
>
> Initializing LINear Constraint Solver
> The number of constraints is 3840
>
> There are: 20736 Atoms
>
> Started mdrun on rank 0 Sun Nov 4 23:01:29 2018
> Step Time
> 0 0.00000
>
> Energies (kJ/mol)
> U-B Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 7.80480e+03 5.27100e+03 8.63175e+01 4.08652e+03 4.83769e+03
> LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
> 3.63164e+04 -2.90354e+03 -3.22530e+05 1.96307e+03 -2.65067e+05
> Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
> 5.18776e+04 -2.13190e+05 -2.13177e+05 3.00053e+02 -2.32857e+02
> Pressure (bar) Constr. rmsd
> -5.67996e+01 9.57285e-06
>
> step 200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
> M-cycles
> step 400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
> M-cycles
> step 600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
> M-cycles
> step 800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
> M-cycles
> step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
> M-cycles
> step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
> M-cycles
> step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
> M-cycles
> step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
> M-cycles
> step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
> M-cycles
> optimal pme grid 48 48 48, coulomb cutoff 1.045
>
> Last checkpoint:
>
> Writing checkpoint, step 22388100 at Mon Nov 5 08:31:29 2018
>
>
> Step Time
> 22500000 45000.00000
>
> Energies (kJ/mol)
> U-B Proper Dih. Improper Dih. LJ-14 Coulomb-14
> 7.74565e+03 5.28043e+03 5.63610e+01 3.87191e+03 4.35044e+03
> LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. Potential
> 3.61122e+04 -2.92965e+03 -3.24570e+05 1.59058e+03 -2.68492e+05
> Kinetic En. Total Energy Conserved En. Temperature Pres. DC (bar)
> 5.16199e+04 -2.16872e+05 -3.11535e+05 2.98562e+02 -2.37059e+02
> Pressure (bar) Constr. rmsd
> 4.08107e+01 9.30833e-06
>
>
> Thank you in advance for any help. Please let me know if any additional
> information is needed.
>
> Best regards,
> Krzysztof
>
>
>
>
More information about the gromacs.org_gmx-users
mailing list