[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)
Szilárd Páll
pall.szilard at gmail.com
Tue Nov 6 13:46:33 CET 2018
Did it really crash after exactly the same number of steps the second time
too?
--
Szilárd
On Tue, Nov 6, 2018 at 10:55 AM Krzysztof Kolman <krzysztof.kolman at gmail.com>
wrote:
> Dear Gromacs Users,
>
> I just wanted to add an additional information. After doing restart, the
> simulation crashed (again segmentation fault) after the same time interval,
> which is 12h and 22500000 steps (so now I am at 45000000 steps out of
> 50000000). I think that this obserevation proves that it is not related to
> an unstable simulation but only to some kind of software issue.
>
> Kind regards,
> Krzysztof
>
> pon., 5 lis 2018 o 21:12 Krzysztof Kolman <krzysztof.kolman at gmail.com>
> napisał(a):
>
> > Dear Gromacs Users,
> >
> > I have problem with my Gromacs 2018.3 that keeps crashing due to
> > segmentation fault after quite long simulations time (more than 12 h wall
> > clock). It is hard for me to tell why because there is no information
> why,
> > except the segmentation fault message. Please find below shortened output
> > from the log file:
> > Command line:
> > gmx mdrun -v -deffnm md_0_1
> >
> > GROMACS version: 2018.3
> > Precision: single
> > Memory model: 64 bit
> > MPI library: thread_mpi
> > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support: CUDA
> > SIMD instructions: AVX2_256
> > FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
> > RDTSCP usage: enabled
> > TNG support: enabled
> > Hwloc support: disabled
> > Tracing support: disabled
> > Built on: 2018-10-17 19:53:24
> > Built by: kolman at kolman-B85-HD3 [CMAKE]
> > Build OS/arch: Linux 4.15.0-36-generic x86_64
> > Build CPU vendor: Intel
> > Build CPU brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> > Build CPU family: 6 Model: 60 Stepping: 3
> > Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
> > intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse
> rdrnd
> > rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > C compiler: /usr/bin/gcc-6 GNU 6.4.0
> > C compiler flags: -march=core-avx2 -O3 -DNDEBUG -funroll-all-loops
> > -fexcess-precision=fast
> > C++ compiler: /usr/bin/g++-6 GNU 6.4.0
> > C++ compiler flags: -march=core-avx2 -std=c++11 -O3 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> > driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
> > Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
> > CUDA compiler
> >
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
> >
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver: 9.10
> > CUDA runtime: 9.10
> >
> >
> > Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
> > Hardware detected:
> > CPU info:
> > Vendor: Intel
> > Brand: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> > Family: 6 Model: 60 Stepping: 3
> > Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel
> > lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
> rdtscp
> > sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > Hardware topology: Basic
> > Sockets, cores, and logical processors:
> > Socket 0: [ 0 4] [ 1 5] [ 2 6] [ 3 7]
> > GPU info:
> > Number of GPUs detected: 1
> > #0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC: no, stat:
> > compatible
> > ...
> >
> > nput Parameters:
> > integrator = md
> > tinit = 0
> > dt = 0.002
> > nsteps = 50000000
> > init-step = 0
> > simulation-part = 1
> > comm-mode = Linear
> > nstcomm = 100
> > bd-fric = 0
> > ld-seed = -105855329
> > emtol = 10
> > emstep = 0.01
> > niter = 20
> > fcstep = 0
> > nstcgsteep = 1000
> > nbfgscorr = 10
> > rtpi = 0.05
> > nstxout = 500000
> > nstvout = 500000
> > nstfout = 0
> > nstlog = 500000
> > nstcalcenergy = 100
> > nstenergy = 50000
> > nstxout-compressed = 50000
> > compressed-x-precision = 1000
> > cutoff-scheme = Verlet
> > nstlist = 10
> > ns-type = Grid
> > pbc = xyz
> > periodic-molecules = false
> > verlet-buffer-tolerance = 0.005
> > rlist = 1
> > coulombtype = PME
> > coulomb-modifier = Potential-shift
> > rcoulomb-switch = 0
> > rcoulomb = 1
> > epsilon-r = 1
> > epsilon-rf = inf
> > vdw-type = Cut-off
> > vdw-modifier = Potential-shift
> > rvdw-switch = 0
> > rvdw = 1
> > DispCorr = EnerPres
> > table-extension = 1
> > fourierspacing = 0.118
> > fourier-nx = 52
> > fourier-ny = 52
> > fourier-nz = 52
> > pme-order = 4
> > ewald-rtol = 1e-05
> > ewald-rtol-lj = 0.001
> > lj-pme-comb-rule = Geometric
> > ewald-geometry = 0
> > epsilon-surface = 0
> > implicit-solvent = No
> > gb-algorithm = Still
> > nstgbradii = 1
> > rgbradii = 1
> > gb-epsilon-solvent = 80
> > gb-saltconc = 0
> > gb-obc-alpha = 1
> > gb-obc-beta = 0.8
> > gb-obc-gamma = 4.85
> > gb-dielectric-offset = 0.009
> > sa-algorithm = Ace-approximation
> > sa-surface-tension = 2.05016
> > tcoupl = V-rescale
> > nsttcouple = 10
> > nh-chain-length = 0
> > print-nose-hoover-chain-variables = false
> > pcoupl = Parrinello-Rahman
> > pcoupltype = Isotropic
> > nstpcouple = 10
> > tau-p = 1
> > compressibility (3x3):
> > compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
> > compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
> > compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
> > ref-p (3x3):
> > ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
> > ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
> > ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
> > refcoord-scaling = COM
> > posres-com (3):
> > posres-com[0]= 0.00000e+00
> > posres-com[1]= 0.00000e+00
> > posres-com[2]= 0.00000e+00
> > posres-comB (3):
> > posres-comB[0]= 0.00000e+00
> > posres-comB[1]= 0.00000e+00
> > posres-comB[2]= 0.00000e+00
> > QMMM = false
> > QMconstraints = 0
> > QMMMscheme = 0
> > MMChargeScaleFactor = 1
> > qm-opts:
> > ngQM = 0
> > constraint-algorithm = Lincs
> > continuation = true
> > Shake-SOR = false
> > shake-tol = 0.0001
> > lincs-order = 4
> > lincs-iter = 1
> > lincs-warnangle = 30
> > nwall = 0
> > wall-type = 9-3
> > wall-r-linpot = -1
> > wall-atomtype[0] = -1
> > wall-atomtype[1] = -1
> > wall-density[0] = 0
> > wall-density[1] = 0
> > wall-ewald-zfac = 3
> > pull = false
> > awh = false
> > rotation = false
> > interactiveMD = false
> > disre = No
> > disre-weighting = Conservative
> > disre-mixed = false
> > dr-fc = 1000
> > dr-tau = 0
> > nstdisreout = 100
> > orire-fc = 0
> > orire-tau = 0
> > nstorireout = 100
> > free-energy = no
> > cos-acceleration = 0
> > deform (3x3):
> > deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> > deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> > deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> > simulated-tempering = false
> > swapcoords = no
> > userint1 = 0
> > userint2 = 0
> > userint3 = 0
> > userint4 = 0
> > userreal1 = 0
> > userreal2 = 0
> > userreal3 = 0
> > userreal4 = 0
> > applied-forces:
> > electric-field:
> > x:
> > E0 = 0
> > omega = 0
> > t0 = 0
> > sigma = 0
> > y:
> > E0 = 0
> > omega = 0
> > t0 = 0
> > sigma = 0
> > z:
> > E0 = 0
> > omega = 0
> > t0 = 0
> > sigma = 0
> > grpopts:
> > nrdf: 7859.43 33729.6
> > ref-t: 300 300
> > tau-t: 0.1 0.1
> > annealing: No No
> > annealing-npoints: 0 0
> > acc: 0 0 0
> > nfreeze: N N N
> > energygrp-flags[ 0]: 0
> >
> > Changing nstlist from 10 to 100, rlist from 1 to 1.148
> >
> > Using 1 MPI thread
> > Using 8 OpenMP threads
> >
> > 1 GPU auto-selected for this run.
> > Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
> > PP:0,PME:0
> > Pinning threads with an auto-selected logical core stride of 1
> > System total charge: 0.000
> > Will do PME sum in reciprocal space for electrostatic interactions.
> > ...
> > Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
> > Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
> > Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size:
> > 1073
> >
> > Long Range LJ corr.: <C6> 3.3459e-04
> > Generated table with 1074 data points for Ewald.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for LJ12.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for 1-4 COUL.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for 1-4 LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for 1-4 LJ12.
> > Tabscale = 500 points/nm
> >
> > Using GPU 8x8 nonbonded short-range kernels
> >
> > Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
> > outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
> > inner list: updated every 12 steps, buffer 0.002 nm, rlist 1.002 nm
> > At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list
> would
> > be:
> > outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
> > inner list: updated every 12 steps, buffer 0.050 nm, rlist 1.050 nm
> >
> > Using Lorentz-Berthelot Lennard-Jones combination rule
> >
> >
> > Initializing LINear Constraint Solver
> > The number of constraints is 3840
> >
> > There are: 20736 Atoms
> >
> > Started mdrun on rank 0 Sun Nov 4 23:01:29 2018
> > Step Time
> > 0 0.00000
> >
> > Energies (kJ/mol)
> > U-B Proper Dih. Improper Dih. LJ-14
> Coulomb-14
> > 7.80480e+03 5.27100e+03 8.63175e+01 4.08652e+03
> 4.83769e+03
> > LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
> Potential
> > 3.63164e+04 -2.90354e+03 -3.22530e+05 1.96307e+03
> -2.65067e+05
> > Kinetic En. Total Energy Conserved En. Temperature Pres. DC
> (bar)
> > 5.18776e+04 -2.13190e+05 -2.13177e+05 3.00053e+02
> -2.32857e+02
> > Pressure (bar) Constr. rmsd
> > -5.67996e+01 9.57285e-06
> >
> > step 200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
> > M-cycles
> > step 400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
> > M-cycles
> > step 600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
> > M-cycles
> > step 800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
> > M-cycles
> > step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
> > M-cycles
> > step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
> > M-cycles
> > step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
> > M-cycles
> > step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
> > M-cycles
> > step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
> > M-cycles
> > optimal pme grid 48 48 48, coulomb cutoff 1.045
> >
> > Last checkpoint:
> >
> > Writing checkpoint, step 22388100 at Mon Nov 5 08:31:29 2018
> >
> >
> > Step Time
> > 22500000 45000.00000
> >
> > Energies (kJ/mol)
> > U-B Proper Dih. Improper Dih. LJ-14
> Coulomb-14
> > 7.74565e+03 5.28043e+03 5.63610e+01 3.87191e+03
> 4.35044e+03
> > LJ (SR) Disper. corr. Coulomb (SR) Coul. recip.
> Potential
> > 3.61122e+04 -2.92965e+03 -3.24570e+05 1.59058e+03
> -2.68492e+05
> > Kinetic En. Total Energy Conserved En. Temperature Pres. DC
> (bar)
> > 5.16199e+04 -2.16872e+05 -3.11535e+05 2.98562e+02
> -2.37059e+02
> > Pressure (bar) Constr. rmsd
> > 4.08107e+01 9.30833e-06
> >
> >
> > Thank you in advance for any help. Please let me know if any additional
> > information is needed.
> >
> > Best regards,
> > Krzysztof
> >
> >
> >
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list