[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)

Tue Nov 6 13:46:33 CET 2018

Did it really crash after exactly the same number of steps the second time
too?
--
Szilárd

On Tue, Nov 6, 2018 at 10:55 AM Krzysztof Kolman <krzysztof.kolman at gmail.com>
wrote:

> Dear Gromacs Users,
>
> I just wanted to add an additional information. After doing restart, the
> simulation crashed (again segmentation fault) after the same time interval,
> which is 12h and 22500000 steps (so now I am at 45000000 steps out of
> 50000000). I think that this obserevation proves that it is not related to
> an unstable simulation but only to some kind of software issue.
>
> Kind regards,
> Krzysztof
>
> pon., 5 lis 2018 o 21:12 Krzysztof Kolman <krzysztof.kolman at gmail.com>
> napisał(a):
>
> > Dear Gromacs Users,
> >
> > I have problem with my Gromacs 2018.3 that keeps crashing due to
> > segmentation fault after quite long simulations time (more than 12 h wall
> > clock). It is hard for me to tell why because there is no information
> why,
> > except the segmentation fault message. Please find below shortened output
> > from the log file:
> > Command line:
> >   gmx mdrun -v -deffnm md_0_1
> >
> > GROMACS version:    2018.3
> > Precision:          single
> > Memory model:       64 bit
> > MPI library:        thread_mpi
> > OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support:        CUDA
> > SIMD instructions:  AVX2_256
> > FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
> > RDTSCP usage:       enabled
> > TNG support:        enabled
> > Hwloc support:      disabled
> > Tracing support:    disabled
> > Built on:           2018-10-17 19:53:24
> > Built by:           kolman at kolman-B85-HD3 [CMAKE]
> > Build OS/arch:      Linux 4.15.0-36-generic x86_64
> > Build CPU vendor:   Intel
> > Build CPU brand:    Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> > Build CPU family:   6   Model: 60   Stepping: 3
> > Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
> > intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse
> rdrnd
> > rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> > C compiler:         /usr/bin/gcc-6 GNU 6.4.0
> > C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
> > -fexcess-precision=fast
> > C++ compiler:       /usr/bin/g++-6 GNU 6.4.0
> > C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler:      /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
> > driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
> > Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
> > CUDA compiler
> >
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
> >
> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver:        9.10
> > CUDA runtime:       9.10
> >
> >
> > Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
> > Hardware detected:
> >   CPU info:
> >     Vendor: Intel
> >     Brand:  Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
> >     Family: 6   Model: 60   Stepping: 3
> >     Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel
> > lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
> rdtscp
> > sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
> >   Hardware topology: Basic
> >     Sockets, cores, and logical processors:
> >       Socket  0: [   0   4] [   1   5] [   2   6] [   3   7]
> >   GPU info:
> >     Number of GPUs detected: 1
> >     #0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC:  no, stat:
> > compatible
> > ...
> >
> > nput Parameters:
> >    integrator                     = md
> >    tinit                          = 0
> >    dt                             = 0.002
> >    nsteps                         = 50000000
> >    init-step                      = 0
> >    simulation-part                = 1
> >    comm-mode                      = Linear
> >    nstcomm                        = 100
> >    bd-fric                        = 0
> >    ld-seed                        = -105855329
> >    emtol                          = 10
> >    emstep                         = 0.01
> >    niter                          = 20
> >    fcstep                         = 0
> >    nstcgsteep                     = 1000
> >    nbfgscorr                      = 10
> >    rtpi                           = 0.05
> >    nstxout                        = 500000
> >    nstvout                        = 500000
> >    nstfout                        = 0
> >    nstlog                         = 500000
> >    nstcalcenergy                  = 100
> >    nstenergy                      = 50000
> >    nstxout-compressed             = 50000
> >    compressed-x-precision         = 1000
> >    cutoff-scheme                  = Verlet
> >    nstlist                        = 10
> >    ns-type                        = Grid
> >    pbc                            = xyz
> >    periodic-molecules             = false
> >    verlet-buffer-tolerance        = 0.005
> >    rlist                          = 1
> >    coulombtype                    = PME
> >    coulomb-modifier               = Potential-shift
> >    rcoulomb-switch                = 0
> >    rcoulomb                       = 1
> >    epsilon-r                      = 1
> >    epsilon-rf                     = inf
> >    vdw-type                       = Cut-off
> >    vdw-modifier                   = Potential-shift
> >    rvdw-switch                    = 0
> >    rvdw                           = 1
> >    DispCorr                       = EnerPres
> >    table-extension                = 1
> >    fourierspacing                 = 0.118
> >    fourier-nx                     = 52
> >    fourier-ny                     = 52
> >    fourier-nz                     = 52
> >    pme-order                      = 4
> >    ewald-rtol                     = 1e-05
> >    ewald-rtol-lj                  = 0.001
> >    lj-pme-comb-rule               = Geometric
> >    ewald-geometry                 = 0
> >    epsilon-surface                = 0
> >    implicit-solvent               = No
> >    gb-algorithm                   = Still
> >    nstgbradii                     = 1
> >    rgbradii                       = 1
> >    gb-epsilon-solvent             = 80
> >    gb-saltconc                    = 0
> >    gb-obc-alpha                   = 1
> >    gb-obc-beta                    = 0.8
> >    gb-obc-gamma                   = 4.85
> >    gb-dielectric-offset           = 0.009
> >    sa-algorithm                   = Ace-approximation
> >    sa-surface-tension             = 2.05016
> >    tcoupl                         = V-rescale
> >    nsttcouple                     = 10
> >    nh-chain-length                = 0
> >    print-nose-hoover-chain-variables = false
> >    pcoupl                         = Parrinello-Rahman
> >    pcoupltype                     = Isotropic
> >    nstpcouple                     = 10
> >    tau-p                          = 1
> >    compressibility (3x3):
> >       compressibility[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
> >       compressibility[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
> >       compressibility[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
> >    ref-p (3x3):
> >       ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
> >       ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
> >       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
> >    refcoord-scaling               = COM
> >    posres-com (3):
> >       posres-com[0]= 0.00000e+00
> >       posres-com[1]= 0.00000e+00
> >       posres-com[2]= 0.00000e+00
> >    posres-comB (3):
> >       posres-comB[0]= 0.00000e+00
> >       posres-comB[1]= 0.00000e+00
> >       posres-comB[2]= 0.00000e+00
> >    QMMM                           = false
> >    QMconstraints                  = 0
> >    QMMMscheme                     = 0
> >    MMChargeScaleFactor            = 1
> > qm-opts:
> >    ngQM                           = 0
> >    constraint-algorithm           = Lincs
> >    continuation                   = true
> >    Shake-SOR                      = false
> >    shake-tol                      = 0.0001
> >    lincs-order                    = 4
> >    lincs-iter                     = 1
> >    lincs-warnangle                = 30
> >    nwall                          = 0
> >    wall-type                      = 9-3
> >    wall-r-linpot                  = -1
> >    wall-atomtype[0]               = -1
> >    wall-atomtype[1]               = -1
> >    wall-density[0]                = 0
> >    wall-density[1]                = 0
> >    wall-ewald-zfac                = 3
> >    pull                           = false
> >    awh                            = false
> >    rotation                       = false
> >    interactiveMD                  = false
> >    disre                          = No
> >    disre-weighting                = Conservative
> >    disre-mixed                    = false
> >    dr-fc                          = 1000
> >    dr-tau                         = 0
> >    nstdisreout                    = 100
> >    orire-fc                       = 0
> >    orire-tau                      = 0
> >    nstorireout                    = 100
> >    free-energy                    = no
> >    cos-acceleration               = 0
> >    deform (3x3):
> >       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    simulated-tempering            = false
> >    swapcoords                     = no
> >    userint1                       = 0
> >    userint2                       = 0
> >    userint3                       = 0
> >    userint4                       = 0
> >    userreal1                      = 0
> >    userreal2                      = 0
> >    userreal3                      = 0
> >    userreal4                      = 0
> >    applied-forces:
> >      electric-field:
> >        x:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> >        y:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> >        z:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> > grpopts:
> >    nrdf:     7859.43     33729.6
> >    ref-t:         300         300
> >    tau-t:         0.1         0.1
> > annealing:          No          No
> > annealing-npoints:           0           0
> >    acc:            0           0           0
> >    nfreeze:           N           N           N
> >    energygrp-flags[  0]: 0
> >
> > Changing nstlist from 10 to 100, rlist from 1 to 1.148
> >
> > Using 1 MPI thread
> > Using 8 OpenMP threads
> >
> > 1 GPU auto-selected for this run.
> > Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
> >   PP:0,PME:0
> > Pinning threads with an auto-selected logical core stride of 1
> > System total charge: 0.000
> > Will do PME sum in reciprocal space for electrostatic interactions.
> > ...
> > Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
> > Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
> > Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size:
> > 1073
> >
> > Long Range LJ corr.: <C6> 3.3459e-04
> > Generated table with 1074 data points for Ewald.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for LJ12.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for 1-4 COUL.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for 1-4 LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1074 data points for 1-4 LJ12.
> > Tabscale = 500 points/nm
> >
> > Using GPU 8x8 nonbonded short-range kernels
> >
> > Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
> >   outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
> >   inner list: updated every  12 steps, buffer 0.002 nm, rlist 1.002 nm
> > At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list
> would
> > be:
> >   outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
> >   inner list: updated every  12 steps, buffer 0.050 nm, rlist 1.050 nm
> >
> > Using Lorentz-Berthelot Lennard-Jones combination rule
> >
> >
> > Initializing LINear Constraint Solver
> > The number of constraints is 3840
> >
> > There are: 20736 Atoms
> >
> > Started mdrun on rank 0 Sun Nov  4 23:01:29 2018
> >            Step           Time
> >               0        0.00000
> >
> >    Energies (kJ/mol)
> >             U-B    Proper Dih.  Improper Dih.          LJ-14
>  Coulomb-14
> >     7.80480e+03    5.27100e+03    8.63175e+01    4.08652e+03
> 4.83769e+03
> >         LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
> Potential
> >     3.63164e+04   -2.90354e+03   -3.22530e+05    1.96307e+03
>  -2.65067e+05
> >     Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC
> (bar)
> >     5.18776e+04   -2.13190e+05   -2.13177e+05    3.00053e+02
>  -2.32857e+02
> >  Pressure (bar)   Constr. rmsd
> >    -5.67996e+01    9.57285e-06
> >
> > step  200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
> > M-cycles
> > step  400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
> > M-cycles
> > step  600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
> > M-cycles
> > step  800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
> > M-cycles
> > step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
> > M-cycles
> > step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
> > M-cycles
> > step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
> > M-cycles
> > step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
> > M-cycles
> > step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
> > M-cycles
> >               optimal pme grid 48 48 48, coulomb cutoff 1.045
> >
> > Last checkpoint:
> >
> > Writing checkpoint, step 22388100 at Mon Nov  5 08:31:29 2018
> >
> >
> >            Step           Time
> >        22500000    45000.00000
> >
> >    Energies (kJ/mol)
> >             U-B    Proper Dih.  Improper Dih.          LJ-14
>  Coulomb-14
> >     7.74565e+03    5.28043e+03    5.63610e+01    3.87191e+03
> 4.35044e+03
> >         LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
> Potential
> >     3.61122e+04   -2.92965e+03   -3.24570e+05    1.59058e+03
>  -2.68492e+05
> >     Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC
> (bar)
> >     5.16199e+04   -2.16872e+05   -3.11535e+05    2.98562e+02
>  -2.37059e+02
> >  Pressure (bar)   Constr. rmsd
> >     4.08107e+01    9.30833e-06
> >
> >
> > Thank you in advance for any help. Please let me know if any additional
> > information is needed.
> >
> > Best regards,
> > Krzysztof
> >
> >
> >
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.