[gmx-users] Gromacs 2018.3 with CUDA - segmentation fault (core dumped)

Benson Muite benson.muite at ut.ee
Tue Nov 6 11:27:59 CET 2018


Hi Krzysztof,

Not sure can be helpful. Possible starting questions:

1) Have you tried other builds such as 2019-beta1 or 2018.2

2) Have you tried cpu only version

3) When you restart are the results reasonable?

4) Are you able to monitor memory utilization?

5) Are you able to compile and run with debug information?

Benson

2019-beta1On 11/6/18 11:55 AM, Krzysztof Kolman wrote:
> Dear Gromacs Users,
>
> I just wanted to add an additional information. After doing restart, the
> simulation crashed (again segmentation fault) after the same time interval,
> which is 12h and 22500000 steps (so now I am at 45000000 steps out of
> 50000000). I think that this obserevation proves that it is not related to
> an unstable simulation but only to some kind of software issue.
>
> Kind regards,
> Krzysztof
>
> pon., 5 lis 2018 o 21:12 Krzysztof Kolman <krzysztof.kolman at gmail.com>
> napisał(a):
>
>> Dear Gromacs Users,
>>
>> I have problem with my Gromacs 2018.3 that keeps crashing due to
>> segmentation fault after quite long simulations time (more than 12 h wall
>> clock). It is hard for me to tell why because there is no information why,
>> except the segmentation fault message. Please find below shortened output
>> from the log file:
>> Command line:
>>   gmx mdrun -v -deffnm md_0_1
>>
>> GROMACS version:    2018.3
>> Precision:          single
>> Memory model:       64 bit
>> MPI library:        thread_mpi
>> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
>> GPU support:        CUDA
>> SIMD instructions:  AVX2_256
>> FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
>> RDTSCP usage:       enabled
>> TNG support:        enabled
>> Hwloc support:      disabled
>> Tracing support:    disabled
>> Built on:           2018-10-17 19:53:24
>> Built by:           kolman at kolman-B85-HD3 [CMAKE]
>> Build OS/arch:      Linux 4.15.0-36-generic x86_64
>> Build CPU vendor:   Intel
>> Build CPU brand:    Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
>> Build CPU family:   6   Model: 60   Stepping: 3
>> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
>> intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd
>> rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
>> C compiler:         /usr/bin/gcc-6 GNU 6.4.0
>> C compiler flags:    -march=core-avx2     -O3 -DNDEBUG -funroll-all-loops
>> -fexcess-precision=fast
>> C++ compiler:       /usr/bin/g++-6 GNU 6.4.0
>> C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG
>> -funroll-all-loops -fexcess-precision=fast
>> CUDA compiler:      /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler
>> driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on
>> Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
>> CUDA compiler
>> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;;
>> ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
>> CUDA driver:        9.10
>> CUDA runtime:       9.10
>>
>>
>> Running on 1 node with total 4 cores, 8 logical cores, 1 compatible GPU
>> Hardware detected:
>>   CPU info:
>>     Vendor: Intel
>>     Brand:  Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
>>     Family: 6   Model: 60   Stepping: 3
>>     Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel
>> lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp
>> sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
>>   Hardware topology: Basic
>>     Sockets, cores, and logical processors:
>>       Socket  0: [   0   4] [   1   5] [   2   6] [   3   7]
>>   GPU info:
>>     Number of GPUs detected: 1
>>     #0: NVIDIA GeForce GTX 770, compute cap.: 3.0, ECC:  no, stat:
>> compatible
>> ...
>>
>> nput Parameters:
>>    integrator                     = md
>>    tinit                          = 0
>>    dt                             = 0.002
>>    nsteps                         = 50000000
>>    init-step                      = 0
>>    simulation-part                = 1
>>    comm-mode                      = Linear
>>    nstcomm                        = 100
>>    bd-fric                        = 0
>>    ld-seed                        = -105855329
>>    emtol                          = 10
>>    emstep                         = 0.01
>>    niter                          = 20
>>    fcstep                         = 0
>>    nstcgsteep                     = 1000
>>    nbfgscorr                      = 10
>>    rtpi                           = 0.05
>>    nstxout                        = 500000
>>    nstvout                        = 500000
>>    nstfout                        = 0
>>    nstlog                         = 500000
>>    nstcalcenergy                  = 100
>>    nstenergy                      = 50000
>>    nstxout-compressed             = 50000
>>    compressed-x-precision         = 1000
>>    cutoff-scheme                  = Verlet
>>    nstlist                        = 10
>>    ns-type                        = Grid
>>    pbc                            = xyz
>>    periodic-molecules             = false
>>    verlet-buffer-tolerance        = 0.005
>>    rlist                          = 1
>>    coulombtype                    = PME
>>    coulomb-modifier               = Potential-shift
>>    rcoulomb-switch                = 0
>>    rcoulomb                       = 1
>>    epsilon-r                      = 1
>>    epsilon-rf                     = inf
>>    vdw-type                       = Cut-off
>>    vdw-modifier                   = Potential-shift
>>    rvdw-switch                    = 0
>>    rvdw                           = 1
>>    DispCorr                       = EnerPres
>>    table-extension                = 1
>>    fourierspacing                 = 0.118
>>    fourier-nx                     = 52
>>    fourier-ny                     = 52
>>    fourier-nz                     = 52
>>    pme-order                      = 4
>>    ewald-rtol                     = 1e-05
>>    ewald-rtol-lj                  = 0.001
>>    lj-pme-comb-rule               = Geometric
>>    ewald-geometry                 = 0
>>    epsilon-surface                = 0
>>    implicit-solvent               = No
>>    gb-algorithm                   = Still
>>    nstgbradii                     = 1
>>    rgbradii                       = 1
>>    gb-epsilon-solvent             = 80
>>    gb-saltconc                    = 0
>>    gb-obc-alpha                   = 1
>>    gb-obc-beta                    = 0.8
>>    gb-obc-gamma                   = 4.85
>>    gb-dielectric-offset           = 0.009
>>    sa-algorithm                   = Ace-approximation
>>    sa-surface-tension             = 2.05016
>>    tcoupl                         = V-rescale
>>    nsttcouple                     = 10
>>    nh-chain-length                = 0
>>    print-nose-hoover-chain-variables = false
>>    pcoupl                         = Parrinello-Rahman
>>    pcoupltype                     = Isotropic
>>    nstpcouple                     = 10
>>    tau-p                          = 1
>>    compressibility (3x3):
>>       compressibility[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
>>       compressibility[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
>>       compressibility[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
>>    ref-p (3x3):
>>       ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
>>       ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
>>       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
>>    refcoord-scaling               = COM
>>    posres-com (3):
>>       posres-com[0]= 0.00000e+00
>>       posres-com[1]= 0.00000e+00
>>       posres-com[2]= 0.00000e+00
>>    posres-comB (3):
>>       posres-comB[0]= 0.00000e+00
>>       posres-comB[1]= 0.00000e+00
>>       posres-comB[2]= 0.00000e+00
>>    QMMM                           = false
>>    QMconstraints                  = 0
>>    QMMMscheme                     = 0
>>    MMChargeScaleFactor            = 1
>> qm-opts:
>>    ngQM                           = 0
>>    constraint-algorithm           = Lincs
>>    continuation                   = true
>>    Shake-SOR                      = false
>>    shake-tol                      = 0.0001
>>    lincs-order                    = 4
>>    lincs-iter                     = 1
>>    lincs-warnangle                = 30
>>    nwall                          = 0
>>    wall-type                      = 9-3
>>    wall-r-linpot                  = -1
>>    wall-atomtype[0]               = -1
>>    wall-atomtype[1]               = -1
>>    wall-density[0]                = 0
>>    wall-density[1]                = 0
>>    wall-ewald-zfac                = 3
>>    pull                           = false
>>    awh                            = false
>>    rotation                       = false
>>    interactiveMD                  = false
>>    disre                          = No
>>    disre-weighting                = Conservative
>>    disre-mixed                    = false
>>    dr-fc                          = 1000
>>    dr-tau                         = 0
>>    nstdisreout                    = 100
>>    orire-fc                       = 0
>>    orire-tau                      = 0
>>    nstorireout                    = 100
>>    free-energy                    = no
>>    cos-acceleration               = 0
>>    deform (3x3):
>>       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>>    simulated-tempering            = false
>>    swapcoords                     = no
>>    userint1                       = 0
>>    userint2                       = 0
>>    userint3                       = 0
>>    userint4                       = 0
>>    userreal1                      = 0
>>    userreal2                      = 0
>>    userreal3                      = 0
>>    userreal4                      = 0
>>    applied-forces:
>>      electric-field:
>>        x:
>>          E0                       = 0
>>          omega                    = 0
>>          t0                       = 0
>>          sigma                    = 0
>>        y:
>>          E0                       = 0
>>          omega                    = 0
>>          t0                       = 0
>>          sigma                    = 0
>>        z:
>>          E0                       = 0
>>          omega                    = 0
>>          t0                       = 0
>>          sigma                    = 0
>> grpopts:
>>    nrdf:     7859.43     33729.6
>>    ref-t:         300         300
>>    tau-t:         0.1         0.1
>> annealing:          No          No
>> annealing-npoints:           0           0
>>    acc:            0           0           0
>>    nfreeze:           N           N           N
>>    energygrp-flags[  0]: 0
>>
>> Changing nstlist from 10 to 100, rlist from 1 to 1.148
>>
>> Using 1 MPI thread
>> Using 8 OpenMP threads
>>
>> 1 GPU auto-selected for this run.
>> Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
>>   PP:0,PME:0
>> Pinning threads with an auto-selected logical core stride of 1
>> System total charge: 0.000
>> Will do PME sum in reciprocal space for electrostatic interactions.
>> ...
>> Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
>> Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
>> Initialized non-bonded Ewald correction tables, spacing: 9.33e-04 size:
>> 1073
>>
>> Long Range LJ corr.: <C6> 3.3459e-04
>> Generated table with 1074 data points for Ewald.
>> Tabscale = 500 points/nm
>> Generated table with 1074 data points for LJ6.
>> Tabscale = 500 points/nm
>> Generated table with 1074 data points for LJ12.
>> Tabscale = 500 points/nm
>> Generated table with 1074 data points for 1-4 COUL.
>> Tabscale = 500 points/nm
>> Generated table with 1074 data points for 1-4 LJ6.
>> Tabscale = 500 points/nm
>> Generated table with 1074 data points for 1-4 LJ12.
>> Tabscale = 500 points/nm
>>
>> Using GPU 8x8 nonbonded short-range kernels
>>
>> Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
>>   outer list: updated every 100 steps, buffer 0.148 nm, rlist 1.148 nm
>>   inner list: updated every  12 steps, buffer 0.002 nm, rlist 1.002 nm
>> At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would
>> be:
>>   outer list: updated every 100 steps, buffer 0.305 nm, rlist 1.305 nm
>>   inner list: updated every  12 steps, buffer 0.050 nm, rlist 1.050 nm
>>
>> Using Lorentz-Berthelot Lennard-Jones combination rule
>>
>>
>> Initializing LINear Constraint Solver
>> The number of constraints is 3840
>>
>> There are: 20736 Atoms
>>
>> Started mdrun on rank 0 Sun Nov  4 23:01:29 2018
>>            Step           Time
>>               0        0.00000
>>
>>    Energies (kJ/mol)
>>             U-B    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>>     7.80480e+03    5.27100e+03    8.63175e+01    4.08652e+03    4.83769e+03
>>         LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.      Potential
>>     3.63164e+04   -2.90354e+03   -3.22530e+05    1.96307e+03   -2.65067e+05
>>     Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC (bar)
>>     5.18776e+04   -2.13190e+05   -2.13177e+05    3.00053e+02   -2.32857e+02
>>  Pressure (bar)   Constr. rmsd
>>    -5.67996e+01    9.57285e-06
>>
>> step  200: timed with pme grid 52 52 52, coulomb cutoff 1.000: 581.8
>> M-cycles
>> step  400: timed with pme grid 44 44 44, coulomb cutoff 1.140: 618.2
>> M-cycles
>> step  600: timed with pme grid 40 40 40, coulomb cutoff 1.254: 692.9
>> M-cycles
>> step  800: timed with pme grid 42 42 42, coulomb cutoff 1.194: 669.0
>> M-cycles
>> step 1000: timed with pme grid 44 44 44, coulomb cutoff 1.140: 630.8
>> M-cycles
>> step 1200: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.1
>> M-cycles
>> step 1400: timed with pme grid 52 52 52, coulomb cutoff 1.000: 566.0
>> M-cycles
>> step 1600: timed with pme grid 48 48 48, coulomb cutoff 1.045: 546.5
>> M-cycles
>> step 1800: timed with pme grid 52 52 52, coulomb cutoff 1.000: 565.3
>> M-cycles
>>               optimal pme grid 48 48 48, coulomb cutoff 1.045
>>
>> Last checkpoint:
>>
>> Writing checkpoint, step 22388100 at Mon Nov  5 08:31:29 2018
>>
>>
>>            Step           Time
>>        22500000    45000.00000
>>
>>    Energies (kJ/mol)
>>             U-B    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>>     7.74565e+03    5.28043e+03    5.63610e+01    3.87191e+03    4.35044e+03
>>         LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.      Potential
>>     3.61122e+04   -2.92965e+03   -3.24570e+05    1.59058e+03   -2.68492e+05
>>     Kinetic En.   Total Energy  Conserved En.    Temperature Pres. DC (bar)
>>     5.16199e+04   -2.16872e+05   -3.11535e+05    2.98562e+02   -2.37059e+02
>>  Pressure (bar)   Constr. rmsd
>>     4.08107e+01    9.30833e-06
>>
>>
>> Thank you in advance for any help. Please let me know if any additional
>> information is needed.
>>
>> Best regards,
>> Krzysztof
>>
>>
>>
>>



More information about the gromacs.org_gmx-users mailing list