[gmx-users] Assertion failed with single precision only

Fri Feb 1 13:41:01 CET 2019

Hi,
I was trying to run a MD run using both single precision and double precision. The dp-run is okay, but the sp-run was terminated with a ‘Assertion Failed’ message.
Any help to resolve this is welcome.
Best regards,
Prithwish

------------------------------------------------------------
The screen output is :
------------------------------------------------------------
comm-mode angular will give incorrect results when the comm group partially crosses a periodic boundary
Using 40 MPI processes
Using 1 OpenMP thread per MPI process

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity

WARNING: This run will generate roughly 11567 Mb of data

Program:     gmx mdrun, version 2018.4
Source file: src/gromacs/mdlib/vcm.cpp (line 394)
Function:    void do_stopcm_grp(const t_mdatoms &, float (*)[3], float (*)[3], const t_vcm &)
MPI rank:    5 (out of 40)

Assertion failed:
Condition: x
Need x to compute angular momentum correction

—————————————————————————————————
The log file is as follows:
———————————————————————————————————————————
C compiler flags:    -xCORE-AVX512 -qopt-zmm-usage=high   -mkl=sequential  -std=gnu99   -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits
C++ compiler flags:  -xCORE-AVX512 -qopt-zmm-usage=high   -mkl=sequential  -std=c++11    -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits

Running on 1 node with total 40 cores, 40 logical cores
Hardware detected on host n12 (the node of MPI rank 0):
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
    Family: 6   Model: 85   Stepping: 4
    Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
    Number of AVX-512 FMA units: 2
  Hardware topology: Full, with devices
    Sockets, cores, and logical processors:
      Socket  0: [   0] [   1] [   2] [   3] [   4] [   5] [   6] [   7] [   8] [   9] [  10] [  11] [  12] [  13] [  14] [  15] [  16] [  17] [  18] [  19]
      Socket  1: [  20] [  21] [  22] [  23] [  24] [  25] [  26] [  27] [  28] [  29] [  30] [  31] [  32] [  33] [  34] [  35] [  36] [  37] [  38] [  39]
    Numa nodes:
      Node  0 (101696126976 bytes mem):   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19
      Node  1 (103079215104 bytes mem):  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39
      Latency:
               0     1
         0  1.00  2.10
         1  2.10  1.00
    Caches:
      L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 1 ways
      L2: 1048576 bytes, linesize 64 bytes, assoc. 16, shared 1 ways
      L3: 28835840 bytes, linesize 64 bytes, assoc. 11, shared 20 ways
    PCI devices:
      0000:00:11.5  Id: 8086:a1d2  Class: 0x0106  Numa: 0
      0000:00:17.0  Id: 8086:a182  Class: 0x0106  Numa: 0
      0000:02:00.0  Id: 1a03:2000  Class: 0x0300  Numa: 0
      0000:18:00.0  Id: 8086:1563  Class: 0x0200  Numa: 0
      0000:18:00.1  Id: 8086:1563  Class: 0x0200  Numa: 0
      0000:5e:00.0  Id: 8086:24f0  Class: 0x0208  Numa: 0

Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.00333333
   nsteps                         = 300000000
   init-step                      = 0
   simulation-part                = 1
   comm-mode                      = Angular
   nstcomm                        = 3000
   bd-fric                        = 0
   ld-seed                        = 3420309074
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 3000
   nstvout                        = 0
   nstfout                        = 0
   nstlog                         = 3000
   nstcalcenergy                  = 3000
   nstenergy                      = 3000
   nstxout-compressed             = 0
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 10
   ns-type                        = Grid
   pbc                            = xyz
   periodic-molecules             = false
   verlet-buffer-tolerance        = 0.005
   rlist                          = 0.8
   coulombtype                    = PME
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 0.8
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 0.8
   DispCorr                       = No
   table-extension                = 1
   fourierspacing                 = 0
   fourier-nx                     = 8
   fourier-ny                     = 8
   fourier-nz                     = 8
   pme-order                      = 3
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   implicit-solvent               = No
   gb-algorithm                   = Still
   nstgbradii                     = 1
   rgbradii                       = 1
   gb-epsilon-solvent             = 80
   gb-saltconc                    = 0
   gb-obc-alpha                   = 1
   gb-obc-beta                    = 0.8
   gb-obc-gamma                   = 4.85
   gb-dielectric-offset           = 0.009
   sa-algorithm                   = Ace-approximation
sa-surface-tension             = 2.05016
   tcoupl                         = Nose-Hoover
   nsttcouple                     = 10
   nh-chain-length                = 1
   print-nose-hoover-chain-variables = false
   pcoupl                         = No
   pcoupltype                     = Isotropic
   nstpcouple                     = -1
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   ref-p (3x3):
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord-scaling               = No
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = false
   QMconstraints                  = 0
   QMMMscheme                     = 0
   MMChargeScaleFactor            = 1
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = false
   Shake-SOR                      = false
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 2
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = false
   awh                            = false
   rotation                       = false
   interactiveMD                  = false
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = false
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = false
   swapcoords                     = no
  userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
   applied-forces:
     electric-field:
grpopts:
   nrdf:       14994
   ref-t:         150
   tau-t:           1
annealing:          No
annealing-npoints:           0
   acc:            0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Changing nstlist from 10 to 100, rlist from 0.8 to 0.8

Initializing Domain Decomposition on 40 ranks
Dynamic load balancing: off
Minimum cell size due to atom displacement: 0.761 nm
Initial maximum inter charge-group distances:
    two-body bonded interactions: 0.153 nm, Exclusion, atoms 8854 8855
Minimum cell size due to bonded interactions: 0.000 nm
Guess for relative PME load: 0.81
Using 0 separate PME ranks, as guessed by mdrun
Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25
Optimizing the DD grid for 40 cells with a minimum initial size of 0.951 nm
The maximum allowed number of cells is: X 25 Y 25 Z 25
Domain decomposition grid 4 x 5 x 2, separate PME ranks 0
PME domain decomposition: 4 x 10 x 1

comm-mode angular will give incorrect results when the comm group partially crosses a periodic boundary
Domain decomposition rank 0, coordinates 0 0 0

The initial number of communication pulses is: X 1 Y 1 Z 1
The initial domain decomposition cell size is: X 6.12 nm Y 4.90 nm Z 12.25 nm

The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           0.800 nm
            two-body bonded interactions  (-rdd)   0.800 nm
          multi-body bonded interactions  (-rdd)   0.800 nm
              virtual site constructions  (-rcon)  4.900 nm
  atoms separated by up to 5 constraints  (-rcon)  4.900 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 1 Z 1
The minimum size for domain decomposition cells is 0.800 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.13 Y 0.16 Z 0.07
The maximum allowed distance for charge groups involved in interactions is:
                 non-bonded interactions           0.800 nm
            two-body bonded interactions  (-rdd)   0.800 nm
          multi-body bonded interactions  (-rdd)   0.800 nm
              virtual site constructions  (-rcon)  0.800 nm

Using 40 MPI processes
Using 1 OpenMP thread per MPI process

Non-default thread affinity set probably by the OpenMP library,
disabling internal thread affinity
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Using a Gaussian width (1/beta) of 0.25613 nm for Ewald
Potential shift: LJ r^-12: -1.455e+01 r^-6: -3.815e+00, Ewald -1.250e-05
Initialized non-bonded Ewald correction tables, spacing: 8.35e-04 size: 960

Using SIMD 4x8 nonbonded short-range kernels

Using a 4x8 pair-list setup:
  updated every 100 steps, buffer 0.000 nm, rlist 0.800 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
  updated every 100 steps, buffer 0.000 nm, rlist 0.800 nm

Using geometric Lennard-Jones combination rule

Removing pbc first time

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962