[gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)

Alex nedomacho at gmail.com
Fri May 10 02:31:46 CEST 2019


Yup, your assessment agrees with our guess. Our HPC guru will be taking his
findings, along with your quote, to the admins.

Thank you,

Alex

On Thu, May 9, 2019 at 2:51 PM Szilárd Páll <pall.szilard at gmail.com> wrote:

> On Thu, May 9, 2019 at 10:01 PM Alex <nedomacho at gmail.com> wrote:
>
> > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9.
> The
> > test procedure is simple, using slurm:
> > 1. Request an interactive session: > srun -N 1 -n 20 --pty
> > --partition=debug --time=1:00:00 --gres=gpu:1 bash
> > 2. Load CUDA library: module load cuda
> > 3. Run test batch. This starts with a CPU-only static EM, which, despite
> > the mdrun variables, runs on a single thread. Any help will be highly
> > appreciated.
> >
> >  md.log below:
> >
> > GROMACS:      gmx mdrun, version 2019.1
> > Executable:   /home/reida/ppc64le/stow/gromacs/bin/gmx
> > Data prefix:  /home/reida/ppc64le/stow/gromacs
> > Working dir:  /home/smolyan/gmx_test1
> > Process ID:   115831
> > Command line:
> >   gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
> > em.tpr -o traj.trr -g md.log -c after_em.pdb
> >
> > GROMACS version:    2019.1
> > Precision:          single
> > Memory model:       64 bit
> > MPI library:        thread_mpi
> > OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support:        CUDA
> > SIMD instructions:  IBM_VSX
> > FFT library:        fftw-3.3.8
> > RDTSCP usage:       disabled
> > TNG support:        enabled
> > Hwloc support:      hwloc-1.11.8
> > Tracing support:    disabled
> > C compiler:         /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
> > C compiler flags:   -mcpu=power9 -mtune=power9  -mvsx     -O2 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > C++ compiler:       /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
> > C++ compiler flags: -mcpu=power9 -mtune=power9  -mvsx    -std=c++11   -O2
> > -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler:      /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
> > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0,
> > V10.0.130
> > CUDA compiler
> >
> >
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
> >
> >
> -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver:        10.10
> > CUDA runtime:       10.0
> >
> >
> > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible
> GPU
> > Hardware detected:
> >   CPU info:
> >     Vendor: IBM
> >     Brand:  POWER9, altivec supported
> >     Family: 0   Model: 0   Stepping: 0
> >     Features: vmx vsx
> >   Hardware topology: Only logical processor count
> >   GPU info:
> >     Number of GPUs detected: 1
> >     #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> > compatible
> >
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> >
> > *SKIPPED*
> >
> > Input Parameters:
> >    integrator                     = steep
> >    tinit                          = 0
> >    dt                             = 0.001
> >    nsteps                         = 50000
> >    init-step                      = 0
> >    simulation-part                = 1
> >    comm-mode                      = Linear
> >    nstcomm                        = 100
> >    bd-fric                        = 0
> >    ld-seed                        = 1941752878
> >    emtol                          = 100
> >    emstep                         = 0.01
> >    niter                          = 20
> >    fcstep                         = 0
> >    nstcgsteep                     = 1000
> >    nbfgscorr                      = 10
> >    rtpi                           = 0.05
> >    nstxout                        = 0
> >    nstvout                        = 0
> >    nstfout                        = 0
> >    nstlog                         = 1000
> >    nstcalcenergy                  = 100
> >    nstenergy                      = 1000
> >    nstxout-compressed             = 0
> >    compressed-x-precision         = 1000
> >    cutoff-scheme                  = Verlet
> >    nstlist                        = 1
> >    ns-type                        = Grid
> >    pbc                            = xyz
> >    periodic-molecules             = true
> >    verlet-buffer-tolerance        = 0.005
> >    rlist                          = 1.2
> >    coulombtype                    = PME
> >    coulomb-modifier               = Potential-shift
> >    rcoulomb-switch                = 0
> >    rcoulomb                       = 1.2
> >    epsilon-r                      = 1
> >    epsilon-rf                     = inf
> >    vdw-type                       = Cut-off
> >    vdw-modifier                   = Potential-shift
> >    rvdw-switch                    = 0
> >    rvdw                           = 1.2
> >    DispCorr                       = No
> >    table-extension                = 1
> >    fourierspacing                 = 0.12
> >    fourier-nx                     = 52
> >    fourier-ny                     = 52
> >    fourier-nz                     = 52
> >    pme-order                      = 4
> >    ewald-rtol                     = 1e-05
> >    ewald-rtol-lj                  = 0.001
> >    lj-pme-comb-rule               = Geometric
> >    ewald-geometry                 = 0
> >    epsilon-surface                = 0
> >    tcoupl                         = No
> >    nsttcouple                     = -1
> >    nh-chain-length                = 0
> >    print-nose-hoover-chain-variables = false
> >    pcoupl                         = No
> >    pcoupltype                     = Isotropic
> >    nstpcouple                     = -1
> >    tau-p                          = 1
> >    compressibility (3x3):
> >       compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    ref-p (3x3):
> >       ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    refcoord-scaling               = No
> >    posres-com (3):
> >       posres-com[0]= 0.00000e+00
> >       posres-com[1]= 0.00000e+00
> >       posres-com[2]= 0.00000e+00
> >    posres-comB (3):
> >       posres-comB[0]= 0.00000e+00
> >       posres-comB[1]= 0.00000e+00
> >       posres-comB[2]= 0.00000e+00
> >    QMMM                           = false
> >    QMconstraints                  = 0
> >    QMMMscheme                     = 0
> >    MMChargeScaleFactor            = 1
> > qm-opts:
> >    ngQM                           = 0
> >    constraint-algorithm           = Lincs
> >    continuation                   = false
> >    Shake-SOR                      = false
> >    shake-tol                      = 0.0001
> >    lincs-order                    = 4
> >    lincs-iter                     = 1
> >    lincs-warnangle                = 30
> >    nwall                          = 0
> >    wall-type                      = 9-3
> >    wall-r-linpot                  = -1
> >    wall-atomtype[0]               = -1
> >    wall-atomtype[1]               = -1
> >    wall-density[0]                = 0
> >    wall-density[1]                = 0
> >    wall-ewald-zfac                = 3
> >    pull                           = false
> >    awh                            = false
> >    rotation                       = false
> >    interactiveMD                  = false
> >    disre                          = No
> >    disre-weighting                = Conservative
> >    disre-mixed                    = false
> >    dr-fc                          = 1000
> >    dr-tau                         = 0
> >    nstdisreout                    = 100
> >    orire-fc                       = 0
> >    orire-tau                      = 0
> >    nstorireout                    = 100
> >    free-energy                    = no
> >    cos-acceleration               = 0
> >    deform (3x3):
> >       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    simulated-tempering            = false
> >    swapcoords                     = no
> >    userint1                       = 0
> >    userint2                       = 0
> >    userint3                       = 0
> >    userint4                       = 0
> >    userreal1                      = 0
> >    userreal2                      = 0
> >    userreal3                      = 0
> >    userreal4                      = 0
> >    applied-forces:
> >      electric-field:
> >        x:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> >        y:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> >        z:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> > grpopts:
> >    nrdf:       47805
> >    ref-t:           0
> >    tau-t:           0
> > annealing:          No
> > annealing-npoints:           0
> >    acc:            0           0           0
> >    nfreeze:           N           N           N
> >    energygrp-flags[  0]: 0
> >
> >
> > Initializing Domain Decomposition on 4 ranks
> > NOTE: disabling dynamic load balancing as it is only supported with
> > dynamics, not with integrator 'steep'.
> > Dynamic load balancing: auto
> > Using update groups, nr 10529, average size 2.5 atoms, max. radius 0.078
> nm
> > Minimum cell size due to atom displacement: 0.000 nm
> > NOTE: Periodic molecules are present in this system. Because of this, the
> > domain decomposition algorithm cannot easily determine the minimum cell
> > size that it requires for treating bonded interactions. Instead, domain
> > decomposition will assume that half the non-bonded cut-off will be a
> > suitable lower bound.
> > Minimum cell size due to bonded interactions: 0.678 nm
> > Using 0 separate PME ranks, as there are too few total
> >  ranks for efficient splitting
> > Optimizing the DD grid for 4 cells with a minimum initial size of 0.678
> nm
> > The maximum allowed number of cells is: X 8 Y 8 Z 8
> > Domain decomposition grid 1 x 4 x 1, separate PME ranks 0
> > PME domain decomposition: 1 x 4 x 1
> > Domain decomposition rank 0, coordinates 0 0 0
> >
> > The initial number of communication pulses is: Y 1
> > The initial domain decomposition cell size is: Y 1.50 nm
> >
> > The maximum allowed distance for atom groups involved in interactions is:
> >                  non-bonded interactions           1.356 nm
> >             two-body bonded interactions  (-rdd)   1.356 nm
> >           multi-body bonded interactions  (-rdd)   1.356 nm
> >               virtual site constructions  (-rcon)  1.503 nm
> >
> > Using 4 MPI threads
> > Using 4 OpenMP threads per tMPI thread
> >
> >
> > Overriding thread affinity set outside gmx mdrun
> >
> > Pinning threads with a user-specified logical core stride of 2
> >
> > NOTE: Thread affinity was not set.
> >
>
> The threads are not pinned -- see above --, but why I can't say. I suggest:
> i) talk to your admins ii) try to tell the job scheduler to not set
> affinities and let mdrun set it.
>
>
> > System total charge: 0.000
> > Will do PME sum in reciprocal space for electrostatic interactions.
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
> > Pedersen
> > A smooth particle mesh Ewald method
> > J. Chem. Phys. 103 (1995) pp. 8577-8592
> > -------- -------- --- Thank You --- -------- --------
> >
> > Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
> > Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -8.333e-06
> > Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size:
> > 1176
> >
> > Generated table with 1100 data points for 1-4 COUL.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 LJ12.
> > Tabscale = 500 points/nm
> >
> > Using SIMD 4x4 nonbonded short-range kernels
> >
> > Using a 4x4 pair-list setup:
> >   updated every 1 steps, buffer 0.000 nm, rlist 1.200 nm
> >
> > Using geometric Lennard-Jones combination rule
> >
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > S. Miyamoto and P. A. Kollman
> > SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
> Rigid
> > Water Models
> > J. Comp. Chem. 13 (1992) pp. 952-962
> > -------- -------- --- Thank You --- -------- --------
> >
> >
> > Linking all bonded interactions to atoms
> > There are 5407 inter charge-group virtual sites,
> > will an extra communication step for selected coordinates and forces
> >
> >
> > Note that activating steepest-descent energy minimization via the
> > integrator .mdp option and the command gmx mdrun may be available in a
> > different form in a future version of GROMACS, e.g. gmx minimize and an
> > .mdp option.
> > Initiating Steepest Descents
> >
> > Atom distribution over 4 domains: av 6687 stddev 134 min 6515 max 6792
> > Started Steepest Descents on rank 0 Thu May  9 15:49:36 2019
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-request at gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list