[gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Szilárd Páll
pall.szilard at gmail.com
Thu May 9 22:51:26 CEST 2019
On Thu, May 9, 2019 at 10:01 PM Alex <nedomacho at gmail.com> wrote:
> Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The
> test procedure is simple, using slurm:
> 1. Request an interactive session: > srun -N 1 -n 20 --pty
> --partition=debug --time=1:00:00 --gres=gpu:1 bash
> 2. Load CUDA library: module load cuda
> 3. Run test batch. This starts with a CPU-only static EM, which, despite
> the mdrun variables, runs on a single thread. Any help will be highly
> appreciated.
>
> md.log below:
>
> GROMACS: gmx mdrun, version 2019.1
> Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx
> Data prefix: /home/reida/ppc64le/stow/gromacs
> Working dir: /home/smolyan/gmx_test1
> Process ID: 115831
> Command line:
> gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
> em.tpr -o traj.trr -g md.log -c after_em.pdb
>
> GROMACS version: 2019.1
> Precision: single
> Memory model: 64 bit
> MPI library: thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support: CUDA
> SIMD instructions: IBM_VSX
> FFT library: fftw-3.3.8
> RDTSCP usage: disabled
> TNG support: enabled
> Hwloc support: hwloc-1.11.8
> Tracing support: disabled
> C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
> C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG
> -funroll-all-loops -fexcess-precision=fast
> C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
> C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx -std=c++11 -O2
> -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
> Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0,
> V10.0.130
> CUDA compiler
>
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
>
> -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> CUDA driver: 10.10
> CUDA runtime: 10.0
>
>
> Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU
> Hardware detected:
> CPU info:
> Vendor: IBM
> Brand: POWER9, altivec supported
> Family: 0 Model: 0 Stepping: 0
> Features: vmx vsx
> Hardware topology: Only logical processor count
> GPU info:
> Number of GPUs detected: 1
> #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> compatible
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
>
> *SKIPPED*
>
> Input Parameters:
> integrator = steep
> tinit = 0
> dt = 0.001
> nsteps = 50000
> init-step = 0
> simulation-part = 1
> comm-mode = Linear
> nstcomm = 100
> bd-fric = 0
> ld-seed = 1941752878
> emtol = 100
> emstep = 0.01
> niter = 20
> fcstep = 0
> nstcgsteep = 1000
> nbfgscorr = 10
> rtpi = 0.05
> nstxout = 0
> nstvout = 0
> nstfout = 0
> nstlog = 1000
> nstcalcenergy = 100
> nstenergy = 1000
> nstxout-compressed = 0
> compressed-x-precision = 1000
> cutoff-scheme = Verlet
> nstlist = 1
> ns-type = Grid
> pbc = xyz
> periodic-molecules = true
> verlet-buffer-tolerance = 0.005
> rlist = 1.2
> coulombtype = PME
> coulomb-modifier = Potential-shift
> rcoulomb-switch = 0
> rcoulomb = 1.2
> epsilon-r = 1
> epsilon-rf = inf
> vdw-type = Cut-off
> vdw-modifier = Potential-shift
> rvdw-switch = 0
> rvdw = 1.2
> DispCorr = No
> table-extension = 1
> fourierspacing = 0.12
> fourier-nx = 52
> fourier-ny = 52
> fourier-nz = 52
> pme-order = 4
> ewald-rtol = 1e-05
> ewald-rtol-lj = 0.001
> lj-pme-comb-rule = Geometric
> ewald-geometry = 0
> epsilon-surface = 0
> tcoupl = No
> nsttcouple = -1
> nh-chain-length = 0
> print-nose-hoover-chain-variables = false
> pcoupl = No
> pcoupltype = Isotropic
> nstpcouple = -1
> tau-p = 1
> compressibility (3x3):
> compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p (3x3):
> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> refcoord-scaling = No
> posres-com (3):
> posres-com[0]= 0.00000e+00
> posres-com[1]= 0.00000e+00
> posres-com[2]= 0.00000e+00
> posres-comB (3):
> posres-comB[0]= 0.00000e+00
> posres-comB[1]= 0.00000e+00
> posres-comB[2]= 0.00000e+00
> QMMM = false
> QMconstraints = 0
> QMMMscheme = 0
> MMChargeScaleFactor = 1
> qm-opts:
> ngQM = 0
> constraint-algorithm = Lincs
> continuation = false
> Shake-SOR = false
> shake-tol = 0.0001
> lincs-order = 4
> lincs-iter = 1
> lincs-warnangle = 30
> nwall = 0
> wall-type = 9-3
> wall-r-linpot = -1
> wall-atomtype[0] = -1
> wall-atomtype[1] = -1
> wall-density[0] = 0
> wall-density[1] = 0
> wall-ewald-zfac = 3
> pull = false
> awh = false
> rotation = false
> interactiveMD = false
> disre = No
> disre-weighting = Conservative
> disre-mixed = false
> dr-fc = 1000
> dr-tau = 0
> nstdisreout = 100
> orire-fc = 0
> orire-tau = 0
> nstorireout = 100
> free-energy = no
> cos-acceleration = 0
> deform (3x3):
> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> simulated-tempering = false
> swapcoords = no
> userint1 = 0
> userint2 = 0
> userint3 = 0
> userint4 = 0
> userreal1 = 0
> userreal2 = 0
> userreal3 = 0
> userreal4 = 0
> applied-forces:
> electric-field:
> x:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> y:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> z:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> grpopts:
> nrdf: 47805
> ref-t: 0
> tau-t: 0
> annealing: No
> annealing-npoints: 0
> acc: 0 0 0
> nfreeze: N N N
> energygrp-flags[ 0]: 0
>
>
> Initializing Domain Decomposition on 4 ranks
> NOTE: disabling dynamic load balancing as it is only supported with
> dynamics, not with integrator 'steep'.
> Dynamic load balancing: auto
> Using update groups, nr 10529, average size 2.5 atoms, max. radius 0.078 nm
> Minimum cell size due to atom displacement: 0.000 nm
> NOTE: Periodic molecules are present in this system. Because of this, the
> domain decomposition algorithm cannot easily determine the minimum cell
> size that it requires for treating bonded interactions. Instead, domain
> decomposition will assume that half the non-bonded cut-off will be a
> suitable lower bound.
> Minimum cell size due to bonded interactions: 0.678 nm
> Using 0 separate PME ranks, as there are too few total
> ranks for efficient splitting
> Optimizing the DD grid for 4 cells with a minimum initial size of 0.678 nm
> The maximum allowed number of cells is: X 8 Y 8 Z 8
> Domain decomposition grid 1 x 4 x 1, separate PME ranks 0
> PME domain decomposition: 1 x 4 x 1
> Domain decomposition rank 0, coordinates 0 0 0
>
> The initial number of communication pulses is: Y 1
> The initial domain decomposition cell size is: Y 1.50 nm
>
> The maximum allowed distance for atom groups involved in interactions is:
> non-bonded interactions 1.356 nm
> two-body bonded interactions (-rdd) 1.356 nm
> multi-body bonded interactions (-rdd) 1.356 nm
> virtual site constructions (-rcon) 1.503 nm
>
> Using 4 MPI threads
> Using 4 OpenMP threads per tMPI thread
>
>
> Overriding thread affinity set outside gmx mdrun
>
> Pinning threads with a user-specified logical core stride of 2
>
> NOTE: Thread affinity was not set.
>
The threads are not pinned -- see above --, but why I can't say. I suggest:
i) talk to your admins ii) try to tell the job scheduler to not set
affinities and let mdrun set it.
> System total charge: 0.000
> Will do PME sum in reciprocal space for electrostatic interactions.
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
> Pedersen
> A smooth particle mesh Ewald method
> J. Chem. Phys. 103 (1995) pp. 8577-8592
> -------- -------- --- Thank You --- -------- --------
>
> Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
> Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -8.333e-06
> Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size:
> 1176
>
> Generated table with 1100 data points for 1-4 COUL.
> Tabscale = 500 points/nm
> Generated table with 1100 data points for 1-4 LJ6.
> Tabscale = 500 points/nm
> Generated table with 1100 data points for 1-4 LJ12.
> Tabscale = 500 points/nm
>
> Using SIMD 4x4 nonbonded short-range kernels
>
> Using a 4x4 pair-list setup:
> updated every 1 steps, buffer 0.000 nm, rlist 1.200 nm
>
> Using geometric Lennard-Jones combination rule
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> S. Miyamoto and P. A. Kollman
> SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
> Water Models
> J. Comp. Chem. 13 (1992) pp. 952-962
> -------- -------- --- Thank You --- -------- --------
>
>
> Linking all bonded interactions to atoms
> There are 5407 inter charge-group virtual sites,
> will an extra communication step for selected coordinates and forces
>
>
> Note that activating steepest-descent energy minimization via the
> integrator .mdp option and the command gmx mdrun may be available in a
> different form in a future version of GROMACS, e.g. gmx minimize and an
> .mdp option.
> Initiating Steepest Descents
>
> Atom distribution over 4 domains: av 6687 stddev 134 min 6515 max 6792
> Started Steepest Descents on rank 0 Thu May 9 15:49:36 2019
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list