[gmx-users] Gromacs 2019.2 on Power9 + Volta GPUs (building and running)
Alex
nedomacho at gmail.com
Thu May 9 22:00:28 CEST 2019
Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. The
test procedure is simple, using slurm:
1. Request an interactive session: > srun -N 1 -n 20 --pty
--partition=debug --time=1:00:00 --gres=gpu:1 bash
2. Load CUDA library: module load cuda
3. Run test batch. This starts with a CPU-only static EM, which, despite
the mdrun variables, runs on a single thread. Any help will be highly
appreciated.
md.log below:
GROMACS: gmx mdrun, version 2019.1
Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx
Data prefix: /home/reida/ppc64le/stow/gromacs
Working dir: /home/smolyan/gmx_test1
Process ID: 115831
Command line:
gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
em.tpr -o traj.trr -g md.log -c after_em.pdb
GROMACS version: 2019.1
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: IBM_VSX
FFT library: fftw-3.3.8
RDTSCP usage: disabled
TNG support: enabled
Hwloc support: hwloc-1.11.8
Tracing support: disabled
C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx -std=c++11 -O2
-DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler
flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
-mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 10.10
CUDA runtime: 10.0
Running on 1 node with total 160 cores, 160 logical cores, 1 compatible GPU
Hardware detected:
CPU info:
Vendor: IBM
Brand: POWER9, altivec supported
Family: 0 Model: 0 Stepping: 0
Features: vmx vsx
Hardware topology: Only logical processor count
GPU info:
Number of GPUs detected: 1
#0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
compatible
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
*SKIPPED*
Input Parameters:
integrator = steep
tinit = 0
dt = 0.001
nsteps = 50000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 1941752878
emtol = 100
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 1000
nstxout-compressed = 0
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 1
ns-type = Grid
pbc = xyz
periodic-molecules = true
verlet-buffer-tolerance = 0.005
rlist = 1.2
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 1.2
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 1.2
DispCorr = No
table-extension = 1
fourierspacing = 0.12
fourier-nx = 52
fourier-ny = 52
fourier-nz = 52
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
tcoupl = No
nsttcouple = -1
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p (3x3):
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
grpopts:
nrdf: 47805
ref-t: 0
tau-t: 0
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Initializing Domain Decomposition on 4 ranks
NOTE: disabling dynamic load balancing as it is only supported with
dynamics, not with integrator 'steep'.
Dynamic load balancing: auto
Using update groups, nr 10529, average size 2.5 atoms, max. radius 0.078 nm
Minimum cell size due to atom displacement: 0.000 nm
NOTE: Periodic molecules are present in this system. Because of this, the
domain decomposition algorithm cannot easily determine the minimum cell
size that it requires for treating bonded interactions. Instead, domain
decomposition will assume that half the non-bonded cut-off will be a
suitable lower bound.
Minimum cell size due to bonded interactions: 0.678 nm
Using 0 separate PME ranks, as there are too few total
ranks for efficient splitting
Optimizing the DD grid for 4 cells with a minimum initial size of 0.678 nm
The maximum allowed number of cells is: X 8 Y 8 Z 8
Domain decomposition grid 1 x 4 x 1, separate PME ranks 0
PME domain decomposition: 1 x 4 x 1
Domain decomposition rank 0, coordinates 0 0 0
The initial number of communication pulses is: Y 1
The initial domain decomposition cell size is: Y 1.50 nm
The maximum allowed distance for atom groups involved in interactions is:
non-bonded interactions 1.356 nm
two-body bonded interactions (-rdd) 1.356 nm
multi-body bonded interactions (-rdd) 1.356 nm
virtual site constructions (-rcon) 1.503 nm
Using 4 MPI threads
Using 4 OpenMP threads per tMPI thread
Overriding thread affinity set outside gmx mdrun
Pinning threads with a user-specified logical core stride of 2
NOTE: Thread affinity was not set.
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -8.333e-06
Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size: 1176
Generated table with 1100 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1100 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Using SIMD 4x4 nonbonded short-range kernels
Using a 4x4 pair-list setup:
updated every 1 steps, buffer 0.000 nm, rlist 1.200 nm
Using geometric Lennard-Jones combination rule
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------
Linking all bonded interactions to atoms
There are 5407 inter charge-group virtual sites,
will an extra communication step for selected coordinates and forces
Note that activating steepest-descent energy minimization via the
integrator .mdp option and the command gmx mdrun may be available in a
different form in a future version of GROMACS, e.g. gmx minimize and an
.mdp option.
Initiating Steepest Descents
Atom distribution over 4 domains: av 6687 stddev 134 min 6515 max 6792
Started Steepest Descents on rank 0 Thu May 9 15:49:36 2019
More information about the gromacs.org_gmx-users
mailing list