[gmx-users] Workstation choice
Benson Muite
benson.muite at ut.ee
Mon Sep 10 20:20:48 CEST 2018
Some results (probably suboptimal) for d.poly-ch2 on a desktop running
Fedora 28 and using Gromacs-Opencl from Fedora repositories:
Log file opened on Mon Sep 10 21:00:25 2018
Host: mikihir pid: 32669 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2018.2 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C.
Berendsen
Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra
Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru
Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus
Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl
Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola
Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov
Michael Shirts Alfons Sijbers Peter Tieleman Teemu
Virolainen
Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2018.2
Executable: /usr/bin/gmx
Data prefix: /usr
Working dir: /home/benson/Projects/GromacsBench/d.poly-ch2
Command line:
gmx mdrun
GROMACS version: 2018.2
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: OpenCL
SIMD instructions: SSE2
FFT library: fftw-3.3.5-sse2-avx
RDTSCP usage: disabled
TNG support: enabled
Hwloc support: hwloc-1.11.6
Tracing support: disabled
Built on: 2018-07-19 19:45:21
Built by: mockbuild@ [CMAKE]
Build OS/arch: Linux 4.17.3-200.fc28.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel Core Processor (Haswell, no TSX)
Build CPU family: 6 Model: 60 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma intel
lahf mmx msr pcid pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1
sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 8.1.1
C compiler flags: -msse2 -O2 -g -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
-fstack-protector-strong -grecord-gcc-switches
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
-Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong
-grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
-DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 8.1.1
C++ compiler flags: -msse2 -O2 -g -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
-fstack-protector-strong -grecord-gcc-switches
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
-std=c++11 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
OpenCL include dir: /usr/include
OpenCL library: /usr/lib64/libOpenCL.so
OpenCL version: 2.0
Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU
Hardware detected:
CPU info:
Vendor: AMD
Brand: AMD FX(tm)-8350 Eight-Core Processor
Family: 21 Model: 2 Stepping: 0
Features: aes amd apic avx clfsh cmov cx8 cx16 f16c fma fma4 htt
lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp
sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
Hardware topology: Full, with devices
Sockets, cores, and logical processors:
Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7]
Numa nodes:
Node 0 (16714620928 bytes mem): 0 1 2 3 4 5 6 7
Latency:
0
0 1.00
Caches:
L1: 16384 bytes, linesize 64 bytes, assoc. 4, shared 1 ways
L2: 2097152 bytes, linesize 64 bytes, assoc. 16, shared 2 ways
L3: 8388608 bytes, linesize 64 bytes, assoc. 64, shared 8 ways
PCI devices:
0000:01:00.0 Id: 1002:67ef Class: 0x0300 Numa: 0
0000:02:00.0 Id: 10ec:8168 Class: 0x0200 Numa: 0
0000:00:11.0 Id: 1002:4391 Class: 0x0106 Numa: 0
GPU info:
Number of GPUs detected: 1
#0: name: Radeon RX 560 Series (POLARIS11 / DRM 3.23.0 /
4.16.3-301.fc28.x86_64, LLVM 6.0.0), vendor: AMD, device version: OpenCL
1.1 Mesa 18.0.5, stat: compatible
Highest SIMD level requested by all nodes in run: AVX_128_FMA
SIMD instructions selected at compile time: SSE2
This program was compiled for different hardware than you are running on,
which could influence performance.
The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake
option.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for
Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.001
nsteps = 5000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = -191216883
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 0
nstcalcenergy = 100
nstenergy = 0
nstxout-compressed = 0
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 0.9
coulombtype = Cut-off
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.12
fourier-nx = 0
fourier-ny = 0
fourier-nz = 0
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = Berendsen
nsttcouple = 20
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p (3x3):
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
grpopts:
nrdf: 17997
ref-t: 300
tau-t: 0.1
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing nstlist from 20 to 100, rlist from 0.9 to 0.905
Using 1 MPI thread
Using 8 OpenMP threads
1 GPU auto-selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
PP:0
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00
Using GPU 8x8 nonbonded short-range kernels
Using a 8x4 pair-list setup:
updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list
would be:
updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm
Using geometric Lennard-Jones combination rule
Removing pbc first time
Intra-simulation communication will occur every 20 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------
There are: 6000 Atoms
There are: 6000 VSites
Initial temperature: 450.358 K
Started mdrun on rank 0 Mon Sep 10 21:00:27 2018
Step Time
0 0.00000
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
1.10780e+04 1.13402e+04 1.88807e+04 -2.19619e+04 0.00000e+00
Potential Kinetic En. Total Energy Conserved En. Temperature
1.93369e+04 3.36615e+04 5.29983e+04 5.29983e+04 4.49913e+02
Pressure (bar)
8.20510e+02
Step Time
5000 5.00000
Writing checkpoint, step 5000 at Mon Sep 10 21:04:37 2018
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
7.30979e+03 7.57440e+03 1.48801e+04 -2.30979e+04 0.00000e+00
Potential Kinetic En. Total Energy Conserved En. Temperature
6.66641e+03 2.25799e+04 2.92463e+04 5.28503e+04 3.01799e+02
Pressure (bar)
-8.06942e+01
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 5001 steps using 51 frames
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
7.59408e+03 7.81450e+03 1.51294e+04 -2.29783e+04 0.00000e+00
Potential Kinetic En. Total Energy Conserved En. Temperature
7.55967e+03 2.30250e+04 3.05847e+04 5.29245e+04 3.07748e+02
Pressure (bar)
2.63622e+01
Total Virial (kJ/mol)
7.74123e+03 2.93639e+02 1.13344e+02
2.93639e+02 7.68271e+03 -3.40627e+02
1.13345e+02 -3.40625e+02 7.17150e+03
Pressure (bar)
-1.13044e+01 -5.10385e+01 -1.83614e+01
-5.10385e+01 -2.53181e+00 6.71371e+01
-1.83616e+01 6.71366e+01 9.29227e+01
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 27.160704 244.446 0.0
NxN RF Elec. + LJ [F] 24321.496320 924216.860 96.8
NxN RF Elec. + LJ [V&F] 250.586880 13531.692 1.4
Shift-X 0.612000 3.672 0.0
Bonds 30.000999 1770.059 0.2
Angles 29.995998 5039.328 0.5
RB-Dihedrals 29.990997 7407.776 0.8
Virial 0.614295 11.057 0.0
Stop-CM 0.624000 6.240 0.0
Calc-Ekin 6.024000 162.648 0.0
Virtual Site 3fd 29.995998 2849.620 0.3
Virtual Site 3fad 0.010002 1.760 0.0
-----------------------------------------------------------------------------
Total 955245.158 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 8 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Vsite constr. 1 8 5001 19.000 610.049 7.6
Neighbor search 1 8 51 0.878 28.190 0.4
Launch GPU ops. 1 8 5001 13.524 434.216 5.4
Force 1 8 5001 88.859 2853.066 35.5
Wait GPU NB local 1 8 5001 1.060 34.044 0.4
NB X/F buffer ops. 1 8 9951 41.072 1318.714 16.4
Vsite spread 1 8 5001 38.567 1238.308 15.4
Write traj. 1 8 1 0.062 1.999 0.0
Update 1 8 5001 44.615 1432.481 17.8
Rest 2.560 82.197 1.0
-----------------------------------------------------------------------------
Total 250.198 8033.266 100.0
-----------------------------------------------------------------------------
GPU timings
-----------------------------------------------------------------------------
Computing: Count Wall t (s) ms/step %
-----------------------------------------------------------------------------
Pair list H2D 51 0.001 0.024 0.0
X / q H2D 5001 0.029 0.006 0.3
Nonbonded F kernel 4950 8.437 1.704 77.8
Nonbonded F+ene+prune k. 51 0.213 4.167 2.0
F D2H 5001 2.171 0.434 20.0
-----------------------------------------------------------------------------
Total 10.851 2.170 100.0
-----------------------------------------------------------------------------
Average per-step force GPU/CPU evaluation time ratio: 2.170 ms/17.768 ms
= 0.122
Core t (s) Wall t (s) (%)
Time: 2001.585 250.198 800.0
(ns/day) (hour/ns)
Performance: 1.727 13.897
Finished mdrun on rank 0 Mon Sep 10 21:04:37 2018
More information about the gromacs.org_gmx-users
mailing list