[gmx-users] Workstation choice
Benson Muite
benson.muite at ut.ee
Sun Sep 9 08:09:50 CEST 2018
As an example on the d.poly-ch2 benchmark from
ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz, I get:
Log file opened on Sun Sep 9 09:03:49 2018
Host: localhost.localdomain pid: 14194 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2018.3 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C.
Berendsen
Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra
Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru
Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus
Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl
Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola
Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov
Michael Shirts Alfons Sijbers Peter Tieleman Teemu
Virolainen
Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2018.3
Executable:
/home/benson/Projects/GromacsTest/gmx-bench3/d.poly-ch2/../../gromacsinstall/bin/gmx
Data prefix:
/home/benson/Projects/GromacsTest/gmx-bench3/d.poly-ch2/../../gromacsinstall
Working dir: /home/benson/Projects/GromacsTest/gmx-bench3/d.poly-ch2
Command line:
gmx mdrun -deffnm mdbench
GROMACS version: 2018.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: OpenCL
SIMD instructions: AVX2_256
FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: 2018-08-24 19:14:49
Built by: benson at localhost.localdomain [CMAKE]
Build OS/arch: Linux 4.17.14-202.fc28.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Build CPU family: 6 Model: 78 Stepping: 3
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt
intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm p
dpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 8.1.1
C compiler flags: -march=core-avx2 -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 8.1.1
C++ compiler flags: -march=core-avx2 -std=c++11 -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
OpenCL include dir: /usr/include
OpenCL library: /usr/lib64/libOpenCL.so
OpenCL version: 2.0
Running on 1 node with total 2 cores, 4 logical cores, 0 compatible GPUs
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Family: 6 Model: 78 Stepping: 3
Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel
lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb
popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 2] [ 1 3]
GPU info:
Number of GPUs detected: 1
#0: name: Intel(R) HD Graphics Skylake ULT GT2, vendor: Intel,
device version: OpenCL 2.0 beignet 1.3, stat: incompati
ble
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for
Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.001
nsteps = 5000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 142701291
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 0
nstcalcenergy = 100
nstenergy = 0
nstxout-compressed = 0
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 0.9
coulombtype = Cut-off
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 0.9
DispCorr = No
table-extension = 1
fourierspacing = 0.12
fourier-nx = 0
fourier-ny = 0
fourier-nz = 0
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = Berendsen
nsttcouple = 20
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = No
pcoupltype = Isotropic
nstpcouple = -1
tau-p = 1
compressibility (3x3):
compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p (3x3):
ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = false
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
grpopts:
nrdf: 17997
ref-t: 300
tau-t: 0.1
annealing: No
annealing-npoints: 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing nstlist from 20 to 100, rlist from 0.9 to 0.905
Using 1 MPI thread
Using 4 OpenMP threads
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00
Using SIMD 4x8 nonbonded short-range kernels
Using a 4x8 pair-list setup:
updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list
would be:
updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm
Using geometric Lennard-Jones combination rule
Removing pbc first time
Intra-simulation communication will occur every 20 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------
There are: 6000 Atoms
There are: 6000 VSites
Initial temperature: 450.358 K
Started mdrun on rank 0 Sun Sep 9 09:03:49 2018
Step Time
0 0.00000
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
1.10780e+04 1.13402e+04 1.88807e+04 -2.19620e+04 0.00000e+00
Potential Kinetic En. Total Energy Conserved En. Temperature
1.93369e+04 3.36615e+04 5.29983e+04 5.29983e+04 4.49913e+02
Pressure (bar)
8.20511e+02
Step Time
5000 5.00000
Writing checkpoint, step 5000 at Sun Sep 9 09:04:00 2018
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
7.34672e+03 7.53008e+03 1.48310e+04 -2.30530e+04 0.00000e+00
Potential Kinetic En. Total Energy Conserved En. Temperature
6.65487e+03 2.24076e+04 2.90624e+04 5.28535e+04 2.99496e+02
Pressure (bar)
-2.27502e+02
<====== ############### ==>
<==== A V E R A G E S ====>
<== ############### ======>
Statistics over 5001 steps using 51 frames
Energies (kJ/mol)
Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
7.57761e+03 7.80533e+03 1.51049e+04 -2.29507e+04 0.00000e+00
Potential Kinetic En. Total Energy Conserved En. Temperature
7.53714e+03 2.29835e+04 3.05206e+04 5.29261e+04 3.07193e+02
Pressure (bar)
4.62342e+01
Total Virial (kJ/mol)
7.64724e+03 1.85579e+02 9.92485e+01
1.85576e+02 7.22973e+03 -1.43850e+02
9.92489e+01 -1.43849e+02 7.35308e+03
Pressure (bar)
5.66092e+00 -3.86826e+01 -1.72257e+01
-3.86822e+01 7.39537e+01 2.80894e+01
-1.72257e+01 2.80893e+01 5.90880e+01
M E G A - F L O P S A C C O U N T I N G
NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
W3=SPC/TIP3p W4=TIP4p (single or pairs)
V&F=Potential and force V=Potential only F=Force only
Computing: M-Number M-Flops % Flops
-----------------------------------------------------------------------------
Pair Search distance check 144.857066 1303.714 0.3
NxN LJ [F] 14451.085440 476885.820 95.0
NxN LJ [V&F] 148.897120 6402.576 1.3
Shift-X 0.612000 3.672 0.0
Bonds 30.000999 1770.059 0.4
Angles 29.995998 5039.328 1.0
RB-Dihedrals 29.990997 7407.776 1.5
Virial 0.614295 11.057 0.0
Stop-CM 0.624000 6.240 0.0
Calc-Ekin 6.024000 162.648 0.0
Virtual Site 3fd 29.995998 2849.620 0.6
Virtual Site 3fad 0.010002 1.760 0.0
-----------------------------------------------------------------------------
Total 501844.270 100.0
-----------------------------------------------------------------------------
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 1 MPI rank, each using 4 OpenMP threads
Computing: Num Num Call Wall time Giga-Cycles
Ranks Threads Count (s) total sum %
-----------------------------------------------------------------------------
Vsite constr. 1 4 5001 0.623 5.985 5.7
Neighbor search 1 4 51 0.293 2.814 2.7
Force 1 4 5001 8.681 83.340 78.9
NB X/F buffer ops. 1 4 9951 0.277 2.660 2.5
Vsite spread 1 4 5001 0.874 8.392 7.9
Write traj. 1 4 1 0.051 0.489 0.5
Update 1 4 5001 0.152 1.462 1.4
Rest 0.047 0.450 0.4
-----------------------------------------------------------------------------
Total 10.999 105.592 100.0
-----------------------------------------------------------------------------
Core t (s) Wall t (s) (%)
Time: 43.996 10.999 400.0
(ns/day) (hour/ns)
Performance: 39.285 0.611
Finished mdrun on rank 0 Sun Sep 9 09:04:00 2018
On 09/09/2018 08:59 AM, Benson Muite wrote:
> This is old, but seems to indicate Beowulf clusters work quite well:
>
> https://docs.uabgrid.uab.edu/wiki/Gromacs_Benchmark
>
> Szilárd had helped create a benchmark data set available at:
> http://www.gromacs.org/About_Gromacs/Benchmarks
> http://www.gromacs.org/@api/deki/files/240/=gromacs-5.0-benchmarks.pdf
> ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz
>
> Does your use case involves a large number of ensemble simulations
> which can be done in single precision without error correction? If so
> might you be better building a small Beowulf cluster with lower spec
> processors that have integrated GPUs? For example a Ryzen 3 with
> integrated graphics is about $100. Motherboard, RAM, power supply
> would probably get you to about $300. Intel core I3 bundle would be
> about $350. Setup could be done using OpenHPC stack:
> http://www.openhpc.community/
>
> This would get you a personal 5-7 node in house cluster. However,
> ability to do maintenance, have local support for repair may also be
> important in considering system lifetime cost, not just initial
> purchase price. Gromacs current and future support for OpenCl, and
> likely also important here.
>
> At least one computer store in my region has allowed benchmarking.
>
> On 09/07/2018 09:40 PM, Olga Selyutina wrote:
>> Hi,
>> A lot of thanks for valuable information.
>> If it isn’t difficult for you, could you answer how the growth of
>> performance under using the second GPU on the single simulation was
>> changed
>> in GROMACS 2018 vs older versions (2016, 5.1, it was 20-30% higher)?
>>
>>
>> 2018-09-07 23:25 GMT+07:00 Szilárd Páll <pall.szilard at gmail.com>:
>>
>>> Are you intending to use it mostly/only for running simulations or
>>> also as
>>> a desktop computer?
>>>
>>> Yes, it will be mostly used for simulations.
>>
>>> I'm not on the top of pricing details so you should probably look at
>>> some
>>> configs and get back with concrete CPU + GPU (+price) combinations
>>> and we
>>> might be able to guesstimate what's best.
>>>
>>>
>> These sets of CPU and GPU are suitable for price (in our region):
>> *GPU*
>> GTX 1070 ~1700MHz, cuda 1920 - $514
>> GTX 1080 ~1700MHz, cuda 2560 - $615
>> GTX 1070Ti ~1700MHz, cuda 2432 - $615
>> GTX 1080Ti ~1600MHz, cuda 3584 - $930
>>
>> *CPU*
>> Ryzen 7 2700X - $357
>> 4200MHz, 8/16 cores/threads, cache L1/L2/L3 768KB/4MB/16MB, 105W,
>> max.T 85C
>>
>> Threadripper 1950X - $930
>> 4000MHz, 16/32 cores/threads, cache L1/L2/L3 1.5/8/32MB, 180W, max.T
>> 68C
>>
>> i7 8086K - $515
>> 4800MHz, 6/12 cores/threads, cache L2/L3 1.5/12MB, 95W, max.T 100C
>>
>> i7 8700K - $442
>> 4600MHz, 6/12 cores/threads, cache L2/L3 1.5/12MB, 95W, max.T 100C
>>
>> The most suitable combinations CPU+GPU are as follows:
>> 1) Ryzen 7 2700X + two GTX 1080 - $1587
>> 1.1) Ryzen 7 2700X + one GTX 1080 + one GTX 1080*Ti* - $1900 (maybe?)
>> 2) Threadripper 1950X + one GTX 1080Ti - $1860
>> 3) i7 8700K + two GTX 1080 - $1672
>> 4) Ryzen 7 2700X + three GTX 1070 - $1900
>> My suggestions:
>> Variant 1 seems to be the most suitable.
>> Variant 2 seems to be suitable only if the single simulation is
>> running on
>> workstation
>> It’s a bit confusing that in synthetic tests/games performance of i7
>> 8700
>> is higher than Ryzen 7 2700.
>> Thanks a lot again for your advice, it has already clarified a lot!
>>
>>
>>
>
--
Hajussüsteemide Teadur
Arvutiteaduse Instituut
Tartu Ülikool
J. Liivi 2, 50409
Tartu
http://kodu.ut.ee/~benson/
----
Research Fellow of Distributed Systems
Institute of Computer Science
University of Tartu
J. Liivi 2 50409
Tartu, Estonia
http://kodu.ut.ee/~benson/
More information about the gromacs.org_gmx-users
mailing list