[gmx-users] Workstation choice
Szilárd Páll
pall.szilard at gmail.com
Tue Sep 11 16:55:50 CEST 2018
Sadly, I can't recommend packaged versions of GROMACS for anything other
than pre- or post-processing or non-performance critical work; these are
compiled with proper SIMD support which is generally wasteful.
Also, I can't (yet) recommend AMD GPUs as a buying option for
consumer-grade stuff as we don't yet have PME offload support in OpenCL,
but this will soon change.
Additionally and importantly, I can't recommend the MESA stack, it's just
not competitive in performance. Use ROCm (or AMDGPU-PRO).
--
Szilárd
On Mon, Sep 10, 2018 at 8:21 PM Benson Muite <benson.muite at ut.ee> wrote:
> Some results (probably suboptimal) for d.poly-ch2 on a desktop running
> Fedora 28 and using Gromacs-Opencl from Fedora repositories:
>
> Log file opened on Mon Sep 10 21:00:25 2018
> Host: mikihir pid: 32669 rank ID: 0 number of ranks: 1
> :-) GROMACS - gmx mdrun, 2018.2 (-:
>
> GROMACS is written by:
> Emile Apol Rossen Apostolov Paul Bauer Herman J.C.
> Berendsen
> Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra
> Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru
> Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus
> Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl
> Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola
> Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov
> Michael Shirts Alfons Sijbers Peter Tieleman Teemu
> Virolainen
> Christian Wennberg Maarten Wolf
> and the project leaders:
> Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2017, The GROMACS development team at
> Uppsala University, Stockholm University and
> the Royal Institute of Technology, Sweden.
> check out http://www.gromacs.org for more information.
>
> GROMACS is free software; you can redistribute it and/or modify it
> under the terms of the GNU Lesser General Public License
> as published by the Free Software Foundation; either version 2.1
> of the License, or (at your option) any later version.
>
> GROMACS: gmx mdrun, version 2018.2
> Executable: /usr/bin/gmx
> Data prefix: /usr
> Working dir: /home/benson/Projects/GromacsBench/d.poly-ch2
> Command line:
> gmx mdrun
>
> GROMACS version: 2018.2
> Precision: single
> Memory model: 64 bit
> MPI library: thread_mpi
> OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support: OpenCL
> SIMD instructions: SSE2
> FFT library: fftw-3.3.5-sse2-avx
> RDTSCP usage: disabled
> TNG support: enabled
> Hwloc support: hwloc-1.11.6
> Tracing support: disabled
> Built on: 2018-07-19 19:45:21
> Built by: mockbuild@ [CMAKE]
> Build OS/arch: Linux 4.17.3-200.fc28.x86_64 x86_64
> Build CPU vendor: Intel
> Build CPU brand: Intel Core Processor (Haswell, no TSX)
> Build CPU family: 6 Model: 60 Stepping: 1
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma intel
> lahf mmx msr pcid pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1
> sse4.2 ssse3 tdt x2apic
> C compiler: /usr/bin/cc GNU 8.1.1
> C compiler flags: -msse2 -O2 -g -pipe -Wall -Werror=format-security
> -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
> -fstack-protector-strong -grecord-gcc-switches
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong
> -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> C++ compiler: /usr/bin/c++ GNU 8.1.1
> C++ compiler flags: -msse2 -O2 -g -pipe -Wall -Werror=format-security
> -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
> -fstack-protector-strong -grecord-gcc-switches
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -std=c++11 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> OpenCL include dir: /usr/include
> OpenCL library: /usr/lib64/libOpenCL.so
> OpenCL version: 2.0
>
>
> Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU
> Hardware detected:
> CPU info:
> Vendor: AMD
> Brand: AMD FX(tm)-8350 Eight-Core Processor
> Family: 21 Model: 2 Stepping: 0
> Features: aes amd apic avx clfsh cmov cx8 cx16 f16c fma fma4 htt
> lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp
> sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
> Hardware topology: Full, with devices
> Sockets, cores, and logical processors:
> Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7]
> Numa nodes:
> Node 0 (16714620928 bytes mem): 0 1 2 3 4 5 6 7
> Latency:
> 0
> 0 1.00
> Caches:
> L1: 16384 bytes, linesize 64 bytes, assoc. 4, shared 1 ways
> L2: 2097152 bytes, linesize 64 bytes, assoc. 16, shared 2 ways
> L3: 8388608 bytes, linesize 64 bytes, assoc. 64, shared 8 ways
> PCI devices:
> 0000:01:00.0 Id: 1002:67ef Class: 0x0300 Numa: 0
> 0000:02:00.0 Id: 10ec:8168 Class: 0x0200 Numa: 0
> 0000:00:11.0 Id: 1002:4391 Class: 0x0106 Numa: 0
> GPU info:
> Number of GPUs detected: 1
> #0: name: Radeon RX 560 Series (POLARIS11 / DRM 3.23.0 /
> 4.16.3-301.fc28.x86_64, LLVM 6.0.0), vendor: AMD, device version: OpenCL
> 1.1 Mesa 18.0.5, stat: compatible
>
> Highest SIMD level requested by all nodes in run: AVX_128_FMA
> SIMD instructions selected at compile time: SSE2
> This program was compiled for different hardware than you are running on,
> which could influence performance.
> The current CPU can measure timings more accurately than the code in
> gmx mdrun was configured to use. This might affect your simulation
> speed as accurate timings are needed for load-balancing.
> Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake
> option.
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
> Lindahl
> GROMACS: High performance molecular simulations through multi-level
> parallelism from laptops to supercomputers
> SoftwareX 1 (2015) pp. 19-25
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
> Tackling Exascale Software Challenges in Molecular Dynamics Simulations
> with
> GROMACS
> In S. Markidis & E. Laure (Eds.), Solving Software Challenges for
> Exascale 8759 (2015) pp. 3-27
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
> Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E.
> Lindahl
> GROMACS 4.5: a high-throughput and highly parallel open source molecular
> simulation toolkit
> Bioinformatics 29 (2013) pp. 845-54
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
> GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
> molecular simulation
> J. Chem. Theory Comput. 4 (2008) pp. 435-447
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
> Berendsen
> GROMACS: Fast, Flexible and Free
> J. Comp. Chem. 26 (2005) pp. 1701-1719
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> E. Lindahl and B. Hess and D. van der Spoel
> GROMACS 3.0: A package for molecular simulation and trajectory analysis
> J. Mol. Mod. 7 (2001) pp. 306-317
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
> GROMACS: A message-passing parallel molecular dynamics implementation
> Comp. Phys. Comm. 91 (1995) pp. 43-56
> -------- -------- --- Thank You --- -------- --------
>
> Input Parameters:
> integrator = md
> tinit = 0
> dt = 0.001
> nsteps = 5000
> init-step = 0
> simulation-part = 1
> comm-mode = Linear
> nstcomm = 100
> bd-fric = 0
> ld-seed = -191216883
> emtol = 10
> emstep = 0.01
> niter = 20
> fcstep = 0
> nstcgsteep = 1000
> nbfgscorr = 10
> rtpi = 0.05
> nstxout = 0
> nstvout = 0
> nstfout = 0
> nstlog = 0
> nstcalcenergy = 100
> nstenergy = 0
> nstxout-compressed = 0
> compressed-x-precision = 1000
> cutoff-scheme = Verlet
> nstlist = 20
> ns-type = Grid
> pbc = xyz
> periodic-molecules = false
> verlet-buffer-tolerance = 0.005
> rlist = 0.9
> coulombtype = Cut-off
> coulomb-modifier = Potential-shift
> rcoulomb-switch = 0
> rcoulomb = 0.9
> epsilon-r = 1
> epsilon-rf = inf
> vdw-type = Cut-off
> vdw-modifier = Potential-shift
> rvdw-switch = 0
> rvdw = 0.9
> DispCorr = No
> table-extension = 1
> fourierspacing = 0.12
> fourier-nx = 0
> fourier-ny = 0
> fourier-nz = 0
> pme-order = 4
> ewald-rtol = 1e-05
> ewald-rtol-lj = 0.001
> lj-pme-comb-rule = Geometric
> ewald-geometry = 0
> epsilon-surface = 0
> implicit-solvent = No
> gb-algorithm = Still
> nstgbradii = 1
> rgbradii = 1
> gb-epsilon-solvent = 80
> gb-saltconc = 0
> gb-obc-alpha = 1
> gb-obc-beta = 0.8
> gb-obc-gamma = 4.85
> gb-dielectric-offset = 0.009
> sa-algorithm = Ace-approximation
> sa-surface-tension = 2.05016
> tcoupl = Berendsen
> nsttcouple = 20
> nh-chain-length = 0
> print-nose-hoover-chain-variables = false
> pcoupl = No
> pcoupltype = Isotropic
> nstpcouple = -1
> tau-p = 1
> compressibility (3x3):
> compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p (3x3):
> ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> refcoord-scaling = No
> posres-com (3):
> posres-com[0]= 0.00000e+00
> posres-com[1]= 0.00000e+00
> posres-com[2]= 0.00000e+00
> posres-comB (3):
> posres-comB[0]= 0.00000e+00
> posres-comB[1]= 0.00000e+00
> posres-comB[2]= 0.00000e+00
> QMMM = false
> QMconstraints = 0
> QMMMscheme = 0
> MMChargeScaleFactor = 1
> qm-opts:
> ngQM = 0
> constraint-algorithm = Lincs
> continuation = false
> Shake-SOR = false
> shake-tol = 0.0001
> lincs-order = 4
> lincs-iter = 1
> lincs-warnangle = 30
> nwall = 0
> wall-type = 9-3
> wall-r-linpot = -1
> wall-atomtype[0] = -1
> wall-atomtype[1] = -1
> wall-density[0] = 0
> wall-density[1] = 0
> wall-ewald-zfac = 3
> pull = false
> awh = false
> rotation = false
> interactiveMD = false
> disre = No
> disre-weighting = Conservative
> disre-mixed = false
> dr-fc = 1000
> dr-tau = 0
> nstdisreout = 100
> orire-fc = 0
> orire-tau = 0
> nstorireout = 100
> free-energy = no
> cos-acceleration = 0
> deform (3x3):
> deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
> simulated-tempering = false
> swapcoords = no
> userint1 = 0
> userint2 = 0
> userint3 = 0
> userint4 = 0
> userreal1 = 0
> userreal2 = 0
> userreal3 = 0
> userreal4 = 0
> applied-forces:
> electric-field:
> x:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> y:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> z:
> E0 = 0
> omega = 0
> t0 = 0
> sigma = 0
> grpopts:
> nrdf: 17997
> ref-t: 300
> tau-t: 0.1
> annealing: No
> annealing-npoints: 0
> acc: 0 0 0
> nfreeze: N N N
> energygrp-flags[ 0]: 0
>
> Changing nstlist from 20 to 100, rlist from 0.9 to 0.905
>
>
> Using 1 MPI thread
> Using 8 OpenMP threads
>
> 1 GPU auto-selected for this run.
> Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
> PP:0
> Pinning threads with an auto-selected logical core stride of 1
> System total charge: 0.000
> Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00
>
> Using GPU 8x8 nonbonded short-range kernels
>
> Using a 8x4 pair-list setup:
> updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
> At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list
> would be:
> updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm
>
> Using geometric Lennard-Jones combination rule
>
> Removing pbc first time
>
> Intra-simulation communication will occur every 20 steps.
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
> 0: rest
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
> Molecular dynamics with coupling to an external bath
> J. Chem. Phys. 81 (1984) pp. 3684-3690
> -------- -------- --- Thank You --- -------- --------
>
> There are: 6000 Atoms
> There are: 6000 VSites
> Initial temperature: 450.358 K
>
> Started mdrun on rank 0 Mon Sep 10 21:00:27 2018
> Step Time
> 0 0.00000
>
> Energies (kJ/mol)
> Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
> 1.10780e+04 1.13402e+04 1.88807e+04 -2.19619e+04 0.00000e+00
> Potential Kinetic En. Total Energy Conserved En. Temperature
> 1.93369e+04 3.36615e+04 5.29983e+04 5.29983e+04 4.49913e+02
> Pressure (bar)
> 8.20510e+02
>
> Step Time
> 5000 5.00000
>
> Writing checkpoint, step 5000 at Mon Sep 10 21:04:37 2018
>
>
> Energies (kJ/mol)
> Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
> 7.30979e+03 7.57440e+03 1.48801e+04 -2.30979e+04 0.00000e+00
> Potential Kinetic En. Total Energy Conserved En. Temperature
> 6.66641e+03 2.25799e+04 2.92463e+04 5.28503e+04 3.01799e+02
> Pressure (bar)
> -8.06942e+01
>
> <====== ############### ==>
> <==== A V E R A G E S ====>
> <== ############### ======>
>
> Statistics over 5001 steps using 51 frames
>
> Energies (kJ/mol)
> Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR)
> 7.59408e+03 7.81450e+03 1.51294e+04 -2.29783e+04 0.00000e+00
> Potential Kinetic En. Total Energy Conserved En. Temperature
> 7.55967e+03 2.30250e+04 3.05847e+04 5.29245e+04 3.07748e+02
> Pressure (bar)
> 2.63622e+01
>
> Total Virial (kJ/mol)
> 7.74123e+03 2.93639e+02 1.13344e+02
> 2.93639e+02 7.68271e+03 -3.40627e+02
> 1.13345e+02 -3.40625e+02 7.17150e+03
>
> Pressure (bar)
> -1.13044e+01 -5.10385e+01 -1.83614e+01
> -5.10385e+01 -2.53181e+00 6.71371e+01
> -1.83616e+01 6.71366e+01 9.29227e+01
>
>
> M E G A - F L O P S A C C O U N T I N G
>
> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table
> W3=SPC/TIP3p W4=TIP4p (single or pairs)
> V&F=Potential and force V=Potential only F=Force only
>
> Computing: M-Number M-Flops % Flops
>
> -----------------------------------------------------------------------------
> Pair Search distance check 27.160704 244.446 0.0
> NxN RF Elec. + LJ [F] 24321.496320 924216.860 96.8
> NxN RF Elec. + LJ [V&F] 250.586880 13531.692 1.4
> Shift-X 0.612000 3.672 0.0
> Bonds 30.000999 1770.059 0.2
> Angles 29.995998 5039.328 0.5
> RB-Dihedrals 29.990997 7407.776 0.8
> Virial 0.614295 11.057 0.0
> Stop-CM 0.624000 6.240 0.0
> Calc-Ekin 6.024000 162.648 0.0
> Virtual Site 3fd 29.995998 2849.620 0.3
> Virtual Site 3fad 0.010002 1.760 0.0
>
> -----------------------------------------------------------------------------
> Total 955245.158 100.0
>
> -----------------------------------------------------------------------------
>
>
> R E A L C Y C L E A N D T I M E A C C O U N T I N G
>
> On 1 MPI rank, each using 8 OpenMP threads
>
> Computing: Num Num Call Wall time Giga-Cycles
> Ranks Threads Count (s) total sum %
>
> -----------------------------------------------------------------------------
> Vsite constr. 1 8 5001 19.000 610.049 7.6
> Neighbor search 1 8 51 0.878 28.190 0.4
> Launch GPU ops. 1 8 5001 13.524 434.216 5.4
> Force 1 8 5001 88.859 2853.066 35.5
> Wait GPU NB local 1 8 5001 1.060 34.044 0.4
> NB X/F buffer ops. 1 8 9951 41.072 1318.714 16.4
> Vsite spread 1 8 5001 38.567 1238.308 15.4
> Write traj. 1 8 1 0.062 1.999 0.0
> Update 1 8 5001 44.615 1432.481 17.8
> Rest 2.560 82.197 1.0
>
> -----------------------------------------------------------------------------
> Total 250.198 8033.266 100.0
>
> -----------------------------------------------------------------------------
>
> GPU timings
>
> -----------------------------------------------------------------------------
> Computing: Count Wall t (s) ms/step %
>
> -----------------------------------------------------------------------------
> Pair list H2D 51 0.001 0.024 0.0
> X / q H2D 5001 0.029 0.006 0.3
> Nonbonded F kernel 4950 8.437 1.704 77.8
> Nonbonded F+ene+prune k. 51 0.213 4.167 2.0
> F D2H 5001 2.171 0.434 20.0
>
> -----------------------------------------------------------------------------
> Total 10.851 2.170 100.0
>
> -----------------------------------------------------------------------------
>
> Average per-step force GPU/CPU evaluation time ratio: 2.170 ms/17.768 ms
> = 0.122
>
> Core t (s) Wall t (s) (%)
> Time: 2001.585 250.198 800.0
> (ns/day) (hour/ns)
> Performance: 1.727 13.897
> Finished mdrun on rank 0 Mon Sep 10 21:04:37 2018
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list