[gmx-users] Workstation choice

Sun Sep 9 08:09:50 CEST 2018

As an example on the d.poly-ch2 benchmark from 
ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz, I get:

Log file opened on Sun Sep  9 09:03:49 2018
Host: localhost.localdomain  pid: 14194  rank ID: 0  number of ranks:  1
                       :-) GROMACS - gmx mdrun, 2018.3 (-:

                             GROMACS is written by:
      Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. 
Berendsen
     Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra
   Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru
  Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson Jiri Kraus
   Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl
   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola
     Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov
    Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu 
Virolainen
  Christian Wennberg    Maarten Wolf
                            and the project leaders:
         Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2018.3
Executable: 
/home/benson/Projects/GromacsTest/gmx-bench3/d.poly-ch2/../../gromacsinstall/bin/gmx
Data prefix: 
/home/benson/Projects/GromacsTest/gmx-bench3/d.poly-ch2/../../gromacsinstall
Working dir: /home/benson/Projects/GromacsTest/gmx-bench3/d.poly-ch2
Command line:
   gmx mdrun -deffnm mdbench

GROMACS version:    2018.3
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        OpenCL
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.8-sse2-avx-avx2-avx2_128-avx512
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
Built on:           2018-08-24 19:14:49
Built by:           benson at localhost.localdomain [CMAKE]
Build OS/arch:      Linux 4.17.14-202.fc28.x86_64 x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Build CPU family:   6   Model: 78   Stepping: 3
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt 
intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm p
dpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 8.1.1
C compiler flags:    -march=core-avx2     -O3 -DNDEBUG 
-funroll-all-loops -fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 8.1.1
C++ compiler flags:  -march=core-avx2    -std=c++11   -O3 -DNDEBUG 
-funroll-all-loops -fexcess-precision=fast
OpenCL include dir: /usr/include
OpenCL library:     /usr/lib64/libOpenCL.so
OpenCL version:     2.0

Running on 1 node with total 2 cores, 4 logical cores, 0 compatible GPUs
Hardware detected:
   CPU info:
     Vendor: Intel
     Brand:  Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
     Family: 6   Model: 78   Stepping: 3
     Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel 
lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb
  popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
   Hardware topology: Basic
     Sockets, cores, and logical processors:
       Socket  0: [   0   2] [   1   3]
   GPU info:
     Number of GPUs detected: 1
     #0: name: Intel(R) HD Graphics Skylake ULT GT2, vendor: Intel, 
device version: OpenCL 2.0 beignet 1.3, stat: incompati
ble

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for 
Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
    integrator                     = md
    tinit                          = 0
    dt                             = 0.001
    nsteps                         = 5000
    init-step                      = 0
    simulation-part                = 1
    comm-mode                      = Linear
    nstcomm                        = 100
    bd-fric                        = 0
    ld-seed                        = 142701291
    emtol                          = 10
    emstep                         = 0.01
    niter                          = 20
    fcstep                         = 0
    nstcgsteep                     = 1000
    nbfgscorr                      = 10
    rtpi                           = 0.05
    nstxout                        = 0
    nstvout                        = 0
    nstfout                        = 0
    nstlog                         = 0
    nstcalcenergy                  = 100
    nstenergy                      = 0
    nstxout-compressed             = 0
    compressed-x-precision         = 1000
    cutoff-scheme                  = Verlet
    nstlist                        = 20
    ns-type                        = Grid
    pbc                            = xyz
    periodic-molecules             = false
    verlet-buffer-tolerance        = 0.005
    rlist                          = 0.9
    coulombtype                    = Cut-off
    coulomb-modifier               = Potential-shift
    rcoulomb-switch                = 0
    rcoulomb                       = 0.9
    epsilon-r                      = 1
    epsilon-rf                     = inf
    vdw-type                       = Cut-off
    vdw-modifier                   = Potential-shift
    rvdw-switch                    = 0
    rvdw                           = 0.9
    DispCorr                       = No
    table-extension                = 1
    fourierspacing                 = 0.12
    fourier-nx                     = 0
    fourier-ny                     = 0
    fourier-nz                     = 0
    pme-order                      = 4
    ewald-rtol                     = 1e-05
    ewald-rtol-lj                  = 0.001
    lj-pme-comb-rule               = Geometric
    ewald-geometry                 = 0
    epsilon-surface                = 0
    implicit-solvent               = No
    gb-algorithm                   = Still
    nstgbradii                     = 1
    rgbradii                       = 1
    gb-epsilon-solvent             = 80
    gb-saltconc                    = 0
    gb-obc-alpha                   = 1
    gb-obc-beta                    = 0.8
    gb-obc-gamma                   = 4.85
    gb-dielectric-offset           = 0.009
    sa-algorithm                   = Ace-approximation
    sa-surface-tension             = 2.05016
    tcoupl                         = Berendsen
    nsttcouple                     = 20
    nh-chain-length                = 0
    print-nose-hoover-chain-variables = false
    pcoupl                         = No
    pcoupltype                     = Isotropic
    nstpcouple                     = -1
    tau-p                          = 1
    compressibility (3x3):
       compressibility[    0]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
       compressibility[    1]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
       compressibility[    2]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
    ref-p (3x3):
       ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
    refcoord-scaling               = No
    posres-com (3):
       posres-com[0]= 0.00000e+00
       posres-com[1]= 0.00000e+00
       posres-com[2]= 0.00000e+00
    posres-comB (3):
       posres-comB[0]= 0.00000e+00
       posres-comB[1]= 0.00000e+00
       posres-comB[2]= 0.00000e+00
    QMMM                           = false
    QMconstraints                  = 0
    QMMMscheme                     = 0
    MMChargeScaleFactor            = 1
qm-opts:
    ngQM                           = 0
    constraint-algorithm           = Lincs
    continuation                   = false
    Shake-SOR                      = false
    shake-tol                      = 0.0001
    lincs-order                    = 4
    lincs-iter                     = 1
    lincs-warnangle                = 30
    nwall                          = 0
    wall-type                      = 9-3
    wall-r-linpot                  = -1
    wall-atomtype[0]               = -1
    wall-atomtype[1]               = -1
    wall-density[0]                = 0
    wall-density[1]                = 0
    wall-ewald-zfac                = 3
    pull                           = false
    awh                            = false
    rotation                       = false
    interactiveMD                  = false
    disre                          = No
    disre-weighting                = Conservative
    disre-mixed                    = false
    dr-fc                          = 1000
    dr-tau                         = 0
    nstdisreout                    = 100
    orire-fc                       = 0
    orire-tau                      = 0
    nstorireout                    = 100
    free-energy                    = no
    cos-acceleration               = 0
    deform (3x3):
       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
    simulated-tempering            = false
    swapcoords                     = no
    userint1                       = 0
    userint2                       = 0
    userint3                       = 0
    userint4                       = 0
    userreal1                      = 0
    userreal2                      = 0
    userreal3                      = 0
    userreal4                      = 0
    applied-forces:
      electric-field:
        x:
          E0                       = 0
          omega                    = 0
          t0                       = 0
          sigma                    = 0
        y:
          E0                       = 0
          omega                    = 0
          t0                       = 0
          sigma                    = 0
        z:
          E0                       = 0
          omega                    = 0
          t0                       = 0
          sigma                    = 0
grpopts:
    nrdf:       17997
    ref-t:         300
    tau-t:         0.1
annealing:          No
annealing-npoints:           0
    acc:               0           0           0
    nfreeze:           N           N           N
    energygrp-flags[  0]: 0

Changing nstlist from 20 to 100, rlist from 0.9 to 0.905

Using 1 MPI thread
Using 4 OpenMP threads

Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00

Using SIMD 4x8 nonbonded short-range kernels

Using a 4x8 pair-list setup:
   updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list 
would be:
   updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm

Using geometric Lennard-Jones combination rule

Removing pbc first time

Intra-simulation communication will occur every 20 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
   0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------

There are: 6000 Atoms
There are: 6000 VSites
Initial temperature: 450.358 K

Started mdrun on rank 0 Sun Sep  9 09:03:49 2018
            Step           Time
               0        0.00000

    Energies (kJ/mol)
            Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
     1.10780e+04    1.13402e+04    1.88807e+04   -2.19620e+04 0.00000e+00
       Potential    Kinetic En.   Total Energy  Conserved En. Temperature
     1.93369e+04    3.36615e+04    5.29983e+04    5.29983e+04 4.49913e+02
  Pressure (bar)
     8.20511e+02

            Step           Time
            5000        5.00000

Writing checkpoint, step 5000 at Sun Sep  9 09:04:00 2018

    Energies (kJ/mol)
            Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
     7.34672e+03    7.53008e+03    1.48310e+04   -2.30530e+04 0.00000e+00
       Potential    Kinetic En.   Total Energy  Conserved En. Temperature
     6.65487e+03    2.24076e+04    2.90624e+04    5.28535e+04 2.99496e+02
  Pressure (bar)
    -2.27502e+02

     <======  ###############  ==>
     <====  A V E R A G E S  ====>
     <==  ###############  ======>

     Statistics over 5001 steps using 51 frames

    Energies (kJ/mol)
            Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
     7.57761e+03    7.80533e+03    1.51049e+04   -2.29507e+04 0.00000e+00
       Potential    Kinetic En.   Total Energy  Conserved En. Temperature
     7.53714e+03    2.29835e+04    3.05206e+04    5.29261e+04 3.07193e+02
  Pressure (bar)
     4.62342e+01

    Total Virial (kJ/mol)
     7.64724e+03    1.85579e+02    9.92485e+01
     1.85576e+02    7.22973e+03   -1.43850e+02
     9.92489e+01   -1.43849e+02    7.35308e+03

    Pressure (bar)
     5.66092e+00   -3.86826e+01   -1.72257e+01
    -3.86822e+01    7.39537e+01    2.80894e+01
    -1.72257e+01    2.80893e+01    5.90880e+01

     M E G A - F L O P S   A C C O U N T I N G

  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
  V&F=Potential and force  V=Potential only  F=Force only

  Computing:                               M-Number M-Flops  % Flops
-----------------------------------------------------------------------------
  Pair Search distance check             144.857066 1303.714     0.3
  NxN LJ [F]                           14451.085440 476885.820    95.0
  NxN LJ [V&F]                           148.897120 6402.576     1.3
  Shift-X                                  0.612000 3.672     0.0
  Bonds                                   30.000999 1770.059     0.4
  Angles                                  29.995998 5039.328     1.0
  RB-Dihedrals                            29.990997 7407.776     1.5
  Virial                                   0.614295 11.057     0.0
  Stop-CM                                  0.624000 6.240     0.0
  Calc-Ekin                                6.024000 162.648     0.0
  Virtual Site 3fd                        29.995998 2849.620     0.6
  Virtual Site 3fad                        0.010002 1.760     0.0
-----------------------------------------------------------------------------
  Total 501844.270   100.0
-----------------------------------------------------------------------------

      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 4 OpenMP threads

  Computing:          Num   Num      Call    Wall time Giga-Cycles
                      Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
  Vsite constr.          1    4       5001       0.623 5.985   5.7
  Neighbor search        1    4         51       0.293 2.814   2.7
  Force                  1    4       5001       8.681 83.340  78.9
  NB X/F buffer ops.     1    4       9951       0.277 2.660   2.5
  Vsite spread           1    4       5001       0.874 8.392   7.9
  Write traj.            1    4          1       0.051 0.489   0.5
  Update                 1    4       5001       0.152 1.462   1.4
  Rest                                           0.047 0.450   0.4
-----------------------------------------------------------------------------
  Total                                         10.999 105.592 100.0
-----------------------------------------------------------------------------

                Core t (s)   Wall t (s)        (%)
        Time:       43.996       10.999      400.0
                  (ns/day)    (hour/ns)
Performance:       39.285        0.611
Finished mdrun on rank 0 Sun Sep  9 09:04:00 2018

On 09/09/2018 08:59 AM, Benson Muite wrote:
> This is old, but seems to indicate Beowulf clusters work quite well:
>
> https://docs.uabgrid.uab.edu/wiki/Gromacs_Benchmark
>
> Szilárd had helped create a benchmark data set available at:
> http://www.gromacs.org/About_Gromacs/Benchmarks
> http://www.gromacs.org/@api/deki/files/240/=gromacs-5.0-benchmarks.pdf
> ftp://ftp.gromacs.org/pub/benchmarks/gmxbench-3.0.tar.gz
>
> Does your use case involves a large number of ensemble simulations 
> which can be done in single precision without error correction? If so 
> might you be better building a small Beowulf cluster with lower spec 
> processors that have integrated GPUs? For example a Ryzen 3 with 
> integrated graphics is about $100. Motherboard, RAM, power supply 
> would probably get you to about $300. Intel core I3 bundle would be 
> about $350.  Setup could be done using OpenHPC stack:
> http://www.openhpc.community/
>
> This would get you a personal 5-7 node in house cluster. However, 
> ability to do maintenance, have local support for repair may also be 
> important in considering system lifetime cost, not just initial 
> purchase price. Gromacs current and future support for OpenCl, and 
> likely also important here.
>
> At least one computer store in my region has allowed benchmarking.
>
> On 09/07/2018 09:40 PM, Olga Selyutina wrote:
>> Hi,
>> A lot of thanks for valuable information.
>> If it isn’t difficult for you, could you answer how the growth of
>> performance under using the second GPU on the single simulation was 
>> changed
>> in GROMACS 2018 vs older versions (2016, 5.1, it was 20-30% higher)?
>>
>>
>> 2018-09-07 23:25 GMT+07:00 Szilárd Páll <pall.szilard at gmail.com>:
>>
>>> Are you intending to use it mostly/only for running simulations or 
>>> also as
>>> a desktop computer?
>>>
>>> Yes, it will be mostly used for simulations.
>>
>>> I'm not on the top of pricing details so you should probably look at 
>>> some
>>> configs and get back with concrete CPU + GPU (+price) combinations 
>>> and we
>>> might be able to guesstimate what's best.
>>>
>>>
>> These sets of CPU and GPU are suitable for price (in our region):
>> *GPU*
>> GTX 1070 ~1700MHz, cuda 1920 - $514
>> GTX 1080 ~1700MHz, cuda 2560 - $615
>> GTX 1070Ti ~1700MHz, cuda 2432 - $615
>> GTX 1080Ti ~1600MHz, cuda 3584 - $930
>>
>> *CPU*
>> Ryzen 7 2700X - $357
>> 4200MHz, 8/16 cores/threads, cache L1/L2/L3 768KB/4MB/16MB, 105W, 
>> max.T 85C
>>
>> Threadripper 1950X - $930
>> 4000MHz, 16/32 cores/threads, cache  L1/L2/L3 1.5/8/32MB, 180W, max.T 
>> 68C
>>
>> i7 8086K - $515
>> 4800MHz, 6/12 cores/threads, cache L2/L3 1.5/12MB, 95W, max.T 100C
>>
>> i7 8700K - $442
>> 4600MHz, 6/12 cores/threads, cache L2/L3 1.5/12MB, 95W, max.T 100C
>>
>> The most suitable combinations CPU+GPU are as follows:
>> 1) Ryzen 7 2700X + two GTX 1080 - $1587
>> 1.1) Ryzen 7 2700X + one GTX 1080 + one GTX 1080*Ti* - $1900 (maybe?)
>> 2) Threadripper 1950X + one GTX 1080Ti - $1860
>> 3) i7 8700K + two GTX 1080 - $1672
>> 4) Ryzen 7 2700X + three GTX 1070 - $1900
>> My suggestions:
>> Variant 1 seems to be the most suitable.
>> Variant 2 seems to be suitable only if the single simulation is 
>> running on
>> workstation
>> It’s a bit confusing that in synthetic tests/games performance of i7 
>> 8700
>> is higher than Ryzen 7 2700.
>> Thanks a lot again for your advice, it has already clarified a lot!
>>
>>
>>
>

-- 
Hajussüsteemide Teadur
Arvutiteaduse Instituut
Tartu Ülikool
J. Liivi 2, 50409
Tartu
http://kodu.ut.ee/~benson/
----
Research Fellow of Distributed Systems
Institute of Computer Science
University of Tartu
J. Liivi 2 50409
Tartu, Estonia
http://kodu.ut.ee/~benson/