[gmx-users] Workstation choice

Mon Sep 10 20:20:48 CEST 2018

Some results (probably suboptimal) for d.poly-ch2 on a desktop running 
Fedora 28 and using Gromacs-Opencl from Fedora repositories:

Log file opened on Mon Sep 10 21:00:25 2018
Host: mikihir  pid: 32669  rank ID: 0  number of ranks:  1
                       :-) GROMACS - gmx mdrun, 2018.2 (-:

                             GROMACS is written by:
      Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. 
Berendsen
     Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra
   Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru
  Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson        Jiri Kraus
   Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl
   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola
     Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov
    Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu 
Virolainen
  Christian Wennberg    Maarten Wolf
                            and the project leaders:
         Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2018.2
Executable:   /usr/bin/gmx
Data prefix:  /usr
Working dir:  /home/benson/Projects/GromacsBench/d.poly-ch2
Command line:
   gmx mdrun

GROMACS version:    2018.2
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        OpenCL
SIMD instructions:  SSE2
FFT library:        fftw-3.3.5-sse2-avx
RDTSCP usage:       disabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.6
Tracing support:    disabled
Built on:           2018-07-19 19:45:21
Built by:           mockbuild@ [CMAKE]
Build OS/arch:      Linux 4.17.3-200.fc28.x86_64 x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel Core Processor (Haswell, no TSX)
Build CPU family:   6   Model: 60   Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma intel 
lahf mmx msr pcid pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 
sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 8.1.1
C compiler flags:    -msse2   -O2 -g -pipe -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions 
-fstack-protector-strong -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 
-Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong 
-grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection  
-DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 8.1.1
C++ compiler flags:  -msse2   -O2 -g -pipe -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions 
-fstack-protector-strong -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic 
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection 
-std=c++11   -DNDEBUG -funroll-all-loops -fexcess-precision=fast
OpenCL include dir: /usr/include
OpenCL library:     /usr/lib64/libOpenCL.so
OpenCL version:     2.0

Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU
Hardware detected:
   CPU info:
     Vendor: AMD
     Brand:  AMD FX(tm)-8350 Eight-Core Processor
     Family: 21   Model: 2   Stepping: 0
     Features: aes amd apic avx clfsh cmov cx8 cx16 f16c fma fma4 htt 
lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp 
sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
   Hardware topology: Full, with devices
     Sockets, cores, and logical processors:
       Socket  0: [   0] [   1] [   2] [   3] [   4] [   5] [   6] [   7]
     Numa nodes:
       Node  0 (16714620928 bytes mem):   0   1   2   3   4   5   6 7
       Latency:
                0
          0  1.00
     Caches:
       L1: 16384 bytes, linesize 64 bytes, assoc. 4, shared 1 ways
       L2: 2097152 bytes, linesize 64 bytes, assoc. 16, shared 2 ways
       L3: 8388608 bytes, linesize 64 bytes, assoc. 64, shared 8 ways
     PCI devices:
       0000:01:00.0  Id: 1002:67ef  Class: 0x0300  Numa: 0
       0000:02:00.0  Id: 10ec:8168  Class: 0x0200  Numa: 0
       0000:00:11.0  Id: 1002:4391  Class: 0x0106  Numa: 0
   GPU info:
     Number of GPUs detected: 1
     #0: name: Radeon RX 560 Series (POLARIS11 / DRM 3.23.0 / 
4.16.3-301.fc28.x86_64, LLVM 6.0.0), vendor: AMD, device version: OpenCL 
1.1 Mesa 18.0.5, stat: compatible

Highest SIMD level requested by all nodes in run: AVX_128_FMA
SIMD instructions selected at compile time:       SSE2
This program was compiled for different hardware than you are running on,
which could influence performance.
The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake 
option.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for 
Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
    integrator                     = md
    tinit                          = 0
    dt                             = 0.001
    nsteps                         = 5000
    init-step                      = 0
    simulation-part                = 1
    comm-mode                      = Linear
    nstcomm                        = 100
    bd-fric                        = 0
    ld-seed                        = -191216883
    emtol                          = 10
    emstep                         = 0.01
    niter                          = 20
    fcstep                         = 0
    nstcgsteep                     = 1000
    nbfgscorr                      = 10
    rtpi                           = 0.05
    nstxout                        = 0
    nstvout                        = 0
    nstfout                        = 0
    nstlog                         = 0
    nstcalcenergy                  = 100
    nstenergy                      = 0
    nstxout-compressed             = 0
    compressed-x-precision         = 1000
    cutoff-scheme                  = Verlet
    nstlist                        = 20
    ns-type                        = Grid
    pbc                            = xyz
    periodic-molecules             = false
    verlet-buffer-tolerance        = 0.005
    rlist                          = 0.9
    coulombtype                    = Cut-off
    coulomb-modifier               = Potential-shift
    rcoulomb-switch                = 0
    rcoulomb                       = 0.9
    epsilon-r                      = 1
    epsilon-rf                     = inf
    vdw-type                       = Cut-off
    vdw-modifier                   = Potential-shift
    rvdw-switch                    = 0
    rvdw                           = 0.9
    DispCorr                       = No
    table-extension                = 1
    fourierspacing                 = 0.12
    fourier-nx                     = 0
    fourier-ny                     = 0
    fourier-nz                     = 0
    pme-order                      = 4
    ewald-rtol                     = 1e-05
    ewald-rtol-lj                  = 0.001
    lj-pme-comb-rule               = Geometric
    ewald-geometry                 = 0
    epsilon-surface                = 0
    implicit-solvent               = No
    gb-algorithm                   = Still
    nstgbradii                     = 1
    rgbradii                       = 1
    gb-epsilon-solvent             = 80
    gb-saltconc                    = 0
    gb-obc-alpha                   = 1
    gb-obc-beta                    = 0.8
    gb-obc-gamma                   = 4.85
    gb-dielectric-offset           = 0.009
    sa-algorithm                   = Ace-approximation
    sa-surface-tension             = 2.05016
    tcoupl                         = Berendsen
    nsttcouple                     = 20
    nh-chain-length                = 0
    print-nose-hoover-chain-variables = false
    pcoupl                         = No
    pcoupltype                     = Isotropic
    nstpcouple                     = -1
    tau-p                          = 1
    compressibility (3x3):
       compressibility[    0]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
       compressibility[    1]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
       compressibility[    2]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
    ref-p (3x3):
       ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
    refcoord-scaling               = No
    posres-com (3):
       posres-com[0]= 0.00000e+00
       posres-com[1]= 0.00000e+00
       posres-com[2]= 0.00000e+00
    posres-comB (3):
       posres-comB[0]= 0.00000e+00
       posres-comB[1]= 0.00000e+00
       posres-comB[2]= 0.00000e+00
    QMMM                           = false
    QMconstraints                  = 0
    QMMMscheme                     = 0
    MMChargeScaleFactor            = 1
qm-opts:
    ngQM                           = 0
    constraint-algorithm           = Lincs
    continuation                   = false
    Shake-SOR                      = false
    shake-tol                      = 0.0001
    lincs-order                    = 4
    lincs-iter                     = 1
    lincs-warnangle                = 30
    nwall                          = 0
    wall-type                      = 9-3
    wall-r-linpot                  = -1
    wall-atomtype[0]               = -1
    wall-atomtype[1]               = -1
    wall-density[0]                = 0
    wall-density[1]                = 0
    wall-ewald-zfac                = 3
    pull                           = false
    awh                            = false
    rotation                       = false
    interactiveMD                  = false
    disre                          = No
    disre-weighting                = Conservative
    disre-mixed                    = false
    dr-fc                          = 1000
    dr-tau                         = 0
    nstdisreout                    = 100
    orire-fc                       = 0
    orire-tau                      = 0
    nstorireout                    = 100
    free-energy                    = no
    cos-acceleration               = 0
    deform (3x3):
       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
    simulated-tempering            = false
    swapcoords                     = no
    userint1                       = 0
    userint2                       = 0
    userint3                       = 0
    userint4                       = 0
    userreal1                      = 0
    userreal2                      = 0
    userreal3                      = 0
    userreal4                      = 0
    applied-forces:
      electric-field:
        x:
          E0                       = 0
          omega                    = 0
          t0                       = 0
          sigma                    = 0
        y:
          E0                       = 0
          omega                    = 0
          t0                       = 0
          sigma                    = 0
        z:
          E0                       = 0
          omega                    = 0
          t0                       = 0
          sigma                    = 0
grpopts:
    nrdf:       17997
    ref-t:         300
    tau-t:         0.1
annealing:          No
annealing-npoints:           0
    acc:               0           0           0
    nfreeze:           N           N           N
    energygrp-flags[  0]: 0

Changing nstlist from 20 to 100, rlist from 0.9 to 0.905

Using 1 MPI thread
Using 8 OpenMP threads

1 GPU auto-selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
   PP:0
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00

Using GPU 8x8 nonbonded short-range kernels

Using a 8x4 pair-list setup:
   updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list 
would be:
   updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm

Using geometric Lennard-Jones combination rule

Removing pbc first time

Intra-simulation communication will occur every 20 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
   0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------

There are: 6000 Atoms
There are: 6000 VSites
Initial temperature: 450.358 K

Started mdrun on rank 0 Mon Sep 10 21:00:27 2018
            Step           Time
               0        0.00000

    Energies (kJ/mol)
            Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
     1.10780e+04    1.13402e+04    1.88807e+04   -2.19619e+04 0.00000e+00
       Potential    Kinetic En.   Total Energy  Conserved En. Temperature
     1.93369e+04    3.36615e+04    5.29983e+04    5.29983e+04 4.49913e+02
  Pressure (bar)
     8.20510e+02

            Step           Time
            5000        5.00000

Writing checkpoint, step 5000 at Mon Sep 10 21:04:37 2018

    Energies (kJ/mol)
            Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
     7.30979e+03    7.57440e+03    1.48801e+04   -2.30979e+04 0.00000e+00
       Potential    Kinetic En.   Total Energy  Conserved En. Temperature
     6.66641e+03    2.25799e+04    2.92463e+04    5.28503e+04 3.01799e+02
  Pressure (bar)
    -8.06942e+01

     <======  ###############  ==>
     <====  A V E R A G E S  ====>
     <==  ###############  ======>

     Statistics over 5001 steps using 51 frames

    Energies (kJ/mol)
            Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
     7.59408e+03    7.81450e+03    1.51294e+04   -2.29783e+04 0.00000e+00
       Potential    Kinetic En.   Total Energy  Conserved En. Temperature
     7.55967e+03    2.30250e+04    3.05847e+04    5.29245e+04 3.07748e+02
  Pressure (bar)
     2.63622e+01

    Total Virial (kJ/mol)
     7.74123e+03    2.93639e+02    1.13344e+02
     2.93639e+02    7.68271e+03   -3.40627e+02
     1.13345e+02   -3.40625e+02    7.17150e+03

    Pressure (bar)
    -1.13044e+01   -5.10385e+01   -1.83614e+01
    -5.10385e+01   -2.53181e+00    6.71371e+01
    -1.83616e+01    6.71366e+01    9.29227e+01

     M E G A - F L O P S   A C C O U N T I N G

  NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
  RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
  W3=SPC/TIP3p  W4=TIP4p (single or pairs)
  V&F=Potential and force  V=Potential only  F=Force only

  Computing:                               M-Number         M-Flops % Flops
-----------------------------------------------------------------------------
  Pair Search distance check              27.160704 244.446     0.0
  NxN RF Elec. + LJ [F]                24321.496320 924216.860    96.8
  NxN RF Elec. + LJ [V&F]                250.586880 13531.692     1.4
  Shift-X                                  0.612000 3.672     0.0
  Bonds                                   30.000999 1770.059     0.2
  Angles                                  29.995998 5039.328     0.5
  RB-Dihedrals                            29.990997 7407.776     0.8
  Virial                                   0.614295 11.057     0.0
  Stop-CM                                  0.624000 6.240     0.0
  Calc-Ekin                                6.024000 162.648     0.0
  Virtual Site 3fd                        29.995998 2849.620     0.3
  Virtual Site 3fad                        0.010002 1.760     0.0
-----------------------------------------------------------------------------
  Total                                                  955245.158 100.0
-----------------------------------------------------------------------------

      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 8 OpenMP threads

  Computing:          Num   Num      Call    Wall time Giga-Cycles
                      Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
  Vsite constr.          1    8       5001      19.000 610.049   7.6
  Neighbor search        1    8         51       0.878 28.190   0.4
  Launch GPU ops.        1    8       5001      13.524 434.216   5.4
  Force                  1    8       5001      88.859 2853.066  35.5
  Wait GPU NB local      1    8       5001       1.060 34.044   0.4
  NB X/F buffer ops.     1    8       9951      41.072 1318.714  16.4
  Vsite spread           1    8       5001      38.567 1238.308  15.4
  Write traj.            1    8          1       0.062 1.999   0.0
  Update                 1    8       5001      44.615 1432.481  17.8
  Rest                                           2.560 82.197   1.0
-----------------------------------------------------------------------------
  Total                                        250.198       8033.266 100.0
-----------------------------------------------------------------------------

  GPU timings
-----------------------------------------------------------------------------
  Computing:                         Count  Wall t (s) ms/step       %
-----------------------------------------------------------------------------
  Pair list H2D                         51       0.001 0.024     0.0
  X / q H2D                           5001       0.029 0.006     0.3
  Nonbonded F kernel                  4950       8.437 1.704    77.8
  Nonbonded F+ene+prune k.              51       0.213 4.167     2.0
  F D2H                               5001       2.171 0.434    20.0
-----------------------------------------------------------------------------
  Total                                         10.851        2.170 100.0
-----------------------------------------------------------------------------

Average per-step force GPU/CPU evaluation time ratio: 2.170 ms/17.768 ms 
= 0.122

                Core t (s)   Wall t (s)        (%)
        Time:     2001.585      250.198      800.0
                  (ns/day)    (hour/ns)
Performance:        1.727       13.897
Finished mdrun on rank 0 Mon Sep 10 21:04:37 2018