[gmx-users] Workstation choice

Tue Sep 11 16:55:50 CEST 2018

Sadly, I can't recommend packaged versions of GROMACS for anything other
than pre- or post-processing or non-performance critical work; these are
compiled with proper SIMD support which is generally wasteful.

Also, I can't (yet) recommend AMD GPUs as a buying option for
consumer-grade stuff as we don't yet have PME offload support in OpenCL,
but this will soon change.

Additionally and importantly, I can't recommend the MESA stack, it's just
not competitive in performance. Use ROCm (or AMDGPU-PRO).

--
Szilárd

On Mon, Sep 10, 2018 at 8:21 PM Benson Muite <benson.muite at ut.ee> wrote:

> Some results (probably suboptimal) for d.poly-ch2 on a desktop running
> Fedora 28 and using Gromacs-Opencl from Fedora repositories:
>
> Log file opened on Mon Sep 10 21:00:25 2018
> Host: mikihir  pid: 32669  rank ID: 0  number of ranks:  1
>                        :-) GROMACS - gmx mdrun, 2018.2 (-:
>
>                              GROMACS is written by:
>       Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C.
> Berendsen
>      Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra
>    Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru
>   Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson        Jiri Kraus
>    Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl
>    Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola
>      Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov
>     Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu
> Virolainen
>   Christian Wennberg    Maarten Wolf
>                             and the project leaders:
>          Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
>
> Copyright (c) 1991-2000, University of Groningen, The Netherlands.
> Copyright (c) 2001-2017, The GROMACS development team at
> Uppsala University, Stockholm University and
> the Royal Institute of Technology, Sweden.
> check out http://www.gromacs.org for more information.
>
> GROMACS is free software; you can redistribute it and/or modify it
> under the terms of the GNU Lesser General Public License
> as published by the Free Software Foundation; either version 2.1
> of the License, or (at your option) any later version.
>
> GROMACS:      gmx mdrun, version 2018.2
> Executable:   /usr/bin/gmx
> Data prefix:  /usr
> Working dir:  /home/benson/Projects/GromacsBench/d.poly-ch2
> Command line:
>    gmx mdrun
>
> GROMACS version:    2018.2
> Precision:          single
> Memory model:       64 bit
> MPI library:        thread_mpi
> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> GPU support:        OpenCL
> SIMD instructions:  SSE2
> FFT library:        fftw-3.3.5-sse2-avx
> RDTSCP usage:       disabled
> TNG support:        enabled
> Hwloc support:      hwloc-1.11.6
> Tracing support:    disabled
> Built on:           2018-07-19 19:45:21
> Built by:           mockbuild@ [CMAKE]
> Build OS/arch:      Linux 4.17.3-200.fc28.x86_64 x86_64
> Build CPU vendor:   Intel
> Build CPU brand:    Intel Core Processor (Haswell, no TSX)
> Build CPU family:   6   Model: 60   Stepping: 1
> Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma intel
> lahf mmx msr pcid pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1
> sse4.2 ssse3 tdt x2apic
> C compiler:         /usr/bin/cc GNU 8.1.1
> C compiler flags:    -msse2   -O2 -g -pipe -Wall -Werror=format-security
> -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
> -fstack-protector-strong -grecord-gcc-switches
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
> -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong
> -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> C++ compiler:       /usr/bin/c++ GNU 8.1.1
> C++ compiler flags:  -msse2   -O2 -g -pipe -Wall -Werror=format-security
> -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
> -fstack-protector-strong -grecord-gcc-switches
> -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
> -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
> -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection
> -std=c++11   -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> OpenCL include dir: /usr/include
> OpenCL library:     /usr/lib64/libOpenCL.so
> OpenCL version:     2.0
>
>
> Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU
> Hardware detected:
>    CPU info:
>      Vendor: AMD
>      Brand:  AMD FX(tm)-8350 Eight-Core Processor
>      Family: 21   Model: 2   Stepping: 0
>      Features: aes amd apic avx clfsh cmov cx8 cx16 f16c fma fma4 htt
> lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp
> sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
>    Hardware topology: Full, with devices
>      Sockets, cores, and logical processors:
>        Socket  0: [   0] [   1] [   2] [   3] [   4] [   5] [   6] [   7]
>      Numa nodes:
>        Node  0 (16714620928 bytes mem):   0   1   2   3   4   5   6 7
>        Latency:
>                 0
>           0  1.00
>      Caches:
>        L1: 16384 bytes, linesize 64 bytes, assoc. 4, shared 1 ways
>        L2: 2097152 bytes, linesize 64 bytes, assoc. 16, shared 2 ways
>        L3: 8388608 bytes, linesize 64 bytes, assoc. 64, shared 8 ways
>      PCI devices:
>        0000:01:00.0  Id: 1002:67ef  Class: 0x0300  Numa: 0
>        0000:02:00.0  Id: 10ec:8168  Class: 0x0200  Numa: 0
>        0000:00:11.0  Id: 1002:4391  Class: 0x0106  Numa: 0
>    GPU info:
>      Number of GPUs detected: 1
>      #0: name: Radeon RX 560 Series (POLARIS11 / DRM 3.23.0 /
> 4.16.3-301.fc28.x86_64, LLVM 6.0.0), vendor: AMD, device version: OpenCL
> 1.1 Mesa 18.0.5, stat: compatible
>
> Highest SIMD level requested by all nodes in run: AVX_128_FMA
> SIMD instructions selected at compile time:       SSE2
> This program was compiled for different hardware than you are running on,
> which could influence performance.
> The current CPU can measure timings more accurately than the code in
> gmx mdrun was configured to use. This might affect your simulation
> speed as accurate timings are needed for load-balancing.
> Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake
> option.
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
> Lindahl
> GROMACS: High performance molecular simulations through multi-level
> parallelism from laptops to supercomputers
> SoftwareX 1 (2015) pp. 19-25
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
> Tackling Exascale Software Challenges in Molecular Dynamics Simulations
> with
> GROMACS
> In S. Markidis & E. Laure (Eds.), Solving Software Challenges for
> Exascale 8759 (2015) pp. 3-27
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
> Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E.
> Lindahl
> GROMACS 4.5: a high-throughput and highly parallel open source molecular
> simulation toolkit
> Bioinformatics 29 (2013) pp. 845-54
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
> GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
> molecular simulation
> J. Chem. Theory Comput. 4 (2008) pp. 435-447
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
> Berendsen
> GROMACS: Fast, Flexible and Free
> J. Comp. Chem. 26 (2005) pp. 1701-1719
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> E. Lindahl and B. Hess and D. van der Spoel
> GROMACS 3.0: A package for molecular simulation and trajectory analysis
> J. Mol. Mod. 7 (2001) pp. 306-317
> -------- -------- --- Thank You --- -------- --------
>
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, D. van der Spoel and R. van Drunen
> GROMACS: A message-passing parallel molecular dynamics implementation
> Comp. Phys. Comm. 91 (1995) pp. 43-56
> -------- -------- --- Thank You --- -------- --------
>
> Input Parameters:
>     integrator                     = md
>     tinit                          = 0
>     dt                             = 0.001
>     nsteps                         = 5000
>     init-step                      = 0
>     simulation-part                = 1
>     comm-mode                      = Linear
>     nstcomm                        = 100
>     bd-fric                        = 0
>     ld-seed                        = -191216883
>     emtol                          = 10
>     emstep                         = 0.01
>     niter                          = 20
>     fcstep                         = 0
>     nstcgsteep                     = 1000
>     nbfgscorr                      = 10
>     rtpi                           = 0.05
>     nstxout                        = 0
>     nstvout                        = 0
>     nstfout                        = 0
>     nstlog                         = 0
>     nstcalcenergy                  = 100
>     nstenergy                      = 0
>     nstxout-compressed             = 0
>     compressed-x-precision         = 1000
>     cutoff-scheme                  = Verlet
>     nstlist                        = 20
>     ns-type                        = Grid
>     pbc                            = xyz
>     periodic-molecules             = false
>     verlet-buffer-tolerance        = 0.005
>     rlist                          = 0.9
>     coulombtype                    = Cut-off
>     coulomb-modifier               = Potential-shift
>     rcoulomb-switch                = 0
>     rcoulomb                       = 0.9
>     epsilon-r                      = 1
>     epsilon-rf                     = inf
>     vdw-type                       = Cut-off
>     vdw-modifier                   = Potential-shift
>     rvdw-switch                    = 0
>     rvdw                           = 0.9
>     DispCorr                       = No
>     table-extension                = 1
>     fourierspacing                 = 0.12
>     fourier-nx                     = 0
>     fourier-ny                     = 0
>     fourier-nz                     = 0
>     pme-order                      = 4
>     ewald-rtol                     = 1e-05
>     ewald-rtol-lj                  = 0.001
>     lj-pme-comb-rule               = Geometric
>     ewald-geometry                 = 0
>     epsilon-surface                = 0
>     implicit-solvent               = No
>     gb-algorithm                   = Still
>     nstgbradii                     = 1
>     rgbradii                       = 1
>     gb-epsilon-solvent             = 80
>     gb-saltconc                    = 0
>     gb-obc-alpha                   = 1
>     gb-obc-beta                    = 0.8
>     gb-obc-gamma                   = 4.85
>     gb-dielectric-offset           = 0.009
>     sa-algorithm                   = Ace-approximation
>     sa-surface-tension             = 2.05016
>     tcoupl                         = Berendsen
>     nsttcouple                     = 20
>     nh-chain-length                = 0
>     print-nose-hoover-chain-variables = false
>     pcoupl                         = No
>     pcoupltype                     = Isotropic
>     nstpcouple                     = -1
>     tau-p                          = 1
>     compressibility (3x3):
>        compressibility[    0]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
>        compressibility[    1]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
>        compressibility[    2]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
>     ref-p (3x3):
>        ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>        ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>        ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>     refcoord-scaling               = No
>     posres-com (3):
>        posres-com[0]= 0.00000e+00
>        posres-com[1]= 0.00000e+00
>        posres-com[2]= 0.00000e+00
>     posres-comB (3):
>        posres-comB[0]= 0.00000e+00
>        posres-comB[1]= 0.00000e+00
>        posres-comB[2]= 0.00000e+00
>     QMMM                           = false
>     QMconstraints                  = 0
>     QMMMscheme                     = 0
>     MMChargeScaleFactor            = 1
> qm-opts:
>     ngQM                           = 0
>     constraint-algorithm           = Lincs
>     continuation                   = false
>     Shake-SOR                      = false
>     shake-tol                      = 0.0001
>     lincs-order                    = 4
>     lincs-iter                     = 1
>     lincs-warnangle                = 30
>     nwall                          = 0
>     wall-type                      = 9-3
>     wall-r-linpot                  = -1
>     wall-atomtype[0]               = -1
>     wall-atomtype[1]               = -1
>     wall-density[0]                = 0
>     wall-density[1]                = 0
>     wall-ewald-zfac                = 3
>     pull                           = false
>     awh                            = false
>     rotation                       = false
>     interactiveMD                  = false
>     disre                          = No
>     disre-weighting                = Conservative
>     disre-mixed                    = false
>     dr-fc                          = 1000
>     dr-tau                         = 0
>     nstdisreout                    = 100
>     orire-fc                       = 0
>     orire-tau                      = 0
>     nstorireout                    = 100
>     free-energy                    = no
>     cos-acceleration               = 0
>     deform (3x3):
>        deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>        deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>        deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
>     simulated-tempering            = false
>     swapcoords                     = no
>     userint1                       = 0
>     userint2                       = 0
>     userint3                       = 0
>     userint4                       = 0
>     userreal1                      = 0
>     userreal2                      = 0
>     userreal3                      = 0
>     userreal4                      = 0
>     applied-forces:
>       electric-field:
>         x:
>           E0                       = 0
>           omega                    = 0
>           t0                       = 0
>           sigma                    = 0
>         y:
>           E0                       = 0
>           omega                    = 0
>           t0                       = 0
>           sigma                    = 0
>         z:
>           E0                       = 0
>           omega                    = 0
>           t0                       = 0
>           sigma                    = 0
> grpopts:
>     nrdf:       17997
>     ref-t:         300
>     tau-t:         0.1
> annealing:          No
> annealing-npoints:           0
>     acc:               0           0           0
>     nfreeze:           N           N           N
>     energygrp-flags[  0]: 0
>
> Changing nstlist from 20 to 100, rlist from 0.9 to 0.905
>
>
> Using 1 MPI thread
> Using 8 OpenMP threads
>
> 1 GPU auto-selected for this run.
> Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
>    PP:0
> Pinning threads with an auto-selected logical core stride of 1
> System total charge: 0.000
> Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00
>
> Using GPU 8x8 nonbonded short-range kernels
>
> Using a 8x4 pair-list setup:
>    updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
> At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list
> would be:
>    updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm
>
> Using geometric Lennard-Jones combination rule
>
> Removing pbc first time
>
> Intra-simulation communication will occur every 20 steps.
> Center of mass motion removal mode is Linear
> We have the following groups for center of mass motion removal:
>    0:  rest
>
> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
> Molecular dynamics with coupling to an external bath
> J. Chem. Phys. 81 (1984) pp. 3684-3690
> -------- -------- --- Thank You --- -------- --------
>
> There are: 6000 Atoms
> There are: 6000 VSites
> Initial temperature: 450.358 K
>
> Started mdrun on rank 0 Mon Sep 10 21:00:27 2018
>             Step           Time
>                0        0.00000
>
>     Energies (kJ/mol)
>             Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
>      1.10780e+04    1.13402e+04    1.88807e+04   -2.19619e+04 0.00000e+00
>        Potential    Kinetic En.   Total Energy  Conserved En. Temperature
>      1.93369e+04    3.36615e+04    5.29983e+04    5.29983e+04 4.49913e+02
>   Pressure (bar)
>      8.20510e+02
>
>             Step           Time
>             5000        5.00000
>
> Writing checkpoint, step 5000 at Mon Sep 10 21:04:37 2018
>
>
>     Energies (kJ/mol)
>             Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
>      7.30979e+03    7.57440e+03    1.48801e+04   -2.30979e+04 0.00000e+00
>        Potential    Kinetic En.   Total Energy  Conserved En. Temperature
>      6.66641e+03    2.25799e+04    2.92463e+04    5.28503e+04 3.01799e+02
>   Pressure (bar)
>     -8.06942e+01
>
>      <======  ###############  ==>
>      <====  A V E R A G E S  ====>
>      <==  ###############  ======>
>
>      Statistics over 5001 steps using 51 frames
>
>     Energies (kJ/mol)
>             Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
>      7.59408e+03    7.81450e+03    1.51294e+04   -2.29783e+04 0.00000e+00
>        Potential    Kinetic En.   Total Energy  Conserved En. Temperature
>      7.55967e+03    2.30250e+04    3.05847e+04    5.29245e+04 3.07748e+02
>   Pressure (bar)
>      2.63622e+01
>
>     Total Virial (kJ/mol)
>      7.74123e+03    2.93639e+02    1.13344e+02
>      2.93639e+02    7.68271e+03   -3.40627e+02
>      1.13345e+02   -3.40625e+02    7.17150e+03
>
>     Pressure (bar)
>     -1.13044e+01   -5.10385e+01   -1.83614e+01
>     -5.10385e+01   -2.53181e+00    6.71371e+01
>     -1.83616e+01    6.71366e+01    9.29227e+01
>
>
>      M E G A - F L O P S   A C C O U N T I N G
>
>   NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
>   RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
>   W3=SPC/TIP3p  W4=TIP4p (single or pairs)
>   V&F=Potential and force  V=Potential only  F=Force only
>
>   Computing:                               M-Number         M-Flops % Flops
>
> -----------------------------------------------------------------------------
>   Pair Search distance check              27.160704 244.446     0.0
>   NxN RF Elec. + LJ [F]                24321.496320 924216.860    96.8
>   NxN RF Elec. + LJ [V&F]                250.586880 13531.692     1.4
>   Shift-X                                  0.612000 3.672     0.0
>   Bonds                                   30.000999 1770.059     0.2
>   Angles                                  29.995998 5039.328     0.5
>   RB-Dihedrals                            29.990997 7407.776     0.8
>   Virial                                   0.614295 11.057     0.0
>   Stop-CM                                  0.624000 6.240     0.0
>   Calc-Ekin                                6.024000 162.648     0.0
>   Virtual Site 3fd                        29.995998 2849.620     0.3
>   Virtual Site 3fad                        0.010002 1.760     0.0
>
> -----------------------------------------------------------------------------
>   Total                                                  955245.158 100.0
>
> -----------------------------------------------------------------------------
>
>
>       R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
>
> On 1 MPI rank, each using 8 OpenMP threads
>
>   Computing:          Num   Num      Call    Wall time Giga-Cycles
>                       Ranks Threads  Count      (s)         total sum    %
>
> -----------------------------------------------------------------------------
>   Vsite constr.          1    8       5001      19.000 610.049   7.6
>   Neighbor search        1    8         51       0.878 28.190   0.4
>   Launch GPU ops.        1    8       5001      13.524 434.216   5.4
>   Force                  1    8       5001      88.859 2853.066  35.5
>   Wait GPU NB local      1    8       5001       1.060 34.044   0.4
>   NB X/F buffer ops.     1    8       9951      41.072 1318.714  16.4
>   Vsite spread           1    8       5001      38.567 1238.308  15.4
>   Write traj.            1    8          1       0.062 1.999   0.0
>   Update                 1    8       5001      44.615 1432.481  17.8
>   Rest                                           2.560 82.197   1.0
>
> -----------------------------------------------------------------------------
>   Total                                        250.198       8033.266 100.0
>
> -----------------------------------------------------------------------------
>
>   GPU timings
>
> -----------------------------------------------------------------------------
>   Computing:                         Count  Wall t (s) ms/step       %
>
> -----------------------------------------------------------------------------
>   Pair list H2D                         51       0.001 0.024     0.0
>   X / q H2D                           5001       0.029 0.006     0.3
>   Nonbonded F kernel                  4950       8.437 1.704    77.8
>   Nonbonded F+ene+prune k.              51       0.213 4.167     2.0
>   F D2H                               5001       2.171 0.434    20.0
>
> -----------------------------------------------------------------------------
>   Total                                         10.851        2.170 100.0
>
> -----------------------------------------------------------------------------
>
> Average per-step force GPU/CPU evaluation time ratio: 2.170 ms/17.768 ms
> = 0.122
>
>                 Core t (s)   Wall t (s)        (%)
>         Time:     2001.585      250.198      800.0
>                   (ns/day)    (hour/ns)
> Performance:        1.727       13.897
> Finished mdrun on rank 0 Mon Sep 10 21:04:37 2018
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.