[gmx-developers] Branches w/working OpenCL support

Mirco Wahab mirco.wahab at chemie.tu-freiberg.de
Thu Jun 18 23:18:12 CEST 2015


Hi Mark,

On 17.06.2015 23:31, Mark Abraham wrote:
> I've solved three correctness issues with the OpenCL implementation now,
> and NPT on AMD looks quite nice. Can you please try the code at
> https://gerrit.gromacs.org/#/c/4314/26 and let us know if the situation
> is improved?

I tried but could not really compile under VS-2013/MSVC-18. The problem 
is, as is already stated in a comment, the macro expansion into an
alignment declaration, which is not supported by MSVC:

[basedefinitions.h]
244:
   #if defined(_MSC_VER) && (_MSC_VER >= 1700 || defined(__ICL))
   #  define GMX_ALIGNMENT 1
   #  define GMX_ALIGNED(type, alignment) 
__declspec(align(alignment*sizeof(type))) type
   #elif defined(__GNUC__) || defined(__clang__)
   #  define GMX_ALIGNMENT 1
   ...


I then changed the "(alignment*sizeof(type))" expression
into "32" which would be probably the value it's been
expanded to:
  # define GMX_ALIGNED(type, alignment) __declspec(align(32)) type
but tested other values (16, 64).

I'm not sure this is the show-stopper, but any trial yielded the
same result. mdrun segfaults immediately after initializing
independent of the input file (but working fine in nb=cpu mode).

I'm attaching the log file here.

regards,

M.



--------------------- 8< [crashed md.log w/gpu] ------------------------
Log file opened on Thu Jun 18 23:09:02 2015
Host: DENEB  pid: 4884  rank ID: 0  number of ranks:  1
                :-) GROMACS - gmx mdrun, VERSION 5.1-beta1-dev (-:

                             GROMACS is written by:
      Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par 
Bjelkmar
  Aldert van Buuren   Rudi van Drunen     Anton Feenstra   Sebastian 
Fritsch
   Gerrit Groenhof   Christoph Junghans   Anca Hamuraru    Vincent 
Hindriksen
  Dimitrios Karkoulis    Peter Kasson     Carsten Kutzner      Per 
Larsson
   Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff    Erik 
Marklund
    Teemu Murtola       Szilard Pall       Sander Pronk      Roland 
Schulz
   Alexey Shvetsov     Michael Shirts     Alfons Sijbers     Peter 
Tieleman
   Teemu Virolainen  Christian Wennberg    Maarten Wolf
                            and the project leaders:
         Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, VERSION 5.1-beta1-dev
Executable:   D:\GromacsCL\bin\gmx.exe
Data prefix:  D:\GromacsCL
Command line:
   gmx mdrun -v

GROMACS version:    VERSION 5.1-beta1-dev
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:        enabled
OpenCL support:     enabled
invsqrt routine:    gmx_software_invsqrt(x)
SIMD instructions:  SSE2
FFT library:        fftw3
RDTSCP usage:       enabled
C++11 compilation:  disabled
TNG support:        enabled
Tracing support:    disabled
Built on:           Unknown date
Built by:           Anonymous at unknown [CMAKE]
Build OS/arch:      Windows-6.2 AMD64
Build CPU vendor:   AuthenticAMD
Build CPU brand:    AMD Phenom(tm) II X6 1090T Processor
Build CPU family:   16   Model: 10   Stepping: 0
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx 
msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
C compiler:         C:/Program Files (x86)/Microsoft Visual Studio 
12.0/VC/bin/x86_amd64/cl.exe MSVC 18.0.31101.0
C compiler flags:        /DWIN32 /D_WINDOWS /W3  /MD /O2 /Ob2 /D NDEBUG
C++ compiler:       C:/Program Files (x86)/Microsoft Visual Studio 
12.0/VC/bin/x86_amd64/cl.exe MSVC 18.0.31101.0
C++ compiler flags:      /DWIN32 /D_WINDOWS /W3 /GR /EHsc /wd4800 
/wd4355 /wd4996 /wd4305 /wd4244 /wd4101 /wd4267 /wd4090  /MD /O2 /Ob2 /D 
NDEBUG
Boost version:      1.57.0 (external)
OpenCL include dir: C:/Program Files (x86)/AMD APP SDK/3.0-0-Beta/include
OpenCL library:     C:/Program Files (x86)/AMD APP 
SDK/3.0-0-Beta/lib/x86_64/OpenCL.lib
OpenCL version:     2.0


Running on 1 node with total 6 cores, 6 hardware threads, 1 compatible GPU
Hardware detected:
   CPU info:
     Vendor: AuthenticAMD
     Brand:  AMD Phenom(tm) II X6 1090T Processor
     Family: 16  model: 10  stepping:  0
     CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx 
msr nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
     SIMD instructions most likely to fit this hardware: SSE2
     SIMD instructions selected at GROMACS compile time: SSE2
   GPU info:
     Number of GPUs detected: 1
     #0: name: Pitcairn, vendor: Advanced Micro Devices, Inc., device 
version: OpenCL 1.2 AMD-APP (1642.5), stat: compatible


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
    integrator                     = md
    tinit                          = 0
    dt                             = 0.001
    nsteps                         = 100000
    init-step                      = 0
    simulation-part                = 1
    comm-mode                      = Linear
    nstcomm                        = 100
    bd-fric                        = 0
    ld-seed                        = 665864
    emtol                          = 10
    emstep                         = 0.01
    niter                          = 20
    fcstep                         = 0
    nstcgsteep                     = 1000
    nbfgscorr                      = 10
    rtpi                           = 0.05
    nstxout                        = 0
    nstvout                        = 0
    nstfout                        = 0
    nstlog                         = 100
    nstcalcenergy                  = 100
    nstenergy                      = 1000
    nstxout-compressed             = 100
    compressed-x-precision         = 1000
    cutoff-scheme                  = Verlet
    nstlist                        = 25
    ns-type                        = Grid
    pbc                            = xyz
    periodic-molecules             = FALSE
    verlet-buffer-tolerance        = 0.005
    rlist                          = 1.097
    rlistlong                      = 1.097
    nstcalclr                      = 25
    coulombtype                    = Reaction-Field
    coulomb-modifier               = Potential-shift
    rcoulomb-switch                = 0
    rcoulomb                       = 1
    epsilon-r                      = 1
    epsilon-rf                     = inf
    vdw-type                       = Cut-off
    vdw-modifier                   = Potential-shift
    rvdw-switch                    = 0
    rvdw                           = 1
    DispCorr                       = No
    table-extension                = 1
    fourierspacing                 = 0.16
    fourier-nx                     = 0
    fourier-ny                     = 0
    fourier-nz                     = 0
    pme-order                      = 4
    ewald-rtol                     = 1e-005
    ewald-rtol-lj                  = 0.001
    lj-pme-comb-rule               = Geometric
    ewald-geometry                 = 0
    epsilon-surface                = 0
    implicit-solvent               = No
    gb-algorithm                   = Still
    nstgbradii                     = 1
    rgbradii                       = 1
    gb-epsilon-solvent             = 80
    gb-saltconc                    = 0
    gb-obc-alpha                   = 1
    gb-obc-beta                    = 0.8
    gb-obc-gamma                   = 4.85
    gb-dielectric-offset           = 0.009
    sa-algorithm                   = Ace-approximation
    sa-surface-tension             = 2.05016
    tcoupl                         = V-rescale
    nsttcouple                     = 25
    nh-chain-length                = 0
    print-nose-hoover-chain-variables = FALSE
    pcoupl                         = No
    pcoupltype                     = Isotropic
    nstpcouple                     = -1
    tau-p                          = 2
    compressibility (3x3):
       compressibility[    0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
       compressibility[    1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
       compressibility[    2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
    ref-p (3x3):
       ref-p[    0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
       ref-p[    1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
       ref-p[    2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
    refcoord-scaling               = No
    posres-com (3):
       posres-com[0]=0.00000e+000
       posres-com[1]=0.00000e+000
       posres-com[2]=0.00000e+000
    posres-comB (3):
       posres-comB[0]=0.00000e+000
       posres-comB[1]=0.00000e+000
       posres-comB[2]=0.00000e+000
    QMMM                           = FALSE
    QMconstraints                  = 0
    QMMMscheme                     = 0
    MMChargeScaleFactor            = 1
qm-opts:
    ngQM                           = 0
    constraint-algorithm           = Lincs
    continuation                   = FALSE
    Shake-SOR                      = FALSE
    shake-tol                      = 0.0001
    lincs-order                    = 4
    lincs-iter                     = 1
    lincs-warnangle                = 30
    nwall                          = 0
    wall-type                      = 9-3
    wall-r-linpot                  = -1
    wall-atomtype[0]               = -1
    wall-atomtype[1]               = -1
    wall-density[0]                = 0
    wall-density[1]                = 0
    wall-ewald-zfac                = 3
    pull                           = FALSE
    rotation                       = FALSE
    interactiveMD                  = FALSE
    disre                          = No
    disre-weighting                = Conservative
    disre-mixed                    = FALSE
    dr-fc                          = 1000
    dr-tau                         = 0
    nstdisreout                    = 100
    orire-fc                       = 0
    orire-tau                      = 0
    nstorireout                    = 100
    free-energy                    = no
    cos-acceleration               = 0
    deform (3x3):
       deform[    0]={0.00000e+000, 0.00000e+000, 0.00000e+000}
       deform[    1]={0.00000e+000, 0.00000e+000, 0.00000e+000}
       deform[    2]={0.00000e+000, 0.00000e+000, 0.00000e+000}
    simulated-tempering            = FALSE
    E-x:
       n = 0
    E-xt:
       n = 0
    E-y:
       n = 0
    E-yt:
       n = 0
    E-z:
       n = 0
    E-zt:
       n = 0
    swapcoords                     = no
    adress                         = FALSE
    userint1                       = 0
    userint2                       = 0
    userint3                       = 0
    userint4                       = 0
    userreal1                      = 0
    userreal2                      = 0
    userreal3                      = 0
    userreal4                      = 0
grpopts:
    nrdf:       17493
    ref-t:         300
    tau-t:         0.2
annealing:          No
annealing-npoints:           0
    acc:	           0           0           0
    nfreeze:           N           N           N
    energygrp-flags[  0]: 0



More information about the gromacs.org_gmx-developers mailing list