[gmx-users] Running GPU issue
Kovalskyy, Dmytro
Kovalskyy at uthscsa.edu
Wed Nov 14 19:58:14 CET 2018
Mark,
Thank you. Then I have an issue I can not find a way to solve.
My MD using GPU fails at the very beginning while CPU-only MD runs no problem with the same tpr file.
I can not find what "HtoD cudaMemcpyAsync failed: invalid argument" means.
Here is some diagnostics.
$ uname -a
Linux didesk 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
dikov at didesk ~ $ gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
dikov at didesk ~ $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
dikov at didesk ~ $
GPU setup:
sudo nvidia-smi -pm ENABLED -i 0
sudo nvidia-smi -ac 4513,1733 -i 0
MD.log
Log file opened on Tue Nov 13 16:16:22 2018
Host: didesk pid: 45669 rank ID: 0 number of ranks: 1
:-) GROMACS - gmx mdrun, 2018.3 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen
Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra
Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru
Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus
Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl
Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola
Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov
Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen
Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2018.3
Executable: /usr/local/gromacs/bin/gmx
Data prefix: /usr/local/gromacs
Working dir: /home/dikov/Documents/Cients/DavidL/MD/GPU
Command line:
gmx mdrun -deffnm md200ns -v
GROMACS version: 2018.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX_512
FFT library: fftw-3.3.7-sse2-avx
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.6
Tracing support: disabled
Built on: 2018-11-13 21:31:10
Built by: dikov at didesk [CMAKE]
Build OS/arch: Linux 4.15.0-36-generic x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Build CPU family: 6 Model: 85 Stepping: 4
Build CPU features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 7.3.0
C compiler flags: -mavx512f -mfma -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 7.3.0
C++ compiler flags: -mavx512f -mfma -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;; ;-mavx512f;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 10.0
CUDA runtime: 10.0
Running on 1 node with total 36 cores, 72 logical cores, 1 compatible GPU
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Family: 6 Model: 85 Stepping: 4
Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
Number of AVX-512 FMA units: 2
Hardware topology: Full, with devices
Sockets, cores, and logical processors:
Socket 0: [ 0 36] [ 1 37] [ 2 38] [ 3 39] [ 4 40] [ 5 41] [ 6 42] [ 7 43] [ 8 44] [ 9 45] [ 10 46] [ 11 47] [ 12 48] [ 13 49] [ 14 50] [ 15 51] [ 16 52] [ 17 53]
Socket 1: [ 18 54] [ 19 55] [ 20 56] [ 21 57] [ 22 58] [ 23 59] [ 24 60] [ 25 61] [ 26 62] [ 27 63] [ 28 64] [ 29 65] [ 30 66] [ 31 67] [ 32 68] [ 33 69] [ 34 70] [ 35 71]
Numa nodes:
Node 0 (33376423936 bytes mem): 0 36 1 37 2 38 3 39 4 40 5 41 6 42 7 43 8 44 9 45 10 46 11 47 12 48 13 49 14 50 15 51 16 52 17 53
Node 1 (33792262144 bytes mem): 18 54 19 55 20 56 21 57 22 58 23 59 24 60 25 61 26 62 27 63 28 64 29 65 30 66 31 67 32 68 33 69 34 70 35 71
Latency:
0 1
0 1.00 2.10
1 2.10 1.00
Caches:
L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 2 ways
L2: 1048576 bytes, linesize 64 bytes, assoc. 16, shared 2 ways
L3: 25952256 bytes, linesize 64 bytes, assoc. 11, shared 36 ways
PCI devices:
0000:00:11.5 Id: 8086:a1d2 Class: 0x0106 Numa: 0
0000:00:16.2 Id: 8086:a1bc Class: 0x0101 Numa: 0
0000:00:17.0 Id: 8086:2826 Class: 0x0104 Numa: 0
0000:02:00.0 Id: 8086:1533 Class: 0x0200 Numa: 0
0000:00:1f.6 Id: 8086:15b9 Class: 0x0200 Numa: 0
0000:91:00.0 Id: 144d:a808 Class: 0x0108 Numa: 0
0000:d5:00.0 Id: 10de:1bb0 Class: 0x0300 Numa: 0
GPU info:
Number of GPUs detected: 1
#0: NVIDIA Quadro P5000, compute cap.: 6.1, ECC: no, stat: compatible
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------
Input Parameters:
integrator = md
tinit = 0
dt = 0.002
nsteps = 100000000
init-step = 0
simulation-part = 1
comm-mode = Linear
nstcomm = 100
bd-fric = 0
ld-seed = 718849372
emtol = 10
emstep = 0.01
niter = 20
fcstep = 0
nstcgsteep = 1000
nbfgscorr = 10
rtpi = 0.05
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 1000
nstcalcenergy = 100
nstenergy = 5000
nstxout-compressed = 5000
compressed-x-precision = 1000
cutoff-scheme = Verlet
nstlist = 20
ns-type = Grid
pbc = xyz
periodic-molecules = false
verlet-buffer-tolerance = 0.005
rlist = 0.931
coulombtype = PME
coulomb-modifier = Potential-shift
rcoulomb-switch = 0
rcoulomb = 0.9
epsilon-r = 1
epsilon-rf = inf
vdw-type = Cut-off
vdw-modifier = Potential-shift
rvdw-switch = 0
rvdw = 0.9
DispCorr = EnerPres
table-extension = 1
fourierspacing = 0.16
fourier-nx = 52
fourier-ny = 60
fourier-nz = 72
pme-order = 4
ewald-rtol = 1e-05
ewald-rtol-lj = 0.001
lj-pme-comb-rule = Geometric
ewald-geometry = 0
epsilon-surface = 0
implicit-solvent = No
gb-algorithm = Still
nstgbradii = 1
rgbradii = 1
gb-epsilon-solvent = 80
gb-saltconc = 0
gb-obc-alpha = 1
gb-obc-beta = 0.8
gb-obc-gamma = 4.85
gb-dielectric-offset = 0.009
sa-algorithm = Ace-approximation
sa-surface-tension = 2.05016
tcoupl = V-rescale
nsttcouple = 20
nh-chain-length = 0
print-nose-hoover-chain-variables = false
pcoupl = Parrinello-Rahman
pcoupltype = Isotropic
nstpcouple = 20
tau-p = 2
compressibility (3x3):
compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00}
compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00}
compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05}
ref-p (3x3):
ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}
ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}
ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}
refcoord-scaling = No
posres-com (3):
posres-com[0]= 0.00000e+00
posres-com[1]= 0.00000e+00
posres-com[2]= 0.00000e+00
posres-comB (3):
posres-comB[0]= 0.00000e+00
posres-comB[1]= 0.00000e+00
posres-comB[2]= 0.00000e+00
QMMM = false
QMconstraints = 0
QMMMscheme = 0
MMChargeScaleFactor = 1
qm-opts:
ngQM = 0
constraint-algorithm = Lincs
continuation = true
Shake-SOR = false
shake-tol = 0.0001
lincs-order = 4
lincs-iter = 1
lincs-warnangle = 30
nwall = 0
wall-type = 9-3
wall-r-linpot = -1
wall-atomtype[0] = -1
wall-atomtype[1] = -1
wall-density[0] = 0
wall-density[1] = 0
wall-ewald-zfac = 3
pull = false
awh = false
rotation = false
interactiveMD = false
disre = No
disre-weighting = Conservative
disre-mixed = false
dr-fc = 1000
dr-tau = 0
nstdisreout = 100
orire-fc = 0
orire-tau = 0
nstorireout = 100
free-energy = no
cos-acceleration = 0
deform (3x3):
deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00}
simulated-tempering = false
swapcoords = no
userint1 = 0
userint2 = 0
userint3 = 0
userint4 = 0
userreal1 = 0
userreal2 = 0
userreal3 = 0
userreal4 = 0
applied-forces:
electric-field:
x:
E0 = 0
omega = 0
t0 = 0
sigma = 0
y:
E0 = 0
omega = 0
t0 = 0
sigma = 0
z:
E0 = 0
omega = 0
t0 = 0
sigma = 0
grpopts:
nrdf: 16011.7 141396
ref-t: 300 300
tau-t: 0.1 0.1
annealing: No No
annealing-npoints: 0 0
acc: 0 0 0
nfreeze: N N N
energygrp-flags[ 0]: 0
Changing nstlist from 20 to 80, rlist from 0.931 to 1.049
Using 1 MPI thread
Using 36 OpenMP threads
1 GPU auto-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
Application clocks (GPU clocks) for Quadro P5000 are (4513,1733)
Application clocks (GPU clocks) for Quadro P5000 are (4513,1733)
Pinning threads with an auto-selected logical core stride of 2
System total charge: -0.000
Will do PME sum in reciprocal space for electrostatic interactions.
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.111e-05
Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018
Long Range LJ corr.: <C6> 3.3851e-04
Generated table with 1024 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1024 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1024 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 1024 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1024 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1024 data points for 1-4 LJ12.
Tabscale = 500 points/nm
Using GPU 8x8 nonbonded short-range kernels
Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning:
outer list: updated every 80 steps, buffer 0.149 nm, rlist 1.049 nm
inner list: updated every 10 steps, buffer 0.003 nm, rlist 0.903 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
outer list: updated every 80 steps, buffer 0.292 nm, rlist 1.192 nm
inner list: updated every 10 steps, buffer 0.043 nm, rlist 0.943 nm
Using Lorentz-Berthelot Lennard-Jones combination rule
Initializing LINear Constraint Solver
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------
The number of constraints is 8054
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------
Intra-simulation communication will occur every 20 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
0: rest
++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------
There are: 78646 Atoms
Started mdrun on rank 0 Tue Nov 13 16:16:25 2018
Step Time
0 0.00000
-------------------------------------------------------
Program: gmx mdrun, version 2018.3
Source file: src/gromacs/gpu_utils/cudautils.cu (line 110)
Fatal error:
HtoD cudaMemcpyAsync failed: invalid argument
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
Thank you,
Dmytro
________________________________________
From: gromacs.org_gmx-users-bounces at maillist.sys.kth.se <gromacs.org_gmx-users-bounces at maillist.sys.kth.se> on behalf of Mark Abraham <mark.j.abraham at gmail.com>
Sent: Tuesday, November 13, 2018 10:29 PM
To: gmx-users at gromacs.org
Cc: gromacs.org_gmx-users at maillist.sys.kth.se
Subject: Re: [gmx-users] Running GPU issue
Hi,
It can share.
Mark
On Mon, Nov 12, 2018 at 10:19 PM Kovalskyy, Dmytro <Kovalskyy at uthscsa.edu>
wrote:
> Hi,
>
>
>
> To perform GPU with Gromacs does it require exclusive GPU card or Gromacs
> can share the video card with X-server?
>
>
> Thank you
>
>
> Dmytro
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
--
Gromacs Users mailing list
* Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list