[gmx-developers] 4.6 beta1 not detecting GPU or CUDA

Mon Dec 3 15:02:04 CET 2012

Hi All,

I'm having a strange problem and I'm hoping I can get some help diagnosing it. 
I compiled the beta release on our AMD cluster that has Tesla S2050 GPU's, but I 
haven't been able to successfully run anything yet.  Our admins don't know much 
specifically about Gromacs, and it seems everything should be working, but 
somehow mdrun is not detecting CUDA or finding the GPU card on the compute nodes.

We have CUDA 3.1, 3.2, and 4.0 available on the cluster, and I can replicate 
this problem with both 3.2 and 4.0.  It appears that mdrun is linked properly to 
the CUDA libraries:

$ ldd mdrun
	libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaaacc8000)
	libgmxpreprocess.so.6 => 
/home/jalemkul/ATHENA/software/gromacs-46beta1/bin/../lib/libgmxpreprocess.so.6 
(0x00002aaaaaecc000)
	libmd.so.6 => 
/home/jalemkul/ATHENA/software/gromacs-46beta1/bin/../lib/libmd.so.6 
(0x00002aaaab1b2000)
	libgmx.so.6 => 
/home/jalemkul/ATHENA/software/gromacs-46beta1/bin/../lib/libgmx.so.6 
(0x00002aaaab7a2000)
	libm.so.6 => /lib64/libm.so.6 (0x00002aaaabfd0000)
	libcudart.so.4 => /cm/shared/apps/cuda40/toolkit/4.0.17/lib64/libcudart.so.4 
(0x00002aaaac253000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaac4c1000)
	libfftw3f.so.3 => /home/jalemkul/ATHENA/software/fftw-3.3.3/lib/libfftw3f.so.3 
(0x00002aaaac6dc000)
	libstdc++.so.6 => /cm/shared/apps/gcc/4.3.4/lib64/libstdc++.so.6 
(0x00002aaaaca58000)
	libgomp.so.1 => /cm/shared/apps/gcc/4.3.4/lib64/libgomp.so.1 (0x00002aaaacd5f000)
	libgcc_s.so.1 => /cm/shared/apps/gcc/4.3.4/lib64/libgcc_s.so.1 (0x00002aaaacf67000)
	libc.so.6 => /lib64/libc.so.6 (0x00002aaaad17d000)
	/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
	librt.so.1 => /lib64/librt.so.1 (0x00002aaaad4d5000)

We have a module system, so by simply using 'module load cuda40' I get all of 
the CUDA stuff loaded properly, i.e.:

$ echo $LD_LIBRARY_PATH
/cm/shared/apps/gcc/4.3.4/lib:/cm/shared/apps/gcc/4.3.4/lib64:/cm/shared/apps/cuda40/toolkit/4.0.17/lib64:/cm/local/apps/cuda40/libs/270.41.19/usr/lib64:/cm/shared/apps/cuda40/sdk/4.0.17/C/lib:/cm/shared/apps/cuda40/sdk/4.0.17/OpenCL/common/lib/Linux64

So it appears mdrun should be able to load the necessary libraries.  When I 
execute the run:

$ mdrun -deffnm md -nb gpu -gpu_id 0

I get the following in the .log file:

Log file opened on Mon Dec  3 08:55:51 2012
Host: athena002  pid: 21368  nodeid: 0  nnodes:  1
Gromacs version:    VERSION 4.6-beta1
Precision:          single
MPI library:        thread_mpi
OpenMP support:     enabled
GPU support:        enabled
invsqrt routine:    gmx_software_invsqrt(x)
CPU acceleration:   SSE2
FFT library:        fftw-3.3.3-sse2
Large file support: enabled
RDTSCP usage:       enabled
Built on:           Thu Nov 29 21:38:43 EST 2012
Built by:           jalemkul at athena1 [CMAKE]
Build OS/arch:      Linux 2.6.18-194.11.4.el5 x86_64
Build CPU vendor:   AuthenticAMD
Build CPU brand:    AMD Opteron(tm) Processor 6134
Build CPU family:   16   Model: 9   Stepping: 1
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf_lm misalignsse mmx msr 
nonstop_tsc pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a
C compiler:         /cm/shared/apps/gcc/4.3.4/bin/gcc GNU gcc (GCC) 4.3.4
C compiler flags:   -msse2  -Wextra -Wno-missing-field-initializers 
-Wno-sign-compare -Wall -Wno-unused -Wunused-value   -fomit-frame-pointer 
-funroll-all-loops  -O3 -DNDEBUG
C++ compiler:       /cm/shared/apps/gcc/4.3.4/bin/g++ GNU g++ (GCC) 4.3.4
C++ compiler flags: -msse2  -Wextra -Wno-missing-field-initializers 
-Wno-sign-compare -Wall -Wno-unused -Wunused-value   -fomit-frame-pointer 
-funroll-all-loops  -O3 -DNDEBUG
CUDA compiler:      nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 
2005-2011 NVIDIA Corporation;Built on Thu_May_12_11:09:45_PDT_2011;Cuda 
compilation tools, release 4.0, V0.2.1221
CUDA driver:        0.0
CUDA runtime:       0.0

 From the last two lines, it seems the the driver and runtime libraries are not 
found.  Later in the .log file, it seems that the GPU itself is not found:

NOTE: Error occurred during GPU detection:
       CUDA driver version is insufficient for CUDA runtime version
       Can not use GPU acceleration, will fall back to CPU kernels.

No GPUs detected

-------------------------------------------------------
Program mdrun, VERSION 4.6-beta1
Source code file: 
/home/jalemkul/gromacs-4.6-beta1/src/gmxlib/gmx_detect_hardware.c, line: 567

Fatal error:
Some of the requested GPUs do not exist, behave strangely, or are not compatible:
     GPU #0: inexistent

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Any ideas on troubleshooting, or things I can tell our admins to help move 
things along?  I've got 4.6beta1 working fine on a local workstation in our lab, 
but the installation on the unversity's cluster is nonfunctional from the 
standpoint of the GPU.

-Justin

-- 
========================================

Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================