[gmx-users] FW: v2018.3; GPU not recognised
Tresadern, Gary [RNDBE]
gtresade at its.jnj.com
Thu Oct 4 17:36:13 CEST 2018
Hi,
We are trying to build a simple workstation installation of v2018.3 that will run with GPU support.
The build and test seems to go without errors, but when we test run new jobs we see the GPU is not being recognized, NOTE: Detection of GPUs failed. The API reported...
We have previously built v5 without these problems. Can you give us some tips for settings we may need to adjust?
Thanks
Gary
#Now switch to sofinst user, (I was not able to do this)
scl enable devtoolset-7 bash
export PATH=$PATH:/usr/local/bin/
export PATH=$PATH:/usr/local/cuda-9.2/bin/
export CUDA_HOME=/usr/local/cuda-9.2/
export PATH=$PATH:/usr/lib64/openmpi/bin/
export LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64:/usr/local/cuda-9.2/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}"
#the command below changes depending on the number of GPUs in the workstation
export CUDA_VISIBLE_DEVICES=0,1
#start installation of gromacs, download gromacs
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2018.3.tar.gz
tar xfz gromacs-2018.3.tar.gz
cd gromacs-2018.3
mkdir build
cd build
#this is the command to set the variables and stuff prior to installation, I chose to install in local /tmp folder, it would be good to keep this the same path on all workstations
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-9.2/ -DGMX_GPU=on -DCMAKE_INSTALL_PREFIX=/tmp/gromacs-2018.3/
make
make check
make install
source /tmp/gromacs-2018.3/bin/GMXRC
-bash-4.2$
-bash-4.2$ nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Wed Oct 3 20:03:24 2018
Driver Version : 390.77
Attached GPUs : 2
GPU 00000000:03:00.0
Product Name : Quadro K4200
Product Brand : Quadro
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0420315044134
GPU UUID : GPU-bdae121b-23e1-dd89-5366-57761927ec39
Minor Number : 0
VBIOS Version : 80.04.FE.00.15
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : N/A
Inforom Version
Image Version : 2004.0503.01.02
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x11B410DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x109610DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 30 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 4036 MiB
Used : 279 MiB
Free : 3757 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 5 MiB
Free : 251 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 3 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 37 C
GPU Shutdown Temp : 102 C
GPU Slowdown Temp : 97 C
GPU Max Operating Temp : 80 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 15.45 W
Power Limit : 110.00 W
Default Power Limit : 110.00 W
Enforced Power Limit : 110.00 W
Min Power Limit : 100.00 W
Max Power Limit : 130.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Video : 405 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 888 MHz
SM : 888 MHz
Memory : 2700 MHz
Video : 540 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 3360
Type : G
Name : /usr/bin/X
Used GPU Memory : 116 MiB
Process ID : 12028
Type : G
Name : /prd/pkgs/schrodinger/pymol/2.0/bin/python
Used GPU Memory : 33 MiB
Process ID : 24619
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 126 MiB
GPU 00000000:81:00.0
Product Name : Tesla K40c
Product Brand : Tesla
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320415010473
GPU UUID : GPU-db0da2e2-ea71-1d14-9812-d7c59b6bf63a
Minor Number : 1
VBIOS Version : 80.80.3E.00.02
MultiGPU Board : No
Board ID : 0x8100
GPU Part Number : 900-22081-1750-000
Inforom Version
Image Version : 2081.0206.01.04
OEM Object : 1.1
ECC Object : 3.0
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x81
Device : 0x00
Domain : 0x0000
Device Id : 0x102410DE
Bus Id : 00000000:81:00.0
Sub System Id : 0x098310DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : N/A
Rx Throughput : N/A
Fan Speed : 23 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11441 MiB
Used : 0 MiB
Free : 11441 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 2 MiB
Free : 254 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 0
Aggregate
Single Bit
Device Memory : 10
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 10
Double Bit
Device Memory : 9
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : 0
Texture Shared : N/A
CBU : N/A
Total : 9
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 4
Pending : No
Temperature
GPU Current Temp : 41 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 90 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 22.90 W
Power Limit : 235.00 W
Default Power Limit : 235.00 W
Enforced Power Limit : 235.00 W
Min Power Limit : 180.00 W
Max Power Limit : 235.00 W
Clocks
Graphics : 324 MHz
SM : 324 MHz
Memory : 324 MHz
Video : 405 MHz
Applications Clocks
Graphics : 875 MHz
Memory : 3004 MHz
Default Applications Clocks
Graphics : 745 MHz
Memory : 3004 MHz
Max Clocks
Graphics : 875 MHz
SM : 875 MHz
Memory : 3004 MHz
Video : 540 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes : None
Scanning dependencies of target tests
[100%] Built target tests
Scanning dependencies of target run-ctest-nophys
[100%] Running all tests except physical validation
Test project /prd/pkgs/gromacs/2018.3/gromacs-gpu-build/build
Start 1: TestUtilsUnitTests
1/39 Test #1: TestUtilsUnitTests ............... Passed 0.42 sec
Start 2: TestUtilsMpiUnitTests
2/39 Test #2: TestUtilsMpiUnitTests ............ Passed 0.27 sec
Start 3: MdlibUnitTest
3/39 Test #3: MdlibUnitTest .................... Passed 0.27 sec
Start 4: AppliedForcesUnitTest
4/39 Test #4: AppliedForcesUnitTest ............ Passed 0.25 sec
Start 5: ListedForcesTest
5/39 Test #5: ListedForcesTest ................. Passed 0.29 sec
Start 6: CommandLineUnitTests
6/39 Test #6: CommandLineUnitTests ............. Passed 0.32 sec
Start 7: EwaldUnitTests
7/39 Test #7: EwaldUnitTests ................... Passed 2.39 sec
Start 8: FFTUnitTests
8/39 Test #8: FFTUnitTests ..................... Passed 0.34 sec
Start 9: GpuUtilsUnitTests
9/39 Test #9: GpuUtilsUnitTests ................ Passed 3.93 sec
Start 10: HardwareUnitTests
10/39 Test #10: HardwareUnitTests ................ Passed 0.27 sec
Start 11: MathUnitTests
11/39 Test #11: MathUnitTests .................... Passed 0.28 sec
Start 12: MdrunUtilityUnitTests
12/39 Test #12: MdrunUtilityUnitTests ............ Passed 0.25 sec
Start 13: MdrunUtilityMpiUnitTests
13/39 Test #13: MdrunUtilityMpiUnitTests ......... Passed 0.28 sec
Start 14: OnlineHelpUnitTests
14/39 Test #14: OnlineHelpUnitTests .............. Passed 0.28 sec
Start 15: OptionsUnitTests
15/39 Test #15: OptionsUnitTests ................. Passed 0.27 sec
Start 16: RandomUnitTests
16/39 Test #16: RandomUnitTests .................. Passed 0.29 sec
Start 17: TableUnitTests
17/39 Test #17: TableUnitTests ................... Passed 0.35 sec
Start 18: TaskAssignmentUnitTests
18/39 Test #18: TaskAssignmentUnitTests .......... Passed 0.24 sec
Start 19: UtilityUnitTests
19/39 Test #19: UtilityUnitTests ................. Passed 0.32 sec
Start 20: FileIOTests
20/39 Test #20: FileIOTests ...................... Passed 0.30 sec
Start 21: PullTest
21/39 Test #21: PullTest ......................... Passed 0.25 sec
Start 22: AwhTest
22/39 Test #22: AwhTest .......................... Passed 0.26 sec
Start 23: SimdUnitTests
23/39 Test #23: SimdUnitTests .................... Passed 0.27 sec
Start 24: GmxAnaTest
24/39 Test #24: GmxAnaTest ....................... Passed 0.41 sec
Start 25: GmxPreprocessTests
25/39 Test #25: GmxPreprocessTests ............... Passed 0.72 sec
Start 26: CorrelationsTest
26/39 Test #26: CorrelationsTest ................. Passed 0.80 sec
Start 27: AnalysisDataUnitTests
27/39 Test #27: AnalysisDataUnitTests ............ Passed 0.33 sec
Start 28: SelectionUnitTests
28/39 Test #28: SelectionUnitTests ............... Passed 0.63 sec
Start 29: TrajectoryAnalysisUnitTests
29/39 Test #29: TrajectoryAnalysisUnitTests ...... Passed 0.94 sec
Start 30: EnergyAnalysisUnitTests
30/39 Test #30: EnergyAnalysisUnitTests .......... Passed 0.40 sec
Start 31: CompatibilityHelpersTests
31/39 Test #31: CompatibilityHelpersTests ........ Passed 0.26 sec
Start 32: MdrunTests
32/39 Test #32: MdrunTests ....................... Passed 12.53 sec
Start 33: MdrunMpiTests
33/39 Test #33: MdrunMpiTests .................... Passed 4.08 sec
Start 34: regressiontests/simple
34/39 Test #34: regressiontests/simple ........... Passed 26.14 sec
Start 35: regressiontests/complex
35/39 Test #35: regressiontests/complex .......... Passed 138.00 sec
Start 36: regressiontests/kernel
36/39 Test #36: regressiontests/kernel ........... Passed 252.22 sec
Start 37: regressiontests/freeenergy
37/39 Test #37: regressiontests/freeenergy ....... Passed 26.16 sec
Start 38: regressiontests/pdb2gmx
38/39 Test #38: regressiontests/pdb2gmx .......... Passed 77.12 sec
Start 39: regressiontests/rotation
39/39 Test #39: regressiontests/rotation ......... Passed 21.36 sec
100% tests passed, 0 tests failed out of 39
Label Time Summary:
GTest = 33.48 sec*proc (33 tests)
IntegrationTest = 17.02 sec*proc (3 tests)
MpiTest = 4.63 sec*proc (3 tests)
UnitTest = 16.47 sec*proc (30 tests)
Total Test time (real) = 574.67 sec
[100%] Built target run-ctest-nophys
Scanning dependencies of target check
[100%] Built target check
____________________________________________________
Gary Tresadern, MChem, Ph.D
Senior Principal Scientist, Discovery Sciences
Janssen Research & Development
Tel.: +32 1464 1569
mailto:gtresade at its.jnj.com
More information about the gromacs.org_gmx-users
mailing list