[gmx-users] FW: v2018.3; GPU not recognised
Szilárd Páll
pall.szilard at gmail.com
Thu Oct 4 17:49:39 CEST 2018
On Thu, Oct 4, 2018 at 5:36 PM Tresadern, Gary [RNDBE] <gtresade at its.jnj.com>
wrote:
> Hi,
> We are trying to build a simple workstation installation of v2018.3 that
> will run with GPU support.
> The build and test seems to go without errors, but when we test run new
> jobs we see the GPU is not being recognized, NOTE: Detection of GPUs
> failed. The API reported...
>
That sounds like a insufficient driver for the runtime. What's the full
error message, doesn't it say exactly that?
> We have previously built v5 without these problems. Can you give us some
> tips for settings we may need to adjust?
>
> Thanks
> Gary
>
> #Now switch to sofinst user, (I was not able to do this)
> scl enable devtoolset-7 bash
>
> export PATH=$PATH:/usr/local/bin/
> export PATH=$PATH:/usr/local/cuda-9.2/bin/
> export CUDA_HOME=/usr/local/cuda-9.2/
> export PATH=$PATH:/usr/lib64/openmpi/bin/
> export
> LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64:/usr/local/cuda-9.2/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}"
>
I believe you need 396.xx or later drivers for CUDA 9.2.
--
Szilárd
> #the command below changes depending on the number of GPUs in the
> workstation
> export CUDA_VISIBLE_DEVICES=0,1
>
> #start installation of gromacs, download gromacs
> wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2018.3.tar.gz
> tar xfz gromacs-2018.3.tar.gz
> cd gromacs-2018.3
> mkdir build
> cd build
>
> #this is the command to set the variables and stuff prior to installation,
> I chose to install in local /tmp folder, it would be good to keep this the
> same path on all workstations
> cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON
> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-9.2/ -DGMX_GPU=on
> -DCMAKE_INSTALL_PREFIX=/tmp/gromacs-2018.3/
> make
> make check
> make install
> source /tmp/gromacs-2018.3/bin/GMXRC
>
> -bash-4.2$
> -bash-4.2$ nvidia-smi -a
>
> ==============NVSMI LOG==============
>
> Timestamp : Wed Oct 3 20:03:24 2018
> Driver Version : 390.77
>
> Attached GPUs : 2
> GPU 00000000:03:00.0
> Product Name : Quadro K4200
> Product Brand : Quadro
> Display Mode : Enabled
> Display Active : Enabled
> Persistence Mode : Enabled
> Accounting Mode : Disabled
> Accounting Mode Buffer Size : 4000
> Driver Model
> Current : N/A
> Pending : N/A
> Serial Number : 0420315044134
> GPU UUID :
> GPU-bdae121b-23e1-dd89-5366-57761927ec39
> Minor Number : 0
> VBIOS Version : 80.04.FE.00.15
> MultiGPU Board : No
> Board ID : 0x300
> GPU Part Number : N/A
> Inforom Version
> Image Version : 2004.0503.01.02
> OEM Object : 1.1
> ECC Object : N/A
> Power Management Object : N/A
> GPU Operation Mode
> Current : N/A
> Pending : N/A
> GPU Virtualization Mode
> Virtualization mode : None
> PCI
> Bus : 0x03
> Device : 0x00
> Domain : 0x0000
> Device Id : 0x11B410DE
> Bus Id : 00000000:03:00.0
> Sub System Id : 0x109610DE
> GPU Link Info
> PCIe Generation
> Max : 2
> Current : 1
> Link Width
> Max : 16x
> Current : 16x
> Bridge Chip
> Type : N/A
> Firmware : N/A
> Replays since reset : 0
> Tx Throughput : N/A
> Rx Throughput : N/A
> Fan Speed : 30 %
> Performance State : P8
> Clocks Throttle Reasons
> Idle : Active
> Applications Clocks Setting : Not Active
> SW Power Cap : Not Active
> HW Slowdown : Not Active
> HW Thermal Slowdown : N/A
> HW Power Brake Slowdown : N/A
> Sync Boost : Not Active
> SW Thermal Slowdown : Not Active
> Display Clock Setting : Not Active
> FB Memory Usage
> Total : 4036 MiB
> Used : 279 MiB
> Free : 3757 MiB
> BAR1 Memory Usage
> Total : 256 MiB
> Used : 5 MiB
> Free : 251 MiB
> Compute Mode : Default
> Utilization
> Gpu : 0 %
> Memory : 3 %
> Encoder : 0 %
> Decoder : 0 %
> Encoder Stats
> Active Sessions : 0
> Average FPS : 0
> Average Latency : 0
> Ecc Mode
> Current : N/A
> Pending : N/A
> ECC Errors
> Volatile
> Single Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Texture Shared : N/A
> CBU : N/A
> Total : N/A
> Double Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Texture Shared : N/A
> CBU : N/A
> Total : N/A
> Aggregate
> Single Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Texture Shared : N/A
> CBU : N/A
> Total : N/A
> Double Bit
> Device Memory : N/A
> Register File : N/A
> L1 Cache : N/A
> L2 Cache : N/A
> Texture Memory : N/A
> Texture Shared : N/A
> CBU : N/A
> Total : N/A
> Retired Pages
> Single Bit ECC : N/A
> Double Bit ECC : N/A
> Pending : N/A
> Temperature
> GPU Current Temp : 37 C
> GPU Shutdown Temp : 102 C
> GPU Slowdown Temp : 97 C
> GPU Max Operating Temp : 80 C
> Memory Current Temp : N/A
> Memory Max Operating Temp : N/A
> Power Readings
> Power Management : Supported
> Power Draw : 15.45 W
> Power Limit : 110.00 W
> Default Power Limit : 110.00 W
> Enforced Power Limit : 110.00 W
> Min Power Limit : 100.00 W
> Max Power Limit : 130.00 W
> Clocks
> Graphics : 324 MHz
> SM : 324 MHz
> Memory : 324 MHz
> Video : 405 MHz
> Applications Clocks
> Graphics : N/A
> Memory : N/A
> Default Applications Clocks
> Graphics : N/A
> Memory : N/A
> Max Clocks
> Graphics : 888 MHz
> SM : 888 MHz
> Memory : 2700 MHz
> Video : 540 MHz
> Max Customer Boost Clocks
> Graphics : N/A
> Clock Policy
> Auto Boost : N/A
> Auto Boost Default : N/A
> Processes
> Process ID : 3360
> Type : G
> Name : /usr/bin/X
> Used GPU Memory : 116 MiB
> Process ID : 12028
> Type : G
> Name :
> /prd/pkgs/schrodinger/pymol/2.0/bin/python
> Used GPU Memory : 33 MiB
> Process ID : 24619
> Type : G
> Name : /usr/bin/gnome-shell
> Used GPU Memory : 126 MiB
>
> GPU 00000000:81:00.0
> Product Name : Tesla K40c
> Product Brand : Tesla
> Display Mode : Disabled
> Display Active : Disabled
> Persistence Mode : Enabled
> Accounting Mode : Disabled
> Accounting Mode Buffer Size : 4000
> Driver Model
> Current : N/A
> Pending : N/A
> Serial Number : 0320415010473
> GPU UUID :
> GPU-db0da2e2-ea71-1d14-9812-d7c59b6bf63a
> Minor Number : 1
> VBIOS Version : 80.80.3E.00.02
> MultiGPU Board : No
> Board ID : 0x8100
> GPU Part Number : 900-22081-1750-000
> Inforom Version
> Image Version : 2081.0206.01.04
> OEM Object : 1.1
> ECC Object : 3.0
> Power Management Object : N/A
> GPU Operation Mode
> Current : N/A
> Pending : N/A
> GPU Virtualization Mode
> Virtualization mode : None
> PCI
> Bus : 0x81
> Device : 0x00
> Domain : 0x0000
> Device Id : 0x102410DE
> Bus Id : 00000000:81:00.0
> Sub System Id : 0x098310DE
> GPU Link Info
> PCIe Generation
> Max : 3
> Current : 1
> Link Width
> Max : 16x
> Current : 16x
> Bridge Chip
> Type : N/A
> Firmware : N/A
> Replays since reset : 0
> Tx Throughput : N/A
> Rx Throughput : N/A
> Fan Speed : 23 %
> Performance State : P8
> Clocks Throttle Reasons
> Idle : Active
> Applications Clocks Setting : Not Active
> SW Power Cap : Not Active
> HW Slowdown : Not Active
> HW Thermal Slowdown : N/A
> HW Power Brake Slowdown : N/A
> Sync Boost : Not Active
> SW Thermal Slowdown : Not Active
> Display Clock Setting : Not Active
> FB Memory Usage
> Total : 11441 MiB
> Used : 0 MiB
> Free : 11441 MiB
> BAR1 Memory Usage
> Total : 256 MiB
> Used : 2 MiB
> Free : 254 MiB
> Compute Mode : Default
> Utilization
> Gpu : 0 %
> Memory : 0 %
> Encoder : 0 %
> Decoder : 0 %
> Encoder Stats
> Active Sessions : 0
> Average FPS : 0
> Average Latency : 0
> Ecc Mode
> Current : Enabled
> Pending : Enabled
> ECC Errors
> Volatile
> Single Bit
> Device Memory : 0
> Register File : 0
> L1 Cache : 0
> L2 Cache : 0
> Texture Memory : 0
> Texture Shared : N/A
> CBU : N/A
> Total : 0
> Double Bit
> Device Memory : 0
> Register File : 0
> L1 Cache : 0
> L2 Cache : 0
> Texture Memory : 0
> Texture Shared : N/A
> CBU : N/A
> Total : 0
> Aggregate
> Single Bit
> Device Memory : 10
> Register File : 0
> L1 Cache : 0
> L2 Cache : 0
> Texture Memory : 0
> Texture Shared : N/A
> CBU : N/A
> Total : 10
> Double Bit
> Device Memory : 9
> Register File : 0
> L1 Cache : 0
> L2 Cache : 0
> Texture Memory : 0
> Texture Shared : N/A
> CBU : N/A
> Total : 9
> Retired Pages
> Single Bit ECC : 0
> Double Bit ECC : 4
> Pending : No
> Temperature
> GPU Current Temp : 41 C
> GPU Shutdown Temp : 95 C
> GPU Slowdown Temp : 90 C
> GPU Max Operating Temp : N/A
> Memory Current Temp : N/A
> Memory Max Operating Temp : N/A
> Power Readings
> Power Management : Supported
> Power Draw : 22.90 W
> Power Limit : 235.00 W
> Default Power Limit : 235.00 W
> Enforced Power Limit : 235.00 W
> Min Power Limit : 180.00 W
> Max Power Limit : 235.00 W
> Clocks
> Graphics : 324 MHz
> SM : 324 MHz
> Memory : 324 MHz
> Video : 405 MHz
> Applications Clocks
> Graphics : 875 MHz
> Memory : 3004 MHz
> Default Applications Clocks
> Graphics : 745 MHz
> Memory : 3004 MHz
> Max Clocks
> Graphics : 875 MHz
> SM : 875 MHz
> Memory : 3004 MHz
> Video : 540 MHz
> Max Customer Boost Clocks
> Graphics : N/A
> Clock Policy
> Auto Boost : N/A
> Auto Boost Default : N/A
> Processes : None
>
>
>
> Scanning dependencies of target tests
> [100%] Built target tests
> Scanning dependencies of target run-ctest-nophys
> [100%] Running all tests except physical validation
> Test project /prd/pkgs/gromacs/2018.3/gromacs-gpu-build/build
> Start 1: TestUtilsUnitTests
> 1/39 Test #1: TestUtilsUnitTests ............... Passed 0.42 sec
> Start 2: TestUtilsMpiUnitTests
> 2/39 Test #2: TestUtilsMpiUnitTests ............ Passed 0.27 sec
> Start 3: MdlibUnitTest
> 3/39 Test #3: MdlibUnitTest .................... Passed 0.27 sec
> Start 4: AppliedForcesUnitTest
> 4/39 Test #4: AppliedForcesUnitTest ............ Passed 0.25 sec
> Start 5: ListedForcesTest
> 5/39 Test #5: ListedForcesTest ................. Passed 0.29 sec
> Start 6: CommandLineUnitTests
> 6/39 Test #6: CommandLineUnitTests ............. Passed 0.32 sec
> Start 7: EwaldUnitTests
> 7/39 Test #7: EwaldUnitTests ................... Passed 2.39 sec
> Start 8: FFTUnitTests
> 8/39 Test #8: FFTUnitTests ..................... Passed 0.34 sec
> Start 9: GpuUtilsUnitTests
> 9/39 Test #9: GpuUtilsUnitTests ................ Passed 3.93 sec
> Start 10: HardwareUnitTests
> 10/39 Test #10: HardwareUnitTests ................ Passed 0.27 sec
> Start 11: MathUnitTests
> 11/39 Test #11: MathUnitTests .................... Passed 0.28 sec
> Start 12: MdrunUtilityUnitTests
> 12/39 Test #12: MdrunUtilityUnitTests ............ Passed 0.25 sec
> Start 13: MdrunUtilityMpiUnitTests
> 13/39 Test #13: MdrunUtilityMpiUnitTests ......... Passed 0.28 sec
> Start 14: OnlineHelpUnitTests
> 14/39 Test #14: OnlineHelpUnitTests .............. Passed 0.28 sec
> Start 15: OptionsUnitTests
> 15/39 Test #15: OptionsUnitTests ................. Passed 0.27 sec
> Start 16: RandomUnitTests
> 16/39 Test #16: RandomUnitTests .................. Passed 0.29 sec
> Start 17: TableUnitTests
> 17/39 Test #17: TableUnitTests ................... Passed 0.35 sec
> Start 18: TaskAssignmentUnitTests
> 18/39 Test #18: TaskAssignmentUnitTests .......... Passed 0.24 sec
> Start 19: UtilityUnitTests
> 19/39 Test #19: UtilityUnitTests ................. Passed 0.32 sec
> Start 20: FileIOTests
> 20/39 Test #20: FileIOTests ...................... Passed 0.30 sec
> Start 21: PullTest
> 21/39 Test #21: PullTest ......................... Passed 0.25 sec
> Start 22: AwhTest
> 22/39 Test #22: AwhTest .......................... Passed 0.26 sec
> Start 23: SimdUnitTests
> 23/39 Test #23: SimdUnitTests .................... Passed 0.27 sec
> Start 24: GmxAnaTest
> 24/39 Test #24: GmxAnaTest ....................... Passed 0.41 sec
> Start 25: GmxPreprocessTests
> 25/39 Test #25: GmxPreprocessTests ............... Passed 0.72 sec
> Start 26: CorrelationsTest
> 26/39 Test #26: CorrelationsTest ................. Passed 0.80 sec
> Start 27: AnalysisDataUnitTests
> 27/39 Test #27: AnalysisDataUnitTests ............ Passed 0.33 sec
> Start 28: SelectionUnitTests
> 28/39 Test #28: SelectionUnitTests ............... Passed 0.63 sec
> Start 29: TrajectoryAnalysisUnitTests
> 29/39 Test #29: TrajectoryAnalysisUnitTests ...... Passed 0.94 sec
> Start 30: EnergyAnalysisUnitTests
> 30/39 Test #30: EnergyAnalysisUnitTests .......... Passed 0.40 sec
> Start 31: CompatibilityHelpersTests
> 31/39 Test #31: CompatibilityHelpersTests ........ Passed 0.26 sec
> Start 32: MdrunTests
> 32/39 Test #32: MdrunTests ....................... Passed 12.53 sec
> Start 33: MdrunMpiTests
> 33/39 Test #33: MdrunMpiTests .................... Passed 4.08 sec
> Start 34: regressiontests/simple
> 34/39 Test #34: regressiontests/simple ........... Passed 26.14 sec
> Start 35: regressiontests/complex
> 35/39 Test #35: regressiontests/complex .......... Passed 138.00 sec
> Start 36: regressiontests/kernel
> 36/39 Test #36: regressiontests/kernel ........... Passed 252.22 sec
> Start 37: regressiontests/freeenergy
> 37/39 Test #37: regressiontests/freeenergy ....... Passed 26.16 sec
> Start 38: regressiontests/pdb2gmx
> 38/39 Test #38: regressiontests/pdb2gmx .......... Passed 77.12 sec
> Start 39: regressiontests/rotation
> 39/39 Test #39: regressiontests/rotation ......... Passed 21.36 sec
>
> 100% tests passed, 0 tests failed out of 39
>
> Label Time Summary:
> GTest = 33.48 sec*proc (33 tests)
> IntegrationTest = 17.02 sec*proc (3 tests)
> MpiTest = 4.63 sec*proc (3 tests)
> UnitTest = 16.47 sec*proc (30 tests)
>
> Total Test time (real) = 574.67 sec
> [100%] Built target run-ctest-nophys
> Scanning dependencies of target check
> [100%] Built target check
>
>
>
>
>
> ____________________________________________________
> Gary Tresadern, MChem, Ph.D
> Senior Principal Scientist, Discovery Sciences
> Janssen Research & Development
> Tel.: +32 1464 1569
> mailto:gtresade at its.jnj.com
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list