[gmx-users] FW: v2018.3; GPU not recognised

Szilárd Páll pall.szilard at gmail.com
Thu Oct 4 17:49:39 CEST 2018


On Thu, Oct 4, 2018 at 5:36 PM Tresadern, Gary [RNDBE] <gtresade at its.jnj.com>
wrote:

> Hi,
> We are trying to build a simple workstation installation of v2018.3 that
> will run with GPU support.
> The build and test seems to go without errors, but when we test run new
> jobs we see the GPU is not being recognized, NOTE: Detection of GPUs
> failed. The API reported...
>

That sounds like a insufficient driver for the runtime. What's the full
error message, doesn't it say exactly that?


> We have previously built v5 without these problems. Can you give us some
> tips for settings we may need to adjust?
>
> Thanks
> Gary
>
> #Now switch to sofinst user, (I was not able to do this)
> scl enable devtoolset-7 bash
>
> export PATH=$PATH:/usr/local/bin/
> export PATH=$PATH:/usr/local/cuda-9.2/bin/
> export CUDA_HOME=/usr/local/cuda-9.2/
> export PATH=$PATH:/usr/lib64/openmpi/bin/
> export
> LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64:/usr/local/cuda-9.2/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}"
>

I believe you need 396.xx or later drivers for CUDA 9.2.

--
Szilárd



> #the command below changes depending on the number of GPUs in the
> workstation
> export CUDA_VISIBLE_DEVICES=0,1
>
> #start installation of gromacs, download gromacs
> wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2018.3.tar.gz
> tar xfz gromacs-2018.3.tar.gz
> cd gromacs-2018.3
> mkdir build
> cd build
>
> #this is the command to set the variables and stuff prior to installation,
> I chose to install in local /tmp folder, it would be good to keep this the
> same path on all workstations
> cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON
> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-9.2/ -DGMX_GPU=on
> -DCMAKE_INSTALL_PREFIX=/tmp/gromacs-2018.3/
> make
> make check
> make install
> source /tmp/gromacs-2018.3/bin/GMXRC
>
> -bash-4.2$
> -bash-4.2$ nvidia-smi -a
>
> ==============NVSMI LOG==============
>
> Timestamp                           : Wed Oct  3 20:03:24 2018
> Driver Version                      : 390.77
>
> Attached GPUs                       : 2
> GPU 00000000:03:00.0
>     Product Name                    : Quadro K4200
>     Product Brand                   : Quadro
>     Display Mode                    : Enabled
>     Display Active                  : Enabled
>     Persistence Mode                : Enabled
>     Accounting Mode                 : Disabled
>     Accounting Mode Buffer Size     : 4000
>     Driver Model
>         Current                     : N/A
>         Pending                     : N/A
>     Serial Number                   : 0420315044134
>     GPU UUID                        :
> GPU-bdae121b-23e1-dd89-5366-57761927ec39
>     Minor Number                    : 0
>     VBIOS Version                   : 80.04.FE.00.15
>     MultiGPU Board                  : No
>     Board ID                        : 0x300
>     GPU Part Number                 : N/A
>     Inforom Version
>         Image Version               : 2004.0503.01.02
>         OEM Object                  : 1.1
>         ECC Object                  : N/A
>         Power Management Object     : N/A
>     GPU Operation Mode
>         Current                     : N/A
>         Pending                     : N/A
>     GPU Virtualization Mode
>         Virtualization mode         : None
>     PCI
>         Bus                         : 0x03
>         Device                      : 0x00
>         Domain                      : 0x0000
>         Device Id                   : 0x11B410DE
>         Bus Id                      : 00000000:03:00.0
>         Sub System Id               : 0x109610DE
>         GPU Link Info
>             PCIe Generation
>                 Max                 : 2
>                 Current             : 1
>             Link Width
>                 Max                 : 16x
>                 Current             : 16x
>         Bridge Chip
>             Type                    : N/A
>             Firmware                : N/A
>         Replays since reset         : 0
>         Tx Throughput               : N/A
>         Rx Throughput               : N/A
>     Fan Speed                       : 30 %
>     Performance State               : P8
>     Clocks Throttle Reasons
>         Idle                        : Active
>         Applications Clocks Setting : Not Active
>         SW Power Cap                : Not Active
>         HW Slowdown                 : Not Active
>             HW Thermal Slowdown     : N/A
>             HW Power Brake Slowdown : N/A
>         Sync Boost                  : Not Active
>         SW Thermal Slowdown         : Not Active
>         Display Clock Setting       : Not Active
>     FB Memory Usage
>         Total                       : 4036 MiB
>         Used                        : 279 MiB
>         Free                        : 3757 MiB
>     BAR1 Memory Usage
>         Total                       : 256 MiB
>         Used                        : 5 MiB
>         Free                        : 251 MiB
>     Compute Mode                    : Default
>     Utilization
>         Gpu                         : 0 %
>         Memory                      : 3 %
>         Encoder                     : 0 %
>         Decoder                     : 0 %
>     Encoder Stats
>         Active Sessions             : 0
>         Average FPS                 : 0
>         Average Latency             : 0
>     Ecc Mode
>         Current                     : N/A
>         Pending                     : N/A
>     ECC Errors
>         Volatile
>             Single Bit
>                 Device Memory       : N/A
>                 Register File       : N/A
>                 L1 Cache            : N/A
>                 L2 Cache            : N/A
>                 Texture Memory      : N/A
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : N/A
>             Double Bit
>                 Device Memory       : N/A
>                 Register File       : N/A
>                 L1 Cache            : N/A
>                 L2 Cache            : N/A
>                 Texture Memory      : N/A
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : N/A
>         Aggregate
>             Single Bit
>                 Device Memory       : N/A
>                 Register File       : N/A
>                 L1 Cache            : N/A
>                 L2 Cache            : N/A
>                 Texture Memory      : N/A
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : N/A
>             Double Bit
>                 Device Memory       : N/A
>                 Register File       : N/A
>                 L1 Cache            : N/A
>                 L2 Cache            : N/A
>                 Texture Memory      : N/A
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : N/A
>     Retired Pages
>         Single Bit ECC              : N/A
>         Double Bit ECC              : N/A
>         Pending                     : N/A
>     Temperature
>         GPU Current Temp            : 37 C
>         GPU Shutdown Temp           : 102 C
>         GPU Slowdown Temp           : 97 C
>         GPU Max Operating Temp      : 80 C
>         Memory Current Temp         : N/A
>         Memory Max Operating Temp   : N/A
>     Power Readings
>         Power Management            : Supported
>         Power Draw                  : 15.45 W
>         Power Limit                 : 110.00 W
>         Default Power Limit         : 110.00 W
>         Enforced Power Limit        : 110.00 W
>         Min Power Limit             : 100.00 W
>         Max Power Limit             : 130.00 W
>     Clocks
>         Graphics                    : 324 MHz
>         SM                          : 324 MHz
>         Memory                      : 324 MHz
>         Video                       : 405 MHz
>     Applications Clocks
>         Graphics                    : N/A
>         Memory                      : N/A
>     Default Applications Clocks
>         Graphics                    : N/A
>         Memory                      : N/A
>     Max Clocks
>         Graphics                    : 888 MHz
>         SM                          : 888 MHz
>         Memory                      : 2700 MHz
>         Video                       : 540 MHz
>     Max Customer Boost Clocks
>         Graphics                    : N/A
>     Clock Policy
>         Auto Boost                  : N/A
>         Auto Boost Default          : N/A
>     Processes
>         Process ID                  : 3360
>             Type                    : G
>             Name                    : /usr/bin/X
>             Used GPU Memory         : 116 MiB
>         Process ID                  : 12028
>             Type                    : G
>             Name                    :
> /prd/pkgs/schrodinger/pymol/2.0/bin/python
>             Used GPU Memory         : 33 MiB
>         Process ID                  : 24619
>             Type                    : G
>             Name                    : /usr/bin/gnome-shell
>             Used GPU Memory         : 126 MiB
>
> GPU 00000000:81:00.0
>     Product Name                    : Tesla K40c
>     Product Brand                   : Tesla
>     Display Mode                    : Disabled
>     Display Active                  : Disabled
>     Persistence Mode                : Enabled
>     Accounting Mode                 : Disabled
>     Accounting Mode Buffer Size     : 4000
>     Driver Model
>         Current                     : N/A
>         Pending                     : N/A
>     Serial Number                   : 0320415010473
>     GPU UUID                        :
> GPU-db0da2e2-ea71-1d14-9812-d7c59b6bf63a
>     Minor Number                    : 1
>     VBIOS Version                   : 80.80.3E.00.02
>     MultiGPU Board                  : No
>     Board ID                        : 0x8100
>     GPU Part Number                 : 900-22081-1750-000
>     Inforom Version
>         Image Version               : 2081.0206.01.04
>         OEM Object                  : 1.1
>         ECC Object                  : 3.0
>         Power Management Object     : N/A
>     GPU Operation Mode
>         Current                     : N/A
>         Pending                     : N/A
>     GPU Virtualization Mode
>         Virtualization mode         : None
>     PCI
>         Bus                         : 0x81
>         Device                      : 0x00
>         Domain                      : 0x0000
>         Device Id                   : 0x102410DE
>         Bus Id                      : 00000000:81:00.0
>         Sub System Id               : 0x098310DE
>         GPU Link Info
>             PCIe Generation
>                 Max                 : 3
>                 Current             : 1
>             Link Width
>                 Max                 : 16x
>                 Current             : 16x
>         Bridge Chip
>             Type                    : N/A
>             Firmware                : N/A
>         Replays since reset         : 0
>         Tx Throughput               : N/A
>         Rx Throughput               : N/A
>     Fan Speed                       : 23 %
>     Performance State               : P8
>     Clocks Throttle Reasons
>         Idle                        : Active
>         Applications Clocks Setting : Not Active
>         SW Power Cap                : Not Active
>         HW Slowdown                 : Not Active
>             HW Thermal Slowdown     : N/A
>             HW Power Brake Slowdown : N/A
>         Sync Boost                  : Not Active
>         SW Thermal Slowdown         : Not Active
>         Display Clock Setting       : Not Active
>     FB Memory Usage
>         Total                       : 11441 MiB
>         Used                        : 0 MiB
>         Free                        : 11441 MiB
>     BAR1 Memory Usage
>         Total                       : 256 MiB
>         Used                        : 2 MiB
>         Free                        : 254 MiB
>     Compute Mode                    : Default
>     Utilization
>         Gpu                         : 0 %
>         Memory                      : 0 %
>         Encoder                     : 0 %
>         Decoder                     : 0 %
>     Encoder Stats
>         Active Sessions             : 0
>         Average FPS                 : 0
>         Average Latency             : 0
>     Ecc Mode
>         Current                     : Enabled
>         Pending                     : Enabled
>     ECC Errors
>         Volatile
>             Single Bit
>                 Device Memory       : 0
>                 Register File       : 0
>                 L1 Cache            : 0
>                 L2 Cache            : 0
>                 Texture Memory      : 0
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : 0
>             Double Bit
>                 Device Memory       : 0
>                 Register File       : 0
>                 L1 Cache            : 0
>                 L2 Cache            : 0
>                 Texture Memory      : 0
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : 0
>         Aggregate
>             Single Bit
>                 Device Memory       : 10
>                 Register File       : 0
>                 L1 Cache            : 0
>                 L2 Cache            : 0
>                 Texture Memory      : 0
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : 10
>             Double Bit
>                 Device Memory       : 9
>                 Register File       : 0
>                 L1 Cache            : 0
>                 L2 Cache            : 0
>                 Texture Memory      : 0
>                 Texture Shared      : N/A
>                 CBU                 : N/A
>                 Total               : 9
>     Retired Pages
>         Single Bit ECC              : 0
>         Double Bit ECC              : 4
>         Pending                     : No
>     Temperature
>         GPU Current Temp            : 41 C
>         GPU Shutdown Temp           : 95 C
>         GPU Slowdown Temp           : 90 C
>         GPU Max Operating Temp      : N/A
>         Memory Current Temp         : N/A
>         Memory Max Operating Temp   : N/A
>     Power Readings
>         Power Management            : Supported
>         Power Draw                  : 22.90 W
>         Power Limit                 : 235.00 W
>         Default Power Limit         : 235.00 W
>         Enforced Power Limit        : 235.00 W
>         Min Power Limit             : 180.00 W
>         Max Power Limit             : 235.00 W
>     Clocks
>         Graphics                    : 324 MHz
>         SM                          : 324 MHz
>         Memory                      : 324 MHz
>         Video                       : 405 MHz
>     Applications Clocks
>         Graphics                    : 875 MHz
>         Memory                      : 3004 MHz
>     Default Applications Clocks
>         Graphics                    : 745 MHz
>         Memory                      : 3004 MHz
>     Max Clocks
>         Graphics                    : 875 MHz
>         SM                          : 875 MHz
>         Memory                      : 3004 MHz
>         Video                       : 540 MHz
>     Max Customer Boost Clocks
>         Graphics                    : N/A
>     Clock Policy
>         Auto Boost                  : N/A
>         Auto Boost Default          : N/A
>     Processes                       : None
>
>
>
> Scanning dependencies of target tests
> [100%] Built target tests
> Scanning dependencies of target run-ctest-nophys
> [100%] Running all tests except physical validation
> Test project /prd/pkgs/gromacs/2018.3/gromacs-gpu-build/build
>       Start  1: TestUtilsUnitTests
> 1/39 Test  #1: TestUtilsUnitTests ...............   Passed    0.42 sec
>       Start  2: TestUtilsMpiUnitTests
> 2/39 Test  #2: TestUtilsMpiUnitTests ............   Passed    0.27 sec
>       Start  3: MdlibUnitTest
> 3/39 Test  #3: MdlibUnitTest ....................   Passed    0.27 sec
>       Start  4: AppliedForcesUnitTest
> 4/39 Test  #4: AppliedForcesUnitTest ............   Passed    0.25 sec
>       Start  5: ListedForcesTest
> 5/39 Test  #5: ListedForcesTest .................   Passed    0.29 sec
>       Start  6: CommandLineUnitTests
> 6/39 Test  #6: CommandLineUnitTests .............   Passed    0.32 sec
>       Start  7: EwaldUnitTests
> 7/39 Test  #7: EwaldUnitTests ...................   Passed    2.39 sec
>       Start  8: FFTUnitTests
> 8/39 Test  #8: FFTUnitTests .....................   Passed    0.34 sec
>       Start  9: GpuUtilsUnitTests
> 9/39 Test  #9: GpuUtilsUnitTests ................   Passed    3.93 sec
>       Start 10: HardwareUnitTests
> 10/39 Test #10: HardwareUnitTests ................   Passed    0.27 sec
>       Start 11: MathUnitTests
> 11/39 Test #11: MathUnitTests ....................   Passed    0.28 sec
>       Start 12: MdrunUtilityUnitTests
> 12/39 Test #12: MdrunUtilityUnitTests ............   Passed    0.25 sec
>       Start 13: MdrunUtilityMpiUnitTests
> 13/39 Test #13: MdrunUtilityMpiUnitTests .........   Passed    0.28 sec
>       Start 14: OnlineHelpUnitTests
> 14/39 Test #14: OnlineHelpUnitTests ..............   Passed    0.28 sec
>       Start 15: OptionsUnitTests
> 15/39 Test #15: OptionsUnitTests .................   Passed    0.27 sec
>       Start 16: RandomUnitTests
> 16/39 Test #16: RandomUnitTests ..................   Passed    0.29 sec
>       Start 17: TableUnitTests
> 17/39 Test #17: TableUnitTests ...................   Passed    0.35 sec
>       Start 18: TaskAssignmentUnitTests
> 18/39 Test #18: TaskAssignmentUnitTests ..........   Passed    0.24 sec
>       Start 19: UtilityUnitTests
> 19/39 Test #19: UtilityUnitTests .................   Passed    0.32 sec
>       Start 20: FileIOTests
> 20/39 Test #20: FileIOTests ......................   Passed    0.30 sec
>       Start 21: PullTest
> 21/39 Test #21: PullTest .........................   Passed    0.25 sec
>       Start 22: AwhTest
> 22/39 Test #22: AwhTest ..........................   Passed    0.26 sec
>       Start 23: SimdUnitTests
> 23/39 Test #23: SimdUnitTests ....................   Passed    0.27 sec
>       Start 24: GmxAnaTest
> 24/39 Test #24: GmxAnaTest .......................   Passed    0.41 sec
>       Start 25: GmxPreprocessTests
> 25/39 Test #25: GmxPreprocessTests ...............   Passed    0.72 sec
>       Start 26: CorrelationsTest
> 26/39 Test #26: CorrelationsTest .................   Passed    0.80 sec
>       Start 27: AnalysisDataUnitTests
> 27/39 Test #27: AnalysisDataUnitTests ............   Passed    0.33 sec
>       Start 28: SelectionUnitTests
> 28/39 Test #28: SelectionUnitTests ...............   Passed    0.63 sec
>       Start 29: TrajectoryAnalysisUnitTests
> 29/39 Test #29: TrajectoryAnalysisUnitTests ......   Passed    0.94 sec
>       Start 30: EnergyAnalysisUnitTests
> 30/39 Test #30: EnergyAnalysisUnitTests ..........   Passed    0.40 sec
>       Start 31: CompatibilityHelpersTests
> 31/39 Test #31: CompatibilityHelpersTests ........   Passed    0.26 sec
>       Start 32: MdrunTests
> 32/39 Test #32: MdrunTests .......................   Passed   12.53 sec
>       Start 33: MdrunMpiTests
> 33/39 Test #33: MdrunMpiTests ....................   Passed    4.08 sec
>       Start 34: regressiontests/simple
> 34/39 Test #34: regressiontests/simple ...........   Passed   26.14 sec
>       Start 35: regressiontests/complex
> 35/39 Test #35: regressiontests/complex ..........   Passed  138.00 sec
>       Start 36: regressiontests/kernel
> 36/39 Test #36: regressiontests/kernel ...........   Passed  252.22 sec
>       Start 37: regressiontests/freeenergy
> 37/39 Test #37: regressiontests/freeenergy .......   Passed   26.16 sec
>       Start 38: regressiontests/pdb2gmx
> 38/39 Test #38: regressiontests/pdb2gmx ..........   Passed   77.12 sec
>       Start 39: regressiontests/rotation
> 39/39 Test #39: regressiontests/rotation .........   Passed   21.36 sec
>
> 100% tests passed, 0 tests failed out of 39
>
> Label Time Summary:
> GTest              =  33.48 sec*proc (33 tests)
> IntegrationTest    =  17.02 sec*proc (3 tests)
> MpiTest            =   4.63 sec*proc (3 tests)
> UnitTest           =  16.47 sec*proc (30 tests)
>
> Total Test time (real) = 574.67 sec
> [100%] Built target run-ctest-nophys
> Scanning dependencies of target check
> [100%] Built target check
>
>
>
>
>
> ____________________________________________________
> Gary Tresadern, MChem, Ph.D
> Senior Principal Scientist, Discovery Sciences
> Janssen Research & Development
> Tel.: +32 1464  1569
> mailto:gtresade at its.jnj.com
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list