[gmx-users] FW: v2018.3; GPU not recognised

Tresadern, Gary [RNDBE] gtresade at its.jnj.com
Thu Oct 4 17:36:13 CEST 2018


Hi,
We are trying to build a simple workstation installation of v2018.3 that will run with GPU support. 
The build and test seems to go without errors, but when we test run new jobs we see the GPU is not being recognized, NOTE: Detection of GPUs failed. The API reported...
We have previously built v5 without these problems. Can you give us some tips for settings we may need to adjust?

Thanks
Gary

#Now switch to sofinst user, (I was not able to do this)
scl enable devtoolset-7 bash

export PATH=$PATH:/usr/local/bin/
export PATH=$PATH:/usr/local/cuda-9.2/bin/
export CUDA_HOME=/usr/local/cuda-9.2/
export PATH=$PATH:/usr/lib64/openmpi/bin/
export LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64:/usr/local/cuda-9.2/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}"
#the command below changes depending on the number of GPUs in the workstation
export CUDA_VISIBLE_DEVICES=0,1

#start installation of gromacs, download gromacs
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2018.3.tar.gz
tar xfz gromacs-2018.3.tar.gz
cd gromacs-2018.3
mkdir build
cd build

#this is the command to set the variables and stuff prior to installation, I chose to install in local /tmp folder, it would be good to keep this the same path on all workstations
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-9.2/ -DGMX_GPU=on -DCMAKE_INSTALL_PREFIX=/tmp/gromacs-2018.3/ 
make
make check
make install
source /tmp/gromacs-2018.3/bin/GMXRC

-bash-4.2$
-bash-4.2$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp                           : Wed Oct  3 20:03:24 2018
Driver Version                      : 390.77

Attached GPUs                       : 2
GPU 00000000:03:00.0
    Product Name                    : Quadro K4200
    Product Brand                   : Quadro
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0420315044134
    GPU UUID                        : GPU-bdae121b-23e1-dd89-5366-57761927ec39
    Minor Number                    : 0
    VBIOS Version                   : 80.04.FE.00.15
    MultiGPU Board                  : No
    Board ID                        : 0x300
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : 2004.0503.01.02
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x11B410DE
        Bus Id                      : 00000000:03:00.0
        Sub System Id               : 0x109610DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : 30 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : N/A
            HW Power Brake Slowdown : N/A
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 4036 MiB
        Used                        : 279 MiB
        Free                        : 3757 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 5 MiB
        Free                        : 251 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 3 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 37 C
        GPU Shutdown Temp           : 102 C
        GPU Slowdown Temp           : 97 C
        GPU Max Operating Temp      : 80 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 15.45 W
        Power Limit                 : 110.00 W
        Default Power Limit         : 110.00 W
        Enforced Power Limit        : 110.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 130.00 W
    Clocks
        Graphics                    : 324 MHz
        SM                          : 324 MHz
        Memory                      : 324 MHz
        Video                       : 405 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 888 MHz
        SM                          : 888 MHz
        Memory                      : 2700 MHz
        Video                       : 540 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 3360
            Type                    : G
            Name                    : /usr/bin/X
            Used GPU Memory         : 116 MiB
        Process ID                  : 12028
            Type                    : G
            Name                    : /prd/pkgs/schrodinger/pymol/2.0/bin/python
            Used GPU Memory         : 33 MiB
        Process ID                  : 24619
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 126 MiB

GPU 00000000:81:00.0
    Product Name                    : Tesla K40c
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0320415010473
    GPU UUID                        : GPU-db0da2e2-ea71-1d14-9812-d7c59b6bf63a
    Minor Number                    : 1
    VBIOS Version                   : 80.80.3E.00.02
    MultiGPU Board                  : No
    Board ID                        : 0x8100
    GPU Part Number                 : 900-22081-1750-000
    Inforom Version
        Image Version               : 2081.0206.01.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x81
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102410DE
        Bus Id                      : 00000000:81:00.0
        Sub System Id               : 0x098310DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : 23 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : N/A
            HW Power Brake Slowdown : N/A
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 11441 MiB
        Used                        : 0 MiB
        Free                        : 11441 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : Enabled
        Pending                     : Enabled
    ECC Errors
        Volatile
            Single Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
            Double Bit
                Device Memory       : 0
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 0
        Aggregate
            Single Bit
                Device Memory       : 10
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 10
            Double Bit
                Device Memory       : 9
                Register File       : 0
                L1 Cache            : 0
                L2 Cache            : 0
                Texture Memory      : 0
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : 9
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 4
        Pending                     : No
    Temperature
        GPU Current Temp            : 41 C
        GPU Shutdown Temp           : 95 C
        GPU Slowdown Temp           : 90 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 22.90 W
        Power Limit                 : 235.00 W
        Default Power Limit         : 235.00 W
        Enforced Power Limit        : 235.00 W
        Min Power Limit             : 180.00 W
        Max Power Limit             : 235.00 W
    Clocks
        Graphics                    : 324 MHz
        SM                          : 324 MHz
        Memory                      : 324 MHz
        Video                       : 405 MHz
    Applications Clocks
        Graphics                    : 875 MHz
        Memory                      : 3004 MHz
    Default Applications Clocks
        Graphics                    : 745 MHz
        Memory                      : 3004 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 3004 MHz
        Video                       : 540 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes                       : None



Scanning dependencies of target tests
[100%] Built target tests
Scanning dependencies of target run-ctest-nophys
[100%] Running all tests except physical validation
Test project /prd/pkgs/gromacs/2018.3/gromacs-gpu-build/build
      Start  1: TestUtilsUnitTests
1/39 Test  #1: TestUtilsUnitTests ...............   Passed    0.42 sec
      Start  2: TestUtilsMpiUnitTests
2/39 Test  #2: TestUtilsMpiUnitTests ............   Passed    0.27 sec
      Start  3: MdlibUnitTest
3/39 Test  #3: MdlibUnitTest ....................   Passed    0.27 sec
      Start  4: AppliedForcesUnitTest
4/39 Test  #4: AppliedForcesUnitTest ............   Passed    0.25 sec
      Start  5: ListedForcesTest
5/39 Test  #5: ListedForcesTest .................   Passed    0.29 sec
      Start  6: CommandLineUnitTests
6/39 Test  #6: CommandLineUnitTests .............   Passed    0.32 sec
      Start  7: EwaldUnitTests
7/39 Test  #7: EwaldUnitTests ...................   Passed    2.39 sec
      Start  8: FFTUnitTests
8/39 Test  #8: FFTUnitTests .....................   Passed    0.34 sec
      Start  9: GpuUtilsUnitTests
9/39 Test  #9: GpuUtilsUnitTests ................   Passed    3.93 sec
      Start 10: HardwareUnitTests
10/39 Test #10: HardwareUnitTests ................   Passed    0.27 sec
      Start 11: MathUnitTests
11/39 Test #11: MathUnitTests ....................   Passed    0.28 sec
      Start 12: MdrunUtilityUnitTests
12/39 Test #12: MdrunUtilityUnitTests ............   Passed    0.25 sec
      Start 13: MdrunUtilityMpiUnitTests
13/39 Test #13: MdrunUtilityMpiUnitTests .........   Passed    0.28 sec
      Start 14: OnlineHelpUnitTests
14/39 Test #14: OnlineHelpUnitTests ..............   Passed    0.28 sec
      Start 15: OptionsUnitTests
15/39 Test #15: OptionsUnitTests .................   Passed    0.27 sec
      Start 16: RandomUnitTests
16/39 Test #16: RandomUnitTests ..................   Passed    0.29 sec
      Start 17: TableUnitTests
17/39 Test #17: TableUnitTests ...................   Passed    0.35 sec
      Start 18: TaskAssignmentUnitTests
18/39 Test #18: TaskAssignmentUnitTests ..........   Passed    0.24 sec
      Start 19: UtilityUnitTests
19/39 Test #19: UtilityUnitTests .................   Passed    0.32 sec
      Start 20: FileIOTests
20/39 Test #20: FileIOTests ......................   Passed    0.30 sec
      Start 21: PullTest
21/39 Test #21: PullTest .........................   Passed    0.25 sec
      Start 22: AwhTest
22/39 Test #22: AwhTest ..........................   Passed    0.26 sec
      Start 23: SimdUnitTests
23/39 Test #23: SimdUnitTests ....................   Passed    0.27 sec
      Start 24: GmxAnaTest
24/39 Test #24: GmxAnaTest .......................   Passed    0.41 sec
      Start 25: GmxPreprocessTests
25/39 Test #25: GmxPreprocessTests ...............   Passed    0.72 sec
      Start 26: CorrelationsTest
26/39 Test #26: CorrelationsTest .................   Passed    0.80 sec
      Start 27: AnalysisDataUnitTests
27/39 Test #27: AnalysisDataUnitTests ............   Passed    0.33 sec
      Start 28: SelectionUnitTests
28/39 Test #28: SelectionUnitTests ...............   Passed    0.63 sec
      Start 29: TrajectoryAnalysisUnitTests
29/39 Test #29: TrajectoryAnalysisUnitTests ......   Passed    0.94 sec
      Start 30: EnergyAnalysisUnitTests
30/39 Test #30: EnergyAnalysisUnitTests ..........   Passed    0.40 sec
      Start 31: CompatibilityHelpersTests
31/39 Test #31: CompatibilityHelpersTests ........   Passed    0.26 sec
      Start 32: MdrunTests
32/39 Test #32: MdrunTests .......................   Passed   12.53 sec
      Start 33: MdrunMpiTests
33/39 Test #33: MdrunMpiTests ....................   Passed    4.08 sec
      Start 34: regressiontests/simple
34/39 Test #34: regressiontests/simple ...........   Passed   26.14 sec
      Start 35: regressiontests/complex
35/39 Test #35: regressiontests/complex ..........   Passed  138.00 sec
      Start 36: regressiontests/kernel
36/39 Test #36: regressiontests/kernel ...........   Passed  252.22 sec
      Start 37: regressiontests/freeenergy
37/39 Test #37: regressiontests/freeenergy .......   Passed   26.16 sec
      Start 38: regressiontests/pdb2gmx
38/39 Test #38: regressiontests/pdb2gmx ..........   Passed   77.12 sec
      Start 39: regressiontests/rotation
39/39 Test #39: regressiontests/rotation .........   Passed   21.36 sec

100% tests passed, 0 tests failed out of 39

Label Time Summary:
GTest              =  33.48 sec*proc (33 tests)
IntegrationTest    =  17.02 sec*proc (3 tests)
MpiTest            =   4.63 sec*proc (3 tests)
UnitTest           =  16.47 sec*proc (30 tests)

Total Test time (real) = 574.67 sec
[100%] Built target run-ctest-nophys
Scanning dependencies of target check
[100%] Built target check





____________________________________________________
Gary Tresadern, MChem, Ph.D
Senior Principal Scientist, Discovery Sciences
Janssen Research & Development
Tel.: +32 1464  1569
mailto:gtresade at its.jnj.com 




More information about the gromacs.org_gmx-users mailing list