[gmx-users] 2018 installation make check errors, probably CUDA related
Tresadern, Gary [RNDBE]
gtresade at its.jnj.com
Sat Mar 17 16:46:15 CET 2018
Hi,
I am unable to pass the make check tests for a 2018 build. I had a working build earlier in the week, but since we updated the cuda toolkit and nvidia driver it now fails.
Below are some details of the installation procedure.
I tried manually setting variables such as CUDA_VISIBLE_DEVICES but that also didn't help.
I am running out of ideas, if you have any tips please let me know.
Thanks
Gary
bash-4.1$ su softinst
bash-4.1$ scl enable devtoolset-2 bash
bash-4.1$ which cmake
/usr/local/bin/cmake
bash-4.1$ cmake --version
cmake version 3.6.2
CMake suite maintained and supported by Kitware (kitware.com/cmake).
bash-4.1$ gcc --version
gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
bash-4.1$ ls /usr/local/cuda-9.1/
bin/ extras/ lib64/ libnvvp/ nsightee_plugins/ nvvm/ samples/ src/ tools/
doc/ include/ libnsight/ LICENSE nvml/ README share/ targets/ version.txt
bash-4.1$ ls /usr/local/cuda-9.1/bin/
bin2c cuda-gdb fatbinary nvcc.profile nvvp
computeprof cuda-gdbserver gpu-library-advisor nvdisasm ptxas
crt/ cuda-install-samples-9.1.sh nsight nvlink
cudafe cuda-memcheck nsight_ee_plugins_manage.sh nvprof
cudafe++ cuobjdump nvcc nvprune
bash-4.1$ export PATH=$PATH:/usr/local/bin/
bash-4.1$ export CUDA_HOME=/usr/local/cuda-9.1/
bash-4.1$ export PATH=$PATH:/usr/lib64/mpich/bin/
bash-4.1$ export LD_LIBRARY_PATH="/usr/local/cuda-9.1/lib64/:${LD_LIBRARY_PATH}"
bash-4.1$ export LD_LIBRARY_PATH="/usr/local/cuda-9.1/lib64:/usr/local/cuda-9.1/targets/x86_64-linux/lib/:${LD_LIBRARY_PATH}"
bash-4.1$ export LD_LIBRARY_PATH=/usr/lib64/openmpi-1.10/lib/openmpi/:$LD_LIBRARY_PATH
bash-4.1$ export MPI_CXX_INCLUDE_PATH=/usr/include/openmpi-1.10-x86_64/openmpi/ompi/mpi/cxx/
bash-4.1$ export PATH=$PATH:/usr/lib64/openmpi-1.10/bin/
bash-4.1$ cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-9.1/ -DGMX_GPU=on -DCMAKE_INSTALL_PREFIX=/prd/pkgs/gromacs/gromacs-2018/ -DGMX_MPI=on
bash-4.1$ make
bash-4.1$ make check
Test project /prd/pkgs/gromacs/gromacs-2018/build
Start 1: TestUtilsUnitTests
1/39 Test #1: TestUtilsUnitTests ............... Passed 0.41 sec
Start 2: TestUtilsMpiUnitTests
2/39 Test #2: TestUtilsMpiUnitTests ............ Passed 0.29 sec
Start 3: MdlibUnitTest
3/39 Test #3: MdlibUnitTest .................... Passed 0.24 sec
Start 4: AppliedForcesUnitTest
4/39 Test #4: AppliedForcesUnitTest ............ Passed 0.22 sec
Start 5: ListedForcesTest
5/39 Test #5: ListedForcesTest ................. Passed 0.25 sec
Start 6: CommandLineUnitTests
6/39 Test #6: CommandLineUnitTests ............. Passed 0.29 sec
Start 7: EwaldUnitTests
7/39 Test #7: EwaldUnitTests ...................***Failed 0.92 sec
[==========] Running 257 tests from 10 test cases.
[----------] Global test environment set-up.
-------------------------------------------------------
Program: ewald-test, version 2018
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 735)
Function: void findGpus(gmx_gpu_info_t*)
Assertion failed:
Condition: cudaSuccess == cudaPeekAtLastError()
Should be cudaSuccess
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 8: FFTUnitTests
8/39 Test #8: FFTUnitTests ..................... Passed 0.37 sec
Start 9: GpuUtilsUnitTests
9/39 Test #9: GpuUtilsUnitTests ................***Failed 0.91 sec
[==========] Running 35 tests from 7 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from HostAllocatorTest/0, where TypeParam = int
[ RUN ] HostAllocatorTest/0.EmptyMemoryAlwaysWorks
-------------------------------------------------------
Program: gpu_utils-test, version 2018
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 735)
Function: void findGpus(gmx_gpu_info_t*)
Assertion failed:
Condition: cudaSuccess == cudaPeekAtLastError()
Should be cudaSuccess
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 10: HardwareUnitTests
10/39 Test #10: HardwareUnitTests ................ Passed 0.24 sec
Start 11: MathUnitTests
11/39 Test #11: MathUnitTests .................... Passed 0.25 sec
Start 12: MdrunUtilityUnitTests
12/39 Test #12: MdrunUtilityUnitTests ............ Passed 0.22 sec
Start 13: MdrunUtilityMpiUnitTests
13/39 Test #13: MdrunUtilityMpiUnitTests ......... Passed 0.35 sec
Start 14: OnlineHelpUnitTests
14/39 Test #14: OnlineHelpUnitTests .............. Passed 0.24 sec
Start 15: OptionsUnitTests
15/39 Test #15: OptionsUnitTests ................. Passed 0.25 sec
Start 16: RandomUnitTests
16/39 Test #16: RandomUnitTests .................. Passed 0.26 sec
Start 17: TableUnitTests
17/39 Test #17: TableUnitTests ................... Passed 0.41 sec
Start 18: TaskAssignmentUnitTests
18/39 Test #18: TaskAssignmentUnitTests .......... Passed 0.21 sec
Start 19: UtilityUnitTests
19/39 Test #19: UtilityUnitTests ................. Passed 0.32 sec
Start 20: FileIOTests
20/39 Test #20: FileIOTests ...................... Passed 0.26 sec
Start 21: PullTest
21/39 Test #21: PullTest ......................... Passed 0.24 sec
Start 22: AwhTest
22/39 Test #22: AwhTest .......................... Passed 0.23 sec
Start 23: SimdUnitTests
23/39 Test #23: SimdUnitTests .................... Passed 0.29 sec
Start 24: GmxAnaTest
24/39 Test #24: GmxAnaTest ....................... Passed 0.38 sec
Start 25: GmxPreprocessTests
25/39 Test #25: GmxPreprocessTests ............... Passed 0.58 sec
Start 26: CorrelationsTest
26/39 Test #26: CorrelationsTest ................. Passed 1.23 sec
Start 27: AnalysisDataUnitTests
27/39 Test #27: AnalysisDataUnitTests ............ Passed 0.30 sec
Start 28: SelectionUnitTests
28/39 Test #28: SelectionUnitTests ............... Passed 0.61 sec
Start 29: TrajectoryAnalysisUnitTests
29/39 Test #29: TrajectoryAnalysisUnitTests ...... Passed 1.19 sec
Start 30: EnergyAnalysisUnitTests
30/39 Test #30: EnergyAnalysisUnitTests .......... Passed 0.58 sec
Start 31: CompatibilityHelpersTests
31/39 Test #31: CompatibilityHelpersTests ........ Passed 0.23 sec
Start 32: MdrunTests
32/39 Test #32: MdrunTests .......................***Failed 0.98 sec
[==========] Running 29 tests from 11 test cases.
[----------] Global test environment set-up.
[----------] 6 tests from BondedInteractionsTest
[ RUN ] BondedInteractionsTest.NormalBondWorks
NOTE 1 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest_NormalBondWorks_input.mdp, line 1]:
/prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest_NormalBondWorks_input.mdp did not specify a value for the .mdp option "cutoff-scheme". Probably it
was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
scheme was introduced, but the group scheme was still the default. The
default is now the Verlet scheme, so you will observe different behaviour.
NOTE 2 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest_NormalBondWorks_input.mdp]:
For a correct single-point energy evaluation with nsteps = 0, use
continuation = yes to avoid constraining the input coordinates.
Setting the LD random seed to 417973934
Generated 3 of the 3 non-bonded parameter combinations
Excluding 3 bonded neighbours molecule type 'butane'
Removing all charge groups because cutoff-scheme=Verlet
NOTE 3 [file BondedInteractionsTest_NormalBondWorks_butane1.top, line 31]:
In moleculetype 'butane' 2 atoms are not bound by a potential or
constraint to any other atom in the same moleculetype. Although
technically this might not cause issues in a simulation, this often means
that the user forgot to add a bond/potential/constraint or put multiple
molecules in the same moleculetype definition by mistake. Run with -v to
get information for each atom.
Number of degrees of freedom in T-Coupling group rest is 9.00
NOTE 4 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest_NormalBondWorks_input.mdp]:
NVE simulation with an initial temperature of zero: will use a Verlet
buffer of 10%. Check your energy drift!
There were 4 notes
-------------------------------------------------------
Program: mdrun-test, version 2018
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 735)
Function: void findGpus(gmx_gpu_info_t*)
Assertion failed:
Condition: cudaSuccess == cudaPeekAtLastError()
Should be cudaSuccess
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
This run will generate roughly 0 Mb of data
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 33: MdrunMpiTests
33/39 Test #33: MdrunMpiTests ....................***Failed 2.06 sec
[==========] Running 7 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 1 test from MultiSimTerminationTest
[ RUN ] MultiSimTerminationTest.WritesCheckpointAfterMaxhTerminationAndThenRestarts
NOTE 1 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/MultiSimTerminationTest_WritesCheckpointAfterMaxhTerminationAndThenRestarts_input1.mdp, line 14]:
/prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/MultiSimTerminationTest_WritesCheckpointAfterMaxhTerminationAndThenRestarts_input1.mdp did not specify a value for the .mdp option "cutoff-scheme". Probably it
was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
scheme was introduced, but the group scheme was still the default. The
default is now the Verlet scheme, so you will observe different behaviour.
NOTE 1 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/MultiSimTerminationTest_WritesCheckpointAfterMaxhTerminationAndThenRestarts_input0.mdp, line 14]:
/prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/MultiSimTerminationTest_WritesCheckpointAfterMaxhTerminationAndThenRestarts_input0.mdp did not specify a value for the .mdp option "cutoff-scheme". Probably it
was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
scheme was introduced, but the group scheme was still the default. The
default is now the Verlet scheme, so you will observe different behaviour.
Setting the LD random seed to 73630723
Generated 3 of the 3 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 3 of the 3 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Setting gen_seed to -1322183961
Velocities were taken from a Maxwell distribution at 288 K
Removing all charge groups because cutoff-scheme=Verlet
Number of degrees of freedom in T-Coupling group System is 9.00
Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 298 K
Calculated rlist for 1x1 atom pair-list as 1.026 nm, buffer size 0.026 nm
Set rlist, assuming 4x4 atom pair-list, to 1.024 nm, buffer size 0.024 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup
This run will generate roughly 0 Mb of data
NOTE 2 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/MultiSimTerminationTest_WritesCheckpointAfterMaxhTerminationAndThenRestarts_input1.mdp]:
You are using a plain Coulomb cut-off, which might produce artifacts.
You might want to consider using PME electrostatics.
There were 2 notes
Setting the LD random seed to 408678750
Generated 3 of the 3 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 3 of the 3 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
Setting gen_seed to 1490520586
Velocities were taken from a Maxwell distribution at 298 K
Removing all charge groups because cutoff-scheme=Verlet
Number of degrees of freedom in T-Coupling group System is 9.00
Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 298 K
NOTE 2 [file /prd/pkgs/gromacs/gromacs-2018/build/src/programs/mdrun/tests/Testing/Temporary/MultiSimTerminationTest_WritesCheckpointAfterMaxhTerminationAndThenRestarts_input0.mdp]:
You are using a plain Coulomb cut-off, which might produce artifacts.
You might want to consider using PME electrostatics.
There were 2 notes
Calculated rlist for 1x1 atom pair-list as 1.026 nm, buffer size 0.026 nm
Set rlist, assuming 4x4 atom pair-list, to 1.024 nm, buffer size 0.024 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup
This run will generate roughly 0 Mb of data
-------------------------------------------------------
Program: mdrun-mpi-test, version 2018
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 735)
Function: void findGpus(gmx_gpu_info_t*)
MPI rank: 0 (out of 2)
Assertion failed:
Condition: cudaSuccess == cudaPeekAtLastError()
Should be cudaSuccess
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Start 34: regressiontests/simple
34/39 Test #34: regressiontests/simple ........... Passed 25.95 sec
Start 35: regressiontests/complex
35/39 Test #35: regressiontests/complex .......... Passed 80.79 sec
Start 36: regressiontests/kernel
36/39 Test #36: regressiontests/kernel ........... Passed 223.69 sec
Start 37: regressiontests/freeenergy
37/39 Test #37: regressiontests/freeenergy ....... Passed 16.11 sec
Start 38: regressiontests/pdb2gmx
38/39 Test #38: regressiontests/pdb2gmx .......... Passed 92.77 sec
Start 39: regressiontests/rotation
39/39 Test #39: regressiontests/rotation ......... Passed 20.51 sec
90% tests passed, 4 tests failed out of 39
Label Time Summary:
GTest = 15.83 sec (33 tests)
IntegrationTest = 3.42 sec (3 tests)
MpiTest = 2.70 sec (3 tests)
UnitTest = 12.41 sec (30 tests)
Total Test time (real) = 475.81 sec
The following tests FAILED:
7 - EwaldUnitTests (Failed)
9 - GpuUtilsUnitTests (Failed)
32 - MdrunTests (Failed)
33 - MdrunMpiTests (Failed)
Errors while running CTest
make[3]: *** [CMakeFiles/run-ctest-nophys] Error 8
make[2]: *** [CMakeFiles/run-ctest-nophys.dir/all] Error 2
make[1]: *** [CMakeFiles/check.dir/rule] Error 2
make: *** [check] Error 2
[softinst at wrndbeberhel13 build]$ bin/gmx_mpi mdrun -version
:-) GROMACS - gmx mdrun, 2018 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof
Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Viveca Lindahl Magnus Lundborg Pieter Meulenhoff
Erik Marklund Teemu Murtola Szilard Pall Sander Pronk
Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers
Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx mdrun, version 2018
Executable: /toledo/prd/pkgs/gromacs/gromacs-2018/build/bin/gmx_mpi
Data prefix: /prd/pkgs/gromacs/gromacs-2018 (source tree)
Working dir: /toledo/prd/pkgs/gromacs/gromacs-2018/build
Command line:
gmx_mpi mdrun -version
GROMACS version: 2018
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX2_256
FFT library: fftw-3.3.5-fma-sse2-avx-avx2-avx2_128
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: 2018-03-16 13:16:04
Built by: softinst at wrndbeberhel13 [CMAKE]
Build OS/arch: Linux 2.6.32-573.12.1.el6.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz
Build CPU family: 6 Model: 63 Stepping: 2
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /opt/rh/devtoolset-2/root/usr/bin/cc GNU 4.8.2
C compiler flags: -march=core-avx2 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /opt/rh/devtoolset-2/root/usr/bin/c++ GNU 4.8.2
C++ compiler flags: -march=core-avx2 -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda-9.1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2017 NVIDIA Corporation;Built on Fri_Nov__3_21:07:56_CDT_2017;Cuda compilation tools, release 9.1, V9.1.85
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;;; ;-march=core-avx2;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver: 9.10
CUDA runtime: 9.10
More information about the gromacs.org_gmx-users
mailing list