[gmx-users] 2019.2 not using all available cores

Wed May 8 23:55:29 CEST 2019

gmx 2019.2 compiled using threads only uses a single core mdrun_mpi
compiled using MPI only uses a single core, gmx 2016.3 using threads
uses all 12 cores.

For compiling thread version of 2019.2 used:
cmake .. -DGMX_GPU=ON -DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/gromacs-2019.2

For compiling MPI version of 2019.2 used:
cmake .. -DGMX_MPI=ON -DBUILD_SHARED_LIBS=OFF -DGMX_GPU=ON
-DCMAKE_CXX_COMPILER=/usr/lib64/mpi/gcc/openmpi/bin/mpiCC
-DCMAKE_C_COMPILER=/usr/lib64/mpi/gcc/openmpi/bin/mpicc
-DGMX_BUILD_MDRUN_ONLY=ON
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/gromacs-2019.2

Between building both to of those, deleted the build directory.

####################################################
GROMACS:      gmx, version 2019.2
Executable:   /usr/local/gromacs/gromacs-2019.2/bin/gmx
Data prefix:  /usr/local/gromacs/gromacs-2019.2
Working dir:  /home/dallas/experiments/current/19-064/P6DLO
Command line:
  gmx -version

GROMACS version:    2019.2
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX_256
FFT library:        fftw-3.3.8-sse2
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /usr/bin/cc GNU 7.4.0
C compiler flags:    -mavx     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 7.4.0
C++ compiler flags:  -mavx    -std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on
Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1,
V10.1.105
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;;
;-mavx;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        10.10
CUDA runtime:       10.10

####################################################
GROMACS:      mdrun_mpi, version 2019.2
Executable:   /usr/local/gromacs/gromacs-2019.2/bin/mdrun_mpi
Data prefix:  /usr/local/gromacs/gromacs-2019.2
Working dir:  /home/dallas/experiments/current/19-064/P6DLO
Command line:
  mdrun_mpi -version

GROMACS version:    2019.2
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX_256
FFT library:        fftw-3.3.8-sse2
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /usr/lib64/mpi/gcc/openmpi/bin/mpicc GNU 7.4.0
C compiler flags:    -mavx     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /usr/lib64/mpi/gcc/openmpi/bin/mpiCC GNU 7.4.0
C++ compiler flags:  -mavx    -std=c++11   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on
Fri_Feb__8_19:08:17_PST_2019;Cuda compilation tools, release 10.1,
V10.1.105
CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;-D_FORCE_INLINES;;
;-mavx;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        10.10
CUDA runtime:       10.10

####################################################
/usr/local/gromacs/gromacs-2016.3/bin/gmx -version

GROMACS:      gmx, version 2016.3
Executable:   /usr/local/gromacs/gromacs-2016.3/bin/gmx
Data prefix:  /usr/local/gromacs/gromacs-2016.3
Working dir:  /home/dallas/experiments/current/19-064/P6DLO
Command line:
  gmx -version

GROMACS version:    2016.3
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:        CUDA
SIMD instructions:  AVX_256
FFT library:        fftw-3.3.8-sse2
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
Built on:           Tue Mar 21 13:21:15 AEDT 2017
Built by:           dallas at morph [CMAKE]
Build OS/arch:      Linux 4.4.49-16-default x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
Build CPU family:   6   Model: 45   Stepping: 7
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf mmx msr
nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3
sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 4.8.5
C compiler flags:    -mavx     -O3 -DNDEBUG -funroll-all-loops
-fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 4.8.5
C++ compiler flags:  -mavx    -std=c++0x   -O3 -DNDEBUG
-funroll-all-loops -fexcess-precision=fast
CUDA compiler:      /usr/local/cuda-8.0/bin/nvcc nvcc: NVIDIA (R) Cuda
compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on
Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0,
V8.0.61
CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-use_fast_math;;;-Xcompiler;,-mavx,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;
CUDA driver:        10.10
CUDA runtime:       8.0

Catch ya,

Dr. Dallas Warren
Drug Delivery, Disposition and Dynamics
Monash Institute of Pharmaceutical Sciences, Monash University
381 Royal Parade, Parkville VIC 3052
dallas.warren at monash.edu
---------------------------------
When the only tool you own is a hammer, every problem begins to resemble a nail.