[gmx-users] performance issue with many short MD runs
Michael Brunsteiner
mbx0009 at yahoo.com
Mon Mar 27 15:52:01 CEST 2017
Hi,I have to run a lot (many thousands) of very short MD reruns with gmx.Using gmx-2016.3 it works without problems, however, what i see is thatthe overall performance (in terms of REAL execution time as measured with the unix time command)which I get on a relatively new computer is poorer than what i get with a much older machine
(by a factor of about 2 - this in spite of gmx reporting a better performance of the new machine in thelog file)
both machines run linux (debian), the old has eight intel cores the newer one 12.
on the newer machine gmx uses a supposedly faster SIMD instruction setotherwise hardware (including hard drives) is comparable.
below output of a typical job (gmx mdrun -rerun with a trajectory containingnot more than a couple of thousand conformations of a single small molecule)on both machines (mdp file content below)
old machine:prompt> time gmx mdrun ...
in the log file:
Core t (s) Wall t (s) (%)
Time: 4.527 0.566 800.0
(ns/day) (hour/ns)
Performance: 1.527 15.719
on the command line:
real 2m45.562s <====================================
user 15m40.901s
sys 0m33.319s
new machine:
prompt> time gmx mdrun ...
in the log file: Core t (s) Wall t (s) (%)
Time: 6.030 0.502 1200.0
(ns/day) (hour/ns)
Performance: 1.719 13.958
on the command line:real 5m30.962s <====================================
user 20m2.208s
sys 3m28.676s
The specs of the two gmx installations are given below.I'd be grateful if anyone could suggest ways to improve performanceon the newer machine!
cheers,Michael
the older machine (here the jobs run faster): gmx --version
GROMACS version: 2016.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: CUDA
SIMD instructions: SSE4.1
FFT library: fftw-3.3.5-sse2
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.8.0
Tracing support: disabled
Built on: Tue Mar 21 11:24:42 CET 2017
Built by: root at rcpetemp1 [CMAKE]
Build OS/arch: Linux 3.13.0-79-generic x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz
Build CPU family: 6 Model: 26 Stepping: 5
Build CPU features: apic clfsh cmov cx8 cx16 htt lahf mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
C compiler: /usr/bin/cc GNU 4.8.4
C compiler flags: -msse4.1 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 4.8.4
C++ compiler flags: -msse4.1 -std=c++0x -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2015 NVIDIA Corporation;Built on Tue_Aug_11_14:27:32_CDT_2015;Cuda compilation tools, release 7.5, V7.5.17
CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-use_fast_math;;;-Xcompiler;,-msse4.1,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;
CUDA driver: 7.50
CUDA runtime: 7.50
the newer machine (here execution is slower by a factor 2): gmx --version
GROMACS version: 2016.3
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: CUDA
SIMD instructions: AVX_256
FFT library: fftw-3.3.5
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.10.0
Tracing support: disabled
Built on: Fri Mar 24 11:18:29 CET 2017
Built by: root at rcpe-sbd-node01 [CMAKE]
Build OS/arch: Linux 3.14-2-amd64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
Build CPU family: 6 Model: 62 Stepping: 4
Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 4.9.2
C compiler flags: -mavx -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 4.9.2
C++ compiler flags: -mavx -std=c++0x -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Wed_Jul_17_18:36:13_PDT_2013;Cuda compilation tools, release 5.5, V5.5.0
CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;;-Xcompiler;,-mavx,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;
CUDA driver: 6.50
CUDA runtime: 5.50
mdp-file:
integrator = md
dt = 0.001
nsteps = 0
comm-grps = System
cutoff-scheme = verlet
;
nstxout = 0
nstvout = 0
nstfout = 0
nstlog = 0
nstenergy = 1
;
nstlist = 10000
ns_type = grid
pbc = xyz
rlist = 3.9
;
coulombtype = cut-off
rcoulomb = 3.9
vdw_type = cut-off
rvdw = 3.9
DispCorr = no
;
constraints = none
;
continuation = yes
More information about the gromacs.org_gmx-users
mailing list