[gmx-users] No performance increase with single vs multiple nodes
Matthew W Hanley
mwhanley at syr.edu
Sun Oct 8 02:39:41 CEST 2017
I am running gromacs 2016.3 on CentOS 7.3 with the following command using a PBS scheduler:
#PBS -N TEST
#PBS -l nodes=1:ppn=32
export OMP_NUM_THREADS=1
mpirun -N 32 mdrun_mpi -deffnm TEST -dlb yes -pin on -nsteps 50000 -cpi TEST
However, I am seeing no performance increase when using more nodes:
On 32 MPI ranks
Core t (s) Wall t (s) (%)
Time: 28307.873 884.621 3200.0
(ns/day) (hour/ns)
Performance: 195.340 0.123
On 64 MPI ranks
Core t (s) Wall t (s) (%)
Time: 25502.709 398.480 6400.0
(ns/day) (hour/ns)
Performance: 216.828 0.111
On 96 MPI ranks
Core t (s) Wall t (s) (%)
Time: 51977.705 541.434 9600.0
(ns/day) (hour/ns)
Performance: 159.579 0.150
On 128 MPI ranks
Core t (s) Wall t (s) (%)
Time: 111576.333 871.690 12800.0
(ns/day) (hour/ns)
Performance: 198.238 0.121
?
Doing an strace of the mdrun process shows mostly this:
gettimeofday({1502811207, 567216}, NULL) = 0
poll([{fd=5, events=POLLIN}, {fd=17, events=POLLIN}, {fd=19, events=POLLIN}, {fd=20, events=POLLIN}, {fd=21, events=POLLIN}], 5, 0) = 0 (Timeout)
clock_gettime(CLOCK_MONOTONIC, {988952, 423108300}) = 0
Process 4818 attached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.62 0.032132 0 1243696 clock_gettime
20.41 0.012954 0 399833 futex
12.59 0.007991 363 22 close
4.71 0.002989 498 6 2 fsync
4.06 0.002579 18 141 write
2.39 0.001515 0 46632 gettimeofday
2.35 0.001494 0 23316 poll
1.55 0.000981 981 1 rename
1.16 0.000734 147 5 3 open
0.15 0.000093 1 70 munmap
0.03 0.000018 9 2 1 epoll_ctl
0.00 0.000002 1 4 nanosleep
0.00 0.000001 0 9 lseek
0.00 0.000000 0 7 read
0.00 0.000000 0 4 1 stat
0.00 0.000000 0 5 fstat
0.00 0.000000 0 2 mmap
0.00 0.000000 0 10 mprotect
0.00 0.000000 0 8 brk
0.00 0.000000 0 2 shutdown
0.00 0.000000 0 1 uname
0.00 0.000000 0 10 getdents
0.00 0.000000 0 1 rmdir
0.00 0.000000 0 17 16 unlink
0.00 0.000000 0 7 1 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.063483 1713811 24 total
And here is the compilation information:
GROMACS version: 2016.3
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: AVX2_256
FFT library: fftw-3.3.5
RDTSCP usage: disabled
TNG support: enabled
Hwloc support: hwloc-1.11.0
Tracing support: disabled
Built on: Fri Aug 11 16:23:00 EDT 2017
Built by: citadmin at CRUSH-LCS-10-51-51-163 [CMAKE]
Build OS/arch: Linux 3.10.0-514.21.2.el7.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E7-8867 v3 @ 2.50GHz
Build CPU family: 6 Model: 63 Stepping: 4
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle lahf mmx msr pclmuldq popcnt pse rdrnd rtm sse2 sse3 sse4.1 sse4.2 ssse3
C compiler: /usr/local/bin/mpicc GNU 6.2.1
C compiler flags: -march=core-avx2 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/local/bin/mpicxx GNU 6.2.1
C++ compiler flags: -march=core-avx2 -std=c++0x -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
Any help would be appreciated, thank you!
-Matt
Matthew Hanley
IT Analyst
College of Engineering and Computer Science
Syracuse University
mwhanley at syr.edu
More information about the gromacs.org_gmx-users
mailing list