[gmx-users] Performance gains with AVX_512 ?
Kutzner, Carsten
ckutzne at gwdg.de
Tue Dec 12 15:07:43 CET 2017
Hi,
what are the expected performance benefits of AVX_512 SIMD instructions
on Intel Skylake processors, compared to AVX2_256? In many cases, I see
a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
not slower than inferior instruction sets.
Some quick benchmarks results:
Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
80k atoms membrane benchmark system, 2 fs time step, pme on cpu
GROMACS v. SIMD ns/d
2016 AVX_512 102.3
2016 AVX2_256 119.3
2018b2 AVX_512 107.9
2018b2 AVX2_256 123.2
I realize that AVX_512 turbo frequencies are significantly lower
compared to AVX2_256 if all cores are in use, and for a serial run,
AVX_512 is indeed by about 6% faster than AVX2_256.
Gromacs 2018b2, -nb cpu
thread-MPI ns/day ns/day improvement
threads AVX_512 AVX2_256 over AVX2
1 2.880 2.702 1.065
2 5.451 5.209 1.046
4 9.617 9.332 1.031
8 17.469 17.276 1.011
12 21.852 24.245 .901
16 28.579 31.691 .902
24 39.731 41.576 .956
48 41.831 39.336 1.063
Can anyone comment on whether that is the expected behavior and why?
Thanks!
Carsten
More information about the gromacs.org_gmx-users
mailing list