[gmx-users] Performance gains with AVX_512 ?

Kutzner, Carsten ckutzne at gwdg.de
Tue Dec 12 15:07:43 CET 2017


Hi,

what are the expected performance benefits of AVX_512 SIMD instructions
on Intel Skylake processors, compared to AVX2_256? In many cases, I see
a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
not slower than inferior instruction sets.

Some quick benchmarks results:
Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
80k atoms membrane benchmark system, 2 fs time step, pme on cpu

GROMACS v.    SIMD        ns/d
2016          AVX_512     102.3
2016          AVX2_256    119.3
2018b2        AVX_512     107.9
2018b2        AVX2_256    123.2

I realize that AVX_512 turbo frequencies are significantly lower
compared to AVX2_256 if all cores are in use, and for a serial run,
AVX_512 is indeed by about 6% faster than AVX2_256.

Gromacs 2018b2, -nb cpu
thread-MPI  ns/day   ns/day     improvement
threads     AVX_512  AVX2_256   over AVX2
 1           2.880    2.702     1.065
 2           5.451    5.209     1.046
 4           9.617    9.332     1.031
 8          17.469   17.276     1.011
12          21.852   24.245      .901
16          28.579   31.691      .902
24          39.731   41.576      .956
48          41.831   39.336     1.063

Can anyone comment on whether that is the expected behavior and why?

Thanks!
  Carsten





More information about the gromacs.org_gmx-users mailing list