[gmx-users] Performance gains with AVX_512 ?

Szilárd Páll pall.szilard at gmail.com
Tue Dec 12 17:58:59 CET 2017


Hi Carsten,

The performance behavior you observe is expected, I have observed it
myself. Nothing seems unusual in the performance numbers you report.

The AVX512 clock throttle is additional (10-20% IIRC) to the AVX2 throttle,
and the only code that really gains significantly from AVX512 is the
nonbonded kernels. When those are offloaded, the gain from higher clocks
with AVX2 will translate to better CPU performance (and especially if the
run is CPU-bound, that will make a significant difference).

BTW, on the low- and mid-range CPUs ("Bronze"/"Silver" and "cut-down" i9s)
AVX512 is even less likely to ever be worth it.

Cheers,

--
Szilárd

On Tue, Dec 12, 2017 at 3:07 PM, Kutzner, Carsten <ckutzne at gwdg.de> wrote:

> Hi,
>
> what are the expected performance benefits of AVX_512 SIMD instructions
> on Intel Skylake processors, compared to AVX2_256? In many cases, I see
> a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
> AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
> not slower than inferior instruction sets.
>
> Some quick benchmarks results:
> Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
> 80k atoms membrane benchmark system, 2 fs time step, pme on cpu
>
> GROMACS v.    SIMD        ns/d
> 2016          AVX_512     102.3
> 2016          AVX2_256    119.3
> 2018b2        AVX_512     107.9
> 2018b2        AVX2_256    123.2
>
> I realize that AVX_512 turbo frequencies are significantly lower
> compared to AVX2_256 if all cores are in use, and for a serial run,
> AVX_512 is indeed by about 6% faster than AVX2_256.
>

By "serial" you mean single threaded runs? Single-core turbo on this 165W
CPU will be pretty high (>=4.2 GHz) and it will not likely to reflect the
relative difference at the base-clock.

Gromacs 2018b2, -nb cpu
> thread-MPI  ns/day   ns/day     improvement
> threads     AVX_512  AVX2_256   over AVX2
>  1           2.880    2.702     1.065
>  2           5.451    5.209     1.046
>  4           9.617    9.332     1.031
>  8          17.469   17.276     1.011
> 12          21.852   24.245      .901
> 16          28.579   31.691      .902
> 24          39.731   41.576      .956
> 48          41.831   39.336     1.063
>

Does this mean that for all but row 5,7, and 8 last two rows you left
socket(s) partially empty?


Cheers,
--
Szilárd


> Can anyone comment on whether that is the expected behavior and why?
>
> Thanks!
>   Carsten
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>


More information about the gromacs.org_gmx-users mailing list