[gmx-users] Performance gains with AVX_512 ?
Szilárd Páll
pall.szilard at gmail.com
Tue Dec 12 17:58:59 CET 2017
Hi Carsten,
The performance behavior you observe is expected, I have observed it
myself. Nothing seems unusual in the performance numbers you report.
The AVX512 clock throttle is additional (10-20% IIRC) to the AVX2 throttle,
and the only code that really gains significantly from AVX512 is the
nonbonded kernels. When those are offloaded, the gain from higher clocks
with AVX2 will translate to better CPU performance (and especially if the
run is CPU-bound, that will make a significant difference).
BTW, on the low- and mid-range CPUs ("Bronze"/"Silver" and "cut-down" i9s)
AVX512 is even less likely to ever be worth it.
Cheers,
--
Szilárd
On Tue, Dec 12, 2017 at 3:07 PM, Kutzner, Carsten <ckutzne at gwdg.de> wrote:
> Hi,
>
> what are the expected performance benefits of AVX_512 SIMD instructions
> on Intel Skylake processors, compared to AVX2_256? In many cases, I see
> a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
> AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
> not slower than inferior instruction sets.
>
> Some quick benchmarks results:
> Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
> 80k atoms membrane benchmark system, 2 fs time step, pme on cpu
>
> GROMACS v. SIMD ns/d
> 2016 AVX_512 102.3
> 2016 AVX2_256 119.3
> 2018b2 AVX_512 107.9
> 2018b2 AVX2_256 123.2
>
> I realize that AVX_512 turbo frequencies are significantly lower
> compared to AVX2_256 if all cores are in use, and for a serial run,
> AVX_512 is indeed by about 6% faster than AVX2_256.
>
By "serial" you mean single threaded runs? Single-core turbo on this 165W
CPU will be pretty high (>=4.2 GHz) and it will not likely to reflect the
relative difference at the base-clock.
Gromacs 2018b2, -nb cpu
> thread-MPI ns/day ns/day improvement
> threads AVX_512 AVX2_256 over AVX2
> 1 2.880 2.702 1.065
> 2 5.451 5.209 1.046
> 4 9.617 9.332 1.031
> 8 17.469 17.276 1.011
> 12 21.852 24.245 .901
> 16 28.579 31.691 .902
> 24 39.731 41.576 .956
> 48 41.831 39.336 1.063
>
Does this mean that for all but row 5,7, and 8 last two rows you left
socket(s) partially empty?
Cheers,
--
Szilárd
> Can anyone comment on whether that is the expected behavior and why?
>
> Thanks!
> Carsten
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/
> Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-request at gromacs.org.
>
More information about the gromacs.org_gmx-users
mailing list