[gmx-users] Performance gains with AVX_512 ?
Kutzner, Carsten
ckutzne at gwdg.de
Tue Dec 12 23:11:18 CET 2017
Hi Szilárd,
> On 12. Dec 2017, at 17:58, Szilárd Páll <pall.szilard at gmail.com> wrote:
>
> Hi Carsten,
>
> The performance behavior you observe is expected, I have observed it
> myself. Nothing seems unusual in the performance numbers you report.
>
> The AVX512 clock throttle is additional (10-20% IIRC) to the AVX2 throttle,
> and the only code that really gains significantly from AVX512 is the
> nonbonded kernels. When those are offloaded, the gain from higher clocks
> with AVX2 will translate to better CPU performance (and especially if the
> run is CPU-bound, that will make a significant difference).
>
> BTW, on the low- and mid-range CPUs ("Bronze"/"Silver" and "cut-down" i9s)
> AVX512 is even less likely to ever be worth it.
So using AVX2 on GPU nodes seems generally to be the fastest option.
Thanks a lot for the info!
Best,
Carsten
>
> Cheers,
>
> --
> Szilárd
>
> On Tue, Dec 12, 2017 at 3:07 PM, Kutzner, Carsten <ckutzne at gwdg.de> wrote:
>
>> Hi,
>>
>> what are the expected performance benefits of AVX_512 SIMD instructions
>> on Intel Skylake processors, compared to AVX2_256? In many cases, I see
>> a significantly (15 %) higher GROMACS 2016 / 2018b2 performance when using
>> AVX2_256 instead of AVX_512. I would have guessed that AVX_512 is at least
>> not slower than inferior instruction sets.
>>
>> Some quick benchmarks results:
>> Node with 2x12 core (48 threads) Xeon Gold 6146 plus 2x GTX 1080Ti
>> 80k atoms membrane benchmark system, 2 fs time step, pme on cpu
>>
>> GROMACS v. SIMD ns/d
>> 2016 AVX_512 102.3
>> 2016 AVX2_256 119.3
>> 2018b2 AVX_512 107.9
>> 2018b2 AVX2_256 123.2
>>
>> I realize that AVX_512 turbo frequencies are significantly lower
>> compared to AVX2_256 if all cores are in use, and for a serial run,
>> AVX_512 is indeed by about 6% faster than AVX2_256.
>>
>
> By "serial" you mean single threaded runs? Single-core turbo on this 165W
> CPU will be pretty high (>=4.2 GHz) and it will not likely to reflect the
> relative difference at the base-clock.
>
> Gromacs 2018b2, -nb cpu
>> thread-MPI ns/day ns/day improvement
>> threads AVX_512 AVX2_256 over AVX2
>> 1 2.880 2.702 1.065
>> 2 5.451 5.209 1.046
>> 4 9.617 9.332 1.031
>> 8 17.469 17.276 1.011
>> 12 21.852 24.245 .901
>> 16 28.579 31.691 .902
>> 24 39.731 41.576 .956
>> 48 41.831 39.336 1.063
>>
>
> Does this mean that for all but row 5,7, and 8 last two rows you left
> socket(s) partially empty?
>
>
> Cheers,
> --
> Szilárd
>
>
>> Can anyone comment on whether that is the expected behavior and why?
>>
>> Thanks!
>> Carsten
>>
>>
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/
>> Support/Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-request at gromacs.org.
>>
> --
> Gromacs Users mailing list
>
> * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-request at gromacs.org.
More information about the gromacs.org_gmx-users
mailing list