Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In this article, they saw the following speeds:

Original: 18 GB/s

AVX2: 20 GB/s

AVX512: 21 GB/s

This is an AMD CPU, but it's clear that the AVX512 benefits are marginal over the AVX2 version. Note that Intel's consumer chips do support AVX2, even on the E-cores.

But there's more to the story: This is a single-threaded benchmark. Intel gave up AVX512 to free up die space for more cores. Intel's top of the line consumer part has 24 cores as a result, whereas AMD's top consumer part has 16. We'd have to look at actual Intel benchmarks to see, but if the AVX2 to AVX512 improvements are marginal, a multithreaded AVX2 version across more cores would likely outperform a multithreaded AVX512 version across fewer cores. Note that Intel's E-cores run AVX2 instructions slower than the P-cores, but again the AVX boost is marginal in this benchmark anyway.

I know people like to get angry at Intel for taking a feature away, but the real-world benefit of having AVX512 instead of only AVX2 is very minimal. In most cases, it's probably offset by having extra cores working on the problem. There are very specific workloads, often single-threaded, that benefit from AVX-512, but on a blended mix of applications and benchmarks I suspect Intel made an informed decision to do what they did.



> We'd have to look at actual Intel benchmarks to see, but if the AVX2 to AVX512 improvements are marginal, a multithreaded AVX2 version across more cores would likely outperform a multithreaded AVX512 version across fewer cores.

Look at any existing heavily multithreaded benchmark like Blender rendering. The E-cores are so weak that it just about takes 2 of them to match the performance of an AMD core. If the only difference was AVX512 support then yeah, 24 AVX2 cores would beat 16 AVX-512 cores. But that's not the only difference, not even close.

That's not to say a 24 core Core 9 Ultra Whatever would be slower than a 16 core 9950X in this workload. Just that the E-cores are kinda shit, especially in the wonky counts Intel is using (too many to just be about power efficiency, too few to really offset how slow they are)


> The E-cores are so weak that it just about takes 2 of them to match the performance of an AMD core.

That's not "weak". If you look at available die-shot analyses, the E-cores are tiny compared to the P-cores, they take up a lot less than half in area and even less in power. P-cores are really only useful for the rare pure single-threaded workload, but E-cores will win otherwise.


We're not comparing to Intel's P cores but AMDs cores. 8 of AMDs cores fit in 70.6mm2 on a high performance process, and take up a fraction of that space on a high density process (see the 192 core Zen 5c chips)


AVX2 vs AVX512 in this case may be somewhat misleading. In .NET, even if you use 256bit-wide vectors, it will still take advantage of AVX512VL whenever available to fuse chained operations into masked, vpternlogd's, etc.[0] (plus standard operations like stack zeroing, struct copying, string comparison, element search, and other can use the full width)[1]

So to force true AVX2 the benchmark would have to be ran with `DOTNET_EnableAVX512F=0` which I assume is not the case here.

[0]: https://devblogs.microsoft.com/dotnet/performance-improvemen...

[1]: https://devblogs.microsoft.com/dotnet/performance-improvemen...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: