Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: 26x speedup on BitNet sparse ops with AVX-512 and 2-bit encoding (github.com/microsoft)
2 points by HyperFoldUK 10 days ago | hide | past | favorite | discuss
I've been optimizing ternary operations for BitNet 1.58b and found significant overhead in the current implementation.

I wrote a dependency-free C kernel (sparse-ternary-fma) using 2-bit encoding and AVX-512 instructions.

Benchmarks on Intel Xeon (N=4096):

Throughput (Dense): 2.38x faster (8.21 GFLOPS vs 3.45 AVX2)

Throughput (Sparse 80% zeros): 26.12x faster (23.25 GFLOPS vs 0.89 Scalar)

Memory: 4x denser (2-bit vs 8-bit standard)

This approach packs 4 trits per byte and leverages sparsity-aware FMA to skip zero-valued weights, which is critical for 1.58-bit quantization efficiency.

PR is pending on the Microsoft BitNet repo. Code is open source here:https://github.com/microsoft/BitNet/pull/365





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: