Old fart version: Back in the days of $1000+ 10mbps ethernet cards, I figured out how to transfer 3mbps over the net and then use an 8 bit at a time state machine (not enough RAM for more) to de-huffman 12bit diffed pixel values to achieve 10mbps lossless transfer of greyscale images with a 3mip CPU.
With a fortuitous natural interleaving effect on the network combined with an inability of the computers to send more than about 3mbps, it could support three of these transfers simultaneously, 9mbps on the wire, leaving people scratching their head about ethernet being "proved"[1] to not be able to carry more than 1mbps.
Interestingly, the state machine approach is almost certainly a loser now that CPUs are so much faster than memory. Better to take the 8 little branches (~4 mis-predictions) and stay in cache. Probably… maybe a 4 bit at a time state machine would keep the table small enough for L1 cache…
[1] proved for a certain set of givens, which turned out not to be true in real life, thank goodness, or we'd all be using Token Ring and FDDI networks now.