CPU inference, even of large transformers, is usually cheaper and is viable with...

sailingparrot · on June 8, 2022

CPU inference of even the smaller version of GPT-3 would be way too slow for a public API.

What CPU did you benchmark on that gave you a cheaper inference price than GPU? In my experience, for GPT like transformers, they don’t come anywhere near what you can squeeze out of something like a Nvidia T4 in terms of either performance or $/token.