CPU inference of even the smaller version of GPT-3 would be way too slow for a public API.
What CPU did you benchmark on that gave you a cheaper inference price than GPU? In my experience, for GPT like transformers, they don’t come anywhere near what you can squeeze out of something like a Nvidia T4 in terms of either performance or $/token.