Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CPU inference, even of large transformers, is usually cheaper and is viable with enough strong CPUs.


CPU inference of even the smaller version of GPT-3 would be way too slow for a public API.

What CPU did you benchmark on that gave you a cheaper inference price than GPU? In my experience, for GPT like transformers, they don’t come anywhere near what you can squeeze out of something like a Nvidia T4 in terms of either performance or $/token.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: