Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

all they need is an API compatible client library so there is no actual switching cost between models other than configuration. There's a reason OpenAI is adding all sorts of add-on features like assistants and file upload, because they know models themselves are going to be a commodity and they need something to lock developers on their platform


Code execution and RAG are not going to lock people in. They are 1000x easier to replicate than the model, which as you say, is already becoming a commodity.

My pet theory is that OpenAI are cooking high quality user data by empowering GPT with all these toys + human-in-the-loop. The purpose is to use this data as a sort of continual evaluation sifting for weak points and enhancing their fine-tuning datasets.

Every human response can carry positive or negative connotation. The model can use that as a reward signal. They claimed to have 100M users, times let's say 10K tokens per month makes 1T synthetic tokens. In a whole year they generate about as much text as the original dataset, 13T. And we know that LLMs can benefit a lot from synthetic data when it is filtered/engineered for quality.

So I think OpenAI's moat is the data they generate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: