all they need is an API compatible client library so there is no actual switchin...

visarga · on Dec 21, 2023

Code execution and RAG are not going to lock people in. They are 1000x easier to replicate than the model, which as you say, is already becoming a commodity.

My pet theory is that OpenAI are cooking high quality user data by empowering GPT with all these toys + human-in-the-loop. The purpose is to use this data as a sort of continual evaluation sifting for weak points and enhancing their fine-tuning datasets.

Every human response can carry positive or negative connotation. The model can use that as a reward signal. They claimed to have 100M users, times let's say 10K tokens per month makes 1T synthetic tokens. In a whole year they generate about as much text as the original dataset, 13T. And we know that LLMs can benefit a lot from synthetic data when it is filtered/engineered for quality.

So I think OpenAI's moat is the data they generate.