The fear is that you won’t get better model from training on synthetic data from a worse model,
and you will miss a lot of modern day knowledge.
Imho the first is not necessarily so, but the second will be a real hindrance. You can talk about the pandemic with LLMs because they were trained on people’s
comments on the subject, but if we had training limits you wouldn’t be able to do so.
Otoh with many cases a sufficiently smart AI can just reach out to the open source material - e.g. if you want to know about modern day politics of Europe, you can read reports of commissions, and if you want info on a new framework/tech, you can just read source code or scientific papers
Imho the first is not necessarily so, but the second will be a real hindrance. You can talk about the pandemic with LLMs because they were trained on people’s comments on the subject, but if we had training limits you wouldn’t be able to do so.
Otoh with many cases a sufficiently smart AI can just reach out to the open source material - e.g. if you want to know about modern day politics of Europe, you can read reports of commissions, and if you want info on a new framework/tech, you can just read source code or scientific papers