META managed to spend a lot of money into AI to achieve inferior results. Something must change for sure, and you don't want an LLM skeptic at home, in my opinion, especially since the problem is not what LeCun is saying right now (LLMs are not the straight path to AGI), but the fact it used to say for some time that LLMs were just statistical models, stochastic parrots (and this is a precise statement, something most people do not understand. It means two things: no understanding of the prompt whatsoever in the activation states, and no internal representation of the idea/sentence the model is going to express either), which is an incredibly weak statement that high level AI scientists refused since the start just because of functional behaviors. Then he slowly changed the point of view. But this shit show and the friction he created inside META is not something to forget.
The problem is the framing. Reductionism always sounds smart and is rhetorically effective but usually just loses all nuance or meaning. I've never met a parrot (stochastic or otherwise) that could write python code or rewrite my emails so what is the point of you describing it like that besides wanting to sound smug and dismissive?
The point is that next-token prediction produces output by sampling from distributions assembled by text it has seen previously (hence stochastic). The “ding” or claim is that - like a parrot - LLMs can’t produce responses which are truly novel in concept or make logical out-of-sample leaps, only repeat from words they’ve been taught explicitly in the past.
So you think stochastic parrot is an accurate term and not an attempt to be dismissive? So if someone woke up from a coma and asked what ChatGPT is you would say "stochastic parrot" and think you've explained things?
While “stochastic parrot” is obviously an over-simplification, and the way the phrase was originally coined in context was likewise obviously intended to be dismissive, I think the analogy holds. I see them as a lossy storage system.
I think the expectation that simply continuing to scale the transformer architecture is not likely to exhibit the type of “intelligence” for which _researchers_ are looking.
For my personal taste, the most interesting development of NLP in this latest AI wave (and LLMs in general) is RAG. I also have always wondered why the tokenization process hasn’t been deemed more important historically. To me, it seems like THE MOST critical part of how Deep Learning works.