> there are some benchmarks which show fundamental inability of LLM perform certain tasks which human can, for example add 100 digits numbers.
fundamental inability ? No. Current Sota LLM (4o, claude, gemini) woes with arithmetic is not a transformer weakness never mind a large language modelling one. Those benchmarks show that those particular models have problems with accuracy on that many digits, not that LLMs fundamentally cannot be accurate on that many digits.
Ultimately, numbers are represented in GPT like systems in a pretty weird way. That way affects how well they can learn to do things like arithmetic and counting and the like but that way isn't a necessary way. You don't have to represent numbers like that.
they built specialized model which is after bunch of trickery still has 99% accuracy(naive model had very low accuracy) on very simple deterministic algo. I also think most of the accuracy came from memorization of training set(model didn't provide intermediate results, and started failing significantly at sligtly larger input). In my book it is fundamental inability to learn and reproduce algorithm.
They also demonstrated that transformer can't learn sorting.
Their method does not require building specialized models from scratch (you can but you don't have to) and they did not prove transformers can't learn sorting. If you think they did, then you don't understand what it means to prove something.
In my books what they built (specialized training data + specialized embeddings format) is exactly specialized model. You can disagree of course and say again that I don't understand something, but discussion will be over.
A poor result when testing one model is not proof that the architecture behind the model is incapable of getting good results. It's just that simple. The same way seeing the OG GPT-3 fail at chess was not proof LLMs can't play chess.
This
>They also demonstrated that transformer can't learn sorting.
> No it didn't. They tested up to 100 digits with very high accuracy. I don't think you even read the abstract of this, nevermind the actual paper.
they have two OOD (out of distribution) accuracies in the paper: OOD: up to 100 digits, and 100+ OOD: 100-160 digits. 100+ OOD accuracy is significantly worse: around 30%.
fundamental inability ? No. Current Sota LLM (4o, claude, gemini) woes with arithmetic is not a transformer weakness never mind a large language modelling one. Those benchmarks show that those particular models have problems with accuracy on that many digits, not that LLMs fundamentally cannot be accurate on that many digits.
https://arxiv.org/abs/2405.17399
https://arxiv.org/abs/2307.03381
https://arxiv.org/abs/2310.02989
Ultimately, numbers are represented in GPT like systems in a pretty weird way. That way affects how well they can learn to do things like arithmetic and counting and the like but that way isn't a necessary way. You don't have to represent numbers like that.