Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The 5-year old counts with an algorithm: they remember the current number (working memory, roughly analogous to context), scan the page and move their finger to the next letter. They were taught this.

It's not much different than ChatGPT being trained write a to Python script.

A notable difference is that it's much more efficient to teach something new to a 5-year old than fine-tune or retrain an LLM.



A theory behind LLM intelligence is that the layer structure forms some sort of world model that has a much higher fidelity than simple pattern matching texts. In specific cases, like where the language is a DSL which maps perfectly to a representation of an Othello gameboard, this appears to actually be the case. But basic operations like returning the number of times the letter r appears in 'strawberry' form a useful counterexample: the LLM has ingested many hundreds of books explaining how letters spell out words and how to count (which are pretty simple concepts very easily stored in small amounts of computer memory) and yet its layers apparently couldn't model it from all that input (apparently an issue with being unable to parse a connection between the token 'strawberry' and its constituent letters... not exactly high-level reasoning).

It appears LLMs got RHLFed into generating suitable Python scripts after the issue was exposed, which is an efficient way of getting better answers, but feels rather like handing the child really struggling with their arithmetic a calculator...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: