Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No previous tool was able to learn on its own mistakes (RLVR).

It might be not enough by itself, but it shows that something has changed in comparison with the 70-odd previous years.





LLM's don't learn on their own mistakes in the same way that real developers and businesses do, at least not in a way that lends itself to RLVR.

Meaningful consequences of mistakes in software don't manifest themselves through compilation errors, but through business impacts which so far are very far outside of the scope of what an AI-assisted coding tool can comprehend.


> through business impacts which so far are very far outside of the scope of what an AI-assisted coding tool can comprehend.

That is, the problems are a) how to generate a training signal without formally verifiable results, b) hierarchical planning, c) credit assignment in a hierarchical planning system. Those problems are being worked on.

There are some preliminary research results that suggest that RL induces hierarchical reasoning in LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: