Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a benchmark which works similarly but they ask harder questions, also based on books https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...

I guess they have to add more questions as these context windows get bigger.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: