I feel the same, but cannot measure the effect in any context benchmark like fic... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		arnaudsm 6 months ago \| parent \| context \| favorite \| on: Gemini 2.5 Deep Think I feel the same, but cannot measure the effect in any context benchmark like fiction.livebench. Are they aggressively quantizing, or are our expectations silently increasing ?

nusl 6 months ago [–]

Yeah, it's hard to measure. Not sure about our expectations, though I recall way better output when I first started using Gemini 2.5 vs now. It seems to be stupider and more headstrong somehow?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact