Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It also destroys models ability to follow phonetic or syntactic constraints: https://paperswithcode.com/paper/most-language-models-can-be...

I've been waiting for real innovation to come in tokenization algorithms. I think a truly well trained character level model would be awesome. Far easier to guide, and might even know how to "count" it's own output.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: