Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"strings are arrays of bytes" combined with the assumption that "characters are a single byte" sounds basically the same as the "array of code points" that the parent comment is disagreeing with


Code points are not bytes.


Sure, but if you're insisting that the string be represented as one byte per character, you end up with the exact same properties with "array of code points" and "array of bytes"


Sort-of, but no, because code points are not characters.

There's a big difference between "get the 5th code point" and "get the 5th character".

Because multiple code points can be used in a single character, it not possible to do random-character-access in a unicode-encoded string.


No, it's impossible to do random access to retrieve a character, if you are dealing with code points, because code points do not have a fixed byte size. I thought this a good intro <https://tonsky.me/blog/unicode/>.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: