"strings are arrays of bytes" combined with the assumption that "characters are ...

foolswisdom · on Nov 27, 2023

Code points are not bytes.

saghm · on Nov 27, 2023

Sure, but if you're insisting that the string be represented as one byte per character, you end up with the exact same properties with "array of code points" and "array of bytes"

bruce511 · on Nov 27, 2023

Sort-of, but no, because code points are not characters.

There's a big difference between "get the 5th code point" and "get the 5th character".

Because multiple code points can be used in a single character, it not possible to do random-character-access in a unicode-encoded string.

foolswisdom · on Nov 27, 2023

No, it's impossible to do random access to retrieve a character, if you are dealing with code points, because code points do not have a fixed byte size. I thought this a good intro <https://tonsky.me/blog/unicode/>.