Some people here seem to think that indexing, measuring and slicing operations based on runes (code points) instead of bytes (UTF-8 code units) by default would be to be a good idea. It's not - you get the worst of both worlds: indexing is not a constant-time operation and a code point is still not a user-perceived character, because combining character sequences consist of multiple code points, even normalization doesn't help in general.
Other languages like C# seem to be different on the surface, but in fact they index and measure by code units as well (2 byte UTF-16 code units), not by code points.
> It's not - you get the worst of both worlds: indexing is not a constant-time operation
You can't usefully index a unicode stream in constant time and do correct and useful textual stuff anyway due to combining codepoints which may not have precombined forms (if only because there is no defined limit to the number of combining codepoints tacked onto the base) (so normalization will not save you) or codepoints which are not visible to the user and which you may or may not want to see depending on the work you're doing.
People really need to come to terms that a unicode stream is exactly that, a stream.
> You can't usefully index a unicode stream in constant time and do correct and useful textual stuff anyway
To find an index of a substring you need to scan the string, right. But once you have the byte index you can quickly jump to its position in the string, e.g. when you do a slice operation based on that index: s[i:].
If strings.Index() returned a code point index and not a byte index you would have to scan the string again.
> To find an index of a substring you need to scan the string, right. But once you have the byte index you can quickly jump to its position in the string
Stop doing that and just get the bit of string you want in the first place?
Other languages like C# seem to be different on the surface, but in fact they index and measure by code units as well (2 byte UTF-16 code units), not by code points.