Some people here seem to think that indexing, measuring and slicing operations b...

masklinn · on Oct 24, 2013

> It's not - you get the worst of both worlds: indexing is not a constant-time operation

You can't usefully index a unicode stream in constant time and do correct and useful textual stuff anyway due to combining codepoints which may not have precombined forms (if only because there is no defined limit to the number of combining codepoints tacked onto the base) (so normalization will not save you) or codepoints which are not visible to the user and which you may or may not want to see depending on the work you're doing.

People really need to come to terms that a unicode stream is exactly that, a stream.

iv_08 · on Oct 24, 2013

> You can't usefully index a unicode stream in constant time and do correct and useful textual stuff anyway

To find an index of a substring you need to scan the string, right. But once you have the byte index you can quickly jump to its position in the string, e.g. when you do a slice operation based on that index: s[i:]. If strings.Index() returned a code point index and not a byte index you would have to scan the string again.

masklinn · on Oct 24, 2013

> To find an index of a substring you need to scan the string, right. But once you have the byte index you can quickly jump to its position in the string

Stop doing that and just get the bit of string you want in the first place?

derefr · on Oct 24, 2013

> and a code point is still not a user-perceived character

How about indexing, measuring and slicing operations based on user-perceived characters, then?

iv_08 · on Oct 24, 2013

I think the number of displayed characters is even font-dependent.

codeka · on Oct 24, 2013

Because even that is not exactly trivial, particularly for noon-Latin languages.