Given that one can make graphemes that contain many Unicode code points (213 in this extreme example: https://www.reddit.com/r/Unicode/comments/4yie0a/tallest_lon...), I wondered whether that can be correct for any reasonable definition of "fully composed character".
As far as I can tell, it turned out to be correct, though. Perl 6 has its own Inocode normalization variant called NFG (https://design.perl6.org/S15.html#NFG):
"Formally Perl 6 graphemes are defined exactly according to Unicode Grapheme Cluster Boundaries at level "extended" (in contrast to "tailored" or "legacy"), see Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION 3 Grapheme Cluster Boundaries>. This is the same as the Perl 5 character class
\X Match Unicode "eXtended grapheme cluster"
With NFG, strings start by being run through the normal NFC process, compressing any given character sequences into precomposed characters.
Any graphemes remaining without precomposed characters, such as ậ or नि, are given their own internal designation to refer to them, at least 32 bits in length, in such a way that they avoid clashing with any potential future changes to Unicode. The mapping between these internal designations and graphemes in this form is not guaranteed constant, even between strings in the same process."
From that, I guess Perl 6 extends its mapping between NFG code points (an extension of Unicode code points) and Unicode graphemes clusters whenever it encounters a grapheme cluster it hasn't seen before. Ignoring performance concerns (might not be bad, but I'm not sure about that), that seems a nice approach.
As far as I can tell, it turned out to be correct, though. Perl 6 has its own Inocode normalization variant called NFG (https://design.perl6.org/S15.html#NFG):
"Formally Perl 6 graphemes are defined exactly according to Unicode Grapheme Cluster Boundaries at level "extended" (in contrast to "tailored" or "legacy"), see Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION 3 Grapheme Cluster Boundaries>. This is the same as the Perl 5 character class
With NFG, strings start by being run through the normal NFC process, compressing any given character sequences into precomposed characters.Any graphemes remaining without precomposed characters, such as ậ or नि, are given their own internal designation to refer to them, at least 32 bits in length, in such a way that they avoid clashing with any potential future changes to Unicode. The mapping between these internal designations and graphemes in this form is not guaranteed constant, even between strings in the same process."
From that, I guess Perl 6 extends its mapping between NFG code points (an extension of Unicode code points) and Unicode graphemes clusters whenever it encounters a grapheme cluster it hasn't seen before. Ignoring performance concerns (might not be bad, but I'm not sure about that), that seems a nice approach.