[iip] Conjuncts are not selected as a single unit when styling initials (#116)

r12a has just created a new issue for https://github.com/w3c/iip:

== Conjuncts are not selected as a single unit when styling initials ==
When the start of a line contains a consonant cluster that uses a conjunct (rather than visible virama), ::first-letter should highlight the whole cluster. Usually, modern Tamil has only two of these conjuncts, however one of them can be created in two ways (making a total of 3 clusters to test).

This doesn't work well if segmentation relies on Unicode grapheme clusters, since a conjunct with two consonants will be parsed as two grapheme clusters (the first ending after the virama, and the second starting with the second consonant and including any following vowel-signs or other combining characters).

For these situations it is necessary to tailor the segmentation algorithm, so that it recognises the whole consonant cluster plus any attached vowel-signs or combining characters as a single unit.  This is a particular issue for Tamil, since all other clusters are typically decomposed and show the virama.



<b class="subhead">Specs:</b>

[css-text-3](https://drafts.csswg.org/css-text-3/#typographic-character-unit) CSS uses the concept of <a href="https://drafts.csswg.org/css-text-3/#typographic-character-unit">'typographic character unit'</a>, rather than grapheme cluster, in its specs with the explanation that the cases just described go beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support. The spec doesn't provide details about the support needed for each language.

The Unicode Consortium made some attempts to address this issue, but it has so far not yielded results.  CLDR now flags up a few scripts for which conjuncts are common.  Tamil is not among them.


<b class="subhead">Tests & results:</b>
<i>Interactive test</i>, [When ::first-letter is applied to Tamil the browser will select the KSHA and SHRI conjuncts as a single unit](https://github.com/w3c/line_paragraph_tests/issues/72)<br>
<span class="pass">Gecko</span> produces the expected result. <span class="fail">Blink</span>, and <span class="fail">Webkit</span> only select the first consonant+pulli.




<b class="subhead">Priority:</b>
Keeping conjuncts together is a pretty basic requirement.  Without a fix for this, authors need to manually mark up text to apply initial letter styling, but that isn't a very useful workaround.


Please view or discuss this issue at https://github.com/w3c/iip/issues/116 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Tuesday, 30 March 2021 14:16:30 UTC