ISSUE-313: Definition of grapheme clusters ⓒ

Definition of grapheme clusters ⓒ

State:
CLOSED
Product:
css-text
Raised by:
Richard Ishida
Opened on:
2013-12-11
Description:
1.3. Terminology
http://www.w3.org/TR/css3-text/#terms

"A grapheme cluster is what a language user considers to be a character or a basic unit of the script."
"The UA may further tailor the definition as required by typographical tradition."
Example 1

I think a grapheme cluster should be defined in the CSS spec as follows: A grapheme cluster is a sequence of characters as defined by the Unicode specification that should be treated as a unit for typographic processing. This generally approximates to what a language user considers to be a letter or basic unit of the script.

I don't think applications should redefine what a grapheme cluster is; that definition is established by the Unicode standard. Rather, we should say that applications sometimes require additional rules beyond the use of 'grapheme clusters' in order to handle the typographic traditions of particular scripts.

An appropriate example for this section of where further rules are needed is that of Devanagari syllables, where the grapheme cluster only includes part of the syllable. For an example, see the last picture on the page at http://rishida.net/docs/unicode-tutorial/part3#graphemes and the text below it. For most operations that rely on grapheme clusters, Devanagari needs additional rules to keep together the whole typographic syllable. This issue is relevant for a large proportion of complex scripts.

I think that the example of the Thai behaviour may be better as a note in the letter-space and justification sections, especially since I believe that the behaviour described is not relevant for line breaking and other operations.

It may be worth mentioning, also, that although the Thai examples show that U+0E33 THAI CHARACTER SARA AM needs to be decomposed first, the desired behaviour still relies on correct application of the standard grapheme cluster rules thereafter to ensure that the small circle resulting from the decomposition stays with the base character and other associated diacritics.

Related Actions Items:
No related actions
Related emails:
  1. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from ishida@w3.org on 2014-08-07)
  2. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from lang.support@gmail.com on 2014-06-26)
  3. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from jjc@jclark.com on 2014-06-26)
  4. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from fantasai.lists@inkedblade.net on 2014-06-25)
  5. [minutes] Internationalization telecon 2014-05-29 (from ishida@w3.org on 2014-05-29)
  6. [minutes] Internationalization telecon 2014-04-24 (from ishida@w3.org on 2014-05-28)
  7. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from ishida@w3.org on 2014-05-22)
  8. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from ishida@w3.org on 2014-02-21)
  9. RE: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from addison@lab126.com on 2014-01-24)
  10. Re: [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from cowan@mercury.ccil.org on 2014-01-24)
  11. [css-text] I18N-ISSUE-313: Definition of grapheme clusters (from addison@lab126.com on 2014-01-24)
  12. [minutes] Internationalization telecon 2013-12-19 (from ishida@w3.org on 2014-01-14)
  13. I18N-ISSUE-313: Definition of grapheme clusters [.prep-CSS3-text] (from sysbot+tracker@w3.org on 2013-12-11)

Related notes:

WG comment.

Addison Phillips, 23 Jan 2014, 18:05:04

I think the text now looks pretty good in the latest ed version.

Richard Ishida, 26 Jun 2014, 13:30:36

Display change log ATOM feed


Addison Phillips <addisonI18N@gmail.com>, Chair, Richard Ishida <ishida@w3.org>, Bert Bos <bert@w3.org>, Fuqiao Xue <xfq@w3.org>, Atsushi Shimono <atsushi@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 313.html,v 1.1 2023/07/19 12:02:02 carcone Exp $