[csswg-drafts] [css-text] Need additional value of word-break for Korean (#4285) from Florian Rivoal via GitHub on 2019-09-09 (public-css-archive@w3.org from September 2019)

From: Florian Rivoal via GitHub <sysbot+gh@w3.org>
Date: Mon, 09 Sep 2019 02:33:02 +0000
To: public-css-archive@w3.org
Message-ID: <issues.opened-490847663-1567996380-sysbot+gh@w3.org>
frivoal has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-text] Need additional value of word-break for Korean ==
In regular situations, `word-break: normal` is expected to pick the right kind of word breaking for various scripts, doing keeping letters of a word together in languages that have word-based line breaking, while allowing wraps in the between letters of a word in languages where that's the normal behavior.

However, Korean typography has been evolving, and while the `normal` values corresponds to what used to be normal, and needs to continue to have this behavior for compat reasons, the preferred behavior is increasingly the one achieved by `keep-all`.

In a document that where all parts are properly language tagged, `* { word-break: normal; } lang(ko) { word-break: keep-all; }` achieves the desired behavior.

However, this is not quite enough to solve the problem in the case of documents with user-generated content: when a user types content in a textarea, or a contenteditable (of if user generated content is retrieved from a database), the author of the page does not generally know what the language is, and cannot tag it in the markup. The following options are available to them, none of them great:
* use `word-break: normal` on elements accepting user input: This will do "the right thing" for all languages, except for Korean, which will break too often.
* use `word-break: keep-all` on elements accepting user input: this will do "the right thing" for space separated languages, including Korean, but will badly break languages like Japanese or Chinese, by disabling wrapping opportunities and causing potential overflow.
* use `word-break: normal` on elements accepting user input, but also add a piece of javascript that monitors the content for changes, and switches the whole element to `work-break: keep-all` if any hangul text is detected:
  - This breaks if the content input by the user contains a mixture of Korean and languages like Japanese or Chinese, as it would apply `keep-all` to them as well.
  - This isn't a purely declarative solution, so it fail if Javascript is disabled
* Use `* { word-break: normal; } lang(ko) { word-break: keep-all; }` together with a piece of Javascript that adds the `lang=ko` attribute (and creates spans/divs as necessary to apply it) on the parts of the text input by the user that contain hangul, and lang="" (or lang=somethingelse, if the somethingelse can be detected reliably) on parts that don't:
  - Getting this script right is very difficult. Not merely because of how it must analyse the content and adjust the markup accordingly, but also because of how it would need to integrate with editing operations: how to make these DOM modifications inside a content editable in a way that is compatible with the browser's undo stack? How to make them in a way that doesn't interfere with ongoing IME operations? How to make them in a way that is compatible with the hodge podge of markup that different browsers may generate inside a `contenteditable` element? etc
  - Getting this script to be correct AND performant is even harder. But performance is important: not all user input is tweet-sized. Think for instance of an online document editor, which may contain multiple pages of (multilingual) rich text.
   - This isn't a purely declarative solution, so it fail if Javascript is disabled

So, to solve this, I propose that we add a `keep-all-hangul` value, that behaves the same as `keep-all` for the unicode characters that correspond to hangul, and `normal` for everything else.

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/4285 using your GitHub account
Received on Monday, 9 September 2019 02:33:03 UTC