[csswg-drafts] [css-text-3] Discarding Line Breaks Adjacent to Ambiguous Characters (#5017)

fantasai has just created a new issue for https://github.com/w3c/csswg-drafts:

== [css-text-3] Discarding Line Breaks Adjacent to Ambiguous Characters ==
The discussion in #337 has veered off in a wide variety of directions, but @hax originally filed the issue to bring up the question of "ambiguous" characters, i.e. those which are commonly used both within and outside Chinese and Japanese context:

> https://drafts.csswg.org/css-text-3/#line-break-transform
> 
> > Otherwise, if the East Asian Width property [UAX11] of both the character before and after the line feed is F, W, or H (not A), and neither side is Hangul, then the segment break is removed.
> 
> As this rule, common use cases of quotation marks in Chinese
>
> ```
> 简体中文的
> “引号”
> 两边不应该有空格。
> ```
> 
> will have unexpected spaces, because quotation marks are _A_.
> 
> Ideally, we should consider the language information of the context. If the context is East Asian language, _A_ should be treat as _W_. Even in the unknown language context, if any side of the line feed is _A_ and other side is _F_, _W_ or _H_, the segment break should also be removed.

We decided to switch to a Unicode Block listing instead of relying on the East Asian Width property (in particular due to some backwards-incompatible changes on Unicode's side). The current draft does not have a concept of ambiguous characters: all characters are strong "discard" or "don't discard", with discarding behavior requiring both sides of the line break to be "discard".

We might want to consider classifying some characters as "ambiguous", particularly symbols and maybe also the few common punctuation marks used in Chinese (double quotes, specifically). These could defer to the character on the other side, and if both are ambiguous, default to "don't discard".

Do we want to do this? If so, should it be language-dependent or universal?

Please view or discuss this issue at https://github.com/w3c/csswg-drafts/issues/5017 using your GitHub account

Received on Tuesday, 28 April 2020 07:30:07 UTC