W3C

– DRAFT –
Clreq Editors' Call

07 November 2019

Attendees

Present
Eric, xfq
Regrets
huijing, Yijun
Chair
xfq
Scribe
xfq

Meeting minutes

Line breaking for ambiguous characters

https://‌github.com/‌w3c/‌csswg-drafts/‌issues/‌4419

Eric: I saw this issue
… two things
… the code point of dash in Chinese and English is the same
… in Latin, there are En Dash and Em Dash
… in Chinese, there are connector marks and two-em dashes
… @@
… I wrote an article about two-em dashes
… in strict line-breaking rules, there are prohibition rules for line start/end

https://‌thetype.com/‌2019/‌03/‌14918/

Eric: the two-em dash is already two-character wide
… using prohibition rules will cause the position of three characters to be moved
… that would be very ugly, especially in cases where column width is narrow, like magazine
… I don't think prohibition rules for line start/end are needed for two-em dashes personally
… but two-em dashes shouldn't be broken in the middle
… the situation for other dash-like characters is the same
… in strict line-breaking rules, there are prohibition rules, but normally we don't enforce them
… these dash-like characters can appear at line end, but not line start
… for example, connector marks are used to indicate ranges
… I think Unicode also has line breaking algorithm for these characters

xfq: do you mean UAX #14?

Eric: yes
… U+2010 (hyphen) is not used in Chinese
… it is only used for hyphenation in English
… English uses U+2013 (En Dash) to indicate range
… Chinese sometimes uses U+2013 for @@
… U+2013 is shorter than U+2014
… I added a note in the Connector Marks section

https://‌w3c.github.io/‌clreq/#h-note-23

Eric: The Guobiao standard does not state the corresponding code point for the three types of connector marks
… we can make the deduction that the long connector mark [—] is U+2014 EM DASH [—]
… and the tilde [~] can be U+FF5E FULLWIDTH TILDE [~]
… the width of U+301C (wave dash) is font-dependent and can not be guaranteed

xfq: U+FF5E should be fullwidth

Eric: but the width of U+301C is not clear
… so I recommend using U+FF5E
… I am worried that clreq and UAX #14 are not consistent

xfq: UAX #14 behavior can be overriden by browsers if necessary

Eric: since the short connector mark should take half the width of the long connector mark
… it should be U+2013 EN DASH [–]
… the position of dash/hyphen is usually not vertically centered
… it's in the middle between ascender and descender
… but in Chinese it should be vertically centered

xfq: can we adjust the position of the dash character when lang=zh-* ?

Eric: it is possible
… but the font needs to support it
… let's go back to the issue itself

xfq: Koji mentioned that Gecko supports the breaks before hyphens rule only when the hyphen-like character follows Japanese characters, and not when they follow Latin letters
… and this seems to make sense to be added to the CSS spec

Eric: https://‌github.com/‌w3c/‌csswg-drafts/‌issues/‌4419#issuecomment-550116964
… "I checked JLREQ line break table, and found that it prohibits break before cl-03."
… it prohibits break only *before* hyphens, not after them
… so hyphens can be used at line end
… Chinese is the same as Japanese in this respect
… we should also check UAX #14
https://‌unicode.org/‌reports/‌tr14/
https://‌unicode.org/‌reports/‌tr14/#Table1
… Em dash
… is B2
… hyphens are BA
… if Em dash is used in Chinese, it may appear at line start, and this is wrong

xfq: so Em dash
… should be BA in Chinese?

Eric: yes

xfq: Em dash is not mentioned in https://‌github.com/‌w3c/‌csswg-drafts/‌issues/‌4419
… that's a different issue

Eric: yes
… if you use two consecutive em dashes for two-em dash, it might be broken in the middle, per UAX #14
… U+2E3A TWO EM DASH is not widely supported by fonts and input methods

xfq: Ellipses might suffer from similar issues

Eric: Source Han Sans uses ccmp (Glyph Composition) to render two U+2014 as a U+2E3A

xfq: in summary, the rules for Chinese and Japanese is the same for https://‌github.com/‌w3c/‌csswg-drafts/‌issues/‌4419

Eric: yes

What does fangsong map to for non chinese text

https://‌github.com/‌w3c/‌csswg-drafts/‌issues/‌4425

xfq: what font should fangsong should map to for characters outside of Chinese?
… we need to think more, not only for fangsong, but also for other new generic font families

Minutes manually created (not a transcript), formatted by Bert Bos's scribe.perl version Mon Apr 15 13:11:59 2019 UTC, a reimplementation of David Booth's scribe.perl. See history.