Re: [css-text-4] feedback on hyphenation

[tl;dr - angst about hyphens that can be ignored]

On Tue, 2014-05-20 at 04:42 +0000, Koji Ishii wrote:

> If you still have use cases to specify this word is “food thief” and
> that word is “carpet thief”, it looks to me that it’s a semantic issue
> since you don’t want to change the meaning of words when styles were
> changed.

This happens in quite a few languages, and today's word processing and
typesetting systems (including TeX) deal with it using a dictionary.

It's not an edge case, it's part of doing hyphenation. A good approach
might be to say a user agent should not normally attempt hyphenation of
text in languages which that user agent does not support - that way,
e.g. a user who reads Swedish will probably have the Swedish locale and
dictionary installed, and will see hyphenated text, even if they are not
in Sweden, and even if their primary language is (say) Japanese, but if
they do not have the Swedish locale and dictionary installed, the text
will still make sense to them, and another user who sees the text and
perhaps copies and pastes it, won't get hyphenations that could change
the meaning.

The spec also needs to be clearer about how ­ interacts with the
user agent -- e.g copy/paste, search, and what to do if the character is
supplied as part of the value of a "content" property.

If hyphenation is under CSS control, how to you allow a word break with
no added hyphen after a / in one stylesheet and not in another? If a
long word contains a soft hyphen can the formatter break the word
elsewhere? What if it contains a "-"?

If the user agent hyphenates automatically, do the inserted hyphens
appear in the DOM or not? (this varies between browsers today I'm told).
And then in-page search is potentially affected.

People are creating Web pages with hyphenation in various incompatible
ways today. So, if hyphen is included, we should be clearer about what
it means. The Unicode Line Breaking Algorithm has some examples but they
are not sufficient for implementations: we need normative text, not just
examples.  The Unicode LBA doc has an example of SHY followed by NBHY
and examples suggesting what happens for Polish, but not for English.

Maybe it's OK if we work quickly on level 4 text and specifying it more,
and if there are good tests. But I fear that the text as written is
difficult to write tests for, because it's not sufficiently precise and
firm.

But maybe I should just welcome css text 3 hyphenation as a step forward
towards documenting what Web browsers do today, and hope for better in
the future. Antenna House Formatter (for example) supports at least some
of (maybe all of) the XSL-FO 2 hyphenation properties, and any rewrite
of a Web browser formatter's lie breaking would need to take hyphenation
requirements into account.

So I think I've talked myself into accepting css text 3 hyphen, although
I think it's crap :-) and trying to improve it for css text 4 (the draft
is still pretty minimal).

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

Received on Friday, 23 May 2014 23:43:35 UTC