Re: CSS3 Text: Line-breaking Properties

On Mon, 21 Apr 2003, fantasai wrote:

>   # In the most general case, (assuming no hyphenation dictionary is
>   # available to the UA), a line break can occur only at white space
>   # characters or hyphens, including U+00AD SOFT HYPHEN.
>
> This doesn't seem to match UAX 14.

In what sense? UAX 14 is complex and confusing, and too implicit in some
basic statements, but the reasonable interpretation is that it defines
_default_ line breaking rules for characters. The rules _permit_ line
breaks at certain points but do not require any particular behavior.
Unfortunately, some software vendors often take those rules literally,
causing line breaks even in strings like a-b or even -b (literally!), but
it would not be adequate to blame UAX 14 on that. Surely the idea is that
the default rules can be applied with discretion, using various criteria
to prevent line breaks where UAX 14 would allow them, and applying
additional line breaking principles when adequate. And CSS is a "higher
level protocol" which can override any character-level rules.

Since UAX 14 exists, it would probably be useful to have the _option_ of
suggesting UAX 14 rules in CSS, but they should surely not be the default.
As I discuss at http://www.cs.tut.fi/~jkorpela/unicode/linebr.html
the UAX 14 rules are far too mechanical to from a sound basis for general
text processing and display. Just breaking a line at some point, with no
indication of what has happened, doesn't do good to many constructs that
are actually used on Web pages.

> line-break-general
>    normal   - as defined in UAX 14 for non-ideographic
>    strict   - only break on spaces and other explicit opportunities like zwsp
>    anywhere - as for "word-break-cjk: break-all"

Presumably "normal" is supposed to be the initial value, and I strongly
disagree. What you describe as "strict" is what dominated on the Web for
years and is easily understood, except for the zwsp part. It should be the
default, and the UAX 14 based method should have a name that clearly
reflects its definition, like "unicode-line-breaking". And for practical
reasons, a value (e.g., "after-hyphen") that allows line breaks after
hyphen-minus characters and is otherwise identical with the default would
be useful.

In principle, it would be nice to have the option of explicitly
enumerating the characters after which a line break is permitted. If you
need to include a URL literally into a document, you might use some
delimiters like "<" and ">" and permit line breaks after some characters
like "/", "?", and "&" but not others.

Besides, when word division by language-dependent algorithms becomes a
reality in browsers, it becomes important to be able to prevent them.
It's an interesting question whether they should be allowed by default. I
would say no, both by Web traditions and by the fact that the algorithms
won't work perfectly - they are more or less bound to create wrong
hyphenations at times. This is more serious than in text processing where
the author can, in principle at least, check what happens, whereas when
CSS is used, the normal situation is that the author is nowhere near when
actual document formatting for presentation takes place. Thus, an author
should have the opportunity of asking for hyphenation, _if_ he regards it
as useful enough for his document, considering all the pros and cons, and
with regard to the need to add detailed markup and explicit hyphenation
information, if the author wishes to prevent bad hyphenations.

But first and foremost, CSS development should not encourage wider
application of UAX 14 line breaking without the author's (or, in some
cases, the user's) discretion.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 21 April 2003 15:24:46 UTC