Re: [css-text] I18N-ISSUE-316: Line breaking defaults

On 10/22/2014 06:49 PM, Asmus Freytag wrote:
> On 10/22/2014 3:12 PM, fantasai wrote:
>>
>> If you're asking about the BA category, in order to safely
>> make a normative requirement, I need it split into two sets:
>
> BA Category 1
>> - characters after which a break is always permissible
>>     and recommended, such as the visible word separators
>
> BA Category 2
>> - characters after which a break is sometimes a good
>>     idea but not always, such as hyphens and slashes
>
> Are there any other members of Category 2?

I am unsure and don't have the time to solve this particular
problem within the next 2 weeks.

If you or the i18nwg would like to go through the entire list
and annotate it over the next couple weeks, then perhaps we
could ask the CSSWG to reconsider this issue.

Personally I don't see why we are so concerned. UAX14 is already
referenced normatively for all the non-tailorable categories and
informatively for all the rest. I am sure that any implementer
would be happy to accept bugs filed against their implementation
for specific cases where it is clearly better than the line-breaking
behavior they have now.

I am not in favor of normatively requiring all of UAX14 because I
don't want anyone to go filing bugs against implementers where they
violate UAX14's tailorable rules and say "you should follow these
rules because they're required [unless you can justify otherwise]".
If we're filing line-breaking bugs, I want them to be argued on
correctness for the particular characters that are not compliant.
I want UAX14 to be used as a source of information, not as a source
of rules, and for that an informative reference is the right approach.

UAX14 line breaking is great *iff* you have a more sophisticated
algorithm that is not simply a pairs table, that has some level
of prioritization-by-distance or perhaps some other kind of
heuristics. It is not, in its current state, suitable for
compliance by a pairwise implementation.

> Is the issue "generic" to all kinds of hyphens and slashes,
> or is it "specific" to special strings like dates, path names
> or identifiers?

It's fairly broad. E-mail, for example, shouldn't be broken
at the hyphen. Neither should :-) nor -x. And of course, as
you mention, neither should dates.

>> I will not issue a normative recommendation to honor BA
>> behavior of the second category. This will result in bad
>> line-breaking when implementations try to comply without
>> performing a thoughtful survey of each individual case
>> and what contextual information the line break may need
>> to consider. Please note that this is not a theoretical
>> concern: we have already run into this exact problem.
>
> I suspect that the issue is more about substrings that represent
> some special context, rather than the generic occurrence of
> these in running text.

It was both.

When unsure, it is safer to not break than to break. Knowing
that the UAX14 pairs table is insufficient for acceptable
line breaking, and that UAs attempting to "improve" their
implementation by following it will regress, I cannot in good
conscience require it as a baseline. I believe, based on past
experience of doing exactly that, that this approach will
result in problems for our implementers.

I stand by my answer in
   http://lists.w3.org/Archives/Public/www-style/2014Jul/0500.html
and I think the existing references to UAX14 are sufficient given
the current situation.

Which doesn't mean we can't work on creating a safer pairs table
that is suitable for dumb line-breaking implementations applied
to Web content, and require that in the future. But as Koji and
I keep re-iterating, that is a significantly larger project than
is in-scope for us right now.

~fantasai

Received on Thursday, 23 October 2014 05:18:21 UTC