Re: First strong on strings surrounded by isolate controls

I do not think that there is a problem with the directionality algorithm.
Aside from alignment, it makes no difference whether that isolate paragraph
is displayed as LTR overall or RTL overall. And for much more important and
common cases of all-neutral text - phone numbers and, to a smaller extent,
dates, times, and signed numbers - its being LTR by default is extremely
important.

I *do* think that there is a problem with the definition of start
alignment, which is not controlled by Unicode, but by various other specs,
such as CSS. The problem affects the alignment of all-neutral paragraphs
generally, like the examples cited above, not just the corner case of an
isolate. Start alignment is generally defined as "left" for an LTR
paragraph, and "right" for an RTL paragraph, and this means that an
all-neutral paragraph, which is LTR by the UBA, is left-aligned - even if
the alignment outside that paragraph is right. This makes the all-neutral
paragraph needlessly different from its surroundings and usually looking
pretty bad.

The solution is to make start alignment match the alignment outside the
paragraph if the paragraph's directionality is determined from its content
and its content is all-neutral.

In the context of CSS, the relevant spec is that of the start and end edges
of a line box whose containing block has ‘unicode-bidi: plaintext’ (
https://www.w3.org/TR/css-text-3/#bidi-linebox). It already makes an
exception for an empty line box, which "takes its inline base direction
from the preceding line box (if any), or, if this is the first line box in
the containing block, then from the ‘direction’ property of the containing
block." I think that this exception should be broadened to an all-neutral
line box.

On Thu, Sep 15, 2016 at 1:15 PM, r12a <ishida@w3.org> wrote:

> On 15/09/2016 10:49, Simon Montagu wrote:
>
>> On 15/09/16 07:51, r12a wrote:
>>
>>> On 15/09/2016 05:44, Martin J. Dürst wrote:
>>>
>>>> This is a very high level, speculative comment, but I'll make it anyway:
>>>>
>>>> You sound as if the isolates are too isolated. My understanding is that
>>>> we introduced the isolates because the embeddings were not independent
>>>> (isolated) enough and interacted with their surroundings too much.
>>>>
>>>> Did we overdo (if maybe even just so sligthly) the isolation when we
>>>> created isolates? Or would we (at least in theory) need a third kind of
>>>> range, somewhere in between isolates and embeddings in independency?
>>>>
>>>
>>> i don't think the level of isolation is the problem, i think it's more
>>> to do with an isolated range being treated as a neutral character
>>> (whereas a non-isolated embedded range (eg. RLE) is treated as a strong
>>> character).
>>>
>>> ri
>>>
>>>
>> That sounds to me like the same issue: as soon as an embedded sequence
>> is treated as a strong character, it stops being isolated: for example
>> it can affect the resolved level of an adjacent numeral. IIUARC this was
>> one of the chief reasons, if not THE reason, for treating isolated
>> sequences as neutral characters in their containers
>>
>
> i agree that it's probably an inseparable issue. The question is how to
> ascertain that a string like "RLI فعالیت بین‌المللی‌سازی، PDI", which i
> think should be regarded by default as a RTL string can be perceived as
> such - especially if those controls have been added by something else along
> the way, such as an application that wraps strings, and which therefore
> removes the previously existing clues.
>
> Asmus, i hear what you're saying about higher level protocols, but i can't
> help thinking that those protocols would need to be adopted by just about
> any application that deals with strings of this kind - which makes me think
> that perhaps there should be a standard mechanism described by the UBA (?).
>
> ri
>
>
>
>
>
>
>
>

Received on Thursday, 15 September 2016 17:30:38 UTC