This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10807 - i18n comment 1 : new attribute: ubi
Summary: i18n comment 1 : new attribute: ubi
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-29 12:15 UTC by i18n bidi group
Modified: 2010-11-11 00:57 UTC (History)
9 users (show)

See Also:


Attachments
Use case (should be <span dir=rtl ubi>) (126 bytes, text/html)
2010-10-06 21:39 UTC, Aharon Lanin
Details

Description i18n bidi group 2010-09-29 12:15:22 UTC
Comment from the i18n review of:
http://dev.w3.org/html5/spec/

Comment 1
At http://www.w3.org/International/reviews/html5-bidi/
Editorial/substantive: S
Tracked by: AL

Location in reviewed document:
undefined [http://dev.w3.org/html5/spec/spec.html#contents]

Comment:Expose in HTML the new "isolate" value added to the unicode-bidi style in CSS3 (
http://dev.w3.org/csswg/css3-text-layout/#unicode-bidi [http://dev.w3.org/csswg/css3-text-layout/#unicode-bidi]
), by adding an element attribute tentatively named ubi, for "Unicode Bidi Isolate", as in <span dir="rtl" ubi>.

This is a part of the proposals made by the "Additional Requirements for Bidi in HTML" W3C First Public Working Draft. For a full description of the use cases, please see 
http://www.w3.org/International/docs/html-bidi-requirements/#bidi-isolation [http://www.w3.org/International/docs/html-bidi-requirements/#bidi-isolation]
. Here is the proposal made there:

The ubi attribute directionally isolates an inline-display element from its surroundings: neither will affect the bidi ordering of the other, and no part of the surrounding content will get ordered between parts of the element's content. Furthermore, the effects of LRE, RLE, LRO, RLO, and PDF characters appearing in the element never extend beyond it.

This is achieved (via unicode-bidi:isolate) by treating the contents of the element as a separate, independent "paragraph" or paragraphs for the purposes of the bidi resolution within the element. These paragraphs' base direction is the elements computed direction. For the purpose of the bidi resolution of the element as a whole in its containing paragraph (if any), the element is treated as if it were an Object Replacement Character (U+FFFC).

The attribute would take three values:

- "off", specifying no special action. This is the default except in the two cases indicated below (for which "off" would have to be explicitly specified when no isolation is desired). There is no inheritance.

- "ubi", specifying isolation. (Alternatively, this value could be named "on". We chose "ubi" for similarity with pre-HTML5 boolean attributes, e.g. selected. It is up to the HTML WG to decide which is better.) It is implemented by setting the unicode-bidi CSS property for the element to "isolate" - or "isolate bidi-override" for a <bdo> element. It is the default value for:

 * Elements whose dir attribute value is "auto" (which is being proposed in a separate bug).

 * Block elements with CSS display:inline (for discussion, see http://www.w3.org/International/docs/html-bidi-requirements/#blocks-as-separators).

- empty string, specifying isolation just like the "ubi" value. The empty string value allows specifying the attribute without a value for conciseness, e.g. <span dir="rtl" ubi>.

Applications generating HTML would use ubi routinely on elements that wrap an inserted data string (usually in conjunction with indicating its direction using the dir attribute). In particular, it will be recommended to use ubi on the <a> element once any browsers support it. It is the anticipated frequency of use that mandated the somewhat cryptic but blessedly short name and syntax: ubi.

Although in theory unicode-bidi:isolate could be used directly to achieve the same effect as ubi, the recommended approach will be to use ubi, since the bidi properties of content are an intimate part of the content and should be specified directly on it as HTML mark-up. They are not simply an issue of presentation that should properly be specified separately in CSS. The new unicode-bidi:isolate CSS property value has been added because, like the dir attribute, the ubi attribute has to be implemented via CSS.
Comment 1 Maciej Stachowiak 2010-09-29 15:40:51 UTC
unicode-bidi takes 5 different values and maybe will take more in the future. Currently, I believe two of those are exposed via HTML - the dir="" attribute on most elements sets "unicode-bidi: embed" (in addition to setting "direction") and the dir="" attribute on the <bdo>element sets "unicode-bidi: override" (again in addition to setting "direction"). Perhaps instead of adding another boolean attribute for yet another value, it would make more sense to add a bidi="" attribute that can express any of the unicode-bidi values. That would be more future proof, as we won't have to proliferate attributes if it turns out more of these are important to expose to markup.
Comment 2 Ian 'Hixie' Hickson 2010-10-05 00:33:13 UTC
I don't understand what problem this solves.

Can you give an example of a page that does not use CSS where it would be necessary to use this feature to solve a problem?
Comment 3 Aharon Lanin 2010-10-06 21:39:09 UTC
Created attachment 921 [details]
Use case (should be <span dir=rtl ubi>)
Comment 4 Aharon Lanin 2010-10-06 21:43:19 UTC
(In reply to comment #2)
> I don't understand what problem this solves.
> 
> Can you give an example of a page that does not use CSS where it would be
> necessary to use this feature to solve a problem?

Several use cases are given and discussed in <http://www.w3.org/International/docs/html-bidi-requirements/#bidi-isolation>. Please do go through that, there is a lot of important stuff there. Nevertheless, let me quote one use case here (uppercase English is used to stand in for RTL script):

<span dir="rtl">PURPLE PIZZA</span> - <a href="ppreviews.html">3 reviews</a>

The intent is to have it appear as

AZZIP ELPRUP - 3 reviews
               ---------

Believe it or not, it is currently (correctly!) displayed as:

3 - AZZIP ELPRUP reviews
-               --------

With ubi, the fix would be to say <span dir="rtl" ubi>PURPLE PIZZA</span>.

I have attached an html file demonstrating this.
Comment 5 Aharon Lanin 2010-10-09 20:16:25 UTC
(In reply to comment #1)
> unicode-bidi takes 5 different values and maybe will take more in the future.
> Currently, I believe two of those are exposed via HTML - the dir="" attribute
> on most elements sets "unicode-bidi: embed" (in addition to setting
> "direction") and the dir="" attribute on the <bdo>element sets "unicode-bidi:
> override" (again in addition to setting "direction").

"Two" is not quite correct. Besides "embed" and "override", the third pre-CSS3 value, "normal", is basically the current default, so it too is already available. The remaining two values are new to CSS3: "plaintext", which is part of the dir=auto discussion and belongs there, and "isolate" , which is what the ubi attribute is meant to expose (not yet another beyond these five). With ubi, no unicode-bidi values are left unexposed in HTML.

> Perhaps instead of adding
> another boolean attribute for yet another value, it would make more sense to
> add a bidi="" attribute that can express any of the unicode-bidi values. That
> would be more future proof, as we won't have to proliferate attributes if it
> turns out more of these are important to expose to markup.

I don't think the result would have very desirable qualities. Setting unicode-bidi to "embed" or "override" without giving a new direction value is not at all a common use-case. That's why the dir attribute, which sets both unicode-bidi and direction is designed the way it is, and is perfect for the job. The dir attribute would handle the "plaintext" too. So, all that's left is the new "isolate" value - and any future ones. It is difficult to say whether the future ones would be a good fit for the new attribute, or a poor one like most of the current values. And even isolate does not fit very well with the new attribute because of the wordiness: unicode-bidi=isolate instead of just ubi. And wordiness matters here because we expect the new attribute to be used quite a bit.

I therefore think that ubi as currently designed would work better.
Comment 6 Ian 'Hixie' Hickson 2010-10-12 10:23:25 UTC
Wouldn't it be better to use an element, like <bdo>? <bdi> or something (Bi Di Isolate)?
Comment 7 Maciej Stachowiak 2010-10-12 10:34:28 UTC
(In reply to comment #6)
> Wouldn't it be better to use an element, like <bdo>? <bdi> or something (Bi Di
> Isolate)?

<bdo> doesn't actually do anything by itself, it only sets "unicode-bidi: bidi-override" if dir is set. From the HTML5 rendering section (which I think roughly matches what browsers do):

bdo[dir=ltr], bdo[dir=rtl] { unicode-bidi: bidi-override; } /* case-insensitive */

The proposal for ubi/bdi/whatever is that it should force unicode-bidi: isolate regardless of dir setting (or, if set to "off", force it off, but I'm not sure how that would work with the CSS model). So it's not really parallel to <bdo>, as proposed.

I'm mildly skeptical of using an element, because, as I mentioned in an earlier comment, there are 5 unicode-bidi values currently specified, maybe more in the future. Adding an element for each does not seem like a winning strategy long-term.
Comment 8 Aharon Lanin 2010-10-13 15:40:37 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Wouldn't it be better to use an element, like <bdo>? <bdi> or something (Bi Di
> > Isolate)?
> 
> <bdo> doesn't actually do anything by itself, it only sets "unicode-bidi:
> bidi-override" if dir is set. From the HTML5 rendering section (which I think
> roughly matches what browsers do):
> 
> bdo[dir=ltr], bdo[dir=rtl] { unicode-bidi: bidi-override; } /* case-insensitive
> */
> 
> The proposal for ubi/bdi/whatever is that it should force unicode-bidi: isolate
> regardless of dir setting (or, if set to "off", force it off, but I'm not sure
> how that would work with the CSS model). So it's not really parallel to <bdo>,
> as proposed.
> 
> I'm mildly skeptical of using an element, because, as I mentioned in an earlier
> comment, there are 5 unicode-bidi values currently specified, maybe more in the
> future. Adding an element for each does not seem like a winning strategy
> long-term.

I do not like a <ubi> element for a few reasons, none of which is very big:
- Longer syntax (need a closing tag)
- You sometimes want isolation on a <div> and sometimes a <span>. <ubi> would have to have the same maybe-phrasing semantics as <a>.
- I want isolation by default for dir=auto elements, but I want to be able to be able to suppress it with ubi=off, which I can't do with an element.
- Adding an element is "heavier" than adding an attribute.

On the other hand, it is possible to argue that it should be an element because it has special semantics - a "self-contained entity".

On balance, I prefer an attribute.

As I explained earlier, however, I do not like simply exposing unicode-bidi as an HTML attribute, for the reasons given before.
Comment 9 Ian 'Hixie' Hickson 2010-10-14 06:51:51 UTC
This seems to be the same use case as the problem described in bug 10808. Am I mistaken?
Comment 10 Aharon Lanin 2010-10-14 08:33:04 UTC
(In reply to comment #9)
> This seems to be the same use case as the problem described in bug 10808. Am I
> mistaken?

Isolation is a very good idea whenever a piece of data is being inserted whose direction need not be the same as that of its context and which is logically separate from the stuff round it.

This is almost always the case when dir=auto (but I still want the user to have a way out, just in case).

But it is also very often the case when one does know the direction (and letting the browser guess is just asking for trouble). For example, one should never knowingly put a dir=auto on a phone number (it is always ltr), but a phone number will often benefit from ubi.

Aharon
Comment 11 Ian 'Hixie' Hickson 2010-10-14 09:21:26 UTC
I'm just talking about use cases here, not solutions. What solutions we pick depends on the use cases.

It seems that the problems we are trying to solve are all basically the same, and fall into two subcategories:

 - embedding user-provided freeform text (e.g. place names)

 - embedding user-provided data with a known direction (e.g. phone numbers)

Is that right?
Comment 12 Ian 'Hixie' Hickson 2010-10-15 00:21:44 UTC
*** Bug 10808 has been marked as a duplicate of this bug. ***
Comment 13 Ian 'Hixie' Hickson 2010-10-15 00:26:59 UTC
It seems to me that the use cases described in this bug can be most easily addressed as follows:

1. Add an 'auto' value for the CSS 'direction' property that determines the direction in a suitably automatic way.

2. Recommend that authors use the <output> element to mark up information from users, and make <output> default to 'direction:auto'. This element defaults to unicode-bidi:isolate.

So for a place name, you'd write:

   <output>Purple Pizza</output> - <a href="ppreviews.html">3 reviews</a>

If you knew the direction, e.g. a phone number, you could write:

   <output dir=ltr>+1 555 123 4567</output>

Are there any use cases that this would not address?
Comment 14 Aharon Lanin 2010-10-18 13:19:59 UTC
(In reply to comment #13)
> It seems to me that the use cases described in this bug can be most easily
> addressed as follows:
> 
> 1. Add an 'auto' value for the CSS 'direction' property that determines the
> direction in a suitably automatic way.
> 
> 2. Recommend that authors use the <output> element to mark up information from
> users, and make <output> default to 'direction:auto'. This element defaults to
> unicode-bidi:isolate.
> 
> So for a place name, you'd write:
> 
>    <output>Purple Pizza</output> - <a href="ppreviews.html">3 reviews</a>
> 
> If you knew the direction, e.g. a phone number, you could write:
> 
>    <output dir=ltr>+1 555 123 4567</output>
> 
> Are there any use cases that this would not address?

I have several problems with this solution.

1. It is not correct to characterize all or even most content that needs isolation (and/or auto-direction) as "user-provided" or as the "output of a calculation" or in any way associated with forms (which the output element seems to be). For example, let's say I am just authoring a simple HTML document in an RTL language, and want to list a few of my favorite bands, e.g. "I LIKE a, b, AND c." If I do not use isolation on a, b, and c, this will be displayed as:

.c DNA ,a, b EKIL I

instead of the intended

.c DNA ,b ,a EKIL I

This can be solved by using &rlm; ("a&rlm;, b&rlm;, AND c"), but this is ugly and has other problems.

What I really want to do is bidi-isolate each of a, b, and c even though they are not user-provided or calculated, and I have no form in my page.

For a different example take a web app, e.g. a search app. None of the things that need isolation in each search result - the title, the snippet, the filename, the size - is generated by the app's user, and there is no association with a form. In a sense, they are calculated, but not in the way intended for the output element.

In brief, I do not think that the output element is a good fit for most use cases - although having isolation on for output element by default (in addition to a more general solution) is probably a good idea.

2. It is quite a common occurrence that the item needing isolation is already wrapped in an element like <a> or <q> or <span> (or <output>). In fact, if <a> and <q> were being invented today, we would want isolation for them by default - but we dare not do that now because it would most certainly break some existing documents. But having to wrap such items in *two* elements, e.g. <a><ubi>BLAH BLAH</ubi></a> - or is it <ubi><a>BLAH BLAH></a></ubi>, is surely rubbing salt in the wounds. An attribute with a short name and no need to specify a value is a lot less painless to use.

3. As far as I understand, adding an "auto" value to the CSS direction property is a non-starter. Fantasai should be able to provide more details. For this reason, I am also reopening bug 10808.


Let's deal with the auto-direction issue separately in bug 10808, which I am re-opening for reasons described there.

Basically, you are saying that isolation will be provided by the <output> element.
Comment 15 Aharon Lanin 2010-10-18 13:24:55 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > It seems to me that the use cases described in this bug can be most easily
> > addressed as follows:
> > 
> > 1. Add an 'auto' value for the CSS 'direction' property that determines the
> > direction in a suitably automatic way.
> > 
> > 2. Recommend that authors use the <output> element to mark up information from
> > users, and make <output> default to 'direction:auto'. This element defaults to
> > unicode-bidi:isolate.
> > 
> > So for a place name, you'd write:
> > 
> >    <output>Purple Pizza</output> - <a href="ppreviews.html">3 reviews</a>
> > 
> > If you knew the direction, e.g. a phone number, you could write:
> > 
> >    <output dir=ltr>+1 555 123 4567</output>
> > 
> > Are there any use cases that this would not address?
> 
> I have several problems with this solution.
> 
> 1. It is not correct to characterize all or even most content that needs
> isolation (and/or auto-direction) as "user-provided" or as the "output of a
> calculation" or in any way associated with forms (which the output element
> seems to be). For example, let's say I am just authoring a simple HTML document
> in an RTL language, and want to list a few of my favorite bands, e.g. "I LIKE
> a, b, AND c." If I do not use isolation on a, b, and c, this will be displayed
> as:
> 
> .c DNA ,a, b EKIL I
> 
> instead of the intended
> 
> .c DNA ,b ,a EKIL I
> 
> This can be solved by using &rlm; ("a&rlm;, b&rlm;, AND c"), but this is ugly
> and has other problems.
> 
> What I really want to do is bidi-isolate each of a, b, and c even though they
> are not user-provided or calculated, and I have no form in my page.
> 
> For a different example take a web app, e.g. a search app. None of the things
> that need isolation in each search result - the title, the snippet, the
> filename, the size - is generated by the app's user, and there is no
> association with a form. In a sense, they are calculated, but not in the way
> intended for the output element.
> 
> In brief, I do not think that the output element is a good fit for most use
> cases - although having isolation on for output element by default (in addition
> to a more general solution) is probably a good idea.
> 
> 2. It is quite a common occurrence that the item needing isolation is already
> wrapped in an element like <a> or <q> or <span> (or <output>). In fact, if <a>
> and <q> were being invented today, we would want isolation for them by default
> - but we dare not do that now because it would most certainly break some
> existing documents. But having to wrap such items in *two* elements, e.g.
> <a><ubi>BLAH BLAH</ubi></a> - or is it <ubi><a>BLAH BLAH></a></ubi>, is surely
> rubbing salt in the wounds. An attribute with a short name and no need to
> specify a value is a lot less painless to use.
> 
> 3. As far as I understand, adding an "auto" value to the CSS direction property
> is a non-starter. Fantasai should be able to provide more details. For this
> reason, I am also reopening bug 10808.
> 
> 
> Let's deal with the auto-direction issue separately in bug 10808, which I am
> re-opening for reasons described there.
> 
> Basically, you are saying that isolation will be provided by the <output>
> element.

Sorry, I had intended to delete the last two paragraphs of this comment ("Let's deal ..." and "Basically..."); please disregard them.
Comment 16 Aryeh Gregor 2010-10-18 18:22:13 UTC
Use-cases seem to be:

1) You're outputting a short inline string that's a logical unit and should not mix up the directionality of adjacent inline things.  E.g., the pizza review example given.  This can currently be fixed only by inserting magic invisible characters, which are hard to produce or see with normal text editors.

2) You're outputting a block of text (or maybe even a <textarea>) that might contain RTL text, LTR, or a mix -- you don't know.  If you let it inherit LTR directionality from the page, RTL paragraphs will look very weird:

HEBREW TEXT!

should display as

!TXET WERBEH

but actually displays as

TXET WERBEH!

This effect is even worse for inputs and textareas, where your cursor jumps between the beginning and end of the text as you type depending on whether the character you just typed is RTL, LTR, or neither.


This bug deals with a solution to use-case 1.  Bug 10808 deals with use-case 2.
Comment 17 Ian 'Hixie' Hickson 2010-10-19 06:16:14 UTC
Ok, now we're getting somewhere. Thanks. This is why use cases are more important than proposed solutions. Bugs should always be focused on problems first, not proposals.

(by the way, please don't quote the whole comment you're replying to, especially if it's immediately above your comment, as it just makes reading the bug more difficult.)

(In reply to comment #14)
> 
> It is quite a common occurrence that the item needing isolation is already
> wrapped in an element like <a> or <q> or <span> (or <output>). [...]
> But having to wrap such items in *two* elements, e.g.
> <a><ubi>BLAH BLAH</ubi></a> - or is it <ubi><a>BLAH BLAH></a></ubi>, is surely
> rubbing salt in the wounds. An attribute with a short name and no need to
> specify a value is a lot less painless to use.

By that argument, we shouldn't have <bdo>, or indeed <a> (many links are given on elements that are already in the markup) or indeed many of the phrasing elements... I don't think that argument holds water.


Given the use cases of text that need isolation, I agree that <output>'s semantic is inappropriate.

Is it the case that all these cases should also have a language specified? All the examples so far seem to be english text mixed in with hebrew; would it be correct to say that they should all be marked up with lang="" attributes? If so, can we just make all elements with lang="" attributes have unicode-bidi:isolate? Or are there examples of where setting the language doesn't change (and you do know the language doesn't change, it's not just that you don't know the language) but you still want this isolation behaviour?
Comment 18 Maciej Stachowiak 2010-10-19 08:00:32 UTC
(In reply to comment #14)
> 
> 3. As far as I understand, adding an "auto" value to the CSS direction property
> is a non-starter. Fantasai should be able to provide more details. For this
> reason, I am also reopening bug 10808.
> 

Having dir values that don't map to CSS will be problematic for implementors, since the dir attribute is currently implemented 100% by mapping to CSS. Getting the right interaction with the values that *do* map to CSS would be particularly tricky. I expect the likely outcome would be to map to a nonstandard CSS value for the direction property. That seems to me like CSS is falling down on the job.

Maybe it would have been better if CSS never got involved in defining directionality, but that's not the world we live in. Having text direction controlled by a mix of CSS and non-CSS mechanisms is likely to be needlessly confusing and hard to implement.
Comment 19 Aharon Lanin 2010-10-19 12:10:23 UTC
(In reply to comment #17)
> Is it the case that all these cases should also have a language specified? All
> the examples so far seem to be english text mixed in with hebrew; would it be
> correct to say that they should all be marked up with lang="" attributes?

Most, but not all. Phone numbers, urls, file sizes, etc. usually need isolation in an RTL document, but are not in any particular language.

> If so, can we just make all elements with lang="" attributes have
> unicode-bidi:isolate? Or are there examples of where setting the language
> doesn't change (and you do know the language doesn't change, it's not just that
> you don't know the language) but you still want this isolation behaviour?

I would hate to make it dependent on the lang attribute, for the reason given above, but also because the exact language of the data is very rarely known. Also, it is possible that adding isolation by default would break existing documents. This is also the argument against doing isolation by default any time the dir attribute is set.
Comment 20 Aharon Lanin 2010-10-19 12:20:44 UTC
(In reply to comment #18)

Discussion of whether there should be an auto value for CSS direction belongs under bug 10808. Answering there.
Comment 21 Aryeh Gregor 2010-10-19 17:03:41 UTC
(In reply to comment #17)
> Is it the case that all these cases should also have a language specified? All
> the examples so far seem to be english text mixed in with hebrew; would it be
> correct to say that they should all be marked up with lang="" attributes? If
> so, can we just make all elements with lang="" attributes have
> unicode-bidi:isolate? Or are there examples of where setting the language
> doesn't change (and you do know the language doesn't change, it's not just that
> you don't know the language) but you still want this isolation behaviour?

Assuming that every language is written either LTR or RTL, not either interchangeably -- this should be true at least if you use language codes like kk-ar and kk-cy to distinguish -- then clearly you don't need isolation if the language of the whole string is known to be the same as the language of the surrounding page.

However, the provided text is generally going to be in an unknown language, and might be in a mix of languages.  For instance, on an English page, a user might submit a one-line input (like a wiki edit summary, or a username) that contains a Hebrew word.  It's possible for this to mess up direction if this isn't contained somehow, e.g.:

Logical:          comments: "abc", "def GHI", "KJL mno"
Expected display: comments: "abc", "def IHG", "LJK mno"
Actual display:   comments: "abc", "def LJK" ,"IHG mno"

Not to mention the possibility that the comments might actually contain directionality marks themselves.  Again, this can mostly be fixed by inserting control characters, but those are a pain to work with, e.g., they get caught in copy-paste.  Note that this affects even LTR text on an LTR page, if the app is bidi-aware -- it's simplest to just output the isolation character unconditionally, and then copy-paste will include invisible garbage that will foil simple string matches and so on.  (Bidi control characters are supposed to be ignored for string matching, but that's generally not done in practice.)
Comment 22 Ian 'Hixie' Hickson 2010-11-02 19:40:41 UTC
In general, it seems like a <bdi> element is the way to go.

However, I think it would also make sense to make sure that <output> elements are always isolated (this is already in the spec), and also that any element which has a lang="" attribute (regardless of value) should have this isolation behaviour. I'll leave the issue of the automatic direction determination to the other bug.

So for a place name, you'd write:

   <output>Purple Pizza</output> - <a href="ppreviews.html">3 reviews</a>

If you knew the direction, e.g. a phone number, you could write:

   <bdi dir=ltr>+1 555 123 4567</bdi>

Comment 21's example would be marked up as:

   "<bdi>abc</bdi>", "<bdi>def GHI</bdi>", "<bdi>KJL mnoM/bdi>"

...or maybe:

   "<span lang="">abc</span>", "<span lang="">def GHI</span>", "<span lang="">KJL mnoM/span>"


(In reply to comment #19)
> 
> Most, but not all. Phone numbers, urls, file sizes, etc. usually need isolation
> in an RTL document, but are not in any particular language.

Fair enough.


> I would hate to make it dependent on the lang attribute, for the reason given
> above, but also because the exact language of the data is very rarely known.

If it's not known, then you should be setting lang="" (empty), so that's ok.

> Also, it is possible that adding isolation by default would break existing
> documents. This is also the argument against doing isolation by default any
> time the dir attribute is set.

How could it break an existing document? Might it not fix as many if not more documents than it breaks?

Indeed, doing this automatically any time dir="" is explicitly set might not be a bad idea either... do we have any data on how many pages would change rendering if we did this? Might this not actually make more sense overall?

I'm very much of the opinion that we should make this work as automatically as possible, because there's no way most authors are going to learn or understand this stuff.
Comment 23 CE Whitehead 2010-11-02 22:43:32 UTC
--- Comment #22 from Ian 'Hixie' Hickson <ian@hixie.ch> 2010-11-02 19:40:41 UTC ---In general, it seems like a <bdi> element is the way to go.

>> I would hate to make it dependent on the lang attribute, for the reason given
>> above, but also because the exact language of the data is very rarely known. 

> If it's not known, then you should be setting lang="" (empty), so that's ok. 

I believe that phone numbers are often embedded in text that is a mixture of English for example plus an rtl language (I need to verify this with someone whose primary language is rtl however) . . . so I am not sure about lang="" in these cases; if these are just numbers then lang=zxx would be right. 

Best,

--C. E. Whitehead
cewcathar@hotmail.com
Comment 24 Ian 'Hixie' Hickson 2010-11-03 08:08:24 UTC
Yeah, for phone numbers, prices, and the like, I would suggest <bdi dir=ltr>.
Comment 25 Aryeh Gregor 2010-11-03 14:00:12 UTC
Sounds good to me.  Does anyone have an example of where you'd embed something with different direction or a different language and *not* want it to automatically isolate?  I can't think of one offhand.
Comment 26 Aharon Lanin 2010-11-03 18:49:51 UTC
(In reply to comment #22)
> In general, it seems like a <bdi> element is the way to go.

1. Regarding the attribute-vs-element issue, I still prefer attribute as the more easily used and less disruptive solution, but you are the HTML expert, and I defer to you.

If it is going to be an element, perhaps we could make dir=auto the default for it? It would be needed most of the time, it can be overriden with an explicit dir value, and there is no backward compatibility problem. (Note we still need dir=auto itself for the cases where you want it to affect the alignment of a block element.)

2. Regarding the name, I am wondering why you seem to prefer bdi over ubi. The name originally suggested (about a year ago) was indeed bdi, but people saw a couple of problems with it (leading to the switch to ubi):

- Could be confused with bdo.

- Could be confused with the term "bidi" generally. Although that is not necessarily bad, especially if it gets dir=auto by default...

> However, I think it would also make sense to make sure that <output> elements
> are always isolated (this is already in the spec)

Sounds good to me. I am not sure I would want the extra expense of dir=auto by default on output, though, which has lots of use cases where bidi can not possibly be a concern.

> and also that any element
> which has a lang="" attribute (regardless of value) should have this isolation
> behaviour.

No, see below.

> > Also, it is possible that adding isolation by default would break existing
> > documents. This is also the argument against doing isolation by default any
> > time the dir attribute is set.
> 
> How could it break an existing document?
>
> Indeed, doing this automatically any time dir="" is explicitly set might not be
> a bad idea either... do we have any data on how many pages would change
> rendering if we did this?

Authors do not always know what they are doing, especially when it comes to bidi. Consider the following:

i spoke to JOHN. <span dir=ltr>susan</span>, MIKE and ollie spoke to him too.

Of course, the dir=ltr on susan is unnecessary, while dir=rtl would have been a good idea on JOHN and MIKE, but like I said, people often get really confused when it comes to bidi. Currently, despite all the nonsense, it is rendered as intended:

i spoke to NHOJ. susan, EKIM and ollie spoke to him too.

With isolation snuck in by default, though, one would get:

i spoke to EKIM ,susan .NHOJ and ollie spoke to him too.

Not convinced? Let's try this one, as might be output by a web app that is trying to visualize some sort of relationship between FOO and BAR, which are names from its database:

Summary: FOO <span dir=ltr>==&gt;</span> BAR

This gets rendered as

Summary: OOF ==> RAB

The dir=ltr on the ==> was put in to prevent it from being displayed as

Summary: RAB <== OOF

which might not be to the app UI designer's liking for some good reason. Of course, another way to fix this would have been with an &lrm; somewhere between FOO and BAR, but nearly no one knows how to use &lrm;. Also, dir=rtl on both FOO and BAR would have been a good idea, but that would not have fixed the UI designer's original problem, and it may be that they had not yet run into the issue of the names themselves getting garbled yet, so they did not do it. This scenario is very, very realistic. Unfortunately, the introduction of isolate-by-default onto the dir=ltr will break their fix and make their application suddenly regress.

> Might it not fix as many if not more
> documents than it breaks?

The breakage that one gets due to lack of isolation, when it happens, is quite obvious. If the page gets any QA, it will be found and fixed, somehow - if anyone cares enough about it. If the page doesn't get QA, it likely has a dozen other bidi problems that we won't fix automatically. Besides, one bug added due to lack of backward compatibility is worth about a hundred that got fixed "for free" - but which apparently no one cared about enough to fix themselves.

I am therefore extremely against doing isolation automatically any time dir is specified. If we were inventing the dir attribute today, I would be all for it, but not as things stand today.

Similarly, I would not do it automatically when lang is specified. This has the additional handicap of being unimportant due to the low incidence of lang attribute use, especially inline.
Comment 27 Ian 'Hixie' Hickson 2010-11-03 23:06:18 UTC
Good points regarding lang="" and dir=""; I won't change their defaults.

Defaulting to dir=auto on the new element seems reasonable.

I think <ubi> is just as confusing as <bdi>, the advantage of the latter is in fact the similarity with <bdo>  they'd appear together on alphabetical lists, they'd have an obvious relationship, and people looking for one but remembering the other could thus likely find what they wanted more easily.

I think an element makes more sense than an attribute here, for the same reason <bdo> is an element. There's never a reason to make this paragraph-level, right? (Paragraphs are always isolated?) Generally we only add global attributes when you'd use the attribute on all kinds of elements, not just phrasing content.
Comment 28 Aharon Lanin 2010-11-04 00:03:20 UTC
(In reply to comment #27)
> I think <ubi> is just as confusing as <bdi>, the advantage of the latter is in
> fact the similarity with <bdo>  they'd appear together on alphabetical lists,
> they'd have an obvious relationship, and people looking for one but remembering
> the other could thus likely find what they wanted more easily.

Good points.
 
> There's never a reason to make this paragraph-level,
> right? (Paragraphs are always isolated?)

Right.

> Generally we only add global
> attributes when you'd use the attribute on all kinds of elements, not just
> phrasing content.

Ok.

The proposed solution, as I understand it:
- new bdi element, gets unicode-bidi:isolate in default stylesheet, otherwise same as span.
- bdi element has dir=auto by default (pending the final outcome of bug 10808)
- output element gets unicode-bidi:isolate in default stylesheet.

If so, looks good to me.
Comment 29 Ian 'Hixie' Hickson 2010-11-05 00:55:16 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: See discussion above.

I've not done the dir=auto thing yet, but I've left a marker in the spec reminding me to fix it as soon as I've added the dir=auto stuff. (Markers get flagged to me every time I check anything in, and are tracked as pending issues in my progress charts, so I don't tend to forget them.)
Comment 30 contributor 2010-11-05 00:55:38 UTC
Checked in as WHATWG revision r5669.
Check-in comment: Add a <bdi> element to safely let people insert user-generated content that may have bidi implications.
http://html5.org/tools/web-apps-tracker?from=5668&to=5669
Comment 31 Aharon Lanin 2010-11-08 13:36:35 UTC
Almost there - some wording tweaks:

1. In the definition of the bdi element, the following sentence appears: "For the purposes of the bidirectional algorithm, the user agent must act as if the contents of the element were a self-contained paragraph not present in the parent element." This is not a complete specification of the expected behavior. For example, the "not present" is not very helpful in terns of determining where the element's content should appear within the surrounding content, or in what order two bdi elements placed next to each other should appear. I would thus suggest something like this instead:

For the purposes of bidirectional resolution of the element's content in the surrounding bidi paragraph (if any), the user agent must treat the element as if it contained just an Object Replacement Character (U+FFFC). For the purposes of bidirectional resolution within the element, the user agent must treat the element's content as a self-contained paragraph (or sequence of paragraphs) with a base direction corresponding to the element's computed direction.

2. In the example following the bdi definition, make the following replacements:
"If the bdi element was not used" -> "If the bdi element were not used"
"would put the colon next and the number" -> "would put the colon and the number"
Comment 32 Ian 'Hixie' Hickson 2010-11-10 17:31:05 UTC
I'll try to fix those problems. Thanks for the feedback and for your patience!
Comment 33 Ian 'Hixie' Hickson 2010-11-11 00:54:49 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter's comments.
Comment 34 contributor 2010-11-11 00:57:24 UTC
Checked in as WHATWG revision r5677.
Check-in comment: Reword some <ubi> requirements and correct some typos.
http://html5.org/tools/web-apps-tracker?from=5676&to=5677