This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23260 - Make the dir attribute use isolation instead of embedding
Summary: Make the dir attribute use isolation instead of embedding
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 critical
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on: 22326
Blocks:
  Show dependency treegraph
 
Reported: 2013-09-16 23:03 UTC by Travis Leithead [MSFT]
Modified: 2013-12-10 10:36 UTC (History)
9 users (show)

See Also:


Attachments

Description Travis Leithead [MSFT] 2013-09-16 23:03:21 UTC
+++ This bug was initially created as a clone of Bug #22326 +++

Cloned at filer's request.

As discussed in http://www.w3.org/International/wiki/Html-bidi-isolation, the dir attribute should start using isolation instead of embedding.

There is some additional discussion in https://www.w3.org/Bugs/Public/show_bug.cgi?id=18490, but further discussion on isolation for the dir attribute should happen here.

The only change that appears to be necessary in the spec is in
http://www.w3.org/html/wg/drafts/html/master/rendering.html#bidirectional-text,
replacing all of the following:

=========
address, blockquote, center, div, figure, figcaption, footer, form,
header, hr, legend, listing, p, plaintext, pre, summary, xmp, article,
aside, h1, h2, h3, h4, h5, h6, hgroup, main, nav, section, table, caption,
colgroup, col, thead, tbody, tfoot, tr, td, th, dir, dd, dl, dt, menu,
ol, ul, li {
  unicode-bidi: isolate;
}

:matches([dir=ltr i], [dir=rtl i], [dir=auto i]):not(address):not(blockquote
):not(center):not(div):not(figure):not(figcaption):not(footer):not(form
):not(header):not(hr):not(legend):not(listing):not(main):not(p):not(plaintext):not(pre
):not(summary):not(xmp):not(article):not(aside):not(h1):not(h2):not(h3):not(h4
):not(h5):not(h6):not(hgroup):not(nav):not(section):not(table):not(caption
):not(colgroup):not(col):not(thead):not(tbody):not(tfoot):not(tr):not(td
):not(th):not(dir):not(dd):not(dl):not(dt):not(menu):not(ol):not(ul):not(li) {
  unicode-bidi: embed;
}

bdi, bdi:matches([dir=ltr i], [dir=rtl i]),
output, output:matches([dir=ltr i], [dir=rtl i]),
[dir=auto i] {
  unicode-bidi: isolate;
}

bdo, bdo:matches([dir=ltr i], [dir=rtl i]) { unicode-bidi: bidi-override; }
bdo[dir=auto i] { unicode-bidi: isolate-override; }
=========

with the following:

=========
address, blockquote, center, div, figure, figcaption, footer, form,
header, hr, legend, listing, p, plaintext, pre, summary, xmp, article,
aside, h1, h2, h3, h4, h5, h6, hgroup, main, nav, section, table, caption,
colgroup, col, thead, tbody, tfoot, tr, td, th, dir, dd, dl, dt, menu,
ol, ul, li,
[dir=ltr i], [dir=rtl i], [dir=auto i],
bdi, output  {
  unicode-bidi: isolate;
}

bdo, bdo[dir=ltr i], bdo[dir=rtl i] { unicode-bidi: bidi-override; }
bdo[dir=auto i] { unicode-bidi: isolate-override; }
=========

Please note that the above leaves <bdo dir="ltr|rtl"> with unicode-bidi: bidi-override instead of the new unicode-bidi: isolate-override. We could move them too to isolate-override, and it may well be cleaner, but this has not been discussed.
Comment 1 Aharon Lanin 2013-09-17 19:42:01 UTC
Comment 1 from 22326 didn't get cloned. Here it is:

Come to think of it, it would be best to make a small change in the section on the dir attribute itself, http://www.w3.org/html/wg/drafts/html/master/dom.html#the-dir-attribute. At the very start, we have the following three paragraphs:

======
The ltr keyword, which maps to the ltr state
Indicates that the contents of the element are explicitly directionally embedded left-to-right text.

The rtl keyword, which maps to the rtl state
Indicates that the contents of the element are explicitly directionally embedded right-to-left text.

The auto keyword, which maps to the auto state
Indicates that the contents of the element are explicitly embedded text, but that the direction is to be determined programmatically using the contents of the element (as described below).
======

In the first two, the term "directionally embedded" would be best changed to "directionally isolated".

In the the third, "explicitly embedded" should be changed to either "directionally isolated" or "explicitly directionally isolated".
Comment 2 Ian 'Hixie' Hickson 2013-09-17 20:13:33 UTC
Why are we changing this stuff _again_? Can't we leave it for a few years? There's so much churn in this part of the spec that I'd be surprised if any browser vendor even looked at the spec any more.

Are browsers planning on changing to this?

What problem is it solving?

What's the back-compat impact?
Comment 3 Martin Dürst 2013-09-18 03:55:38 UTC
(In reply to Ian 'Hixie' Hickson from comment #2)
> Why are we changing this stuff _again_? Can't we leave it for a few years?
> There's so much churn in this part of the spec that I'd be surprised if any
> browser vendor even looked at the spec any more.

There is indeed quite some churn here, and browser vendors may wait and look at the spec again once things (including the Unicode Bidi Algorithm itself) have settled down.

> Are browsers planning on changing to this?
> 
> What problem is it solving?

The actual change that Aharon proposed in comment #1 make sure that the browsers have leeway when implementing this. They can make sure that the dir attribute uses isolation by explicitly including the relevant parts of the default stylesheet. Or they can make sure that the default behavior is just the same as the default stylesheet.

Another way to interpret Aharon's tweak in comment #1 is that it makes sure the spec is consistent, nothing more. And I wouldn't want to wait a few years to make that happen.

> What's the back-compat impact?

If we assume that the default stylesheet part trumps the definition in the 'dir' attribute section, then Aharon's tweak in comment #1 doesn't change anything.

If you ask about the back-compat impact of the overall bug, I'd describe it as follows:

There is a difference between embedding and isolation only in weird corner cases. And these cases are so that when you hit them, you want isolation, not embedding. You have to fake that by adding some &lrm; or &rlm; or some such. The fake doesn't do any harm when interpretation changes from embedding to isolation, it just becomes unnecessary. So the back-compat impact is essentially non-existent.
Comment 4 Aharon Lanin 2013-09-18 14:38:22 UTC
> What problem is it solving?

If you ask someone who uses the dir attribute what it does, they will answer (correctly) that it sets the direction of the stuff inside the element. What not one in a hundred will realize is that, as currently specified, it also affects the ordering of the content outside as would a strong character of that direction. Most of the time, this does not have a visible effect. But not infrequently, it has a disastrous effect. For example, in an LTR page,

<a dir="rtl">HEBREW ARTICLE NAME</a> - 27 september 2013

is displayed unreadably:

27 - EMAN ELCITRA WERBEH september 2013

No one expects it, but it happens.

This is why a couple of years ago we asked for directionality isolation (https://www.w3.org/Bugs/Public/show_bug.cgi?id=10807). And we got it - as the <bdi> element. The idea was that when you want it, you use it. Thus,

<a><bdi dir="rtl">HEBREW ARTICLE NAME</bdi></a> - 27 september 2013

displays as expected:

EMAN ELCITRA WERBEH - 27 september 2013

Now, the problem, as fully explained in http://www.w3.org/International/wiki/Html-bidi-isolation:

Non-isolation is almost never intended and usually only causes problems. As Unicode admitted, isolation should have been the way that embedding was defined when UAX#9 first came out all those years ago. LRE and RLE are deprecated in favor of LRI and RLI. Thus, in HTML, isolation should be the default.

<bdi> can not continue to be the only or even the primary way to achieve isolation in markup, since it relegates isolation to being a little-known power tool instead of the default for bidi content, and since using a special element for this purpose is impractical in some scenarios.

- As long as isolates are more difficult to set up than embeddings, embeddings will be the default, and isolates the exception; the use of isolates will not replace the use of embeddings.

- A single attribute has historically been and should continue to be sufficient to do all the bidi in HTML. Why should the preferred way to embed opposite-direction content inline now require the use of both a special-purpose element (<bdi>) and a special attribute (dir)?

- HTML document authors must be instructed that when a “block” element like <p> gets opposite-direction content, they should indicate it by putting a dir attribute on that element. For “inline” elements, however, it depends. An element like <textarea> or <input> or <option> whose content is inherently “out-of-flow” and thus directionally isolated can also get the dir attribute directly on it. However, when an “ordinary” “inline” element like <cite> gets opposite-direction content, they should not just put the dir attribute directly on it, but on a special <bdi> element especially inserted for that purpose either within the <cite> or around it. (Which, by the way?) The distinctions are impossible to justify or explain!

- When an HTML or XHTML document tags a data item with microformatting or some other form of data export, it makes good sense to also indicate the data item’s direction using an attribute on the tagged element, so that consumers of the data will know how to display it properly. It makes little sense to put it on a surrounding element, where consumers of the data will ignore it (unless they bother to ask for the tagged element’s computed direction style) or on an element especially introduced within the tagged element for the purpose of carrying the attribute, suddenly turning what had been a nice plain-text data item into HTML. If the attribute goes on the tagged element, and it happens to be inline, we want it to be isolated, so now the tagged element suddenly has to be <bdi>. Do we need to update the RFCs on microformatting to require the use of <bdi> for all microformatting (except where a “block” element is used)?

In brief, we must make it possible to set up bidi isolates by using the dir attribute alone.

> Can't we leave it for a few years?

No.

> What's the back-compat impact?

Overall, good, for two reasons:

- We believe it will fix more problems in existing pages than it will create.

- There is no browser interoperability in this respect anyway because IE8-10 does not follow the current spec. It displays an inline element with a dir attribute as if that element were followed by an LRM or an RLM, whichever fits the directionality of the parent. Thus, the example above is displayed as intended - and as it would display if dir were defined as isolating.

> Why are we changing this stuff _again_?

Actually, we aren't changing *this* stuff again. The effect of the basic dir attribute was not changed in the last go-around. We added a new value, auto, which is great and which the current proposal is not touching in any way. And we added the <bdi> element, which we are now not asking to change in any way either - it remains handy in some situations. In hindsight, we should have asked for this in the last go-around. We didn't because we were afraid of backward compatibility issues, and because we were not aware that IE had silently changed the semantics of dir to something that usually has the same effect as isolation. I guess they had more chutzpah than we did - and should have had.

> There's so much churn in this part of the spec that I'd be surprised if any
> browser vendor even looked at the spec any more.

They do. Mozilla and WebKit and Blink implemented dir=auto and <bdi>. IE hasn't yet, whcih I very much hope will no longer be the case in IE11 (anyone listening? :-), but as evidenced by what they changed in IE8, it seems that they weren't looking at this part of the spec anyway.

> Are browsers planning on changing to this?

I don't know about IE - I don't have any contacts there. I have every reason that Mozilla will change this if the spec changes; I have discussed this with Fatasai and Simon. I believe I can also vouch for Blink (once the spec changes). I have not discussed this with anyone at WebKit, but I have no reason to think that they won't change it. But the best thing would be if the various representatives responded on this bug.

> If we assume that the default stylesheet part trumps the definition in the
> 'dir' attribute section, then Aharon's tweak in comment #1 doesn't change
> anything.

It does trump it, the change would be mandatory, browsers would not have leeway, and the tweak doesn't change anything. It just makes things clearer in the part of the spec that humans read.
Comment 5 Travis Leithead [MSFT] 2013-09-18 23:28:17 UTC
These are great clarifications, thank you.
Comment 6 Richard Ishida 2013-09-19 11:59:41 UTC
Aharon, thanks for responding so much more clearly and comprehensively than what I was preparing.
Comment 7 Ian 'Hixie' Hickson 2013-09-23 19:54:49 UTC
So does this mean that with this change we can drop <bdi>?

Why didn't we do this to start with instead of introducing <bdi>?
Comment 8 Aharon Lanin 2013-09-24 08:21:59 UTC
(In reply to Ian 'Hixie' Hickson from comment #7)
> So does this mean that with this change we can drop <bdi>?

If we were doing this from scratch now, we would not bother introducing <bdi>. I am not sure it is a good idea to drop it, though, because it is starting to be used, and because of its dir="auto" default, it is handy for marking off unknown-direction inserts.

> Why didn't we do this to start with instead of introducing <bdi>?

When we filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=10807, we were afraid to make backward-incompatible changes. Furthermore, at that time, there were no Unicode isolates, and it was not yet well accepted that Unicode LRE and RLE should have been defined as isolates from the get-go.

However, we did not suggest adding <bdi>. What we asked to do was to add a new boolean attribute that would control isolation. Thus, <span dir="rtl" ubi> and <span ubi> would be isolating; <span dir="rtl"> (or <span dir="rtl" ubi="off") would not.

That proposal was not accepted. You proposed doing isolation via an element instead of an attribute in https://www.w3.org/Bugs/Public/show_bug.cgi?id=10807#c6. I objected several times in follow-up comments, but you insisted, and in the end I deferred to you. I was not yet aware that requiring an element gives the additional problems I described above (microformatting and the block/inline dichotomy).
Comment 9 Ian 'Hixie' Hickson 2013-09-24 22:09:04 UTC
> When we filed bug 10807, we were afraid to make backward-incompatible changes. 

Why are we not afraid any more? What's changed?

In bug 10807 comment 19, you wrote:

| Also, it is possible that adding isolation by default would break existing 
| documents. This is also the argument against doing isolation by default any 
| time the dir attribute is set.

In bug 10807 comment 14, you wrote:

| In fact, if <a> and <q> were being invented today, we would want isolation
| for them by default - but we dare not do that now because it would most
| certainly break some existing documents.

I agree! This change would "most certainly break some existing documents". This usually makes it a non-starter. Do we have data now suggesting that we are wrong to expect breakage?


> However, we did not suggest adding <bdi>. What we asked to do was to add a
> new boolean attribute that would control isolation. Thus, <span dir="rtl"
> ubi> and <span ubi> would be isolating; <span dir="rtl"> (or <span dir="rtl"
> ubi="off") would not.
> 
> That proposal was not accepted.

That proposal wasn't a problem, it was a solution. When the problems were described, it turned out that an attribute didn't make sense to solve them. (I note that with this bug, again, initially just a solution was described, with no description of the problem.)

I'm still not sure I understand what has changed since then. Why are the use cases that led to <bdi> no longer satisfied by <bdi>?


> You proposed doing isolation via an element instead of an attribute in bug 
> 10807. I objected several times in follow-up comments

Objecting without providing a reason ("I still prefer attribute as the more easily used and less disruptive solution", bug 10807 comment 26) is not a useful objection. Bug 10807 comment 14 is the only place where you attempted to provide an actual reason to prefer a global attribute, but I provided counter arguments in bug 10807 comment 17 and bug 10807 comment 27.


Indeed in bug 10807 comment 22 I specifically proposed exactly what this bug is now asking for:

| How could it break an existing document? Might it not fix as many if not
| more documents than it breaks?
|
| Indeed, doing this automatically any time dir="" is explicitly set might
| not be a bad idea either... do we have any data on how many pages would
| change rendering if we did this? Might this not actually make more sense
| overall?
|
| I'm very much of the opinion that we should make this work as automatically
| as possible, because there's no way most authors are going to learn or
| understand this stuff.

...and it was only because of your detailed arguments that I abandoned that line of research — see the last 17 paragraphs of bug 10807 comment 26 (also quoted at the end of this comment).


(In reply to Aharon Lanin from comment #4)
> 
> <bdi> can not continue to be the only or even the primary way to achieve
> isolation in markup, since it relegates isolation to being a little-known
> power tool instead of the default for bidi content

I don't see why <bdi dir=""> need be a "little-known power tool". Plenty of pages use <a href="">, which is arguably more complicated.


> and since using a
> special element for this purpose is impractical in some scenarios.

What scenarios? The scenarios for which <bdi> was invented are those listed in bug 10807 comment 16, primarily the first of those two. Why would <bdi> not work for those? Or are there new use cases that haven't been listed yet?


> - As long as isolates are more difficult to set up than embeddings,
> embeddings will be the default, and isolates the exception; the use of
> isolates will not replace the use of embeddings.

I don't see why

   <span dir="ltr">...</span>

...is any easier than:

   <bdi dir="ltr">...</bdi>


> - A single attribute has historically been and should continue to be
> sufficient to do all the bidi in HTML.

I disagree with both premises in this statement. It's never been sufficient, and even if it was, that isn't a reason to continue that way. (Or not continue. It's just not a relevant factor.)


> Why should the preferred way to embed
> opposite-direction content inline now require the use of both a
> special-purpose element (<bdi>) and a special attribute (dir)?

For the same reason that every semantic in HTML requires a special-purpose element, basically, with the attribute to provide fine-grained control. That's how HTML works.

Global attributes make sense when they apply globally, but isolation only makes sense at the phrasing level (since all non-phrasing elements are always isolated, due to earlier changes). So if we're not to just change dir=""'s semantics, something which three years ago you convincingly argued we should never do, then an element makes sense, as you agreed in bug 10807 comment 28.


> - HTML document authors must be instructed that when a “block” element like
> <p> gets opposite-direction content, they should indicate it by putting a
> dir attribute on that element. For “inline” elements, however, it depends.

You say this like it's complicated, but I don't think it is. All you have to say is "Set your text directionality on your container element, such as <body> or <p>, using the dir="" attribute. When you embed text from a different directionality inside other text, use a <bdi> element".

(Note that HTML has neither "block" nor "inline" anymore. There are "flow" elements, like <p>, <li>, <bdi>, or <span>, and some of those are also "phrase" elements, like <bdi> and <span>.)


> An element like <textarea> or <input> or <option> whose content is
> inherently “out-of-flow” and thus directionally isolated can also get the
> dir attribute directly on it. However, when an “ordinary” “inline” element
> like <cite> gets opposite-direction content, they should not just put the
> dir attribute directly on it, but on a special <bdi> element especially
> inserted for that purpose either within the <cite> or around it. (Which, by
> the way?) The distinctions are impossible to justify or explain!

As I said in bug 10807 comment 17: "By that argument, we shouldn't have <bdo>, or indeed <a> (many links are given on elements that are already in the markup) or indeed many of the phrasing elements... I don't think that argument holds water". Authors have no trouble figuring out that they can do <a><cite>, why would they have trouble figuring out they can do <bdi><cite>?


> - When an HTML or XHTML document tags a data item with microformatting or
> some other form of data export, it makes good sense to also indicate the
> data item’s direction using an attribute on the tagged element, so that
> consumers of the data will know how to display it properly. It makes little
> sense to put it on a surrounding element, where consumers of the data will
> ignore it (unless they bother to ask for the tagged element’s computed
> direction style) or on an element especially introduced within the tagged
> element for the purpose of carrying the attribute, suddenly turning what had
> been a nice plain-text data item into HTML. If the attribute goes on the
> tagged element, and it happens to be inline, we want it to be isolated, so
> now the tagged element suddenly has to be <bdi>. Do we need to update the
> RFCs on microformatting to require the use of <bdi> for all microformatting
> (except where a “block” element is used)?

I'm not sure what you mean here.

If you're referring to microdata, then it completely ignores directionality, so it doesn't matter where you put the dir="" attribute (microdata only operates on text strings). If you mean microformats, then Tantek informs me that it honours HTML's semantics transparently, so it doesn't matter if the dir="" is on the element with the class attribute, its parent, or its child.


> In brief, we must make it possible to set up bidi isolates by using the dir
> attribute alone.

Please respond to comment 10807 comment 26 (reproduced below) explaining why those comments are no longer true.


> > Why are we changing this stuff _again_?
> 
> Actually, we aren't changing *this* stuff again.

By "this stuff" I mean anything touching bidi. It seems to me we've changed bidi stuff in the HTML spec at least once a year for the past four or five years.


The end of bug 10807 comment 26 follows:
===============================================================================
> Indeed, doing this automatically any time dir="" is explicitly set might not 
> be a bad idea either... do we have any data on how many pages would change
> rendering if we did this?

Authors do not always know what they are doing, especially when it comes to bidi. Consider the following:

i spoke to JOHN. <span dir=ltr>susan</span>, MIKE and ollie spoke to him too.

Of course, the dir=ltr on susan is unnecessary, while dir=rtl would have been a good idea on JOHN and MIKE, but like I said, people often get really confused when it comes to bidi. Currently, despite all the nonsense, it is rendered as intended:

i spoke to NHOJ. susan, EKIM and ollie spoke to him too.

With isolation snuck in by default, though, one would get:

i spoke to EKIM ,susan .NHOJ and ollie spoke to him too.

Not convinced? Let's try this one, as might be output by a web app that is trying to visualize some sort of relationship between FOO and BAR, which are names from its database:

Summary: FOO <span dir=ltr>==&gt;</span> BAR

This gets rendered as

Summary: OOF ==> RAB

The dir=ltr on the ==> was put in to prevent it from being displayed as

Summary: RAB <== OOF

which might not be to the app UI designer's liking for some good reason. Of course, another way to fix this would have been with an &lrm; somewhere between FOO and BAR, but nearly no one knows how to use &lrm;. Also, dir=rtl on both FOO and BAR would have been a good idea, but that would not have fixed the UI designer's original problem, and it may be that they had not yet run into the issue of the names themselves getting garbled yet, so they did not do it. This scenario is very, very realistic. Unfortunately, the introduction of isolate-by-default onto the dir=ltr will break their fix and make their application suddenly regress.

> Might it not fix as many if not more
> documents than it breaks?

The breakage that one gets due to lack of isolation, when it happens, is quite obvious. If the page gets any QA, it will be found and fixed, somehow - if anyone cares enough about it. If the page doesn't get QA, it likely has a dozen other bidi problems that we won't fix automatically. Besides, one bug added due to lack of backward compatibility is worth about a hundred that got fixed "for free" - but which apparently no one cared about enough to fix themselves.

I am therefore extremely against doing isolation automatically any time dir is specified. If we were inventing the dir attribute today, I would be all for it, but not as things stand today.

Similarly, I would not do it automatically when lang is specified. This has the additional handicap of being unimportant due to the low incidence of lang attribute use, especially inline.
===============================================================================
Comment 10 Richard Ishida 2013-09-25 06:28:30 UTC
Ian, have you read http://www.w3.org/International/wiki/Html-bidi-isolation?

The main thing that has changed is that Unicode 6.3, which will be released in a few days time, now recommends that isolation be the default for *all* future inline bidirectional text embeddings, and that document summarises a discussion that spanned several months about how best to achieve that.

We started, initially, by trying to make bdi work, but came upon a number of problems - not least, how to ensure that isolation is henceforth always applied (and in the simplest way) in all situations where people mark up their content for direction. The solution of making dir isolate not only solves that problem, but provides a useful fix for existing pages, and establishes consistency with behaviours already in IE.

Making dir isolate also has the potential to significantly reduce the complexity of markup for most situations. See http://www.w3.org/International/articles/inline-bidi-markup/#whattodo, where all the various rules for dealing with bidi in HTML4 and HTML5 when you know the direction of the text can simply be reduced to "continue to use dir as you always have". No need to learn about or remember to use new markup or RLM/LRM.
Comment 11 Ian 'Hixie' Hickson 2013-09-26 18:15:09 UTC
My point is that I proposed all this three years ago, and was told it was impossible because of backwards compatibility concerns, which seemed (and still seem) quite valid and compelling. If the idea was bad three years ago because of compatibility concerns, why is it not bad now because of compatibility concerns? I don't understand how it could have gotten _less_ bad. The content that would have broken then is still there, and we've probably added an order of magnitude more content since.
Comment 12 Aharon Lanin 2013-10-09 08:08:17 UTC
Primarily, what changed for me is that, unbeknownst to us, IE 8 - 10 started treating the dir attribute in a way that is:
- incompatible with the current spec
- not backward compatible
- not interoperable with other browsers
- similar in effect to isolation (in the usual case).

The interesting part is that when IE did this, the world did not come to an end. No complaints made their way to me. No bugs were filed on products I follow that were due to this change. I heard nothing about it on the web.

Thus, the arguments and dire predictions I made in bug 10807 comment 26 against making isolation the default behavior of the dir attribute simply did not stand the test of time. Some of the use cases I cited there did come from real pages, but that usage was changed soon after I cited because it was broken in other ways - not because the IE8 change had broken it.

It turned out that I was just too obsessed with backward compatibility. 

You did suggest three years ago precisely what we are suggesting now, and I was the one who argued not to do it. In March of this year, I was dumbfounded to discovered the change in IE, and gradually became convinced that I was wrong to argue against changing the semantics of dir. By the time I was writing comment 8 above, it appears that I had completely repressed that history. I simply did not remember your suggestion and my response. I profoundly apologize for my very partial (and self-serving) recollection. I realized just how unfair it was when you answered my comment, and have been beating myself up over it for the past couple of weeks. Once again, I apologize.

In any case, I am only addressing here the backward compatibility issue. It appears I also need to clarify further the problems with using <bdi>, but want to do so separately, and only if we can get past the backward compatibility issue.
Comment 13 Ian 'Hixie' Hickson 2013-10-09 16:43:34 UTC
(In reply to Aharon Lanin from comment #12)
> Primarily, what changed for me is that, unbeknownst to us, IE 8 - 10 started
> treating the dir attribute in a way that is [...]

Ah, interesting. I didn't realise that the problems with IE's behaviour were as radical, and as clear of negative effects, as that. That's good to know.


> I profoundly apologize for my very partial (and self-serving) recollection.

No worries (apology fully accepted, thank you!). My concern wasn't so much that I had proposed it, so much as we had decided against it before, and it wasn't clear to me what had changed. Thank you for clarifying the impact of IE's bugs to this decision.


Based on the lack of effect of IE's similar (though not identical) change, it seems reasonable to make the change propose above, to satisfy the use cases in bug 10807. This makes <bdi> pointless, as far as I can tell. Should we drop <bdi>? (Should we replace it with <bde>, as a way to use the embeddings, should anyone want them?)

Note that one set of pages that will be deeply affected by this are bidi test cases. That's gonna be a mess and a half.
Comment 14 Aharon Lanin 2013-10-10 08:07:31 UTC
> No worries (apology fully accepted, thank you!).

Thank you!

> This makes <bdi> pointless, as far as I can tell.
> Should we drop <bdi>?
> Should we replace it with <bde>, as a way to use the embeddings,
> should anyone want them?


If we were doing this from scratch now, we would not bother introducing <bdi>. I am not sure it is a good idea to drop it, though, because it is starting to be used, and because it is handy for marking off unknown-direction inserts due to its dir="auto" default. It could be re-labeled as "bidi insert" instead of "bidi isolate".

I don't think we need <bde>. One can always get a very close approximation of embedding by using &lrm;s or &rlm;s around an inline element with (an isolating) dir attribute (whichever matches the parent's directionality). Or one can use CSS in the page to override the default CSS change that will make dir isolating.
Comment 15 Ian 'Hixie' Hickson 2013-10-10 22:13:29 UTC
The dir=auto default on <bdi> making it still somewhat useful makes sense, true.
Comment 16 Erika Doyle Navara 2013-10-24 23:40:37 UTC
FWIW, Travis and I mined the IE archives for more background on the bidi changes we made in IE8 and their impact.

The change in behavior was a byproduct of our rewrite of the bidi algorithm to comply with Unicode, but not intentionally speced out behavior. Nevertheless, the compat servicing fallout seems to have been minimal: I couldn't find any bugs directly related to this issue since then that we didn't already fix during the course of IE8 development (2 public betas + 1 release candidate build), and our developers from the time don't recall any major issues.

With IE11, we actually changed our behavior to match browser interop, but even so, IE is on board with this change.

I went ahead and posted the change for public review:

http://lists.w3.org/Archives/Public/public-html/2013Oct/0113.html

How does this look:

https://github.com/w3c/html/commit/efbf692e44fa3e6b9d66a0823fc2133ff1d18f1e
Comment 17 Aharon Lanin 2013-10-25 02:33:54 UTC
The change basically looks good to me, with two exceptions.

1. The "<!-- basically anything that is display:block-like -->" comment should no longer be on the "unicode-bidi: isolate;" line, since it is talking only about the list of "block-like" elements at the start of the selector. It does not apply to the "[dir=ltr i], [dir=rtl i], [dir=auto i]" and "bdi, output" parts of the selector. Thus, those lines should really look something like this (if we want to retain comments):

ol, ul, li, <!-- anything that by default has display:block, in case document gives it display:inline -->
[dir=ltr i], [dir=rtl i], [dir=auto i], <!-- anything with a dir attribute -->
bdi, output <!-- inlines defined as directionally isolated -->
{
  unicode-bidi: isolate;
}

2. When we filed this bug (see above), we speculated that perhaps <bdo dir="ltr|rtl"> should also be directionally isolated, i.e. given unicode-bidi:isolate-override instead if the current unicode-bidi:bidi-override, but noted that this has not been discussed. Since then, we have discussed this some more in the i18n WG and (I think) feel that indeed this should be done. The arguments, if I remember correctly, are that it would make the effects of dir more consistent across different kinds of elements, and should be helpful (for the exactly the same reasons as it is helpful on other inline elements) on bdo in its most common use case of displaying "visual Hebrew" text.
Comment 18 Ian 'Hixie' Hickson 2013-11-14 22:17:46 UTC
Let me know if I missed something.
Comment 19 contributor 2013-11-14 22:20:10 UTC
Checked in as WHATWG revision r8283.
Check-in comment: Change how dir='' works, from being an embedding to being an override, for better results on mixed-directionality sites. THIS IS A HIGH RISK CHANGE, EXPECT BREAKAGE. Please report breakage on the bug if it's higher than acceptable, so we can revert the change if necessary.
http://html5.org/tools/web-apps-tracker?from=8282&to=8283
Comment 20 Aharon Lanin 2013-12-02 08:32:29 UTC
Sorry I am replying so late, but there is a number of problems in the latest set of changes.

1. There is a problem in the CSS. The selector of "bdo { unicode-bidi: isolate-override; }" has a lower specificity than the selectors of "[dir=ltr i], [dir=rtl i], [dir=auto i] { unicode-bidi: isolate; }". As a result, this assigns unicode-bidi:isolate to <bdo dir="ltr"> instead of the intended isolate-override. To fix this, "bdo { unicode-bidi: isolate-override; }" should be replaced with "bdo, bdo[dir=ltr i], bdo[dir=rtl i], bdo[dir=auto i] { unicode-bidi: isolate-override; }". Note that this does assign isolate-override to <bdo> and <bdo dir="auto">, though, which according to the spec for bdo are not allowed. If that is problem (even though we have not worried about it up to now), we need to break this up into two parts:

bdo, bdo[dir=auto i] { unicode-bidi: normal; }

bdo[dir=ltr i], bdo[dir=rtl i] { unicode-bidi: isolate-override; }

2. The spec for bdo, which previously stated that the element behaves as if it were surrounded by LRO ... PDF or RLO ... PDF, now states that it behaves as if it were surrounded by LRI ... PDI or RLI ... PDI. This is wrong, or at least incomplete, because the isolate formatting characters alone do not achieve the required override effect. It has to behave as if surrounded by LRI LRO ... PDF PDI or RLI RLO ... PDF PDI. But actually, I think it would be best to remove any reference to implementation via Unicode formatting characters entirely, because this is actually in the province of http://www.w3.org/TR/css-writing-modes-3. The specification there is actually much more involved because it has to deal with the possibility that an element (that is supposed to behave as surrounded by bidi formatting characters) happens to contain within it various things that are bidi paragraph breaks. (These include newlines when white-space is preformatted, <br>, and nested elements with display:block.) According to the Unicode spec, a paragraph break ends the effect of bidi formatting character(s), but this does not make sense for HTML/CSS, where one would expect the effect of a dir attribute to last until the end of the element. Thus, when such a paragraph break is encountered, the spec says that the opening Unicode formatting characters for all the relevant ancestor elements have to be repeated after the paragraph break. For example, for <div dir="ltr">hello <bdo dir="rtl">foo<br>bar</bdo> world</div>, the equivalent plain text is "[LRE]hello [RLI][RLO]foo[PDF][PDI]\n[RLI][RLO]bar[PDF][PDI] world[PDF]". I do not think that any of this should be specified on the HTML level because it needs to refer to style - display, white-space, unicode-bidi. (And, if we were to specify it on the HTML level, the place to start would be the dir attribute, not the relatively unimportant <bdo>.)

Thus I think that the two paragraphs in the <bdo> spec that start with "If the element's directionality is" should be removed. If you do not want to make such a radical change at this late date, then they should be restored to the text they had prior to http://html5.org/tools/web-apps-tracker?from=8282&to=8283.

3. A minor typo: "a such an element" should be "such an element".

4. While it probably is not absolutely necessary, I think that the sentence (in the spec for the dir attribute) that currently reads "For the purposes of applying the bidirectional algorithm to the contents of elements with a dir attribute that is in one of the states defined above, user agents must treat the element as an independent and isolated segment" would be more complete if the words "of the directionality determined above" or something like them were added. But I would not have reopened the bug for that alone.
Comment 21 Ian 'Hixie' Hickson 2013-12-02 20:06:58 UTC
(In reply to Aharon Lanin from comment #20)
> 1. There is a problem in the CSS. The selector of "bdo { unicode-bidi:
> isolate-override; }" has a lower specificity than the selectors of "[dir=ltr
> i], [dir=rtl i], [dir=auto i] { unicode-bidi: isolate; }".

Oops, fixed.


> But actually, I think it would
> be best to remove any reference to implementation via Unicode formatting
> characters entirely, because this is actually in the province of
> http://www.w3.org/TR/css-writing-modes-3.

Unfortunately CSS is optional so we can't rely on it being present.

But I suppose we could just require that UAs act as if it was present. Done. This supersedes your other comments in point 2 and your points 3 and 4.

Note that the above specification doesn't seem to actually define any of this in sufficient detail. The bidi spec lists the precise ways in which it can be overridden by a higher-level spec (the HL* rules), but I don't see anything in the Writing Modes spec that actually does this adequately. So right now, this is still far too vaguely defined to get interop. (We are also relying on the 'content' property, which is currently basically unspecified.)
Comment 22 contributor 2013-12-02 20:07:12 UTC
Checked in as WHATWG revision r8317.
Check-in comment: Move all requirements about bidi out and just rely on CSS instead. Also, fix the CSS rules for <bdo>.
http://html5.org/tools/web-apps-tracker?from=8316&to=8317
Comment 23 Aharon Lanin 2013-12-03 12:28:47 UTC
> fixed the CSS rules for <bdo>.

Thanks, looks good.

> Note that the above specification

Not sure what you mean by "above specification"

> doesn't seem to actually define any of this in sufficient detail.
> The bidi spec lists the precise ways in which it can be overridden by a
> higher-level spec (the HL* rules), but I don't see anything in
> the Writing Modes spec that actually does this adequately. So right now, this
> is still far too vaguely defined to get interop.

Here is some of the relevant text from Writing Modes:

"User agents that support bidirectional text must apply the Unicode bidirectional algorithm to every sequence of inline-level boxes uninterrupted by any block boundary or “bidi type B” forced paragraph break. This sequence forms the paragraph unit in the bidirectional algorithm."

"In general, the paragraph embedding level is set according to the direction property of the paragraph’s containing block rather than by the heuristic given in steps P2 and P3 of the Unicode algorithm. [UAX9] When the computed unicode-bidi of the paragraph’s containing block is plaintext, however, the Unicode heuristics (rules P2 and P3) are used instead."

"If an inline element is broken around a bidi paragraph boundary (e.g. if split by a block or forced paragraph break), then the bidi control codes assigned to the end of the element are added before the interruption and the codes assigned to the start of the element are added after it. (In other words, any embedding levels or overrides started by the element are closed at the paragraph break and reopened on the other side of it.)"

If you think that this is inadequate in some specific way, perhaps it should be brought up on the CSS list?

> We are also relying on the 'content' property, which is currently basically
> unspecified.

Indeed, this bothers me, and I am not sure that it is a good idea to remove the specific bidi requirements from the description of <br> and <wbr>.

Furthermore, the CSS for <br> and <wbr> is in http://www.whatwg.org/specs/web-apps/current-work/#phrasing-content-1, and not in http://www.whatwg.org/specs/web-apps/current-work/#bidirectional-text, so it is not at all obvious that there is anything bidi-significant there.

Perhaps a note should be added to http://www.whatwg.org/specs/web-apps/current-work/#bidirectional-text saying that http://www.whatwg.org/specs/web-apps/current-work/#phrasing-content-1 also contains some rules relevant to bidirectional reordering, in that the content specified for <br> and <wbr> determines their bidirectional properties.
Comment 24 Ian 'Hixie' Hickson 2013-12-05 18:32:03 UTC
> > Note that the above specification
> 
> Not sure what you mean by "above specification"

http://dev.w3.org/csswg/css-writing-modes


> "In general, the paragraph embedding level is set according to the direction
> property of the paragraph’s containing block rather than by the heuristic
> given in steps P2 and P3 of the Unicode algorithm.

That's non-normative (no MUST). And it doesn't use the HL* rules to define the override formally as the bidi spec requests.


> [UAX9] When the computed
> unicode-bidi of the paragraph’s containing block is plaintext, however, the
> Unicode heuristics (rules P2 and P3) are used instead."

That's non-normative (no MUST).


> "If an inline element is broken around a bidi paragraph boundary (e.g. if
> split by a block or forced paragraph break), then the bidi control codes
> assigned to the end of the element are added before the interruption and the
> codes assigned to the start of the element are added after it."

That's non-normative (no MUST). Also it's vague — is "the bidi control codes assigned to the end" defined anywhere? Does it handle nested elements? How about bidi formatting codes that come from the block? (e.g. a block-level bdo)


> If you think that this is inadequate in some specific way, perhaps it should
> be brought up on the CSS list?

Filed bug 24006.


> > We are also relying on the 'content' property, which is currently basically
> > unspecified.
> 
> Indeed, this bothers me

We just need to spec the 'content' property. We're relying on it for more than just bidi.


> Furthermore, the CSS for <br> and <wbr> is in
> http://www.whatwg.org/specs/web-apps/current-work/#phrasing-content-1, and
> not in
> http://www.whatwg.org/specs/web-apps/current-work/#bidirectional-text, so it
> is not at all obvious that there is anything bidi-significant there.

I've added cross-references.
Comment 25 Aharon Lanin 2013-12-06 06:59:53 UTC
I don't see the cross-references yet.
Comment 26 Aharon Lanin 2013-12-10 10:36:02 UTC
I see them now, thanks.