<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>11234</bug_id>
          
          <creation_ts>2010-11-05 11:30:31 +0000</creation_ts>
          <short_desc>Invalidate documents whose text content contains improperly balanced bidi formatting characters</short_desc>
          <delta_ts>2011-09-04 17:38:48 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>LC1 HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Aharon Lanin">aharon.lists.lanin</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>addison</cc>
    
    <cc>aharon.lists.lanin</cc>
    
    <cc>ayg</cc>
    
    <cc>fantasai.bugs</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>public-i18n-bidi</cc>
    
    <cc>shachar</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>42177</commentid>
    <comment_count>0</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2010-11-05 11:30:31 +0000</bug_when>
    <thetext>As has surfaced in the discussion of bug 10809, it would be helpful to declare invalid documents where any element&apos;s text node children (*not* descendants generally) contain improperly balanced LRE, RLE, LRO, RLO, or PDF characters. In other words, for the purposes of validation, treat every LRE, RLE, LRO, or RLO character as the opening tag of an imaginary element, something like &lt;bidi-formatting&gt;, and PDF as that imaginary element&apos;s closing tag. This applies to these character&apos;s entities, as well, of course.

Examples of invalid usage:

1. &lt;div&gt;&amp;#x202A;&lt;/div&gt;
2. &lt;div&gt;&amp;#x202C;&lt;/div&gt;
3. &lt;div&gt;&amp;#x202C;&amp;#x202A;&lt;/div&gt;
4. &lt;div&gt;&amp;#x202A;&amp;#x202A;&amp;#x202C;&lt;/div&gt;
5. &lt;div&gt;&amp;#x202A;&lt;br&gt;&amp;#x202A;&amp;#x202C;&lt;/div&gt;
6. &lt;div&gt;&amp;#x202A;&lt;span&gt;&amp;#x202C;&lt;/span&gt;&lt;/div&gt;
7. &lt;div&gt;&lt;span&gt;&amp;#x202A;&lt;/span&gt;&amp;#x202C;&lt;/div&gt;

An example of valid (but not recommended!) usage:

&lt;div&gt;&amp;#x202A;&lt;span&gt;...&lt;/span&gt;&amp;#x202C;&lt;/div&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42233</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-11-08 08:01:28 +0000</bug_when>
    <thetext>This shouldn&apos;t be too hard to add to the spec.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42275</commentid>
    <comment_count>2</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2010-11-08 15:17:36 +0000</bug_when>
    <thetext>What about attribute values?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42280</commentid>
    <comment_count>3</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2010-11-08 17:11:24 +0000</bug_when>
    <thetext>(In reply to comment #2)
&gt; What about attribute values?

Not sure what you mean.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42286</commentid>
    <comment_count>4</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2010-11-08 21:07:45 +0000</bug_when>
    <thetext>I mean, should the following be invalid?

&lt;p title=&quot;&amp;#x202A;&quot;&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42303</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-11-09 02:17:12 +0000</bug_when>
    <thetext>Yes.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>42305</commentid>
    <comment_count>6</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2010-11-09 07:17:12 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; Yes.

Yeah, it makes sense. They should be balanced within an attribute value.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43981</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-01-10 09:31:46 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter&apos;s comments.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>43982</commentid>
    <comment_count>8</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-01-10 09:55:28 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r5754.
Check-in comment: Define conformance criteria around bidi formatting characters
http://html5.org/tools/web-apps-tracker?from=5753&amp;to=5754</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>53920</commentid>
    <comment_count>9</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-08-04 05:34:29 +0000</bug_when>
    <thetext>mass-move component to LC1</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>54378</commentid>
    <comment_count>10</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2011-08-08 13:42:59 +0000</bug_when>
    <thetext>The checked-in change seems to say that the use of the formatting characters (when restricted as specified) is perfectly fine:

&quot;[Text content] may contain characters in the range U+202A to U+202E (the bidirectional-algorithm formatting characters).&quot;

&quot;Note: *For convenience*, where possible authors will likely prefer to use the dir attribute, the bdo element, and the bdi element, rather than maintaining the bidirectional-algorithm formatting characters manually.&quot; (emphasis mine)

The use of the formatting characters, even when they obey the given rules, should still be discouraged. It is *not* equivalent to the use of the dir attribute and the bdo element, for two reasons. (BTW, the bdi element should not be mentioned at all. There is no way to faithfully emulate its behavior using the formatting characters.)

1. The dir attribute sets the element&apos;s directionality. The formatting characters don&apos;t. That means that they do no affect the proposed CSS4 :dir(ltr|rtl} pseudo-class.

2. When used around an element that introduces bidi paragraph break, e.g. &quot;LRE &lt;br&gt; PDF&quot; or &quot;LRE &lt;div&gt;&lt;/div&gt; PDF&quot;, the formatting characters go completely haywire, since the paragraph break resets the bidirectional state, so that the effect of the opening character is lost after the paragraph break, and the closing formatting character is unmatched. The effects of the dir attribute, on the other hand, are carefully defined in CSS (via its effect on unicode-bidi) to  be reopened after the paragraph break.

Neither of these can be fixed. Thus, the use of the formatting characters, even when they obey the given rules, should be discouraged wherever mark-up can be used instead. The bug as opened suggested ruling certain uses of formatting characters completely invalid. It did not suggest pronouncing the remaining use perfectly fine.

Certainly the use of the dir attribute etc. is more than a matter of convenience. It is *the only recommended way* of declaring text direction in HTML (except for those places where mark-up can not be used, e.g. inside &lt;option&gt; and &lt;title&gt;). The use of both CSS and formatting characters for this purpose is discouraged (for different reasons).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>54379</commentid>
    <comment_count>11</comment_count>
    <who name="Shachar Shemesh">shachar</who>
    <bug_when>2011-08-08 13:48:46 +0000</bug_when>
    <thetext>(In reply to comment #10)
&gt; Certainly the use of the dir attribute etc. is more than a matter of
&gt; convenience. It is *the only recommended way* of declaring text direction in
&gt; HTML

While I do not disagree with you on this point (which is to say, I agree), I think we should not go as far as recommending against (&quot;should not&quot;). The BiDi control characters can come in handy when different sources product the HTML entities and the content, and are sometimes the only practical option available.

Shachar</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>54390</commentid>
    <comment_count>12</comment_count>
    <who name="fantasai">fantasai.bugs</who>
    <bug_when>2011-08-08 19:34:35 +0000</bug_when>
    <thetext>(In reply to comment #10)
&gt; (BTW, the bdi element should
&gt; not be mentioned at all. There is no way to faithfully emulate its behavior
&gt; using the formatting characters.)

I disagree on this point; you can&apos;t faithfully emulate &lt;bdi&gt; with formatting characters as it&apos;s not equivalent to any one of them, but some of the problems that can are solved with formatting characters (like &amp;rlm;) are better solved with &lt;bdi&gt;, so this cross-reference should be given.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>54451</commentid>
    <comment_count>13</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2011-08-09 11:38:56 +0000</bug_when>
    <thetext>(In reply to comment #12)
&gt; (In reply to comment #10)
&gt; &gt; (BTW, the bdi element should
&gt; &gt; not be mentioned at all. There is no way to faithfully emulate its behavior
&gt; &gt; using the formatting characters.)
&gt; 
&gt; I disagree on this point; you can&apos;t faithfully emulate &lt;bdi&gt; with formatting
&gt; characters as it&apos;s not equivalent to any one of them, but some of the problems
&gt; that can are solved with formatting characters (like &amp;rlm;) are better solved
&gt; with &lt;bdi&gt;, so this cross-reference should be given.

Currently, the sentence says that the mark-up is just a convenience that translates to formatting characters, which is not really true for dir= and &lt;bdo&gt;, and completely untrue for &lt;bdi&gt;. If the sentence is changed to encourage people to use dir=, &lt;bdo&gt;, and &lt;bdi&gt; instead of formatting characters, then I fully agree with fantasai.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>54681</commentid>
    <comment_count>14</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2011-08-11 07:40:21 +0000</bug_when>
    <thetext>(In reply to comment #11)
&gt; (In reply to comment #10)
&gt; &gt; Certainly the use of the dir attribute etc. is more than a matter of
&gt; &gt; convenience. It is *the only recommended way* of declaring text direction in
&gt; &gt; HTML
&gt; 
&gt; While I do not disagree with you on this point (which is to say, I agree), I
&gt; think we should not go as far as recommending against (&quot;should not&quot;). The BiDi
&gt; control characters can come in handy when different sources product the HTML
&gt; entities and the content, and are sometimes the only practical option
&gt; available.
&gt; 
&gt; Shachar

The spec could recommend using directional mark-up instead of directional formatting characters whenever feasible. It could also have a note warning that placing an element between an LRE, RLE, LRO, or RLO and its matching PDF does not work well with various HTML and CSS features, and has effects that vary radically depending on the element&apos;s style.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55327</commentid>
    <comment_count>15</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-08-17 19:23:52 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter&apos;s comments. Specifically, I changed the spec to encourage authors to use the elements instead, and made the conformance rules not allow &quot;LRE &lt;div&gt;&lt;/div&gt; PDF&quot;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55328</commentid>
    <comment_count>16</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-08-17 19:24:27 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r6487.
Check-in comment: More useful conformance rules and advice for bidi formatting characters
http://html5.org/tools/web-apps-tracker?from=6486&amp;to=6487</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55547</commentid>
    <comment_count>17</comment_count>
    <who name="Aharon Lanin">aharon.lists.lanin</who>
    <bug_when>2011-08-21 15:39:59 +0000</bug_when>
    <thetext>The change looks great, with two small flaws:

1. The treatment given to an element that is flow content but is not also phrasing content should be extended to &lt;br&gt;, which also serves as a bidi paragraph break, and thus (by design) terminates the effects of the bidi formatting characters.

2. The comment that the formatting characters interact poorly with CSS is too narrow - they also interact poorly with some HTML features (even when used as currently spec&apos;ed). An example:

&lt;div dir=rtl&gt;&amp;#x202A;If this works I will eat my &lt;input /&gt;.&amp;#x202C;&lt;/div&gt;

The &lt;input&gt; will have RTL directionality despite being between an LRE and its matching PDF.

I am not suggesting adding this example or changing the validity spec - just expanding the note to include some unspecified HTML features (as opposed to just CSS).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55661</commentid>
    <comment_count>18</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-08-23 05:25:01 +0000</bug_when>
    <thetext>I&apos;ll add something about &lt;br&gt;.
 

&gt; &lt;div dir=rtl&gt;&amp;#x202A;If this works I will eat my &lt;input /&gt;.&amp;#x202C;&lt;/div&gt;

That&apos;s not a poor interaction IMHO.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>56080</commentid>
    <comment_count>19</comment_count>
    <who name="Addison Phillips">addison</who>
    <bug_when>2011-08-31 05:18:11 +0000</bug_when>
    <thetext>BTW&gt; The I18N WG supported re-opening this bug and Aharon&apos;s comments generally (I18N-ACTION-66).

In looking at the changes, I note that there may be a very minor typo where is says:

--
The strings resulting from the applying the following algorithm...
--

It should say &quot;The string&quot;, since &quot;output&quot; is a single string?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>56291</commentid>
    <comment_count>20</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-09-04 17:35:32 +0000</bug_when>
    <thetext>(In reply to comment #19)
&gt; It should say &quot;The string&quot;, since &quot;output&quot; is a single string?

&quot;output&quot; is a list of strings.


EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: see diff given below
Rationale: Addressed the &lt;br&gt; issue.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>56292</commentid>
    <comment_count>21</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-09-04 17:38:48 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r6533.
Check-in comment: Make sure &lt;br&gt; is handled right in the requirements regarding bidi formatting characters.
http://html5.org/tools/web-apps-tracker?from=6532&amp;to=6533</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>