<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>10806</bug_id>
          
          <creation_ts>2010-09-29 11:45:56 +0000</creation_ts>
          <short_desc>ignoring escapes is not needed for compatibility with existing content</short_desc>
          <delta_ts>2011-03-24 12:38:22 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML5 spec (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc>http://dev.w3.org/html5/spec/Overview.html#content-type-sniffing</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>NE, WGDecision</keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          <blocked>10804</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter name="Julian Reschke">julian.reschke</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>annevk</cc>
    
    <cc>hsivonen</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>mjs</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>rubys</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>39929</commentid>
    <comment_count>0</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-29 11:45:56 +0000</bug_when>
    <thetext>&lt;http://dev.w3.org/html5/spec/Overview.html#content-type-sniffing&gt;:

&quot;If it is a U+0022 QUOTATION MARK (&apos;&quot;&apos;) and there is a later U+0022 QUOTATION MARK (&apos;&quot;&apos;) in s
If it is a U+0027 APOSTROPHE (&quot;&apos;&quot;) and there is a later U+0027 APOSTROPHE (&quot;&apos;&quot;) in s
    Return the encoding corresponding to the string between this character and the next earliest occurrence of this character.&quot;

This is indeed a violation of the Content-Type syntax defined in RFC 2616, in not handling backslash-escapes inside quoted-string properly.

The spec claims that this is required for &quot;backwards compatibility with legacy
content&quot;.

I&apos;m attaching a test case that shows that the following browsers *do* handle escapes despite what the spec says:

Opera, Safari, Konqueror 4.4

Please remove the requirement to violate the base syntax.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39930</commentid>
    <comment_count>1</comment_count>
      <attachid>917</attachid>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-29 11:46:35 +0000</bug_when>
    <thetext>Created attachment 917
test case for backslash-escapes in quoted-string</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39931</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2010-09-29 11:51:29 +0000</bug_when>
    <thetext>It seems better for Opera to match Chrome / Gecko / IE here. We may very well hit problems because of this.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39932</commentid>
    <comment_count>3</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-29 12:03:21 +0000</bug_when>
    <thetext>(In reply to comment #2)
&gt; It seems better for Opera to match Chrome / Gecko / IE here. We may very well
&gt; hit problems because of this.

Please elaborate.

The backslash character isn&apos;t even allowed in charset names.

I believe that requiring an incompatibility with a base spec needs much stronger justification.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39933</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2010-09-29 12:07:06 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; The backslash character isn&apos;t even allowed in charset names.

Right, so we may need to ignore the rule rather than handle it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39937</commentid>
    <comment_count>5</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-29 12:22:31 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; Right, so we may need to ignore the rule rather than handle it.

Nope, unless you can prove that there is existing content that &quot;breaks&quot; because it uses &quot;\x&quot; although it wants to say &quot;x&quot;.

In absence of that proof, the right thing to do is to handle escape sequences as specified.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40045</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-30 03:54:05 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale:

WebKit trunk doesn&apos;t support this. Are you sure you haven&apos;t set your default encoding to UTF-8 in Safari? What version are you testing?

Opera doesn&apos;t support this either. It just ignores all punctuation. For instance, see:
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/121.html
This is known to be incompatible with legacy content, however (the spec used to do this too, but had to change for compat reasons).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40069</commentid>
    <comment_count>7</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-30 07:41:24 +0000</bug_when>
    <thetext>1) I&apos;m running Safari 5.0.2 on Windows, and &quot;Text Encoding&quot; is set to &quot;default&quot;.

2) The question of *why* Opera is handling the escapes properly isn&apos;t really relevant; what counts is the observable behavior.

You have claimed that this violation is needed for &quot;compatibility&quot;, yet the shipping versions of Opera and Safari on Windows, nor Konqueror do this. I think this is proof that the claim is incorrect.

Please provide more data.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40072</commentid>
    <comment_count>8</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-30 07:52:33 +0000</bug_when>
    <thetext>What does Safari get for you on this test (the expected and actual encodings, not the pass/fail state)?:
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/113.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40083</commentid>
    <comment_count>9</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-30 08:14:14 +0000</bug_when>
    <thetext>(In reply to comment #8)
&gt; What does Safari get for you on this test (the expected and actual encodings,
&gt; not the pass/fail state)?:
&gt;    http://www.hixie.ch/tests/adhoc/html/parsing/encoding/113.html

Expected result: Windows-1252

Encoding used by browser is: Windows-1254</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40084</commentid>
    <comment_count>10</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-30 08:23:12 +0000</bug_when>
    <thetext>Upon further investigation, it turns out WebKit has changed behaviour. It used to ignore all punctuation (it didn&apos;t support backslash escapes, just ignored punctuation like Opera). Newer trunk builds however now don&apos;t ignore punctuation.

(In reply to comment #7)
&gt; 2) The question of *why* Opera is handling the escapes properly isn&apos;t really
&gt; relevant; what counts is the observable behavior.

The point is that Opera isn&apos;t handling the escapes at all. For example, if you put a backslash before the final quote, it doesn&apos;t continue the string. It still ends at the quote. For example, consider:

   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/122.html
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/123.html


EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: As far as I can tell, no browsers with more than 1% market share do anything with backslashes at all, and the browsers that ignored backslashes are actively moving towards what the spec says.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40107</commentid>
    <comment_count>11</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-30 08:58:53 +0000</bug_when>
    <thetext>You still haven&apos;t provided evidence that this is needed for compatibility.

Will raise a tracker issue.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40129</commentid>
    <comment_count>12</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-09-30 09:51:52 +0000</bug_when>
    <thetext>(In reply to comment #11)
&gt; You still haven&apos;t provided evidence that this is needed for compatibility.
&gt; 
&gt; Will raise a tracker issue.

Are you suggesting that backlash-processing complexity be introduced into implementations without compatibility with existing content requiring it?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40130</commentid>
    <comment_count>13</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-30 10:02:29 +0000</bug_when>
    <thetext>What I&apos;m suggesting is that implementations should treat quoted-strings in HTTP parameters uniformly, so special-casing Content-Type is *adding* complexity.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40131</commentid>
    <comment_count>14</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2010-09-30 10:06:48 +0000</bug_when>
    <thetext>Opera will very soon match Safari nightlies and everyone else here. Us &quot;supporting&quot; this was just a side effect of supporting the way too liberal UTS22 matching rules.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40133</commentid>
    <comment_count>15</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-30 10:17:55 +0000</bug_when>
    <thetext>(In reply to comment #14)
&gt; Opera will very soon match Safari nightlies and everyone else here. Us
&gt; &quot;supporting&quot; this was just a side effect of supporting the way too liberal
&gt; UTS22 matching rules.

A, self-fulfilling prophecy.

Anyway, I have shown that the claim that this is &quot;required for compatibility&quot; is wrong; thus will escalate the issue.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40140</commentid>
    <comment_count>16</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2010-09-30 11:54:52 +0000</bug_when>
    <thetext>Raised as

  http://www.w3.org/html/wg/tracker/issues/126</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>917</attachid>
            <date>2010-09-29 11:46:35 +0000</date>
            <delta_ts>2010-09-29 11:46:35 +0000</delta_ts>
            <desc>test case for backslash-escapes in quoted-string</desc>
            <filename>enctest2.html</filename>
            <type>text/html</type>
            <size>2473</size>
            <attacher name="Julian Reschke">julian.reschke</attacher>
            
              <data encoding="base64">PCFET0NUWVBFIEhUTUw+DQo8bWV0YSBodHRwLWVxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9
J3RleHQvaHRtbDsgY2hhcnNldD0iXFVcVEYtOCInPg0KPHA+VGVzdDoNCjxwcmU+Jmx0OyFET0NU
WVBFIEhUTUw+DQombHQ7bWV0YSBodHRwLWVxdWl2PSJDb250ZW50LVR5cGUiIGNvbnRlbnQ9J3Rl
eHQvaHRtbDsgY2hhcnNldD0iXFVcVEYtOCInPjwvcHJlPg0KPHA+RXhwZWN0ZWQgcmVzdWx0OiA8
c3BhbiBpZD0iZXhwZWN0ZWQiPlVURi04PC9zcGFuPg0KPGRpdj4NCiA8c3R5bGUgc2NvcGVkPg0K
ICAucGFzcyB7IGJhY2tncm91bmQ6IGdyZWVuOyBjb2xvcjogd2hpdGU7IHBhZGRpbmc6IDAuNWVt
OyBmb250LXdlaWdodDogYm9sZDsgfQ0KICAuZmFpbCB7IGJhY2tncm91bmQ6IHJlZDsgY29sb3I6
IHllbGxvdzsgcGFkZGluZzogMC41ZW07IGZvbnQtd2VpZ2h0OiBib2xkOyB9DQogPC9zdHlsZT4N
CiA8cD5FbmNvZGluZyB1c2VkIGJ5IGJyb3dzZXIgaXM6IDxzcGFuIGlkPSJlbmNvZGluZyI+U2Ny
aXB0IGRpZCBub3QgcnVuLjwvc3Bhbj4NCiA8cD5SZXN1bHQ6IDxzcGFuIGlkPSJyZXN1bHQiPlNj
cmlwdCBkaWQgbm90IHJ1bi48L3NwYW4+DQogPHNjcmlwdD4NCiAgIHZhciBlbmNvZGluZyA9ICd1
bmtub3duJzsNCiAgIGlmICgn4CcgPT0gJ1x1MDBFMCcpIHsgLy8gMHhFMA0KICAgICBpZiAoJ5kn
ID09ICdcdTIxMjInKSB7IC8vIDB4OTkNCiAgICAgICBpZiAoJ/4nID09ICdcdTAwRkUnKSAvLyAw
eEZFDQogICAgICAgICBlbmNvZGluZyA9ICdXaW5kb3dzLTEyNTInOw0KICAgICAgIGVsc2UgaWYg
KCf+JyA9PSAnXHUwMTVGJykNCiAgICAgICAgIGVuY29kaW5nID0gJ1dpbmRvd3MtMTI1NCc7DQog
ICAgICAgZWxzZSBpZiAoJ/4nID09ICdcdTIwMEYnKQ0KICAgICAgICAgZW5jb2RpbmcgPSAnV2lu
ZG93cy0xMjU2JzsNCiAgICAgICBlbHNlIGlmICgn/icgPT0gJ1x1MjBBQicpDQogICAgICAgICBl
bmNvZGluZyA9ICdXaW5kb3dzLTEyNTgnOw0KICAgICAgIGVsc2UNCiAgICAgICAgIGVuY29kaW5n
ID0gJ3Vua25vd24gd2l0aCAweEUwID0gVSswMEUwIGFuZCAweDk5ID0gVSsyMTIyJzsNCiAgICAg
fSBlbHNlIGlmICgn/ycgPT0gJ1x1MDJEOScpIC8vIDB4RkYNCiAgICAgICBlbmNvZGluZyA9ICdJ
U08tODg1OS0zJzsNCiAgICAgZWxzZSBpZiAoJ/4nID09ICdcdTAxNzcnKSAvLyAweEZFDQogICAg
ICAgZW5jb2RpbmcgPSAnSVNPLTg4NTktMTQnOw0KICAgICBlbHNlIGlmICgn/icgPT0gJ1x1MDIx
QicpDQogICAgICAgZW5jb2RpbmcgPSAnSVNPLTg4NTktMTYnOw0KICAgICBlbHNlIGlmICgn/icg
PT0gJ1x1MDE1RicpDQogICAgICAgZW5jb2RpbmcgPSAnSVNPLTg4NTktOSc7DQogICAgIGVsc2Ug
aWYgKCe+JyA9PSAnXHUwMEJFJykgLy8gMHhCRQ0KICAgICAgIGVuY29kaW5nID0gJ0lTTy04ODU5
LTEnOw0KICAgICBlbHNlIGlmICgnvicgPT0gJ1x1MDE3OCcpDQogICAgICAgZW5jb2RpbmcgPSAn
SVNPLTg4NTktMTUnOw0KICAgICBlbHNlDQogICAgICAgZW5jb2RpbmcgPSAndW5rbm93biB3aXRo
IDB4RTAgPSBVKzAwRTAnOw0KICAgfSBlbHNlIGlmICgn4CcgPT0gJ1x1MDEwMScpIC8vIDB4RTAN
CiAgICAgZW5jb2RpbmcgPSAnSVNPLTg4NTktMTAnOw0KICAgZWxzZSBpZiAoJ+AnID09ICdcdTBF
NDAnKSAvLyAweEUwDQogICAgIGVuY29kaW5nID0gJ0lTTy04ODU5LTExJzsNCiAgIGVsc2UgaWYg
KCfgJyA9PSAnXHVGRkZEJykgeyAvLyAweEUwDQogICAgIGlmICgn4pi6JyA9PSAnXHUyNjNBJykg
Ly8gMHhFMiAweDk4IDB4QkENCiAgICAgICBlbmNvZGluZyA9ICdVVEYtOCc7DQogICAgIGVsc2UN
CiAgICAgICBlbmNvZGluZyA9ICd1bmtub3duIChidXQgQVNDSUktY29tcGF0aWJsZSknOw0KICAg
fQ0KICAgZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ2VuY29kaW5nJykuZmlyc3RDaGlsZC5kYXRh
ID0gZW5jb2Rpbmc7DQogICB2YXIgZXhwZWN0ZWQgPSBkb2N1bWVudC5nZXRFbGVtZW50QnlJZCgn
ZXhwZWN0ZWQnKS5maXJzdENoaWxkLmRhdGE7DQogICBpZiAoZXhwZWN0ZWQgPT0gJyhkZW1vIC0g
bm8gZXhwZWN0ZWQgcmVzdWx0KScpIHsNCiAgICAgZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ3Jl
c3VsdCcpLmZpcnN0Q2hpbGQuZGF0YSA9ICdkZW1vJzsNCiAgIH0gZWxzZSBpZiAoZW5jb2Rpbmcg
PT0gZXhwZWN0ZWQpIHsNCiAgICAgZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ3Jlc3VsdCcpLmZp
cnN0Q2hpbGQuZGF0YSA9ICdQQVNTJzsNCiAgICAgZG9jdW1lbnQuZ2V0RWxlbWVudEJ5SWQoJ3Jl
c3VsdCcpLmNsYXNzTmFtZSA9ICdwYXNzJzsNCiAgIH0gZWxzZSB7DQogICAgIGRvY3VtZW50Lmdl
dEVsZW1lbnRCeUlkKCdyZXN1bHQnKS5maXJzdENoaWxkLmRhdGEgPSAnRkFJTCc7DQogICAgIGRv
Y3VtZW50LmdldEVsZW1lbnRCeUlkKCdyZXN1bHQnKS5jbGFzc05hbWUgPSAnZmFpbCc7DQogICB9
DQogPC9zY3JpcHQ+DQo8L2Rpdj4NCg==
</data>

          </attachment>
      

    </bug>

</bugzilla>