This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10806 - ignoring escapes is not needed for compatibility with existing content
Summary: ignoring escapes is not needed for compatibility with existing content
Status: CLOSED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/Overview...
Whiteboard:
Keywords: NE, WGDecision
Depends on:
Blocks: 10804
  Show dependency treegraph
 
Reported: 2010-09-29 11:45 UTC by Julian Reschke
Modified: 2011-03-24 12:38 UTC (History)
8 users (show)

See Also:


Attachments
test case for backslash-escapes in quoted-string (2.42 KB, text/html)
2010-09-29 11:46 UTC, Julian Reschke
Details

Description Julian Reschke 2010-09-29 11:45:56 UTC
<http://dev.w3.org/html5/spec/Overview.html#content-type-sniffing>:

"If it is a U+0022 QUOTATION MARK ('"') and there is a later U+0022 QUOTATION MARK ('"') in s
If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 APOSTROPHE ("'") in s
    Return the encoding corresponding to the string between this character and the next earliest occurrence of this character."

This is indeed a violation of the Content-Type syntax defined in RFC 2616, in not handling backslash-escapes inside quoted-string properly.

The spec claims that this is required for "backwards compatibility with legacy
content".

I'm attaching a test case that shows that the following browsers *do* handle escapes despite what the spec says:

Opera, Safari, Konqueror 4.4

Please remove the requirement to violate the base syntax.
Comment 1 Julian Reschke 2010-09-29 11:46:35 UTC
Created attachment 917 [details]
test case for backslash-escapes in quoted-string
Comment 2 Anne 2010-09-29 11:51:29 UTC
It seems better for Opera to match Chrome / Gecko / IE here. We may very well hit problems because of this.
Comment 3 Julian Reschke 2010-09-29 12:03:21 UTC
(In reply to comment #2)
> It seems better for Opera to match Chrome / Gecko / IE here. We may very well
> hit problems because of this.

Please elaborate.

The backslash character isn't even allowed in charset names.

I believe that requiring an incompatibility with a base spec needs much stronger justification.
Comment 4 Anne 2010-09-29 12:07:06 UTC
(In reply to comment #3)
> The backslash character isn't even allowed in charset names.

Right, so we may need to ignore the rule rather than handle it.
Comment 5 Julian Reschke 2010-09-29 12:22:31 UTC
(In reply to comment #4)
> Right, so we may need to ignore the rule rather than handle it.

Nope, unless you can prove that there is existing content that "breaks" because it uses "\x" although it wants to say "x".

In absence of that proof, the right thing to do is to handle escape sequences as specified.
Comment 6 Ian 'Hixie' Hickson 2010-09-30 03:54:05 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale:

WebKit trunk doesn't support this. Are you sure you haven't set your default encoding to UTF-8 in Safari? What version are you testing?

Opera doesn't support this either. It just ignores all punctuation. For instance, see:
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/121.html
This is known to be incompatible with legacy content, however (the spec used to do this too, but had to change for compat reasons).
Comment 7 Julian Reschke 2010-09-30 07:41:24 UTC
1) I'm running Safari 5.0.2 on Windows, and "Text Encoding" is set to "default".

2) The question of *why* Opera is handling the escapes properly isn't really relevant; what counts is the observable behavior.

You have claimed that this violation is needed for "compatibility", yet the shipping versions of Opera and Safari on Windows, nor Konqueror do this. I think this is proof that the claim is incorrect.

Please provide more data.
Comment 8 Ian 'Hixie' Hickson 2010-09-30 07:52:33 UTC
What does Safari get for you on this test (the expected and actual encodings, not the pass/fail state)?:
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/113.html
Comment 9 Julian Reschke 2010-09-30 08:14:14 UTC
(In reply to comment #8)
> What does Safari get for you on this test (the expected and actual encodings,
> not the pass/fail state)?:
>    http://www.hixie.ch/tests/adhoc/html/parsing/encoding/113.html

Expected result: Windows-1252

Encoding used by browser is: Windows-1254
Comment 10 Ian 'Hixie' Hickson 2010-09-30 08:23:12 UTC
Upon further investigation, it turns out WebKit has changed behaviour. It used to ignore all punctuation (it didn't support backslash escapes, just ignored punctuation like Opera). Newer trunk builds however now don't ignore punctuation.

(In reply to comment #7)
> 2) The question of *why* Opera is handling the escapes properly isn't really
> relevant; what counts is the observable behavior.

The point is that Opera isn't handling the escapes at all. For example, if you put a backslash before the final quote, it doesn't continue the string. It still ends at the quote. For example, consider:

   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/122.html
   http://www.hixie.ch/tests/adhoc/html/parsing/encoding/123.html


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: As far as I can tell, no browsers with more than 1% market share do anything with backslashes at all, and the browsers that ignored backslashes are actively moving towards what the spec says.
Comment 11 Julian Reschke 2010-09-30 08:58:53 UTC
You still haven't provided evidence that this is needed for compatibility.

Will raise a tracker issue.
Comment 12 Henri Sivonen 2010-09-30 09:51:52 UTC
(In reply to comment #11)
> You still haven't provided evidence that this is needed for compatibility.
> 
> Will raise a tracker issue.

Are you suggesting that backlash-processing complexity be introduced into implementations without compatibility with existing content requiring it?
Comment 13 Julian Reschke 2010-09-30 10:02:29 UTC
What I'm suggesting is that implementations should treat quoted-strings in HTTP parameters uniformly, so special-casing Content-Type is *adding* complexity.
Comment 14 Anne 2010-09-30 10:06:48 UTC
Opera will very soon match Safari nightlies and everyone else here. Us "supporting" this was just a side effect of supporting the way too liberal UTS22 matching rules.
Comment 15 Julian Reschke 2010-09-30 10:17:55 UTC
(In reply to comment #14)
> Opera will very soon match Safari nightlies and everyone else here. Us
> "supporting" this was just a side effect of supporting the way too liberal
> UTS22 matching rules.

A, self-fulfilling prophecy.

Anyway, I have shown that the claim that this is "required for compatibility" is wrong; thus will escalate the issue.
Comment 16 Julian Reschke 2010-09-30 11:54:52 UTC
Raised as

  http://www.w3.org/html/wg/tracker/issues/126