14676 – For UTF-16, the oder of the steps in "change the encoding" doesn't seem right.

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14676 - For UTF-16, the oder of the steps in "change the encoding" doesn't seem right.

Summary: For UTF-16, the oder of the steps in "change the encoding" doesn't seem right.

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-11-02 10:03 UTC by contributor
Modified:	2012-07-18 18:46 UTC (History)
CC List:	3 users (show)

See Also:

Attachments

Description contributor 2011-11-02 10:03:30 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
Multipage: http://www.whatwg.org/C#changing-the-encoding-while-parsing
Complete: http://www.whatwg.org/c#changing-the-encoding-while-parsing

Comment:
For UTF-16, the oder of the steps in "change the encoding" doesn't seem right.

Posted from: 114.43.127.97 by kennyluck@csail.mit.edu
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1

Comment 1 KangHao Lu 2011-11-02 10:19:22 UTC

s/oder/order/

Consider a simple test case like, <script>alert(document.characterSet||document.charset)</script><meta http-equiv="content-type" content="charset=utf-16"> . By the pre-scanning algorithm, the first try should be utf-8. But then in step 1 of the "change the encoding" uft-16 isn't at that moment equivalent to utf-8, so a reload is possible depending on how you interpret the "may" in step 4.

Gecko doesn't reload in this case. IE first gives the default encoding (which contradicts the pre-scanning algorithm but that's another issue), and then "unicode" (but decodes the content in "utf-8").

Anyway, is allowing reloading in my example intentional? If not, I propose we move step 3 before step 1.

Comment 2 contributor 2011-11-02 20:40:37 UTC

Checked in as WHATWG revision r6814.
Check-in comment: When a page interpreted as UTF-8 has a <meta charset> saying UTF-16, the spec used to say to reload even though the encoding didn't change.
http://html5.org/tools/web-apps-tracker?from=6813&to=6814