17861 – i18n-ISSUE-107: replacement characters

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17861 - i18n-ISSUE-107: replacement characters

Summary: i18n-ISSUE-107: replacement characters

Status:	RESOLVED INVALID

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-07-18 07:07 UTC by contributor
Modified:	2012-09-28 19:51 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description contributor 2012-07-18 07:07:54 UTC

This was was cloned from bug 16972 as part of operation convergence.
Originally filed: 2012-05-07 17:45:00 +0000
Original reporter: Addison Phillips <addison@lab126.com>

================================================================================
 #0   Addison Phillips                                2012-05-07 17:45:48 +0000 
--------------------------------------------------------------------------------
2.6.3 Resolving URLs
http://www.w3.org/TR/html5/urls.html#resolving-urls

Step 8.1 replaces characters that cannot be encoded into the target encoding with the question mark character (0x3F). Should this be, instead, the replacement character for the target encoding? For example, UTF-8 would use U+FFFD. Some encodings use _.
================================================================================
 #1   Ian 'Hixie' Hickson                             2012-05-10 17:58:18 +0000 
--------------------------------------------------------------------------------
Please provide test cases demonstrating the proposed behaviour is compatible with legacy implementations.
================================================================================

Comment 1 Addison Phillips 2012-09-28 00:01:39 UTC

I validated FF, Chrome, IE8, and Opera 11 all show U+FFFD or a tofu box that evaulates to be U+FFFD and not 0x3F, for the UTF-8 test and all but FF show the same behavior for the SJIS test I created. (FF shows random junk instead) Please see:

http://www.inter-locale.com/test/html5test/17861-t2.html (UTF-8)
http://www.inter-locale.com/test/html5test/17861-t1.html (SJIS)

Comment 2 Ian 'Hixie' Hickson 2012-09-28 02:57:15 UTC

Anne, is this an issue for your URL spec?

Comment 3 Anne 2012-09-28 07:02:05 UTC

Yes, it would be. However, those test cases are not testing URLs. They are testing bytes -> unicode for the HTML parser, whereas the URL requirement is about unicode -> bytes for the query component (and only the query component). And I'm pretty sure browsers all use 0x3F there for non-utf-8 encodings.

Comment 4 Ian 'Hixie' Hickson 2012-09-28 19:51:02 UTC

Ok. Marking INVALID; please reopen if Anne and I misunderstand the issue here.