This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 16972 - i18n-ISSUE-107: replacement characters
Summary: i18n-ISSUE-107: replacement characters
Status: CLOSED INVALID
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-07 17:45 UTC by Addison Phillips
Modified: 2014-03-03 16:44 UTC (History)
7 users (show)

See Also:


Attachments

Description Addison Phillips 2012-05-07 17:45:48 UTC
2.6.3 Resolving URLs
http://www.w3.org/TR/html5/urls.html#resolving-urls

Step 8.1 replaces characters that cannot be encoded into the target encoding with the question mark character (0x3F). Should this be, instead, the replacement character for the target encoding? For example, UTF-8 would use U+FFFD. Some encodings use _.
Comment 1 Ian 'Hixie' Hickson 2012-05-10 17:58:18 UTC
Please provide test cases demonstrating the proposed behaviour is compatible with legacy implementations.
Comment 2 contributor 2012-07-18 07:07:56 UTC
This bug was cloned to create bug 17861 as part of operation convergence.
Comment 3 Robin Berjon 2012-09-06 16:39:40 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Rationale:

Addison, can you please provide further information as per Ian's comment #1?
Comment 4 Addison Phillips 2014-02-28 21:44:10 UTC
The text in this section has changed and I can't find the "offending bits" any longer. Actually, I believe this has been taken over by the Encodings document and the issue is being discussed in a bug there. Since there isn't anything in HTML to change, closing the bug.

Regarding Ian's comment, the replacement character for UTF-8 is well known to be U+FFFD and all of the browsers use that code point when presented with malformed UTF-8 data. Other encodings do work as described in my original comment. However, I will stipulate that URL conversion may be handled specially. Since this is no longer a bug, I haven't produced a test case to test it with.
Comment 5 Richard Ishida 2014-03-03 16:40:25 UTC
You can find this in the Encoding spec at http://encoding.spec.whatwg.org/#encodings. hth
Comment 6 Addison Phillips 2014-03-03 16:44:22 UTC
Yes, I know it's there :-). Although that's not really the spec for URL processing.