<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>16972</bug_id>
          
          <creation_ts>2012-05-07 17:45:48 +0000</creation_ts>
          <short_desc>i18n-ISSUE-107: replacement characters</short_desc>
          <delta_ts>2014-03-03 16:44:22 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Addison Phillips">addison</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>ian</cc>
    
    <cc>ishida</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>public-i18n-core</cc>
    
    <cc>robin</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>67407</commentid>
    <comment_count>0</comment_count>
    <who name="Addison Phillips">addison</who>
    <bug_when>2012-05-07 17:45:48 +0000</bug_when>
    <thetext>2.6.3 Resolving URLs
http://www.w3.org/TR/html5/urls.html#resolving-urls

Step 8.1 replaces characters that cannot be encoded into the target encoding with the question mark character (0x3F). Should this be, instead, the replacement character for the target encoding? For example, UTF-8 would use U+FFFD. Some encodings use _.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>67575</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-05-10 17:58:18 +0000</bug_when>
    <thetext>Please provide test cases demonstrating the proposed behaviour is compatible with legacy implementations.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>70193</commentid>
    <comment_count>2</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-07-18 07:07:56 +0000</bug_when>
    <thetext>This bug was cloned to create bug 17861 as part of operation convergence.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>73332</commentid>
    <comment_count>3</comment_count>
    <who name="Robin Berjon">robin</who>
    <bug_when>2012-09-06 16:39:40 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Rationale:

Addison, can you please provide further information as per Ian&apos;s comment #1?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>101658</commentid>
    <comment_count>4</comment_count>
    <who name="Addison Phillips">addison</who>
    <bug_when>2014-02-28 21:44:10 +0000</bug_when>
    <thetext>The text in this section has changed and I can&apos;t find the &quot;offending bits&quot; any longer. Actually, I believe this has been taken over by the Encodings document and the issue is being discussed in a bug there. Since there isn&apos;t anything in HTML to change, closing the bug.

Regarding Ian&apos;s comment, the replacement character for UTF-8 is well known to be U+FFFD and all of the browsers use that code point when presented with malformed UTF-8 data. Other encodings do work as described in my original comment. However, I will stipulate that URL conversion may be handled specially. Since this is no longer a bug, I haven&apos;t produced a test case to test it with.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>101757</commentid>
    <comment_count>5</comment_count>
    <who name="Richard Ishida">ishida</who>
    <bug_when>2014-03-03 16:40:25 +0000</bug_when>
    <thetext>You can find this in the Encoding spec at http://encoding.spec.whatwg.org/#encodings. hth</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>101758</commentid>
    <comment_count>6</comment_count>
    <who name="Addison Phillips">addison</who>
    <bug_when>2014-03-03 16:44:22 +0000</bug_when>
    <thetext>Yes, I know it&apos;s there :-). Although that&apos;s not really the spec for URL processing.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>