<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>17861</bug_id>
          
          <creation_ts>2012-07-18 07:07:54 +0000</creation_ts>
          <short_desc>i18n-ISSUE-107: replacement characters</short_desc>
          <delta_ts>2012-09-28 19:51:02 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>HTML</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>addison</cc>
    
    <cc>annevk</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>public-i18n-core</cc>
          
          <qa_contact>contributor</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>70192</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-07-18 07:07:54 +0000</bug_when>
    <thetext>This was was cloned from bug 16972 as part of operation convergence.
Originally filed: 2012-05-07 17:45:00 +0000
Original reporter: Addison Phillips &lt;addison@lab126.com&gt;

================================================================================
 #0   Addison Phillips                                2012-05-07 17:45:48 +0000 
--------------------------------------------------------------------------------
2.6.3 Resolving URLs
http://www.w3.org/TR/html5/urls.html#resolving-urls

Step 8.1 replaces characters that cannot be encoded into the target encoding with the question mark character (0x3F). Should this be, instead, the replacement character for the target encoding? For example, UTF-8 would use U+FFFD. Some encodings use _.
================================================================================
 #1   Ian &apos;Hixie&apos; Hickson                             2012-05-10 17:58:18 +0000 
--------------------------------------------------------------------------------
Please provide test cases demonstrating the proposed behaviour is compatible with legacy implementations.
================================================================================</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74694</commentid>
    <comment_count>1</comment_count>
    <who name="Addison Phillips">addison</who>
    <bug_when>2012-09-28 00:01:39 +0000</bug_when>
    <thetext>I validated FF, Chrome, IE8, and Opera 11 all show U+FFFD or a tofu box that evaulates to be U+FFFD and not 0x3F, for the UTF-8 test and all but FF show the same behavior for the SJIS test I created. (FF shows random junk instead) Please see:

http://www.inter-locale.com/test/html5test/17861-t2.html (UTF-8)
http://www.inter-locale.com/test/html5test/17861-t1.html (SJIS)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74704</commentid>
    <comment_count>2</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-09-28 02:57:15 +0000</bug_when>
    <thetext>Anne, is this an issue for your URL spec?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74757</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-09-28 07:02:05 +0000</bug_when>
    <thetext>Yes, it would be. However, those test cases are not testing URLs. They are testing bytes -&gt; unicode for the HTML parser, whereas the URL requirement is about unicode -&gt; bytes for the query component (and only the query component). And I&apos;m pretty sure browsers all use 0x3F there for non-utf-8 encodings.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>74832</commentid>
    <comment_count>4</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-09-28 19:51:02 +0000</bug_when>
    <thetext>Ok. Marking INVALID; please reopen if Anne and I misunderstand the issue here.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>