<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>9989</bug_id>
          
          <creation_ts>2010-06-23 09:51:30 +0000</creation_ts>
          <short_desc>Is the number of replacement characters supposed to be well-defined? If not this should be explicitly noted. If it is then more detail is required.</short_desc>
          <delta_ts>2010-09-30 08:28:25 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebAppsWG</product>
          <component>WebSocket API (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#handling-errors-in-utf-8-from-the-server</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>annevk</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>public-webapps</cc>
          
          <qa_contact>public-webapps-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>36368</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2010-06-23 09:51:30 +0000</bug_when>
    <thetext>Section: http://www.whatwg.org/specs/web-apps/current-work/complete.html#handling-errors-in-utf-8-from-the-server

Comment:
Is the number of replacement characters supposed to be well-defined? If not
this should be explicitly noted. If it is then more detail is required.

Posted from: 88.131.66.80</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>37058</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-07-22 05:27:48 +0000</bug_when>
    <thetext>I don&apos;t understand what isn&apos;t well-defined.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>37064</commentid>
    <comment_count>2</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2010-07-22 13:25:19 +0000</bug_when>
    <thetext>The spec says to replace bytes *or* sequences of bytes that are not valid utf-8 with U+FFFD. It is thus not well-defined how many U+FFFD are expected for any given sequence of bytes that are not valid utf-8. It could be one or the same amount of bytes that are not valid, or anything in between.

(The same applies to text/html parsing.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>37094</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2010-07-26 06:29:11 +0000</bug_when>
    <thetext>This really ought to be fixed in the UTF-8 specification or in some encoding layer specification as a whole bunch of specifications are affected by this.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>37095</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2010-07-26 06:29:37 +0000</bug_when>
    <thetext>Scrap that bit about UTF-8, misread something.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>37389</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-08-13 07:27:56 +0000</bug_when>
    <thetext>What would you consider an acceptable replacement for the current text? I intentionally use the same prose throughout the Web Apps 1.0 spec, because I thought it was what we&apos;d agreed was clear, but I&apos;m happy to change it to something else if you have a concrete proposal for what to change it to.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39291</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-24 14:56:48 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Did Not Understand Request
Change Description: no spec change
Rationale: see comment 5</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39565</commentid>
    <comment_count>7</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2010-09-27 11:22:36 +0000</bug_when>
    <thetext>It seems the bugzilla monster ate my comment. Trying again:


First try, probably isn&apos;t quite right:
 
Numbers are bytes in hex. &quot;Anything but ...&quot; includes EOF.
 
Stray 80-BF:
FE-FF:
replace with one U+FFFD.
 
C0-C1 followed by 80-BF:
replace the 2-byte sequence with one U+FFFD.
 
C0-FD followed by anything but 80-BF:
replace the first byte with one U+FFFD and reprocess the second byte.
 
E0-FD followed by 80-BF followed by anything but 80-BF:
replace the first two bytes with one U+FFFD and reprocess the third byte.
 
F0-FD followed by two 80-BF followed by anything but 80-BF:
replace the first three bytes with one U+FFFD and reprocess the forth byte.
 
F0-F4 followed by three 80-BF that represent a code point above U+10FFFF:
replace all four bytes with one U+FFFD.
 
F5-FD followed by three 80-BF followed by anything but 80-BF:
replace the first four bytes with one U+FFFD and reprocess the fifth byte.
 
FC-FD followed by four 80-BF followed by anything but 80-BF:
replace the first five bytes with one U+FFFD and reprocess the sixth byte.
 
Overlong forms (e.g. F0 80 80 A0):
replace the whole byte sequence with one U+FFFD.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39698</commentid>
    <comment_count>8</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-28 07:29:37 +0000</bug_when>
    <thetext>Any volunteers for a Web UTF-8 spec?

I guess I&apos;ll put this in the HTML spec&apos;s infrastructure section and then refer to it from all the other specs of relevance.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40087</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-30 08:28:03 +0000</bug_when>
    <thetext>something weird is happening with this bug</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40088</commentid>
    <comment_count>10</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2010-09-30 08:28:25 +0000</bug_when>
    <thetext>anyway this bug if fixed now</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>