<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>16157</bug_id>
          
          <creation_ts>2012-02-29 01:15:34 +0000</creation_ts>
          <short_desc>WebSocket shouldn&apos;t throw SyntaxError on unpaired surrogates</short_desc>
          <delta_ts>2012-05-02 20:06:47 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebAppsWG</product>
          <component>WebSocket API (editor: Ian Hickson)</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Glenn Maynard">glenn</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>brian.raymor</cc>
    
    <cc>ian</cc>
    
    <cc>jonas</cc>
    
    <cc>mike</cc>
    
    <cc>public-webapps</cc>
    
    <cc>zcorpan</cc>
          
          <qa_contact>public-webapps-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>64718</commentid>
    <comment_count>0</comment_count>
    <who name="Glenn Maynard">glenn</who>
    <bug_when>2012-02-29 01:15:34 +0000</bug_when>
    <thetext>&gt; If the method&apos;s second argument has any unpaired surrogates, then throw a SyntaxError exception and abort these steps.

and

&gt; If the data argument has any unpaired surrogates, then throw a SyntaxError exception.

Don&apos;t throw exceptions on unpaired surrogates.  Instead, use the WebIDL &quot;convert a DOMString to a sequence of Unicode characters&quot; [1] algorithm, which converts unpaired surrogates to U+FFFD, as well as defining the conversion itself.


http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html

[1] http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>64941</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-03-02 23:28:49 +0000</bug_when>
    <thetext>Silently scrambling data seems like a bad idea. Why would we do this?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>64942</commentid>
    <comment_count>2</comment_count>
    <who name="Glenn Maynard">glenn</who>
    <bug_when>2012-03-02 23:30:17 +0000</bug_when>
    <thetext>Please see the thread at http://lists.w3.org/Archives/Public/public-webapps/2011JulSep/1589.html, so we don&apos;t start the discussion from scratch.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>64943</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-03-02 23:46:14 +0000</bug_when>
    <thetext>I read that thread before and didn&apos;t see any reason to do this.

The only argument I&apos;ve seen so far is about what happens if a user types in a message with astral characters and the script truncates it naïvely half-way through a surrogate and then sends it through the socket. That does seem like a potentially rare case (wouldn&apos;t be caught in the design). Not clear that replacing the half-surrogate with U+FFFD is especially nice either but it seems better than crashing.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>65655</commentid>
    <comment_count>4</comment_count>
    <who name="Jonas Sicking (Not reading bugmail)">jonas</who>
    <bug_when>2012-03-16 07:33:16 +0000</bug_when>
    <thetext>How is this different from the &quot;draconian&quot; error handling the XML parsers are required to do and which many people, you included, has argued strongly against.

The problem with throwing for unpaired surrogates is that easy-to-make data-dependent mistakes produces very fatal results. I.e. if for example you want to send string data in smaller chunks a very easy &quot;mistake&quot; to make would be to simply chop up the JS-string into 10k sized chunks and send each separately. This will generally work great, however in languages which produces a lot of surrogates this will fail 50%-67% of the time.

If we could make it throw consistently then I agree it would have been a more reasonable strategy. But I can&apos;t think of a way to not make this very data dependent which means that it&apos;s likely to not fail on developers machines, but fail in the real world.

And yes, putting in a replacement character also results in destroyed data. However in the example stated above, having one destroyed character every 10k of data should be a low enough error rate that the message is still understandable to a human. Just like the layout errors produced by a missing end tag likely produces a page understandable to humans.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>67256</commentid>
    <comment_count>5</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-05-02 20:06:47 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r7084.
Check-in comment: Make WebSocket silently convert isolated surrogated to U+FFFD rather than throwing an exception. This will result in data corruption when a user types in astral-plane characters that get truncated by naiive script half-way through, rather than crashing the application.
http://html5.org/tools/web-apps-tracker?from=7083&amp;to=7084</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>