<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>26967</bug_id>
          
          <creation_ts>2014-10-04 12:38:09 +0000</creation_ts>
          <short_desc>Use USVString instead of DOMString for url argument and send() method (removes lone surrogates)</short_desc>
          <delta_ts>2016-03-10 08:49:16 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>HTML</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>MOVED</resolution>
          
          
          <bug_file_loc>https://html.spec.whatwg.org/#dom-websocket</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>annevk</cc>
    
    <cc>d</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>rubys</cc>
    
    <cc>simon.sapin</cc>
    
    <cc>zcorpan</cc>
          
          <qa_contact>contributor</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>112692</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2014-10-04 12:38:09 +0000</bug_when>
    <thetext>Specification: https://html.spec.whatwg.org/multipage/comms.html
Multipage: https://html.spec.whatwg.org/multipage/#dom-websocket
Complete: https://html.spec.whatwg.org/#dom-websocket
Referrer: https://html.spec.whatwg.org/multipage/

Comment:
Use USVString instead of DOMString for url argument and send() method (removes
lone surrogates)

Posted from: 46.127.136.57 by annevk@annevk.nl
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:35.0) Gecko/20100101 Firefox/35.0</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112981</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-10-11 00:33:07 +0000</bug_when>
    <thetext>Why the URL argument?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>112988</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-10-11 07:24:08 +0000</bug_when>
    <thetext>The url parser deals with scalar values only.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113046</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-10-13 22:27:39 +0000</bug_when>
    <thetext>So this applies to anywhere and everywhere that I get a URL and pass it to the URL parser?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113068</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-10-14 07:10:15 +0000</bug_when>
    <thetext>Yes.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113141</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-10-14 21:23:33 +0000</bug_when>
    <thetext>So how does that work with, say, content attributes and reflecting IDL attributes?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113167</commentid>
    <comment_count>6</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-10-15 07:09:15 +0000</bug_when>
    <thetext>I suppose those still need http://heycam.github.io/webidl/#dfn-obtain-unicode

(It seems browsers however have some kind of IDL extension named [Reflect] that would better allow for this kind of type sharing. Hopefully we can do something like that at some point.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113232</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-10-15 18:30:43 +0000</bug_when>
    <thetext>I don&apos;t understand. Why can&apos;t URL just take care of this when I hand in a DOMString?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113276</commentid>
    <comment_count>8</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-10-16 07:31:45 +0000</bug_when>
    <thetext>For the same reason e.g. the network doesn&apos;t take care of it? It&apos;s a different system and expects different values.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113326</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-10-16 16:28:30 +0000</bug_when>
    <thetext>I think this should happen at the URL level. Otherwise we have to have prose all over the place doing conversions back and forth. There&apos;s really no need for it when it could be a single sentence in the URL spec that does the conversion. (I&apos;m not really even clear on why it needs to be any prose at all. You just treat the bytes differently.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>113332</commentid>
    <comment_count>10</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-10-16 16:35:03 +0000</bug_when>
    <thetext>The current setup works well for anything APIs. It seems that the only thing that it does not work well for is reflected attributes, which is also a one sentence fix.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115559</commentid>
    <comment_count>11</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-11-26 20:07:28 +0000</bug_when>
    <thetext>I&apos;m very confused. The content and IDL attributes here are just regular DOMString attributes, they&apos;re not anything special until they are later parsed as URLs. Are you going to change e.g. getAttribute() to return a USVString?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115596</commentid>
    <comment_count>12</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-11-27 08:05:02 +0000</bug_when>
    <thetext>I mean that for the case of reflected attributes you would have to invoke the conversion yourself before handing it to the URL parser.

I hope that at some point we can define reflected attributes as such, which is already the case in Chromium as I understand it:

  [Reflect=URL] attribute USVString href;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115643</commentid>
    <comment_count>13</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-11-28 04:26:49 +0000</bug_when>
    <thetext>Oh you&apos;re talking just about how these attributes resolve themselves? Not about how the URL is actually used?

I really don&apos;t understand the problem here. I pass DOMString strings to the URL parser all the time (e.g. whenever I take a content attribute and resolve it to get an absolute URL). Why would reflecting attributes be any different.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115649</commentid>
    <comment_count>14</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-11-28 07:53:24 +0000</bug_when>
    <thetext>The problem is that the URL parser does not take a DOMString.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115748</commentid>
    <comment_count>15</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-12-01 18:59:17 +0000</bug_when>
    <thetext>That is indeed the problem. I&apos;m saying that instead of everyone having to convert their strings to Unicode before ever interacting with the URL spec, the URL spec should just act like everyone else and take the same kind of string as all the other APIs. If it needs to then act on them as if they&apos;re Unicode and not UTF-16, then that&apos;s fine, but that&apos;s an internal concern, not something you should expose in your API.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115835</commentid>
    <comment_count>16</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-12-03 14:43:33 +0000</bug_when>
    <thetext>I disagree, but I&apos;m happy to add something like a &quot;DOMString-accepting URL parser&quot; hook.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115858</commentid>
    <comment_count>17</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-12-03 20:10:34 +0000</bug_when>
    <thetext>Why don&apos;t you agree?

Why not just have the current hook, but just make it so it accepts both?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115873</commentid>
    <comment_count>18</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-12-03 21:51:07 +0000</bug_when>
    <thetext>So I was convinced by Simon that this was a better strategy as it keeps the URL parser surrogate free. Reversing that would be somewhat painful, but is definitely doable of course.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115879</commentid>
    <comment_count>19</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-12-03 22:25:30 +0000</bug_when>
    <thetext>You can still keep the parser surrogate free. I&apos;m just saying that whatever prose you would have me put at all the call sites, you would just put at the top of whatever algorithm I&apos;m calling.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115880</commentid>
    <comment_count>20</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-12-03 22:28:16 +0000</bug_when>
    <thetext>That was my suggestion in comment 16...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115882</commentid>
    <comment_count>21</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-12-03 22:54:05 +0000</bug_when>
    <thetext>Right, but you said you disagreed. I&apos;m trying to figure out why you disagree.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115883</commentid>
    <comment_count>22</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-12-03 23:00:38 +0000</bug_when>
    <thetext>I would prefer addressing this per comment 6, but I&apos;m okay with addressing this per comment 16 for now (until something like IDL [Reflect] becomes feasible which seems like a better solution).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115886</commentid>
    <comment_count>23</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2014-12-03 23:33:03 +0000</bug_when>
    <thetext>I don&apos;t understand how comment 6 would help. Most of the call sites for this aren&apos;t the one &quot;reflecting IDL attribute&quot; call site. Can you elaborate? Why is having this in multiple call sites, and having additional IDL syntax, better than just having one line of prose in the URL spec?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>115921</commentid>
    <comment_count>24</comment_count>
    <who name="Simon Pieters">zcorpan</who>
    <bug_when>2014-12-04 14:08:12 +0000</bug_when>
    <thetext>This seems similar to the various algorithms in css-syntax that need to be invoked with different kind of inputs from different places (a token stream or a string).

http://dev.w3.org/csswg/css-syntax/#parser-entry-points

I agree with Hixie that it seems nicer to normalize the input on your end instead of having all other specs convert to the input you want. It centralizes the conversion so it is done consistently, and it is less prose for other specs.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125426</commentid>
    <comment_count>25</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2016-03-10 08:49:16 +0000</bug_when>
    <thetext>https://github.com/whatwg/html/pull/840</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>