26967 – Use USVString instead of DOMString for url argument and send() method (removes lone surrogates)

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26967 - Use USVString instead of DOMString for url argument and send() method (removes lone surrogates)

Summary: Use USVString instead of DOMString for url argument and send() method (remove...

Status:	RESOLVED MOVED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	contributor

URL:	https://html.spec.whatwg.org/#dom-web...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-10-04 12:38 UTC by contributor
Modified:	2016-03-10 08:49 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description contributor 2014-10-04 12:38:09 UTC

Specification: https://html.spec.whatwg.org/multipage/comms.html
Multipage: https://html.spec.whatwg.org/multipage/#dom-websocket
Complete: https://html.spec.whatwg.org/#dom-websocket
Referrer: https://html.spec.whatwg.org/multipage/

Comment:
Use USVString instead of DOMString for url argument and send() method (removes
lone surrogates)

Posted from: 46.127.136.57 by annevk@annevk.nl
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:35.0) Gecko/20100101 Firefox/35.0

Comment 1 Ian 'Hixie' Hickson 2014-10-11 00:33:07 UTC

Why the URL argument?

Comment 2 Anne 2014-10-11 07:24:08 UTC

The url parser deals with scalar values only.

Comment 3 Ian 'Hixie' Hickson 2014-10-13 22:27:39 UTC

So this applies to anywhere and everywhere that I get a URL and pass it to the URL parser?

Comment 4 Anne 2014-10-14 07:10:15 UTC

Yes.

Comment 5 Ian 'Hixie' Hickson 2014-10-14 21:23:33 UTC

So how does that work with, say, content attributes and reflecting IDL attributes?

Comment 6 Anne 2014-10-15 07:09:15 UTC

I suppose those still need http://heycam.github.io/webidl/#dfn-obtain-unicode

(It seems browsers however have some kind of IDL extension named [Reflect] that would better allow for this kind of type sharing. Hopefully we can do something like that at some point.)

Comment 7 Ian 'Hixie' Hickson 2014-10-15 18:30:43 UTC

I don't understand. Why can't URL just take care of this when I hand in a DOMString?

Comment 8 Anne 2014-10-16 07:31:45 UTC

For the same reason e.g. the network doesn't take care of it? It's a different system and expects different values.

Comment 9 Ian 'Hixie' Hickson 2014-10-16 16:28:30 UTC

I think this should happen at the URL level. Otherwise we have to have prose all over the place doing conversions back and forth. There's really no need for it when it could be a single sentence in the URL spec that does the conversion. (I'm not really even clear on why it needs to be any prose at all. You just treat the bytes differently.)

Comment 10 Anne 2014-10-16 16:35:03 UTC

The current setup works well for anything APIs. It seems that the only thing that it does not work well for is reflected attributes, which is also a one sentence fix.

Comment 11 Ian 'Hixie' Hickson 2014-11-26 20:07:28 UTC

I'm very confused. The content and IDL attributes here are just regular DOMString attributes, they're not anything special until they are later parsed as URLs. Are you going to change e.g. getAttribute() to return a USVString?

Comment 12 Anne 2014-11-27 08:05:02 UTC

I mean that for the case of reflected attributes you would have to invoke the conversion yourself before handing it to the URL parser.

I hope that at some point we can define reflected attributes as such, which is already the case in Chromium as I understand it:

  [Reflect=URL] attribute USVString href;

Comment 13 Ian 'Hixie' Hickson 2014-11-28 04:26:49 UTC

Oh you're talking just about how these attributes resolve themselves? Not about how the URL is actually used?

I really don't understand the problem here. I pass DOMString strings to the URL parser all the time (e.g. whenever I take a content attribute and resolve it to get an absolute URL). Why would reflecting attributes be any different.

Comment 14 Anne 2014-11-28 07:53:24 UTC

The problem is that the URL parser does not take a DOMString.

Comment 15 Ian 'Hixie' Hickson 2014-12-01 18:59:17 UTC

That is indeed the problem. I'm saying that instead of everyone having to convert their strings to Unicode before ever interacting with the URL spec, the URL spec should just act like everyone else and take the same kind of string as all the other APIs. If it needs to then act on them as if they're Unicode and not UTF-16, then that's fine, but that's an internal concern, not something you should expose in your API.

Comment 16 Anne 2014-12-03 14:43:33 UTC

I disagree, but I'm happy to add something like a "DOMString-accepting URL parser" hook.

Comment 17 Ian 'Hixie' Hickson 2014-12-03 20:10:34 UTC

Why don't you agree?

Why not just have the current hook, but just make it so it accepts both?

Comment 18 Anne 2014-12-03 21:51:07 UTC

So I was convinced by Simon that this was a better strategy as it keeps the URL parser surrogate free. Reversing that would be somewhat painful, but is definitely doable of course.

Comment 19 Ian 'Hixie' Hickson 2014-12-03 22:25:30 UTC

You can still keep the parser surrogate free. I'm just saying that whatever prose you would have me put at all the call sites, you would just put at the top of whatever algorithm I'm calling.

Comment 20 Anne 2014-12-03 22:28:16 UTC

That was my suggestion in comment 16...

Comment 21 Ian 'Hixie' Hickson 2014-12-03 22:54:05 UTC

Right, but you said you disagreed. I'm trying to figure out why you disagree.

Comment 22 Anne 2014-12-03 23:00:38 UTC

I would prefer addressing this per comment 6, but I'm okay with addressing this per comment 16 for now (until something like IDL [Reflect] becomes feasible which seems like a better solution).

Comment 23 Ian 'Hixie' Hickson 2014-12-03 23:33:03 UTC

I don't understand how comment 6 would help. Most of the call sites for this aren't the one "reflecting IDL attribute" call site. Can you elaborate? Why is having this in multiple call sites, and having additional IDL syntax, better than just having one line of prose in the URL spec?

Comment 24 Simon Pieters 2014-12-04 14:08:12 UTC

This seems similar to the various algorithms in css-syntax that need to be invoked with different kind of inputs from different places (a token stream or a string).

http://dev.w3.org/csswg/css-syntax/#parser-entry-points

I agree with Hixie that it seems nicer to normalize the input on your end instead of having all other specs convert to the input you want. It centralizes the conversion so it is done consistently, and it is less prose for other specs.

Comment 25 Anne 2016-03-10 08:49:16 UTC

https://github.com/whatwg/html/pull/840