This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
http://url.spec.whatwg.org/#application/x-www-form-urlencoded [[ The application/x-www-form-urlencoded parser takes a string input using code points in the range U+0000 to U+007F ]] Similarly to bug 23958, this parser seems to be called with unrestricted (potentially non-ASCII) input: new URLSearchParams('a=☃')
So the idea is to make the parser accept a byte sequence and then overload it with a parser that accepts a string. And then if you pass a string it'll simply utf-8 encode the string and invoke the byte version with the result.
I’m a bit worried that, when used with a character encoding not as resilient as UTF-8, percent-escaped sequences could change the meaning of neighboring non-ASCII code points (since the character decoder that is called next doesn’t know what bytes come from the character encoder or from percent-escaping.) I think I prefer the design where consecutive percent-escaped sequences are character-decoded together. (Ie. percent decoding maps text to text and takes an encoding.)
The version that accepts strings would only ever use utf-8 and then pass the bytes to the byte version.
What about application/x-www-form-urlencoded’s encoding override?
I cannot really think of a scenario where that would apply and the input would not be bytes. Probably need to add warnings and such though.
*** Bug 24146 has been marked as a duplicate of this bug. ***
Implemented comment 2 plus comment 5. https://github.com/whatwg/url/commit/3cfaa1779bfb9a3ba2b907a5802ca0251ca9a7e6