This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23969 - Non-ASCII application/x-www-form-urlencoded
Summary: Non-ASCII application/x-www-form-urlencoded
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
: 24146 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-12-03 13:48 UTC by Simon Sapin
Modified: 2014-01-14 16:12 UTC (History)
2 users (show)

See Also:


Attachments

Description Simon Sapin 2013-12-03 13:48:36 UTC
http://url.spec.whatwg.org/#application/x-www-form-urlencoded

[[
The application/x-www-form-urlencoded parser takes a string input using code points in the range U+0000 to U+007F
]]

Similarly to bug 23958, this parser seems to be called with unrestricted (potentially non-ASCII) input:

new URLSearchParams('a=☃')
Comment 1 Anne 2013-12-03 14:17:01 UTC
So the idea is to make the parser accept a byte sequence and then overload it with a parser that accepts a string. And then if you pass a string it'll simply utf-8 encode the string and invoke the byte version with the result.
Comment 2 Simon Sapin 2013-12-03 14:28:36 UTC
I’m a bit worried that, when used with a character encoding not as resilient as UTF-8, percent-escaped sequences could change the meaning of neighboring non-ASCII code points (since the character decoder that is called next doesn’t know what bytes come from the character encoder or from percent-escaping.)

I think I prefer the design where consecutive percent-escaped sequences are character-decoded together. (Ie. percent decoding maps text to text and takes an encoding.)
Comment 3 Anne 2013-12-03 14:30:40 UTC
The version that accepts strings would only ever use utf-8 and then pass the bytes to the byte version.
Comment 4 Simon Sapin 2013-12-03 14:44:00 UTC
What about application/x-www-form-urlencoded’s encoding override?
Comment 5 Anne 2013-12-03 14:54:45 UTC
I cannot really think of a scenario where that would apply and the input would not be bytes. Probably need to add warnings and such though.
Comment 6 Anne 2014-01-14 16:07:29 UTC
*** Bug 24146 has been marked as a duplicate of this bug. ***