23969 – Non-ASCII application/x-www-form-urlencoded

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23969 - Non-ASCII application/x-www-form-urlencoded

Summary: Non-ASCII application/x-www-form-urlencoded

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+urlspec

URL:
Whiteboard:
Keywords:

Duplicates (1):	24146 (view as bug list)
Depends on:
Blocks:

Reported:	2013-12-03 13:48 UTC by Simon Sapin
Modified:	2014-01-14 16:12 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Simon Sapin 2013-12-03 13:48:36 UTC

http://url.spec.whatwg.org/#application/x-www-form-urlencoded

[[
The application/x-www-form-urlencoded parser takes a string input using code points in the range U+0000 to U+007F
]]

Similarly to bug 23958, this parser seems to be called with unrestricted (potentially non-ASCII) input:

new URLSearchParams('a=☃')

Comment 1 Anne 2013-12-03 14:17:01 UTC

So the idea is to make the parser accept a byte sequence and then overload it with a parser that accepts a string. And then if you pass a string it'll simply utf-8 encode the string and invoke the byte version with the result.

Comment 2 Simon Sapin 2013-12-03 14:28:36 UTC

I’m a bit worried that, when used with a character encoding not as resilient as UTF-8, percent-escaped sequences could change the meaning of neighboring non-ASCII code points (since the character decoder that is called next doesn’t know what bytes come from the character encoder or from percent-escaping.)

I think I prefer the design where consecutive percent-escaped sequences are character-decoded together. (Ie. percent decoding maps text to text and takes an encoding.)

Comment 3 Anne 2013-12-03 14:30:40 UTC

The version that accepts strings would only ever use utf-8 and then pass the bytes to the byte version.

Comment 4 Simon Sapin 2013-12-03 14:44:00 UTC

What about application/x-www-form-urlencoded’s encoding override?

Comment 5 Anne 2013-12-03 14:54:45 UTC

I cannot really think of a scenario where that would apply and the input would not be bytes. Probably need to add warnings and such though.

Comment 6 Anne 2014-01-14 16:07:29 UTC

*** Bug 24146 has been marked as a duplicate of this bug. ***

Comment 7 Anne 2014-01-14 16:12:10 UTC

Implemented comment 2 plus comment 5.

https://github.com/whatwg/url/commit/3cfaa1779bfb9a3ba2b907a5802ca0251ca9a7e6