19978 – The decoding algorithm uses ampersand as the separator. But HTML4 recommended semicolon. http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 Therefore both semicolon and ampersand should be the separator.

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19978 - The decoding algorithm uses ampersand as the separator. But HTML4 recommended semicolon. http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 Therefore both semicolon and ampersand should be the separator.

Summary: The decoding algorithm uses ampersand as the separator. But HTML4 recommended...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+urlspec

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-11-16 07:28 UTC by contributor
Modified:	2014-01-15 14:03 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description contributor 2012-11-16 07:28:59 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html
Multipage: http://www.whatwg.org/C#url-encoded-form-data
Complete: http://www.whatwg.org/c#url-encoded-form-data

Comment:
The decoding algorithm uses ampersand as the separator. But HTML4 recommended
semicolon.
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
Therefore both semicolon and ampersand should be the separator.

Posted from: 218.45.212.2
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11

Comment 1 Anne 2012-11-24 16:44:22 UTC

That's not compatible with implementations.

Comment 2 NARUSE, Yui 2012-11-26 05:55:52 UTC

(In reply to comment #1)
> That's not compatible with implementations.

What I point is not encoding but decoding, so this should increase compatibility.

Comment 3 Anne 2012-11-26 10:29:14 UTC

Compatibility with which encoding implementations? Or do decoding implementations typically implement this? (Though if you cannot get it generated that seems pointless.)

Comment 4 NARUSE, Yui 2012-11-26 13:27:59 UTC

(In reply to comment #3)
> Compatibility with which encoding implementations? Or do decoding
> implementations typically implement this? (Though if you cannot get it
> generated that seems pointless.)

For example CGI.pm of perl emits semicolon-separated query string:
> perl -e'use CGI;$q=CGI->new;$q->param(foo => "bar");$q->param(hoge => "fuga");print $q->query_string()'
foo=bar;hoge=fuga

Ruby's cgi.rb emits &-separated string, but can parse ;-separated one.

Comment 5 Anne 2012-11-26 13:35:09 UTC

I guess we might want to consider it a bit more then.

In any event, I want to use  this algorithm for the URLQuery API and there I definitely do not want ; to count as separator. "?na;me=value&name;2=othervalue" should just be split on & and then =.

Comment 6 NARUSE, Yui 2012-11-26 18:15:27 UTC

Is there a web browser which doesn't escape ; in name on form submitting?

Comment 7 Anne 2012-11-26 18:27:57 UTC

I'm not sure, but you can get such URLs by manipulating via JavaScript or simply with <a>.

Comment 8 NARUSE, Yui 2012-11-26 21:14:42 UTC

What creates such URLs?

When it is via JavaScript, there's no direct generator function from a form like form.queryString().
If it uses escape(), it escapes ; and &.
If it uses encodeURI(), it doesn't escape neither ; and &, but it is wrong use.
If it uses encodeURIComponent(), it escapes ; and &.
So there's no problem.

When it is simple a href, it seems by some libraries or by hand.

If it uses libries,
Perl's CGI.pm decodes with ; and & as separators, and encodes with ; by default.
Python's cgi.py decodes with ; and & as separators, and urllib.py encodes with &.
Ruby's cgi.rb decodes with ; and & as separators, and uri.rb encodes with &.
PHP decodes with & as a separator (can change by arg_separator.input), and http_build_query encodes with &.
All of them encodes both & and ; of key.
So there's no problem.

If it is written by hand, there's many possibility.
It may use odd separator like !, ,, |, $, and so on.
Of course it includes a query string like "?na;me=value&name;2=othervalue".

Therefore I think splitting with [&;] is reasonable de facto standard.
Current decoding algorithm breaks CGI.pm, the majority, defending rare edge cases.

Comment 9 Ian 'Hixie' Hickson 2012-12-27 00:14:46 UTC

This is a URL spec bug now right?

Comment 10 Ian 'Hixie' Hickson 2012-12-27 00:15:20 UTC

(assuming it's a bug at all, I mean; I personally think it should be WONTFIXed as I see no value in using semicolons as well, and supporting multiple syntaxes is a recipe for security bugs, typically)

Comment 11 Anne 2012-12-27 12:40:37 UTC

It's URL once HTML starts making that dependency. Agreed about WONTFIX. Seems better if everyone aims to converge to a single format.

Comment 12 NARUSE, Yui 2012-12-27 15:01:52 UTC

I'm ok about WONTFIX if you decide it with understanding above facts I showed.

Comment 13 Ian 'Hixie' Hickson 2012-12-27 21:34:41 UTC

Ok, Anne, your call. (Assume HTML defers to URL for this stuff.)

Comment 14 Anne 2014-01-15 14:03:53 UTC

Current libraries on the server also have other quirks as illustrated in bug 24222.

They are free to implement other things I think. The specification just defines parsing for the format produced by the web platform.