This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19978 - The decoding algorithm uses ampersand as the separator. But HTML4 recommended semicolon. http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 Therefore both semicolon and ampersand should be the separator.
Summary: The decoding algorithm uses ampersand as the separator. But HTML4 recommended...
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-16 07:28 UTC by contributor
Modified: 2014-01-15 14:03 UTC (History)
4 users (show)

See Also:


Attachments

Description contributor 2012-11-16 07:28:59 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html
Multipage: http://www.whatwg.org/C#url-encoded-form-data
Complete: http://www.whatwg.org/c#url-encoded-form-data

Comment:
The decoding algorithm uses ampersand as the separator. But HTML4 recommended
semicolon.
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
Therefore both semicolon and ampersand should be the separator.

Posted from: 218.45.212.2
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11
Comment 1 Anne 2012-11-24 16:44:22 UTC
That's not compatible with implementations.
Comment 2 NARUSE, Yui 2012-11-26 05:55:52 UTC
(In reply to comment #1)
> That's not compatible with implementations.

What I point is not encoding but decoding, so this should increase compatibility.
Comment 3 Anne 2012-11-26 10:29:14 UTC
Compatibility with which encoding implementations? Or do decoding implementations typically implement this? (Though if you cannot get it generated that seems pointless.)
Comment 4 NARUSE, Yui 2012-11-26 13:27:59 UTC
(In reply to comment #3)
> Compatibility with which encoding implementations? Or do decoding
> implementations typically implement this? (Though if you cannot get it
> generated that seems pointless.)

For example CGI.pm of perl emits semicolon-separated query string:
> perl -e'use CGI;$q=CGI->new;$q->param(foo => "bar");$q->param(hoge => "fuga");print $q->query_string()'
foo=bar;hoge=fuga

Ruby's cgi.rb emits &-separated string, but can parse ;-separated one.
Comment 5 Anne 2012-11-26 13:35:09 UTC
I guess we might want to consider it a bit more then.

In any event, I want to use  this algorithm for the URLQuery API and there I definitely do not want ; to count as separator. "?na;me=value&name;2=othervalue" should just be split on & and then =.
Comment 6 NARUSE, Yui 2012-11-26 18:15:27 UTC
Is there a web browser which doesn't escape ; in name on form submitting?
Comment 7 Anne 2012-11-26 18:27:57 UTC
I'm not sure, but you can get such URLs by manipulating via JavaScript or simply with <a>.
Comment 8 NARUSE, Yui 2012-11-26 21:14:42 UTC
What creates such URLs?

When it is via JavaScript, there's no direct generator function from a form like form.queryString().
If it uses escape(), it escapes ; and &.
If it uses encodeURI(), it doesn't escape neither ; and &, but it is wrong use.
If it uses encodeURIComponent(), it escapes ; and &.
So there's no problem.

When it is simple a href, it seems by some libraries or by hand.

If it uses libries,
Perl's CGI.pm decodes with ; and & as separators, and encodes with ; by default.
Python's cgi.py decodes with ; and & as separators, and urllib.py encodes with &.
Ruby's cgi.rb decodes with ; and & as separators, and uri.rb encodes with &.
PHP decodes with & as a separator (can change by arg_separator.input), and http_build_query encodes with &.
All of them encodes both & and ; of key.
So there's no problem.

If it is written by hand, there's many possibility.
It may use odd separator like !, ,, |, $, and so on.
Of course it includes a query string like "?na;me=value&name;2=othervalue".

Therefore I think splitting with [&;] is reasonable de facto standard.
Current decoding algorithm breaks CGI.pm, the majority, defending rare edge cases.
Comment 9 Ian 'Hixie' Hickson 2012-12-27 00:14:46 UTC
This is a URL spec bug now right?
Comment 10 Ian 'Hixie' Hickson 2012-12-27 00:15:20 UTC
(assuming it's a bug at all, I mean; I personally think it should be WONTFIXed as I see no value in using semicolons as well, and supporting multiple syntaxes is a recipe for security bugs, typically)
Comment 11 Anne 2012-12-27 12:40:37 UTC
It's URL once HTML starts making that dependency. Agreed about WONTFIX. Seems better if everyone aims to converge to a single format.
Comment 12 NARUSE, Yui 2012-12-27 15:01:52 UTC
I'm ok about WONTFIX if you decide it with understanding above facts I showed.
Comment 13 Ian 'Hixie' Hickson 2012-12-27 21:34:41 UTC
Ok, Anne, your call. (Assume HTML defers to URL for this stuff.)
Comment 14 Anne 2014-01-15 14:03:53 UTC
Current libraries on the server also have other quirks as illustrated in bug 24222.

They are free to implement other things I think. The specification just defines parsing for the format produced by the web platform.