<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>19978</bug_id>
          
          <creation_ts>2012-11-16 07:28:59 +0000</creation_ts>
          <short_desc>The decoding algorithm uses ampersand as the separator. But HTML4 recommended semicolon. http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 Therefore both semicolon and ampersand should be the separator.</short_desc>
          <delta_ts>2014-01-15 14:03:53 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>URL</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>other</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#url-encoded-form-data</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>annevk</cc>
    
    <cc>ian</cc>
    
    <cc>mike</cc>
    
    <cc>naruse</cc>
          
          <qa_contact>sideshowbarker+urlspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>78380</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-11-16 07:28:59 +0000</bug_when>
    <thetext>Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html
Multipage: http://www.whatwg.org/C#url-encoded-form-data
Complete: http://www.whatwg.org/c#url-encoded-form-data

Comment:
The decoding algorithm uses ampersand as the separator. But HTML4 recommended
semicolon.
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2
Therefore both semicolon and ampersand should be the separator.

Posted from: 218.45.212.2
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78741</commentid>
    <comment_count>1</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-24 16:44:22 +0000</bug_when>
    <thetext>That&apos;s not compatible with implementations.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78815</commentid>
    <comment_count>2</comment_count>
    <who name="NARUSE, Yui">naruse</who>
    <bug_when>2012-11-26 05:55:52 +0000</bug_when>
    <thetext>(In reply to comment #1)
&gt; That&apos;s not compatible with implementations.

What I point is not encoding but decoding, so this should increase compatibility.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78820</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-26 10:29:14 +0000</bug_when>
    <thetext>Compatibility with which encoding implementations? Or do decoding implementations typically implement this? (Though if you cannot get it generated that seems pointless.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78824</commentid>
    <comment_count>4</comment_count>
    <who name="NARUSE, Yui">naruse</who>
    <bug_when>2012-11-26 13:27:59 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; Compatibility with which encoding implementations? Or do decoding
&gt; implementations typically implement this? (Though if you cannot get it
&gt; generated that seems pointless.)

For example CGI.pm of perl emits semicolon-separated query string:
&gt; perl -e&apos;use CGI;$q=CGI-&gt;new;$q-&gt;param(foo =&gt; &quot;bar&quot;);$q-&gt;param(hoge =&gt; &quot;fuga&quot;);print $q-&gt;query_string()&apos;
foo=bar;hoge=fuga

Ruby&apos;s cgi.rb emits &amp;-separated string, but can parse ;-separated one.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78825</commentid>
    <comment_count>5</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-26 13:35:09 +0000</bug_when>
    <thetext>I guess we might want to consider it a bit more then.

In any event, I want to use  this algorithm for the URLQuery API and there I definitely do not want ; to count as separator. &quot;?na;me=value&amp;name;2=othervalue&quot; should just be split on &amp; and then =.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78843</commentid>
    <comment_count>6</comment_count>
    <who name="NARUSE, Yui">naruse</who>
    <bug_when>2012-11-26 18:15:27 +0000</bug_when>
    <thetext>Is there a web browser which doesn&apos;t escape ; in name on form submitting?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78846</commentid>
    <comment_count>7</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-26 18:27:57 +0000</bug_when>
    <thetext>I&apos;m not sure, but you can get such URLs by manipulating via JavaScript or simply with &lt;a&gt;.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78867</commentid>
    <comment_count>8</comment_count>
    <who name="NARUSE, Yui">naruse</who>
    <bug_when>2012-11-26 21:14:42 +0000</bug_when>
    <thetext>What creates such URLs?

When it is via JavaScript, there&apos;s no direct generator function from a form like form.queryString().
If it uses escape(), it escapes ; and &amp;.
If it uses encodeURI(), it doesn&apos;t escape neither ; and &amp;, but it is wrong use.
If it uses encodeURIComponent(), it escapes ; and &amp;.
So there&apos;s no problem.

When it is simple a href, it seems by some libraries or by hand.

If it uses libries,
Perl&apos;s CGI.pm decodes with ; and &amp; as separators, and encodes with ; by default.
Python&apos;s cgi.py decodes with ; and &amp; as separators, and urllib.py encodes with &amp;.
Ruby&apos;s cgi.rb decodes with ; and &amp; as separators, and uri.rb encodes with &amp;.
PHP decodes with &amp; as a separator (can change by arg_separator.input), and http_build_query encodes with &amp;.
All of them encodes both &amp; and ; of key.
So there&apos;s no problem.

If it is written by hand, there&apos;s many possibility.
It may use odd separator like !, ,, |, $, and so on.
Of course it includes a query string like &quot;?na;me=value&amp;name;2=othervalue&quot;.

Therefore I think splitting with [&amp;;] is reasonable de facto standard.
Current decoding algorithm breaks CGI.pm, the majority, defending rare edge cases.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80627</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-12-27 00:14:46 +0000</bug_when>
    <thetext>This is a URL spec bug now right?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80628</commentid>
    <comment_count>10</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-12-27 00:15:20 +0000</bug_when>
    <thetext>(assuming it&apos;s a bug at all, I mean; I personally think it should be WONTFIXed as I see no value in using semicolons as well, and supporting multiple syntaxes is a recipe for security bugs, typically)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80636</commentid>
    <comment_count>11</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-12-27 12:40:37 +0000</bug_when>
    <thetext>It&apos;s URL once HTML starts making that dependency. Agreed about WONTFIX. Seems better if everyone aims to converge to a single format.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80637</commentid>
    <comment_count>12</comment_count>
    <who name="NARUSE, Yui">naruse</who>
    <bug_when>2012-12-27 15:01:52 +0000</bug_when>
    <thetext>I&apos;m ok about WONTFIX if you decide it with understanding above facts I showed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>80646</commentid>
    <comment_count>13</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-12-27 21:34:41 +0000</bug_when>
    <thetext>Ok, Anne, your call. (Assume HTML defers to URL for this stuff.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>98500</commentid>
    <comment_count>14</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-01-15 14:03:53 +0000</bug_when>
    <thetext>Current libraries on the server also have other quirks as illustrated in bug 24222.

They are free to implement other things I think. The specification just defines parsing for the format produced by the web platform.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>