This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This was was cloned from bug 16909 as part of operation convergence. Originally filed: 2012-05-02 20:09:00 +0000 ================================================================================ #0 contributor@whatwg.org 2012-05-02 20:09:21 +0000 -------------------------------------------------------------------------------- Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html Multipage: http://www.whatwg.org/C#multipart-form-data Complete: http://www.whatwg.org/c#multipart-form-data Comment: The specification is unclear about how field names should be encoded. In particular, what should be done if they include special characters? (eg. quotes, new lines, unicode, etc?). I started a mailing list thread on this issue... Posted from: 74.66.64.60 User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5 ================================================================================ #1 Evan Jones 2012-05-02 20:10:52 +0000 -------------------------------------------------------------------------------- The specification is unclear about how field names should be encoded. In particular, what should be done if they include special characters? (eg. quotes, new lines, unicode, etc?). ================================================================================ #2 Evan Jones 2012-05-02 20:41:21 +0000 -------------------------------------------------------------------------------- Argh; whoops. Sorry for the bugzilla spam. I didn't realize that the "comment" thingy just filed a bugzilla bug. HTML5 states: "Encode the (now mutated) form data set using the rules described by RFC 2388". However, it then modifies the rules: "The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388)." http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data So the problem is: what are we supposed to do with field names? In particular, what if they contain "special" MIME characters (e.g. \r\n newlines, backslashes, double quotes, or semi-colons?). Different browsers do different things, meaning that currently server code must detect the browser to do the right thing. Example: <input name='bàz%22\"\' value="foo"> Firefox 13b: Content-Disposition: form-data; name="bàz%22\\"\" Webkit nightly: Content-Disposition: form-data; name="bàz%22\%22\" Firefox backslash quotes double quotes, except it fails to quote backslashes. This means its header fails to parse according to the MIME specification (it sort of decodes as bàz%22\ with an extra trailing \" Webkit %-escapes the double quotes, but does not %-escape the percent. Thus the above form control could be either name='bàz"\"\' or the desired name. Webkit has a bug open on this issue, asking for specification guidance: https://bugs.webkit.org/show_bug.cgi?id=62107 HTML5 should specify exactly how field names are encoded. Some potential solutions: 1) Bless Firefox's backslash quoting rules (they are very weird but I think they are unambiguous?). This means Webkit POSTs will be decoded to the wrong field names, and POSTs to older servers may parse incorrectly if the name includes a \ (but that must already happen for Firefox?). 2) Bless Webkit's percent escaping rules (ideally also escaping %). Servers that strictly parse this format will fail to parse Firefox POSTs if the name includes a \, and will 3) Adopt RFC 6266's approach of having two name parameters when there are special characters: one with the existing escaping, and one with an unambiguously escaped version. Ideally, existing servers will parse the first name and not break unless the form value contains a special character. As servers are upgraded, they will be able to unambiguously parse the new header. See: http://tools.ietf.org/html/rfc6266 Aside: The *same* issue happens for uploaded file names. I started a mailing list thread to attempt to collect more information about this: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-May/035610.html ================================================================================
Making this a higher priority to actively seek more feedback on from implementers and webdevs.
HTML5.1 Bugzilla Bug Triage: Moved Moved the summary and tracking of followup on this issue to GitHub: https://github.com/w3c/html/issues/222 If this resolution is not satisfactory, please copy the relevant bug details/proposal into a new issue at the W3C HTML5 Issue tracker: https://github.com/w3c/html/issues/new where it will be re-triaged. Thanks!