This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18135 - multipart/form-data: field name encoding is not specified; browsers do incompatible things
Summary: multipart/form-data: field name encoding is not specified; browsers do incomp...
Status: RESOLVED MOVED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P1 normal
Target Milestone: ---
Assignee: Robin Berjon
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 17:24 UTC by contributor
Modified: 2016-04-19 22:39 UTC (History)
6 users (show)

See Also:


Attachments

Description contributor 2012-07-18 17:24:10 UTC
This was was cloned from bug 16909 as part of operation convergence.
Originally filed: 2012-05-02 20:09:00 +0000

================================================================================
 #0   contributor@whatwg.org                          2012-05-02 20:09:21 +0000 
--------------------------------------------------------------------------------
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html
Multipage: http://www.whatwg.org/C#multipart-form-data
Complete: http://www.whatwg.org/c#multipart-form-data

Comment:
The specification is unclear about how field names should be encoded. In
particular, what should be done if they include special characters? (eg.
quotes, new lines, unicode, etc?). I started a mailing list thread on this
issue...

Posted from: 74.66.64.60
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5
================================================================================
 #1   Evan Jones                                      2012-05-02 20:10:52 +0000 
--------------------------------------------------------------------------------
The specification is unclear about how field names should be encoded. In particular, what should be done if they include special characters? (eg. quotes, new lines, unicode, etc?).
================================================================================
 #2   Evan Jones                                      2012-05-02 20:41:21 +0000 
--------------------------------------------------------------------------------
Argh; whoops. Sorry for the bugzilla spam. I didn't realize that the "comment" thingy just filed a bugzilla bug.

HTML5 states: "Encode the (now mutated) form data set using the rules described by RFC 2388". However, it then modifies the rules:

"The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388)."

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data

So the problem is: what are we supposed to do with field names? In particular, what if they contain "special" MIME characters (e.g. \r\n newlines, backslashes, double quotes, or semi-colons?). Different browsers do different things, meaning that currently server code must detect the browser to do the right thing.


Example: <input name='bàz%22\"\' value="foo">

Firefox 13b: Content-Disposition: form-data; name="bàz%22\\"\"
Webkit nightly: Content-Disposition: form-data; name="bàz%22\%22\"

Firefox backslash quotes double quotes, except it fails to quote backslashes. This means its header fails to parse according to the MIME specification (it sort of decodes as bàz%22\ with an extra trailing \"

Webkit %-escapes the double quotes, but does not %-escape the percent. Thus the above form control could be either name='bàz"\"\' or the desired name. Webkit has a bug open on this issue, asking for specification guidance: https://bugs.webkit.org/show_bug.cgi?id=62107


HTML5 should specify exactly how field names are encoded. Some potential solutions:

1) Bless Firefox's backslash quoting rules (they are very weird but I think they are unambiguous?). This means Webkit POSTs will be decoded to the wrong field names, and POSTs to older servers may parse incorrectly if the name includes a \ (but that must already happen for Firefox?).

2) Bless Webkit's percent escaping rules (ideally also escaping %). Servers that strictly parse this format will fail to parse Firefox POSTs if the name includes a \, and will 

3) Adopt RFC 6266's approach of having two name parameters when there are special characters: one with the existing escaping, and one with an unambiguously escaped version. Ideally, existing servers will parse the first name and not break unless the form value contains a special character. As servers are upgraded, they will be able to unambiguously parse the new header. See: http://tools.ietf.org/html/rfc6266


Aside: The *same* issue happens for uploaded file names. I started a mailing list thread to attempt to collect more information about this: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-May/035610.html
================================================================================
Comment 1 Michael[tm] Smith 2015-06-16 10:16:07 UTC
Making this a higher priority to actively seek more feedback on from implementers and webdevs.
Comment 2 Travis Leithead [MSFT] 2016-04-19 22:39:40 UTC
HTML5.1 Bugzilla Bug Triage: Moved

Moved the summary and tracking of followup on this issue to GitHub:
https://github.com/w3c/html/issues/222

If this resolution is not satisfactory, please copy the relevant bug details/proposal into a new issue at the W3C HTML5 Issue tracker: https://github.com/w3c/html/issues/new where it will be re-triaged. Thanks!