<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>16909</bug_id>
          
          <creation_ts>2012-05-02 20:09:21 +0000</creation_ts>
          <short_desc>multipart/form-data: field name encoding is not specified; browsers do incompatible things</short_desc>
          <delta_ts>2019-03-29 21:49:01 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>HTML</component>
          <version>unspecified</version>
          <rep_platform>Other</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>MOVED</resolution>
          
          
          <bug_file_loc>http://www.whatwg.org/specs/web-apps/current-work/#multipart-form-data</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P3</priority>
          <bug_severity>major</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          <blocked>19879</blocked>
          <everconfirmed>1</everconfirmed>
          <reporter>contributor</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>anforowicz</cc>
    
    <cc>annevk</cc>
    
    <cc>d</cc>
    
    <cc>ej</cc>
    
    <cc>ian</cc>
    
    <cc>julian.reschke</cc>
    
    <cc>lmm</cc>
    
    <cc>masinter</cc>
    
    <cc>mike</cc>
    
    <cc>slave.loren</cc>
    
    <cc>w3bugs</cc>
          
          <qa_contact>contributor</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>67257</commentid>
    <comment_count>0</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-05-02 20:09:21 +0000</bug_when>
    <thetext>Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html
Multipage: http://www.whatwg.org/C#multipart-form-data
Complete: http://www.whatwg.org/c#multipart-form-data

Comment:
The specification is unclear about how field names should be encoded. In
particular, what should be done if they include special characters? (eg.
quotes, new lines, unicode, etc?). I started a mailing list thread on this
issue...

Posted from: 74.66.64.60
User agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>67259</commentid>
    <comment_count>1</comment_count>
    <who name="Evan Jones">ej</who>
    <bug_when>2012-05-02 20:10:52 +0000</bug_when>
    <thetext>The specification is unclear about how field names should be encoded. In particular, what should be done if they include special characters? (eg. quotes, new lines, unicode, etc?).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>67261</commentid>
    <comment_count>2</comment_count>
    <who name="Evan Jones">ej</who>
    <bug_when>2012-05-02 20:41:21 +0000</bug_when>
    <thetext>Argh; whoops. Sorry for the bugzilla spam. I didn&apos;t realize that the &quot;comment&quot; thingy just filed a bugzilla bug.

HTML5 states: &quot;Encode the (now mutated) form data set using the rules described by RFC 2388&quot;. However, it then modifies the rules:

&quot;The parts of the generated multipart/form-data resource that correspond to non-file fields must not have a Content-Type header specified. Their names and values must be encoded using the character encoding selected above (field names in particular do not get converted to a 7-bit safe encoding as suggested in RFC 2388).&quot;

http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html#multipart-form-data

So the problem is: what are we supposed to do with field names? In particular, what if they contain &quot;special&quot; MIME characters (e.g. \r\n newlines, backslashes, double quotes, or semi-colons?). Different browsers do different things, meaning that currently server code must detect the browser to do the right thing.


Example: &lt;input name=&apos;bàz%22\&quot;\&apos; value=&quot;foo&quot;&gt;

Firefox 13b: Content-Disposition: form-data; name=&quot;bàz%22\\&quot;\&quot;
Webkit nightly: Content-Disposition: form-data; name=&quot;bàz%22\%22\&quot;

Firefox backslash quotes double quotes, except it fails to quote backslashes. This means its header fails to parse according to the MIME specification (it sort of decodes as bàz%22\ with an extra trailing \&quot;

Webkit %-escapes the double quotes, but does not %-escape the percent. Thus the above form control could be either name=&apos;bàz&quot;\&quot;\&apos; or the desired name. Webkit has a bug open on this issue, asking for specification guidance: https://bugs.webkit.org/show_bug.cgi?id=62107


HTML5 should specify exactly how field names are encoded. Some potential solutions:

1) Bless Firefox&apos;s backslash quoting rules (they are very weird but I think they are unambiguous?). This means Webkit POSTs will be decoded to the wrong field names, and POSTs to older servers may parse incorrectly if the name includes a \ (but that must already happen for Firefox?).

2) Bless Webkit&apos;s percent escaping rules (ideally also escaping %). Servers that strictly parse this format will fail to parse Firefox POSTs if the name includes a \, and will 

3) Adopt RFC 6266&apos;s approach of having two name parameters when there are special characters: one with the existing escaping, and one with an unambiguously escaped version. Ideally, existing servers will parse the first name and not break unless the form value contains a special character. As servers are upgraded, they will be able to unambiguously parse the new header. See: http://tools.ietf.org/html/rfc6266


Aside: The *same* issue happens for uploaded file names. I started a mailing list thread to attempt to collect more information about this: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-May/035610.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>70738</commentid>
    <comment_count>3</comment_count>
    <who name="">contributor</who>
    <bug_when>2012-07-18 17:24:14 +0000</bug_when>
    <thetext>This bug was cloned to create bug 18135 as part of operation convergence.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>79235</commentid>
    <comment_count>4</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2012-12-01 21:59:07 +0000</bug_when>
    <thetext>Larry, any chance RFC 2388 will get updated to resolve this issues?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>83080</commentid>
    <comment_count>5</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-02-13 00:41:17 +0000</bug_when>
    <thetext>Dropped a mail to Larry, we&apos;ll see what he says.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>85074</commentid>
    <comment_count>6</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-03-27 23:47:51 +0000</bug_when>
    <thetext>http://www.ietf.org/mail-archive/web/apps-discuss/current/msg08908.html

I&apos;m marking this with the same milestone as other form-related stuff, but I doubt I&apos;ll actually do this in the HTML spec. Any volunteers want to write this up as a new spec? See the e-mail above if you want to do this in the IETF space, or contact me on IRC if you want to do it in the WHATWG space, I&apos;m sure either way you&apos;ll find people eager to help you.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>85076</commentid>
    <comment_count>7</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-03-27 23:53:45 +0000</bug_when>
    <thetext>*** Bug 19879 has been marked as a duplicate of this bug. ***</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>90786</commentid>
    <comment_count>8</comment_count>
    <who name="Larry Masinter">lmm</who>
    <bug_when>2013-07-16 07:04:58 +0000</bug_when>
    <thetext>RFC 2388 was clear:
   Field names originally in non-ASCII character sets may be encoded
   within the value of the &quot;name&quot; parameter using the standard method
   described in RFC 2047.

For reasons I don&apos;t understand, browsers did different, incompatible
things. 

I think the main advice is: 

* those creating HTML forms 
   SHOULD use ASCII field names, since deployed HTML processors vary,
   and field names shouldn&apos;t be visible to the user anyway.

* Those developing server infrastructure to read multipart/form-data uploads
   SHOULD be aware of the varying behavior of the browsers in translating
   non-ASCII field names, and look for any of the variants (if they&apos;re 
   expecting non-ASCII field names). 

* Those developing browsers should migrate toward a standard 
  encoding, but the server infrastructure will still have to do
  fuzzy match for a long while.

What should the browsers migrate to?

 http://www.rfc-editor.org/rfc/rfc5987.txt 
seems like a more recent proposal and possibly implemented in HTTP anyway.

Sites that use non-ASCII field names and want to work with multiple
browsers already have to do fuzzy matching.

The problem is that the fuzzy matchers already deployed might not
recognize any *NEW* encodings.

So I suppose having a name* value would be necessary.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>93265</commentid>
    <comment_count>9</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-09-11 18:37:32 +0000</bug_when>
    <thetext>(In reply to Larry Masinter from comment #8)
&gt; RFC 2388 was clear:
&gt;    Field names originally in non-ASCII character sets may be encoded
&gt;    within the value of the &quot;name&quot; parameter using the standard method
&gt;    described in RFC 2047.

&quot;may&quot; is what makes this not clear. It means that the above is one option, but what are the other options? What else can they do?

Specs should basically never say MAY or SHOULD when it comes to describing what they put on the wire.


&gt; * those creating HTML forms 
&gt;    SHOULD use ASCII field names, since deployed HTML processors vary,
&gt;    and field names shouldn&apos;t be visible to the user anyway.

The goal on the HTML side is to have HTML processors not vary.


&gt; * Those developing server infrastructure to read multipart/form-data uploads
&gt;    SHOULD be aware of the varying behavior of the browsers in translating
&gt;    non-ASCII field names, and look for any of the variants (if they&apos;re 
&gt;    expecting non-ASCII field names). 

If the servers have to look for variants, we should define those variants.


&gt; * Those developing browsers should migrate toward a standard 
&gt;   encoding, but the server infrastructure will still have to do
&gt;   fuzzy match for a long while.
&gt;
&gt; What should the browsers migrate to?

What do they do now? Presumably what they do now is the right answer.


&gt; So I suppose having a name* value would be necessary.

I don&apos;t think adding new features here is viable. We should specify what most browsers do, and just stick with that. IMHO.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>93400</commentid>
    <comment_count>10</comment_count>
    <who name="Larry Masinter">lmm</who>
    <bug_when>2013-09-15 22:06:56 +0000</bug_when>
    <thetext>in Comment 8:

&gt; &quot;may&quot; is what makes this not clear

draft-masinter-multipart-form-data-00 (current revision
as of this note) doesn&apos;t use may or MAY

&gt; The goal on the HTML side is to have HTML processors not vary.

=&gt;  https://github.com/masinter/multipart-form-data/issues/8


That&apos;s a great goal for HTML, and this definition of
multipart/form-data shouldn&apos;t interfere with that goal.

&gt; If the servers have to look for variants,
    we should define those variants.


 What should the browsers migrate to?

What do they do now? Presumably what they do now is the right answer.


draft-masinter-multipart-form-data fixes</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>93403</commentid>
    <comment_count>11</comment_count>
    <who name="Larry Masinter">lmm</who>
    <bug_when>2013-09-15 23:00:43 +0000</bug_when>
    <thetext>see https://github.com/masinter/multipart-form-data/ 

contains draft of RFC2388bis, plus proposed HTML spec (excerpted content).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>125787</commentid>
    <comment_count>12</comment_count>
    <who name="Domenic Denicola">d</who>
    <bug_when>2016-04-08 23:54:08 +0000</bug_when>
    <thetext>I believe this might have been fixed in https://github.com/whatwg/html/pull/710. Larry, could you confirm that https://html.spec.whatwg.org/#multipart/form-data-encoding-algorithm correctly delegates to RFC 7578, and that RFC 7578 handles the cases discussed here?

From my reading I am not so sure... the HTML spec now says &quot;Encode the (now mutated) form data set using the rules described by RFC 7578&quot;, but I can&apos;t find an algorithm in RFC 7578 that takes as input a form data set and gives as output a byte stream.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>126252</commentid>
    <comment_count>13</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2016-04-28 11:25:37 +0000</bug_when>
    <thetext>How to encode names containing e.g., quotes, is still not defined. Apparently Chrome/WebKit uses percent-encoding to some extent and Firefox did not. See https://bugzilla.mozilla.org/show_bug.cgi?id=136676. I suppose at some point we&apos;ll need to define this format completely someplace.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>128834</commentid>
    <comment_count>14</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2017-08-11 08:18:03 +0000</bug_when>
    <thetext>Another problem, do filenames get normalized: https://bugzilla.mozilla.org/show_bug.cgi?id=695995. (Though this may also affect application/x-www-form-urlencoded I suspect.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>128854</commentid>
    <comment_count>15</comment_count>
    <who name="">anforowicz</who>
    <bug_when>2017-08-23 17:18:03 +0000</bug_when>
    <thetext>Another problem: which characters are allowed in a mime multipart boundary?

For example - Chromium and WebKit restrict the allowed characters to a subset of what is allowed by RFC 2046.  The restriction helps achieve compatibility with some of the servers - in particular see the analysis in https://bugs.webkit.org/show_bug.cgi?id=13352#c29 which says that some servers cannot process boundaries that include the &apos;/&apos; character.

References:

1) https://crbug.com/575779#c10 which tracks the following TODO in the Chromium code https://chromium.googlesource.com/chromium/src/+/79420989569478d5b9a05e35a841a10d9d836cc4/net/base/mime_util.cc#592 :

    // Characters to be used for mime multipart boundary.
    //
    // TODO(rsleevi): crbug.com/575779: Follow the spec or fix the spec.
    // The RFC 2046 spec says the alphanumeric characters plus the
    // following characters are legal for boundaries:  &apos;()+_,-./:=?
    // However the following characters, though legal, cause some sites
    // to fail: (),./:=+
    const char kMimeBoundaryCharacters[] =
        &quot;0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&quot;;

2) Equivalent code and comment in WebKit: https://github.com/WebKit/webkit/blob/d071f76012298b17327ca14981ca5ffdbd1621df/Source/WebCore/platform/network/FormDataBuilder.cpp#L79</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>129705</commentid>
    <comment_count>16</comment_count>
    <who name="Domenic Denicola">d</who>
    <bug_when>2019-03-29 21:49:01 +0000</bug_when>
    <thetext>https://github.com/whatwg/html/issues/3223 appears to be the best current tracking issue for this.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>