This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11452 - 4.10.22.5.5. should describe how to send the filename when the field is a file field. It should send the filename as "filename" parameter and its character encoding must be _charset_.
Summary: 4.10.22.5.5. should describe how to send the filename when the field is a fil...
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-02 07:27 UTC by contributor
Modified: 2011-08-04 05:13 UTC (History)
9 users (show)

See Also:


Attachments

Description contributor 2010-12-02 07:27:04 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/
Section: http://www.whatwg.org/specs/web-apps/current-work/#multipart-form-data

Comment:
4.10.22.5.5. should describe how to send the filename when the field is a file
field. It should send the filename as "filename" parameter and its character
encoding must be _charset_.

Posted from: 218.45.212.2
Comment 1 Anne 2010-12-02 11:19:27 UTC
Ideally this is described in terms of Blob/File objects to make it easier for FormData to describe this stuff.
Comment 2 Ian 'Hixie' Hickson 2011-01-11 00:06:32 UTC
I don't really follow what's being requested here. Why isn't this an RFC2388 issue? What do you mean by "its character encoding must be _charset_"? Also I'm not sure what a Blob or File would mean here. Can you elaborate?
Comment 3 NARUSE, Yui 2011-01-11 01:44:19 UTC
(In reply to comment #2)
> I don't really follow what's being requested here. Why isn't this an RFC2388 issue?

RFC2388 says "The sending application MAY supply a
   file name; if the file name of the sender's operating system is not
   in US-ASCII, the file name might be approximated, or encoded using
   the method of RFC 2231."
But IE, Firefox, Opera, Safari, and Chrome sends filename without any escaping.

>  What do you mean by "its character encoding must be _charset_"?

It means "selected character encoding".
Comment 4 Anne 2011-01-11 09:20:44 UTC
Well RFC 2388 certainly does not reflect reality so it would be nice if HTML5 explicitly defined how it fit in and what it violated.

An alternative (for which I have a preference) would be to take over the registration of multipart/form-data and fix it.
Comment 5 Anne 2011-02-16 09:57:32 UTC
So the problems are:

* Incorrect with respect to encodings. I.e. what octets are to be transmitted.
* Suggests files can be send together while they should be in separate fields for compatibility with existing server software.

In general processing on the server for this type is pretty restricted while the RFC seems to allow for countless of ways to encode things. The RFC is also very hard to read for something that comes down to encoding a bit of data.

One way to avoid upsetting people much might be to give a serialization algorithm in the HTML specification, effectively subsetting RFC 2388. (Though the encoding problem pointed out here would presumably be a willful violation...)
Comment 6 Ian 'Hixie' Hickson 2011-03-01 23:28:39 UTC
I'm not really sure what the first bullet point means. Can you elaborate on that? (To the level of detail that would be necessary to file a bug on the RFC, I mean.)

The second one isn't a problem, as previously discussed. The HTML side defines the fields for the purposes of the RFC, and they don't map 1:1 to the form controls, and there's only ever one file per field.

I wouldn't want to rewrite the RFC for such a small number of minor problems.
Comment 7 NARUSE, Yui 2011-03-02 01:32:56 UTC
Current implementation's behavior is:
* get filename (or file path) of the target file which is selected by the input form
* the encoding of the filename string is depend on the filesystem in which the file exists
* convert the encoding of the filename string to "selected character encoding" defined in 4.10.22.5.
* send the string without escaping

This behavior is "be approximated" in RFC2388?
If so, it should be clarify in HTML5 because it is not documentated yet.
If not so, it should be a willful violation.
Comment 8 Anne 2011-03-04 13:19:18 UTC
In reply to comment 6 I think the second one is a problem. Server implementors will do more work than necessary and client implementors might create code that does not work with popular servers. Only in combination with the HTML specification would it become clear what is actually needed which is not very desirable.
Comment 9 Ian 'Hixie' Hickson 2011-05-05 22:25:16 UTC
Re comment 8, this is a dupe of bug 12065 as far as I can tell.

(In reply to comment #7)
> * send the string without escaping

This is definitely not entirely accurate. For example, consider a file whose filename is what is given between the following single quote marks:

'filename newline
colon : quote " percent % five stars ★★★★★ '


Firefox uploads it with 'Content-Disposition: form-data; name="x";' followed by:

 filename="filename newline colon : quote \" percent % five stars ★★★★★ "


WebKit uploads it with 'Content-Disposition: form-data; name="x";' followed by:

 filename="filename newline%0Acolon : quote %22 percent % five stars ★★★★★ "


(Opera couldn't find the file and I couldn't test IE.)


So conclusion: there is some escaping going on ('\"' in Firefox, '%0A' and '%22' in WebKit), and some approximation going on (Firefox drops the newline). But characters that aren't special are just sent in the current encoding (UTF-8 in this case).
Comment 10 Ian 'Hixie' Hickson 2011-05-05 22:26:02 UTC
(Bugzilla screwed me over there. There should be no newline before the stars on the lines that start filename= in the previous comment.)
Comment 11 Ian 'Hixie' Hickson 2011-05-05 22:31:57 UTC
The RFC2388 prose is pretty vague here (it doesn't give any requirements), but I agree we could be more helpful here. Would it be ok to just add a note saying that "implementations are encouraged to transmit non-ASCII characters in filenames using the selected character encoding rather than approximating the filename or using RFC 2231, as suggested in RFC 2388"?
Comment 12 NARUSE, Yui 2011-05-06 10:03:56 UTC
(In reply to comment #9)
> Re comment 8, this is a dupe of bug 12065 as far as I can tell.

Ah, yes bug 12065 is also form file field and RFC2388.
Anyway, this bug's topic is mainly filename attribute of file field.

> (In reply to comment #7)
> > * send the string without escaping
> 
> This is definitely not entirely accurate. For example, consider a file whose
> filename is what is given between the following single quote marks:

Yes, my intention was non-ASCII characters, some ASCII characters as your example are escaped on Firefox/WebKit.
 
> (Opera couldn't find the file and I couldn't test IE.)

I tested IE. On Windows, newline, colon and quotes are not available for filenames, so it won't be a case.
(even if %22 can't be distinguish from escaped doble quote)

filename="C:\Documents and Settings\naruse\My Documents\filename percent % percent22 %22 five stars ★★★★★.txt"

> The RFC2388 prose is pretty vague here (it doesn't give any requirements), but
> I agree we could be more helpful here. Would it be ok to just add a note saying
> that "implementations are encouraged to transmit non-ASCII characters in
> filenames using the selected character encoding rather than approximating the
> filename or using RFC 2231, as suggested in RFC 2388"?

I'm ok about the conclusion.

Anywai if a browser which complient RFC2231 sends the filename above, the result will be

Content-Disposition: form-data; name="f";
 filename*0*=UTF-8'en'filename newline%0Acolon : quote %22 percent %25 five stars
 filename*1*=%E2%98%85%E2%98%85%E2%98%85%E2%98%85%E2%98%85%20

This is completely different from current implementations.
Comment 13 Ian 'Hixie' Hickson 2011-06-09 18:17:15 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: see above
Comment 14 contributor 2011-06-09 18:18:17 UTC
Checked in as WHATWG revision r6206.
Check-in comment: Add requirements for how to express file names in formdata
http://html5.org/tools/web-apps-tracker?from=6205&to=6206
Comment 15 Michael[tm] Smith 2011-08-04 05:13:38 UTC
mass-move component to LC1