Bug 16303 - meaning of "all" charset parameters of content-type header
meaning of "all" charset parameters of content-type header
Status: RESOLVED WORKSFORME
Product: WebAppsWG
Classification: Unclassified
Component: XHR
unspecified
All All
: P2 normal
: ---
Assigned To: Anne
public-webapps-bugzilla
http://dvcs.w3.org/hg/xhr/raw-file/8d...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-10 07:06 UTC by Glenn Adams
Modified: 2012-11-27 14:21 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Glenn Adams 2012-03-10 07:06:27 UTC
Section 4.7.6 step 3 states:

"If a Content-Type header is in author request headers and its value is a valid MIME type that has a charset parameter whose value is not a case-insensitive match for encoding, and encoding is not null, set all the charset parameters of that Content-Type header to encoding."

Questions: what does *all* mean in "set all the charset parameters of that Content-Type"? could you give a concrete example of a case with more than one charset parameter in the single media type value of Content-Type?
Comment 1 Julian Reschke 2012-03-10 09:18:49 UTC
IMHO this is a good example of over-specification. Having multiple charset parameters is invalid, specifying the wrong charset is a bug, and relying on case in charset names is a bug. So this specifies behavior for a case of a double client bug + a server bug at the same time.

As far as I can tell, no UA except FF has tried to implement this yet; and the implementation in FF has required lots of hacks and layering violations (essentially, the header field parser needs to preserve all kinds of state that otherwise wouldn't be needed). Furthermore, it also doesn't do this for "all" charset parameters.

My suggestion would be to drop that silly requirement, and jut clarify that if you specify the charset although XHR will override it, you're on your own.
Comment 2 Anne 2012-03-26 17:22:23 UTC
"text/html;charset=utf-6;charset=utf-9" would be an example. The text should be taken literally, like all text.
Comment 3 Julian Reschke 2012-04-12 07:13:26 UTC
I believe this should be left open until you have evidence of at least two implementations doing what the spec asks for.
Comment 4 Anne 2012-04-12 07:16:27 UTC
Oh please. We're not going to open bugs for everything where implementations currently mismatch.
Comment 5 Julian Reschke 2012-04-12 07:28:22 UTC
This requirement has been in the spec for years. Last time I checked, only Firefox attempted to implement it, but didn't as specified. 

I think the logical next step is to actually try to *remove* it from Firefox, and then to simplify the spec.
Comment 6 Anne 2012-04-12 07:54:22 UTC
What do you mean by "it" and what would removing "it" mean? I don't really care that much what we do here, but I disagree that this is over-specification. Multiple charset parameters is a legitimate situation that can come up and implementors need to know what to do.
Comment 7 Julian Reschke 2012-04-12 08:44:59 UTC
(In reply to comment #6)
> What do you mean by "it" and what would removing "it" mean? I don't really care
> that much what we do here, but I disagree that this is over-specification.
> Multiple charset parameters is a legitimate situation that can come up and
> implementors need to know what to do.

things the spec says (last time I checked):

a) rewrite Content-Type request header field, because XHR *will* use UTF-8

b) if original charset matched UTF-8, preserve it's exact representation (so don't rewrite "Utf-8" to "UTF-8")

c) in addition, do that for all additional charset params

So this is an edge case of an edge case of an edge case.

Optimally, we can get rid of all of this; just declare that if the sender doesn't specify UTF-8, it's his problem (resulting in an inconsistent request).

b) seems to be a workaround for one broken server seen in the past. Maybe it was fixed?

c) over-specifies b) for the a broken input. Nobody implements it. Why do you care about this edge case in the first place?
Comment 8 Anne 2012-04-12 08:48:15 UTC
The spec does not do b). It seemed best to replace all charset parameters rather than just the first or last. We could also define either first or last, but I'm not sure how that is better.
Comment 9 Julian Reschke 2012-04-12 08:55:28 UTC
(In reply to comment #8)
> The spec does not do b). It seemed best to replace all charset parameters
> rather than just the first or last. We could also define either first or last,
> but I'm not sure how that is better.

Well, but UAs do. I haven't seen a UA that does what you want for multiple charset params. It's a garbage parameter.
Comment 10 Anne 2012-04-12 08:58:23 UTC
I don't think all UAs do b) and the specification has to deal with input garbage as we cannot control it. We have to do something.
Comment 11 Julian Reschke 2012-04-12 09:08:05 UTC
(In reply to comment #10)
> I don't think all UAs do b) and the specification has to deal with input
> garbage as we cannot control it. We have to do something.

No, you don't "have to do something". It's your choice.

Seems this is going nowhere without test cases and comparison of what UAs do, which I'll try to get done as soon time permits.
Comment 12 Anne 2012-04-12 09:10:33 UTC
I'm not going to leave edge cases undefined.
Comment 13 Julian Reschke 2012-04-12 09:19:56 UTC
(In reply to comment #12)
> I'm not going to leave edge cases undefined.

The problem here is that you assume that UAs use a full-blown Content-Type parser here (needed to extract proper type and charset information in the first place), *and* that that parser preserves all the information you want (about duplicated parameters).

This is not the case right now, and as far as I can tell.

That's why I'm calling this "overspecification".

If you absolutely *want* to handle this case, a much much simpler approach is to treat it as error and throw an exception.
Comment 14 Anne 2012-04-12 09:24:01 UTC
I'm not assuming that. I'm just saying I don't want to leave edge cases undefined. We could also make this about the first or last charset parameter, as I've indicated.

Throwing an exception instead (where? for what method?) is not at all a simpler solution nor a solution that is going to work.
Comment 15 Julian Reschke 2012-04-12 09:35:04 UTC
(In reply to comment #14)
> I'm not assuming that. I'm just saying I don't want to leave edge cases
> undefined. We could also make this about the first or last charset parameter,
> as I've indicated.

"If a Content-Type header is in author request headers and its value is a valid MIME type that has a charset parameter whose value is not a case-insensitive match for encoding, and encoding is not null, set all the charset parameters of that Content-Type header to encoding."

To make this decision, a UA will have to run the field value through a parser, otherwise it will not know about individual parameters.
 
> Throwing an exception instead (where? for what method?) is not at all a simpler
> solution nor a solution that is going to work.

For setRequestHeader() if possible, otherwise for send().

It is simpler as it doesn't require the UA to have a parser for broken field values.

Also, when you claim "is not going to work" it would be awesome if you could explain why.
Comment 16 Anne 2012-04-12 09:38:51 UTC
Because setRequestHeader() would have to get special purpose parsers. That's not how setRequestHeader() works.
Comment 17 Julian Reschke 2012-04-12 10:05:59 UTC
(In reply to comment #16)
> Because setRequestHeader() would have to get special purpose parsers. That's
> not how setRequestHeader() works.

"For setRequestHeader() if possible, otherwise for send()."
Comment 18 Anne 2012-04-12 10:08:26 UTC
We cannot start throwing for something we previously did not throw for. Also, I do not see how it's easier as you would still have to parse the header whereas you say a simpler pattern is used (which we could define, I don't mind).
Comment 19 Julian Reschke 2012-04-12 10:17:20 UTC
(In reply to comment #18)
> We cannot start throwing for something we previously did not throw for. Also, I

Why not?

> do not see how it's easier as you would still have to parse the header whereas
> you say a simpler pattern is used (which we could define, I don't mind).

No, if you want to rewrite charset parameters, you need to parse the header.

The spec, as written currently, assumes that these parsers are written in a way that a broken field can get parsed, and that information about duplicated parameters is returned. As far as I can tell, this doesn't reflect reality.
Comment 20 Julian Reschke 2012-11-27 13:58:06 UTC
Does this part of the spec have any test coverage yet?