<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>16303</bug_id>
          
          <creation_ts>2012-03-10 07:06:27 +0000</creation_ts>
          <short_desc>meaning of &quot;all&quot; charset parameters of content-type header</short_desc>
          <delta_ts>2012-11-27 14:21:09 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebAppsWG</product>
          <component>XHR</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WORKSFORME</resolution>
          
          
          <bug_file_loc>http://dvcs.w3.org/hg/xhr/raw-file/8d4e9ccfdbd4/Overview.html#the-send()-method</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Glenn Adams">glenn</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>julian.reschke</cc>
    
    <cc>mike</cc>
    
    <cc>public-webapps</cc>
          
          <qa_contact>public-webapps-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>65279</commentid>
    <comment_count>0</comment_count>
    <who name="Glenn Adams">glenn</who>
    <bug_when>2012-03-10 07:06:27 +0000</bug_when>
    <thetext>Section 4.7.6 step 3 states:

&quot;If a Content-Type header is in author request headers and its value is a valid MIME type that has a charset parameter whose value is not a case-insensitive match for encoding, and encoding is not null, set all the charset parameters of that Content-Type header to encoding.&quot;

Questions: what does *all* mean in &quot;set all the charset parameters of that Content-Type&quot;? could you give a concrete example of a case with more than one charset parameter in the single media type value of Content-Type?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>65283</commentid>
    <comment_count>1</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-03-10 09:18:49 +0000</bug_when>
    <thetext>IMHO this is a good example of over-specification. Having multiple charset parameters is invalid, specifying the wrong charset is a bug, and relying on case in charset names is a bug. So this specifies behavior for a case of a double client bug + a server bug at the same time.

As far as I can tell, no UA except FF has tried to implement this yet; and the implementation in FF has required lots of hacks and layering violations (essentially, the header field parser needs to preserve all kinds of state that otherwise wouldn&apos;t be needed). Furthermore, it also doesn&apos;t do this for &quot;all&quot; charset parameters.

My suggestion would be to drop that silly requirement, and jut clarify that if you specify the charset although XHR will override it, you&apos;re on your own.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66068</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-03-26 17:22:23 +0000</bug_when>
    <thetext>&quot;text/html;charset=utf-6;charset=utf-9&quot; would be an example. The text should be taken literally, like all text.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66631</commentid>
    <comment_count>3</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 07:13:26 +0000</bug_when>
    <thetext>I believe this should be left open until you have evidence of at least two implementations doing what the spec asks for.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66632</commentid>
    <comment_count>4</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 07:16:27 +0000</bug_when>
    <thetext>Oh please. We&apos;re not going to open bugs for everything where implementations currently mismatch.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66634</commentid>
    <comment_count>5</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 07:28:22 +0000</bug_when>
    <thetext>This requirement has been in the spec for years. Last time I checked, only Firefox attempted to implement it, but didn&apos;t as specified. 

I think the logical next step is to actually try to *remove* it from Firefox, and then to simplify the spec.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66644</commentid>
    <comment_count>6</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 07:54:22 +0000</bug_when>
    <thetext>What do you mean by &quot;it&quot; and what would removing &quot;it&quot; mean? I don&apos;t really care that much what we do here, but I disagree that this is over-specification. Multiple charset parameters is a legitimate situation that can come up and implementors need to know what to do.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66646</commentid>
    <comment_count>7</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 08:44:59 +0000</bug_when>
    <thetext>(In reply to comment #6)
&gt; What do you mean by &quot;it&quot; and what would removing &quot;it&quot; mean? I don&apos;t really care
&gt; that much what we do here, but I disagree that this is over-specification.
&gt; Multiple charset parameters is a legitimate situation that can come up and
&gt; implementors need to know what to do.

things the spec says (last time I checked):

a) rewrite Content-Type request header field, because XHR *will* use UTF-8

b) if original charset matched UTF-8, preserve it&apos;s exact representation (so don&apos;t rewrite &quot;Utf-8&quot; to &quot;UTF-8&quot;)

c) in addition, do that for all additional charset params

So this is an edge case of an edge case of an edge case.

Optimally, we can get rid of all of this; just declare that if the sender doesn&apos;t specify UTF-8, it&apos;s his problem (resulting in an inconsistent request).

b) seems to be a workaround for one broken server seen in the past. Maybe it was fixed?

c) over-specifies b) for the a broken input. Nobody implements it. Why do you care about this edge case in the first place?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66647</commentid>
    <comment_count>8</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 08:48:15 +0000</bug_when>
    <thetext>The spec does not do b). It seemed best to replace all charset parameters rather than just the first or last. We could also define either first or last, but I&apos;m not sure how that is better.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66648</commentid>
    <comment_count>9</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 08:55:28 +0000</bug_when>
    <thetext>(In reply to comment #8)
&gt; The spec does not do b). It seemed best to replace all charset parameters
&gt; rather than just the first or last. We could also define either first or last,
&gt; but I&apos;m not sure how that is better.

Well, but UAs do. I haven&apos;t seen a UA that does what you want for multiple charset params. It&apos;s a garbage parameter.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66649</commentid>
    <comment_count>10</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 08:58:23 +0000</bug_when>
    <thetext>I don&apos;t think all UAs do b) and the specification has to deal with input garbage as we cannot control it. We have to do something.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66650</commentid>
    <comment_count>11</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 09:08:05 +0000</bug_when>
    <thetext>(In reply to comment #10)
&gt; I don&apos;t think all UAs do b) and the specification has to deal with input
&gt; garbage as we cannot control it. We have to do something.

No, you don&apos;t &quot;have to do something&quot;. It&apos;s your choice.

Seems this is going nowhere without test cases and comparison of what UAs do, which I&apos;ll try to get done as soon time permits.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66651</commentid>
    <comment_count>12</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 09:10:33 +0000</bug_when>
    <thetext>I&apos;m not going to leave edge cases undefined.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66652</commentid>
    <comment_count>13</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 09:19:56 +0000</bug_when>
    <thetext>(In reply to comment #12)
&gt; I&apos;m not going to leave edge cases undefined.

The problem here is that you assume that UAs use a full-blown Content-Type parser here (needed to extract proper type and charset information in the first place), *and* that that parser preserves all the information you want (about duplicated parameters).

This is not the case right now, and as far as I can tell.

That&apos;s why I&apos;m calling this &quot;overspecification&quot;.

If you absolutely *want* to handle this case, a much much simpler approach is to treat it as error and throw an exception.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66653</commentid>
    <comment_count>14</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 09:24:01 +0000</bug_when>
    <thetext>I&apos;m not assuming that. I&apos;m just saying I don&apos;t want to leave edge cases undefined. We could also make this about the first or last charset parameter, as I&apos;ve indicated.

Throwing an exception instead (where? for what method?) is not at all a simpler solution nor a solution that is going to work.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66654</commentid>
    <comment_count>15</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 09:35:04 +0000</bug_when>
    <thetext>(In reply to comment #14)
&gt; I&apos;m not assuming that. I&apos;m just saying I don&apos;t want to leave edge cases
&gt; undefined. We could also make this about the first or last charset parameter,
&gt; as I&apos;ve indicated.

&quot;If a Content-Type header is in author request headers and its value is a valid MIME type that has a charset parameter whose value is not a case-insensitive match for encoding, and encoding is not null, set all the charset parameters of that Content-Type header to encoding.&quot;

To make this decision, a UA will have to run the field value through a parser, otherwise it will not know about individual parameters.
 
&gt; Throwing an exception instead (where? for what method?) is not at all a simpler
&gt; solution nor a solution that is going to work.

For setRequestHeader() if possible, otherwise for send().

It is simpler as it doesn&apos;t require the UA to have a parser for broken field values.

Also, when you claim &quot;is not going to work&quot; it would be awesome if you could explain why.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66655</commentid>
    <comment_count>16</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 09:38:51 +0000</bug_when>
    <thetext>Because setRequestHeader() would have to get special purpose parsers. That&apos;s not how setRequestHeader() works.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66656</commentid>
    <comment_count>17</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 10:05:59 +0000</bug_when>
    <thetext>(In reply to comment #16)
&gt; Because setRequestHeader() would have to get special purpose parsers. That&apos;s
&gt; not how setRequestHeader() works.

&quot;For setRequestHeader() if possible, otherwise for send().&quot;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66657</commentid>
    <comment_count>18</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-04-12 10:08:26 +0000</bug_when>
    <thetext>We cannot start throwing for something we previously did not throw for. Also, I do not see how it&apos;s easier as you would still have to parse the header whereas you say a simpler pattern is used (which we could define, I don&apos;t mind).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>66658</commentid>
    <comment_count>19</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-04-12 10:17:20 +0000</bug_when>
    <thetext>(In reply to comment #18)
&gt; We cannot start throwing for something we previously did not throw for. Also, I

Why not?

&gt; do not see how it&apos;s easier as you would still have to parse the header whereas
&gt; you say a simpler pattern is used (which we could define, I don&apos;t mind).

No, if you want to rewrite charset parameters, you need to parse the header.

The spec, as written currently, assumes that these parsers are written in a way that a broken field can get parsed, and that information about duplicated parameters is returned. As far as I can tell, this doesn&apos;t reflect reality.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78908</commentid>
    <comment_count>20</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-11-27 13:58:06 +0000</bug_when>
    <thetext>Does this part of the spec have any test coverage yet?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78910</commentid>
    <comment_count>21</comment_count>
    <who name="Julian Reschke">julian.reschke</who>
    <bug_when>2012-11-27 14:08:47 +0000</bug_when>
    <thetext>Related Mozilla issue: https://bugzilla.mozilla.org/show_bug.cgi?id=397234

Test cases: http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/0955.html</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>