<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>13396</bug_id>
          
          <creation_ts>2011-07-27 18:20:45 +0000</creation_ts>
          <short_desc>i18n-ISSUE-77: HTTP and defaulting to UTF-16LE</short_desc>
          <delta_ts>2011-11-18 15:58:24 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>LC1 HTML5 spec</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>a11y, a11ytf</keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="I18n Core WG">public-i18n-core</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>duerst</cc>
    
    <cc>hsivonen</cc>
    
    <cc>ian</cc>
    
    <cc>joshue.oconnor</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>51569</commentid>
    <comment_count>0</comment_count>
    <who name="I18n Core WG">public-i18n-core</who>
    <bug_when>2011-07-27 18:20:45 +0000</bug_when>
    <thetext>8.2.2.2 Character encodings
http://www.w3.org/TR/html5/parsing.html#character-encodings-0

Supported by the i18n WG.

&quot;When a user agent is to use the UTF-16 encoding but no BOM has been found, user agents must default to UTF-16LE.&quot;

If the HTTP header declares the file to be UTF-16BE, which I believe it can, and in which case a BOM should *not* be used, then I think that this would not be true. If the HTTP header declares the file to be UTF-16, then there must be a BOM, so I assume that this is a recovery mechanism if someone does declare UTF-16 in HTTP but omits the BOM. I&apos;d think that some kind of clarification and perhaps error message would be in order though.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>51609</commentid>
    <comment_count>1</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2011-07-28 08:03:42 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt; 8.2.2.2 Character encodings
&gt; http://www.w3.org/TR/html5/parsing.html#character-encodings-0
&gt; 
&gt; Supported by the i18n WG.
&gt; 
&gt; &quot;When a user agent is to use the UTF-16 encoding but no BOM has been found,
&gt; user agents must default to UTF-16LE.&quot;
&gt; 
&gt; If the HTTP header declares the file to be UTF-16BE, which I believe it can,
&gt; and in which case a BOM should *not* be used, then I think that this would not
&gt; be true.

Then the user agent isn&apos;t to use the UTF-16 encoding but the UTF-16BE encoding. The quoted sentence shouldn&apos;t say &quot;UTF-16LE&quot;. It should say &quot;little-endian UTF-16&quot;, unless the spec intends the reported encoding for the document to change and I&apos;m pretty sure that&apos;s not the intention.

&gt; If the HTTP header declares the file to be UTF-16, then there must be
&gt; a BOM, so I assume that this is a recovery mechanism if someone does declare
&gt; UTF-16 in HTTP but omits the BOM.

Yes.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>51722</commentid>
    <comment_count>2</comment_count>
    <who name="Martin Dürst">duerst</who>
    <bug_when>2011-07-29 02:02:56 +0000</bug_when>
    <thetext>(In reply to comment #1)
&gt; (In reply to comment #0)
&gt; &gt; 8.2.2.2 Character encodings
&gt; &gt; http://www.w3.org/TR/html5/parsing.html#character-encodings-0
&gt; &gt; 
&gt; &gt; Supported by the i18n WG.
&gt; &gt; 
&gt; &gt; &quot;When a user agent is to use the UTF-16 encoding but no BOM has been found,
&gt; &gt; user agents must default to UTF-16LE.&quot;
&gt; &gt; 
&gt; &gt; If the HTTP header declares the file to be UTF-16BE, which I believe it can,
&gt; &gt; and in which case a BOM should *not* be used, then I think that this would not
&gt; &gt; be true.
&gt; 
&gt; Then the user agent isn&apos;t to use the UTF-16 encoding but the UTF-16BE encoding.
&gt; The quoted sentence shouldn&apos;t say &quot;UTF-16LE&quot;. It should say &quot;little-endian
&gt; UTF-16&quot;, unless the spec intends the reported encoding for the document to
&gt; change and I&apos;m pretty sure that&apos;s not the intention.

This would definitely make things clearer. I&apos;d also suggest to change &quot;is to use the UTF-16 encoding&quot; at the start of the sentence to something that makes it clearer that this is stuff *labeled* with a label of &quot;UTF-16&quot; (using explicit quotes).

&gt; &gt; If the HTTP header declares the file to be UTF-16, then there must be
&gt; &gt; a BOM, so I assume that this is a recovery mechanism if someone does declare
&gt; &gt; UTF-16 in HTTP but omits the BOM.
&gt; 
&gt; Yes.

It may help to make this clear in the text.

Regards,   Martin.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>53974</commentid>
    <comment_count>3</comment_count>
    <who name="Michael[tm] Smith">mike</who>
    <bug_when>2011-08-04 05:34:59 +0000</bug_when>
    <thetext>mass-move component to LC1</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55364</commentid>
    <comment_count>4</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2011-08-17 22:27:57 +0000</bug_when>
    <thetext>EDITOR&apos;S RESPONSE: This is an Editor&apos;s Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: Concurred with reporter&apos;s comments.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55365</commentid>
    <comment_count>5</comment_count>
    <who name="">contributor</who>
    <bug_when>2011-08-17 22:28:43 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r6498.
Check-in comment: Clean up how we refer to UTF-16.
http://html5.org/tools/web-apps-tracker?from=6497&amp;to=6498</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>