<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>23089</bug_id>
          
          <creation_ts>2013-08-29 11:51:34 +0000</creation_ts>
          <short_desc>Incomplete coverage of Cyrillic languages that should imply a windows-1251 fallback</short_desc>
          <delta_ts>2013-11-06 21:38:43 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>HTML</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Henri Sivonen">hsivonen</reporter>
          <assigned_to name="Ian &apos;Hixie&apos; Hickson">ian</assigned_to>
          <cc>ian</cc>
    
    <cc>mike</cc>
          
          <qa_contact>contributor</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>92699</commentid>
    <comment_count>0</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2013-08-29 11:51:34 +0000</bug_when>
    <thetext>http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding  is incomplete in its coverage of Cyrillic languages.

be: The Belarusian localization of Firefox has windows-1251 as the fallback, so it&apos;s virtually certain that the spec should require this.

kk: The Kazakh localization of Firefox currently has UTF-8 as the fallback, and we have telemetry data indicating that it is a bad fallback, so it&apos;s virtually certain that the spec should require a windows-1251 fallback for Kazakh.

Considering Windows code page legacy and, in some cases, relationship with Russia, it&apos;s reasonable to guess that also the following should *probably* fall back to windows-1251: 
ba (Bashkir)
ky (Kyrgyz)
mk (Macedonian)
tg (Tajik)
tt (Tatar)
sah (Yakut)

Probably best to check this latter list with someone who actually knows.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92752</commentid>
    <comment_count>1</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-08-30 18:06:41 +0000</bug_when>
    <thetext>The current requirements are from bug 21087, where you said &quot;In order to avoid spreading bugs, please remove all the entries that haven&apos;t been cross-checked to agree with the defaults of a version of Internet Explorer that predates the inclusion of the table in the spec&quot;. What changed?

These are the notes I have in the spec for those locales:

&lt;!-- be, Belarusian, is not listed here because Windows Vista wanted windows-1251, Chrome wanted &lt;none&gt;, and Firefox wanted ISO-8859-5 --&gt;
&lt;!-- ba-RU, Bashkir (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- ky, Kyrgyz, is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- mk, Macedonian, is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- tg-Cyrl-TJ, Tajik (Cyrillic, Tajikistan), is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- tt, Tatar, is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- sah-RU, Yakut (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>93083</commentid>
    <comment_count>2</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2013-09-06 07:36:12 +0000</bug_when>
    <thetext>&gt; The current requirements are from bug 21087, where you said &quot;In order to 
&gt; avoid spreading bugs, please remove all the entries that haven&apos;t been 
&gt; cross-checked to agree with the defaults of a version of Internet 
&gt; Explorer that predates the inclusion of the table in the spec&quot;. What 
&gt; changed?

Doesn&apos;t windows-1252 agree with IE?

&gt; &lt;!-- be, Belarusian, is not listed here because Windows Vista wanted windows-1251, Chrome wanted &lt;none&gt;, and Firefox wanted ISO-8859-5 --&gt;

https://mxr.mozilla.org/l10n-mozilla-release/search?string=intl.charset.default&amp;find=intl.properties says windows-1251 in Firefox.

&lt;!-- ba-RU, Bashkir (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- tt, Tatar, is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;
&lt;!-- sah-RU, Yakut (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;

These are minority languages of Russia that use the Cyrillic script. It seems reasonable to expect the users to have to often browse ru-RU content and it would seem strange for the Cyrillic legacy for these languages in Russia to differ from the legacy of ru-RU.

&gt; &lt;!-- mk, Macedonian, is not listed here because neither Chrome nor Firefox knew about it. For what it&apos;s worth, Windows Vista wanted windows-1251 --&gt;

Firefox now has a localization for mk which sets UTF-8 as the fallback. For the obvious reasons, I find it *extremely* hard to believe that UTF-8 is the right answer.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95894</commentid>
    <comment_count>3</comment_count>
    <who name="Ian &apos;Hixie&apos; Hickson">ian</who>
    <bug_when>2013-11-06 21:38:02 +0000</bug_when>
    <thetext>Ok, seems reasonable.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>95895</commentid>
    <comment_count>4</comment_count>
    <who name="">contributor</who>
    <bug_when>2013-11-06 21:38:43 +0000</bug_when>
    <thetext>Checked in as WHATWG revision r8258.
Check-in comment: Add some more locales to the default encoding logic.
http://html5.org/tools/web-apps-tracker?from=8257&amp;to=8258</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>