This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding is incomplete in its coverage of Cyrillic languages. be: The Belarusian localization of Firefox has windows-1251 as the fallback, so it's virtually certain that the spec should require this. kk: The Kazakh localization of Firefox currently has UTF-8 as the fallback, and we have telemetry data indicating that it is a bad fallback, so it's virtually certain that the spec should require a windows-1251 fallback for Kazakh. Considering Windows code page legacy and, in some cases, relationship with Russia, it's reasonable to guess that also the following should *probably* fall back to windows-1251: ba (Bashkir) ky (Kyrgyz) mk (Macedonian) tg (Tajik) tt (Tatar) sah (Yakut) Probably best to check this latter list with someone who actually knows.
The current requirements are from bug 21087, where you said "In order to avoid spreading bugs, please remove all the entries that haven't been cross-checked to agree with the defaults of a version of Internet Explorer that predates the inclusion of the table in the spec". What changed? These are the notes I have in the spec for those locales: <!-- be, Belarusian, is not listed here because Windows Vista wanted windows-1251, Chrome wanted <none>, and Firefox wanted ISO-8859-5 --> <!-- ba-RU, Bashkir (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- ky, Kyrgyz, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- mk, Macedonian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- tg-Cyrl-TJ, Tajik (Cyrillic, Tajikistan), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- tt, Tatar, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- sah-RU, Yakut (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 -->
> The current requirements are from bug 21087, where you said "In order to > avoid spreading bugs, please remove all the entries that haven't been > cross-checked to agree with the defaults of a version of Internet > Explorer that predates the inclusion of the table in the spec". What > changed? Doesn't windows-1252 agree with IE? > <!-- be, Belarusian, is not listed here because Windows Vista wanted windows-1251, Chrome wanted <none>, and Firefox wanted ISO-8859-5 --> https://mxr.mozilla.org/l10n-mozilla-release/search?string=intl.charset.default&find=intl.properties says windows-1251 in Firefox. <!-- ba-RU, Bashkir (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- tt, Tatar, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> <!-- sah-RU, Yakut (Russia), is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> These are minority languages of Russia that use the Cyrillic script. It seems reasonable to expect the users to have to often browse ru-RU content and it would seem strange for the Cyrillic legacy for these languages in Russia to differ from the legacy of ru-RU. > <!-- mk, Macedonian, is not listed here because neither Chrome nor Firefox knew about it. For what it's worth, Windows Vista wanted windows-1251 --> Firefox now has a localization for mk which sets UTF-8 as the fallback. For the obvious reasons, I find it *extremely* hard to believe that UTF-8 is the right answer.
Ok, seems reasonable.
Checked in as WHATWG revision r8258. Check-in comment: Add some more locales to the default encoding logic. http://html5.org/tools/web-apps-tracker?from=8257&to=8258