This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24336 - Encoding names should match what people actually call them
Summary: Encoding names should match what people actually call them
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-20 18:18 UTC by Geoffrey Sneddon
Modified: 2014-01-20 23:30 UTC (History)
2 users (show)

See Also:


Attachments

Description Geoffrey Sneddon 2014-01-20 18:18:42 UTC
http://gsnedders.html5.org/web-encoding-names/results.html shows what document.characterSet returns in current versions of browsers. Notably, Firefox and Chrome both return the uppercased names for many of these. (IE returns them all lowercase except "GB18030"; ZombieOpera returns them all lowercase)

Googling these encoding names it becomes clear that almost everyone refers to "UTF-8", "ISO-8859-n", etc. (uppercased), and as there is no interop here currently, and the proposed behaviour matches Firefox/Chrome, it would seem better to just give them their names that are in common usage.

As such, I propose to change the names to the following (thereby changing case only):

 - UTF-8
 - IBM866
 - ISO-8859-n
 - ISO-8859-8-I
 - KOI8-R
 - KOI8-U
 - HZ-GB-2312
 - Big5
 - EUC-JP
 - ISO-2022-JP
 - Shift_JIS
 - EUC-KR
 - UTF-16BE
 - UTF-16LE
Comment 1 Anne 2014-01-20 23:30:22 UTC
I value more that now you can predict what characterSet returns. With your proposed change you need to know that windows-1252 is not spelled Windows-1252. And that gb18030 is an exception.