24336 – Encoding names should match what people actually call them

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24336 - Encoding names should match what people actually call them

Summary: Encoding names should match what people actually call them

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	Encoding (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+encodingspec

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-20 18:18 UTC by Geoffrey Sneddon
Modified:	2014-01-20 23:30 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Geoffrey Sneddon 2014-01-20 18:18:42 UTC

http://gsnedders.html5.org/web-encoding-names/results.html shows what document.characterSet returns in current versions of browsers. Notably, Firefox and Chrome both return the uppercased names for many of these. (IE returns them all lowercase except "GB18030"; ZombieOpera returns them all lowercase)

Googling these encoding names it becomes clear that almost everyone refers to "UTF-8", "ISO-8859-n", etc. (uppercased), and as there is no interop here currently, and the proposed behaviour matches Firefox/Chrome, it would seem better to just give them their names that are in common usage.

As such, I propose to change the names to the following (thereby changing case only):

 - UTF-8
 - IBM866
 - ISO-8859-n
 - ISO-8859-8-I
 - KOI8-R
 - KOI8-U
 - HZ-GB-2312
 - Big5
 - EUC-JP
 - ISO-2022-JP
 - Shift_JIS
 - EUC-KR
 - UTF-16BE
 - UTF-16LE

Comment 1 Anne 2014-01-20 23:30:22 UTC

I value more that now you can predict what characterSet returns. With your proposed change you need to know that windows-1252 is not spelled Windows-1252. And that gb18030 is an exception.