This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
IE takes the labels koi8-u and koi8-ru to mean KOI8-RU and not KOI8-U. The difference is that KOI8-RU has an additional letter needed for Byelorussian, AE: U+045E (ў) and BE: U+040E (Ў), where KOI8-U has line-drawing characters, AE: U+255D (╝) and BE: U+256C (╬). Letters are arguably more important than box-drawing characters, so KOI8-RU might be a better choice than KOI-8-U, at least if it can be shown that koi8-(r)u is used for Byelorussian (i.e., that AE/BE are used to encode ў/Ў).
KOI8-RU is not one of the aliases supported by ICU (see <http://demo.icu-project.org/icu-bin/convexp>). Is the encoding itself supported by ICU? HAving such back-end support would be useful for getting it supported in browsers. I don't have any data on real life use of this encoding.
(In reply to comment #1) > Is [KOI8-RU] supported by ICU? Apparently not: $ grep '042F.*F1' mappings/* mappings/ibm-1168_P100-2002.ucm:<U042F> \xF1 |0 mappings/ibm-878_P100-1996.ucm:<U042F> \xF1 |0 All KOI-8 encodings encode the basic modern Russian letters identically. In particular, Я (U+042F) is encoded as 0xF1. Only KOI8-R (IBM-878) and KOI8-U (IBM-1168) match, so KOI8-RU is not supported. > I don't have any data on real life use of this encoding. Have you looked for 0xAE bytes in data labelled KOI8-U (or possibly KOI8-R)?
Does IE also report koi8-ru as the encoding name (via the DOM)? I suppose if IE does this it might be more compatible, although IE is not dominant in that region (afaik).
(In reply to comment #3) > Does IE also report koi8-ru as the encoding name (via the DOM)? document.charset returns koi8-u in IE9. The encoding vector appears to have been changed from KOI8-U to KOI8-RU at some point between IE6 and IE9. I assume this would not have happened in the absence of KOI8-RU content labelled as KOI8-U, but this may not be an issue for current Web content.
Adrian, Travis, any idea here?
I will need to have an encoding expert on our team look into this; offhand I don't know how prevalent this encoding is or why this change may have been made.
We've searched IE's code base, and found that we've had this behavior since at least IE4. I can't prove it at the moment, but I suspect that what we're seeing here is an encoding compatibility decision that was made to align with Netscape at the time. Due to the longevity of this behavior, I'm not very keen on changing it unless you can prove a significant web compatibility problem with it.
It seems koi8-r also removes a few letters in favor of line-drawing characters. I wonder if just supporting koi8-ru would be sufficient.
Simon, Jungshik, Joshua, Henri, last year Travis expressed disinterest in changing Internet Explorer for this encoding. Are Chromium and Gecko willing to change their implementation to match Internet Explorer? Comment 0 describes the minor difference between the mapping in browsers.
Feel free to reopen this once someone can address the question in comment 9. Long live the status quo of the majority...
Hmm..I missed this bug. Without doing any research but purely based on comment 0 and comment 7, I don't see a big issue with changing those two (0xAE, 0xBE). I'm not sure if it's worth a while to get to the bottom of it (data collection, etc) as it gets less significant as time goes on.
Alright, let's change it then. IE's behavior does seem slightly better.
I did not change the name of the encoding per comment 4. https://github.com/whatwg/encoding/commit/52f08a6259d331197685c6b417ee753b817c5a79