<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>17053</bug_id>
          
          <creation_ts>2012-05-14 22:23:43 +0000</creation_ts>
          <short_desc>Support KOI8-RU mapping for KOI8-U</short_desc>
          <delta_ts>2015-08-19 13:36:56 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>Windows 3.1</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>pub-w3</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>adrianba</cc>
    
    <cc>ap</cc>
    
    <cc>hsivonen</cc>
    
    <cc>jsbell</cc>
    
    <cc>jshin</cc>
    
    <cc>mike</cc>
    
    <cc>smontagu</cc>
    
    <cc>travil</cc>
    
    <cc>www-international</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>67697</commentid>
    <comment_count>0</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-05-14 22:23:43 +0000</bug_when>
    <thetext>IE takes the labels koi8-u and koi8-ru to mean KOI8-RU and not KOI8-U.  The difference is that KOI8-RU has an additional letter needed for Byelorussian,

AE:  U+045E  (ў) and
BE:  U+040E  (Ў),

where KOI8-U has line-drawing characters,

AE:  U+255D  (╝) and
BE:  U+256C  (╬).

Letters are arguably more important than box-drawing characters, so KOI8-RU might be a better choice than KOI-8-U, at least if it can be shown that koi8-(r)u is used for Byelorussian (i.e., that AE/BE are used to encode ў/Ў).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>76032</commentid>
    <comment_count>1</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2012-10-11 20:17:58 +0000</bug_when>
    <thetext>KOI8-RU is not one of the aliases supported by ICU (see &lt;http://demo.icu-project.org/icu-bin/convexp&gt;). Is the encoding itself supported by ICU? HAving such back-end support would be useful for getting it supported in browsers.

I don&apos;t have any data on real life use of this encoding.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>76928</commentid>
    <comment_count>2</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-10-22 21:30:40 +0000</bug_when>
    <thetext>(In reply to comment #1)
&gt; Is [KOI8-RU] supported by ICU?

Apparently not:

$ grep &apos;042F.*F1&apos; mappings/* 
mappings/ibm-1168_P100-2002.ucm:&lt;U042F&gt; \xF1 |0
mappings/ibm-878_P100-1996.ucm:&lt;U042F&gt; \xF1 |0

All KOI-8 encodings encode the basic modern Russian letters identically.  In particular, Я (U+042F) is encoded as 0xF1.  Only KOI8-R (IBM-878) and KOI8-U (IBM-1168) match, so KOI8-RU is not supported.

&gt; I don&apos;t have any data on real life use of this encoding.

Have you looked for 0xAE bytes in data labelled KOI8-U (or possibly KOI8-R)?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>78412</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2012-11-16 14:32:21 +0000</bug_when>
    <thetext>Does IE also report koi8-ru as the encoding name (via the DOM)? I suppose if IE does this it might be more compatible, although IE is not dominant in that region (afaik).</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>79758</commentid>
    <comment_count>4</comment_count>
    <who name="">pub-w3</who>
    <bug_when>2012-12-08 15:23:29 +0000</bug_when>
    <thetext>(In reply to comment #3)
&gt; Does IE also report koi8-ru as the encoding name (via the DOM)?

document.charset returns koi8-u in IE9.

The encoding vector appears to have been changed from KOI8-U to KOI8-RU at some point between IE6 and IE9.  I assume this would not have happened in the absence of KOI8-RU content labelled as KOI8-U, but this may not be an issue for current Web content.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92949</commentid>
    <comment_count>5</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-09-04 09:32:28 +0000</bug_when>
    <thetext>Adrian, Travis, any idea here?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92995</commentid>
    <comment_count>6</comment_count>
    <who name="Travis Leithead [MSFT]">travil</who>
    <bug_when>2013-09-04 17:58:51 +0000</bug_when>
    <thetext>I will need to have an encoding expert on our team look into this; offhand I don&apos;t know how prevalent this encoding is or why this change may have been made.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>93299</commentid>
    <comment_count>7</comment_count>
    <who name="Travis Leithead [MSFT]">travil</who>
    <bug_when>2013-09-12 17:32:22 +0000</bug_when>
    <thetext>We&apos;ve searched IE&apos;s code base, and found that we&apos;ve had this behavior since at least IE4. I can&apos;t prove it at the moment, but I suspect that what we&apos;re seeing here is an encoding compatibility decision that was made to align with Netscape at the time.

Due to the longevity of this behavior, I&apos;m not very keen on changing it unless you can prove a significant web compatibility problem with it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>97453</commentid>
    <comment_count>8</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2013-12-11 16:32:38 +0000</bug_when>
    <thetext>It seems koi8-r also removes a few letters in favor of line-drawing characters. I wonder if just supporting koi8-ru would be sufficient.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>114489</commentid>
    <comment_count>9</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-11-04 15:07:57 +0000</bug_when>
    <thetext>Simon, Jungshik, Joshua, Henri, last year Travis expressed disinterest in changing Internet Explorer for this encoding. Are Chromium and Gecko willing to change their implementation to match Internet Explorer?

Comment 0 describes the minor difference between the mapping in browsers.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>122656</commentid>
    <comment_count>10</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2015-08-19 11:15:12 +0000</bug_when>
    <thetext>Feel free to reopen this once someone can address the question in comment 9. Long live the status quo of the majority...</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>122660</commentid>
    <comment_count>11</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2015-08-19 12:33:47 +0000</bug_when>
    <thetext>Hmm..I missed this bug. Without doing any research but purely based on comment 0 and comment 7, I don&apos;t see a big issue with changing those two (0xAE, 0xBE). I&apos;m not sure if it&apos;s worth a while to get to the bottom of it (data collection, etc) as it gets less  significant as time goes on.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>122662</commentid>
    <comment_count>12</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2015-08-19 12:57:55 +0000</bug_when>
    <thetext>Alright, let&apos;s change it then. IE&apos;s behavior does seem slightly better.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>122663</commentid>
    <comment_count>13</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2015-08-19 13:36:56 +0000</bug_when>
    <thetext>I did not change the name of the encoding per comment 4.

https://github.com/whatwg/encoding/commit/52f08a6259d331197685c6b417ee753b817c5a79</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>