<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>27235</bug_id>
          
          <creation_ts>2014-11-04 19:43:19 +0000</creation_ts>
          <short_desc>Bring back gbk encoder</short_desc>
          <delta_ts>2015-03-06 18:51:45 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Anne">annevk</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>hsivonen</cc>
    
    <cc>jsbell</cc>
    
    <cc>jshin</cc>
    
    <cc>mike</cc>
    
    <cc>travil</cc>
    
    <cc>www-international</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>114501</commentid>
    <comment_count>0</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-11-04 19:43:19 +0000</bug_when>
    <thetext>Firefox ended up not following the plan from bug 16862 comment 18. Its gbk decoder is identical to its gb18030 decoder, but its gbk encoder per https://bugzilla.mozilla.org/show_bug.cgi?id=951691 is distinct.

So we should probably bring the gbk encoder back. When fixing this we should pay attention to the EURO sign and PUA code points. See

  https://bugzilla.mozilla.org/show_bug.cgi?id=951691#c16
  https://bugzilla.mozilla.org/show_bug.cgi?id=951691#c19

Having said that, if other browsers meanwhile converged on not having a distinct gbk encoder, perhaps Firefox should revisit its approach. Input welcome.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>114506</commentid>
    <comment_count>1</comment_count>
    <who name="Joshua Bell">jsbell</who>
    <bug_when>2014-11-04 20:59:05 +0000</bug_when>
    <thetext>Data point: Chromium has NOT aligned with the Encoding standard here.

Our tracking bug is http://crbug.com/339862

As usual, Jungshik has a lot more context than I do, but we were definitely hesitant about trying to make this change.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>114714</commentid>
    <comment_count>2</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-11-08 10:22:47 +0000</bug_when>
    <thetext>Anticipated changes:

* Partially revert https://github.com/whatwg/encoding/commit/182ad9e607a7c6f0fa51d9dd6c638edaa5ec59fd to restore gb18030 as independent encoding with a single label, and gbk as independent encoding with nine labels.
* Map gbk&apos;s decoder to gb18030&apos;s decoder (no flags).
* Introduce a flag for gb18030&apos;s encoder that limits it to what gbk can output. (Still need to look into € and PUA.)
* Use that flag to define gbk&apos;s encoder.

(Per that commit we apparently historically defined gb18030 in terms of gbk, but that doesn&apos;t make much sense. So now we&apos;ll define gbk as a subset of gb18030.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>114721</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-11-08 19:52:51 +0000</bug_when>
    <thetext>https://github.com/whatwg/encoding/commit/c8838716fc6f575f50506e5b82f12c434b5be6bb

(It turns out that gbk supports the same PUA code points as far as I can tell.)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>114737</commentid>
    <comment_count>4</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2014-11-10 07:01:18 +0000</bug_when>
    <thetext>Sorry that I didn&apos;t get back here in a timely manner. I was out on internal/external conferences last week. Chromium was hesitant, but I&apos;ve been considering merging gbk and gb18030 per spec before the latest revision. 

Moreover, latest revision made it a bit hard to implement GBK/GB18030 without touching the ICU&apos;s gb18030 implementation (even though I agree to the approach; 1. decoding is identical for both encodings 2. gbk encoding is limited to &apos;the gbk subset&apos;).  I&apos;ve just read the latest revision and it&apos;s just my first thought. There might be an easier way. I&apos;ll give more thought to it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>118402</commentid>
    <comment_count>5</comment_count>
    <who name="Jungshik Shin">jshin</who>
    <bug_when>2015-03-06 18:51:45 +0000</bug_when>
    <thetext>I filed bug 28156 suggesting that GBK and GB18030 be completely separated even when decoding (toUnicode).</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>