This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 16697 - Indexes: additional gbk mappings
Summary: Indexes: additional gbk mappings
Status: RESOLVED DUPLICATE of bug 16862
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: PC Windows 3.1
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-11 07:39 UTC by Anne
Modified: 2013-12-16 16:09 UTC (History)
3 users (show)

See Also:


Attachments

Description Anne 2012-04-11 07:39:31 UTC
Comparing your GBK index to my tables reveals a few differences:
 
> 6 57 coq only FE10
> 6 58 coq only FE12
> 6 59 coq only FE11
> 6 60 coq only FE13
> 6 61 coq only FE14
> 6 62 coq only FE15
> 6 63 coq only FE16
> 6 76 coq only FE17
> 6 77 coq only FE18
> 6 83 coq only FE19
 
Vertical variants of punctuation marks in GBK/1 (GBK additions to GB2312, Row 6) missing from the index.  These were apparently missing from the original GBK standard.
 
> 8 28 coq only 1E3F
 
Another GBK/1 addition (ḿ).
 
> 203 96 annevk only 3000
 
An additional codepoint for ideographic space missing from my tables.  This looks a bit random, but (at least some) browsers do this, so I guess it is needed.  More information would be nice.
 
> 294 18 coq only 20087
> 294 19 coq only 20089
> 294 20 coq only 200CC
> 294 26 coq only 9FB4
> 294 34 coq only 9FB5
> 294 39 coq only 9FB6
> 294 40 coq only 9FB7
> 294 45 coq only 215D7
> 294 46 coq only 9FB8
> 294 55 coq only 2298F
> 294 63 coq only 9FB9
> 294 80 coq only 9FBA
> 294 81 coq only 241FE
> 294 96 coq only 9FBB
 
This is the entire Unihan G9 repertoire, 14 of the 101 non-Unicode 1.0 hanzi included at the end of GBK/4.
 
Apart from the ideographic space, the codepoints listed above all mapped to PUA or FFFD in browsers when I last checked (cf. <http://coq.no/character-tables/chinese-simplified/en> under GBK), but they render as expected in IE and should probably be added to the index.
 
Øistein
Comment 1 pub-w3 2012-04-25 19:41:26 UTC
More useful list of the missing characters including GBK/GB18030 encoding, old PUA mapping and new non-PUA mapping:

A6 D9  U+E78D  U+FE10  ︐
A6 DA  U+E78E  U+FE12  ︒
A6 DB  U+E78F  U+FE11  ︑
A6 DC  U+E790  U+FE13  ︓
A6 DD  U+E791  U+FE14  ︔
A6 DE  U+E792  U+FE15  ︕
A6 DF  U+E793  U+FE16  ︖
A6 EC  U+E794  U+FE17  ︗
A6 ED  U+E795  U+FE18  ︘
A6 F3  U+E796  U+FE19  ︙

A8 BC  U+E7C7  U+1E3F  ḿ

FE 51  U+E816  U+20087  
Comment 2 Anne 2012-04-25 19:47:15 UTC
What about the code points listed after U+20087 in comment 0?
Comment 3 pub-w3 2012-04-25 19:49:31 UTC
FE 51  U+E816  U+20087  
Comment 4 pub-w3 2012-04-25 19:51:26 UTC
W3C cannot handle astral characters, apparently.  :-(

FE 51  U+E816  U+20087  [astral]
FE 52  U+E817  U+20089  [astral]
FE 53  U+E818  U+200CC  [astral]
FE 59  U+E81E  U+9FB4  龴
FE 61  U+E826  U+9FB5  龵
FE 66  U+E82B  U+9FB6  龶
FE 67  U+E82C  U+9FB7  龷
FE 6C  U+E831  U+215D7  [astral]
FE 6D  U+E832  U+9FB8  龸
FE 76  U+E83B  U+2298F  [astral]
FE 7E  U+E843  U+9FB9  龹
FE 90  U+E854  U+9FBA  龺
FE 91  U+E855  U+241FE  [astral]
FE A0  U+E864  U+9FBB  龻
Comment 5 Anne 2013-12-13 15:22:25 UTC
What about U+E7C9 from bug 21145? But yeah, I need to fix this mess.
Comment 6 Anne 2013-12-16 16:09:53 UTC

*** This bug has been marked as a duplicate of bug 16862 ***