ウェブの国際化

WEBテキスト処理法

2004年5月18日

Martin J. Dürst (テュールスト マーティン ヤコブ)

Internationalization Activity Lead, W3C

[this document is at http://www.w3.org/People/D%c3%bcrst/SFC/2004/0418Hagino.html]

Overview

Internationalization and Localization

Data Representation

Representing Text (single byte)

Representing Text (multibyte)

Representing Text (model)

Japanese Character Encodings

An End to Encoding Confusion?

Unicode: One Code Table, Several Encodings

UTF-8 Patterns

bytes 1st byte 2nd byte 3rd byte 4th byte payload bits
1 0xxx xxxx 7
2 110x xxxx 10xx xxxx 8-11
3 1110 xxxx 10xx xxxx 10xx xxxx 12-16
4 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx 17-21

No overlong encodings! (security problems)

UTF-8 Example

Character Encoding in HTML/XML

Indicating Character Encoding on the Web

Kanji Unification (包摂)

Kanji Unification Guidelines

General Criticism of Unicode (history?)

Kanji-related Criticism of Unicode (history?)

Maybe the Web can offer a solution?