Skip to notes.The slides are typically visually-oriented graphics, but the notes contain enough information to understand the tutorial.
Slide

Character sets, coded character sets, and encodings

It is important to clearly distinguish between the concepts character set and character encoding.

A character set or repertoire comprises the set of characters one might use for a particular purpose – be it those required to support Western European languages in computers, or those a Chinese child will learn at school in the third grade (nothing to do with computers).

A coded character set is a set of characters for which a unique number has been assigned to each character. Units of a coded character set are known as code points. For example, the code point for the letter à in the Unicode coded character set is 225 in decimal, or E1 in hexadecimal notation. (Note that hexadecimal notation is commonly used for identifying such characters, and will be used here.)

The character encoding reflects the way these abstract characters are mapped to bytes for manipulation in a computer.

This explanation glosses over some of the detailed nomenclature related to encoding. More detail can be found in Unicode Technical Report #17.


Version: $Id: Slide0060.html,v 1.2 2006/02/02 07:54:31 rishida Exp $