Numeric character references always refer to the number of a character in the Unicode repertoire, no matter what encoding you use. It is a common error for people working on a page encoded in Windows code page 1252, for example, to try to represent the euro sign using €. This is because the euro appears at position 80 on the Windows 1252 code page. Using € would actually produce a control character, since the escape would be expanded as the character at position 80 in the Unicode repertoire. What was really needed was €.
Typically when the Unicode Standard refers to or lists characters it does so using a hexadecimal value. For instance, the code point for the letter รก may be referred to as U+00E1. Given the prevalence of this convention, it is often useful, though not required, to use hexadecimal numeric values in escapes rather than decimal values. You do not need to use leading zeros in escapes.
If you use entities (such as á) to represent characters, you should take care any time your content is processed using XML tools, or converted to XML. These entities have to be declared in the Document Type Definition to work. For this reason, it may be safer to use numeric values.
Supplementary characters are those Unicode characters that have code points higher than the characters in the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded using two 16-bit surrogate code points from the BMP. Because of this, some people think that supplementary characters need to be represented using two escapes, but this is incorrect - you must use the single, scalar value for that character. For example, use 𣎴 rather than ��.
Version: $Id: Slide0480.html,v 1.2 2006/02/02 07:54:32 rishida Exp $