HTML 4.01 Test Suite - Assertions
Testable Assertions: Section 5 HTML Document Representation
5 HTML Document Representation - Character sets, character encodings, and entities
(must) Names for character encodings are case-insensitive, so that for example "SHIFT_JIS", "Shift_JIS", and "shift_jis" are equivalent.
(must) Conforming user agents must correctly map to ISO 10646 all characters in any character encodings that they recognize (or they must behave as if they did).
(must) Because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter, user agents must not assume any default value for the "charset" parameter when it is absent from the HTTP "Content-Type" header field.
(must) Conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
1.An HTTP "charset" parameter in a "Content-Type" field.
2.A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
3.The charset attribute set on an element that designates an external resource.
(may) In addition to this list of priorities, the user agent may use heuristics and user settings. For example, many user agents use a heuristic to distinguish the various encodings used for Japanese text. Also, user agents typically have a user-definable, local default character encoding which they apply in the absence of other indicators
(may) User agents may provide a mechanism that allows users to override incorrect "charset" information. However, if a user agent offers such a mechanism, it should only offer it for browsing and not for editing, to avoid the creation of Web pages marked with an incorrect "charset" parameter.
(must) Numeric character references specify the code position of a character in the document character set. Numeric character references may take two forms:
- The syntax "&#D;", where D is a decimal number,
refers to the ISO 10646 decimal character number D.
- The syntax "&#xH;" or "&#XH;", where H
is a hexadecimal number, refers to the ISO 10646 hexadecimal character number
H. Hexadecimal numbers in numeric character references are case-insensitive.
(must) Character entity references are case-sensitive.
(must) Four character entity references deserve special mention since they are frequently used to escape special characters:
- "<" represents the < sign.
- ">" represents the > sign.
- "&" represents the & sign.
- "" represents the " mark.
(should) Depending on the implementation, undisplayable characters may also be handled by the underlying display system and not the application itself. In the absence of more sophisticated behavior, for example tailored to the needs of a particular script or language, we recommend the following behavior for user agents:
Adopt a clearly visible, but unobtrusive mechanism to alert the user of missing resources.
If missing characters are presented using their numeric representation, use the hexadecimal (not decimal) form since this is the form used in character set standards.