HTML Testable Assertions Section 5 HTML Document Representation

Assertion 5.1-1

(author)(must) To promote interoperability, SGML requires that each application (including HTML) specify its document character set.

Tests: None

Assertion 5.1-2

Reference: Section 5.1

(author)(must) User agents must also know the specific character encoding that was used to transform the document character stream into a byte stream

Tests: None

Assertion 5.2-1

Reference: Section 5.2

(author)(may) Authoring tools (e.g., text editors) may encode HTML documents in the character encoding of their choice, and the choice largely depends on the conventions used by the system software. These tools may employ any convenient encoding that covers most of the characters contained in the document, provided the encoding is correctly labeled.

Tests: None

Assertion 5.2.1-1

Reference: Section 5.2.1

(must) Names for character encodings are case-insensitive, so that for example "SHIFT_JIS", "Shift_JIS", and "shift_jis" are equivalent.

Tests: None

Assertion 5.2.1-2

Reference: Section 5.2.1

(must) Conforming user agents must correctly map to ISO 10646 all characters in any character encodings that they recognize (or they must behave as if they did).

Tests: None

Assertion 5.2.2-1

Reference: Section 5.2.2

(must) Because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter, user agents must not assume any default value for the "charset" parameter when it is absent from the HTTP "Content-Type" header field.

Tests: None

Assertion 5.2.2-2

Reference: Section 5.2.2

(author)(may) HTML documents may include explicit information about the document's character encoding; the META element can be used to provide user agents with this information

Tests: None

Assertion 5.2.2-3

Reference: Section 5.2.2

(author)(must) The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed).

Tests: None

Assertion 5.2.2-4

Reference: Section 5.2.2

(author)(should) META declarations should appear as early as possible in the HEAD element.

Tests: None

Assertion 5.2.2-5

Reference: Section 5.2.2

(must) Conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest): 1.An HTTP "charset" parameter in a "Content-Type" field. 2.A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset". 3.The charset attribute set on an element that designates an external resource.

Tests: None

Assertion 5.2.2-6

Reference: Section 5.2.2

(may) In addition to this list of priorities, the user agent may use heuristics and user settings. For example, many user agents use a heuristic to distinguish the various encodings used for Japanese text. Also, user agents typically have a user-definable, local default character encoding which they apply in the absence of other indicators

Tests: None

Assertion 5.2.2-7

Reference: Section 5.2.2

(may) User agents may provide a mechanism that allows users to override incorrect "charset" information. However, if a user agent offers such a mechanism, it should only offer it for browsing and not for editing, to avoid the creation of Web pages marked with an incorrect "charset" parameter.

Tests: None

Assertion 5.3.1-1

Reference: Section 5.3.1

(must) Numeric character references specify the code position of a character in the document character set. Numeric character references may take two forms:

The syntax "&#D;", where D is a decimal number, refers to the ISO 10646 decimal character number D.
The syntax "&#xH;" or "&#XH;", where H is a hexadecimal number, refers to the ISO 10646 hexadecimal character number H. Hexadecimal numbers in numeric character references are case-insensitive.

Tests: 5_3_1-BF-01.html

Assertion 5.3.2-1

Reference: Section 5.3.2

(must) Character entity references are case-sensitive.

Tests: 5_3_1-BF-01.html

Assertion 5.3.2-2

Reference: Section 5.3.2

(must) Four character entity references deserve special mention since they are frequently used to escape special characters:

"<" represents the < sign.
">" represents the > sign.
"&" represents the & sign.
"" represents the " mark.

Tests: 5_3_2-BF-01.html

Assertion 5.4-1

Reference: Section 5.4

(should) Depending on the implementation, undisplayable characters may also be handled by the underlying display system and not the application itself. In the absence of more sophisticated behavior, for example tailored to the needs of a particular script or language, we recommend the following behavior for user agents: Adopt a clearly visible, but unobtrusive mechanism to alert the user of missing resources. If missing characters are presented using their numeric representation, use the hexadecimal (not decimal) form since this is the form used in character set standards.

Tests: 5_3_1-BF-01.html

HTML 4.01 Test Suite - Assertions

Testable Assertions: Section 5 HTML Document Representation

5 HTML Document Representation - Character sets, character encodings, and entities

Assertion 5.1-1

Assertion 5.1-2

Assertion 5.2-1

Assertion 5.2.1-1

Assertion 5.2.1-2

Assertion 5.2.2-1

Assertion 5.2.2-2

Assertion 5.2.2-3

Assertion 5.2.2-4

Assertion 5.2.2-5

Assertion 5.2.2-6

Assertion 5.2.2-7

Assertion 5.3.1-1

Assertion 5.3.2-1

Assertion 5.3.2-2

Assertion 5.4-1