Authoring Techniques for XHTML & HTML Internationalization 1.0
Checklist format

W3C Working Draft dd mmmm 2003

This version:
http://www.w3.org/International/Group/charmod-edit
Latest version:
http://www.w3.org/TR/2002/WD-charmod-20020430
Previous version:
http://www.w3.org/TR/2002/WD-charmod-20020430
Editor:
Richard Ishida, W3C <ishida@w3.org>

Table of contents

- Document structure & metadata
- Character sets, character encodings and entities
- Specifying the language of content
- Text direction
- Handling elements that vary by locale

Document structure & metadata

See detailed explanations...Creating an internationalised page header

  • Use the meta element in HTML documents to explicity declare the document's character encoding.

  • The meta declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the meta element is parsed).

  • meta declarations should appear as early as possible in the head element.

  • Use the lang and xml:lang attributes in the html tag.

Character sets, character encodings and entities

See detailed explanations...Choosing an encoding

  • A document encoding SHOULD be chosen which maximizes the opportunity to directly represent characters and minimizes the need to represent characters by markup means such as character escapes.

  • Encode web pages in UTF-8 unless there is a good reason not to.

  • Use IANA's preferred names for charset declarations.

  • Use character sets and encodings that will be accessible and common to your users.

See detailed explanations...Specifying the character encoding

  • Use the meta element in HTML documents to explicity declare the document's character encoding.

  • The meta declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the meta element is parsed).

  • meta declarations should appear as early as possible in the head element.

See detailed explanations...Referring to specific characters

  • Escapes SHOULD be avoided when the characters to be expressed are representable in the character encoding of the document.

  • Content SHOULD use the hexadecimal form of character escapes when there is one.

  • If, for a specific application, it becomes necessary to refer to characters outside [ISO10646], characters should be assigned to a private zone to avoid conflicts with present or future versions of the standard. This is highly discouraged, however, for reasons of portability.

  • Don't use entities in XHTML?

  • Something about the use of inline images to represent characters

See detailed explanations...Dealing with undisplayable fonts

  • Some guidelines for content authors who know that users won't have all the necessary fonts.

- Specifying the language of content

See detailed explanations...Identifying the primary language

  • Use the lang and xml:lang attributes in the html tag.

See detailed explanations...Identifying language change

  • Use the lang and xml:lang attributes around the text.

See detailed explanations...Specifying the language of a link destination

  • Use the hreflang attribute on the a element.

See detailed explanations...Specifying language codes

  • Follow the guidelines in RFC3066.

  • Use the two letter ISO 639 codes for the language code and the two letter ISO 3166 codes for the country code wherever possible.

Text direction

See detailed explanations...Setting directionality for an entire document in a bidirectional script

  • Add dir="rtl" to the html tag.

  • Enter all text in a single logical order, and leave it to the Unicode bidirectional algorithm to order the text as appropriate

  • If using an ISO encoding, choose iso-8859-8-i. (Alternatively use utf-8 or utf-16.)

See detailed explanations...Changing the directional properties of a part of the text

  • Add the dir attribute to an element that encompasses all the text.

  • Use markup to achieve bidirectional effects rather than CSS styling or the Unicode control characters for bidi embedding.

See detailed explanations...Overriding the Unicode bidirectional algorithm

  • Use the bdo element to force the directionality of a sequence of inline characters.

  • Use the special entities, &lrm; and &rlm; to force directionality of directionally neutral characters.

Handling elements that vary by locale

See detailed explanations...Date & time

    • Always use a four-digit number for the year.

    • Always use words (abbreviated if necessary) for the month.

  • For forms, use structured fields or popup menus for time and date input.