Authoring Techniques for XHTML & HTML Internationalization 1.0
Outline View

This is an outline of the Working Draft, Authoring Techniques for XHTML and HTML Internationalization 1.0 dated 9 October 2003. Links to the full document and to resource information are provided at the beginning of each section.

Table of contents

-Introduction
-Document structure & metadata
-Character sets, character encodings and entities
-Fonts
-Specifying the language of content
-Handling bidirectional text
-* Handling vertical text
-* Text formatting
-* Lists
-* Tables
-* Links
-* Objects
-* Images
-Handling data that varies by locale
-Forms
-* Keyboard shortcuts
-* Writing source text
-* Navigation
-* File management
-* Supplying data for localization

Notes on document use

This document is in early draft form. It is undergoing constant and frequent modification and does not yet contain accurate content.

Use the icons to the right of each section header to view the full text or view resources for a given section.

The yellow cells to the right indicates whether a technique is supported by a given user agent. The possible alternatives are:

  • Y : the earliest version of the user agent considered for this document supports the technique. Earliest versions are: Internet Explorer 6, Netscape Navigator 7, Opera 7
  • - : not supported
  • (digit): supported in version X and above
  • (blank space): needs further investigation.

2 Document structure & metadata

See detailed explanations...Resources dealing with this topic...Return to top of contents...2.1 Internationalizing the page header

IENSOp

For HTML documents and XHTML documents served as text/html, always use the meta element to explicitly declare the document's character encoding.

YYY

Use meta charset declarations as early as possible in the head element.

YYY

For HTML use the lang attribute, and for XHTML use the lang and xml:lang attributes in the html tag.

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...2.2 International layout considerations

IENSOp

Whenever possible, avoid HTML attributes with values of right and left. Use CSS in a linked stylesheet instead.

YYY

Whenever possible, avoid using CSS constructs that specify values of right and left. Use before and after if available.

YYY

3 Character sets, character encodings and entities

See detailed explanations...Resources dealing with this topic...Return to top of contents...3.1 Choosing a page encoding

IENSOp

Choose UTF-8 or another Unicode encoding for all content.

YYY

If you don't use a Unicode encoding, select an encoding that best supports the languages / characters to be included in the page text. [Ed. note: What does this mean? Does it mean, which maximizes the opportunity to directly represent characters and minimizes the need to represent characters by markup means such as character escapes? Does it include the idea that you should choose the most commonly used encoding for a region?]

YYY

Check that user agents (all agents that must render the page) adequately support the page encoding that you have selected. If not, you might need to use a more widely supported encoding to achieve an adequate degree of user agent support.[Ed. note: Couldn't this be rolled into the previous technique?]

YYY

Use character sets and encodings that will be accessible and common to your users.

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...3.2 Specifying a page encoding

IENSOp

Where practical, declare the page's character encoding by setting the charset parameter in the HTTP Content-Type header.

YYY

For XHTML served as text/html, where practical use an XML declaration with an encoding attribute.

YYY

For XHTML served as application/xhtml+xml, always use an XML declaration with an encoding attribute.

-YY

For HTML documents and XHTML documents served as text/html, always use the meta element to explicitly declare the document's character encoding.

YYY

Use meta charset declarations as early as possible in the head element.

YYY

Use the preferred names from IANA's charset registry.

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...3.3 Referring to specific characters

IENSOp

Avoid escapes when the characters to be expressed are representable in the character encoding of the document.

YYY

When using escapes, use the hexadecimal form.

YYY

If, for a specific application, it becomes necessary to refer to characters outside [ISO10646], characters should be assigned to a private zone to avoid conflicts with present or future versions of the standard. Use of private use characters is highly discouraged, however, for reasons of portability.

YYY

[Ed. note: Add something about the use of inline images to represent characters ]

YYY

4 Fonts

See detailed explanations...Resources dealing with this topic...Return to top of contents...4.1 Choosing & specifying fonts

IENSOp

Do not use <font> tags - use CSS styles instead.

YYY

Always use the serif and sans-serif fallbacks

YYY

Don't assume you know which fonts will be available on the client.

YYY

Don't rely on text just fitting in a space

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...4.2 Dealing with undisplayable characters

IENSOp

Some guidelines for content authors who know that users won't have all the necessary fonts.

YYY

5 Specifying the language of content

See detailed explanations...Resources dealing with this topic...Return to top of contents...5.1 Specifying the overall language of a document

IENSOp

For HTML use the lang attribute, and for XHTML use the lang and xml:lang attributes in the html tag.

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...5.2 Identifying language change

IENSOp

Use the lang and xml:lang attributes around text in a language other than that of the whole document.

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...5.3 Specifying the language of a link destination

IENSOp

Use the hreflang attribute on the a element.

See detailed explanations...Resources dealing with this topic...Return to top of contents...5.4 Specifying language codes

IENSOp

Follow the guidelines in RFC3066.

YYY

Use the two-letter ISO 639 codes for the language code wherever possible, rather than the 3-letter codes.

YYY

6 Handling bidirectional text

'Bidirectional', or 'bidi', text refers to text written using a script such as Arabic or Hebrew. In such scripts the text flows predominantly from right to left, but embedded numbers or text in other scripts (such as Latin script) still runs left to right.

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.1 Enabling easy localization for RTL scripts

IENSOp

Whenever possible, avoid HTML attributes with values of right and left. Use CSS in a linked stylesheet instead.

YYY

Whenever possible, avoid using CSS constructs that specify values of right and left. Use before and after if available.

YYY

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.2 General use of bidi markup

IENSOp

Do not use CSS styling to control directionality in XHTML/HTML. Use markup.

YYY

Only use bidi markup where it is needed.

YY-

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.3 Basic setup for pages in RTL scripts

IENSOp

Add dir="rtl" to the html tag any time the overall document direction is right-to-left.

YY-

Do not add dir="rtl" to the body tag.

YY-

Use logical order, not visual ordering for Hebrew.

YY-

If using an ISO character encoding for Hebrew, choose iso-8859-8-i and use logical ordering.

YY-

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.4 Changing the directionality of a block element

IENSOp

Add the dir attribute to a block level element (only) to change its directionality.

YY-

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.5 Mixing text direction inline

IENSOp

Use a Unicode right-to-left mark (RLM) or left-to-right mark (LRM) to make neutral characters such as punctuation and spaces appear in the right place when they fall between different directional runs.

YY-

Use a Unicode right-to-left mark (RLM) or left-to-right mark (LRM) to correctly order separate runs of same direction text separated by neutral characters such as punctuation and spaces.

YY-

Use the dir attribute on an inline element to resolve problems with nested directional runs.

YY-

For attribute text or element text that allows no internal markup, use Unicode control characters for bidirectional control.

YYY

Do not use Unicode control characters for bidirectional control if markup is available.

YY-

Do not leave white space at the end of inline elements that mark a directional boundary.

YY-

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.6 Handling parentheses & other mirrored characters

IENSOp

Treat mirrored characters as if any word left in the name meant 'opening', and right meant 'closing'.

YY

See detailed explanations...Resources dealing with this topic...Return to top of contents...6.7 Overriding the Unicode bidirectional algorithm

IENSOp

Use the bdo element to force the directionality of a sequence of inline characters.

YY-

14 Handling data that varies by locale

See detailed explanations...Resources dealing with this topic...Return to top of contents...14.1 Date & time

IENSOp

Use the full form of the year.

YYY

Use words (abbreviated if necessary) for the month.

YYY

For forms, use structured fields or popup menus for date and time input.

YYY