This is an outline of the Working Draft, Authoring Techniques for XHTML and HTML Internationalization 1.0 dated 9 October 2003. Links to the full document and to resource information are provided at the beginning of each section.
Table of contents-Introduction -Document structure & metadata -Character sets, character encodings and entities -Fonts -Specifying the language of content -Handling bidirectional text -* Handling vertical text -* Text formatting -* Tables -* Links -* Objects -* Images -Handling data that varies by locale -Forms -* Keyboard shortcuts -* Writing source text -* Navigation -* File management -* Supplying data for localization | Notes on document useThis document is in early draft form. It is undergoing constant and frequent modification and does not yet contain accurate content. Use the icons to the right of each section header to view the full text or view resources for a given section. The yellow cells to the right indicates whether a technique is supported by a given user agent. The possible alternatives are:
2 Document structure & metadata
|
| IE | NS | Op | ||
• | For HTML documents and XHTML documents served as text/html, always use the meta element to explicitly declare the document's character encoding. | Y | Y | Y |
• | Use | Y | Y | Y |
• | For HTML use the | Y | Y | Y |
| IE | NS | Op | ||
• | Choose UTF-8 or another Unicode encoding for all content. | Y | Y | Y |
• | If you don't use a Unicode encoding, select an encoding that best supports the languages / characters to be included in the page text. [Ed. note: What does this mean? Does it mean, which maximizes the opportunity to directly represent characters and minimizes the need to represent characters by markup means such as character escapes? Does it include the idea that you should choose the most commonly used encoding for a region?] | Y | Y | Y |
• | Check that user agents (all agents that must render the page) adequately support the page encoding that you have selected. If not, you might need to use a more widely supported encoding to achieve an adequate degree of user agent support.[Ed. note: Couldn't this be rolled into the previous technique?] | Y | Y | Y |
• | Use character sets and encodings that will be accessible and common to your users. | Y | Y | Y |
| IE | NS | Op | ||
• | Where practical, declare the page's character encoding by setting the | Y | Y | Y |
• | For XHTML served as text/html, where practical use an XML declaration with an encoding attribute. | Y | Y | Y |
• | For XHTML served as | - | Y | Y |
• | For HTML documents and XHTML documents served as text/html, always use the meta element to explicitly declare the document's character encoding. | Y | Y | Y |
• | Use | Y | Y | Y |
• | Use the preferred names from IANA's charset registry. | Y | Y | Y |
| IE | NS | Op | ||
• | Avoid escapes when the characters to be expressed are representable in the character encoding of the document. | Y | Y | Y |
• | When using escapes, use the hexadecimal form. | Y | Y | Y |
• | If, for a specific application, it becomes necessary to refer to characters outside [ISO10646], characters should be assigned to a private zone to avoid conflicts with present or future versions of the standard. Use of private use characters is highly discouraged, however, for reasons of portability. | Y | Y | Y |
• | [Ed. note: Add something about the use of inline images to represent characters ] | Y | Y | Y |
| IE | NS | Op | ||
• | Do not use <font> tags - use CSS styles instead. | Y | Y | Y |
• | Always use the serif and sans-serif fallbacks | Y | Y | Y |
• | Don't assume you know which fonts will be available on the client. | Y | Y | Y |
• | Don't rely on text just fitting in a space | Y | Y | Y |
| IE | NS | Op | ||
• | For HTML use the | Y | Y | Y |
| IE | NS | Op | ||
• | Use the | Y | Y | Y |
'Bidirectional', or 'bidi', text refers to text written using a script such as Arabic or Hebrew. In such scripts the text flows predominantly from right to left, but embedded numbers or text in other scripts (such as Latin script) still runs left to right.
| IE | NS | Op | ||
• | Whenever possible, avoid HTML attributes with values of | Y | Y | Y |
• | Whenever possible, avoid using CSS constructs that specify values of | Y | Y | Y |
| IE | NS | Op | ||
• | Do not use CSS styling to control directionality in XHTML/HTML. Use markup. | Y | Y | Y |
• | Only use bidi markup where it is needed. | Y | Y | - |
| IE | NS | Op | ||
• | Add | Y | Y | - |
• | Do not add | Y | Y | - |
• | Use logical order, not visual ordering for Hebrew. | Y | Y | - |
• | If using an ISO character encoding for Hebrew, choose iso-8859-8-i and use logical ordering. | Y | Y | - |
| IE | NS | Op | ||
• | Add the | Y | Y | - |
| IE | NS | Op | ||
• | Use a Unicode right-to-left mark (RLM) or left-to-right mark (LRM) to make neutral characters such as punctuation and spaces appear in the right place when they fall between different directional runs. | Y | Y | - |
• | Use a Unicode right-to-left mark (RLM) or left-to-right mark (LRM) to correctly order separate runs of same direction text separated by neutral characters such as punctuation and spaces. | Y | Y | - |
• | Use the | Y | Y | - |
• | For attribute text or element text that allows no internal markup, use Unicode control characters for bidirectional control. | Y | Y | Y |
• | Do not use Unicode control characters for bidirectional control if markup is available. | Y | Y | - |
• | Do not leave white space at the end of inline elements that mark a directional boundary. | Y | Y | - |
| IE | NS | Op | ||
• | Treat mirrored characters as if any word | Y | Y |
While this section awaits content you can find a W3C Internationalization FAQ that answers the question, What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?