Version using Dynamic HTML
Notes for HTML i18n Guidelines Devt
These are very rough notes as a first step towards developing a set of techniques for use of HTML.
Each rule attempts to broadly list likely audiences for that rule. Current categories include (authoring) tools developers, web masters,
authors, user agent developers. Note that 'author' is used in the sense described by the HTML 4.01 spec, ie as a person or program that
writes or generates HTML documents.
- Choosing character encodings and representing characters
- Choosing an encoding
- Authoring tools (e.g., text editors) should allow the encoding of HTML documents in the character encoding of the
user's choice.
- Relevance: tools developers
- Could list some of the more common encodings for particular langs - point to page in hints & tips.
- Servers and proxies may change a character encoding (called transcoding) on the fly to meet the requests of user agents.
- A document encoding SHOULD be chosen which maximizes the opportunity to directly represent characters and minimizes the need to
represent characters by markup means such as character escapes.
- Relevance: authors, web masters
- Point out benefits of utf-8 everywhere.
- Encode web pages in UTF-8 unless there is a good reason not to.
- Relevance: authors, web masters
- Refer to any issues with particular browsers - but also point out what browsers (and versions) already support utf-8.
- Point out benefits of utf-8 everywhere.
- Use IANA's preferred names for charset declarations.
- Relevance: authors, web masters
- Use character sets that will be accessible and common to your users.
- Relevance: authors, web masters
- Point to the table in hints & tips.
- Do not use the UTF-1 transformation format of [ISO10646].
- Relevance: authors, web masters, tools developers, UA developers
- References:
- Transmitting encoded text
- When HTML text is transmitted in UTF-16 (charset=UTF-16), text data should be transmitted in network byte order
("big-endian", high-order byte first).
- Relevance: tools developers
- It is recommended that documents transmitted as UTF-16 always begin with a Byte Order Mark.
- Relevance: tools developers
- References:
- Specifying the character encoding
- Servers should send out an HTTP "charset" parameter whenever possible, but take care not to identify a document with
the wrong "charset" parameter value.
- Relevance: web masters
- Could be by server side configuration controls, a database of files and encodings, sniffing the document...
- Show examples of how to do this in Apache, etc.
- Show example of an HTTP header
- Use the META element in HTML documents to explicity declare the document's character encoding.
- The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters
(at least until the META element is parsed).
- META declarations should appear as early as possible in the HEAD element.
- References:
- Detecting document encodings
- User agents must not assume any default value for the "charset" parameter.
- Conforming user agents must observe the following priorities when determining a document's character encoding (from
highest priority to lowest):
- An HTTP "charset" parameter in a "Content-Type" field.
- A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
- The charset attribute set on an element that designates an external resource.
- In addition to the list of priorities, the user agent may use heuristics and user settings.
- In addition to the list of priorities, user agents consider the use of a user-definable, local default character encoding which they
apply in the absence of other indicators.
- If a user agent offers such a mechanism that allows users to override incorrect "charset" information, it should only offer it for
browsing and not for editing, to avoid the creation of Web pages marked with an incorrect "charset" parameter.
- Relevance: UA developers, tools developers
- References:
- Referring to specific characters
- Escapes SHOULD be avoided when the characters to be expressed are representable in the character encoding of the document.
- Relevance: authors
- Provide an overview of how to use escapes: hex / decimal NCRs, and entities
- Since character set standards usually list character numbers as hexadecimal, content SHOULD use the hexadecimal form
of character escapes when there is one.
- If, for a specific application, it becomes necessary to refer to characters outside [ISO10646], characters should be assigned to a
private zone to avoid conflicts with present or future versions of the standard. This is highly discouraged, however, for reasons of portability.
- Don't use entities in XHTML?
- Relevance: authors
- Maximises compatability with xml ?
- Something about the use of inline images to represent characters
- Relevance: authors
- How best to handle for accessibility needs or to allow exact identification of the intended character(s).
- References:
- Dealing with undisplayable characters
- Adopt a clearly visible, but unobtrusive mechanism to alert the user of missing resources.
- If missing characters are presented using their numeric representation, use the hexadecimal (not decimal) form since
this is the form used in character set standards.
- Some guidelines for content authors who know that users won't have all the necessary fonts.
- References: