These are very rough notes as a first step towards developing a set of techniques for use of HTML.
Each rule attempts to broadly list likely audiences for that rule. Current categories include (authoring) tools developers, web masters, authors, user agent developers. Note that 'author' is used in the sense described by the HTML 4.01 spec, ie as a person or program that writes or generates HTML documents.
Click on anything with a + sign to the left to expand its contents.
This will cause block elements and table columns to start on the right and flow from right to left. The Unicode bidirectional algorithm should automatically handle the inline text directionality. All block elements in the document will inherit this setting unless it is explicitly overridden.
No dir attribute is needed for documents that have a base directionality of ltr, since this is the default.
Note: in Internet Explorer adding the dir attribute to the html tag also moves the scroll bar to the left of the browser window.
Example: the Hebrew text that reads as follows
The title says "פעילות הבינאום, W3C" in Hebrew.
Would be typed into the editor and stored in the computer memory (though not necessarily displayed if your editor is bidi aware) as
The title says "פעילות הבינאום, W3C" in Hebrew.
Occasionally there may be situations where the Unicode bidirectional algorithm doesn't quite do what is required. Alternatively, you may want to assign a different directionality to a part of the page. In these cases you can apply additional markup to override the default ordering.
See explanation in the html4.01 spec.
If the dir attribute is added to a block element, all subordinate elements inherit the directionality (unless of course their directionality is changed explicitly using a different value for dir). Elements and their contents will flow from the right of the displayed page towards the left.
Example: The following lines of text show a right-to-left paragraph embedded in this left-to-right page.
להוביל את הרשת למיצוי הפוטנציאל שלה…
The code underlying this paragraph, ordered as per the characters in memory, is:
<p dir="rtl">להוביל את הרשת למיצוי הפוטנציאל שלה… <img src="globe.gif" alt="globe"/></p>
Note the effect of the dir attribute in placing the image to the left of the text.
At a simple level the Unicode bidirectional algorithm takes care of the reordering of inline text, but where there is nesting of directionality the dir attribute needs to be used.
Example: The following line of text is coded without any dir attributes. Note that the order of the two Hebrew words is correct, but the text 'W3C' should appear on the left hand side of the quotation.
The title says "פעילות הבינאום, W3C" in Hebrew.
To get the correct result we surround the text within the quote marks with a span element and set the dir attribute rtl as shown here (with all characters as ordered in memory).
<p>The title says "<span dir="rtl">פעילות הבינאום, W3C</span>" in Hebrew.</p>
The result when displayed is:
The title says "פעילות הבינאום, W3C" in Hebrew.
'bdo' stands for 'bidirectional override'. This inline element can be used to override the Unicode bidirectional algorithm if the dir attribute doesn't produce the desired result or if you want to produce a different result.
Example: Illustrations of the characters as stored in memory in earlier examples are produced by simply applying a bdo tag to produce a left to right flow of characters regardless of the directionality of the characters involved. So the earlier example showing how text was stored in the computer's memory
The title says "פעילות הבינאום, W3C" in Hebrew.
can be produced using the following underlying code
<p><bdo dir="ltr">The title says "פעילות הבינאום, W3C" in Hebrew.</bdo></p>
Without the bdo tag, the Unicode bidirectional algorithm would have produced the following result
The title says "פעילות הבינאום, W3C" in Hebrew.
These represent two special characters in Unicode that can be used after the neutral character whose directionality is ambiguous. Problems typically arise for punctuation that falls between characters in a bidirectional script and characters in a non-bidirectional script. The entities are Unicode characters that are strongly typed, so they help disambiguate the context for the Unicode bidirectional algorithm.
Example: In the following sentence, despite the use of the dir attribute, the commas between the English text that are part of the Hebrew right to left flow have become confused. This is because they are surrounded by Latin text, and the Unicode bidirectional algorithm assumes that they are part of the Latin text flow that goes from left to right.
פעילות הבינאום, W3C, W3C, W3C, פעילות הבינאום, W3C
This can be easily remedied by adding a ‏ entity immediately after the commas, as shown here
פעילות הבינאום, W3C, W3C, W3C, פעילות הבינאום, W3C
The code that produced this result is
<p dir="rtl">פעילות הבינאום, W3C,‏ W3C,‏ W3C,‏ פעילות הבינאום, W3C</p>
Should we suggest the use of hex values, since XHTML is XML? Actually, I think XML ought to have meaningful entity names for these invisible formatting control characters - makes the code a lot easier to manage/understand.
Use the lang and xml:lang attributes in the html tag
Use the lang and xml:lang attributes around the text.
Use the hreflang attribute on the a element.
Need to think about this - don't think it is supported by browsers.
Do we include detail here or under section on links?
Follow the guidelines in RFC3066.
Note that the HTML spec still says rfc1766, but this has been obsoleted by rfc3066.
Explain the basic principles here.
Use the two letter ISO 639 codes for the language code and the two letter ISO 3166 codes for the country code wherever possible.
This aids interoperability, and increases the likelihood of recognition by browsers.
Use the lang and xml:lang attributes in the html tag
Point to or include detail
Use the META element in HTML documents to explicity declare the document's character encoding.
Point to or include detail