This document contains examples in another language or script.
Accesskey n skips to in page navigation. Skip to the content start
Slide by slide You can view larger versions of the slides by clicking on these icons or the
slide images.
Slide text If you want to copy the text on the slides, click on these icons.
Overview A list of headings to help you navigate around the presentation quickly.
HTML/XHTML and CSS content authors who want to get a general idea of what lies in the future with regard to CSS support for non-Latin text support.
CSS3 will introduce a large number of properties designed to support non-Latin text, from vertical script support to kashida justification, from ruby positioning to list numbering. This tutorial will give you a glimpse of some of the properties that lie in store, and discuss how you can help to make it happen.
This material is organized around a set of presentation slides which can be viewed in several ways. Each view is identified by an icon as described below.
All in one A single page containing all explanatory text followed by small accompanying slides.
Slide by slide One page per slide view. This is particularly useful if you need to see the detail on a slide.
Slide text This page by page version of the slides is provided mainly for those who want to cut and paste the text on the slides. (You will need appropriate fonts and rendering software to see the text correctly.)
Overview The overview provides a list of headings to help you navigate around the presentation quickly.
Please send any comments to ishida@w3.org.
After reading this tutorial you should:
This tutorial will not expose you to all of the properties planned for inclusion in CSS3 for the support of international text. It will content itself with giving you a flavor of what is to come.
None of the specifications we will discuss here are finalized.
Hopefully the tutorial will raise your expectations and motivate you, where appropriate, to become involved in bringing these modules to a final state and getting them implemented by user agents.
The numbering of lists various from script to script. In CSS2 nine non-Latin numbering systems were specified. Unfortunately, user agents didn't implement all of these, and as part of its mission to represent a snapshot of current usage, the CSS2.1 specification reduced that number to two.
This shows how important it is to make your voice heard if you want non-Latin features to be supported in specifications and user agents.
It has to be said that the expected behaviour was also poorly specified for these options.
The CSS3 Lists Module currently specifies almost 70 non-Latin schemes for list numbering, and provides much more rigourous rules about their use. (If you want to avoid this number shrinking, however, don't take this for granted, and make your voice heard!)
CSS3 will enable the use of vertical text, and should do so in a way that makes the choice of vertical or horizontal a purely stylistic decision, ie. you would be able to switch from well-rendered horizontal text to well-rendered vertical text and back with a few simple tweaks to the style sheet but no need to change the markup. You will also be able to mix horizontal and vertical text within the same page.
Key parameters here are the block-progression direction, the inline text direction, and the glyph orientation.
The block-progression property allows you to specify the direction in which you would read the lines of text. tb (top to bottom) would be used for horizontal text, since the lines are stacked top to bottom. rl (right to left) would be the normal setting for vertical Chinese, Japanese and Korean, where lines progress from the right to the left of the page or block. lr (left to right) would be appropriate for a script like Mongolian, which is also vertical but where your eyes start reading from top-left position.
The direction property allows you to specify the inline directionality of text. rtl would be appropriate for scripts such as Arabic and Hebrew. This is interpreted relative to the direction of the block progression, so in vertical text it means 'bottom to top'.
A writing-mode property is currently proposed as a shortcut that combines both block and inline progression into a set of common combinations.
glyph-orientation-vertical and glyph-orientation-horizontal properties can be used to control whether the Latin letters in vertical Japanese appear on their side (like 'Johansson' on the slide), or upright (like 'FIFA' on the slide).
Some of the more difficult aspects of achieving this relate to how one provides the user's preferred way of supporting inline text in vertically set paragraphs.
When lines of text wrap, the type of script affects the expected behavior - particularly with regard to the treatment of white space around line breaks in the code.
Chinese and Japanese scripts do not delimit words with spaces, and wrap on a character-by-character basis. There are, however, some rules (called kinsoku rules in Japan) that forbid certain characters (mostly final punctuation) from appearing at the beginning of a line, and others that forbid certain characters appearing at the end of a line.
Thai script uses spaces to delimit phrases, rather than words, and typically does not use sentence final punctuation. There is, however, a strong concept of a word, and text should be wrapped at word boundaries. Some Thai systems rely on users adding zero-width spaces to indicate where wrapping is appropriate, but it is more common to use a dictionary to determine word boundaries.
Unless the style sheet requires that space be preserved, when dealing with the Latin script, the general process of displaying text that is wrapped onto a new line in the source involves first reducing all white space at the beginning and end of the line to a single space. Then, unless the style sheet says otherwise, the line break characters and any surrounding spaces are boiled down to a single space.
The line breaking properties in CSS3 will allow you to indicate appropriate behavior for scripts like Chinese, Japanese and Thai, where adding a space between wrapped text may not be appropriate. (This is quite a complicated area, and discussion is still going on about how best to handle this.)
Some of the CSS3 properties will allow you to specify whether or not to wrap differently in the middle of embedded text from another script. For example, if you have Latin text in the middle of Chinese, should it wrap character by character or word by word? Both are valid, and the word-break-cjk property allows you to choose according to your general preferences or the context. (See the two examples on the slide.)
In addition, the line-break property allows you to specify preferences relating to the kinsoku rules and
their Chinese and Korean equivalents. The top example on the slide shows the expected result of setting line-break: normal
- a small
katakana character begins the second line. This is the tends to be the preference in modern Japanese typography, and is particularly useful in text
with thin columns. The second example shows the result of line-break: strict
. The line wrapping algorithm now pulls down the last
katakana character from the previous line so that the small katakana character is no longer in line-initial position.
One way of applying the end-of-line kinsoku rules just mentioned is to wrap two characters down to the next line. This can be done, for example, to avoid a sentence final delimiter appearing at the start of a line.
An alternative is to leave the punctuation hanging out of the margin. The hanging-punctuation property will allow you to control this.
Among the properties for specifying alignment and justification of text is text-justify. One of the values
for this property will allow you to specify how justification is applied to mixed script text. The top example on the slide shows how
text-justify: inter-ideograph
can cause inter-character spacing to be applied to ideographic characters but not Latin text.
Text-justify will also allow appropriate types of justification for various different scripts, for example ideographic, indic scripts with baseline connectors, South-East Asian scripts that don't use spaces between words, cursive scripts like Arabic, etc.
If you apply text-justify: kashida
to Arabic script you can extend the baseline to justify Arabic words.
The text-kashida-space property allows you to specify the degree to which such stretching is applied during justification.
The text-justify-trim property will allow you to specify whether and how blank space compression behaves in a script like Japanese. The slide shows an example of a full-width parenthesis that has its blank space removed to allow for compression.
The punctuation-trim property will allow you to specify whether a full-width punctuation mark in ideographic text should be narrowed at the beginning of a line so that its 'ink' lines up with the first glyph in the lines above and below.
When non-ideographic text or numbers appear in ideographic text it is often preferable to separate the two with a little additional space. The text-autospace property will allow you to add such spacing without the need for spaces in the content. You can apply it to a number of types of embedded text, and combine them as you wish.
It is common for the glyphs in documents written in East Asian languages to be laid out on a page in a grid pattern. This approach is helped by that fact that ideographic, kana and hangul characters tend to be the same width.
CSS3 Text specifies a set of properties for applying grids and for managing non-full width characters inside the grid, such as Latin text, in various ways.
This slide shows some vertical Japanese text with no grid applied.
On this slide we see the effect of applying one type of grid layout to the text on the previous slide.
Kumimoji is a Japanese word referring to the practice of combining up to 5 glyphs within a single wide-character glyph space. The first example on the slide shows how this might look.
Warichu is a Japanese word for a type of inline note, where the text runs on two lines. See the second example on the slide.
Both of these effects will be available through the text-combine property of CSS3.
The fonts module of CSS3 will allow alternate forms of emphasis, such as the boten marks used in Japanese. These are similar in use to italicisation or bolding in Latin text - neither of which work well in small font sizes on screen.
Ruby is a type of annotation associated with a base text that is often used in Japanese, and to some extent in Chinese also, to provide pronunciation information for ideographs, and sometimes short explanations. The name 'ruby' originated from a named font size (about half the size of the normal 10 point font) used by British typesetters. In Japanese this is known as furigana.
The Ruby Annotation Recommendation describes how to mark up text so that it is clear which is the base text and which is the ruby annotation. There are simple and complex ruby models. This slide shows the simplest form of markup described in the Recommendation. The text in the <rt> element is the annotation. The base text is in the <rb> element.
A user agent that displays ruby will normally display simple ruby text above horizontal base text and in a smaller font. For vertical text the ruby would by default be displayed to the right.
For a slightly more detailed description see Ruby Markup and Styling.
Sometimes you may want to be able to control the location of the ruby text. For example, the specification says that it is common to place pinyin annotations below horizontal Chinese text, rather than above. The ruby-position property will give you control over the location.
If you specify before, the ruby text will appear above horizontal base text and to the right of vertical text. Specify after, and the ruby text appears below horizontal base text and to the left of vertical text.
The ruby-align property addresses the relative alignment of ruby and base text when one is longer than the other. Basically the effect is applied to whichever is shorter, the ruby text or the base text.
The slide shows how text would look if ruby-align is set to end. (Ignore the green line, which is there just to show the boundaries more clearly on the slide.)
If the ruby text is wider than the base text you can specify whether or not it overhangs any surrounding base text, and to what amount, using the ruby-overhang property.
The example on the slide shows the effect of setting ruby-overhang to start. Note how the ruby text overlaps the preceding characters, but not the following base text. Note also that ruby does not overlap itself.
We have looked very briefly at some, but by no means all of the international support CSS3 modules will offer. What is more, we have only dipped our toes into the properties we have described - each slide in this tutorial could become a tutorial in its own right.
Some of the features we have discussed have been implemented in Internet Explorer 5+, but the specifications have changed since then, so you should be very careful about using those features. There is no guarantee that anything you implement currently on Internet Explorer will be interoperable code in the long term. For example, the grid layout properties are now completely different.
With the possible exception of the list type property, there do not appear to be any implementations of these features on other user agents as yet.
The following list indicates the current status of the modules discussed here. Note that work is still ongoing on all of these specifications. Even the Candidate Recommendations may return to Working Draft again before they go on to Recommendation.
This is not an exhaustive list of specifications that contain properties relevant to international text. For example, the proposed CSS3 Line module, which has not yet been published as a Working Draft, promises to deliver some important control over behavior related to baseline alignment across scripts.
There is also additional work that must be completed in modules that are dependencies for these specifications.
If you are interested in seeing these features become available, please let the W3C know. It is always useful to us to hear what users want. There are also a number of practical ways to get involved.
The CSS Working Group has a lot of work on its hands at the moment, and work on these modules has been going slowly. As mentioned before, there is even discussion going on currently around features in the Candidate Recommendations. Your assistance in developing these specifications will be appreciated.
You can help move the specifications forward by reviewing and commenting on the public drafts that are made available. If you have expertise in this area, you might also consider participating in the CSS or Internationalization Working Groups to help move the work along. A significant amount of specification work benefitted from the Japanese JIS 4051 specification, but there still are a number of areas where it would be useful to have local input about or confirmation of the requirements and expected behaviors of text in other scripts.
Even when we move these specifications to Recommendation stage, the battle is not yet won. User agent developers must implement these features so that we can use them, and we need the properties to be widely implemented on a range of user agents. Again, the voice of local users is important in making that happen. User agent developers are unlikely to implement these features if they hear no-one asking for them.
Finally, you can help by keeping informed about progress on these features and implementing them in your content when they become available.
CSS3 Fonts http://www.w3.org/TR/css3-fonts/
CSS3 Lists http://www.w3.org/TR/css3-lists/
CSS3 Ruby http://www.w3.org/TR/css3-ruby
Ruby Markup and Styling http://www.w3.org/International/tutorials/ruby/
CSS3 Text http://www.w3.org/TR/css3-text/
Author: Richard Ishida.