This document contains examples in another language or script.

Accesskey n skips to in page navigation. Skip to the content start

Go to W3C Home PageGo to Architecture Domain home page  Internationalization 
 

Tutorial: CSS3 and International Text (Draft)

Front matter

Intended audience

HTML/XHTML and CSS content authors who want to get a general idea of what lies in the future with regard to CSS support for non-Latin text support.

Why should you read this?

CSS3 will introduce a large number of properties designed to support non-Latin text, from vertical script support to kashida justification, from ruby positioning to list numbering. This tutorial will give you a glimpse of some of the properties that lie in store, and discuss how you can help to make it happen.

How to use this material

This material is organized around a set of presentation slides which can be viewed in several ways. Each view is identified by an icon as described below.

Icon for viewing the all-in-one version. All in one A single page containing all explanatory text followed by small accompanying slides.

Icon for viewing the slide by slide version. Slide by slide One page per slide view. This is particularly useful if you need to see the detail on a slide.

Icon for viewing the text version. Slide text This page by page version of the slides is provided mainly for those who want to cut and paste the text on the slides. (You will need appropriate fonts and rendering software to see the text correctly.)

Icon for linking to the overview. Overview The overview provides a list of headings to help you navigate around the presentation quickly.

Please send any comments to ishida@w3.org.

Objectives

After reading this tutorial you should:

This tutorial will not expose you to all of the properties planned for inclusion in CSS3 for the support of international text. It will content itself with giving you a flavor of what is to come.

None of the specifications we will discuss here are finalized.

Hopefully the tutorial will raise your expectations and motivate you, where appropriate, to become involved in bringing these modules to a final state and getting them implemented by user agents.

Lists module

The numbering of lists various from script to script. In CSS2 nine non-Latin numbering systems were specified. Unfortunately, user agents didn't implement all of these, and as part of its mission to represent a snapshot of current usage, the CSS2.1 specification reduced that number to two.

This shows how important it is to make your voice heard if you want non-Latin features to be supported in specifications and user agents.

It has to be said that the expected behaviour was also poorly specified for these options.

slide Go to individual slides view. View text for this slide.Go to Overview.

The CSS3 Lists Module currently specifies almost 70 non-Latin schemes for list numbering, and provides much more rigourous rules about their use. (If you want to avoid this number shrinking, however, don't take this for granted, and make your voice heard!)

slide Go to individual slides view. View text for this slide.Go to Overview.

Text module

Text direction

CSS3 will enable the use of vertical text, and should do so in a way that makes the choice of vertical or horizontal a purely stylistic decision, ie. you would be able to switch from well-rendered horizontal text to well-rendered vertical text and back with a few simple tweaks to the style sheet but no need to change the markup. You will also be able to mix horizontal and vertical text within the same page.

Key parameters here are the block-progression direction, the inline text direction, and the glyph orientation.

The block-progression property allows you to specify the direction in which you would read the lines of text. tb (top to bottom) would be used for horizontal text, since the lines are stacked top to bottom. rl (right to left) would be the normal setting for vertical Chinese, Japanese and Korean, where lines progress from the right to the left of the page or block. lr (left to right) would be appropriate for a script like Mongolian, which is also vertical but where your eyes start reading from top-left position.

The direction property allows you to specify the inline directionality of text. rtl would be appropriate for scripts such as Arabic and Hebrew. This is interpreted relative to the direction of the block progression, so in vertical text it means 'bottom to top'.

A writing-mode property is currently proposed as a shortcut that combines both block and inline progression into a set of common combinations.

glyph-orientation-vertical and glyph-orientation-horizontal properties can be used to control whether the Latin letters in vertical Japanese appear on their side (like 'Johansson' on the slide), or upright (like 'FIFA' on the slide).

Some of the more difficult aspects of achieving this relate to how one provides the user's preferred way of supporting inline text in vertically set paragraphs.

slide Go to individual slides view. View text for this slide.Go to Overview.

Line breaking

When lines of text wrap, the type of script affects the expected behavior - particularly with regard to the treatment of white space around line breaks in the code.

Chinese and Japanese scripts do not delimit words with spaces, and wrap on a character-by-character basis. There are, however, some rules (called kinsoku rules in Japan) that forbid certain characters (mostly final punctuation) from appearing at the beginning of a line, and others that forbid certain characters appearing at the end of a line.

Thai script uses spaces to delimit phrases, rather than words, and typically does not use sentence final punctuation. There is, however, a strong concept of a word, and text should be wrapped at word boundaries. Some Thai systems rely on users adding zero-width spaces to indicate where wrapping is appropriate, but it is more common to use a dictionary to determine word boundaries.

Unless the style sheet requires that space be preserved, when dealing with the Latin script, the general process of displaying text that is wrapped onto a new line in the source involves first reducing all white space at the beginning and end of the line to a single space. Then, unless the style sheet says otherwise, the line break characters and any surrounding spaces are boiled down to a single space.

The line breaking properties in CSS3 will allow you to indicate appropriate behavior for scripts like Chinese, Japanese and Thai, where adding a space between wrapped text may not be appropriate. (This is quite a complicated area, and discussion is still going on about how best to handle this.)

Some of the CSS3 properties will allow you to specify whether or not to wrap differently in the middle of embedded text from another script. For example, if you have Latin text in the middle of Chinese, should it wrap character by character or word by word? Both are valid, and the word-break-cjk property allows you to choose according to your general preferences or the context. (See the two examples on the slide.)

slide Go to individual slides view. View text for this slide.Go to Overview.

In addition, the line-break property allows you to specify preferences relating to the kinsoku rules and their Chinese and Korean equivalents. The top example on the slide shows the expected result of setting line-break: normal - a small katakana character begins the second line. This is the tends to be the preference in modern Japanese typography, and is particularly useful in text with thin columns. The second example shows the result of line-break: strict. The line wrapping algorithm now pulls down the last katakana character from the previous line so that the small katakana character is no longer in line-initial position.

slide Go to individual slides view. View text for this slide.Go to Overview.

One way of applying the end-of-line kinsoku rules just mentioned is to wrap two characters down to the next line. This can be done, for example, to avoid a sentence final delimiter appearing at the start of a line.

An alternative is to leave the punctuation hanging out of the margin. The hanging-punctuation property will allow you to control this.

slide Go to individual slides view. View text for this slide.Go to Overview.

Text alignment & justification

Among the properties for specifying alignment and justification of text is text-justify. One of the values for this property will allow you to specify how justification is applied to mixed script text. The top example on the slide shows how text-justify: inter-ideograph can cause inter-character spacing to be applied to ideographic characters but not Latin text.

Text-justify will also allow appropriate types of justification for various different scripts, for example ideographic, indic scripts with baseline connectors, South-East Asian scripts that don't use spaces between words, cursive scripts like Arabic, etc.

slide Go to individual slides view. View text for this slide.Go to Overview.

If you apply text-justify: kashida to Arabic script you can extend the baseline to justify Arabic words.

The text-kashida-space property allows you to specify the degree to which such stretching is applied during justification.

slide Go to individual slides view. View text for this slide.Go to Overview.

The text-justify-trim property will allow you to specify whether and how blank space compression behaves in a script like Japanese. The slide shows an example of a full-width parenthesis that has its blank space removed to allow for compression.

slide Go to individual slides view. View text for this slide.Go to Overview.

Text spacing

The punctuation-trim property will allow you to specify whether a full-width punctuation mark in ideographic text should be narrowed at the beginning of a line so that its 'ink' lines up with the first glyph in the lines above and below.

slide Go to individual slides view. View text for this slide.Go to Overview.

When non-ideographic text or numbers appear in ideographic text it is often preferable to separate the two with a little additional space. The text-autospace property will allow you to add such spacing without the need for spaces in the content. You can apply it to a number of types of embedded text, and combine them as you wish.

slide Go to individual slides view. View text for this slide.Go to Overview.

Document grids

It is common for the glyphs in documents written in East Asian languages to be laid out on a page in a grid pattern. This approach is helped by that fact that ideographic, kana and hangul characters tend to be the same width.

CSS3 Text specifies a set of properties for applying grids and for managing non-full width characters inside the grid, such as Latin text, in various ways.

This slide shows some vertical Japanese text with no grid applied.

slide Go to individual slides view. View text for this slide.Go to Overview.

On this slide we see the effect of applying one type of grid layout to the text on the previous slide.

slide Go to individual slides view. View text for this slide.Go to Overview.

Kumimoji and warichu

Kumimoji is a Japanese word referring to the practice of combining up to 5 glyphs within a single wide-character glyph space. The first example on the slide shows how this might look.

Warichu is a Japanese word for a type of inline note, where the text runs on two lines. See the second example on the slide.

Both of these effects will be available through the text-combine property of CSS3.

slide Go to individual slides view. View text for this slide.Go to Overview.

Fonts module

The fonts module of CSS3 will allow alternate forms of emphasis, such as the boten marks used in Japanese. These are similar in use to italicisation or bolding in Latin text - neither of which work well in small font sizes on screen.

slide Go to individual slides view. View text for this slide.Go to Overview.

Ruby module

Ruby markup

Ruby is a type of annotation associated with a base text that is often used in Japanese, and to some extent in Chinese also, to provide pronunciation information for ideographs, and sometimes short explanations. The name 'ruby' originated from a named font size (about half the size of the normal 10 point font) used by British typesetters. In Japanese this is known as furigana.

The Ruby Annotation Recommendation describes how to mark up text so that it is clear which is the base text and which is the ruby annotation. There are simple and complex ruby models. This slide shows the simplest form of markup described in the Recommendation. The text in the <rt> element is the annotation. The base text is in the <rb> element.

A user agent that displays ruby will normally display simple ruby text above horizontal base text and in a smaller font. For vertical text the ruby would by default be displayed to the right.

For a slightly more detailed description see Ruby Markup and Styling.

slide Go to individual slides view. View text for this slide.Go to Overview.

Ruby styling

Sometimes you may want to be able to control the location of the ruby text. For example, the specification says that it is common to place pinyin annotations below horizontal Chinese text, rather than above. The ruby-position property will give you control over the location.

If you specify before, the ruby text will appear above horizontal base text and to the right of vertical text. Specify after, and the ruby text appears below horizontal base text and to the left of vertical text.

slide Go to individual slides view. View text for this slide.Go to Overview.

The ruby-align property addresses the relative alignment of ruby and base text when one is longer than the other. Basically the effect is applied to whichever is shorter, the ruby text or the base text.

The slide shows how text would look if ruby-align is set to end. (Ignore the green line, which is there just to show the boundaries more clearly on the slide.)

slide Go to individual slides view. View text for this slide.Go to Overview.

If the ruby text is wider than the base text you can specify whether or not it overhangs any surrounding base text, and to what amount, using the ruby-overhang property.

The example on the slide shows the effect of setting ruby-overhang to start. Note how the ruby text overlaps the preceding characters, but not the following base text. Note also that ruby does not overlap itself.

slide Go to individual slides view. View text for this slide.Go to Overview.

When will it be ready

Where things stand

We have looked very briefly at some, but by no means all of the international support CSS3 modules will offer. What is more, we have only dipped our toes into the properties we have described - each slide in this tutorial could become a tutorial in its own right.

Some of the features we have discussed have been implemented in Internet Explorer 5+, but the specifications have changed since then, so you should be very careful about using those features. There is no guarantee that anything you implement currently on Internet Explorer will be interoperable code in the long term. For example, the grid layout properties are now completely different.

With the possible exception of the list type property, there do not appear to be any implementations of these features on other user agents as yet.

If you are curious about these implementations you can find some examples at the following pages:

The following list indicates the current status of the modules discussed here. Note that work is still ongoing on all of these specifications. Even the Candidate Recommendations may return to Working Draft again before they go on to Recommendation.

This is not an exhaustive list of specifications that contain properties relevant to international text. For example, the proposed CSS3 Line module, which has not yet been published as a Working Draft, promises to deliver some important control over behavior related to baseline alignment across scripts.

There is also additional work that must be completed in modules that are dependencies for these specifications.

slide Go to individual slides view. View text for this slide.Go to Overview.

How to move things faster?

If you are interested in seeing these features become available, please let the W3C know. It is always useful to us to hear what users want. There are also a number of practical ways to get involved.

The CSS Working Group has a lot of work on its hands at the moment, and work on these modules has been going slowly. As mentioned before, there is even discussion going on currently around features in the Candidate Recommendations. Your assistance in developing these specifications will be appreciated.

You can help move the specifications forward by reviewing and commenting on the public drafts that are made available. If you have expertise in this area, you might also consider participating in the CSS or Internationalization Working Groups to help move the work along. A significant amount of specification work benefitted from the Japanese JIS 4051 specification, but there still are a number of areas where it would be useful to have local input about or confirmation of the requirements and expected behaviors of text in other scripts.

Even when we move these specifications to Recommendation stage, the battle is not yet won. User agent developers must implement these features so that we can use them, and we need the properties to be widely implemented on a range of user agents. Again, the voice of local users is important in making that happen. User agent developers are unlikely to implement these features if they hear no-one asking for them.

Finally, you can help by keeping informed about progress on these features and implementing them in your content when they become available.

slide Go to individual slides view. View text for this slide.Go to Overview.

Further reading

Author: Richard Ishida.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Content created 22 April, 2005. Last update 2005-04-29 18:49 GMT