Intended audience: HTML/XHTML and CSS content authors who want to get a general idea of what lies in the future with regard to CSS support for non-Latin text support. The article assumes a basic understanding of CSS.
The CSS3 modules currently in development will introduce a large number of properties designed to support non-Latin text, from vertical script support to kashida justification, from ruby positioning to list numbering. This article will give you a glimpse of some of the properties that lie in store, and discuss how you can help to make these improvements a reality.
The article will not expose you to all of the properties planned for inclusion in CSS3 for the support of international text. It will give you a flavor of what is to come, so that you can see the potential benefits of completing the work.
None of the specifications we will discuss here are finalized.
Hopefully the article will raise your expectations and motivate you, where appropriate, to become involved in bringing these modules to a final state and getting them implemented by user agents.
The way items are numbered in a list varies from script to script. In CSS2 nine non-Latin numbering systems were specified. These included: lower-greek, hebrew, armenian, georgian, cjk-ideographic, hiragana, katakana, hiragana-iroha, katakana-iroha. Unfortunately, user agents didn't implement all of these, and as part of its mission to represent a snapshot of current usage, the CSS2.1 specification reduced that number to two: armenian, and georgian.
This shows how important it is to make your voice heard if you want non-Latin features to be supported in specifications and user agents.
It has to be said that the expected behaviour was also poorly specified for these options.
The CSS3 Lists Module currently specifies almost 70 non-Latin schemes for list numbering, and provides much more rigourous rules about their use. (If you want to avoid this number shrinking, however, don't take this for granted, and make your voice heard! Implementers tend to ask themselves from time to time whether they need to support such a lot of possibilities.)
CSS3 will enable the use of vertical text, and should do so in a way that makes the choice of vertical or horizontal a purely stylistic decision, ie. you would be able to switch from well-rendered horizontal text to well-rendered vertical text and back with a few simple tweaks to the style sheet but no need to change the markup. You will also be able to mix horizontal and vertical text within the same page.
Key parameters here are the block-progression direction, the inline text direction, and the glyph orientation.
The block-progression direction allows you to specify the direction in which lines of text flow. Top to bottom would be appropriate for horizontal text, since the lines are stacked top to bottom. Right-to-left would be the normal setting for vertical Chinese, Japanese and Korean, where lines progress from the right to the left of the page or block. Left-to-right would be appropriate for a script like Mongolian, which is also vertical but where your eyes start reading from top-left position.
Within a given line it must be possible to specify the inline directionality of text. Right-to-left would be appropriate for scripts such as Arabic and Hebrew. This needs to be interpreted relative to the direction of the block progression.
A writing-mode property has been proposed as a shortcut that combines both block and inline progression into a set of common combinations.
It is then necessary to provide a means to control whether the Latin letters in vertical Japanese appear on their side (like 'Johansson' on the picture), or upright (like 'FIFA' on the picture), and whether text (left-to-right and right-to-left) runs up the vertical line or down.
Some of the more difficult aspects of text direction relate to how one provides the user's preferred way of supporting inline text in vertically set paragraphs. The CSS Working Group is currently discussing how this should work (May 2006).
When lines of text wrap, the type of script affects the expected behavior - particularly with regard to the treatment of white space around line breaks in the code.
Chinese and Japanese scripts do not delimit words with spaces, and wrap on a character-by-character basis. There are, however, some rules (called kinsoku rules in Japan) that forbid certain characters (mostly final punctuation) from appearing at the beginning of a line, and others that forbid certain characters appearing at the end of a line.
Thai script uses spaces to delimit phrases, rather than words, and typically does not use sentence final punctuation. There is, however, a strong concept of a word, and text should be wrapped at word boundaries. Some Thai systems rely on users adding zero-width spaces to indicate where wrapping is appropriate, but it is more common to use a dictionary to determine word boundaries.
Unless the style sheet requires that space be preserved, when dealing with the Latin script, the general process of displaying text that is wrapped onto a new line in the source involves first reducing all white space at the beginning and end of the line to a single space. Then, unless the style sheet says otherwise, the line break characters and any surrounding spaces are boiled down to a single space.
The line breaking properties in CSS3 will allow you to indicate appropriate behavior for scripts like Chinese, Japanese and Thai, where adding a space between wrapped text may not be appropriate. (This is quite a complicated area, and discussion is still going on about how best to handle this.)
Some of the CSS3 properties will allow you to specify whether or not to wrap differently in the middle of embedded text from another script. For example, if you have Latin text in the middle of Chinese, should it wrap character by character or word by word? Both are valid, and CSS should allow you to choose according to your general preferences or the context. (See the two examples in the figure below.)
In addition, it is also important to allow the user to express preferences relating to the kinsoku rules and their Chinese and Korean equivalents. The top example in the figure below shows a typical result when the last character on a line is followed by a small katakana character.
The small katakana character begins the second line. This tends to be the preference in modern Japanese typography, and is particularly useful in text with thin columns. The second example shows the result of a different approach, often referred to as 'strict'. The line wrapping algorithm now pulls down the last katakana character from the previous line so that the small katakana character is no longer in line-initial position, leaving a gap at the end of the line if the text is not justified.
One way of applying the end-of-line kinsoku rules just mentioned is to wrap two characters down to the next line. This may be done, for example, to avoid a sentence final delimiter appearing at the start of a line.
An alternative is to leave the punctuation hanging out of the margin. CSS3 should allow you to control this.
When specifying alignment and justification of text, CSS needs to allow you to specify how justification is to be applied to mixed script text. The example below shows how one approach can cause inter-character spacing to be applied to ideographic characters but not Latin text.
Properties to control justification will also allow appropriate types of justification for various different scripts, for example ideographic, indic scripts with baseline connectors, South-East Asian scripts that don't use spaces between words, cursive scripts like Arabic, etc.
Arabic script text typically justifies by stretching the connected baselines of letters within a word, rather than stretching the spaces between the words. CSS3 should provide properties to allow you to extend the baseline in this way.
You also need to be able to specify the degree to which such stretching is applied during justification.
As part of the process of justifying CJK text, you may want to specify whether and how blank space compression behaves. The picture below shows an example of a full-width parenthesis in Japanese that has its blank space removed to allow for compression.
You should also be able to specify whether a full-width punctuation mark in ideographic text should be narrowed at the beginning of a line so that its 'ink' lines up with the first glyph in the lines above and below. This makes the left margin or top margin look neater, but also affects the justification of the line on which it occurs.
When non-ideographic text or numbers appear in ideographic text it is often preferable to separate the two with a little additional space. CSS3 will provide a property that allows you to add such spacing without the need for spaces in the content. You can apply it around a number of different types of embedded text, and combine them as you wish.
It is common for the glyphs in documents written in East Asian languages to be laid out on a page in a grid pattern. This approach is helped by that fact that ideographic, kana and hangul characters tend to be the same width.
CSS3 may specify a set of properties for applying grids and for managing non-full width characters inside the grid, such as Latin text, in various ways. Such properties would indicate whether the grid should apply to non-ideographic characters in the same way as to CJK characters
This picture shows some vertical Japanese text with no grid applied.
On this next picture we see the effect of applying one type of grid layout to the previous text.
Kumimoji is a Japanese word referring to the practice of combining up to 5 glyphs within a single wide-character glyph space.
Warichu is a Japanese word for a type of inline note, where the text runs on two lines.
Both of these effects will be available through CSS3 properties.
The fonts module of CSS3 will allow native forms of emphasis, such as the boten marks used in Japanese. These are similar in use to italicisation or bolding in Latin text - neither of which work well in small font sizes for on-screen ideographic text.
Various properties are needed to specify the type of boten character to be used, and the position relative to the base text (ie. top or bottom for horizontal text, and left or right for vertical).
Ruby is a type of annotation associated with a base text that is often used in Japanese, and to some extent in Chinese also, to provide pronunciation information for ideographs, and sometimes short explanations. The name 'ruby' originated from a named font size (about half the size of the normal 10 point font) used by British typesetters. In Japanese this is known as furigana.
The Ruby Annotation Recommendation describes how to mark up text so that it is clear which is the base text and which is the ruby annotation. (This is not a CSS specification.) There are simple and complex ruby models. The picture below shows the simplest form of markup described in the Recommendation. The text in the <rt> element is the annotation. The base text is in the <rb> element.
A user agent that displays ruby will normally display simple ruby text above horizontal base text and in a smaller font. For vertical text the ruby would by default be displayed to the right.
For a more detailed description see Ruby Markup and Styling.
Sometimes you may want to be able to control the location of the ruby text. For example, the Ruby Annotation specification says that it is common to place pinyin annotations below horizontal Chinese text, rather than above. CSS3 will provide a property will give you control over the location.
If you specify before, the ruby text will appear above horizontal base text and to the right of vertical text. Specify after, and the ruby text appears below horizontal base text and to the left of vertical text.
A value of 'right' might be useful for dealing with bopomofo annotations in Traditional Chinese text, where the ruby appears in a column to the right of each ideographic character, whether the text is horizontal or vertical.
Properties are also needed to indicate the relative alignment of ruby and base text when one is longer than the other. Basically the effect is applied to whichever is shorter, the ruby text or the base text.
The picture below shows how text would look if a ruby-align property were set to end. (Ignore the green line, which is there just to show the boundaries more clearly.)
If the ruby text is wider than the base text you can specify whether or not it overhangs any surrounding base text, and to what amount, using a property for ruby overhang.
The example in the following picture shows the effect of setting a ruby-overhang property to start. Note how the ruby text overlaps the preceding characters, but not the following base text. Note also that ruby does not overlap itself.
For a more detailed description of the currently proposed approach to ruby styling see Ruby Markup and Styling.
We have looked very briefly at some, but by no means all of the international support CSS3 modules will offer. What is more, we have only dipped our toes into the properties we have described. When CSS3 is done, many a section in this article could become an article in its own right.
Some of the features we have discussed have been implemented in Internet Explorer 5+, but the specifications have changed since then, so you should be very careful about using those features. There is no guarantee that anything you implement currently on Internet Explorer will be interoperable code in the long term. For example, the grid layout properties are now completely different.
With the possible exception of the list type property, there do not appear to be any implementations of these features on other user agents as yet.
The following list indicates the current status of the modules discussed here. Note that work is still ongoing on all of these specifications. Even the Candidate Recommendation may return to Working Draft again before it goes on to Recommendation.
This is not an exhaustive list of specifications that contain properties relevant to international text. For example, the proposed CSS3 Line Module, which has not yet been published as a Working Draft, promises to deliver some important control over behavior related to baseline alignment across scripts.
There is also additional work that must be completed in modules that are dependencies for these specifications.
If you are interested in seeing these features become available, please let the W3C know. It is always useful to us to hear what users want. There are also a number of practical ways to get involved.
The CSS Working Group has a lot of work on its hands at the moment, and work on these modules has been going slowly. As mentioned before, there is even currently discussion going on around features in the Candidate Recommendations. Your assistance in developing these specifications will be appreciated.
You can help move the specifications forward by reviewing and commenting on the public drafts that are made available. If you have expertise in this area, you might also consider participating in the CSS or Internationalization Working Groups to help move the work along. A significant amount of specification work benefitted from the Japanese JIS 4051 specification, but there still are a number of areas where it would be useful to have local input about or confirmation of the requirements and expected behaviors of text in other scripts.
Even when we move these specifications to Recommendation stage, the battle is not yet won. User agent developers must implement these features so that we can use them, and we need the properties to be widely implemented on a range of user agents. Again, the voice of local users is important in making that happen. User agent developers are unlikely to implement these features if they hear no-one asking for them.
Finally, you can help by keeping informed about progress on these features and implementing them in your content when they become available.
Tell us what you think (English).
Content first published 2005-04-22. Last substantive update 2006-05-17 GMT. This version 2011-05-04 8:51 GMT
For the history of document changes, search for article-css3-text in the i18n blog.
Copyright © 2005-2011 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.