W3C WD-css3-text-20050627

CSS3 Text Effects Module

W3C Working Draft 27 June 2005

This version:
http://www.w3.org/TR/2005/WD-css3-text-20050627/
Latest version:
http://www.w3.org/TR/css3-text/
Previous version:
http://www.w3.org/TR/2003/CR-css3-text-20030514/
Editor:
Elika J. Etemad
Previous Editor:
Michel Suignard (Microsoft)

Abstract

This CSS3 module defines properties for text manipulation and specifies their processing model. It covers line breaking, justification and alignment, white space handling, text decoration and text transformation.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Working Draft, and it is still very incomplete. In fact, the majority of its sections have not been added in. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Feedback on this draft should be posted to the www-style@w3.org mailing list with [CSS3 Text] in the subject line. You are strongly encouraged to complain if you see something stupid in this draft. I will do my best to respond to all feedback.

This Text Effects module and a separate (upcoming) Text Layout module replace and obsolete the May 2003 CSS3 Text Module Candidate Recommendation. Since this is a thorough overhaul of the previous version, a list of changes has been provided instead of a diff.

IF YOU HAVE IMPLEMENTED PROPERTIES FROM CSS3 TEXT CR please let me know so I can take that into account as I redraft the spec. You can post to www-style (public), post to the CSS WG mailing list (Member-restricted), or email me directly (personal).

This document has been produced as a combined effort of the W3C Internationalization Activity, and the Style Activity and is maintained by the CSS Working Group. It also includes contributions made by participants in the XSL Working Group (members only). Patent disclosures relevant to CSS may be found on the Working Group's public patent disclosure page..

Table of contents

1. Introduction

[document here]

2. Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 (see [RFC2119]). However, for readability, these words do not typically appear in all uppercase letters in this specification.

Additional key words, e.g. "User agent (UA)", are defined by CSS 2.1 ([CSS21], section 3.1).

2.1. Partial and Experimental Implementations

UAs must treat as invalid any properties or values they do not support. Experimental implementations should support only a vendor-prefixed syntax for the property/value.

3. White Space Processing

White space processing in CSS interprets white space characters for rendering: it has no effect on the underlying document data. In the context of CSS, the document white space set is defined to be any space characters (Unicode value U+0020), tab characters (U+0009), or line break characters (defined by the document format: typically line feed, U+000A). Control characters besides the white space characters and the bidi formatting characters (U+202x) are treated as normal characters and rendered according to the same rules.

The document parser must normalize line break character sequences according to its own format rules before CSS processing takes effect. However, in generated content strings the line feed character (U+000A) and only the line feed character is considered a line break sequence. For CSS white space processing all line breaks must be normalized to a single character representation—usually the line feed character (U+000A)—here called a "line break". This way, all recognized line breaks are treated the same and style rules behave consistently across systems.

The document parser may have not only normalized line break characters, but also collapsed other space characters or otherwise processed white space according to markup rules. Because CSS processing occurs after the parsing stage, it is not possible to restore these characters for styling. Therefore, some of the behavior specified below can be affected by these limitations and may be user agent dependent.

3.1. White Space Collapsing: the 'white-space-collapse' property

Name: white-space-collapse
Value: preserve | collapse | preserve-breaks | discard
Initial: collapse
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property declares whether and how white space inside the element is collapsed. Values have the following meanings:

collapse
This value directs user agents to collapse sequences of white space into a single character (or in some cases, no character).
preserve
This value prevents user agents from collapsing sequences of white space. Line breaks are preserved.
preserve-breaks
This value collapses white space as for 'collapse', but preserves line breaks.
discard
This value directs user agents to discard all white space in the element.

3.2. The White Space Processing Rules

Any text that is directly contained inside a block (not inside an inline) is treated as being inside an anonymous inline element.

For each inline (including anonymous inlines), white space characters are handled as follows, ignoring bidi formatting characters as if they were not there:

Then, the entire block is rendered. Inlines are laid out, taking bidi reordering into account, and wrapping as specified by the 'text-wrap' property.

As each line is laid out,

  1. A sequence of collapsible spaces (U+0020) at the beginning of a line is removed.
  2. A tab (U+0009) is rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. Tab stops occur at points that are multiples of 8 times the width of a space (U+0020) rendered in the block's font from the block's starting content edge.
  3. A sequence of collapsible spaces (U+0020) at the end of a line is removed.

3.2.1. Example of bidirectionality with white space collapsing

Consider the following markup fragment, taking special note of spaces (with varied backgrounds and borders for emphasis and identification):

<ltr>A <rtl> B </rtl> C</ltr>

where the <ltr> element represents a left-to-right embedding and the <rtl> element represents a right-to-left embedding. If the 'white-space-collapse' property is set to 'collapse', the above processing model would result in the following:

This would leave two spaces, one after the A in the left-to-right embedding level, and one after the B in the right-to-left embedding level. This is then ordered according to the Unicode bidirectional algorithm, with the end result being:

A  BC

Note that there are two spaces between A and B, and none between B and C. This is best avoided by putting spaces outside the element instead of just inside the opening and closing tags and, where practical, by relying on implicit bidirectionality instead of explicit embedding levels.

3.2.2. Line Break Transformation Rules

When line breaks are collapsible, they are either transformed into a space (U+0020) or removed depending on the script context before and after the line break.

The script context is determined by the Unicode-given script value [UAX24] of the first character that side of the line break. However, characters such as punctuation that belong to the COMMON and INHERITED scripts are ignored in this check; the next character is examined instead. The UA must not examine characters outside the block and may limit its examination to as few as four characters on each side of the line break. If the check fails to find an acceptable script value (i.e. it has hit the check limits), then the script context is neutral.

Comments on how well this would work in practice would be very much appreciated, particularly from people who work with Thai and similar scripts.

3.2.3. Informative Summary of White Space Collapsing Effects

4. Line Breaking and Word Boundaries

In many writing systems, words are always separated by spaces or punctuation. In the absence of a hyphenation dictionary, a line break can occur only at these explicit word boundaries. In Chinese and Japanese typography, however, no spaces nor any other word separating characters are used. In these systems a line can break anywhere except between certain character combinations. Additionally the level of strictness in these restrictions can vary with the typesetting style.

Scripts like Thai, which uses a space to separate clauses rather than to separate words, present another type of line breaking case. The lack of visible word delimiters makes it similar to the CJK systems. However, like English in the absence of a hyphenating dictionary, Thai never breaks inside words. As a result, knowledge of the vocabulary is necessary to be able to correctly break a line of Thai text. To explicitly mark word boundaries, the zero width space (U+200B) can be used as a word delimiter in Thai and similar scripts.

4.1. Line Breaking Restrictions: the 'word-break' property

CSS distinguishes between two levels of strictness in the rules for implicit line breaking in CJK text. The precise set of rules in effect for the strict and loose levels is up to the UA and should follow language conventions. However, this specification does recommend that the following breaks be forbidden in strict line breaking and allowed in loose:

Information on line breaking conventions can be found in [JIS4051] for Japanese, [标点符号] for Chinese, and [?] for Korean, and in [UAX14].

Name: word-break
Value: normal | keep-all | loose | break-strict | break-all
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property specifies what set of line breaking restrictions are in effect within the element. Values have the following meanings:

normal
Breaks non-CJK scripts according to their own rules while using a strict set of line breaking restrictions for CJK scripts (Hangul, Japanese Kana, and CJK ideographs).
keep-all
Same as 'normal' for all non-CJK scripts. However, sequences of CJK characters can no longer break on implied break points. This option should only be used where the presence of white space characters still creates line-breaking opportunities, as in Korean.
loose
As for 'normal', but CJK scripts use a less restrictive set of line-breaking restrictions.
break-strict
Same as 'normal' for CJK scripts, but non-CJK scripts can break anywhere. This option is used mostly when the text is predominantly CJK characters with few non-CJK excerpts and it is desired that the text be more evenly distributed on each line.
break-all
As for 'break-strict', except CJK scripts break according to the rules for 'loose'.

When shaping scripts such as Arabic are allowed to break within words due to 'break-all' or 'break-strict', the characters must still be shaped as if the word were not broken.

4.2. Hyphenation: the 'hyphenate' property

Name: hyphenate
Value: none | auto
Initial: none
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property determines whether the line-breaking algorithm is allowed to use a hyphenation engine to break within words. Intra-word breaking restrictions have no effect when 'word-break' is 'break-all'. Possible values:

none
No intra-word breaking.
auto
Words can be broken at an appropriate hyphenation point. It requires that the user agent have an hyphenation dictionary for the language of the text being broken.

5. Text Wrapping

Text wrapping is controlled by the 'text-wrap' and 'word-wrap' properties:

5.1. Text Wrap Settings: the 'text-wrap' property

Name: text-wrap
Value: normal | unrestricted | none | suppress
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property specifies the mode of text wrap. Possible values:

normal
Lines may break at allowed break points, as determined by the line-breaking rules in effect.
none
Lines may not break; text that does not fit within the block box overflows it.
unrestricted
Lines may break between any two grapheme clusters. Line-breaking restrictions have no effect and hyphenation does not take place. Character shaping must ignore the break.
suppress
Line breaking is suppressed within the element: breaking is allowed, but priority is given to valid breakpoints before and after the element. If the text breaks, line-breaking restrictions are honored as for 'normal'.

When restricted text-wrapping is enabled, UAs that allow breaks at punctutation other than spaces should prioritize breakpoints. For example, if breaks after slashes have a lower priority than spaces, the sequence "check /etc" will never break between the '/' and the 'e'. The UA may use the width of the containing block, the document language, and other factors in assigning priorities.

Example of using 'text-wrap: suppress' in presenting a footer

The priority of breakpoints can be set to reflect the intended grouping of text.

Given the rules

footer { text-wrap: suppress; }
      

and the following markup:

<footer>
  <venue>27th Internationalization and Unicode Conference</venue>
  &#8226; <date>April 7, 2005<date> &#8226;
  <place>Berlin, Germany<place>
</footer>
      

In a narrow window the footer could be broken as

27th Internationalization and Unicode Conference •
April 7, 2005 • Berlin, Germany
      

or in a narrower window as

27th Internationalization and Unicode
Conference • April 7, 2005 •
Berlin, Germany
      

but not as

27th Internationalization and Unicode Conference • April
7, 2005 • Berlin, Germany
      

5.2. Force Wrapping: the 'word-wrap' property

Name: word-wrap
Value: normal | break-word
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property specifies whether the UA may break within a word to prevent overflow when an otherwise-unbreakable string is too long to fit within the containing block. It only has an effect when 'text-wrap' is either 'normal' or 'suppress'. Possible values:

normal
Lines may break only at allowed break points.
break-word
An unbreakable 'word' may be broken at an arbitrary point if there are no otherwise-acceptable break points in the line. Shaping characters are still shaped as if the word were not broken, and grapheme clusters must stay as one unit.

6. Alignment and Justification

6.1. Text Alignment: the 'text-align' property

Name: text-align
Value: start | end | left | right | center | justify | <string>
Initial: start
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property describes how inline contents of a block are horizontally aligned. Values have the following meanings:

start
The inline contents are aligned to the start edge of the line box.
end
The inline contents are aligned to the end edge of the line box.
left
The inline contents are aligned to the left edge of the line box. In vertical text, "left" is interpreted with respect to the beginning of the line stack rather than the top of the page.
right
The inline contents are aligned to the right edge of the line box. In vertical text, "right" is interpreted with respect to the beginning of the line stack rather than the top of the page.
center
The inline contents are centered within the line box.
justify
The text is justified according to the method specified by the 'text-justify' property.
<string>
When applied to a table cell, specifies a string on which all cells in its table column that also have a string value for 'text-align' will align (see the section on horizontal alignment in a column for details and an example). When applied to any other element, it is treated as 'start'.

A block of text is a stack of line boxes. In the case of 'start', 'end', 'left', 'right' and 'center', this property specifies how the inline boxes within each line box align with respect to the line box's sides: alignment is not with respect to the viewport. In the case of 'justify', the UA may stretch the inline boxes in addition to adjusting their positions. (See also the 'text-justify', 'text-justify-trim', 'text-kashida-space', 'letter-spacing' and 'word-spacing'.)

6.2. Last Line Alignment: the 'text-align-last' property

Name: text-align-last
Value: start | end | left | right | center | justify
Initial: start
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property describes how the last line of a block or a line right before a forced line break is aligned when 'text-align' is set to 'justify'. Values have the same meaning as for 'text-align'.

6.3. Justification Method: the 'text-justify' property

Name: text-justify
Value: auto | inter-word | inter-ideograph | inter-character | inter-cluster | kashida | size
Initial: auto
Applies to: all elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value

This property selects the justification method used when 'text-align' is set to 'justify'. Different values affects different type of writing systems in different ways. For justification purposes, writing systems are grouped as follows:

block
CJK (including Hangul and half-width kana) and by extension all "wide" characters. (See [UAX11])
clustered
South-East Asian scripts that have discrete units but do not use space between words (such as Thai, Lao, Khmer, Myanmar)
connected
Devanagari and other South Asian scripts using spaces between words and baseline connectors within words (such as Bengali and Gurmukhi)
cursive
Arabic and similar cursive scripts
discrete
Scripts that use spaces between words and have discrete, unconnected (in print) units within words, such as Latin, Greek, Cyrillic, Hebrew; this category also includes symbols and punctuation.

'text-justify' takes the following values:

auto
The UA determines the justification algorithm to follow, based on a balance between performance and adequate presentation quality. The UA may not, however, change the font size (height).
inter-word
Justification primarily flexes the word spaces
inter-ideograph
Justification primarily flexes spaces in all scripts and inter-graphemic boundaries in scripts that use no word spaces
inter-character
Justification primarily flexes both spaces and the graphemic boundaries in all scripts except those in the connected and cursive groups.
inter-cluster
Justification primarily flexes spaces in all scripts and grapheme cluster boundaries in cluster scripts.
kashida
Justification primarily stretches Arabic and related scripts through the use of kashida. Second and third priority flex points are the same as the 'inter-word' first- and second-priority flex points.
size
Instead of increasing spacing, this justification method increases the font size of all text on the line until the line box is full or until one of the font sizes reaches its maximum, whichever comes first. The exact sizing algorithm is UA-dependent. Any justification beyond that is done as for 'auto'.

How should Tibetan get classified? Should it be treated similar to Latin, with each grapheme cluster as a separate in-word unit and tseks justified like word spaces?

When justifying text, the user agent takes the remaining space between the ends of a line's contents and the edges of its line box, and distributes that space at flex points throughout the contents so that the contents exactly fill the line box. If the 'letter-spacing' and 'word-spacing' property values allow it, the user agent may also distribute negative space, putting more content on the line than would otherwise fit under normal spacing conditions. The exact algorithm is UA-dependent; however, CSS defines some general guidelines which must be followed when any justification method other than 'auto' is specified.

CSS defines two or three priorities for justification flex points, depending on the method. The first priority flex points must be evenly expanded or compressed to their limits before second priority flex points can be adjusted. The 'kashida' value also defines third-priority flex points; they may only be used after the second-priority flex points reach their limits. How any remaining space is distributed once the last-priority flex points also hit their limits is left to the UA. If the inline contents of a line cannot be stretched to the full width of the line box, then they must be aligned as specified by the 'text-align-last' property or as 'start' if 'text-align-last' is 'justify'.

The flex point prioritization for values of 'text-justify' is given below. Spacing must be distributed evenly between all flex points in a given prioritization group except for kashida; the priority of kashida compared to other flex points in the group may be different and this behavior is UA-dependent. The different types of flex points are defined as follows:

spaces
flex at spaces and other visible word separators
graphemes
flex between two grapheme clusters, when at least one of which belongs to the relevant script group
kashida
apply kashida elongation. This may be done in discrete kashida units, and the prioritization of kashida points is UA-dependent: for example, the UA may apply more at the end of the line. The UA should not apply kashida to fonts for which it is inappropriate. It may instead rely on other justification methods that lengthen Arabic segments (e.g. by substituting in swash forms).
font-size
flex by adjusting the font size, as described above
method: inter-word inter-ideograph inter-character inter-cluster kashida size
priority: 1st 2nd 1st 2nd 1st 2nd 1st 2nd 1st 2nd 3rd 1st 2nd
discrete spaces graphemes spaces graphemes spaces, graphemes n/a spaces graphemes spaces graphemes font-size UA-dependent
block spaces graphemes spaces, graphemes n/a spaces, graphemes n/a spaces graphemes spaces graphemes font-size UA-dependent
clustered spaces graphemes spaces, graphemes n/a spaces, graphemes n/a spaces, graphemes n/a spaces, graphemes n/a font-size UA-dependent
connected spaces n/a spaces n/a spaces n/a spaces n/a spaces n/a font-size UA-dependent
cursive spaces kashida spaces kashida spaces, kashida spaces kashida kashida spaces n/a font-size UA-dependent

7. Spacing

The next two properties refer to the <spacing-limit> value type, which is defined as follows:

<spacing-limit>
normal | <length> | <percentage>

If only two values are specified, the third is assumed to be the same as the second. If only one value is specified, all three values are the same.

normal
Specifies the normal optimum/minimum/maximum spacing, as defined by the current font and/or the user agent.
<length> or <percentage>
Specifies extra spacing in addition to the normal spacing. Percentages are with respect to the width of a space (U+0020). Values can be negative, but there may be implementation-dependent limits.

Should these descriptions be copied into each property?

7.1. Word Spacing: the 'word-spacing' property

Name: word-spacing
Value: <spacing-limit>{1,3}
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: refers to width of space (U+0020) glyph
Media: visual
Computed value: 'normal' or computed value or percentage

This property specifies spacing behavior between words.

The first 'word-spacing' value specifies the desired (optimum) spacing. The second value specifies the desired minimum spacing limit, and the third specifies the desired maximum spacing limit. If the minimum spacing value is greater than the optimum spacing value, then the used minimum spacing value becomes the optimum spacing value. If the maximum spacing value is less than the optimum spacing value, then the used maximum spacing value becomes the optimum spacing value. The text justification process must not violate the minimum spacing limit and should also avoid exceeding the maximum. (See the 'text-justify' property.)

Spacing is applied to each word-separator character left in the text after the white space processing rules have been applied and should be applied half on each side of the character. Word-separator characters include the space (U+0020), the no-break space (U+00A0), the Ethiopic word space (U+1361), the ideographic space (U+3000), the Aegean word separators (U+10100,U+10101), the Ugaritic word divider (U+1039F), and the Tibetan tsek (U+0F0B, U+0F0C). Is this list correct? If there are no word-separator characters, or if the word-separating character has a zero advance width (such as the zero width space U+200B) the user agent must not create an additional spacing between words. General punctuation and fixed-width spaces are not considered word-separators.

7.2. Tracking: the 'letter-spacing' property

Name: letter-spacing
Value: <spacing-limit>{1,3}
Initial: normal
Applies to: all elements
Inherited: yes
Percentages: refers to width of space (U+0020) glyph
Media: visual
Computed value: computed value

This property specifies spacing behavior between grapheme clusters.

A grapheme cluster is what a language user considers to be a character or a basic unit of the language. The term is described in detail in the Unicode Technical Report [UAX29]: Text Boundaries. This specification relies on the default (not tailored) rules only.

The first 'letter-spacing' value specifies the desired (optimum) spacing. The second value specifies the desired minimum spacing limit, and the third specifies the desired maximum spacing limit. If the minimum spacing value is greater than the optimum spacing value, then the used minimum spacing value becomes the optimum spacing value. If the maximum spacing value is less than the optimum spacing value, then the used maximum spacing value becomes the optimum spacing value. The text justification process must not violate the minimum spacing limit and should also avoid exceeding the maximum. (See the 'text-justify' property.)

Spacing should be applied half on each side of the grapheme cluster. Spacing must not be applied at the beginning or at the end of a line.

UAs must not apply letter-spacing to connected and cursive scripts.

When the resultant space between two characters is not the same as the default space, user agents should not use ligatures.

Should letter-spacing affect the spacing between (connected) Arabic segments? Or should it do something else? For scripts like Devanagari, should it extend baseline connectors between grapheme clusters, if possible, instead of ignoring the spacing?

7.3. Kashida Elongation: the 'text-kashida-space' property

Put something here. What sort of settings are needed?

To be continued...

8. Changes from the May 2003 CSS3 Text CR

Much of the text has been rewritten or severely revised, so all changes will not be listed here. Highlights include:

Many sections intended for this module are not yet represented in this draft. In particular, the 'text-justify-trim', 'text-indent', 'text-overflow', 'text-decoration', 'text-transformation', 'punctuation-trim', 'text-autospace', 'text-shadow', 'hanging-punctuation', 'kerning-mode', and related properties have not yet been evaulated.

Sections relating to text layout (vertical text, grids, 'text-combine') will be moved to a separate Text Layout module. These features may change greatly from the last revision, but they have not been dropped. The vertical text feature, for example, will likely be based on the methods described in Unicode Technical Note #22.

9. Acknowledgements

This specification would not have been possible without the help from: Ayman Aldahleh, Bert Bos, Tantek Çelik, Stephen Deach, Martin Dürst, Laurie Anna Edlund, Ben Errez, Yaniv Feinberg, Arye Gittelman, Martin Heijdra, Richard Ishida, Koji Ishii, Masayasu Ishikawa, Michael Jochimsen, Eric LeVine, Chris Lilley, Paul Nelson, Chris Pratley, Martin Sawicki, Rahul Sonnad, Frank Tang, Chris Thrasher, Etan Wexler, Chris Wilson, Masafumi Yabe and Steve Zilles.

10. References

10.1. Normative References

[CSS21]
Bert Bos; et al. Cascading Style Sheets, level 2 revision 1. 25 February 2004. W3C Candidate Recommendation. (Work in progress.) URL: http://www.w3.org/TR/2004/CR-CSS21-20040225
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. Internet RFC 2119.URL: http://www.ietf.org/rfc/rfc2119.txt
[UAX11]
Asmus Freytag. East Asian Width. 28 March 2005. Unicode Standard Annex #11. URL: http://www.unicode.org/unicode/reports/tr11/tr11-14.html
[UAX24]
Mark Davis. Script Names. 28 March 2005. Unicode Standard Annex #24. URL: http://www.unicode.org/unicode/reports/tr24/tr24-7.html
[UAX29]
Mark Davis. Text Boundaries. 25 March 2005. Unicode Standard Annex #29. URL: http://www.unicode.org/unicode/reports/tr24/tr24-7.html

10.2. Informative References

[标点符号]
标点符号用法 (Punctuation Mark Usage). 中华人民共和国国家标准. 1995.
[JIS4051]
JIS X 4051-1995. Line Composition Rules for Japanese Documents. (『日本語文晝の行組版方法』) Japanese Standards Association. 1995.
[UAX14]
Asmus Freytag. Line Breaking Properties. 29 March 2005. Unicode Standard Annex #14. URL: http://www.unicode.org/unicode/reports/tr14/tr14-17.html