W3C

CSS3 Text Module

W3C Candidate Recommendation 14 May 2003

This version:
http://www.w3.org/TR/2003/CR-css3-text-20030514
Latest version:
http://www.w3.org/TR/css3-text
Previous version:
http://www.w3.org/TR/2003/WD-css3-text-20030226
Editor:
Michel Suignard (Microsoft)

Abstract

This document presents a set of text formatting properties for CSS3. Many of these properties already existed in CSS2 [CSS2]. Many of the new properties have been added to address basic requirements in international text layout, particularly for East Asian and bidirectional text.

Status of This Document

This specification is one of the "modules" for the upcoming CSS level 3 (CSS3) specification. It has been developed by the CSS Working Group which is part of the Style Activity (see summary). It contains features to be included in CSS level 3.

This is a Candidate Recommendation, which means W3C believes the specification is ready to be implemented.

All persons are encouraged to review and implement this specification and send comments to the (archived) public mailing list www-style (see instructions). W3C Members can also send comments directly to the CSS Working Group.

For this specification to become a W3C Recommendation, the following criteria must be met:

  1. There must be at least two interoperable implementations for every feature in the specification.

    For the purposes of this criterion, we define the following terms:

    feature
    a section or subsection in the specification
    interoperable
    passing the respective test case(s) in the test suite, or, if the implementation is not a web browser, an equivalent test. Every relevant test in the test suite should have an equivalent test created if such a user agent (UA) is to be used to claim interoperability. In addition if such a UA is to be used to claim interoperability, then there must one or more additional UAs which can also pass those equivalent tests in the same way for the purpose of interoperability. The equivalent tests must be made publically available for the purposes of peer review.
    implementation
    a user agent which:
    1. implements the feature.
    2. is available (i.e. publicly downloadable or available through some other public point of sale mechanism). This is the "show me" requirement.
    3. is shipping (i.e. development, private or unofficial versions are insufficient).
    4. is not experimental (i.e. is intended for a wide audience and could be used on a daily basis.)
  2. A minimum of six months of the CR period must have elapsed. This is to ensure that enough time is given for any remaining major errors to be caught.

The comments that the CSS WG received on the last working draft, together with responses and resulting changes are listed in the disposition of comments.

This document has been produced as a combined effort of the W3C Internationalization Activity, and the Style Activity. It also includes extensive contribution made by participants in the XSL Working Group (members only). Finally, some of the proposal surfaced first in the Scalable Vector Graphics (SVG) 1.1 Specification [SVG1.1]. The text has been duplicated in this document to reflect which properties and specification should eventually be referenced in CSS itself.

Patent disclosures relevant to CSS may be found on the Working Group's public patent disclosure page.

To find the latest version of this Working Draft, please follow the "Latest version" link above, or visit the list of W3C Technical Reports.

Contents


1. Dependencies on other modules

This CSS3 module depends on the following CSS3 modules:

It has non-normative (informative) references to the following CSS3 modules:

2. Introduction

In both CSS1 and CSS2, text formatting has been limited to simple effects like: text decoration, text alignment and letter spacing. However, international typography contains types of formatting that could not be achieved without using special workarounds or graphics.

Along with already existing text-related properties, this document presents a number of new CSS properties to represent such formatting. The features this proposal covers include two of the most important features for East Asian typography: vertical text and layout grid.

There are a number of illustrations in this document for which the following legend is used:

Wide n
- wide-cell glyph (e.g. Han) which is the n-th glyph in the text run,
Narrow n
- narrow-cell glyph (e.g. Latin) which is the n-th glyph in the text run,
Connected n
- connected glyph (e.g. Arabic) which is the n-th glyph in the text run.

Many typographical properties in East Asian typography depends on the fact that a character is typically rendered as either a wide or narrow glyph. All characters described by the Unicode Standard [UNICODE] can be categorized by a width property. This is covered by the Unicode Standard Annex #11, East Asian Width [UAX-11].

The orientation which the above symbols assume in the diagrams corresponds to the orientation that the glyphs they represent are intended to assume when rendered in the UA (user agent). Spacing between these glyphs in the diagrams is usually symbolic, unless intentionally changed to make a point.

Furthermore, all properties, in addition to the noted values, take 'initial' and 'inherit'. These values are not repeated in property value enumerations.

This module uses extensively the 'before', 'after', 'start' and 'end' notation to specify the four edges of a box relative to its text advance direction, independently of its absolute orientation in terms of 'top', 'bottom', 'left' and 'right' (corresponding respectively to the 'before', 'after', 'start' and 'end' positions in a typical Western text layout). This notation is also used extensively in [XSL1.0] for the same purpose.

The term 'Latin' is used frequently in this document to designate behavior shared among popular writing scripts in Europe and America based on the Latin, Greek and Cyrillic scripts.

Finally, in this document, requirements are expressed using the key words "MUST", "MUST NOT", "REQUIRED", "SHALL" and "SHALL NOT". Recommendations are expressed using the key words "SHOULD", "SHOULD NOT" and "RECOMMENDED". "MAY" and "OPTIONAL" are used to indicate optional features or behavior. These keywords are used in accordance with [RFC2119]. For legibility these keywords are used in small caps form.


3. Text layout

3.1. Text layout introduction

This section describes the text layout features supported by CSS, which includes support for various international writing directions, such as left-to-right (e.g., Latin scripts), right-to-left (e.g., Hebrew or Arabic), bidirectional (e.g., mixing Latin with Arabic) and vertical (e.g., Asian scripts).

The 'direction' property, already defined in CSS2, determines an inline-progression. The 'block-progression' property determines a block-progression. The 'writing-mode' shorthand combines inline and block progression together. For example, Latin scripts are typically written with a left to right inline-progression and a top to bottom block-progression.

The glyph orientation is the orientation of the rendered visual shape of characters relative to the block-progression and the bottom of the block box.

Within a line, the inline-progression for characters is based on the current glyph orientation, the metrics of the glyph just rendered, kerning tables in the font, and the current values of various attributes and properties, such as the spacing properties.

For many combinations of 'direction', 'block-progression' and glyph orientation values, the proper directionality and ordering of text are determined the Unicode Bidirectional Algorithm [UAX9]. CSS relies on that algorithm to achieve proper bidirectional text rendering and possible reordering. Furthermore, with the 'unicode-bidi' property, the style sheet can influence the bidirectional algorithm by allowing new embedding levels and direction overrides.

Note: The Unicode Standard Annex #9, The Bidirectional Algorithm [UAX9] defines a bidirectional algorithm that determines the character directionality for bidirectional text. The display ordering of bidirectional text depends upon the directional properties of the characters in the text.

The HTML 4.01 specification ([HTML401], section 8.2) defines bidirectionality behavior for HTML elements. Conforming HTML user agents MAY therefore ignore the 'direction' and 'unicode-bidi' properties in author and user style sheets. The style sheet rules that would achieve the bidirectionality behavior specified in HTML 4.01 are given in the sample style sheet. The HTML 4.01 specification also contains more information on bidirectionality issues.

Note: HTML 4.01 only allows the change of inline-progression whereas the 'block-progression' and the 'writing-mode' properties allow the change of the block-progression.

3.2. Setting the inline-progression and block-progression: the 'direction', 'block-progression' properties and the shorthand 'writing-mode' property

Name: direction
Value: ltr | rtl
Initial: ltr
Applies to: all elements and generated content, but see prose
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

The 'direction' property sets the inline-progression value. Possible values:

ltr
Left-to-right direction.
rtl
Right-to-left direction.

This property specifies the inline-progression and the direction of embeddings and overrides (see 'unicode-bidi') for the Unicode Bidirectional Algorithm [UAX9]. The values 'ltr' and 'rtl' are interpreted relative to the 'block-progression'. For example, a 'ltr' inline-progression goes from the left to the right of the box when the 'block-progression' is set to 'tb'; the same 'ltr' inline-progression goes from the top to the bottom of the box when 'block-progression' is set to 'rl'.

This property also affects the direction of table column layout, the direction of the overflow when determined by the inline-progression (such as the 'start' and 'end' values of the 'text-align' property), the initial alignment of text and the position of an incomplete last line in a block in case of 'text-align: justify' and many other properties affected by inline-progression changes.

Note: Even when the inline-progression is left-to-right or right-to-left, some or all of the character content within a given element might advance in the opposite direction because of the Unicode Bidirectional Algorithm [UAX9] or because of explicit text advance overrides due to the usage of this property and 'unicode-bidi' on children elements.

For the 'direction' property to have any effect on inline-level elements, the following conditions must be met:

For more on bidirectional text, see the section about Embedding and override.

Note: The 'direction' property, when specified for table column elements, is not inherited by cells in the column since column elements are never the ancestors of their constituent cell elements. Thus, CSS cannot easily capture the "dir" attribute inheritance rules described in [HTML4.01], section 11.3.2.

Name: block-progression
Value: tb | rl | lr
Initial: tb
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

The 'block-progression' property sets the block-progression value and the flow orientation. Possible values:

tb
Top-to-bottom direction. The flow orientation is horizontal.
rl
Right-to-left direction. The flow orientation is vertical.
lr
Left-to-right direction. The flow orientation is vertical.

An inline-level element that has a different 'block-progression' from its parent becomes an 'inline-block' element[CSS3-box]. Two cases are possible:

  1. The two block-progressions are perpendicular to each other (for example, 'tb' and 'lr'). In such cases, the content height of the element within the line box height is determined by its maximum inline progression dimension (advance width). However, the resulting line height is determined by other properties such as the 'text-height' and the 'line-stacking-strategy' [CSS3-line].
  2. The two block-progressions are parallel to each other (for example, 'rl' and 'lr'). In such cases, the content width of the element within the line box width is determined by its maximum inline progression dimension.

In horizontal flow orientations, the top and bottom margins can be collapsed. For vertical flow orientations, the left and right margin can be collapsed. See "Collapsing margins" in the CSS3 Box module [CSS3-box] for the details of collapsing margins.

Name: writing-mode
Value: lr-tb | rl-tb | tb-rl | tb-lr
Initial: not defined for shorthand properties
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: see individual properties

The 'writing-mode' property is a shorthand property for the 'direction' property and the 'block-progression' property. Although strictly speaking, the property has no initial value, it is equivalent to 'lr-tb'. The definition of the property values are established by the following table, which shows the setting of the constituent properties and example of common usage.

writing-mode: direction: block-progression: Common Usage:
lr-tb ltr tb Latin-based, Greek, Cyrillic writing systems (and many others)
rl-tb rtl tb Arabic, Hebrew writing systems
tb-rl ltr rl some East Asian writing systems
tb-lr rtl lr Mongolian writing system

In the following example, two blocks elements (1 and 3) separated by an image (2) are presented in various flow orientations.

Here is a diagram of a horizontal flow ("writing-mode: lr-tb"):

Diagram of a horizontal flow

Here is a diagram for a vertical flow used in East Asia ("writing-mode: tb-rl"):

Diagram of a vertical flow with right-to-left block-progression

And finally, here is a diagram for another flow used for Uighur and Mongolian ("writing-mode: tb-lr"):

Diagram of a vertical flow with left-to-right block-progression

In East Asian documents, it is often preferred to display certain Latin-based strings, such as numerals in a year, always in a horizontal flow orientation regardless of the flow orientation of the line of text these strings appear in, as in:

Layout of Tate Naka Yoko, showing a group of glyphs appearing
horizontally in a vertical column of text Example of Tate Naka Yoko, showing the
year 1996 appearing horizontally in a column of vertical text

Horizontal in vertical ("Tate-chu-yoko")

In Japanese, this effect is known as "Tate-chu-yoko". In order to achieve it in an XHTML context, the Latin string should be enclosed in a span element with a horizontal flow orientation, as in:

.date {writing-mode: lr-tb;}
<span class="date">1996</span>

3.3. Glyph orientation within a text run: the 'glyph-orientation-vertical' and 'glyph-orientation-horizontal' properties

In some cases, it is required to alter the orientation of a sequence of glyphs relative to the block-progression. The requirement is particularly applicable to vertical layouts of East Asian documents, where sometimes half-width Latin text is to be displayed horizontally and other times vertically.

Two properties control the glyph orientation relative to the block-progression. 'glyph-orientation-vertical' controls glyph orientation when the flow orientation is vertical. 'glyph-orientation-horizontal' controls glyph orientation when the flow orientation is horizontal.

Name: glyph-orientation-vertical
Value: <angle> | auto | upright | inline
Initial: auto
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)
<angle>
Although any angle value may be used, the behavior related to the value is determined by rounding it to the nearest multiple of 90 degrees.
auto
The glyph orientation is determined automatically based on the Unicode character code of the rendered character.
upright
Glyphs are oriented as if an <angle> of "0deg" had been specified. However all vertical alternates of the glyphs should be used. Enclosing punctuations such as parentheses should be oriented to face in the text they enclose. The user agent may use heuristic to determine the best orientation for symbols that are flow orientation dependent.
inline
All glyphs are laid out top to bottom regardless of inherent direction. The embedding levels, as determined by the bidirectional algorithm [UAX9], are used to set the orientation of some glyphs (see following prose). For this value of 'glyph-orientation-vertical', the directionality of characters cannot be changed by the 'direction' property.

The bidirectional algorithm [UAX9] applies differently depending on the value of the glyph vertical orientation, either specified as an <angle> value, or as implied by one of the 'glyph-orientation-vertical' keyword values. Possible effects:

Conforming user agents MUST at least support the 'auto' and "90deg" value. The user agent MAY round the actual value of the angle to the values of glyph rotation supported by the user agent. However, this does not affect the computed value.

The glyph orientation affects the amount that the current text position advances as each glyph is rendered. It also affects how the glyph is aligned relative to the baseline. When the flow orientation is vertical and the 'glyph-orientation-vertical' value results in a glyph orientation angle which is a multiple of 180deg, then the current text position is incremented according to the vertical metrics of the glyph, and the glyph is aligned using the vertical alignment-point as described in the CSS3 Line module [CSS3-line].

The diagrams below illustrate different uses of 'glyph-orientation-vertical'. The diagram on the left shows the result of the mixing of full-width ideographic characters with normal-width Latin characters when 'glyph-orientation-vertical' for the span containing the Latin characters is either auto or "90deg". The diagram on the right show the result of mixing full-width ideographic characters with normal-width Latin characters when the span containing the Latin characters is specified to have a 'glyph-orientation-vertical' of "0deg".

Note: The effect on the right can be also be achieved by using full-width Latin characters and using 'glyph-orientation-vertical: auto' for the span containing the ideographic characters and the full-width Latin characters.

Layout of mixed glyphs in vertical-ideographic mode.
Wide-cell glyphs are upright, Non-wide-cell glyphs are rotated by 90
degrees. Example of mixed Japanese and English in vertical-ideographic layout.
Japanese glyphs are upright, English rotated. Layout of mixed glyphs in
vertical mode. All glyphs are upright. Example of mixed Japanese and English in
vertical layout. All glyphs are upright.
Name: glyph-orientation-horizontal
Value: <angle> | auto | inline
Initial: auto
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)
<angle>
Although any angle value may be used, the behavior related to the value is determined by rounding it to the nearest multiple of 90 degrees.
auto
The glyph orientation relative to the inline-progression is determined automatically based on the Unicode character code of the rendered character.
inline
All glyphs are laid out left to right regardless of inherent direction. The embedding levels, as determined by the bidirectional algorithm [UAX9], are used to set the orientation of some glyphs (see following prose). For this value of 'glyph-orientation-horizontal', the directionality of characters cannot be changed by the 'direction' property.

The bidirectional algorithm [UAX9] applies differently depending on the value of the glyph horizontal orientation, either specified as an <angle> value, or as implied by one of the 'glyph-orientation-horizontal' keyword values. Possible effects:

Conforming user agents MUST at least support the 'auto' and "0deg" value. The user agent MAY round the actual value of the angle to the values of glyph rotation supported by the user agent. However, this does not affect the computed value.

The glyph orientation affects the amount that the current text position advances as each glyph is rendered. It also affects how the glyph is aligned relative to the baseline. When the inline-progression is horizontal and the 'glyph-orientation-horizontal' value results in a glyph orientation angle which is a multiple of "180deg", then the current text position is incremented according to the horizontal metrics of the glyph, and the glyph is aligned using the horizontal alignment-point as described in the CSS3 Line module [CSS3-line].

3.4. Embedding and override: the 'unicode-bidi' property

Name: unicode-bidi
Value: normal | embed | bidi-override
Initial: normal
Applies to: all elements and generated content, but see prose
Inherited: no
Percentages: N/A
Media: visual
Computed value: specified (except for initial and inherit)

This property allows further control of the Unicode Bidirectional Algorithm [UAX9] by allowing new embedding levels or direction overrides. Values for this property have the following meanings:

normal
The element does not open an additional level of embedding with respect to the bidirectional algorithm. For inline-level elements, implicit reordering works across element boundaries.
embed
If the element is inline-level, this value opens an additional level of embedding with respect to the bidirectional algorithm. The direction of this embedding level is given by the 'direction' property. Inside the element, reordering is done implicitly. This corresponds to adding a LRE (U+202A; for 'direction: ltr') or RLE (U+202B; for 'direction: rtl') at the start of the element and a PDF (U+202C) at the end of the element.
bidi-override
For inline-level elements this creates an override. For block-level elements this creates an override for inline-level descendents not within another block. This means that inside the element, reordering is strictly in sequence according to the 'direction' property; the implicit part of the bidirectional algorithm is ignored. This corresponds to adding a LRO (U+202D; for 'direction: ltr') or RLO (U+202E; for 'direction: rtl') at the start of the element and a PDF (U+202C) at the end of the element.

The final order of characters in each block-level element is the same as if the bidirectional control codes had been added as described above, mark-up had been stripped, non-textual entities such as images treated as object replacement characters (U+FFFC), and the resulting character sequence had been passed to an implementation of the Unicode Bidirectional Algorithm [UAX9] for plain text that produced the same line-breaks as the styled text.

Note: In order to be able to flow inline boxes in a uniform direction (either entirely left-to-right or entirely right-to-left), more inline boxes (including anonymous inline boxes) may have to be created, and some inline boxes may have to be split up and reordered before flowing.

Because the Unicode algorithm has a limit of 61 levels of embedding, care should be taken not to use 'unicode-bidi' with a value other than 'normal' unless appropriate. In particular, a value of 'inherit' should be used with extreme caution. However, for elements that are, in general, intended to be displayed as blocks, a setting of 'unicode-bidi: embed' is preferred to keep the element together in case display is changed to inline (see example below).

The following example shows an XML document with bidirectional text. It illustrates an important design principle: DTD designers should take bidirectionality into account both in the language proper (elements and attributes) and in any accompanying style sheets. The style sheets should be designed so that bidirectional rules are separate from other style rules. The bidirectional rules should not be overridden by other style sheets so that the document language's or DTD's bidirectional behavior is preserved.

Example(s):

In this example, lowercase letters in element contents stand for inherently left-to-right characters and uppercase letters represent inherently right-to-left characters:

<div xml:lang="he">
  <par>HEBREW1 HEBREW2 english3 HEBREW4 HEBREW5</par>
  <par>HEBREW6 <emph>HEBREW7</emph> HEBREW8</par>
</div>
<div xml:lang="en">
  <par>english9 english10 english11 HEBREW12 HEBREW13</par>
  <par>english14 english15 english16</par>
  <par>english17 <quo xml:lang=he">HEBREW18 english19 HEBREW20</quo></par>
</div>

Since this is XML, the style sheet is responsible for setting the writing direction. This is the style sheet:

/* Rules for bidirectional */
div:lang(he)   {direction: rtl}
quo:lang(he)   {direction: rtl; unicode-bidi:  embed}
par:lang(en)   {direction: ltr}

/* Rules for presentation */
div, par  {display: block}
emph      {font-weight: bold}

The div element with xml:lang="he" is a block with a right-to-left base direction, the div element with xml:lang="en" is a block with a left-to-right base direction. The par elements are blocks that inherit the base direction from their parents. Thus, the first two par elements are read starting at the top right, the final three are read starting at the top left.

The emph element is inline-level, and since its value for 'unicode-bidi' is 'normal' (the initial value), it has no effect on the ordering of the text. The quo element, on the other hand, creates an embedding.

The formatting of this text might look like this if the line length is long:

               5WERBEH 4WERBEH english3 2WERBEH 1WERBEH

                                8WERBEH 7WERBEH 6WERBEH

english9 english10 english11 13WERBEH 12WERBEH

english14 english15 english16

english17 20WERBEH english19 18WERBEH

Note that the quo embedding causes HEBREW18 to be to the right of english19.

If lines have to be broken, it might be more like this:

       2WERBEH 1WERBEH
  -EH 4WERBEH english3
                 5WERB

   -EH 7WERBEH 6WERBEH
                 8WERB

english9 english10 en-
glish11 12WERBEH
13WERBEH

english14 english15
english16

english17 18WERBEH
20WERBEH english19

Because HEBREW18 must be read before english19, it is on the line above english19. Just breaking the long line from the earlier formatting would not have worked. Note also that the first syllable from english19 might have fit on the previous line, but hyphenation of left-to-right words in a right-to-left context, and vice versa, is usually suppressed to avoid having to display a hyphen in the middle of a line.

3.5. Script character classification: the 'text-script' property

Many text layout behaviors are relative to the script classification of the text content. The Unicode Technical Report [UAX-24]: "Script names" determines a script identifier for all characters.

Note: There is also an ISO draft standard [ISO15924] addressing script identification.

For some operations, such as baseline alignment, a dominant script is required to determine an alignment strategy for the whole element. A dominant script is established by setting the 'text-script' property to an explicit script identifier in conformance with [UAX-24], or by using the heuristic determination computed by the user agent when the 'text-script' value is set to 'auto'.

In many other cases, such as white space handling or text justification, the script property is used on a character by character basis. In those cases, the 'text-script' property can be used to set an homogeneous value for all characters of the element through the usage of an explicit script identifier. But, if the 'text-script' is set to 'auto', the user agent will establish a script property value for each character of the element.

Name: text-script
Value: auto | <script>
Initial: auto
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

Values have the following meanings:

auto
The user agent uses its own heuristic to determine the dominant script of the element's content. In the absence of any textual components with an unambiguous script identifier (or no textual content at all), the dominant script is 'LATIN'. An unambiguous script identifier is any script value other than 'COMMON' or 'INHERITED'. For operations that do not require a dominant script, but instead use a script determination on a character by character basis within the element, the user agent will determine a script value for each character based on its inherent script property as established by [UAX-24]. In such cases, the values 'COMMON' or 'INHERITED' are valid.

Note: The heuristic determining the dominant script may compare the script of the first character and last character descendant which have unambiguous script identifiers [UAX-24]. If they are identical the script is determined, otherwise a weighted method associated with the element's language setting or any other hints may be used.

<script>
A script identifier in conformance with [UAX-24]. If the script identifier is either 'COMMON' or 'INHERITED' the dominant script value is 'LATIN', otherwise the dominant script value is the specified value. All characters in the element are reclassified as belonging to this dominant script. For example, setting 'text-script' to 'HAN' makes the content behave as CJK content for line-breaking. Typically, an explicit script value should be used only when the textual content is script ambiguous and a specific behavior is sought.

Note: Script identifiers are not case-sensitive.


4. Text alignment and justification

4.1. Text alignment: the 'text-align' property

Name: text-align
Value: start | end | left | right | center | justify | <string>
Initial: start
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial, inherit and <string>, see prose for the latter)

This property describes how inline content of a block is aligned. Values have the following meanings:

start
The text is aligned on the start of the inline-progression.
end
The text is aligned on the end of the inline-progression.
left, right
In horizontal flow orientation, the text is aligned on the left or right respectively. In vertical flow orientation, the alignment SHOULD be interpreted relative to the 'block-progression'. That is, if the 'block-progression' value is 'rl', 'left' means 'the top of the block' and 'right' means 'the bottom of the block'. If the 'block-progression' value is 'lr', 'left' means 'the bottom of the block' and 'right' means 'the top of the block'. Unlike the 'start' and 'end' values, 'left' and 'right' are not related to the current inline-progression.
center
The text is center aligned.
justify
The text is justified. The justification algorithm can be further refined by using the 'text-justify' property. Although conforming CSS2 user agents could interpret the value 'justify' as 'start', conforming CSS3 user agents may not, unless a profile specifies otherwise.
<string>
Specifies a string on which cells in a table column will align (see the section on horizontal alignment in a column for details and an example). This value applies only to table cells. If set on other elements, the computed value is 'start', otherwise it is as specified.

A block of text is a stack of line boxes. In the case of 'start', 'end', 'left', 'right' and 'center', this property specifies how the inline boxes within each line box align with respect to the line box's start and end sides; alignment is not with respect to the viewport. In the case of 'justify', the UA may stretch the inline boxes in addition to adjusting their positions. (See also 'letter-spacing' and 'word-spacing'.)

Example(s):

In this example, note that since 'text-align' is inherited, all block-level elements inside the div element with 'class=important' will have their inline content centered.

div.important { text-align: center }

Note: The property's initial value has changed between CSS2 and CSS3 from being UA dependent in CSS2 to be related to the current inline-progression in CSS3 (through the usage of the 'start' value).

4.2. Justification: the 'text-justify' property

Name: text-justify
Value: auto | inter-word | inter-ideograph | distribute | newspaper | inter-cluster | kashida
Initial: auto
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property selects the justification algorithm used when 'text-align' is set to 'justify'. Most values affects different type of writing systems in different ways. Writing systems are grouped as follows:

Depending on script classification value (controlled by the 'text-script' property value) and the 'text-justify' property value, spacing may be altered between words or letters or both.

The possible values for the text-justify property are:

auto
The UA determines the justification algorithm to follow, based on a balance between performance and adequate presentation quality. Inter-word expansion is typically used for all scripts that use space as word delimiter. The concept of a word is script dependent, although the user agent determines the exact algorithm. If the 'text-kashida-space' property has a non zero percent value it is recommended to use kashida elongation for Arabic text. Inter-cluster spacing may also occur.
inter-word
Selects the simplest and fastest full justification behavior, which spreads the text evenly across the line by increasing the width of the space between words only. The concept of a word is script dependent, although the user agent determines the exact algorithm. At minimum, justification is expected to occur at each white space boundary. No expansion or compression occurs within the words, i.e. no additional letter spacing is created. No kashida effect takes place.

Note: White space does not include zero-width-space, therefore justification should not expand these characters. However justification is expected to expand white space with explicit width set by the 'word-spacing' property.

The diagram below illustrates this mode, by showing how the glyphs are laid out in the last two lines of an element:

Diagram showing glyph distribution in inter-word justification

Mixed glyph layout in the last two lines in an inter-word justified element

For example a viewer could render an 'inter-word' justified paragraph in the following way:

Example of inter-word justification applied to mixed Japanese
and English text

Inter-word justification applied to mixed text

newspaper
Selects the justification behavior in which both inter-word and inter-letter spacing can be expanded or reduced to spread the text across the whole line. Also, text distribution on any given line may depend on the layout or the contents of the previous or the following several lines. This is the significantly slower and more sophisticated type of the full justify behavior preferred in newspaper and magazines, as it is especially useful for narrow columns. For example, typically, compression is tried first. If unsuccessful, expansion occurs: inter-word spaces are expanded up to a threshold, and finally inter-letter expansion is performed. Inter-letter spacing is not applied to Devanagari and other South Asian writing systems using baseline connectors. Kashida elongation and inter-cluster spacing may occur. The threshold value may be related to the ratio of column width to font size. The exact layout algorithm is determined by the user agent. Further explanation about multi-column layout can be found in the CSS3 Multi-column layout module.

The diagram below illustrates this mode:

Diagram showing character distribution in newspaper
justification

Mixed glyph layout in the last two lines of a newspaper justified element

Note: In CSS3 a value of 'letter-spacing: 0' no longer strictly inhibits spacing-out of words for justification. The letter-spacing value is just an entry to the letter-spacing process that occurs prior to the possible justification process. Justification may alter the initial spacing between letters, especially with the 'text-justify: newspaper' value.

inter-ideograph
In this mode, letter-spacing modification only occurs for the CJK group. Others only use inter-word expansion. No kashida effect takes place. This is the preferred justification in the context of the Japanese writing system, but not Latin nor Korean.

The diagram below illustrates this mode:

Diagram showing glyph distribution in inter-ideograph
justification

Mixed glyph layout in the last two lines in an inter-ideograph justified element

Below is an example of how this mode would work:

Example of inter-ideograph justification applied to mixed
Japanese and English text

Inter-ideograph justification applied to mixed text

distribute
Like 'newspaper' it allows letter spacing modification for most script groups (except the Devanagari group), but unlike newspaper, it does not prioritize between word spacing and letter spacing, i.e. the space character gets the same letter spacing modification as others. And by consequence there are no variations between narrow and wide columns. Kashida elongation and inter-cluster spacing may occur. This value is best used in East Asian context.

The diagram below illustrates this mode:

Diagram showing character distribution in distribute
justification

Mixed glyph layout in the last two lines of a distribute justified element

For example a viewer could render a 'distribute' justified paragraph in the following way:

Example of distribute justification applied to mixed Japanese
and English text

Distribute justification applied to mixed text

inter-cluster
This is the Southeast Asian counterpart to 'inter-ideograph'. That is letter spacing only occurs between script-defined grapheme clusters occurring in Southeast Asian scripts.

Note: A grapheme cluster is what a language user consider to be a character or a basic unit of the language. The term is described in detail in the Unicode Technical Report [UAX-29]: Text Boundaries.

kashida
This is the Arabic counterpart to 'inter-ideograph'. Letter spacing may be increased between Arabic letters, the extra space being filled by kashida. The amount of kashida elongation is controlled by the 'text-kashida-space' property. If 'text-kashida-space' is set to '0%' (which is the initial value), there will be no kashida effect. No letter spacing occurs for other scripts.

The following table describes the expansion/compression strategy for the combination of each script groups and the text-justify property value for each relevant text-justify property value:

text-justify property value
Script groups auto* inter-word newspaper inter-ideograph distribute inter-cluster kashida
Latin** word-spacing only* word-spacing only prioritization between word-spacing and letter-spacing word-spacing only word-spacing and letter-spacing word-spacing only word-spacing only
CJK no extra spacing* no extra spacing letter-spacing letter-spacing letter-spacing no extra spacing no extra spacing
Devanagari*** word-spacing* word-spacing word-spacing word-spacing word-spacing word-spacing word-spacing
Arabic kashida and word-spacing* word-spacing kashida and word-spacing word-spacing kashida and word-spacing word-spacing kashida and word-spacing
SE Asian clusters inter-cluster spacing* inter-cluster spacing inter-cluster spacing no extra spacing inter-cluster spacing inter-cluster spacing no extra spacing

Interaction between text-justify values and script groups

*The values shown for the auto column are only a recommendation. The UAs might implement a different strategy.

**The Latin entry represents as well other scripts and writing systems used in Europe and America that use the same typographic convention for justification such as Greek, Cyrillic, etc.

***The Devanagari entry represents as well other scripts and writing systems used in India that use baseline connectors (such as Bengali and Gurmukhi).

4.3. Last line alignment: the 'text-align-last' property

Name: text-align-last
Value: start | end | center | left | right | justify | size
Initial: start
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property describes how the last line of the inline content of a block is aligned when 'text-align' is set to 'justify'. This also applies to the only line of a block if it contains a single line, the line preceding a br element in a XHTML context, or a hard line break in other languages, and to last lines of anonymous blocks. Possible values:

start, end and center
Start, end and center text respectively.
left, right
In horizontal flow orientation, the last line is aligned on the left or right respectively. In vertical flow orientation, the alignment SHOULD be interpreted relative to the 'block-progression'. That is, if the 'block-progression' value is 'rl', 'left' means 'the top of the block' and 'right' means 'the bottom of the block'. If the 'block-progression' value is 'lr', 'left' means 'the bottom of the block' and 'right' means 'the top of the block'. . Unlike the 'start' and 'end' values, the 'left' and 'right' are not related to the current inline-progression.
justify
The last line will be justified according to the 'text-justify' property value. However, if there is no expansion opportunity in the last line, the line might not appear justified.
size
The line content is scaled to fit on the line. All the fonts on the line MUST be scaled by the same factor. Typically this value is used for single line element. Finally, this value, unlike the others, may change (i.e. decrease) the number of lines in a block element.

The following XHTML example shows the usage of the alignment properties in a case where all lines are justified in a distributed justification. This is commonly found in East Asian typography:

p.distributealllines
 { text-align: justify;
   text-justify: distribute;
   text-align-last: justify }

4.4. Minimum and maximum font size: the 'min-font-size' and 'max-font-size' property

The two following properties are only used in conjunction with the 'text-align-last' property set to 'size'. They control the font-size adjustments allowed to to fit the line content within the line.

Name: min-font-size
Value: <'font-size'> | auto
Initial: auto
Applies to: all elements and generated content
Inherited: yes
Percentages: element's computed 'font-size'
Media: visual
Computed value: <font-size>

Possible values:

<'font-size'>
The font sizes of the last line of an element are not allowed to become smaller than the smaller of the computed 'font-size' value and the <'font-size'> value set to 'min-font-size'.
auto
The user agent determine the minimum readable font-size for the media. For example, a value of '8px' (relative to the viewing device) is recommended for Latin scripts.
Name: max-font-size
Value: <'font-size'> | auto
Initial: auto
Applies to: all elements and generated content
Inherited: yes
Percentages: element's computed 'font-size'
Media: visual
Computed value: <font-size>

Possible values:

<font-size>
The font sizes of the last line of an element are not allowed to become larger than the larger of the computed 'font-size' value and the value <'font-size'> set to 'max-font-size'.
auto
There is no upper limit to the font sizes of the last line of an element.

4.5. Additional compression: The 'text-justify-trim' property

Name: text-justify-trim
Value: none | punctuation | punctuation-and-kana
Initial: punctuation
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This sets the individual font blank space compression permissions for the text justification algorithm, when 'text-justify' is anything other than 'inter-word'. This special type of space compression occurs on the font level, i.e. the blank space within the glyphs themselves may be reduced without affecting the appearance of the filled parts of glyphs. This applies to wide-cell glyphs only. Possible values:

none
No wide-cell font space compression is allowed.

Diagram of glyph layout with no compression

Glyph layout with no compression

punctuation
Space can be taken away only from wide-cell punctuation glyphs.

Diagram of glyph layout with punctuation compression

Glyph layout with punctuation compression

punctuation-and-kana
Space compression is allowed on wide-cell punctuation and wide-cell Kana glyphs.

Diagram of character layout with punct-and-kana compression

Glyph layout with punctuation and Kana compression

4.6. Kashida effect: the 'text-kashida-space' property'

Name: text-kashida-space
Value: <percentage>
Initial: 0%
Applies to: block-level and inline-block elements
Inherited: yes
Percentages: as described
Media: visual
Computed value: <percentage>

Kashida is a typographic effect used in Arabic writing systems that allows glyph elongation at some carefully chosen points. Each elongation can be accomplished using a number of kashida glyphs, a single graphic or character elongation on each side of the kashida point. (The user agent MAY use either mechanism based on font or system capability). The 'text-kashida-space' property expresses the ratio of the kashida expansion size to the white space expansion size. The value '0%' means no kashida expansion. The value '100%' means kashida expansion only. This property has a visible effect with any justification style where kashida expansion is allowed (currently if the 'text-justify' property is set to: auto, kashida, distribute or newspaper).

In the diagram below showing two identical paragraphs of Arabic text, the blue line in the second line (not justified) shows the length that is used for kashida and divided among the elongation opportunities in the first line (justified), as indicated by the red underlines:

Example of kashida applied to Arabic text

Kashida applied to Arabic text

The 'text-kashida-space' property is set to 100% in this example, so all expansion occurs in the elongated glyphs and none between the word themselves.


5. Text indentation: the 'text-indent' property

Text indentation is controlled by the 'text-indent' property.

Name: text-indent
Value: [ <length> | <percentage> ] hanging?
Initial: 0
Applies to: block-level, inline-block elements and table cells
Inherited: yes
Percentages: refers to width of containing block
Media: visual
Computed value: absolute <length> or <percentage>, with the 'hanging' keyword is specified

This property specifies the indentation applied to lines of inline content in a block. The indentation only affects the first line of inline content in the block unless the 'hanging' keyword is specified, in which case it affects all lines except the first. Possible values:

<length>
The indentation is a fixed length.
<percentage>
The indentation is a percentage of the containing block inline-progression dimension.
hanging
When specified, the indentation affects all lines.

The amount of indentation is given by the length or percentage value. Percentages are relative to the containing block, even in the presence of floats. They are inherited as percentages, not as absolute lengths.

The box is indented with respect to the starting edge of the line box. User agents should render this indentation as blank space. When the 'text-align' property is not set to align the text at the starting edge, this property only specifies a minimum indentation. When the 'text-align' property is set to 'center', the content of the first line is centered within the line box inline progression minus the indentation.

The value of 'text-indent' may be negative, but there may be implementation-specific limits. If the value of 'text-indent' is negative, the value of 'overflow' [CSS3-box] will affect whether the text is visible.

Note: Since the 'text-indent' property inherits, when specified on a block element, it will affect descendent inline-block elements. For this reason, it is often wise to specify 'text-indent: 0' on elements that are specified 'display: inline-block'.

Example(s):

The following example causes the first line of a XHTML p element to be indented by '3em'.

     p     { text-indent: 3em; }

The following example causes the first line of a XHTML p element flush with the content edge and the following lines to be indented by '3em'.

     p     { text-indent: 3em hanging; }

6. Line breaking

6.1. Types of line breaking

In documents written in Latin-based languages, where runs of characters make up words and words are separated by spaces or hyphens, line breaking is relatively simple. In the most general case, (assuming no hyphenation dictionary is available to the UA), a line break can occur only at white space characters or hyphens, including U+00AD SOFT HYPHEN.

In ideographic typography, however, where what appears as a single glyph can represent an entire word and no spaces nor any other word separating characters are needed, a line breaking opportunity is not as obvious as a space. It can occur after or before many other characters. Certain line breaking restrictions still apply, but they are not as strict as they are in Latin typography.

Thai is another interesting example with its own special line breaking rules. Since Thai words are made up of runs of characters, it resembles Latin in that respect. But the lack of spaces as word delimiters, or in fact any consistent word delimiters, makes it similar to CJK. Thai, like Latin in the absence of a hyphenating dictionary, never breaks inside of words. In fact, a knowledge of the vocabulary is necessary to be able to correctly break a line of Thai text. To specify an explicit line breaking opportunity, the character U+200B ZERO WIDTH SPACE can be inserted in documents of Thai and similar scripts .

A number of levels of line-breaking strictness can be used in Japanese typography. These levels add or remove line breaking restrictions. The model presented in this specification distinguishes between two most commonly used line breaking levels for Japanese text, using the 'line-break' property.

In ideographic typography, it is also possible, though not always preferred, to allow line breaks to occur inside of quoted Latin and Hangul (Korean) words without following the line breaking rules of those particular scripts. The model proposed in this document gives the author control over that behavior through the 'word-break-cjk' property.

In addition, hyphenation is controlled by 'word-break-inside'.

The 'word-break' shorthand property sets 'word-break-cjk' and 'word-break-inside'.

Finally, there is an additional property 'wrap-option' which may influence line-breaking, especially the property value 'wrap-option: emergency' which provides for emergency word-breaking for long words.

Note: Line breaking is covered by the Unicode Standard Annex [UAX-14], available from the Unicode Web site. It contains a detailed recommendation and corresponding data for each Unicode character. The line breaking data for a character is formally independent from its inherent script value, although both are tightly correlated. Consequently, the 'text-script' property has no influence on line breaking and word breaking processing. The following properties descriptions use commonly script classification because the classification conveniently describes the specific cases of line breaking and word breaking.

6.2. Line breaking: the 'line-break' property

Name: line-break
Value: normal | strict
Initial: normal
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property selects the set of line breaking rules to be used for text. The values described below are especially useful to CJK authors, but the property itself is open to other, not yet specified settings for non-CJK authors as well. (This is an area for future expansion.)

normal
Selects the normal line breaking mode for CJK. While the UA is free to define its own line breaking restrictions for the 'normal' mode, it is recommended that breaks between a standard katakana or hiragana character and a small katakana or hiragana (respectively) character be allowed. That is the preference in modern Japanese typography, and is especially desirable for narrow columns. Japanese kana words may be long, and it is preferable to allow line breaks to occur among such characters than to have excessive expansion due to justification.
strict
Selects a more restrictive line breaking mode for CJK text. While the UA is free to define its own line breaking restrictions for the 'strict' mode, it is recommended that the restrictions specified by the Unicode Standard Annex [UAX-14] be followed. That implies that in this mode, small katakana and hiragana characters are not allowed to start a line if they follow a standard katakana or hiragana character.

Note: In Japanese, a set of line breaking restrictions is referred to as "Kinsoku". JIS X-4051 [JIS-X-4051] is a popular source of reference for this behavior using the strict set of rules. The rules described by JIS X-4051 have been superseded by the Unicode Technical Report #14.

Note: Both values: 'normal' and 'strict' imply that a set of line-breaking restrictions is in use.

6.3. Word breaking: the 'word-break-cjk', 'word-break-inside' properties and the shorthand 'word-break' property

Name: word-break-cjk
Value: normal | break-all | keep-all
Initial: normal
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property controls line-breaking behavior inside of words from a CJK point of view. Possible values:

normal
Keeps non-CJK scripts together (according to their own rules), while Hangul and CJK ideographs (including the Korean Hanja characters) break according to the rules set by 'line-break' property. Typically CJK ideographs and Hangul characters can break everywhere with a limited set of exception controlled by the 'line-break' property. The behavior of non-CJK scripts can also be superseded by using the value 'emergency' in the 'wrap-option' property, or the value 'hyphenate' in the 'word-break-inside' property.
break-all
Same as 'normal' for CJK ideographs and Hangul, but non-CJK scripts can break anywhere. This option is used mostly in a context where the text is predominantly using CJK characters with few non-CJK excerpts and it is desired that the text be better distributed on each line. The UAs MAY however limit the break everywhere behavior for script using clusters such as Thai.
keep-all
Same as 'normal' for all non-CJK scripts. CJK ideographs and Hangul are kept together. This removes line breaking opportunities between CJK ideographs and Hangul characters. This option should only be used in the context of CJK ideographs used in small clusters like in the Korean writing system where the presence of white space characters still create line breaking opportunities.

The following example shows a paragraph style where all non-CJK scripts can break anywhere.

p.anywordbreaks { word-break: break-all }
Name: word-break-inside
Value: normal | hyphenate
Initial: normal
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property controls the hyphenation behavior inside of words. Possible values:

normal
A word should always stay in a single line. However, this can be superseded by using the value 'break-all' in the 'word-break-cjk' property, or the value 'emergency' in the 'wrap-option' property. Moreover, explicit hyphenation characters (hyphen, soft hyphen, etc...) still create line breaking opportunities.
hyphenate
Words can be broken at an appropriate hyphenation point. It requires that the user agent have an hyphenation dictionary for the language of the text being broken. Setting this value activates the hyphenation engine in the user agent.

Note: Intra-word breaks may or may not be indicated by a visible hyphen, depending on the language. The hyphenation glyph may appear at the end of the line or at the start of the next line, and its actual shape may depend on the text language.

Name: word-break
Value: <'word-break-cjk'> || <'word-break-inside'>
Initial: see individual properties
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: see individual properties

The 'word-break' property is a shorthand property for setting 'word-break-cjk', and 'word-break-inside', at the same place in the style sheet.

The properties 'word-break-cjk' and 'word-break-inside' are first reset to their initial values (all 'normal'). Then, those properties that are given explicit values in the 'word-break' shorthand are set to those values.


7. Text Wrapping, White space Control and Text Overflow

The following section describes text wrapping, white space handling and text overflow. Text wrapping and white space handling are interrelated through the CSS2 'white-space' property combining these two effects together. Text wrapping and text overflow both deal with situation where the text reaches the flow after-edge of its containing box.

CSS3 clearly separates these three effects in different sets of property while keeping the 'white-space' property for compatibility reasons.

The following section frequently uses the term line feed character to specify the normalized newline indicator. In XML and HTML context, the line feed character is the LINE FEED (U+000A). In other contexts, it may be represented differently, for example by a CARRIAGE RETURN (U+000A). The term 'line feed character' represents the normalized newline character native to a given framework.

7.1. Text wrapping: the 'wrap-option' property

Name: wrap-option
Value: wrap | no-wrap | soft-wrap | emergency
Initial: wrap
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property controls whether or not text wraps when it reaches the flow edge of its containing block box. Several value descriptions use the term preserved line feed characters. A preserved line feed character (either from the source content or from occurrence of "\A" in generated content) is maintained for presentation purpose and may therefore influence text wrapping. The preserved status of line feed characters is determined by the 'linefeed-treatment' property. The 'wrap-option' possible values are:

wrap
The text is wrapped at the best line-breaking opportunity (if required) within the available block inline-progression dimension (block width in horizontal text flow). The best line-breaking opportunity is determined in priority by the existence of preserved line feed characters, or by the line-breaking algorithm controlled by the 'line-break' and word-break' properties.
no-wrap
The text is only wrapped where explicitly specified by preserved line feed characters. In the case when lines are longer than the available block width, the overflow will be treated in accordance with the 'overflow' property specified in the element.
soft-wrap
The text is wrapped after the last character which can fit before the ending-edge of the line and where explicitly specified by preserved line feed characters. No line-breaking algorithm is invoked. The intended usage is the rendering of a character terminal emulation.
emergency
The text is wrapped like for the 'wrap' case, except that the line-breaking algorithm will allow as a last resort option a text wrap after the last character which can fit before the ending edge of the line box, independently of 'line-break', 'word-break-cjk' and 'word-break-inside' properties. For example, this addresses the situation of very long words constrained in a fixed-width container with no scrolling allowed.

7.2. White space control: the 'linefeed-treatment', 'white-space-treatment', 'all-space-treatment' properties and the 'white-space' shorthand property

White space processing in the context of CSS is the mechanism by which all white space characters are interpreted for rendering purpose. The white space set is determined by the XML [XML1.0] specification as being a combination of one or more space characters (Unicode value U+0020), carriage returns (U+000D), line feed characters (U+000A), or tabs (U+0009).

Note: [HTML401] also defines the form feed character (U+000C) as a white space character, but that character is not part of any XHTML versions as they are all based on XML.

The amount of white space processing that can be achieved by a user agent that supports CSS is directly related to the CSS processing model, especially the document parsing and validation. After parsing and possible validation, the document tree may contain text nodes that contain unprocessed white space characters, or the document tree may already have been processed in a way that white space characters have been collapsed and partially removed (white space normalization).

In that respect, the CSS properties related to white space processing can only be effective if the CSS processor has access to the white space characters that were originally encoded in the document. However, end-of-line characters are typically handled (like by XML processors) in such a way that any arbitrary combination of end-of-line characters is replaced by a single line feed character.

Note: The first version of XML [XML1.0] only normalizes two characters sequences of (U+000D U+000A) or any U+000D not followed by U+000A to a single U+000A. The forthcoming version of XML [XML1.1] adds U+0085 (NEL) and U+2028 (LINE SEPARATOR) to the line feed normalization process. However the set of white space characters is unchanged. Notably, the character U+2029 (PARAGRAPH SEPARATOR) is not part of that set. If the characters U+2028 and U+2029 appears in text, they are treated as zero-width characters without semantic meaning.

Note: XML Schema, through its 'whiteSpace' facet can constrain exactly the type of white space characters still available to a rendering process like CSS for elements containing string datatype. In addition, some XML languages like [XHTML1.0] may have their own white space processing rules when parsing and validating documents with white space characters. Therefore, some of the behaviors described below may be affected by these limitations and may be user agent dependent in these contexts.

In addition, line feed characters can be inserted in generated content by using the '\A' string. The behavior of these inserted line feed characters is identical to original line feed characters part of the source document and is controlled by the same set of properties.

White space processing

Any text that is directly contained inside a block (not inside an inline) should be treated as an anonymous inline element.

For each inline (including anonymous inlines), the following steps are performed, ignoring bidirectional formatting characters as if they were not there:

  1. Each non-line feed white space character is treated as per the 'white-space-treatment' property.
  2. If 'all-space-treatment' is set to 'preserve', any sequence of spaces (U+0020) unbroken by an element boundary is treated as a sequence of non-breaking spaces. However, a line breaking opportunity exists at the end of the sequence.
  3. Each line feed character is treated as per the 'linefeed-treatment' property.
  4. If 'all-space-treatment' is set to 'collapse',
    1. every tab (U+0009) is converted to a space (U+0020)
    2. any space (U+0020) following another space (U+0020)--even a space before the inline, if that space also has 'all-space-treatment' set to collapse--is removed.

Then, the entire block is rendered. Inlines are laid out, taking bidirectional reordering into account, and wrapping as specified by the 'wrap-option', 'line-break' and 'word-break' properties.

As each line is laid out,

  1. If a space (U+0020) at the beginning of a line has all-space-treatment' set to 'collapse', it is removed.
  2. All tabs (U+0009) are rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. Tab stops occur at points that are multiples of 8 times the width of a space (U+0020) rendered in the block's font from the block's starting content edge.
  3. If a space (U+0020) at the end of a line has 'all-space-treatment' set to 'collapse', it is also removed.

Note: Tab stops line up in the block regardless of font change.

These rendering rules make no assumption about the storage model of these white space character sequences. It is outside the scope of CSS to determine the character code values accessible through programming interface such as DOM. These rules do not apply to elements that have an explicit white space rendering behavior (like the pre element in XHTML).

When white space characters are collapsed for rendering purpose, the text decoration style applied to the collapsed set is the one that would be applied to the first white space character of the original sequence.

The 'white-space' property is a shorthand property for 'linefeed-treatment', 'white-space-treatment', 'all-space-treatment' and 'wrap-option'.

Name: linefeed-treatment
Value: auto | ignore | preserve | treat-as-space | treat-as-zero-width-space | ignore-if-after-linefeed
Initial: auto
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property specifies the treatment of line feed characters for rendering purpose. Values have the following meanings:

auto
The user agent either transforms each line feed character to a space character (U+0020), transforms each line feed character to a zero width space character (U+200B), or removes the line feed characters, following the line feed conversion algorithm. The choice of the resulting character is conditioned by the script value of the characters preceding and following the line feed character which are part of the same inline text flow in the same block element. The script value of each character is determined by the 'text-script' property.
ignore
Line feed characters are ignored. They are removed and are not rendered.
preserve
Line feed characters indicate an end of line of boundary.
treat-as-space
Line feed characters are transformed for rendering purpose into a space character (U+0020). The result of the transformation can be treated by subsequent CSS processing (including white space collapsing).
treat-as-zero-width-space
Line feed characters are transformed for rendering purpose into a zero width space character (U+200B). The result of the transformation can be treated by subsequent CSS processing (including white space collapsing).
ignore-if-after-linefeed
Specifies that any line feed characters that immediately follow a line feed character, SHALL be discarded. This collapses multiple consecutive line feed characters into a single line feed.

Note: The Unicode Standard [UNICODE] specifies that the zero width space is considered a valid line-break point and that if two characters with a zero width space in between are placed on the same line they are placed with no space between them; and that if they are placed on two lines no additional glyph area, such as for a hyphen, is created at the line-break.

Line feed conversion algorithm

This algorithm is used when 'linefeed-treatment' is set to 'auto'. In determining how to convert a line feed character, a user agent should consider the following cases, whereby the scripts value of characters preceding and following the line feed determine the choice of the replacement. The script value of each character is determined by the 'text-script' property. (Note that if 'text-script' is set to 'auto', the determination is done character by character within the element, otherwise all characters share the same script value within the element.) Characters of COMMON script (such as punctuation) are treated as the same as the script on the other side:

  1. If the characters preceding and following the line feed character have a script value in which the space character (U+0020) is used as a word separator, the line feed character should be converted into a space character. Examples of such scripts are Latin, Greek, and Cyrillic.

  2. If the characters preceding and following the line feed character have either a ideographic-based script value or a script value which make them part of an ideographic-based writing system in which there is no word separator, the line feed should be converted into no character. Examples of such scripts or writing systems are Chinese, Japanese.

  3. If the characters preceding and following the line feed character have a non ideographic-based script vale in which there is no word separator, the line feed should be converted into a zero width space character (U+200B) or no character. Examples of such scripts are Thai, Khmer.

  4. If none of the conditions in (1) through (3) are true, the line feed character should be converted into a space character (U+0020).

Name: white-space-treatment
Value: ignore | preserve | ignore-if-before-linefeed | ignore-if-after-linefeed |
ignore-if-surrounding-linefeed
Initial: ignore-if-surrounding-linefeed
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

This property specifies the treatment for rendering purpose of the space character (U+0020) and other white space characters (except for line feed characters, since their treatment is determined by the 'linefeed-treatment' property). White space characters, when rendered as an advance width, use the width of the empty glyph normally used for the space character (U+0020). Values have the following meanings:

ignore
White space characters, except for line feed characters, are ignored. They are removed and are not rendered.
preserve
All white space characters other than line feed are rendered as they are (with advance width).
ignore-if-before-linefeed
Specifies that any white space characters, except for line feed characters, that immediately precedes a line feed character, SHALL be discarded. This action SHALL take place regardless of the setting of the 'linefeed-treatment' property.
ignore-if-after-linefeed
Specifies that any white space characters, except for line feed characters, that immediately follows a line feed character, SHALL be discarded. This action SHALL take place regardless of the setting of the 'linefeed-treatment' property.
ignore-if-surrounding-linefeed
Specifies that any white space characters, except for line feed characters, that immediately precedes or follows a line feed character, SHALL be discarded. This action SHALL take place regardless of the setting of the 'linefeed-treatment' property.
Name: all-space-treatment
Value: preserve | collapse
Initial: collapse
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)

The 'all-space-treatment' property specifies the treatment of all consecutive white space characters (with no exception for line feed characters, unlike the 'white-space-treatment' property). Values have the following meanings:

preserve
All white space characters are rendered as they are. The rendering of tab characters (U+0009) is described in the white-space processing section.
collapse
The white space characters are collapsed according to the rules described in White space processing.
Name: white-space
Value: normal | pre | nowrap | pre-wrap | pre-line
Initial: not defined for shorthand properties
Applies to: all elements and generated content
Inherited: yes
Percentages: N/A
Media: visual
Computed value: see individual properties

This property declares how 'white-space' inside the element is handled. Setting a value on the 'white-space' property sets the respective values on 'wrap-option', 'linefeed-treatment', 'white-space-treatment' and 'all-space-treatment'. Although, strictly speaking, the property has no initial value, it is equivalent to 'normal'. The definition of the property values are established by the following table, which shows the settings of the constituent properties.

white-space: wrap-option: linefeed-treatment: white-space-treatment: all-space-treatment:
normal wrap auto ignore-if-surrounding-linefeed collapse
pre no-wrap preserve preserve preserve
nowrap no-wrap auto ignore-if-surrounding-linefeed collapse
pre-wrap wrap preserve preserve preserve
pre-linewrappreserveignore-if-surrounding-linefeed collapse

Example(s):

The following examples show what white-space behavior is expected from the pre and p elements, the "nowrap" attribute in XHTML, and in generated content.

pre        { white-space: pre }
p          { white-space: normal }
td[nowrap] { white-space: nowrap }
:before, :after {white-space: pre-line }

In addition, the effect of a XHTML pre element with the non-standard "wrap" attribute is demonstrated by the following example:

pre[wrap]  {white-space: pre-wrap }

7.3. Text overflow: the 'text-overflow-mode', 'text-overflow-ellipsis' properties and the shorthand 'text-overflow' property

Text overflow deals with the situation where some textual content is clipped when it overflows the element's box in its inline-progression direction as determined by the 'direction' property value. This situation only occurs when the 'overflow' property has the values: hidden, scroll or auto (in the latter case only when the user agent introduces a scrolling mechanism).

Text overflow allows the author to introduce a visual hint at the two ending boundaries of the text flow within the element box (after and end). The hint is typically a horizontal ellipsis character (U+2026), although the hint may be some other string or even an image. Setting a non-empty string (or an URI for an image) for either text flow boundary enables the presentation of the hint. If both hints are enabled, only the 'after' hint is rendered. Initially, only the end of line hint is shown (correspond to the right of any over flown lines for left to right inline-progression).

Control over text-overflow is divided among properties: 'text-overflow-mode' controls the presence and position of the hint, 'text-overflow-ellipsis' controls what constitutes the hint. The shorthand property 'text-overflow' sets the other text flow properties.

Name: text-overflow-mode
Value: clip | ellipsis | ellipsis-word
Initial: clip
Applies to: block-level and inline-block elements
Inherited: no
Percentages: N/A
Media: visual
Computed value: specified value (except for initial and inherit)
clip
Clip text as appropriate for the text content. Glyphs representation of the text may be only partially rendered.
ellipsis
A visual hint is inserted at each box boundary where text overflow occurs. The 'text-overflow-ellipsis' property determines the content of the hint. The insertions take place after the the last letter that entirely fits on the line.
ellipsis-word
A visual hint is inserted at each box boundary where text overflow occurs. The 'text-overflow-ellipsis' property determines the content of the hint. The insertions take place after the last word that entirely fits on the line.

The overflow hints are active only for textual content. That is, the user agent MUST NOT render an overflow hint when only replaced content overflows.

Although the property is not inherited, overflowing children blocks that are either statically or relatively positioned and do not have a specified width or height will be hinted as specified by their parent text-overflow-mode property value.

Consider the following example:

<blockquote>
<p class="sentence"><span class="nowrap">I didn't like the play,</span> but then I saw
it under adverse conditions - the curtain was up.
<div class="attributed-to">_Groucho_Marx_</div>
</p>
</blockquote>

Here is the style sheet controlling the overflow situations:

blockquote        { width:100px; border: thin solid red; overflow: hidden;
                    text-overflow-mode:ellipsis;font-size:14px }
span.nowrap       { white-space : nowrap; }
div.attributed-to { position: relative;left:8px }

This will result in the content of the span to be partially visible and the ellipsis will be shown, the inner div which is relatively positioned will only show a partial ellipsis as it is offset by few pixels: