Copyright© 2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
XSL is a language for expressing stylesheets. It consists of two parts:
a language for transforming XML documents, and
an XML vocabulary for specifying formatting semantics.
An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary.
This is a W3C Working Draft for review by W3C members and other interested parties. This adds additional functionality to what was described in the previous draft. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. The XSL Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.
This document has been produced as part of the W3C Style Activity by the XSL Working Group (members only).
Comments may be sent to xsl-editors@w3.org. Public discussion of XSL takes place on the XSL-List mailing list.
This specification defines the Extensible Stylesheet Language (XSL). XSL is a language for expressing stylesheets. Given a class of arbitrarily structured XML [W3C XML] documents or data files, designers use an XSL stylesheet to express their intentions about how that structured content should be presented; that is, how the source content should be styled, laid out, and paginated onto some presentation medium, such as a window in a Web browser or a hand-held device, or a set of physical pages in a catalog, report, pamphlet, or book.
An XSL stylesheet processor accepts a document or data in XML and an XSL stylesheet and produces the presentation of that XML source content that was intended by the designer of that stylesheet. There are two aspects of this presentation process: first, constructing a result tree from the XML source tree and second, interpreting the result tree to produce formatted results suitable for presentation on a display, on paper, in speech, or onto other media. The first aspect is called tree transformation and the second is called formatting. The process of formatting is performed by the formatter. This formatter may simply be a rendering engine inside a browser.
Tree transformation allows the structure of the result tree to be significantly different from the structure of the source tree. For example, one could add a table-of-contents as a filtered selection of an original source document, or one could rearrange source data into a sorted tabular presentation. In constructing the result tree, the tree transformation process also adds the information necessary to format that result tree.
Formatting is enabled by including formatting semantics in the result tree. Formatting semantics are expressed in terms of a catalog of classes of formatting objects. The nodes of the result tree are formatting objects. The classes of formatting objects denote typographic abstractions such as page, paragraph, table, and so forth. Finer control over the presentation of these abstractions is provided by a set of formatting properties, such as those controlling indents, word- and letter-spacing, and widow, orphan, and hyphenation control. In XSL, the classes of formatting objects and formatting properties provide the vocabulary for expressing presentation intent.
The XSL processing model is intended to be conceptual only. An implementation is not mandated to provide these as separate processes. Furthermore, implementations are free to process the source document in any way that produces the same result as if it were processed using the conceptual XSL processing model. A diagram depicting the detailed conceptual model is shown below.
Tree transformation constructs the result tree. In XSL, this tree is called the element and attribute tree, with objects primarily in the "formatting object" namespace. In this tree, a formatting object is represented as an XML element, with the properties represented by a set of XML attribute-value pairs. The content of the formatting object is the content of the XML element. Tree transformation is defined in the XSLT Recommendation [XSLT]. A diagram depicting this conceptual process is shown below.
The XSL stylesheet is used in tree transformation. A stylesheet contains a set of tree construction rules. The tree construction rules have two parts: a pattern that is matched against elements in the source tree and a template that constructs a portion of the result tree. This allows a stylesheet to be applicable to a wide class of documents that have similar source tree structures.
Formatting interprets the result tree in its formatting object tree form to produce the presentation intended by the designer of the stylesheet from which the XML element and attribute tree in the "fo" namespace was constructed.
The vocabulary of formatting objects supported by XSL - the set of
fo: element types - represents the set of
typographic abstractions available to the
designer. Semantically, each formatting object represents a
specification for a part of the pagination, layout, and styling
information that will be applied to the content of that formatting
object as a result of formatting the whole result tree. Each
formatting object class represents a particular kind of formatting
behavior. For example, the block formatting object class represents
the breaking of the content of a paragraph into lines. Other parts of
the specification may come from other formatting objects; for
example, the formatting of a paragraph (block formatting
object)
depends on both the specification of properties on the block
formatting object and the specification of the layout structure into
which the block is placed by the formatter.
The properties associated with an instance of a formatting object control the formatting of that object. Some of the properties, for example "color", directly specify the formatted result. Other properties, for example 'space-before', only constrain the set of possible formatted results without specifying any particular formatted result. The formatter may make choices among other possible considerations such as esthetics.
Formatting consists of the generation of a tree of geometric areas, called the area tree. The geometric areas are positioned on a sequence of one or more pages (a browser typically uses a single page). Each geometric area has a position on the page, a specification of what to display in that area and may have a background, padding, and borders. For example, formatting a single character generates an area sufficiently large enough to hold the glyph that is used to present the character visually and the glyph is what is displayed in this area. These areas may be nested. For example, the glyph may be positioned within a line, within a block, within a page.
Rendering takes the area tree, the abstract model of the presentation (in terms of pages and their collections of areas), and causes a presentation to appear on the relevant medium, such as a browser window on a computer display screen or sheets of paper. The semantics of rendering are not described in detail in this specification.
The first step in formatting is to "objectify" the element and attribute tree obtained via an XSLT transformation. Objectifying the tree basically consists of turning the elements in the tree into formatting object nodes and the attributes into property specifications. The result of this step is the formatting object tree.
As part of the step of objectifying, the characters that occur in the result tree are replaced by fo:character nodes. The first phase of the Unicode Bidirectional Algorithm is used to convert implicit Bidirectional mark-up to explicit nodes with the appropriate directional properties. Care is taken to insure that the introduced explicit nodes are properly nested in the formatting object tree.
The second phase in formatting is to refine the formatting object tree to produce the refined formatting object tree. The refinement process handles the mapping from properties to traits. This consists of: (1) shorthand expansion into individual properties, (2) mapping of corresponding properties, (3) determining computed values (may include expression evaluation), and (4) inheritance. Details on refinement are found in [5 Property Refinement / Resolution].
The refinement step is depicted in the diagram below.
The third step in formatting is the construction of the area tree. The area tree is generated as described in the semantics of each formatting object. The traits applicable to each formatting object class control how the areas are generated. Although every formatting property may be specified on every formatting object, for each formatting object class, only a subset of the formatting properties are used to determine the traits for objects of that class.
Area generation is depicted in the diagram below.
Unlike the case of HTML, element names in XML have no intrinsic presentation semantics. Absent a stylesheet, a processor could not possibly know how to render the content of an XML document other than as an undifferentiated string of characters. XSL provides a comprehensive model and a vocabulary for writing such stylesheets using XML syntax.
This document is intended for implementors of such XSL processors. Although it can be used as a reference manual for writers of XSL style sheets, it is not tutorial in nature.
XSL builds on the prior work on Cascading Style Sheets [CSS2] and the Document Style Semantics and Specification Language [DSSSL]. While many of XSL's formatting objects and properties correspond to the common set of properties, this would not be sufficient by itself to accomplish all the goals of XSL. In particular, XSL introduces a model for pagination and layout that extends what is currently available and that can in turn be extended, in a straightforward way, to page structures beyond the simple page models described in this specification.
Doing both scrollable document windows and pagination introduces new complexities to the styling (and pagination) of XML content. Because pagination introduces arbitrary boundaries (pages or regions on pages) on the content, concepts such as the control of spacing at page, region, and block boundaries become extremely important. There are also concepts related to adjusting the spaces between lines (to adjust the page vertically) and between words and letters (to justify the lines of text). These do not always arise with simple scrollable document windows, such as those found in today's browsers. However, there is a correspondence between a page with multiple regions, such as a body, header, footer, and left and right side-bars, and a Web presentation using "frames". The distribution of content into the regions is basically the same in both cases, and XSL handles both cases in an analogous fashion.
XSL was developed to give designers control over the features needed when documents are paginated as well as to provide an equivalent "frame" based structure for browsing on the Web. To achieve this control, XSL has extended the set of formatting objects and formatting properties. In addition, the selection of XML source components that can be styled (elements, attributes, text nodes, comments, and processing instructions) is based on XSLT and XPath, thus providing the user with an extremely powerful selection mechanism.
The design of the formatting objects and properties extensions was first inspired by DSSSL. The actual extensions, however, do not always look like the DSSSL constructs on which they were based. To either conform more closely with the CSS2 specification or to handle cases more simply than in DSSSL, some extensions have diverged from DSSSL.
There are several ways in which extensions were made. In some cases, it sufficed to add new values, as in the case of those added to reflect a variety of writing-modes, such as top-to-bottom and bottom-to-top, rather than just left-to-right and right-to-left.
In other cases, common properties that are expressed in CSS2 as one property with multiple simultaneous values, are split into several new properties to provide independent control over independent aspects of the property. For example, the "white-space" property was split into four properties: a "space-treatment" property that controls how white-space is processed, a "line-feed" property that controls how line-feeds are processed, a "white-space-collapse" property that controls how multiple consecutive spaces are collapsed, and a "wrap-option" property that controls whether lines are automatically wrapped when they encounter a boundary, such as the edge of a column. The effect of splitting a property into two or more (sub-)properties is to make the equivalent existing CSS2 property a "shorthand" for the set of sub-properties it subsumes.
In still other cases, it was necessary to create new properties. For example, there are a number of new properties that control how hyphenation is done. These include identifying the script and country the text is from as well as such properties as "hyphenation-character" (which varies from script to script).
Some of the formatting objects and many of the properties in XSL come from the CSS2 specification, ensuring compatibility between the two.
There are four classes of XSL properties that can be identified as:
CSS properties by copy (unchanged from their CSS2 semantics)
CSS properties with extended values
CSS properties broken apart and/or extended
XSL only properties
As mentioned above, XSL uses XSLT and XPath for tree construction and pattern selection, thus providing a high degree of control over how portions of the source content are presented, and what properties are associated with those content portions, even where mixed namespaces are involved.
For example, the patterns of XPath allow the selection of a portion of a string or the Nth text node in a paragraph. This allows users to have a rule that makes all third paragraphs in procedural steps appear in bold, for instance. In addition, properties can be associated with a content portion based on the numeric value of that content portion or attributes on the containing element. This allows one to have a style rule that makes negative values appear in "red" and positive values appear in "black". Also, text can be generated depending on a particular context in the source tree, or portions of the source tree may be presented multiple times with different styles.
There is a set of formatting objects in XSL to describe both the layout structure of a page or "frame" (how big is the body; are there multiple columns; are there headers, footers, or side-bars; how big are these) and the rules by which the XML source content is placed into these "containers".
The layout structure is defined in terms of one or more instances of a "simple-page-master" formatting object. This formatting object allows one to define independently filled regions for the body (with multiple columns), a header, a footer, and side-bars on a page. These simple-page-masters can be used in page sequences that specify in which order the various simple-page-masters shall be used. The page sequence also specifies how styled content is to fill those pages. This model allows one to specify a sequence of simple-page-masters for a book chapter where the page instances are automatically generated by the formatter or an explicit sequence of pages such as used in a magazine layout. Styled content is assigned to the various regions on a page by associating the name of the region with names attached to styled content in the result tree.
In addition to these layout formatting objects and properties, there are properties designed to provide the level of control over formatting that is typical of paginated documents. This includes control over hyphenation, and expanding the control over text that is kept with other text in the same line, column, or on the same page.
The extension of the properties and formatting objects, particularly in the area on control over the spacing of blocks, lines, and page regions and within lines, necessitated an extension of the CSS2 box formatting model. This extended model is described in [4 Area Model] of this specification. The CSS2 box model is a subset of this model. See the mapping of the CSS2 box model terminology to the XSL Area Model terminology in [7.2 XSL Areas and the CSS Box Model]. The area model provides a vocabulary for describing the relationships and space-adjustment between letters, words, lines, and blocks.
There are many scripts, in particular in the Far East, that are typically set with words proceeding from top-to-bottom and lines proceeding either from right-to-left (most common) or from left-to-right. Other directions are also used. Properties expressed in terms of a fixed, absolute frame of reference (using top, bottom, left, and right) and which apply only to a notion of words proceeding from left to right or right to left do not generalize well to the languages based on these scripts.
For this reason XSL (and before it DSSSL) uses a relative frame of reference for the formatting object and property descriptions. Just as the CSS2 frame of reference has four directions (top, bottom, left and right), so does the XSL relative frame of reference have four directions (before, after, start, and end), but these are relative to the "writing-mode". The "writing-mode" property is a way of controlling the directions needed by a formatter to correctly place glyphs, words, lines, blocks, etc. on the page or screen. The "writing-mode" expresses the basic directions noted above. There are writing-modes for "left-to-right - top-to-bottom" (denoted as "lr-tb"), "right-to-left - top-to-bottom" (denoted as "rl-tb"), "top-to-bottom - right-to-left" (denoted as "tb-rl") and more, see [7.21.43 "writing-mode"] for the description of the "writing-mode" property. Typically, the writing-mode value specifies two directions, the first is the inline-progression-direction which determines the direction in which words will be placed and the second is the block-progression-direction which determines the direction in which blocks (and lines) are placed one after another.
Besides the directions that are explicit in the name of the value of the "writing-mode" property, the writing-mode determines other directions needed by the formatter, such as the shift-direction (used for sub- and super-scripts), etc.
Because XML, unlike HTML, has no built-in semantics, there is no built-in notion of a hypertext link. Therefore, XSL has a formatting object that expresses the dual semantics of formatting the content of the link reference and the semantics of following the link.
XSL provides a few mechanisms for changing the presentation of a link target that is being visited. One of these mechanisms permits indicating the link target as such; another allows for control over the placement of the link target in the viewing area; still another permits some degree of control over the way the link target is displayed in relationship to the originating link anchor.
The Tree Construction is described in "XSL Transformations" [XSLT].
The provisions in "XSL Transformations" form an integral part of this recommendation and are considered normative.
The XSL namespace has the URI http://www.w3.org/1999/XSL/Format.
NOTE:The
1999in the URI indicates the year in which the URI was allocated by the W3C. It does not indicate the version of XSL being used.
XSL processors must use the XML namespaces mechanism [W3C XML Names] to recognize elements and attributes from this namespace. Elements from the XSL namespace are recognized only in the stylesheet, not in the source document. Implementors must not extend the XSL namespace with additional elements or attributes. Instead, any extension must be in a separate namespace.
This specification uses the prefix fo: for referring
to elements in the XSL namespace. However, XSL stylesheets are free
to use any prefix, provided that there is a namespace declaration that
binds the prefix to the URI of the XSL namespace.
An element from the XSL namespace may have any attribute not from the XSL namespace, provided that the expanded-name of the attribute has a non-null namespace URI. The presence of such attributes must not change the behavior of XSL elements and functions defined in this document. Thus, an XSL processor is always free to ignore such attributes, and must ignore such attributes without giving an error if it does not recognize the namespace URI. Such attributes can provide, for example, unique identifiers, optimization hints, or documentation.
It is an error for an element from the XSL namespace to have attributes with expanded-names that have null namespace URIs (i.e., attributes with unprefixed names) other than attributes defined for the element in this document.
NOTE:The conventions used for the names of XSL elements, attributes, and functions are as follows: names are all lower-case, hyphens are used to separate words, dots are used to separate names for the components of complex datatypes, and abbreviations are used only if they already appear in the syntax of a related language such as XML or HTML.
The aim of this section is to describe the general process of formatting, enough to read the area model and the formatting-object descriptions and properties and to understand the process of refinement.
Formatting is the process of turning the result of an XSL transformation into a tangible form for the reader or listener. This process comprises several steps, some of which depend on others in a non-sequential way. Our model for formatting will be the construction of an area tree, which is an ordered tree containing geometric information for the placement of every glyph, shape, and image in the document, together with information embodying spacing constraints and other rendering information; this information is referred to under the rubric of traits, which are to areas what properties are to formatting objects and attributes are to XML nodes. Section 4 (see [4 Area Model]) will describe the area tree and define the default placement-constraints on stacked areas. However, this is an abstract model which need not be actually implemented in this way in a formatter, so long as the resulting tangible form obeys the implied constraints.
Formatting objects are elements in the formatting-object tree, whose names are from the XSL namespace; a formatting object belongs to a class of formatting objects identified by its element name. The formatting behavior of each class of formatting objects is described in terms of what areas are created by a formatting object of that class, how the traits of the areas are established, and how the areas are structured hierarchically with respect to areas created by other formatting objects. Sections 6 (see [6 Formatting Objects]) and Section 7 (see [7 Formatting Properties] describe formatting objects and their properties.
Some formatting objects are block-level and others are inline-level. This refers to the types of areas which they generate, which in turn refer to their default placement method. Inline-areas (for example, glyph-areas) are collected into lines and the direction in which they are stacked is the inline-progression-direction. Lines are a type of block-area and these are stacked in a direction perpendicular to the inline-progression-direction, called the block-progression-direction. See Section 4 for detailed decriptions of these area types and directions.
In Western writing systems, the block-progression-direction is "top-to-bottom" and the inline-progression-direction is "left-to-right". This specification treats other writing systems as well and introduces the terms "block" and "inline" instead of using absolute indicators like "vertical" and "horizontal". Similarly this specification tries to give relatively-specified directions ("before" and "after" in the block-progression-direction, "start" and "end" in the inline-progression-direction) where appropriate, either in addition to or in place of absolutely-specified directions such as "top", "bottom", "left", and "right". These are interpreted according to the value of the writing-mode property.
Central to this model of formatting is refinement. This is a computational process which finalizes the specification of properties based on the attribute values in the XML result tree. Though the XML result tree and the formatting-object tree have very similar structure, it is helpful to think of them as separate conceptual entities. Refinement involves
propagating the various inherited values of properties (both implicitly and those with an attribute value of "inherit"),
evaluating expressions in property value specifications into actual values, which are then used to determine the value of the properties
converting relative numerics to absolute numerics,
constructing some composite properties from more than one attribute,
Some of these operations (particularly evaluating expressions) depend on knowledge of the area tree. Thus refinement is not necessarily a straightforward, sequential procedure, but may involve look-ahead, back-tracking, or control-splicing with other processes in the formatter. Refinement is described more fully in Section 5. See (see [5 Property Refinement / Resolution]).
To summarize, formatting proceeds by constructing an area tree (containing areas and their traits) which satisfies constraints based on information contained in the XML result tree (containing element nodes and their attributes). Conceptually, there are intermediate steps of constructing a formatting-object tree (containing formatting objects and their properties) and refinement; these steps may proceed in an interleaved fashion during the construction of the area tree.
This subsection contains a conceptual description of how formatting could work. This conceptual procedure does not mandate any particular algorithms or data structures as long as the result obeys the implied constraints.
The procedure works by processing formatting objects. Each object, while being processed, may initiate processing in other objects. While the objects are hierarchically structured, the processing is not; processing of a given object is rather like a co-routine which may pass control to other processes, but pick up again later where it left off. The procedure starts by initiating the processing of the fo:root formatting object.
Unless otherwise specified, processing a formatting object creates areas and returns them to its parent to be placed in the area tree. Like a co-routine, it resumes control later and initiates formatting of its own children (if any), or some subset of them. The formatting object supplies parameters to its children based on the traits of areas already in the area tree, possibly including areas generated by the formatting object or its ancestors. It then disposes of the areas returned by its formatting-object children. It might simply return such an area to its parent (and will always do this if it does not generate areas itself), or alternatively it might arrange the area in the area tree according to the semantics of the formatting object; this may involve changing its geometric position. It terminates processing when all its children have terminated processing (if initiated) and it is finished generating areas.
Some formatting objects do not themselves generate areas, instead these formatting objects simply return the areas returned to them by their children. Alternatively, a formatting object may continue to generate (and return) areas based on information discovered while formatting its own children; for example, the fo:page-sequence formatting object will continue generating pages as long as it contains a flow with unprocessed descendants.
Areas received by an fo:root formatting object are pages, and are simply placed as children of the area tree root in the order in which they are returned, with no geometrical implications.
As a general rule, the order of the area tree parallels the order of the formatting-object tree. That is, if one formatting object precedes another in the depth-first traversal of the formatting-object tree, with neither containing the other, then all the areas generated by the first will precede all the areas generated by the second in the depth-first traversal of the area tree, unless otherwise specified. Typical exceptions to this rule would be things like inline floats, block floats, and footnotes.
At the end of the procedure, the areas and their traits have been constructed, and they are required to satisfy constraints described in the definitions of their associated formatting objects, and in the area model section. In particular, size and position of the areas will be subject to the placement and spacing constraints described in the area model, unless the formatting-object definition indicates otherwise.
The formatting-object definitions, property descriptions, and area model are not algorithms. Thus, the formatting-object semantics do not specify how the line-breaking algorithm must work in collecting characters into words, positioning words within lines, shifting lines within a container, etc. Rather this specification assumes that the formatter has done these things and describes the constraints which the result is supposed to satisfy.
In XSL, one creates a tree of formatting objects that serve as inputs or specifications to a formatter. The formatter generates a hierarchical arrangement of areas which comprise the formatted result. This section defines the general model of areas and how they interact. The purpose is to present an abstract framework which is used in describing the semantics of formatting objects. It should be seen as describing a series of constraints for conforming implementations, and not as prescribing particular algorithms.
The formatter generates an ordered tree, the area tree, which describes a geometric structuring of the output medium. The terms child, sibling, parent, descendant, and ancestor refer to this tree structure. The tree has a root node.
Each area tree node other than the root is called an area and is associated to a rectangular portion of the output medium. Areas are not formatting objects; rather, a formatting object generates zero or more rectangular areas, and normally each area is generated by a unique object in the formatting object tree.
NOTE:The only exceptions are when several leaf nodes of the formatting object tree are combined to generate a single area, for example when several characters in sequence generate a single ligature glyph. In all such cases, relevant properties such as font-family and font-size are the same for all the generating formatting objects.
An area has a content-rectangle, the portion in which its child areas are assigned, and optional padding and border. The diagram shows how these portions are related to one another. The outer bound of the border is called the border-rectangle, and the outer bound of the padding is called the padding-rectangle.
Each area has a set of traits, a mapping of names to values, in the way elements have attributes and formatting objects have properties. Individual traits are used either for rendering the area or for defining constraints on the result of formatting, or both. Traits used strictly for formatting purposes or for defining constraints may be called formatting traits, and traits used for rendering may be called rendering traits. For the complete list of the type of traits see [C Property Index].
The semantics of each type of formatting object that generates areas are given in terms of which areas it generates and their place in the area-tree hierarchy. This may be further modified by interactions between the various types of formatting objects. The properties of the formatting object determine what areas are generated and how the formatting object's content is distributed among them. (For example, a word that is not to be hyphenated may not have its glyphs distributed into areas on two separate line-areas.)
The traits of an area are either:
1. "directly-derived" -- The values of directly-derived traits are the computed value of a property of the same name on the generating formatting object, or
2. "indirectly-derived" -- The values of indirectly-derived traits are the result of a computation involving the computed values of one or more properties on the generating formatting object, other traits on this area or other interacting areas (ancestors, parent, siblings, and/or children) and/or one or more values constructed by the formatter. The calculation formula may depend on the type of the formatting object.
This description assumes that refined values have been computed for all properties of formatting objects in the result tree, i.e., all relative and corresponding values have been computed and the inheritable values have been propagated as described in [5 Property Refinement / Resolution]. This allows the process of inheritance to be described once and avoids a need to repeat information on computing values in this description.
There are two types of areas: block-areas and inline-areas. These differ according to how they are typically stacked by the formatter. An area can have block-area children or inline-area children as determined by the generating formatting object, but a given area's children must all be of one type. Although block-areas and inline-areas are typically stacked, some areas can be explicitly positioned.
A line-area is a special kind of block-area whose children are all inline-areas. A glyph-area is a special kind of inline-area which has no child areas, and has a single glyph image as content.
Typical examples would be: a paragraph rendered by using an fo:block formatting object, which generates block-areas, and a character rendered by using an fo:character formatting object, which generates an inline-area (in fact, a glyph-area).
Associated with any area are two directions, which are derived from the generating formatting object's "writing-mode" and "reference-orientation" properties: the block-progression-direction is the direction for stacking block-area descendants of the area, and the inline-progression-direction is the direction for stacking inline-area descendants of the area. Another trait, the shift-direction, is present on inline-areas and refers to the direction in which baseline shifts are applied. Also the glyph-orientation defines the orientation of glyph-images in the rendered result.
The Boolean trait is-indent-reference, determines
whether or not an area establishes a coordinate system for
specifying indents. An area for which this trait is true
is called a reference-area.
Only a reference-area may have a block-progression-direction which is different from
that of its parent.
A reference-area may be either a
block-area or an inline-area.
A set of traits describes the position and dimensions of the area. Other traits include:
the is-first and is-last traits, which are Boolean traits
indicating the order in which areas are generated by a given
formatting object. is-first is true
for the first area (or only area) generated by a formatting object, and is-last
is true for the last area (or only area).
the amount of space outside the border-rectangle: space-before, space-after, space-start, and space-end (though some of these may be required to be zero on certain classes of area);
the thickness of each of the four sides of the padding: padding-before, padding-after, padding-start, and padding-end;
the style, thickness, and color of each of the four sides of the border: border-before, etc.; and
the background rendering of the area: background-color, etc.
NOTE:'Before', 'after', 'start', and 'end' refer to relative directions and are defined below.
a nominal-font, which identifies the font that is deemed to be used within that area.
Unless otherwise specified, the traits of an area generated by a formatting object are present, and have the same name and value on the area.
As described above, the content-rectangle is the rectangle bounding the inside of the padding and is used to describe the constraints on the positions of descendant areas. It is possible that marks from glyph contents or descendant areas may appear outside the content-rectangle.
Related to this is the allocation-rectangle of an area, which is used to describe the constraints on the position of the area within its parent area. For an inline-area this extends to the content-rectangle in the block-progression-direction and to the border-rectangle in the inline-progression-direction.
Allocation- and content-rectangles of an inline-area
For a block-area, it extends to the border-rectangle in the block-progression-direction and outside the border-rectangle in the inline-progression-direction by an amount equal to the end-indent, and in the opposite direction by an amount equal to the start-indent. The traits actual-block-progression-dimension and actual-inline-progression-dimension of an area apply to the content-rectangle.
NOTE:The inclusion of space outside the border-rectangle of a block-area in the inline-progression-direction does not affect placement constraints, and is intended to promote compatibility with the CSS box model.
Allocation- and content-rectangles of a block-area
The edges of a rectangle are designated as follows:
the before-edge is the edge occurring first in the block-progression-direction and perpendicular to it;
the after-edge is the edge opposite the before-edge;
the start-edge is the edge occurring first in the inline-progression-direction and perpendicular to it,
the end-edge is the edge opposite the start-edge.
The following diagram shows the correspondence between the various edge names for a mixed writing-mode example:
For purposes of this definition, the content-rectangle of an area uses the inline-progression-direction and block-progression-direction of that area; but the border-rectangle, padding-rectangle, and allocation-rectangle use the directions of its parent area. Thus the edges designated for the content-rectangle may not correspond with the same-named edges on the padding-, border-, and allocation-rectangles. This is important in the case of nested block-areas with different writing-modes.
Each inline-area has a position-point determined by the formatter, on the start-edge of its allocation-rectangle; for a glyph-area, this is a point on the leading edge of the glyph on its preferred baseline (see below). This is script-dependent and does not necessarily correspond to the (0,0) coordinate point used for the data describing the glyph shape.
In the area tree, the set of areas with a given parent is ordered. The terms initial, final, preceding, and following refer to this ordering.
In any ordered tree, this sibling order extends to an ordering of the entire tree in at least two ways.
In the pre-order traversal order of a tree, the children of each node (their order unchanged relative to one another) come after the node, but before any following siblings of the node or of its ancestors.
In the post-order traversal order of a tree, the children of each node come before the node, but after any preceding siblings of the node or of its ancestors.
"Preceding" and "following", when applied to non-siblings, will depend on the extension order used, which must be specified. However, in either of these given orders, the leaves of the tree (nodes without children) are unambiguously ordered.
Given a particular order for the tree, a subset S of the tree is contiguous if for any elements A and B of S, S also contains any node that follows A and precedes B in the given order. There is a relative version of this: for a particular subset C of nodes of a tree, if S is a subset of C, then S is contiguous relative to C if for any elements A and B of S, S also contains any node of C that follows A and precedes B.
This section defines the notion of block-stacking constraints and inline-stacking constraints involving areas. These are defined as ordered relations, i.e. if A and B have a stacking constraint it does not necessarily mean that B and A have a stacking constraint.
The area-class trait is an enumerated value which is
xsl-normal for an area which is stacked with
other areas in sequence. A normally-sequenced area is an
area for which this trait is xsl-normal. Other
values mark an area as not following the main sequence (e.g., floats, footnotes and
absolutely positioned areas).
If P is a block-area, then there is a fence before P if P is a reference area or if the border-before-width or padding-before-width of P are nonzero. Similarly, there is a fence after P if P is a reference area or if the border-after-width or padding-after-width of P are nonzero.
If A and B are normally-sequenced areas, and S is a sequence of space-specifiers, it is defined that A and B have block-stacking constraint S if any of the following conditions holds:
B is a block-area which is the first normally-sequenced child of A, and S is the sequence consisting of the space-before of B.
A is a block-area which is the last normally-sequenced child of B, and S is the sequence consisting of the space-after of A.
A and B are both block-areas, and either
a. B is the next normally-sequenced sibling area of A, and S is the sequence consisting of the space-after of A and the space-before of B;
b. B is the first normally-sequenced child of a block-area P, there is no fence before P, A and P have a block-stacking constraint S', and S consists of S' followed by the space-before of B; or
c. A is the last normally-sequenced child of a block-area P, there is no fence after P, P and B have a block-stacking constraint S'', and S consists of the space-after of A followed by S''.
When A and B have a block-stacking constraint, the adjacent edges of A and B are an ordered pair defined as:
In case 1, the before-edge of the content rectangle of A and the before-edge of the allocation-rectangle of B.
In case 2, the after-edge of the content rectangle of A and the after-edge of the allocation-rectangle of B.
In case 3a, the after-edge of the allocation rectangle of A and the before-edge of the allocation-rectangle of B.
In case 3b, the first of the adjacent edges of A and P, and the before-edge of the allocation-rectangle of B.
In case 3c, the after-edge of the allocation rectangle of A and the second of the adjacent edges of P and B.
NOTE:The intention of the definition is to identify areas at any level of the tree which have only space between them.
Block-stacking constraint example
Example. In this diagram each node represents a block-area. Assume that all padding and border widths are zero, and none of the areas are reference-areas. Then P and A have a block-stacking constraint, as do A and B, A and C, B and C, C and D, D and B, B and E, D and E, and E and P; these are the only pairs in the diagram having block-stacking constraints. If B had non-zero padding-after, then D and E would not have any block-stacking constraint (though B and E would continue to have a block-stacking constraint).
Inline stacking constraints. This section will define the inline-stacking-constraints between two areas, together with the notion of fence-before and fence-after. This parallels the definition for block-stacking constraints, but with the additional complication that we may have a stacking constraint between inline-areas which are stacked in opposite inline-progression-directions. (This is not an issue for block-stacking constraints because a block-area which is not a reference area may not have block-progression-direction different from that of its parent.)
If P and Q have an inline-stacking constraint, then P has a fence before Q if P is a reference area or has non-zero border width or padding width at the first adjacent edge of P and Q. Similarly, Q has a fence after P if Q is a reference area or has non-zero border width or padding width at the second adjacent edge of P and Q.
If A and B are normally-sequenced areas, and S is a sequence of space-specifiers, it is defined that A and B have inline-stacking constraint S if any of the following conditions holds:
B is an inline-area which is the first normally-sequenced child of A, and S is the sequence consisting of the space-start of B.
A is an inline-area which is the last normally-sequenced child of B, and S is the sequence consisting of the space-end of A.
A and B are both inline-areas, and either
a. B is the next normally-sequenced sibling area of A, and S is the sequence consisting of the space-end of A and the space-start of B;
b. B is the first normally-sequenced child of an inline-area P, P has no fence after A, A and P have an inline-stacking constraint S', the inline-progression-direction of P is the same as the inline-progression-direction of the nearest common ancestor area of A and P, and S consists of S' followed by the space-start of B.
c. A is the last normally-sequenced child of a block-area P, P has no fence before B, P and B have an inline-stacking constraint S'', the inline-progression-direction of P is the same as the inline-progression-direction of the nearest common ancestor area of A and P, and S consists of the space-end of A followed by S''.
d. B is the last normally-sequenced child of an inline-area P, P has no fence after A, A and P have an inline-stacking constraint S', the inline-progression-direction of P is opposite to the inline-progression-direction of the nearest common ancestor area of A and P, and S consists of S' followed by the space-end of B.
e. A is the first normally-sequenced child of a block-area P, P has no fence before B, P and B have an inline-stacking constraint S'', the inline-progression-direction of P is opposite to the inline-progression-direction of the nearest common ancestor area of A and P, and S consists of the space-start of A followed by S''.
When A and B have an inline-stacking constraint, the adjacent edges of A and B are an ordered pair defined as:
In case 1, the start-edge of the content rectangle of A and the start-edge of the allocation-rectangle of B.
In case 2, the end-edge of the content rectangle of A and the end-edge of the allocation-rectangle of B.
In case 3a, the end-edge of the allocation rectangle of A and the start-edge of the allocation-rectangle of B.
In case 3b, the first of the adjacent edges of A and P, and the start-edge of the allocation-rectangle of B.
In case 3c, the end-edge of the allocation rectangle of A and the second of the adjacent edges of P and B.
In case 3d, the first of the adjacent edges of A and P, and the end-edge of the allocation-rectangle of B.
In case 3e, the start-edge of the allocation rectangle of A and the second of the adjacent edges of P and B.
Two areas are adjacent if they have a block-stacking constraint or an inline-stacking constraint. It follows from the definitions that areas of the same type (inline or block) can be adjacent only if all their non-common descendants are also of the same type (up to but not including their nearest common ancestor). Thus, for example, two inline-areas which reside in different line-areas are never adjacent.
An area A begins an area P if A is a descendant of P and P and A have either a block-stacking constraint or an inline-stacking constraint. In this case the second of the adjacent edges of P and A is defined to be a leading edge in P.
Similarly, An area A ends an area P if A is a descendant of P and A and P have either a block-stacking constraint or an inline-stacking constraint. In this case the first of the adjacent edges of A and P is defined to be a trailing edge in P.
Each script has its preferred "baseline" for aligning glyphs from that script. Western scripts typically use a "alphabetic" baseline that touches at or near the bottom of capital letters. Further, for each font there is a preferred way of aligning embedded characters from different scripts, e.g. for a Western font there is a separate baseline for aligning embedded ideographic or Indic characters.
Each block-area and inline-area has a "dominant-baseline" trait which is a baseline-type, an enumerated type corresponding to the type of alignment expected for the nominal font for that area. Similarly, each inline-area has a "baseline-identifier" trait, corresponding to the type of alignment preferred for that area. The geometric line identified by this trait is called the preferred baseline of the inline-area.
Associated to each font there is a table of baseline shifts, called the baseline table, which associates to each pair of possible baseline types the distance between the corresponding baselines in that font.
Example. For a Western font, the baseline table would
associate to the pair <alphabetic, hanging> a distance
approximately equal to the ascender height of the font, representing the offset
from the alphabetic baseline of the designated alignment point for embedded hanging-aligned
characters.
Some font standards, e.g. OpenType, define a baseline table as part of the font data.
Certain baselines which are not part of the registered set of baselines are defined
as follows. The offset of the text-before-edge baseline
is determined by the height of font relative to the dominant baseline.
The determination of the text-after-edge
baseline offset is analogous; the descent of the nominal font is used for
"text-after-edge".
For each line area, the offset of the "before-edge" baseline is determined by ignoring
all inline areas whose baseline-identifier is "before-edge". The "before-edge" baseline
offset is set to the maximum extent in the direction opposite the block-progression-direction,
of the before-edges of the remaining inline-areas. If all
the inline-areas in an line area are aligned "before-edge" then use "text-before-edge" as the
"before-edge" alignment offset. The determination of the "after-edge" baseline is analogous.
For each area define two quantities, before-baseline-height and after-baseline-height, as the respective distances from the area's dominant baseline to the before-edge and after-edge baselines.
A space-specifier is a compound datatype whose components are minimum, optimum, and maximum, conditionality, and precedence.
Minimum, optimum, and maximum are lengths and can be used to define a constraint on a distance, namely that the distance should preferably be the optimum, and in any case no less than the minimum nor more than the maximum. Any of these values may be negative, which can (for example) cause areas to overlap, but in any case the minimum should be less than or equal to the optimum value, and the optimum less than or equal to the maximum value.
Conditionality is an enumerated value which controls whether a
space-specifier has effect at the beginning or end of a reference-area or a
line-area. Possible values are retain and
discard;
a conditional space-specifier is one for which this value is
discard.
Precedence has a value which is either an integer or the special
token force. A forcing space-spe cifier
is one for which this value is force.
Space-specifiers occurring in sequence may interact with each other. The constraint imposed by a sequence of space-specifiers is computed by calculating for each space-specifier its associated resolved space-specifier in accordance with their conditionality and precedence, as shown below in the space-resolution rules.
The constraint imposed on a distance by a sequence of resolved space-specifiers is additive; that is, the distance is constrained to be no less than the sum of the resolved minimum values and no larger than the sum of the resolved maximum values.
To compute the resolved space-specifier of a given space-specifier S, consider the maximal inline-stacking constraint or block-stacking constraint containing S. The resolved space-specifier of S is a non-conditional space-specifier computed in terms of this sequence.
If any of the space-specifiers (in the maximal sequence) is conditional, and begins a reference-area or line-area, then it is suppressed, which means that its resolved space-specifier is zero. Further, any conditional space-specifiers which consecutively follow it in the sequence are also suppressed.
If a conditional space-specifier ends a reference-area or line-area, then it is suppressed together with any other conditional space-specifiers which consecutively precede it in the sequence.
If any of the remaining space-specifiers is forcing, all non-forcing space-specifiers are suppressed, and the value of each of the forcing space-specifiers is taken as its resolved value.
Alternatively if all of the remaining space-specifiers are non-forcing, then the resolved space-specifier is defined in terms of those space-specifiers whose precedence is highest, and among these those whose optimum value is the greatest. All other space-specifiers are suppressed. If there is only one of these then its value is taken as its resolved value.
Otherwise the resolved space-specifier of the last space-specifier in the sequence is derived from these spaces by taking their common optimum value as its optimum, the greatest of their minimum values as its minimum, and the least of their maximum values as its maximum, and all other space-specifiers are suppressed.
Example. Suppose the sequence of space values occurring at the
beginning of a reference-areas is: first, a space with value 10 points (that is
minimum, optimum, and maximum all equal to 10 points) and conditionality
discard; second, a space with value 4 points and
conditionality retain; and third, a space
with value 5 points and conditionality discard;
all three spaces having precedence zero. Then the first (10 point) space is
suppressed under rule 1, and the
second (4 point) space is suppressed under rule 3. The resolved value of the
third space is a non-conditional 5 points, even though
it originally came from a conditional space.
The padding of a block-area does not interact with the any space-specifier (except that by definition, the presence of padding at the before- or after-edge prevents areas on either side of it from having a stacking constraint.)
The border or padding at the before-edge or after-edge of a block-area may be specified as conditional. If so, then it is set to zero if its its associated edge is a leading or trailing edge in a reference-area. In this case, the border or padding is taken to be zero for purposes of the stacking constraint definitions.
Block-areas have several traits which typically affect the placement of their
children. The line-height is used in line placement calculations.
So is its dominant-glyph-height, which is the size (in the
block-progression-direction) of a glyph-area in the nominal-font of the
block-area. This depends only on
the font and not on which glyphs (or fonts) actually occur among
descendants of the block-area.
The line-stacking-strategy trait controls what kind of allocation
is used for descendant line-areas and has an enumerated value
(either font-height, max-height,
or line-height). This is all rigorously described below.
All areas have these traits,
but they only have relevance for areas which have stacked line-area children.
The space-before and space-after determine the distance between the block-area and surrounding block-areas.
A block-area which is not a line-area typically has its size in the inline-progression-direction determined by its start-indent and end-indent and by the size of its nearest ancestor reference-area. A block-area which is not a line-area typically varies its block-progression-dimension to accommodate its descendants. Alternatively the generating formatting object may specify a block-progression-dimension for the block-area.
Block-area children of an area are typically stacked in the block-progression-direction within their parent area, and this is the default method of positioning block-areas. However, formatting objects are free to specify other methods of positioning child areas of areas which they generate, for example list-items or tables.
For a parent area P whose children are block-areas, P is defined to be properly stacked if all of the following conditions hold:
For each block-area which is a descendant of P, the following hold:
the before-edge and after-edge of its allocation-rectangle are parallel to the before-edge and after-edges of the content-rectangle of P,
the start-edge of its allocation-rectangle is parallel to the start-edge of the content-rectangle of R (where R is the closest ancestor reference-area of B), and offset from it by a distance equal to the block-area's start-indent plus its start-intrusion-adjustment, minus its border-start, padding-start, and space-start values, and
the end-edge of its allocation-rectangle is parallel to the end-edge of the content-rectangle of R, and offset from it by a distance equal to the block-area's end-indent plus its end-intrusion-adjustment, minus its border-end, padding-end, and space-end values.
For each pair of normally-sequenced areas B and B' in the subtree below P, if B and B' have a block-stacking constraint S, then the distance between the adjacent edges of B and B' is consistent with the constraint imposed by the resolved values of the space-specifiers in S.
NOTE:The start-intrusion-adjustment and end-intrusion-adjustment are traits used to deal with intrusions from floats in the inline-progression-direction. The notion of indent is intended to apply to the content-rectangle, but the constraint is written in terms of the allocation-rectangle, because as noted earlier the edges of the content-rectangle may not correspond to like-named edges of the allocation-rectangle.
Example. In the diagram, if area
A
has a space-after value of 3 points, B a
space-before
of 1 point, and C a space-before of 2 points, all
with
precedence of force, and with zero border and padding,
then the constraints will place B's
allocation-rectangle
4 points below that of A, and C's
allocation-rectangle
6 points below that
of A. Thus the 4-point gap receives the
background color
from P, and the 2-point gap before C
receives the background color from B.
A line-area is a special type of block-area, and is generated by the same formatting object which generated its parent. Line-areas do not have borders and padding, i.e., border-before-width, padding-before-width, etc. are all zero. Inline-areas are stacked within a line-area relative to a baseline-start-point which is a point determined by the formatter, on the start-edge of its content-rectangle.
The allocation-rectangle of a line is determined by the value of the
line-stacking-strategy trait: if the
value is font-height, the allocation-rectangle is
the nominal-requested-line-rectangle, defined below; if the value is
max-height, the allocation-rectangle is the
maximum-line-rectangle, defined below; and if
the value is
line-height, the allocation-rectangle is the
per-inline-height-rectangle, defined below.
The nominal-requested-line-rectangle for a line-area is the rectangle whose start-edge and end-edge are parallel to and coincident with the start-edge and end-edge of the content-rectangle of the parent block-area (as modified by typographic properties such as indents), whose before-edge is separated from the baseline-start-point by the before-baseline-height, and whose after-edge is separated from the baseline-start-point by the after-baseline-height. It has the same block-progression-dimension for each line-area child of a block-area.
The maximum-line-rectangle for a line-area has the same length as the nominal-requested-line-rectangle in the inline-progression-direction. Its block-progression-dimension is the minimum required to enclose both the nominal-requested-line-rectangle and the allocation-rectangles of all the inline-areas stacked within the line-area; this may vary depending on the descendants of the line-area.
Nominal and Maximum Line Rectangles
The per-inline-height-rectangle has the same length as the nominal-requested-line-rectangle in the inline-progression-direction. For each inline-area the half-leading is defined to be half the difference of its line-height minus its actual-block-progression-dimension. The expanded-rectangle of an inline-area is the rectangle with start-edge and end-edge coincident with those of its allocation-rectangle, and whose before-edge and after-edge are outside those of its allocation-rectangle by a distance equal to the half-leading. The per-inline-height-rectangle is then defined to have the minimum block-progression-dimension required to enclose both the nominal-requested-line-rectangle and the expanded-rectangles of all the inline-areas stacked within the line-area; this may vary depending on the descendants of the line-area.
NOTE:Using the nominal-requested-line-rectangle allows equal baseline-to-baseline spacing. Using the maximum-line-rectangle allows constant space between line-areas. Using the per-inline-height-rectangle and zero space-before and space-after allows CSS-style line box stacking.
An inline-area has its own line-height trait, which may be
different from the line-height of its containing block-area. This may affect the
placement of its ancestor line-area when the line-stacking-strategy
is line-height. An inline-area has a baseline-table
for its nominal-font. It has
a dominant-baseline trait which determines how its stacked inline-area
descendants are to be aligned.
An inline-area may or may not have child areas, and if so it may or may not be a reference-area. The dimensions of the content-rectangle for an inline-area without children is computed as specified by the generating formatting object, as are those of an inline-area with block-area children.
An inline-area with inline-area children has a content-rectangle which is the minimum rectangle (with sides parallel to those of the content-rectangle of its parent area) which includes the allocation-rectangles of all of its children, and which extends from its position point by at least the after-baseline-height in the block-progression-direction, and in the opposite direction by at least the before-baseline-height from its position-point (these latter quantities derived from the nominal-font of the area, as defined in section 4.2.6).
Examples of inline-areas with children might include portions of inline mathematical expressions or areas arising from mixed writing systems (left-to-right within right-to-left, for example).
Inline-area children of an area are typically stacked in the inline-progression-direction within their parent area, and this is the default method of positioning inline-areas.
Inline-areas are stacked relative to a baseline, defined as follows:
1. If P is a line-area, the baseline of P is defined to be the line through the baseline-start-point which is parallel to the inline-progression-direction;
2. If P is an inline-area, the baseline of P is defined to be the line through the position-point of P which is parallel to the inline-progression-direction.
For a parent area P whose children are inline-areas, P is defined to be properly stacked if all of the following conditions hold:
For each inline-area descendant I of P, the start-edge, end-edge, before-edge and after-edge of the allocation-rectangle of I are parallel to corresponding edges of the content-rectangle of the nearest ancestor reference-area of I.
For each pair of normally-sequenced areas I and I' in the subtree below P, if I and I' have an inline-stacking constraint S, then the distance between the adjacent edges of I and I' is consistent with the constraint imposed by the resolved values of the space-specifiers in S.
For any inline-area descendant I of P, the distance in the shift-direction from the baseline of P to the position-point of I equals the distance between the dominant-baseline of P and the preferred baseline of I (as determined by the dominant-baseline-table of P), plus the sum of the baseline-shifts for I and all of its ancestors which are descendants of P. This alignment is done with respect to the line-area's dominant baseline, and not with respect to the baseline of any intermediate area.
The first summand is computed to compensate for mixed writing systems with different nominal glyph baselines, and the other summands involve deliberate baseline shifts for things like superscripts and subscripts.
The most common inline-area is a glyph-area, which contains the representation for a character in a particular font.
A glyph-area has an associated font, determined by its typographic traits, which apply to its character data.
The position-point and preferred baseline of a glyph-area are assigned according to the writing-system in use (e.g., the glyph baseline in Western languages), and are used to control placement of inline-areas descendants of a line-area. The formatter may generate inline-areas with different inline-progression-directions from their parent to accommodate correct inline-area stacking in the case of mixed writing systems.
A glyph-area has no children. Its actual-block-progression-dimension and baseline-table are the same for all glyphs in a font.
This section describes tree-structure constraints on the result of formatting a fo:block or similar block-level object.
A block-level formatting-object F which constructs lines does so by constructing block-areas which it returns to its parent formatting-object, and placing areas returned to F by its child formatting-objects as children of those block-areas or of line-areas which it constructs as children of those block-areas.
For each such formatting-object F, it must be possible to form an ordered partition P consisting of ordered subsets S1, S2, ..., Sn of the normally-sequenced areas returned by the child formatting-objects, such that the following are all satisfied:
Each subset consists of a sequence of inline-areas, or of a single block-area.
The ordering of the the partition follows the ordering of the formatting-object tree. Specifically, if A is in Si and B is in Sj with i < j, or if A and B are both in the same subset Si with A before B in the subset order, then either A is returned by a preceding sibling formatting-object of B, or A and B are returned by the same formatting-object with A being returned before B.
The partitioning occurs at legal line-breaks. Specifically, if A is the last area of Si and B is the first area of Si+1, then the rules of the language and script in effect must permit a line break between A and B, within the context of all areas in Si and Si+1.
The partition follows the ordering of the Area Tree, except for certain glyph substitutions and deletions. Specifically, if B1, B2, ..., Bp are the normally-sequenced child areas of the block-area or block-areas returned by F, (ordered in the pre-order traversal order of the area tree) then there is a one-one correspondence between these child areas and the partition subsets (i.e., n = p), and for each i,
if Si consists of a single block-area then Bi is that block-area, and
if Si consists of inline-areas then
Bi is a line-area whose child areas are the
same as the inline-areas in Si, and in the same order,
except that where the rules of the language and script in effect call
for glyph-areas to be substituted, inserted, or deleted, then the
substituted or inserted glyph-areas appear in the area tree in the
corresponding place, and the deleted glyph-areas do not appear in the
area tree. Deletions occur in the case of a glyph-area with a
remove-at-line-break value of true. Insertions and substitutions may
occur because of addition of hyphens or spelling changes due to
hyphenation, or glyph image construction from syllabilification, or
ligature formation.
Substitutions that r