W3C

Requirements and Use Cases for XSLT 2.1

W3C Working Draft 10 June 2010

This version:
http://www.w3.org/TR/2010/WD-xslt-21-requirements-20100610/
Latest version:
http://www.w3.org/TR/xslt-21-requirements/
Editor:
Petr Cimprich, UNITY Mobile <http://www.unitymobile.com/>

Abstract

This document is a characterization of requirements and use cases for [XSL Transformations (XSLT) Version 2.1]. The Requirements lists enhancements requested over time that may be addressed in XSLT 2.1.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index.

This is the First Public Working Draft of the Requirements and Use Cases for XSLT 2.1, produced by the W3C XSL Working Group, which is part of the XML Activity. The Working Group expects to eventually publish this document as a Working Group Note.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at ). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string "[XSLT21Req]" in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at .

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
2 Requirements
    2.1 Enabling Streamable Processing
    2.2 Modes and Schema-awareness
    2.3 Composite Keys
    2.4 The xsl:analyze-string Instruction Applied to an Empty Sequence
    2.5 Context Item for a Named Template
    2.6 Traditional Hebrew Numbering
    2.7 Separate Compilation of Stylesheet Modules
    2.8 The start-at Attribute of xsl:number
    2.9 Allowing xsl:variable before xsl:param
    2.10 Combining group-starting-with and group-ending-with
    2.11 Improvements to Schema for Stylesheets
    2.12 Setting Initial Template Parameters
    2.13 Invoking XQuery from XSLT
    2.14 Enhancement to Sorting and Grouping
    2.15 Enhancement to Conditional Modes
    2.16 Default Initial Template
3 Real-World Scenarios
    3.1 Transforming MPEG-21 BSDL
    3.2 Validation of SOAP Digital Signatures
    3.3 Transformation of the RDF Dump of the Open Directory
    3.4 Transformations on a Cell Phone
    3.5 XSL FO Multiple Extraction/Processing
    3.6 EFT/EDI Transformation
4 Tasks
    4.1 Splitting Flat Data
    4.2 Splitting Nested Data
    4.3 Joining
    4.4 Concatenation
    4.5 Adding Children
    4.6 Renaming and Counting Nested Elements
    4.7 Renaming and Counting Nested Elements and Counting Other Elements
    4.8 Filtering According to Attribute
    4.9 Filtering According to Child
    4.10 Histogram
    4.11 Hierarchical to Flat
    4.12 Flat to Hierarchical
    4.13 CSV Result
    4.14 Local Sorting
    4.15 Resolving References
    4.16 Multiple Extraction/Processing
    4.17 Grouping
    4.18 Iterations
    4.19 Making Explicit Sections
    4.20 Merging Sorted Sequences

Appendices

A Sample Data
    A.1 Flat Collection
    A.2 Nested Collection
    A.3 Product Catalog
    A.4 Hierarchical to Flat
    A.5 Rows and Columns
    A.6 Transactions and Balance
    A.7 Explicit Sections
B References


1 Introduction

This document is a characterization of requirements and use cases for [XSL Transformations (XSLT) Version 2.1]. The section 2 Requirements lists enhancements requested over time that may be addressed in XSLT 2.1. The relative priorities to be assigned to these different enhancements are still being decided.

Use cases are presented in two different styles: section 3 Real-World Scenarios contains real-world scenarios illustrating some shortcomings of [XSL Transformations (XSLT) Version 2.0], while section 4 Tasks contains descriptions of specific transformation tasks that make it possible to analyze the implementation in XSLT 2.0 and the proposed implementation in XSLT 2.1.

2 Requirements

2.1 Enabling Streamable Processing

XSLT should provide some facilities to enable transformation of a source document on the fly without constructing a complete tree representation of the document in memory. Difficulties with transformations when the entire document cannot fit into memory or when results must be produced while reading the input are the main motivation for this requirement.

The streaming facilities can impose constraints on stylesheets to ensure that streamable processing is possible. There must be a way to determine if a construct is streamable and whether the processor can guarantee that it will be processed using streaming.

To facilitate the analysis of streamability, new explicit constructs for some typical tasks may be added to the language. The constructs would be useful in themselves not only in conjunction with streaming.

  • Merging several sorted input sequences.

  • Computing multiple results during a single scan of the input data.

  • Adding an explicit instruction for iterative processing of a sequence.

  • Adding a declaration of mode so that properties like the streamability can be declared on the mode.

2.2 Modes and Schema-awareness

The ability to take advantage of schema-awareness in XSLT 2.0 is limited by the fact that most of the code consists of template rules, and in a typical template rule written with match="elementname" there is no type information available statically about the type of the context node. Rewriting all the template rules to use match="schema-element(elementname)" is laborious, and only works for elements declared globally; it also makes it very difficult to maintain parallel schema-aware and non-schema-aware versions of the stylesheet.

This problem can be reduced by making schema-awareness a property of a mode. Modes could be declared so that rules in this mode will only match untyped nodes, or to treat an element name E used at the start of a match pattern as schema-element(E); either for all elements or for the elements that corresponds to the name of a global element declaration.

2.3 Composite Keys

Composite (multi-part) sort keys are allowed in XSLT 2.0, but composite access keys (xsl:key) or grouping keys are not allowed. Users are required to construct such keys by string concatenation, which is clumsy and error prone because the result may not be unique, and it prevents use of non-string types as keys.

Composite access keys and composite grouping keys can be allowed.

2.4 The xsl:analyze-string Instruction Applied to an Empty Sequence

The fn:analyze-string() function which has been introduced in [XPath and XQuery Functions and Operators 1.1] behaves like most string functions in that it accepts an empty sequence as input, and treats it in the same way as a zero-length string. The xsl:analyze-string instruction in XSLT 2.0 does not work this way: it reports an error if the input is an empty sequence.

This can be changed for usability, for consistency, and to make it a little bit easier for implementations to reuse code between xsl:analyze-string and fn:analyze-string().

2.5 Context Item for a Named Template

The scope for static checking of named templates against a schema is very limited in XSLT 2.0, because the type of the context item is not known and cannot be declared.

A mechanism is needed to declare the type and other properties of the context item at the level of the initial stylesheet invocation. It would be useful to reuse this construct to allow declaration of the context item supplied to a named template.

2.6 Traditional Hebrew Numbering

There are issues with "Traditional Hebrew" numbering. Sometimes numbers are printed with additional marks to indicate that they are numbers, sometimes they aren't. The XSLT 2.0 specification uses both conventions, once in the example for dates, once in the example for numbering. The types of additional marks also change. In modern texts, numbers are sometimes marked with a geresh following the number, and sometimes with a gershayim; In archaic texts, overdots are sometimes used to indicate that the value is numeric and not a word. When the number is represented as words, it could be masculine or feminine, in both ordinal and cardinal forms. There's currently no way to specify masculine or feminine for cardinal forms. There are two conventions for how to specify a number in words: The modern convention (the equivalent of representing 1234 as "one thousand two hundred thirty four") and the archaic convention ("four and thirty and two hundred and one thousand").

What can help is an additional way to provide the XSLT processor with nonstandard language-specific options.

2.7 Separate Compilation of Stylesheet Modules

As XSLT applications become larger, there is a requirement for separate compilation of stylesheet modules. The design of XSLT 2.0 makes this difficult because there are only few constraints on what an importing/including stylesheet can do to change the behavior of an imported/included stylesheet. Some of the changes that are needed to make separate compilation viable include:

  • a change to the syntax and/or semantics of xsl:include and xsl:import to recognize the existence of precompiled stylesheet modules,

  • an addition of attributes controlling visibility of the declarations of functions, named templates, global variables and other objects such as attribute sets in a precompiled module,

  • rules constraining the ability to override variables, templates and functions,

  • some kind of connection between importing and modes,

  • making some declarations such as xsl:strip-space and xsl:output less global.

Some constraints will apply in stylesheet modules that are suitable for separate compilation.

2.8 The start-at Attribute of xsl:number

A simple and useful addition to xsl:number would be an attribute start-at="expression" to control the first number in the numbering sequence (defaulting to 1). This will be useful for example where numbering is to run across the documents in a collection.

2.9 Allowing xsl:variable before xsl:param

The XSLT 2.0 specification forbids intermixing of xsl:variable and xsl:param in templates. This seems to be unnecessarily restrictive to some users. Allowing xsl:variable before xsl:param in a template would be useful for some use cases, for example to calculate default parameter values.

2.10 Combining group-starting-with and group-ending-with

The group-starting-with and group-ending-with attributes are not allow to coexist on the xsl:for-each-group instruction in XSLT 2.0. Removing this restriction would provide a natural solution to some grouping use cases. For example the grouping of the following sequence of elements into a true hierarchy.

<start/>
<item/>
<item/>
<start/>
<item/>
<end/>
<item/>
<end/>

2.11 Improvements to Schema for Stylesheets

The patterns for NCNames and QNames should be made consistent and more precise regarding the naming rules for the first character and later characters. This affects xsl:QName, nametests, and method, and could be an opportunity to define "QName-but-not-NCName" as a type.

The complexType declarations for "text-element-base-type" and "transform-element-base-type" belong in Part A.

2.12 Setting Initial Template Parameters

Parameters passed to the transformation are matched against stylesheet parameters, not against the template parameters declared within the initial template. The initial template parameters take their default values.

This restriction can be relaxed. APIs will be allowed to allow the parameters to the initial template to be set. This does not mean that every invocation API must offer this capability; some invocation interfaces do not allow parameters to be set at all.

2.13 Invoking XQuery from XSLT

XSLT should have a way to invoke XQuery, including one or more of these ways:

  • Dynamic evaluation, similar to an instruction to evaluate XSLT code dynamically from XSLT.

  • Importing an XQuery library, so that its functions can be called from an XSLT stylesheet.

  • Embedding XQuery in a stylesheet.

  • Invoking statically known queries, e.g., xquery-invoke("query.xqy", $src).

2.14 Enhancement to Sorting and Grouping

The following extensions could be made to XSLT grouping and sorting capabilities:

  • Allow xsl:variable before xsl:sort, to compute a value that can be used both in the sort key expression and in the subsequent processing of the relevant item.

  • Allow grouping keys to be specified in a separate group element.

  • Use this to allow composite grouping keys.

  • Allow control over how a sequence-valued group key is handled.

  • Allow variables to be declared before the group-by OR group-starting-when in place of group-starting-with; the value is an expression rather than a pattern, and a new group starts when the expression is true.

2.15 Enhancement to Conditional Modes

It would be useful to set mode to the current mode to be able to set the mode conditionally, based on the current mode. Additionally, it would help to make the mode conditional (dependent on the current mode) but not be the same as the current mode. In other words, the requirement is to dispatch to a different mode depending on what the current mode is.

This requirement does not mean to allow the mode attribute on xsl:apply-templates to be set dynamically. Other options like the current-mode() function should be considered.

2.16 Default Initial Template

It would be useful as the stylesheet author to be able to define a default initial template within the stylesheet. This would allow to run a transformation with no input without the need for the user to supply the name of initial template. For example:

<xsl:stylesheet ... 
  default-initial-template="main">

  <xsl:template name="main">
  ...

3 Real-World Scenarios

The use cases described in this section illustrate when real users reach limits of existing XML transformation standards. The use cases are elaborated in form of short stories.

3.1 Transforming MPEG-21 BSDL

The BSDL (Bitstream Syntax Description Language) is an XML schema developed within the [ISO/IEC 21000-7:2004] standard (a part of MPEG-21 framework) in order to describe the high-level structure of a scalable video bitstream. The strength of BSDL lies in fact that it allows a bitstream adaptation by means of changing an XML-based description of bitstream which makes it possible to create a universal adaptation engine.

As the size of BSDL files is proportional to the number of bitstream frames the BSDL files can be rather large. Apart from the number of frames the size of BSDL files depends on the coding format of the video stream and the level of detail of the BSDL. The more detail a BSDL contains, the larger is its size.

For example, an H.264/AVC encoded video stream lasting 7 minutes has a size of 155 MB and contains approximately 10200 frames. The size of corresponding BSDL file is 7.7 MB. XSLT transformations of BSDL files for longer streams often touch limits of a processing environment. Transformations of BSDL descriptions of "infinite" live streams require custom transformation tools.

The following fragment of BSDL file - Bitstream Syntax schema for temporal scalable H.264/AVC bitstreams - contains a byte_stream_nal_unit element, representing a NAL (Network Abstraction Layer) unit. An BSDL file can contain many thousands of such or similar repeating elements.

<?xml version="1.0"?>
<Byte_stream xmlns="h264_avc"
	     bs1:bitstreamURI="example_cif.264"
	     xmlns:bs1="urn:mpeg:mpeg21:2003:01-DIA-BSDL1-NS"
	     xsi:schemaLocation="h264_avc h264_avc.xsd"
	     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	     xmlns:jvt="h264_avc">

  <byte_stream_nal_unit>
    <zero_byte>00</zero_byte>
    <startcode>000001</startcode>
    <nal_unit>
      <forbidden_zero_bit>0</forbidden_zero_bit>
      <nal_ref_idc>3</nal_ref_idc>
      <nal_unit_type>5</nal_unit_type>
      <raw_byte_sequence_payload>
	<slice_layer_without_partitioning_rbsp>
	  <slice_header>
	    <first_mb_in_slice>0</first_mb_in_slice>
	    <slice_type>7</slice_type>
	    <pic_parameter_set_id>0</pic_parameter_set_id>
	    <frame_num xsi:type="b4">0</frame_num>
	    <idr_pic_id>0</idr_pic_id>
	    <pic_order_cnt_lsb xsi:type="b6">0</pic_order_cnt_lsb>
	  </slice_header>
	  <stuffbits>0</stuffbits>
	  <payload_data>29 24031</payload_data>
	</slice_layer_without_partitioning_rbsp>
      </raw_byte_sequence_payload>
    </nal_unit>
  </byte_stream_nal_unit>
  :

</Byte_stream>

See [BSDL: Application of Content Adaptation] for more details.

3.2 Validation of SOAP Digital Signatures

The [XML Signature] technology has been widely adopted by Web Services to provide message-level security. As the design of XML Signature introduces a number of complex processing steps the validation of signatures often lead to performance and scalability problems.

The processing steps include:

  1. selection of a nodeset

  2. canonicalization

  3. applying a digest algorithm

While the third step is a specific cryptographic task the first and the second step can be seen as transformation of an XML message into an XML fragment. Using traditional XML tools like DOM, XPath and XSLT, the first two steps are considered a bottleneck of secure Web Service systems. With larger XML messages the processing time becomes unacceptable for real-time services.

Current services requiring better performance and scalability are thrown upon proprietary solutions, as described in [Streaming Validation for Digital Signatures].

3.3 Transformation of the RDF Dump of the Open Directory

The Open Directory (http://www.dmoz.org) is a large open source web catalog, whose content is organized into topics. These topics are hierarchically organized (topics may contain subtopics). Every topic contains a list of resources, consisting of a title, its URL, and a description. The complete content of the Open Directory is available for download as one very large (> 1 GB) RDF/XML dump.

Processing this RDF/XML file with XML software obviously requires streaming techniques. One possible task is to create a human readable representation by transforming the RDF file into multiple HTML pages. The resulting HTML should be similar to the existing web pages under www.dmoz.org.

The required transformation is rather simple: create a single HTML page for every topic that contains links to its subtopics as well as the title, the description and the URL of its resources. Since all topic elements occur as a flat list this transformation can be done using similar transforming strategies as demonstrated in 4.12 Flat to Hierarchical. More detailed information about this RDF transforming using STX is provided in [Transforming XML on the Fly].

Another variant is to start a new group for each Topic containing values from all the following ExternalPage elements. This is the same task as 4.17 Grouping, task b2.

3.4 Transformations on a Cell Phone

Mobile devices such as cell phones, PDAs, etc. often provide very limited RAM memory. Applications for such devices must be specially designed to respect these limitations. An XML processing which takes place on these devices should not require to store both XML source and result concurrently in memory. A strategy that consumes source XML and produces the result simultaneously is much more appropriate.

A mobile blogging application is an example of application which needs to process XML in the constrained environment. Using this application, people may create blog entries on their mobile device and post them to special blog servers (aka blog service providers - BSP). As different BSPs use different XML formats the challenge is to provide an architecture for one mobile application that works with different BSPs. This can be achieved by transforming the entered blog data (which is represented as XML in the mobile blog application) into the required XML format of the receiving BSP directly on the mobile device. For every BSP there is a special plugin that knows the transformation rules.

Source XML:

<?xml version="1.0"?>
<entry>
  <title type='text'>New Post</title>
  <content type='xhtml'>
    <div id='content'>Text embedded with the picture. </div>
    <div id='picture'>
      <object type='image/jpeg' id='pic[0]'
          data='data:image/jpeg;base64,Base64CodeEmbedded'/>
    </div>
  </content>
  <author>
    <name>This is where the authors are posted.</name>
  </author>
</entry>

Target XML (Flickr):

<?xml version="1.0" encoding=" ISO-8859-1" ?>
<a:entry xmlns:a="http://purl.org/atom/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title mode="escaped">New Post</title>
  <summary mode="escaped">Text embedded with the picture. </summary>
  <content type="image/jpeg" mode="base64">
    Base64CodeEmbedded
  </content>
  <issued />
  <standalone xmlns="http://sixapart.com/atom/typepad#">
    1
  </standalone>
</a:entry> 

One of the specific problems was the base64 encoded text for representing images. It would be desirable to stream this text node, too. The current XML data model represents this text as one text node so it is difficult or even impossible to transform this text in smaller parts using XSLT, even if the whole task is to the text as it is to the result.

See [Plug-in Based Architecture for Mobile Blogging] for more details.

3.5 XSL FO Multiple Extraction/Processing

Transformation of an extensive XML document consisting of sections, headings, paragraphs, and figures. The result consists of a formatted document containing three, consecutive, parts:

  • heading titles extracted from the source document (aka table of content)

  • figure titles extracted from the source document (aka list of figures)

  • the source document transformed in a simple, mostly linear, way

This kind of transformation is very common for producing an XSL FO instance that is then formatted.

The complete stylesheet for this transformation can be downloaded from http://www.w3.org/2010/06/ABmp_doc.xsl.

3.6 EFT/EDI Transformation

Given a huge (more than 1GB) denormalized XML extraction from database or other data source. The XSLT implementation needs to process nested regrouping and sorting along with varies calculation and produce grouped and sorted output as plain text.

This is a rather simplified version of a typical EFT/EDI (Electronic Funds Transfer/Electronic Data Interchange) transformation Oracle product handles. In real life such XSLT transform is not written by hand, instead the product compiles an table based EFT/EDI definition with PL/SQL alike syntax to XSLT by a processor, which usually yields in a complicated transformation. Nevertheless, even the simplified version does include some of the major challenging part of XSLT 2.0 in terms of streaming, e.g. regrouping with sorting, sorting within grouped data, and aggregation.

The xml data is some time normalized with structure, but most of the time it's rather just straightforward rowset/row dataset like following xml, and the size of that can easily reach hundreds of megabyte, even gigabyte level:

<?xml version="1.0"?>
<rowset>
  <row>
    <c1>aa</c1>
    <c2>ab</c2>   
    <c3>ac</c3>
    :
  </row>
  <row>
    <c1>ba</c1>
    <c2>bb</c2>   
    <c3>bc</c3>
    :
  </row>
  :
</rowset>

The XSLT is like this:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output format="text"/>

  <xsl:template match="/">
    <xsl:for-each-group select="rowset/row" group-by="c1">
      <xsl:sort select="current-grouping-key()"/>
      <xsl:call-template name="process_rows"/>
    </xsl:for-each-group>
    <xsl:text>GRAND TOTAL:</xsl:text>
    <xsl:value-of select="sum(rowset/row/c3)"/>
  </xsl:template>

  <xsl:template name="process_rows">
    <xsl:for-each select="current-group()">
      <xsl:sort select="c2"/>
      <xsl:text>FROM:</xsl:text>
      <xsl:value-of select="c1"/>
      <xsl:text>,TO:</xsl:text>
      <xsl:value-of select="c2"/>
      <xsl:text>,AMOUNT:</xsl:text>
      <xsl:value-of select="c3"/>
    </xsl:for-each>
    <xsl:text>TOTAL:</xsl:text>
    <xsl:value-of select="sum(current-group()/c3)"/>
  </xsl:template>
</xsl:stylesheet>

4 Tasks

Tasks are examples of relatively simple transformations whose definitions in XSLT 2.0 are not easy, straightforward or even possible. Some of these tasks are difficult solely because of the fact that one or more input or output XML documents is so large that the entire document cannot be held in memory. Other difficulties are related to merging and forking documents, restricted capabilities to iterate and the lack of common constructs (dynamic evaluation of expressions, try/catch).

The transformation task illustrating troubles with huge XML documents (4.1 Splitting Flat Data) can be defined in XSLT 2.0. The processor can even recognize that there is no need to keep the entire document in memory and can run the transformation in a memory-efficient way in some cases. But there no guarantee of this behavior. New facilities suggested for XSLT 2.1 aim to guarantee that a transformation must be processed in a streaming manner.

4.1 Splitting Flat Data

Task: Split the document A.1 Flat Collection so that each chapter child is copied to a separate XML document, with a URI of the form outer/chapterN.xml where N is a sequence number. The input document A.1 Flat Collection is too large to fit into memory but each chapter subtree (and thus each output document) fits into memory.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:template match="/wrapper">  
    <xsl:for-each select="chapter">
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:-of select="."/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The only difference is that the unnamed mode is explicitly marked as capable of being processed in a streaming manner.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
  
  <xsl:template match="/wrapper">  
    <xsl:for-each select="chapter">
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:-of select="."/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

4.2 Splitting Nested Data

The same task as 4.1 Splitting Flat Data but with a different input data. The main difference is that chapter elements are not necessarily children of the wrapper element.

Task: Split the document A.2 Nested Collection so that each chapter which is not descendant of another chapter element is copied to a separate XML document, with a URI of the form chapterN.xml where N is a sequence number.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:for-each select="//chapter[not(ancestor::chapter)]">
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:copy-of select="."/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. Again, the only difference is that the unnamed mode is explicitly marked as streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>

  <xsl:template match="/wrapper">    
    <xsl:for-each select="outermost(//chapter)"/>
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:copy-of select="."/>
      </xsl:result-document>
    </xsl:for-each>  
  </xsl:template>

</xsl:stylesheet>

4.3 Joining

Task: Do the inverse of the 4.1 Splitting Flat Data use case. That is, join documents produced by the 4.1 Splitting Flat Data use case and create a single A.1 Flat Collection document on the output.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="last-doc"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:for-each select="1 to $last-doc">
        <xsl:copy-of select="document(concat('chapter', ., '.xml'))"/>
      </xsl:for-each>
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. This version uses a new construct xsl:stream that reads a source document and processes the content of the document in a streaming manner.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="last-doc"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:for-each select="1 to $last-doc">
        <xsl:stream href="{concat('chapter', ., '.xml')}">
          <xsl:copy-of select="."/>
        </xsl:stream>
      </xsl:for-each>
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

4.4 Concatenation

Task: Given two 1GB documents with structure of A.1 Flat Collection, create a single 2GB file with the same structure, that contains first all the chapter children from the first file, then all the chapter children from the second file. A relevant difference between this use case and 4.3 Joining is that the two input documents are too large to fit into memory in this use case, while 4.3 Joining concatenates a number of smaller input documents each of them can be held in memory.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="doc1"/>
  <xsl:param name="doc2"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:copy-of select="document($doc1)/wrapper/chapter"/>
      <xsl:copy-of select="document($doc2)/wrapper/chapter"/>
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is explicitly marked as streamable and the documents are read using xsl:stream.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
  
  <xsl:param name="doc1"/>
  <xsl:param name="doc2"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:stream href="{$doc1}">
        <xsl:copy-of select="wrapper/chapter"/>
      </xsl:stream>
      <xsl:stream href="{$doc2}">  
        <xsl:copy-of select="wrapper/chapter"/>
      </xsl:stream>  
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

4.5 Adding Children

Task: Given an input document with the structure of A.1 Flat Collection, produce a new 1GB document where a predefined nested content (child elements) is added to each chapter element. The existing contents of the chapter elements are retained. The new contents are added at the beginning.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="content_to_add"/>

  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:copy-of select="document($content_to_add)"/>
      <xsl:copy-of select="node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The on-no-match attribute specifies which built-in rules to use to process a node that does not match any user-written template. The value "copy" means that the source tree is copied unchanged to the output. This why the "identity template" can be left out from the stylesheet.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy"/>

  <xsl:param name="content_to_add"/>  
  
  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:copy-of select="document($content_to_add)"/>
      <xsl:copy-of select="node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

4.6 Renaming and Counting Nested Elements

Task: Rename all chapter elements in A.2 Nested Collection to section. Additionally, print the number of renamed elements at the end of the document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:copy>
      <xsl:apply-templates />
      <renamed count="{count(//chapter)}" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>
  </xsl:template>  

  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>

</xsl:transform>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The default built-in rule is "copy". A new instruction xsl:fork is used to enable streamed processing in the case where several constructs (xsl:apply-templates, count()) need to be evaluated during a single pass over the input data. The result is exactly the same as if the xsl:fork element was not there; it only provides a hint to processor that contained instructions should be evaluated during a single pass. The instruction must be independent.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode name="rename" streamable="yes" on-no-match="copy"/>

  <xsl:template name="/wrapper">
    <xsl:copy>
      <xsl:fork>  
        <xsl:apply-templates />
        <renamed count="{count(//chapter)}" />
      </xsl:fork>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>  
  </xsl:template>
  
</xsl:transform>

4.7 Renaming and Counting Nested Elements and Counting Other Elements

Task: The same task like 4.6 Renaming and Counting Nested Elements but in addition we also want to count removed in A.2 Nested Collection. The number of renamed chapter elements and the number of removed elements is printed out at the end of the document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:copy>
      <xsl:apply-templates />
      <renamed count="{count(//chapter)}" />
      <removed count="{count(//removed)}" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>
  </xsl:template>  
  
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>

</xsl:transform>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The default built-in rule is "copy". The xsl:fork instruction is used to enable streamed processing of three independent constructs: xsl:apply-templates, count(//chapter), count(//removed).

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy"/>

  <xsl:template name="/wrapper">
    <xsl:copy>
      <xsl:fork>  
        <xsl:apply-templates />
        <renamed count="{count(//chapter)}" />
        <removed count="{count(//removed)}" />
      </xsl:fork>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>  
  </xsl:template>

</xsl:transform>

4.8 Filtering According to Attribute

Task: Given an input document with the structure of A.1 Flat Collection, remove all chapter elements which have the removed attribute.

XSLT 2.0 implementation.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
               version="2.0">

  <xsl:template match="chapter[@removed]" />
  
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>    

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The default built-in rule "copy" is used for all nodes but chapter elements with removed attribute.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy"/>
   
  <xsl:template match="chapter[@removed]" />

</xsl:stylesheet>

4.9 Filtering According to Child

Task: Given an input document with the structure of A.1 Flat Collection, remove all chapter elements which have at least one removed child.

XSLT 2.0 implementation.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
               version="2.0">

  <xsl:template match="chapter[removed]"/>
  
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>    

</xsl:stylesheet>

XSLT 2.1 implementation. This is a windowing example. Each chapter is processed in non-streaming mode but independently on other chapters. The transformation is initiated in the unnamed streamable mode. A copy of the subtree rooted at the chapter element is created for each chapter and processed in a non-streamable "chapter" mode.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" />  
  <xsl:mode name="chapter" streamable="no" />

  <xsl:template match="/wrapper">
    <xsl:copy>
      <xsl:apply-templates select="copy-of(chapter)" mode="chapter" />
    </xsl:copy>  
  </xsl:template>
  
  <xsl:template match="chapter" mode="chapter">
    <xsl:if test="not(removed)">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:copy-of select="node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>  
  
</xsl:stylesheet>

4.10 Histogram

Task: Given a 1GB document with the structure of A.1 Flat Collection produce a histogram showing the frequency distribution of chapter elements by the number of paragraphs (descendant p elements) in each document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output method="text"/>

  <xsl:template match="/wrapper">
    <!-- count the number of <p> elements in each <chapter> -->
    <xsl:variable name="counted_p">
      <count>
        <xsl:for-each select="chapter">
          <ps><xsl:value-of select="count(p)"/></ps>
        </xsl:for-each>
      </count>
    </xsl:variable>
    <!-- find min and max -->
    <xsl:variable name="min_ps" select="min($counted_p/count/ps) cast as xs:integer" />
    <xsl:variable name="max_ps" select="max($counted_p/count/ps) cast as xs:integer" />

    <!-- do the histogram -->
    <xsl:text>Number of "chapter" elements with N "p" elements; N from </xsl:text>
    <xsl:value-of select="$min_ps"/><xsl:text> to </xsl:text>
    <xsl:value-of select="$max_ps"/>
    <xsl:text>&#010;</xsl:text>
    <xsl:for-each select="$min_ps to $max_ps">
      <xsl:variable name="nr_ps" select="."/>
      <xsl:variable name="nr_chapters" select="count($counted_p/count/ps[ . = $nr_ps])"/>
      <xsl:call-template name="do_histo_bar">
        <xsl:with-param name="nr" select="$nr_chapters"/>
      </xsl:call-template>
      <xsl:text>&#010;</xsl:text>
    </xsl:for-each>
  </xsl:template>

  <xsl:template name="do_histo_bar">
    <xsl:param name="nr" select="0"/>

    <xsl:for-each select="1 to $nr">
      <xsl:text>X</xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable which is the only change needed to make this stylesheet streamable. The data is stored in a variable during a single pass through the input document. The subsequent processing only uses the stored data.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output method="text"/>
  <xsl:mode streamable="yes"/>

  <xsl:template match="/wrapper">
    <!-- count the number of <p> elements in each <chapter> -->
    <xsl:variable name="counted_p">
      <count>
        <xsl:for-each select="chapter">
          <ps><xsl:value-of select="count(p)"/></ps>
        </xsl:for-each>
      </count>
    </xsl:variable>
    <!-- find min and max -->
    <xsl:variable name="min_ps" select="min($counted_p/count/ps) cast as xs:integer" />
    <xsl:variable name="max_ps" select="max($counted_p/count/ps) cast as xs:integer" />

    <!-- do the histogram -->
    <xsl:text>Number of "chapter" elements with N "p" elements; N from </xsl:text>
    <xsl:value-of select="$min_ps"/><xsl:text> to </xsl:text>
    <xsl:value-of select="$max_ps"/>
    <xsl:text>&#010;</xsl:text>
    <xsl:for-each select="$min_ps to $max_ps">
      <xsl:variable name="nr_ps" select="."/>
      <xsl:variable name="nr_chapters" select="count($counted_p/count/ps[ . = $nr_ps])"/>
      <xsl:call-template name="do_histo_bar">
        <xsl:with-param name="nr" select="$nr_chapters"/>
      </xsl:call-template>
      <xsl:text>&#010;</xsl:text>
    </xsl:for-each>
  </xsl:template>

  <xsl:template name="do_histo_bar">
    <xsl:param name="nr" select="0"/>

    <xsl:for-each select="1 to $nr">
      <xsl:text>X</xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

4.11 Hierarchical to Flat

Task: Starting with a tree structure convert it to a flat list of node that keeps the relation between node (with addition of two attributes @parent and @preceding-sibling). See A.4 Hierarchical to Flat.

XSLT 2.0 implementation. This version reads the parent and preceding-sibling ID from the tree. Parent and preceding-sibling axes are used which makes the streaming processing difficult.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="node"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
  <xsl:text>&#010;</xsl:text>
    <node>
      <xsl:attribute name="id" select="@id"/>
      <xsl:attribute name="parent" select="if (parent::tree) then 'ROOT' else parent::node/@id" />
      <xsl:attribute name="preceding-sibling" select="preceding-sibling::node[1]/@id" />
      <xsl:copy-of select="content"/>
    </node>
    <xsl:apply-templates select="node"/>
  </xsl:template>

</xsl:stylesheet>

Another XSLT 2.0 implementation. The parent and preceding-sibling ID are passed along as parameters. which avoids both parent and preceding-sibling axes and is more convenient for streaming.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="node[1]"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
    <xsl:param name="pid" select="'ROOT'"/>
    <xsl:param name="sid"/>
    <xsl:text>&#010;</xsl:text>    
    <node>
      <xsl:attribute name="id" select="@id"/>
      <xsl:attribute name="parent" select="$pid"/>
      <xsl:attribute name="preceding-sibling" select="$sid"/>
      <xsl:copy-of select="content"/>
    </node>
    <xsl:apply-templates select="node[1]">
      <xsl:with-param name="pid" select="@id"/>
      <xsl:with-param name="sid" select="''"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="following-sibling::node[1]">
      <xsl:with-param name="pid" select="$pid"/>
      <xsl:with-param name="sid" select="@id"/>
    </xsl:apply-templates>
  </xsl:template>
 
</xsl:stylesheet>

XSLT 2.1 implementation. It's based on the second XSLT 2.0 implementation of the task above. The unnamed mode is marked as streamable. There are two downwards selections in the last template - child::node[1] and following-sibling::node[1]. These two selections are streamable in this order but the XSLT processor need not to recognize this fact. This transformation is not guaranteed streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>  

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="node[1]"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
    <xsl:param name="pid" select="'ROOT'"/>
    <xsl:param name="sid"/>
    <xsl:text>&#010;</xsl:text>    
    <node>
      <xsl:attribute name="id" select="@id"/>
      <xsl:attribute name="parent" select="$pid"/>
      <xsl:attribute name="preceding-sibling" select="$sid"/>
      <xsl:copy-of select="content"/>
    </node>
    <xsl:apply-templates select="node[1]">
      <xsl:with-param name="pid" select="@id"/>
      <xsl:with-param name="sid" select="''"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="following-sibling::node[1]">
      <xsl:with-param name="pid" select="$pid"/>
      <xsl:with-param name="sid" select="@id"/>
    </xsl:apply-templates>
  </xsl:template>
 
</xsl:stylesheet>

Another XSLT 2.1 implementation with xsl:iterate rather than recursion. This removes the issue with two downwards selections and is guaranteed streamable. However it relies on the fact that content is the first element child of node.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="*"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
    <xsl:param name="pid" select="'ROOT'"/>
    <xsl:param name="sid"/>

    <xsl:iterate select="*">
      <xsl:param name="pid"/>
      <xsl:param name="sid"/>

      <xsl:variable name="myid" select="string(@id)"/>
      <xsl:apply-templates select=".">
        <xsl:with-param name="gpid" select="(ancestor::node[2]/@id,'ROOT')[1]"/>
        <xsl:with-param name="pid" select="parent::node/@id"/>
        <xsl:with-param name="sid" select="$sid"/>
      </xsl:apply-templates>

      <xsl:next-iteration>
        <xsl:with-param name="pid" select="$pid"/>
        <xsl:with-param name="sid" select="if (self::content) then '' else $myid"/>
      </xsl:next-iteration>
    </xsl:iterate>
  </xsl:template>

  <xsl:template match="content">
    <xsl:param name="gpid"/>
    <xsl:param name="pid"/>
    <xsl:param name="sid"/>

    <xsl:text>&#xa;</xsl:text>
    <node id="{$pid}" parent="{$gpid}" preceding-sibling="{$sid}">
      <xsl:copy-of select="."/>
    </node>
  </xsl:template>

</xsl:stylesheet>

4.12 Flat to Hierarchical

Task: The reverse operation to 4.11 Hierarchical to Flat. The conversion of a flat list of nodes to a tree structure. See A.4 Hierarchical to Flat.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/nodes">
    <tree>
      <xsl:apply-templates select="node[1]"/>
    </tree>
  </xsl:template>

  <xsl:template match="node">
    <xsl:variable name="id" select="@id"/>
    <node id="{@id}">
      <xsl:copy-of select="content"/>
      <!-- descendants -->
      <xsl:apply-templates select="following-sibling::node[@parent = $id and @preceding-sibling = ''][1]"/>
    </node>
    <!-- following sibling -->
    <xsl:apply-templates select="following-sibling::node[@preceding-sibling = $id]"/>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. This transformation is in theory streamable because all nodes that will be found with the first apply-templates (descendants) go before the nodes matching the second apply-templates (following siblings). But this fact is only evident to those who fully understand the meaning of the input data (A.4 Hierarchical to Flat) and semantics of its elements and attributes. It would be rather difficult to come to the same conclusion with the automatic analysis of the stylesheet and input data. Therefore this task can be another example of transformation that is not recognized as streamable by an XSLT 2.1 processor despite of the fact that it could be run in a streaming way. This transformation is not guaranteed streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" />  

  <xsl:template match="/nodes">
    <tree>
      <xsl:apply-templates select="node[1]"/>
    </tree>
  </xsl:template>

  <xsl:template match="node">
    <xsl:variable name="id" select="@id"/>
    <node id="{@id}">
      <xsl:copy-of select="content"/>
      <!-- descendants -->
      <xsl:apply-templates select="following-sibling::node[@parent = $id and @preceding-sibling = ''][1]"/>
    </node>
    <!-- following sibling -->
    <xsl:apply-templates select="following-sibling::node[@preceding-sibling = $id]"/>
  </xsl:template>

</xsl:stylesheet>

4.13 CSV Result

Task: Given 1GB input document containing multiple row elements with col children (A.5 Rows and Columns), produce a csv document with the content of col elements.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="row">
    <xsl:value-of select="col" separator=", "/>
    <xsl:text>&#010;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" />  
  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="row">
    <xsl:value-of select="col" separator=", "/>
    <xsl:text>&#010;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

4.14 Local Sorting

Task: Given a 1GB document with the structure of A.1 Flat Collection, produce an output document containing the same data, but with all elements p within each chapter element sorted in the alphabetic order. The other elements within the chapter element follow the sorted p elements in the same document order.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:copy> 
      <xsl:apply-templates select="chapter"/>
    </xsl:copy>  
  </xsl:template>

  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="p">
        <xsl:sort />
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates select="* except p"/>
    </xsl:copy>  
  </xsl:template>
  
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. Another windowing example. Each chapter is processed in non-streaming mode but independently on other chapters. The transformation is initiated in the unnamed streamable mode. Each chapter is then sorted in a non-streamable "chapter" mode.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>  
  <xsl:mode name="chapter" streamable="no" on-no-match="copy"/>

  <xsl:template match="/wrapper">
    <xsl:copy> 
      <xsl:apply-templates select="copy-of(chapter)" mode="chapter"/>
    </xsl:copy>  
  </xsl:template>

  <xsl:template match="chapter" mode="chapter">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="p">
        <xsl:sort />
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates select="* except p"/>
    </xsl:copy>  
  </xsl:template>

</xsl:stylesheet>

4.15 Resolving References

Task: Given the two documents A.3 Product Catalog, produce a new document in which the code attribute is replaced by a description attribute, where the description is derived from the product code by a lookup in a 100Kb product codes document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:variable name="product_codes" select="document('data-2-codes.xml')"/>

  <xsl:template match="product">
    <product description="{$product_codes/*/code[@id = current()/@code]}">
      <xsl:apply-templates/>
    </product>
  </xsl:template>

  <!-- identity transform template -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. All codes and their descriptions are stored in a variable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy" />
  <xsl:variable name="product_codes" select="document('data-2-codes.xml')"/>

  <xsl:template match="product">
    <product description="{$product_codes/*/code[@id = current()/@code]}">
      <xsl:apply-templates/>
    </product>
  </xsl:template>

</xsl:stylesheet>

4.16 Multiple Extraction/Processing

Task: Process A.2 Nested Collection to produce a series of chapter-name elements containing the content of the chapter/@name attributes followed by a series of chapter-id elements containing the content of chapter/@id attributes followed by a body element containing all p elements and their text content.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <result>
      <xsl:apply-templates select=".//chapter" mode="name"/>
      <xsl:apply-templates select=".//chapter" mode="id"/>
      <body>
        <xsl:apply-templates select=".//p"/>
      </body>
    </result>
  </xsl:template>

  <xsl:template match="chapter" mode="name">
    <chapter-name>
      <xsl:value-of select="@name"/>
    </chapter-name>
  </xsl:template>
  
  <xsl:template match="chapter" mode="id">
    <chapter-id>
      <xsl:value-of select="@id"/>
    </chapter-id>
  </xsl:template>
 
  <xsl:template match="p">
    <p>
      <xsl:value-of select="text()"/>
    </p>
  </xsl:template>

</xsl:stylesheet>

This transformation requires multiple scans of the input data. The single scan way of processing would require to buffer basically the whole document. Neither streaming facilities of XSLT 2.1 nor xsl:fork can help to avoid the multiple scanning or the extensive buffering.

4.17 Grouping

Task: Process A.1 Flat Collection data. Group chapter elements by position and insert new contents between the groups. Copy the input and add an empty pagebreak element every 3 chapters.

XSLT 2.0 implementation.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                version="2.0">

  <xsl:template match="/*">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
    
  <xsl:template match="chapter">
    <xsl:variable name="position">
      <xsl:number />
    </xsl:variable> 
    <xsl:if test="$position != 1  and $position mod 3 = 1">
      <pagebreak />
    </xsl:if>
    <xsl:copy-of select="." />
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The xsl:number instruction is not always guaranteed streamable but in this specific case the streamed evaluation is possible.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                version="2.1">

  <xsl:mode streamable="yes" on-no-match="copy"/>

  <xsl:template match="chapter">
    <xsl:variable name="position">
      <xsl:number />
    </xsl:variable> 
    <xsl:if test="$position != 1  and $position mod 3 = 1">
      <pagebreak />
    </xsl:if>
    <xsl:copy-of select="." />
  </xsl:template>

</xsl:stylesheet>

4.18 Iterations

Task: Transform the input document to the required output as described in A.6 Transactions and Balance. The data of individual transactions are accumulated and the current balance is maintained for each transaction.

XSLT 2.0 implementation. A template is called recursively.

  <xsl:template match="/transactions">
    <account>
      <xsl:apply-templates select="transaction[1]" />
    </account>
  </xsl:template>  
  
  <xsl:template match="transaction">
    <xsl:param name="balance" select="0.00" as="xs:decimal"/>
    <xsl:variable name="newBalance" 
                    select="$balance + xs:decimal(@value)"/>
    <balance date="{@date}" value="{$newBalance}" change="{@value}"/>
    <xsl:apply-templates select="following-sibling::transaction[1]">
      <xsl:with-param name="balance" select="$newBalance"/>
    </xsl:apply-templates>
  </xsl:template>

</xsl:stylesheet>    

XSLT 2.1 implementation. The tail recursion is replaced with an iteration - using the new xsl:iterate construct.

<?xml version="1.0"?>
<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:xs="http://www.w3.org/2001/XMLSchema">
                
  <xsl:mode streamable="yes"/>

  <xsl:template match="/transactions">
    <account>
      <xsl:iterate select="transaction">
        <xsl:param name="balance" select="0.00" as="xs:decimal"/>
        <xsl:variable name="newBalance" 
                    select="$balance + xs:decimal(@value)"/>
        <balance date="{@date}" value="{$newBalance}"/>
        <xsl:next-iteration>
          <xsl:with-param name="balance" select="$newBalance"/>
        </xsl:next-iteration>
      </xsl:iterate>
    </account>  
  </xsl:template>

</xsl:stylesheet>

4.19 Making Explicit Sections

Task: Process A.7 Explicit Sections data. Convert a structure with implicit sections to a structure with explicit sections.

This use case has been described in [XQuery 1.1 Use Cases] (4.2.2. - Windowing Q2).

XSLT 2.0 implementation.

<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/body">
    <chapter>
      <xsl:for-each select="h2">
        <section title="{text()}">
          <xsl:apply-templates select="following-sibling::p[1]" />
        </section>
      </xsl:for-each>
    </chapter>
  </xsl:template>  
  
  <xsl:template match="p">
    <para>
      <xsl:value-of select="text()" />
    </para>  
    <xsl:if test="name(following-sibling::*[1]) = 'p'">
      <xsl:apply-templates select="following-sibling::p[1]"/>
    </xsl:if>
  </xsl:template>
  
</xsl:stylesheet>    

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The tail recursion is replaced with iteration.

<?xml version="1.0"?>
<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
                
  <xsl:template match="/body">
    <chapter>
      <xsl:for-each select="h2">
        <section title="{text()}">        
          <xsl:iterate select="following-sibling::*">
            <para>
              <xsl:value-of select="text()" />
            </para>  
            <xsl:if test="name(following-sibling::*[1]) != 'p'">
              <xsl:break />
            </xsl:if>
          </xsl:iterate>        
        </section>
      </xsl:for-each>
    </chapter>
  </xsl:template>  

</xsl:stylesheet>    

4.20 Merging Sorted Sequences

Task: Merge the input document specified in A.6 Transactions and Balance with another instance of the same document type to produce an output document of the same type that contains all transactions from both input documents. Both input documents are already sorted. The output keeps the same order.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:variable name="other" select="document('transactions-2.xml')"/>
                
  <xsl:template match="/transactions">
    <xsl:copy>
      <xsl:apply-templates select="transaction[1]">
        <xsl:with-param name="date" select="$other/transactions/transaction[1]/@date"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>  
  
  <xsl:template match="transaction">
    <xsl:param name="date"/>
    <xsl:variable name="current_date" select="@date"/>
    <xsl:for-each select="$other/transactions/transaction[@date &gt;= $date][@date &lt; $current_date]">
      <Transaction date="{@date}" value="{@value}"/>
    </xsl:for-each>
    <transaction date="{@date}" value="{@value}"/>
    <xsl:apply-templates select="following-sibling::transaction[1]">
      <xsl:with-param name="date" select="$current_date"/>
    </xsl:apply-templates>
    <xsl:if test="not(following-sibling::transaction)">
      <xsl:for-each select="$other/transactions/transaction[@date &gt; $date]">
        <TRansaction date="{@date}" value="{@value}"/>
      </xsl:for-each>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>    

XSLT 2.1 implementation. This transformation uses the xsl:merge instruction which allows to construct a sorted sequence of items by merging several input pre-sorted sequences. The xsl:merge instruction is designed to enable the streaming processing.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
  
  <xsl:template match="/transactions">
    <xsl:copy>
      <xsl:merge>
        <xsl:merge-source select="doc('transactions-1.xml'), doc('transactions-2.xml')">
          <xsl:merge-input select="transactions/transaction">
            <xsl:merge-key select="@date"/>
          </xsl:merge-input>
        </xsl:merge-source>
        <xsl:merge-action>
          <xsl:copy-of select="current-group()"/>
        </xsl:merge-action>
      </xsl:merge>
   </xsl:copy>
  </xsl:template>  
  
</xsl:stylesheet>    

A Sample Data

The following XML data are used in use cases

A.1 Flat Collection

A 1GB document consisting of a single wrapper element with a number of chapter children, each of them having several p children and an optional removed child. There are no nested chapter elements.

<?xml version="1.0"?>
<wrapper>
  <chapter id="1" name="a_chapter_1">
    <p>S the first element of the list.</p>
    <p>Ele.</p>
    <p>He first element of the list, passing the rema.</p>
  </chapter>
  <removed/>
  <chapter id="2" name="a_chapter_2" removed="yes">
    <p>A.</p>
    <removed/>
    <p>Fied as the first el.</p>
    <p>Fied as the first element of the list, passing the remaining elements as.</p>
    <p>Ified as the first ele.</p>
    <p>First element of the list, passing the remaining elements as.</p>
  </chapter>
  <chapter id="3" name="b_chapter_3" removed="yes">
    <p>As the first element of the list, passing the remaining element.</p>
    <removed/>
  </chapter>
  :
</wrapper>

A.2 Nested Collection

A less regular version of the strict A.1 Flat Collection document. chapter elements are not children of wrapper and they are not all siblings. Also, the content of chapter is not limited to p elements. The size of document is still about 1GB.

<?xml version="1.0"?>
<wrapper>
  <chapter id="1" name="chapter_1">
    <p>S the first element of the list.</p>
    <p>Ele.</p>
    <chapter id="2" name="chapter_2">
      <p>Element of the list, pao the syst.</p>
    </chapter>
    <p>He first element of tht, passing the rema.</p>
  </chapter>
  <set>
    <chapter id="3" name="chapter_3">
      <p>A.</p>
      <chapter id="4" name="chapter_4" removed="yes">
        <p>.</p>
        <p>T element o.</p>
      </chapter>
      <removed/>
      <p>Fied as the first el.</p>
      <p>Fied as the fig the remaining elements as.</p>
      <p>Ified as the first ele.</p>
      <p>First element of the list, passing the remaining elements as.</p>
    </chapter>
  </set>  
  <chapter id="5" name="chapter_5" removed="yes">
    <p>As the first element of the list, passing the remaining element.</p>
  </chapter>
  <removed/>
  :
</wrapper>

A.3 Product Catalog

A 1GB catalog document that contains product elements with code attributes, and a 100kB product codes document.

Main document:

<?xml version="1.0"?>
<catalog>
  <product code="111">
    <description>
      <p>This amazing carburettor choke valve is the best thing for you since 
        pre-sliced bread. That is, unless, you live in a country where the bread is baked 
        fresh and delivered to you for eating within a short period of time.
        In this case this product is the best thing since steamed frech lobster.</p>
      <p>Use of this product will make your car go twice as fast, consume less petrol, 
        and pollute less.</p>
    </description>
  </product>
  <product code="112">
    <description>
      <p>This amazing carburettor choke nut is the best thing for you since 
        pre-sliced bread. That is, unless, you live in a country where the bread is baked 
        fresh and delivered to you for eating within a short period of time.
        In this case this product is the best thing since steamed frech lobster.</p>
      <p>Use of this product will make your car go twice as fast, consume less petrol, 
        and pollute less.</p>
    </description>
  </product>
   :
</catalog>

Product codes document:

<?xml version="1.0"?>
<product-codes>
  <code id="111">carburetor choke valve</code>
  <code id="112">carburettor choke nut</code>
  <code id="113">carburettor choke bolt</code>
  <code id="114">carburettor choke screw</code>
  <code id="115">carburettor choke spanner</code>
  <code id="116">carburettor choke screw driver</code>
  <code id="117">carburettor choke chisel</code>
  <code id="118">carburettor choke hammer</code>
  <code id="119">carburettor choke jack</code>
  :
</product-codes>

A.4 Hierarchical to Flat

This sample data consists of two documents:

The first one is a 1GB document that contains tree structure of node elements with id attributes. Each node has exactly one content element. The content element is the first child of a node. There are no node descendants of a content element.

<?xml version="1.0"?>
<tree>
  <node id="id1">
    <content>...</content>
    <node id="id2">
      <content>...</content>
      :
    </node>
    <node id="id3">
      <content>...</content>
      :
    </node>
    :
  </node>
</tree>

The second document is a 1GB document that contains flat structure of node elements with id attributes, and additional parent and preceding-sibling attributes that keep information about a hierarchical structure of the first document.

<?xml version="1.0"?>
<nodes>
  <node id="id1" parent="ROOT">
    <content>.....</content>
  </node>
  <node id="id2" parent="id1" preceding-sibling="">
    <content>.....</content>
  </node>
  <node id="id3" parent="id1" preceding-sibling="id2">
    <content>.....</content>
  </node>
  :
</nodes>

A.5 Rows and Columns

This 1GB sample document contains multiple row elements with col children.

<?xml version="1.0"?>
<table>
  <row>
    <col>aa</col>
    <col>ab</col>   
    <col>ac</col>
    :
  </row>
  <row>
    <col>ba</col>
    <col>bb</col>   
    <col>bc</col>
    :
  </row>
  :
</table>

A.6 Transactions and Balance

The input XML document has this structure:

<transactions>
  <transaction date="2008-09-01" value="12.00"/>
  <transaction date="2008-09-01" value="8.00"/>
  <transaction date="2008-09-02" value="-2.00"/>
  <transaction date="2008-09-02" value="5.00"/>
  <transaction date="2008-09-03" value="6.00"/>
  <transaction date="2008-09-04" value="-3.00"/>
   :
</transactions>

The required output structure is:

<account>
  <balance date="2008-09-01" value="12.00"/>
  <balance date="2008-09-01" value="20.00"/>
  <balance date="2008-09-02" value="18.00"/>
  <balance date="2008-09-02" value="23.00"/>
  <balance date="2008-09-03" value="29.00"/>
  <balance date="2008-09-04" value="26.00"/>
  :
</account>

A.7 Explicit Sections

The input XML document:

<body>
  <h2>heading1</h2>
  <p>para1</p>
  <p>para2</p>
  <h2>heading2</h2>
  <p>para3</p>
  <p>para4</p>
  <p>para5</p>
</body>

The expected result is:

<chapter>
  <section title="heading1">
    <para>para1</para>
    <para>para2</para>
    <para>heading2</para>
  </section>
  <section title="heading2">
    <para>para3</para>
    <para>para4</para>
    <para>para5</para>
  </section>
</chapter>

B References

XSL Transformations (XSLT) Version 2.0
W3C XSL Transformations (XSLT) Version 2.0 W3C Recommendation See http://www.w3.org/TR/xslt20/.
XSL Transformations (XSLT) Version 2.1
W3C XSL Transformations (XSLT) Version 2.1 W3C Working Draft See http://www.w3.org/TR/xslt-21/.
XPath and XQuery Functions and Operators 1.1
W3C XPath and XQuery Functions and Operators 1.1 W3C Working Draft See http://www.w3.org/TR/xpath-functions-11/.
XQuery 1.1 Use Cases
W3C XQuery 1.1 Use Cases W3C Working Draft See http://www.w3.org/TR/xquery-11-use-cases/.
ISO/IEC 21000-7:2004
ISO/IEC MPEG-21 -- Part 7: Digital Item Adaptation ISO Standard See http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=37379.
BSDL: Application of Content Adaptation
Myriam Amielh, Sylvain Devillers Bitstream Syntax Description Language: Application of XML-Schema to Multimedia Content Adaptation See http://www2002.org/CDROM/alternate/334/.
XML Signature
W3C XML-Signature Syntax and Processing W3C Recommendation See http://www.w3.org/TR/xmldsig-core/.
Streaming Validation for Digital Signatures
Wei Lu, Kenneth Chiu, Aleksander Slominski and Dennis Gannon A Streaming Validation Model for SOAP Digital Signature See http://www.cs.indiana.edu/~welu/c14n_hpdc05.pdf.
Plug-in Based Architecture for Mobile Blogging
César Zapata, Christoffer Jakobsen Feasibility Study of a Plug-in Based Architecture for Mobile Blogging Master Thesis, Jönköping University See http://www.diva-portal.org/hj/abstract.xsql?dbid=989.
Transforming XML on the Fly
Oliver Becker Transforming XML on the Fly XML Europe 2003 See http://www.idealliance.org/papers/dx_xmle03/papers/04-02-02/04-02-02.html.