XML Inclusion Proposal(XInclude)

The purpose of this document is to set forth a minimal set of requirements and introduce a processing model and syntax for a general purpose inclusion facility. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process uses an XML-friendly syntax (elements, attributes, URI-References). The general purpose inclusion mechanism is usable in well-formed but not necessarily valid XML documents.

Status of this document

The XML Linking Working Group has decided to publish the XInclude proposal as a W3C Note from the XML Linking Working Group. This is the result of the evolution of the show="parsed" behaviour found in early XLink Working Drafts. It was decided that this functionality would be better handled in the core XML specification. Hence, at this time, this document is for discussion purposes only. This Note may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Notes as reference material or to cite them as other than "work in progress". This document is for discussion only and does not imply endorsement by the W3C membership.

This document has been produced as part of the W3C XML Activity by the XML Linking Working Group. The XML Linking WG charter currently expires in March 2000, but can be extended, if necessary.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Appendices

1. Introduction

This proposal offers a mechanism that enables composite infoset creation by including resources or subresources using element, attribute, and uri-reference syntax.

Many programming languages provide an inclusion mechanism to facilitate modularity. Many markup languages also have need of such a mechanism. Some examples of related "inclusions" in W3C technologies include:

The purpose of this document is to set forth a minimal set of requirements and introduce a processing model and syntax for a general purpose inclusion facility. The general purpose inclusion mechanism is usable in well-formed but not necessarily valid XML documents. The syntax consists of XML element and attribute syntax. No DTD style syntax is required, unlike entities.

The main audience is XML document authors who have need of modularity mechanisms in their production systems and language designs. While consolidating the special-purpose inclusion features already in use or development in the W3C is not a requirement for this effort, it is hoped that a general purpose mechanism would reduce future proliferation of such grammar-specific solutions.

Inclusion features differ from the linking features described in XLink in that they require specific behavior from the inclusion processor. This behavior occurs at a low level (infoset) as opposed to XLink which provides a mechanism for storing metadata for the use of higher-level applications (up to and including hypertext browsers.) Specifically, inclusion differs from transclusion, which is generally thought to provide much more contextual information - style sheets, schemas, etc - than simple node inclusion.

2. Requirements

3. Processing Model

Inclusion as defined in this document is a specific type of infoset transformation. A source infoset is transformed into a result infoset using the processing model specified in this document.

The infosets used or created by a XInclude processor support all required information items and properties as specified in the XML Infoset ( http://www.w3.org/TR/xml-infoset), and may support optional properties as well. In addition, XInclude requires the Base URI property to be surfaced on information items. This property is optional in the XML Infoset.

The result of performing the inclusion transformation is a result infoset which merges the source infoset with the infosets of resources identified by URI-references appearing in xinclude:include elements. The transformation input consists of a source infoset. A mechanism to resolve URLs and return the identified resources as infosets is needed.

By defining inclusion as an infoset-to-infoset transformation, only a minimal extension to the XML Infoset is envisioned (making the base URI property required instead of optional). This approach simplifies the coordination problems between working groups, because we operate upon infosets instead of within them. One disadvantage of this strategy is that a single infoset does not simultaneously provide to "included" and "non-included" views. This simplification does not address some interesting applications which need to determine which content has been included, such as editing a doc with an included copyright, then using DAV to save the non-included part back to a server.

3.1 Locating an Inclusion

The XInclude namespace shall be associated with the following URI: http://www.w3.org/1999/XML/xinclude. The declaration of the include element and attributes will require the usual declaration of namespaces in XML.

When performing inclusions, an XInclude processor identifies an xinclude:include element in the source infoset and acquires the resource specified. The information set for the resource is created and merged with the source infoset. This process is repeated until all xinclude:include elements have been processed.

3.2 Acquiring the Resource to be Included

The value of the href attribute on an xinclude:include is combined with the base URI of the xinclude:include element upon which it appears to describe a full URI-reference. The resource identified by this URI-reference is acquired and an infoset created, either by parsing the resource, or turning it into an infoset consisting of a single text information item. This latter behavior facilitates the inclusion of "working examples" into explanatory text. The two methods for creating an infoset are specified using the parse attribute, which has the values "xml", "text", or "cdata".

Resources that are unavailable for any reason result in an error. Resources that resolve to non-well-formed XML given the parse="xml" option result in an error. Resources that resolve to something other than text when parse="text" or parse="cdata" is specified result in an error.

Any xinclude:include elements in this infoset are recursively processed to depth of their steps attribute, up to the maximum recursion depth specified by the steps attribute on the initial source document.

3.2.1 Inclusion loops

When processing nested xinclude:include elements with parse="xml", it is an error to include a resource that contains an xinclude:include containing a URI-reference that has already been processed in the inclusion chain.

3.2.2 Namespaces

An XInclude processor is by definition namespace aware, and performs namespace processing as described in the Infoset WD. The namespace URI is thus considered part of the element information item, and merging the infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML sources. A serialized result infoset may thus contain additional namespace declarations when including a sub-resource.

This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.

Applications performing serialization of the result infoset are not constrained on where they place the namespace declarations, as long as the result preserves the namespaces of the included items.

3.3 Merging infosets

The acquired infoset is merged with the source infoset to create a new infoset by replacing the information items representing the xinclude:include elements with information items in the acquired infoset. The xinclude:include element, its attributes and any children, are not represented in the result infoset.

The base URI property of the acquired infoset is not changed as result of merging the infoset, so the base URI property is retained unchanged after merging.

3.3.1 Document nodes

An acquired infoset will often represent a complete XML document. In this case the top-level children of the document information item will replace the xinclude:include element, in the order in which they appear in the acquired infoset. This applies to comments, processing instructions, and the document element.

The XML declaration in the included document is used to parse the external file but the declaration itself is not preserved in the resulting infoset.

3.3.2 Multiple nodes

An xinclude:include may identify a subresource that consists of more than a single information item. In this case these information items replace the information item representing xinclude:include in the order in which they appear in the included document.

If the document element in the source infoset is an xinclude:include, it is an error to attempt to replace it with more than a single element.

3.3.3 Attribute nodes

An href with an XPointer may identify an attribute or a collection of nodes containing an attribute. Attempting inclusion of attributes results in an error.

3.3.4 Non-node locations

An href with an XPointer may identify a location that is not a node, such as a range or a string location. Attempting inclusion of non-node locations results in an error.

4. The XInclude Processor

A XInclude processor is a class of XML processor that conforms to all the behavior of the XML and XML Namespaces Recommendations, and additionally supports the inclusion behavior specified in this document. For purposes of this document, the term "XInclude processor" includes all the functionality of an "XML Processor".

Note that a simple application-defined switch would be sufficient to flip between XML processors and XInclude processors.

4.1 Example of a application-level switch (non-normative)

The XML component shipped with Microsoft Internet Explorer 5 can operate as either a validating parser, or a non-validating one. This option is provided to a user through the addition of a method to the DOM document object, called validateOnParse. A similar switch (processIncludes) could be provided for toggling between a normal XML processor and a XInclude processor.

4.2 Exposing the Base URI

A XInclude processor may expose the base URI of a document, element, or processing instruction information item. This enables applications which resolve URI References to process them correctly. Two examples where this is necessary are XLink, and the xml-stylesheet processing instruction.

4.2.1 Example of a DOM extension to expose base URIs (non-normative)

The base URI information could be provided through the addition of a method to the DOM node object. Rather than simply exposing the base as a property, it may be more useful instead to be able to resolve URI-references using the base. The resolveURI method is passed a relative URI-reference, which is resolved to a full URI-reference using the base URI of a particular node.

This example finds the first xlink:simple element, extracts the relative URI-reference, and resolves it into a full URI-reference in context of the xlink:simple element.

4.3 Validation

XML 1.0 validation is not performed on the results of the inclusion, nor on the included elements. The include mechanism introduces the notion of infoset validation. After all inclusions are completed, an include processor will validate the infoset against the original document's DTD if it contains a doctype declaration.

4.4 IDs

IDs and IDREFS intersection with the inclusion mechanism surfaces a few issues with respect to to XML and inclusion infoset validation.

If an attribute declares an ID that has already been declared, processing is the same as if duplicate IDs had been encountered in a single XML document. This condition would be discovered during infoset validation, after all inclusions are performed. For example, processing could be halted and an approprate error surfaced.

ID rewriting is a possiblity for inclusion of documents with nodes containing IDs. The following condition may occur: An including document contains an ID. The inclusion specified is to the subnode of a separate document. The separate document contains the same ID outside the scope of the inclusion, and the inclusion scope contains an IDREF to the ID. It is unclear whether this should be an error condition or not. It is conceivable that authors would design their modularity to use this aspect of IDs. It is also possible that IDs should be re-written to be local to the scope of the document.

This proposal suggests that ID rewriting should not be performed. In the previous use-case, the document will infoset validate if the infoset after inclusion contains an IDREF to an ID that is in the document.

4.5 Relation to other XML standards

"The relationship between XInclude and othr XML standards is defined by the concept of a 'XInclude processor'. Such a processor leverages XML 1.0 and XML Namespaces in it's syntax, and uses the XML Infoset to describe a specific processing model. In general, XInclude processing should occur between the generation of an Infoset by a processor, and the consumption of that infoset by a higher-level application, so that the inclusion results are transparent to those applications.

Although XInclude may be implemented as an independent layer, it also may be implemented at a lower level with the same results, but with potentially greater performance.

The relationship between XInclude and DTD or XML Schema validation needs additional exploration (as noted by issues within this document). In particular DTD validation as defined in XML 1.0 does not support validation of the result infoset within this 'layered' strategy.

5. Syntax

The syntax for specifying inclusion is an element similar to the simple links defined by XLink. XInclude defines a namespace associated with the URI http://www.w3.org/1999/XML/xinclude.

The XInclude namespace contains a single element, xinclude:include, with the following attributes:

6. Conformance

A. Examples

A.1 Infoset inclusion example

The following XML document fragments specify a document containing an xinclude:include element which points to an external document.

<?xml version='1.0'?>
<mydocument xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
<p>I'm asking a silly question about the following content</p>
<xinclude:include steps="1" href="myurl.xml"/>
<p>Now, what did the included content tell us?</p>
</mydocument>

myurl.xml contains:

<?xml version='1.0'?>
<copyright>
<p>Copyright notice for all content by the W3C</p>
</copyright>

The result of processing this document with an XLink processor is the same as the result of processing the following document with an XML processor.

<?xml version='1.0'?>
<mydocument>
<p>I'm asking a silly question about the following content</p>
<copyright>
<p>Copyright notice for all content by the W3C</p>
</copyright>
<p>Now, what did the included content tell us?</p>
</mydocument>

A.2 Textual inclusion example

The following XML document link a working example into.

<?xml version='1.0'?>
<mydocument xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
<p>The following is the source of the "data.xml" file:</p>
<example><xinclude:include href="data.xml" parse="cdata"/></example>
<example><xinclude:include href="data.xml" parse="text"/></example>
</mydocument>

data.xml contains:

<?xml version='1.0'?>
<data>
<item><![CDATA[Brooks & Sheilds]]></item>
</data>

The result of processing this document with an XLink processor is the same as the result of processing the following document with an XML processor.

<?xml version='1.0'?>
<mydocument xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
<p>The following is the source of the "data.xml" file:</p>
<example><![CDATA[<data>
<item><![CDATA[Brooks & Sheilds]]]]><![CDATA[></item>
</data>]]></example>
<example>&lt;data&gt;
&lt;item&gt;&lt;![CDATA[Brooks &amp; Sheilds]]&gt;&lt;/item&gt;
&lt;/data&gt;</example>
</mydocument>

Note that CDATA notation can itself be escaped by replacing occurances of "]]>" with "]]]]><![CDATA[>". At the DOM level, this may mean several CDATA nodes may result from an inclusion, instead of just one.

B. Relationship to xsl:import and xsl:include

While the intention of XInclude is not to replace the xsl:import or xsl:include functionality of XSLT, it may be useful to examine how these features could have been formulated to leverage the benefits of xinclude:include.

xsl:include could be replaced by xinclude:include by allowing the xsl:stylesheet to appear within itself. If the following stylesheet were legal:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>
  
  <xsl:stylesheet>
    <xsl:template match="foo">
      <foo2><xsl:apply-templates /></foo2>
    </xsl:template>
  </xsl:stylesheet>
</xsl:stylesheet>

and if processor were directed to ignore the extra level of hierarchy introduced by the nested xsl:stylesheet element, it is easy to see that the nested xsl:stylesheet element could be placed in an external file and replaced by xinclude:include. XSLT currently handles this case by conceptually stripping the extra level of hierarchy during the inclusion process.

The problem where the included stylesheet uses the "single template structure" is a bit more problematic, as the trivial solution requires the author of the including stylesheet to have knowledge of this structure:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xinclude:include href="sing-temp-ss.xsl" xmlns:xinclude="http://www.w3.org/1999/XML/xinclude/"/>
  </xsl:template>
</xsl:stylesheet>

XSLT currently handles this case by implying the existance of the xsl:template match='/' wrapper when it detects a stylesheet of this form.

xsl:import is similar to xsl:include, differing only in that it assigns "importance" levels to each file. This could be accomplished with a wrapper element. The following stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:import href="base-stylesheet.xsl"/>
  
  <xsl:template match="foo">
    <foo2><xsl:apply-templates/></foo2>
  </xsl:template>
</xsl:stylesheet>

would become:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:importance>
    <xinclude:include href="base-stylesheet.xsl"/>
  </xsl:importance>
  
  <xsl:template match="foo">
    <foo2><xsl:apply-templates/></foo2>
  </xsl:template>
</xsl:stylesheet>

Note that the include functionality becomes completely orthogonal to determining importance.

C. Comparison with XLink extended and simple

XLink defines no processing model for links, only a syntax that enables link detection. XInclude, on the other hand, defines a clear processing model at the infoset level. XLinks are processed at a high level (the application) while XIncludes are processed at a low level (infoset). Because of these differences, this proposal considers XInclude to be a different kind of link than those described in XLink. It would also be reasonable to consider XInclude as a specific type of XLink. This appendix explores the syntax that might arise from this approach.

C.1 Comparison with simple links in xlink

Modification of the xlink:simple element to provide inclusion services would require the steps and parse attributes to be added to simple and the use of now defunct parse attribute for show. The steps and parse attributes could have the same defaults in xlink:simple as in the proposed xinclude:include.

Repeating the simple include example,

<foo xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
 <xinclude:include href="common.xml#xptr(a/b)"/>
</foo>

The minimal xlink:simple comparison is shown below.

<foo xmlns:xlink="http://www.w3.org/1999/xlink/namespace">
 <xlink:simple href="common.xml#xptr(a/b)" show="parse" actuate="auto"></xlink:simple></eg>
</foo>

A maximal xlink:simple comparison is shown below.

<foo xmlns:xlink="http://www.w3.org/1999/xlink/namespace">
 <xlink:simple href="common.xml#xptr(a/b)" show="parse" actuate="auto" parse="xml" 
 steps="1" title="commonxmlfile" role="container"></xlink:simple></eg>
</foo>

C.2 Comparison with extended links in xlink

Modification of the xlink extended element would require the steps and parse attributes to be added to arc and the use of now defunct parse attribute for show. The steps and parse attributes could have the same defaults in xlink:arc as in the proposed xinclude:include. Repeating the simple include example,

<foo xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
 <xinclude:include href="common.xml#xptr(a/b)"/>
</foo>

The minimal xlink:extended comparison is shown below,

<foo xmlns:xlink="http://www.w3.org/1999/xlink/namespace">
  <xlink:extended id="mylink1">
    <xlink:arc show="parse" actuate="auto" from="linksource" to="target"/>
    <xlink:locator role="linksource" href="#xptr(here())"/>
    <xlink:locator role="target" href="common.xml#xptr(a/b)"/>
  </xlink:extended>
</foo>

D. Issues List

The following issues were raised at the XML Link F2F in September 1999:

dtd validation
id/idref rewriting or whatever
stylesheet PIs in included documents
doctype declarations in included documents
base URI issues
recursion issues including authors/readers intent (note, same issues as with XML 1.0 external entity refs)
range inclusion
do we want an infoitem to tell us that some node came from an inclusion
cdata section issues (the included part is some characters within a cdata section)
any whitespace issues?
should this facility be built on top of xlink or something else
behavior
character set issue/I18N
relationship with schema group requirements

E. History

The following shows the timeline of the XML Inclusion facility:

June 25, 1999: David Orchard's original XIL (XML Inclusion Language) proposal. Inclusion without parse.
July 15, 1999: Jonathan Marsh becomes co-author. Parse attribute added.
July 29, 1999: Renamed to SNIP, Simple Node Inclusion Proposal.
Aug 18, 1999: Proposed to XML Link WG.
Sept 29, 1999: Reviewed extensively in XML Link Face to Face meeting, many issues added.
Oct 19, 1999: Renamed to XInclude for purpose of submission as W3C note.

F. Open Issues List

A tabulation of open issues flagged above follows:

XML Inclusion Proposal (XInclude)

W3C Note 23 November 1999

Abstract