This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process uses an XML-friendly syntax (elements, attributes, URI References). The general purpose inclusion mechanism is usable in well-formed but not necessarily valid XML documents.

Appendices

1. Introduction

Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism. This proposal introduces a generic mechanism for merging XML documents (as represented by their information sets) The syntax leverages existing XML constructs - elements, attributes, and URI references.

1.1 Relationship to XLink

XInclude differs from the linking features described in the XML Linking Language [XLink], specifically links with the attribute value show="embed". Such links provide a media-type independent syntax for indicating that a resource is to be embedded graphically within the display of the document. XLink does not specify a specific processing model, but simply facilitates the detection of links and recognition of associated metadata by a higher level application.

XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.

Simple node inclusion as described in this specification differs from transclusion, which preserves contextual information such as style.

1.2 Relationship to XML External Entities

There are a number of differences between XInclude and XML external entities [XML] which make them complimentary technologies.

Processing of external entities (as with the rest of DTDs) occurs at parse time. XInclude operates on information sets and thus is orthogonal to parsing.

Declaration of external entities requires a DTD or internal subset. This places a set of dependencies on inclusion, for instance, the syntax for the DOCTYPE declaration requires that the document element be named - clearly orthogonal to inclusion in many cases. Validating parsers must have a complete content model defined. XInclude is orthogonal to validation and the name of the document element.

External entities provide a level of indirection - the external entity must be declared and named, and separately invoked. XInclude uses direct referencecs. Applications which generate XML output incrementally can benefit from not having to pre-declare inclusions.

The syntax for an internal subset is cumbersome to many authors of simple well-formed XML documents. XInclude syntax is based on familiar XML constructs.

Note also that XInclude together with XPointer [XPointer] can replace certain forms of internal entities, although XInclude syntax is not optimized for this purpose.

1.3 Relationship to Grammar-specific Inclusions

Special purpose inclusion mechanisms have been introduced into specific XML grammars. XInclude provides a generic mechanism for recognizing and processing inclusions, and as such can offer a simpler overall authoring experience, greater performance, and less code redundancy.

3. Processing Model

Inclusion as defined in this document is a specific type of infoset transformation. A source infoset is transformed into a result infoset using the processing model specified in this document.

The infosets used or created by an XInclude processor support all required information items and properties as specified in the XML Infoset [XML Infoset], and may support any optional properties as well. In addition, XInclude requires the Base URI property to be surfaced on information items. This property is optional in the XML Infoset.

The input for the inclusion transformation consists of a source infoset. The output is a new infoset which merges the source infoset with the infosets of resources identified by URI references appearing in xinclude:include elements. Thus a mechanism to resolve URLs and return the identified resources as infosets is assumed. There is no attempt to preserve information in the result infoset indicating where inclusion has been performed - for this information the original infoset must be examined.

3.1 Locating an Inclusion

The existence of an include is asserted by an include element.

When performing inclusions, an XInclude processor identifies an xinclude:include element in the source infoset and acquires the resource specified. The information set for the resource is created and merged with the source infoset. This process is repeated until all xinclude:include elements have been processed. The order in which include elements are processed is not defined by this specification. Intra-document references within include elements must be resolved against the original infoset, instead of resolving them against some intermediate state.

In the following example, the infoset representing something.xml will appear twice.

<x xmlns:xinclude="...">
  <xinclude:include href="something.xml"/>
  <xinclude:include href="#xpointer(x/xinclude:include[1])"/>
</x>

3.2 Acquiring the Resource to be Included

The value of the href attribute on an xinclude:include is combined with the base URI of the xinclude:include element as specified in XML Base [XMLBase]. The resource identified by the full URI reference is acquired and an infoset created, either by parsing the resource as xml, or by converting it into an infoset consisting of a single text information item. This latter behavior allows the inclusion of "working examples" into explanatory text. Which of the two methods for creating an infoset is to be used is determined by the parse attribute, which may take the values "xml", "text", or "cdata".

Note that the character encodings of the including and included resources can be different. This does not affect the resulting infoset, but may need to be taken into account during any subsequent serialization.

Resources that are unavailable for any reason result in an error. Resources that resolve to non-well-formed XML given the parse="xml" option result in an error. Resources that resolve to something other than text when parse="text" or parse="cdata" is specified result in an error.

Any xinclude:include elements in this infoset are recursively processed.

Issue (XInclude:03-nesting-optimization): The proposal implies that the destination documents are knitted before inclusion, which we agree is the right behaviour, but we need some way to optimise this (including an element which doesn't have any links in it, but is in a document which does, should not require following all the links in that document). [Richard Tobin]

3.2.1 Inclusion loops

When processing nested xinclude:include elements with parse="xml", it is an error to include a resource that contains an xinclude:include containing a URI reference that has already been processed in the inclusion chain.

In other words, the following are all legal inclusion:

An inclusion with parse="text" or parse="cdata" may reference itself.
An inclusion may identify a different part of the same local resource.
Two non-nested inclusions may identify a resource which itself contains a legal inclusion.

The following are illegal inclusions:

An inclusion of the xinclude:include element itself or any ancestor thereof.
An inclusion of any xinclude:include element or ancestor thereof which has already been processed by a higher-level inclusion.

3.2.2 Namespaces

An XInclude processor is by definition aware of XML Namespaces [XML Names], and performs namespace processing as described in the Infoset WD. The namespace URI is thus considered part of the element information item, and merging the infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML sources. A serialized result infoset may thus contain additional namespace declarations when including a sub-resource.

For example, the following document:

<foo xmlns:x="uri1">
 <xinclude:include href="common.xml#xptr(a/b)"/>
</foo>

including a node from common.xml:

<a xmlns:x="uri2">
  <b>
    <x:a/>
  </b>
</a>

results in a document that could be serialized as:

<foo xmlns:x="uri1">
  <b xmlns:x="uri2">
    <x:a/>
  </b>
</foo>

This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.

Applications performing serialization of the result infoset are not constrained on where they place the namespace declarations, as long as the result preserves the namespaces of the included items.

3.3 Merging infosets

The acquired infoset is merged with the source infoset to create a new infoset by replacing the information items representing the xinclude:include elements with information items in the acquired infoset. The xinclude:include element, its attributes and any children, are not represented in the result infoset.

The base URI property of the acquired infoset is not changed as result of merging the infoset, so the base URI property remains unchanged after merging.

Issue (XInclude:02-base-uri-syntax): A reserialised document will lose the base URL information; do we need an [xinclude:base-url] attribute that can be added to any element? [Richard Tobin]

Issue (XInclude:36-infoset-entities): The infoset exposes entity information items http://www.w3.org/TR/xml-infoset#infoitem.entity. XInclude does not define whether entity information items are copied via the infoset or not.

3.3.1 Document nodes

An acquired infoset will often represent a complete XML document. In this case the document information item does not appear in the resulting infoset. The top-level children of the document information item replace the xinclude:include element, in the order in which they appear in the acquired infoset. This applies to comments, processing instructions, and the document element.

The XML declaration in the included document is ignored. The document type declaration information item in the included document is ignored.

Ed. note: Add example of ignorable and non-ignorable whitespace.

3.3.2 Multiple nodes

An xinclude:include may identify a subresource that consists of more than a single information item. In this case these information items replace the information item representing xinclude:include in the order in which they appear in the included document.

If the document element in the source infoset is an xinclude:include, it is an error to attempt to replace it with more than a single element.

3.3.3 Attribute nodes

An href with an XPointer may identify an attribute or a collection of nodes containing an attribute. Attempting inclusion of attributes results in an error.

Issue (XInclude:32-include-attributes): Currently, it is not possible to set the value of an attribute through an include mechanism. This make it difficult to generate XLinks for example. Should a mechanism be developed to include text as attribute values?
source:
  <x>
    <uri>theUri</uri>
    <link xmlns:xlink="...">
      <xinclude:include href="#xpointer(x/uri/text())"
         as-an-attribute-named="xlink:href"/>
    </link>
  </x>
result:
  <x>
    <uri>theUri</uri>
    <link xlink:href="theURI" xmlns:xlink="..."/>
  </x>

Issue (XInclude:12-ignore-attributes): Should attempted inclusion of attributes be ignored instead of generating an error?

3.3.4 Range locations

An href with an XPointer may identify a location set that represents a ranges or a set of ranges. Information items within these ranges appear in the result tree.

[Definition: ] An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range. [Definition: ] An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range. By definition, a character information item cannot be partially selected.

A range is included by including in document order the set of information item selected or partially selected by the range. The children of selected information items are included. The children of partially selected information items are included if they in turn are either selected or partially selected.

A location set containing multiple ranges are included as if each range in the location set were included in order.

4. The XInclude Processor

[Definition: ] An XInclude processor is a class of XML processor that conforms to all the behavior of the XML and XML Namespaces Recommendations, and additionally supports the inclusion behavior specified in this document. For purposes of this document, the term "XInclude processor" includes all the functionality of an "XML processor".

Note that a simple application-defined switch would be sufficient to flip between XML processors and XInclude processors.

4.1 Exposing the Base URI

An XInclude processor may expose the base URI of a document, element, or processing instruction information item. This enables applications which resolve URI References to process them correctly. Two examples where this is necessary are XLink, and the xml-stylesheet processing instruction.

Issue (XInclude:14-exposing-base-url): Should exposure of this information be required? It appears necessary for applications that wish to operate on URIs in the result.

4.2 Validation

XML 1.0 validation is not performed on the results of the inclusion, nor on the included elements. The include mechanism introduces the notion of infoset validation. After all inclusions are completed, an include processor will validate the infoset against the original document's DTD if it contains a doctype declaration.

NOTE: The DTD or Schema used for validation may need to be adjusted when running a particular document through an XML processor instead of an XInclude processor. A validating XInclude document is not necessarily a validating XML document, and vice versa.

Issue (XInclude:15-validation-relationship): I do not believe that XInclude should hard-code its relationship to schema validation. If I want to write an application that does inclusion and then validates the resulting document, I should be allowed to. [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]

Issue (XInclude:16-dtd-validation): Technically speaking, XInclude inclusion *cannot* occur before DTD validation. DTD validation is done by the XML processor: by definition it is accomplished before an information set is created. If you want DTD-syntax validation that works on information sets the you need to specify it yourself as the HyTime people did. SGML and XML just do not support it natively. [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]
From Ben Trafford: Couldn't you guys define a normative addition to the internal subset that would allow for XInclude validation, and then state than an XInclude-aware processor makes this addition to the infoset based on the parsing of the internal subset? Basically, a 'virtual internal subset'.

4.3 IDs

IDs and IDREFS intersection with the inclusion mechanism surfaces a few issues with respect to to XML and inclusion infoset validation.

If an attribute declares an ID that has already been declared, processing is the same as if duplicate IDs had been encountered in a single XML document. This condition would be discovered during infoset validation, after all inclusions are performed. For example, processing could be halted and an approprate error surfaced.

ID rewriting is a possiblity for inclusion of documents with nodes containing IDs. The following condition may occur: An including document contains an ID. The inclusion specified is to the subnode of a separate document. The separate document contains the same ID outside the scope of the inclusion, and the inclusion scope contains an IDREF to the ID. It is unclear whether this should be an error condition or not. It is conceivable that authors would design their modularity to use this aspect of IDs. It is also possible that IDs should be re-written to be local to the scope of the document.

This proposal suggests that ID rewriting should not be performed. In the previous use-case, the document will infoset validate if the infoset after inclusion contains an IDREF to an ID that is in the document.

Issue (XInclude:17-id-validation-redundant): ID validation is merely a schema validation issue and should not be separated out as its own "point." [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]

4.4 Relation to other XML standards

The relationship between XInclude and othr XML standards is defined by the concept of a 'XInclude processor'. Such a processor leverages XML 1.0 and XML Namespaces in it's syntax, and uses the XML Infoset to describe a specific processing model. In general, XInclude processing should occur between the generation of an Infoset by a processor, and the consumption of that infoset by a higher-level application, so that the inclusion results are transparent to those applications.

Although XInclude may be implemented as an independent layer, it also may be implemented at a lower level with the same results, but with potentially greater performance.

The relationship between XInclude and DTD or XML Schema validation needs additional exploration (as noted by issues within this document). In particular DTD validation as defined in XML 1.0 does not support validation of the result infoset within this 'layered' strategy.

Issue (XInclude:27-schema): Are there any requirements in particular that the Schema WG has of XInclude? For instance, Schema has a facility for mapping included documents to the including document's namespace instead. We could provide this feature as well.

5. Syntax

The syntax for specifying inclusion is an element similar to the simple links defined by XLink. XInclude defines a namespace associated with the URI http://www.w3.org/1999/XML/xinclude .

[Definition: ] The XInclude namespace contains a single element, the include element, or xinclude:include. This element has the following attributes:

href: A URI Reference containing the address of the resource to include.
A URI Reference containing the address of the resource to include.
parse: An enumeration specifying whether or not to include the resource as parsed XML or as text. A value of "xml" indicates that the resource should be parsed as XML and the infosets merged. A value of "text" indicates that the resource should be included as the contents of a text node. A value of "cdata" indicates that the resource should be included as the contents of a CDATA node or a sequence of CDATA nodes.
An enumeration specifying whether or not to include the resource as parsed XML or as text. A value of "xml" indicates that the resource should be parsed as XML and the infosets merged. A value of "text" indicates that the resource should be included as the contents of a text node. A value of "cdata" indicates that the resource should be included as the contents of a CDATA node or a sequence of CDATA nodes.

<!ELEMENT xinclude:include EMPTY>
<!ATTLIST xinclude:include
    href           CDATA               #REQUIRED
    parse          (xml|parse|cdata)   #IMPLIED "xml"
>

Issue (XInclude:29-add-id-attribute): Should an id attribute be added to XInclude? If so, how is it given the ID datatype? [Paul Grosso in http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JanMar/0290.html (W3C Members only)]

Issue (XInclude:30-allow-other-attributes): Should the permission to add non-XInclude attributes such as ID be made explicit? [John Cowen in http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2000JanMar/0292.html (W3C Members only)]

Issue (XInclude:31-which-namespace): The authors suggest that the xml: namespace should be the namespace of the include element. The use of the xml: namespace allows all xml documents to reference the inclusion mechanism without requiring additional namespace declarations to support inclusion. As inclusion is useful to most or all xml vocabularies, we suggest that it is reasonable to add to the xml: namespace. The authors do not suggest a mechanism for the W3C to determine the body that works on the specification of the xml:include element.

Issue (XInclude:33-atribute-only-syntax): XInclude requires an XML element. This has implications for re-use in other vocabularies. It may be advantageous to have an attribute only syntax for XInclude to allow vocabularies the ability to create their own include elements. XLink, faced with a similar problem, chose to only support an attribute-based syntax.

Appendices

A. References

XLink: World Wide Web Consortium. XML Linking Language (XLink). W3C Working Draft. See http://www.w3.org/TR/xlink .
XML: World Wide Web Consortium. Extensible Markup Language (XML) 1.0. W3C Recommendation. See http://www.w3.org/TR/REC-xml
XML Infoset: World Wide Web Consortium. XML Information Set. W3C Working Draft. See http://www.w3.org/TR/xml-infoset
XML Names: World Wide Web Consortium. Namespaces in XML. W3C Recommendation. See http://www.w3.org/TR/REC-xml-names
XMLBase: World Wide Web Consortium. XML Base (XBase). W3C Working Draft. See http://www.w3.org/TR/xmlbase .
XPointer: World Wide Web Consortium. XML Pointer Language (XPointer). W3C Working Draft. See http://www.w3.org/TR/xptr .

B. Examples (Non-Normative)

B.1 Infoset inclusion example

The following XML document contains an xinclude:include element which points to an external document.

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>120Mz is adequate for an average home user.</p>
  <xinclude:include href="disclaimer.xml"/>
</document>

disclaimer.xml contains:

<?xml version='1.0'?>
<disclaimer>
  <p>The opinions represented herein represent those of the individual
  and should not be interpreted as official policy endorsed by this
  organization.</p>
</disclaimer>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>120Mz is adequate for an average home user.</p>
  <disclaimer>
  <p>The opinions represented herein represent those of the individual
  and should not be interpreted as official policy endorsed by this
  organization.</p>
</disclaimer>
</document>

B.2 Range inclusion example

The following illustrates the results of including a range specified by an XPointer.

<?xml version='1.0'?>
<document>
  <p>The relevant excerpt is:</p>
  <quotation>
    <xinclude:include xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"
      href="source.xml#xpointer(string-range(chapter/p[1],'Sentence 2') to string-range(chapter/p[2]/i,'3.',0,11))"/>
  </quotation>
</document>

source.xml contains:

<chapter>
  <p>Sentence 1.  Sentence 2.</p>
  <p><i>Sentence 3.  Sentence 4.</i>  Sentence 5.</p>
</chapter>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document>
  <p>The relevant excerpt is:</p>
  <quotation>
    <p>Sentence 2.</p>
  <p><i>Sentence 3.</i></p>
  </quotation>
</document>

B.3 Textual inclusion example

The following XML document link a working example into.

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>The following is the source of the "data.xml" file:</p>
  <example><xinclude:include href="data.xml" parse="cdata"/></example>
  <example><xinclude:include href="data.xml" parse="text"/></example>
</document>

data.xml contains:

<?xml version='1.0'?>
<data>
  <item><![CDATA[Brooks & Sheilds]]></item>
</data>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>The following is the source of the "data.xml" file:</p>
  <example><![CDATA[<data>
  <item><![CDATA[Brooks & Sheilds]]]]><![CDATA[></item>
</data>]]></example>
  <example>&lt;data&gt;
  &lt;item&gt;&lt;![CDATA[Brooks &amp; Sheilds]]&gt;&lt;/item&gt;
&lt;/data&gt;</example>
</document>

Note that CDATA notation can itself be escaped at the textual level by replacing occurances of "]]>" with "]]]]><![CDATA[>". At the DOM level, this may mean several CDATA nodes may result from an inclusion, instead of just one.

Issue (XInclude:34-cdata-breaking): The above implies one way to split a CDATA section into parts, but other ways exist, e.g. splitting ]-]> instead of ]]->. Do we want to mandate a specific split point?

Issue (XInclude:35-multiple-cdata-nodes): It is unclear whether this is necessary. The CDATA start and end markers can be inserted around the include, and since the resource is acquired as text, there isn't really any necessity to double escape these. In any case the normative description should be worded in terms of CDATA markers.

C. Open Issues List (Non-Normative)

A tabulation of open issues flagged above follows:

Issue (XInclude:37-next-number): Dummy issue used to record the next unused issue number.

XML Inclusions (XInclude)

W3C Working Draft 22-March-2000

Abstract

Status of this document

Table of Contents

Appendices

1. Introduction

1.1 Relationship to XLink

1.2 Relationship to XML External Entities

1.3 Relationship to Grammar-specific Inclusions

2. Requirements

3. Processing Model

3.1 Locating an Inclusion

3.2 Acquiring the Resource to be Included

3.2.1 Inclusion loops

3.2.2 Namespaces

3.3 Merging infosets

3.3.1 Document nodes

3.3.2 Multiple nodes

3.3.3 Attribute nodes

3.3.4 Range locations

4. The XInclude Processor

4.1 Exposing the Base URI

4.2 Validation

4.3 IDs

4.4 Relation to other XML standards

5. Syntax

6. Conformance

Appendices

A. References

B. Examples (Non-Normative)

B.1 Infoset inclusion example

B.2 Range inclusion example

B.3 Textual inclusion example

C. Open Issues List (Non-Normative)