This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process is expressed in XML-friendly syntax (elements, attributes, URI References).

Status of this document

The XML Core Working Group, with this 2000 July 17 XInclude working draft, invites comment on this specification.

The W3C Membership and other interested parties are invited to review the specification, provide comment, and report early implementation experience. The area of work covered by this specification was outlined in the XML Inclusion Proposal (XInclude), W3C Note of 23 November 1999 [XInclude]. The purpose of publishing this draft is to update the community on our progress in this area and to solicit feedback on the current draft. It should be noted that the WG plans to take this specification to a Last Call review in the near future.

While the WG has decided to publish this working draft, outstanding issues remain as noted in the draft. Especially note the change from an element-based syntax to an attribute-based syntax. This change is being revisited by the WG and feedback is specifically solicited on this issue.

Comments on this document should be sent to www-xml-xinclude-comments@w3.org , which is publicly archived. While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.

It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.

Appendices

1. Introduction

Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism. This proposal introduces a generic mechanism for merging XML documents (as represented by their information sets) The syntax leverages existing XML constructs - attributes and URI references.

The requirements used to guide the development of XInclude may be found in the XML Inclusion Proposal W3C Note of 23 November 1999 [XInclude] .

1.1. Relationship to XLink

XInclude differs from the linking features described in the XML Linking Language [XLink], specifically links with the attribute value show="embed". Such links provide a media-type independent syntax for indicating that a resource is to be embedded graphically within the display of the document. XLink does not specify a specific processing model, but simply facilitates the detection of links and recognition of associated metadata by a higher level application.

XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.

Simple node inclusion as described in this specification differs from transclusion, which preserves contextual information such as style.

1.2. Relationship to XML External Entities

There are a number of differences between XInclude and XML external entities [XML] which make them complimentary technologies.

Processing of external entities (as with the rest of DTDs) occurs at parse time. XInclude operates on information sets and thus is orthogonal to parsing.

Declaration of external entities requires a DTD or internal subset. This places a set of dependencies on inclusion, for instance, the syntax for the DOCTYPE declaration requires that the document element be named - clearly orthogonal to inclusion in many cases. Validating parsers must have a complete content model defined. XInclude is orthogonal to validation and the name of the document element.

External entities provide a level of indirection - the external entity must be declared and named, and separately invoked. XInclude uses direct referencecs. Applications which generate XML output incrementally can benefit from not having to pre-declare inclusions.

The syntax for an internal subset is cumbersome to many authors of simple well-formed XML documents. XInclude syntax is based on familiar XML constructs.

Note also that XInclude together with XPointer [XPointer] can replace certain forms of internal entities, although XInclude syntax is not optimized for this purpose.

1.3. Relationship to DTDs

XInclude defines no relationship to DTD validation. XInclude describes an infoset-to-infoset transformation and not a change in XML 1.0 parsing behavior. XInclude does not define a mechanism for DTD validation of the resulting infoset.

1.4. Relationship to XML Schemas

XInclude defines no relationship to the augmented infosets produced by applying an XML Schema. Such an augmented infoset can be supplied as the input infoset, or such augmentation may be applied to the infoset resulting from the inclusion.

1.5. Relationship to Grammar-specific Inclusions

Special-purpose inclusion mechanisms have been introduced into specific XML grammars. XInclude provides a generic mechanism for recognizing and processing inclusions, and as such can offer a simpler overall authoring experience, greater performance, and less code redundancy. The attribute-based syntax facilitates incorporation of inclusion capabilities into other markup languages.

3. Processing Model

Inclusion as defined in this document is a specific type of infoset transformation. A source infoset as defined by the XML Infoset [XML Infoset] is transformed into a result infoset according to the rules specified in this document.

The input for the inclusion transformation consists of a source infoset. The output is a new infoset which merges the source infoset with the infosets of resources identified by URI references appearing in include elements. Thus a mechanism to resolve URLs and return the identified resources as infosets is assumed. Well-formed XML entities that do not have defined infosets (e.g. an external entity file with multiple top-level elements) are outside the scope of this specification, either for use as source infosets or the result infoset.

There is no attempt to preserve information in the result infoset indicating where inclusion has been performed - for this information the original infoset must be examined.

Issue (XInclude-44-preserve-include-element): We could preserve this information rather trivially, by describing an Infoset addition. For instance, a property [include-element] defined to contain the element information item for the include element in the original infoset. Applications unaware of this property would treat the inclusion as transparent, applications aware of this property could use it to (e.g. editors) switch between included and non-included "views" of the document. I don't see any harm in providing such an optional property.

3.1. Include Elements

The existence of an include is asserted by an [Definition: ] include element , which is any element bearing the attributes required by this specification.

Each include element in the source infoset is examined, its xinclude:href attribute dereferenced and parsed into an infoset, and this infoset merged with the source infoset, as described in following sections.

The order in which include elements are processed is not defined by this specification. The effect of this is as if intra-document references within include elements must be resolved against the original infoset, instead of resolving against some intermediate state.

In the following example, the second include points to the <myinclude> element itself, not to it's replacement.

<x xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <myinclude xinclude:href="something.xml"/>
  <myinclude xinclude:href="#xpointer(x/myinclude[1])"
             xinclude:parse="text"/>
</x>

3.2. Acquiring the Resource to be Included

The value of the xinclude:href is a URI reference. The set of characters allowed in a xinclude:href is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references, and thus processors must encode and escape these characters to obtain a valid URI reference from the attribute value.

The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the crosshatch (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows:

Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes.
Any octets corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.

The URI reference is combined with the base URI of the include element as specified in XML Base [XML Base]. The resource identified by the full escaped and encoded URI reference is acquired and an infoset or collection of information items is created, either by parsing the resource as xml, or by treating the resource as a plain text file and converting it into a collection of character information items. This latter behavior allows the inclusion of "working examples" into explanatory text. Which of the two methods for creating an infoset is to be used is determined by the parse attribute, which must take either the value "xml" or the value "text".

Issue (XInclude-53-parse-text-fragments): Presumably a URI reference for an include with parse="text" should not be allowed to contain a fragment (xpath), since there is no infoset, just a collection of character items. Or do we want to allow ranges of characters from a text file? This seems be getting into the same area as infosets-from-external-entities.

Note that the character encodings of the including and included resources can be different. This does not affect the resulting infoset, but may need to be taken into account during any subsequent serialization.

Resources that are unavailable for any reason result in an error. Resources that resolve to non-well-formed XML given the parse="xml" option result in an error. Resources that resolve to something other than text when parse="text" are specified result in an error.

Issue (XInclude-45-fail-text): It is easy to see how to fail a non-xml resource - it's not well-formed. Is there a similarly well-defined mechanism for determining the success of a parse="text" inclusion? Or do we need to rely on the media type text/*? (We intentionally don't rely on text/xml, as we want to enable things like image/svg.)

When parse="xml" is specified, include elements in the infoset are recursively expanded.

3.2.1. Inclusion loops

When processing nested include elements with parse="xml", it is an error to include a resource that contains an include element containing a URI reference that has already been processed in the inclusion chain.

Issue (XInclude-46-compare-uris): (Oh no not again!) Is the literal value of the URI Reference compared, or its absolutized and escaped version?

In other words, the following are all legal:

An include element with parse="text" may reference itself.
An include element may identify a different part of the same local resource.
Two non-nested include elements may identify a resource which itself contains an include element.

The following are illegal:

An include element pointing to itself or any ancestor thereof.
An include element pointing to any include element or ancestor thereof which has already been processed at a higher level.

3.2.2. Namespaces

A source infoset might contain namespace information items. The namespace URI property is considered to be part of the element information item, and merging infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML text source. A serialized result infoset might contain additional namespace declarations when including a sub-resource.

For example, the following document:

<foo xmlns:x="uri1">
 <include xinclude:href="common.xml#xptr(a/b)"/>
</foo>

including a node from common.xml:

<a xmlns:x="uri2">
  <b>
    <x:a/>
  </b>
</a>

results in a document that could be serialized as:

<foo xmlns:x="uri1">
  <b xmlns:x="uri2">
    <x:a/>
  </b>
</foo>

This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.

Serialization, and specifically where additional namespace declarations might appear, is not constrained by this specificataion.

Issue (XInclude-52-infoset-properties): We specifically say that the namespace name property of an element is preserved when the infosets are merged. What about the in-scope namespaces property? This seems to be needed so that qnames in the included nodes can be resolved. What about the "declared namespaces" property? More generally, should there be a list of infoset properties that must be preserved or deleted?

3.3. Merging infosets

The acquired infoset is merged with the source infoset to create a new infoset by replacing the information items representing the include elements with information items in the acquired infoset. The include element, its attributes and any children, are not represented in the result infoset.

The base URI property of the acquired infoset is not changed as result of merging the infoset, and remains unchanged after merging. Thus relative URIs in the included infoset resolve to the same URI despite being included into a document with a potentially different base URI in effect.

Issue (XInclude-36-infoset-entities): The infoset exposes entity information items http://www.w3.org/TR/xml-infoset#infoitem.entity. XInclude does not define whether entity information items are copied via the infoset or not.

3.3.1. Document nodes

An acquired infoset will often represent a complete XML document. In this case the document information item does not appear in the resulting infoset. The top-level children of the document information item replace the include element in the order in which they appear in the acquired infoset. This applies to comments, processing instructions, and the document element.

The XML declaration and the document type declaration information item in the included document do not appear in the result infoset.

Ed. note: Add example of ignorable and non-ignorable whitespace.

3.3.2. Multiple nodes

An include element might identify a subresource that consists of more than a single information item. In this case these information items replace the information item representing include element in the order in which they appear in the included document.

If the document element in the source infoset is an include element, it is an error to attempt to replace it with more than a single element.

3.3.3. Attribute and Namespace nodes

An xinclude:href with an XPointer might identify an attribute, a namespace node, or a collection of nodes containing an attribute or namespace node. Attempting inclusion of nodes that are not allowed as a child of an element results in an error.

3.3.4. Range locations

An xinclude:href with an XPointer might identify a location set that represents a ranges or a set of ranges. Information items within these ranges appear in the result infoset.

[Definition: ] An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range. [Definition: ] An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range. By definition, a character information item cannot be partially selected.

A range is included by including in document order the set of information item selected or partially selected by the range. The children of selected information items are included. The children of partially selected information items are included if they in turn are either selected or partially selected.

A location set containing multiple ranges are included as if each range in the location set were included in order.

Appendices

A. References

IETF RFC 2119: RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Internet Engineering Task Force, 1997. (See http://www.ietf.org/rfc/rfc2119.txt .)
IETF RFC 2279: RFC 2279: UTF-8, a transformation format of ISO 10646. Internet Engineering Task Force, 1998. (See http://www.ietf.org/rfc/rfc2279.txt .)
IETF RFC 2396: RFC 2396: Uniform Resource Identifiers. Internet Engineering Task Force, 1995. (See http://www.ietf.org/rfc/rfc2396.txt .)
IETF RFC 2732: RFC 2732: Format for Literal IPv6 Addresses in URL's. Internet Engineering Task Force, 1999. (See http://www.ietf.org/rfc/rfc2732.txt .)
Unicode: The Unicode Consortium. The Unicode Standard.(See http://www.unicode.org/unicode/standard/standard.html.)
XML: Tim Bray, Jean Paoli, and C.M. Sperberg-McQueen, editors. Extensible Markup Language (XML) 1.0. World Wide Web Consortium, 1998. (See http://www.w3.org/TR/REC-xml .)
XML Base: Jonathan Marsh, editor. XML Base. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xmlbase.)
XML Infoset: John Cowan and David Megginson, editors. XML Information Set. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xml-infoset.)
XML Names: Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. Textuality, Hewlett-Packard, and Microsoft. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)
XPointer: Steve DeRose, Ron Daniel, Eve Maler, editors. XML Pointer Language (XPointer). World Wide Web Consortium, 1999. (See http://www.w3.org/TR/xptr.)

B. References (Non-Normative)

XInclude: Jonathan Marsh, David Orchard, editors. XML Inclusion Proposal (XInclude). World Wide Web Consortium, 1999. (See http://www.w3.org/TR/1999/NOTE-xinclude-19991123 .)
XLink: Steve DeRose, Eve Maler, David Orchard, and Ben Trafford, editors. XML Linking Language (XLink). World Wide Web Consortium, 2000. (See http://www.w3.org/TR/xlink/.)

C. Examples (Non-Normative)

C.1. Infoset inclusion example

The following XML document contains an include element which points to an external document.

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>120Mz is adequate for an average home user.</p>
  <boilerplate xinclude:href="disclaimer.xml"/>
</document>

disclaimer.xml contains:

<?xml version='1.0'?>
<disclaimer>
  <p>The opinions represented herein represent those of the individual
  and should not be interpreted as official policy endorsed by this
  organization.</p>
</disclaimer>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>120Mz is adequate for an average home user.</p>
  <disclaimer>
  <p>The opinions represented herein represent those of the individual
  and should not be interpreted as official policy endorsed by this
  organization.</p>
</disclaimer>
</document>

C.2. Range inclusion example

The following illustrates the results of including a range specified by an XPointer.

<?xml version='1.0'?>
<document>
  <p>The relevant excerpt is:</p>
  <quotation>
    <include xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"
      xinclude:href="source.xml#xpointer(string-range(chapter/p[1],'Sentence 2') to string-range(chapter/p[2]/i,'3.',0,11))"/>
  </quotation>
</document>

source.xml contains:

<chapter>
  <p>Sentence 1.  Sentence 2.</p>
  <p><i>Sentence 3.  Sentence 4.</i>  Sentence 5.</p>
</chapter>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document>
  <p>The relevant excerpt is:</p>
  <quotation>
    <p>Sentence 2.</p>
  <p><i>Sentence 3.</i></p>
  </quotation>
</document>

C.3. Textual inclusion example

The following XML document link a working example into.

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>The following is the source of the "data.xml" file:</p>
  <example><example-body xinclude:href="data.xml" xinclude:parse="text"/></example>
</document>

data.xml contains:

<?xml version='1.0'?>
<data>
  <item><![CDATA[Brooks & Sheilds]]></item>
</data>

The infoset resulting from resolving inclusions on this document could be serialized as:

<?xml version='1.0'?>
<document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude">
  <p>The following is the source of the "data.xml" file:</p>
  <example>&lt;?xml version='1.0'?&gt;
&lt;data&gt;
  &lt;item&gt;&lt;![CDATA[Brooks &amp; Sheilds]]&gt;&lt;/item&gt;
&lt;/data&gt;</example>
</document>

D. Open Issues List (Non-Normative)

A tabulation of open issues flagged above follows:

Issue (XInclude-55-next-number): Dummy issue used to record the next unused issue number.

XML Inclusions (XInclude) Version 1.0

W3C Working Draft 17-July-2000

Abstract

Status of this document

Table of Contents

Appendices

1. Introduction

1.1. Relationship to XLink

1.2. Relationship to XML External Entities

1.3. Relationship to DTDs

1.4. Relationship to XML Schemas

1.5. Relationship to Grammar-specific Inclusions

2. Terminology

3. Processing Model

3.1. Include Elements

3.2. Acquiring the Resource to be Included

3.2.1. Inclusion loops

3.2.2. Namespaces

3.3. Merging infosets

3.3.1. Document nodes

3.3.2. Multiple nodes

3.3.3. Attribute and Namespace nodes

3.3.4. Range locations

4. Syntax

5. Conformance

5.1. Markup Conformance

5.2. Application Conformance

Appendices

A. References

B. References (Non-Normative)

C. Examples (Non-Normative)

C.1. Infoset inclusion example

C.2. Range inclusion example

C.3. Textual inclusion example

D. Open Issues List (Non-Normative)