The purpose of this document is to set forth a minimal set of requirements and introduce a processing model and syntax for a general purpose inclusion facility. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process uses an XML-friendly syntax (elements, attributes, URI-References). The general purpose inclusion mechanism is usable in well-formed but not necessarily valid XML documents.
The XML Linking Working Group has decided to publish the XInclude proposal as a W3C Note from the XML Linking Working Group. This is the result of the evolution of the show="parsed" behaviour found in early XLink Working Drafts. It was decided that this functionality would be better handled in the core XML specification. Hence, at this time, this document is for discussion purposes only. This Note may be updated, replaced or rendered obsolete by other W3C documents at any time. It is inappropriate to use W3C Notes as reference material or to cite them as other than "work in progress". This document is for discussion only and does not imply endorsement by the W3C membership.
This document has been produced as part of the W3C XML Activity by the XML Linking Working Group. The XML Linking WG charter currently expires in March 2000, but can be extended, if necessary.
Please send detailed comments on this document to the archived mailing-list firstname.lastname@example.org.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
This proposal offers a mechanism that enables composite infoset creation by including resources or subresources using element, attribute, and uri-reference syntax.
Many programming languages provide an inclusion mechanism to facilitate modularity. Many markup languages also have need of such a mechanism. Some examples of related "inclusions" in W3C technologies include:
The purpose of this document is to set forth a minimal set of requirements and introduce a processing model and syntax for a general purpose inclusion facility. The general purpose inclusion mechanism is usable in well-formed but not necessarily valid XML documents. The syntax consists of XML element and attribute syntax. No DTD style syntax is required, unlike entities.
The main audience is XML document authors who have need of modularity mechanisms in their production systems and language designs. While consolidating the special-purpose inclusion features already in use or development in the W3C is not a requirement for this effort, it is hoped that a general purpose mechanism would reduce future proliferation of such grammar-specific solutions.
Inclusion features differ from the linking features described in XLink in that they require specific behavior from the inclusion processor. This behavior occurs at a low level (infoset) as opposed to XLink which provides a mechanism for storing metadata for the use of higher-level applications (up to and including hypertext browsers.) Specifically, inclusion differs from transclusion, which is generally thought to provide much more contextual information - style sheets, schemas, etc - than simple node inclusion.
Inclusion as defined in this document is a specific type of infoset transformation. A source infoset is transformed into a result infoset using the processing model specified in this document.
The infosets used or created by a XInclude processor support all required information items and properties as specified in the XML Infoset ( http://www.w3.org/TR/xml-infoset), and may support optional properties as well. In addition, XInclude requires the Base URI property to be surfaced on information items. This property is optional in the XML Infoset.
The result of performing the inclusion transformation is a result infoset which merges
the source infoset with the infosets of resources identified by URI-references appearing
xinclude:include elements. The transformation input consists of a
source infoset. A mechanism to resolve URLs and return the identified resources
as infosets is needed.
By defining inclusion as an infoset-to-infoset transformation, only a minimal extension to the XML Infoset is envisioned (making the base URI property required instead of optional). This approach simplifies the coordination problems between working groups, because we operate upon infosets instead of within them. One disadvantage of this strategy is that a single infoset does not simultaneously provide to "included" and "non-included" views. This simplification does not address some interesting applications which need to determine which content has been included, such as editing a doc with an included copyright, then using DAV to save the non-included part back to a server.
The existence of an include is asserted by an include element.
The XInclude namespace shall be associated with the following URI: http://www.w3.org/1999/XML/xinclude. The declaration of the include element and attributes will require the usual declaration of namespaces in XML.
NOTE: The authors suggest that the
xml:namespace should be the namespace of the include element. The use of the xml: namespace allows all xml documents to reference the inclusion mechanism without requiring additional namespace declarations to support inclusion. As inclusion is useful to most or all xml vocabularies, we suggest that it is reasonable to add to the xml: namespace. The authors do not suggest a mechanism for the W3C to determine the body that works on the specification of the
When performing inclusions, an XInclude processor identifies an
element in the source infoset and acquires the resource specified. The information set
for the resource is created and merged with the source infoset. This process is repeated
xinclude:include elements have been processed.
Issue (XInclude:3-nesting-optimization): The proposal implies that the destination documents are knitted before inclusion, which we agree is the right behaviour, but we need some way to optimise this (including an element which doesn't have any links in it, but is in a document which does, should not require following all the links in that document). [Richard Tobin]
The value of the
href attribute on an
xinclude:include is combined
with the base URI of the
xinclude:include element upon which it appears to describe
a full URI-reference. The resource identified by this URI-reference is acquired and an infoset created, either
by parsing the resource, or turning it into an infoset consisting of a single text information
item. This latter behavior facilitates the inclusion of "working examples" into explanatory text. The two
methods for creating an infoset are specified using the
parse attribute, which
has the values "xml", "text", or "cdata".
Resources that are unavailable for any reason result in an error. Resources that resolve
to non-well-formed XML given the
parse="xml" option result in an error.
Resources that resolve to something other than text when
parse="cdata" is specified result in an error.
xinclude:include elements in this infoset are recursively processed to
depth of their
steps attribute, up to the maximum recursion depth specified by
steps attribute on the initial source document.
Issue (XInclude:21-steps-utility): The steps attribute is included to allow a parser to protect itself from combinatorial explosion of includes. Should steps be removed because it cannot solve the combinatorial include problem? It was pointed out in the XML Link face to face meeting that steps may be an incorrect mechanism for to solve this problem. This issue is that an author will probably want to specify steps="*" for any and all nodes and inclusion, or the author will want to specify steps="1" to simply include the next set of nodes. It will be very difficult for an author to know the correct level of steps to prevent inclusion exposion. A separate mechanism must be used by an include processor to allow protection from this case.
Issue (XInclude:5-steps-violoation): Do we leave remaining xinclude:includes alone at that point or do we signal an error if any remain? In other words, is this an assertion of the max depth, or the definition of the max depth?
Issue (XInclude:23-include-indirection-using-locators): An advantage of XML 1.0 external entities is that external references may be defined and then re-used. Include could be augmented by the same notion of defining external entities and then consumption by include elements. A locator element could specify a remote resource using an href attribute and named by a name attribute. An include element could use an eref attribute instead of an href. The eref on the include would match the name attribute of the locator element. While it is possible for include to specify the href attribute via an xpointer expression, include does not currently allow a dereference of the resulting attribute. Example:
<locator href="xyz" name="abc"/><include eref="abc"/>
Issue (XInclude:6-shortcut-syntax): Should we investigate a shorthand for inclusion? The syntax for entity references is minimal (although the overhead of a DTD is not), and may provide a model.
David Orchard: External entites allow the definition and then use of a resource, currenty defined in XML 1.0 DTDs. A proposal to combine XInclude and entities could involve the creation of a Locator element that specifies a name and an href. Similar to the include-indirection-using-locators issue, xinclude:include could add the attribute of eref (entityname) that specifies the name of a Locator element. This would be used instead of href when eref was present. Then parse and steps are used as normal. &name; means steps="1" and parse="xml", ie &name; is short for <xinclude:include steps="1" parse="xml" ename="name">. This involves changing the xml 1.0 specification for name references.
Paul Prescod notes [http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]: "You guys are in the same jam as the XML Schema people. You have to decide what level you want to work at and live with it. If you are working on infosets then &foo; syntax is long past resolved (or reported as an error) unless you are going to go in and hack the XML 1.0 specification."
When processing nested
xinclude:include elements with
it is an error to include a resource that contains an
containing a URI-reference that has already been processed in the inclusion chain.
In other words, the following are all legal inclusion:
parse="cdata"may reference itself.
The following are illegal inclusions:
xinclude:includeelement itself or any ancestor thereof.
xinclude:includeelement or ancestor thereof which has already been processed by a higher-level inclusion.
Issue (XInclude:7-steps-redundant): Do you want to outlaw (section 3.3.1) apparent recursion that is in fact prevented by the steps attribute? It would be nice to be able to use circular links to build circular data structures. [Richard Tobin]
An XInclude processor is by definition namespace aware, and performs namespace processing as described in the Infoset WD. The namespace URI is thus considered part of the element information item, and merging the infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML sources. A serialized result infoset may thus contain additional namespace declarations when including a sub-resource.
For example, the following document:
<foo xmlns:x="uri1"> <xinclude:include href="common.xml#xptr(a/b)"/> </foo>
including a node from common.xml:
<a xmlns:x="uri2"> <b> <x:a/> </b> </a>
results in a document that could be serialized as:
<foo xmlns:x="uri1"> <b xmlns:x="uri2"> <x:a/> </b> </foo>
This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.
Applications performing serialization of the result infoset are not constrained on where they place the namespace declarations, as long as the result preserves the namespaces of the included items.
The acquired infoset is merged with the source infoset to create a new infoset by
replacing the information items representing the
with information items in the acquired infoset. The
its attributes and any children, are not represented in the result infoset.
The base URI property of the acquired infoset is not changed as result of merging the infoset, so the base URI property is retained unchanged after merging.
Issue (XInclude:2-base-uri-syntax): A reserialised document will lose the base URL information; do we need an [xinclude:base-url] attribute that can be added to any element? [Richard Tobin]
Issue (XInclude:8-internationalization): Do we need to specify anything about merging documents with different character sets? For instance, what about merging a Japanese doc with an US-ASCII doc? Or does the Infoset already imply normalization into Unicode?
Issue (XInclude:9-include-children-syntax): Should we have a simpler mechanism for including the children of the identified node? This would be especially useful for simulating text entities:<entity-declaration id="nbsp"> </entity-declaration> <xinclude:include href="#xptr(id('nbsp')/text())"/>Could be instead specified by something like:<xinclude:include href="#nbsp" children-only="yes"/>or<xinclude:include-children href="#nbsp"/>
An acquired infoset will often represent a complete XML document. In this case the
top-level children of the document information item will replace the
element, in the order in which they appear in the acquired infoset. This applies to
comments, processing instructions, and the document element.
The XML declaration in the included document is used to parse the external file but the declaration itself is not preserved in the resulting infoset.
Issue (XInclude:10-whitespace): Are there any tricky whitespace issues to consider?
Issue (XInclude:11-fragments): XSLT transformations can operate on document fragments. Should we expand our scope to include fragments as well?
Issue (XInclude:22-document-element): Should something other than ignoring the document element declaration be performed on document nodes?
xinclude:include may identify a subresource that consists of
more than a single information item. In this case these information items
replace the information item representing xinclude:include in the order in which they
appear in the included document.
If the document element in the source infoset is an
it is an error to attempt to replace it with more than a single element.
Issue (XInclude:24-range-inclusions): When the href attribute specifies a range, should the range nodes be included in order or should ranges be disallowed. Currently decided to disallow.
href with an XPointer may identify an attribute or a collection
of nodes containing an attribute. Attempting inclusion of attributes results in
Issue (XInclude:12-ignore-attributes): Should attempted inclusion of attributes be ignored instead of generating an error?
An href with an XPointer may identify a location that is not a node, such as a range or a string location. Attempting inclusion of non-node locations results in an error.
Issue (XInclude:13-substring-inclusion): Should we support inclusion of arbitrary ranges? What about simple text substrings?
Issue (XInclude:26-cdata-inclusion): Should we support the inclusion of characters within a CDATA section?
A XInclude processor is a class of XML processor that conforms to all the behavior of the XML and XML Namespaces Recommendations, and additionally supports the inclusion behavior specified in this document. For purposes of this document, the term "XInclude processor" includes all the functionality of an "XML Processor".
Note that a simple application-defined switch would be sufficient to flip between XML processors and XInclude processors.
The XML component shipped with Microsoft Internet Explorer 5 can operate as either
a validating parser, or a non-validating one. This option is provided to a user through
the addition of a method to the DOM
document object, called
validateOnParse. A similar switch (
be provided for toggling between a normal XML processor and a XInclude processor.
xmldoc = new ActiveXObject("Microsoft.XMLDOM"); xmldoc.validateOnParse = false; xmldoc.processIncludes = true; xmldoc.load("source.xml");
Issue (XInclude:28-document-level-switch): Ben Trafford: The example used to switch between the processors is fine as programming code, but how could someone specifically make a request within the document to activate the inclusion mechanism? I'm envisioning people using a XInclude-aware processor, that only processes the inclusions when requested to (via a stylesheet mechanism seems the most obvious way to me).
A XInclude processor may expose the base URI of a document, element, or processing instruction information item. This enables applications which resolve URI References to process them correctly. Two examples where this is necessary are XLink, and the xml-stylesheet processing instruction.
Issue (XInclude:14-exposing-base-url): Should exposure of this information be required? It appears necessary for applications that wish to operate on URIs in the result.
The base URI information could be provided through the addition of a method
to the DOM
node object. Rather than simply exposing the base as a property,
it may be more useful instead to be able to resolve URI-references using the base. The
resolveURI method is passed a relative URI-reference, which is resolved to a full URI-reference
using the base URI of a particular node.
This example finds the first
xlink:simple element, extracts the relative
URI-reference, and resolves it into a full URI-reference in context of the xlink:simple element.
xmlnode = xmldoc.getElementsByTagName("xlink:simple").item(0); relativeURI = xmlnode.attributes.getNamedItem("href").nodeValue; fullURI = xmlnode.resolveURI(relativeURI);
Issue (XInclude:25-included-node-attribute): Should a XInclude processor expose attribute(s) indicating whether a node was included or not? If so, should any relevent inclusion information be present, such as include href, steps and prase.
XML 1.0 validation is not performed on the results of the inclusion, nor on the included elements. The include mechanism introduces the notion of infoset validation. After all inclusions are completed, an include processor will validate the infoset against the original document's DTD if it contains a doctype declaration.
NOTE: The DTD or Schema used for validation may need to be adjusted when running a particular document through an XML processor instead of a XInclude processor. A validating XInclude document is not necessarily a validating XML document, and vice versa.
Issue (XInclude:15-validation-relationship): I do not believe that XInclude should hard-code its relationship to schema validation. If I want to write an application that does inclusion and then validates the resulting document, I should be allowed to. [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]
Issue (XInclude:16-dtd-validation): Technically speaking, XInclude inclusion *cannot* occur before DTD validation. DTD validation is done by the XML processor: by definition it is accomplished before an information set is created. If you want DTD-syntax validation that works on information sets the you need to specify it yourself as the HyTime people did. SGML and XML just do not support it natively. [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]
From Ben Trafford: Couldn't you guys define a normative addition to the internal subset that would allow for XInclude validation, and then state than a XInclude-aware processor makes this addition to the infoset based on the parsing of the internal subset? Basically, a 'virtual internal subset'.
IDs and IDREFS intersection with the inclusion mechanism surfaces a few issues with respect to to XML and inclusion infoset validation.
If an attribute declares an ID that has already been declared, processing is the same as if duplicate IDs had been encountered in a single XML document. This condition would be discovered during infoset validation, after all inclusions are performed. For example, processing could be halted and an approprate error surfaced.
ID rewriting is a possiblity for inclusion of documents with nodes containing IDs. The following condition may occur: An including document contains an ID. The inclusion specified is to the subnode of a separate document. The separate document contains the same ID outside the scope of the inclusion, and the inclusion scope contains an IDREF to the ID. It is unclear whether this should be an error condition or not. It is conceivable that authors would design their modularity to use this aspect of IDs. It is also possible that IDs should be re-written to be local to the scope of the document.
This proposal suggests that ID rewriting should not be performed. In the previous use-case, the document will infoset validate if the infoset after inclusion contains an IDREF to an ID that is in the document.
Issue (XInclude:17-id-validation-redundant): ID validation is merely a schema validation issue and should not be separated out as its own "point." [Paul Prescod: http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0211.html (W3C Members only)]
"The relationship between XInclude and othr XML standards is defined by the concept of a 'XInclude processor'. Such a processor leverages XML 1.0 and XML Namespaces in it's syntax, and uses the XML Infoset to describe a specific processing model. In general, XInclude processing should occur between the generation of an Infoset by a processor, and the consumption of that infoset by a higher-level application, so that the inclusion results are transparent to those applications.
Although XInclude may be implemented as an independent layer, it also may be implemented at a lower level with the same results, but with potentially greater performance.
The relationship between XInclude and DTD or XML Schema validation needs additional exploration (as noted by issues within this document). In particular DTD validation as defined in XML 1.0 does not support validation of the result infoset within this 'layered' strategy.
Issue (XInclude:27-schema): Are there any requirements in particular that the Schema WG has of XInclude?
The syntax for specifying inclusion is an element similar to the simple links defined by XLink. XInclude defines a namespace associated with the URI http://www.w3.org/1999/XML/xinclude.
The XInclude namespace contains a single element,
xinclude:include, with the following attributes:
<!ELEMENT xinclude:include EMPTY> <!ATTLIST xinclude:include href CDATA #REQUIRED steps CDATA #IMPLIED "1" parse CDATA #IMPLIED "xml" >
NOTE: A new element name is used instead of a simple link with a new value for the
showattribute to differentiate the inclusion link from XLinks. Differentiation is desireable since the processing models are quite different, namely XInclude defines a specific processing model, while XLink does not.
Issue (XInclude:18-role-syntax): Should the syntax be expressed instead through a qualified role value? [Paul Prescod in http://lists.w3.org/Archives/Member/w3c-xml-linking-ig/1999Aug/0209.html (W3C Members only)]
Issue (XInclude:19-role-syntax-2): It would also be possible for it to be a completely different [xlink:type] value, making it clearer that other XLink attributes were not relevant. [Richard Tobin]
xinclude:includeelement according to the semantics given in this specification.
The following XML document fragments specify a document containing an xinclude:include element which points to an external document.
<?xml version='1.0'?> <mydocument xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>I'm asking a silly question about the following content</p> <xinclude:include steps="1" href="myurl.xml"/> <p>Now, what did the included content tell us?</p> </mydocument>
<?xml version='1.0'?> <copyright> <p>Copyright notice for all content by the W3C</p> </copyright>
The result of processing this document with an XLink processor is the same as the result of processing the following document with an XML processor.
<?xml version='1.0'?> <mydocument> <p>I'm asking a silly question about the following content</p> <copyright> <p>Copyright notice for all content by the W3C</p> </copyright> <p>Now, what did the included content tell us?</p> </mydocument>
The following XML document link a working example into.
<?xml version='1.0'?> <mydocument xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><xinclude:include href="data.xml" parse="cdata"/></example> <example><xinclude:include href="data.xml" parse="text"/></example> </mydocument>
<?xml version='1.0'?> <data> <item><![CDATA[Brooks & Sheilds]]></item> </data>
The result of processing this document with an XLink processor is the same as the result of processing the following document with an XML processor.
<?xml version='1.0'?> <mydocument xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><![CDATA[<data> <item><![CDATA[Brooks & Sheilds]]]]><![CDATA[></item> </data>]]></example> <example><data> <item><![CDATA[Brooks & Sheilds]]></item> </data></example> </mydocument>
Note that CDATA notation can itself be escaped by replacing occurances of "]]>" with "]]]]><![CDATA[>". At the DOM level, this may mean several CDATA nodes may result from an inclusion, instead of just one.
While the intention of XInclude is not to replace the
xsl:include functionality of XSLT, it may be useful to examine how
these features could have been formulated to leverage the benefits of
xsl:include could be replaced by xinclude:include by allowing the
xsl:stylesheet to appear within itself. If the following stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:apply-templates/> </xsl:template> <xsl:stylesheet> <xsl:template match="foo"> <foo2><xsl:apply-templates /></foo2> </xsl:template> </xsl:stylesheet> </xsl:stylesheet>
and if processor were directed to ignore the extra level of hierarchy introduced by
xsl:stylesheet element, it is easy to see that the nested
xsl:stylesheet element could be placed in an external file and replaced by
xinclude:include. XSLT currently handles this case by conceptually stripping
the extra level of hierarchy during the inclusion process.
The problem where the included stylesheet uses the "single template structure" is a bit more problematic, as the trivial solution requires the author of the including stylesheet to have knowledge of this structure:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xinclude:include href="sing-temp-ss.xsl" xmlns:xinclude="http://www.w3.org/1999/XML/xinclude/"/> </xsl:template> </xsl:stylesheet>
XSLT currently handles this case by implying the existance of the
xsl:template match='/' wrapper when it detects a stylesheet of this form.
xsl:import is similar to
xsl:include, differing only in that
it assigns "importance" levels to each file. This could be accomplished with a wrapper
element. The following stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="base-stylesheet.xsl"/> <xsl:template match="foo"> <foo2><xsl:apply-templates/></foo2> </xsl:template> </xsl:stylesheet>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:importance> <xinclude:include href="base-stylesheet.xsl"/> </xsl:importance> <xsl:template match="foo"> <foo2><xsl:apply-templates/></foo2> </xsl:template> </xsl:stylesheet>
Note that the include functionality becomes completely orthogonal to determining importance.
XLink defines no processing model for links, only a syntax that enables link detection. XInclude, on the other hand, defines a clear processing model at the infoset level. XLinks are processed at a high level (the application) while XIncludes are processed at a low level (infoset). Because of these differences, this proposal considers XInclude to be a different kind of link than those described in XLink. It would also be reasonable to consider XInclude as a specific type of XLink. This appendix explores the syntax that might arise from this approach.
Modification of the xlink:simple element to provide inclusion services would require the steps and parse attributes to be added to simple and the use of now defunct parse attribute for show. The steps and parse attributes could have the same defaults in xlink:simple as in the proposed xinclude:include.
Repeating the simple include example,
<foo xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <xinclude:include href="common.xml#xptr(a/b)"/> </foo>
The minimal xlink:simple comparison is shown below.
<foo xmlns:xlink="http://www.w3.org/1999/xlink/namespace"> <xlink:simple href="common.xml#xptr(a/b)" show="parse" actuate="auto"></xlink:simple></eg> </foo>
A maximal xlink:simple comparison is shown below.
<foo xmlns:xlink="http://www.w3.org/1999/xlink/namespace"> <xlink:simple href="common.xml#xptr(a/b)" show="parse" actuate="auto" parse="xml" steps="1" title="commonxmlfile" role="container"></xlink:simple></eg> </foo>
Modification of the xlink extended element would require the steps and parse attributes to be added to arc and the use of now defunct parse attribute for show. The steps and parse attributes could have the same defaults in xlink:arc as in the proposed xinclude:include. Repeating the simple include example,
<foo xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <xinclude:include href="common.xml#xptr(a/b)"/> </foo>The minimal xlink:extended comparison is shown below,
<foo xmlns:xlink="http://www.w3.org/1999/xlink/namespace"> <xlink:extended id="mylink1"> <xlink:arc show="parse" actuate="auto" from="linksource" to="target"/> <xlink:locator role="linksource" href="#xptr(here())"/> <xlink:locator role="target" href="common.xml#xptr(a/b)"/> </xlink:extended> </foo>
The following issues were raised at the XML Link F2F in September 1999:
The following shows the timeline of the XML Inclusion facility:
A tabulation of open issues flagged above follows: