Copyright © 2000 W3C® ( MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document specifies a processing model and syntax for general purpose inclusion. Inclusion is accomplished by merging a number of XML Infosets into a single composite Infoset. Specification of the XML documents (infosets) to be merged and control over the merging process is expressed in XML-friendly syntax (elements, attributes, URI References).
The XML Core Working Group, with this 2000 July 17 XInclude working draft, invites comment on this specification.
The W3C Membership and other interested parties are invited to review the specification, provide comment, and report early implementation experience. The area of work covered by this specification was outlined in the XML Inclusion Proposal (XInclude), W3C Note of 23 November 1999 [XInclude]. The purpose of publishing this draft is to update the community on our progress in this area and to solicit feedback on the current draft. It should be noted that the WG plans to take this specification to a Last Call review in the near future.
While the WG has decided to publish this working draft, outstanding issues remain as noted in the draft. Especially note the change from an element-based syntax to an attribute-based syntax. This change is being revisited by the WG and feedback is specifically solicited on this issue.
Comments on this document should be sent to www-xml-xinclude-comments@w3.org , which is publicly archived. While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.
It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR/.
Many programming languages provide an inclusion mechanism to facilitate modularity. Markup languages also often have need of such a mechanism. This proposal introduces a generic mechanism for merging XML documents (as represented by their information sets) The syntax leverages existing XML constructs - attributes and URI references.
The requirements used to guide the development of XInclude may be found in the XML Inclusion Proposal W3C Note of 23 November 1999 [XInclude] .
XInclude differs from the linking features described in the XML
Linking Language [XLink], specifically links with
the attribute value show="embed"
. Such links provide
a media-type independent syntax for indicating that a resource is to
be embedded graphically within the display of the document. XLink does
not specify a specific processing model, but simply facilitates the detection
of links and recognition of associated metadata by a higher level application.
XInclude, on the other hand, specifies a media-type specific (XML into XML) transformation. It defines a specific processing model for merging information sets. XInclude processing occurs at a low level, often by a generic XInclude processor which makes the resulting information set available to higher level applications.
Simple node inclusion as described in this specification differs from transclusion, which preserves contextual information such as style.
There are a number of differences between XInclude and XML external entities [XML] which make them complimentary technologies.
Processing of external entities (as with the rest of DTDs) occurs at parse time. XInclude operates on information sets and thus is orthogonal to parsing.
Declaration of external entities requires a DTD or internal subset. This places a set of dependencies on inclusion, for instance, the syntax for the DOCTYPE declaration requires that the document element be named - clearly orthogonal to inclusion in many cases. Validating parsers must have a complete content model defined. XInclude is orthogonal to validation and the name of the document element.
External entities provide a level of indirection - the external entity must be declared and named, and separately invoked. XInclude uses direct referencecs. Applications which generate XML output incrementally can benefit from not having to pre-declare inclusions.
The syntax for an internal subset is cumbersome to many authors of simple well-formed XML documents. XInclude syntax is based on familiar XML constructs.
Note also that XInclude together with XPointer [XPointer] can replace certain forms of internal entities, although XInclude syntax is not optimized for this purpose.
XInclude defines no relationship to DTD validation. XInclude describes an infoset-to-infoset transformation and not a change in XML 1.0 parsing behavior. XInclude does not define a mechanism for DTD validation of the resulting infoset.
XInclude defines no relationship to the augmented infosets produced by applying an XML Schema. Such an augmented infoset can be supplied as the input infoset, or such augmentation may be applied to the infoset resulting from the inclusion.
Special-purpose inclusion mechanisms have been introduced into specific XML grammars. XInclude provides a generic mechanism for recognizing and processing inclusions, and as such can offer a simpler overall authoring experience, greater performance, and less code redundancy. The attribute-based syntax facilitates incorporation of inclusion capabilities into other markup languages.
[Definition: ] The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [IETF RFC 2119].
Inclusion as defined in this document is a specific type of infoset transformation. A source infoset as defined by the XML Infoset [XML Infoset] is transformed into a result infoset according to the rules specified in this document.
The input for the inclusion transformation consists of a source infoset. The output is a new infoset which merges the source infoset with the infosets of resources identified by URI references appearing in include elements. Thus a mechanism to resolve URLs and return the identified resources as infosets is assumed. Well-formed XML entities that do not have defined infosets (e.g. an external entity file with multiple top-level elements) are outside the scope of this specification, either for use as source infosets or the result infoset.
There is no attempt to preserve information in the result infoset indicating where inclusion has been performed - for this information the original infoset must be examined.
Issue (XInclude-44-preserve-include-element): We could preserve this information rather trivially, by describing an Infoset addition. For instance, a property [include-element] defined to contain the element information item for the include element in the original infoset. Applications unaware of this property would treat the inclusion as transparent, applications aware of this property could use it to (e.g. editors) switch between included and non-included "views" of the document. I don't see any harm in providing such an optional property.
The existence of an include is asserted by an [Definition: ] include element , which is any element bearing the attributes required by this specification.
Each include element in the source infoset is
examined, its xinclude:href
attribute dereferenced and
parsed into an infoset, and this infoset merged with the source infoset,
as described in following sections.
The order in which include elements are processed is not defined by this specification. The effect of this is as if intra-document references within include elements must be resolved against the original infoset, instead of resolving against some intermediate state.
In the following example, the second include points to the <myinclude> element itself, not to it's replacement.
<x xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <myinclude xinclude:href="something.xml"/> <myinclude xinclude:href="#xpointer(x/myinclude[1])" xinclude:parse="text"/> </x> |
The value of the
xinclude:href
is a URI reference. The set of characters allowed
in a xinclude:href
is the same as for XML, namely
[Unicode]. However, some Unicode characters are disallowed from URI
references, and thus processors must encode and
escape these characters to obtain a valid URI reference from the attribute
value.
The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [IETF RFC 2396], except for the crosshatch (#) and percent sign (%) characters and the square bracket characters re-allowed in [IETF RFC 2732]. Disallowed characters must be escaped as follows:
Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or more bytes.
Any octets corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).
The original character is replaced by the resulting character sequence.
The URI reference is combined with the base URI of the
include element as specified in XML Base [XML
Base]. The resource identified by the full escaped and encoded URI
reference is acquired and an infoset or collection of information items
is created, either by parsing the resource as xml, or by treating the
resource as a plain text file and converting it into a collection of character
information items. This latter behavior allows the inclusion of
"working examples" into explanatory text. Which of the two methods for
creating an infoset is to be used is determined by the parse
attribute, which must take either the value "xml"
or the value "text".
Issue (XInclude-53-parse-text-fragments): Presumably a URI reference for an include with parse="text" should not be allowed to contain a fragment (xpath), since there is no infoset, just a collection of character items. Or do we want to allow ranges of characters from a text file? This seems be getting into the same area as infosets-from-external-entities.
Note that the character encodings of the including and included resources can be different. This does not affect the resulting infoset, but may need to be taken into account during any subsequent serialization.
Resources that are unavailable for any reason result
in an error. Resources that resolve to non-well-formed XML given the
parse="xml"
option result in an error. Resources that
resolve to something other than text when parse="text"
are specified result in an error.
Issue (XInclude-45-fail-text): It is easy to see how to fail a non-xml resource - it's not well-formed. Is there a similarly well-defined mechanism for determining the success of a parse="text" inclusion? Or do we need to rely on the media type text/*? (We intentionally don't rely on text/xml, as we want to enable things like image/svg.)
When parse="xml"
is specified, include elements
in the infoset are recursively expanded.
When processing nested include elements with parse="xml"
, it is an error to include a resource that contains an include
element containing a URI reference that has already been processed in
the inclusion chain.
Issue (XInclude-46-compare-uris): (Oh no not again!) Is the literal value of the URI Reference compared, or its absolutized and escaped version?
In other words, the following are all legal:
An include element with parse="text"
may reference itself.
An include element may identify a different part of the same local resource.
Two non-nested include elements may identify a resource which itself contains an include element.
The following are illegal:
An include element pointing to itself or any ancestor thereof.
An include element pointing to any include element or ancestor thereof which has already been processed at a higher level.
A source infoset might contain namespace information items. The namespace URI property is considered to be part of the element information item, and merging infosets preserves the namespace of the item. This can result in a different result than a simple cut and paste of XML text source. A serialized result infoset might contain additional namespace declarations when including a sub-resource.
For example, the following document:
<foo xmlns:x="uri1"> <include xinclude:href="common.xml#xptr(a/b)"/> </foo> |
including a node from common.xml:
<a xmlns:x="uri2"> <b> <x:a/> </b> </a> |
results in a document that could be serialized as:
<foo xmlns:x="uri1"> <b xmlns:x="uri2"> <x:a/> </b> </foo> |
This differs from a text-level copy and paste in that it retains the integrity of the items from the uri2 namespace. A straight copy and paste could result in either the remapping of element names to an unintended namespace, or a document that is not well-formed with respect to namespaces.
Serialization, and specifically where additional namespace declarations might appear, is not constrained by this specificataion.
Issue (XInclude-52-infoset-properties): We specifically say that the namespace name property of an element is preserved when the infosets are merged. What about the in-scope namespaces property? This seems to be needed so that qnames in the included nodes can be resolved. What about the "declared namespaces" property? More generally, should there be a list of infoset properties that must be preserved or deleted?
The acquired infoset is merged with the source infoset to create a new infoset by replacing the information items representing the include elements with information items in the acquired infoset. The include element, its attributes and any children, are not represented in the result infoset.
The base URI property of the acquired infoset is not changed as result of merging the infoset, and remains unchanged after merging. Thus relative URIs in the included infoset resolve to the same URI despite being included into a document with a potentially different base URI in effect.
Issue (XInclude-36-infoset-entities): The infoset exposes entity information items http://www.w3.org/TR/xml-infoset#infoitem.entity. XInclude does not define whether entity information items are copied via the infoset or not.
An acquired infoset will often represent a complete XML document. In this case the document information item does not appear in the resulting infoset. The top-level children of the document information item replace the include element in the order in which they appear in the acquired infoset. This applies to comments, processing instructions, and the document element.
The XML declaration and the document type declaration information item in the included document do not appear in the result infoset.
Ed. note: Add example of ignorable and non-ignorable whitespace.
An include element might identify a subresource that consists of more than a single information item. In this case these information items replace the information item representing include element in the order in which they appear in the included document.
If the document element in the source infoset is an include element, it is an error to attempt to replace it with more than a single element.
An
xinclude:href
with an XPointer might identify an attribute,
a namespace node, or a collection of nodes containing an attribute or
namespace node. Attempting inclusion of nodes that are not allowed
as a child of an element results in an error.
An xinclude:href
with an XPointer might identify a location set that represents
a ranges or a set of ranges. Information items within these ranges
appear in the result infoset.
[Definition: ] An information item is said to be selected by a range if it occurs after (in document order) the starting point of the range and before the ending point of the range. [Definition: ] An information item is said to be partially selected by a range if it contains only the starting point of the range, or only the ending point of the range. By definition, a character information item cannot be partially selected.
A range is included by including in document order the set of information item selected or partially selected by the range. The children of selected information items are included. The children of partially selected information items are included if they in turn are either selected or partially selected.
A location set containing multiple ranges are included as if each range in the location set were included in order.
The syntax for specifying inclusion similar to the simple links defined by XLink. XInclude defines a namespace associated with the URI http://www.w3.org/1999/XML/xinclude . For convenience the prefix "xinclude" is used within this specification to indicate this namespace URI.
Issue (XInclude-54-syntax): Previous working drafts employed an element in the XInclude namespace as an include element. This draft employs an attribute-based syntax. The WG plans to revisit this issue and encourages public input on this issue.
The XInclude namespace contains the following attributes:
The following DTD fragment illustrates a declaration for indicating that an element serves as an include element.
<!ELEMENT include EMPTY> <!ATTLIST include xmlns:xinclude "http://www.w3.org/1999/XML/xinclude" #FIXED xinclude:href CDATA #REQUIRED xinclude:parse (xml|parse) "xml" > |
Issue (XInclude-31-which-namespace): The authors suggest that
the xml:
namespace should be the namespace of the include
element. The use of the xml: namespace allows all xml documents to reference
the inclusion mechanism without requiring additional namespace declarations
to support inclusion. As inclusion is useful to most or all xml vocabularies,
we suggest that it is reasonable to add to the xml: namespace.
An element information item is XInclude-conformant if it meets the syntactic requirements for include elements defined in this specification. This specification imposes no particular constraints on DTDs, conformance applies only to elements and attributes.
An application conforms to XInclude if it:
supports XML 1.0, XML Namespaces, and XML Base
observes the mandatory conditions (must ) set forth in this specification, and for any optional conditions (should and may ) it chooses to observe, observes them in the way prescribed
performs markup conformance testing according to all the conformance constraints appearing in this specification.
The following XML document contains an include element which points to an external document.
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>120Mz is adequate for an average home user.</p> <boilerplate xinclude:href="disclaimer.xml"/> </document> |
disclaimer.xml contains:
<?xml version='1.0'?> <disclaimer> <p>The opinions represented herein represent those of the individual and should not be interpreted as official policy endorsed by this organization.</p> </disclaimer> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>120Mz is adequate for an average home user.</p> <disclaimer> <p>The opinions represented herein represent those of the individual and should not be interpreted as official policy endorsed by this organization.</p> </disclaimer> </document> |
The following illustrates the results of including a range specified by an XPointer.
<?xml version='1.0'?> <document> <p>The relevant excerpt is:</p> <quotation> <include xmlns:xinclude="http://www.w3.org/1999/XML/xinclude" xinclude:href="source.xml#xpointer(string-range(chapter/p[1],'Sentence 2') to string-range(chapter/p[2]/i,'3.',0,11))"/> </quotation> </document> |
source.xml contains:
<chapter> <p>Sentence 1. Sentence 2.</p> <p><i>Sentence 3. Sentence 4.</i> Sentence 5.</p> </chapter> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document> <p>The relevant excerpt is:</p> <quotation> <p>Sentence 2.</p> <p><i>Sentence 3.</i></p> </quotation> </document> |
The following XML document link a working example into.
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><example-body xinclude:href="data.xml" xinclude:parse="text"/></example> </document> |
data.xml contains:
<?xml version='1.0'?> <data> <item><![CDATA[Brooks & Sheilds]]></item> </data> |
The infoset resulting from resolving inclusions on this document could be serialized as:
<?xml version='1.0'?> <document xmlns:xinclude="http://www.w3.org/1999/XML/xinclude"> <p>The following is the source of the "data.xml" file:</p> <example><?xml version='1.0'?> <data> <item><![CDATA[Brooks & Sheilds]]></item> </data></example> </document> |
A tabulation of open issues flagged above follows:
Issue (XInclude-55-next-number): Dummy issue used to record the next unused issue number.