XInclude 1.1 Requirement and Use Cases

Support for RFC 5147 XInclude offers facilities for both XML inclusion and plain text inclusion. In its current design, the use of fragment identifiers (in the xpointer attribute) is forbidden when performing plain text inclusion. The publication of RFC 5147 introduces a fragment identifier syntax for text/plain content. It would be very useful to be able to extract portions of a text document with RFC 5147 fragment identifiers just as it is useful to be able to extract portions of an XML document with XPointer. XInclude should be extended to support RFC 5147 fragment identifiers when parse="text" is specified.

Text pointer design It's worth considering, briefly, the possible design choices that could be made in adding support for RFC 5147. Three seem apparent: Support a new fragment identifier scheme in the xpointer attribute, for example xpointer="text(line=12,19;length=1859)". Unfortunately, the xpointer attribute is described as being specifically for XPointer fragment identifiers. XPointer, in turn, is specifically about XML. Add a new attribute, for example textpointer to hold the fragment identifier for text/plain content. Add a new attribute, for example fragid to hold the fragment identifier and deprecate the use of xpointer. In some respects, this seems the most logical choice, but it would invalidate (or at least deprecate) all existing documents that are using fragment identifiers. What's more, implementors would have to support both attributes indefinitely in order to handle legacy content, so this doesn't seem to provide much benefit. Adding a new attribute seems like the best compromise.

Improved communication between the pre- and post-included infosets XInclude is a transformative process. It begins with two or more infosets and produces a new, single infoset that represents the result of the transformation process. (For the purpose of this discussion, it's sufficient to consider the case where parse="text" is specified as including an infoset that consists entirely of character information items). There are aspects of the included infosets that must be made manifest in the resulting infoset in order to preserve semantics. Specifically, a xml:base attribute may be added to included elements in order to preserve the XML Base of the included items and a xml:lang attribute may be added to included elements in order to preserve the language of the included items. These additional bits of communication across the transclusion boundary preserve important semantic information present in the original infosets. It is possible to imagine other kinds of important semantic information that an author might want to preserve. In particular, one troubling aspect of XInclude processing is the potential damage done to ID/IDREF relationships. If the same ID is defined in several included documents (each of which is entirely valid on its own), the resulting document cannot be valid because it contains duplicate ID values. By the same token, IDREF values that ostensibly point to one of these now duplicated IDs are now left in an unfortunate state. Authors might have several strategies in mind for resolving these problems: Perform textual transformation of IDs in (some) included documents to force uniqueness across the entire resulting infoset. Perform IDREF fixup using any of several algorithms: Within an included fragment, point to the locally included ID. Within an included fragment, point to globally the first ID. Outside an included fragment, point to the first preceding ID. Outside an included fragment, point to the first following ID. Point to the closest ID. It's clear that there is no single strategy that would satisfy all authors all the time. In fact, on examination, it becomes clear that there's no single strategy that would satisfy all authors within the same document. (For additional background and detailed analysis of one real use case, see and .) Designing a language (or extending XInclude) to support this degree of flexibility would be complicated. This complexity would be exacerbated by the fact that generic XML tools may not even be able to identify all of the ID and IDREF values in a document. Addressing this problem may require vocabulary-specific knowledge of the documents involved. Attempts to solve the problem in a vocabulary-specific manner, however, run afoul of the fact that XInclude leaves no trace of its actions. A processor cannot come back to the result of an XInclude transformation and identify the boundaries of inclusion. The ability to do that, which is arguably semantic information at least as valuable as the base URI and language of the included documents, would provide the hooks necessary to develop application-specific solutions without requiring that those solutions encompass all of the features of XInclude. One method of providing this information would be to pass attributes present on the xi:include element through to the root element(s) included. For example, if a chapter was included with this XInclude: <xi:include href="chapter.xml" ex:root="true" ex:fixup="nearest"/> The resulting infoset might include the chapter in this way: <chapter xml:base="base/chapter.xml" xml:lang="en-us" ex:root="true" ex:fixup="nearest"/> Passing additional attributes through would provide a mechanism for authors to communicate with down-stream processes. XInclude should be extended to support improved communication between the pre- and post-included infosets.

Communication design It's worth considering, briefly, the possible design choices that could be made in adding support of this kind: All attributes could be copied. All attributes except href, parse, xml:base, and xml:lang could be copied. Only namespace-qualified attributes could be copied. Only non-namespace-qualified attributes could be copied. The attributes to be copied could be explicitly enumerated in another new attribute. Some record of which attributes have been copied could be added to the result. There are merits to each of these options, and there may be others, but ideally the solution will be as simple as possible. It's also worth noting that none of these approaches offers any solution for inclusions that consist of top-level nodes other than elements. It doesn't appear possible to address those cases.