XInclude 1.1 Requirement and Use Cases

W3C Working Group Note 14 February 2012

This Version:: http://www.w3.org/TR/2012/NOTE-xinclude-11-requirements-20120214/
Latest Version:: http://www.w3.org/TR/xinclude-11-requirements/
Editor:: Norman Walsh, MarkLogic Corporation <norman.walsh@marklogic.com>

This document is also available in these non-normative formats: XML

Abstract

This document summarizes requirements and use cases for possible enhancements to XInclude.

Status of this Document

Implementation experience has led users of XInclude to suggest a number of enhancements. These enhancements would allow XInclude to support the needs of richer applications by providing mechanisms to address uniqueness constraints, to communicate with processes that occur after XInclude, and to take advantage of the fragment identifier scheme for text/plain documents. This document attempts to enumerate these enhancements.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This requirements document is being published as a Working Group Note. Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document is a product of the W3C XML Core Working Group as part of the XML Activity.

Please submit any comments on this document to www-xml-xinclude-comments@w3.org; public archives are available.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

1 Introduction

2 Support for RFC 5147

2.1 Text pointer design

3 Improved communication between the pre- and post-included infosets

3.1 Communication design

4 References

1 Introduction

It has been several years since the XInclude Recommendation was published. This document outlines the requirements and use cases for to changes to XInclude: support for [RFC 5147] and improved communication between the pre- and post-inclusion Infosets.

2 Support for RFC 5147

XInclude offers facilities for both XML inclusion and plain text inclusion. In its current design, the use of fragment identifiers (in the xpointer attribute) is forbidden when performing plain text inclusion.

The publication of RFC 5147 introduces a fragment identifier syntax for text/plain content. It would be very useful to be able to extract portions of a text document with RFC 5147 fragment identifiers just as it is useful to be able to extract portions of an XML document with XPointer.

XInclude should be extended to support RFC 5147 fragment identifiers when parse="text" is specified.

2.1 Text pointer design

It's worth considering, briefly, the possible design choices that could be made in adding support for RFC 5147. Three seem apparent:

Support a new fragment identifier scheme in the xpointer attribute, for example xpointer="text(line=12,19;length=1859)". Unfortunately, the xpointer attribute is described as being specifically for XPointer fragment identifiers. XPointer, in turn, is specifically about XML.
Add a new attribute, for example textpointer to hold the fragment identifier for text/plain content.
Add a new attribute, for example fragid to hold the fragment identifier and deprecate the use of xpointer. In some respects, this seems the most logical choice, but it would invalidate (or at least deprecate) all existing documents that are using fragment identifiers. What's more, implementors would have to support both attributes indefinitely in order to handle legacy content, so this doesn't seem to provide much benefit.

Adding a new attribute seems like the best compromise.

3 Improved communication between the pre- and post-included infosets

XInclude is a transformative process. It begins with two or more infosets and produces a new, single infoset that represents the result of the transformation process. (For the purpose of this discussion, it's sufficient to consider the case where parse="text" is specified as including an infoset that consists entirely of character information items).

There are aspects of the included infosets that must be made manifest in the resulting infoset in order to preserve semantics. Specifically, a xml:base attribute may be added to included elements in order to preserve the XML Base of the included items and a xml:lang attribute may be added to included elements in order to preserve the language of the included items.

These additional bits of communication across the transclusion boundary preserve important semantic information present in the original infosets. It is possible to imagine other kinds of important semantic information that an author might want to preserve.

In particular, one troubling aspect of XInclude processing is the potential damage done to ID/IDREF relationships. If the same ID is defined in several included documents (each of which is entirely valid on its own), the resulting document cannot be valid because it contains duplicate ID values. By the same token, IDREF values that ostensibly point to one of these now duplicated IDs are now left in an unfortunate state.

Authors might have several strategies in mind for resolving these problems:

Perform textual transformation of IDs in (some) included documents to force uniqueness across the entire resulting infoset.
Perform IDREF fixup using any of several algorithms:
1. Within an included fragment, point to the locally included ID.
2. Within an included fragment, point to globally the first ID.
3. Outside an included fragment, point to the first preceding ID.
4. Outside an included fragment, point to the first following ID.
5. Point to the closest ID.

It's clear that there is no single strategy that would satisfy all authors all the time. In fact, on examination, it becomes clear that there's no single strategy that would satisfy all authors within the same document. (For additional background and detailed analysis of one real use case, see [DBTRANS-REQ] and [DBTRANS].)

Designing a language (or extending XInclude) to support this degree of flexibility would be complicated. This complexity would be exacerbated by the fact that generic XML tools may not even be able to identify all of the ID and IDREF values in a document. Addressing this problem may require vocabulary-specific knowledge of the documents involved.

Attempts to solve the problem in a vocabulary-specific manner, however, run afoul of the fact that XInclude leaves no trace of its actions. A processor cannot come back to the result of an XInclude transformation and identify the boundaries of inclusion.

The ability to do that, which is arguably semantic information at least as valuable as the base URI and language of the included documents, would provide the hooks necessary to develop application-specific solutions without requiring that those solutions encompass all of the features of XInclude.

One method of providing this information would be to pass attributes present on the xi:include element through to the root element(s) included. For example, if a chapter was included with this XInclude:

<xi:include href="chapter.xml" ex:root="true" ex:fixup="nearest"/>

The resulting infoset might include the chapter in this way:

<chapter xml:base="base/chapter.xml" xml:lang="en-us"
         ex:root="true" ex:fixup="nearest"/>

Passing additional attributes through would provide a mechanism for authors to communicate with down-stream processes.

XInclude should be extended to support improved communication between the pre- and post-included infosets.

3.1 Communication design

It's worth considering, briefly, the possible design choices that could be made in adding support of this kind:

All attributes could be copied.
All attributes except href, parse, xml:base, and xml:lang could be copied.
Only namespace-qualified attributes could be copied.
Only non-namespace-qualified attributes could be copied.
The attributes to be copied could be explicitly enumerated in another new attribute.
Some record of which attributes have been copied could be added to the result.

There are merits to each of these options, and there may be others, but ideally the solution will be as simple as possible.

It's also worth noting that none of these approaches offers any solution for inclusions that consist of top-level nodes other than elements. It doesn't appear possible to address those cases.

4 References

[RFC 5147] URI Fragment Identifiers for the text/plain Media Type. E. Wilde, M. Duerst, IETF, April 2008. (See http://www.ietf.org/rfc/rfc5147.txt.)

[DBTRANS-REQ] Requirements for transclusion in DocBook. Jirka Kosek, 09 December 2010. (See http://docbook.org/docs/transclusion-requirements/.)

[DBTRANS] DocBook Transclusion. Jirka Kosek, 20 April 2011. (See http://docbook.org/docs/transclusion/.)