<?xml-stylesheet href="spec.xsl" type="text/xsl"?><specification xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:cs="http://www.w3.org/XML/XProc/2006/04/components#" xmlns:e="http://www.w3.org/1999/XSL/Spec/ElementSyntax" class="note" version="5.0-extension">
<info>
<title>XInclude 1.1 Requirement and Use Cases</title>
<w3c-shortname>xinclude-11-requirements</w3c-shortname>
<pubdate>2012-02-14</pubdate>
<bibliorelation type="isformatof" xlink:href="Overview.xml">XML</bibliorelation>
<!--
<bibliorelation type="isformatof" xlink:href="diff.html">Revision markup</bibliorelation>
<bibliorelation type="replaces" xlink:href="http://www.w3.org/TR/2010/PR-xproc-20100309/"/>
-->
<authorgroup>
  <author>
    <personname>Norman Walsh</personname>
    <affiliation>
      <orgname>MarkLogic Corporation</orgname>
    </affiliation>
    <email>norman.walsh@marklogic.com</email>
  </author>
</authorgroup>

<abstract>
<para>This document summarizes requirements and use cases for possible enhancements
to XInclude.</para>
</abstract>

<legalnotice role="status">

<para>Implementation experience has led users of XInclude to suggest a
number of enhancements. These enhancements would allow XInclude to support
the needs of richer applications by providing mechanisms to address
uniqueness constraints, to communicate with processes that occur after
XInclude, and to take advantage of the fragment identifier scheme
for <code>text/plain</code> documents.
This document attempts to enumerate these enhancements.</para>

<para><emphasis>This section describes the status of this document at
the time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest revision
of this technical report can be found in the <link xlink:href="http://www.w3.org/TR/">W3C technical reports index</link>
at http://www.w3.org/TR/.</emphasis></para>

<para>This requirements document is being published as a Working Group
Note. Publication as a Working Group Note does not imply endorsement
by the W3C Membership. This is a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in
progress.</para>

<para>This document
is a product of the <link xlink:href="http://www.w3.org/XML/Core/">W3C XML
Core Working Group</link> as part of the
<link xlink:href="http://www.w3.org/XML/">XML Activity</link>.</para>

<para>Please submit any comments on this document to
<link xlink:href="mailto:www-xml-xinclude-comments@w3.org">www-xml-xinclude-comments@w3.org</link>;
<link xlink:href="http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/">public
archives</link> are available.</para>

<para>This document was produced by a group operating under the
<link xlink:href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5
February 2004 W3C Patent Policy</link>. W3C maintains a
<link xlink:href="http://www.w3.org/2004/01/pp-impl/18796/status#disclosures">public
list of any patent disclosures</link> made in connection with the
deliverables of the group; that page also includes instructions for
disclosing a patent. An individual who has actual knowledge of a
patent which the individual believes contains
<link xlink:href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential
Claim(s)</link> must disclose the information in accordance with
<link xlink:href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section
6 of the W3C Patent Policy</link>.</para>
</legalnotice>
</info>

<section xml:id="introduction">
<title>Introduction</title>

<para>It has been several years since the XInclude Recommendation
was published. This document outlines the requirements and use cases for
to changes to XInclude: support for
<biblioref linkend="rfc-5147"/>
and improved communication between the pre- and post-inclusion Infosets.</para>
</section>

<section xml:id="rfc5147">
<title>Support for RFC 5147</title>

<para>XInclude offers facilities for both XML inclusion and plain text
inclusion. In its current design, the use of fragment identifiers (in
the <tag class="attribute">xpointer</tag> attribute) is forbidden when
performing plain text inclusion.</para>

<para>The publication of RFC 5147 introduces a fragment identifier
syntax for text/plain content. It would be very useful to be able to extract
portions of a text document with RFC 5147 fragment identifiers just as it is
useful to be able to extract portions of an XML document with XPointer.</para>

<para>XInclude should be extended to support RFC 5147 fragment identifiers
when <code>parse="text"</code> is specified.</para>

<section xml:id="text-pointer-design">
<title>Text pointer design</title>

<para>It's worth considering, briefly, the possible design choices that
could be made in adding support for RFC 5147. Three seem apparent:</para>

<orderedlist>
<listitem>
<para>Support a new fragment identifier scheme in the <tag class="attribute">xpointer</tag>
attribute, for example <code>xpointer="text(line=12,19;length=1859)"</code>.
Unfortunately, the <tag class="attribute">xpointer</tag> attribute is described
as being specifically for XPointer fragment identifiers. XPointer, in turn, is specifically
about XML.</para>
</listitem>
<listitem>
<para>Add a new attribute, for example <tag class="attribute">textpointer</tag>
to hold the fragment identifier for text/plain content.</para>
</listitem>
<listitem>
<para>Add a new attribute, for example <tag class="attribute">fragid</tag>
to hold the fragment identifier and deprecate the use of
<tag class="attribute">xpointer</tag>. In some respects, this seems the most
logical choice, but it would invalidate (or at least deprecate) all existing
documents that are using fragment identifiers. What's more, implementors would
have to support both attributes indefinitely in order to handle legacy content,
so this doesn't seem to provide much benefit.</para>
</listitem>
</orderedlist>

<para>Adding a new attribute seems like the best compromise.</para>
</section>
</section>

<section xml:id="improved-communication">
<title>Improved communication between the pre- and post-included infosets</title>

<para>XInclude is a transformative process. It begins with two or more infosets
and produces a new, single infoset that represents the result of the transformation
process. (For the purpose of this discussion, it's sufficient to consider the
case where <code>parse="text"</code> is specified as including an infoset that
consists entirely of character information items).</para>

<para>There are aspects of the included infosets that must be made manifest in
the resulting infoset in order to preserve semantics. Specifically, a
<tag class="attribute">xml:base</tag> attribute may be added to included elements
in order to preserve the XML Base of the included items and a
<tag class="attribute">xml:lang</tag> attribute may be added to included elements
in order to preserve the language of the included items.</para>

<para>These additional bits of communication across the transclusion boundary
preserve important semantic information present in the original infosets.
It is possible to imagine <emphasis>other</emphasis> kinds of important semantic
information that an author might want to preserve.</para>

<para>In particular, one troubling aspect of XInclude processing is the potential
damage done to ID/IDREF relationships. If the same ID is defined in several included
documents (each of which is entirely valid on its own), the resulting document
cannot be valid because it contains duplicate ID values. By the same token, IDREF values
that ostensibly point to one of these now duplicated IDs are now left in an
unfortunate state.</para>

<para>Authors might have several strategies in mind for resolving these problems:</para>

<orderedlist>
<listitem>
<para>Perform textual transformation of IDs in (some) included documents to force
uniqueness across the entire resulting infoset.</para>
</listitem>
<listitem>
<para>Perform IDREF fixup using any of several algorithms:</para>
<orderedlist>
<listitem>
<para>Within an included fragment, point to the locally included ID.</para>
</listitem>
<listitem>
<para>Within an included fragment, point to globally the first ID.</para>
</listitem>
<listitem>
<para>Outside an included fragment, point to the first preceding ID.</para>
</listitem>
<listitem>
<para>Outside an included fragment, point to the first following ID.</para>
</listitem>
<listitem>
<para>Point to the closest ID.</para>
</listitem>
</orderedlist>
</listitem>
</orderedlist>

<para>It's clear that there is no single strategy that would satisfy all authors
all the time. In fact, on examination, it becomes clear that there's no single strategy
that would satisfy all authors <emphasis>within the same document</emphasis>.
(For additional background and detailed analysis of one real use case, see
<biblioref linkend="dbtrans-req"/> and <biblioref linkend="dbtrans"/>.)
</para>

<para>Designing a language (or extending XInclude) to support this
degree of flexibility would be complicated. This complexity would be
exacerbated by the fact that generic XML tools may not even be able to
identify all of the ID and IDREF values in a document. Addressing this
problem may require vocabulary-specific knowledge of the documents
involved.</para>

<para>Attempts to solve the problem in a vocabulary-specific manner, however,
run afoul of the fact that XInclude leaves no trace of its actions. A processor
cannot come back to the result of an XInclude transformation and identify the
boundaries of inclusion.</para>

<para>The ability to do that, which is arguably semantic information at least
as valuable as the base URI and language of the included documents, would provide
the hooks necessary to develop application-specific solutions without requiring
that those solutions encompass <emphasis>all</emphasis> of the features of
XInclude.</para>

<para>One method of providing this information would be to pass attributes present
on the <tag>xi:include</tag> element through to the root element(s) included.
For example, if a <code>chapter</code> was included with this
XInclude:</para>

<programlisting>&lt;xi:include href="chapter.xml" ex:root="true" ex:fixup="nearest"/&gt;</programlisting>

<para>The resulting infoset might include the chapter in this way:</para>

<programlisting>&lt;chapter xml:base="base/chapter.xml" xml:lang="en-us"
         ex:root="true" ex:fixup="nearest"/&gt;</programlisting>

<para>Passing additional attributes through would provide a mechanism for authors
to communicate with down-stream processes.</para>

<para>XInclude should be extended to support improved communication between
the pre- and post-included infosets.</para>

<section xml:id="communication-design">
<title>Communication design</title>

<para>It's worth considering, briefly, the possible design choices that
could be made in adding support of this kind:</para>

<orderedlist>
<listitem>
<para>All attributes could be copied.</para>
</listitem>
<listitem>
<para>All attributes except <tag class="attribute">href</tag>,
<tag class="attribute">parse</tag>, <tag class="attribute">xml:base</tag>,
and <tag class="attribute">xml:lang</tag> could be copied.</para>
</listitem>
<listitem>
<para>Only namespace-qualified attributes could be copied.</para>
</listitem>
<listitem>
<para>Only <emphasis>non-</emphasis>namespace-qualified attributes could
be copied.</para>
</listitem>
<listitem>
<para>The attributes to be copied could be explicitly enumerated in another
new attribute.</para>
</listitem>
<listitem>
<para>Some record of which attributes <emphasis>have been</emphasis> copied could be
added to the result.</para>
</listitem>
</orderedlist>

<para>There are merits to each of these options, and there may be others,
but ideally the solution will be as simple as possible.</para>

<para>It's also worth noting that none of these approaches offers any
solution for inclusions that consist of top-level nodes other than
elements. It doesn't appear possible to address those cases.</para>
</section>
</section>

<section xml:id="references">
<title>References</title>

<bibliolist>
<bibliomixed xml:id="rfc-5147"><abbrev>RFC 5147</abbrev>
<citetitle xlink:href="http://www.ietf.org/rfc/rfc5147.txt">URI Fragment
Identifiers for the text/plain Media Type</citetitle>.
E. Wilde, M. Duerst, IETF, April 2008. (See http://www.ietf.org/rfc/rfc5147.txt.)
</bibliomixed>

<bibliomixed xml:id="dbtrans-req"><abbrev>DBTRANS-REQ</abbrev>
<citetitle xlink:href="http://docbook.org/docs/transclusion-requirements/">Requirements
for transclusion in DocBook</citetitle>.
Jirka Kosek, 09 December 2010.  (See http://docbook.org/docs/transclusion-requirements/.)
</bibliomixed>

<bibliomixed xml:id="dbtrans"><abbrev>DBTRANS</abbrev>
<citetitle xlink:href="http://docbook.org/docs/transclusion/">DocBook
Transclusion</citetitle>. Jirka Kosek, 20 April 2011.
(See http://docbook.org/docs/transclusion/.)</bibliomixed>
</bibliolist>
</section>

</specification>