W3CNOTE-XML-FRAG-REQ-19981123


XML Fragment Interchange Requirements
Version 1.0

W3C Note 23-Nov-1998

This version:
http://www.w3.org/TR/1998/NOTE-XML-FRAG-REQ-19981123
Latest version:
http://www.w3.org/TR/NOTE-XML-FRAG-REQ
Previous versions: (Member Only)
http://www.w3.org/XML/Group/1998/09/xml-frag-req
http://www.w3.org/XML/Group/1998/09/xml-frag-req-19980828
http://www.w3.org/XML/Group/1998/10/xml-frag-req-19981030
Editor:
Paul Grosso (Arbortext) <paul@arbortext.com>

Status of this document

This is a W3C Note produced as a deliverable of the XML Fragment WG (members only) according to its charter and the current XML Activity process. A list of current W3C working drafts and notes can be found at http://www.w3.org/TR .

This document is a work in progress representing the current consensus of the W3C XML Fragment Working Group. This version of the XML Fragment Interchange Requirements document has been approved by the XML Fragment working group and the XML Plenary to be posted for review by W3C members and other interested parties. Publication as a Note does not imply endorsement by the W3C membership. Comments should be sent to www-xml-fragment-comments@w3.org, which is an automatically and publicly archived email list.

This document is being processed according to the following review schedule:

Review Schedule
ProcessClosing dateStatusContact
XML Fragment WG signoff1998/11/04doneXML Fragment WG
XML Plenary signoff1998/11/20done paul@arbortext.com,veillard@w3.org
Publish as W3C Note1998/11/23accepting comments www-xml-fragment-comments@w3.org
Checkpoint of comments1999/01/08   

Comments about this document should be submitted to the "contact" listed above for each process.

Copyright ©1998 W3C (MIT, INRIA, Keio) , All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.

Abstract

The XML standard supports logical documents composed of possibly several entities. It may be desirable to view or edit one or more of the entities or parts of entities while having no interest, need, or ability to view or edit the entire document. The problem, then, is how to provide to a recipient of such a fragment the appropriate information about the context that fragment had in the larger document that is not available to the recipient. The XML Fragment WG is chartered with defining a way to send fragments of an XML document--regardless of whether the fragments are predetermined entities or not--without having to send all of the containing document up to the part in question. This document specifies the design principles and requirements for this activity.

1. Overview

The XML standard supports logical documents composed of possibly several entities. It may be desirable to view or edit one or more of the entities or parts of entities while having no interest, need, or ability to view or edit the entire document. The problem, then, is how to provide to a recipient of such a fragment the appropriate information about the context that fragment had in the larger document that is not available to the recipient.

The XML Fragment WG is chartered to work on a mechanism to address these issues. The SGML Open Technical Resolution 9601:1996 on Fragment Interchange provides a basis for this work. This document specifies the design principles and requirements for this activity.

In the case of many XML documents, it is suboptimal to have to receive and parse the entire document when only a fragment of it is desired. If the user asked to look at chapter 20, one shouldn't need to parse 19 whole chapters before getting to the part of interest. The goal of this activity is to define a way senders can send small parts of an XML document without having to send everything up to the part needed. This can be done regardless of whether the parts are entities or not, and the parts can either be viewed immediately or accumulated for later use, assembly, or other processing.

The challenge is that an isolated element from an XML document may not contain quite enough information to be parsed correctly. The goal of this activity is to enable senders to provide the remaining information required so that systems can interchange any XML elements they choose, from books or chapters all the way down to paragraphs, tables, footnotes, book titles, and so on, without having to manage each as a separate entity or having to risk incorrect parsing due to loss of context.

2. Design Principles

In the design of any language, trade-offs in the solution space are necessary. To aid in making these trade-offs the follow design principles will be used (the order of these principles is not necessarily significant):

  1. XML fragment specifications should be usable over the internet.
  2. XML fragment specifications should support the specification of context for any well-formed chunk of XML; the definition of a fragment may be broadened to allow any chunk of XML that matches XML's "content" production (production [43]). Chunks of XML that do not match XML's "content" production (i.e., that are not well-formed entities) are specifically out of scope.
  3. XML fragment specifications should be optimized to work with simpler XML fragments (such as those conforming to the simpler XML profile being developed by the XML Syntax WG), though the language should also work with any XML ("the easy stuff should be easy, and the harder stuff should be possible"); working with SGML features not included in XML (including those, such as tag omission, allowed in HTML) is not a goal.
  4. XML fragment specifications should be capable of being specified both in the same storage object as the fragment body itself as well as in a separate object linked in some fashion to the fragment body.
  5. XML fragment specifications should support interaction with XML browsers, editors, repositories, and other XML applications.
  6. SGML features and characteristics not included in XML shall not be taken into consideration in the design of our fragment context specification solution.
  7. It is specifically not a goal that XML fragment specifications be designed in consideration of non-XML HTML browsers, parsers, or other non-XML applications.
  8. Since interoperability is a primary goal, there should be only one language for the fragment context specification rather than multiple "features." However, since the goal is to provide enough information to parse the fragment, and well-formed XML may not require any extra information to allow it to be parsed, no specific set of context information should be required in all context specifications. (No implementation should choke on any valid piece of context information, but no implementation should be considered non-compliant for choosing to ignore [on the receiving end]--or not include [on the sending end]--a specific piece of context information if doing so makes sense in the particular environment.)
  9. XML fragment specifications should leverage other recommendations and standards, including XML 1.0, XML Namespace, XPointer, XML Information Set, the SGML Open TR9601:1996 on Fragment Interchange, and relevant IETF work.
  10. XML fragment specifications should be human-readable and reasonably clear.
  11. Terseness in XML fragment specification syntax is of minimal importance.
  12. Issues involved with the possible "return" of any fragment to its original context and the determination of the possible validity of the "returned" fragment in its original context are beyond the scope of this activity.

3. Requirements

This activity will enable interchanging portions of XML documents while retaining the ability to parse them correctly (that is, as they would be parsed in their originating document context), and, as far as practical, to be formatted, edited, and otherwise processed in useful ways.

Conceptually, a sender examines a fragment to be sent and, using the notation to be defined by this activity, constructs a fragment context specification. The object representing the fragment removed from its source document is called the fragment body. The sender sends the fragment context specification and the fragment body to the recipient. The storage object in which the fragment body is transmitted is call the fragment entity. (In some packaging schemes, the fragment context specification may also be embedded in the fragment entity.) The recipient processes the fragment context specification to determine the proper parser state for the context at the beginning of the fragment and uses that information to enable the XML parser to parse the fragment body.

The point of the fragment context information is to provide information that is not available in the fragment body itself but that would be available from the complete XML document. Specifically, any information not available from the XML document as a whole (plus knowledge of the location of the fragment body within the document) is out of scope for inclusion in the fragment context information. Such information may well be useful and important metadata in a variety of applications, but there are (or need to be) other mechnisms for handling this information.

Specifically a successful XML Fragment WG Recommendation will enable the following scenarios:

  1. A sender can send a fragment that consists of any element or any sequence of XML data that matches production 43 for "content" in the XML 1.0 Recommendation. Most commonly this means an element or a sequence of contiguous sibling elements, but character data, processing instructions, comments, whitespace, and certain other XML constructs may also be permitted.
  2. The fragment can be parsed correctly at the recipient end to produce precisely the same information set (XML structure and content information as defined by the XML Information Set WG) that the sender got when it parsed the fragment in its complete document context.
  3. Where feasible, the fragment will be able to include information to aid a recipient in determining how to present the fragment (e.g., to allow the recipient to number headings as if the fragment were section 3 of appendix B). This same information should allow the recipient to compute many link anchors based on a hierarchical location in the original document.
  4. Where feasible, the fragment will be able to include enough context information to allow a recipient, such as a validating editor, to determine what modifications would be valid or invalid given the larger document context. Note that the maintenance of certain global validity constraints--such as document-wide uniqueness of IDs--may be deemed unfeasible.
  5. To allow for optimized interchange between systems that have special knowledge of each other's capabilities and requirements, no specific piece of fragment context information will be required (in particular, the "null" fragment context specification would be a valid fragment context specification).

To accomplish these ends, this activity will define:

A. Potential reference scenarios (Non-normative)

A.1 One element of a transaction record as a fragment

The user has an XML document that represents a customer's set of purchases as a bookstore, and the part of that document that represents the purchase of a particular book needs to be represented as a fragment.

A.2 A user selection (aka highlighted region) as a fragment

The user makes a well-balanced selection in the original document and wants to make the contents of that selection a fragment.

A.3 Entities as fragments

A user has an XML document composed of several entities, and she wants to be able to edit each entity standalone as well as having them referencable from the parent document (i.e., each entity has to be both a valid XML entity and a legal fragment at the same time).

A.4 Indexes into a large document

The user has very large XML documents, possibly a gigabyte or more in size, and wishes to be able to view portions of the document without parsing the whole document. In order to do this the user creates an "index" for each document portion (fragment) that they wish to so address. The "index" consists of a fragment context specification in combination with a packaging mechanism designed for quick access to the fragment body.


Valid HTML 4.0!