This is a W3C Note produced as a deliverable of the XML Linking WG according to its charter and Published as a W3C NOTE February 23 1999. A list of current W3C working drafts and notes can be found at http://www.w3.org/TR.

This document was originally a portion of the XPointer Requirements document. It was separated out by decision of the XML Linking WG taken at the meeting of November 21-22, 1998 in Chicago, to become a liaison statement to the XML Information Set Working Group.

Requirements for a virtual abstract structure tree

The abstract information structure of an XML document defines the components that links address into using XPointers. This structure must be explicit and unambiguous for XPointer to work, because:

Otherwise an XPointer would be ambiguous: two conforming implementations could interpret the same XPointer as pointing to two different places. Questions as simple as "What is the element with id='foo'?" (is id a name or a type?), and "Which node is the third child of this other node?" (do PIs count?) would become unanswerable.
Otherwise XPointers could not achieve robustness even against trivial, non-structural changes in syntax (such as inserting spaces around "=" in start-tags). Such robustness is highly desirable, especially when XML is generated automatically or dynamically. It would be very bad if XPointers could be broken by such semantically insignificant changes, and far worse if they could make XPointers quietly point to different places.

Given this, some requirements on that abstract structure tree representation are laid out below. Merely to have a term in this document, we refer to any XML information representation as a Virtual Abstract Structure Tree, or "VAST". We choose such a generic term to prevent tieing these constraints to any particular model, API, or implementation. These requirements should be considered liaison information for DOM and especially for the XML Information Set WG, but are not intended to constrain the details of any API or implementation, beyond the need to somehow support the aspects of functionality needed by XPointer.

A: Completeness requirements

An information structure to support XPointer (call it "VAST" for the moment) must represent elements, attributes, characters, PIs, and unparsed entities. The treatment of comments is still under consideration by the XML Linking WG.

All these things might be needed link targets for someone. Although an end user may only rarely want to link to a PI or comment, there is no principled reason they should not, and there are real cases already where crucial information (such as scripts) is buried in comments. User annotations in document authoring and review applications must be able to point to all these things in order to comment on them. Also, XPointer is not only for end-user navigation links. Programs can use it to represent internal connections; in such scenarios they certainly may want to refer to a particular processing instruction to attach further data to it, connect it to workflow, version control, or other meta-information, and so on. In short, any structure XML defines is a real object in XML documents, and needs to be supported; syntax details such as whitespace within start-tags are on quite a different level from whole events such as PIs.
A VAST must represent all the containment and order relations among the nodes that constitute information.

For example, the order of paragraphs in a section is normally information, but the order of attributes in a tag is not. An unordered model, such as one that would require retrieving all children of a node and then sorting by an internal field in order to get them in order, or one that would require visiting all descendants in order to reach a sibling, is unacceptable (though such an implementation need not conflict with this conceptual model at all).
A VAST must be able to represent documents in which there are multiple nodes surrounding the document root element (comments, PI, DOCTYPE, etc).

This is commonly done by positing a reserved true-root node as parent of the document element, which thus contains all this stuff. This is natural because most of the same methods are meaningful for this node as for any other. However, many other implementations and interfaces are possible.

B: Robustness requirements

The same structure must be generated regardless of whether the parser was WF or validating.

This is critical for XPointer, because otherwise links couldn't be reliably followed when the link-maker and link-user happened to use different kinds of parsers. For invalid documents, of course, a validating parser will stop parsing and therefore result in a radically different VAST, or no VAST at all; but for valid documents the VASTs must be the same.

The XML Linking WG realizes there are considerable difficulties with completely fulfilling this requirement. We do not see default attributes as the crucial aspect, since they may be relatively unlikely to be referenced by Xpointers; more important are distinction of child-counting, that would throw off geometric pointing into element structure trees, on which we depend for certain aspects of addressing.

[Implementor note: Since attributes may receive default values based on declarations in the internal subset, these must be supported by WF systems that are to fully support XPointer.]
A VAST must distinguish the element/attribute relationship from the element/subelement relationship. That is, attributes are 'owned' (or some other term), not 'children'; thus they do not confuse traversals, child counting for linking, etc.
A VAST must include all whitespace in content from the parser.

As with attribute internals, a WF parser cannot determine which whitespace is insignificant, and so interoperability of XPointers appears to require this.

C: Mechanical and syntactic requirements

A VAST must permit efficient implementations for reaching a complete resource given a sub-resource of it, and for reaching intermediate-scope nodes between the two.

For example, if a link results in references to one or more P elements, it is usually not user-useful to retrieve the individual P elements in isolation, and be unable to display them in context. The interpretation of text requires context. There are certain scenarios where information portions out of context suffice but in far more scenarios context is required. For example:
- In nearly all style languages, even a P element cannot be formatted correctly without knowing where it lives in the larger tree context, due to style inheritance, auto-numbering algorithms, and so on.
- In many applications the user must be able to move up or down to read the preceding and following contexts, or to identify the current node in a more global view such as a table of contents.
- Intellectual property issues may impose requirements that information be displayed with certain context. One obvious piece of contextual information is a copyright notice on the document out of which the linked data is drawn.
- In general, the content one gets by discarding markup is not the correct text; this is because some tags imply a word-break (ITEM) and some do not (I); some tags imply much more, such as a digression to a completely different text (FOOTNOTE). So returning "the content" may simply produce the wrong result.
XPointer's abstract structural requirements must permit varying implementations of how children are attached to parents, how attributes are attached to element nodes, and so on, but need not closely mirror any of them.

[Implementor note: There are many well-known and efficient data structures for representing structure trees: arrays of children at each node, doubly-linked sibling lists, threaded arrays, and so on. Any are fine so long as they provide the necessary context access.]
Characters must be accessible just as other nodes are.

This is critical for XPointer because it is extremely common to point to characters, especially with the ever-popular "select/create link" interfaces.

If characters could not be treated as easily as other nodes, the XPointer definition would have to become more complicated. It could not simply state that the result of evaluating an XPointer is just a node, or a list of nodes or locations, or even a list of start/end pairs. Instead, a pointer to a character would have to consist of 3 things together: some pointer to a nearby node (say, the one to the right, or the parent), plus a bit to say that this time it's pointing to a character associated with that node instead of to the node itself, plus something to say which associated character. For example,
```
     Node 378 (say, a P node)
```
versus
```
     Text preceding node 378, character 12
```
Note: This obviously does not constrain how particular implementations allocate or arrange their internal data structures. But XPointer needs to be able to access and count positions without regard to how particular implementations may or may not "chunk" things. Propogating such distinctions into XPointer would hugely complicate implementations. Accessing and counting characters without concern for possible intermediate text-chunk nodes is the conceptual model needed by XPointer, though we fully understand that using some form of strings or chunks may often be the best implementation approach. Our constraints are specified in terms of a conceptual model for determining how things are counted, and so use characters. So long as we can do that, we are not concerned about the underlying implementation method(s), from storing internal text nodes, to forwarding-pointer-enhanced linked lists, to indexed arrays; these could all present things consistent with the way our abstract model counts.]

D: Non-requirements

A VAST need not represent the internal structure of attributes (such as tokenization).

One reason for this is that it follows from the requirement that WF and validating parsers provide the same tree: since WF parsers do not in general know whether an attribute is NMTOKENS or CDATA, they cannot determine what the internal structure is.

Also, since schemas will likely provide internal structuring capability for attribute values, anything put in now for this will be soon obsolete. Applications can interpret attributes however they want (such as interpreting them as quantities and doing math on them). However, we do know that attributes will continue to be strings, and will not contain true elements or other nodes, so we can leave them at that for now.
A VAST need not represent DTD structures at this time.

If schema efforts using XML syntax are successful, then VAST can clearly be used for DTDs. If not, an abstract representation of DTDs can be added in a later revision. The abstract tree, however, must be able to represent the DOCTYPE reference to a DTD, so the DTD can be found by any applicable process.

[Implementor note: The effect of attribute defaults declared in the internal subset is represented, but the VAST need not include a representation of the declarations that created that effect.
A VAST may, but need not for XPointer, provide information about CDATA marked sections and entity boundaries. It must, however, easily provide a view without this information.

If such information is included in an implementation, there must still be access to a simpler view in an extremely simple, fast way, equivalent to mere deletion of nodes from the more complex view: no re-parenting, reordering, promotion, demotion, or merging of other node, etc. This is so that node counting and traversal will not be adversely affected by whether the user is in an application that requires such information for other legitimate purposes, such as editing.

As always, any of the countless possible implementations that can provide these external requirements is fine; the requirement for XPointer is that its task not be made unduly costly or painful because in consequence.

[Implementor note: One way to do this is to define "raw" and "cooked" views, where the raw view includes the extra nodes. Editors and some other applications would probably need the full raw information internally, but XPointer and some other applications can be defined strictly in terms of the (subset) cooked view. This is crucial because only a cooked view can be guaranteed identical across applications whether or not they support raw views.]
A VAST need not directly represent derived relationships such as the connection between IDREFs and the like-keyed ID, or the relationship between all elements with the same element type, so long as it does not make it impossible to associate such information in other ways.

For example, the conceptual model does not impose any requirement that implementations embed hard pointers to resolve all IDREFs, as opposed to building some other method to get from an IDREF to the corresponding ID. Strictly speaking, such information is semantic rather than purely syntactic: XML syntax determines a tree; semantic relationships (the most obvious example being IDREFs) may determine arbitrarily more complex structures (this is also clear from the fact that such semantic information cannot affect well-formedness).

[Implementor note: Putting such information directly into a VAST would pose several problems:
- It would add great complexity to the specification, and would introduce subtle questions about just which derived relationships to represent (for example, should attributes containing XPointers be connected just as IDREF attributes are?).
- It could mislead implementors to believe they must implement such relationships via direct pointers in the VAST, when we know that differing implementations have proven useful in established applications with varying user requirements (separate ID indexes, serial search, hard internal pointers, etc.).
- A VAST that included such relationships on par with the primary syntactic element/sub-element structure would not be a tree. It would not even be a directed acyclic graph, but would be a fully general cyclic graph structure. Cyclic graphs are far more sophisticated and subtle to deal with, and many implementors are not familiar with them. Even such an everyday task as traversing the structure is hard (and is often done by creating a subset of the graph that is a tree, and traversing that "spanning tree" -- see Liu 1977). Many operations, such as order comparisons, traversals, import/export, and so on, are far easier to define on a tree.
Such relationships clearly exist, and can be handled just fine (many systems have done so for a long time); indeed representing them is much of the purpose of XPointer. Problems only arise from describing them as on par with the basic syntactic node relationships of the defining structural model. Doing so would complicate implementing and teaching the model:

On the one hand, structure trees have repeatedly proven easy to teach and to motivate. Describing the element/text tree is easy; adding PIs and comments is easy from there; and describing attributes as properties rather than children still leaves the structure a tree, and is not difficult. Perhaps this ease of teaching is because this model fits the HTML and XML overall models so clearly.

On the other hand, we know from many attempts that drawing all the other (semantic) relationships on top of the tree structure obscures it. When told the conceptual structure isn't "really" a tree, less-technical users' eyes quickly glaze over, while implementors are quickly tempted to discard well-known and extremely efficient algorithms that they know apply only to trees.

[Note: Most implementations of trees are not themselves trees anyway. One of many examples is a useful representation of n-ary trees (mentioned in Knuth 1973) that uses a doubly-linked list of children for each node. Obviously this is not a tree if you "count" all those back-links; but what defines a data structure as a tree representation is its conceptual topology, not its implementation. Knuth's formal definition is simply:

a finite set T of one or more nodes such that
a) There is one specially designated node called the root of the tree, root(T); and
b) The remaining nodes (excluding the root) are partitioned into m>=0 disjoint sets T1,..., Tm, and each of these sets in turn is a tree. The trees T1,..., Tm are called the subtrees of the root.

In simpler terms, this just means that as long as there is a root with no parent and every other node has one parent, it's a tree. It quite simply doesn't matter how many other pointers, links, arcs, or whatever else you have floating around, as long as each node but the root has exactly one connection that you call "parent". ]

This version:	http://www.w3.org/TR/1999/NOTE-xptr-infoset-liaison-19990224
Latest version:	http://www.w3.org/TR/NOTE-xptr-infoset-liaison
Editor:	Steven J. DeRose (Inso Corp. & Brown Univ.) <Steven_DeRose@Brown.edu>

XPointer-Information Set Liaison Statement

W3C Note 24-Feb-1999

Status of this document

Related documents

Abstract

Table of Contents

Requirements for a virtual abstract structure tree

A: Completeness requirements

B: Robustness requirements

C: Mechanical and syntactic requirements

D: Non-requirements

Bibliography