XPointer Framework

1 Introduction

This specification defines the XML Pointer Language (XPointer) Framework, an extensible system for XML addressing that underlies additional XPointer scheme specifications. The framework is intended to be used as a basis for fragment identifiers for any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity. Other XML-based media types are also encouraged to use this framework in defining their own fragment identifier languages.

Many types of XML-processing applications need to address into the internal structures of XML-encoded resources using URI references, for example, the XML Linking Language [XLink], XML Inclusions [XInclude] , the Resource Description Framework [RDF], and SOAP V1.2 [SOAP12]. This specification does not constrain the types of applications that utilize URI references to XML-encoded resources, nor does it constrain or dictate the behavior of those applications once they locate the desired information in those resources.

1.1 Notation

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]

The formal grammar for the XPointer Framework is given using simple Extended Backus-Naur Form (EBNF) notation, as described in the XML Recommendation [XML].

1.2 Terminology

[Definition: pointer]: A string conforming to this specification. This specification defines the syntax and semantics of pointers.
[Definition: pointer part]: A portion of a pointer that provides a scheme name and some pointer data that conforms to the definition of that scheme.
[Definition: scheme]: A specialized pointer data format that has a name and is defined in a specification.
[Definition: XPointer processor]: A software component that identifies some subresource of an XML-encoded resource by applying a pointer to it. This specification defines the behavior of XPointer processors.
[Definition: application]: A software component that incorporates or uses an XPointer processor because it needs to access XML-encoded resources by means of URI references. The occurrence and usage of URI references, and the behavior to be applied to resources and subresources obtained by accessing those URI references, are governed by the definition of each application's corresponding data format (which could be XML-based or non-XML-based). For example, HTML [HTML] Web browsers and XInclude processors are applications that might use XPointer processors.
[Definition: error]: A violation of the rules of this specification; results are undefined. Specifications of XPointer schemes may define their own error conditions that have different consequences from XPointer Framework errors.
[Definition: failure]: The inability of an XPointer processor to identify a subresource within an XML-encoded resource. Note that while failure of a shorthand pointer (shorthand) causes an XPointer Framework error, failure of a pointer part does not.
[Definition: namespace binding context]: A list of zero or more pairs of XML Namespace-defined [XML-Names] namespace prefixes and their associated namespace URIs.

2 Conformance

This specification defines a framework; it does not currently define a minimum conformance level for XPointer processors. Thus, the information in this section defines conformance requirements only for the framework portion of any minimum conformance level.

XPointer processors normatively depend on reversing any fragment identifier encoding and escaping done in conformance to [RFC 2396] (as updated by [RFC 2732]).

XPointer processors normatively depend on sufficient input about an XML resource to identify the following information items and properties, in the cases where they exist in the resource:

From the XML Information Set [Infoset]:
- document information item
  - [document element] property
    Note that if the XML resource is not a document but rather an external parsed entity, this property will not be reported. Rather, the information set is effectively extended to report the one or more top-level elements in the entity as ordered "root element" properties for the entity.
- element information item
  - [attributes] property
  - [children] property
- attribute information item
  - [attribute type] property
  - [normalized value] property
From the XML Schema post-schema validation information set (PSVI) [XMLSchema]:
- [schema normalized value] property
- Either:
  - [member type definition] property
  - [type definition] property
  or:
  - [member type definition namespace] property
  - [member type definition name] property
  - [type definition namespace] property
  - [type definition name] property

Software components claiming to be XPointer processors must conform to this XPointer Framework specification and any other specifications that, together with this specification, define the minimum conformance level for XPointer, and may conform to additional XPointer scheme specifications. XPointer processors must document the additional scheme specifications to which they conform. Specifications that depend on XPointer processing should document the schemes they require and allow.

Conforming XPointer processors must report XPointer Framework errors to the application. Applications may terminate or recover from XPointer Framework errors in any fashion. XPointer processors should not report failure conditions that do not result in error conditions.

3 Language and Processing

This section describes the XPointer Framework and the behavior of XPointer processors with respect to the framework.

An XPointer processor takes as input an XML-encoded resource and a fragment identifier taken from the URI reference that was used to access the resource, and produces as output either an identification of some subresource within that resource based on the pointer extracted from the fragment identifier, or one or more errors, or both. If the fragment identifier contains any characters that have been encoded or escaped to conform to [RFC 2396] (as updated by [RFC 2732]) requirements, the XPointer processor must reverse the encoding or escaping in order to interpret the pointer.

3.1 Syntax

If a string used as a pointer does not adhere to the syntax defined in this section, it is an error.

XPointer Framework Syntax

[1]	`pointer`	::=	`shorthand \| schemebased`
[2]	`shorthand`	::=	`Name`
[3]	`schemebased`	::=	`ptrpart (S?, ptrpart)*`
[4]	`ptrpart`	::=	`scheme '(' schemedata ')'`	[VC: Parenthesis escaping]
[5]	`scheme`	::=	`NCName`
[6]	`schemedata`	::=	`Char*`

Validity constraint: Parenthesis escaping

The end of a pointer part is signaled by the right parenthesis ")" character that is balanced with the left parenthesis "(" character that began the part. If either a left or a right parenthesis occurs without being balanced by its counterpart, it must be escaped with a circumflex (^) character preceding it. Any literal occurrences of the circumflex must be escaped with an additional circumflex (that is, ^^). Any other use of a circumflex is an error.

3.2 Shorthand Pointer

A shorthand pointer consists of an XML-defined Name alone. The Name identifies a single element in the XML resource by ID as follows, in terms of the XML resource's information set:

If the document has an XML Schema [XMLSchema] PSVI, and exposes the [type definition] and [member type definition] properties, then:
Return the first element information item in document order with an [attributes] collection containing an attribute information item, or whose [children] collection contains an element information item, which has a [schema normalized value] equal to the Name and which has a type definition (the [member type definition], if one is present, otherwise the [type definition]) which has either:
1. a {target namespace} of "http://www.w3.org/2001/XMLSchema" and a {name} of "ID" (having the effect of identifying the element information item associated with an identifier that is directly assigned the XML Schema ID type), or
2. a {base type definition} that directly or recursively satisfies (a) (having the effect of identifying the element information item associated with an identifier that is directly or indirectly derived from the XML Schema ID type);
or, if an element information item was not identified in the previous step because the document has an XML Schema PSVI but does not expose the [type definition] and [member type definition] properties, then:
Return the first element information item in document order with an [attributes] collection containing an attribute information item, or whose [children] collection contains an element information item, which has a [schema normalized value] equal to the Name and which has a [type definition namespace] of "http://www.w3.org/2001/XMLSchema" and a [type definition name] of "ID" or a [member type definition namespace] of "http://www.w3.org/2001/XMLSchema" and a [member type definition name] of "ID";
or, if an element information item was not identified in the previous step:
Return the first element information item in document order with an [attributes] collection containing an attribute information item with an [attribute type] of "ID" and a [normalized value] equal to the Name.

If no element is identified by this process, the Name fails and the pointer is in error.

Note:

A shorthand pointer provides, for resources with XML-based media types, a rough analog of HTML fragment identifier behavior. However, if ID typing information is not available because no DTD or schema information is available, the pointer will not identify any element. There are several ways to make element identification more reliable. For example, the creator of a resource can use an internal DTD subset to indicate the presence of ID-typed attributes, and the creator of a pointer can, instead of a shorthand pointer, use a schema-based pointer and provide one or more schemes that address the desired element in other ways.

3.3 Scheme-Based Pointer

A scheme-based pointer consists of one or more pointer parts, optionally separated by XML-defined white space (S). Each part has a scheme name and contains, within parentheses, data (zero or more of the XML-defined Char) conforming to the named scheme. If the scheme data contains parentheses, they must be either balanced or escaped.

In the case of URI references referring to any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity, this specification reserves all scheme names for definition in additional W3C XPointer scheme specifications. However, the scheme mechanism provides a general framework for extensibility; other XML-based media types are also encouraged to use this framework in defining their own fragment identifier languages.

When multiple pointer parts are provided, an XPointer processor must evaluate them in left-to-right order. If a part being evaluated fails because the XPointer processor does not support the scheme, because the scheme data is syntactically in error according to the specification governing that scheme, or because the scheme identifies no subresource, that part is consumed and the next, if any, is evaluated. The result of the first pointer part whose evaluation succeeds is reported by the XPointer processor as the subresource identified by the pointer as a whole, and evaluation stops. If all the parts fail, it is an error. If a scheme-based pointer has an error in its construction as a whole, evaluation stops and pointer parts are not consumed.

A scheme-based pointer might contain characters that are not allowed in a URI reference. For example, this XPointer Framework specification allows spaces between pointer parts, and individual XPointer scheme specifications might allow or require other characters that are disallowed in URI references. During creation of the pointer, it is typically not necessary to perform encoding or escaping on disallowed characters. However, by the time a pointer is fed as input (as a fragment identifier on a URI reference) into a URI resolver, any such characters must have been encoded and/or escaped as defined in [RFC 2396] (as updated by [RFC 2732]).

3.4 Namespace Binding Context

Scheme specifications may define ways to bind XML Namespaces [XML-Names] prefixes to namespace names for the purpose of interpreting element and attribute names' namespace prefixes appearing in pointer parts. These bindings contribute to a namespace binding context that applies to all pointer parts to the right of the pointer part making the binding, unless exceptions are explicitly made by the schemes in question. The documentation for any namespace-binding scheme must specify whether its bindings remain in effect for later pointer parts. The documentation for every scheme must specify whether it uses the namespace binding context.

The initial namespace binding context prior to evaluation of the first pointer part consists of a single entry: the xml prefix bound to the URI http:/www.w3.org/XML/1998/namespace. Pointer parts must not attempt to redefine the xml prefix; any attempt to do so results in no change being made to the namespace binding context.

XPointer Framework

W3C Working Draft 10 July 2002

Abstract

Status of this Document

Table of Contents

Appendix