W3C

XPointer Framework

W3C Working Draft 10 July 2002

This version:
http://www.w3.org/TR/2002/WD-xptr-framework-20020710/
Latest version:
http://www.w3.org/TR/xptr-framework/
Previous version:
http://www.w3.org/TR/2001/CR-xptr-20010911/
Editors:
Paul Grosso, Arbortext, Inc. <pgrosso@arbortext.com>
Eve Maler, Sun Microsystems <eve.maler@sun.com>
Jonathan Marsh, Microsoft <jmarsh@microsoft.com>
Norman Walsh, Sun Microsystems <Norman.Walsh@Sun.COM>

This document is also available in the following non-normative format: XML (DTD, XSL).


Abstract

This specification defines the XML Pointer Language (XPointer) Framework, an extensible system for XML addressing that underlies additional XPointer scheme specifications. The framework is intended to be used as a basis for fragment identifiers for any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity. Other XML-based media types are also encouraged to use this framework in defining their own fragment identifier languages.

Status of this Document

This is a Last Call W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." Comments on this document should be sent no later than 31 July 2002 to the public mailing list www-xml-linking-comments@w3.org (archive).

This document has been produced by the W3C XML Linking Working Group as part of the XML Activity. The goals of this work are set out in the XPointer Requirements document.

There are patent disclosures and license commitments associated with this working draft, which may be found on the XPointer IPR Statement page in conformance with W3C policy.

Even though it has not been seen before in this form, this specification is being published as a Last Call Working Draft because it is essentially a subset of the previous specification. This specification contains only the "bare names" and scheme mechanism features that were in the XPointer Candidate Recommendation published on 11 September 2001. Note that the "bare names" functionality has been extended slightly to include a schema-based ID addressing option.

We are specifically seeking input on what the XML Linking WG should recommend as a minimum conformance level for the purposes of XPointer usage in fragment identifiers for any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity. Actually specifying this level is an issue that will have to be taken up normatively in the successor to IETF RFC 3023 [RFC 3023].

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/

Table of Contents

1 Introduction
    1.1 Notation
    1.2 Terminology
2 Conformance
3 Language and Processing
    3.1 Syntax
    3.2 Shorthand Pointer
    3.3 Scheme-Based Pointer
    3.4 Namespace Binding Context

Appendix

A References
    A.1 Normative References
    A.2 Non-Normative References


1 Introduction

This specification defines the XML Pointer Language (XPointer) Framework, an extensible system for XML addressing that underlies additional XPointer scheme specifications. The framework is intended to be used as a basis for fragment identifiers for any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity. Other XML-based media types are also encouraged to use this framework in defining their own fragment identifier languages.

Many types of XML-processing applications need to address into the internal structures of XML-encoded resources using URI references, for example, the XML Linking Language [XLink], XML Inclusions [XInclude] , the Resource Description Framework [RDF], and SOAP V1.2 [SOAP12]. This specification does not constrain the types of applications that utilize URI references to XML-encoded resources, nor does it constrain or dictate the behavior of those applications once they locate the desired information in those resources.

1.1 Notation

[Definition: The key words must, must not, required, shall, shall not, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC 2119].]

The formal grammar for the XPointer Framework is given using simple Extended Backus-Naur Form (EBNF) notation, as described in the XML Recommendation [XML].

1.2 Terminology

[Definition: pointer]

A string conforming to this specification. This specification defines the syntax and semantics of pointers.

[Definition: pointer part]

A portion of a pointer that provides a scheme name and some pointer data that conforms to the definition of that scheme.

[Definition: scheme]

A specialized pointer data format that has a name and is defined in a specification.

[Definition: XPointer processor]

A software component that identifies some subresource of an XML-encoded resource by applying a pointer to it. This specification defines the behavior of XPointer processors.

[Definition: application]

A software component that incorporates or uses an XPointer processor because it needs to access XML-encoded resources by means of URI references. The occurrence and usage of URI references, and the behavior to be applied to resources and subresources obtained by accessing those URI references, are governed by the definition of each application's corresponding data format (which could be XML-based or non-XML-based). For example, HTML [HTML] Web browsers and XInclude processors are applications that might use XPointer processors.

[Definition: error]

A violation of the rules of this specification; results are undefined. Specifications of XPointer schemes may define their own error conditions that have different consequences from XPointer Framework errors.

[Definition: failure]

The inability of an XPointer processor to identify a subresource within an XML-encoded resource. Note that while failure of a shorthand pointer (shorthand) causes an XPointer Framework error, failure of a pointer part does not.

[Definition: namespace binding context]

A list of zero or more pairs of XML Namespace-defined [XML-Names] namespace prefixes and their associated namespace URIs.

2 Conformance

This specification defines a framework; it does not currently define a minimum conformance level for XPointer processors. Thus, the information in this section defines conformance requirements only for the framework portion of any minimum conformance level.

XPointer processors normatively depend on reversing any fragment identifier encoding and escaping done in conformance to [RFC 2396] (as updated by [RFC 2732]).

XPointer processors normatively depend on sufficient input about an XML resource to identify the following information items and properties, in the cases where they exist in the resource:

Software components claiming to be XPointer processors must conform to this XPointer Framework specification and any other specifications that, together with this specification, define the minimum conformance level for XPointer, and may conform to additional XPointer scheme specifications. XPointer processors must document the additional scheme specifications to which they conform. Specifications that depend on XPointer processing should document the schemes they require and allow.

Conforming XPointer processors must report XPointer Framework errors to the application. Applications may terminate or recover from XPointer Framework errors in any fashion. XPointer processors should not report failure conditions that do not result in error conditions.

3 Language and Processing

This section describes the XPointer Framework and the behavior of XPointer processors with respect to the framework.

An XPointer processor takes as input an XML-encoded resource and a fragment identifier taken from the URI reference that was used to access the resource, and produces as output either an identification of some subresource within that resource based on the pointer extracted from the fragment identifier, or one or more errors, or both. If the fragment identifier contains any characters that have been encoded or escaped to conform to [RFC 2396] (as updated by [RFC 2732]) requirements, the XPointer processor must reverse the encoding or escaping in order to interpret the pointer.

3.1 Syntax

If a string used as a pointer does not adhere to the syntax defined in this section, it is an error.

XPointer Framework Syntax
[1]   pointer   ::=   shorthand | schemebased
[2]   shorthand   ::=   Name
[3]   schemebased   ::=   ptrpart (S?, ptrpart)*
[4]   ptrpart   ::=   scheme '(' schemedata ')'[VC: Parenthesis escaping]
[5]   scheme   ::=   NCName
[6]   schemedata   ::=   Char*

Validity constraint: Parenthesis escaping

The end of a pointer part is signaled by the right parenthesis ")" character that is balanced with the left parenthesis "(" character that began the part. If either a left or a right parenthesis occurs without being balanced by its counterpart, it must be escaped with a circumflex (^) character preceding it. Any literal occurrences of the circumflex must be escaped with an additional circumflex (that is, ^^). Any other use of a circumflex is an error.

3.2 Shorthand Pointer

A shorthand pointer consists of an XML-defined Name alone. The Name identifies a single element in the XML resource by ID as follows, in terms of the XML resource's information set:

  1. If the document has an XML Schema [XMLSchema] PSVI, and exposes the [type definition] and [member type definition] properties, then:

    Return the first element information item in document order with an [attributes] collection containing an attribute information item, or whose [children] collection contains an element information item, which has a [schema normalized value] equal to the Name and which has a type definition (the [member type definition], if one is present, otherwise the [type definition]) which has either:

    1. a {target namespace} of "http://www.w3.org/2001/XMLSchema" and a {name} of "ID" (having the effect of identifying the element information item associated with an identifier that is directly assigned the XML Schema ID type), or

    2. a {base type definition} that directly or recursively satisfies (a) (having the effect of identifying the element information item associated with an identifier that is directly or indirectly derived from the XML Schema ID type);

  2. or, if an element information item was not identified in the previous step because the document has an XML Schema PSVI but does not expose the [type definition] and [member type definition] properties, then:

    Return the first element information item in document order with an [attributes] collection containing an attribute information item, or whose [children] collection contains an element information item, which has a [schema normalized value] equal to the Name and which has a [type definition namespace] of "http://www.w3.org/2001/XMLSchema" and a [type definition name] of "ID" or a [member type definition namespace] of "http://www.w3.org/2001/XMLSchema" and a [member type definition name] of "ID";

  3. or, if an element information item was not identified in the previous step:

    Return the first element information item in document order with an [attributes] collection containing an attribute information item with an [attribute type] of "ID" and a [normalized value] equal to the Name.

If no element is identified by this process, the Name fails and the pointer is in error.

Note:

A shorthand pointer provides, for resources with XML-based media types, a rough analog of HTML fragment identifier behavior. However, if ID typing information is not available because no DTD or schema information is available, the pointer will not identify any element. There are several ways to make element identification more reliable. For example, the creator of a resource can use an internal DTD subset to indicate the presence of ID-typed attributes, and the creator of a pointer can, instead of a shorthand pointer, use a schema-based pointer and provide one or more schemes that address the desired element in other ways.

3.3 Scheme-Based Pointer

A scheme-based pointer consists of one or more pointer parts, optionally separated by XML-defined white space (S). Each part has a scheme name and contains, within parentheses, data (zero or more of the XML-defined Char) conforming to the named scheme. If the scheme data contains parentheses, they must be either balanced or escaped.

In the case of URI references referring to any resource whose Internet media type is one of text/xml, application/xml, text/xml-external-parsed-entity, or application/xml-external-parsed-entity, this specification reserves all scheme names for definition in additional W3C XPointer scheme specifications. However, the scheme mechanism provides a general framework for extensibility; other XML-based media types are also encouraged to use this framework in defining their own fragment identifier languages.

When multiple pointer parts are provided, an XPointer processor must evaluate them in left-to-right order. If a part being evaluated fails because the XPointer processor does not support the scheme, because the scheme data is syntactically in error according to the specification governing that scheme, or because the scheme identifies no subresource, that part is consumed and the next, if any, is evaluated. The result of the first pointer part whose evaluation succeeds is reported by the XPointer processor as the subresource identified by the pointer as a whole, and evaluation stops. If all the parts fail, it is an error. If a scheme-based pointer has an error in its construction as a whole, evaluation stops and pointer parts are not consumed.

A scheme-based pointer might contain characters that are not allowed in a URI reference. For example, this XPointer Framework specification allows spaces between pointer parts, and individual XPointer scheme specifications might allow or require other characters that are disallowed in URI references. During creation of the pointer, it is typically not necessary to perform encoding or escaping on disallowed characters. However, by the time a pointer is fed as input (as a fragment identifier on a URI reference) into a URI resolver, any such characters must have been encoded and/or escaped as defined in [RFC 2396] (as updated by [RFC 2732]).

3.4 Namespace Binding Context

Scheme specifications may define ways to bind XML Namespaces [XML-Names] prefixes to namespace names for the purpose of interpreting element and attribute names' namespace prefixes appearing in pointer parts. These bindings contribute to a namespace binding context that applies to all pointer parts to the right of the pointer part making the binding, unless exceptions are explicitly made by the schemes in question. The documentation for any namespace-binding scheme must specify whether its bindings remain in effect for later pointer parts. The documentation for every scheme must specify whether it uses the namespace binding context.

The initial namespace binding context prior to evaluation of the first pointer part consists of a single entry: the xml prefix bound to the URI http:/www.w3.org/XML/1998/namespace. Pointer parts must not attempt to redefine the xml prefix; any attempt to do so results in no change being made to the namespace binding context.

A References

A.1 Normative References

Infoset
John Cowan and Richard Tobin, editors. XML Information Set. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xml-infoset/.)
RFC 2119
RFC 2119: Key words for use in RFCs to Indicate Requirement Levels. Internet Engineering Task Force, 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
RFC 2396
RFC 2396: Uniform Resource Identifiers. Internet Engineering Task Force, 1995. (See http://www.ietf.org/rfc/rfc2396.txt.)
RFC 2732
RFC 2732: Format for Literal IPv6 Addresses in URL's. Internet Engineering Task Force, 1999. (See http://www.ietf.org/rfc/rfc2732.txt.)
RFC 3023
RFC 3023: XML Media Types. Internet Engineering Task Force, 2001. (See http://www.ietf.org/rfc/rfc3023.)
XML
Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, and Eve Maler, editors. Extensible Markup Language (XML) 1.0 (Second Edition). World Wide Web Consortium, 2000. (See http://www.w3.org/TR/REC-xml.)
XML-Names
Tim Bray, Dave Hollander, and Andrew Layman, editors. Namespaces in XML. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/REC-xml-names/.)
XMLSchema
Henry Thompson et al., editors. XML Schema Part 1. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xmlschema-1/.)

A.2 Non-Normative References

HTML
HTML 4.01 Specification. World Wide Web Consortium, 1999. (See http://www.w3.org/TR/html4/.)
RDF
Dave Beckett, editor. RDF/XML Syntax Specification. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/REC-rdf-syntax/.)
SOAP12
Nilo Mitra et al., editors. SOAP Version 1.2 Parts 0, 1, and 2. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/soap12-part0/.)
XInclude
Jonathan Marsh and David Orchard, editors. XML Inclusions (XInclude) Version 1.0. World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xinclude/.)
XLink
Steve DeRose, Eve Maler, and David Orchard, editors. XML Linking Language (XLink). World Wide Web Consortium, 2001. (See http://www.w3.org/TR/xlink/.)