XQuery 1.0 and XPath 2.0 Data Model

attributes"> base-uri"> node-kind"> children"> content"> namespaces"> nilled"> node-name"> parent"> prefix"> string-value"> target"> type-name"> uri"> typed-value"> ]>

&doc.prefix;-&doc.date;

W3C Working Draft

&date.day; &date.month; &date.year; &url.this; XML http://www.w3.org/TR/xpath-datamodel/ http://www.w3.org/TR/2004/WD-xpath-datamodel-20040723/ http://www.w3.org/TR/2003/WD-xpath-datamodel-20031112/ Mary Fernández (XML Query WG) AT&T Labs mff@research.att.com Ashok Malhotra (XML Query and XSL WGs) Oracle Corporation ashok.malhotra@alum.mit.edu Jonathan Marsh (XSL WG) Microsoft jmarsh@microsoft.com Marton Nagy (XML Query WG) Science Applications International Corporation (SAIC) marton.nagy@saic.com Norman Walsh (XSL WG) Sun Microsystems Norman.Walsh@Sun.COM

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Public Working Draft for review by W3C Members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The XQuery 1.0 and XPath 2.0 Data Model has been defined jointly by the XML Query Working Group and the XSL Working Group (both part of the XML Activity).

This working draft includes a number of changes made in response to comments received during the Last Call period that ended on Feb. 15, 2004. The working group is continuing to process these comments, and additional changes are expected.

This document reflects decisions taken up to and including the face-to-face meeting in Cambridge, MA during the week of June 21, 2004. These decisions are recorded in the Last Call issues list (http://www.w3.org/2004/10/data-model-issues.html). However, some of these decisions may not yet have been made in this document.

Public comments on this document and its open issues are invited. Comments should be sent to the W3C mailing list public-qt-comments@w3.org. (archived at http://lists.w3.org/Archives/Public/public-qt-comments/) with “[DM]” at the beginning of the subject field.

The patent policy for this document is the 5 February 2004 W3C Patent Policy. Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page and the XSL Working Group's patent disclosure page. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

This document defines the W3C XQuery 1.0 and XPath 2.0 Data Model, which is the data model of , , and , and any other specifications that reference it. This data model is based on the data model and earlier work on an . This document is the result of joint work by the and the .

English

See the CVS changelog.

Introduction

This document defines the XQuery 1.0 and XPath 2.0 Data Model, which is the data model of , and

The XQuery 1.0 and XPath 2.0 Data Model (henceforth "data model") serves two purposes. First, it defines the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages. A language is closed with respect to a data model if the value of every expression in the language is guaranteed to be in the data model. XSLT 2.0, XQuery 1.0, and XPath 2.0 are all closed with respect to the data model.

The data model is based on the (henceforth "Infoset"), but it requires the following new features to meet the and :

Support for XML Schema types. The XML Schema recommendations define features, such as structures () and simple data types (), that extend the XML Information Set with precise type information.

Representation of collections of documents and of complex values. ()

As with the Infoset, the XQuery 1.0 and XPath 2.0 Data Model specifies what information in the documents is accessible, but it does not specify the programming-language interfaces or bindings used to represent or access the data.

The data model can represent various values including not only the input and the output of a stylesheet or query, but all values of expressions used during the intermediate calculations. Examples include the input document or document repository (represented as a &documentNode; or a sequence of &documentNode;s), the result of a path expression (represented as a sequence of nodes), the result of an arithmetic or a logical expression (represented as an atomic value), a sequence expression resulting in a sequence of items, etc.

This document provides a precise definition of the properties of nodes in the XQuery 1.0 and XPath 2.0 Data Model, how they are accessed, and how they relate to values in the Infoset and PSVI.

Concepts

This section outlines a number of general concepts that apply throughout this specification.

Terminology

For a full glossary of terms, see .

In this specification the words must, must not, should, should not, may and recommended are to be interpreted as described in .

This specification distinguishes between the data model as a general concept and specific items (documents, elements, atomic values, etc.) that are concrete examples of the data model by identifying all concrete examples as instances of the data model.

Every instance of the data model is a sequence..

A sequence is an ordered collection of zero or more items. A sequence cannot be a member of a sequence. A single item appearing on its own is modeled as a sequence containing one item. Sequences are defined in .

An item is either a node or an atomic value,

Every node is one of the seven kinds of nodes defined in . Nodes form a tree that consists of a root node plus all the nodes that are reachable directly or indirectly from the root node via the children, attributes, and namespaces accessors. Every node belongs to exactly one tree, and every tree has exactly one root node.

A tree whose root node is a &documentNode; is referred to as a document.

A tree whose root node is not a &documentNode; is referred to as a fragment.

An atomic value is a value in the value space of an atomic type and is labeled with the name of that atomic type.

An atomic type is a primitive simple type or a type derived by restriction from another atomic type. (Types derived by list or union are not atomic.)

There are 24 primitive simple types: the 19 defined in of and xdt:anyAtomicType, xdt:untyped, xdt:untypedAtomic, xdt:dayTimeDuration, and xdt:yearMonthDuration, defined in .

A type is represented in the data model by an expanded-QName.

An expanded-QName is a pair of values consisting of a possibly empty namespace URI and a local name. They belong to the value space of the XML Schema type xs:QName. References to xs:QName in this document always mean the value space, i.e. a namespace URI, local name pair (and not the lexical space referring to constructs of the form “prefix:local-name”).

Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.

Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.

In all cases where this specification leaves the behavior implementation-defined or implementation-dependent, the implementation has the option of providing mechanisms that allow the user to influence the behavior.

This document normatively defines the XQuery 1.0 and XPath 2.0 Data Model. In this document, examples and material labeled as "Note" are provided for explanatory purposes and are not normative.

Notation

In addition to prose, this specification defines a set of accessor functions to explain the data model. The accessors are shown with the prefix dm:. This prefix is always shown in italics to emphasize that these functions are abstract; they exist to explain the interface between the data model and specifications that rely on the data model: they are not accessible directly from the host language.

Several prefixes are used throughout this document for notational convenience. The following bindings are assumed.

xs: bound to http://www.w3.org/2001/XMLSchema

xsi: bound to http://www.w3.org/2001/XMLSchema-instance

xdt: bound to http://www.w3.org/&date.year;/&date.MM;/xpath-datatypes

fn: bound to http://www.w3.org/2004/10/xpath-functions

In practice, any prefix that is bound to the appropriate URI may be used.

The signature of accessor functions is shown using the same style as , described in .

This document relies on the and PSVI. Information items and properties are indicated by the styles information item and infoset property, respectively.

Some aspects of type assignment rely on the ability to access properties of the schema components. Such properties are indicated by the style {component property}. Note that this does not mean a lightweight schema processor cannot be used, it only means that the application must have some mechanism to access the necessary properties.

Node Identity

Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)

The concept of node identity should not be confused with the concept of a unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.

Document Order

A document order is defined among all the nodes accessible during a given query or transformation. Document order is a total ordering, although the relative order of some nodes is implementation-dependent. Informally, document order is the order in which nodes appear in the XML serialization of a document. Document order is stable, which means that the relative order of two nodes will not change during the processing of a given query or transformation, even if this order is implementation-dependent.

Within a tree, document order satisfies the following constraints:

The root node is the first node.

Every node occurs before all of its children and descendants.

&namespaceNode;s immediately follow the &elementNode; with which they are associated. The relative order of &namespaceNode;s is stable but implementation-dependent.

&attributeNode;s immediately follow the &namespaceNode;s of the element with which they are associated. If there are no &namespaceNode;s associated with a given element, then the &attributeNode;s associated with that element immediately follow the element. The relative order of &attributeNode;s is stable but implementation-dependent.

The relative order of siblings is the order in which they occur in the &dm.prop.children; property of their parent node.

Children and descendants occur before following siblings.

The relative order of nodes in distinct trees is stable but implementation-dependent, subject to the following constraint: If any node in a given tree, T1, occurs before any node in a different tree, T2, then all nodes in T1 are before all nodes in T2.

Sequences

An important characteristic of the data model is that there is no distinction between an item (a node or an atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa.

A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values. When a node is added to a sequence its identity remains the same. Consequently a node may occur in more than one sequence and a sequence may contain duplicate items.

Sequences never contain other sequences; if sequences are combined, the result is always a “flattened” sequence. In other words, appending “(d e)” to “(a b c)” produces a sequence of length 5: “(a b c d e)”. It does not produce a sequence of length 4: “(a b c (d e))”, such a nested sequence never occurs.

Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates. In generalizing node-sets to sequences in XPath 2.0, duplicate removal is provided by functions on node sequences.

Types

The data model supports strongly typed languages such as and that have a type system based on . The type system is formally defined in .

Every item in the data model has both a value and a type. In addition to nodes, the data model can represent atomic values like the number 5 or the string “Hello World.” For each of these atomic values, the data model contains both the value of the item (such as 5 or “Hello World”) and its type name (such as xs:integer or xs:string).

Representation of Types

The data model uses expanded-QNames to represent the names of schema types, which include the built-in types defined by , five additional types defined by this specification, and may include other user- or implementation-defined types.

For XML Schema types, the namespace name of the expanded-QName is the target namespace property of the type definition, and its local name is the name property of the type definition.

The data model relies on the fact that an expanded-QName uniquely identifies every named type. (Although it is possible for different schemas to define different types with the same expanded-QName, at most one of them can be used in any given validation episode.)

For anonymous types, the processor must construct an anonymous type name that is distinct from the name of every named type and the name of every other anonymous type. An anonymous type name is an implementation defined, unique type name provided by the processor for every anonymous type declared in the schemas available in the static context. Anonymous type names must be globally unique across all anonymous types that are accessible to the processor. In the formalism of this specification, the anonymous type names are assumed to be xs:QNames, but in practice implementations are not required to use xs:QNames to represent the implementation-defined names of anonymous types.

The scope over which the names of anonymous types must be meaningful and distinct depends on the processing context. In XSLT, it is the duration of an entire transformation. In XQuery, it is the duration of the evaluation of a top-level expression, i.e. an expression not contained in any other expression.

The data model associates schema type information with &elementNode;s, &attributeNode;s and atomic values. The item is guaranteed to be an instance of that kind of item with the given schema type.

The data model does not represent element or attribute declaration schema components, but it supports various type-related operations. The semantics of other operations, for example, checking if a particular instance of an &elementNode; has a given schema type is defined in .

Predefined Types

In addition to the 19 types defined in of , the data model defines five additional types: xdt:anyAtomicType, xdt:untyped, xdt:untypedAtomic, xdt:dayTimeDuration, and xdt:yearMonthDuration:

xdt:anyAtomicType

The abstract datatype xdt:anyAtomicType is a child of xs:anySimpleType and is the base type for all the primitive atomic types described in . This datatype cannot be used in type declarations, nor can it be used as a base for user-defined atomic types. It can be used, as discussed in , to define a required type (for example in a function signature) to indicate that any of the primitive atomic types or xdt:untypedAtomic is acceptable.

xdt:untyped

The datatype xdt:untyped is a child of xs:anyType and serves as a special type annotation to indicate types that have not been validated by a XML Schema or a DTD. This type cannot be used in type declarations, nor can it be used as a base for user-defined types. It can be used, as discussed in , to define a required type (for example in a function signature) to indicate that only an untyped value is acceptable.

xdt:untypedAtomic

The datatype xdt:untypedAtomic is a child of xdt:anyAtomicType and serves as a special type annotation to indicate atomic values that have not been validated by a XML Schema or a DTD or have received an instance type annotation of xs:anySimpleType in the PSVI. This datatype cannot be used in type declarations, nor can it be used as a base for user-defined atomic types. It can be used, as discussed in , to define a required type (for example in a function signature) to indicate that only an untyped atomic value is acceptable.

xdt:dayTimeDuration

The type xdt:dayTimeDuration is derived from xs:duration by restricting its lexical representation to contain only the days, hours, minutes and seconds components. The value space of xdt:dayTimeDuration is the set of fractional second values. The components of xdt:dayTimeDuration correspond to the day, hour, minute and second components defined in Section 5.5.3.2 of , respectively. xdt:dayTimeDuration is derived from xs:duration as follows:

]]>

To make the long pattern easier to read, it has been formatted on six lines using additional new line and space characters in the pattern string. These additional characters should not be interpreted as part of the pattern.

xdt:yearMonthDuration

The type xdt:yearMonthDuration is derived from xs:duration by restricting its lexical representation to contain only the year and month components. The value space of xdt:yearMonthDuration is the set of xs:integer month values. The year and month components of xdt:yearMonthDuration correspond to the Gregorian year and month components defined in section 5.5.3.2 of , respectively.

The type xdt:yearMonthDuration is derived from xs:duration as follows:

]]> Type Hierarchy

The diagram below shows how the nodes, primitive simple types, and user defined types fit together into a hierarchy.

The xs:IDREFS, xs:NMTOKENS, xs:ENTITIES and user-defined list and union types are special types in that these types are lists or unions rather than true subtypes.

Atomic Values

An atomic value can be constructed from a lexical representation. Given a string and an atomic type, the atomic value is constructed in such a way as to be consistent with validation. If the string does not represent a valid value of the type, an error is raised. When xdt:untypedAtomic is specified as the type, no validation takes place. The details of the construction are described in and the related section of .

String Values

A string value can be constructed from an atomic value. Such a value is constructed by converting the atomic value to its string representation as described in . Using the canonical lexical representation for atomic values may not always be compatible with XPath 1.0. These and other backwards incompatibilities are described in .

Data Model Construction

This section describes the constraints on instances of the data model.

The data model supports well-formed XML documents conforming to or . Documents that are not well-formed are, by definition, not XML. XML documents that do not conform to or are not supported (nor are they supported by ).

In other words, the data model supports the following classes of XML documents:

Well-formed documents conforming to or .

DTD-valid documents conforming to or , and

W3C XML Schema-validated documents.

This document describes how to construct an instance of the data model from an or a Post Schema Validation Infoset (PSVI), the augmented infoset produced by an XML Schema validation episode.

An instance of the data model can also be constructed directly through application APIs, or from non-XML sources such as relational tables in a database.

The data model supports some kinds of values that are not supported by . Examples of these are document fragments and sequences of &documentNode;s. The data model also supports values that are not nodes. Examples of these are sequences of atomic values, or sequences mixing nodes and atomic values. These are necessary to be able to represent the results of intermediate expressions in the data model during expression processing.

Direct Construction

Although this document describes construction of an instance of the data model in terms of infoset properties, an infoset is not an absolutely necessary precondition for building an instance of the data model.

There are no constraints on how an instance of the data model may be constructed directly, save that the resulting instance must satisfy all of the constraints described in this document.

Construction from an Infoset

An instance of the data model can be constructed from an that satisfies the following general constraints:

All general and external parsed entities must be fully expanded. The Infoset must not contain any unexpanded entity reference information items.

The infoset must provide all of the properties identified as required in this document. The properties identified as optional may be used, if they are present. All other properties are ignored.

An instance of the data model constructed from an information set must be consistent with the description provided for each node kind.

Construction from a PSVI

An instance of the data model can be constructed from a PSVI, whose element and attribute information items have been strictly assessed, laxly assessed, or have not been assessed. Constructing an instance of the data model from a PSVI must be consistent with the description provided in this section and with the description provided for each node kind.

Data model construction requires that the PSVI provide unique names for all anonymous schema types.

does not require all schema processors to provide unique names for anonymous schema types. In order to build an instance of the data model from a PSVI produced by a processor that does not provide the names, some post-processing will be required in order to assure that they are all uniquely identified before construction begins.

An incompletely validated document is an XML document that has a corresponding schema but whose schema-validity assessment has resulted in one or more element or attribute information items being assigned values other than 'valid' for the validity property in the PSVI.

The data model supports incompletely validated documents. Elements and attributes that are not valid are treated as having unknown schema types.

The most significant difference between Infoset construction and PSVI construction occurs in the area of schema type assignment. Other differences can also arise from schema processing: default attribute and element values may be provided, white space normalization of element content may occur, and the user-supplied lexical form of elements and attributes with atomic schema types may be lost.

Mapping PSVI Additions to Type Names

A PSVI element or attribute information item may have a validity property. The validity property may be valid, invalid, or notKnown and reflects the outcome of schema-validity assessment. In the data model, precise schema type information is exposed for Element and &attributeNode;s that are valid. Nodes that are not valid are treated as if they were simply well-formed XML and only very general schema type information is associated with them.

Element and Attribute Node Type Names

The precise definition of the schema type of an element or attribute information item depends on the properties of the PSVI. In the PSVI, only guarantees the existence of either the type definition property, or the type definition namespace, type definition name and type definition anonymous properties. If the type definition refers to a union type, there are further properties defined, that refer to the type definition which actually validated the item's normalized value. These properties are not used to determine the schema type of the node.

If the validity and validation attempted properties exist and have the values valid and full, respectively, the schema type of an element or attribute information item is represented by an expanded-QName whose namespace and local name correspond to the first applicable items in the following list:

If the type definition property exists:

If the {name} property is not absent, the {target namespace} and {name} properties of the type definition property;

Otherwise, the namespace and local name of the appropriate anonymous type name.

If type definition anonymous exists:

If it is false: the type definition namespace and the type definition name properties;

Otherwise, the namespace and local name of the appropriate anonymous type name.

If the validity property does not exist or is not valid, or the validition attempted property does not exist or is not full, the schema type of an element is xdt:untyped and the type of an attribute is xdt:untypedAtomic.

Typed Value Determination

The typed value of &attributeNode;s and some &elementNode;s is a sequence of atomic values. (Elements that have a complex type with element-only or mixed content do not contain atomic values; such nodes have no typed value and this section does not apply to them.) The types of the items in the typed value of a node may not be the same as the type of the node itself. This section describes how the typed value of a node is derived from the properties of an information item in a PSVI.

The types of the items in the typed value of a node are determined by a recursive process called typed value determination. This process begins with T, the schema type of the node itself, as represented in the PSVI. The type T has a variety, which is either atomic, union, or list. The typed value determination process is defined as follows:

If the {variety} of T is atomic,

If T is xdt:untyped, the typed value is an instance of xdt:untypedAtomic.

Otherwise, the typed value is an instance of T.

If the {variety} of T is union, then the type of the typed value is the determined by the type definition that actually validated the content of the node, as follows:

If member type definition exists: If the {name} property exists, the {target namespace} and {name} properties of the member type definition; otherwise, the appropriate anonymous type name.

If member type definition anonymous exists: If it is false, the member type definition namespace and member type definition name properties; otherwise, the appropriate anonymous type name.

The resulting type is substituted for T, and the typed value determination process is invoked recursively.

If the {variety} of T is list, the schema normalized value of the node is considered to be a space-separated list of lexical forms, each of which has its own type. For each of these lexical forms, the type of the corresponding item is found in {item type definition}. This type is then substituted for T, and the typed value determination process is invoked recursively for each member of the list.

The typed value determination process is guaranteed to result in a sequence of atomic values, each having a well-defined atomic type. This sequence of atomic values, in turn, determines the string-value and typed-value properties of the node in the data model. However, implementations are allowed some flexibility in how these properties are stored. An implementation may choose to store the string value only and derive the typed value from it, or to store the typed value only and derive the string value from it, or to store both the string value and the typed value.

In order to permit these various implementation strategies, some variations in the string value of a node are defined as insignificant. Implementations that store only the typed value of a node are permitted to return a string value that is different from the original lexical form of the node content. For example, consider the following element:

Assuming that the node is valid, it has a typed value of 30 as an xs:integer. An implementation may return either "30" or "0030" as the string value of the node. Any string that is a valid lexical representation of the typed value is acceptable. In this specification, we express this rule by saying that the relationship between the string value of a node and its typed value must be "consistent with schema validation."

Mapping xsi:nil on &elementNode;s

introduced a mechanism for signaling that an element should be accepted as valid when it has no content despite a content type which does not require or even necessarily allow empty content. That mechanism is the xsi:nil attribute.

The data model exposes this special semantic in the &dm.prop.nilled; property. (It also exposes the attribute, irrespective of whether or not schema processing has been performed.)

If the validity property exists on an information item and is valid then if the nil property exists and is true, then the &dm.prop.nilled; property is true. In all other cases, including all cases where schema validity assessment was not attempted or did not succeed, the &dm.prop.nilled; property is false.

Dates and Times

The date and time types require special attention. The following sections apply to xs:dateTime, xs:date, and xs:time types and types derived from them.

Storing xs:dateTime, xs:date, and xs:time Values in the Data Model

permits xs:dateTime, xs:date, and xs:time values both with and without timezones and therefore only specifies a partial ordering among date and time values. In the data model, it is necessary to preserve timezone information.

In order to achieve this goal, xs:dateTime, xs:date, and xs:time values must be stored with care. If the lexical representation of the value includes a timezone, it is converted to UTC as defined by and the timezone in the lexical representation is converted to a xdt:dayTimeDuration value (as an offset from UTC). Implementations must keep track of both these values for each xs:dateTime, xs:date, and xs:time stored.

Lexical representations that do not have a timezone are assumed to be in UTC for the purposes of normalization only. An empty sequence is used for their timezone.

Thus, for the purpose of validation, 2003-01-02T11:30:00-05:00 is converted to 2003-01-02T16:30:00Z, but in the data model it must be stored as as (2003-01-02T16:30:00Z, -PT5H0M). The value 2003-01-16T16:30:00 is stored as (2003-01-16T16:30:00Z, ()) because it has no timezone.

Retreiving the Typed Value of xs:dateTime, xs:date, and xs:time Values

For xs:dateTime, xs:date and xs:time, the typed value is the atomic value that is determined from its stored form as follows:

If the timezone component is not the empty sequence (the timezone was specified), then the value contains the time component, normalized to the timezone specified by the timezone component, as well as the timezone component. The stored values "(2003-01-02T16:30:00Z, -PT5H0M)" produce the value "2003-01-02T11:30:00-05:00".

If the timezone component is the empty sequence (the timezone was not specified), then the time component without any indication of timezone. The stored values "(2003-01-02T16:30:00Z, ())" produce the value "2003-01-02T16:30:00".

QNames and NOTATIONS

The QName and NOTATION data types require special attention. The following sections apply to xs:QName, xs:NOTATION, and types derived from them. These types are referred to collectively as “qualified names”.

As defined in XML Schema, the lexical space for qualified names includes a local name and an optional namespace prefix. The value space for qualified names contains a local name and an optional namespace URI. Therefore, it is not possible to derive a lexical value from the typed value, or vice versa, without access to some context that defines the namespace bindings.

When qualified exist as values of nodes in a well-formed document, it is always possible to determine such a namespace context. However, the data model also allows qualified names to exist as freestanding atomic values, or as the name or value of a parentless attribute node, and in these cases no namespace context is available.

In this Data Model, therefore, the value space for qualified names contains a local-name, and optional namespace URI, and an optional prefix. The prefix is used only when producing a lexical representation of the value, that is, when casting the value to a string. The prefix plays no part in other operations involving qualified names: in particular, two qualified names are equal if their local names and namespace URIs match, regardless whether they have the same prefix.

The following consistency constraints apply:

If the namespace URI of a qualified name is absent, then the prefix must also be absent.

For every element node whose name has a prefix, the prefix must be one that is bound to the namespace URI of the element name by one of the namespace nodes of the element.

For every element node whose name has no prefix, the element must have a namespace node that binds the empty prefix to the namespace URI of the element name, or must have no namespace node that binds the empty prefix in the case where the name of the element has no namespace URI.

For every attribute node whose name has a prefix, the attribute node must either be parentless, or the prefix must be one that is bound to the namespace URI of the attribute name by one of the namespace nodes of the parent element.

For every qualified name that contains a prefix and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, the prefix must be one that is bound to the namespace URI of the qualified name by one of the namespace nodes of that element.

For every qualified name that contains a namespace URI and no prefix, and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, that element node must have a namespace node that binds the empty prefix to that namespace URI.

For every qualified name that contains neither a namespace URI nor a prefix, and that is included in the typed value of an element node, or of an attribute node that has an element node as its parent, that element node must have no namespace node that binds the empty prefix.

Infoset Mapping

This specification describes how to map each kind of node to the corresponding information item. This mapping produces an Infoset; it does not and cannot produce a PSVI. Validation must be used to obtain a PSVI for a (portion of a) data model instance.

An Infoset can also be constructed by serializing an instance of the data model and parsing it. Serialization is governed by .

Accessors

A set of accessors is defined nodes in the data model. Some accessors return a constant empty sequence on certain node kinds. The unparsed-entity-system-id, unparsed-entity-public-id, and document-uri accessors, which are only available on &documentNode;s, are not included in this summary.

In order for processors to be able to operate on instances of the data model, the model must expose the properties of the items it contains. The data model does this by defining a family of accessor functions. These are not functions in the literal sense; they are not available for users or applications to call directly. Rather they are descriptions of the information that an implementation of the data model must expose to applications. Functions and operators available to end-users are described in .

Some typed values in the data model are undefined. Attempting to access an undefined property always raises an error.

base-uri Accessor

The base-uri accessor returns the base URI of a node as a sequence containing zero or one URI reference. For more information about base URIs, see .

It is defined on all seven node kinds.

node-kind Accessor

The node-kind accessor returns a string identifying the kind of node. It will be one of the following, depending on the kind of node: “document”, “element”, “attribute”, “processing-instruction”, “comment”, or “text”.

It is defined on all seven node kinds.

node-name Accessor

The node-name accessor returns the name of the node as a sequence of zero or one xs:QNames. Note that the QName value includes an optional prefix as described in .

It is defined on all seven node kinds.

parent Accessor

The parent accessor returns the &dm.prop.parent; of a node as a sequence containing zero or one nodes.

It is defined on all seven node kinds.

string-value Accessor

The string-value accessor returns the string value of a node.

It is defined on all seven node kinds.

typed-value Accessor

The typed-value accessor returns the typed-value of the node as a sequence of zero or more atomic values.

It is defined on all seven node kinds.

type-name Accessor

The type-name accessor returns the name of the schema type of a node as a sequence of zero or one xs:QNames.

It is defined on all seven node kinds.

children Accessor

The children accessor returns the children of a node as a sequence containing zero or more nodes.

It is defined on all seven node kinds.

attributes Accessor

The attributes accessor returns the attributes of a node as a sequence containing zero or more &attributeNode;s. The order of &attributeNode;s is stable but implementation dependent.

It is defined on all seven node kinds.

namespace-nodes Accessor

The namespace-nodes accessor returns the dynamic, in-scope namespaces associated with a node as a sequence containing zero or more &namespaceNode;s. The order of &namespaceNode;s is stable but implementation dependent.

It is defined on all seven node kinds.

Note: this accessor and the namespace-bindings accessor provide two views of the same information. Implementations that do not need to expose &namespaceNode;s might choose not to implement this accessor.

namespace-bindings Accessor

The namespace-bindings accessor returns returns the dynamic, in-scope namespaces associated with a node as a set of prefix/URI pairs. In the formalism of this specification, these pairs are represented as a list of strings where each odd-numbered list item is the prefix and the following even-numbered item is the URI. In practice, implemenations may choose a more efficient return type.

The prefix for the default namespace is "".

The namespace-bindings accessor is defined on all seven node kinds.

Note: this accessor and the namespace-nodes accessor provide two views of the same information.

nilled Accessor

The nilled accessor returns true if the node is nilled, see .

It is defined on all seven node kinds.

is-id Accessor

The is-id accessor returns true if the node is an XML ID.

It is defined on Element and Attribute Nodes.

is-idrefs Accessor

The is-idrefs accessor returns true if the node is an XML IDREF or IDREFS.

It is defined on Element and Attribute Nodes.

Nodes

There are seven kinds of Nodes in the data model: document, element, attribute, text, namespace, processing instruction, and comment. Each kind of node is described in the following sections.

All nodes must satisfy the following general constraints:

Every node must have a unique identity, distinct from all other nodes.

The &dm.prop.children; property of a node must not contain two consecutive &textNode;s.

The &dm.prop.children; property of a node must not contain any empty &textNode;s.

The &dm.prop.children; and &dm.prop.attributes; properties of a node must not contain two nodes with the same identity.

&Document; ∈ &Attribute; &Namespace; &ProcessingInstruction; &Comment; &Text; Conformance

The data model is intended primarily as a component that can be used by other specifications. Therefore, the data model relies on specifications that use it (such as , , and ) to specify conformance criteria for the data model in their respective environments. Specifications that set conformance criteria for their use of the data model must not relax the constraints expressed in this specification.

Authors of conformance criteria for the use of the data model should pay particular attention to the following features of the data model:

Support for DTD processing (both validation and unparsed entities).

Support for W3C XML Schema processing.

Support for the normative construction from an infoset described in .

Support for the normative construction from a PSVI described in .

Support for XML 1.0 and XML 1.1.

XML Information Set Conformance

This specification conforms to the XML Information Set . The following information items must be exposed by the infoset producer to construct a data model unless they are explicitly identified as optional:

The Document Information Item with base URI and children properties.

Element Information Items with base URI, children, attributes, in-scope namespaces, local name, namespace name, parent properties.

Attribute Information Items with namespace name, local name, normalized value, attribute type, and owner element properties.

Character Information Items with character code and parent properties.

Processing Instruction Information Items with base URI, target, content and parent properties.

Comment Information Items with content and parent properties.

Namespace Information Items with prefix (optional) and namespace name properties.

Other information items and properties made available by the Infoset processor are ignored. In addition to the properties above, the following properties are required from the PSVI if the data model is constructed from a PSVI:

validity, type definition, type definition namespace, type definition name, type definition anonymous, member type definition, member type definition namespace, member type definition name, member type definition anonymous and schema normalized value properties on Element Information Items.

Error Summary

This error is raised whenever an accessor is called for a property that is undefined.

References Normative References XML Path Language (XPath) 2.0, Mary F. Fernández, Michael Kay, Jonathan Robie, et. al., Editors. World Wide Web Consortium, 29 Oct 2004. This version is http://www.w3.org/TR/2004/WD-xpath20-20041029. The latest version is available at http://www.w3.org/TR/xpath20. XQuery 1.0 and XPath 2.0 Functions and Operators, Ashok Malhotra, Jim Melton, and Norman Walsh, Editors. World Wide Web Consortium, 29 Oct 2004. This version is http://www.w3.org/TR/2004/WD-xpath-functions-20041029/. The latest version is available at http://www.w3.org/TR/xpath-functions/. XSLT 2.0 and XQuery 1.0 Serialization, Michael Kay, Norman Walsh, and Henry Zongaro, Editors. World Wide Web Consortium, 29 Oct 2004. This version is http://www.w3.org/TR/2004/WD-xslt-xquery-serialization-20041029/. The latest version is available at http://www.w3.org/TR/xslt-xquery-serialization/. Other References XML Query Data Model, Mary Fernández and Jonathan Robie, Editors. World Wide Web Consortium, 15 Feb 2001. XML Query Working Group, World Wide Web Consortium. Home page: http://www.w3.org/XML/Query XSL Working Group, World Wide Web Consortium. Home page: http://www.w3.org/Style/XSL/ XQuery 1.0: An XML Query Language, Daniela Florescu, Jonathan Robie, Jérôme Siméon, et. al., Editors. World Wide Web Consortium, 29 Oct 2004. This version is http://www.w3.org/TR/2004/WD-xquery-20041029/. The latest version is available at http://www.w3.org/TR/xquery. ISO (International Organization for Standardization). Representations of dates and times, 2000-08-03. Available from: http://www.iso.ch/

Glossary

Example

The following XML document is used to illustrate the information contained in a data model:

&dm-example.xml;

The document is associated with the URI http://www.example.com/catalog.xml, and is valid with respect to the following XML schema:

&dm-example.xsd;

This example exposes the data model for a document that has an associated schema and has been validated successfully against it. In general, an XML Schema is not required, that is, the data model can represent a schemaless, well-formed XML document with the rules described in .

The XML document is represented by the nodes described below. The value D1 represents a &documentNode;; the values E1, E2, etc. represent &elementNode;s; the values A1, A2, etc. represent &attributeNode;s; the values N1, N2, etc. represent &namespaceNode;s; the values P1, P2, etc. represent &processingInstructionNode;s; the values T1, T2, etc. represent &textNode;s.

For brevity:

&textNode;s in the data model that contain only white space are not shown.

Literal strings are shown in quotes without the xs:string() constructor

Literal decimals are shown without the xs:decimal() constructor

Nodes are referred to using the syntax [nodeID]

xs:QNames are used with the following prefixes bindings:

xs	http://www.w3.org/2001/XMLSchema
xsi	http://www.w3.org/2001/XMLSchema-instance
cat	http://www.example.com/catalog
xlink	http://www.w3.org/1999/xlink
html	http://www.w3.org/1999/xhtml
anon	An implementation-dependent prefix associated with anonymous type names

The abbreviation \n is used in string literals to represent a newline character; this isn't supported in XPath, but it makes this presentation clearer.

Accessors that return the empty sequence have been omitted.

To simplify the presentation, we’re assuming an implementation that does not expose the namespace axis. Therefore, &namespaceNode;s are shared across multiple elements. See .

&dm-example.tbl;

A graphical representation of the data model for the preceding example is shown below. Document order in this representation can be found by following the traditional in-order, left-to-right, depth-first traversal; however, because the image has been rotated for easier presentation, this appears to be in-order, bottom-to-top, depth-first order.

Graphic representation of the data model. [large view, SVG]