W3C

XQuery 1.0 and XPath 2.0 Data Model

W3C Working Draft 02 May 2003

This version:
http://www.w3.org/TR/2003/WD-xpath-datamodel-20030502/
Latest version:
http://www.w3.org/TR/xpath-datamodel/
Previous versions:
http://www.w3.org/TR/2002/WD-query-datamodel-20021115/ http://www.w3.org/TR/2002/WD-query-datamodel-20020816/
Editors:
Mary Fernández (XML Query WG), AT&T Labs <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Jonathan Marsh (XSL WG), Microsoft <jmarsh@microsoft.com>
Marton Nagy (XML Query WG), Science Applications International Corporation (SAIC) <marton.nagy@saic.com>
Norman Walsh (XSL WG), Sun Microsystems <Norman.Walsh@Sun.COM>

This document is also available in these non-normative formats: XML.


Abstract

This document defines the W3C XQuery 1.0 and XPath 2.0 Data Model, which is the data model of at least [XPath 2.0], [XSLT 2.0], and [XQuery 1.0: A Query Language for XML], and any other specifications that reference it. This data model is based on the [XPath 1.0] data model and earlier work on an [XML Query Data Model]. This document is the result of joint work by the [XSL Working Group] and the [XML Query Working Group].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is a Public Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.

The XQuery 1.0 and XPath 2.0 Data Model has been defined jointly by the XML Query Working Group and the XSL Working Group (both part of the XML Activity).

This is a Last Call Working Draft. Comments on this document are due on 30 June 2003. Comments should be sent to the W3C mailing list public-qt-comments@w3.org. (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.html.

Table of Contents

1 Introduction
2 Notation
3 Concepts
    3.1 Node Identity
    3.2 Document Order
    3.3 XML Schemas and the XML Information Set
    3.4 Types
    3.5 Typed Value and String Value
    3.6 Mapping PSV Infoset additions to Types
        3.6.1 Mapping xs:dateTime, xs:date, and xs:time Values
        3.6.2 Mapping xsi:nil on Element Nodes
    3.7 Comments, Processing Instructions, and Whitespace
4 Nodes
    4.1 Accessors
        4.1.1 base-uri Accessor
        4.1.2 node-kind Accessor
        4.1.3 node-name Accessor
        4.1.4 parent Accessor
        4.1.5 string-value Accessor
        4.1.6 typed-value Accessor
        4.1.7 type Accessor
        4.1.8 children Accessor
        4.1.9 attributes Accessor
        4.1.10 namespaces Accessor
        4.1.11 nilled Accessor
    4.2 Documents
        4.2.1 Overview
        4.2.2 Accessors
        4.2.3 PSVI to Data Model Mapping
        4.2.4 Data Model to Infoset Mapping
    4.3 Elements
        4.3.1 Overview
        4.3.2 Accessors
        4.3.3 PSVI to Data Model Mapping
        4.3.4 Data Model to Infoset Mapping
    4.4 Attributes
        4.4.1 Overview
        4.4.2 Accessors
        4.4.3 PSVI to Data Model Mapping
        4.4.4 Data Model to Infoset Mapping
    4.5 Namespaces
        4.5.1 Overview
        4.5.2 Accessors
        4.5.3 PSVI to Data Model Mapping
        4.5.4 Data Model to Infoset Mapping
    4.6 Processing Instructions
        4.6.1 Overview
        4.6.2 Accessors
        4.6.3 PSVI to Data Model Mapping
        4.6.4 Data Model to Infoset Mapping
    4.7 Comments
        4.7.1 Overview
        4.7.2 Accessors
        4.7.3 PSVI to Data Model Mapping
        4.7.4 Data Model to Infoset Mapping
    4.8 Text
        4.8.1 Overview
        4.8.2 Accessors
        4.8.3 PSVI to Data Model Mapping
        4.8.4 Data Model to Infoset Mapping
5 Atomic Values
6 Sequences

Appendices

A XML Information Set Conformance
B References
    B.1 Normative References
    B.2 Other References
C Glossary (Non-Normative)
D Example (Non-Normative)
E Issues List (Non-Normative)
F Recently Closed Issues (Non-normative)
G Accessor Summary (Non-normative)


1 Introduction

This document defines the XQuery 1.0 and XPath 2.0 Data Model, which is the data model of [XPath 2.0], [XSLT 2.0] and [XQuery 1.0: A Query Language for XML]

The XQuery 1.0 and XPath 2.0 Data Model (henceforth "data model") serves two purposes. First, it defines precisely the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages. A language is closed with respect to a data model if the value of every expression in a language is guaranteed to be in the data model. XSLT 2.0, XQuery 1.0, and XPath 2.0 are all closed with respect to the data model.

The data model is based on the [XML Information Set] (henceforth "Infoset"), but it requires the following new features to meet the [XPath Requirements Version 2.0] and [XML Query Requirements]:

As with the Infoset, the XQuery 1.0 and XPath 2.0 Data Model specifies what information in the documents is accessible, but it does not specify the programming-language interfaces or bindings used to represent or access the data.

Every value handled by the data model is a sequence of zero or more items. An item is either a node or an atomic value. A node is defined in 4 Nodes and is one of seven node kinds. An atomic value encapsulates an XML Schema atomic type and a corresponding value of that type. They are defined in 5 Atomic Values. A sequence is an ordered collection of nodes, atomic values, or any mixture of nodes and atomic values. A sequence cannot be a member of a sequence. A single item appearing on its own is modeled as a sequence containing one item. Sequences are defined in 6 Sequences.

Note:

In XPath 1.0, the data model only defines nodes. The primitive data types (number, boolean, string, node-set) are part of the expression language, not the data model.

The data model can represent various values including not only the input and the output of a stylesheet or query, but all values of expressions used during the intermediate calculations. Examples include the input document or document repository (represented as a document node or a sequence of document nodes), the result of a path expression (represented as a sequence of nodes), the result of an arithmetic or a logical expression (represented as an atomic value), a sequence expression resulting in a sequence of items, etc.

In this document, we provide a precise definition of the properties of nodes in the XQuery 1.0 and XPath 2.0 Data Model, how they are accessed, and how they relate to values in the Infoset. We note wherever the XQuery 1.0 and XPath 2.0 Data Model differs from that of XPath 1.0.

2 Notation

In addition to prose, we define a set of accessor functions to explain the data model. The accessors defined by the data model are shown with the prefix dm. The prefix is always shown in italics to emphasize that these functions are abstract; they exist to explain the interface between the data model and specifications that rely on the data model: they are not and cannot be made accessible directly from the host language.

The signature of accessors is shown using the same style as [XQuery 1.0 and XPath 2.0 Functions and Operators]. For example:

dm:typed-value($n as Node) as xdt:anyAtomicType*

In the notation syntax, the term Node denotes the category of node values and Item refers to the category of either node values or atomic values.

Some accessors can accept or return sequences. The following notation is used to denote sequence values:

In a sequence, V may be a Node or AtomicValue, or the union (choice) of several categories of Items.

There are some functions in the data model that are partial functions. We use the occurrence indicators ? or * when specifying the return type of such functions. For example, a node may have one parent node or no parent. If the node argument has a parent, the dm:parent accessor returns a singleton sequence. If the node argument does not have a parent, it returns the empty sequence. The signature of dm:parent specifies that it returns an empty sequence or a sequence containing one node:

dm:parent($n as Node) as Node?

This document relies on the [XML Information Set]. Information items and properties are indicated by the styles information item and [property], respectively.

This document frequently uses the term expanded-QName. [Definition: An expanded-QName is a pair of values consisting of a namespace URI and a local name. They belong to the value space of the XML Schema type xs:QName. When this document refers to xs:QName we always mean the value space, i.e. a namespace URI, local name pair (and not the lexical space referring to constructs of the form prefix:local-name).]

3 Concepts

3.1 Node Identity

Because XML documents are tree-structured, we define the data model using conventional terminology for trees. The data model is a node-labeled, directed graph, in which each node has a unique identity. Every node in the data model is unique: identical to itself, and not identical to any other node.

This concept should not be confused with the concept of a unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.

3.2 Document Order

[Definition: A document order is defined on all the nodes in a document. Document order is a total ordering, although the relative order of some nodes is implementation-dependent. Informally, document order is the order returned by an in-order, depth-first, left-to-right traversal of the data model.] There is precisely one document order and it satisfies the following constraints.

  • The document node is the first node.

  • The relative order of siblings is determined by their order in the XML representation. A node N1 occurs before a node N2 in document order if and only if the start of N1 occurs before the start of N2 in the XML document.

  • Element nodes occur before their children; children occur before following-siblings.

  • Namespace nodes immediately follow the element node with which they are associated. The relative order of namespace nodes is stable but implementation-dependent.

  • Attribute nodes immediately follow the namespace nodes of the element with which they are associated. The relative order of attribute nodes is stable but implementation-dependent.

The relative order of nodes in distinct documents is implementation-dependent but stable. In other words, given two distinct documents A and B, if a node in document A is before a node in document B, then every node in document A is before every node in document B.

The relative order of free-standing nodes (elements, attributes, and other nodes created outside the context of a particular document) is also implementation-dependent but stable.

3.3 XML Schemas and the XML Information Set

This document describes how to construct an instance of the data model from an [XML Information Set]. Some aspects of the data model are dependent upon XML Schema validity assessment; this document describes how to determine those aspects of the data model from a Post Schema Validation Infoset. [Definition: A Post Schema Validation Infoset, or PSVI, is the augmented infoset produced by an XML Schema validation episode.].

Although we describe construction of a data model in terms of infoset properties, an infoset is not an absolutely necessary precondition for building an instance of the Data Model. Purely synthetic data model instances are entirely appropriate as long as they obey all of the constraints described in this document.

The data model supports well-formed XML documents conforming to [Namespaces in XML]. XML documents that are not well-formed are not XML, by definition. XML documents that do not conform to [Namespaces in XML] are not supported (nor are they supported by [XML Information Set]).

In other words, the data model supports the following classes of XML documents:

The data model supports some kinds of values that are not supported by [XML Information Set]. Examples of these are well-formed document fragments, sequences of fragments or sequences of documents. The data model also supports values that are not nodes. Examples of these are atomic values, sequences of atomic values, or sequences mixing nodes and atomic values. These are necessary to be able to represent the results of intermediate expressions in the data model during expression processing.

Schema-validated documents include documents in which some elements or attributes have been validated by "lax" or "skip" validation ([XMLSchema Part 1]).

An "incompletely validated document" is an XML document that has a corresponding schema but whose schema-validity assessment has resulted in one or more element or attribute information items being assigned values other than 'valid' for the [validity] property in the PSVI.

The data model supports incompletely validated documents, but inconsistent data models are forbidden. Elements and attributes that are not valid are treated as untyped.

In addition to specifying the transformation from the Post Schema Validation Infoset (PSVI) to the data model, this document also specifies the transformation from the data model back to the XML Information Set. This is a useful notion that can be used for defining serialization and validation. Serialization can be viewed as a two step process, first transforming to the XML Infoset and then to an XML document. Validation is described conceptually as a process of mapping the data model to the XML Infoset followed by XML Schema validation producing a PSVI which is then loaded into the data model.

3.4 Types

The data model supports a representation of named types as stipulated by [XQuery 1.0 Formal Semantics].

For named types, which includes both the built-in types defined by [XMLSchema Part 2] and named user-defined types declared in a schema and imported by a stylesheet or query, the data model uses expanded-QNames to represent their names. Since named types in XML Schema are global, an expanded-QName uniquely identifies such a type. The namespace name of the expanded-QName is the target namespace of the schema and its local name is the name of the type.

For anonymous types, the processor must construct an anonymous type name that is distinct from the name of every named type and the name of every other anonymous type. [Definition: An anonymous type name is an implementation-defined, globally unique type name provided by the processor for every anonymous type declared in an imported schema.]

In either case, the type names must also appear in the In-scope Schema Definitions (as defined in [XPath 2.0]) available to the processor.

The data model associates type information with element nodes, attribute nodes and atomic values. The item is guaranteed to be a valid instance of that type as defined by XML Schema.

The data model defines an accessor dm:type that returns an expanded-QName corresponding to the type of the element node, attribute node or atomic value. It returns xs:anyType or xs:anySimpleType if no type information exists, or if it failed W3C XML Schema validity assessment.

When no type information exists for an element or an attribute node we frequently use the terminology "element with unknown type" or "attribute with unknown simple type".

The data model does not represent element or attribute declaration schema components, but it supports various type-related operations. The semantics of such operations, e.g. checking if a particular instance of an element node has a given type is defined in [XQuery 1.0 Formal Semantics].

3.5 Typed Value and String Value

The content of a text, attribute, or element node can be interpreted in two ways: as a string value or as a typed value. For these types of nodes, the typed value can be extracted by the dm:typed-value accessor, and the string value can be extracted by the dm:string-value accessor.

The string value of a node is a single xs:string derived from the content of the node as described in the definitions of the accessor functions for each kind of node.

The typed value of a node is a sequence of atomic values derived from its string value and its type in a way that is consistent with schema validation, as described in the definitions of the accessor functions for each kind of node.

3.6 Mapping PSV Infoset additions to Types

This section specifies how the type of an element or attribute node is computed from the PSVI properties that specify validity and type assessment for the node's corresponding information item.

A PSVI element or attribute information item has a [validity] property. The [validity] property may be "valid", "invalid", or "notKnown" and reflects the outcome of schema-validity assessment. The only information that can be inferred from an invalid or not known validity value is that the information item is well-formed, therefore, we must associate some general type information with the element or attribute node if it is not known to be valid.

The precise definition of the type of an element or attribute information item depends on the properties of the Infoset or PSVI. In a PSVI, XML Schema only guarantees the existence of either the [type definition] property, or the [type definition namespace], [type definition name] and [type definition anonymous] properties. If the type definition refers to a union type, there are further properties defined, that refer to the type definition which actually validated the item's normalized value. These properties are either the [member type definition], or the [member type definition namespace], [member type definition name] and [member type definition anonymous] properties. If these are available, the type of an element or attribute will refer to the member type that actually validated the schema normalized value.

If a PSVI is not available, then the data model is constructed from the Infoset in a manner that is compatible with the expectations of well-formed or DTD-validated parsing of an XML document.

The type of an element information item is represented by an expanded-QName whose namespace and local name correspond to the first applicable items in the following list:

  • If the [validity] property exists and is "valid":

    • If [member type definition] exists and its {name} property is present:

      • The {target namespace} and {name} properties of the [member type definition] property.

    • If the [type definition] property exists and its {name} property is present:

      • The {target namespace} and {name} properties of the [type definition] property.

    • If [member type definition anonymous] exists:

      • If it is false: the [member type definition namespace] and the [member type definition name].

      • Otherwise, the namespace and local name of the appropriate anonymous type name.

    • If [type definition anonymous] exists:

      • If it is false: the [type definition namespace] and the [type definition name]

      • Otherwise, the namespace and local name of the appropriate anonymous type name.

  • If the [validity] property does not exist on this node or any of its ancestors, Infoset-only processing is applied:

    • If the [attribute type] property exists and has one of the following values: ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, or NMTOKENS, the {target namespace} is "http://www.w3.org/2001/XMLSchema" and the {name} is the [attribute type].

    Note that this processing is only performed if no part of the subtree that contains the node was schema validated. In particular, Infoset-only processing does not apply to subtrees that are "skip" validated in a document.

  • Otherwise, xs:anyType for elements or xs:anySimpleType for attributes.

If the expanded-QName that results from this derivation is not available in the processor's In-Scope Schema Definitions, the expanded-QName is promoted to xs:anyType for elements or xs:anySimpleType for attributes. This can occur, for example, if the processor does not support the schema import feature or if it was unable to import the necessary schema.

Attributes from the XML Schema instance namespace, "http://www.w3.org/2001/XMLSchema-instance", (xsi:schemaLocation, xsi:type, etc.) appear as ordinary attributes in the data model. They will be validated appropriately by schema processors and will simply appear as attributes of type xs:anySimpleType if they haven't been schema validated.

3.6.1 Mapping xs:dateTime, xs:date, and xs:time Values

[XMLSchema Part 2] permits xs:dateTime, xs:date, and xs:time values both with and without timezones. In the context of validation, this is a purely lexical distinction. In order to compare dates and times, an XML Schema validator converts all times to Coordinated Universal Time (UTC or timezone Z). But in the data model, it is necessary to preserve timezone information.

In order to achieve this goal xs:dateTime, xs:date, and xs:time values are represented as tuples in the data model: a time value normalized to UTC and a timezone represented as a xdt:dayTimeDuration.

The lexical representation of the value is converted to UTC as defined by [XMLSchema Part 2] and the timezone in the lexical representation is converted to a xdt:dayTimeDuration value. These two values are stored in the tuple.

Lexical representations that do not have a timezone are assumed to be in UTC for the purposes of normalization. An empty sequence is used for their timezone in the tuple.

Thus, for the purpose of validation, "2003-01-02T11:30:00-05:00" is converted to "2003-01-02T16:30:00Z", but the data model stores it as "(2003-01-02T16:30:00Z, -PT5H0M)". The value "2003-01-16T16:30:00" is stored as it is "(2003-01-02T16:30:00Z, ())" because it has no timezone.

3.6.2 Mapping xsi:nil on Element Nodes

[XMLSchema Part 2] introduced a mechanism for signaling that an element should be accepted as valid when it has no content despite a content type which does not require or even necessarily allow empty content. That mechanism is the xsi:nil attribute.

The data model exposes this special semantic in the nilled property.

If the [validity] property exists on an element node and is "valid" then nilled may be set. The nilled property is never set for nodes that have not been successfully schema validated.

If the element is valid and has a PSVI [nil] property and that property is true, then nilled is true. In all other cases, nilled is false.

3.7 Comments, Processing Instructions, and Whitespace

Although the data model is able to represent comments, processing instructions, and insignificant whitespace, preservation of this information may be unnecessary and onerous for some applications.

An instance of the data model can be constructed from an Infoset, a PSVI, or from some other data source entirely. Different applications may or may not choose to construct nodes in the data model to represent comments, processing instructions, and insignificant white space. These decisions are considered outside the scope of the data model. Consequently the data model makes no attempt to control or identify the sort of processing in this regard that an application uses to construct a data model instance.

4 Nodes

The category of Node values contains seven distinct kinds of nodes: document, element, attribute, text, namespace, processing instruction, and comment. The seven kinds of nodes are defined in the following subsections.

A tree contains a root plus all nodes that are reachable directly or indirectly from the root via the dm:children, dm:attributes, and dm:namespaces accessors. Every node belongs to exactly one tree, and every tree has exactly one root node. A tree whose root node is a document node is referred to as a document. A tree whose root node is some other kind of node is referred to as a fragment.

4.1 Accessors

A set of accessors is defined on all seven kinds of Nodes. Some accessors return a constant empty sequence on certain node kinds. Some node kinds have additional accessors that are not summarized here.

In order for applications to be able to operate on instances of the data model, the model must expose properties of the items it contains. The data model does this by defining a family of accessor functions. These are not functions in the literal sense, they are not available for users or applications to call directly, rather they are descriptions of the interface that an implementation of the data model must expose to applications. Functions and operators available to end-users are described in [XQuery 1.0 and XPath 2.0 Functions and Operators].

4.1.1 base-uri Accessor

dm:base-uri($n as Node) as xs:anyURI?

The dm:base-uri accessor returns a sequence containing zero or one uri references.

Document, element, and processing-instruction nodes have a base-uri property. The base-uri of all other node types is the empty sequence.

If the base-uri property of a document, element, or processing-instruction node is non-empty, its value is returned.

If the accessor is called on a node that does not have a base-uri property, or whose base-uri property is empty, the base-uri of that node's parent is returned. If the node has no parent, the empty sequence is returned.

4.1.2 node-kind Accessor

dm:node-kind($n as Node) as xs:string

The dm:node-kind accessor returns a string value identifying the kind of node on which the accessor was called. One of the following values is returned:

  • "document" for document nodes.

  • "element" for element nodes.

  • "attribute" for attribute nodes.

  • "text" for text nodes.

  • "namespace" for namespace nodes.

  • "processing-instruction" for processing instruction nodes.

  • "comment" for comment nodes.

4.1.3 node-name Accessor

dm:node-name($n as Node) as xs:QName?

The dm:node-name accessor returns a sequence of zero or one xs:QNames.

  • For element and attribute nodes, dm:node-name returns the qualified name of the element or attribute.

  • For processing-instructions nodes, dm:node-name returns an xs:QName with the processing instruction target name in the local-name and no namespace URI.

  • For namespace nodes, dm:node-name returns an xs:QName with the prefix of the namespace declaration in the local-name and no namespace URI. If the namespace declaration declares the default namespace, which has no prefix, an empty sequence is returned.

    Some implementations may not preserve information about the prefixes declared. In these cases, the dm:node-name accessor returns the empty sequence when applied to namespace nodes.

4.1.4 parent Accessor

dm:parent($n as Node) as Node?

The dm:parent accessor returns a sequence containing zero or one nodes.

For nodes that have a parent, dm:parent returns the parent node. For all other nodes, it returns the empty sequence.

If the return value is not the empty sequence, it will always be either an element node or a document node.

4.1.5 string-value Accessor

dm:string-value($n as Node) as xs:string

Every node has a string value; the way in which the string value of a node is computed is different for each kind of node and is specified in the sections on nodes below.

The string value of an atomic value is computed by casting it to an xs:string as per the rules described in [XQuery 1.0 and XPath 2.0 Functions and Operators].

4.1.6 typed-value Accessor

dm:typed-value($n as Node) as xdt:anyAtomicType*

The dm:typed-value accessor returns the typed-value of the node, which is a sequence of zero or more atomic values derived from the string-value of the node and its type in such a way as to be consistent with validation.

  • If the node is a comment, document, namespace, processing-instruction, or text node, then its typed value is equal to its string value as an instance of xdt:untypedAtomic.

  • If the node is an attribute node with type xs:anySimpleType, then its typed value is equal to its string value as an instance of xdt:untypedAtomic. The typed value of an attribute node with any other type is derived from its string value and type annotation in a way that is consistent with XML Schema validation.

  • If the node is an element node with type xs:anyType, then its typed value is equal to its string value, as an instance of xdt:untypedAtomic.

  • If the node is an element node with a simple type or with a complex type of simple content, then its typed value is derived from its string value and type in a way that is consistent with XML Schema validation.

  • If the item is an element node with complex type of empty content, then its typed value is the empty sequence.

  • If the node is an element node with a complex type of mixed content, then its typed value is its string value as an instance of xdt:untypedAtomic.

  • If the item is an element node with complex type of complex content, then its typed value is undefined and dm:typed-value raises a type error, which may be handled by the host language.

For detailed semantics see [XQuery 1.0 Formal Semantics].

For xs:dateTime, xs:date and xs:time, the typed value is the atomic value that is determined from its tuple representation as follows:

  • If the timezone component is not the empty sequence, then the value contains the time component, normalized to the timezone specified by the timezone component, as well as the timezone component. The tuple "(2003-01-02T16:30:00Z, -PT5H0M)" produces the value "2003-01-02T11:30:00-05:00".

  • If the timezone component is the empty sequence, then the time component without any indication of timezone. The tuple "(2003-01-02T16:30:00Z, ())" produces the value "2003-01-02T16:30:00".

4.1.7 type Accessor

dm:type($n as Node) as xs:QName?

The dm:type accessor returns the name of the type of a node.

For element nodes and attribute nodes, dm:type returns the name of the type of the node (as an xs:QName) if it has one. If the type is anonymous, or if no type information exists, the name returned will be unique but implementation defined.

Note:

The use of xs:QName in this signature is part of the data model formalism. In practice, implementations are not required to use xs:QNames to represent the implementation-defined names of anonymous types.

For text nodes, dm:type returns xdt:untypedAtomic.

For other node kinds, it always returns the empty sequence.

4.1.8 children Accessor

dm:children($n as Node) as Node*

The dm:children accessor returns a sequence containing zero or more nodes.

For document and element nodes, it returns the nodes that are the children of that node in document order. It returns the empty sequence for document and element nodes that have no children. If children exist, they will always consist exclusively of element, processing-instruction, comment, and text nodes. Attribute, namespace, and document nodes can never appear as children.

For all other nodes, it always returns the empty sequence.

A document node or an element node is the parent of each of its child nodes. Nodes never share children: if two nodes have distinct identities, then no child of one node will be a child of the other node.

The sequence of children will never contain adjacent text nodes.

4.1.9 attributes Accessor

dm:attributes($n as Node) as AttributeNode*

The dm:attributes accessor returns a sequence containing zero or more attribute nodes.

For element nodes, these are the attributes of the node. For all other nodes, it always returns the empty sequence.

4.1.10 namespaces Accessor

dm:namespaces($n as Node) as NamespaceNode*

The dm:namespaces accessor returns a sequence containing zero or more namespace nodes.

For element nodes, these are the namespaces of the node. For all other nodes, it always returns the empty sequence.

4.1.11 nilled Accessor

dm:nilled($n as Node) as xs:boolean

The dm:nilled accessor returns the setting of the nilled property of an element node. See 3.6.2 Mapping xsi:nil on Element Nodes.

For all other nodes, it always returns the emtpy sequence.

4.2 Documents

4.2.1 Overview

Document nodes encapsulate XML documents. Documents have the following properties:

  • base-uri, possibly empty.

  • children, possibly empty.

  • unparsed-entities, possibly empty.

  • document-uri, possibly empty.

Document nodes must satisfy the following constraints.

  1. Every document node must have a unique identity, distinct from all other nodes.

  2. The children must consist exclusively of element, processing instruction, comment, and text nodes if it is not empty. Attribute, namespace, and document nodes can never appear as children

  3. The sequence of nodes in the children property is ordered and must be in document order.

  4. The children property must not contain two consecutive text nodes.

  5. If a node N is a child of a document D, then the parent of N must be D.

  6. If a node N has a parent document D, then N must be among the children of D.

  7. Every child of a document must be distinct.

In a well-formed document, the children of the document node must not be empty and consist exclusively of element nodes, processing-instruction nodes, and comment nodes, and exactly one of these children is an element node. A document node in the data model is more permissive: it may be empty and it allows more than one element node as a child and also permits text nodes as children.

Note:

Document nodes and XPath 1.0 root nodes are essentially identical.

Implementations that support DTD processing and access to the unparsed entity accessors, use the unparsed-entities property to associate information about an unordered collection of unparsed entities with a document node.

4.2.2 Accessors

Accessor Returns:
dm:base-uri The value of the base-uri property
dm:node-kind "document"
dm:node-name ()
dm:parent ()
dm:string-value The concatenation of the string-values of all the text node descendants of the document in document order
dm:typed-value The string value of the document node as an xdt:untypedAtomic value
dm:type ()
dm:children The children of the document node
dm:attributes ()
dm:namespaces ()
dm:nilled ()

Three additional accessors are defined on document nodes:

dm:unparsed-entity-system-id( $node  as DocumentNode,
$entityname  as xs:string) as xs:string?

The dm:unparsed-entity-system-id accessor returns the system identifier of an unparsed external entity declared in the specified document. If no entity with the name specified in $entityname exists, or if the entity is not an external unparsed entity, the empty sequence is returned.

dm:unparsed-entity-public-id( $node  as DocumentNode,
$entityname  as xs:string) as xs:string?

The dm:unparsed-entity-public-id accessor returns the public identifier of an unparsed external entity declared in the specified document. If no entity with the name specified in $entityname exists, or if the entity is not an external unparsed entity, or if the entity has no public identifier, the empty sequence is returned.

dm:document-uri($node as DocumentNode) as xs:string?

The dm:document-uri accessor returns the absolute URI of the resource from which the document node was constructed, if the absolute URI is available. If there is no URI available, or if it cannot be made absolute when the data model is constructed, the empty sequence is returned.

For example, if a collection of documents is returned by the fn:collection function, the dm:document-uri may serve to distinguish between them even though each has the same dm:base-uri.

4.2.3 PSVI to Data Model Mapping

When a data model fragment is created from the PSVI, a document information item is mapped to a Document Node. The precise transformation is described by specifying the PSVI property corresponding to each property of a document node.

base-uri

The value of the [base URI] property.

children

The sequence of nodes constructed from the information items found in the [children] property.

To construct the value of the children property, for each element, processing instruction, comment, and maximal sequence of adjacent character information items found in the [children] property, a corresponding Element, Processing Instruction, Comment, and Text node is constructed and that sequence of nodes is used as the value. If present among the [children], the [document type declaration] information item is ignored.

4.2.4 Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps a Document Node to a document information item. The properties of the document information item are constructed as follows:

Property Value:
[base URI] The value returned by the dm:base-uri accessor
[children] The sequence of information items constructed from the nodes returned by the dm:children accessor. In other words, for each node returned by the dm:children accessor, a corresponding information item is constructed and that sequence of information items is used as the value for the [children] property.
[document element] The values of these properties are implementation-defined but must be consistent with the rest of the Infoset constructed.
[notations]
[unparsed entities]
[character encoding scheme]
[standalone]
[version]
[all declarations processed]
Note:

Since Document Nodes are more permissive than document information items, the resulting Infoset may be invalid.

4.3 Elements

4.3.1 Overview

Element nodes encapsulate XML elements. Elements have the following properties:

  • base-uri, possibly empty.

  • node-name

  • parent, possibly empty

  • type

  • children, possibly empty

  • attributes, possibly empty

  • namespaces, possibly empty

  • nilled

Element nodes must satisfy the following constraints.

  1. Every element node must have a unique identity, distinct from all other nodes.

  2. The children must consist exclusively of element, processing instruction, comment, and text nodes if it is not empty. Attribute, namespace, and document nodes can never appear as children

  3. The sequence of nodes in the children property is ordered and must be in document order.

  4. The children property must not contain two consecutive text nodes.

  5. Every child of an element must be distinct.

  6. The attributes of an element must have distinct names.

  7. The namespace nodes of an element must have distinct names. At most one of the namespace nodes of an element has no name (this is the default namespace). A namespace node whose namespace URI is the zero-length string must have no name. No namespace node may have the name "xmlns".

  8. If a node N is a child of an element E, then the parent of N must be E.

  9. Exclusive of attribute nodes, if a node N has a parent element E, then N must be among the children of E. (Attribute nodes have a parent, but they do not appear among the children of their parent.)

    The data model permits element nodes without parents (to represent partial results during expression processing, for example). Such elements must not appear among the children of any other node.

  10. If an attribute node A has a parent element E, then A must be among the attributes of E.

    The data model permits attribute nodes without parents (to represent partial results during expression processing, for example). Such attributes must not appear among the attributes of any element node.

The data model does not enforce a constraint that the namespaces of an element must be a superset of the namespaces of its parent, nor does it enforce a constraint that the namespaces of an element must include namespace nodes for each of the namespace URIs used in the element name and the names of its attributes, or of namespace URIs used in the content of elements and attributes of type xs:QName. Applications of the data model (such as XSLT and XQuery) may enforce such constraints in particular circumstances, but these constraints are not part of the data model.

4.3.2 Accessors

Accessor Returns:
dm:base-uri The value of the base-uri property or its parent's base URI
dm:node-kind "element"
dm:node-name The xs:QName of the element
dm:parent The parent element or document node
dm:string-value The concatenation of the string-values of all the text node descendants of the element in document order
dm:typed-value The typed value of the node
dm:type The name of the type of the element
dm:children The children of the element node
dm:attributes The attributes of the element node
dm:namespaces The namespaces of the element node
dm:nilled The status of the nilled property of the element node

The dm:base-uri accessor returns the base-uri property of the element node, if it exists. If it does not exist, the base URI of the element's parent is returned.

The accessors dm:namespaces and dm:attributes return the same set of namespace and attribute nodes (respectively) associated with the element. They are not constrained to return them in any particular order.

The dm:parent accessor returns the empty sequence if the element has no parent.

If the element node's type is xs:anyType, the dm:typed-value accessor returns the node's string value as xs:anySimpleType. If the type is a complex type with complex content, invoking dm:typed-value raises an error.

The dm:typed-value accessor returns the typed-value of the node, which is a sequence of zero or more atomic values. The typed-value is closely related to the node's string-value and its type. For example:

  • When the node's string-value is "3.14" and its type is xs:decimal, the typed-value is a sequence containing the atomic value 3.14 of type decimal.

  • When the node's string-value is "foo bar baz" and its type is xs:IDREFS, the typed-value is a sequence containing the atomic values "foo", "bar", and "baz", each of type xs:IDREF.

  • When the node's string-value is "17" and its type is xs:anyType, the typed-value is a sequence containing the atomic value "17" of type xs:anySimpleType.

In fact, when the type is an atomic type, typed-value is always the atomic-value constructed from the string-value and the type.

In the general case, dm:typed-value constructs a sequence of atomic values. These values are derived from the string-value of the element and its type, in such a way as to be consistent with validation.

One additional accessors is defined on element nodes:

dm:element-declaration($node as ElementNode) as xs:string*

The dm:element-declaration accessor returns the xs:QName of the global element declaration associated with this element. If the element declaration is local, it returns a sequence consisting of the xs:QName of the local element declaration and the SchemaGlobalContext of the declaration.

This declaration can be used by implementations to identify substitution groups, nillability, and other aspects of the declaration.

4.3.3 PSVI to Data Model Mapping

When a data model fragment is created from the PSVI, an element information item is mapped to an Element Node. The precise transformation is described by specifying the PSVI property corresponding to each property of an element node.

base-uri

The value of the [base URI] property.

node-name

An xs:QName constructed from the [local name] property and the [namespace name] property

parent

The value of the [parent] property.

type

The xs:QName computed as described in 3.6 Mapping PSV Infoset additions to Types. Note that if the type referenced would be a union type then type refers to the member type that actually validated the schema normalized value.

children
  • If the [schema normalized value] PSVI property exists and is not absent, the processor may, depending on the implementation, use a sequence of nodes containing the Processing Instruction and Comment nodes corresponding to the processing instruction and comment information items found in the [children] property, plus a single text node whose string value is the the [schema normalized value]. The order of these nodes is implementation defined.

  • Otherwise, a sequence of nodes constructed in the following way from the information items found in the [children] property: for each element, processing instruction, comment, and maximal sequence of adjacent character information items found in the [children] property, a corresponding Element, Processing Instruction, Comment, and Text node is constructed.

Because the data model requires that all general entities be expanded, there will never be unexpanded entity reference information item children.

attributes

A set of Attribute Nodes constructed from the attribute information items appearing in the [attributes] property. This includes all of the "special" attributes (xml:lang, xml:space, xsi:type, etc.) but does not include namespace declarations (because they are not attributes).

namespaces

A set of Namespace Nodes constructed from the namespace information items appearing in the [in-scope namespaces] property.

Some implementations may choose to use only a subset of the namespaces present in the PSVI. In particular, they may exclude namespace nodes for namespaces which do not appear in the qualified name of any element or attribute information item. This can arise when xs:QNames are used in content.

nilled

If the [validity] property exists and is "valid" and the [attributes] property contains an attribute with the local-name "nil" and the namespace URI "http://www.w3.org/2001/XMLSchema-instance", then "true", otherwise "false".

4.3.4 Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps an Element Node to an element information item. The properties of the element information item are constructed as follows:

Property Value:
[namespace name] The namespace name of the xs:QName returned by the dm:node-name accessor
[local name] The local name of the xs:QName returned by the dm:node-name accessor
[prefix] An appropriate namespace prefix, as described below
[children] The sequence of information items constructed from the nodes returned by the dm:children accessor. In other words, for each node returned by the dm:children accessor, a corresponding information item is constructed and that sequence of information items is used as the value for the [namespace name] property.
[attributes] The sequence of attribute information items constructed from the nodes returned by the dm:attributes accessor.
[in-scope namespaces] The sequence of namespace information items constructed from the nodes returned by the dm:namespaces accessor.
[base URI] The value returned by the dm:base-uri accessor
[parent] The information item constructed from the node returned by the dm:parent accessor. If the node has no parent, the property must be left absent and the resulting Infoset will not be valid.
[namespace attributes] The sequence of namespace information items constructed from the nodes that are present in the difference between the sequence of nodes returned by the dm:namespaces accessor on this element and the sequence of nodes returned by the dm:namespaces accessor of this element's dm:parent; see below.

An implementation must construct the value of the [prefix] property as if the following algorithm was applied: if the element has at least one namespace node whose namespace URI is the same as the namespace name of the xs:QName returned by the dm:node-name accessor, it returns the local part of the name of that namespace node or the empty string if the namespace node has no name. If there are several such namespace nodes, it chooses one of them arbitrarily. If there is no such namespace node, it generates an arbitrary prefix that is distinct from the dm:node-name of any of the element's namespaces. The [prefix] is the empty string if the element has an empty [namespace name] (if it is in the null namespace).

If a new prefix is generated, a corresponding namespace information item must be added to the [in-scope namespaces] property of the element information item. The namespace information item must associate the generated prefix with the namespace name of the xs:QName returned by the element's dm:node-name accessor.

Note:

If the implementation has allowed in-scope namespaces to be discarded from the data model, then these namespaces may need to be reintroduced when creating an Infoset in order to ensure that the Infoset corresponds to a document that is namespace well-formed as defined in [Namespaces in XML].

Note:

The algorithm used to calculate namespace attributes will need to be adjusted to cater for XML Namespaces 1.1, which allows the "undeclaration" of all namespaces, whether they have a prefix or not.

The [namespace attributes] property is computed so that it contains the smallest possible set of namespace attributes. For example, suppose that the dm:namespaces accessor for this element returns namespace nodes for the "foo", "bar", and "baz" namespaces and the dm:namespaces accessor for this element's parent returns namespace nodes for the "foo" and "bar" namespaces. In this case, the [namespace attributes] property will contain a single namespace information item for the "baz" namespace.

4.4 Attributes

4.4.1 Overview

Attribute nodes encapsulate XML attributes. Attributes have the following properties:

  • node-name

  • string-value

  • parent, possibly empty

  • type

Attribute nodes must satisfy the following constraints.

  1. Every attribute node must have a unique identity, distinct from all other nodes.

  2. If a attribute node A has a parent element E, then A must be among the attributes of E.

    The data model permits attribute nodes without parents (to represent partial results during expression processing, for example). Such attributes must not appear among the attributes of any element node.

For convenience, the element node that owns this attribute is called its "parent" even though an attribute node is not a "child" of its parent element.

4.4.2 Accessors

Accessor Returns:
dm:base-uri ()
dm:node-kind "attribute"
dm:node-name The xs:QName of the attribute
dm:parent The parent element node
dm:string-value The value of the attribute
dm:typed-value The typed value of the attribute
dm:type The name of the type of the attribute
dm:children ()
dm:attributes ()
dm:namespaces ()
dm:nilled ()

If the attribute node's type is xs:anySimpleType, the dm:typed-value accessor returns the node's string value as xdt:untypedAtomic.

The dm:typed-value accessor returns the typed-value of the node, which is a sequence of zero or more atomic values. The typed-value is closely related to the node's string-value and its type. For example:

  • When the node's string-value is "3.14" and its type is xs:decimal, the typed-value is a sequence containing the atomic value 3.14 of type decimal.

  • When the node's string-value is "foo bar baz" and its type is xs:IDREFS, the typed-value is a sequence containing the atomic values "foo", "bar", and "baz", each of type xs:IDREF.

  • When the node's string-value is "17" and its type is xs:anyType, the typed-value is a sequence containing the atomic value "17" of type xs:untypedAtomic.

In fact, when the type is an atomic type, typed-value is always the atomic-value constructed from the string-value and the type.

In the general case, dm:typed-value constructs a sequence of atomic values. These values are derived from the string-value of the element and its type, in such a way as to be consistent with validation.

4.4.3 PSVI to Data Model Mapping

When a data model fragment is created from the PSVI an attribute information item is mapped to an Attribute Node. The precise transformation is described by specifying the PSVI property corresponding to each property of an attribute node.

node-name

An xs:QName constructed from the [local name] property and the [namespace name] property

string-value
  • The [schema normalized value] PSVI property if that exists, or

  • the [normalized value] property.

parent

The value of the [parent] property.

type

The xs:QName computed as described in 3.6 Mapping PSV Infoset additions to Types. Note that if the type referenced would be a union type then type refers to the member type that actually validated the schema normalized value.

4.4.4 Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps an Attribute Node to an attribute information item. The properties of the corresponding attribute information item are constructed as follows:

Property Value:
[namespace name] The namespace name of the xs:QName returned by the