Copyright © 2001 W3C ® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document defines the W3C XQuery 1.0 and XPath 2.0 Data Model, which is the data model of at least [XSLT 2.0], and [XQuery 1.0: A Query Language for XML], and any other specifications that reference it. This data model is based on the data models of [XPath] and [XML Query Data Model] and replaces [XML Query Data Model]. This document is the result of joint work by the [XSL Working Group] and the [XML Query Working Group].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This is a Public Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.
This document has been produced as part of the [XML Activity], following the procedures set out for the W3C Process. The document has been written by the [XSL Working Group] and [XML Query Working Group].
Comments on this document should be sent to the W3C mailing list www-xml-query-comments@w3.org. (archived at http://lists.w3.org/Archives/Public/www-xml-query-comments/).
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
1 Introduction
2 Notation and Pseudo-code Syntax
3 Concepts
3.1 Node Identity
3.2 Document Order
3.3 XML Schemas and the XML Information Set
3.4 Schema Components and Values
3.5 Text Nodes and Simple-Typed Values
3.6 Ignoring Comments, Processing Instructions, and Whitespace
4 Nodes
4.1 Documents
4.2 Elements
4.3 Attributes
4.4 Namespaces
4.5 Processing Instructions
4.6 Comments
4.7 Text
5 Simple Typed Values
6 Sequences
7 Error
8 Schema Components
8.1 Mapping PSV Infoset additions to Schema Components
A XML Information Set Conformance
B References
C References (Non-Normative)
D Example (Non-Normative)
E Issues (Non-Normative)
F Open Issues (Non-normative)
G Resolved Issues (Non-normative)
This document defines the XQuery 1.0 and XPath 2.0 Data Model, which is the data model of [XSLT 2.0] and [XQuery 1.0: A Query Language for XML] 1.0.
The XQuery 1.0 and XPath 2.0 Data Model (henceforth "data model") serves two purposes. First, it defines precisely the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages. A language is closed with respect to a data model if the value of every expression in a language is guaranteed to be in the data model. XSLT 2.0, XQuery 1.0, and XPath 2.0 are all closed with respect to the data model.
The data model is based on the [XML Information Set] (henceforth "Infoset"), but it requires the following new features to meet the [XPath Requirements Version 2.0] and [XML Query Requirements]:
Support for XML Schema types. The XML Schema recommendations define features, such as structures ([XMLSchema Part 1]) and simple data types ([XMLSchema Part 2]), that extend the XML Information Set with precise type information.
Representation of collections of documents and of complex values. ([XML Query Requirements])
As with the Infoset, the XQuery 1.0 and XPath 2.0 Data Model specifies what information in the documents is accessible, but it does not specify the programming-language interfaces or bindings used to represent or access the data.
[Definition: Values in the data model fall into four non-overlapping value categories: nodes, simple typed values, error, and schema components. Additionally we distinguish a fifth overlapping value category, sequences, which includes nodes and simple typed values, given the convention that a node or a simple typed value is identified with a one element sequence containing it.] A node is defined in 4 Nodes and is one of seven node kinds. A simple typed value encapsulates an XML Schema simple type and a corresponding value of that type. They are defined in 5 Simple Typed Values. A sequence is an ordered collection of nodes, simple typed values, or any mixture of nodes and simple typed values. A sequence cannot be a member of a sequence. Sequences are defined in 6 Sequences. The error value is defined in 7 Error. A schema component represents the type of an element node, attribute node, or simple typed value, and are defined in 8 Schema Components.
Note: In XPath 1.0, the data model only defines nodes. The primitive data types (number, boolean, string, node-set) are part of the expression language, not the data model.
Given the above value categories the data model can represent various values including not only the input and the output of a query, but all values of expressions used during the intermediate calculations. Examples include the input document or document repository (represented as a document node or a sequence of document nodes), the result of a path expression (represented as a sequence of nodes), the result of an arithmetic or a logical expression (represented as a simple typed value), a sequence expression resulting a sequence of integers, dates, QNames or other XML Schema simple typed values (represented as a sequence of simple typed values), etc. Examples of values that cannot be expressed directly by the data model include sequences containing schema components, sets containing both simple typed values and the error value, simple typed values whose type is not an XML Schema simple type, etc.
In this document, we provide a precise definition of how values in the XQuery 1.0 and XPath 2.0 Data Model are constructed and accessed, and how they relate to values in the Infoset. We note wherever the XQuery 1.0 and XPath 2.0 Data Model differs from that of XPath 1.0.
In addition to using prose, we define the data model using a functional notation. We chose this notation because it is simple and permits a precise definition of the data model, suitable for use by the formal semantics of XQuery. Although the notation has a functional style, we emphasize that the data model can be realized in a variety of programming languages and styles, for example, as object classes and methods in an object-oriented language.
Pseudo-code syntax is highlighted as follows:
f : (x) -> y
In the psuedo-code syntax, the term Node denotes the category of node values, SimpleTypedValue denotes the category of simple typed values, and UnitValue refers to the category of either node values or simple typed values. The term SchemaComponent denotes the category of schema component values. Sequence<V> denotes the category of sequence values all of whose members are in category V. In a sequence, V may be a Node or SimpleTypedValue, or the union (choice) of several categories of UnitValues. For example, the following denotes a category of sequence containing any combination of comment and processing instruction nodes:
Sequence<CommentNode | ProcessingInstructionNode>
There are some functions in the data model that are partial functions, for example, a node may have one parent node or no parent. We use bounded sequences, Sequence(m,n)<V>, to denote a sequence of at least m and at most n V values. The unbounded sequence Sequence<V> is equivalent to Sequence(0,*)<V>, where * denotes unbounded. For example, if the node argument has a parent, the parent accessor returns a singleton sequence. If the node argument does not have a parent, it returns the empty sequence. The signature of parent specifies that it returns an empty sequence or a sequence containing one element or document node:
parent : (Node) -> Sequence(0,1)<ElementNode | DocumentNode>
The pseudo-code syntax defines functions to construct values, called constructors; and functions to access parts of values, called accessors (see [Issue-0033: Unclear relationship between values passed to the constructor, and those returned by the accessor]).
Note: The XPath 1.0 data model defines accessors, but does not define constructors.
The term signature of a function specifies the value category of its zero or more inputs and the value category of its one output. The following signature denotes a function f that takes values in the categories V1, ..., Vm and returns an output value in the category Vn (see [Issue-0067: Align function call syntax]).
f : (V1, ..., Vm) -> Vn
A member of a particular category is a permissible argument to any function that accepts the category, for example, a ProcessingInstructionNode is a permissible argument to a function expecting a Node.
This document relies on the [XML Information Set]. Information items and properties are indicated by the styles information item and [property], respectively.
This document also provides pseudo-code syntax describing the mapping
from the Infoset to the data model. To facilitate this we introduce accessors
to the required infoset properties through InfoItem objects.
These infoset accessors are for rhetorical purposes only and are not intended to be exposed outside this
specification. We name the accessors of the Infoset using the convention
infoset-<item-name>-<property-name>.
Similarly, accessor functions that return PSV Infoset properties use the
naming convention psvi-<item-name>-<property-name>.
For example, infoset-element-attributes is the accessor that
returns an element information item's [attributes]
property:
infoset-element-attributes : (ElementItem) -> Sequence<AttributeItem>
An InfoItem is one of eleven kinds of item: document item, element item, attribute item, processing instruction item, unexpanded entity item, character item, comment item, doctype item, unparsed entity item, notation item, and namespace item.
The infoitem-kind accessor
returns a string value representing the information item's kind, either
"document", "element", "attribute",
"character", "namespace",
"processing-instruction", "comment", "doctype",
"notation", or "unparsed-entity".
infoitem-kind : (InfoItem) -> xs:string
Throughout this document, the namespace prefix xs indicates
the [XMLSchema Part 1] namespace name
http://www.w3.org/2001/XMLSchema.
The namespace prefixes xf and op indicate
the namespace of the functions and operators defined in
[XQuery 1.0 and XPath 2.0 Functions and Operators].They are associated with the namespace names
http://www.w3.org/2001/12/xquery-functions and
http://www.w3.org/2001/12/xquery-operators respectively.
The following functions and operators are defined in [XQuery 1.0 and XPath 2.0 Functions and Operators]:
xf:anyURI
xf:concat
xf:decimal
xf:NCName
xf:node-equal
xf:QName
xf:QName-from-uri
xf:get-local-name
xf:get-namespace-uri
xf:string
op:value-equal
op:concatenate
xf:count
xf:empty
op:item-at
xf:sublist
Because XML documents are tree-structured, we define the data model using conventional terminology for trees. The data model is a node-labeled, tree-constructor representation, but also includes a concept of node identity. The identity of a node is established when a node-constructor is applied to create the node: each application of a node constructor creates a new node that is identical to itself, and not identical to any other node (see 4 Nodes).
This concept should not be confused with the concept of unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.
A document order is defined on all the nodes in a document. The document node is the first node. Element nodes, comment nodes, and processing instruction nodes occur in the order of their representation in the XML (after expansion of entities). Element nodes occur before their children. The namespace nodes of an element immediately follow the element node (see [Issue-0051: Document order of shared namespace nodes]). The relative order of namespace nodes is implementation-dependent. The attribute nodes of an element immediately follow the namespace nodes of the element. The relative order of attribute nodes is implementation-dependent. Reverse document order is the reverse of document order.
The relative order of nodes in distinct documents is implementation-dependent but stable. In other words, given two distinct documents A and B, if a node in document A is before a node in document B, then every node in document A is before every node in document B.
Note: The relative order of free-floating nodes (those not in a document) is not defined. See [Issue-0050: Relative order of free-floating nodes].
The data model is defined in terms of the [XML Information Set] after XML Schema validity assessment. XML Schema validity assessment is the process of assessing an XML element information item with respect to an XML Schema and augmenting it and some or all of its descendants with properties that provide information about validity and type assignment. The result of schema validity assessment is an augmented Infoset, known as the Post Schema-Validation Infoset, or PSVI.
The data model supports the following classes of XML documents:
Schema-validated documents, i.e., those validated with respect to a schema,
DTD-valid documents, i.e., those documents validated with respect to a DTD, and
Well-formed documents with no corresponding DTD or schema.
The data model does not support XML documents that are not supported by the XML Information Set, for example, non-well-formed documents and documents that don't conform to XML Namespaces.
Schema-validated documents include documents in which some elements or attributes have been validated by "lax" or "skip" validation ([XMLSchema Part 2]).
An "incompletely validated document" is an XML document that has a corresponding schema but whose schema-validity assessment has resulted in one or more element or attribute information items being assigned values other than 'valid' for the [validity] property in the PSVI.
The data model supports incompletely validated documents. 8 Schema Components specifies how such documents are represented in the data model. See [Issue-0024: Support for Schema-invalid documents].
Note: This implies accommodation for the case where both a DTD and a schema are applied. This will probably require some reconciliation of the [attribute type] property with type information from the PSVI. See issue [Issue-0004: Schema/DTD].
The [XML Schema: Formal Description] (henceforth "XSFD") is a formal, declarative system for describing and naming XML Schema information, specifying XML instance type information, and validating instances against schemas. XSFD includes a component model that defines four schema components (element, attribute, simple type, and complex type), and it defines the mapping from the XML Schema component model to the XSFD model. In addition, it specifies "normalized, universal" names for all components of an XML Schema, so that they can be uniquely identified by URIs.
The data model provides a representation for schema components, which are used to represent the types of values. All element nodes, attribute nodes, and simple typed values have an associated schema component. We use the term SchemaComponent to collectively refer to the data model's schema component values element-declaration, attribute-declaration, simple-type-definition, and complex-type-definition. The accessors for schema components are defined in 8 Schema Components. The schema component of element and attribute nodes is derived from the PSV Infoset additions of the nodes' corresponding element and attribute information items.
A schema simple type consists of a lexical space, a value space, and a set of facets [XMLSchema Part 2]. A simple type is either primitive (e.g., xs:string, xs:boolean, xs:float, xs:double) or derived (e.g., xs:language, xs:NMTOKEN, xs:long, xs:ID, xs:IDREF, etc., or user defined). If a simple typed value is in the value space of the simple type, we say a simple typed value is an instance of a schema simple type. Because the value spaces of schema simple types may overlap, a simple typed value may be an instance of more than one schema simple type; e.g., an instance of xs:integer is also an instance of a xs:long.
A schema complex type defines the permissible structure and content of an element [XMLSchema Part 1].
A schema attribute declaration specifies an attribute's name and the simple type of its value.
A schema element declaration specifies an element's name and the simple or complex type of its content.
The data model supports two representations of the character data in an XML document: text nodes and simple-typed values. A text node represents a string of consecutive character information items and never has another text node as its immediately following sibling. An element node, for example, has child nodes that may include text nodes, comment nodes, processing instruction nodes, and other element nodes. In addition, the text content of an element may be interpreted as a simple-typed value, such as an integer, a date, or a sequence of prices. To illustrate, consider an element node whose type is a sequence of double-precision numbers. The element's children are three nodes: a text node with string contents " 12.00 ", followed by a comment node, followed by a text node with contents " 13.0", whereas its simple-typed value is a sequence containing the double-precision numbers 12.0 and 13.0.
The data model logically supports both text nodes and simple-typed values, but it does not specify that they both must be implemented. An implementation might choose to only store simple-typed values and reconstruct text nodes on demand, or vice versa. This allows for the efficient storage of and access to simple typed values but it has an impact on interoperability; for instance, searching for leading zeros in nodes with numerical simple typed values may yield results in some implementations but not in others.
Although the data model is able to represent comments, processing instructions, and insignificant whitespace, preservation of this information may be unnecessary and onerous for some applications.
Construction of a document from an XML information set is parameterized by three flags, ignore-comments, ignore-processing-instructions, and ignore-whitespace. If the ignore-comments flag is true, comment nodes are not preserved in the data model. If the ignore-processing-instructions flag is true, processing-instruction nodes are not preserved in the data model. If the ignore-whitespace flag is true, insignificant whitespace is not preserved.
ignore-comments : xs:boolean ignore-processing-instructions : xs:boolean ignore-whitespace : xs:boolean
Note: By whom these flags are set is not defined. See [Issue-0040: Setting and examining construction flags].
Expressions which rely upon the presence or absence of comments, processing instructions, or insignificant whitespace may produce different results for two data models created from the same infoset (XML document), when each data model is constructed with different settings of these flags.
Insignificant whitespace is defined as a text node that:
contains no characters other than whitespace characters (as defined in XML 1.0), and
has a parent element with a [validity] property with the value "valid", and a [type definition] property yielding a complex type with content-type of element-only.
Note: See [Issue-0034: Interaction of insignificant whitespace with comments]. Removal of insignificant whitespace might be performed automatically when consistent with the schema. See [Issue-0028: Whitespace handling]. XSLT's whitespace handling mechanism needs to be supported; see [Issue-0057: Support for XSLT whitespace stripping].
The category of Node values contains seven distinct kinds of nodes: document, element, attribute, text, namespace, processing instruction, and comment. The seven kinds of nodes are defined in the following subsections.
Each kind of node has its own constructor. The effect of a node constructor is to create a new node with a unique identity, distinct from all other nodes.
A set of accessors is defined on all seven kinds of Nodes. Some accessors return a constant empty sequence on certain node kinds.
The node-kind accessor returns a string value
representing the node's kind: either "document",
"element", "attribute", "text",
"namespace", "processing-instruction", or
"comment".
The name accessor returns a sequence containing one expanded QName for node kinds that can have names. For other node kinds, it always returns an empty sequence. An expanded QName is in the value space of xs:QName, and consists of a namespace URI and a local name. See [Issue-0063: Is prefix preserved?], [Issue-0070: Should the name accessor return "" or ()?].
The parent accessor returns a sequence containing zero or one nodes for node kinds that can have parents. For other node kinds, it always returns the empty sequence.
The string-value accessor returns the node's string representation. For some kinds of node, the string-value is part of the node; for other kinds of node, the string-value is computed from the string-value of its descendant nodes.
The typed-value accessor returns a sequence of simple typed values corresponding to the node. This may be a non-empty sequence for element and attribute nodes, but it is always the empty sequence for other node kinds.
Every node has at most one parent, which is either an element node or the document node. A node that has no parent is regarded as the root of a tree. The one exception is a namespace node, which never has a parent.
Note: In XPath 1.0, Namespace nodes have parents.
Document nodes and element nodes have a sequence of children nodes. A document node or an element node is the parent of each of its child nodes. Nodes never share children: if two nodes have distinct identities, then no child of one node will be a child of the other node.
The return types of the Node accessors is given below. Some kinds of node further restrict the return types; notably, many node kinds return a constant empty sequence for some of the accessors.
node-kind : (Node) -> xs:string name : (Node) -> Sequence(0,1)<xs:QName> base-uri : (Node) -> Sequence(0,1)<xs:anyURI> string-value : (Node) -> xs:string typed-value : (Node) -> Sequence<SimpleTypedValue> parent : (Node) -> Sequence(0,1)<ElementNode | DocumentNode> children : (Node) -> Sequence<ElementNode | TextNode | ProcessingInstructionNode | CommentNode> attributes : (Node) -> Sequence<AttributeNode> namespaces : (Node) -> Sequence<NamespaceNode> declaration : (Node) -> Sequence(0,1)<SchemaComponent> type : (Node) -> Sequence(0,1)<SchemaComponent> unique-ID : (Node) -> Sequence(0,1)<xs:ID>
A tree contains a root plus all nodes that are reachable directly or indirectly from the root via the children, attributes, and namespace accessors. Every node belongs to exactly one tree, and every tree has exactly one root node. A tree whose root node is a document node is referred to as a document. A tree whose root node is some other kind of node is referred to as a fragment.
| Document Node accessors | possible values |
| node-kind | "document" |
| name | empty sequence |
| base-uri | xs:anyURI |
| string-value | xs:string |
| typed-value | empty sequence |
| parent | empty sequence |
| children | arbitrary sequence of one or more element nodes, zero or more processing instruction nodes, and zero or more comment nodes |
| attributes | empty sequence |
| namespaces | empty sequence |
| declaration | empty sequence |
| type | empty sequence |
| unique-ID | empty sequence |
A document is represented by a document node, which corresponds to a document information item.
Note: Document nodes and XPath 1.0 root nodes are essentially identical.
A document node does not have an expanded-QName.
The base-uri of the document corresponds to the [base URI] property.
The string-value of the document node is the concatenation of the string-values of all text-node descendants of the document node in document order.
The parent of the document node is always the empty sequence. A document node always represents the root of a tree.
The children of the document node are nodes corresponding to the information items found in the [children] property, omitting any document type declaration information items.
Note: There is no way to determine what DTD might apply to the data model. See [Issue-0042: System Id and Public Id are not exposed].
In a well-formed document, the children of the document node consist exclusively of element nodes, processing-instruction nodes, and comment nodes, and exactly one of these children is an element node. A document node in the data model is more permissive: it permits more than one element node as a child and also permits text nodes as children.
A document node has the constructor document-node, which takes a base URI value and a non-empty sequence of its children nodes. Like all other node constructors, the document-node constructor has the effect of creating a new node with a unique identity, distinct from all other nodes.
document-node : (xs:anyURI, Sequence(1,*)<ElementNode | TextNode | ProcessingInstructionNode | CommentNode>) -> DocumentNode
The accessors base-uri and children return a document node's constituent parts:
base-uri : (DocumentNode) -> xs:anyURI children : (DocumentNode) -> Sequence(1,*)<ElementNode | TextNode | ProcessingInstructionNode | CommentNode>
The accessors node-kind and string-value also apply to document nodes and return results other than the empty sequence. The accessors name, typed-value, parent, attributes, namespaces, declaration, type and unique-ID applied to a document node always return the empty sequence.
A document node is constructed from a Document Information Item by the infoitem-to-document-node function:
/* Accessors for document information items: */ infoset-document-children : (DocumentItem) -> Sequence<ElementItem | ProcessingInstructionItem | CommentItem | DocTypeItem> infoset-document-base-uri : (DocumentItem) -> xs:anyURI
infoitem-to-document-node : (DocumentItem) -> DocumentNode function infoitem-to-document-node(d) { let kids := collapse-text-nodes(sequence-map(infoitem-to-node, infoset-document-children(d))) return document-node(infoset-document-base-uri(d), kids) }
The collapse-text-nodes
function synthesizes a single text node from multiple text nodes. The
sequence-map function
applies its first function argument to each member of its second
sequence argument and returns a new sequence containing the result
of applying the function to each member of the sequence. In the pseudo-code
above, infoitem-to-node is applied to each child of the
document information item value
d and a new sequence of children nodes is constructed,
each of which is a Node. The constructor
document-node constructs the document node in the data
model.
sequence-map : ((UnitValue1 -> UnitValue2), Sequence<UnitValue1>)
-> Sequence<UnitValue2>The infoitem-to-node function maps an information item to a sequence of zero or one node.
infoitem-to-node : (InfoItem) -> Sequence(0,1)<Node> function infoitem-to-node(i) { return if (infoitem-kind(i) = "element") then infoitem-to-element-node(i) else if (infoitem-kind(i) = "character") then infoitem-to-single-character-text-node(i) else if (infoitem-kind(i) = "processing-instruction") then if (not(ignore-processing-instructions)) then infoitem-to-processing-instruction-node(i) else empty-sequence() else if (infoitem-kind(i) = "comment") then if (not(ignore-comments)) then infoitem-to-comment-node(i) else empty-sequence() else /* infoitem-kind(i) = "doctype" | "notation" | "unparsed-entity" */ empty-sequence() }
| Element Node accessors | possible values |
| node-kind | "element" |
| name | xs:QName |
| base-uri | xs:anyURI |
| string-value | xs:string |
| typed-value | sequence of zero or more simple typed values |
| parent | sequence of zero or one element or document node |
| children | sequence of zero or more element nodes, zero or more processing instruction nodes, zero or more comment nodes, and zero or more text nodes |
| attributes | sequence of zero or more attribute nodes |
| namespaces | sequence of zero or more namespace nodes |
| declaration | schema component |
| type | schema component |
| unique-ID | zero or one xs:ID |
Each element node corresponds to an element information item.
An element node has an expanded-QName. The local part of the expanded-QName corresponds to the [local name] property. The namespace URI of the expanded-QName of the element node corresponds to the [namespace name] property.
The parent of the element node corresponds to the node corresponding to the [parent] property.
An element node has an associated typed-value, which is a sequence of zero or more simple typed values. Examples of such values would be a sequence containing an integer, or user-defined simple value or several dates, etc. For a document with a schema, the element's typed-value corresponds to the [schema normalized value] PSVI property. If the element has a complex type, the typed-value is the empty sequence (see [Issue-0054: Complex types with simple content]. For an element in a well-formed document with no associated schema, the element's typed-value is the empty sequence. If the element was created with an xsi:nil attribute set to true, then typed-value returns the empty sequence. See [Issue-0071: Magic Attributes].
The children nodes of the element node correspond to the element, comment, processing instruction, and character information items appearing in the [children] property. This correspondence is not one-to-one, as consecutive character information item children are coalesced into a single text node. Because the data model requires that all general entities be expanded, there will never be unexpanded entity reference information item children.
The attributes of the element node are nodes corresponding to attribute information items appearing in the [attributes] property. The attributes of an element always have distinct names.
The namespaces of the element node are nodes corresponding to namespace information items appearing in the [in-scope namespaces] property. The namespaces of an element always have distinct prefixes. See [Issue-0062: Namespace fixups required].
The declaration of an element is a schema component and corresponds to the [element declaration] PSVI property. The type of an element is a schema component and corresponds to the [type definition] PSVI property. The representation of schema component information is defined in 8 Schema Components. See [Issue-0064: Access to member type definition].
The unique ID of the element node is an identifier optionally assigned by the user. It corresponds to the [normalized value] property of the attribute information item in the [attributes] property that has a type ID, if one exists.
Note: Using this definition, only IDs declared in a DTD are effective. See [Issue-0004: Schema/DTD]. Even so, this definition is not backward compatible with XPath 1.0. See [Issue-0038: XPath 1.0 treatment of non-unique IDs]. Furthermore, it doesn't even work as spec'd, see [Issue-0044: Unable to construct an element with unique ID].
An element node can be constructed in one of two ways: element-complex-node is useful for constructing an element from the PSVI, element-simple-node is useful when constructing an element via embedded expressions. The difference in the constructors is whether the children of the node are specified as a sequence of nodes or as a sequence of simple typed values.
The constructor element-complex-node takes an expanded-QName, a sequence of namespace nodes, a sequence of attribute nodes, a sequence of child nodes, and the node's element declaration, which is a schema component.
element-complex-node : (xs:QName, Sequence<NamespaceNode>, Sequence<AttributeNode>, Sequence<ElementNode | TextNode | ProcessingInstructionNode | CommentNode>, SchemaComponent) -> ElementNode
Editorial Note: MF: The constructor only takes the element declaration, because it's possible to derive the type of an element or attribute from its corresponding declaration. But would it be cleaner to include the type in the constructor as well?
The constructor element-simple-node takes an expanded-QName, a sequence of namespace nodes, a sequence of attribute nodes, a sequence of simple typed values, and the node's element declaration, which is a schema component.
element-simple-node : (xs:QName, Sequence<NamespaceNode>, Sequence<AttributeNode>, Sequence<SimpleTypedValue>, SchemaComponent) -> ElementNode
Like all other node constructors, the element node constructors has the effect of creating a new node with a unique identity, distinct from all other nodes.
To guarantee that the parent-child relationship is invertible, the element constructors logically create a copy of all of their namespace, attribute, and children arguments and set the parent property of these nodes to the newly created element node. As long as the parent-child constraint is satisfied, an implementation of the data model may choose to use specialized techniques to avoid creating physical copies of the arguments to an element constructor. See [Issue-0052: Element constructor copies nodes?].
Note: An alternative interface is suggested by James Clark: See [Issue-0019: Element constructor that performs schema processing].
Neither constructor allows specifying the children of the node as a heterogeneous sequence, i.e. a sequence mixing nodes and simple typed values. It is still prossible to construct an element (with mixed content) from such a sequence in two steps: First the heterogeneous sequence will need to be turned into a homogeneous one by converting the simple typed values to text nodes, then the now homogeneous sequence can be passed to the element-complex-node constructor. Note that the precise type information of the simple typed values is lost in the process, which is inline with XML Schema, that cannot represent the type of such mixed content.
Editorial Note: Both element constructors take a SchemaComponent as one of their arguments. The question of what should the value of this argument be when no type information is available (for instance when dealing with well formed, schemaless documents) and how should the typed-value accessor behave in these cases is currently under discussion. See also [Issue-0036: Support for abstract types].
The accessors name, namespaces, attributes and declaration return an element node's constituent parts. The children accessor returns the sequence of children nodes for an element node if it was created with element-complex-node and it returns a singleton sequence containing a text node corresponding to the sequence of simple typed values if it was created with element-simple-node. The type accessor returns the schema component corresponding to the type of the element's content: either a complex-type-definition or simple-type-definition. It is possible to derive the element's type from its declaration. See [Issue-0068: Retaining the type of a sequence of heterogeneous simple typed values.].
name : (ElementNode) -> xs:QName namespaces : (ElementNode) -> Sequence<NamespaceNode> attributes : (ElementNode) -> Sequence<AttributeNode> children : (ElementNode) -> Sequence<ElementNode | TextNode | ProcessingInstructionNode | CommentNode> declaration : (ElementNode) -> SchemaComponent type : (ElementNode) -> SchemaComponent
If an element has a simple type, the accessor function typed-value returns a sequence of the simple-typed values of the element; otherwise, it returns the empty sequence. If the element was created with an xsi:nil attribute set to true, then typed-value returns the empty sequence.
typed-value : (ElementNode) -> Sequence<SimpleTypedValue>
The string-value of an element node returns the concatenation of the string-values of all text-node descendants of the element node in document order if it was created with element-complex-node and it returns the string representation of the sequence of simple typed values if it was created with element-simple-node.
string-value : (ElementNode) -> xs:string
If an element has a unique ID, the accessor function unique-ID returns a sequence containing the unique ID of the node; otherwise, it returns the empty sequence.
unique-ID : (ElementNode) -> Sequence(0,1)<xs:ID>
The node accessors base-uri, node-kind, parent, and string-value also apply to element nodes.
An element node is constructed from an Element Information Item by the infoitem-to-element-node function:
/* Accessors for element information items: */ infoset-element-namespace-name : (ElementItem) -> Sequence(0,1)<xs:anyURI> infoset-element-local-name : (ElementItem) -> xs:string infoset-element-children : (ElementItem) -> Sequence<InfoItem> infoset-element-attributes : (ElementItem) -> Sequence<AttributeItem> infoset-element-in-scope-namespaces : (ElementItem) -> Sequence<NamespaceItem> infoset-element-base-uri : (ElementItem) -> xs:anyURI /* unused ? */ psvi-element-validity : (ElementItem) -> xs:string psvi-element-element-declaration : (ElementItem) -> ElementItem psvi-element-type-definition : (ElementItem) -> ElementItem psvi-element-schema-normalized-value : (ElementItem) -> xs:string
infoitem-to-element-node : (ElementItem) -> ElementNode function infoitem-to-element-node(e) { let name := xf:QName-from-uri(infoset-element-namespace-name(e), infoset-element-local-name(e)), nsnodes := sequence-map(infoitem-to-namespace-node, infoset-element-in-scope-namespaces(e)), attrnodes := sequence-map(infoitem-to-attribute-node, infoset-element-attributes(e)), kids := collapse-text-nodes(sequence-map(infoitem-to-node, infoset-element-children(e))), declaration := infoitem-to-schema-component(psvi-element-validity(e), psvi-element-element-declaration(e)), type := infoitem-to-schema-component(psvi-element-validity(e), psvi-element-type-definition(e)) return element-complex-node(name, nsnodes, attrnodes, kids, declaration) }
Editorial Note: MF: Even though its possible to derive the type of an element from its corresponding element declaration, it seems cleaner to compute explicitly the schema components for both the element declaration and its simple or complex type.
Note: [base URI] is discarded. See [Issue-0030: Base URI is a property of element nodes].
| Attribute Node accessors | possible values |
| node-kind | "attribute" |
| name | xs:QName |
| base-uri | empty sequence |
| string-value | xs:string |
| typed-value | sequence of zero or more simple typed values |
| parent | sequence of zero or one element nodes |
| children | empty sequence |
| attributes | empty sequence |
| namespaces | empty sequence |
| declaration | schema component |
| type | schema component |
| unique-ID | empty sequence |
Each element node has an associated set of attribute nodes, each corresponding to an attribute information item.
An attribute node has an expanded-QName. The local part of the expanded-QName corresponds to the [local name] property. The namespace name of the expanded-QName corresponds to the [namespace name] property.
An attribute node has an associated string-value, which corresponds to the [normalized value] property.
An attribute node also has a typed-value. For a document with a schema, the attribute's typed-value corresponds to the [schema normalized value] PSVI property.
For convenience, the element node is called the "parent" of each of these attribute nodes even though an attribute node is not a "child" of its parent element. The parent of the attribute node corresponds to the [owner element] property.
The declaration of an attribute is a schema component and corresponds to the [attribute declaration] PSVI property. The type of an attribute is a schema component and corresponds to the [type definition] PSVI property. The representation of schema component information is defined in 8 Schema Components.
An attribute node can be constructed in one of two ways: attribute-complex-node is useful for constructing an attribute from the PSVI, attribute-simple-node is useful when constructing an attribute with embedded expressions.
The constructor attribute-complex-node takes the attribute's name, a string value and the node's attribute declaration, which is a schema component.
attribute-complex-node : (xs:QName, xs:string, SchemaComponent) -> AttributeNode
The constructor attribute-simple-node takes the attribute's name, a simple typed value and the node's attribute declaration, which is a schema component.
attribute-simple-node : (xs:QName, Sequence<SimpleTypedValue>, SchemaComponent) -> AttributeNode
Like all other node constructors, the attribute node constructors have the effect of creating a new node with a unique identity, distinct from all other nodes.
Editorial Note: Both attribute constructors take a SchemaComponent as one of their arguments. The question of what should the value of this argument be when no type information is available (for instance when dealing with well formed, schemaless documents) and how should the typed-value accessor behave in these cases is currently under discussion. See also [Issue-0036: Support for abstract types].
The accessors name and declaration return an attribute's constituent parts. The accessor string-value returns an attribute's constituent part if it was created with attribute-complex-node and it returns the string representation of the sequence of simple typed values if it was created with attribute-simple-node. The type accessor returns the schema component corresponding to the simple type of the attribute's value. It is possible to derive the attribute's type from its declaration. The accessor function typed-value returns a sequence of the simple-typed values of an attribute.
name : (AttributeNode) -> xs:QName string-value : (AttributeNode) -> xs:string declaration : (AttributeNode) -> SchemaComponent type : (AttributeNode) -> SchemaComponent typed-value : (AttributeNode) -> Sequence<SimpleTypedValue>
The node accessors node-kind and parent also apply to attribute nodes and may return results other than the empty sequence. The accessors base-uri, children, attributes, namespaces and unique-ID applied to an attribute node always return the empty sequence.
An attribute node is constructed from an Attribute Information Item by the infoitem-to-attribute-node function:
/* Accessors for attribute information items: */ infoset-attribute-namespace-name : (AttributeItem) -> Sequence(0,1)<xs:anyURI> infoset-attribute-local-name : (AttributeItem) -> xs:string infoset-attribute-normalized-value : (AttributeItem) -> xs:string infoset-attribute-owner-element : (AttributeItem) -> ElementItem psvi-attribute-validity : (AttributeItem) -> xs:string psvi-attribute-attribute-declaration : (AttributeItem) -> ElementItem psvi-attribute-type-definition : (AttributeItem) -> ElementItem psvi-attribute-schema-normalized-value : (AttributeItem) -> xs:string
infoitem-to-attribute-node : (AttributeItem) -> AttributeNode function infoitem-to-attribute-node(a) { let name := xf:QName-from-uri(infoset-attribute-namespace-name(a), infoset-attribute-local-name(a)), declaration := infoitem-to-schema-component(psvi-attribute-validity(a), psvi-attribute-attribute-declaration(a)), type := infoitem-to-schema-component(psvi-attribute-validity(a), psvi-attribute-type-definition(a)) return attribute-complex-node(name, infoset-attribute-normalized-value(a), declaration) }
Editorial Note: JM: Update the above to accommodate the possibility of schema-less and DTD validation.
| Namespace Node accessors | possible values |
| node-kind | "namespace" |
| name | sequence of zero or one xs:QName |
| base-uri | empty sequence |
| string-value | xs:string |
| typed-value | empty sequence |
| parent | empty sequence |
| children | empty sequence |
| attributes | empty sequence |
| namespaces | empty sequence |
| declaration | empty sequence |
| type | empty sequence |
| unique-ID | empty sequence |
Each element node has an associated set of namespace nodes, each corresponding to a namespace information item.
A namespace node has an expanded-QName. The local part of the QName corresponds to the [prefix] property. The namespace URI of the QName is the empty sequence.
The string-value of the namespace node corresponds to the [namespace name] property.
A namespace node has no parent.
Note: From XPath 1.0 : "The parent of the namespace node is the element node in whose namespaces collection this node appears." See [Issue-0039: Parent of namespace nodes] and [Issue-0060: Sharing namespace nodes], and [Issue-0061: No access to prefix on free-floating attributes].
A namespace node has the constructor namespace-node, which takes a namespace prefix and the absolute URI of the namespace being declared. The namespace prefix may be the empty sequence. If the URI is the zero-length string, the prefix must be the empty sequence. Like all other node constructors, the namespace node constructor has the effect of creating a new node with a unique identity, distinct from all other nodes.
namespace-node : (Sequence(0,1)<xs:string>, xs:string) -> NamespaceNode
A namespace node's constituent parts may be obtained by applying the accessors name (with the function xf:get-local-name) and string-value.
name : (NamespaceNode) -> Sequence(0,1)<xs:QName> string-value : (NamespaceNode) -> xs:string
The node accessor node-kind also applies to namespace nodes and returns a result other than the empty sequence. The accessors base-uri, typed-value, parent, children, attributes, namespaces, declaration, type and unique-ID applied to a namespace node always return the empty sequence.
A namespace node is constructed from a Namespace Information Item by the infoitem-to-namespace-node function:
infoset-namespace-prefix : (NamespaceItem) -> Sequence(0,1)<xs:string> infoset-namespace-namespace-name : (NamespaceItem) -> xs:string
infoitem-to-namespace-node : (NamespaceItem) -> NamespaceNode function infoitem-to-namespace-node(i) { return namespace-node(infoset-namespace-prefix(i), infoset-namespace-namespace-name(i)) }
| Processing Instruction Node accessors | possible values |
| node-kind | "processing-instruction" |
| name | xs:QName |
| base-uri | empty-sequence |
| string-value | xs:string |
| typed-value | empty sequence |
| parent | sequence of zero or one element or document nodes |
| children | empty sequence |
| attributes | empty sequence |
| namespaces | empty sequence |
| declaration | empty sequence |
| type | empty sequence |
| unique-ID | empty sequence |
A processing instruction node corresponds to a processing instruction information item. There are no processing instruction nodes for processing instructions that are children of a document type declaration information item.
A processing instruction node has an expanded-QName. The local part of the expanded-QName corresponds to the [target] property. The namespace URI of the expanded-QName is the empty sequence. The local part is a string value that must be an NCName.
The string '?>' may not occur within a processing instruction's target value ([XML Recommendation]).
The string-value of the processing instruction node corresponds to the [content] property.
The parent of the processing instruction node corresponds to the [parent] property.
A processing-instruction node has the constructor processing-instruction-node, which takes an NCName representing the target and a string representing the content. Like all other node constructors, the processing node constructor has the effect of creating a new node with a unique identity, distinct from all other nodes.
processing-instruction-node : (xs:NCName, xs:string) -> ProcessingInstructionNode
A processing instruction's constituent parts may be obtained by applying the accessors name (with the function xf:get-local-name) and string-value.
name : (ProcessingInstructionNode) -> xs:QName string-value : (ProcessingInstructionNode) -> xs:string
The node accessors node-kind and parent also apply to processing-instruction nodes and may return results other than the empty sequence. The accessors base-uri, typed-value, children, attributes, namespaces, declaration, type and unique-ID applied to a processing-instruction node always return the empty sequence.
A processing-instruction node is constructed from an Processing Instruction Information Item by the infoitem-to-processing-instruction-node function:
/* Accessors for processing instruction information items */ infoset-processing-instruction-target : (ProcessingInstructionItem) -> xs:string infoset-processing-instruction-content : (ProcessingInstructionItem) -> xs:string
infoitem-to-processing-instruction-node : (ProcessingInstructionItem) -> ProcessingInstructionNode function infoitem-to-processing-instruction-node(i) { return processing-instruction-node(xf:NCName(infoset-processing-instruction-target(i)), infoset-processing-instruction-content(i)) }
| Comment Node accessors | possible values |
| node-kind | "comment" |
| name | empty sequence |
| base-uri | empty-sequence |
| string-value | xs:string |
| typed-value | empty sequence |
| parent | sequence of zero or one element or document nodes |
| children | empty sequence |
| attributes | empty sequence |
| namespaces | empty sequence |
| declaration | empty sequence |
| type | empty sequence |
| unique-ID | empty sequence |
A comment node corresponds to a comment information item. There are no comment nodes for comments that are children of a document type declaration information item.
A comment node does not have an expanded-QName.
The string-value of the comment node corresponds to the [content] property.
The parent of the comment node corresponds to the [parent] property.
The string "--" (double-hyphen) must not occur within a comment's string value ([XML Recommendation]).
A comment node has the constructor comment-node, which takes a string value. Like all other node constructors, the comment node constructor has the effect of creating a new node with a unique identity, distinct from all other nodes.
comment-node : (xs:string) -> CommentNode
A comment node's constituent parts may be obtained by applying the accessor string-value.
string-value : (CommentNode) -> xs:string
The node accessors node-kind and parent also apply to comment nodes and may return results other than the empty sequence. The accessors name, base-uri, typed-value, children, attributes, namespaces, declaration, type and unique-ID applied to a comment node always return the empty sequence.
A comment node is constructed from a Comment Information Item by the infoitem-to-comment-node function:
/* Accessors for comment information items */ infoset-comment-value : (CommentItem) -> xs:string
infoitem-to-comment-node : (CommentItem) -> CommentNode function infoitem-to-comment-node(i) { return comment-node(infoset-comment-value(i)) }
| Text Node accessors | possible values |
| node-kind | "text" |
| name | empty sequence |
| base-uri | empty-sequence |
| string-value | xs:string |
| typed-value | empty sequence |
| parent | sequence of zero or one element or document nodes |
| children | empty sequence |
| attributes | empty sequence |
| namespaces | empty sequence |
| declaration | empty sequence |
| type | empty sequence |
| unique-ID | empty sequence |
A text node corresponds to a sequence of one or more consecutive character information items. As much character data as possible is grouped into each text node: a text node never has an immediately following or preceding sibling that is a text node.
A text node does not have an expanded-QName.
The string-value of a text node is the character data, which corresponds to the concatenated [character code] properties of each of the character information items.
Note: The string-value is not W3C normalized as described in the Character Model for the World Wide Web version 1.0 draft. See [Issue-0045: Text nodes are not W3C-normalized text].
The parent of the text node corresponds to the [parent] property of any one of the consecutive character information items (consecutive characters always have the same parent).
A text node has the constructor text-node
and takes a string value. Like all other node constructors, the text
constructor has the effect of creating a new node with a unique identity,
distinct from all other nodes.
The string-value of a text node is simply its content.
The node accessors node-kind and parent also apply to text nodes and may return results other than the empty sequence. The accessors name, base-uri, typed-value, children, attributes, namespaces, declaration, type and unique-ID applied to a text node always return the empty sequence.
The mapping from character information items to text nodes occurs in the infoitem-to-element-node function. The infoset-character-code accessor maps a character information item to the ISO 10646 character code (in the range 0 to #x10FFFF, though not every value in this range is a legal XML character code) of the character.
infoset-character-code : (CharacterItem) -> Code
The function infoitem-to-single-character-text-node takes one character information item and maps it to a text node with a string value of length one.
infoitem-to-single-character-text-node : (CharacterItem) -> TextNode function infoitem-to-single-character-text-node(c) { /* convert character code to string of length 1 */ return text-node(code2string(infoset-character-code(c))) }
Editorial Note: JM: need a definition or description of code2string.
The collapse-text-nodes function synthesizes a single text node from multiple text nodes. It calls infoitem-to-text-nodes to collapse recursively one or more consecutive text nodes in its argument sequence. If insignificant whitespace is ignored, any text node containing only whitespace is eliminated. All other nodes are returned unchanged.
collapse-text-nodes : (Sequence<Node>) -> Sequence<Node>
infoitem-to-text-nodes : (Sequence<Node>) -> Sequence<Node>
function collapse-text-nodes(nodes) {
let newnodes := infoitem-to-text-nodes(nodes)
return
if (ignore-whitespace) then
sequence-map(delete-whitespace-node, newnodes)
else newnodes
}
function infoitem-to-text-nodes(nodes) {
if (xf:empty(nodes)) then return empty-sequence()
else
let head := op:item-at(nodes, 1),
tail := xf:sublist(nodes, 2)
return
if (node-kind(head) = "text") then
/* Collapse two consecutive text nodes and apply
infoitem-to-text-nodes recursively */
if (xf:empty(tail)) then head
else if (node-kind(op:item-at(tail,1)) = "text") then
infoitem-to-text-nodes(
op:concatenate(
text-node(xf:concat(string-value(head), string-value(op:item-at(tail,1)))),
xf:sublist(tail, 2)
)
)
else op:concatenate(head, op:concatenate(op:item-at(tail,1), infoitem-to-text-nodes(tail)))
else op:concatenate(head, infoitem-to-text-nodes(tail))
}Editorial Note: JM: need a defin