XQuery 1.0 and XPath 2.0 Data Model

attributes"> base-uri"> children"> content"> namespaces"> nilled"> node-name"> parent"> prefix"> string-value"> target"> type"> uri"> xdt:untypedAtomic value"> prefix in the local-name and an empty URI"> xdt:untypedAtomic value"> xdt:untypedAtomic value"> xdt:untypedAtomic"> ]>

&doc.prefix;-&doc.date;

W3C Working Draft

&date.day; &date.month; &date.year; &url.this; XML http://www.w3.org/TR/xpath-datamodel/ http://www.w3.org/TR/2002/WD-query-datamodel-20021115/ http://www.w3.org/TR/2002/WD-query-datamodel-20020816/ Mary Fernández (XML Query WG) AT&T Labs mff@research.att.com Ashok Malhotra (XML Query and XSL WGs) Microsoft ashokma@microsoft.com Jonathan Marsh (XSL WG) Microsoft jmarsh@microsoft.com Marton Nagy (XML Query WG) Science Applications International Corporation (SAIC) marton.nagy@saic.com Norman Walsh (XSL WG) Sun Microsystems Norman.Walsh@Sun.COM

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This is a Public Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.

The XQuery 1.0 and XPath 2.0 Data Model has been defined jointly by the XML Query Working Group and the XSL Working Group (both part of the XML Activity).

This is a Last Call Working Draft. Comments on this document are due on 30 June 2003. Comments should be sent to the W3C mailing list public-qt-comments@w3.org. (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.html.

This document defines the W3C XQuery 1.0 and XPath 2.0 Data Model, which is the data model of at least , , and , and any other specifications that reference it. This data model is based on the data model and earlier work on an . This document is the result of joint work by the and the .

English

See the CVS changelog.

Introduction

This document defines the XQuery 1.0 and XPath 2.0 Data Model, which is the data model of , and

The XQuery 1.0 and XPath 2.0 Data Model (henceforth "data model") serves two purposes. First, it defines precisely the information contained in the input to an XSLT or XQuery processor. Second, it defines all permissible values of expressions in the XSLT, XQuery, and XPath languages. A language is closed with respect to a data model if the value of every expression in a language is guaranteed to be in the data model. XSLT 2.0, XQuery 1.0, and XPath 2.0 are all closed with respect to the data model.

The data model is based on the (henceforth "Infoset"), but it requires the following new features to meet the and :

Support for XML Schema types. The XML Schema recommendations define features, such as structures () and simple data types (), that extend the XML Information Set with precise type information.

Representation of collections of documents and of complex values. ()

As with the Infoset, the XQuery 1.0 and XPath 2.0 Data Model specifies what information in the documents is accessible, but it does not specify the programming-language interfaces or bindings used to represent or access the data.

Every value handled by the data model is a sequence of zero or more items. An item is either a node or an atomic value. A node is defined in and is one of seven node kinds. An atomic value encapsulates an XML Schema atomic type and a corresponding value of that type. They are defined in . A sequence is an ordered collection of nodes, atomic values, or any mixture of nodes and atomic values. A sequence cannot be a member of a sequence. A single item appearing on its own is modeled as a sequence containing one item. Sequences are defined in .

In XPath 1.0, the data model only defines nodes. The primitive data types (number, boolean, string, node-set) are part of the expression language, not the data model.

The data model can represent various values including not only the input and the output of a stylesheet or query, but all values of expressions used during the intermediate calculations. Examples include the input document or document repository (represented as a document node or a sequence of document nodes), the result of a path expression (represented as a sequence of nodes), the result of an arithmetic or a logical expression (represented as an atomic value), a sequence expression resulting in a sequence of items, etc.

In this document, we provide a precise definition of the properties of nodes in the XQuery 1.0 and XPath 2.0 Data Model, how they are accessed, and how they relate to values in the Infoset. We note wherever the XQuery 1.0 and XPath 2.0 Data Model differs from that of XPath 1.0.

Notation

In addition to prose, we define a set of accessor functions to explain the data model. The accessors defined by the data model are shown with the prefix dm. The prefix is always shown in italics to emphasize that these functions are abstract; they exist to explain the interface between the data model and specifications that rely on the data model: they are not and cannot be made accessible directly from the host language.

The signature of accessors is shown using the same style as . For example:

In the notation syntax, the term Node denotes the category of node values and Item refers to the category of either node values or atomic values.

Some accessors can accept or return sequences. The following notation is used to denote sequence values:

V* denotes a sequence of zero or more items of type V.

V? denotes a sequence of exactly zero or one items of type V.

V+ denotes a sequence of one or more items of type V.

In a sequence, V may be a Node or AtomicValue, or the union (choice) of several categories of Items.

There are some functions in the data model that are partial functions. We use the occurrence indicators ? or * when specifying the return type of such functions. For example, a node may have one parent node or no parent. If the node argument has a parent, the parent accessor returns a singleton sequence. If the node argument does not have a parent, it returns the empty sequence. The signature of parent specifies that it returns an empty sequence or a sequence containing one node:

This document relies on the . Information items and properties are indicated by the styles information item and property, respectively.

This document frequently uses the term expanded-QName. An expanded-QName is a pair of values consisting of a namespace URI and a local name. They belong to the value space of the XML Schema type xs:QName. When this document refers to xs:QName we always mean the value space, i.e. a namespace URI, local name pair (and not the lexical space referring to constructs of the form prefix:local-name).

Concepts Node Identity

Because XML documents are tree-structured, we define the data model using conventional terminology for trees. The data model is a node-labeled, directed graph, in which each node has a unique identity. Every node in the data model is unique: identical to itself, and not identical to any other node.

This concept should not be confused with the concept of a unique ID, which is a unique name assigned to an element by the author to represent references using ID/IDREF correlation.

Document Order

A document order is defined on all the nodes in a document. Document order is a total ordering, although the relative order of some nodes is implementation-dependent. Informally, document order is the order returned by an in-order, depth-first, left-to-right traversal of the data model. There is precisely one document order and it satisfies the following constraints.

The document node is the first node.

The relative order of siblings is determined by their order in the XML representation. A node N1 occurs before a node N2 in document order if and only if the start of N1 occurs before the start of N2 in the XML document.

Element nodes occur before their children; children occur before following-siblings.

Namespace nodes immediately follow the element node with which they are associated. The relative order of namespace nodes is stable but implementation-dependent.

Attribute nodes immediately follow the namespace nodes of the element with which they are associated. The relative order of attribute nodes is stable but implementation-dependent.

The relative order of nodes in distinct documents is implementation-dependent but stable. In other words, given two distinct documents A and B, if a node in document A is before a node in document B, then every node in document A is before every node in document B.

The relative order of free-standing nodes (elements, attributes, and other nodes created outside the context of a particular document) is also implementation-dependent but stable.

XML Schemas and the XML Information Set

This document describes how to construct an instance of the data model from an . Some aspects of the data model are dependent upon XML Schema validity assessment; this document describes how to determine those aspects of the data model from a Post Schema Validation Infoset. A Post Schema Validation Infoset, or PSVI, is the augmented infoset produced by an XML Schema validation episode..

Although we describe construction of a data model in terms of infoset properties, an infoset is not an absolutely necessary precondition for building an instance of the Data Model. Purely synthetic data model instances are entirely appropriate as long as they obey all of the constraints described in this document.

The data model supports well-formed XML documents conforming to . XML documents that are not well-formed are not XML, by definition. XML documents that do not conform to are not supported (nor are they supported by ).

In other words, the data model supports the following classes of XML documents:

Well-formed documents conforming to ,

DTD-valid documents conforming to , and

W3C XML Schema-validated documents.

The data model supports some kinds of values that are not supported by . Examples of these are well-formed document fragments, sequences of fragments or sequences of documents. The data model also supports values that are not nodes. Examples of these are atomic values, sequences of atomic values, or sequences mixing nodes and atomic values. These are necessary to be able to represent the results of intermediate expressions in the data model during expression processing.

Schema-validated documents include documents in which some elements or attributes have been validated by "lax" or "skip" validation ().

An "incompletely validated document" is an XML document that has a corresponding schema but whose schema-validity assessment has resulted in one or more element or attribute information items being assigned values other than 'valid' for the validity property in the PSVI.

The data model supports incompletely validated documents, but inconsistent data models are forbidden. Elements and attributes that are not valid are treated as untyped.

In addition to specifying the transformation from the Post Schema Validation Infoset (PSVI) to the data model, this document also specifies the transformation from the data model back to the XML Information Set. This is a useful notion that can be used for defining serialization and validation. Serialization can be viewed as a two step process, first transforming to the XML Infoset and then to an XML document. Validation is described conceptually as a process of mapping the data model to the XML Infoset followed by XML Schema validation producing a PSVI which is then loaded into the data model.

Types

The data model supports a representation of named types as stipulated by .

For named types, which includes both the built-in types defined by and named user-defined types declared in a schema and imported by a stylesheet or query, the data model uses expanded-QNames to represent their names. Since named types in XML Schema are global, an expanded-QName uniquely identifies such a type. The namespace name of the expanded-QName is the target namespace of the schema and its local name is the name of the type.

For anonymous types, the processor must construct an anonymous type name that is distinct from the name of every named type and the name of every other anonymous type. An anonymous type name is an implementation-defined, globally unique type name provided by the processor for every anonymous type declared in an imported schema.

In either case, the type names must also appear in the In-scope Schema Definitions (as defined in ) available to the processor.

The data model associates type information with element nodes, attribute nodes and atomic values. The item is guaranteed to be a valid instance of that type as defined by XML Schema.

The data model defines an accessor type that returns an expanded-QName corresponding to the type of the element node, attribute node or atomic value. It returns xs:anyType or xs:anySimpleType if no type information exists, or if it failed W3C XML Schema validity assessment.

When no type information exists for an element or an attribute node we frequently use the terminology element with unknown type or attribute with unknown simple type.

The data model does not represent element or attribute declaration schema components, but it supports various type-related operations. The semantics of such operations, e.g. checking if a particular instance of an element node has a given type is defined in .

Typed Value and String Value

The content of a text, attribute, or element node can be interpreted in two ways: as a string value or as a typed value. For these types of nodes, the typed value can be extracted by the typed-value accessor, and the string value can be extracted by the string-value accessor.

The string value of a node is a single xs:string derived from the content of the node as described in the definitions of the accessor functions for each kind of node.

The typed value of a node is a sequence of atomic values derived from its string value and its type in a way that is consistent with schema validation, as described in the definitions of the accessor functions for each kind of node.

Mapping PSV Infoset additions to Types

This section specifies how the type of an element or attribute node is computed from the PSVI properties that specify validity and type assessment for the node's corresponding information item.

A PSVI element or attribute information item has a validity property. The validity property may be "valid", "invalid", or "notKnown" and reflects the outcome of schema-validity assessment. The only information that can be inferred from an invalid or not known validity value is that the information item is well-formed, therefore, we must associate some general type information with the element or attribute node if it is not known to be valid.

The precise definition of the type of an element or attribute information item depends on the properties of the Infoset or PSVI. In a PSVI, XML Schema only guarantees the existence of either the type definition property, or the type definition namespace, type definition name and type definition anonymous properties. If the type definition refers to a union type, there are further properties defined, that refer to the type definition which actually validated the item's normalized value. These properties are either the member type definition, or the member type definition namespace, member type definition name and member type definition anonymous properties. If these are available, the type of an element or attribute will refer to the member type that actually validated the schema normalized value.

If a PSVI is not available, then the data model is constructed from the Infoset in a manner that is compatible with the expectations of well-formed or DTD-validated parsing of an XML document.

The type of an element information item is represented by an expanded-QName whose namespace and local name correspond to the first applicable items in the following list:

If the validity property exists and is "valid":

If member type definition exists and its {name} property is present:

The {target namespace} and {name} properties of the member type definition property.

If the type definition property exists and its {name} property is present:

The {target namespace} and {name} properties of the type definition property.

If member type definition anonymous exists:

If it is false: the member type definition namespace and the member type definition name.

Otherwise, the namespace and local name of the appropriate anonymous type name.

If type definition anonymous exists:

If it is false: the type definition namespace and the type definition name

Otherwise, the namespace and local name of the appropriate anonymous type name.

If the validity property does not exist on this node or any of its ancestors, Infoset-only processing is applied:

If the attribute type property exists and has one of the following values: ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, or NMTOKENS, the {target namespace} is "http://www.w3.org/2001/XMLSchema" and the {name} is the [attribute type].

Note that this processing is only performed if no part of the subtree that contains the node was schema validated. In particular, Infoset-only processing does not apply to subtrees that are "skip" validated in a document.

Otherwise, xs:anyType for elements or xs:anySimpleType for attributes.

If the expanded-QName that results from this derivation is not available in the processor's In-Scope Schema Definitions, the expanded-QName is promoted to xs:anyType for elements or xs:anySimpleType for attributes. This can occur, for example, if the processor does not support the schema import feature or if it was unable to import the necessary schema.

Attributes from the XML Schema instance namespace, "http://www.w3.org/2001/XMLSchema-instance", (xsi:schemaLocation, xsi:type, etc.) appear as ordinary attributes in the data model. They will be validated appropriately by schema processors and will simply appear as attributes of type xs:anySimpleType if they haven't been schema validated.

Mapping xs:dateTime, xs:date, and xs:time Values

permits xs:dateTime, xs:date, and xs:time values both with and without timezones. In the context of validation, this is a purely lexical distinction. In order to compare dates and times, an XML Schema validator converts all times to Coordinated Universal Time (UTC or timezone Z). But in the data model, it is necessary to preserve timezone information.

In order to achieve this goal xs:dateTime, xs:date, and xs:time values are represented as tuples in the data model: a time value normalized to UTC and a timezone represented as a xdt:dayTimeDuration.

The lexical representation of the value is converted to UTC as defined by and the timezone in the lexical representation is converted to a xdt:dayTimeDuration value. These two values are stored in the tuple.

Lexical representations that do not have a timezone are assumed to be in UTC for the purposes of normalization. An empty sequence is used for their timezone in the tuple.

Thus, for the purpose of validation, 2003-01-02T11:30:00-05:00 is converted to 2003-01-02T16:30:00Z, but the data model stores it as (2003-01-02T16:30:00Z, -PT5H0M). The value 2003-01-16T16:30:00 is stored as it is (2003-01-02T16:30:00Z, ()) because it has no timezone.

Mapping xsi:nil on Element Nodes

introduced a mechanism for signaling that an element should be accepted as valid when it has no content despite a content type which does not require or even necessarily allow empty content. That mechanism is the xsi:nil attribute.

The data model exposes this special semantic in the &dm.prop.nilled; property.

If the validity property exists on an element node and is "valid" then &dm.prop.nilled; may be set. The &dm.prop.nilled; property is never set for nodes that have not been successfully schema validated.

If the element is valid and has a PSVI nil property and that property is true, then &dm.prop.nilled; is true. In all other cases, &dm.prop.nilled; is false.

Comments, Processing Instructions, and Whitespace

Although the data model is able to represent comments, processing instructions, and insignificant whitespace, preservation of this information may be unnecessary and onerous for some applications.

An instance of the data model can be constructed from an Infoset, a PSVI, or from some other data source entirely. Different applications may or may not choose to construct nodes in the data model to represent comments, processing instructions, and insignificant white space. These decisions are considered outside the scope of the data model. Consequently the data model makes no attempt to control or identify the sort of processing in this regard that an application uses to construct a data model instance.

Nodes

The category of Node values contains seven distinct kinds of nodes: document, element, attribute, text, namespace, processing instruction, and comment. The seven kinds of nodes are defined in the following subsections.

A tree contains a root plus all nodes that are reachable directly or indirectly from the root via the children, attributes, and namespaces accessors. Every node belongs to exactly one tree, and every tree has exactly one root node. A tree whose root node is a document node is referred to as a document. A tree whose root node is some other kind of node is referred to as a fragment.

Accessors

A set of accessors is defined on all seven kinds of Nodes. Some accessors return a constant empty sequence on certain node kinds. Some node kinds have additional accessors that are not summarized here.

In order for applications to be able to operate on instances of the data model, the model must expose properties of the items it contains. The data model does this by defining a family of accessor functions. These are not functions in the literal sense, they are not available for users or applications to call directly, rather they are descriptions of the interface that an implementation of the data model must expose to applications. Functions and operators available to end-users are described in .

base-uri Accessor

The base-uri accessor returns a sequence containing zero or one uri references.

Document, element, and processing-instruction nodes have a base-uri property. The base-uri of all other node types is the empty sequence.

If the base-uri property of a document, element, or processing-instruction node is non-empty, its value is returned.

If the accessor is called on a node that does not have a base-uri property, or whose base-uri property is empty, the base-uri of that node's parent is returned. If the node has no parent, the empty sequence is returned.

node-kind Accessor

The node-kind accessor returns a string value identifying the kind of node on which the accessor was called. One of the following values is returned:

"document" for document nodes.

"element" for element nodes.

"attribute" for attribute nodes.

"text" for text nodes.

"namespace" for namespace nodes.

"processing-instruction" for processing instruction nodes.

"comment" for comment nodes.

node-name Accessor

The node-name accessor returns a sequence of zero or one xs:QNames.

For element and attribute nodes, node-name returns the qualified name of the element or attribute.

For processing-instructions nodes, node-name returns an xs:QName with the processing instruction target name in the local-name and no namespace URI.

For namespace nodes, node-name returns an xs:QName with the prefix of the namespace declaration in the local-name and no namespace URI. If the namespace declaration declares the default namespace, which has no prefix, an empty sequence is returned.

Some implementations may not preserve information about the prefixes declared. In these cases, the node-name accessor returns the empty sequence when applied to namespace nodes.

parent Accessor

The parent accessor returns a sequence containing zero or one nodes.

For nodes that have a parent, parent returns the parent node. For all other nodes, it returns the empty sequence.

If the return value is not the empty sequence, it will always be either an element node or a document node.

string-value Accessor

Every node has a string value; the way in which the string value of a node is computed is different for each kind of node and is specified in the sections on nodes below.

The string value of an atomic value is computed by casting it to an xs:string as per the rules described in .

typed-value Accessor

The typed-value accessor returns the typed-value of the node, which is a sequence of zero or more atomic values derived from the string-value of the node and its type in such a way as to be consistent with validation.

If the node is a comment, document, namespace, processing-instruction, or text node, then its typed value is equal to its string value as an instance of xdt:untypedAtomic.

If the node is an attribute node with type xs:anySimpleType, then its typed value is equal to its string value as an instance of xdt:untypedAtomic. The typed value of an attribute node with any other type is derived from its string value and type annotation in a way that is consistent with XML Schema validation.

If the node is an element node with type xs:anyType, then its typed value is equal to its string value, as an instance of xdt:untypedAtomic.

If the node is an element node with a simple type or with a complex type of simple content, then its typed value is derived from its string value and type in a way that is consistent with XML Schema validation.

If the item is an element node with complex type of empty content, then its typed value is the empty sequence.

If the node is an element node with a complex type of mixed content, then its typed value is its string value as an instance of xdt:untypedAtomic.

If the item is an element node with complex type of complex content, then its typed value is undefined and typed-value raises a type error, which may be handled by the host language.

For detailed semantics see .

For xs:dateTime, xs:date and xs:time, the typed value is the atomic value that is determined from its tuple representation as follows:

If the timezone component is not the empty sequence, then the value contains the time component, normalized to the timezone specified by the timezone component, as well as the timezone component. The tuple "(2003-01-02T16:30:00Z, -PT5H0M)" produces the value "2003-01-02T11:30:00-05:00".

If the timezone component is the empty sequence, then the time component without any indication of timezone. The tuple "(2003-01-02T16:30:00Z, ())" produces the value "2003-01-02T16:30:00".

type Accessor

The type accessor returns the name of the type of a node.

For element nodes and attribute nodes, type returns the name of the type of the node (as an xs:QName) if it has one. If the type is anonymous, or if no type information exists, the name returned will be unique but implementation defined.

The use of xs:QName in this signature is part of the data model formalism. In practice, implementations are not required to use xs:QNames to represent the implementation-defined names of anonymous types.

For text nodes, type returns xdt:untypedAtomic.

For other node kinds, it always returns the empty sequence.

children Accessor

The children accessor returns a sequence containing zero or more nodes.

For document and element nodes, it returns the nodes that are the children of that node in document order. It returns the empty sequence for document and element nodes that have no children. If children exist, they will always consist exclusively of element, processing-instruction, comment, and text nodes. Attribute, namespace, and document nodes can never appear as children.

For all other nodes, it always returns the empty sequence.

A document node or an element node is the parent of each of its child nodes. Nodes never share children: if two nodes have distinct identities, then no child of one node will be a child of the other node.

The sequence of children will never contain adjacent text nodes.

attributes Accessor

The attributes accessor returns a sequence containing zero or more attribute nodes.

For element nodes, these are the attributes of the node. For all other nodes, it always returns the empty sequence.

namespaces Accessor

The namespaces accessor returns a sequence containing zero or more namespace nodes.

For element nodes, these are the namespaces of the node. For all other nodes, it always returns the empty sequence.

nilled Accessor

The nilled accessor returns the setting of the &dm.prop.nilled; property of an element node. See .

For all other nodes, it always returns the emtpy sequence.

Documents Overview

Document nodes encapsulate XML documents. Documents have the following properties:

&dm.prop.base-uri;, possibly empty.

&dm.prop.children;, possibly empty.

unparsed-entities, possibly empty.

document-uri, possibly empty.

Document nodes must satisfy the following constraints.

Every document node must have a unique identity, distinct from all other nodes.

The &dm.prop.children; must consist exclusively of element, processing instruction, comment, and text nodes if it is not empty. Attribute, namespace, and document nodes can never appear as children

The sequence of nodes in the &dm.prop.children; property is ordered and must be in document order.

The &dm.prop.children; property must not contain two consecutive text nodes.

If a node N is a child of a document D, then the parent of N must be D.

If a node N has a parent document D, then N must be among the children of D.

Every child of a document must be distinct.

In a well-formed document, the children of the document node must not be empty and consist exclusively of element nodes, processing-instruction nodes, and comment nodes, and exactly one of these children is an element node. A document node in the data model is more permissive: it may be empty and it allows more than one element node as a child and also permits text nodes as children.

Document nodes and XPath 1.0 root nodes are essentially identical.

Implementations that support DTD processing and access to the unparsed entity accessors, use the unparsed-entities property to associate information about an unordered collection of unparsed entities with a document node.

Accessors

Accessor	Returns:
base-uri	&document.node.base-uri.returns;
node-kind	&document.node.node-kind.returns;
node-name	&document.node.node-name.returns;
parent	&document.node.parent.returns;
string-value	&document.node.string-value.returns;
typed-value	&document.node.typed-value.returns;
type	&document.node.type.returns;
children	&document.node.children.returns;
attributes	&document.node.attributes.returns;
namespaces	&document.node.namespaces.returns;
nilled	&document.node.nilled.returns;

Three additional accessors are defined on document nodes:

The unparsed-entity-system-id accessor returns the system identifier of an unparsed external entity declared in the specified document. If no entity with the name specified in $entityname exists, or if the entity is not an external unparsed entity, the empty sequence is returned.

The unparsed-entity-public-id accessor returns the public identifier of an unparsed external entity declared in the specified document. If no entity with the name specified in $entityname exists, or if the entity is not an external unparsed entity, or if the entity has no public identifier, the empty sequence is returned.

The document-uri accessor returns the absolute URI of the resource from which the document node was constructed, if the absolute URI is available. If there is no URI available, or if it cannot be made absolute when the data model is constructed, the empty sequence is returned.

For example, if a collection of documents is returned by the fn:collection function, the document-uri may serve to distinguish between them even though each has the same base-uri.

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI, a document information item is mapped to a Document Node. The precise transformation is described by specifying the PSVI property corresponding to each property of a document node.

&dm.prop.base-uri;

The value of the base URI property.

&dm.prop.children;

The sequence of nodes constructed from the information items found in the children property.

To construct the value of the &dm.prop.children; property, for each element, processing instruction, comment, and maximal sequence of adjacent character information items found in the children property, a corresponding Element, Processing Instruction, Comment, and Text node is constructed and that sequence of nodes is used as the value. If present among the children, the document type declaration information item is ignored.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps a Document Node to a document information item. The properties of the document information item are constructed as follows:

Property	Value:
base URI	The value returned by the base-uri accessor
children	The sequence of information items constructed from the nodes returned by the children accessor. In other words, for each node returned by the children accessor, a corresponding information item is constructed and that sequence of information items is used as the value for the children property.
document element	The values of these properties are implementation-defined but must be consistent with the rest of the Infoset constructed.
notations
unparsed entities
character encoding scheme
standalone
version
all declarations processed

Since Document Nodes are more permissive than document information items, the resulting Infoset may be invalid.

Elements Overview

Element nodes encapsulate XML elements. Elements have the following properties:

&dm.prop.base-uri;, possibly empty.

&dm.prop.node-name;

&dm.prop.parent;, possibly empty

&dm.prop.type;

&dm.prop.children;, possibly empty

&dm.prop.attributes;, possibly empty

&dm.prop.namespaces;, possibly empty

&dm.prop.nilled;

Element nodes must satisfy the following constraints.

Every element node must have a unique identity, distinct from all other nodes.

The &dm.prop.children; must consist exclusively of element, processing instruction, comment, and text nodes if it is not empty. Attribute, namespace, and document nodes can never appear as children

The sequence of nodes in the &dm.prop.children; property is ordered and must be in document order.

The &dm.prop.children; property must not contain two consecutive text nodes.

Every child of an element must be distinct.

The attributes of an element must have distinct names.

The namespace nodes of an element must have distinct names. At most one of the namespace nodes of an element has no name (this is the default namespace). A namespace node whose namespace URI is the zero-length string must have no name. No namespace node may have the name "xmlns".

If a node N is a child of an element E, then the parent of N must be E.

Exclusive of attribute nodes, if a node N has a parent element E, then N must be among the children of E. (Attribute nodes have a parent, but they do not appear among the children of their parent.)

The data model permits element nodes without parents (to represent partial results during expression processing, for example). Such elements must not appear among the children of any other node.

If an attribute node A has a parent element E, then A must be among the attributes of E.

The data model permits attribute nodes without parents (to represent partial results during expression processing, for example). Such attributes must not appear among the attributes of any element node.

The data model does not enforce a constraint that the namespaces of an element must be a superset of the namespaces of its parent, nor does it enforce a constraint that the namespaces of an element must include namespace nodes for each of the namespace URIs used in the element name and the names of its attributes, or of namespace URIs used in the content of elements and attributes of type xs:QName. Applications of the data model (such as XSLT and XQuery) may enforce such constraints in particular circumstances, but these constraints are not part of the data model.

Accessors

Accessor	Returns:
base-uri	&element.node.base-uri.returns;
node-kind	&element.node.node-kind.returns;
node-name	&element.node.node-name.returns;
parent	&element.node.parent.returns;
string-value	&element.node.string-value.returns;
typed-value	&element.node.typed-value.returns;
type	&element.node.type.returns;
children	&element.node.children.returns;
attributes	&element.node.attributes.returns;
namespaces	&element.node.namespaces.returns;
nilled	&element.node.nilled.returns;

The base-uri accessor returns the &dm.prop.base-uri; property of the element node, if it exists. If it does not exist, the base URI of the element's parent is returned.

The accessors namespaces and attributes return the same set of namespace and attribute nodes (respectively) associated with the element. They are not constrained to return them in any particular order.

The parent accessor returns the empty sequence if the element has no parent.

If the element node's type is xs:anyType, the typed-value accessor returns the node's string value as xs:anySimpleType. If the type is a complex type with complex content, invoking typed-value raises an error.

The typed-value accessor returns the typed-value of the node, which is a sequence of zero or more atomic values. The typed-value is closely related to the node's string-value and its type. For example:

When the node's string-value is "3.14" and its type is xs:decimal, the typed-value is a sequence containing the atomic value 3.14 of type decimal.

When the node's string-value is "foo bar baz" and its type is xs:IDREFS, the typed-value is a sequence containing the atomic values "foo", "bar", and "baz", each of type xs:IDREF.

When the node's string-value is "17" and its type is xs:anyType, the typed-value is a sequence containing the atomic value "17" of type xs:anySimpleType.

In fact, when the type is an atomic type, typed-value is always the atomic-value constructed from the string-value and the type.

In the general case, typed-value constructs a sequence of atomic values. These values are derived from the string-value of the element and its type, in such a way as to be consistent with validation.

One additional accessors is defined on element nodes:

The element-declaration accessor returns the xs:QName of the global element declaration associated with this element. If the element declaration is local, it returns a sequence consisting of the xs:QName of the local element declaration and the SchemaGlobalContext of the declaration.

This declaration can be used by implementations to identify substitution groups, nillability, and other aspects of the declaration.

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI, an element information item is mapped to an Element Node. The precise transformation is described by specifying the PSVI property corresponding to each property of an element node.

&dm.prop.base-uri;

The value of the base URI property.

&dm.prop.node-name;

An xs:QName constructed from the local name property and the namespace name property

&dm.prop.parent;

The value of the parent property.

&dm.prop.type;

The xs:QName computed as described in . Note that if the type referenced would be a union type then type refers to the member type that actually validated the schema normalized value.

&dm.prop.children;

If the schema normalized value PSVI property exists and is not absent, the processor may, depending on the implementation, use a sequence of nodes containing the Processing Instruction and Comment nodes corresponding to the processing instruction and comment information items found in the children property, plus a single text node whose string value is the the schema normalized value. The order of these nodes is implementation defined.

Otherwise, a sequence of nodes constructed in the following way from the information items found in the children property: for each element, processing instruction, comment, and maximal sequence of adjacent character information items found in the children property, a corresponding Element, Processing Instruction, Comment, and Text node is constructed.

Because the data model requires that all general entities be expanded, there will never be unexpanded entity reference information item children.

&dm.prop.attributes;

A set of Attribute Nodes constructed from the attribute information items appearing in the attributes property. This includes all of the special attributes (xml:lang, xml:space, xsi:type, etc.) but does not include namespace declarations (because they are not attributes).

&dm.prop.namespaces;

A set of Namespace Nodes constructed from the namespace information items appearing in the in-scope namespaces property.

Some implementations may choose to use only a subset of the namespaces present in the PSVI. In particular, they may exclude namespace nodes for namespaces which do not appear in the qualified name of any element or attribute information item. This can arise when xs:QNames are used in content.

&dm.prop.nilled;

If the validity property exists and is valid and the attributes property contains an attribute with the local-name nil and the namespace URI http://www.w3.org/2001/XMLSchema-instance, then true, otherwise false.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps an Element Node to an element information item. The properties of the element information item are constructed as follows:

Property	Value:
namespace name	The namespace name of the xs:QName returned by the node-name accessor
local name	The local name of the xs:QName returned by the node-name accessor
prefix	An appropriate namespace prefix, as described below
children	The sequence of information items constructed from the nodes returned by the children accessor. In other words, for each node returned by the children accessor, a corresponding information item is constructed and that sequence of information items is used as the value for the namespace name property.
attributes	The sequence of attribute information items constructed from the nodes returned by the attributes accessor.
in-scope namespaces	The sequence of namespace information items constructed from the nodes returned by the namespaces accessor.
base URI	The value returned by the base-uri accessor
parent	The information item constructed from the node returned by the parent accessor. If the node has no parent, the property must be left absent and the resulting Infoset will not be valid.
namespace attributes	The sequence of namespace information items constructed from the nodes that are present in the difference between the sequence of nodes returned by the namespaces accessor on this element and the sequence of nodes returned by the namespaces accessor of this element's parent; see below.

An implementation must construct the value of the prefix property as if the following algorithm was applied: if the element has at least one namespace node whose namespace URI is the same as the namespace name of the xs:QName returned by the node-name accessor, it returns the local part of the name of that namespace node or the empty string if the namespace node has no name. If there are several such namespace nodes, it chooses one of them arbitrarily. If there is no such namespace node, it generates an arbitrary prefix that is distinct from the node-name of any of the element's namespaces. The prefix is the empty string if the element has an empty namespace name (if it is in the null namespace).

If a new prefix is generated, a corresponding namespace information item must be added to the in-scope namespaces property of the element information item. The namespace information item must associate the generated prefix with the namespace name of the xs:QName returned by the element's node-name accessor.

If the implementation has allowed in-scope namespaces to be discarded from the data model, then these namespaces may need to be reintroduced when creating an Infoset in order to ensure that the Infoset corresponds to a document that is namespace well-formed as defined in .

The algorithm used to calculate namespace attributes will need to be adjusted to cater for XML Namespaces 1.1, which allows the "undeclaration" of all namespaces, whether they have a prefix or not.

The namespace attributes property is computed so that it contains the smallest possible set of namespace attributes. For example, suppose that the namespaces accessor for this element returns namespace nodes for the foo, bar, and baz namespaces and the namespaces accessor for this element's parent returns namespace nodes for the foo and bar namespaces. In this case, the namespace attributes property will contain a single namespace information item for the baz namespace.

Attributes Overview

Attribute nodes encapsulate XML attributes. Attributes have the following properties:

&dm.prop.node-name;

&dm.prop.string-value;

&dm.prop.parent;, possibly empty

&dm.prop.type;

Attribute nodes must satisfy the following constraints.

Every attribute node must have a unique identity, distinct from all other nodes.

If a attribute node A has a parent element E, then A must be among the attributes of E.

For convenience, the element node that owns this attribute is called its "parent" even though an attribute node is not a "child" of its parent element.

Accessors

Accessor	Returns:
base-uri	&attribute.node.base-uri.returns;
node-kind	&attribute.node.node-kind.returns;
node-name	&attribute.node.node-name.returns;
parent	&attribute.node.parent.returns;
string-value	&attribute.node.string-value.returns;
typed-value	&attribute.node.typed-value.returns;
type	&attribute.node.type.returns;
children	&attribute.node.children.returns;
attributes	&attribute.node.attributes.returns;
namespaces	&attribute.node.namespaces.returns;
nilled	&attribute.node.nilled.returns;

If the attribute node's type is xs:anySimpleType, the typed-value accessor returns the node's string value as xdt:untypedAtomic.

When the node's string-value is "3.14" and its type is xs:decimal, the typed-value is a sequence containing the atomic value 3.14 of type decimal.

When the node's string-value is "foo bar baz" and its type is xs:IDREFS, the typed-value is a sequence containing the atomic values "foo", "bar", and "baz", each of type xs:IDREF.

When the node's string-value is "17" and its type is xs:anyType, the typed-value is a sequence containing the atomic value "17" of type xs:untypedAtomic.

In fact, when the type is an atomic type, typed-value is always the atomic-value constructed from the string-value and the type.

In the general case, typed-value constructs a sequence of atomic values. These values are derived from the string-value of the element and its type, in such a way as to be consistent with validation.

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI an attribute information item is mapped to an Attribute Node. The precise transformation is described by specifying the PSVI property corresponding to each property of an attribute node.

&dm.prop.node-name;

An xs:QName constructed from the local name property and the namespace name property

&dm.prop.string-value;

The schema normalized value PSVI property if that exists, or

the normalized value property.

&dm.prop.parent;

The value of the parent property.

&dm.prop.type;

The xs:QName computed as described in . Note that if the type referenced would be a union type then type refers to the member type that actually validated the schema normalized value.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps an Attribute Node to an attribute information item. The properties of the corresponding attribute information item are constructed as follows:

Property	Value:
namespace name	The namespace name of the xs:QName returned by the node-name accessor
local name	The local name of the xs:QName returned by the node-name accessor
prefix	An appropriate namespace prefix, as described below
normalized value	The value returned by the string-value accessor
owner element	The information item constructed from the node returned by the parent accessor. If the node has no parent, the property must be left absent and the resulting Infoset will not be valid.
specified	The values of these properties are implementation-defined but must be consistent with the rest of Infoset constructed.
attribute type
references

An implementation must construct the value of the prefix property in the following way: if the attribute has a parent, in the same way that a prefix would be constructed for that element, otherwise a non-empty prefix is chosen arbitrarily, and no attempt is made to associate the prefix with the namespace URI.

Namespaces Overview

Namespace nodes encapsulate XML namespaces. Namespaces have the following properties:

&dm.prop.prefix;, possibly empty.

&dm.prop.uri;

&dm.prop.parent;, possibly empty

Namespace nodes must satisfy the following constraints.

Every namespace node must have a unique identity, distinct from all other nodes.

The namespace prefix may be the empty sequence. If the URI is the zero-length string, the prefix must be the empty sequence.

In XPath 1.0, namespace nodes were directly accessible by applications, by means of the namespace axis. In XPath 2.0 the namespace axis is deprecated, and it is not available at all in XQuery 1.0. XPath 2.0 implementations are not required to expose the namespace axis, though they may do so if they wish to offer backwards compatibility. The information held in namespace nodes is instead made available to applications using two functions defined in , namely fn:get-in-scope-namespaces and fn:get-namespace-uri-for-prefix. Certain properties of namespace nodes are not exposed by these functions: in particular, properties related to the identity of namespace nodes, their parentage, and their position in document order. Implementations that do not expose the namespace axis can therefore avoid the overhead of maintaining this information.

Accessors

Accessor	Returns:
base-uri	&namespace.node.base-uri.returns;
node-kind	&namespace.node.node-kind.returns;
node-name	&namespace.node.node-name.returns;
parent	&namespace.node.parent.returns;
string-value	&namespace.node.string-value.returns;
typed-value	&namespace.node.typed-value.returns;
type	&namespace.node.type.returns;
children	&namespace.node.children.returns;
attributes	&namespace.node.attributes.returns;
namespaces	&namespace.node.namespaces.returns;
nilled	&namespace.node.nilled.returns;

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI a namespace information item is mapped to a Namespace Node. The precise transformation is described by specifying the PSVI property corresponding to each property of a namespace node.

&dm.prop.prefix;

The prefix property.

&dm.prop.uri;

The namespace name property.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps a Namespace Node to a namespace information item. The properties of the namespace information item are constructed as follows:

Property	Value:
prefix	An appropriate namespace prefix, as described below
namespace name	The value returned by the string-value accessor

Processing Instructions Overview

Processing instruction nodes encapsulate XML processing instructions. Processing instructions have the following properties:

&dm.prop.target;

&dm.prop.content;

&dm.prop.base-uri;, possibly empty

&dm.prop.parent;, possibly empty

Namespace nodes must satisfy the following constraints.

Every processing instruction node must have a unique identity, distinct from all other nodes.

The &dm.prop.target; must be an NCName.

The string ?> must not occur within the &dm.prop.target; or &dm.prop.content;.

Accessors

Accessor	Returns:
base-uri	&pi.node.base-uri.returns;
node-kind	&pi.node.node-kind.returns;
node-name	&pi.node.node-name.returns;
parent	&pi.node.parent.returns;
string-value	&pi.node.string-value.returns;
typed-value	&pi.node.typed-value.returns;
type	&pi.node.type.returns;
children	&pi.node.children.returns;
attributes	&pi.node.attributes.returns;
namespaces	&pi.node.namespaces.returns;
nilled	&pi.node.nilled.returns;

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI, a processing instruction information item is mapped to a Processing Instruction Node. The precise transformation is described by specifying the PSVI property corresponding to each property of a processing instruction node.

&dm.prop.target;

The value of the target property.

&dm.prop.content;

The value of the content property.

&dm.prop.base-uri;

The value of the base URI property.

&dm.prop.parent;

The value of the parent property.

There are no processing instruction nodes for processing instructions that are children of a document type declaration information item.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps a Processing Instruction Node to a processing instruction information item. The properties of the processing instruction information item are constructed as follows:

Property	Value:
target	The local name of the xs:QName returned by the node-name accessor
content	The value of the string-value accessor
parent	The value of the parent accessor.
notation	This property has no value.
base URI	The value of the base-uri accessor

Comments Overview

Comment nodes encapsulate XML comments. Comments have the following properties:

&dm.prop.content;

&dm.prop.parent;

Comment nodes must satisfy the following constraints.

Every comment node must have a unique identity, distinct from all other nodes.

The string -- must not occur within the &dm.prop.content;.

Accessors

Accessor	Returns:
base-uri	&comment.node.base-uri.returns;
node-kind	&comment.node.node-kind.returns;
node-name	&comment.node.node-name.returns;
parent	&comment.node.parent.returns;
string-value	&comment.node.string-value.returns;
typed-value	&comment.node.typed-value.returns;
type	&comment.node.type.returns;
children	&comment.node.children.returns;
attributes	&comment.node.attributes.returns;
namespaces	&comment.node.namespaces.returns;
nilled	&comment.node.nilled.returns;

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI a comment information item is mapped to a Comment Node. The precise transformation is described by specifying the PSVI property corresponding to each property of a comment node.

&dm.prop.content;

The value of the content property.

&dm.prop.parent;

The value of the parent property.

There are no comment nodes for comments that are children of a document type declaration information item.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps a Comment Node to a comment information item. The properties of the corresponding comment information item are constructed as follows:

Property	Value:
content	The value of the string-value accessor
parent	The value of the parent accessor

Text Overview

Text nodes encapsulate XML character content. Text has the following properties:

&dm.prop.content;

&dm.prop.parent;

Text nodes must satisfy the following constraint:

A text node cannot contain the empty string as its content.

In addition, document and element nodes impose the constraint that two consecutive text nodes can never occur as adjacent siblings.

Accessors

Accessor	Returns:
base-uri	&text.node.base-uri.returns;
node-kind	&text.node.node-kind.returns;
node-name	&text.node.node-name.returns;
parent	&text.node.parent.returns;
string-value	&text.node.string-value.returns;
typed-value	&text.node.typed-value.returns;
type	&text.node.type.returns;
children	&text.node.children.returns;
attributes	&text.node.attributes.returns;
namespaces	&text.node.namespaces.returns;
nilled	&text.node.nilled.returns;

PSVI to Data Model Mapping

When a data model fragment is created from the PSVI a maximal sequence of consecutive character information items are mapped to a Text Node. The precise transformation is described by specifying the PSVI property corresponding to each property of a text node.

&dm.prop.content;

A string comprised of characters that correspond to the character code properties of each of the character information items.

&dm.prop.parent;

The value of the parent property.

The string-value is not W3C normalized as described in the Character Model for the World Wide Web version 1.0 draft.

Data Model to Infoset Mapping

The mapping of the data model to the XML Information Set maps a Text Node to a sequence of character information items. The properties of the corresponding character information items are constructed as follows:

Property	Value:
character code	The ISO 10646 character code of the character in question
element content whitespace	`false`
parent	The value of the parent accessor.

Atomic Values

An atomic value is a value in the value space of an atomic type labeled with that atomic type. The typed value of nodes whose type is unknown (for instance because they have not been validated) are labeled with the type xs:anySimpleType. An atomic type is a primitive simple type or a type derived by restriction from a primitive simple type. Types derived by list or union are not atomic.

The primitive simple types are those defined by XML Schema : xs:string, xs:boolean, xs:decimal, xs:float, xs:double, xs:duration, xs:dateTime, xs:time, xs:date, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gDay, xs:gMonth, xs:hexBinary, xs:base64Binary, xs:anyURI, xs:QName, and xs:NOTATION. A derived atomic type is derived by restriction and has a primitive base type and a set of constraining facets.

The value space of the atomic values is the union of the value spaces of the nineteen primitive XML Schema types. This value space clearly includes those atomic values whose type is primitive, but it also includes those whose type is derived, as derivation by restriction always limits the value space.

An XML Schema simple type may be primitive or derived by restriction, list, or union.

The values of nodes whose type is an XML Schema primitive simple type or is derived by restriction from an XML Schema primitive simple type are represented as atomic values of that type.

The values of nodes whose type is derived by list from an XML Schema primitive simple type (or from a type derived by restriction from an XML Schema primitive simple type) are represented by a sequence of atomic values whose type is the item type.

The values of nodes whose type is derived by union from an XML Schema primitive type are represented by a sequence of atomic values each of whose type is one of the individual types from the union. The union type information is lost and only the specific types of each individual item is retained.

An atomic value can be constructed from the value's lexical representation. Given a string and an atomic type, the atomic value is constructed in such a way as to be consistent with validation. In particular the construction takes into consideration the facets of the type. If the string does not represent a valid value of the type, an error is raised. When xs:anySimpleType is specified as the type, no validation takes place. The details of the construction are described in the Constructor Functions and the related Casting Functions section of .

A string value can be constructed from an atomic value. Such a value is constructed by converting the atomic value to its string representation as described in the Casting Functions section of . Using the canonical lexical representation for atomic values may not always be compatible with XPath 1.0.

Sequences

A sequence is an ordered collection of zero or more items. An item may be a node or an atomic value, i.e. a sequence may contain nodes, atomic values, or any mixture of nodes and atomic values. When a node is added to a sequence its identity remains the same. Consequently a node may occur in more than one sequence and a sequence may contain duplicate items. Sequences are flat, they may not contain other sequences.

An important characteristic of the data model is that there is no distinction between an item (a node or an atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa.

Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates. In generalizing node-sets to sequences in XPath 2.0, duplicate removal is provided by functions on node sequences.

A collection of documents is represented in the data model as a sequence of document nodes.

A sequence has no identity. Equality comparison of sequences is performed only by comparing the items of the sequences.

XML Information Set Conformance

This specification conforms to the XML Information Set . The following information items must be exposed by the infoset producer to construct a data model fragment:

The Document Information Item with base URI and children properties.

Element Information Items with children, attributes, in-scope namespaces, local name, namespace name, parent properties.

Attribute Information Items with namespace name, local name, normalized value, owner element properties.

Character Information Items with character code and parent properties.

Processing Instruction Information Items with target, content and parent properties.

Comment Information Items with content and parent properties.

Namespace Information Items with prefix and namespace name properties.

Other information items and properties made available by the Infoset processor are ignored. In addition to the properties above, the following properties from the PSV Infoset are required:

validity, type definition, type definition namespace, type definition name, type definition anonymous, member type definition, member type definition namespace, member type definition name, member type definition anonymous and schema normalized value properties on Element Information Items.

References Normative References World Wide Web Consortium, XML Information Set (Infoset). See http://www.w3.org/TR/xml-infoset/. World Wide Web Consortium, Namespaces in XML See http://www.w3.org/TR/REC-xml-names. World Wide Web Consortium, XQuery 1.0 and XPath 2.0 Functions and Operators. See http://www.w3.org/TR/xpath-functions/. World Wide Web Consortium, XML Schema Part 1: Structures. See http://www.w3.org/TR/xmlschema-1. World Wide Web Consortium, XML Schema Part 2: Datatypes. See http://www.w3.org/TR/xmlschema-2. Other References World-Wide Web Consortium XML Query Data Model, Working Draft, Feb 2001. See http://www.w3.org/TR/2001/WD-query-datamodel-20010215/. World-Wide Web Consortium XML Path Language (XPath): Version 1.0. November, 1999. See http://www.w3.org/TR/xpath.html. World Wide Web Consortium, XPath Requirements Version 2.0. See http://www.w3.org/TR/xpath20req. World-Wide Web Consortium XML Path Language (XPath): Version 2.0. See http://www.w3.org/TR/xpath20/. World Wide Web Consortium, XSL Transformations Language (XSLT): Version 2.0. See http://www.w3.org/TR/xslt20/. World Wide Web Consortium, XQuery 1.0 and XPath 2.0 Formal Semantics. See http://www.w3.org/TR/xquery-semantics/ World Wide Web Consortium, XML Query Working Group. Home page: http://www.w3.org/XML/Query. World Wide Web Consortium, XSL Working Group. Home page: http://www.w3.org/Style/XSL/. World Wide Web Consortium, XQuery 1.0: A Query Language for XML. See http://www.w3.org/TR/xquery/. World Wide Web Consortium, XML Query Requirements. g See http://www.w3.org/TR/2003/WD-xquery-requirements-20030502.

Glossary

Example

We use the following XML document to illustrate the information contained in a data model fragment:

&dm-example.xml;

The document is associated with the URI http://www.example.com/catalog.xml, and is valid with respect to the following XML schema:

&dm-example.xsd;

This example exposes the data model for a document that has an associated schema and has been validated successfully against it. In general, an XML Schema is not required, that is, the data model can represent a schemaless, well-formed XML document with the rules described in .

The XML document is represented by the nodes described below. The value D1 represents a document node; the values E1, E2, etc. represent element nodes; the values A1, A2, etc. represent attribute nodes; the values N1, N2, etc. represent namespace nodes; the values P1, P2, etc. represent processing-instruction nodes; the values T1, T2, etc. represent text nodes.

For brevity:

The data model doesn't include whitespace-only text nodes.

Literal strings are shown without the xs:string() constructor

Literal decimals are shown without the xs:decimal() constructor

Nodes are referred to using the syntax [nodeID]

xs:QNames are used with the following prefixes:

xs	http://www.w3.org/2001/XMLSchema
xsi	http://www.w3.org/2001/XMLSchema-instance
cat	http://www.example.com/catalog
xlink	http://www.w3.org/1999/xlink
html	http://www.w3.org/1999/xhtml

The abbreviation \n is used in string literals to represent a newline character; this isn't supported in XPath, but it makes this presentation clearer.

Accessors that return the empty sequence have been omitted.

&dm-example.tbl;

A graphical representation of the data model for the preceding example is shown below. Document order in this representation can be found by following the traditional in-order, left-to-right, depth-first traversal; however, because the image has been rotated for easier presentation, this appears to be in-order, bottom-to-top, depth-first order.

Graphic representation of the data model. [large view]

Issues List

The issues in this section serve as a design history for this document. The ordering of issues is irrelevant. Each issue has a unique id of the form Issue-<dddd> (where d is a digit). This can be used for referring to the issue by <url-of-this-document>#Issue-<dddd>. Furthermore, each issue has a mnemonic header, a date, an optional description, and an optional resolution.

Some of the issues contain references to W3C internal archives. These are marked with "members only". Some of the descriptions of the resolved issues are obsolete w.r.t. to the current version of the document.

Starting with the November 2002 publication, only issues that are still open are displayed. All of the issues are still available in the XML sources for this document.

As of 26 Mar 2003, there are no open issues in this document. In the future, issues relating to the Data Model will be compiled outside this document.

PSV Infoset identity constraints

What should be data-model representation, if any, of PSV Infoset identity-constraint tables?

JM: Duplicate of .

Representation of atomic values

This function assumes that the character information items for an atomic value (e.g., string, integer, floating-point number) are not interleaved with other information items (e.g., PIs or comments). The treatment of such interleaved values is not handled in this definition. This issue is addressed in threads beginning at: http://lists.w3.org/Archives/Member/w3c-archive/2000Jun/0090.html (members only) and http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2000Sep/0079.html (members only).

MF: The data model does not preserve information items interleaved with the character info items of an atomic value.

Example parent

Remark Michael: An IDREF cannot point to an empty string.

JM: Issue is obsolete, probably had something to do with Reference Nodes, which have been removed.

Schema/DTD

A document may refer to a DTD and have an associated schema. Currently, content model from the DTD is ignored, as are unique IDs from the schema. A coherent priority or merging strategy is needed.

Any strategy developed must also address the issue of types derived from xs:ID.

Infoset-only processing is performed if and only if schema processing was not. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0117.html

Lists of Simple Values

The current data model draft takes only into account singleton value-nodes. It must represent lists of simple-type values as well. See http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2000May/0060.html (members only).

Peter suggests having a special-purpose kind of a TextNode that represents lists of simple types. An advantage of this approach is that the constraint that lists of simple types be homogeneous/monomorphic can be enforced. However, lists/forests already can be modeled in current data model, without adding more complexity. For example, an attribute's value could be modeled as a list of TextNodes:

value : (AttributeNode) -> Sequence<TextNode>

A disadvantage of this approach is that the monomorphism constraint on lists derived from simple types is not enforced. However, given a type system for Query, such a constraint could be enforced. So Mary is in favor of not having a special-purpose kind of TextNode to represent lists, but instead model them by forests directly in the data model.

JM: note that the pseudo-code doesn't appear to handle schema list values.

decision record (members only).

Collections

We need a more thorough definition of collections, perhaps in a separate section, which includes bags and defines collections formally.

In particular, the algebra (probably) will not support arbitrarily nested collections (i.e., lists of lists, sets of sets, etc.). We need to specify how collections are constructed. For example, in the data model, the basic collection is a forest, i.e., a list of Nodes. The forest constructor creates a singleton forest from one Node; or it creates a forest from two forests by concatenating the two forests:

Forest = Node | Sequence<Node> forest : (Node) -> Forest function forest(Node n) = Sequence<n> union : (Forest, Forest) -> Forest function union(f1, f2) = list-append f1 f2 bagunion : (NodeBag, NodeBag) -> NodeBag setunion : (NodeSet, NodeSet) -> NodeSet

Similar constructors would exist for bags with and without duplicates.

unordered : (Forest) -> NodeBag unique : (NodeBag) -> NodeSet set = unique o unordered : (Forest) -> NodeSet

Added section on Collections .

Text Nodes

An alternative representation is to have a single TextNode whose base type is string:

text-node : (xs:string, S, Sequence<Node>) -> TextNode

This representation is more closely aligned with other node types in the data model, but it makes the simple type of leaf-node values opaque.

Peter Fankhauser compares and constrasts these options in : http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2000Apr/0174.html (members only).

decision record (members only).

Node vs edge centric data model

Cite:

Let me summarize my issues with a node-centric datamodel right at the beginning. The first two are mentioned in the doc later on:

As long as (1) the data represents a tree, (2) easy bi-directional is not required, (3) projection/extension operations with object-preserving semantics are not required, a node-centric datamodel is isomorphic to an edge-centric datamodel and is easier to represent and understand.

As soon as anyone of the above requirements change, an edge model has several advantages:

data represents a graph: naming the edges (relationships) becomes a must, since the names are now on the relationships and not on the objects. Uniform treatment of all edges (even the so far anonymous containment edges) makes defining operations easier since they are more orthogonal. With the possibility of distinguishing "type" from "name", even subelement names now semantically represent relationship names. For example, ShipAddr and BillAddr in

are denoting relationships (ownership to be exact) from the order element to the Address elements.

As soon as backwards pointers are introduced into a node-centric model, the representation becomes more complex and less elegant. Transforming data becomes more complex since the backwards pointer becomes part of the object state. Thus, if I define views where an element changes the parent, in the edge-centric case, this just adds a new relationship, the object state is unchanged, in the node-centric approach, I need to express now two parents in the object state.

Projection/extension operations. Assume that I pose a query that projects name and address but hides the age of a person element. In the edge-centric approach, this means that the query logically transforms the graph context on which the query operates by removing the age edge from the context without touching the object state (the objects keeps its basetype), in the node-centric approach, the object state needs to change since the context transformation will remove the attribute property age. While both operations transform the context, I find the former to be more elegant than the later.

MF: To align with XPath 1.0 and the Algebra, the data model is node centric.

Schema info

Cite:Sometimes one wants to use different schemata over the same basic XML fragment. So I would rather start with that in principle, the data model is schemaless and can provide the data model of any XML fragment given a schema. Thus, the schema postprocessing becomes a datamodel transformation that we make explicit (and that could be optimized with other operations that transform the datamodel graph).

MN: This was an alternate design to the named typing proposal. With the acceptance of the named typing this isssue is closed. When using a different schemata over the same XML fragment (by invoking validate, see ) a new instance is constructed and there is no need for postprocessing.

Node identity

Should the data model require that an implementation guarantee that the identity of a node is always preserved?

MF: The data model always preserves node identity; the only operator that does not preserve node identity is copy.

Access to facets

In XML Schema, facets such as "nullable" is associated with an element declaration, which is an element name, complex type pair. If the query language needs access to such facets, we may need to replace ReferenceNode by a reference to the element declaration.

MN: The data model represents named types as expanded-QNames. This way it supports type-related operations, but the semantics of such operations and the information that must be available to support them (e.g. facets) is described in the .

Representation of reference values

Cite: The current representation of reference values is too much IDREF(S) centric. I would prefer a more general representation for XLink and the schema (and potentially graph operation) introduced reference mechanisms.

JM: Removed reference nodes.

Equality operators on collections

Equality operators '=' on collections are not defined.

MF: Added Functions (subsequently removed to Functions and Operators spec).

Elements with unordered children

Should the element constructor element-node also permit bags of children?

MF: decision to use sequences everywhere in data model.

Semantics of value equality operator '='

The semantics of the value equality operator '=' are undefined.

JM: Defined in the Functions and Operators document.

PSV Infoset Mapping - undefined terms

Code is undefined.

Defined in .

Relationship between Ordered and Unordered collections

The relationship between ordered and unordered collections is not specified. Any ordered collection can be treated as an unordered collection.

Unordered collections removed.

Representation of lists of IDREFS and NMTOKENS

How are IDREF lists and NMTOKEN lists represented in data model.

JM: Duplicate of .

Element constructor that performs schema processing

An alternate is to separate element construction from schema validity assessment. The element constructor would construct an element corresponding to the an element information item in the Infoset before schema validity assessment. To produce elements with types, the schema-process function would schema process an element with respect to a schema type to yield a new element with the full PSV infoset. The schema-process function would ignore any type information on attributes and elements and would assess the untyped value with respect to the given type.

element-node : (xs:QName, Sequence<NamespaceNode>, Sequence<AttributeNode>, Sequence<ElementNode | ProcessingInstructionNode | TextNode | CommentNode>) -> ElementNode schema-process : (ElementNode | AttributeNode, SchemaComponent) -> ElementNode | AttributeNode

MN: The working groups decided not to pursue this alternate design. It is noted though that using the current constructors with the type set to either xs:anyType or to a more specific type makes them similar to either the first or the second function proposed here.

Semantics of copy

Deep copy on a node is defined only informally. For example, does deep copy preserve base URI?

JM: Obsolete - copy function removed (not used).

Declared vs. In-scope namespaces

Currently, an element node preserved its declared namespace nodes, not its in-scope namespaces. Members of the XSLT WG point out this may make impossible to determine the meaning of data-model values that refer to the default namespace. This is a big, nasty problem.

JM: Element constructor uses in-scope namespaces.

Abstraction of Run-time type information

The representation of run-time type information is very concrete—it's the data model representation of a Schema type. The XPath task force would like a more abstract representation of runtime type that is not bound so tightly to XML Schema. This is an open design problem.

MN: The data model currently represents named types in a more abstract manner, as expanded-QNames.

Support for document repositories

Many people would like to see support for document repositories in XPath 2.0 with a corresponding notion in the data model. A document repository is easy to model as a sequence or bag of document nodes. It may have some additional properties, like for an ordered repository, order among all the nodes in the repository.

NW: We now have fn:collection(), fn:input(), and fn:document() functions. Ashok and I believe that these functions satisfy the requirements raised by this issue.

Support for Schema-invalid documents

In its current state, the data model clearly does not cover schema-invalid documents: section 3.3 says "We assume that the element is an instance of the type represented by Def-Type, i.e., the document 'type checks' or is valid with respect to the given schema."

I believe we may wish to extend / modify the data model to specify that:

if the element is marked valid (i.e. if the [validity] property for the element information item has the value "valid"), then we assume that the element is an instance of the type represented by Def-Type

otherwise, if the element is marked invalid (i.e. the [validity] property has the value "invalid" or "notKnown"), and if the element has neither attributes nor child elements, then we assume [observe] that the element is an instance of the type anySimpleType

otherwise, we assume that the element is an instance of the type anyType

This would allow / require schema systems to be robust in the face of invalid documents. At first glance, that seems like a win.

The data model answers the above issue in the following way: Section 3.3 specifically states that the data model supports incompletely validated documents. Section 3.4 (formerly 8.1) describes a function used in the pseudo-code segments that maps the elements with the different [validity] properties to the appropriate types. The mapping is inline with the solution suggested by the issue above. For further details see the discussion thread and decision record (members only).

Types of Sequences

Should sequence values carry their type as do simple typed values and element and attribute nodes?

NW: Given that sequences are heterogenous, the editors feel that this issue no longer makes any sense.

Schema Component Values vs. Nodes

If schema component values becomes nodes, then does that mean they can occur any where in a document tree? I.e., can they be children of other nodes? What does this mean when a data model is serialized as a document?

MN: The data model represents named types as expanded-QNames. They can be associated with element nodes, attribute nodes and atomic values. Consequently the question raised by the issue no longer applies.

Lexical representation of simple-typed values

Given a simple-typed value, it may be necessary to recover its lexical representation, for example, when creating a text node that contains the value. It is not always possible to compute a unique lexical representation of a simple typed value.

Changed the definition of the string-value accessor to return the canonical lexical representation of the simple typed value (as defined by XML Schema Part 2: Datatypes). decision record (members only)

Whitespace handling

Whitespace handling needs to be more explicit. In the presence of a schema we have full knowledge of which whitespace is significant and which isn't, and can either mark whitespace as insignificant (and thus exclude it from text() and string-range() for instance), or automatically suppress whitespace in the data model. The former is appropriate given the dual representation of text nodes and values, the latter is appropriate if we only expose values.

Closed. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Oct/att-0275/01-2002-10-16-pres.html.

Use of Reference Nodes

Reference nodes may be part of the data model, but will never appear from a mapping from the infoset. In addition they cannot be serialized. Without these two features there doesn't seem to be much point in having them. Should we leverage an existing syntax (e.g. IDREFS) or design a new syntax to represent them?

Removed Reference Nodes.

Base URI is a property of element nodes

With external entities, and now with XML Base, the base URI can be scoped to various parts of the document. A base URI property should be added to Element Nodes, and the constructor and infoset mapping updated. Otherwise relative URIs in content cannot be correctly resolved.

Processing instructions also require an infoset-derived base URI. The base URI of attributes, for instance, should probably not be the empty sequence, if that does not adequately imply that the base URI of the element should be used instead.

Closed. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Oct/att-0275/01-2002-10-16-pres.html.

Schema component does not reveal [content] property

Schema component does not reveal [content] property of XML Schema: Formal Description schema component. MF: Problem with revealing [content] property is that we/Schema/Query have to agree on syntax for component content (Sec 2.2.1 in XML Schema: Formal Description).

Keys and key references not represented

Note that the data model does not currently represent key values and key reference values as described in XML Schema Part 1 : Structures . In a future draft of this document, keys and key references will be represented in the data model.

Not in V1. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Oct/att-0275/01-2002-10-16-pres.html.

Unclear relationship between values passed to the constructor, and those returned by the accessor

http://lists.w3.org/Archives/Member/w3c-xsl-query/2001Apr/0312.html (members only). Asks for inference rules, especially for the constuctor, describing when values returned by an accessor are the same as those set by the corresponding constructor. Especially unclear are when adjacent text nodes are collapsed, base URI and namespace declarations.

Overtaken by events; constructors have been removed.

Interaction of insignificant whitespace with comments

http://lists.w3.org/Archives/Member/w3c-xsl-query/2001May/0053.html (members only). Clarify whether whitespace is classified as insignificant before or after PI and comment removal.

NW: insignificant whitespace is orthogonal to PIs and comments. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Sep/0043.html

Eliminate heterogeneous sequences

http://lists.w3.org/Archives/Member/w3c-xsl-query/2001May/0054.html (members only), http://lists.w3.org/Archives/Member/w3c-xsl-query/2001May/0048.html (members only). Simplify operations such as distinct() by disallowing sequences mixing nodes and simple typed values. Suggests converting nodes in such a heterogeneous sequence to their typed values.

Heterogeneous sequences are here to stay.

Support for abstract types

xsd:anySimpleType (and other abstract types) are not supported in the data model. The string-value of an xsd:anySimpleTyped is apparently the empty sequence - indicating that you can't really do useful operations on anySimpleType'd values. However, we apply a default schema which assigns xsd:anySimpleType, resulting in a proliferation of these types. How can we operate on these documents? Should we at least return a non-empty string-value() for an anySimpleType? Should we define additional operations such as +, *, for compatibility with XPath 1.0?

MN: The data model now supports xs:anySimpleType and xs:anyType. The typed-value() of elements or attributes of this type is the string-value() as xs:anySimpleType. The semantics of the operations on atomic values of type xs:anySimpleType is defined in and .

Axis functions

Define (somewhere other than the data model document?) axis functions for non-primitive axes like descendants-or-self.

NW: This issue seems out of place in the DM document. Closed with no action.

XPath 1.0 treatment of non-unique IDs

From XPath 1.0: "If an XML processor reports two elements in a document as having the same unique ID (which is possible only if the document is invalid) then the second element in document order must be treated as not having a unique ID." This has not been incorporated into this document.

Overtaken by events; the unique-ID() accessor no longer exists. This is no longer a data model issue.

Parent of namespace nodes

In XPath 1.0 namespace nodes have a parent. Should we adopt the XPath 1.0 behavior, the current behavior (no parent), or some other parent (e.g. the document)?

Closed by Mike Kay's namespace proposal

Setting and examining construction flags

[Jim] found [him]self wondering how those flags (parameters) get set/passed. More importantly, can a process ask whether "this instance" of the data model has those flags set or not? If so, how? If not, why not?

Overtake by events. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0251.html

Document node permissiveness unnecessary

What are the use cases for being permissive about children of the document node? According to the note above, sequences of elements do not need a container node. Suggest removing this permissiveness.

MN: The working groups decided to keep the document node permissiveness to support "well-balanced" trees. decision record (members only).

System Id and Public Id are not exposed

In our model, there is no way for a query to determine what DTD is relevant for the data model instance. That seems like a piece of information that might be wanted occasionally (though probably not often).

The WGs decided not to add this functionality.

Treatment of common accessors inconsistent

In some cases, when an accessor is inappropriate for a node, we omit that accessor. In other cases, we define it to always return the empty sequence. We need to rationalize these two design patterns.

Use empty sequence throughout. decision record (members only).

Unable to construct an element with unique ID

The unique ID property is defined on an element node, but is a function of an attribute information item. When an element node is constructed it is given an attribute node - not an info item. An attribute node is insufficient to remember the appropriate properties from the infoset in order for the element constructor to detect when an attribute is an ID declared in the DTD.

Overtaken by events. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0253.html

Text nodes are not W3C-normalized text

The Character Model for the World Wide Web version 1.0 working draft defines W3C-normalized text. The algorithm for constructing text nodes from character information items does not perform normalization to this form. Should it?

The bottom line is that a well-formed document can contain unnormalized text, and therefore our data model can also contain unnormalized text. If normalization happens, it's not done by the data model.

Value of derived types?

In XML Schema, the type unsignedInt is a simple type, but it's a derived type (not a primitive type); the nearest corresponding primitive type is decimal. If I were to invoke the value accessor with something analogous to "value(cast as unsignedInt('10'))", does it return a decimal value, and is the SchemaComponent returned by type a component that describes decimal? I do not believe that this a non-issue because the only types supported in the data model are XPath 1.0's types (string, number, boolean, and node-set), since this document makes a strong point about using all of Schema's simple types. Thus, does value return the primitive value, or does it return something like the schema-normalized value of the specified simple type?

Simple typed values encapsulate both the value and the type. The type of an unsignedInt is the SchemaComponent corresponding to the unsignedInt. The string representation on the other hand is the value's canonical lexical representation for the base type (which is a primitive type) as specified by XML Schema. Decision record (members only).

Should errors be allowed in sequences?

The possibility that generating sequences in which one or more members are Error may sometimes be a useful way to produce partial results that can satisfy some sorts of queries.

MN: The error value has been removed from the data model. This issue is now mute.

Imprecise behavior of errors

I am very concerned that "How the error value is handled in a query processor is implementation-defined." I think we have to do a bit better than this, although leaving flexibility for implementations to do more for their users is a Good Thing.

MN: The error value has been removed from the data model. The error behavior is described in .

Alternative design of schema components

MF cite James: An alternative would be to have an extends accessor that returns deref([base]) if [derivation] is extension and empty sequence otherwise, and a restricts accessor that returns deref([base]) if [derivation] is restriction and empty sequence otherwise.

MN: This proposal is not in the scope of the data model: The data model represents named types as expanded-QNames. The semantics of the type operations and the information that must be available to support them is described in the .

Relative order of free-floating nodes

Are newly-constructed nodes in any particular order, such as some kind of document order? Does the order of these nodes have any relationship to the document order of the "input" data model instance? In fact, is the process being described properly characterized as "create a new data model instance from information derived from an existing data model instance", or something similar? Can more than one "new" document instance be created by a single query? In the fourth paragraph [Section 4], we see the phrase "the document node"— is this the "existing document"'s document node, the "new document"'s document node, both, neither? Can more than one "existing" document instance be the source of information for a query?

The relative order of free-floating nodes is implementation-dependent but stable.

Document order of shared namespace nodes

Section 3.2 states that in the document ordering, the namespace nodes of an element follow the element but precede its attributes. This is inconsistent with the idea, suggested but not spelled out in 4.4, that a namespace node can be shared by several elements. In fact, the question of namespace node identity is not really tackled. My view is that namespace node identity should be determined by the combination of (document identity, namespace prefix, namespace URI), that the parent of a namespace node should be the document node, and that namespace nodes should be ordered after every other node in the document. (This is easier for implementations than placing them at the start of the document, because the number of namespace nodes is not known until parsing is complete).

Closed by Mike Kay's namespace proposal

Element constructor copies nodes?

In section 4.2 Elements, the notion that the constructor makes a copy of the supplied child nodes seems strange. It's hard to square this with the definition of node identity. Also, I don't see why the provision is needed here, but not for the document node constructor. Wouldn't it be cleaner to define a precondition that all the child nodes supplied to the constructor must be parentless?

Overtaken by events; constructors have been removed.

Semantics of head and tail on empty sequences

Head would appear to be a partial function, it does not apply to empty sequences. If we follow the same conventions as elsewhere, that means head returns a Sequence(0,1)<item>, which perhaps begs the question as to how you extract the first member of this sequence...

Replaced the accessors head and tail by the some of the sequence accessors defined in the Functions and Operators document.

Complex types with simple content

Currently, the typed-value of a complex type is the empty sequence. Should complex types with simple content (i.e. an element that contains only text but that also has an attribute) be treated differently than complex types with complex content?

Now typed-value corresponds to the schema normalized value PSVI property if that exists, or the empty sequence otherwise. This way if an element has a complex type with simple content its typed-value may be something other than the empty sequence. For further details see the discussion thread and decision record (members only).

Effect of xsi:nil

The XSL WG wishes xsi:nil="true" to result in a typed-value of the empty sequence. This allows the differentiation of a null string value and an empty string.

This is a duplicate of the resolved XPath Issue 0021: Handling of xsi:nil on Input. See the review of the XPath issues (members only). Accordingly closed this issue, updated the document to reflect the decision and added the related data model Issue-0071.

Lightweight Schema Components

Summary: The only necessary operation on Schema Components is instance-of(schema-component1, schema-component2). There is no need to expose the internal structure of Schema Components to enable this functionality. Instead we could provide an abstract Type object whose internal structure can be treated as a black box. See http://lists.w3.org/Archives/Member/w3c-xsl-query/2001Jul/0042.html (members only).

MN: The data model represents named types as expanded-QNames, which is inline with the suggestion of the issue.

Support for XSLT whitespace stripping

XSLT allows a stylesheet to designate elements whose whitespace is to be stripped. We need to support this in the data model, or possibly elsewhere.

Similarly, a data model instance for a stylesheet has provisions for stripping comments and processing instructions.

NW: this functionality does not need to be supported in the data model. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Sep/0071.html

Node constructors formalism of questionable value

Node constructors. I'm a bit concerned that this is lacking in rigor. Some of this is exposed in . The idea that the constructor for a parent node (Element or Document) takes a copy of the supplied children node doesn't seem to be fully worked out. What does "taking a copy" mean, where is it defined? It has to be a deep copy to make sense; it has to preserve its name, its type, and its children, but not its base URI or its node identity, and it acquires a new position in document order. What happens about the namespace nodes when an element or attribute is copied? Altogether, I'm worried that this idea of node constructors looks formal, but is actually just as informal as the 1.0 specification. It's actually a very procedural description, and I can't really see why it's needed: if it's intended as a target vocabulary for the formal semantics of the language, then it's a pretty shaky foundation. I'd be much happier with a model that only defines the valid states in the system; if we are going to define the permitted state transitions, we need to be much more rigorous about it.

Overtaken by events; constructors have been removed.

Pseudo-formalism provides no value

"The sequence-map function applies its first function argument to each member of its second sequence argument and returns a new sequence containing the result of applying the function to each member of the sequence." I really wonder whether it is a good idea to define the data model using a pseudo-formal language that we make up and half-explain as we go along? If we can't use an existing formal notation like Z or VDM that has a good specification we can reference, we should do the whole thing in English.

NW: The psuedo-formalism has largely been removed. Closed with no action ("overtaken by events")

Sharing namespace nodes

We need to say something about the identity of namespace nodes, and about the fact that two namespace nodes for the same prefix and uri may need to be combined when an element is added to a document. Also "the element constructor logically creates a copy of all of its namespace..." - namespace nodes do not need to be copied(?)

Closed by Mike Kay's namespace proposal

No access to prefix on free-floating attributes

An observation: if an attribute node has no parent element (a floating attribute), then there is also no access to any namespace nodes. This makes it impossible to support the XPath 1.0 name() function, which returns a lexical QName for the node by finding a prefix that maps to the node's namespace URI.

[JM: Sounds like we need a QName object that is a triple, local-name, namesapce-uri, and prefix, to support XSLT.]

Closed by Mike Kay's namespace proposal

Namespace fixups required

The current XSLT draft includes a substantial piece of text on namespace fixup. This was introduced in the XSLT 1.1 working draft, and is basically designed to ensure that when an element or attribute is added to a result tree, namespace nodes are also added to declare any namespace URI used by the element or attribute. (At XSLT 1.0, this was handled at serialization time, but this had to change when temporary trees became accessible to the stylesheet). The description of namespace fixup logically belongs with the description of the node construction process in the data model document. http://lists.w3.org/Archives/Member/w3c-xsl-query/2001Oct/0036.html (members only).

Closed by Mike Kay's namespace proposal

Is prefix preserved?

Although XSLT and the XPath 2.0 data model agree that element and attribute nodes do not hold a namespace prefix, XSLT has always hinted that prefixes might be preserved through a transformation where possible. http://lists.w3.org/Archives/Member/w3c-xsl-query/2001Oct/0036.html (members only).

[JM: the ability to recover the lexical value of an xs:QName simple type seems useful, perhaps even necessary. It is at least needed to support the name() function. A better description of the xs:QName accessors is required, perhaps unifying xs:QName and expanded-QName.]

No, but you can get back a reasonable prefix from the namespace nodes.

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Including type definition in element constructors

The constructor for the element node probably should include the type definition in the constructor as well, for cases where the [type definition] of the element information item is not the same as the [type definition] of the [element declaration], which can occur if xsi:type is used in the document. Should the same be done for attribute node constructors for consistency, (even though attributes cannot have xsi:type attributes)? http://lists.w3.org/Archives/Public/www-xml-query-comments/2001Sep/0012.html.

MN: The definition of the element an attribute constructors have been changed to take an xs:QName that represents the type definition and not the element or attribute declaration. The type definition can be either the [type definition] of the element information item or the more specific [type definition] of the [element declaration] when xsi:type is used.

Proposal for alternative constructor

The element and attribute constructors are not convenient when constructing elements or attributes with simple-type content. An alternative constructor that takes a typed value directly (instead of only text nodes or normalized string values) would simplify dependent specifications. http://lists.w3.org/Archives/Member/w3c-xsl-query/2001Nov/0068.html (members only).

Add the two new constructors proposed by Mary with the change that the element/attribute declarations are not optional and they are named differently (so we won't overload the constructors). decision record (members only)

SchemaComponent support for substitutions groups

SchemaComponents currently don't expose any substitution group information. http://lists.w3.org/Archives/Member/w3c-xml-query-wg/2001Nov/0358.html (members only).

Align function call syntax

The syntax of the function calls need to be aligned with the other working documents, in particular the Functions and Operators and the XQuery documents. Possible candidates are f:(T1,T2)->T or f(T1 $v1, T2 $v2)=>T.

MN: The pseudo-calls and examples now use the XQuery syntax.

Retaining the type of a sequence of heterogeneous simple typed values.

When a sequence of heterogeneous simple typed values is passed as the content of an element constructor, is the type information associated with the simple typed values preserved? e.g. when using a sequence containing a decimal, a string and a boolean are these types preserved (or possibly changed to anySimpleType* ?). Note that XML Schema cannot describe this former. Related to this, should we ever allow an element constructor to create an instance that can not be serialized as XML text without loss of type information?

MN: As a result of the adoption of Alternative 1 for element/attribute constructors, the data model no longer provides constructors accepting a sequence of atomic values as arguments. The current constructors can only create instances whose type can be described by XML Schema.

Canonical form for derived types.

Should derived types have a canonical form? Should we ask XML Schema to fix this?

When you derive, you restrict its value space. The canonical form maps from the value space to a lexical form. Hence the canonical form is the mapping from the restricted value space to the lexical form.

Note, however, that if the derived type is restricted with a pattern, the resulting lexical form may not validate.

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Should the name accessor return "" or ()?

Should the name accessor return "" or () for nodes that have no name?

Returns ().

Magic Attributes

Should the following attributes be represented in the data model? xmlns attributes, xml:lang, xml:space, xsi:nil (etc). If so, how?

Namespace declarations do not appear in the attributes property. All of the other attributes except xsi:nil do.

As of Dec 2002, xsi:nil also appears in the attribute list. It is a normal attribute. In valid documents, it has the side-effect of setting the nilled property.

Lexical representation of Schema primitive types

Unfortunately the XML Schema Datatypes Rec doesn't detail the canonical lexical representation of all of the primitive types. In particular, no canonical lexical representation is specified for:

xs:string, xs:base64Binary, xs:anyURI (but that's OK, I think we can guess)

xs:duration - presumably the lexical representation contains all components of the duration (years, months, days, hours, minutes and seconds, even those that occur 0 times? Or are these omitted? In the latter case, what's the canonical lexical representation of PT0S? Since the number of seconds can be a decimal, is this decimal represented with a decimal point (i.e. using the canonical lexical representation for xs:decimal)?

xs:date - what happens to the timezone component? Presumably, unlike xs:dateTime and xs:time, this isn't normalized to Z? (And similarly for xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:Month, and xs:Day)

xs:QName and xs:NOTATION - these are the trickiest (their value spaces are the same). The XML Schema Rec states that the lexical representation of a QName depends on the in-scope namespaces. Does this mean the ones in the query/stylesheet or the ones from the source document? What if there's more than one namespace declaration for the namespace URI? What if there aren't any?

http://lists.w3.org/Archives/Public/www-xml-query-comments/2002Jan/0268.html

Closed. Use the casting tables in F&O: for xs:string, whatever the user put in; for xs:anyURI, whatever the user put in after whitespace handling; for xs:base64Binary, Schema says what to do (Schema 1.0 errata); for xs:duration, use what's in F&O casting table; for xs:date, with timezone has one form; for xs:date w/o timezone serialize without; for xs:QName, use casting table; for xs:NOTATION, not on the table (you can't build one); for xs:boolean = true/false.

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Whitespace normalization of the string-value of elements with simple content

Currently the data model says that the string value of an attribute node is normalized (according to the whitespace facet of its type); on the other hand, the string value of an element node is not normalized. The string value of elements with simple content should also be normalized in the same way as the string value of attributes. http://lists.w3.org/Archives/Public/www-xml-query-comments/2002Jan/0269.html

MN: The mapping from the PSVI to the data model has been updated to use the schema normalized value PSVI property (if exists) in the case of both attributes or elements.

Do we need Document fragments

Currently the Document node in the data model is permissive in the number of element nodes it can directly contain whereas the infoset only allows a single element node. Since preserving the single element node constraint is important to enforce queries to generate only well-formed documents when generating documents, the question is whether the data model needs to introduce a docfragment node that is permissive and keep the document node to be non-permissive. See http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Feb/0111.html (members only).

The constraints on document nodes have been relaxed, so fragments would be redundant. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0256.html

Support for unparsed entities

XSL WG has a requirement to access unparsed entities in a document. This is needed to support the XSLT 1.0 unparsed-entity-uri() function, and the unparsed-entity-public-id() function which we are adding for XSLT 2.0. In XSLT 1.0, unparsed entities were not described in the XPath data model, but in an XSLT addition to the data model. This solution is unsatisfactory: the data model should describe all the information that is available. The information is available in the Infoset so there is no architectural problem in providing it. There is room for debate about how it is best provided (we don't really want another node type if we can help it), but the information should be there. We are not proposing, at this stage, that the functions unparsed-entity-uri() and unparsed-entity-public-id() be moved from XSLT into XPath, though that can easily be done if people want them. The information about unparsed entities in the data model would therefore not be available to XQuery users or to XPath users outside an XSLT environment.

Closed: added data model accessors. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Oct/att-0275/01-2002-10-16-pres.html

PSVI to Type mapping supporting derived types

The last bullet in the definition of the PSVI to Type mapping does not handle derived types properly. In those cases we should not bottom out at xs:anyType, but attempt to use the rules all over again with the type from which the current type is derived. Also note that these rules need to be slightly different if we decide to use generated type identifiers when no type names are available.

The PSVI to Type Mapping has been extensively reworked since this issue was raised. I believe it is no longer an issue and the name of (globally declared) derived types are now properly represented in the data model. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0257.html

PSVI to Type mapping dependence on conformance levels

The rules specifying the computation of the types from the PSVI do not yet reflect that this process depends on the conformance level of the processor. For Basic, are any type annotations preserved? The Query/XPath book says the data types are mapped to the nearest supertype, the Data Model needs to agree with this.

Close with no changes. Possibly may require changes to data model to implement processing model conformance level changes.

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Data model does not represent CDATA sections

It is not clear how the current data model can represent or accomodate CDATA sections. (This is also captured as Issue 0293 on the XPath issue list.)

Subsumed by query issue #293. http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Oct/att-0275/01-2002-10-16-pres.html.

String-value vs. string-value of the typed-value

Note that the current definition of string-value($n) is such that it may differ from string-value(dm:typed-value($n)) for some element or attribute nodes $n. For instance given a node element-node(my:a,(),(),dm:text-node("01"),xs:integer), its string-value gives "01", but the string-value of the typed-value is "1".

The possible resolutions are that we (a) prohibit, (b) allow or (c) mandate string-value($n) to be string-value(dm:typed-value($n)). The current text reflects (a). Option (b) would result in non-interoperable implementations. Option (c) is promising. In that case the element constructor would need to be a little smart and change the textnode to contain the string value of atomic-value-sequence applied to the string-value of the original text node and the type passed to the constructor. Similar change would need to be done to the attribute constructor.

Further clarification of typed-value() and string-value() have been made in the document. It is not a goal to make string-value() be the same as string-value(typed-value()). Nor is it a goal to make them different. Jan 31, Mary, 097-02, xsl-query

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Typed value of Document, PI and Comment nodes

There is currently an asymmetry between the data model and the F&O documents. In the data model, the typed-value accessor on document, comment, PI nodes returns the empty sequence. In the F&O document, the data() function applied to a document, namespace, comment or processing instruction node, raises an error.

My intuition is that data() should be defined on all nodes, just as string-value() is defined. Raising an error on document, comment, PI, seems draconian and has tripped me up in writing queries that iterate over a variety of nodes. If data() returns error on such nodes, one ends up with a lot of code checking what kind of node is bound to a variable, etc.

Made consistent with F&O.

Schema-less documents with a DTD

If a document has a DTD, do we really have to lose the IDness of all ID attributes and other information that's available from DTD validation?

See Issue-0004.

Identifying element/attribute type

Note that currently the last bullet point cannot be reached from a legal PSVI because every element in a PSVI must have one of the combinations of properties listed. Elements whose type definition is anonymous still have a [type definition] property, it's just that the type definition's [name] is absent (the property exists, I think, but it has no value). Under the scheme above, such elements would have type whose namespace was the target namespace of the schema and whose name was nothing. http://lists.w3.org/Archives/Public/public-qt-comments/2002Aug/0018.html

The PSVI to type mapping now takes the existence of the [name] into consideration. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0261.html

Distinction between {name} and [name]

I'm not sure what distinction you're making between [name] and {name}. As I understand it in the XML Schema spec, {name} is a property on a schema component while [name] is a property on an information item in the PSVI. Since you're talking about properties in the PSVI, I believe you should be using the notation [name] rather than {name}. http://lists.w3.org/Archives/Public/public-qt-comments/2002Aug/0018.html

This editorial issue has been resolved.

How are unparsed entities represented?

Suppose I have an attribute that has the type xs:ENTITY and the value "foo". How can I find out the public and system identifiers of the external unparsed entity "foo"?

Duplicate of .

Globally declared namespaces in the infoset

There is an open issue about how to map namespaces declared globally in the query prolog during the data model to infoset transformation.

This issue replaces an editorial note in the previous draft that said there was an issue.

Since constructors have been removed, this is now simply a matter of setting up the data model correctly.

Nodes returned by namespaces and attributes

Does the constraint that the same set of nodes is returned restrict the possibilities for implementation? Should this constraint be relaxed?

For instance, an XML 1.0 store such as a DOM stores namespace information as attributes, and has no mechanism for undeclaring arbitrary namespaces. If a series of constructor functions are called to construct a data model instance that has fewer namespaces on children then on a parent element, the store will be unable to represent this, and might return extra namespace nodes. I claim (without proof :-) that these extra namespace nodes are harmless, and constraining implementations in this way is a burden.

Overtaken by events; constructors have been removed so the constraint no longer exists.

base-uri should return ()?

Calling base-uri on a node that has no (transitive) base URI property raises an error. Would it be better to return ()?

See http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Oct/att-0275/01-2002-10-16-pres.html.

NW: The empty sequence is a better and more consistent answer than an error.

Content type is not preserved

Calculating the value of element content whitespace requires knowing the element's content type which doesn't seem to be available.

The element content whitespace property of an infoset constructed from the data model is always false.

How is typed-value calculated?

I don't think the spec is clear enough on how typed-value is calculated. I think the following proposal would work, but perhaps I'm wrong or perhaps we've already got another proposal somewhere else.

If the element has a schema normalized value property:

() if the property is absent

Otherwise the result of casting the schema normalized value to the appropriate type (the element type or its content type).

Otherwise ().

NW: AM provided new text for the typed-value accessor for the 17 Jan draft.

Documents can be empty

I notice that there's a requirement that a document node has at least one child. This seems to have been in previous drafts, but I overlooked it. In XSLT it's legal to create a document node with no children.

http://lists.w3.org/Archives/Member/w3c-xsl-wg/2002Nov/0005.html

Constraint removed. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0260.html

Support for substitution groups

I suggest that we add a dm:substitution-groups() accessor to the data model that returns a sequence of xs:QNames derived from the {substitution group affiliations} of the [element declaration] reported for the element and use this list to work out whether an element "has been validated and found to be a member of a substitution group whose head element has the required name" rather than guessing on the basis of the element's name (and type).

http://lists.w3.org/Archives/Member/w3c-xsl-query/2002Nov/0007.html.

Return the element declaration and schema context in element-declaration()

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Anonymous type names must be expanded QNames?

Does the DM have to return QNames for anonymous types, or is it sufficient to return any{Simple}Type when the underlying type is anonymous. Or should the closest non-anonymous supertype be returned? What does the Formal Semantics need? (MFF has action to investigate from the 21 Jan 2003 XPath TF telcon.)

Reworded. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0435.html

dm:typed-value of complex types

The F&O document says fn:data() on nodes of complex type raises an error. What should DM say?

Closed by TF revisions of F&O and DM. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Feb/0274.html

Namespaces 1.0 or 1.1?

Are we building the document model on top of Namespaces 1.0 or 1.1? In Namespaces 1.1, namespaces may be undeclared.

Closed with no action in the data model. Will be dealt with at the language level.

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html

Expose validation in DM?

At present, there's no way to tell from the DM what if any validation was performed. Should it be possible to tell?

Not in this version. http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Jan/0264.html

Unparsed-entity-* in DM?

There are accessors for unparsed-entity-public-id() and unparsed-entity-system-id(), but there aren't actually any properties in the DM to suppor them. Nor is there any description of how they're obtained from the Infoset. This is probably editorial.

Overtaken by events; constructors have been removed so these are simply properties of the data model.

Type information for xsi:schemaLocation, xsi:type, etc.

W3C XML Schema always allow xsi:schemaLocation, xsi:type, etc. What are the types of those attributes in the data model?

They are handled correctly by schema. They will be normal attributes in untyped documents.

Relationship between xs:anySimpleType and xdt:untypedAtomic

We have taken it upon ourselves to invent a new type, xdt:untypedAtomic, which never occurs in a PSVI. Having done this, it becomes very important to specify clearly the relationship between this type and xs:anySimpleType, which can occur in a PSVI.

Closed with following change: Modify data model so that well-formed documents are aligned with XML Schema skip validation, i.e., attribute nodes from well-formed documents are annotated with xs:anySimpleType.

http://lists.w3.org/Archives/Member/w3c-xsl-query/2003Mar/0013.html