W3C

XQuery 1.0 Formal Semantics

W3C Working Draft 26 March 2002

This version:
http://www.w3.org/TR/2002/WD-query-semantics-20020326/
Latest version:
http://www.w3.org/TR/query-semantics/
Previous versions:
http://www.w3.org/TR/2001/WD-query-semantics-20010607/
http://www.w3.org/TR/2001/WD-query-algebra-20010215/
http://www.w3.org/TR/2000/WD-query-algebra-20001204/
Editors:
Denise Draper (XML Query WG), Nimble Technology <ddraper@nimble.com>
Peter Fankhauser (XML Query WG), Infonyte GmbH <fankhaus@infonyte.com>
Mary Fernández (XML Query WG), AT&T Labs - Research <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Kristoffer Rose (XSL WG), IBM Research <krisrose@us.ibm.com>
Michael Rys (XML Query WG), Microsoft <mrys@microsoft.com>
Jérôme Siméon (XML Query WG), Lucent Technologies <simeon@research.bell-labs.com>
Philip Wadler (XML Query WG), Avaya <wadler@avaya.com>

Abstract

This document presents the formal semantics of XQuery [XQuery 1.0: A Query Language for XML].

Status of this Document

This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.

Much of this document is the result of joint work by the XML Query and XSL Working Groups, which are jointly responsible for XPath 2.0, a language derived from both XPath 1.0 and XQuery. This document defines the formal semantics for XQuery 1.0. A future version of this document will also define the formal semantics for XPath 2.0. Further information on the progress of the work of the XML Query WG can be found at http://www.w3.org/XML/Query.

This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.

A complete list of issues is maintained in [C Issues]. In particular, the reader should note the following important issues.

Comments on this document should be sent to the W3C XML Query mailing list www-xml-query-comments@w3.org; note that any email information sent to this list is publicly available in the W3C XML Query mailing list archive http://lists.w3.org/Archives/Public/www-xml-query-comments/.

XQuery 1.0 has been defined jointly by the XML Query Working Group (part of the XML Activity) and the XSL Working Group (part of the Style Activity).

Table of Contents

1 Introduction
2 Preliminaries
    2.1 Processing model
    2.2 Data Model
        2.2.1 Data model components
        2.2.2 Node identity
        2.2.3 Document order and sequence order
        2.2.4 Errors
    2.3 Schemas and types
        2.3.1 The elements of a (static) type system
        2.3.2 Subtyping
            2.3.2.1 Type equivalence
            2.3.2.2 Type substitutability
            2.3.2.3 Subtyping and XML Schema derivation
        2.3.3 Static types and dynamic types
    2.4 Functions
        2.4.1 Functions and operators
        2.4.2 Functions and static typing
        2.4.3 Data Model Accessors and XPath Axes
        2.4.4 Other formal semantics functions
    2.5 Notation
        2.5.1 Judgments and patterns
        2.5.2 Inference rules
        2.5.4 Environments
        2.5.5 Static type inference
3 The XQuery Type System
    3.1 XQuery Types
        3.1.1 Wildcard types
    3.2 Instances, and Domain of a Type
        3.2.1 Domain of a NameTest
        3.2.2 Semantics of the Domain of a Type
    3.3 Subtyping
        3.3.1 NameTest Subtyping
        3.3.2 Semantics of Subtyping
        3.3.3 Type equivalence
    3.4 Prime types
        3.4.1 Prime types and sequences of items with similar types
        3.4.2 Computing Prime Types
        3.4.3 Common Prime Types
            3.4.3.1 Common Occurrence Indicator
    3.5 Importing types from XML Schema
        3.5.1 Schema
        3.5.2 QNames
        3.5.3 Complex Type Definitions
            3.5.3.1 Global complex type
            3.5.3.2 Anonymous local complex type
        3.5.4 Groups
            3.5.4.1 Global named group declarations
            3.5.4.2 Local groups
        3.5.5 minOccurs and maxOccurs
        3.5.6 Elements
            3.5.6.1 Global elements
            3.5.6.2 Local elements
        3.5.7 Attributes
            3.5.7.1 Use
            3.5.7.2 Global attributes
            3.5.7.3 Local attributes
        3.5.8 Simple Content
        3.5.9 Attribute Group
            3.5.9.1 Attribute group declaration
            3.5.9.2 Attribute group reference
            3.5.9.3 Attribute wildcard
        3.5.10 Complex Content
        3.5.11 Simple Types
    3.6 Major type issues
4 Semantics of Expressions
    4.1 Basics
        4.1.1 Expression Context
            4.1.1.1 Static Context
            4.1.1.2 Evaluation Context
        4.1.2 Type Conversions
    4.2 Primary Expressions
        4.2.1 Literals
        4.2.2 Variables
        4.2.3 Parenthesized Expressions
        4.2.4 Function Calls
        4.2.5 Comments
    4.3 Path expressions
        4.3.1 Axis Steps
            4.3.1.1 Axes
            4.3.1.2 Node Tests
        4.3.2 General Steps
        4.3.3 Step Qualifiers
            4.3.3.1 Predicates
            4.3.3.2 Dereferences
        4.3.4 Unabbreviated Syntax
        4.3.5 Abbreviated Syntax
    4.4 Sequence Expressions
        4.4.1 Constructing Sequences
        4.4.2 Combining Sequences
    4.5 Arithmetic Expressions
    4.6 Comparison Expressions
        4.6.1 Value Comparisons
        4.6.2 General Comparisons
        4.6.3 Node Comparisons
        4.6.4 Order Comparisons
    4.7 Logical Expressions
    4.8 Constructors
        4.8.1 Element Constructors
        4.8.2 Computed Element and Attribute Constructors
        4.8.3 Other Constructors and Comments
    4.9 FLWR Expressions
        4.9.1 FLWR expressions
        4.9.2 For expression
        4.9.3 Let expression
    4.10 Sorting Expressions
    4.11 Conditional Expressions
    4.12 Quantified Expressions
    4.13 Datatypes
        4.13.1 Referring to Datatypes
        4.13.2 Expressions on Datatypes
            4.13.2.1 Instance of
            4.13.2.2 Cast expressions
                4.13.2.2.1 Cast as
                4.13.2.2.2 Treat as
                4.13.2.2.3 Assert as
            4.13.2.3 Typeswitch
5 The Query Prolog
    5.1 Namespace Declarations and Schema Imports
    5.2 Function Definitions
6 Additional Semantics of Functions
    6.1 Formal Semantics Functions
        6.1.1 The fs:document function
        6.1.2 The fs:data function
    6.2 Functions with specific typing rules
        6.2.1 The dm:error function
        6.2.2 The op:union, op:intersect and op:expect operators
        6.2.3 The op:to operator

Appendices

A Normalized core grammar
B References
    B.1 Normative References
    B.2 Non-normative References
    B.3 Background References
C Issues
    C.1 Introduction
    C.2 Issues list
    C.3 Alphabetic list of issues
        C.3.1 Open Issues
        C.3.2 Resolved (or redundant) Issues
    C.4 Delegated Issues
        C.4.1 XPath 2.0
        C.4.2 XQuery
        C.4.3 Operators


1 Introduction

This document defines the formal semantics of XQuery. The present document is part of a set of documents that together define the XQuery 1.0 language:

The scope and goals for the XQuery language were laid out in the charter of the XML Query Working Group and in the XQuery requirements [XML Query 1.0 Requirements].

XQuery is a powerful language, capable of selecting and extracting complex patterns from XML documents and of reformulating them into results in arbitrary ways. This document defines the semantics of XQuery by giving a precise formal meaning to each of the constructions of the XQuery specification in terms of the XQuery data model. This document assumes that the reader is already familiar with the XQuery language.

Two important design aspects of XQuery is that it is functional and typed. These two aspects play an important role in the XQuery Formal Semantics.

XQuery is a functional language. It is built from expressions , called queries, rather than statements. Every construct in the language (except for the query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the later expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.

XQuery is a typed language. Types can be imported from one or several XML Schemas (typically describing the documents that will be processed), and the XQuery language can then perform operations based on these types (e.g., using a treat as operation). In addition, XQuery also supports a level of static type analysis. This means that the system can perform some inference on the type of a query, based of the type of its inputs. Static typing allows early error detection, and may be the basis for certain forms of optimization. The type system of XQuery is based on [XML Schema Part 1], but does not support all the features of XML Schema. For example, XQuery does not provide support for derivation. The relationship between the XQuery type system and XML Schema is defined in [3.5 Importing types from XML Schema]. Issues with respect to the XQuery type system are discussed in [3.6 Major type issues].

The XQuery formal semantics builds on long-standing traditions in the database and in the programming languages communities. In particular, we have been inspired by works on SQL [SQL], OQL [ODMG], [BKD90], [CM93], and nested relational algebra (NRA) [BNTW95], [Col90], [LW97], [LMW96]. We have also been inspired by systems such as Quilt [Quilt], UnQL [BFS00], XDuce [HP2000], XML-QL [XMLQL99], XPath [XPath], XQL [XQL99], XSLT [XSLT 99], and YaTL [YAT99]. Additional citations are found in the bibliography [B References].

This document is organized as follows. In [2 Preliminaries], we introduce concepts and notations used for defining the XQuery formal semantics. In [3 The XQuery Type System], we describe the XQuery type system, operations on the XQuery type system, and explain the relationship between the XQuery type system and XML Schema. In [4 Semantics of Expressions], we describe the dynamic and static semantics of XQuery. In [5 The Query Prolog], we describe the semantics of the XQuery prolog. In [6 Additional Semantics of Functions], we describe any additional semantics required for functions.

2 Preliminaries

Ed. Note: Status: This section contains introductory material and is mostly new.

In this section, we provide background concepts for the XQuery formal semantics and introduce the notations that are used.

2.1 Processing model

We first define a processing model for query evaluation. This processing model is not intended to be a model for an actual implementation, although a (naive) implementation might be obtained from it. It does not require or constrain any implementation technique, but any implementation should produce the same results as the one obtained by following this processing model and applying the rest of the formal semantics specification.

The processing model is based of four phases; each phase consumes the result of the previous phase and generates output for the next phase:

  1. Parsing. The grammar for the XQuery syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, some internal representation of the parsed syntax tree is created.

  2. Normalization. To simplify the semantics specification, we first perform some normalization over the query. The XQuery language provides many powerful features that make querys simpler to write and use, but are also redundant. For instance, a complex for expression might be rewritten as a composition of several simple for expressions. The language composed of these simpler query is called the XQuery Core language and is a subset of the complete XQuery language. The grammar of the XQuery Core language is given in [A Normalized core grammar].

    During the normalization phase, each XQuery query is mapped into its equivalent query in the core. (Note that this has nothing to do with Unicode Normalization which works on character strings). Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literals. After normalization, the full semantics can be obtained just by giving a semantics to the normalized Core query. This is done during the last two phases.

    Ed. Note: The query prolog needs to be processed before normalization starts. [5 The Query Prolog] explains how the query prolog is processed. This is not yet reflected in the processing model. See [Issue-0130:  When to process the query prolog].

  3. Static type analysis. Static type analysis checks if each query is type safe, and if so, determines their static types. Static type analysis is defined only for Core query. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and the known types of input documents into account.

    Static type analysis can result in a static error, if the query is not type-safe. For instance, a comparison between an integer value and a string value might be detected as an error during the static type analysis. If static type analysis succeeds, it results in a syntax tree where each sub-expression is "decorated" with its static type, and in an output type for the result of the query.

    Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The XQuery type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in XQuery by associating a standard type with the document, such that it may include any legal XML content.

  4. Dynamic Evaluation. This is the phase in which the result of the query is computed. The semantics of evaluation is defined only for Core query terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals. This guarantees that every expression can be unambiguously reduced to a value, which is the final result of the query.

The first three phases above are "analysis-time" (sometimes also called "compile-time") steps, which is to say that they can be performed on a query before examining any input document. Indeed, analysis-time processing can be performed before such document even exists. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of query, suitable for execution by a compiled-query processor. The last phase is an "execution-time" (sometimes also called "run-time") step, which is to say that it is performed on the actual input document.

Static analysis catches only certain classes of errors. For instance, it can detect a comparison operation applied between incompatible types (e.g., xsd:int and xsd:date). Some other classes of errors cannot be detected by the static analysis and are only detected at execution time. For instance, whether an arithmetic expression on 32 bits integers (xsd:int) yields an out-of-bound value can only be detected by looking at the data, and is raised as an evaluation error.

While implementations may be free to employ different processing models, the XQuery static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report static type errors, as specified in this document. It is still an open issue (See [Issue-0098:  Implementation of and conformance levels for static type checking]) whether to make the static type analysis phase optional to allow implementations to return a result for a type-invalid query in special cases where the query happens to perform correctly on a particular instance of input data. (Note that this is not the same as merely permitting that the static type error is emitted at evaluation time as we do below.)

Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: query processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. All the processing model defines is what the final result of processing should be.

The above processing phases are all internal to the query processor. They do not deal with how the query processor interacts with the outside world, notably how it accesses actual documents and types. A typical query engine would support at least three other important processing phases:

  1. XML Schema import phase. The XQuery type system is based on XML Schema. In order to perform static type analysis, the XQuery processor needs to build type descriptions that correspond to the schema of the input documents. This phase is achieved by mapping all schemas required by the query into the XQuery type system. The XML Schema import phase is described in this document in [3.5 Importing types from XML Schema].

  2. XML loading phase. During the evaluation phase, expressions are applied on instances of the [XQuery 1.0 and XPath 2.0 Data Model]. How XML documents are loaded into the data model is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not be discussed further here.

  3. Serialization phase. Once the query is evaluated, processors might want to serialize the result of the query as actual XML documents. Serialization of data model instances is still an open issue (See [Issue-0116:  Serialization]) and is not be discussed further here.

We do not describe the parsing phase formally. Notably, we do not specify data structures for the syntax trees. Instead, we use the XQuery concrete syntax directly as a support for the formal semantics specification. More details about parsing can be found in the [XQuery 1.0: A Query Language for XML] document. No further discussion of parsing is included here.

For the other three phases (normalization, static type analysis and evaluation), instead or giving an algorithm in pseudo-code, we describe the semantics formally by means of inference rules. Inference rules provide a notation to describe how the semantics of an expression derives from the semantics of its sub-expressions. Hence, they provide a concise and precise way for specifying bottom-up algorithms, as required by the normalization, static type analysis, and evaluation phases.

2.2 Data Model

In this section we first present the basic components of the XQuery data model pertinent to the semantic description. The remainder of the section is devoted to several related features that merit special attention: node identity, order, and error values.

2.2.1 Data model components

The components manipulated in XQuery, such as simple-typed values and nodes (attributes, elements, etc.), are defined in the [XQuery 1.0 and XPath 2.0 Data Model] document. In this section we present only a short review of these components. We refer the reader to the [XQuery 1.0 and XPath 2.0 Data Model] document for more details.

There are five kinds of components in the [XQuery 1.0 and XPath 2.0 Data Model]: Error, SimpleValue, Node, Sequence, and SchemaComponent.

  • A single error value.

  • A SimpleValue: a value from any of the value spaces of XML Schema simple types, as defined in [XML Schema Part 2].

  • A Node; which is one of the following kinds: DocumentNode, ElementNode, AttributeNode, NamespaceNode, CommentNode, PINode and TextNode. XML documents and their content are represented in the data model as trees composed of nodes and values. [XQuery 1.0 and XPath 2.0 Data Model] defines a mapping from the PSVI (Post Schema Validation Information Set) to such trees.

  • A Sequence: which is an ordered collection of nodes, an ordered collection of simple-typed values, or an ordered collection of items (i.e., any mixture of nodes and simple-typed values). An important characteristic of sequences is that they cannot be nested. That is, all sequences are "flat", and cannot be composed of other sequences. Moreover, there is no distinction between a single item and a singleton sequence containing that item. That is, a single item and a singleton sequence containing that item are equivalent. It is still open whether sequences of entire documents are permitted [Issue-0068:  Document Collections].

  • A SchemaComponent: the type of a node. The content and use of this schema component is still an open issue.

[XQuery 1.0 and XPath 2.0 Data Model] defines constructors, functions to construct each of these kinds of data model component, as well as accessors, functions to access their contents. For instance, the xf:children function can be used to retrieve the children nodes of an element node.

2.2.2 Node identity

In the [XQuery 1.0 and XPath 2.0 Data Model], nodes have an identity. Node identity follows from the unique location of any node within exactly one XML document (or document fragment, in the case of XML being constructed within the query itself). It is possible for two expressions to return not only the same value, but also the exact same node. On the other hand, two expressions could have results with the same document structure, but different identities. XQuery supports comparison of two nodes using either "node equality" (equality by identity) or "value equality" (equality by equal values), for instance using one of the two operators == or =.

Simple-typed values do not have an associated identity and are therefore always compared by value equality.

The difference between node equality and value equality is illustrated on the following example:

let $a1 := <book><author>Suciu</author></book>,
    $a2 := <author>Suciu</author>,
    $a3 := $a1/author
return
    ($a1/author == $a3), {-- true, they refer to the same node --}
    ($a1/author = $a2),  {-- true, they have the same value --}
    ($a1/author == $a2)  {-- false, they are not the same node --}

All XQuery's operators preserve node identity with the exception of explicit copy operations and node (element, attribute, etc.) constructors. An element constructor always creates a deep copy of its attributes and children nodes. Other operators, such as sequence construction or path expressions, do not result in copies of the corresponding nodes. So, for example, ($a3, $a2) creates a new sequence of nodes, the first of which has the same identity as $a1/author.

2.2.3 Document order and sequence order

There are two kinds of order in XQuery: sequence order and document order.

Sequence order. Sequences of items in the XQuery data model are ordered. Sequence order refers to the order of these items within a given sequence.

Document order. Document order refers to the total order among all nodes within a given document. It is defined as the order of appearance of the nodes when performing a pre-order, depth-first traversal of a tree. For elements, this corresponds to the order of appearance of their opening tags in the corresponding XML serialization. Document order is equivalent to the definition used in [XML Path Language (XPath) : Version 1.0].

Depending on the context, it may be possible to talk about two different notions of order for the same (set of) nodes. For example:

let $e := <list>
              <item id="1"/>
              <item id="2"/>
          </list>,
    $s := ($e/item[@id='2'], $e/item[@id='1'])
...

The item "1" is before item "2" in document order, but the same two nodes are in the opposite order in the sequence $s.

In the case of nodes among several documents, the order between the document is implementation defined but must be stable. I.e., it must not change for the duration of query evaluation. Support for document order among elements in distinct documents or document fragments is discussed in [Issue-0003:  Document Order].

2.2.4 Errors

Some queries result in an error. Errors that are detected at compile-time, like parse errors or type errors, are reported to the user by the system. Dynamic errors must return the Error value defined by the [XQuery 1.0 and XPath 2.0 Data Model].

Whenever necessary, we use the notation dm:error() to refer to the error value in the [XQuery 1.0 and XPath 2.0 Data Model]. General handling of errors in XQuery is still an open issue (See [Issue-0094:  Static type errors and warnings] and Issue-98 in [XQuery 1.0: A Query Language for XML]).

2.3 Schemas and types

The Data Model establishes the space of values that may be manipulated by XQuery, but it does not establish a complete type system. For example, we may know that a node is an ElementNode, but that does not tell us what kind of element it is, what its legal contents is, etc.

The XQuery type system is based on [XML Schema Part 1]. An introduction to XML Schema can be found in the primer [XML Schema Part 0]. An important notion underlying XML Schema are regular expressions, and their tree counterparts, regular tree grammars. An introduction to regular (tree) languages can be found in [Languages] or in [TATA]. In order to denote regular expressions, the XQuery type system uses the familiar notation of XML DTDs. We refer the reader to [XML] for more on DTDs.

Ed. Note: We do not give an introduction to regular expressions and tree grammars for now, but a future version of the document should add some introductory text about: regular expressions, tree grammars, and basic algorithms over these that are relevant for XQuery. Notably, some text about closure properties, complexity, subsumption algorithm, etc. could be provided there.

This section presents an introduction to (static) type systems in general, and a conceptual overview of the XQuery type system in particular. The XQuery type system is discussed formally in [3 The XQuery Type System].

2.3.1 The elements of a (static) type system

A type system relates values, types, and expressions. A (static) type system requires several constituent parts:

  • A universe of possible types. Every type system starts with a set of primitive types and most (including XQuery's) have type constructors that allow the construction of complex types.

    In XQuery, the primitive types are the built-in simple datatypes types of [XML Schema Part 2]. There are also type constructors for sequences and node types such as attributes and elements. These correspond to the concepts of group, attribute and element in [XML Schema Part 1] (or more precisely, in the formalization of XML Schema in [XML Schema : Formal Description]).

  • A notation for types. A syntax is necessary to define and manipulate types. In order to write the type related semantics of XQuery, this document uses its own notation, which is used notably in type inference and dynamic semantics rules.

  • A notion of instances, and domain of a type. Formally, a type is defined to be the (usually infinite) set of instance values that belong to that type. Thus the type xs:integer consists of the set of all integer values, and the type element address { xs:string } consists of all elements named "address" whose contents are a single string value.

  • A notion of subtyping. Subtyping is a relationship between type, i.e., it relates types to each other.

    Subtyping plays a crucial role in a type system. Most notably, it is used to determine the legality of arguments to functions.

  • Rules that relate expressions to types. The heart of static type inference is associating a static type with each expression in a query. In a strongly-typed language like XQuery, the static type is a guarantee that, no matter what the input to the query is, the result of evaluating the expression is a value that belongs to its static type.

  • Rules that relate values to types. During query evaluation, expressions are evaluated to yield values. The dynamic component of a typing system determines the actual types of data values as they are computed.

2.3.2 Subtyping

One type is a subtype of another if every value belonging to the first type also belongs to the other. Here is a simple example of subtyping:

( xs:double )*
( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains only doubles is also an example of a sequence containing either integers or doubles.

We write Type1 <: Type2 to indicate that type Type1 is a subtype Type2. Note that we use the subtype notation <: when defining the XQuery semantics, but it is not part of the XQuery syntax.

2.3.2.1 Type equivalence

Two types are equivalent if every value of one type is also a value of the other type, and vice versa, i.e., two types are equivalent if and only if they are both subtypes of the other. Here is a simple example of two equivalent types:

( xs:integer | xs:double )*
( xs:double | xs:integer )*

Both of these types mean a sequence of any number of doubles or integers.

We write type equivalence as Type1 = Type2, where Type1 and Type2 are both types. If Type1 = Type2 then Type1 <: Type2 and Type2 <: Type1 are true.

2.3.2.2 Type substitutability

Subtyping plays a crucial role in the semantics of function application since all functions have an associated signature that declares the types of input parameters and the type of the result of the function.

Intuitively, one can call a function not only by passing an argument of the declared type, but also any argument whose type is a subtype of the declared type. This property is called type substitutability.

Ed. Note: This section should contain an example.

2.3.2.3 Subtyping and XML Schema derivation

Subtyping in XQuery is not the same as "derivation" in XML Schema. A derived type is always a subtype of its base type, but the opposite is not true. For instance, consider the following types:

( xs:double )+
xs:double, ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains one or more doubles is also an example of a sequence containing a double followed by a sequence of zero or more integers or doubles. But derivation as defined in XML Schema does not hold between these two types.

The relationship between subtyping and XML Schema derivation is discussed in [Issue-0019:  Support derived types].

2.3.3 Static types and dynamic types

One consequence of having a static type system is that for the same expression, a type is first computed during the static phase, then a type for the actual value which is the result of that expression is computed during the dynamic phase. We use the terms static type and dynamic type respectively for each of these computed types for a given expression. It is important to note that for the same expression the dynamic type is often more precise than the static one, since the real value is not known statically. Indeed, the static type can be considered as an approximation of the dynamic one based on the available static information. Consider, for example, an XPath expression

      
$x/(foo | bar)
    

In this expression the static type might be

      
( element foo | element bar )*
    

If during execution, $x happens to contain exactly two foo elements, the dynamic type of (the result of evaluating) $x is a sequence of two foo elements.

      
( element foo, element foo ).
    

Of course, the basic guarantee of static type inference is that the dynamic type is always a subtype of the static type. Note that dynamic types are specific: they never contain choices, all groups, or quantifiers.

Ed. Note: MFF: This next paragraph should be changed to describe type annotations.

The XQuery data model also defines SchemaComponents associated with Nodes. SchemaComponents relate to the type associated with a Node through schema validation. This may be different from either the static or the dynamic type of the node. The relationship between static types, dynamic types and schema components in the data model is still an open issue. See [Issue-0018:  Align algebra types with schema].

2.4 Functions

2.4.1 Functions and operators

The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides a number of useful basic functions over the components of the XQuery data model (simple-typed values, nodes, sequences, etc). We use a number of these functions in the course of describing the XQuery semantics. Here is the list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the XQuery formal semantics:

  • op:intersect
  • op:union
  • op:except
  • op:concatenate
  • op:boolean-and
  • op:boolean-or
  • op:union-value
  • op:except-value
  • op:to
  • op:numeric-equal
  • op:numeric-add
  • op:get-end-datetime
  • op:numeric-subtract
  • op:get-duration
  • op:get-start-datetime
  • op:numeric-multiply
  • op:numeric-divide
  • op:numeric-mod
  • op:boolean-equal
  • op:datetime-equal
  • op:duration-equal
  • op:hex-binary-equal
  • op:base64-binary-equal
  • op:numeric-greater-than
  • op:datetime-greater-than
  • op:duration-greater-than
  • op:numeric-less-than
  • op:datetime-less-than
  • op:duration-less-than
  • op:node-equal
  • op:node-before
  • op:node-after
  • op:numeric-unary-plus
  • op:numeric-unary-minus
  • op:operation
  • xf:false
  • xf:true
  • xf:length
  • xf:boolean
  • xf:item-at
  • xf:get-namespace-uri
  • xf:get-local-name
  • xf:round
  • xf:compare
  • xf:not
  • xf:not3
  • xf:empty-sequence
  • xf:root
  • xf:index-of

2.4.2 Functions and static typing

Many functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document are generic: they perform operations on arbitrary components of the data model, e.g., any kind of node, or any sequence of items. For instance, the xf:distinct-nodes function removes duplicates in any sequence of nodes. As a result, the signature given in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document is also generic. For instance, the signature of the xf:distinct-nodes function is:

define function xf:distinct-nodes(node*) returns node*

If applied as is, this signature would result in poor static type information. In general, it would be preferable if some of these function signatures could take the type of input parameters into account. For instance, if the function xf:distinct-nodes is applied on a parameter of type element a*, element b, one can easily deduce that the resulting sequence is a collection of either a or b elements.

In order to provide better static typing, we specify typing rules for some of the functions in [XQuery 1.0 and XPath 2.0 Functions and Operators]. These additional typing rules are given in [6 Additional Semantics of Functions]. Here is the list of functions that are given specific typing rules.

  • dm:error
  • op:union
  • op:intersect
  • op:expect
  • op:to

Ed. Note: This list reflects the content of Sections 6, but should be expanded. See [Issue-0135:  Semantics of special functions].

2.4.3 Data Model Accessors and XPath Axes

The XQuery Data Model provides operations to construct or access component of the Data Model. Some of these operations are used in the course of defining the formal semantics. We use the namespace prefix dm: those functions to distinguish them for other functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Here is a list of data model constructors or accessors used for formal semantics specification.

  • dm:error
  • dm:atomic-value
  • dm:children
  • dm:attributes
  • dm:parent
  • dm:name
  • dm:node-kind
  • dm:dereference
  • dm:empty-sequence
  • dm:namespaces
  • dm:type
  • dm:typed-value
  • dm:attribute-simple-node
  • dm:element-simple-node
  • dm:text-node

XPath axes are used to describe tree navigation in instances of the XQuery data model. The semantics of axis navigation is described in terms of data model functions. Some of the XPath axes are directly supported through existing data model accessors. Some axes (e.g., ancestor) are defined as a simple recursive application of a data model accessor and others (e.g., following) require more complex computation on top of existing data model accessors. The correspondence between XPath axis and data model accessors is specified in section [4.3.1 Axis Steps]

2.4.4 Other formal semantics functions

In a few cases, the formal semantics makes use of some functions that are not currently in the [XQuery 1.0 and XPath 2.0 Functions and Operators] or in the [XQuery 1.0 and XPath 2.0 Data Model] documents. We use the namespace prefix fs: for those functions, to distinguish them from functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document and from Data Model accessors.

Here is a list of additional functions used for formal semantics specification.

  • fs:document
  • fs:distinct-doc-order
  • fs:data
  • fs:unique-item-at

2.5 Notation

Ed. Note: Status: This section is intended to provide an introduction to formal notations, such as inference rules, for readers which are not familiar with these notations. It can be skipped by the reader which is familiar with these notations. This introductory material is still considered a draft and will be improved in further rewritings of the Formal Semantics document.

In order to make the semantics specification both concise and precise, we use logical notation in the form of logical inference rules. In this section, we introduce the notations for inference rules and sorts of semantic values, specifically environments.

2.5.1 Judgments and patterns

Ed. Note: The following is a somewhat terse explanation of inference rules using formal semantic lingo informally. It needs some examples.

The basic building block of an inference rule is a judgment that expresses some property. A judgment is specified by a description that contains a combination of basic patterns, each identifying a sort of value that can be inserted, interspersed with symbols. Symbols must occur literally in judgments between values of a sort that corresponds to the used patterns.

A pattern is identified by one or more italic words. The names of patterns are significant: each pattern name corresponds to a sort of value that can be substituted legally for the pattern. For this reason, a pattern is sometimes called a metavariable that is instantiated to some value of the appropriate sort.

We employ the following conventions for pattern names:

  1. patterns for syntactic sorts denoting lexical items begin with upper case and their names correspond to terminals or nonterminals in the XQuery grammar. For example, Expr is a pattern that can be instantiated with any XQuery Expr.

  2. patterns for semantic sorts, to be defined, begin with lower case letters. For example, denotes a semantic value.

Furthermore, all patterns may appear with subscripts (e.g. Expr1, Expr2), which simply name different instances of the same pattern sort.

For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment described by 'Expr => value'. It is no problem that we are overloading the symbol '3' to denote both the '3' in an expression and the value 3: this is unambiguous because of the strict use of sorts in the judgment descriptions.

Finally, we require that each distinct pattern is instantiated to a single expression or value so if the same pattern occurs twice in a judgment description then it should be instantiated with the same value. This means that '3 => 3' is an instance of the judgment described by 'Expr => Expr' but '$x+0 => 3' is not. (Both, however, are instances of the judgment described by 'Expr1 => Expr2'.)

2.5.2 Inference rules

Inference rules are used to express the logical relation between judgments and define how complex judgments can be concluded from simpler premise judgments. (As explained in [2.1 Processing model], this approach lends itself well to bottom-up algorithms.)

A logical inference rule is written as two collections of premise and conclusion judgments, written above and below a dividing line, respectively:

premise1 ... premisen

conclusion1 ... conclusionn

The interpretation of an inference rule is: if all the premise judgments above the line are true, then we know that each of the conclusion judgments below the line are also true.

Here is a simple example to illustrate (based on the example judgment from above described by 'Expr1 => Expr2'):

$x => 0      3 => 3

$x + 3 => 3

This rule expresses the property of the evaluation judgment 'Expr=>value': if an expression containing the variable reference '$x' yields the value zero, and the literal expression '3' yields the value '3', then we know that '$x + 3' yields the value '3'.

It is also possible for the expression above the line to have no judgments at all; this simply means that the expression below the line is true under all premises:


3 => 3

This judgment says that evaluating the expression '3' always yields the value three.

The rules above are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns that match sorts of things. For example, here is a rule that says for any variable, adding 0 yields the same value:

Variable => value

Variable + 0 => value

Formally, the rule states that if some variable in XQuery, to which Variable is instantiated, evaluates to some value, to which value is instantiated, then we can conclude that the query 'Variable + 0', with Variable substituted by the actual variable, evaluates to the same value.

We require that each unique patterns used in a particular inference rules is instantiated to the same value for the entire rule. This means that we can refer to "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

Putting these together, here is an example of an inference rule that occurs later in this document:

statEnvs |- Expr1 : xs:boolean      statEnvs |- Expr2 : Type2      statEnvs |- Expr3 : Type3

statEnvs |- if Expr1 then Expr2 else Expr3 : ( Type2 | Type3 )

This rule states that if the conditional expression of an if expression has type boolean, then the type of the entire expression is one of the two types of its then and else clauses. Note that this is represented as a choice type: '(Type2|Type3)'.

The rule also illustrates a convention of judgment expressions: Judgments usually consist of three parts: the assumptions, the problem, and the result, organized as "assumptions |- problem =>> result" The "|-" symbol is pronounced "turnstile" and is ubiquitous in the rules whereas the actual symbol used instead of "=>>" varies with the defined judgment -- in this particular example it is the ":" symbol. The idea is that with the assumption defined one can compute the result from the problem.

Ed. Note: Jonathan suggests to explain how to 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.5.3 Normalization rules

Normalization is presented as logical rewrite rules, which specify those expressions that are rewritten into some other expression in the XQuery core. The static semantics is presented as type inference rules, which relate XQuery expressions to types and specify under what conditions an expression is well typed. The dynamic, or operational, semantics is presented as value inference rules, which specify the order in which an XQuery expression is evaluated and relate XQuery expressions to values.

Ed. Note: I'm leaving this unchanged, but I think we can and should change the definition of normalization so that it doesn't force the use of "no-op" rules; i.e. instead of forcing the recurrent execution of more mapping rules, allow them to be pattern matched just as the rest of the inference rules are.

Normalization applies a translation function to each expression in the XQuery syntax and returns an expression in the XQuery core syntax; The notation [ Expr ] denotes the result of translating the XQuery expression Expr into an expression in the XQuery core. All normalization rules have the following structure:

 
[Expr]
==
coreExpr

The [ Expr ] above the == sign denotes an expression in the XQuery language before translation and the coreExpr beneath the == sign denotes an equivalent expression in the XQuery core.

For convenience, the translation of an optional expression, e.g., Expr?, is denoted by ( [ Expr ] )?, meaning that the optional expression is translated, if it exists. Similarly, the translation of a repeated expression, e.g., Expr* is denoted by [ Expr ]*, meaning that each expression in the repetition is translated individually.

Many of the mapping rules from XQuery into the core convert expressions that have implicit arguments, types, or complex semantics into core expressions that have explicit arguments, types, and a more verbose, but explicit semantics.

2.5.4 Environments

Logical inference rules use environments to record information computed during static type inference or dynamic evaluation so that information can be used by other logical inference rules. For example, the type signature of a user-defined function in a query prolog can be recorded in an environment and used by subsequence rules. Similarly, the value assigned to a variable within a let statement can be captured in an environment and used for further evaluations.

In general an environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to a value. If env is an environment then "env( symbol )" denotes the value that symbol is mapped to by env. Depending on the kind of environment, the "value" can be a type, a data value, etc. The notation is intentionally akin to function application as an environment can be seen as a "function" from the argument symbol to the value that symbol maps to.

In this specification we use environment groups that group related environments. If "envs" is an environment group with the member "mem", then that environment is denoted "envs.mem" and the value that it maps symbol to is denoted "envs.mem(symbol)".

In particular, updating is only defined on environment groups:

  • "envs [ mem(symbol |-> value) ]" denotes the environment group which is just like envs except that the mem environment has been updated to map symbol to value.

  • If the "value" is a type then we use the conventional variant notation "envs [ mem(symbol : value) ]".

  • We allow the shorthand "envs [ mem( symbol1 |-> value1 ; ... ; symboln |-> valuen ) ]" for the nested " (...(envs[mem(symbol1 |-> value1)]) ... [mem(symboln |-> valuen)])".

Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to properly capture the scope of variables, namespaces, etc. Also, note that there are no operations to remove entries from environments: the need never arises because the environment group from "before" the update remains accessible concurrently with the updated version.

Finally, an example that brings several of the notations above together. Here is a rule that extends the varType member of the static environment group statEnvs with a typing of a new variable:

statEnvs |- Expr1 : Type1      statEnvs [ varType(QName : Type1) ] |- Expr2 : Type2

statEnvs |- let $QName = Expr1 return Expr2   :   Type2

2.5.5 Static type inference

The result of static type inference is to associate a static type with every expression in a query, such that any evaluation of that expression is guaranteed to yield a value that belongs to that type. This section outlines how this is done.

For each type of expression (as defined by the grammar production rules of XQuery), we define static type inference rules. Here is a simple example of such a rule:

Expr1 : Type1      Expr2 : Type2

Expr1 , Expr2 : Type1, Type2

This rule is read as follows: if two expressions Expr1 and Expr2 have already been statically inferred to have types Type1 and Type2 (that's the premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the type "Type1, Type2". Basically, it says that when two expressions are concatenated together, the result type is the concatenation of their types.

The "left half" of the expression below the line (the part before the ":") corresponds to some phrase in a query, for which we want to compute a type. If the query has been parsed into an internal parse tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (Expr1 and Expr2) that need to be matched against the children of the node in the parse tree. The expressions above the line indicate things that we need to know or compute to use this rule; in this case, we need to know the types of the two sub-expressions of the "," operator. Once we know those types (by applying static inference rules recursively to the expressions on each side), then we can compute the type of this expression. This illustrates a general feature of the type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is a recursive, following the parse structure of the query. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed: the query is not type-safe.

As stated earlier, the types computed by static inference rules are sound, which means that they guarantee that any value computed by an expression must belong to the static type of the expression. Another characteristic of a static type system is completeness. A type system is complete if it computes the "tightest possible" static type for every expression. It is very rare for type systems to be complete, and XQuery is no exception: there are expressions for which the static type system of XQuery computes a type that is not the tightest type possible.

2.5.6 Evaluation rules

Dynamic semantics associates values with query expressions. As with static type inference, we use logical inference rules to determine the value of expressions given values of sub-expressions, and information contained in environments. XQuery's dynamic semantics is modeled on the dynamic semantics presented in [Milner].

The dynamic rules use judgments of the form: dynEnvs |- phrase => value, where phrase is a phrase in XQuery syntax, and value is a value.

The inference rules used for dynamic evaluation, like those for static type inference, follow a top-down recursive structure, computing the value of expressions from the values of their sub-expressions.

The dynamic type of data values is completely determined by the values themselves. Thus, there are no extra rules or machinery for computing dynamic types.

3 The XQuery Type System

Ed. Note: Status:The XQuery type system defined in this section is based on structural typing. The definition and formal specification of the XQuery type system will be revisited when support for named typing is be added. Support for named typing is discussed in [Issue-0104:  Support for named typing].

The XQuery type system is used for the specification of both the dynamic and the static semantics of XQuery. It is used to describe the semantics of datatype operations and type conversion rules in XQuery. It is also the basis for the static semantics of XQuery.

The XQuery type system is based on XML Schema. XML Schema defines a notion of validation for XML documents. When doing validation, an XML Schema processor checks if the structure, and the text content of a document verifies the constraints on the structure, and the constraints on values, specified in the schema. As an example of constraints on the document structure, all book elements might be declared to contain an isbn attribute, and a title element, followed by a sequence of one or more author elements. As an example of constraints on values, the isbn attribute might be declared as a subset of the XML Schema xsd:string values, using a XML Schema pattern facet.

The XQuery type system is based on the structural constraints of XML schema, which have been formalized as [XML Schema : Formal Description]. There are two main difference between the XQuery type system and [XML Schema : Formal Description]. First the XQuery type system does not capture the notion of XML Schema names (also called normalized universal names in [XML Schema : Formal Description]). Second, the XQuery type system uses a specific syntax, as needed for semantics specification, notably the static type inference.

This section describes the XQuery type system, the essential operations on the XQuery type system, and the relationship between the XQuery type system and XML Schema.

3.1 XQuery Types

Types in the XQuery type system are described with the following grammar:

[75]   TypeDeclaration   ::=   ("define" "element" QName "{" Type? "}")
|  ("define" "attribute" QName "{" Type? "}")
|  ("define" "type" QName "{" Type? "}")
[76]   Type   ::=   TypeUnion
|  TypeBoth
|  TypeSequence
|  TypeRepeat
|  ItemType
|  TypeRef
|  TypeParenthesized
|  TypeNone
[77]   TypeUnion   ::=   Type  "|"  Type
[78]   TypeBoth   ::=   Type  "&"  Type
[80]   TypeRepeat   ::=   Type TypeOccurrenceIndicator
[79]   TypeSequence   ::=   Type  ","  Type
[81]   ItemType   ::=   ElementRef
|  AttributeRef
|  SimpleType
[82]   TypeRef   ::=   "type" QName
[83]   TypeParenthesized   ::=   "(" Type? ")"
[84]   TypeNone   ::=   "none"
[55]   SimpleType   ::=   QName
[85]   AttributeRef   ::=   ("attribute" QName)
|  ("attribute" NameTest "{" Type? "}")
[86]   ElementRef   ::=   ("element" QName)
|  ("element" NameTest "{" Type?)
[87]   TypeOccurrenceIndicator   ::=   "*" |  "+" |  "?"

Ed. Note: This type system does not support text, process-instruction and document nodes. Such types have to be added (See [Issue-0105:  Types for nodes in the data model.], and [Issue-0068:  Document Collections]).

In addition, some constraints, not specified by the grammar, must hold:

SimpleType

SimpleType must be one of the built-in simple datatypes from [XML Schema Part 2].

AttributeRef

The content model of a AttributeRef must be consistent with the allowed content of attributes. Namely, it must be a subtype of (using wilcards types as defined in section on [3.1.1 Wildcard types]):

            
xs:AnySimpleType *
         

In the case the content of the attribute is not specified (e.g., attribute isbn), this indicates a reference to a globally defined attribute in the schema. Therefore the corresponding attribute must exist in the schema and be globally declared.

ElementRef

The content model of a ElementRef must be consistent with the allowed content of elements. Namely, it must be a subtype of:

  ( xs:AnyAttribute* ,
       xs:AnySimpleType* |
       ( xs:AnyElement | xs:string )* )
         

Note that comments and processing instructions, while they are allowed data values within an element, do not constitute part of an element's type.

In the case the content of the element is not specified (e.g., element title), this indicates a reference to a globally defined element in the schema. Therefore the corresponding element must exist in the schema and be globally declared.

TypeNone

The type TypeNone indicates a type error. This type can be generated by the type system during the course of static type analysis, but it is not a legal value in any type declaration in a query. Note that TypeNone is the identity for choice (I.e., (t | TypeNone) == t)).

Ed. Note: Jerome: The status of the constraints on attribute and element type constructors is unclear. When are they enforced and checked ? (See [Issue-0106:  Constraint on attribute and element content models]

In [XML Schema Part 2], a simple datatype is either a primitive datatype, or is derived from another simple datatype by specifying a set of facets. The type syntax of XQuery does not provide any way to define datatypes with facets. Such types can be imported from XML Schemas and may be referenced as Type xs:QName, where xs:QName names the definition of the faceted type in the schema.

The occurrence indicators "*", "?" and "+" indicate a sequence of any, zero or one, or more than one items, respectively. This plays the same role as, but is more limited than, the minoccurs/maxoccurs facets of [XML Schema Part 1], and follows the notations of DTDs.

A WildCard denotes a set of names. A WildCard is either

  • a QName denoting the name with the given namespace prefix and local name;

  • * denoting any name in any namespace;

  • NCName:* denoting any local name in namespace NCName;

  • or *:NCName denoting the local name NCName in any namespace;

The element, and attribute, type constructors may use a wildcard to specify sets of one or more matching elements, or attributes. For example,

  element bib:* { xs:integer* }  

indicates any element in the bib namespace whose content consists of zero or more integers.

Empty content is indicated with the explicit empty sequence, as in

  element bib:* { ( ) }  

If the content model is omitted, then the element, (resp. attribute or type) name must be a proper QName, and refers to the global element (resp. attribute or type) with that name, which must be defined (by importation from a schema or through a type definition expression).

The type system includes three operators that can combine types: ",", "|" and "&".

The "," operator builds the "sequence" of two types. For example,

  xs:integer , xs:double  

means a sequence of an integer followed by a double. This corresponds to the xs:sequence construct of XML Schema for elements; XML Schema does not recognize similar properties for simple types.

The "|" operator builds a "choice" between two types. For example,

  xs:integer | (element bib:author)*  

means either an integer or a sequence of bib:author elements. The "|" operator corresponds to the xs:choice construct of XML Schema for elements and the xs:union construct for simple types.

The "&" operator builds the "interleaved product" of two types. The type t1 & t2 matches any sequence that is an interleaving of a sequence that matches t1 and a sequence that matches t2. For example,

  (element a , element b) & xs:string   =
                  element a, element b, xs:string
                | element a, xs:string, element b
                | xs:string, element a, element b   

In each case, the order of element a and element b is unchanged, but the xs:string can be interleaved in any position.

The interleaved product is a generalization of the xs:all construct in XML Schema for elements, and has no corresponding representation in XML Schema for simple types. The xs:all construct in XML Schema may only consist of global or local element declarations with lower bound 0 or 1, and upper bound 1.

3.1.1 Wildcard types

XQuery defines the following built-in datatypes:

define type xs:AnySimpleType { xs:integer | xs:float | xs:string ... }
define type xs:AnyAttribute  { attribute * { type xs:AnySimpleType * }}
define type xs:AnyElement    { element *  { type xs:AnyType } }
define type xs:AnyNode       { type xs:AnyAttribute | type xs:AnyElement }
define type xs:AnyItem       { type xs:AnySimpleType | type xs:AnyNode }
define type xs:AnyType       { type xs:AnyItem* }

The concepts of xs:AnySimpleType and xs:AnyType have the same interpretation as anySimpleType and anyType of XML Schema, respectively, though this version of xs:AnyType contains types (such as comments and processing instructions) that do not occur in XML Schema.

3.2 Instances, and Domain of a Type

Ed. Note: Status:This section is still incomplete and based on structural typing. The notion of domain of a type will be revised when support for named typing is added.

The semantics of a type is given by the notion of domain, I.e., the set of all values which are instance of that type. A tree in the [XQuery 1.0 and XPath 2.0 Data Model] is an instance of a type in the XQuery type system, if and only if it verifies the structural constraints described by the type. For instance the query:

  <bib:pubdate> { 1954, 1966, 1974, 1986 } </bib:pubdate> 

creates an instance of the following type:

  element bib:* { attribute country { xs:string }?, xs:integer* }  

because the element name "bib:pubdate" is an instance of the wildcard nametest "bib:*", the attribute "country" is optional, and the content of that element is indeed a sequence of integers.

The notion of instance of a type in XQuery is structural. Notably, it does not take XML Schema names into account. For instance, the two following XQuery types:

  define element entry { type Book }
  define type Book {
    attribute isbn { xs:string },
    element title { xs:string },
    element author { xs:string }*,
  }

and

  define element entry { type Article }
  define type Article {
    attribute isbn { xs:string }?
    (element title { xs:string } | element ref { xs:string }),
    element author { xs:string }+,
  }

might be declared in two different schemas, one for books and one for research articles. Because the notion of instance in the XQuery type system is structural, the following document:

<entry> isbn="0-618-15398-5">
  <title>The Lord of the Rings</title>
  <author>J.R.R. Tolkien</author>
</entry>

would always be an instance of both types "Book" and "Article". In XML Schema, depending against which schema the document is validated, only one of the two above types would hold.

In this section, we define formally the domain of a XQuery type. We first define the domain of a nametest, then define the domain of a complete type.

3.2.1 Domain of a NameTest

The semantics of a given NameTest is defined by the set of names that matches a that NameTest.

Notation

We denote access to the domain of a NameTest with the following auxiliary judgment. The notation:

statEnvs |- QName1 inwild NameTest2

indicates that the QName QName is in the domain defined by the NameTest NameTest2.

We specify the relationship QName1 inwild NameTest2 by decomposing both sides in to QNames or WildCards, and then defining the relationship between each component of the name.

First, is always in the domain of QName is always a singleton compose of itself. Note that the following inference rule takes namespace resolution into account.

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expand(QName2) => qname(URI2,ncname2)            ncname1 = ncname2

statEnvs |- QName1 inwild QName2

statEnvs |- expand(QName1) => qname(ncname1)
statEnvs |- expand(QName2) => qname(ncname2)            ncname1 = ncname2

statEnvs |- QName1 inwild QName2

Now the rules for QName inwild WildCard:

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expandNCName(NCName2) => URI2

statEnvs |- QName1 inwild NCName2:*

statEnvs |- expand(QName1) => qname(URI1,ncname1)      ncname1 = NCName2

statEnvs |- QName1 inwild *:NCName2


statEnvs |- QName1 inwild *

3.2.2 Semantics of the Domain of a Type

Notation

We denote the domain of a type using the following auxiliary judgment. The notation:

statEnvs |- value in Type

indicates that the value is the domain of type Type.

The XQuery type system is based on tree grammars. The semantics for the domain of a type is based on the set of a trees accepted by the corresponding tree grammar.

We do now give a full specification of the notion of language recognized by a tree grammar at this point, but refer the reader to the literature on tree grammars, for instance [Languages], or [TATA]

Ed. Note: Future versions of that document will include a more complete description of the notion of domain of a type.

Ed. Note: The use of pure tree grammars for defining the domain yields a structural definition of the domain of a type. This definition will have to be revised as soon as the named typing proposal is added.

3.3 Subtyping

This section defines the semantics of subtyping in XQuery. Subtyping is an important relationship between type, which is used notably, when doing function applications to check whether the input parameters of the function are valid.

We first define subtyping between nametests, then define the notion of subtyping between arbitrary types.

3.3.1 NameTest Subtyping

Notation

We denote NameTest subtyping with the following auxiliary judgment. The notation:

statEnvs |- NameTest1 <:wild NameTest2

indicates that NameTest NameTest1 is a subtype of the NameTest NameTest2.

We evaluate NameTest1 <:wild NameTest2 by decomposing both sides in to QNames or WildCards, and then defining a subtyping relation between each component of the name.

First QName <:wild QName is only true if they represent the same fully qualified name:

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expand(QName2) => qname(URI2,ncname2)            ncname1 = ncname2

statEnvs |- QName1 <:wild QName2

statEnvs |- expand(QName1) => qname(ncname1)
statEnvs |- expand(QName2) => qname(ncname2)            ncname1 = ncname2

statEnvs |- QName1 <:wild QName2

Now the rules for QName <:wild WildCard:

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expandNCName(NCName2) => URI2

statEnvs |- QName1 <:wild NCName2:*

statEnvs |- expand(QName1) => qname(URI1,ncname1)      ncname1 = NCName2

statEnvs |- QName1 <:wild *:NCName2


statEnvs |- QName1 <:wild *

And finally rules for WildCard1 <:wild WildCard2. Note that the second rule also prevents NCName:* from being a subtype of anything unless it is a valid known prefix.


statEnvs |- * <:wild *

expandNCName(NCName) => URI

statEnvs |- NCName:* <:wild *


statEnvs |- *:NCName <:wild *

Ed. Note: DD: (1) I've taken liberties comparing, e.g. NCName = ncname, (2) this doesn't take into account any resolution of URIs.