W3C

XQuery 1.0 Formal Semantics

W3C Working Draft 26 March 2002

This version:
http://www.w3.org/TR/2002/WD-query-semantics-20020326/
Latest version:
http://www.w3.org/TR/query-semantics/
Previous versions:
http://www.w3.org/TR/2001/WD-query-semantics-20010607/
http://www.w3.org/TR/2001/WD-query-algebra-20010215/
http://www.w3.org/TR/2000/WD-query-algebra-20001204/
Editors:
Denise Draper (XML Query WG), Nimble Technology <ddraper@nimble.com>
Peter Fankhauser (XML Query WG), Infonyte GmbH <fankhaus@infonyte.com>
Mary Fernández (XML Query WG), AT&T Labs - Research <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Kristoffer Rose (XSL WG), IBM Research <krisrose@us.ibm.com>
Michael Rys (XML Query WG), Microsoft <mrys@microsoft.com>
Jérôme Siméon (XML Query WG), Lucent Technologies <simeon@research.bell-labs.com>
Philip Wadler (XML Query WG), Avaya <wadler@avaya.com>

Abstract

This document presents the formal semantics of XQuery [XQuery 1.0: A Query Language for XML].

Status of this Document

This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.

Much of this document is the result of joint work by the XML Query and XSL Working Groups, which are jointly responsible for XPath 2.0, a language derived from both XPath 1.0 and XQuery. This document defines the formal semantics for XQuery 1.0. A future version of this document will also define the formal semantics for XPath 2.0. Further information on the progress of the work of the XML Query WG can be found at http://www.w3.org/XML/Query.

This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.

A complete list of issues is maintained in [C Issues]. In particular, the reader should note the following important issues.

Comments on this document should be sent to the W3C XML Query mailing list www-xml-query-comments@w3.org; note that any email information sent to this list is publicly available in the W3C XML Query mailing list archive http://lists.w3.org/Archives/Public/www-xml-query-comments/.

XQuery 1.0 has been defined jointly by the XML Query Working Group (part of the XML Activity) and the XSL Working Group (part of the Style Activity).

Table of Contents

1 Introduction
2 Preliminaries
    2.1 Processing model
    2.2 Data Model
        2.2.1 Data model components
        2.2.2 Node identity
        2.2.3 Document order and sequence order
        2.2.4 Errors
    2.3 Schemas and types
        2.3.1 The elements of a (static) type system
        2.3.2 Subtyping
            2.3.2.1 Type equivalence
            2.3.2.2 Type substitutability
            2.3.2.3 Subtyping and XML Schema derivation
        2.3.3 Static types and dynamic types
    2.4 Functions
        2.4.1 Functions and operators
        2.4.2 Functions and static typing
        2.4.3 Data Model Accessors and XPath Axes
        2.4.4 Other formal semantics functions
    2.5 Notation
        2.5.1 Judgments and patterns
        2.5.2 Inference rules
        2.5.4 Environments
        2.5.5 Static type inference
3 The XQuery Type System
    3.1 XQuery Types
        3.1.1 Wildcard types
    3.2 Instances, and Domain of a Type
        3.2.1 Domain of a NameTest
        3.2.2 Semantics of the Domain of a Type
    3.3 Subtyping
        3.3.1 NameTest Subtyping
        3.3.2 Semantics of Subtyping
        3.3.3 Type equivalence
    3.4 Prime types
        3.4.1 Prime types and sequences of items with similar types
        3.4.2 Computing Prime Types
        3.4.3 Common Prime Types
            3.4.3.1 Common Occurrence Indicator
    3.5 Importing types from XML Schema
        3.5.1 Schema
        3.5.2 QNames
        3.5.3 Complex Type Definitions
            3.5.3.1 Global complex type
            3.5.3.2 Anonymous local complex type
        3.5.4 Groups
            3.5.4.1 Global named group declarations
            3.5.4.2 Local groups
        3.5.5 minOccurs and maxOccurs
        3.5.6 Elements
            3.5.6.1 Global elements
            3.5.6.2 Local elements
        3.5.7 Attributes
            3.5.7.1 Use
            3.5.7.2 Global attributes
            3.5.7.3 Local attributes
        3.5.8 Simple Content
        3.5.9 Attribute Group
            3.5.9.1 Attribute group declaration
            3.5.9.2 Attribute group reference
            3.5.9.3 Attribute wildcard
        3.5.10 Complex Content
        3.5.11 Simple Types
    3.6 Major type issues
4 Semantics of Expressions
    4.1 Basics
        4.1.1 Expression Context
            4.1.1.1 Static Context
            4.1.1.2 Evaluation Context
        4.1.2 Type Conversions
    4.2 Primary Expressions
        4.2.1 Literals
        4.2.2 Variables
        4.2.3 Parenthesized Expressions
        4.2.4 Function Calls
        4.2.5 Comments
    4.3 Path expressions
        4.3.1 Axis Steps
            4.3.1.1 Axes
            4.3.1.2 Node Tests
        4.3.2 General Steps
        4.3.3 Step Qualifiers
            4.3.3.1 Predicates
            4.3.3.2 Dereferences
        4.3.4 Unabbreviated Syntax
        4.3.5 Abbreviated Syntax
    4.4 Sequence Expressions
        4.4.1 Constructing Sequences
        4.4.2 Combining Sequences
    4.5 Arithmetic Expressions
    4.6 Comparison Expressions
        4.6.1 Value Comparisons
        4.6.2 General Comparisons
        4.6.3 Node Comparisons
        4.6.4 Order Comparisons
    4.7 Logical Expressions
    4.8 Constructors
        4.8.1 Element Constructors
        4.8.2 Computed Element and Attribute Constructors
        4.8.3 Other Constructors and Comments
    4.9 FLWR Expressions
        4.9.1 FLWR expressions
        4.9.2 For expression
        4.9.3 Let expression
    4.10 Sorting Expressions
    4.11 Conditional Expressions
    4.12 Quantified Expressions
    4.13 Datatypes
        4.13.1 Referring to Datatypes
        4.13.2 Expressions on Datatypes
            4.13.2.1 Instance of
            4.13.2.2 Cast expressions
                4.13.2.2.1 Cast as
                4.13.2.2.2 Treat as
                4.13.2.2.3 Assert as
            4.13.2.3 Typeswitch
5 The Query Prolog
    5.1 Namespace Declarations and Schema Imports
    5.2 Function Definitions
6 Additional Semantics of Functions
    6.1 Formal Semantics Functions
        6.1.1 The fs:document function
        6.1.2 The fs:data function
    6.2 Functions with specific typing rules
        6.2.1 The dm:error function
        6.2.2 The op:union, op:intersect and op:expect operators
        6.2.3 The op:to operator

Appendices

A Normalized core grammar
B References
    B.1 Normative References
    B.2 Non-normative References
    B.3 Background References
C Issues
    C.1 Introduction
    C.2 Issues list
    C.3 Alphabetic list of issues
        C.3.1 Open Issues
        C.3.2 Resolved (or redundant) Issues
    C.4 Delegated Issues
        C.4.1 XPath 2.0
        C.4.2 XQuery
        C.4.3 Operators


1 Introduction

This document defines the formal semantics of XQuery. The present document is part of a set of documents that together define the XQuery 1.0 language:

The scope and goals for the XQuery language were laid out in the charter of the XML Query Working Group and in the XQuery requirements [XML Query 1.0 Requirements].

XQuery is a powerful language, capable of selecting and extracting complex patterns from XML documents and of reformulating them into results in arbitrary ways. This document defines the semantics of XQuery by giving a precise formal meaning to each of the constructions of the XQuery specification in terms of the XQuery data model. This document assumes that the reader is already familiar with the XQuery language.

Two important design aspects of XQuery is that it is functional and typed. These two aspects play an important role in the XQuery Formal Semantics.

XQuery is a functional language. It is built from expressions , called queries, rather than statements. Every construct in the language (except for the query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the later expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.

XQuery is a typed language. Types can be imported from one or several XML Schemas (typically describing the documents that will be processed), and the XQuery language can then perform operations based on these types (e.g., using a treat as operation). In addition, XQuery also supports a level of static type analysis. This means that the system can perform some inference on the type of a query, based of the type of its inputs. Static typing allows early error detection, and may be the basis for certain forms of optimization. The type system of XQuery is based on [XML Schema Part 1], but does not support all the features of XML Schema. For example, XQuery does not provide support for derivation. The relationship between the XQuery type system and XML Schema is defined in [3.5 Importing types from XML Schema]. Issues with respect to the XQuery type system are discussed in [3.6 Major type issues].

The XQuery formal semantics builds on long-standing traditions in the database and in the programming languages communities. In particular, we have been inspired by works on SQL [SQL], OQL [ODMG], [BKD90], [CM93], and nested relational algebra (NRA) [BNTW95], [Col90], [LW97], [LMW96]. We have also been inspired by systems such as Quilt [Quilt], UnQL [BFS00], XDuce [HP2000], XML-QL [XMLQL99], XPath [XPath], XQL [XQL99], XSLT [XSLT 99], and YaTL [YAT99]. Additional citations are found in the bibliography [B References].

This document is organized as follows. In [2 Preliminaries], we introduce concepts and notations used for defining the XQuery formal semantics. In [3 The XQuery Type System], we describe the XQuery type system, operations on the XQuery type system, and explain the relationship between the XQuery type system and XML Schema. In [4 Semantics of Expressions], we describe the dynamic and static semantics of XQuery. In [5 The Query Prolog], we describe the semantics of the XQuery prolog. In [6 Additional Semantics of Functions], we describe any additional semantics required for functions.

2 Preliminaries

Ed. Note: Status: This section contains introductory material and is mostly new.

In this section, we provide background concepts for the XQuery formal semantics and introduce the notations that are used.

2.1 Processing model

We first define a processing model for query evaluation. This processing model is not intended to be a model for an actual implementation, although a (naive) implementation might be obtained from it. It does not require or constrain any implementation technique, but any implementation should produce the same results as the one obtained by following this processing model and applying the rest of the formal semantics specification.

The processing model is based of four phases; each phase consumes the result of the previous phase and generates output for the next phase:

  1. Parsing. The grammar for the XQuery syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, some internal representation of the parsed syntax tree is created.

  2. Normalization. To simplify the semantics specification, we first perform some normalization over the query. The XQuery language provides many powerful features that make querys simpler to write and use, but are also redundant. For instance, a complex for expression might be rewritten as a composition of several simple for expressions. The language composed of these simpler query is called the XQuery Core language and is a subset of the complete XQuery language. The grammar of the XQuery Core language is given in [A Normalized core grammar].

    During the normalization phase, each XQuery query is mapped into its equivalent query in the core. (Note that this has nothing to do with Unicode Normalization which works on character strings). Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literals. After normalization, the full semantics can be obtained just by giving a semantics to the normalized Core query. This is done during the last two phases.

    Ed. Note: The query prolog needs to be processed before normalization starts. [5 The Query Prolog] explains how the query prolog is processed. This is not yet reflected in the processing model. See [Issue-0130:  When to process the query prolog].

  3. Static type analysis. Static type analysis checks if each query is type safe, and if so, determines their static types. Static type analysis is defined only for Core query. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and the known types of input documents into account.

    Static type analysis can result in a static error, if the query is not type-safe. For instance, a comparison between an integer value and a string value might be detected as an error during the static type analysis. If static type analysis succeeds, it results in a syntax tree where each sub-expression is "decorated" with its static type, and in an output type for the result of the query.

    Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The XQuery type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in XQuery by associating a standard type with the document, such that it may include any legal XML content.

  4. Dynamic Evaluation. This is the phase in which the result of the query is computed. The semantics of evaluation is defined only for Core query terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals. This guarantees that every expression can be unambiguously reduced to a value, which is the final result of the query.

The first three phases above are "analysis-time" (sometimes also called "compile-time") steps, which is to say that they can be performed on a query before examining any input document. Indeed, analysis-time processing can be performed before such document even exists. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of query, suitable for execution by a compiled-query processor. The last phase is an "execution-time" (sometimes also called "run-time") step, which is to say that it is performed on the actual input document.

Static analysis catches only certain classes of errors. For instance, it can detect a comparison operation applied between incompatible types (e.g., xsd:int and xsd:date). Some other classes of errors cannot be detected by the static analysis and are only detected at execution time. For instance, whether an arithmetic expression on 32 bits integers (xsd:int) yields an out-of-bound value can only be detected by looking at the data, and is raised as an evaluation error.

While implementations may be free to employ different processing models, the XQuery static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report static type errors, as specified in this document. It is still an open issue (See [Issue-0098:  Implementation of and conformance levels for static type checking]) whether to make the static type analysis phase optional to allow implementations to return a result for a type-invalid query in special cases where the query happens to perform correctly on a particular instance of input data. (Note that this is not the same as merely permitting that the static type error is emitted at evaluation time as we do below.)

Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: query processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. All the processing model defines is what the final result of processing should be.

The above processing phases are all internal to the query processor. They do not deal with how the query processor interacts with the outside world, notably how it accesses actual documents and types. A typical query engine would support at least three other important processing phases:

  1. XML Schema import phase. The XQuery type system is based on XML Schema. In order to perform static type analysis, the XQuery processor needs to build type descriptions that correspond to the schema of the input documents. This phase is achieved by mapping all schemas required by the query into the XQuery type system. The XML Schema import phase is described in this document in [3.5 Importing types from XML Schema].

  2. XML loading phase. During the evaluation phase, expressions are applied on instances of the [XQuery 1.0 and XPath 2.0 Data Model]. How XML documents are loaded into the data model is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not be discussed further here.

  3. Serialization phase. Once the query is evaluated, processors might want to serialize the result of the query as actual XML documents. Serialization of data model instances is still an open issue (See [Issue-0116:  Serialization]) and is not be discussed further here.

We do not describe the parsing phase formally. Notably, we do not specify data structures for the syntax trees. Instead, we use the XQuery concrete syntax directly as a support for the formal semantics specification. More details about parsing can be found in the [XQuery 1.0: A Query Language for XML] document. No further discussion of parsing is included here.

For the other three phases (normalization, static type analysis and evaluation), instead or giving an algorithm in pseudo-code, we describe the semantics formally by means of inference rules. Inference rules provide a notation to describe how the semantics of an expression derives from the semantics of its sub-expressions. Hence, they provide a concise and precise way for specifying bottom-up algorithms, as required by the normalization, static type analysis, and evaluation phases.

2.2 Data Model

In this section we first present the basic components of the XQuery data model pertinent to the semantic description. The remainder of the section is devoted to several related features that merit special attention: node identity, order, and error values.

2.2.1 Data model components

The components manipulated in XQuery, such as simple-typed values and nodes (attributes, elements, etc.), are defined in the [XQuery 1.0 and XPath 2.0 Data Model] document. In this section we present only a short review of these components. We refer the reader to the [XQuery 1.0 and XPath 2.0 Data Model] document for more details.

There are five kinds of components in the [XQuery 1.0 and XPath 2.0 Data Model]: Error, SimpleValue, Node, Sequence, and SchemaComponent.

  • A single error value.

  • A SimpleValue: a value from any of the value spaces of XML Schema simple types, as defined in [XML Schema Part 2].

  • A Node; which is one of the following kinds: DocumentNode, ElementNode, AttributeNode, NamespaceNode, CommentNode, PINode and TextNode. XML documents and their content are represented in the data model as trees composed of nodes and values. [XQuery 1.0 and XPath 2.0 Data Model] defines a mapping from the PSVI (Post Schema Validation Information Set) to such trees.

  • A Sequence: which is an ordered collection of nodes, an ordered collection of simple-typed values, or an ordered collection of items (i.e., any mixture of nodes and simple-typed values). An important characteristic of sequences is that they cannot be nested. That is, all sequences are "flat", and cannot be composed of other sequences. Moreover, there is no distinction between a single item and a singleton sequence containing that item. That is, a single item and a singleton sequence containing that item are equivalent. It is still open whether sequences of entire documents are permitted [Issue-0068:  Document Collections].

  • A SchemaComponent: the type of a node. The content and use of this schema component is still an open issue.

[XQuery 1.0 and XPath 2.0 Data Model] defines constructors, functions to construct each of these kinds of data model component, as well as accessors, functions to access their contents. For instance, the xf:children function can be used to retrieve the children nodes of an element node.

2.2.2 Node identity

In the [XQuery 1.0 and XPath 2.0 Data Model], nodes have an identity. Node identity follows from the unique location of any node within exactly one XML document (or document fragment, in the case of XML being constructed within the query itself). It is possible for two expressions to return not only the same value, but also the exact same node. On the other hand, two expressions could have results with the same document structure, but different identities. XQuery supports comparison of two nodes using either "node equality" (equality by identity) or "value equality" (equality by equal values), for instance using one of the two operators == or =.

Simple-typed values do not have an associated identity and are therefore always compared by value equality.

The difference between node equality and value equality is illustrated on the following example:

let $a1 := <book><author>Suciu</author></book>,
    $a2 := <author>Suciu</author>,
    $a3 := $a1/author
return
    ($a1/author == $a3), {-- true, they refer to the same node --}
    ($a1/author = $a2),  {-- true, they have the same value --}
    ($a1/author == $a2)  {-- false, they are not the same node --}

All XQuery's operators preserve node identity with the exception of explicit copy operations and node (element, attribute, etc.) constructors. An element constructor always creates a deep copy of its attributes and children nodes. Other operators, such as sequence construction or path expressions, do not result in copies of the corresponding nodes. So, for example, ($a3, $a2) creates a new sequence of nodes, the first of which has the same identity as $a1/author.

2.2.3 Document order and sequence order

There are two kinds of order in XQuery: sequence order and document order.

Sequence order. Sequences of items in the XQuery data model are ordered. Sequence order refers to the order of these items within a given sequence.

Document order. Document order refers to the total order among all nodes within a given document. It is defined as the order of appearance of the nodes when performing a pre-order, depth-first traversal of a tree. For elements, this corresponds to the order of appearance of their opening tags in the corresponding XML serialization. Document order is equivalent to the definition used in [XML Path Language (XPath) : Version 1.0].

Depending on the context, it may be possible to talk about two different notions of order for the same (set of) nodes. For example:

let $e := <list>
              <item id="1"/>
              <item id="2"/>
          </list>,
    $s := ($e/item[@id='2'], $e/item[@id='1'])
...

The item "1" is before item "2" in document order, but the same two nodes are in the opposite order in the sequence $s.

In the case of nodes among several documents, the order between the document is implementation defined but must be stable. I.e., it must not change for the duration of query evaluation. Support for document order among elements in distinct documents or document fragments is discussed in [Issue-0003:  Document Order].

2.2.4 Errors

Some queries result in an error. Errors that are detected at compile-time, like parse errors or type errors, are reported to the user by the system. Dynamic errors must return the Error value defined by the [XQuery 1.0 and XPath 2.0 Data Model].

Whenever necessary, we use the notation dm:error() to refer to the error value in the [XQuery 1.0 and XPath 2.0 Data Model]. General handling of errors in XQuery is still an open issue (See [Issue-0094:  Static type errors and warnings] and Issue-98 in [XQuery 1.0: A Query Language for XML]).

2.3 Schemas and types

The Data Model establishes the space of values that may be manipulated by XQuery, but it does not establish a complete type system. For example, we may know that a node is an ElementNode, but that does not tell us what kind of element it is, what its legal contents is, etc.

The XQuery type system is based on [XML Schema Part 1]. An introduction to XML Schema can be found in the primer [XML Schema Part 0]. An important notion underlying XML Schema are regular expressions, and their tree counterparts, regular tree grammars. An introduction to regular (tree) languages can be found in [Languages] or in [TATA]. In order to denote regular expressions, the XQuery type system uses the familiar notation of XML DTDs. We refer the reader to [XML] for more on DTDs.

Ed. Note: We do not give an introduction to regular expressions and tree grammars for now, but a future version of the document should add some introductory text about: regular expressions, tree grammars, and basic algorithms over these that are relevant for XQuery. Notably, some text about closure properties, complexity, subsumption algorithm, etc. could be provided there.

This section presents an introduction to (static) type systems in general, and a conceptual overview of the XQuery type system in particular. The XQuery type system is discussed formally in [3 The XQuery Type System].

2.3.1 The elements of a (static) type system

A type system relates values, types, and expressions. A (static) type system requires several constituent parts:

  • A universe of possible types. Every type system starts with a set of primitive types and most (including XQuery's) have type constructors that allow the construction of complex types.

    In XQuery, the primitive types are the built-in simple datatypes types of [XML Schema Part 2]. There are also type constructors for sequences and node types such as attributes and elements. These correspond to the concepts of group, attribute and element in [XML Schema Part 1] (or more precisely, in the formalization of XML Schema in [XML Schema : Formal Description]).

  • A notation for types. A syntax is necessary to define and manipulate types. In order to write the type related semantics of XQuery, this document uses its own notation, which is used notably in type inference and dynamic semantics rules.

  • A notion of instances, and domain of a type. Formally, a type is defined to be the (usually infinite) set of instance values that belong to that type. Thus the type xs:integer consists of the set of all integer values, and the type element address { xs:string } consists of all elements named "address" whose contents are a single string value.

  • A notion of subtyping. Subtyping is a relationship between type, i.e., it relates types to each other.

    Subtyping plays a crucial role in a type system. Most notably, it is used to determine the legality of arguments to functions.

  • Rules that relate expressions to types. The heart of static type inference is associating a static type with each expression in a query. In a strongly-typed language like XQuery, the static type is a guarantee that, no matter what the input to the query is, the result of evaluating the expression is a value that belongs to its static type.

  • Rules that relate values to types. During query evaluation, expressions are evaluated to yield values. The dynamic component of a typing system determines the actual types of data values as they are computed.

2.3.2 Subtyping

One type is a subtype of another if every value belonging to the first type also belongs to the other. Here is a simple example of subtyping:

( xs:double )*
( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains only doubles is also an example of a sequence containing either integers or doubles.

We write Type1 <: Type2 to indicate that type Type1 is a subtype Type2. Note that we use the subtype notation <: when defining the XQuery semantics, but it is not part of the XQuery syntax.

2.3.2.1 Type equivalence

Two types are equivalent if every value of one type is also a value of the other type, and vice versa, i.e., two types are equivalent if and only if they are both subtypes of the other. Here is a simple example of two equivalent types:

( xs:integer | xs:double )*
( xs:double | xs:integer )*

Both of these types mean a sequence of any number of doubles or integers.

We write type equivalence as Type1 = Type2, where Type1 and Type2 are both types. If Type1 = Type2 then Type1 <: Type2 and Type2 <: Type1 are true.

2.3.2.2 Type substitutability

Subtyping plays a crucial role in the semantics of function application since all functions have an associated signature that declares the types of input parameters and the type of the result of the function.

Intuitively, one can call a function not only by passing an argument of the declared type, but also any argument whose type is a subtype of the declared type. This property is called type substitutability.

Ed. Note: This section should contain an example.

2.3.2.3 Subtyping and XML Schema derivation

Subtyping in XQuery is not the same as "derivation" in XML Schema. A derived type is always a subtype of its base type, but the opposite is not true. For instance, consider the following types:

( xs:double )+
xs:double, ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains one or more doubles is also an example of a sequence containing a double followed by a sequence of zero or more integers or doubles. But derivation as defined in XML Schema does not hold between these two types.

The relationship between subtyping and XML Schema derivation is discussed in [Issue-0019:  Support derived types].

2.3.3 Static types and dynamic types

One consequence of having a static type system is that for the same expression, a type is first computed during the static phase, then a type for the actual value which is the result of that expression is computed during the dynamic phase. We use the terms static type and dynamic type respectively for each of these computed types for a given expression. It is important to note that for the same expression the dynamic type is often more precise than the static one, since the real value is not known statically. Indeed, the static type can be considered as an approximation of the dynamic one based on the available static information. Consider, for example, an XPath expression

      
$x/(foo | bar)
    

In this expression the static type might be

      
( element foo | element bar )*
    

If during execution, $x happens to contain exactly two foo elements, the dynamic type of (the result of evaluating) $x is a sequence of two foo elements.

      
( element foo, element foo ).
    

Of course, the basic guarantee of static type inference is that the dynamic type is always a subtype of the static type. Note that dynamic types are specific: they never contain choices, all groups, or quantifiers.

Ed. Note: MFF: This next paragraph should be changed to describe type annotations.

The XQuery data model also defines SchemaComponents associated with Nodes. SchemaComponents relate to the type associated with a Node through schema validation. This may be different from either the static or the dynamic type of the node. The relationship between static types, dynamic types and schema components in the data model is still an open issue. See [Issue-0018:  Align algebra types with schema].

2.4 Functions

2.4.1 Functions and operators

The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides a number of useful basic functions over the components of the XQuery data model (simple-typed values, nodes, sequences, etc). We use a number of these functions in the course of describing the XQuery semantics. Here is the list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the XQuery formal semantics:

  • op:intersect
  • op:union
  • op:except
  • op:concatenate
  • op:boolean-and
  • op:boolean-or
  • op:union-value
  • op:except-value
  • op:to
  • op:numeric-equal
  • op:numeric-add
  • op:get-end-datetime
  • op:numeric-subtract
  • op:get-duration
  • op:get-start-datetime
  • op:numeric-multiply
  • op:numeric-divide
  • op:numeric-mod
  • op:boolean-equal
  • op:datetime-equal
  • op:duration-equal
  • op:hex-binary-equal
  • op:base64-binary-equal
  • op:numeric-greater-than
  • op:datetime-greater-than
  • op:duration-greater-than
  • op:numeric-less-than
  • op:datetime-less-than
  • op:duration-less-than
  • op:node-equal
  • op:node-before
  • op:node-after
  • op:numeric-unary-plus
  • op:numeric-unary-minus
  • op:operation
  • xf:false
  • xf:true
  • xf:length
  • xf:boolean
  • xf:item-at
  • xf:get-namespace-uri
  • xf:get-local-name
  • xf:round
  • xf:compare
  • xf:not
  • xf:not3
  • xf:empty-sequence
  • xf:root
  • xf:index-of

2.4.2 Functions and static typing

Many functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document are generic: they perform operations on arbitrary components of the data model, e.g., any kind of node, or any sequence of items. For instance, the xf:distinct-nodes function removes duplicates in any sequence of nodes. As a result, the signature given in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document is also generic. For instance, the signature of the xf:distinct-nodes function is:

define function xf:distinct-nodes(node*) returns node*

If applied as is, this signature would result in poor static type information. In general, it would be preferable if some of these function signatures could take the type of input parameters into account. For instance, if the function xf:distinct-nodes is applied on a parameter of type element a*, element b, one can easily deduce that the resulting sequence is a collection of either a or b elements.

In order to provide better static typing, we specify typing rules for some of the functions in [XQuery 1.0 and XPath 2.0 Functions and Operators]. These additional typing rules are given in [6 Additional Semantics of Functions]. Here is the list of functions that are given specific typing rules.

  • dm:error
  • op:union
  • op:intersect
  • op:expect
  • op:to

Ed. Note: This list reflects the content of Sections 6, but should be expanded. See [Issue-0135:  Semantics of special functions].

2.4.3 Data Model Accessors and XPath Axes

The XQuery Data Model provides operations to construct or access component of the Data Model. Some of these operations are used in the course of defining the formal semantics. We use the namespace prefix dm: those functions to distinguish them for other functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Here is a list of data model constructors or accessors used for formal semantics specification.

  • dm:error
  • dm:atomic-value
  • dm:children
  • dm:attributes
  • dm:parent
  • dm:name
  • dm:node-kind
  • dm:dereference
  • dm:empty-sequence
  • dm:namespaces
  • dm:type
  • dm:typed-value
  • dm:attribute-simple-node
  • dm:element-simple-node
  • dm:text-node

XPath axes are used to describe tree navigation in instances of the XQuery data model. The semantics of axis navigation is described in terms of data model functions. Some of the XPath axes are directly supported through existing data model accessors. Some axes (e.g., ancestor) are defined as a simple recursive application of a data model accessor and others (e.g., following) require more complex computation on top of existing data model accessors. The correspondence between XPath axis and data model accessors is specified in section [4.3.1 Axis Steps]

2.4.4 Other formal semantics functions

In a few cases, the formal semantics makes use of some functions that are not currently in the [XQuery 1.0 and XPath 2.0 Functions and Operators] or in the [XQuery 1.0 and XPath 2.0 Data Model] documents. We use the namespace prefix fs: for those functions, to distinguish them from functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document and from Data Model accessors.

Here is a list of additional functions used for formal semantics specification.

  • fs:document
  • fs:distinct-doc-order
  • fs:data
  • fs:unique-item-at

2.5 Notation

Ed. Note: Status: This section is intended to provide an introduction to formal notations, such as inference rules, for readers which are not familiar with these notations. It can be skipped by the reader which is familiar with these notations. This introductory material is still considered a draft and will be improved in further rewritings of the Formal Semantics document.

In order to make the semantics specification both concise and precise, we use logical notation in the form of logical inference rules. In this section, we introduce the notations for inference rules and sorts of semantic values, specifically environments.

2.5.1 Judgments and patterns

Ed. Note: The following is a somewhat terse explanation of inference rules using formal semantic lingo informally. It needs some examples.

The basic building block of an inference rule is a judgment that expresses some property. A judgment is specified by a description that contains a combination of basic patterns, each identifying a sort of value that can be inserted, interspersed with symbols. Symbols must occur literally in judgments between values of a sort that corresponds to the used patterns.

A pattern is identified by one or more italic words. The names of patterns are significant: each pattern name corresponds to a sort of value that can be substituted legally for the pattern. For this reason, a pattern is sometimes called a metavariable that is instantiated to some value of the appropriate sort.

We employ the following conventions for pattern names:

  1. patterns for syntactic sorts denoting lexical items begin with upper case and their names correspond to terminals or nonterminals in the XQuery grammar. For example, Expr is a pattern that can be instantiated with any XQuery Expr.

  2. patterns for semantic sorts, to be defined, begin with lower case letters. For example, denotes a semantic value.

Furthermore, all patterns may appear with subscripts (e.g. Expr1, Expr2), which simply name different instances of the same pattern sort.

For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment described by 'Expr => value'. It is no problem that we are overloading the symbol '3' to denote both the '3' in an expression and the value 3: this is unambiguous because of the strict use of sorts in the judgment descriptions.

Finally, we require that each distinct pattern is instantiated to a single expression or value so if the same pattern occurs twice in a judgment description then it should be instantiated with the same value. This means that '3 => 3' is an instance of the judgment described by 'Expr => Expr' but '$x+0 => 3' is not. (Both, however, are instances of the judgment described by 'Expr1 => Expr2'.)

2.5.2 Inference rules

Inference rules are used to express the logical relation between judgments and define how complex judgments can be concluded from simpler premise judgments. (As explained in [2.1 Processing model], this approach lends itself well to bottom-up algorithms.)

A logical inference rule is written as two collections of premise and conclusion judgments, written above and below a dividing line, respectively:

premise1 ... premisen

conclusion1 ... conclusionn

The interpretation of an inference rule is: if all the premise judgments above the line are true, then we know that each of the conclusion judgments below the line are also true.

Here is a simple example to illustrate (based on the example judgment from above described by 'Expr1 => Expr2'):

$x => 0      3 => 3

$x + 3 => 3

This rule expresses the property of the evaluation judgment 'Expr=>value': if an expression containing the variable reference '$x' yields the value zero, and the literal expression '3' yields the value '3', then we know that '$x + 3' yields the value '3'.

It is also possible for the expression above the line to have no judgments at all; this simply means that the expression below the line is true under all premises:


3 => 3

This judgment says that evaluating the expression '3' always yields the value three.

The rules above are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns that match sorts of things. For example, here is a rule that says for any variable, adding 0 yields the same value:

Variable => value

Variable + 0 => value

Formally, the rule states that if some variable in XQuery, to which Variable is instantiated, evaluates to some value, to which value is instantiated, then we can conclude that the query 'Variable + 0', with Variable substituted by the actual variable, evaluates to the same value.

We require that each unique patterns used in a particular inference rules is instantiated to the same value for the entire rule. This means that we can refer to "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

Putting these together, here is an example of an inference rule that occurs later in this document:

statEnvs |- Expr1 : xs:boolean      statEnvs |- Expr2 : Type2      statEnvs |- Expr3 : Type3

statEnvs |- if Expr1 then Expr2 else Expr3 : ( Type2 | Type3 )

This rule states that if the conditional expression of an if expression has type boolean, then the type of the entire expression is one of the two types of its then and else clauses. Note that this is represented as a choice type: '(Type2|Type3)'.

The rule also illustrates a convention of judgment expressions: Judgments usually consist of three parts: the assumptions, the problem, and the result, organized as "assumptions |- problem =>> result" The "|-" symbol is pronounced "turnstile" and is ubiquitous in the rules whereas the actual symbol used instead of "=>>" varies with the defined judgment -- in this particular example it is the ":" symbol. The idea is that with the assumption defined one can compute the result from the problem.

Ed. Note: Jonathan suggests to explain how to 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.5.3 Normalization rules

Normalization is presented as logical rewrite rules, which specify those expressions that are rewritten into some other expression in the XQuery core. The static semantics is presented as type inference rules, which relate XQuery expressions to types and specify under what conditions an expression is well typed. The dynamic, or operational, semantics is presented as value inference rules, which specify the order in which an XQuery expression is evaluated and relate XQuery expressions to values.

Ed. Note: I'm leaving this unchanged, but I think we can and should change the definition of normalization so that it doesn't force the use of "no-op" rules; i.e. instead of forcing the recurrent execution of more mapping rules, allow them to be pattern matched just as the rest of the inference rules are.

Normalization applies a translation function to each expression in the XQuery syntax and returns an expression in the XQuery core syntax; The notation [ Expr ] denotes the result of translating the XQuery expression Expr into an expression in the XQuery core. All normalization rules have the following structure:

 
[Expr]
==
coreExpr

The [ Expr ] above the == sign denotes an expression in the XQuery language before translation and the coreExpr beneath the == sign denotes an equivalent expression in the XQuery core.

For convenience, the translation of an optional expression, e.g., Expr?, is denoted by ( [ Expr ] )?, meaning that the optional expression is translated, if it exists. Similarly, the translation of a repeated expression, e.g., Expr* is denoted by [ Expr ]*, meaning that each expression in the repetition is translated individually.

Many of the mapping rules from XQuery into the core convert expressions that have implicit arguments, types, or complex semantics into core expressions that have explicit arguments, types, and a more verbose, but explicit semantics.

2.5.4 Environments

Logical inference rules use environments to record information computed during static type inference or dynamic evaluation so that information can be used by other logical inference rules. For example, the type signature of a user-defined function in a query prolog can be recorded in an environment and used by subsequence rules. Similarly, the value assigned to a variable within a let statement can be captured in an environment and used for further evaluations.

In general an environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to a value. If env is an environment then "env( symbol )" denotes the value that symbol is mapped to by env. Depending on the kind of environment, the "value" can be a type, a data value, etc. The notation is intentionally akin to function application as an environment can be seen as a "function" from the argument symbol to the value that symbol maps to.

In this specification we use environment groups that group related environments. If "envs" is an environment group with the member "mem", then that environment is denoted "envs.mem" and the value that it maps symbol to is denoted "envs.mem(symbol)".

In particular, updating is only defined on environment groups:

  • "envs [ mem(symbol |-> value) ]" denotes the environment group which is just like envs except that the mem environment has been updated to map symbol to value.

  • If the "value" is a type then we use the conventional variant notation "envs [ mem(symbol : value) ]".

  • We allow the shorthand "envs [ mem( symbol1 |-> value1 ; ... ; symboln |-> valuen ) ]" for the nested " (...(envs[mem(symbol1 |-> value1)]) ... [mem(symboln |-> valuen)])".

Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to properly capture the scope of variables, namespaces, etc. Also, note that there are no operations to remove entries from environments: the need never arises because the environment group from "before" the update remains accessible concurrently with the updated version.

Finally, an example that brings several of the notations above together. Here is a rule that extends the varType member of the static environment group statEnvs with a typing of a new variable:

statEnvs |- Expr1 : Type1      statEnvs [ varType(QName : Type1) ] |- Expr2 : Type2

statEnvs |- let $QName = Expr1 return Expr2   :   Type2

2.5.5 Static type inference

The result of static type inference is to associate a static type with every expression in a query, such that any evaluation of that expression is guaranteed to yield a value that belongs to that type. This section outlines how this is done.

For each type of expression (as defined by the grammar production rules of XQuery), we define static type inference rules. Here is a simple example of such a rule:

Expr1 : Type1      Expr2 : Type2

Expr1 , Expr2 : Type1, Type2

This rule is read as follows: if two expressions Expr1 and Expr2 have already been statically inferred to have types Type1 and Type2 (that's the premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the type "Type1, Type2". Basically, it says that when two expressions are concatenated together, the result type is the concatenation of their types.

The "left half" of the expression below the line (the part before the ":") corresponds to some phrase in a query, for which we want to compute a type. If the query has been parsed into an internal parse tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (Expr1 and Expr2) that need to be matched against the children of the node in the parse tree. The expressions above the line indicate things that we need to know or compute to use this rule; in this case, we need to know the types of the two sub-expressions of the "," operator. Once we know those types (by applying static inference rules recursively to the expressions on each side), then we can compute the type of this expression. This illustrates a general feature of the type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is a recursive, following the parse structure of the query. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed: the query is not type-safe.

As stated earlier, the types computed by static inference rules are sound, which means that they guarantee that any value computed by an expression must belong to the static type of the expression. Another characteristic of a static type system is completeness. A type system is complete if it computes the "tightest possible" static type for every expression. It is very rare for type systems to be complete, and XQuery is no exception: there are expressions for which the static type system of XQuery computes a type that is not the tightest type possible.

2.5.6 Evaluation rules

Dynamic semantics associates values with query expressions. As with static type inference, we use logical inference rules to determine the value of expressions given values of sub-expressions, and information contained in environments. XQuery's dynamic semantics is modeled on the dynamic semantics presented in [Milner].

The dynamic rules use judgments of the form: dynEnvs |- phrase => value, where phrase is a phrase in XQuery syntax, and value is a value.

The inference rules used for dynamic evaluation, like those for static type inference, follow a top-down recursive structure, computing the value of expressions from the values of their sub-expressions.

The dynamic type of data values is completely determined by the values themselves. Thus, there are no extra rules or machinery for computing dynamic types.

3 The XQuery Type System

Ed. Note: Status:The XQuery type system defined in this section is based on structural typing. The definition and formal specification of the XQuery type system will be revisited when support for named typing is be added. Support for named typing is discussed in [Issue-0104:  Support for named typing].

The XQuery type system is used for the specification of both the dynamic and the static semantics of XQuery. It is used to describe the semantics of datatype operations and type conversion rules in XQuery. It is also the basis for the static semantics of XQuery.

The XQuery type system is based on XML Schema. XML Schema defines a notion of validation for XML documents. When doing validation, an XML Schema processor checks if the structure, and the text content of a document verifies the constraints on the structure, and the constraints on values, specified in the schema. As an example of constraints on the document structure, all book elements might be declared to contain an isbn attribute, and a title element, followed by a sequence of one or more author elements. As an example of constraints on values, the isbn attribute might be declared as a subset of the XML Schema xsd:string values, using a XML Schema pattern facet.

The XQuery type system is based on the structural constraints of XML schema, which have been formalized as [XML Schema : Formal Description]. There are two main difference between the XQuery type system and [XML Schema : Formal Description]. First the XQuery type system does not capture the notion of XML Schema names (also called normalized universal names in [XML Schema : Formal Description]). Second, the XQuery type system uses a specific syntax, as needed for semantics specification, notably the static type inference.

This section describes the XQuery type system, the essential operations on the XQuery type system, and the relationship between the XQuery type system and XML Schema.

3.1 XQuery Types

Types in the XQuery type system are described with the following grammar:

[75]   TypeDeclaration   ::=   ("define" "element" QName "{" Type? "}")
|  ("define" "attribute" QName "{" Type? "}")
|  ("define" "type" QName "{" Type? "}")
[76]   Type   ::=   TypeUnion
|  TypeBoth
|  TypeSequence
|  TypeRepeat
|  ItemType
|  TypeRef
|  TypeParenthesized
|  TypeNone
[77]   TypeUnion   ::=   Type  "|"  Type
[78]   TypeBoth   ::=   Type  "&"  Type
[80]   TypeRepeat   ::=   Type TypeOccurrenceIndicator
[79]   TypeSequence   ::=   Type  ","  Type
[81]   ItemType   ::=   ElementRef
|  AttributeRef
|  SimpleType
[82]   TypeRef   ::=   "type" QName
[83]   TypeParenthesized   ::=   "(" Type? ")"
[84]   TypeNone   ::=   "none"
[55]   SimpleType   ::=   QName
[85]   AttributeRef   ::=   ("attribute" QName)
|  ("attribute" NameTest "{" Type? "}")
[86]   ElementRef   ::=   ("element" QName)
|  ("element" NameTest "{" Type?)
[87]   TypeOccurrenceIndicator   ::=   "*" |  "+" |  "?"

Ed. Note: This type system does not support text, process-instruction and document nodes. Such types have to be added (See [Issue-0105:  Types for nodes in the data model.], and [Issue-0068:  Document Collections]).

In addition, some constraints, not specified by the grammar, must hold:

SimpleType

SimpleType must be one of the built-in simple datatypes from [XML Schema Part 2].

AttributeRef

The content model of a AttributeRef must be consistent with the allowed content of attributes. Namely, it must be a subtype of (using wilcards types as defined in section on [3.1.1 Wildcard types]):

            
xs:AnySimpleType *
         

In the case the content of the attribute is not specified (e.g., attribute isbn), this indicates a reference to a globally defined attribute in the schema. Therefore the corresponding attribute must exist in the schema and be globally declared.

ElementRef

The content model of a ElementRef must be consistent with the allowed content of elements. Namely, it must be a subtype of:

  ( xs:AnyAttribute* ,
       xs:AnySimpleType* |
       ( xs:AnyElement | xs:string )* )
         

Note that comments and processing instructions, while they are allowed data values within an element, do not constitute part of an element's type.

In the case the content of the element is not specified (e.g., element title), this indicates a reference to a globally defined element in the schema. Therefore the corresponding element must exist in the schema and be globally declared.

TypeNone

The type TypeNone indicates a type error. This type can be generated by the type system during the course of static type analysis, but it is not a legal value in any type declaration in a query. Note that TypeNone is the identity for choice (I.e., (t | TypeNone) == t)).

Ed. Note: Jerome: The status of the constraints on attribute and element type constructors is unclear. When are they enforced and checked ? (See [Issue-0106:  Constraint on attribute and element content models]

In [XML Schema Part 2], a simple datatype is either a primitive datatype, or is derived from another simple datatype by specifying a set of facets. The type syntax of XQuery does not provide any way to define datatypes with facets. Such types can be imported from XML Schemas and may be referenced as Type xs:QName, where xs:QName names the definition of the faceted type in the schema.

The occurrence indicators "*", "?" and "+" indicate a sequence of any, zero or one, or more than one items, respectively. This plays the same role as, but is more limited than, the minoccurs/maxoccurs facets of [XML Schema Part 1], and follows the notations of DTDs.

A WildCard denotes a set of names. A WildCard is either

  • a QName denoting the name with the given namespace prefix and local name;

  • * denoting any name in any namespace;

  • NCName:* denoting any local name in namespace NCName;

  • or *:NCName denoting the local name NCName in any namespace;

The element, and attribute, type constructors may use a wildcard to specify sets of one or more matching elements, or attributes. For example,

  element bib:* { xs:integer* }  

indicates any element in the bib namespace whose content consists of zero or more integers.

Empty content is indicated with the explicit empty sequence, as in

  element bib:* { ( ) }  

If the content model is omitted, then the element, (resp. attribute or type) name must be a proper QName, and refers to the global element (resp. attribute or type) with that name, which must be defined (by importation from a schema or through a type definition expression).

The type system includes three operators that can combine types: ",", "|" and "&".

The "," operator builds the "sequence" of two types. For example,

  xs:integer , xs:double  

means a sequence of an integer followed by a double. This corresponds to the xs:sequence construct of XML Schema for elements; XML Schema does not recognize similar properties for simple types.

The "|" operator builds a "choice" between two types. For example,

  xs:integer | (element bib:author)*  

means either an integer or a sequence of bib:author elements. The "|" operator corresponds to the xs:choice construct of XML Schema for elements and the xs:union construct for simple types.

The "&" operator builds the "interleaved product" of two types. The type t1 & t2 matches any sequence that is an interleaving of a sequence that matches t1 and a sequence that matches t2. For example,

  (element a , element b) & xs:string   =
                  element a, element b, xs:string
                | element a, xs:string, element b
                | xs:string, element a, element b   

In each case, the order of element a and element b is unchanged, but the xs:string can be interleaved in any position.

The interleaved product is a generalization of the xs:all construct in XML Schema for elements, and has no corresponding representation in XML Schema for simple types. The xs:all construct in XML Schema may only consist of global or local element declarations with lower bound 0 or 1, and upper bound 1.

3.1.1 Wildcard types

XQuery defines the following built-in datatypes:

define type xs:AnySimpleType { xs:integer | xs:float | xs:string ... }
define type xs:AnyAttribute  { attribute * { type xs:AnySimpleType * }}
define type xs:AnyElement    { element *  { type xs:AnyType } }
define type xs:AnyNode       { type xs:AnyAttribute | type xs:AnyElement }
define type xs:AnyItem       { type xs:AnySimpleType | type xs:AnyNode }
define type xs:AnyType       { type xs:AnyItem* }

The concepts of xs:AnySimpleType and xs:AnyType have the same interpretation as anySimpleType and anyType of XML Schema, respectively, though this version of xs:AnyType contains types (such as comments and processing instructions) that do not occur in XML Schema.

3.2 Instances, and Domain of a Type

Ed. Note: Status:This section is still incomplete and based on structural typing. The notion of domain of a type will be revised when support for named typing is added.

The semantics of a type is given by the notion of domain, I.e., the set of all values which are instance of that type. A tree in the [XQuery 1.0 and XPath 2.0 Data Model] is an instance of a type in the XQuery type system, if and only if it verifies the structural constraints described by the type. For instance the query:

  <bib:pubdate> { 1954, 1966, 1974, 1986 } </bib:pubdate> 

creates an instance of the following type:

  element bib:* { attribute country { xs:string }?, xs:integer* }  

because the element name "bib:pubdate" is an instance of the wildcard nametest "bib:*", the attribute "country" is optional, and the content of that element is indeed a sequence of integers.

The notion of instance of a type in XQuery is structural. Notably, it does not take XML Schema names into account. For instance, the two following XQuery types:

  define element entry { type Book }
  define type Book {
    attribute isbn { xs:string },
    element title { xs:string },
    element author { xs:string }*,
  }

and

  define element entry { type Article }
  define type Article {
    attribute isbn { xs:string }?
    (element title { xs:string } | element ref { xs:string }),
    element author { xs:string }+,
  }

might be declared in two different schemas, one for books and one for research articles. Because the notion of instance in the XQuery type system is structural, the following document:

<entry> isbn="0-618-15398-5">
  <title>The Lord of the Rings</title>
  <author>J.R.R. Tolkien</author>
</entry>

would always be an instance of both types "Book" and "Article". In XML Schema, depending against which schema the document is validated, only one of the two above types would hold.

In this section, we define formally the domain of a XQuery type. We first define the domain of a nametest, then define the domain of a complete type.

3.2.1 Domain of a NameTest

The semantics of a given NameTest is defined by the set of names that matches a that NameTest.

Notation

We denote access to the domain of a NameTest with the following auxiliary judgment. The notation:

statEnvs |- QName1 inwild NameTest2

indicates that the QName QName is in the domain defined by the NameTest NameTest2.

We specify the relationship QName1 inwild NameTest2 by decomposing both sides in to QNames or WildCards, and then defining the relationship between each component of the name.

First, is always in the domain of QName is always a singleton compose of itself. Note that the following inference rule takes namespace resolution into account.

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expand(QName2) => qname(URI2,ncname2)            ncname1 = ncname2

statEnvs |- QName1 inwild QName2

statEnvs |- expand(QName1) => qname(ncname1)
statEnvs |- expand(QName2) => qname(ncname2)            ncname1 = ncname2

statEnvs |- QName1 inwild QName2

Now the rules for QName inwild WildCard:

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expandNCName(NCName2) => URI2

statEnvs |- QName1 inwild NCName2:*

statEnvs |- expand(QName1) => qname(URI1,ncname1)      ncname1 = NCName2

statEnvs |- QName1 inwild *:NCName2


statEnvs |- QName1 inwild *

3.2.2 Semantics of the Domain of a Type

Notation

We denote the domain of a type using the following auxiliary judgment. The notation:

statEnvs |- value in Type

indicates that the value is the domain of type Type.

The XQuery type system is based on tree grammars. The semantics for the domain of a type is based on the set of a trees accepted by the corresponding tree grammar.

We do now give a full specification of the notion of language recognized by a tree grammar at this point, but refer the reader to the literature on tree grammars, for instance [Languages], or [TATA]

Ed. Note: Future versions of that document will include a more complete description of the notion of domain of a type.

Ed. Note: The use of pure tree grammars for defining the domain yields a structural definition of the domain of a type. This definition will have to be revised as soon as the named typing proposal is added.

3.3 Subtyping

This section defines the semantics of subtyping in XQuery. Subtyping is an important relationship between type, which is used notably, when doing function applications to check whether the input parameters of the function are valid.

We first define subtyping between nametests, then define the notion of subtyping between arbitrary types.

3.3.1 NameTest Subtyping

Notation

We denote NameTest subtyping with the following auxiliary judgment. The notation:

statEnvs |- NameTest1 <:wild NameTest2

indicates that NameTest NameTest1 is a subtype of the NameTest NameTest2.

We evaluate NameTest1 <:wild NameTest2 by decomposing both sides in to QNames or WildCards, and then defining a subtyping relation between each component of the name.

First QName <:wild QName is only true if they represent the same fully qualified name:

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expand(QName2) => qname(URI2,ncname2)            ncname1 = ncname2

statEnvs |- QName1 <:wild QName2

statEnvs |- expand(QName1) => qname(ncname1)
statEnvs |- expand(QName2) => qname(ncname2)            ncname1 = ncname2

statEnvs |- QName1 <:wild QName2

Now the rules for QName <:wild WildCard:

statEnvs |- expand(QName1) => qname(URI1,ncname1)            URI1 = URI2
statEnvs |- expandNCName(NCName2) => URI2

statEnvs |- QName1 <:wild NCName2:*

statEnvs |- expand(QName1) => qname(URI1,ncname1)      ncname1 = NCName2

statEnvs |- QName1 <:wild *:NCName2


statEnvs |- QName1 <:wild *

And finally rules for WildCard1 <:wild WildCard2. Note that the second rule also prevents NCName:* from being a subtype of anything unless it is a valid known prefix.


statEnvs |- * <:wild *

expandNCName(NCName) => URI

statEnvs |- NCName:* <:wild *


statEnvs |- *:NCName <:wild *

Ed. Note: DD: (1) I've taken liberties comparing, e.g. NCName = ncname, (2) this doesn't take into account any resolution of URIs.

3.3.2 Semantics of Subtyping

Notation

We denote subtyping with the following auxiliary judgment. The notation:

statEnvs |- Type1 <: Type2

indicates that type Type1 is a subtype of Type2.

The definition of subtyping is based on the notion of domain, as defined in [3.2 Instances, and Domain of a Type]

By definition, Type1 <: Type2 if and only if all the instances in the domain of Type1 are also in the domain of Type2. That is, for every value value such that:

statEnvs |- value in Type1

it is also the case that:

statEnvs |- value in Type2

It is easy to see that the subtype relation <: is a partial order, i.e. it is reflexive: statEnvs |- Type <: Type

and it is transitive: if, statEnvs |- Type1 <: Type2      and, statEnvs |- Type2 <: Type3      then, statEnvs |- Type1 <: Type3

Note

The above definition although complete and precise, does not give a simple means to compute subtyping.

The XQuery type system is based on tree grammars. Computing subtyping between two types can be done by computing inclusion between their corresponding tree grammar.

We do now give a full algorithm to compute inclusion between tree grammar at this point, but refer the reader to the literature on tree grammars, for instance [Languages], or [TATA]

Ed. Note: Future versions of that document will include a more complete description of algorithms to compute subtyping.

Ed. Note: The use of pure tree grammars for defining the domain yields a structural definition of the notion of subtyping. This definition will have to be revised as soon as the named typing proposal is added.

3.3.3 Type equivalence

In a few cases, we use the fact that two types are equivalent, I.e., that they define exactly the same domain over data model instances.

By definition, we write Type1 = Type2 if and only if, Type1<:Type2 and Type2<: Type1.

3.4 Prime types

This section defines the notion of Prime Type, and operations on prime types. Prime types play an important role in defining the static semantics of XQuery.

3.4.1 Prime types and sequences of items with similar types

Some expressions operate on sequences of similar items. These include FLWR expressions, sorting, and duplicate elimination. For example, consider the following small document:

  <paper>
      <author>Buneman</author>
      <author>Davidson</author>
      <author>Fernandez</author>
      <author>Suciu</author>
      <title>Adding structure to unstructured data</title>
      <editor>Afrati</editor>
      <editor>Kolaitis</editor>
   </paper>
:  element paper {
      element author {xsd:string}+,
      element title {xsd:string},
      element editor {xsd:string}*
   }

The following query extracts a sequence of authors followed by editors and sorts them by content. This results in a sequence with authors and editors arbitrarily interleaved. The result type reflects this by repeating a choice of author and editor.

   (/paper/author , /paper/editor) sortby .
    => <editor>Afrati</editor>
       <author>Buneman</author>
       <author>Davidson</author>
       <author>Fernandez</author>
       <editor>Kolaitis</editor>
       <author>Suciu</author>
    :  (element author {xsd:string} | element editor {xsd:string})+
    

Such operation always operate on sequences over choices of items, that we call a Prime type.

Prime types can be described by the following grammar production.

[88]   PrimeType   ::=   ItemType
|  (PrimeType "|" PrimeType)

This section explains the details about computing over prime types and sequences of prime types. For instance, we explain why the above type ends in a plus rather than a star.

3.4.2 Computing Prime Types

We use two auxiliary functions on types. Note that both functions are only used by XQuery's type inference rules; they are not part of the XQuery syntax.

The type function prime(Type) extracts all item types from the type expression Type and combines them by a choice.

prime(ItemType)  =  ItemType
prime(())  =  none
prime(none)  =  none
prime(Type1 , Type2)  =  prime(Type1) | prime(Type2)
prime(Type1 & Type2)  =  prime(Type1) | prime(Type2)
prime(Type1 | Type2)  =  prime(Type1) | prime(Type2)
prime(Type?)  =  prime(Type)
prime(Type*)  =  prime(Type)
prime(Type+)  =  prime(Type)

The type function quantifier(Type) approximates the possible number of items in Type with the quantifiers supported by XQuery type expressions (?, +, *). To quantify interim results, we use the auxiliary quantifiers 1 for exactly one occurrence, 0 for exactly zero occurrences, i.e., the quantifier of the empty list, and - for infinitely many occurrences, i.e., the quantifier of the empty choice none, which is used to type errors.

quantifier(ItemType)  =  1
quantifier(())  =  0
quantifier(none)  =  0
quantifier(Type1 , Type2)  =  quantifier(Type1) , quantifier(Type2)
quantifier(Type1 & Type2)  =  quantifier(Type1) , quantifier(Type2)
quantifier(Type1 | Type2)  =  quantifier(Type1) | quantifier(Type2)
quantifier(Type?)  =  quantifier(Type) · ?
quantifier(Type*)  =  quantifier(Type) · *
quantifier(Type+)  =  quantifier(Type) · +

The tables below specify the necessary arithmetics to compute the sum (q1 , q2), the choice (q1 | q2), and the product (q1 · q2) of two quantifiers q1, q2.

 ,  0  1  ?  +  * 
 0  0  1  ?  +  * 
 1  1  +  +  +  + 
 ?  ?  +  *  +  * 
 +  +  +  +  +  + 
 *  *  +  *  +  * 
    
 |  0  1  ?  +  * 
 0  0  ?  ?  *  * 
 1  ?  1  ?  +  * 
 ?  ?  ?  ?  *  * 
 +  *  +  *  +  * 
 *  *  *  *  *  * 
    
 ·  0  1  ?  +  * 
 0  0  0  0  0  0 
 1  0  1  ?  +  * 
 ?  0  ?  ?  *  * 
 +  0  +  *  +  * 
 *  0  *  *  *  * 

The auxiliary quantifiers for interim results are eliminated with the following type simplification rules.

Type · 0  =  ()
Type · 1  =  Type
Type · ?  =  Type?
Type · +  =  Type+
Type · *  =  Type*
Type · -  =  none

prime(Type) · quantifier(Type) enjoys two desirable properties:

(1) for all types Type1, Type2: Type1 <: Type2 implies prime(Type1) · quantifier(Type1) <: prime(Type2) · quantifier(Type2).
For example, for the two syntactically different but semantically equivalent types Type1 = (xsd:string & xsd:integer)* and Type2 =((xsd:string,xsd:integer) | (xsd:integer,xsd:string))*, prime(Type1) · quantifier(Type1) = prime(Type2) · quantifier(Type2) = (xsd:string | xsd:integer)*.

(2) for all types Type: Type <: prime(Type) · quantifier(Type).
Note that for some type expressions prime(Type) · quantifier(Type) may be regarded as too general. For example, the result-type of sorting a sequence of type xsd:integer , xsd:string is (xsd:integer | xsd:string)+. A more specific result-type would be xsd:integer & xsd:string, which preserves the individual quantifiers for xsd:integer and xsd:string. However, computing such result-types for arbitrary type expressions loses type precision for choice types (e.g. (xsd:integer | xsd:string)+ becomes (xsd:integer* & xsd:string*)), and it requires more complicated typing rules.

Ed. Note: (Peter): I'm not sure whether we need to elaborate on this here, but at least Denise may wonder why we ditched disorder(). In addition, I'm aware that (2) is a corollary of Phil's famous factoring theorem, I'm not sure whether this also holds for (1), but haven't spend much thought on it. BTW. Impotence holds of course as well.

Using these two auxiliary functions, the result type of a SortExpr can be computed as follows. If Expr has type Type then Expr sortby SortSpecList has type prime(Type) · quantifier(Type). For the formal type inference rule see [4.10 Sorting Expressions]. Similar rules are applied to type the other operations that destroy sequence order - union, intersect, except, distinct-node, distinct-value, and unorder.

Ed. Note: (Peter): I deliberately do not introduce the formal inference rule here, because (a) the machinery for type inference probably hasn't been explained yet at this place in the document, and (b) the current static semantics of SortExpr is incomplete anyway. Maybe the example should use unorder() instead of sortby.

3.4.3 Common Prime Types

Ed. Note: Status: This section contains new material necessary for the simplified semantics of typeswitch.

When performing type checking for certain XQuery expressions, we also need to compute the common prime types between two types, and the common occurrence indicator between two types.

The type function common-primes(Type1,Type2) extracts all item types that are common from the prime types and and combines them into a new prime type.

Notation

In order to define the "common-primes" function, we use the following auxiliary judgment.

statEnvs |- common-primes(ItemType1, ItemType2) = Type

specifies that under the current static environment, the common prime of ItemType1, ItemType2 is type Type.

We define the common primes by looking at each combination of items in each prime type.

common-primes( ( ItemType1 | ··· | ItemTypen ), ( ItemType1' | ··· | ItemTyper' ) )
  =  
common-primes( ItemType1, ItemType1' ) | ··· | common-primes( ItemTypen, ItemTyper' )

We then we define whether each item is kept or discarded using the auxiliary judgment. In the case the item is discarded, the result of common-primes is none. Remember that none is the unit for choice.

In the case of a simple type, or globally defined attribute or element, one keeps the corresponding item if both items in the functions are the same.

statEnvs |- SimpleType1 = SimpleType2

statEnvs |- common-primes(SimpleType1, SimpleType2) = SimpleType1

statEnvs |- element QName1 = element QName2

statEnvs |- common-primes(element QName1, element QName2) = element QName1

statEnvs |- attribute QName1 = attribute QName2

statEnvs |- common-primes(attribute QName1, attribute QName2) = attribute QName1

When matching an element (resp. attribute), if the left-hand side item type is a subtype of the right-hand side item type, then the function returns the left-hand side item type. This rule is particularly useful, when the right hand side is a wildcard type, as might be often the case in a typeswitch expression.

statEnvs |- element NameTest1 { Type1 } <: element NameTest2 { Type2 }

statEnvs |- common-primes(element NameTest1 { Type1 }, element NameTest2 { Type2 }) = element NameTest1 { Type1 }

statEnvs |- element NameTest1 { Type1 } <: element QName2

statEnvs |- common-primes(element NameTest1 { Type1 }, element QName2) = element NameTest1 { Type1 }

statEnvs |- element QName1 <: element NameTest2 { Type2 }

statEnvs |- common-primes(element QName1, element NameTest2 { Type2 }) = element QName1

statEnvs |- attribute NameTest1 { Type1 } <: attribute NameTest2 { Type2 }

statEnvs |- common-primes(attribute NameTest1 { Type1 }, attribute NameTest2 { Type2 }) => attribute NameTest1 { Type1 }

statEnvs |- attribute NameTest1 { Type1 } <: attribute QName2

statEnvs |- common-primes(attribute NameTest1 { Type1 }, attribute QName2) => attribute NameTest1 { Type1 }

statEnvs |- attribute QName1 <: attribute NameTest2 { Type2 }

statEnvs |- common-primes(attribute QName1, attribute NameTest2 { Type2 }) => attribute QName1

Otherwise, if the left-hand side item and the right-hand side item have compatible names, then the function returns the right-hand side item type.

statEnvs |- not ( element NameTest1 { Type1 } <: element NameTest2 { Type2 } )
statEnvs |- (NameTest1 <:wild NameTest2) or (NameTest2 <:wild NameTest1)

statEnvs |- common-primes(element NameTest1 { Type1 }, element NameTest2 { Type2 }) => element NameTest2 { Type2 }

statEnvs |- not( element NameTest1 { Type1 } <: element QName2 )
statEnvs |- (NameTest1 <:wild QName2) or (QName2 <:wild NameTest1)

statEnvs |- common-primes(element NameTest1 { Type1 }, element QName2) => element QName2

statEnvs |- not( element QName1 <: element NameTest2 { Type2 } )
statEnvs |- (QName1 <:wild NameTest2) or (NameTest2 <:wild QName1)

statEnvs |- common-primes(element QName1, element NameTest2 { Type2 }) => element NameTest2 { Type2 }

statEnvs |- not( attribute NameTest1 { Type1 } <: attribute NameTest2 { Type2 } )
statEnvs |- (NameTest1 <:wild NameTest2) or (NameTest2 <:wild NameTest1)

statEnvs |- common-primes(attribute NameTest1 { Type1 }, attribute NameTest2 { Type2 }) => attribute QName2 { Type2 }

statEnvs |- not( attribute NameTest1 { Type1 } <: attribute QName2 )
statEnvs |- (NameTest1 <:wild QName2) or (QName2 <:wild NameTest1)

statEnvs |- common-primes(attribute NameTest1 { Type1 }, attribute QName2) => attribute QName2

statEnvs |- not( attribute QName1 <: attribute NameTest2 { Type2 } )
statEnvs |- (QName1 <:wild NameTest2) or (NameTest2 <:wild QName1)

statEnvs |- common-primes(attribute QName1, element NameTest2 { Type2 }) => attribute NameTest2 { Type2 };

Finally, in all the other cases, the common type is none. I.e., otherwise:

statEnvs |- common-primes(ItemType1, ItemType2) => none
3.4.3.1 Common Occurrence Indicator

The following table is computing the common occurrence indicator and is used in the typing of the typeswitch expression.

 common-occur  0  1  ?  +  * 
 0  0  0  0  0  0 
 1  0  1  1  1  1 
 ?  0  1  ?  1  ? 
 +  0  1  1  +  + 
 *  0  1  ?  +  * 

3.5 Importing types from XML Schema

Ed. Note: Status: This section is still incomplete, needs detailed review, and will be fully revised to support named typing. This section is enclose here merely to illustrate what mapping from XML Schema into the XQuery type system will look like in future versions of the document.

The XQuery type system support a static subset of XML Schema. At compile time, the XQuery environment imports XML Schema declarations and load them as declarations in the XQuery type system for further static reasoning. The semantics of that loading process is defined by a simple mapping from XML Schema to the XQuery type system.

The mapping applies a translation function to each XML Schema expression in the XML Schema syntax and returns an expression in the XQuery type system syntax. The notation [ SchemaExpr ] denotes the result of translating the XML Schema expression Expr into an expression in the XQuery type system. All mapping rules have the following structure:

 
[SchemaExpr]
==
Type

The [ SchemaExpr ] above the horizontal rule denotes an expression in XML Schema before translation and the coreExpr beneath the horizontal rule denotes an equivalent expression in the XQuery core.

3.5.1 Schema

Schemas are imported by the 'schema' declaration in the preamble of a query. To import a schema, the document referred to by the given URI is opened and the schema declarations contained in the document are translated into the corresponding in-line type definitions.

 
[schema NCName { URI }]
==
[open-schema-document(URI)]sort_decl(NCName)

To make the translation rules a bit easier to read, we use grammar rules to distinguish between the various parts of a <schema> element:

[89]   Pragma   ::=   ("include" |  "import" |  "redefine" |  "annotation")*
[90]   Content   ::=   (("simpleType" |  "complexType" |  "element" |  "attribute" |  "attributeGroup""attributeGroup" |  "group" |  "notation") "annotation"*)*

(Note: attributes that are ignored are emphasized)

 
[
  <schema
       attributeFormDefault = (qualified | unqualified) : unqualified
       elementFormDefault = (qualified | unqualified) : unqualified
       finalDefault = (#all | List of (extension | restriction)) : ''
       id = ID
       targetNamespace = anyURI
       version = token
       xml:lang = language
       {any attributes with non-schema namespace . . .}>
       Pragma Content
  </schema>
]sort_decl(NS)
==
[Pragma]sort_decl(NS) [Content]sort_decl(NS)
 
[
  <include
       id = ID
       schemaLocation = anyURI
       {any attributes with non-schema namespace . . .}>
       Content: (annotation?)
  </include>
]
==
[schema targetNS { anyURI }]sort_decl(targetNS)

An imported schema is associated with the namespace prefix that it defines:

 
[
  <import
       id = ID
       namespace = NCName
       schemaLocation = anyURI
       {any attributes with non-schema namespace . . .}>
       Content: (annotation?)
  </import>
]
==
[schema NCName { anyURI }]sort_decl(namespace)

Redefine is unsupported.

 
[
  <redefine
       id = ID
       schemaLocation = anyURI
       {any attributes with non-schema namespace . . .}>
       Content: (annotation | (simpleType | complexType | group | attributeGroup))*
  </redefine>
]sort_decl(targetNS)
==

Annotations are unsupported.

 
[annotation]sort_decl(targetNS)
==

3.5.2 QNames

 
[NCName]name(targetNS)
==
targetNS:NCName
 
[NS:NCName]name(targetNS)
==
NS:NCName

3.5.3 Complex Type Definitions

The following productions describe the structure of a complex type definition in XML Schema.

[91]   ComplexTypeContent   ::=   "annotation"? ("simpleContent" |  "complexContent" |  "group" |  "all" |  "choice" |  "sequence")
[92]   Attributes   ::=   ("attribute" |  "attributeGroup""attributeGroup")* "anyAttribute"?
  <complexType
       abstract = boolean : false
       block = (#all | List of (extension | restriction))
       final = (#all | List of (extension | restriction))
       id = ID
       mixed = boolean : false
       name = NCName
       {any attributes with non-schema namespace . . .}>
       ComplexTypeContent
       Attributes
  </complexType>

We distinguish between global complex types (which are mapped to sort declarations) and local complex types (which are mapped to type definitions).

3.5.3.1 Global complex type
 
[
  <complexType name=NCName>
      ComplexTypeContent
      Attributes
  </complexType>
]sort_decl(prefix)
==
define type [NCName ]name(prefix) { [Attributes]group(prefix), [ComplexTypeContent]group(prefix) }

In case of mixed complex type, the mapping introduces TEXT in the corresponding type system representation.

 
[
  <complexType name=NCName mixed="true">
      ComplexTypeContent
      Attributes
  </complexType>
]sort_decl(prefix)
==
define type [NCName ]name(prefix) { [Attributes]group(prefix), TEXT* & [ComplexTypeContent]group(prefix) }
3.5.3.2 Anonymous local complex type
 
[
  <complexType>
      ComplexTypeContent
    Attributes
  </complexType>
]sort_decl(prefix)
==
[Attributes]group(prefix), [ComplexTypeContent]group(prefix)

In case of mixed complex type, the mapping introduces TEXT in the corresponding type system representation.

 
[
  <complexType mixed="true">
      ComplexTypeContent
      Attributes
  </complexType>
]sort_decl(prefix)
==
[Attributes]group(prefix), TEXT* &[ComplexTypeContent]group(prefix)

3.5.4 Groups

3.5.4.1 Global named group declarations
 
[
  <group
      name = NCName>
      Content: (annotation?, (all | choice | sequence))
  </group>
]group(prefix)
==
define type [NCName]name(prefix) { [Content]group(prefix) }
3.5.4.2 Local groups

Anonymous local groups

 
[
  <group>
      Content: (annotation?, (all | choice | sequence))
  </group>
]sort_decl(prefix)
==
[Content]group(prefix)

All groups

 
[
  <all
      id = ID
      maxOccurs = 1 : 1
      minOccurs = (0 | 1) : 1
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, element1 ... elementn)
  </all>
]group(prefix)
==
[element1]group(prefix) & ... & [elementn]group(prefix)

Choice groups

 
[
  <choice
      id = ID
      Occurs
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (element | group | choice | sequence | any)*)
  </choice>
]group(prefix)
==
([Content1]group(prefix) | ... | [Contentn]group(prefix)) [Occurs]bound

Sequence groups

 
[
  <sequence
      id = ID
      Occurs
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (element | group | choice | sequence | any)*)
  </sequence>
]group(prefix)
==
([Content1]group(prefix), ... , [Contentn]group(prefix)) [Occurs]bound

3.5.5 minOccurs and maxOccurs

The following structure describes minOccurs and maxOccurs attributes in XML Schema.

[93]   Occurs   ::=   ("maxOccurs" ("nonNegativeInteger" |  "unbounded"))
|  ("minOccurs" "nonNegativeInteger")

minOccurs and maxOccurs are mapped to the type system in the following way.

 
[minOccurs=0]bound
==
?
 
[maxOccurs="unbounded"]bound
==
+
 
[minOccurs=0 maxOccurs="unbounded"]bound
==
*
 
[minOccurs=1 maxOccurs=1]bound
==

Right now, the proposed type system grammar does not capture other occurrences cases. Therefore, these are converted into a collection.

 
[minOccurs=m maxOccurs=n]bound
==
*

3.5.6 Elements

The following structure describes element declarations in XML Schema.

  <element
      abstract = boolean : false
      block = (#all | List of (extension | restriction | substitution))
      default = string
      final = (#all | List of (extension | restriction))
      fixed = string
      form = (qualified | unqualified)
      id = ID
      Occurs
      name = NCName
      nillable = boolean : false
      ref = QName
      substitutionGroup = QName
      type = QName
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*))
  </element>
3.5.6.1 Global elements
 
[<element name=NCName1 type=QName>]sort_decl(prefix)
==
define element [NCName1]name(prefix) { [QName]name(prefix) }
 
[
  <element name=NCName>
      Content
  </element>
]sort_decl(prefix)
==
define element [NCName]name(prefix) { [Content]group(prefix) }
3.5.6.2 Local elements

Local elements with explicit bounds:

 
[<element ref=NCName Bound>]group(prefix)
==
element [NCName]name(prefix) [Bound]bound
 
[<element name=NCName type=QName Bound>]group(prefix)
==
element [NCName]name(prefix) { [QName]name(prefix) } [Bound]bound
 
[
  <element name=NCName Bound>
      Content
  </element>
]group(prefix)
==
element [NCName]name(prefix) { [Content]group(prefix) } [Bound]bound

Local elements with no explicit bounds:

 
[<element ref=NCName>]group(prefix)
==
element [NCName]name(prefix)
 
[<element name=NCName type=QName>]group(prefix)
==
element [NCName]name(prefix) { [QName]name(prefix) }
 
[
  <element name=NCName>
      Content
  </element>
]group(prefix)
==
element [NCName]name(prefix) { [Content]group(prefix) }

3.5.7 Attributes

The following structure describes attribute declarations in XML Schema.

  <attribute
      default = string
      fixed = string
      form = (qualified | unqualified)
      id = ID
      name = NCName
      ref = QName
      type = QName
      Use
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (simpleType?))
  </attribute>
3.5.7.1 Use

The "use" attribute is used to indicate whether attributes are optional or not. The following production describes the structure of the "use"attribute.

The meaning of the "use" attribute is naturally captured in the type system by an occurrence indicator, as reflected in the following mapping rules.

Optional attributes are mapped to optional groups in the type system, using the '?' notation.

 
[optional]use
==
?

Required attributes indicate a required group.

 
[required]use
==
?????????????

It is an open issue what should be the semantics of a prohibited attribute in the type system.

 
[prohibited]use
==
3.5.7.2 Global attributes
 
[<attribute name=NCName type=QName>]sort_decl(prefix)
==
define attribute [NCName]name(prefix) { [QName]name(prefix) }
 
[
  <attribute name=NCName>
      Content
  </element>
]sort_decl(prefix)
==
define attribute [NCName]name(prefix) { [Content]group(prefix) }
3.5.7.3 Local attributes

Local attributes with explicit use:

 
[<attribute ref=NCName Use>]group(prefix)
==
attribute [NCName]name(prefix) [Use]use
 
[<attribute name=NCName type=QName Use>]group(prefix)
==
attribute [NCName]name(prefix) { [QName]name(prefix) } [Use]use
 
[
  <attribute name=NCName Use>
      Content
  </attribute>
]group(prefix)
==
attribute [NCName]name(prefix) { [Content]group(prefix) } [Use]use

Local attributes with no explicit use:

 
[<attribute ref=NCName>]group(prefix)
==
attribute [NCName]name(prefix)?
 
[<attribute name=NCName type=QName>]group(prefix)?
==
attribute [NCName]name(prefix) { [QName]name(prefix) }?
 
[
  <attribute name=NCName>
      Content
  </attribute>
]group(prefix)
==
attribute [NCName]name(prefix) { [Content]group(prefix) }?

3.5.8 Simple Content

For simple content, one just map their content in the type system. The mapping ignores most of the other information, notably facets.

 
[
  <simpleContent
      id = ID
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (restriction | extension))
  </simpleContent>
]group(prefix)
==
[Content]group(prefix)

The only part of the <restriction> content that is mapped is simpleType? and the attributes.

 
[
  <restriction
      base = QName
      id = ID
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive |
      totalDigits | fractionDigits | length | minLength | maxLength | enumeration | whiteSpace | pattern)*)?,
      ((attribute | attributeGroup)*, anyAttribute?))
  </restriction>
]group(prefix)
==
[Content]group(prefix)

All facets are ignored:

 
[
  (minExclusive | minInclusive | maxExclusive | maxInclusive |
  totalDigits | fractionDigits | length | minLength |
  maxLength | enumeration | whiteSpace | pattern)*)?
]group(prefix)
==

We also ignore <extension> for now.

 
[
  <extension
      base = QName
      id = ID
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, ((attribute | attributeGroup)*, anyAttribute?))
  </extension>
]group(prefix)
==

3.5.9 Attribute Group

3.5.9.1 Attribute group declaration
 
[
  <attributeGroup
      id = ID
      name = NCName
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, ((attribute | attributeGroup)*, anyAttribute?))
  </attributeGroup>
]sort_decl(prefix)
==
define type [NCName]name(prefix) { [Content]group(prefix) }
3.5.9.2 Attribute group reference

Attribute group reference can only occur in a complex type:

 
[
  <attributeGroup
      id = ID
      ref = QName
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?)
  </attributeGroup>
]group(prefix)
==
[QName]name(prefix)
3.5.9.3 Attribute wildcard

The anyAttribute schema component is mapped to the corresponding wildcard in the XQuery type system. The 'namespace' modifier is ignored.

 
[
  <anyAttribute
      id = ID
      namespace = ((##any | ##other) | List of (anyURI | (##targetNamespace | ##local)) ) : ##any
      processContents = (lax | skip | strict) : strict
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?)
  </anyAttribute>
]
==
xs:AnyAttribute

3.5.10 Complex Content

 
[
  <complexContent
      id = ID
      mixed = false
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (restriction | extension))
  </complexContent>
]group(prefix)
==
[Content]group(prefix)
 
[
  <complexContent
      id = ID
      mixed = boolean
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (restriction | extension))
  </complexContent>
]group(prefix)
==
TEXT* & [Content]group(prefix)
 
[
  <restriction
      base = QName
      id = ID
      {any attributes with non-schema namespace . . .}>
      Content
  </restriction>
]group(prefix)
==
[Content]group(prefix)

3.5.11 Simple Types

Global simple types:

 
[
  <simpleType
      final = (#all | (list | union | restriction))
      id = ID
      name = NCName
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (restriction | list | union))
  </simpleType>
]
==
define type [NCName]group(prefix) { [Content]group(prefix) }

Anonymous local simple type:

 
[
  <simpleType
      final = (#all | (list | union | restriction))
      id = ID
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (restriction | list | union))
  </simpleType>
]group(prefix)
==
[Content]group(prefix)

Again, we ignore all facets and only translate simpleType?

 
[
  <restriction
      base = QName
      id = ID
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive |
      totalDigits | fractionDigits | length | minLength | maxLength | enumeration | whiteSpace | pattern)*))
  </restriction>
]group(prefix)
==
[Content]group(prefix)
 
[
  <list
      id = ID
      itemType = QName
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, (simpleType?))
  </list>
]
==
[Content]group(prefix)*
 
[
  <union
      id = ID
      memberTypes = QName1, ..., QNamen
      {any attributes with non-schema namespace . . .}>
      Content: (annotation?, simpleType_1, ... simpleType_n)
  </union>
]
==
[QName1]name(prefix) | ... | [QNamen]name(prefix) | [simpleType1]group(prefix) | ... | [simpleTypen]group(prefix)

3.6 Major type issues

Ed. Note: Alignment between XQuery, the [XQuery 1.0 and XPath 2.0 Data Model], and the XQuery formal semantics. There are some known discrepancies between the type models in the Data Model document, the XQuery document and this document. Currently the XQuery/XPath grammar does not contain comments, processing-instructions or schema-components. Currently neither XQuery nor this document contain text nodes or document nodes, which are present in the Data Model. See issues [Issue-0068:  Document Collections], [Issue-0105:  Types for nodes in the data model.].

Ed. Note: Syntax for types. With respect to the currently published XQuery document, there is another major misalignment as well: the current XQuery document does not have a syntax for type constructors in XQuery.

Ed. Note: Complexity of type operations. There are concerns about the complexity of type inference, notably about type subsumption used during type inference, and validation used during evaluation of some of the type operations (e.g., typeswitch). Additionally, there are many language features for which it is possible to define special type inference rules that would give tighter bounds. The question is, which features, which rules, and how do they interact with the complexity of type inference and subtype computation. See [Issue-0080:  Typing of parent], [Issue-0083:  Expressive power and complexity of typeswitch expression], [Issue-0091:  Attribute expression].

Ed. Note: XML Schema mapping. The mapping from XML Schema to the XQuery type system is new and has not yet been reviewed by the XML Query Working Group See [Issue-0019:  Support derived types], [Issue-0020:  Structural vs. name equivalence], [Issue-0073:  Facets for simple types and their role for typechecking].

4 Semantics of Expressions

Ed. Note: Status: This section contained a fully revised semantics for XQuery, based on the December 2001 working drafts. Some sections might still await further material or editorial consolidation, as indicated by additional status note included in each specific section.

This section defines semantics of all XQuery expressions. The organization of this section parallels the organization of Section 2 of the XQuery document.

[4]   Expr   ::=   SortExpr
|  OrExpr
|  AndExpr
|  FLWRExpr
|  QuantifiedExpr
|  TypeswitchExpr
|  IfExpr
|  GeneralComp
|  ValueComp
|  NodeComp
|  OrderComp
|  InstanceofExpr
|  RangeExpr
|  AdditiveExpr
|  MultiplicativeExpr
|  UnionExpr
|  IntersectExceptExpr
|  UnaryExpr
|  CastExpr
|  Constructor
|  PathExpr

For each expression, we give a short description of the expression to be defined and include its grammar productions. The semantics of an expression includes the normalization, static analysis, and dynamic evaluation phases. Recall that normalization rules translate XQuery syntax into core XQuery syntax. In the sections that contain normalization rules, we include the Core grammar productions into which the expression is normalized. After normalization, sections on static type inference and dynamic evaluation define the static type and dynamic value for the core expression.

The different phases are covered by the judgments shown in the following table where the environment patterns are defined as part of the context below.

statEnvs |- query phrase The query phrase is well-typed in the static environment group statEnvs, [2.5.5 Static type inference].
statEnvs |- query phrase : type expression The query phrase is well-typed in the static environment group statEnvs, and its static type is given by type expression, [2.5.5 Static type inference].
dynEnvs |- query phrase => value phrase The evaluation of query phrase yields the value given by value phrase in the dynamic environment group dynEnvs, [2.5.6 Evaluation rules]

4.1 Basics

4.1.1 Expression Context

Introduction

The expression context for a given expression consists of all the information that can affect the result of the expression. This information is organized into the static context and the evaluation context. This section specifies the environments that represent the context information used by XQuery expressions.

4.1.1.1 Static Context

The XQuery static inference rules use the "static" environment group statEnvs containing the environments built during static type checking and used during both static type checking and dynamic evaluation.

The following environments are maintained in the static environment group:

statEnvs.namespace

The namespace environment maps namespace prefixes (NCNames) onto their fully qualified definitions (an xs:anyURI), as determined by the appropriate namespace declaration.

statEnvs.varType

The static variable type environment maps variable and parameter names (Variables) to their static type (a Type).

statEnvs.funcType

The function declaration environment stores the static type signatures of functions. Because XQuery allows multiple functions with the same name differing only in the number of arguments, this environment maps function name/arity (QName, integer) pairs to function type signatures (Typer, Type1, ..., Typen) where the first is the return type and the rest are the types of the parameters, in order.

statEnvs.elemDecl

The element declaration environment maps element type names (QNames) onto their element type definition (a Type).

statEnvs.attrDecl

The attribute declaration environment maps attribute type names (QNames) onto their attribute type definition (a Type).

statEnvs.typeDecl

The type declaration environment maps type names (QNames) onto their type definition (a Type).

Environments have an initial state when query processing begins, containing, for example, the function signatures of all built-in functions.

A common use of the static environment is to expand QNames by looking up namespace prefixes in the statEnvs.namespace environment. The QName production in the XQuery grammar is as follows:

[223]   QName   ::=   ":"? NCName (":" NCName)?

which corresponds to either a single local name, or a namespace prefix and a local name.

We define a helper function, expand to expand QNames by looking up the namespace prefix in statEnvs.namespace. We write it as follows:

   statEnvs |- expand(QName) = qname(anyURI, ncname)

or

   statEnvs |- expand(QName) = qname(ncname)

where what is on the right side is a QName value (not a QName expression).

The helper expand function is defined as follows:

  • If QName matches NCName1:NCName2, and if statEnvs.namespace(NCName1) = anyURI, then the expression yields qname(anyURI,NCName2). If the namespace prefix NCName1 is not found in statEnvs.namespace, then the expression does not apply (that is, the inference rule will not match).

  • If QName matches NCName, and if statEnvs.namespace("*default*") = anyURI, then the expression yields qname(anyURI,NCName), where anyURI is the default namespace in effect, otherwise the expression yields qname(NCName).

Ed. Note: The above rules could be given as proper inference rules defining the expand judgment.

Here is an example that shows modifying a static environment in response to a namespace definition.

statEnvs [ namespace(NCName |-> StringLiteral) ] |- Expr*

statEnvs |- namespace NCName = StringLiteral Expr*

This rule is read: "the phrase on the bottom (a namespace declaration followed by a sequence of expressions) is well-typed (accepted by the static type inference rules) within an environment statEnvs if (above the line, now) the sequence of expressions is well-typed in the environment obtained from statEnvs by adding the namespace declaration".

This is the common idiom for passing new information in an environment along to sub-expressions. In the case where we need to update an environment with a completely new component, we write:

statEnvs [ namespace = (NewEnvironment) ]
4.1.1.2 Evaluation Context

We use dynEnv to denote the group of "dynamic" environments, i.e., those environments built and used only during dynamic evaluation.

The following environments are dynamic:

dynEnvs.funcDefn

The dynamic function environment maps a function name to a function definition. The function definition consists of an expression, which is the function's body, and a list of variables, which are the function's formal parameters. As with statEnvs.funcType, dynEnvs.funcDefn requires a function name and a parameter count: it maps a (QName, integer) pair to a value list of the form (Expr , Variable1, ..., Variablen).

dynEnvs.varValue

The dynamic value environment maps a variable name (QName) onto the variables current value (value).

Ed. Note: DD: this still does not account for those functions that are genuinely overloaded, as some of the built-in functions are. Probably this will be handled by special rules for those functions in the main semantics section of the document. See [Issue-0122:  Overloaded functions]

Finally, dynamic environments have an initial state when query processing begins, containing, for example, the function declarations of all built-in functions.

Ed. Note: Somewhere we need an exact definition of what is in the initial environments? Notice that for XPath this is partially defined by the containing language. See [Issue-0115:  What is in the default context?].

Several built-in variables represent other parts of the evaluation context:

Built-in Variable  Represents:
$fs:dotcontext item
$fs:positioncontext position
$fs:lastcontext size
context document

Variables with the "xq" namespace prefix are reserved for use in the definition of the formal semantics. It is a static error to define a variable in the "xq" namespace.

Ed. Note: "xq" is actually a namespace prefix and should be replaced with a proper namespace URI unique to this spec. See [Issue-0100:  Namespace resolution].

Ed. Note: The following dynamic contexts have no formal representation yet: current date and time. See [Issue-0114:  Dynamic context for current date and time]

4.1.2 Type Conversions

Notation

For the basic-conversion rules, we define three normalization functions: []Optional_Atomic_Value []Atomic_Value_Sequence, and []Node_Sequence. They are used in the definition of arithmetic, comparison, and function-call expressions.

Normalization

There are two variants of the normalization function []Optional_Atomic_Value: the first variant is not parameterized by the required type of the expression and the second variant is parameterized by the required type. The function converts an expression into an expression that returns an optional atomic value and is used in the normalization of expressions whose required type is an optional atomic value. A node is converted to an optional atomic value by extracting its typed content.

 
[Expr]Optional_Atomic_Value
==
typeswitch (Expr) as $v
case fs:atomic? return $v
case node return
   (typeswitch (fs:data($v)) as $w
     case fs:atomic? return $w
     default return [$w]Type_Exception_Opt_Atomic)
default return [$v]Type_Exception_Other

In the parameterized variant, an atomic value with type fs:UnknownSimpleType is always cast to the required type:

 
[Expr]Optional_Atomic_Value(Type)
==
typeswitch (Expr) as $v
case fs:UnknownSimpleType return cast as Type ($v)
case fs:atomic? return $v
case node return
   (typeswitch (fs:data($v)) as $w
     case fs:UnknownSimpleType return cast as Type ($w)
     case fs:atomic? return $w
     default return [$w]Type_Exception_Opt_Atomic(Type))
default return [$v]Type_Exception_Opt_Atomic(Type)

Ed. Note: The type fs:UnknownSimpleType denotes untyped character data. The name of this type is an open XPath/XQuery issue (Issue-174).

The function []Atomic_Value_Sequence(Type) is parameterized by the atomic type of the sequence. This rule is used in the normalization of expressions whose required type is a sequence of atomic values. The rule converts each value in the argument expression to an atomic value. An atomic value with type fs:UnknownSimpleType is always cast to the required type. A node is converted to a sequence of atomic values by extracted its typed content and, for each atomic value in that sequence, cast any value with type fs:UnknownSimpleType to the required type and return all other values unchanged.

 
[Expr]Atomic_Value_Sequence(Type)
==
for $v in Expr
return
   typeswitch ($v) as $w
   case fs:UnknownSimpleType return cast as Type ($w)
   case atomic return $w
   case node return for $x in fs:data($w) return [$x]Optional_Atomic_Value(Type)
   default return [$v]Type_Exception_Other

The function []Node_Sequence is used in the normalization of expressions whose required type is node sequence. They all map an XQuery core expression to an XQuery core expression.

 
[Expr]Node_Sequence
==
typeswitch (Expr) as $v
case node* return $v
default return [$v]Type_Exception_Other

Notation

Two more normalization functions implement type exceptions They are: []Type_Exception_Opt_Atomic(Type) and []Type_Exception_Other.

Normalization

These definitions implement the strict type exception policy. The strict policy always raises an error:

 
[Expr]Type_Exception_Opt_Atomic(Type)
==
dm:error()
 
[Expr]Type_Exception_Other
==
dm:error()

In the first rule, the required type is xs:boolean. The conditional expression in the "default return" clause below guarantees that in an arbitrary sequence of items, if at least one value is a node, then the boolean expression evaluates to true.

 
[Expr]Type_Exception_Opt_Atomic(boolean) = ...
==
typeswitch (Expr) as $v
case () return xf:false()
default return
   if (xf:length([$v/self::node()]Path) >= 1) then xf:true()
   else xf:boolean(xf:item-at($v, 1))

Ed. Note: MFF: The rule above requires that $v/self::node() on a simple value returns () rather than raise an error.

Ed. Note: The semantics of Boolean node tests over sequences is still an open issue. See [Issue-0131:  Boolean node test and sequences].

In the next rule, the required type is node type. The conditional expression in the "default return" clause below guarantees that in an arbitrary sequence of items, if the first item is a node, then the boolean expression evaluates to true.

 
[Expr]Type_Exception_Opt_Atomic(node) = ...
==
typeswitch (Expr) as $v
case item+ return
     (typeswitch (xf:item-at($v, 1)) as $w
     case node return $w
     default return dm:error())
default return dm:error()

Ed. Note: MFF: All the remaining rules in the fallback conversions table must be specified. See [Issue-0113:  Incomplete specification of type conversions]

 
[Expr]Type_Exception_Other
==
dm:error()

4.2 Primary Expressions

This section defines the semantics of [2.2 Primary Expressions] in [XQuery 1.0: A Query Language for XML]. Primary expressions are the basic primitives of the language.

A primary expression is a variable, literal, an element name, a function call, a wildcard expression, a node-kind test, or a parenthesized expression.

[47]   PrimaryExpr   ::=   Variable
|  Literal
|  ElementNameOrFunctionCall
|  Wildcard
|  KindTest
|  ParenthesizedExpr

4.2.1 Literals

Introduction

A literal is a direct syntactic representation of a atomic value. XQuery supports two kinds of literals: string literals and numeric literals.

[49]   Literal   ::=   NumericLiteral |  StringLiteral
[48]   NumericLiteral   ::=   IntegerLiteral |  DecimalLiteral |  DoubleLiteral
[201]   IntegerLiteral   ::=   Digits
[202]   DecimalLiteral   ::=   ("." Digits) |  (Digits "." [0-9]*)
[203]   DoubleLiteral   ::=   (("." Digits) |  (Digits ("." [0-9]*)?)) ([e] | [E]) ([+] | [-])? Digits
[214]   StringLiteral   ::=   (["] [^"]* ["]) |  (['] [^']* ['])

Normalization

All literals are core expressions, therefore there are no normalization rules for literals.

Core Grammar

The core grammar rules for literals are:

[49]   Literal   ::=   NumericLiteral |  StringLiteral
[48]   NumericLiteral   ::=   IntegerLiteral |  DecimalLiteral |  DoubleLiteral

Static Type Analysis

In the static semantics, the type of an integer literal is simply xs:integer:


statEnvs |- IntegerLiteral : xs:integer

Dynamic Evaluation

In the dynamic semantics, an integer literal is evaluated by constructing a atomic value in the data model, which consists of the literal value and its type:


dynEnvs |- IntegerLiteral => dm:atomic-value(IntegerLiteral, xs:integer)

The formal definitions of decimal, double, and string literals are analogous to those for integer.

Static Type Analysis


statEnvs |- DecimalLiteral : xs:decimal

Dynamic Evaluation


dynEnvs |- DecimalLiteral => dm:atomic-value(DecimalLiteral, xs:decimal)

Static Type Analysis


statEnvs |- DoubleLiteral : xs:double

Dynamic Evaluation


dynEnvs |- DoubleLiteral => dm:atomic-value(DoubleLiteral, xs:double)

Static Type Analysis


statEnvs |- StringLiteral : xs:string

Dynamic Evaluation


dynEnvs |- StringLiteral => dm:atomic-value(StringLiteral, xs:string)

Ed. Note: MFF: Phil has noted that the data model should support primitive literals in their lexical form, in which case no explicit dynamic semantic rule would be necessary. See [Issue-0118:  Data model syntax and literal values].

4.2.2 Variables

Introduction

A variable evaluates to the value to which the variable's QName is bound in the evaluation context.

[192]   Variable   ::=   "$" QName

Normalization

A variable is a core expression, therefore there is no normalization rule for a variable.

Core Grammar

The core grammar rules for variables are:

[192]   Variable   ::=   "$" QName

Static Type Analysis

In the static semantics, the type of a variable is simply its type in the static type environment statEnvs.varType:

statEnvs.varType(Variable) = Type

statEnvs |- Variable : Type

In case the variable is not bound in the environment, the system raises an error.

Dynamic Evaluation

In the dynamic semantics, a variable is evaluated by "looking up" its value in dynEnvs.varValue:

dynEnvs.varValue(Variable) = value

dynEnvs |- Variable => value

In case the variable is not bound in the environment, the system raises an error.

4.2.3 Parenthesized Expressions

The formal definition of parenthesized expressions is in [4.4 Sequence Expressions].

4.2.4 Function Calls

Introduction

A function call consists of a QName followed by a parenthesized list of zero or more expressions.

[51]   ElementNameOrFunctionCall   ::=   QName ("(" (Expr ("," Expr)*)? ")")?

Notation

We define a normalization function,[]Formal_Argument for mapping the formal argument of a function call. There are three variants of this mapping rule. If the required type of the formal argument is an optional atomic value, the following rule is applied: (Type is the required atomic type.)

 
[Expr]Formal_Argument
==
[Expr]Optional_Atomic_Value(Type)

If the required type of the formal argument is a sequence of atomic values, the following rule is applied: (Type is the atomic type of the sequence.)

 
[Expr]Formal_Argument
==
[Expr]Atomic_Value_Sequence(Type)

If the required type of the formal argument is neither an optional atomic value or a sequence of atomic values, the expression is simply mapped to its corresponding core expression:

 
[Expr]Formal_Argument
==
[Expr]

Normalization

First, each argument in a function call is normalized to its corresponding core expression by applying []Formal_Argument.

 
[ QName (Expr1, ..., Exprn) ]
==
[QName] ( [Expr1 ]Formal_Argument, ..., [ Exprn ]Formal_Argument)

Note that this normalization rule depends on the static environment containing function signatures. This is working as the static context is built from the query prolog, before the normalization phase. [5 The Query Prolog] explains how the query prolog is processed.

Core Grammar

The core grammar rule for a function call is:

[51]   ElementNameOrFunctionCall   ::=   QName "(" (Expr ("," Expr)*)? ")"

Static Type Analysis

In the static semantics for core function calls, the type of each actual argument to the function must be a subtype of the corresponding formal argument to the function, i.e., it is not necessary that the actual and formal types be equal.

statEnvs |- QName => qname
statEnvs.funcType(qname, n) = qname(Type1, ..., Typen) returns Type
statEnvs |- Expr1 : Type'1     Type'1 <: Type1
...
statEnvs |- Exprn : Type'n     Type'n <: Typen

statEnvs |- QName (Expr1, ..., Exprn) : Type

Dynamic Evaluation

In the dynamic semantics for core function calls, we first evaluate each function argument. Then we extend dynEnvs.varValue by binding each formal variable to its corresponding actual value, and then evaluate the body of the function in the new environment. The resulting value is the value of the function call.

dynEnvs |- QName => qname
dynEnvs.funcDefn(qname, n) = (Expr, Variable1, ... , Variablen)
dynEnvs |- Expr1 => value1 ... dynEnvs |- Exprn => valuen
dynEnvs [ varValue = (Variable1 |-> value1;...; Variablen |-> valuen) ] |- Expr => value

dynEnvs |- QName ( Expr1, ..., Exprn ) => value

Note that the function body is evaluated in the default environment,

Ed. Note: This semantics only works for non-overloaded function. This is always the case for user defined functions, but not for [XQuery 1.0 and XPath 2.0 Functions and Operators] functions. See [Issue-0122:  Overloaded functions].

4.2.5 Comments

[106]   ExprComment   ::=   "{--" [^}]* "--}"

Comments are lexical constructs only, and have no meaning within the query and therefore have no formal semantics.

4.3 Path expressions

Introduction

Path expressions are used to locate nodes within a tree. There are two kinds of path expressions, absolute path expressions and relative path expressions. An absolute path expression is a rooted relative path expression. A relative path expression is composed of a sequence of steps.

[25]   PathExpr   ::=   AbsolutePathExpr |  RelativePathExpr
[31]   AbsolutePathExpr   ::=   ("/" RelativePathExpr?) |  ("//" RelativePathExpr)
[32]   RelativePathExpr   ::=   StepExpr (("/" |  "//") StepExpr)*
[33]   StepExpr   ::=   AxisStep |  GeneralStep

Notation

Besides the general [Expr]Expr normalization rule for expressions, we use an auxiliary normalization rule for path expressions, denoted [PathExpr]Path.

Normalization

A path expression always returns a sequence in document order. The complete normalization of a Path expression is obtained by applying the Path normalization and then sorting the result by document order and removing duplicates.

 
[PathExpr]Expr
==
fs:distinct-doc-order( [PathExpr]Path )

Ed. Note: Jerome: the restriction that XPath expressions operate on nodes here seems too strict and is still an open issue. See [Issue-0125:  Operations on node only in XPath].

Absolute path expressions are path expressions starting with the / symbol, indicating that the expression must be applied on the root node in the current context. Remember that the root node in the current context is the topmost ancestor of the context node. The following two rules are used to normalize absolute path expressions, and rely on the use of the fs:document() function, which computes the document node from the context node.

 
["/"]Path
==
[xf:document( $fs:dot ) ]Path
 
["/" RelativePathExpr]Path
==
[xf:document( $fs:dot ) "/" RelativePathExpr]Path

Ed. Note: Some of the semantics of the root expression '/' is still an open issue. For instance, what should be the semantics of '/' in case of a document fragment (e.g., created using XQuery element constructor). See [Issue-0123:  Semantics of /].

The // syntax is part of the Abbreviated syntax in XPath. We give separate normalization rules for Abbreviated XPath expressions in [4.3.5 Abbreviated Syntax]

Normalization of relative path expressions is done by normalizing each step, one after the other. A Step is normalized by using a for expression in the following way.

 
[StepExpr "/" RelativePathExpr]Path
==
let $fs:sequence := fs:distinct-doc-order( [StepExpr]Path ) return
  let $fs:last := xf:length($fs:sequence) return
    for $fs:dot in $fs:sequence return
      let $fs:position := xf:index-of($fs:sequence, $fs:dot) return
        [RelativePathExpr]Path

As explained in the XQuery document, applying a step in XPath changes the focus (or context). The change of focus is made explicit by the normalization rule, which binds the variable $fs:dot to the node currently being processed, and the variable $fs:position to the position (i.e., the position within the input sequence) of that node.

Ed. Note: This rule is using the function xf:index-of to bind the position of a node in a sequence. This is working since path expressions should only work on nodes. See [Issue-0125:  Operations on node only in XPath]. Still, it might be preferable to be able to bind both the current node and its position through a single processing of the collection within the for expression. The FS editors proposed several syntax for such an operation. For instance: for $v at $i in E1 return E2. See [Issue-0124:  Binding position in FLWR expressions].

The semantics of Step expressions, which are either axis or general steps, is given next.

4.3.1 Axis Steps

Introduction

An axis step is either a combination of an axis accessor, a node test and a step qualifier, or an abbreviated step. We define here the semantics of Axis NodeTest pairs. The semantics of StepQualifiers is defined in [4.3.3 Step Qualifiers] and the semantics of abbreviated steps is defined in [4.3.5 Abbreviated Syntax].

[34]   AxisStep   ::=   (Axis NodeTest StepQualifiers) |  AbbreviatedStep

Ed. Note: (Michael Kay): there is a bug here in the formal semantics which does not deal properly with principal node types. See [Issue-0108:  Principal node types in XPath].

Notation

During the normalization process, we have to distinguish between forward and reverse Axis. In order to do so, we need to slightly reorganize the XQuery grammar production to make that distinction. This reorganization does not actually change the syntax, but merely regroup together different productions. Here is the grammar that is the basis for the following normalization of steps.

[96]   FSAxisStep   ::=   (ForwardAxis NodeTest StepQualifiers)
|  (ReverseAxis NodeTest StepQualifiers)
|  AbbreviatedStep
[97]   ForwardAxis   ::=   "self" "::"
|  "child" "::"
|  "descendant" "::"
|  "descendant-or-self" "::"
|  "attribute" "::"
[98]   ReverseAxis   ::=   "parent" "::"

We then use two auxiliary normalization rules denoted [...]ForwardPath and [...]ReversePath

Normalization

A forward axis is directly translated to core XQuery

 
[ForwardAxis NodeTest StepQualifier]Path
==
[ForwardAxis NodeTest StepQualifier]ForwardPath

A reverse axis is directly translated to core XQuery

 
[ReverseAxis NodeTest StepQualifier]Path
==
[ReverseAxis NodeTest StepQualifier]ReversePath

Core Grammar

We have now normalized all Axis Steps into the following core grammar.

[34]   AxisStep   ::=   Axis NodeTest
[35]   Axis   ::=   "child" "::"
|  "descendant" "::"
|  "parent" "::"
|  "attribute" "::"
|  "self" "::"
|  "descendant-or-self" "::"
[36]   NodeTest   ::=   NameTest |  KindTest
[37]   NameTest   ::=   QName |  Wildcard
[38]   Wildcard   ::=   "*" |  ":"? NCName ":" "*" |  "*" ":" NCName
[39]   KindTest   ::=   ProcessingInstructionTest
|  CommentTest
|  TextTest
|  AnyKindTest
[40]   ProcessingInstructionTest   ::=   "processing-instruction" "(" StringLiteral? ")"
[41]   CommentTest   ::=   "comment" "(" ")"
[42]   TextTest   ::=   "text" "(" ")"
[43]   AnyKindTest   ::=   "node" "(" ")"

Next, we give the static and dynamic semantics for Axis Steps.

Notation

We use the term Filter to denote either an axis or a node test.

[95]   Filter   ::=   Axis |  NodeTest

To define the semantics of axis and nodetest, we use the two following auxiliary judgments for filters. The first judgment is defining the effect of a filter (either an axis or a nodetest) on a value, and is used in the dynamic semantics. This judgment should be read as follows: in the current context (i.e., with respect to the dynamic environment dynEnvs.varValue), applying the filter Filter on value Value1 yields the value Value2

dynEnvs.varValue ; Value1 |- Filter => Value2

The second judgment is defining the effect of a filter (either an axis or a nodetest) on a type and is used in the static semantics. This judgment should be read as follows: in the current context, (i.e., with respect to the the static environment statEnvs.varType), applying the filter Filter on type Type1 yields the type Type2

statEnvs.varType ; Type1 |- Filter : Type2

In a few cases, when applying the type filter judgment recursively, we need to use an additional type environment that contains local type information. We use the following notation.

statEnvs ; localTypeEnv ; Type1 |- Filter : Type2

We write updates to the local type environment as "localTypeEnv[qname : Type]"

Static Type Analysis

The static semantics of an Axis NodeTest pair is obtained by retrieving the type of the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the static filter judgment as follows.

statEnvs.varType($fs:dot) = Type1
statEnvs ; Type1 |- Axis : Type2
statEnvs ; Type2 |- NodeTest : Type3

statEnvs |- Axis NodeTest : Type3

Dynamic Evaluation

The dynamic semantics of an Axis NodeTest pair is obtained by retrieving the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the filter judgment as follows.

dynEnvs.varValue($fs:dot) = Value1
dynEnvs.varValue ; Value1 |- Axis => Value2
dynEnvs.varValue ; Value2 |- NodeTest => Value3

dynEnvs.varValue |- Axis NodeTest => Value3

We now define the filter judgment used in the dynamic semantics. This is done in two sets of inference rules. The first set of inference rules is common to all filters, and defines the part of the semantics which applies to both Axis and NodeTest. The second set of inference rules is specific to either Axis or NodeTest and is given in the corresponding subsections below.

Static Type Analysis

Here are the common set of inference rules defining the static semantics of filters. Essentially, these rules are used to process complex types through a filter, breaking the types down to the type of a node or of a value. The semantics of a filter applied to a node or value type is then specific to each Axis and NodeTest.


statEnvs ; () |- Filter : ()


statEnvs ; none |- Filter : none

statEnvs ; Type1 |- Filter : Type'1
statEnvs ; type2 |- Filter : Type'2

statEnvs ; Type1, Type2 |- Filter : Type'1,Type'2

statEnvs ; Type1 |- Filter : Type'1
statEnvs ; type2 |- Filter : Type'2

statEnvs ; Type1 | Type2 |- Filter : Type'1 | Type'2

statEnvs ; Type1 |- Filter : Type'1
statEnvs ; type2 |- Filter : Type'2

statEnvs ; Type1 & Type2 |- Filter : Type'1 & Type'2

Dynamic Evaluation

Here are the common set of inference rules defining the dynamic semantics of filters. These rules apply filters to a sequence, by breaking the sequence down to a node or value. The semantics of a filter applied to a node or value is then specific to each Axis and NodeTest.


dynEnvs.varValue ; () |- Filter => ()

dynEnvs.varValue ; Value1 |- Filter => Value'1
dynEnvs.varValue ; Value2 |- Filter => Value'2

dynEnvs.varValue ; Value1, Value2 |- Filter => Value'1,Value'2

Ed. Note: Jerome: These rules use a notation abuse. Value'1,Value'2 indicates the concatenation of Value'1 and Value'2, and is used for both constructing a new sequence and deconstructing it. When doing the construction, it is equivalent to applying the op:concatenate operator. We should strive for more consistent notation for data model constructions and deconstruction, throughout the formal semantics specification. See also [Issue-0118:  Data model syntax and literal values].

4.3.1.1 Axes

Introduction

[35]   Axis   ::=   "child" "::"
|  "descendant" "::"
|  "parent" "::"
|  "attribute" "::"
|  "self" "::"
|  "descendant-or-self" "::"

Axis are used to traverse the document, returning nodes related to the context node inside the tree (children, parent, attribute, etc).

Static Type Analysis

The following rules define the static semantics of the filter judgment when applied to an Axis.

The type for the self axis is the same type as the type of the context node.


statEnvs ; NodeType |- self:: : NodeType

In case of elements, the type of the child axis is obtained by extracting the children types out of the content model describing the type of the context node.


statEnvs ; element qname { AttrType, ChildType } |- child:: : ChildType

The type of the attribute axis is obtained by extracting the attribute types out of the content model describing the type of the context node.


statEnvs ; element qname { AttrType, ChildType } |- attribute:: : AttrType

The type for the parent axis is always an element, possibly optional.


statEnvs ; element |- parent:: : element?

Ed. Note: Better typing for the parent axis is still an open issue. See [Issue-0080:  Typing of parent].

The type for the namespace axis is always empty.


statEnvs NodeType |- namespace:: : ()

Ed. Note: Jerome: The type of namespace nodes is still an open issue. See [Issue-0105:  Types for nodes in the data model.].

The types for the descendant, and descendant-or-self axis are implemented through recursive application of the children and parent filters. The corresponding inference rules use the auxiliary filter judgment with a local type environment. This local type environment is used to keep track of the visited elements in order to deal with recursive types.

statEnvs ; { } ; Type |- descendant:: : Type'

statEnvs ; Type |- descendant:: : Type'

The two following rules are used to deal with global elements. Global elements need careful treatment here in order to deal with possible recursion. Notably, if the global element has been seen before, then the rule terminates and returns the union of all types already visited in the local environment.

localTypeEnv(qname) = Type
localTypeEnv = { Type1, Type2, ... }

statEnvs ; element qname |- descendant:: : (Type1 | Type2 | ...)*

localTypeEnv(qname) is an error
statEnvs.varType(qname) = Type
statEnvs ; localTypeEnv[qname : Type] ; Type |- descendant:: : Type'

statEnvs localTypeEnv ; element qname |- descendant:: : element qname, Type'

Ed. Note: Peter: I need yet to figure out, whether this works, there still appear a few glitches. See [Issue-0132:  Typing for descendant].

In all other cases, the following rule applies

statEnvs ; NodeType |- child:: : Type1
statEnvs ; Type1 |- descendant:: : Type2

statEnvs NodeType |- descendant:: : Type1, Type2

statEnvs ; NodeType |- self:: : Type1
statEnvs ; Type1 |- descendant:: : Type2

statEnvs ; NodeType |- descendant-or-self:: : Type1, Type2

In all the other cases, the filter application results in an empty type.

statEnvs ; NodeType |- AxisName:: : () otherwise.

Dynamic Evaluation

The following rules define the dynamic semantics of the filter judgment when applied to an axis.

The self axis just returns the context node.


dynEnvs.varValue ; NodeValue |- self:: => NodeValue

The child, parent, attribute and namespace axis are implemented through their corresponding accessors in the [XQuery 1.0 and XPath 2.0 Data Model].


dynEnvs.varValue ; NodeValue |- child:: => dm:children(NodeValue)


dynEnvs.varValue ; NodeValue |- attribute:: => dm:attributes(NodeValue)


dynEnvs.varValue ; NodeValue |- parent:: => dm:parent(NodeValue)

The descendant, descendant-or-self, ancestor, and ancestor-or-self axis are implemented through recursive application of the children and parent filters.

dynEnvs.varValue ; NodeValue |- child:: => Value1
dynEnvs.varValue ; Value1 |- descendant:: => Value2

dynEnvs.varValue ; NodeValue |- descendant:: => Value1, Value2

dynEnvs.varValue ; NodeValue |- self:: => Value1
dynEnvs.varValue ; Value1 |- descendant:: => Value2

dynEnvs.varValue ; NodeValue |- descendant-or-self:: => Value1, Value2

dynEnvs.varValue ; NodeValue |- parent:: => Value1
dynEnvs.varValue ; Value1 |- ancestor:: => Value2

dynEnvs.varValue ; NodeValue |- ancestor:: => Value1, Value2

dynEnvs.varValue ; NodeValue |- self:: => Value1
dynEnvs.varValue ; Value1 |- ancestor:: => Value2

dynEnvs.varValue ; NodeValue |- ancestor-or-self:: => Value1, Value2

In all the other cases, the filter application results in an empty sequence.

dynEnvs.varValue ; NodeValue |- AxisName:: => () otherwise.

Ed. Note: Peter: Generally steps may operate on nodes and values alike; the axis rules only can operate on nodes (NodeValue). Is it a dynamic error to apply an axis rule on a value? See [Issue-0125:  Operations on node only in XPath].

4.3.1.2 Node Tests

Introduction

A node test is a condition applied on the nodes selected by an axis step. Possible node tests are described by the following grammar productions.

[36]   NodeTest   ::=   NameTest |  KindTest
[37]   NameTest   ::=   QName |  Wildcard
[38]   Wildcard   ::=   "*" |  ":"? NCName ":" "*" |  "*" ":" NCName
[39]   KindTest   ::=   ProcessingInstructionTest
|  CommentTest
|  TextTest
|  AnyKindTest
[40]   ProcessingInstructionTest   ::=   "processing-instruction" "(" StringLiteral? ")"
[41]   CommentTest   ::=   "comment" "(" ")"
[42]   TextTest   ::=   "text" "(" ")"
[43]   AnyKindTest   ::=   "node" "(" ")"

The inference rules for the filter node tests indicate, for each node step and each input node, whether the node should be kept or left out. Therefore, each rule either returns the input node, or returns the empty sequence. The overall result is then obtained by putting together all of the remaining nodes, following the generic semantics for filters.

Static Type Analysis

Node tests on elements and attributes always accomplish the most specific type possible. For example, if $v is bound to an element with a computed name, the type of $v is element * { Type }. The static type computed for the expression $v/self::foo is element foo {Type}, which makes use of the nametest foo to arrive at a more specific type.

qname2 = prefix2:local2
xf:get-namespace-uri( qname1) = statEnvs.varType(prefix2)
xf:get-local-name( qname1 ) = local2

statEnvs ; element qname1 {Type} |- qname2 : element qname1 {Type}

qname2 = prefix2:local2
local1 = local2

statEnvs ; element *:local1 {Type} |- qname2 : element qname2 {Type}

qname2 = prefix2:local2
statEnvs.namespace(prefix1) = statEnvs.namespace(prefix2)

statEnvs ; element prefix1:* {Type} |- qname2: element prefix1:local2{Type}


statEnvs ; element * {Type} |- qname2 : element qname2 {Type}

xf:get-local-name( qname1 ) = local2

statEnvs ; element qname1 {Type} |- *:local2 : element qname1 {Type}

local1 = local2

statEnvs ; element *:local1 {Type} |- *:local2 : element *:local2 {Type}


statEnvs ; element prefix1:* {Type} |- *:local2: element prefix1:local2{Type}


statEnvs ; element * { Type } |- *:local2 : element *:local2 {Type}

xf:get-namespace-uri( qname1) = statEnvs.namespace(prefix2)

statEnvs ; element qname1 {Type} |- prefix2:* : element qname1 {Type}


statEnvs ; element *:local1 {Type} |- prefix2:* : element prefix2:local1 {Type}

statEnvs.namespace(prefix1) = statEnvs.namespace(prefix2)

statEnvs ; element prefix1:* {Type} |- prefix2:* : element prefix1:* {Type}


statEnvs ; element * {Type} |- prefix2:* : element prefix2:* {Type}


statEnvs ; element prefix:local {Type} |- * : element prefix:local {Type}

Very similar typing rules apply to attributes:

qname2 = prefix2:local2
xf:get-namespace-uri( qname1) = statEnvs.namespace(prefix2)
xf:get-local-name( qname1 ) = local2

statEnvs ; attribute qname1 {Type} |- qname2 : attribute qname1 {Type}

qname2 = prefix2:local2
local1 = local2

statEnvs ; attribute *:local1 {Type} |- qname2 : attribute qname2 {Type}

qname2 = prefix2:local2
statEnvs.namespace(prefix1) = statEnvs.namespace(prefix2)

statEnvs ; attribute prefix1:* {Type} |- qname2: attribute prefix1:local2{Type}


statEnvs ; attribute * {Type} |- qname2 : attribute qname2 {Type}

xf:get-local-name( qname1 ) = local2

statEnvs ; attribute qname1 {Type} |- *:local2 : attribute qname1 {Type}

local1 = local2

statEnvs ; attribute *:local1 {Type} |- *:local2 : attribute *:local2 {Type}


statEnvs ; attribute prefix1:* {Type} |- *:local2: attribute prefix1:local2{Type}


statEnvs ; attribute * {Type} |- *:local2 : attribute *:local2 {Type}

xf:get-namespace-uri( qname1) = statEnvs.namespace(prefix2)

statEnvs ; attribute qname1 {Type} |- prefix2:* : attribute qname1 {Type}


statEnvs ; attribute *:local1 {Type} |- prefix2:* : attribute prefix2:local1 {Type}

statEnvs.namespace(prefix1) = statEnvs.namespace(prefix2)

statEnvs ; attribute prefix1:* {Type} |- prefix2:* : attribute prefix1:* {Type}


statEnvs ; attribute * {Type} |- prefix2:* : attribute prefix2:* {Type}


statEnvs ; attribute prefix:local {Type} |- * : attribute prefix:local {Type}

Comments, processing instructions, and text:


statEnvs ; processing-instruction |- processing-instruction () : processing-instruction


statEnvs ; processing-instruction |- processing-instruction (string) : processing-instruction


statEnvs ; comment |- comment () : comment


statEnvs ; text |- text () : text


statEnvs ; NodeType |- node () : NodeType

If none of the above rules applies then the node test returns the empty sequence, and the following dynamic rule is applied:

statEnvs ; NodeType |- node () : ()

Ed. Note: Peter: Except for self::, all axes guarantee that the NodeType is not the generic typenode. However, when the type node is encountered, it has to be interpreted as element | attribute | text | comment | processing-instruction for these typing rules to work.

Dynamic Evaluation

dm:name( NodeValue ) = qname1
qname2 = prefix2:local2
xf:get-namespace-uri ( qname1 ) = statEnvs.namespace(prefix2)
xf:get-local-name ( qname1 ) = local2

dynEnvs.varValue ; NodeValue |- qname2 => NodeValue

dm:name ( NodeValue ) = qname

dynEnvs.varValue ; NodeValue |- * => NodeValue

dm:name ( NodeValue ) = qname
xf:get-namespace-uri ( qname ) = statEnvs.namespace(prefix)

dynEnvs.varValue ; NodeValue |- prefix:* => NodeValue

dm:name ( NodeValue ) = qname
xf:get-local-name ( qname ) = local

dynEnvs.varValue ; NodeValue |- *:local => NodeValue

dm:node-kind ( NodeValue ) = "processing-instruction"

dynEnvs.varValue ; NodeValue |- processing-instruction () => NodeValue

dm:node-kind ( NodeValue ) = "processing-instruction"
dm:name ( NodeValue ) = qname
xf:get-local-name ( qname ) = String

dynEnvs.varValue ; NodeValue |- processing-instruction ( String ) => NodeValue

Ed. Note: Note the use of the xf:get-local-name function to extract the local name out of the name of the node.

dm:node-kind ( NodeValue ) = "comment"

dynEnvs.varValue ; NodeValue |- comment () => NodeValue

dm:node-kind ( NodeValue ) = "text"

dynEnvs.varValue ; NodeValue |- text () => NodeValue

The node() node test is true for all nodes. Therefore, the following rule does not have any precondition (remember that an empty upper part in the rule indicates that the rule is always true).


dynEnvs.varValue ; NodeValue |- node () => NodeValue

If none of the above rules applies then the node test returns the empty sequence, and the following dynamic rule is applied:

dynEnvs.varValue ; NodeValue |- node () => ()

4.3.2 General Steps

Introduction

General steps are composed of primary expressions, followed by a step qualifier. The semantics of step qualifiers is defined in the next subsection.

[44]   GeneralStep   ::=   PrimaryExpr StepQualifiers

Normalization

Primary expressions are normalized as expressions. I.e., their Path normalization is obtained by applying the general normalization rules for expressions. In GeneralSteps, StepQualifiers are always applied as forward steps. Note that the following rule makes use of a typewitch expression to ensure the path expression is always applied on a sequence of nodes.

Ed. Note: Jerome: where to enforce the restriction that XPath expressions operate on nodes is still an open issue. See [Issue-0125:  Operations on node only in XPath].

 
[PrimaryExpr]Path
==
typeswitch [PrimaryExpr]Expr as $fs:new
  case node* return $fs:new
  default return dm:error()
 
[PrimaryExpr StepQualifiers]Path
==
[[PrimaryExpr]Path StepQualifiers]ForwardPath

4.3.3 Step Qualifiers

Introduction

Step qualifiers are composed of zero or more predicates (using the [ ... ] notation) or dereference operations (using the => notation).

[46]   StepQualifiers   ::=   (("[" Expr "]") |  ("=>" NameTest))*

Normalization

Normalization is first applied on all the components of a step qualifier. Remember that StepQualifiers might be normalized through either a ForwardPath rule, or a ReversePath rule.

 
[StepQualifier1 StepQualifier2 ... ]ForwardPath
==
[StepQualifier1]ForwardPath [StepQualifier2]ForwardPath ...
 
[StepQualifier1 StepQualifier2 ... ]ReversePath
==
[StepQualifier1]ReversePath [StepQualifier2]ReversePath ...
4.3.3.1 Predicates

Normalization

We now define the semantics of predicates, for both forward and reverse steps.

 
[Expr1 [ Expr2 ] ]ForwardPath
==
let $fs:sequence := [Expr1]Path return
let $fs:last := xf:length(Sequence) return
for $fs:position in op:to(1,$fs:last) return
let $fs:last := xf:length($fs:sequence) return
let $fs:dot := fs:unique-item-at($fs:sequence, $fs:position) return
if [Expr2]Predicate then $fs:dot else ()
 
[Expr1 [ Expr2 ] ]ReversePath
==
let $fs:sequence := [Expr1]Path return
let $fs:last := xf:length(Sequence)
for $fs:counter in op:to(1,$fs:last) return
let $fs:position := $fs:last + 1 - $fs:counter return
let $fs:last := 1 return
let $fs:dot := fs:unique-item-at($fs:sequence, $fs:position)
return [Expr2]Predicate) then $fs:dot else ()

Predicates in path expressions are normalized with a special mapping rule:

 
[Expr]Predicate
==
typeswitch [Expr] as $v
case () return xf:false()
case numeric return op:numeric-equal(xf:round($v), $fs:position)
case xs:boolean return $v
default return
if (xf:length([$v/self::node()]Path)>=1)
then xf:true()
else dm:error()
4.3.3.2 Dereferences

Normalization

Dereference steps are mapped into an iteration, followed by a call to the dereference function, then followed by the appropriate NameTest

 
[ ForwardStep "=>" NameTest ]Path
==
for $fs:dot in [ ForwardStep ]_Expr return
let $fs:dot := dm:dereference($fs:dot) return
self::NameTest

4.3.4 Unabbreviated Syntax

Ed. Note: XQuery Section 2.3.4 has no semantic content -- it just contains examples! (This suggests that perhaps the section structure is a little weird.)

4.3.5 Abbreviated Syntax

[45]   AbbreviatedStep   ::=   "." |  ".." |  ("@" NameTest StepQualifiers)

Normalization

Here are normalization rules for the abbreviated syntax.

 
[ // RelativePathExpr ]Path
==
/ descendant-or-self::node() / [RelativePathExpr]Path
 
[ StepExpr // RelativePathExpr ]Path
==
[StepExpr]Step / descendant-or-self::node() / [RelativePathExpr]Path
 
[ . ]Path
==
 
[ .. ]Path
==
parent::node()
 
[ @ NodeTest ]Path
==
attribute :: NodeTest
 
[ NodeTest ]Path
==
child :: NodeTest

4.4 Sequence Expressions

Introduction

This section defines the semantics of [2.4 Sequence Expressions] in [XQuery 1.0: A Query Language for XML].

XQuery supports operators to construct and combine sequences. A sequence is an ordered collection of zero or more items. An item is either an atomic value or a node.

4.4.1 Constructing Sequences

[50]   ParenthesizedExpr   ::=   "(" ExprSequence? ")"
[3]   ExprSequence   ::=   Expr ("," Expr)*
[17]   RangeExpr   ::=   Expr  "to"  Expr

Normalization

The sequence expression is normalized into a sequence of core expressions:

 
["(" ExprSequence ")"]
==
"(" [ ExprSequence ] ")"
 
[Expr1 , ... , Exprn]
==
[Expr1], ..., [Exprn]

Ed. Note: Mike Kay remarks that it would be cleaner to have binary rules and recursion to define the semantics rather than the ... notation.

Core Grammar

The core grammar rule for sequence expressions is:

[3]   ExprSequence   ::=   Expr ("," Expr)*

Static Type Analysis

The static semantics of the sequence expression follows. The type of the sequence expression is the sequence over the types of individual expressions.

statEnvs |- Expr1 : Type1      ...     statEnvs |- Exprn : Typen

statEnvs |- Expr1 , ... , Exprn : Type1, ..., Typen

Dynamic Evaluation

The dynamic semantics of the sequence expression follows. Each expression in the sequence is evaluated and the resulting values are concatenated into one sequence.

dynEnvs |- Expr1 => value1     dynEnvs |- Expr2 => value2     ...     dynEnvs |- Exprn => valuen

dynEnvs |- Expr1, ..., Exprn => op:concatenate(value1, op:concatenate(value2, op:concatenate(... , valuen)...)

Normalization

The normalization of the infix "to" operator maps it into an application of op:to.

 
[Expr1 "to" Expr2]
==
[op:to (Expr1, Expr2)]

Static Type Analysis

The static semantics of the op:to function is defined in [6 Additional Semantics of Functions].

Dynamic Evaluation

The dynamic semantics rules for function calls given in [4.2.4 Function Calls] are applied to the function call op:to above.

Ed. Note: Should the "to" operator be defined formally? See [Issue-0133:  Should to also be described in the formal semantics?].

4.4.2 Combining Sequences

XQuery provides several operators for combining sequences.

[20]   UnionExpr   ::=   Expr  ("union" |  "|")  Expr
[21]   IntersectExceptExpr   ::=   Expr  ("intersect" |  "except")  Expr

Notation

First, a union, intersect, or except expression is normalized into a corresponding core expression. The functions []SequenceOp and []SequenceValueOp are defined by the following tables:

SequenceOp[SequenceOp]SequenceOp
"union"op:union
"|"op:union
"intersect"op:intersect
"except"op:except
SequenceOp[SequenceOp]SequenceValueOp
"union"op:union-value
"|"op:union-value
"intersect"op:intersect-value
"except"op:except-value

Normalization

 
[Expr1 SequenceOp Expr2]
==
typeswitch (Expr1) as $v1
case fs:atomic+ return
     (typeswitch (Expr2) as $v2
     case fs:atomic+ return [SequenceOp]SequenceValueOp($v1, $v2)
     default return dm:error())
case node+
     (typeswitch (Expr2) as $v2
     case node+ return [SequenceOp]SequenceOp($v1, $v2)
     default return dm:error())
default return dm:error()

Ed. Note: MFF: this mapping assumes that there are two versions of union: one based on node identity and one based on value equality. What happens if we have a heterogeneous sequence is not clear. See [Issue-0120:  Sequence operations: value vs. node identity].

Static Type Analysis

The static semantics of the functions that operate on sequences are defined in [6 Additional Semantics of Functions].

Dynamic Evaluation

The dynamic semantics rules for function calls given in [4.2.4 Function Calls] are applied to the calls to functions on sequences above.

4.5 Arithmetic Expressions

This section defines the semantics of [2.5 Arithmetic Expressions] in [XQuery 1.0: A Query Language for XML].

XQuery provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms.

[18]   AdditiveExpr   ::=   Expr  ("+" |  "-")  Expr
[19]   MultiplicativeExpr   ::=   Expr  ( "*" |  "div" |  "mod")  Expr
[22]   UnaryExpr   ::=   ("-" |  "+")  Expr

Notation

Ed. Note: MFF: The operator table in this section was produced by Don Chamberlin. This table should be in one place, probably the formal semantics

The tables in this section list the combinations of datatypes for which the various operators of XQuery are defined. For each valid combination of datatypes, the table indicates the name of the function that implements the operator and the datatype of the result. Definitions of the functions can be found in [XQuery 1.0 and XPath 2.0 Functions and Operators].

In the following tables, the term fs:numeric refers to the types xs:integer, xs:decimal, xs:float, and xs:double. When the result type of an operator is listed as fs:numeric, it means "same as the highest type of any input operand, in promotion order." For example, when invoked with operands of type xs:integer and xs:float, the binary + operator returns a result of type xs:float.

In the following tables, the term Gregorian refers to the types xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gDay, and xs:gMonth. For binary operators that accept two Gregorian-type operands, both operands must have the same type (for example, if one operand is of type xs:gDay, the other operand must be of type xs:gDay.)

The functions []BinaryOp and []UnaryOp are defined by the following two tables. The function []BinaryOp takes the left-hand expression (A) and its type, the operator, the right-hand expression (b) and its type and returns a new expression, which applies the type-appropriate operator to the two expressions. The function []UnaryOp takes an operator and an expression and returns a new expression, which applies the appropriate operator to the expression.

OperatorType(A)Type(B)[A; Type(A); Operator; B; Type(B)]BinaryOpResult type
A + Bfs:numericfs:numericop:numeric-add(A, B)fs:numeric
A + Bxs:datexs:duration(missing)xs:date
A + Bxs:durationxs:date(missing)xs:date
A + Bxs:timexs:duration(missing)xs:time
A + Bxs:durationxs:time(missing)xs:time
A + Bxs:dateTimexs:durationop:get-end-datetime(A, B)xs:dateTime
A + Bxs:durationxs:dateTimeop:get-end-datetime(B, A)xs:dateTime
A + Bxs:durationxs:duration(missing)xs:duration
A - Bfs:numericfs:numericop:numeric-subtract(A, B)fs:numeric
A - Bxs:datexs:date(missing)xs:duration
A - Bxs:datexs:duration(missing)xs:date
A - Bxs:timexs:time(missing)xs:duration
A - Bxs:timexs:duration(missing)xs:time
A - Bxs:dateTimexs:dateTimeop:get-duration(A, B)xs:duration
A - Bxs:dateTimexs:durationop:get-start-datetime(A, B)xs:dateTime
A - Bxs:durationxs:duration(missing)xs:duration
A * Bfs:numericfs:numericop:numeric-multiply(A, B)fs:numeric
A div Bfs:numericfs:numericop:numeric-divide(A, B)fs:numeric
A mod Bfs:numericfs:numericop:numeric-mod(A, B)fs:numeric
A eq Bfs:numericfs:numericop:numeric-equal(A, B)xs:boolean
A eq Bxs:booleanxs:booleanop:boolean-equal(A, B)xs:boolean
A eq Bxs:stringxs:stringop:numeric-equal(xf:compare(A, B), 1)xs:boolean
A eq Bxs:datexs:date(missing)xs:boolean?
A eq Bxs:timexs:time(missing)xs:boolean?
A eq Bxs:dateTimexs:dateTimeop:datetime-equal(A, B)xs:boolean?
A eq Bxs:durationxs:durationop:duration-equal(A, B)xs:boolean?
A eq BGregorianGregorian(missing)xs:boolean
A eq Bxs:hexBinaryxs:hexBinaryop:hex-binary-equal(A, B)xs:boolean
A eq Bxs:base64Binaryxs:base64Binaryop:base64-binary-equal(A, B)xs:boolean
A eq Bxs:anyURIxs:anyURI(missing)xs:boolean
A eq Bxs:QNamexs:QName(missing)xs:boolean
A ne Bfs:numericfs:numericxf:not(op:numeric-equal(A, B))xs:boolean
A ne Bxs:booleanxs:booleanxf:not(op:boolean-equal(A, B))xs:boolean
A ne Bxs:stringxs:stringxf:not(op:numeric-equal(xf:compare(A, B), 1))xs:boolean
A ne Bxs:datexs:date(missing)xs:boolean?
A ne Bxs:timexs:time(missing)xs:boolean?
A ne Bxs:dateTimexs:dateTimexf:not3(op:datetime-equal(A, B))xs:boolean?
A ne Bxs:durationxs:durationxf:not3(op:duration-equal(A, B))xs:boolean?
A ne BGregorianGregorian(missing)xs:boolean
A ne Bxs:hexBinaryxs:hexBinaryxf:not(op:hex-binary-equal(A, B))xs:boolean
A ne Bxs:base64Binaryxs:base64Binaryxf:not(op:base64-binary-equal(A, B))xs:boolean
A ne Bxs:anyURIxs:anyURI(missing)xs:boolean
A ne Bxs:QNamexs:QName(missing)xs:boolean
A eq Bxs:NOTATIONxs:NOTATION(missing)xs:boolean
A ne Bxs:NOTATIONxs:NOTATION(missing)xs:boolean
A gt Bfs:numericfs:numericop:numeric-greater-than(A, B)xs:boolean
A gt Bxs:booleanxs:boolean(missing)xs:boolean
A gt Bxs:stringxs:stringop:numeric-greater-than(xf:compare(A, B), 0)xs:boolean
A gt Bxs:datexs:date(missing)xs:boolean?
A gt Bxs:timexs:time(missing)xs:boolean?
A gt Bxs:dateTimexs:dateTimeop:datetime-greater-than(A, B)xs:boolean?
A gt Bxs:durationxs:durationop:duration-greater-than(A, B)xs:boolean?
A lt Bfs:numericfs:numericop:numeric-less-than(A, B)xs:boolean
A lt Bxs:booleanxs:boolean(missing)xs:boolean
A lt Bxs:stringxs:stringop:numeric-less-than(xf:compare(A, B), 0)xs:boolean
A lt Bxs:datexs:date(missing)xs:boolean?
A lt Bxs:timexs:time(missing)xs:boolean?
A lt Bxs:dateTimexs:dateTimeop:datetime-less-than(A, B)xs:boolean?
A lt Bxs:durationxs:durationop:duration-less-than(A, B)xs:boolean?
A ge Bfs:numericfs:numericop:numeric-less-than(B, A)xs:boolean
A ge Bxs:stringxs:stringop:numeric-greater-than(xf:compare(A, B), -1)xs:boolean
A ge Bxs:datexs:date(missing)xs:boolean?
A ge Bxs:timexs:time(missing)xs:boolean?
A ge Bxs:dateTimexs:dateTimeop:datetime-less-than(B, A)xs:boolean?
A ge Bxs:durationxs:durationop:duration-less-than(B, A)xs:boolean?
A le Bfs:numericfs:numericop:numeric-greater-than(B, A)xs:boolean
A le Bxs:stringxs:stringop:numeric-less-than(xf:compare(A, B), 1)xs:boolean
A le Bxs:datexs:date(missing)xs:boolean?
A le Bxs:timexs:time(missing)xs:boolean?
A le Bxs:dateTimexs:dateTimeop:datetime-greater-than(B, A)xs:boolean?
A le Bxs:durationxs:durationop:duration-greater-than(B, A)xs:boolean?
A == Bnodenodeop:node-equal(A, B)xs:boolean
A !== Bnodenodexf:not(op:node-equal(A, B))xs:boolean
A << Bnodenodeop:node-before(A, B)xs:boolean
A >> Bnodenodeop:node-after(A, B)xs:boolean
A precedes Bnodenode(missing)xs:boolean
A follows Bnodenode(missing)xs:boolean
A union Bitem*item*op:union(A, B)item*
A | Bitem*item*op:union(A, B)item*
A intersect Bitem*item*op:intersect(A, B)item*
A except Bitem*item*op:except(A, B)item*
A to Bxs:decimalxs:decimalop:to(A, B)xs:integer+
A , Bitem*item*op:concatenate(A, B)item*

An analogous table exists for unary operators.

OperatorOperand type [Operator; Expr]UnaryOpResult type
+ Afs:numericop:numeric-unary-plus(A)same as operand
- Afs:numericop:numeric-unary-minus(A)same as operand

Normalization

The normalization rules for the arithmetic operators "+" and "-" are similar, but not identical, because as the table above illustrates, "-" is not commutative.

The following normalization rule for "+" first applies []Optional_Atomic_Type to each argument expression, binding the results of these expressions to two new variables, $e1 and $e2. It then applies a typeswitch on the left-hand operand $e1, and for each left-hand operand type, it applies a second typeswitch on the right-hand operand $e2. The function [Operator]BinaryOp takes the operator, the left-hand type, and the right-hand type and returns the appropriate function, which is applied to the argument values.

 
[Expr1 "+" Expr2]
==
let $coree1 := ([ Expr1 ]),
     $coree2 := ([ Expr2 ]),
     $e1 := [ $coree1 ]Optional_Atomic_Value,
     $e2 := [ $coree2 ]Optional_Atomic_Value
return
typeswitch ($e1) as $v1
case () return ()
case fs:numeric return
     (typeswitch ($e2) as $v2
     case () return ()
         case fs:numeric return [ $v1; fs:numeric; "+"; $v2; fs:numeric ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; fs:numeric; "+"; (cast ($v2) as xs:double); fs:numeric ]BinaryOp
     default return dm:error())
case xs:date return
     (typeswitch ($e2) as $v2
     case xs:duration return [ $v1; xs:date; "+"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:date; "+"; (cast ($v2) as xs:duration ); xs:duration ]BinaryOp
     default return dm:error())
case xs:time return
     (typeswitch ($e2) as $v2
     case xs:duration return [ $v1; xs:time; "+"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:time; "+"; (cast ($v2) as xs:duration ); xs:duration ]BinaryOp
     default return dm:error())
case xs:dateTime return
     (typeswitch ($e2) as $v2
     case xs:duration return [$v1; xs:dateTime; "+"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:dateTime; "+"; (cast ($v2) as xs:duration ); xs:duration ]BinaryOp
     default return dm:error())
case xs:duration return
     (typeswitch ($e2) as $v2
     case xs:duration return [ $v1; xs:duration; "+"; $v2; xs:duration ]BinaryOp
     case xs:date return [$v1; xs:duration; "+"; $v2; xs:date ]BinaryOp
     case xs:time return [ $v1; xs:duration; "+"; $v2; xs:time ]BinaryOp
     case xs:dateTime return [ $v1; xs:duration; "+"; $v2; xs:dateTime ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:duration; "+"; (cast ($v2) as xs:duration); xs:duration ]BinaryOp
     default return dm:error())
case fs:UnknownSimpleType return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return
         [ (cast ($v1) as xs:double); xs:double; "+"; $v2; xs:double ]BinaryOp
     case xs:duration return
         [(cast ($v1) as xs:duration); xs:duration; "+"; $v2; xs:duration; ]BinaryOp
     case xs:time
         [ (cast ($v1) as xs:duration); xs:duration; "+"; $v2; xs:time ]BinaryOp
     case xs:dateTime return
         [(cast ($v1) as xs:duration); xs:duration; "+"; $v2; xs:dateTime ]BinaryOp
     case fs:UnknownSimpleType return
         [ (cast ($v1) as xs:double); xs:double; "+";(cast ($v2) as xs:double); xs:double ]BinaryOp
     default return dm:error())
default return dm:error()

Ed. Note: MFF: The XQuery document specifies that the casting should depend on the lexical form. This cannot be captured by normalization. See [Issue-0128:  Casting based on the lexical form].

Ed. Note: MFF: The Datatype production does not permit choices of item types -- this is annoying. See [Issue-0127:  Datatype limitations].

Ed. Note: Peter: the static semantics of operators is as strict as the one of functions, and is still under discussion. See [Issue-0129:  Static typing of union].

The following normalization rule for "-" is analogous to that for "+".

 
[Expr1 "-" Expr2]
==
let $coree1 := ([ Expr1 ]),
     $coree2 := ([ Expr2 ]),
     $e1 := [ $coree1 ]Optional_Atomic_Value,
     $e2 := [ $coree2 ]Optional_Atomic_Value
return
typeswitch ($e1) as $v1
case () return ()
case fs:numeric return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return [ $v1; fs:numeric; "-"; $v2; fs:numeric ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; fs:numeric; "-"; (cast ($v2) as xs:double); fs:numeric ]BinaryOp
default return dm:error())
case xs:date return
     (typeswitch ($e2) as $v2
     case xs:duration return [ $v1; xs:date; "-"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:date; "-"; (cast ($v2) as xs:duration ); xs:duration ]BinaryOp
     default return dm:error())
case xs:time return
     (typeswitch ($e2) as $v2
     case xs:duration return [$v1; xs:time; "-"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:time; "-"; (cast ($v2) as xs:duration ); xs:duration ]BinaryOp
     default return dm:error())
case xs:dateTime return
     (typeswitch ($e2) as $v2
     case xs:duration return [$v1; xs:dateTime; "-"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [$v1; xs:dateTime; "-"; (cast ($v2) as xs:duration );xs:duration ]BinaryOp
     default return dm:error())
case xs:duration return
     (typeswitch ($e2) as $v2
     case xs:duration return [ $v1; xs:duration; "-"; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:duration; "-"; (cast ($v2) as xs:duration); xs:duration ]BinaryOp
     default return dm:error())
case fs:UnknownSimpleType return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return
         [ (cast ($v1) as xs:double); xs:double; "-" $v2; fs:numeric]BinaryOp
     case xs:duration return
         [(cast ($v1) as xs:duration); xs:duration; "-"; $v2; xs:duration ]BinaryOp
     case xs:time return
         [ (cast ($v1) as xs:duration); xs:duration; "-"; $v2 ; xs:time ]BinaryOp
     case xs:dateTime return
         [ (cast ($v1) as xs:duration); xs:duration; "-"; $v2; xs:dateTime]BinaryOp
     case fs:UnknownSimpleType return
         [ (cast ($v1) as xs:double); xs:double; "-"; (cast ($v2) as xs:double)' xs:double ]BinaryOp
     default return dm:error())
default return dm:error()

The multiplicative operators "*", "div", and "mod" are only defined on fs:numeric, so their normalization rule is simple. For convenience, MultOp denotes "*", "div", or "mod".

 
[Expr1 MultOp Expr2]
==
let $coree1 := ([ Expr1 ]),
     $coree2 := ([ Expr2 ]),
     $e1 := [ $coree1 ]Optional_Atomic_Value ,
     $e2 := [ $coree2 ]Optional_Atomic_Value
return
typeswitch ($e1) as $v1
case () return ()
case fs:numeric return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return [ $v1; fs:numeric; MultOp; $v2; fs:numeric ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; fs:numeric; MultOp; (cast ($v2) as xs:double); fs:numeric ]BinaryOp
default return dm:error())
case fs:UnknownSimpleType return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return
         [(cast ($v1) as xs:double); xs:doube; "-"; $v2; fs:numeric ]BinaryOp
     case fs:UnknownSimpleType return
         [ (cast ($v1) as xs:double); xs:double; "-"; (cast ($v2) as xs:double); xs:double ]BinaryOp
     default return dm:error())
default return dm:error()

Ed. Note: MFF: The XQuery document specifies that the casting should depend on the lexical form. This cannot be captured by normalization. See [Issue-0128:  Casting based on the lexical form].

For convenience, UnaryOp denotes the unary operators "+" and "-". The normalization rule for unary operators is straightforward:

 
[UnaryOp Expr]
==
let $coree1 := ([ Expr1 ]),
     $e1 := [ $coree1 ]Optional_Atomic_Value(xs:double),
return [ UnaryOp; $e1 ]UnaryOp

Core Grammar

There are no core grammar rules for arithmetic expressions as they are normalized to function calls.

Static Type Analysis

In the [XQuery 1.0 and XPath 2.0 Functions and Operators], type promotion rules are given for all the arithmetic operators, denoted by op:operation, and the result types of these operations. The following static semantics rules specifies the result types for all arithmetic operators when applied to specific fs:numeric types.

statEnvs |- Expr1 : Type1      Type1 <: xs:decimal      statEnvs |- Expr2 : Type2      Type2 <: xs:decimal

statEnvs |- op:operation(Expr1, Expr2) : xs:decimal

statEnvs |- Expr1 : Type1      Type1 <: xs:float      statEnvs |- Expr2 : Type2      Type2 <: xs:float

statEnvs |- op:operation(Expr1, Expr2) : xs:float

statEnvs |- Expr1 : Type1      Type1 <: xs:double      statEnvs |- Expr2 : Type2      Type2 <: xs:double

statEnvs |- op:operation(Expr1, Expr2) : xs:double

Ed. Note: MFF: Can the three rules above be factored?

Analogous static type rules are given for the unary arithmetic operators.

statEnvs |- Expr1 : Type1      Type1 <: xs:decimal

statEnvs |- op:operation(Expr1) : xs:decimal

statEnvs |- Expr1 : Type1      Type1 <: xs:float

statEnvs |- op:operation(Expr1) : xs:float

statEnvs |- Expr1 : Type1      Type1 <: xs:double

statEnvs |- op:operation(Expr1) : xs:double

Dynamic Evaluation

The normalization rules map all arithmetic operators into core expressions, whose dynamic semantics is defined in other sections, therefore there are no dynamic semantics rules for arithmetic operators. The dynamic semantics rules for function calls given in [4.2.4 Function Calls] are applied to all the function calls op:numeric-add, etc.

4.6 Comparison Expressions

Introduction

This section defines the semantics of [2.6 Comparison Expressions] in [XQuery 1.0: A Query Language for XML].

Comparison expressions allow two values to be compared. XQuery provides four kinds of comparison expressions, called value comparisons, general comparisons, node comparisons, and order comparisons.

[13]   ValueComp   ::=   Expr  ("eq" |  "ne" |  "lt" |  "le" |  "gt" |  "ge")  Expr
[12]   GeneralComp   ::=   Expr  ("=" |  "!=" |  "<" S |  "<=" |  ">" |  ">=")  Expr
[14]   NodeComp   ::=   Expr  ("==" |  "!==")  Expr
[15]   OrderComp   ::=   Expr  ("<<" |  ">>" |  "precedes" |  "follows")  Expr

4.6.1 Value Comparisons

Normalization

The value comparison equality operators "eq" and "ne" are defined on a large set of types.

 
[Expr1 ValueEqOp Expr2]
==
let $coree1 := ([ Expr1 ]),
     $coree2 := ([ Expr2 ]),
     $e1 := [ $coree1 ]Optional_Atomic_Value ,
     $e2 := [ $coree2 ]Optional_Atomic_Value
return
typeswitch ($e1) as $v1
case () return ()
case fs:numeric return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return [ $v1; fs:numeric; ValueEqOp; $v2; fs:numeric ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; fs:numeric; ValueEqOp; (cast ($v2) as xs:double); fs:numeric ]BinaryOp
     default return dm:error())
case xs:boolean return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:boolean return [ $v1; xs:boolean; ValueEqOp; $v2; xs:boolean ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:boolean; ValueEqOp; (cast ($v2) as xs:boolean); xs:boolean ]BinaryOp
     default return dm:error())
case xs:string return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:string return [ $v1; xs:string; ValueEqOp; $v2; xs:string ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:string; ValueEqOp; (cast ($v2) as xs:string); xs:string ]BinaryOp
     default return dm:error())
case xs:date return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:date return [ $v1; xs:date; ValueEqOp; $v2; xs:date ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:date; ValueEqOp; (cast ($v2) as xs:date); xs:date ]BinaryOp
     default return dm:error())
case xs:time return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:time return [ $v1; xs:time; ValueEqOp; $v2; xs:time ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:time; ValueEqOp; (cast ($v2) as xs:time); xs:time ]BinaryOp
     default return dm:error())
case xs:dateTime return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:dateTime return [ $v1; xs:dateTime; ValueEqOp; $v2; xs:dateTime ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:dateTime; ValueEqOp; (cast ($v2) as xs:dateTime); xs:dateTime ]BinaryOp
     default return dm:error())
case xs:duration return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:duration return [ $v1; xs:duration; ValueEqOp; $v2; xs:duration ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:duration; ValueEqOp; (cast ($v2) as xs:duration); xs:duration ]BinaryOp
     default return dm:error())
case Gregorian return
     (typeswitch ($e2) as $v2
     case () return ()
     case Gregorian return [ $v1; Gregorian; ValueEqOp; $v2; Gregorian ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; Gregorian; ValueEqOp; (cast ($v2) as Gregorian); Gregorian ]BinaryOp
     default return dm:error())
case xs:hexBinary return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:hexBinary return [ $v1; xs:hexBinary; ValueEqOp; $v2; xs:hexBinary ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:hexBinary; ValueEqOp; (cast ($v2) as xs:hexBinary); xs:hexBinary ]BinaryOp
     default return dm:error())
case xs:base64Binary return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:base64Binary return [ $v1; xs:base64Binary; ValueEqOp; $x2; xs:base64Binary ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:base64Binary; ValueEqOp; (cast ($v2) as xs:base64Binary); xs:base64Binary ]BinaryOp
     default return dm:error())
case xs:anyURI return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:anyURI return [ $v1; xs:anyURI; ValueEqOp; $v2; xs:anyURI ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:anyURI; ValueEqOp; (cast ($v2) as xs:anyURI); xs:anyURI ]BinaryOp
     default return dm:error())
case xs:QName return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:QName return [ $v1; xs:QName; ValueEqOp; $v2; xs:QName ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xsQName; ValueEqOp; (cast ($v2) as xs:QName) ; xs:QName ]BinaryOp
     default return dm:error())
case fs:UnknownSimpleType return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return
         [ (cast ($v1) as xs:double); xs:double; ValueEqOp; $v2; xs:double; fs:numeric]BinaryOp
     case xs:string return
         [ (cast ($v1) as xs:string); xs:string; ValueEqOp; $v2; xs:string]BinaryOp
     case xs:date return
         [ (cast ($v1) as xs:date); xs:date; ValueEqOp; $v2; xs:date]BinaryOp
     case xs:time return
         [ (cast ($v1) as xs:time); xs:time; ValueEqOp; $v2; xs:time]BinaryOp
     case xs:dateTime return
         [ (cast ($v1) as xs:dateTime); xs:dateTime; ValueEqOp; $v2; xs:dateTime]BinaryOp
     case xs:duration return
         [ (cast ($v1) as xs:duration); xs:duration; ValueEqOp; $v2; xs:duration]BinaryOp,
     case fs:UnknownSimpleType return
         [ (cast ($v1) as xs:string); xs:string; ValueEqOp; (cast ($v2) as xs:string); xs:string ]BinaryOp
     default return dm:error())
default return dm:error()

Ed. Note: MFF: The definition of equality operators could be factored by introducing another normalization function, which would be applied to the bodies of the cases. Is it clearer (albeit longer) to just enumerate all the cases? For now, they are enumerated.

 
[Expr2 Type2 ValueEqOp; ]ValueOp
==
     (typeswitch (Expr2) as $v2
     case () return ()
     case return [ $v1; ValueEqOp; $v2; ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; Type2 ValueEqOp; (cast ($v2) as Type2); $v2 ]BinaryOp
     default return dm:error())

Normalization

The value comparison in-equality operators "lt", "le", "gt", and "ge" are defined on a smaller set of types than are the equality operators "eq" and "ne". For convenience, ValueInEqOp denotes "lt", "le", "gt", or "ge".

 
[Expr1 ValueInEqOp Expr2]
==
let $coree1 := ([ Expr1 ]),
     $coree2 := ([ Expr2 ]),
     $e1 := [ $coree1 ]Optional_Atomic_Value ,
     $e2 := [ $coree2 ]Optional_Atomic_Value
return
typeswitch ($e1) as $v1
case () return ()
case fs:numeric return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return [ $v1; fs:numeric; ValueInEqOp; $v2; fs:numeric ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; fs:numeric; ValueInEqOp; (cast ($v2) as xs:double); fs:numeric ]BinaryOp
     default return dm:error())
case xs:string return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:string return [ $v1; xs:string; ValueInEqOp; $v2; xs:string ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:string; ValueInEqOp; (cast ($v2) as xs:string); xs:string ]BinaryOp
     default return dm:error())
case xs:date return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:date return [ $v1; xs:date; ValueInEqOp; $v2; xs:date ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:date; ValueInEqOp; (cast ($v2) as xs:date); xs:date ]BinaryOp
     default return dm:error())
case xs:time return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:time return [ $v1; xs:time; ValueInEqOp; $v2; xs:time ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:time; ValueInEqOp; (cast ($v2) as xs:time); xs:time ]BinaryOp
     default return dm:error())
case xs:dateTime return
     (typeswitch ($e2) as $v2
     case () return ()
     case xs:dateTime return [ $v1; xs:dateTime; ValueInEqOp; $v2; xs:dateTime ]BinaryOp
     case fs:UnknownSimpleType return
         [ $v1; xs:dateTime; ValueInEqOp; (cast ($v2) as xs:dateTime); xs:dateTime ]BinaryOp
     default return dm:error())
case fs:UnknownSimpleType return
     (typeswitch ($e2) as $v2
     case () return ()
     case fs:numeric return
         [ (cast ($v1) as xs:double); xs:double; ValueInEqOp; $v2; fs:numeric]BinaryOp
     case xs:string return
         [(cast ($v1) as xs:string); xs:string; ValueInEqOp; $v2; xs:string]BinaryOp
     case xs:date return
         [ ValueInEqOp; xs:date; xs:date]BinaryOp(cast ($v1) as xs:date, $v2)
     case xs:time return
         [ (cast ($v1) as xs:time); xs:time; ValueInEqOp; $v2; xs:time]BinaryOp
     case xs:dateTime return
         [ (cast ($v1) as xs:dateTime); xs:dateTime; ValueInEqOp; $v2; xs:dateTime]BinaryOp
     case xs:duration return
         [ (cast ($v1) as xs:duration); xs:duration; ValueInEqOp; $v2; xs:duration]BinaryOp
     case fs:UnknownSimpleType return
         [ (cast ($v1) as xs:string); xs:string; ValueInEqOp; (cast ($v2) as xs:string); xs:string ]BinaryOp
     default return dm:error())
default return dm:error()

Core Grammar

There are no core grammar rules for value comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the general comparison operators. They all have return type xs:boolean, as specified in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Dynamic Evaluation

The normalization rules map all value comparison operators into core expressions, whose dynamic semantics is defined in other sections, therefore there are no dynamic semantics rules for value comparison operators. The dynamic semantics rules for function calls given in [4.2.4 Function Calls] are applied to all the function calls op:numeric-less-than, etc.

4.6.2 General Comparisons

Introduction

General comparisons are defined by adding existential semantics to value comparisons. The operands of a general comparison may be sequences of any length. The result of a general comparison is always true or false.

Notation

For convenience, GeneralOp denotes "=", "!=", "<", "", ">", or "".

The function []ValueOp is defined by the following table:

GeneralOp[GeneralOp]ValueOp
=eq
!=ne
<lt
<=le
>gt
>=ge

Normalization

A general comparison expression is normalized by mapping it into an existentially quantified, value-comparison expression, which is normalized recursively.

 
[Expr1 GeneralOp Expr2]
==
[some $v1 in Expr1 satisfies (some $v2 in Expr2 satisfies [GeneralCompOp]ValueOp($v1, $v2))]

Core Grammar

There are no core grammar rules for general comparisons as they are normalized to existentially quantified core expressions.

Static Type Analysis

There are no static type rules for the general comparison operators. The existentially quantified some expression always returns xs:boolean. Its static typing semantics is given in [4.12 Quantified Expressions].

Dynamic Evaluation

The normalization rules map all general comparison operators into core expressions, whose dynamic semantics is defined in other sections, therefore there are no dynamic semantics rules for general comparison operators.

4.6.3 Node Comparisons

Notation

For convenience, NodeOp denotes "==" and "!==".

Normalization

The normalization rule for node comparison expressions checks that both operands are optional node values, otherwise generates an error. If both operands are nodes, it applies the operator specified by the [ ]BinaryOp function.

 
[Expr1 NodeOp Expr2]
==
let $e1 := ([ Expr1 ]),
     $e2 := ([ Expr2 ]),
return
typeswitch ($e1) as $v1
case () return
     (typeswitch ($e2) as $v2
     case node? return ()
     default return dm:error())
case node return
     (typeswitch ($e2) as $v2
     case () return ()
     case node return [ $v1; node; NodeOp; $v2; node ]BinaryOp
     default return dm:error())
default return dm:error()

Core Grammar

There are no core grammar rules for node comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the node comparison operators.

Dynamic Evaluation

The normalization rules map the node comparison operators into core expressions, whose dynamic semantics is defined in other sections, therefore there are no dynamic semantics rules for node comparison operators.

4.6.4 Order Comparisons

Notation

For convenience, OrderOp denotes "follow", "precedes", "<<", and ">>".

Normalization

The normalization rule for order comparison expressions checks that both operands are optional node values, otherwise generates an error. If both operands are nodes, it applies the operator specified by the [ ]BinaryOp function.

 
[Expr1 OrderOp Expr2]
==
let $e1 := ([ Expr1 ]),
     $e2 := ([ Expr2 ]),
return
typeswitch ($e1) as $v1
case () return
     (typeswitch ($e2) as $v2
     case node? return ()
     default return dm:error())
case node return
     (typeswitch ($e2) as $v2
     case () return ()
     case node return [$v1; node; OrderOp; $v2; node ]BinaryOp
     default return dm:error())
default return dm:error()

Core Grammar

There are no core grammar rules for order comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the order comparison operators.

Dynamic Evaluation

The normalization rules map the order comparison operators into core expressions, whose dynamic semantics is defined in other sections, therefore there are no dynamic semantics rules for order comparison operators.

4.7 Logical Expressions

Introduction

This section defines the semantics of [2.7 Logical Expressions] in [XQuery 1.0: A Query Language for XML].

A logical expression is either an and-expression or an or-expression. The value of a logical expression is always one of the boolean values true or false.

[6]   OrExpr   ::=   Expr  "or"  Expr
[7]   AndExpr   ::=   Expr  "and"  Expr

Notation

The first step in evaluating a logical expression is to reduce each of its operands to an effective boolean value. The function []Effective_Boolean_Value takes an expression and normalizes it into an effective boolean value. The conditional expression in the "default return" clause below guarantees that in an arbitrary sequence of items, if at least one value is a node, then the boolean expression evaluates to true.

 
[Expr]Effective_Boolean_Value =
==
typeswitch (Expr) as $v
case () return xf:false()
case xs:boolean return $v
default return
   if (xf:length([$v/self::node()]Path) >= 1) then xf:true()
   else [$v]Type_Exception_Opt_Atomic(boolean)

Ed. Note: The semantics of Boolean node tests over sequences is still an open issue. See [Issue-0131:  Boolean node test and sequences].

Normalization

The normalization rules for "and" and "or" first get the effective boolean value of each argument, then apply the appropriate operand.

 
[Expr1 and Expr2]
==
let $e1 := [ Expr1 ]Effective_Boolean_Value,
     $e2 := [ Expr2 ]Effective_Boolean_Value,
return op:boolean-and($e1, $e2)
 
[Expr1 or Expr2]
==
let $e1 := [ Expr1 ]Effective_Boolean_Value,
     $e2 := [ Expr2 ]Effective_Boolean_Value,
return op:boolean-or($e1, $e2)

Core Grammar

There are no core grammar rules for logical expressions as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the logical comparison operators. They both have return type xs:boolean, as specified in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Dynamic Evaluation

The normalization rules map the logical comparison operators into core expressions, whose dynamic semantics is defined in other sections, therefore there are no dynamic semantics rules for logical comparison operators.

4.8 Constructors

Ed. Note: Status: This section is still draft and needs further revision. Revision of the section is pending agreement about the semantics of element and attribute constructors. See [Issue-0110:  Semantics of element and attribute constructors]

This section defines the semantics of [2.8 Constructors] in [XQuery 1.0: A Query Language for XML].

XQuery supports two forms of constructors: a "literal" form that follows the XML syntax, and element and attribute constructors that can be used to construct stand-alone elements and attributes, possibly with a computed name.

[24]   Constructor   ::=   ElementConstructor |  ComputedElementConstructor |  ComputedAttributeConstructor
[57]   ElementConstructor   ::=   "<" QName AttributeList ("/>" |  (">" ElementContent* "</" QName ">"))
[63]   ElementContent   ::=   Char
|  "{{"
|  "}}"
|  ElementConstructor
|  EnclosedExpr
|  CdataSection
|  CharRef
|  PredefinedEntityRef
|  XmlComment
|  XmlProcessingInstruction
|  ComputedElementConstructor
|  ComputedAttributeConstructor
[64]   AttributeList   ::=   (QName "=" (AttributeValue |  EnclosedExpr))*
[65]   AttributeValue   ::=   (["] AttributeValueContent* ["])
|  (['] AttributeValueContent* ['])
[66]   AttributeValueContent   ::=   Char
|  CharRef
|  "{{"
|  "}}"
|  EnclosedExpr
|  PredefinedEntityRef
[67]   EnclosedExpr   ::=   "{" ExprSequence "}"

4.8.1 Element Constructors

Introduction

We define the semantics of the literal form of element constructors by normalizing it to a computed element constructor.

Normalization

Ed. Note: Status:The specification of element constructor normalization is pending adoption of a proposal on the treatment of whitespace in XQuery expressions and of a proposal on the semantics of element constructors. See [Issue-0110:  Semantics of element and attribute constructors].

4.8.2 Computed Element and Attribute Constructors

[58]   ComputedElementConstructor   ::=   "element" (QName |  EnclosedExpr) "{" ExprSequence? "}"
[59]   ComputedAttributeConstructor   ::=   "attribute" (QName |  EnclosedExpr) "{" ExprSequence? "}"

Normalization

Computed element and attribute expressions are normalized by mapping their sub-expressions:

 
[element (QName | { Expr }) { ExprSequence }]
==
element ([QName] | { [Expr] }) { [ExprSequence ]}
 
[attribute (QName | { Expr }) { ExprSequence }]
==
attribute ([QName] | { [Expr] }) { [ExprSequence ]}

Core Grammar

The core grammar rules for computed constructors are:

[58]   ComputedElementConstructor   ::=   "element" (QName |  EnclosedExpr) "{" ExprSequence? "}"
[59]   ComputedAttributeConstructor   ::=   "attribute" (QName |  EnclosedExpr) "{" ExprSequence? "}"

Static Type Analysis

The normalization rules leave us with only a the operator form of element (resp. attribute) constructor to handle. The element (resp. attribute) operator still has two form: one in which a QName is supplied as the element name, and one in which a computed expression is supplied. In the latter case, we are unable to provide anything other than a wildcard type, since the element name cannot be known until runtime.

Note