W3C

XQuery 1.0 and XPath 2.0 Formal Semantics

W3C Working Draft 16 August 2002

This version:
http://www.w3.org/TR/2002/WD-query-semantics-20020816/
Latest version:
http://www.w3.org/TR/query-semantics/
Previous versions:
http://www.w3.org/TR/2002/WD-query-semantics-20020326/
http://www.w3.org/TR/2001/WD-query-semantics-20010607/
http://www.w3.org/TR/2001/WD-query-algebra-20010215/
http://www.w3.org/TR/2000/WD-query-algebra-20001204/
Editors:
Denise Draper (XML Query WG), Nimble Technology <ddraper@nimble.com>
Peter Fankhauser (XML Query WG), Infonyte GmbH <fankhaus@infonyte.com>
Mary Fernández (XML Query WG), AT&T Labs - Research <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Kristoffer Rose (XSL WG), IBM Research <krisrose@us.ibm.com>
Michael Rys (XML Query WG), Microsoft <mrys@microsoft.com>
Jérôme Siméon (XML Query WG), Lucent Technologies <simeon@research.bell-labs.com>
Philip Wadler (XML Query WG), Avaya <wadler@avaya.com>

Abstract

This document defines the formal semantics of XQuery 1.0 [XQuery 1.0: A Query Language for XML] and XPath 2.0 [XML Path Language (XPath) 2.0].

Status of this Document

This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.

This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.

Among the most important changes from the previous version of this document are:

A complete list of issues is maintained in [C Issues]. The following are identified as high priority issues.

Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

XQuery 1.0, XPath 2.0, and their formal semantics has been defined jointly by the XML Query Working Group (part of the XML Activity) and the XSL Working Group (part of the Style Activity).

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.

Table of Contents

1 Introduction
2 Preliminaries
    2.1 Processing model
    2.2 Namespaces
    2.3 Data Model
        2.3.1 Data model overview
        2.3.2 Node identity
        2.3.3 Document order and sequence order
        2.3.4 Type annotations
    2.4 Schemas and types
        2.4.1 The elements of a (statically) typed language
        2.4.2 XML Schema and the XQuery type system
        2.4.3 Structural and named typing
        2.4.4 Subtyping
            2.4.4.1 Type substitutability
            2.4.4.2 Subtyping and XML Schema derivation
    2.5 Functions
        2.5.1 Functions and operators
        2.5.2 Functions and static typing
        2.5.3 The error function
        2.5.4 Data Model Accessors and XPath Axes
        2.5.5 Other Formal Semantics functions
    2.6 Notations
        2.6.1 Grammar productions
        2.6.2 Judgments
        2.6.3 Inference rules
        2.6.4 Environments
        2.6.5 Putting it together
    2.7 The Formal Semantics
        2.7.1 Normalization
        2.7.2 Static type inference
        2.7.3 Dynamic Evaluation
3 The XQuery Type System
    3.1 Values and Types
        3.1.1 Values
        3.1.2 Types
        3.1.3 Top level definitions
        3.1.4 Built-in type declarations
        3.1.5 Syntactic constraints on types
        3.1.6 Example
    3.2 Auxiliary judgments
        3.2.1 Derivation and substitution
            3.2.1.1 Derives
            3.2.1.2 Substitutes
        3.2.2 Extension
        3.2.3 Mixed content
        3.2.4 Adjusts
        3.2.5 Resolution
        3.2.6 Derives
        3.2.7 Lookup
        3.2.8 Interleaving
        3.2.9 Filtering
    3.3 Matching
        3.3.1 Nil-matches
        3.3.2 Matches
        3.3.3 Optimized matching
    3.4 Erase and Annotate
        3.4.1 Erasure
            3.4.1.1 Simply erases
            3.4.1.2 Erases
        3.4.2 Annotate
            3.4.2.1 Simply annotate
            3.4.2.2 Nil-annotate
            3.4.2.3 Annotate
    3.5 Subtyping
        3.5.1 Subtype
        3.5.2 Type equivalence
    3.6 Auxiliary typing judgments for for, unordered, and sortby expressions
        3.6.1 Prime types
        3.6.2 Computing Prime Types and Occurrence Indicators
    3.7 Auxiliary typing judgments for typeswitch expressions
        3.7.1 Computing common types and occurrence in typeswitch
        3.7.2 Computing Common Prime Types and Occurrence Indicators
    3.8 Major type issues
4 Basics
    4.1 Expression Context
        4.1.1 Static Context
        4.1.2 Evaluation Context
    4.2 Input Functions
    4.3 Expression Processing
    4.4 Types
        4.4.1 Type Checking
        4.4.2 SequenceType
        4.4.3 Type Conversions
            4.4.3.1 Atomization
            4.4.3.2 Effective Boolean Value
            4.4.3.3 Fallback Conversion
    4.5 Errors and Conformance
5 Expressions
    5.1 Primary Expressions
        5.1.1 Literals
        5.1.2 Variables
        5.1.3 Parenthesized Expressions
        5.1.4 Function Calls
        5.1.5 Comments
    5.2 Path Expressions
        5.2.1 Steps
            5.2.1.1 Axes
            5.2.1.2 Node Tests
        5.2.2 Predicates
        5.2.3 Unabbreviated Syntax
        5.2.4 Abbreviated Syntax
    5.3 Sequence Expressions
        5.3.1 Constructing Sequences
        5.3.2 Combining Sequences
    5.4 Arithmetic Expressions
    5.5 Comparison Expressions
        5.5.1 Value Comparisons
        5.5.2 General Comparisons
        5.5.3 Node Comparisons
        5.5.4 Order Comparisons
    5.6 Logical Expressions
    5.7 Constructors
        5.7.1 Element Constructors
        5.7.2 Computed Element and Attribute Constructors
            5.7.2.1 Computed Element Nodes
            5.7.2.2 Constructed Attribute Nodes
        5.7.3 Whitespace in Constructors
        5.7.4 Other Constructors and Comments
    5.8 [For/FLWR] Expressions
        5.8.1 FLWR expressions
        5.8.2 For expression
        5.8.3 Let Expression
    5.9 Order-Related Expressions
        5.9.1 Sorting Expressions
        5.9.2 Unordered Expressions
    5.10 Conditional Expressions
    5.11 Quantified Expressions
    5.12 Expressions on SequenceTypes
        5.12.1 Instance Of
        5.12.2 Typeswitch
        5.12.3 Cast and Treat
            5.12.3.1 Cast as
            5.12.3.2 Treat as
    5.13 Validate Expressions
6 The Query Prolog
    6.1 Namespace Declarations
    6.2 Schema Imports
    6.3 Xmlspace Declaration
    6.4 Default Collation
    6.5 Function Definitions
7 Additional Semantics of Functions
    7.1 Formal Semantics Functions
        7.1.1 The fs:characters-to-string function
        7.1.2 The fs:distinct-doc-order function
        7.1.3 The fs:item-sequence-to-node-sequence function
        7.1.4 The fs:item-sequence-to-string function
    7.2 Functions with specific typing rules
        7.2.1 The xf:error function
        7.2.2 The xf:distinct-nodes, and xf:distinct-values functions
        7.2.3 The op:union, op:intersect and op:except operators
        7.2.4 The op:to operator
        7.2.5 The xf:data function
8 Importing Schemas
    8.1 Introduction
        8.1.1 Features
        8.1.2 Organization
        8.1.3 Main mapping rules
        8.1.4 Special attributes
            8.1.4.1 use
            8.1.4.2 minOccurs and maxOccurs
            8.1.4.3 mixed
            8.1.4.4 nillable
            8.1.4.5 substitutionGroup
    8.2 Attribute Declarations
        8.2.1 Global attributes declarations
        8.2.2 Local attribute declarations
    8.3 Element Declarations
        8.3.1 Global element declarations
        8.3.2 Local element declarations
    8.4 Complex Type Definitions
        8.4.1 Global complex type
        8.4.2 Local complex type
        8.4.3 Complex type with simple content
        8.4.4 Complex type with complex content
    8.5 Attribute Uses
    8.6 Attribute Group Definitions
        8.6.1 Attribute group definitions
        8.6.2 Attribute group reference
    8.7 Model Group Definitions
    8.8 Model Groups
        8.8.1 All groups
        8.8.2 Choice groups
        8.8.3 Sequence groups
    8.9 Particles
        8.9.1 Element reference
        8.9.2 Group reference
    8.10 Wildcards
        8.10.1 Attribute wildcards
        8.10.2 Element wildcards
    8.11 Identity-constraint Definitions
    8.12 Notation Declarations
    8.13 Annotation
    8.14 Simple Type Definitions
        8.14.1 Global simple type definition
        8.14.2 Local simple type definition
        8.14.3 Simple type content
    8.15 Schemas as a whole
        8.15.1 Schema
        8.15.2 Include
        8.15.3 Redefine
        8.15.4 Import

Appendices

A Normalized core grammar
    A.1 Core lexical structure
        A.1.1 Syntactic Constructs
    A.2 Core BNF
B References
    B.1 Normative References
    B.2 Non-normative References
    B.3 Background References
C Issues
    C.1 Introduction
    C.2 Issues list
    C.3 Alphabetic list of issues
        C.3.1 Open Issues
        C.3.2 Resolved (or redundant) Issues
    C.4 Delegated Issues
        C.4.1 XPath 2.0
        C.4.2 XQuery
        C.4.3 Operators


1 Introduction

This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:

Ed. Note: [Kristoffer/XSL] The role of the formal semantics document could be clarified: does it supplement or subsume the language specification in case of conflict?

The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].

[XPath/XQuery] is a powerful language, capable of selecting and extracting complex patterns from XML documents , reformulating them into results in arbitrary ways, and computing expressions over the result. This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the constructions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.

Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.

[XPath/XQuery] is a functional language. It is built from expressions, rather than statements. Every construct in the language (except for the query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.

[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas describing the documents that will be processed, and the [XPath/XQuery] language can then perform operations based on these types (e.g., using a treat as operation). In addition, [XPath/XQuery] also supports a level of static type analysis. This means that the system can perform infer the type of a [expression/query], based on the type of its inputs. Static typing allows early error detection, and can be used as the basis for certain forms of optimization. The type system of [XPath/XQuery] is based on [XML Schema]. The [XPath/XQuery] type system models most of the features of [XML Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types. The main issues with respect to the [XPath/XQuery] type system are discussed in [3.8 Major type issues].

The [XPath/XQuery] formal semantics builds on long-standing traditions in the database and in the programming languages communities. In particular, the [XPath/XQuery] formal semantics has been inspired by works on SQL [SQL], OQL [ODMG], [BKD90], [CM93], and nested relational algebra (NRA) [BNTW95], [Col90], [LW97], [LMW96]. Other references include Quilt [Quilt], UnQL [BFS00], XDuce [HP2000], XML-QL [XMLQL99], XPath 1.0 [XML Path Language (XPath) : Version 1.0], XQL [XQL99], XSLT [XSLT 99], and YATL [YAT99]. Additional related work can be found in the bibliography [B References].

This document is organized as follows. [2 Preliminaries] introduces notations used to define the [XPath/XQuery] formal semantics. [3 The XQuery Type System] describes the [XPath/XQuery] type system, operations on the [XPath/XQuery] type system, and explains the relationship between the [XPath/XQuery] type system and XML Schema. [4 Basics] describes semantics for basic [XPath/XQuery] concepts, and [5 Expressions], describes the dynamic and static semantics of [XPath/XQuery] expressions. [6 The Query Prolog] describes the semantics of the [XPath/XQuery] prolog. [7 Additional Semantics of Functions] describes any additional semantics required for [XPath/XQuery] functions. Finally, [8 Importing Schemas], specifies how XML Schemas are imported into the [XPath/XQuery] type system.

2 Preliminaries

This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.

2.1 Processing model

First, a processing model for [expression/query] evaluation is defined. This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.

The processing model consists of five phases; each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we furthermore point to the relevant notations introduced later in the document.

  1. Parsing. The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal representation of the parsed syntax tree corresponding to the query is created: this ensures that we can unambiguously specify the following phases by a case per syntax production.

  2. Context Processing. The semantics of [expression/query] depends on the input context. The input context needs to be generated before the [expression/query] can be processed. In XQuery, the input context is described in the Query Prolog (See [6 The Query Prolog]). In XPath, the input context is generated by the processing environment. The input context is split in a static and a dynamic part (denoted statEnv and dynEnv, respectively).

  3. Normalization. To simplify the semantics specification, some normalization must first be performed over the [expression/query]. The [XPath/XQuery] language provides many powerful features that make [expression/query]s simpler to write and use, but are also redundant. For instance, a complex for expression might be rewritten as a composition of several simple for expressions. The language composed of these simpler [expression/query] is called the [XPath/XQuery] Core language and is a subset of the complete [XPath/XQuery] language. The grammar of the [XPath/XQuery] Core language is given in [A Normalized core grammar].

    During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literals.

    Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.7.1 Normalization].

    After normalization, the full semantics can be obtained just by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.

  4. Static type analysis. Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and the known types of input documents into account.

    If the [expression/query] is not type-safe, static type analysis can result in a static error. For instance, a comparison between an integer value and a string value might be detected as an error during the static type analysis. If static type analysis succeeds, it results in a syntax tree where each sub-expression is "decorated" with its static type, and in an output type for the result of the [expression/query].

    More precisely the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.7.2 Static type inference].

    Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.

  5. Dynamic Evaluation. This is the phase in which the result of the [expression/query] is computed. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals. This guarantees that every expression can be unambiguously reduced to a value, which is the final result of the [expression/query].

    The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.7.3 Dynamic Evaluation].

Ed. Note: Issue:[Kristoffer/XSL] The distinction between errors manifest by not being able to prove a judgment and special error values is not clear. Also the processing model does not allow for skipping static typing. See [Issue-0094:  Static type errors and warnings], [Issue-0171:  Raising errors], and [Issue-0169:  Conformance Levels].

The first four phases above are "analysis-time" (sometimes also called "compile-time") steps, which is to say that they can be performed on a [expression/query] before examining any input document. Indeed, analysis-time processing can be performed before such document even exists. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. The last phase is an "execution-time" (sometimes also called "run-time") step, which is to say that the query is evaluated on actual input document(s).

Static analysis catches only certain classes of errors. For instance, it can detect a comparison operation applied between incompatible types (e.g., xs:int and xs:date). Some other classes of errors cannot be detected by the static analysis and are only detected at execution time. For instance, whether an arithmetic expression on 32 bits integers (xs:int) yields an out-of-bound value can only be detected at run-time by looking at the data.

While implementations may be free to employ different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report static type errors, as specified in this document. It is still an open issue (See [Issue-0098:  Implementation of and conformance levels for static type checking]) whether to make the static type analysis phase optional to allow implementations to return a result for a type-invalid [expression/query] in special cases where the [expression/query] happens to perform correctly on a particular instance of input data. (Note that this is not the same as merely permitting that the static type error is emitted at evaluation time as is done below.)

Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: [expression/query] processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model defines only what the final result.

The above processing phases are all internal to the [expression/query] processor. They do not deal with how the [expression/query] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:

  1. XML Schema import phase. The [XPath/XQuery] type system is based on XML Schema. In order to perform static type analysis, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [8 Importing Schemas].

  2. XML loading phase. Expressions are evaluated on values in the [XQuery 1.0 and XPath 2.0 Data Model]. XML documents must be loaded into the [XQuery 1.0 and XPath 2.0 Data Model] before the evaluation phase. This is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not discussed further here.

  3. Serialization phase. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is still an open issue (See [Issue-0116:  Serialization]) and is not discussed further here.

The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.

For the other three phases (normalization, static type analysis and evaluation), instead or giving an algorithm in pseudo-code, the semantics is described formally by means of inference rules. Inference rules provide a notation to describe how the semantics of an expression derives from the semantics of its sub-expressions. Hence, they provide a concise and precise way for specifying bottom-up algorithms, as required by the normalization, static type analysis, and evaluation phases.

2.2 Namespaces

The Formal Semantics uses the following namespace prefixes.

All these prefixes are assumed to be bound to the appropriate URIs.

Ed. Note: [Kristoffer/XSL] The fs: prefix URI should be defined here.

2.3 Data Model

2.3.1 Data model overview

This section gives an overview of the main data model features that are relevant to the Formal Semantics. The reader is referred to the [XQuery 1.0 and XPath 2.0 Data Model] document for more details.

  • An atomic value is a value in the value space of one of the [XML Schema Part 2] atomic types (that is, a primitive simple type that is not derived by list or union).

  • A node is one of the following kinds: document, element, attribute, namespace, comment, processing-instruction and text. The [XPath/XQuery] Formal Semantics does not currently discuss namespace, comment, and processing-instruction nodes (See [Issue-0105:  Types for nodes in the data model.]). Components of nodes can be typed values, string values, names, type annotations, or other nodes.

  • XML documents and their contents are represented in the data model as trees composed of nodes and atomic values. [XQuery 1.0 and XPath 2.0 Data Model] defines a mapping from the PSVI (Post Schema Validation Information Set) to such trees.

  • [XQuery 1.0 and XPath 2.0 Data Model] defines constructors, which are functions that construct nodes and atomic values, and accessors, which are functions that access a value's properties or components.

    For instance, the children function returns all the child nodes of an element node. The typed value of a node is a sequence of zero or more atomic values. The string value of a node is an instance of the type xs:string. The name of a node is an instance of the type xs:QName.

  • An item is either an atomic value or a node.

  • Every value in the data model is a sequence. A sequence is an ordered collection of zero or more items. Sequences are represented by items separated by commas, enclosed in parentheses.

    A sequence containing exactly one item is called a singleton sequence. An item is identical to a singleton sequence containing that item. Sequences are never nested--for example, combining the values 1, (2, 3), and () into a single sequence results in the sequence (1, 2, 3). A sequence containing zero items is called an empty sequence.

  • The data model annotates element nodes, attribute nodes, and atomic values with type names. This type annotation is the type name given to the element, attribute, or value through the validation process.

The remainder of the section is devoted to several related features that merit special attention: node identity, order, and type annotations.

2.3.2 Node identity

In the [XQuery 1.0 and XPath 2.0 Data Model], nodes have identity. Node identity follows from the unique location of the node within exactly one XML document (or document fragment, in the case of XML being constructed within the [expression/query] itself). It is possible for two expressions to return not only the same value, but also the identical node. On the other hand, two expressions could return results with the same document structure, but different identities. [XPath/XQuery] supports comparison of nodes using either "node equality" (equality by identity) or "value equality" (equality by value), for instance by using the operators is or =.

Atomic values do not have an associated identity and are therefore always compared by value.

The difference between node equality and value equality is illustrated by the following example:

   let $a1 := <book><author>Suciu</author></book>,
       $a2 := <author>Suciu</author>,
       $a3 := $a1/author
   return
       ($a1/author is $a3), {-- true, they refer to the same node --}
       ($a1/author = $a2),  {-- true, they have the same value --}
       ($a1/author is $a2)  {-- false, they are not the same node --}

All [XPath/XQuery]'s operators preserve node identity with the exception of explicit copy operations, node (element, attribute, etc.) constructors and validate expressions. An element constructor always creates a deep copy of its attributes and child nodes. Other operators, such as sequence constructors or path expressions, do not result in copies of the corresponding nodes. So, for example, ($a3, $a2) creates a new sequence of nodes, the first of which has the same identity as $a1/author.

2.3.3 Document order and sequence order

There are two kinds of order in [XPath/XQuery]: sequence order and document order.

Sequence order. Sequences of items in the [XPath/XQuery] data model are ordered. Sequence order refers to the order of items within a given sequence.

Document order. Document order refers to the total order among all nodes in a given document. It is defined as the order of appearance of the nodes when performing a pre-order, depth-first traversal of a tree. For elements, this corresponds to the order of appearance of their opening tags in the corresponding XML serialization. Document order is equivalent to the definition used in [XML Path Language (XPath) : Version 1.0].

Depending on the context, it may be possible to talk about two different notions of order for the same (set of) nodes. For example:

   let $e := <list>
               <item id="1"/>
               <item id="2"/>
             </list>,
       $s := ($e/item[@id='2'], $e/item[@id='1'])
   ...

The item "1" is before item "2" in document order, but the same two nodes are in the opposite order in the sequence $s.

The order between nodes from distinct documents is implementation defined but must be stable. That is, it must not change for the duration of query evaluation.

2.3.4 Type annotations

A type annotation indicates the type name for each element and attribute node, which results from the validation process or is assigned by default (depending on the way the data was built). In the case of anonymous types, the data model annotates element and attributes with xs:anyType and xs:anySimpleType respectively.

For instance, consider the document fragment

  <fact>The cat weighs <weight>12</weight> pounds.</fact>

and the associated schema

  <xs:element name="fact" type="FactType"/>

  <xs:complexType name="FactType" mixed="true">
    <xs:sequence>
      <xs:element name="weight" type="xs:integer">
    </xs:sequence>
  </xs:complexType>

Before validation, the two elements fact and weight in the document are annotated by xs:anyType.

After validation, the element fact is annotated with the type name FactType and the element weight is annotated with the type name xs:integer.

Type annotations are taken into account during [expression/query] evaluation and have an impact on the semantics of [XPath/XQuery].

For instance, consider the following function declaration

  define function convert_weight(element of type FactType $x)
    { ... }

This function only accepts as input an element annotated with the type FactType, or with one of the types derived from it. Therefore, the previous document fragment is only accepted by the function after it has been validated against the given schema.

[3 The XQuery Type System] gives a formal description of data model values with type annotations and explain how type annotations are taken into account when matching a value against a given type.

2.4 Schemas and types

This section presents an introduction to (statically) typed languages in general, and a conceptual overview of the [XPath/XQuery] type system in particular. The [XPath/XQuery] type system is discussed formally in [3 The XQuery Type System].

2.4.1 The elements of a (statically) typed language

A type system relates values, types, and expressions in the language.

The main constituent parts of a typed system are:

  • A universe of possible types. Type are composed of other types, starting with primitive types, and can be built using type constructors.

    In [XPath/XQuery], the primitive types are the primitive atomic types of [XML Schema Part 2]. Type constructors are also based on schema, and can be used to build types for attributes and elements, sequence and choice groups, etc. The [XPath/XQuery] type system is defined in [3.1 Values and Types].

  • A notation for specifying types.

    In [XPath/XQuery], types are imported from XML Schema, and can be used in expressions using the SequenceType syntax. In order to model and reason about types in [XPath/XQuery], this document introduces its own notation for types. This notation is defined in [3.1 Values and Types].

  • A relationship between values and types. Formally, the semantics of a type is defined to be the (usually infinite) set of instance values which match that type.

    For example, the type xs:integer consists of the set of all integer values, and the type element address { xs:string } consists of all elements named "address" whose contents are a single string value, etc.

    Matching a [XPath/XQuery] value against a type is defined in [3.3 Matching].

  • A notion of subtyping. Subtyping is a relationship between types that plays an important role in a statically typed language. Notably, it is used to determine whether function calls are legal or not.

    Subtyping in [XPath/XQuery] is defined in [3.5.1 Subtype].

For a statically typed language one also needs to define:

  • Rules that relate expressions to types. Static type analysis infers a type for each expression in the language. The most important property for a static type system is that the type inferred statically must be such that for all valid inputs, evaluating the expression returns a result that matches the inferred type. If this property holds, then the static typing provides guarantees that certain kinds of errors can not occur at run-time.

    The static typing rules for [XPath/XQuery] are defined for all [XPath/XQuery] expressions in [5 Expressions], [6 The Query Prolog] and [7 Additional Semantics of Functions].

Finally, types can also be used during language evaluation if the language provides:

  • Expressions on types. A language can support operations that take types into account, such as casting, type matching, etc.

    [XPath/XQuery] provide several such expressions: "typeswitch", "cast as", "treat as", and "validate as". [XPath/XQuery] also support a "validate" expression, which performs XML Schema validation. Validation is related to matching in that if validation succeeds, it returns a value which matches the type used for validation. Expressions on types are defined in [5.12 Expressions on SequenceTypes]. The validate expression is defined in [3.4 Erase and Annotate].

The remainder of the section is devoted to several related features that merit special attention: the relationship between the [XPath/XQuery] type system and XML Schema, structural and named typing, and subtyping.

2.4.2 XML Schema and the XQuery type system

The [XPath/XQuery] type system is based on [XML Schema]. An introduction to XML Schema can be found in [XML Schema Part 0].

The [XPath/XQuery] type system captures a large subset of XML Schema. This section gives a summary of which features are captured and in some cases deviations from XML Schema.

The following components and features of XML Schema are captured exactly in the [XPath/XQuery] type system:

  • Element and attribute declarations;

  • Simple and complex type definitions;

  • Local attributes and elements;

  • Named and anonymous types;

  • the type name hierarchy;

  • Sequence, choice and all groups;

  • Wildcards;

  • Derivation by extension, list and union;

  • Substitution groups.

The following components or features of XML Schema are not represented in the [XPath/XQuery] type system.

  • Simple type facets;

  • Identity-constraint definitions;

  • Notations and annotations.

The following components or features of XML Schema are represented partially or differently in the [XPath/XQuery] type system.

  • The [XPath/XQuery] type system only captures an approximation of minOccurs and maxOccurs. Like DTDs, the [XPath/XQuery] type system only supports *, +, and ? which correspond to [minOccurs="0" maxOccurs="unbounded"], [minOccurs="1" maxOccurs="unbounded"], and [minOccurs="0" maxOccurs="1"] respectively.

  • The [XPath/XQuery] type system generalizes the notion of derivation by restriction for the needs of static type analysis. Derivation by restriction in XML Schema relies on a syntactic relationship between content models. The [XPath/XQuery] type system uses a generalization of this relationship based on the notion of inclusion between types -- also called structural subtyping.

At compile time, the [XPath/XQuery] environment imports XML Schema declarations, and loads them as declarations in the [XPath/XQuery] type system. This loading process is specified by a mapping from XML Schema to the [XPath/XQuery] type system, given in [8 Importing Schemas].

2.4.3 Structural and named typing

XML Schema is a powerful schema language for XML. XML Schema can describe both structural constraints as well as type annotations that apply to XML documents. These aspects are referred to as structural typing and named typing respectively.

  • Structural typing. An important notion underlying XML Schema is the concept of regular expressions, and their tree counterparts, regular tree grammars. An introduction to regular (tree) languages can be found in [Languages] or in [TATA].

    Tree grammars can be used to capture the structural constraints imposed by an XML Schema. Automata are the traditional way of implementing tree grammars and can be used to implement part of the XML Schema validation process.

    For instance, consider the following element declaration:

      <xs:element name="root">
        <xs:complexType>
          <xs:sequence maxOccurs="unbounded">
    	<xs:choice minOccurs="0" maxOccurs="unbounded">
    	  <xs:element ref="a"/>
    	  <xs:element ref="b" minOccurs="0"/>
    	</xs:choice>
    	<xs:element ref="c"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    

    The structural constraints imposed by the definition of the content of element root can be described by the following regular expression (written here in DTD notation): ((a|b?)*,c)+. Such a constraint can be checked using a finite state automaton applied on the children of the element root in the instance document.

  • Named typing. In addition, XML Schema provides the ability to associate names with type definitions and to declare relationships between type names.

    For instance, consider the following two element declarations:

      <xs:element name="person">
        <xs:complexType name="Person">
          <xs:sequence>
            <xs:element ref="ssn">
    	<xs:choice>
              <xs:sequence>
      	    <xs:element ref="major" minOccurs="0"/>
      	    <xs:element ref="minors" maxOccurs="unbounded"/>
              </xs:sequence>
              <xs:element ref="job_desc"/>
    	</xs:choice>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    
      <xs:element name="student">
        <xs:complexType name="Student">
          <xs:complexContent>
            <xs:restriction base="Person">
              <xs:sequence>
                <xs:element ref="ssn">
                <xs:element ref="major" minOccurs="0"/>
                <xs:element ref="minors" maxOccurs="unbounded"/>
              </xs:sequence>
            </xs:restriction>
          </xs:complexType>
        </xs:complexContent>
      </xs:element>
    
    

    The type Studentof the element student is derived by restriction from the type Person. Such a declaration not only defines a new type, but also creates a relationship between the type names Person and Student which needs to be taken into account when doing validation and matching against a given type. For instance, the following function

      define function salary(element of type Person $x)
        { .... }
    

    accepts as parameters elements of type Person as well as elements with types derived by restriction from Person, but does not accept elements with the same content model as the element person but a type which does not derive from the type Person.

2.4.4 Subtyping

A type is a subtype of another type if every value matching the first type also matches the second type. Since matching takes both type structure and type names into account, so is subtyping.

Here is a simple example of subtyping on two regular expressions:

  ( xs:double )*
  ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains only doubles is also an example of a sequence that contains either integers or doubles. This example only illustrates the structural subtyping.

Type1 <: Type2 indicates that type Type1 is a subtype of Type2. Note that the notation <: is used when defining the [XPath/XQuery] semantics, but is not part of the [XPath/XQuery] syntax.

2.4.4.1 Type substitutability

Subtyping plays a crucial role in the semantics of function application since all functions have an associated signature that declares the types of input parameters and the type of the result of the function.

Intuitively, one can call a function not only by passing an argument of the declared type, but also any argument whose type is a subtype of the declared type. This property is called type substitutability.

Ed. Note: [Kristoffer/XSL] Type substitutability should really be true for any subexpression, not just function arguments (it just happens that in functional languages, where type substitutability is usually defined, the situations are equivalent). Specifically type substitutability does not apply to any subexpression of [XPath/XQuery]: typeswitch violates it. In particular this means that while one may replace a function argument with one whose type is a subtype this does not guarantee that the resulting value is the same even if the value is the same (but cast to a subtype).

2.4.4.2 Subtyping and XML Schema derivation

Subtyping in [XPath/XQuery] is related to, but not identical to, "derivation by restriction" in XML Schema. A type derived by restriction is always a subtype of its base type, but the opposite is not true. For instance, consider the following types:

  ( xs:double )+
  xs:double, ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains one or more doubles is also an example of a sequence containing a double followed by a sequence of zero or more integers or doubles. But the first type cannot be derived from the second type using derivation by restriction as defined in XML Schema.

2.5 Functions

2.5.1 Functions and operators

The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides a number of useful basic functions over the components of the [XPath/XQuery] data model (atomic values, nodes, sequences, etc). A number of these functions are used in the course of describing the [XPath/XQuery] semantics. Here is the list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the [XPath/XQuery] Formal Semantics:

  • op:numeric-add
  • op:numeric-subtract
  • op:numeric-multiply
  • op:numeric-divide
  • op:numeric-mod
  • op:numeric-unary-plus
  • op:numeric-unary-minus
  • op:numeric-equal
  • op:numeric-less-than
  • op:numeric-greater-than
  • op:boolean-equal
  • op:boolean-less-than
  • op:boolean-greater-than
  • op:yearMonthDuration-equal
  • op:yearMonthDuration-less-than
  • op:yearMonthDuration-greater-than
  • op:dayTimeDuration-equal
  • op:dayTimeDuration-less-than
  • op:dayTimeDuration-greater-than
  • op:dateTime-equal
  • op:dateTime-less-than
  • op:dateTime-greater-than
  • op:add-yearMonthDurations
  • op:subtract-yearMonthDurations
  • op:multiply-yearMonthDuration
  • op:divide-yearMonthDuration
  • op:add-dayTimeDurations
  • op:subtract-dayTimeDurations
  • op:multiply-dayTimeDuration
  • op:divide-dayTimeDuration
  • op:add-yearMonthDuration-to-dateTime
  • op:add-dayTimeDuration-to-dateTime
  • op:subtract-yearMonthDuration-from-dateTime
  • op:subtract-dayTimeDuration-from-dateTime
  • op:add-yearMonthDuration-to-date
  • op:add-dayTimeDuration-to-date
  • op:subtract-yearMonthDuration-from-date
  • op:subtract-dayTimeDuration-from-date
  • op:add-dayTimeDuration-to-time
  • op:subtract-dayTimeDuration-from-time
  • op:QName-equal
  • op:anyURI-equal
  • op:hex-binary-equal
  • op:base64-binary-equal
  • op:NOTATION-equal
  • op:node-equal
  • op:node-before
  • op:node-after
  • op:node-precedes
  • op:node-follows
  • op:to
  • op:concatenate
  • op:item-at
  • op:union
  • op:intersect
  • op:except
  • op:operation
  • xf:false
  • xf:true
  • xf:count
  • xf:boolean
  • xf:get-namespace-uri
  • xf:get-local-name
  • xf:round
  • xf:compare
  • xf:not
  • xf:empty
  • xf:root
  • xf:error

2.5.2 Functions and static typing

Many functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document are generic: they perform operations on arbitrary components of the data model, e.g., any kind of node, or any sequence of items. For instance, the xf:distinct-nodes function removes duplicates in any sequence of nodes. As a result, the signature given in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document is also generic. For instance, the signature of the xf:distinct-nodes function is:

  define function xf:distinct-nodes(node*) returns node*

If applied as is, this signature would result in poor static type information. In general, it would be preferable if some of these function signatures could take the type of input parameters into account. For instance, if the function xf:distinct-nodes is applied on a parameter of type element a*, element b, one can easily deduce that the resulting sequence is a collection of either a or b elements.

In order to provide better static typing, some specific typing rules are specified for some of the functions in [XQuery 1.0 and XPath 2.0 Functions and Operators]. These additional typing rules are given in [7 Additional Semantics of Functions]. Here is the list of functions that are given specific typing rules.

  • xf:error
  • xf:distinct-nodes
  • xf:distinct-values
  • op:union
  • op:intersect
  • op:except
  • op:to
  • xf:data

Ed. Note: There might be other functions that need specific typing rules. See [Issue-0135:  Semantics of special functions].

2.5.3 The error function

Some queries result in an error. Errors that are detected at compile-time, like parse errors or type errors, are reported to the user by the system. Dynamic errors are raised using the xf:error() function. General handling of errors in the Formal Semantics is still an open issue (See [Issue-0094:  Static type errors and warnings], [Issue-0098:  Implementation of and conformance levels for static type checking], [Issue-0171:  Raising errors], and [Issue-0169:  Conformance Levels]).

2.5.4 Data Model Accessors and XPath Axes

The [XPath/XQuery] Data Model provides operations to construct or access components of the Data Model, some of which are used in the course of defining the [XPath/XQuery] semantics. The Formal Semantics uses the namespace prefix dm: for those functions to distinguish them from other functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Here is a list of data model constructors or accessors used for Formal Semantics specification.

  • dm:atomic-value
  • dm:children
  • dm:attributes
  • dm:parent
  • dm:name
  • dm:node-kind
  • dm:dereference
  • dm:empty-sequence
  • dm:namespaces
  • dm:type
  • dm:typed-value
  • dm:attribute-node-atomic
  • dm:element-node-atomic
  • dm:text-node

XPath axes are used to describe tree navigation in instances of the [XPath/XQuery] data model. The semantics of axis navigation is described in terms of data model functions. Some of the XPath axes are directly supported through existing data model accessors. Some axes (e.g., ancestor) are defined as a simple recursive application of a data model accessor and others (e.g., following) require more complex computation on top of existing data model accessors. The correspondence between XPath axis and data model accessors is specified in section [5.2.1.1 Axes]

2.5.5 Other Formal Semantics functions

In a few cases, the Formal Semantics makes use of functions that are not currently in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document. The namespace prefix fs: is used for those functions, to distinguish them from functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Here is a list of additional functions used for Formal Semantics specification.

  • fs:characters-to-string
  • fs:distinct-doc-order
  • fs:item-sequence-to-node-sequence
  • fs:item-sequence-to-string

2.6 Notations

In order to make the specification of the semantics both concise and precise, formal notations in the form of judgments and inference rules are used. This section, introduces the notations for judgments, inference rules and mapping rules, as well as the notion of an environment.

This section is intended for the reader who may not be familiar with these notations. The reader familiar with these notations can skip this section and go directly to [2.7 The Formal Semantics].

2.6.1 Grammar productions

Grammar productions are used in this document to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.

XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.

[10 (XQuery)]   FLWRExpr   ::=   ((ForClause |  LetClause)+ WhereClause? "return")* QuantifiedExpr

For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in the document, and are annotated with "(Path)". For instance, the following production describes for expressions in XPath.

[13 (XPath)]   ForExpr   ::=   (ForClause "return")* QuantifiedExpr

XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.

[8 (Core)]   ForExpr   ::=   (ForClause "return")* TypeswitchExpr

The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe the [XPath/XQuery] type system. XQuery Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.

[27 (Formal)]   Definition   ::=   (<"define" "element"> ElementName Substitution? Nillable? TypeSpecifier)
|  (<"define" "attribute"> AttributeName TypeSpecifier)
|  (<"define" "type"> TypeName TypeDerivation)

Note that grammar productions which are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.

2.6.2 Judgments

The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.

For example:

Notation

The judgment

Painting is beautiful

holds if the painting Painting is beautiful.

Or more relevant, here are two example of judgments that are used extensively in the rest of this document.

Notation

The judgment

Expr => Value

holds if the expression Expr yields (or evaluates to) the value Value.

Notation

The judgment

Expr : Type

holds when the expression Expr has type Type.

A judgment can contain symbols and patterns.

Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.

Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr can be used to represent any [XPath/XQuery] expression, Value can be used to represent any value in the [XPath/XQuery] data model.

When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).

Patterns may appear with subscripts (e.g. Expr1, Expr2), which simply name different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. On the contrary, '$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.

In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.

2.6.3 Inference rules

Whether a judgments holds or not is specified by means of inference rules. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments. (As explained in [2.1 Processing model], this approach lends itself well to bottom-up algorithms.)

A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:

premise1 ... premisen

conclusion

All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.

Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:

$x => 0      3 => 3

$x + 3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.

It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:


3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.

The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:

Variable => Integer

Variable + 0 => Integer

As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

2.6.4 Environments

Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" statement can be captured in an environment and used for further evaluations.

An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.

If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped to. The notation is intentionally akin to function application as an environment can be seen as a "function" from the argument symbol to the "object" that the symbol is mapped to.

This specification uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".

Updating is only defined on environment groups:

  • "env [ mem(symbol |-> object) ]" denotes the a new environment group which is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol |-> object indicates that symbol is mapped to object in the new environment.

  • If the "object" is a type then the following conventional variant notation is used: "env [ mem(symbol : object) ]".

  • The following shorthand is also allowed: "env [ mem( symbol1 |-> object1 ; ... ; symboln |-> objectn ) ]" in which each symbol is mapped to a corresponding object in the new environment.

    This notation is equivalent to nested simple updates, as in " (...(env[mem(symbol1 |-> object1)]) ... [mem(symboln |-> objectn)])".

Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of variables, namespaces, etc. Also, note that there are no operations to remove entries from environments: the need never arises because the environment group from "before" the update remains accessible concurrently with the updated version.

Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is ubiquitous in inference rules.

For instance, the judgment

dynEnv |- Expr => Value

should be read: assuming the dynamic environment dynEnv, the expression Expr yields the value Value.

The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [4.1 Expression Context].

2.6.5 Putting it together

Putting the above notations together, here is an example of an inference rule that occurs later in this document:

statEnv |- Expr1 : Type1      statEnv |- Expr2 : Type2

statEnv |- Expr1 , Expr2 : Type1, Type2

This rule is read as follows: if two expressions Expr1 and Expr2 have already been statically inferred to have types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the type "Type1, Type2", which is the sequence of types Type1 and Type2.

The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let/return" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.

statEnv |- Expr1 : Type1      statEnv [ varType(QName : Type1) ] |- Expr2 : Type2

statEnv |- let $QName := Expr1 return Expr2 : Type2

This rule should be read as follows. First, the type Type1 for the "let" input expression Expr1 is computed. Second the "let" variable is added into the varType member of the static environment group statEnv, with type Type1. Finally, the type Type2 of Expr2 is computed in that new environment.

Ed. Note: Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.7 The Formal Semantics

Finally, this section introduces the top-level notations defining the processing model phases described above. These notations are used to define the normalization, static type analysis, and dynamic evaluation processing phases.

2.7.1 Normalization

Normalization is specified using mapping rules which describe how a [XPath/XQuery] expression is rewritten into another expression in the [XPath/XQuery] Core. Note that mapping rules are also used in [8 Importing Schemas] to specify how XML Schemas are imported into the [XPath/XQuery] Type System.

Notation

Mapping rules are written using a square bracket notation, as follows:

 
[Object]Subscript
==
Mapped Object

The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.

(Since normalization is always done in the context of the static context the above is really a shorthand for

statEnv |- [Object] Subscript == Mapped Object

but we shall stay with the shorthand because statEnv will always be implied.)

The static environment is used in certain cases (e.g. for normalization of function calls) during normalization. To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.

Ed. Note: [Kristoffer/XSL] We should decide whether to use a shorthand notation as suggested or modify the mapping rules throughout.

Specifically the normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is

 
[Expr]Expr
==
CoreExpr

which indicates that the expression Expr is normalized to the expression CoreExpr in the [XPath/XQuery] core (with the implied statEnv).

Example

For instance, the following [expression/query]

    for $i in (1, 2),
        $j in (3, 4)
    return
      element pair { ($i,$j) }

is normalized to the core expression

    for $i in (1, 2) return
      for $j in (3, 4) return
        return
          element pair { ($i,$j) }

in which the complex "FWLR" expression is mapped into a composition of two simpler "for" expressions.

2.7.2 Static type inference

The static semantics is specified using type inference rules, which relate [XPath/XQuery] expressions to types and specify under what conditions an expression is well typed.

Notation

The judgment

statEnv |- Expr : Type

holds when, assuming the static environment statEnv is given, the expression Expr has type Type.

Example

The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.

For instance, the following expression.

   let $v := 3
   return
       $v+5

has type xs:integer. This can be inferred as follows: the input literals '3' and '5' have type integer, so the variable $v also has type integer. Since the sum of two integers is an integer, the complete expression has type integer.

Note

The type of an expression is computed by inference. Static type inference rules are given to describe, for each kind of expression, how to compute the type of the expression given the types of its sub-expressions. Here is a simple example of such a rule:

statEnv |- Expr1 : xs:boolean      statEnv |- Expr2 : Type2      statEnv |- Expr3 : Type3

statEnv |- if Expr1 then Expr2 else Expr3 : ( Type2 | Type3 )

This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a choice: '(Type2|Type3)'.

The "left half" of the expression below the line (the part before the :) corresponds to some phrase in a [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal parse tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr1, Expr2, and Expr3) that need to be matched against the children of the node in the parse tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the two sub-expressions of the "," operator. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This illustrates a general feature of the type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the parse structure of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type-safe.

2.7.3 Dynamic Evaluation

The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.

Notation

The judgment

statEnv ; dynEnv |- Expr => Value

holds when, assuming the static environment statEnv and dynamic environment dynEnv are given, the expression Expr yields the value Value.

The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.

Example

For instance, the following expression.

   let $v := 3
   return
       $v+5

yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.

Note

As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions. [XPath/XQuery]'s dynamic semantics is modeled on the dynamic semantics presented in [Milner].

The inference rules used for dynamic evaluation, like those for static type inference, follow a top-down recursive structure, computing the value of expressions from the values of their sub-expressions.

3 The XQuery Type System

Ed. Note: Status:This section has been extensively revised to improve the alignment between the [XPath/XQuery] type system and XML Schema. It now incorporates "named typing". Feedback on the new design is solicited.

The [XPath/XQuery] type system is used for the specification of both the dynamic and the static semantics of [XPath/XQuery]. It is used to describe the semantics of expressions on sequence types and in type conversion rules. It is also the basis for the static semantics of [XPath/XQuery].

This section defines formal values and types, four main operations on types. Three operations (match, erase, and annotate) are used for the dynamic semantics. The last operation (subtyping) is used for the static semantics.

3.1 Values and Types

This section introduces formal notations for describing XQuery types and values. Those notations are used extensively throughout this document to represent values and types in inference rules. We use formal notations for values and types because they are more compact, and are easier to manipulate within the inference rules. The formal notations for values and types introduced here are not exposed to the XQuery user.

3.1.1 Values

The following grammar introduces formal notations for values. This notation is used to formally represent values in the the [XQuery 1.0 and XPath 2.0 Data Model].

A value is a sequence of zero or more items. An item is either a node or an atomic value. A node is either an element, an attribute, a document node, or a text node. Text nodes always contain string values of type xs:anySimpleType. Elements, attributes, and atomic values have a type annotation, which is the QName of a type.

Elements may be annotated with any type, and attributes may be annotated with any simple type.

A atomic value encapsulates an XML Schema atomic type (which is its type annotation) and a corresponding value of that type. An XML Schema atomic type [XML Schema Part 2] may be primitive or derived, or xs:anySimpleType. The corresponding atomic type value may be a value in the value space of one of the 19 primitive XML Schema types.

Type annotations are always present. Untyped elements or attributes (e.g., from well-formed documents before validation has been performed) are annotated with xs:anyType and attributes and atomic values with no other type information are annotated with xs:anySimpleType.

A value is called a simple value if it consists of a sequence of zero or more atomic values.

[1 (Formal)]   Value   ::=   Item
|  (Value ","