W3C

XQuery 1.0 and XPath 2.0 Formal Semantics

W3C Working Draft 15 November 2002

This version:
http://www.w3.org/TR/2002/WD-query-semantics-20021115/
Latest version:
http://www.w3.org/TR/query-semantics/
Previous versions:
http://www.w3.org/TR/2002/WD-query-semantics-20020816/
http://www.w3.org/TR/2002/WD-query-semantics-20020326/
http://www.w3.org/TR/2001/WD-query-semantics-20010607/
http://www.w3.org/TR/2001/WD-query-algebra-20010215/
http://www.w3.org/TR/2000/WD-query-algebra-20001204/
Editors:
Denise Draper (XML Query WG), Nimble Technology <ddraper@nimble.com>
Peter Fankhauser (XML Query WG), Infonyte GmbH <fankhaus@infonyte.com>
Mary Fernández (XML Query WG), AT&T Labs - Research <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Kristoffer Rose (XSL WG), IBM Research <krisrose@us.ibm.com>
Michael Rys (XML Query WG), Microsoft <mrys@microsoft.com>
Jérôme Siméon (XML Query WG), Lucent Technologies <simeon@research.bell-labs.com>
Philip Wadler (XML Query WG), Avaya <wadler@avaya.com>

Abstract

This document defines the formal semantics of XQuery 1.0 [XQuery 1.0: A Query Language for XML] and XPath 2.0 [XML Path Language (XPath) 2.0].

Status of this Document

This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.

This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.

Among the most important changes from the previous version of this document are:

A complete list of issues is maintained in [D Issues]. The following are identified as high priority issues.

Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

XQuery 1.0, XPath 2.0, and their formal semantics has been defined jointly by the XML Query Working Group (part of the XML Activity) and the XSL Working Group (part of the Style Activity).

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.

Table of Contents

1 Introduction
2 Preliminaries
    2.1 Processing model
    2.2 Namespaces
    2.3 Data Model
        2.3.1 Data model overview
        2.3.2 Node identity
        2.3.3 Document order and sequence order
        2.3.4 Type annotations and anonymous types
    2.4 Schemas and types
        2.4.1 The elements of a (statically) typed language
        2.4.2 XML Schema and the XQuery type system
        2.4.3 Structural and named typing
        2.4.4 Subtyping
            2.4.4.1 Type substitutability
            2.4.4.2 Subtyping and XML Schema derivation
    2.5 Functions
        2.5.1 Functions and operators
        2.5.2 Functions and static typing
        2.5.3 The error function
        2.5.4 Data Model Accessors
        2.5.5 Other Formal Semantics functions
    2.6 Notations
        2.6.1 Grammar productions
        2.6.2 Judgments
        2.6.3 Inference rules
        2.6.4 Environments
        2.6.5 Putting it together
    2.7 The Formal Semantics
        2.7.1 Normalization
        2.7.2 Static type inference
        2.7.3 Dynamic Evaluation
        2.7.4 Formal Semantics internal links
3 The XQuery type system
    3.1 Values
    3.2 Types
        3.2.1 Item types
        3.2.2 Content models
        3.2.3 Top level definitions
        3.2.4 Built-in type declarations
        3.2.5 Syntactic constraints on types
        3.2.6 Example
    3.3 Judgments for name hierarchies
        3.3.1 Derives
        3.3.2 Substitutes
    3.4 Judgments for type matching
        3.4.1 Element and attribute lookup
        3.4.2 Matches
        3.4.3 Subtype
    3.5 Judgments for sequences of item types
    3.6 Judgments for the typeswitch expression
    3.7 Judgments for type promotion
    3.8 Judgments for the validate expression
        3.8.1 Builtin attributes
        3.8.2 Extension
        3.8.3 Mixed content
        3.8.4 Adjusts
        3.8.5 Type resolution
        3.8.6 Element and attribute lookup
        3.8.7 Interleaving
        3.8.8 Filtering
        3.8.9 Structural matching
            3.8.9.1 Structural nil matching
            3.8.9.2 Structural matching
        3.8.10 Erasure
            3.8.10.1 Simply erases
            3.8.10.2 Erases
        3.8.11 Annotate
            3.8.11.1 Simply annotate
            3.8.11.2 Nil-annotate
            3.8.11.3 Annotate
4 Basics
    4.1 Expression Context
        4.1.1 Static Context
        4.1.2 Evaluation Context
    4.2 Input Functions
    4.3 Expression Processing
    4.4 Types
        4.4.1 Type Checking
        4.4.2 SequenceType
            4.4.2.1 SequenceType Matching
        4.4.3 Type Conversions
            4.4.3.1 Atomization
            4.4.3.2 Effective Boolean Value
    4.5 Variable Bindings
    4.6 Errors and Conformance
5 Expressions
    5.1 Primary Expressions
        5.1.1 Literals
        5.1.2 Variables
        5.1.3 Parenthesized Expressions
        5.1.4 Function Calls
        5.1.5 Comments
    5.2 Path Expressions
        5.2.1 Steps
            5.2.1.1 Axes
            5.2.1.2 Node Tests
        5.2.2 Predicates
        5.2.3 Unabbreviated Syntax
        5.2.4 Abbreviated Syntax
    5.3 Sequence Expressions
        5.3.1 Constructing Sequences
        5.3.2 Combining Sequences
    5.4 Arithmetic Expressions
    5.5 Comparison Expressions
        5.5.1 Value Comparisons
        5.5.2 General Comparisons
        5.5.3 Node Comparisons
        5.5.4 Order Comparisons
    5.6 Logical Expressions
    5.7 Constructors
        5.7.1 Element Constructors
        5.7.2 Computed Element and Attribute Constructors
            5.7.2.1 Computed Text Nodes
            5.7.2.2 Computed Document Nodes
            5.7.2.3 Computed Element Nodes
            5.7.2.4 Constructed Attribute Nodes
        5.7.3 Whitespace in Constructors
        5.7.4 Other Constructors and Comments
    5.8 [For/FLWOR] Expressions
        5.8.1 FLWR expressions
        5.8.2 For expression
        5.8.3 Let Expression
        5.8.4 Order By
    5.9 Unordered Expressions
    5.10 Conditional Expressions
    5.11 Quantified Expressions
    5.12 Expressions on SequenceTypes
        5.12.1 Instance Of
        5.12.2 Typeswitch
        5.12.3 Cast
        5.12.4 Castable
        5.12.5 Constructor Functions
        5.12.6 Treat as
    5.13 Validate Expressions
6 The Query Prolog
    6.1 Namespace Declarations
    6.2 Schema Imports
    6.3 Xmlspace Declaration
    6.4 Default Collation
    6.5 Function Definitions
7 Additional Semantics of Functions
    7.1 Formal Semantics Functions
        7.1.1 The fs:characters-to-string function
        7.1.2 The fs:distinct-doc-order function
        7.1.3 The fs:item-sequence-to-node-sequence function
        7.1.4 The fs:item-sequence-to-string function
    7.2 Functions with specific typing rules
        7.2.1 The fn:error function
        7.2.2 The fn:distinct-nodes, and fn:distinct-values functions
        7.2.3 The op:union, op:intersect and op:except operators
        7.2.4 The op:to operator
        7.2.5 The fn:data function
8 Importing Schemas
    8.1 Introduction
        8.1.1 Features
        8.1.2 Organization
        8.1.3 Main mapping rules
        8.1.4 Special attributes
            8.1.4.1 use
            8.1.4.2 minOccurs and maxOccurs
            8.1.4.3 mixed
            8.1.4.4 nillable
            8.1.4.5 substitutionGroup
        8.1.5 Anonymous type names
    8.2 Attribute Declarations
        8.2.1 Global attributes declarations
        8.2.2 Local attribute declarations
    8.3 Element Declarations
        8.3.1 Global element declarations
        8.3.2 Local element declarations
    8.4 Complex Type Definitions
        8.4.1 Global complex type
        8.4.2 Local complex type
        8.4.3 Complex type with simple content
        8.4.4 Complex type with complex content
    8.5 Attribute Uses
    8.6 Attribute Group Definitions
        8.6.1 Attribute group definitions
        8.6.2 Attribute group reference
    8.7 Model Group Definitions
    8.8 Model Groups
        8.8.1 All groups
        8.8.2 Choice groups
        8.8.3 Sequence groups
    8.9 Particles
        8.9.1 Element reference
        8.9.2 Group reference
    8.10 Wildcards
        8.10.1 Attribute wildcards
        8.10.2 Element wildcards
    8.11 Identity-constraint Definitions
    8.12 Notation Declarations
    8.13 Annotation
    8.14 Simple Type Definitions
        8.14.1 Global simple type definition
        8.14.2 Local simple type definition
        8.14.3 Simple type content
    8.15 Schemas as a whole
        8.15.1 Schema
        8.15.2 Include
        8.15.3 Redefine
        8.15.4 Import

Appendices

A Normalized core grammar
    A.1 Core lexical structure
        A.1.1 Syntactic Constructs
    A.2 Core BNF
B Mapping of Overloaded Internal Functions
C References
    C.1 Normative References
    C.2 Non-normative References
    C.3 Background References
D Issues
    D.1 Introduction
    D.2 Issues list
    D.3 Alphabetic list of issues
        D.3.1 Open Issues
        D.3.2 Resolved (or redundant) Issues
    D.4 Delegated Issues
        D.4.1 XPath 2.0
        D.4.2 XQuery
        D.4.3 Operators


1 Introduction

This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:

Ed. Note: [Kristoffer/XSL] The role of the formal semantics document could be clarified: does it supplement or subsume the language specification in case of conflict?

The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].

[XPath/XQuery] is a powerful language, capable of selecting and extracting complex patterns from XML documents , reformulating them into results in arbitrary ways, and computing expressions over the result. This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the constructions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.

Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.

[XPath/XQuery] is a functional language. It is built from expressions, rather than statements. Every construct in the language (except for the query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.

[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas describing the documents that will be processed, and the [XPath/XQuery] language can then perform operations based on these types (e.g., using a treat as operation). In addition, [XPath/XQuery] also supports a level of static type analysis. This means the system can infer the output type of an expression, based on the type of its inputs. Static typing allows early error detection, and can be used as the basis for certain forms of optimization. The type system of [XPath/XQuery] is based on [XML Schema]. The [XPath/XQuery] type system models most of the features of [XML Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types.

The [XPath/XQuery] formal semantics builds on long-standing traditions in the database and in the programming languages communities. In particular, the [XPath/XQuery] formal semantics has been inspired by works on SQL [SQL], OQL [ODMG], [BKD90], [CM93], and nested relational algebra (NRA) [BNTW95], [Col90], [LW97], [LMW96]. Other references include Quilt [Quilt], UnQL [BFS00], XDuce [HP2000], XML-QL [XMLQL99], XPath 1.0 [XML Path Language (XPath) : Version 1.0], XQL [XQL99], XSLT [XSLT 99], and YATL [YAT99]. Additional related work can be found in the bibliography [C References].

This document is organized as follows. [2 Preliminaries] introduces notations used to define the [XPath/XQuery] formal semantics. [3 The XQuery type system] describes the [XPath/XQuery] type system, operations on the [XPath/XQuery] type system, and explains the relationship between the [XPath/XQuery] type system and XML Schema. [4 Basics] describes semantics for basic [XPath/XQuery] concepts, and [5 Expressions], describes the dynamic and static semantics of [XPath/XQuery] expressions. [6 The Query Prolog] describes the semantics of the [XPath/XQuery] prolog. [7 Additional Semantics of Functions] describes any additional semantics required for [XPath/XQuery] functions. Finally, [8 Importing Schemas], specifies how XML Schemas are imported into the [XPath/XQuery] type system.

2 Preliminaries

This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.

2.1 Processing model

First, a processing model for [expression/query] evaluation is defined. This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.

The processing model consists of five phases; each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we furthermore point to the relevant notations introduced later in the document.

  1. Parsing. The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal representation of the parsed syntax tree corresponding to the query is created: this ensures that we can unambiguously specify the following phases by a case per syntax production.

  2. Context Processing. The semantics of [expression/query] depends on the input context. The input context needs to be generated before the [expression/query] can be processed. In XQuery, the input context is described in the Query Prolog (See [6 The Query Prolog]). In XPath, the input context is generated by the processing environment. The input context is split in a static and a dynamic part (denoted statEnv and dynEnv, respectively).

  3. Normalization. To simplify the semantics specification, some normalization is performed on the [expression/query]. The [XPath/XQuery] language provides many powerful features that make [expression/query]s simpler to write and use, but are also redundant. For instance, a complex for expression might be rewritten as a composition of several simple for expressions. The language composed of these simpler [expression/query] is called the [XPath/XQuery] Core language and is described by a grammar which is a subset of the XQuery grammar. The grammar of the [XPath/XQuery] Core language is given in [A Normalized core grammar].

    During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literals.

    Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.7.1 Normalization].

    After normalization, the full semantics can be obtained just by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.

  4. Static type analysis. Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and the known types of input documents into account.

    If the [expression/query] is not type-safe, static type analysis can result in a type error. For instance, a comparison between an integer value and a string value might be detected as an error during the static type analysis. If static type analysis succeeds, it results in a syntax tree where each sub-expression is "decorated" with its static type, and in an output type for the result of the [expression/query].

    More precisely the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.7.2 Static type inference].

    Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.

  5. Dynamic Evaluation. This is the phase in which the result of the [expression/query] is computed. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals. Evaluation may result in a value OR a dynamic error (or, once static typing is made optional, in a type error).

    The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.7.3 Dynamic Evaluation].

Ed. Note: Issue:[Kristoffer/XSL] The distinction between errors manifest by not being able to prove a judgment and special error values is not clear. Also the processing model does not allow for skipping static typing. See [Issue-0094:  Static type errors and warnings], [Issue-0171:  Raising errors], and [Issue-0169:  Conformance Levels].

The first four phases above are "analysis-time" (sometimes also called "compile-time") steps, which is to say that they can be performed on a [expression/query] before examining any input document. Indeed, analysis-time processing can be performed before such document even exists. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. The last phase is an "execution-time" (sometimes also called "run-time") step, which is to say that the query is evaluated on actual input document(s).

Static analysis catches only certain classes of errors. For instance, it can detect a comparison operation applied between incompatible types (e.g., xs:int and xs:date). Some other classes of errors cannot be detected by the static analysis and are only detected at execution time. For instance, whether an arithmetic expression on 32 bits integers (xs:int) yields an out-of-bound value can only be detected at run-time by looking at the data.

While implementations may be free to employ different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report static type errors, as specified in this document. It is still an open issue (See [Issue-0098:  Implementation of and conformance levels for static type checking]) whether to make the static type analysis phase optional to allow implementations to return a result for a type-invalid [expression/query] in special cases where the [expression/query] happens to perform correctly on a particular instance of input data. (Note that this is not the same as merely permitting that the static type error is emitted at evaluation time as is done below.)

Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: [expression/query] processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model defines only what the final result.

The above processing phases are all internal to the [expression/query] processor. They do not deal with how the [expression/query] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:

  1. XML Schema import phase. The [XPath/XQuery] type system is based on XML Schema. In order to perform static type analysis, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [8 Importing Schemas].

  2. XML loading phase. Expressions are evaluated on values in the [XQuery 1.0 and XPath 2.0 Data Model]. XML documents must be loaded into the [XQuery 1.0 and XPath 2.0 Data Model] before the evaluation phase. This is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not discussed further here.

  3. Serialization phase. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is still an open issue (See [Issue-0116:  Serialization]) and is not discussed further here.

The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.

For the other three phases (normalization, static type analysis and evaluation), instead or giving an algorithm in pseudo-code, the semantics is described formally by means of inference rules. Inference rules provide a notation to describe how the semantics of an expression derives from the semantics of its sub-expressions. Hence, they provide a concise and precise way for specifying bottom-up algorithms, as required by the normalization, static type analysis, and evaluation phases.

2.2 Namespaces

The Formal Semantics uses the following namespace prefixes.

The accessors and constructors defined by the data model are shown with the prefix dm. Some special functions, variables and types used in the Formal Semantics are shown with the prefix fs. Those prefixes are always shown in italics to emphasize that the corresponding functions, variables, and types are abstract: they are not and cannot be made accessible directly from the host language.

All these prefixes are assumed to be bound to the appropriate URIs.

2.3 Data Model

2.3.1 Data model overview

This section gives an overview of the main data model features that are relevant to the Formal Semantics. The reader is referred to the [XQuery 1.0 and XPath 2.0 Data Model] document for more details.

  • An atomic value is a value in the value space of one of the [XML Schema Part 2] atomic types (that is, a primitive simple type that is not derived by list or union).

  • A node is one of the following kinds: document, element, attribute, namespace, comment, processing-instruction and text. The [XPath/XQuery] Formal Semantics does not currently discuss namespace, comment, and processing-instruction nodes (See [Issue-0105:  Types for nodes in the data model.]). Components of nodes can be typed values, string values, names, type annotations, or other nodes.

  • XML documents and their contents are represented in the data model as trees composed of nodes and atomic values. [XQuery 1.0 and XPath 2.0 Data Model] defines a mapping from the PSVI (Post Schema Validation Information Set) to such trees.

  • [XQuery 1.0 and XPath 2.0 Data Model] defines constructors, which are functions that construct nodes and atomic values, and accessors, which are functions that access a value's properties or components.

    For instance, the children function returns all the child nodes of an element node. The typed value of a node is a sequence of zero or more atomic values. The string value of a node is an instance of the type xs:string. The name of a node is an instance of the type xs:QName.

  • An item is either an atomic value or a node.

  • Every value in the data model is a sequence. A sequence is an ordered collection of zero or more items. Sequences are represented by items separated by commas, enclosed in parentheses.

    A sequence containing exactly one item is called a singleton sequence. An item is identical to a singleton sequence containing that item. Sequences are never nested--for example, combining the values 1, (2, 3), and () into a single sequence results in the sequence (1, 2, 3). A sequence containing zero items is called an empty sequence.

  • The data model annotates element nodes, attribute nodes, and atomic values with type names. This type annotation is the type name given to the element, attribute, or value through the validation process.

The remainder of the section is devoted to several related features that merit special attention: node identity, order, and type annotations.

2.3.2 Node identity

In the [XQuery 1.0 and XPath 2.0 Data Model], nodes have identity. Node identity follows from the unique location of the node within exactly one XML document (or document fragment, in the case of XML being constructed within the [expression/query] itself). It is possible for two expressions to return not only the same value, but also the identical node. On the other hand, two expressions could return results with the same document structure, but different identities. [XPath/XQuery] supports comparison of nodes using either "node equality" (equality by identity) or "value equality" (equality by value), for instance by using the operators is or =.

Atomic values do not have an associated identity and are therefore always compared by value.

The difference between node equality and value equality is illustrated by the following example:

   let $a1 := <book><author>Suciu</author></book>,
       $a2 := <author>Suciu</author>,
       $a3 := $a1/author
   return
       ($a1/author is $a3), {-- true, they refer to the same node --}
       ($a1/author = $a2),  {-- true, they have the same value --}
       ($a1/author is $a2)  {-- false, they are not the same node --}

All [XPath/XQuery]'s operators preserve node identity with the exception of explicit copy operations, node (element, attribute, etc.) constructors and validate expressions. An element constructor always creates a deep copy of its attributes and child nodes. Other operators, such as sequence constructors or path expressions, do not result in copies of the corresponding nodes. So, for example, ($a3, $a2) creates a new sequence of nodes, the first of which has the same identity as $a1/author.

2.3.3 Document order and sequence order

There are two kinds of order in [XPath/XQuery]: sequence order and document order.

Sequence order. Sequences of items in the [XPath/XQuery] data model are ordered. Sequence order refers to the order of items within a given sequence.

Document order. Document order refers to the total order among all nodes in a given document. It is defined as the order of appearance of the nodes when performing a pre-order, depth-first traversal of a tree. For elements, this corresponds to the order of appearance of their opening tags in the corresponding XML serialization. Document order is equivalent to the definition used in [XML Path Language (XPath) : Version 1.0].

Depending on the context, it may be possible to talk about two different notions of order for the same (set of) nodes. For example:

   let $e := <list>
               <item id="1"/>
               <item id="2"/>
             </list>,
       $s := ($e/item[@id='2'], $e/item[@id='1'])
   ...

The item "1" is before item "2" in document order, but the same two nodes are in the opposite order in the sequence $s.

The order between nodes from distinct documents is implementation defined but must be stable. That is, it must not change for the duration of query evaluation.

2.3.4 Type annotations and anonymous types

A type annotation indicates the type name for each element and attribute node, which results from the validation process or is assigned by default (depending on the way the data was built).

For example, consider the document fragment

  <fact>The cat weighs <weight>12</weight> pounds.</fact>

and its associated schema

  <xs:element name="fact" type="FactType"/>

  <xs:complexType name="FactType" mixed="true">
    <xs:sequence>
      <xs:element name="weight" type="xs:integer">
    </xs:sequence>
  </xs:complexType>

Before validation, the two elements fact and weight in the document are annotated by xs:anyType.

After validation, the element fact is annotated with the type name FactType and the element weight is annotated with the type name xs:integer.

In the case a schema type does not have an explicit name, it is called of anonymous types. In that case, the data model annotates the corresponding element or attributes with an implementation-defined type name. Those special anonymous type names, are not visible to the user.

For example, the above schema could be written as an anonymous type as follows.

and its associated schema

  <xs:element name="fact">
    <xs:complexType mixed="true">
      <xs:sequence>
        <xs:element name="weight" type="xs:integer">
      </xs:sequence>
    </xs:complexType>
  </xs:element>

In that case, a type name is automatically generated by the system, this name is used to annotated the fact elements after schema validation.

Type annotations are have an impact on the semantics of [XPath/XQuery]. For example, consider the following function declaration

  define function convert_weight($x as element of type FactType)
    { ... }

This function only accepts as input an element annotated with the type FactType, or with one of the types derived from it. Therefore, the previous document fragment is only accepted by the function after it has been validated against the given schema.

[3 The XQuery type system] gives a formal description of data model values with type annotations and explain how type annotations are taken into account when matching a value against a given type.

2.4 Schemas and types

This section presents an introduction to (statically) typed languages in general, and a conceptual overview of the [XPath/XQuery] type system in particular. The [XPath/XQuery] type system is discussed formally in [3 The XQuery type system].

2.4.1 The elements of a (statically) typed language

A type system relates values, types, and expressions in the language.

The main constituent parts of a typed system are:

  • A universe of possible types. Type are composed of other types, starting with primitive types, and can be built using type constructors.

    In [XPath/XQuery], the primitive types are the primitive atomic types of [XML Schema Part 2]. Type constructors are also based on schema, and can be used to build types for attributes and elements, sequence and choice groups, etc. The [XPath/XQuery] type system is defined in [3.2 Types].

  • A notation for specifying types.

    In [XPath/XQuery], types are imported from XML Schema, and can be used in expressions using the SequenceType syntax. In order to model and reason about types in [XPath/XQuery], this document introduces its own notation for types. This notation is defined in [3.2 Types].

  • A relationship between values and types. Formally, the semantics of a type is defined to be the (usually infinite) set of instance values which match that type.

    For example, the type xs:integer consists of the set of all integer values, and the type element address { xs:string } consists of all elements named "address" whose contents are a single string value, etc.

    Matching a [XPath/XQuery] value against a type is defined in [3.4 Judgments for type matching].

  • A notion of subtyping. Subtyping is a relationship between types that plays an important role in a statically typed language. Notably, it is used to determine whether function calls are legal or not.

    Subtyping in [XPath/XQuery] is defined in [3.4.3 Subtype].

For a statically typed language one also needs to define:

  • Rules that relate expressions to types. Static type analysis infers a type for each expression in the language. The most important property for a static type system is that the type inferred statically must be such that for all valid inputs, evaluating the expression returns a result that matches the inferred type. If this property holds, then the static typing provides guarantees that certain kinds of errors can not occur at run-time.

    The static typing rules for [XPath/XQuery] are defined for all [XPath/XQuery] expressions in [5 Expressions], [6 The Query Prolog] and [7 Additional Semantics of Functions].

Finally, types can also be used during language evaluation if the language provides:

  • Expressions on types. A language can support operations that take types into account, such as casting, type matching, etc.

    [XPath/XQuery] provide several such expressions: "typeswitch", "cast as", "treat as", and "validate as". [XPath/XQuery] also support a "validate" expression, which performs XML Schema validation. Validation is related to matching in that if validation succeeds, it returns a value which matches the type used for validation. Expressions on types are defined in [5.12 Expressions on SequenceTypes]. The validate expression is defined in [3.8 Judgments for the validate expression].

The remainder of the section is devoted to several related features that merit special attention: the relationship between the [XPath/XQuery] type system and XML Schema, structural and named typing, and subtyping.

2.4.2 XML Schema and the XQuery type system

The [XPath/XQuery] type system is based on [XML Schema]. An introduction to XML Schema can be found in [XML Schema Part 0].

The [XPath/XQuery] type system captures a large subset of XML Schema. This section gives a summary of which features are captured and in some cases deviations from XML Schema.

The following components and features of XML Schema are captured exactly in the [XPath/XQuery] type system:

  • Element and attribute declarations;

  • Simple and complex type definitions;

  • Local attributes and elements;

  • Named and anonymous types;

  • the type name hierarchy;

  • Sequence, choice and all groups;

  • Wildcards;

  • Derivation by extension, list and union;

  • Substitution groups.

The following components or features of XML Schema are not represented in the [XPath/XQuery] type system.

  • Simple type facets;

  • Identity-constraint definitions;

  • Notations and annotations.

The following components or features of XML Schema are represented partially or differently in the [XPath/XQuery] type system.

  • The [XPath/XQuery] type system only captures an approximation of minOccurs and maxOccurs. Like DTDs, the [XPath/XQuery] type system only supports *, +, and ? which correspond to [minOccurs="0" maxOccurs="unbounded"], [minOccurs="1" maxOccurs="unbounded"], and [minOccurs="0" maxOccurs="1"] respectively.

  • The [XPath/XQuery] type system generalizes the notion of derivation by restriction for the needs of static type analysis. Derivation by restriction in XML Schema relies on a syntactic relationship between content models. The [XPath/XQuery] type system uses a generalization of this relationship based on the notion of inclusion between types -- also called structural subtyping.

At compile time, the [XPath/XQuery] environment imports XML Schema declarations, and loads them as declarations in the [XPath/XQuery] type system. This loading process is specified by a mapping from XML Schema to the [XPath/XQuery] type system, given in [8 Importing Schemas].

2.4.3 Structural and named typing

XML Schema is a powerful schema language for XML. XML Schema can describe both structural constraints as well as type annotations that apply to XML documents. These aspects are referred to as structural typing and named typing respectively.

  • Structural typing. An important notion underlying XML Schema is the concept of regular expressions, and their tree counterparts, regular tree grammars. An introduction to regular (tree) languages can be found in [Languages] or in [TATA].

    Tree grammars can be used to capture the structural constraints imposed by an XML Schema. Automata are the traditional way of implementing tree grammars and can be used to implement part of the XML Schema validation process.

    For instance, consider the following element declaration:

      <xs:element name="root">
        <xs:complexType>
          <xs:sequence maxOccurs="unbounded">
    	<xs:choice minOccurs="0" maxOccurs="unbounded">
    	  <xs:element ref="a"/>
    	  <xs:element ref="b" minOccurs="0"/>
    	</xs:choice>
    	<xs:element ref="c"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    

    The structural constraints imposed by the definition of the content of element root can be described by the following regular expression (written here in DTD notation): ((a|b?)*,c)+. Such a constraint can be checked using a finite state automaton applied on the children of the element root in the instance document.

  • Named typing. In addition, XML Schema provides the ability to associate names with type definitions and to declare relationships between type names.

    For instance, consider the following two element declarations:

      <xs:element name="person" type="Person"/>
    
      <xs:complexType name="Person">
        <xs:sequence>
          <xs:element ref="ssn">
          <xs:choice>
            <xs:sequence>
              <xs:element ref="major" minOccurs="0"/>
              <xs:element ref="minors" maxOccurs="unbounded"/>
            </xs:sequence>
            <xs:element ref="job_desc"/>
          </xs:choice>
        </xs:sequence>
      </xs:complexType>
    
      <xs:element name="student" type="Student"/>
    
      <xs:complexType name="Student">
        <xs:complexContent>
          <xs:restriction base="Person">
            <xs:sequence>
              <xs:element ref="ssn">
              <xs:element ref="major" minOccurs="0"/>
              <xs:element ref="minors" maxOccurs="unbounded"/>
            </xs:sequence>
          </xs:restriction>
        </xs:complexType>
      </xs:complexContent>
    

    The type Studentof the element student is derived by restriction from the type Person. Such a declaration not only defines a new type, but also creates a relationship between the type names Person and Student which needs to be taken into account when doing validation and matching against a given type. For instance, the following function

      define function salary($x as element of type Person)
        { .... }
    

    accepts as parameters elements of type Person as well as elements with types derived by restriction from Person, but does not accept elements with the same content model as the element person but a type which does not derive from the type Person.

2.4.4 Subtyping

A type is a subtype of another type if every value matching the first type also matches the second type. Since matching takes both type structure and type names into account, so is subtyping.

Here is a simple example of subtyping on two regular expressions:

  ( xs:double )*
  ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains only doubles is also an example of a sequence that contains either integers or doubles. This example only illustrates the structural subtyping.

Type1 <: Type2 indicates that type Type1 is a subtype of Type2. Note that the notation <: is used when defining the [XPath/XQuery] semantics, but is not part of the [XPath/XQuery] syntax.

2.4.4.1 Type substitutability

Subtyping plays a crucial role in the semantics of function application since all functions have an associated signature that declares the types of input parameters and the type of the result of the function.

Intuitively, one can call a function not only by passing an argument of the declared type, but also any argument whose type is a subtype of the declared type. This property is called type substitutability.

Ed. Note: [Kristoffer/XSL] Type substitutability should really be true for any subexpression, not just function arguments (it just happens that in functional languages, where type substitutability is usually defined, the situations are equivalent). Specifically type substitutability does not apply to any subexpression of [XPath/XQuery]: typeswitch violates it. In particular this means that while one may replace a function argument with one whose type is a subtype this does not guarantee that the resulting value is the same even if the value is the same (but cast to a subtype).

2.4.4.2 Subtyping and XML Schema derivation

Subtyping in [XPath/XQuery] is related to, but not identical to, "derivation by restriction" in XML Schema. A type derived by restriction is always a subtype of its base type, but the opposite is not true. For instance, consider the following types:

  ( xs:double )+
  xs:double, ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains one or more doubles is also an example of a sequence containing a double followed by a sequence of zero or more integers or doubles. But the first type cannot be derived from the second type using derivation by restriction as defined in XML Schema.

2.5 Functions

2.5.1 Functions and operators

The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides a number of useful basic functions over the components of the [XPath/XQuery] data model (atomic values, nodes, sequences, etc). A number of these functions are used in the course of describing the [XPath/XQuery] semantics. Here is the list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the [XPath/XQuery] Formal Semantics:

  • op:numeric-add
  • op:numeric-subtract
  • op:numeric-multiply
  • op:numeric-divide
  • op:numeric-mod
  • op:numeric-unary-plus
  • op:numeric-unary-minus
  • op:numeric-equal
  • op:numeric-less-than
  • op:numeric-greater-than
  • op:boolean-equal
  • op:boolean-less-than
  • op:boolean-greater-than
  • op:yearMonthDuration-equal
  • op:yearMonthDuration-less-than
  • op:yearMonthDuration-greater-than
  • op:dayTimeDuration-equal
  • op:dayTimeDuration-less-than
  • op:dayTimeDuration-greater-than
  • op:dateTime-equal
  • op:dateTime-less-than
  • op:dateTime-greater-than
  • op:add-yearMonthDurations
  • op:subtract-yearMonthDurations
  • op:multiply-yearMonthDuration
  • op:divide-yearMonthDuration
  • op:add-dayTimeDurations
  • op:subtract-dayTimeDurations
  • op:multiply-dayTimeDuration
  • op:divide-dayTimeDuration
  • op:add-yearMonthDuration-to-dateTime
  • op:add-dayTimeDuration-to-dateTime
  • op:subtract-yearMonthDuration-from-dateTime
  • op:subtract-dayTimeDuration-from-dateTime
  • op:add-yearMonthDuration-to-date
  • op:add-dayTimeDuration-to-date
  • op:subtract-yearMonthDuration-from-date
  • op:subtract-dayTimeDuration-from-date
  • op:add-dayTimeDuration-to-time
  • op:subtract-dayTimeDuration-from-time
  • op:QName-equal
  • op:anyURI-equal
  • op:hex-binary-equal
  • op:base64-binary-equal
  • op:NOTATION-equal
  • op:node-equal
  • op:node-before
  • op:node-after
  • op:node-precedes
  • op:node-follows
  • op:to
  • op:concatenate
  • op:item-at
  • op:union
  • op:intersect
  • op:except
  • op:operation
  • fn:false
  • fn:true
  • fn:count
  • fn:boolean
  • fn:get-namespace-uri
  • fn:get-local-name
  • fn:round
  • fn:compare
  • fn:not
  • fn:empty
  • fn:root
  • fn:error

2.5.2 Functions and static typing

Many functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document are generic: they perform operations on arbitrary components of the data model, e.g., any kind of node, or any sequence of items. For instance, the fn:distinct-nodes function removes duplicates in any sequence of nodes. As a result, the signature given in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document is also generic. For instance, the signature of the fn:distinct-nodes function is:

  fn:distinct-nodes(node*) as node*

If applied as is, this signature would result in poor static type information. In general, it would be preferable if some of these function signatures could take the type of input parameters into account. For instance, if the function fn:distinct-nodes is applied on a parameter of type element a*, element b, one can easily deduce that the resulting sequence is a collection of either a or b elements.

In order to provide better static typing, some specific typing rules are specified for some of the functions in [XQuery 1.0 and XPath 2.0 Functions and Operators]. These additional typing rules are given in [7 Additional Semantics of Functions]. Here is the list of functions that are given specific typing rules.

  • fn:error
  • fn:distinct-nodes
  • fn:distinct-values
  • op:union
  • op:intersect
  • op:except
  • op:to
  • fn:data

Ed. Note: There might be other functions that need specific typing rules. See [Issue-0135:  Semantics of special functions].

2.5.3 The error function

Some queries result in an error. Errors that are detected at compile-time, like parse errors or type errors, are reported to the user by the system. Dynamic errors are raised using the fn:error() function. General handling of errors in the Formal Semantics is still an open issue (See [Issue-0094:  Static type errors and warnings], [Issue-0098:  Implementation of and conformance levels for static type checking], [Issue-0171:  Raising errors], and [Issue-0169:  Conformance Levels]).

2.5.4 Data Model Accessors

Here is a list of data model constructors or accessors used for Formal Semantics specification.

  • dm:atomic-value
  • dm:children
  • dm:attributes
  • dm:parent
  • dm:name
  • dm:node-kind
  • dm:dereference
  • dm:empty-sequence
  • dm:namespaces
  • dm:type
  • dm:typed-value
  • dm:attribute-node-atomic
  • dm:element-node-atomic
  • dm:text-node

2.5.5 Other Formal Semantics functions

Here is a list of additional functions used for Formal Semantics specification.

  • fs:characters-to-string
  • fs:distinct-doc-order
  • fs:item-sequence-to-node-sequence
  • fs:item-sequence-to-string

2.6 Notations

In order to make the specification of the semantics both concise and precise, formal notations in the form of judgments and inference rules are used. This section, introduces the notations for judgments, inference rules and mapping rules, as well as the notion of an environment.

This section is intended for the reader who may not be familiar with these notations. The reader familiar with these notations can skip this section and go directly to [2.7 The Formal Semantics].

2.6.1 Grammar productions

Grammar productions are used in this document to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.

XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.

[28 (XQuery)]   FLWRExpr   ::=   ((ForClause |  LetClause)+ WhereClause? OrderByClause? "return")* QuantifiedExpr

For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in the document, and are annotated with "(XPath)". For instance, the following production describes for expressions in XPath.

[25 (XPath)]   ForExpr   ::=   (SimpleForClause "return")* QuantifiedExpr

XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.

[23 (Core)]   ForExpr   ::=   (ForClause OrderByClause? "return")* TypeswitchExpr

The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe the [XPath/XQuery] type system. XQuery Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.

[37 (Formal)]   Definition   ::=   (<"define" "element"> ElementName Substitution? Nillable? TypeReference)
|  (<"define" "attribute"> AttributeName TypeReference)
|  (<"define" "type"> TypeName TypeDerivation)

Note that grammar productions which are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.

2.6.2 Judgments

The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.

For example:

Notation

The judgment

Painting is beautiful

holds if the painting Painting is beautiful.

Or more relevant, here are two example of judgments that are used extensively in the rest of this document.

Notation

The judgment

Expr => Value

holds if the expression Expr yields (or evaluates to) the value Value.

Notation

The judgment

Expr : Type

holds when the expression Expr has type Type.

A judgment can contain symbols and patterns.

Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.

Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr can be used to represent any [XPath/XQuery] expression, Value can be used to represent any value in the [XPath/XQuery] data model.

When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).

Patterns may appear with subscripts (e.g. Expr1, Expr2), which simply name different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. On the contrary, '$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.

In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.

2.6.3 Inference rules

Whether a judgments holds or not is specified by means of inference rules. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments. (As explained in [2.1 Processing model], this approach lends itself well to bottom-up algorithms.)

A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:

premise1 ... premisen

conclusion

All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.

Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:

$x => 0      3 => 3

$x + 3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.

It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:


3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.

The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:

Variable => Integer

Variable + 0 => Integer

As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

2.6.4 Environments

Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" statement can be captured in an environment and used for further evaluations.

An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.

If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped to. The notation is intentionally akin to function application as an environment can be seen as a "function" from the argument symbol to the "object" that the symbol is mapped to.

This specification uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".

Updating is only defined on environment groups:

  • "env [ mem(symbol |-> object) ]" denotes the a new environment group which is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol |-> object indicates that symbol is mapped to object in the new environment.

  • If the "object" is a type then the following conventional variant notation is used: "env [ mem(symbol : object) ]".

  • The following shorthand is also allowed: "env [ mem( symbol1 |-> object1 ; ... ; symboln |-> objectn ) ]" in which each symbol is mapped to a corresponding object in the new environment.

    This notation is equivalent to nested simple updates, as in " (...(env[mem(symbol1 |-> object1)]) ... [mem(symboln |-> objectn)])".

Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of variables, namespaces, etc. Also, note that there are no operations to remove entries from environments: the need never arises because the environment group from "before" the update remains accessible concurrently with the updated version.

Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is ubiquitous in inference rules.

For instance, the judgment

dynEnv |- Expr => Value

should be read: assuming the dynamic environment dynEnv, the expression Expr yields the value Value.

The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [4.1 Expression Context].

2.6.5 Putting it together

Putting the above notations together, here is an example of an inference rule that occurs later in this document:

statEnv |- Expr1 : Type1      statEnv |- Expr2 : Type2

statEnv |- Expr1 , Expr2 : Type1, Type2

This rule is read as follows: if two expressions Expr1 and Expr2 have already been statically inferred to have types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the type "Type1, Type2", which is the sequence of types Type1 and Type2.

The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let/return" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.

statEnv |- Expr1 : Type1      statEnv [ varType(QName : Type1) ] |- Expr2 : Type2

statEnv |- let $QName := Expr1 return Expr2 : Type2

This rule should be read as follows. First, the type Type1 for the "let" input expression Expr1 is computed. Second the "let" variable is added into the varType member of the static environment group statEnv, with type Type1. Finally, the type Type2 of Expr2 is computed in that new environment.

Ed. Note: Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.7 The Formal Semantics

Finally, this section introduces the top-level notations defining the processing model phases described above. These notations are used to define the normalization, static type analysis, and dynamic evaluation processing phases.

2.7.1 Normalization

Normalization is specified using mapping rules which describe how a [XPath/XQuery] expression is rewritten into another expression in the [XPath/XQuery] Core. Note that mapping rules are also used in [8 Importing Schemas] to specify how XML Schemas are imported into the [XPath/XQuery] Type System.

Notation

Mapping rules are written using a square bracket notation, as follows:

 
[Object]Subscript
==
Mapped Object

The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.

(Since normalization is always done in the context of the static context the above is really a shorthand for

statEnv |- [Object] Subscript == Mapped Object

but we shall stay with the shorthand because statEnv will always be implied.)

The static environment is used in certain cases (e.g. for normalization of function calls) during normalization. To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.

Ed. Note: [Kristoffer/XSL] We should decide whether to use a shorthand notation as suggested or modify the mapping rules throughout.

Specifically the normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is

 
[Expr]Expr
==
CoreExpr

which indicates that the expression Expr is normalized to the expression CoreExpr in the [XPath/XQuery] core (with the implied statEnv).

Example

For instance, the following [expression/query]

    for $i in (1, 2),
        $j in (3, 4)
    return
      element pair { ($i,$j) }

is normalized to the core expression

    for $i in (1, 2) return
      for $j in (3, 4) return
        return
          element pair { ($i,$j) }

in which the complex "FWLR" expression is mapped into a composition of two simpler "for" expressions.

2.7.2 Static type inference

The static semantics is specified using type inference rules, which relate [XPath/XQuery] expressions to types and specify under what conditions an expression is well typed.

Notation

The judgment

statEnv |- Expr : Type

holds when, assuming the static environment statEnv is given, the expression Expr has type Type.

Example

The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.

For instance, the following expression.

   let $v := 3
   return
       $v+5

has type xs:integer. This can be inferred as follows: the input literals '3' and '5' have type integer, so the variable $v also has type integer. Since the sum of two integers is an integer, the complete expression has type integer.

Note

The type of an expression is computed by inference. Static type inference rules are given to describe, for each kind of expression, how to compute the type of the expression given the types of its sub-expressions. Here is a simple example of such a rule:

statEnv |- Expr1 : xs:boolean      statEnv |- Expr2 : Type2      statEnv |- Expr3 : Type3

statEnv |- if Expr1 then Expr2 else Expr3 : ( Type2 | Type3 )

This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a choice: '(Type2|Type3)'.

The "left half" of the expression below the line (the part before the :) corresponds to some phrase in a [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal parse tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr1, Expr2, and Expr3) that need to be matched against the children of the node in the parse tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the two sub-expressions of the "," operator. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This illustrates a general feature of the type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the parse structure of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type-safe.

2.7.3 Dynamic Evaluation

The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.

Notation

The judgment

statEnv ; dynEnv |- Expr => Value

holds when, assuming the static environment statEnv and dynamic environment dynEnv are given, the expression Expr yields the value Value.

The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.

Example

For instance, the following expression.

   let $v := 3
   return
       $v+5

yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.

Note

As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions.

The inference rules used for dynamic evaluation, like those for static type inference, follow a top-down recursive structure, computing the value of expressions from the values of their sub-expressions.

2.7.4 Formal Semantics internal links

Note that for convenience, the formal semantics features internal hyperlinks to be able to recall of the judgments and environments being used in inference rules. Clicking on a judgment or environment should point to the location where that judgment or environment has been defined.

3 The XQuery type system

The [XPath/XQuery] type system is used for the specification of both the dynamic and the static semantics of [XPath/XQuery]. It is used to describe the semantics of expressions on sequence types and in type conversion rules. It is also the basis for the static semantics of [XPath/XQuery].

3.1 Values

This section introduces formal notations for describing XQuery values. Those notations are used extensively to describe and manipulate values in the inference rules. The formal notations for values introduced here are not exposed to the XQuery user.

A value is a sequence of zero or more items. An item is either a node or an atomic value. A node is either an element, an attribute, a document node, or a text node. Text nodes always contain string values of type xs:anySimpleType. Elements, attributes, and atomic values have a type annotation.

A atomic value encapsulates an XML Schema atomic type (which is its type annotation) and a corresponding value of that type. An XML Schema atomic type [XML Schema Part 2] may be primitive or derived, or xs:anySimpleType. The corresponding atomic type value may be a value in the value space of one of the 19 primitive XML Schema types.

Type annotations are always present. A type annotation can be either the QName of a declared type, or an anonymous type. Anonymous types corresponds to XML Schema types for which the user did not write an explicit names. Those type names are not visible to the user, but are generated during schema validation and used as an annotation in the data model. In the Formal Semantics, we write those type names with a a special notation: [Anon0], [Anon1], etc.

Untyped elements (e.g., in from well-formed documents) are annotated with xs:anyType, untyped attributes are annotated with xs:anySimpleType, and untyped atomic values (i.e., text content or attribute content in well-formed documents) are annotated with xs:anySimpleType.

A value is called a simple value if it consists of a sequence of zero or more atomic values.

[7 (Formal)]   Value   ::=   Item
|  (Value "," Value)
|  ("(" ")")
[8 (Formal)]   SimpleValue   ::=   AtomicValue
|  (SimpleValue "," SimpleValue)
|  ("(" ")")
[9 (Formal)]   ElementValue   ::=   "element" ElementName TypeAnnotation "{" Value "}"
[10 (Formal)]   AttributeValue   ::=   "attribute" AttributeName TypeAnnotation "{" SimpleValue "}"
[11 (Formal)]   DocumentValue   ::=   "document" "{" Value "}"
[12 (Formal)]   TextValue   ::=   "text" "{" String "}"
[13 (Formal)]   NodeValue   ::=   ElementValue
|  AttributeValue
|  DocumentValue
|  TextValue
[14 (Formal)]   Item   ::=   NodeValue
|  AtomicValue
[15 (Formal)]   AtomicValue   ::=   AtomicTypeValue TypeAnnotation
[16 (Formal)]   AtomicTypeValue   ::=   String
|  Boolean
|  Decimal
|  Float
|  Double
|  Duration
|  DateTime
|  Time
|  Date
|  GYearMonth
|  GYear
|  GMonthDay
|  GDay
|  GMonth
|  HexBinary
|  Base64Binary
|  AnyURI
|  QName
|  NOTATION
[17 (Formal)]   TypeAnnotation   ::=   <"of" "type"> TypeName
[18 (Formal)]   ElementName   ::=   QName
[21 (Formal)]   AttributeName   ::=   QName
[25 (Formal)]   TypeName   ::=   QName |  AnonymousTypeName
[24 (Formal)]   AnonymousTypeName   ::=   [Anon1] |  [Anon2]

Notation

In the above grammar, the atomic type names are used to represents the value space of the corresponding XML Schema atomic type. For instance, "String" indicates the value space of xs:string, and "Decimal" indicates the value space of xs:decimal.

Note that the same rule about constructing sequences apply to the values described by that grammar. Notably sequences cannot be nested. For example, the sequence (10, (1, 2), (), (3, 4)) is equivalent to the sequence (10, 1, 2, 3, 4).

Ed. Note: Issue: The formal semantics does not represent comments and process-instructions (See [Issue-0143:  Support for PI, comment and namespace nodes]).

Ed. Note: Issue: This formal notation for values only represents either text nodes or values. (See [Issue-0144:  Representation of text nodes in formal values])

Examples

A well-formed document

  <fact>The cat weighs <weight units="lbs">12</weight> pounds.</fact>

In the absence of a Schema, this document is represented as

  element fact of type xs:anyType {
    text { "The cat weighs " },
    element weight of type xs:anyType {
      attribute units of type xs:anySimpleType {
        "lbs" of type xs:anySimpleType
      }
      text { "12" }
    },
    text { " pounds." }
  }

A document before and after validation.

  <weight xsi:type="xs:integer">42</weight>

The formal model can represent values before and after validation. Before validation, this element is represented as:

  element weight of type xs:anyType {
    attribute xsi:type of type xs:anySimpleType {
      "xs:integer" of type xs:anySimpleType
    },
    text { "42" }
  }

After validation, this element is represented as:

  element weight of type xs:integer {
    attribute xsi:type of type xs:QName {
      "xs:integer" of type xs:QName
    },
    42 of type xs:integer
  }

Note that XML Schema permits attributes with name xsi:type and xsi:nil to appear on any element.

An element with a list type

  <sizes>1 2 3</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 2 3" }
  }

Assume the following Schema.

  <xs:element name="sizes" type="sizesType"/>
  <xs:simpleType name="sizesType">
    <xs:list itemType="sizeType"/>
  </xs:simpleType>
  <xs:simpleType name="sizeType">
    <xs:restriction base="xs:integer"/>
  </xs:simpleType>

After validation against this Schema, the element is represented as:

  element sizes of type sizesType {
    1 of type sizeType,
    2 of type sizeType,
    3 of type sizeType
  }

An element with an anonymous type

  <sizes>1 2 3</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 2 3" }
  }

Assume the following Schema.

  <xs:element name="sizes">
    <xs:simpleType>
      <xs:list itemType="xs:integer"/>
    </xs:simpleType>
  </xs:element>

After validation, this element is represented as:

  element sizes of type [Anon1] {
    1 of type xs:integer,
    2 of type xs:integer,
    3 of type xs:integer
  }

where [Anon1] stands for the internal anonymous name generated by the system for the sizes element.

An element with a union type

  <sizes>1 two 3 four</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 two 3 four" }
  }

Assume the following Schema:

  <xs:element name="sizes" type="sizesType"/>
  <xs:simpleType name="sizesType">
    <xs:list itemType="sizeType"/>
  </xs:simpleType>
  <xs:simpleType name="sizeType">
    <xs:union memberType="xs:integer xs:string"/>
  </xs:simpleType>

After validation against this Schema, the element is represented as:

  element sizes of type sizesType {
    1 of type xs:integer,
    "two" of type xs:string,
    3 of type xs:integer,
    "four" of type xs:string
  }

3.2 Types

This section introduces formal notations for describing XQuery types. Those notations are used extensively to describe and manipulate types in the inference rules. The formal notations for types introduced here are not exposed to the XQuery user.

3.2.1 Item types

An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, or the untyped type (i.e., which is the type of untyped text values).

[28 (Formal)]   ItemType   ::=   AtomicTypeName
|  ElementType
|  AttributeType
|  "document"
|  "text"
|  "untyped"
[29 (Formal)]   AtomicTypeName   ::=   QName
[30 (Formal)]   ElementType   ::=   "element" ElementName? Nillable? TypeReference?
[31 (Formal)]   AttributeType   ::=   "attribute" AttributeName? TypeReference?
[32 (Formal)]   Nillable   ::=   "nillable"
[34 (Formal)]   TypeReference   ::=   <"of" "type"> TypeName

An element or attribute type has an optional name and an optional type reference. A name alone corresponds to reference to a global element or attribute declaration. A name with a type reference corresponds to a local element or attribute declaration. The word "element" or "attribute" alone refers to the wildcard types any element or any attribute. In addition, an element type have an optional nillable flag which indicates whether the element can be nilled or not.

Examples

A text node type

  text

matches any text node, such as:

  text { "Text is beautiful" }

A wildcard element

  element

matches any element.

A wildcard element of type string

  element of type xs:integer 

matches any element of type string, such as:

  element name of type xs:string { "John Doe" }

A nillable element of type string

  element size nillable of type xs:integer 

matches any element with name size of type xs:integer, such as:

  element size of type xs:integer {
    2 of type xs:integer
  }

or it matches any element with name size which has no content and the xsi:nil attribute set to true, such as:

  element size of type xs:integer {
    attribute xsi:nil of type xs:anySimpleType { "true" }
  }

A reference to a globally declared attribute

  attribute sizes

refers to the global declaration for the attribute sizes.

An element with an anonymous type

  element sizes of type [Anon1]

matches any element with name sizes and the anonymous type [Anon1], such as:

  element sizes of type [Anon1] {
    1 of type xs:integer,
    2 of type xs:integer,
    3 of type xs:integer
  }

3.2.2 Content models

Following XML Schema, types in [XPath/XQuery] are composed from item types by optional, one or more, zero or more, all group, sequence, choice, empty sequence, or empty choice (written none).

The type none matches no values. It is called the empty choice because it is the identity for choice, that is (Type | none) = Type)). The type none appears in the definition of prime types in [3.6 Judgments for the typeswitch expression], and is the static type for [2.5.3 The error function].

[26 (Formal)]   Type   ::=   ItemType
|  (Type Occurrence)
|  (Type "&" Type)
|  (Type "," Type)
|  (Type "|" Type)
|  ("(" ")")
|  "none"
[27 (Formal)]   Occurrence   ::=   "*" |  "+" |  "?"

The type system includes three binary operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema. The type system includes three unary operators on types: "*", "+", and "?", corresponding respectively to zero or more instances of the type, one or more instances of the type, or an optional instance of the type.

The "&" operator builds the "interleaved product" of two types. The type Type1 & Type2 matches any sequence that is an interleaving of a sequence that matches Type1 and a sequence that matches Type2.

The interleaved product is used to represent all groups in XML Schema. All group in XML Schema are restricted to apply only on global or local element declarations with lower bound 0 or 1, and upper bound 1.

Examples

A sequence of elements

The "," operator builds the "sequence" of two types. For example,

  element title of type xs:string, element year of type xs:integer

is a sequence of an element title of type string followed by an element year of type integer.

A choice between two elements

The "|" operator builds a "choice" between two types. For example,

  element editor of type xs:string | element bib:author

means either an element editor of type string, or a reference to the gobal element bib:author.

An all group of two elements

The "&" operator builds the "interleaved product" of two types. For example,

  (element a & element b) =
    element a, element b
  | element b, element a

which specifies that the a and b elements can occur in any order.

An empty type

The following type matches the empty sequence.

  ()

A sequence of zero or more elements

The following type matches zero or more elements each of which can be a surgeon or a plumber.

  (element surgeon | element plumber)*

3.2.3 Top level definitions

At the top level, one can define elements, attributes, and types.

[37 (Formal)]   Definition   ::=   (<"define" "element"> ElementName Substitution? Nillable? TypeReference)
|  (<"define" "attribute"> AttributeName TypeReference)
|  (<"define" "type"> TypeName TypeDerivation)
[38 (Formal)]   Substitution   ::=   <"substitutes" "for"> ElementName
[33 (Formal)]   TypeDerivation   ::=   Derivation? Mixed? "{" Type? "}"
[35 (Formal)]   Derivation   ::=   ("restricts" TypeName)
|  ("extends" TypeName)
[36 (Formal)]   Mixed   ::=   "mixed"

A type definition has a name (possibly anonymous) and a type derivation. A type derivation indicates if the type is derived by extension or restriction from its based type, gives its content model, and an optional flag indicating if it has mixed content.

A type is derived either by extension or restriction from a base type. In case the type derivation is omitted, the type derives by restriction from xs:anyType, as in:

  define type Bib { xs:integer } =
  define type Bib restricts xs:anyType { xs:integer }

Empty content can be indicated with the explicit empty sequence, or omitted, as in:

  define type Bib { } =
  define type Bib { () }

Global element and attribute declarations have always a name, and a reference to a (possibly anonymous) type. In addition, a global element declaration may declare a substitution group for the element and whether the element is nillable.

Examples

A type declaration with complex content

  define type Address {
    element name of type xsd:string,
    element street of type xsd:string*
  }

A type declaration with complex content derived by extension

  define type USAddress extends Address {
    element zip name of type xsd:integer
  }

A type declaration with mixed content

  define type Section mixed {
    (element h1 of type xsd:string |
     element p of type xsd:string |
     element div of type Section)*
  }

A type declaration with simple content derived by restriction

  define type SKU restricts xsd:string { xsd:string }

An element declaration

  define element address of type Address

An element declaration with a substitution group

  define element usaddress substitutes for address of type USAddress

An element declaration which is nillable

  define element zip nillable of type xs:integer

3.2.4 Built-in type declarations

The following type definitions capture the primitive types from Schema.

  define type xs:string restricts xs:anySimpleType { xs:string }
  define type xs:decimal restricts xs:anySimpleType { xs:decimal }
  ...

In the above definitions, each name (such as xs:string) appears twice. The first appearance declares the relation between type names (xs:string is derived by restriction from xs:anySimpleType). The second declares the value space for the type (the value space of xs:string is described by xs:string, and corresponds to the value space as defined in the XML Schema specification).

The following type definition captures the Ur simple type from Schema.

  define type xs:anySimpleType restricts xs:anyType {
    ( xs:string | xs:decimal | ... )*
  }

The name of the Ur simple type is xs:anySimpleType. It is derived by restriction from xs:anyType, its content is given by the union of all primitive atomic types.

Note that there is no separate "value space" for the type untyped. It has the same value space as xs:string, but its type annotation is xs:anySimpleType. The following example shows the distinction between a value of type string containing "Database" and an untyped value containing "Database": both are using string values as content with different type annotations.

"Databases" of type xs:string
"Databases" of type xs:anySimpleType

The following type definition captures the Ur type from Schema.

  define type xs:anyType restricts xs:anyType {
    attribute*, ( xs:anySimpleType | ( element* & text* ) )
  }

The xs:anyType is defined by restriction from itself, and it can contain any number of attributes, and either a simple content or a complex content with any number of elements or text nodes.

3.2.5 Syntactic constraints on types

The type system given in [3.2 Types] gives a simple but overly general definition of types. For instance, the above type grammar allows types to describe sequences of attributes and elements in any order. For example, the following type, describing a sequence of zero or more elements or attributes with name "year", is allowed.

  (element year of type xs:string | attribute year of type xs:string)*

There are two reasons for this approach. First, it makes the grammar for types simpler. Second, it allows to describe types for heterogeneous sequences, that can result from [XPath/XQuery] expressions but cannot be represented as an XML Schema.

When constructing elements, or attributes, some additional constraints on type must to be checked, for which this grammar is too general. This section gives auxiliary grammar productions which impose additional constraints on the content of elements and attributes.

First, a grammar for simple types is defined. This grammar is a subset of the grammar for types. A simple type is composed from atomic types by optional, one or more, zero or more, choice, or empty choice.

[39 (Formal)]   SimpleType   ::=   AtomicTypeName
|  (SimpleType Occurrence)
|  (SimpleType "|" SimpleType)
|  ("(" ")")
|  "none"

An all group contains only attributes or only elements; attributes or elements in an all group may be optional but not repeated; attributes always precede other content of an element.

[40 (Formal)]   AttributeAll   ::=   AttributeType
|  (AttributeType "?")
|  (AttributeAll "&" AttributeAll)
|  ("(" ")")
[41 (Formal)]   ElementAll   ::=   ElementType
|  (ElementType "?")
|  (ElementAll "&" ElementAll)
|  ("(" ")")
|  "none"
[42 (Formal)]   ElementNoAll   ::=   ElementType
|  "text"
|  (ElementType Occurrence)
|  (ElementAll "," ElementAll)
|  (ElementAll "|" ElementAll)
|  ("(" ")")
[0 (Formal)]   ElementContent   ::=   "ElementContent"
[43 (Formal)]   ElementBody   ::=   AttributeAll "ElementContent"

The type in an element type should always match the grammar for ElementBody given above. (Note that in case the element type is omitted altogether, e.g., like in element a { }, the type is implicitely matching the empty sequence.)

Those auxiliary productions is be used when matching the type content of elements or attributes within inference rules.

3.2.6 Example

Here is a schema describing purchase orders, taken from the XML Schema Primer, followed by its mapping into the [XPath/XQuery] type system. The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas].

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  
   <xsd:annotation>
    <xsd:documentation xml:lang="en">
     Purchase order schema for Example.com.
     Copyright 2000 Example.com. All rights reserved.
    </xsd:documentation>
   </xsd:annotation>
  
   <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
  
   <xsd:element name="comment" type="xsd:string"/>
  
   <xsd:complexType name="PurchaseOrderType">
    <xsd:sequence>
     <xsd:element name="shipTo" type="USAddress"/>
     <xsd:element name="billTo" type="USAddress"/>
     <xsd:element ref="comment" minOccurs="0"/>
     <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
   </xsd:complexType>
  
   <xsd:complexType name="USAddress">
    <xsd:sequence>
     <xsd:element name="name"   type="xsd:string"/>
     <xsd:element name="street" type="xsd:string"/>
     <xsd:element name="city"   type="xsd:string"/>
     <xsd:element name="state"  type="xsd:string"/>
     <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN"
  	fixed="US"/>
   </xsd:complexType>
  
   <xsd:complexType name="Items">
    <xsd:sequence>
     <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
      <xsd:complexType>
  	<xsd:sequence>
  	 <xsd:element name="productName" type="xsd:string"/>
  	 <xsd:element name="quantity">
  	  <xsd:simpleType>
  	   <xsd:restriction base="xsd:positiveInteger">
  	    <xsd:maxExclusive value="100"/>
  	   </xsd:restriction>
  	  </xsd:simpleType>
  	 </xsd:element>
  	 <xsd:element name="USPrice"  type="xsd:decimal"/>
  	 <xsd:element ref="comment"   minOccurs="0"/>
  	 <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
  	</xsd:sequence>
  	<xsd:attribute name="partNum" type="SKU" use="required"/>
      </xsd:complexType>
     </xsd:element>
    </xsd:sequence>
   </xsd:complexType>
  
   <!-- Stock Keeping Unit, a code for identifying products -->
   <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
     <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
   </xsd:simpleType>
  
  </xsd:schema>
  namespace xsd = "http://www.w3.org/2001/XMLSchema"

  define element purchaseOrder of type ipo:PurchaseOrderType
 
  define element comment of type xsd:string
  
  define type PurchaseOrderType {
    attribute orderDate of type xsd:date?,
    element shipTo of type USAddress,
    element billTo of type USAddress,
    element ipo:comment?,
    element items of type Items
  }

  define type USAddress {
    attribute country of type xsd:NMTOKEN?,
    element name of type xsd:string,
    element street of type xsd:string,
    element city of type xsd:string,
    element state of type xsd:string,
    element zip of type xsd:decimal
  }

  define type Items {
    attribute partNum of type SKU,
    element item of type [Anon1]*
  }

  define type [Anon1] {
    element productName of type xsd:string,
    element quantity of type [Anon2],
    element USPrice of type xsd:decimal,
    element comment?,
    element shipDate of type xsd:date?
  }

  define type [Anon2] restricts xsd:positiveInteger { xsd:positiveInteger }

  define type SKU restrict xsd:string { xsd:string }

Note the way the two anonymous types inside the item element are handled by giving them anonymous names [Anon1] and [Anon2].

The following additional definitions illustrate how more advanced XML Schema features (a complex type derived by extension, an anonymous simple type derived by restriction, and substitution groups) are represented in the XQuery type system.

  <complexType name="NYCAddress">
    <complexContent>
     <extension base="USAddress">
      <sequence>
       <element ref="appt"/>
      </sequence>
     </extension>
    </complexContent>
  </complexType>

  <element name="appt"/>
    <xsd:simpleType>
     <xsd:restriction base="xsd:positiveInteger">
      <xsd:maxExclusive value="10000"/>
     </xsd:restriction>
    </xsd:simpleType>
  <element>

  <element name="usaddress" substitutionGroup="address" type="USAddress"/>
  <element name="nycaddress" substitutionGroup="usaddress" type="NYCAddress"/>

Those definitions are written in the type system as follows.

  define type NYCAddress extends USAddress {
    element appt
  }

  define element appt of type [Anon3]

  define type [Anon3] restricts xsd:positiveInteger

  define element usaddress  substitutes for address of type USAddress
  define element nycaddress substitutes for usaddress of type NYCAddress

3.3 Judgments for name hierarchies

Introduction

Many static and dynamic inference rules need to operate on element, attributes and type names. This section defines several judgments on names used in the rest of the Formal Semantics.

The first two judgments are capturing the derivation hiearchy and the substitution group hierarchy. The we introduce judgments to compute the "most precise" name within those hierarchies.

3.3.1 Derives

Notation

The judgment

statEnv |- TypeName derives from TypeName

holds when the first type name derives from the second type name.

Example

For example, assuming the extended XML Schema given in section [3.2.6 Example], then the following judgments hold.

  USAddress            derives from  xs:anyType
  NYCAddress           derives from  USAddress
  NYCAddress           derives from  xs:anyType
  xsd:positiveInteger  derives from  xsd:integer
  xsd:integer          derives from  xs:anySimpleType
  [Anon3]              derives from  xsd:positiveInteger
  [Anon3]              derives from  xsd:integer
  [Anon3]              derives from  xs:anySimpleType
  [Anon3]              derives from  xs:anyType

Note

Derivation is a partial order. It is reflexive and transitive by the definition below. It is asymmetric because no cycles are allowed in derivation by restriction or extension.

Semantics

This judgment is specified by the following rules.

Some rules have hypotheses that simply list a type, element, or attribute declaration.

Every type name derives from itself.


statEnv |- TypeName derives from TypeName

Every type name derives from the type it is declared to derive from by restriction or extension.

statEnv.typeDefn(TypeName) => define type TypeName extends BaseTypeName Mixed? { Type? }

statEnv |- TypeName derives from BaseTypeName

statEnv.typeDefn(TypeName) => define type TypeName restricts BaseTypeName Mixed? { Type? }

statEnv |- TypeName derives from BaseTypeName

Derivation is transitive.

statEnv |- TypeName1 derives from TypeName2      statEnv |- TypeName2 derives from TypeName3

statEnv |- TypeName1 derives from TypeName3

3.3.2 Substitutes

The substitutes judgment is used to know whether an element name is in the substitution group of another element name.

Notation

The judgment

statEnv |- ElementName substitutes for ElementName

holds when the first element name substitutes for the second element name.

Example

For example, assuming the extended XML Schema given in section [3.2.6 Example], then the following judgments hold.

  usaddress  substitutes element for  address
  nyaddress  substitutes element for  usaddress
  nyaddress  substitutes element for  address

Note

Substitution is a partial order. It is reflexive and transitive by the definition below. It is asymmetric because no cycles are allowed in substitution groups.

Semantics

The substitutes judgment for element names is specified by the following rules.

Every element name substitutes for itself.


statEnv |- ElementName substitutes for ElementName

Every element name substitutes for the element it is declared to substitute for.

statEnv.elemDecl(ElementName) => define element ElementName substitutes for BaseElementName Nillable? TypeReference

statEnv |- ElementName substitutes for BaseElementName

Substitution is transitive.

statEnv |- ElementName1 substitutes for ElementName2      statEnv |- ElementName1 substitutes for ElementName3

statEnv |- ElementName1 substitutes for ElementName3

3.4 Judgments for type matching

Introduction

XQuery supports type declarations on variable bindings, and several operations on types (typeswitch, instance of, etc). This section describes judgments used for the specification of the semantics of those operations.

  • The "match" judgment specifies formally type matching. It takes as input a value and a type and either succeeds or fails. It is used in matching parameters against function signatures, type declarations, and matching values against cases in "typeswitch". An informal description of type matching is given in [2.4.2 SequenceType] in [XQuery 1.0: A Query Language for XML].

  • The "subtyping" judgment takes two types and succeeds if all values matching the first type also match the second. It is used to define the static semantics of operations using type matching.

Type matching and subtyping need to lookup the name and type annotations for any attribute and element types. First, an auxiliary judgment to that effect is defined.

3.4.1 Element and attribute lookup

Element and attribute types can be of various kinds: reference to a global element or attribute declaration, local element with or without a wildcard, an element can be nillable or not, etc. The lookup judgments are used to obtain the appropriate type annotations from any given attribute or element type.

Notation

The judgment

statEnv |- ElementName lookup ElementType yields Nillable? TypeReference

holds when matching an element with the given element name against the given element type requires that the element be nillable as indicated and matches the type reference.

Example

For example, assuming the extended XML Schema given in section [3.2.6 Example], then the following judgments hold.

  comment    lookup element comment                          yields of type xsd:string
  size       lookup element size nillable of type xs:integer yields nillable of type xsd:string
  appt       lookup element appt                             yields of type [Anon3]
  nycaddress lookup element address                          yields of type NYCAddress

Note that in the case the element name is in the substitution group, the lookup returns the type name of corresponding to the original element name (here the type NYCAddress for the element nycaddress, instead of Address for the element address).

Semantics

This judgment is specified by the following rules.

If the element type is a reference to a global element, then lookup yields the type reference in the element declaration for the given element name. The given element name must be in the substitution group of the global element.

statEnv |- ElementName1 substitutes for ElementName2
statEnv.elemDecl(ElementName1) => define element ElementName1 Substitution? Nillable? TypeReference

statEnv |- ElementName1 lookup element ElementName2 yields Nillable? TypeReference

If the given element name matches the element name in the element type, and the element type contains a type reference, then lookup yields that type reference.


statEnv |- ElementName lookup element ElementName Nillable? TypeReference yields Nillable? TypeReference

If the element type has no element name but contains a type reference, then lookup yields the type reference.


statEnv |- ElementName lookup element TypeReference yields TypeReference

If the element type has no element name and no type reference, then lookup yields xs:anyType.


statEnv |- ElementName lookup element yields of type xs:anyType

Notation

The judgment

statEnv |- AttributeName lookup AttributeType yields TypeReference

holds when matching an attribute with the given attribute name against the given attribute type matches the type reference.

Example

For example, assuming the extended XML Schema given in section [3.2.6 Example], then the following judgments hold.

  orderDate  lookup  attribute orderDate of type xsd:date  yields  of type xsd:date?
  orderDate  lookup  attribute of type xsd:date            yields  of type xsd:date?

Semantics

This judgment is specified by the following rules.

If the attribute type is a reference to a global attribute, then lookup yields the type reference in the attribute declaration for the given attribute name.

statEnv.attrDecl(AttributeName) => define attribute AttributeName TypeReference

statEnv |- AttributeName lookup attribute AttributeName yields TypeReference

If the given attribute name matches the attribute name in the attribute type, and the attribute type contains a type reference, then lookup yields that type reference.


statEnv |- AttributeName lookup attribute AttributeName TypeReference yields TypeReference

If the attribute type has no attribute name but contains a type reference, then lookup yields the type reference.


statEnv |- AttributeName lookup attribute TypeReference yields TypeReference

If the attribute type has no attribute name and no type reference, then lookup yields xs:anySimpleType.


statEnv |- AttributeName lookup attribute yields of type xs:anySimpleType

3.4.2 Matches

Notation

The judgment

statEnv |- Value matches Type

holds when the given value matches the given type.

Example

For example, assuming the extended XML Schema given in section [3.2.6 Example], then the following judgments hold.

  element comment of type xsd:string { "This is not important" }
    matches
  element comment of type xsd:string

  (element appt of type [Anon3] { 2510 },
   element appt of type [Anon3] { 2511 })
    matches
  element appt+

  ()
    matches
  element usaddress?

  element usaddress of type USAddress {
    element name of type xsd:string { "The Archive" },
    element street of type xsd:string { "Christopher Street" },
    element city of type xsd:string { "New York" },
    element state of type xsd:string { "NY" },
    element zip of type xsd:decimal { 10210 }
  }
    matches
  element usaddress?

Semantics

We start by giving the inference rules for matching an element against an element type.

An element matches an element type if the element type resolves to another element type, and the type annotation is derived from the type annotation of the resolved type. Note that there is no need to check structural constraints on the value since since those have been checked during XML Schema validation and the value is assumed to be consistent with its type annotation.

statEnv |- ElementName lookup ElementType yields Nillable? of type BaseTypeName
statEnv |- TypeName derives from BaseTypeName
Value filter @xsi:nil => () or false

statEnv |- element ElementName of type TypeName { Value } matches ElementType

In the case the element has been nilled, the following alternative rule applies, which checks that the type is nillable and that there exists and xsi:nil attribute set to true in the element.

statEnv |- ElementName lookup ElementType yields nillable of type BaseTypeName
statEnv |- TypeName derives from BaseTypeName
Value filter @xsi:nil => true

statEnv |- element ElementName of type TypeName { Value } matches ElementType

The rule for attributes is similar.

statEnv |- AttributeName lookup AttributeType yields of type BaseTypeName
statEnv |- TypeName derives from BaseTypeName

statEnv |- attribute AttributeName of type TypeName { Value } matches AttributeType

A document node matches the corresponding document type.


statEnv |- document { Value } matches document

A text node matches text.


statEnv |- text { String } matches text

An atomic value matches an atomic type if its type annotation derives from the atomic type. The value itself is ignored -- this is checked as part of validation.

statEnv |- AtomicTypeName1 derives from AtomicTypeName2

statEnv |- AtomicValue of type AtomicTypeName1 matches

An atomic value matches "untyped" only if it is a string an is annotated with xs:anySimpleType.


statEnv |- String of type xs:anySimpleType matches untyped

A type can also be a sequence of items, in that case the matching rules also need to check whether the constraints described by the type as a regular expression hold. This is specified by the following rules.

The empty sequence matches the empty sequence type.


statEnv |- () matches ()

If two values match two types, then their sequence matches the corresponding sequence type.

statEnv |- Value1 matches Type1
statEnv |- Value2 matches Type2

statEnv |- Value1,Value2 matches Type1,Type2

If a value matches a type, then it also matches a choice type where that type is one of the choices.

statEnv |- Value matches Type1

statEnv |- Value matches Type1|Type2

statEnv |- Value matches Type2

statEnv |- Value matches Type1|Type2

If two values match two types, then their interleaving matches the corresponding all group.

statEnv |- Value1 matches Type1
statEnv |- Value2 matches Type2
statEnv |- Value1 interleave Value2 yields Value

statEnv |- Value matches Type1 & Type2

An optional type matches a value of that type or the empty sequence.

statEnv |- Value matches (Type | ())

statEnv |- Value matches Type?

The following rules are used to match a value against a sequence of zero (or one) or more types.


statEnv |- () matches Type*

statEnv |- Value1 matches Type      statEnv |- Value2 matches Type*

statEnv |- Value1, Value2 matches Type*

statEnv |- Value1 matches Type      statEnv |- Value2 matches Type*

statEnv |- Value1, Value2 matches Type+

Note

The above definition of type matching, although complete and precise, does not give a simple means to compute type matching. Notably, some of the above rules can be non-deterministic (e.g., the rule for matching of choice or repetition).

The structural component of the [XPath/XQuery] type system can be modeled by regular expressions. Regular expressions can be implemented by means of finite state automata. Computing type matching then is equivalent to check if a given sequence of items is recognized by its corresponding finite state automata. Finite state automata and their relationships to regular expressions have been extensively studied and documented in the literature. The interested reader can consult the relevant literature, for instance [Languages], or [TATA].

3.4.3 Subtype

Introduction

This section defines the semantics of subtyping in [XPath/XQuery]. Subtyping is used during the static type analysis, in typeswitch expressions, treat and assert expressions, and to check the correctness of function applications.

Notation

The judgment

statEnv |- Type1 <: Type2

holds if the first type is a subtype of the second.

Semantics

This judgment is true if and only if, for every value Value, if Value matches Type1 holds, then Value matches Type2 also holds.

Note

It is easy to see that the subtype relation <: is a partial order, i.e. it is reflexive:

statEnv |- Type <: Type     

and it is transitive: if,

statEnv |- Type1 <: Type2     

and,

statEnv |- Type2 <: Type3     

then,

statEnv |- Type1 <: Type3

Note

The above definition although complete and precise, does not give a simple means to compute subtyping. Notably the definition above refers to values which are not available at static type checking type.

The structural component of the [XPath/XQuery] type system can be modeled by regular expressions. Regular expressions can be implemented by means of finite state automata. Computing subtyping between two types can then be done by computing if inclusion holds between their corresponding finite states automata.

Finite state automata and how to compute operations on those automata, such as inclusion, emptiness or intersection have been extensively studied and documented in the literature. The interested reader can consult the relevant literature on tree grammars, for instance [Languages], or [TATA].

3.5 Judgments for sequences of item types

Introduction

Some [XPath/XQuery] operations work on sequences of items. For instance, [For/FLWOR] expressions iterate over a sequence of items, the fn:distinct function removes duplicates items in a sequence, the fn:unordered function can return all items in a sequence in any order, etc.

Static typing for those operations need to infer a type acceptable for all the items in the sequence. This sometimes require to approximate the type known for each item individually.

Example

Assume the variable $shipTo is bound to the shipTo element

    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
        <city>Mill Valley</city>
        <state>CA</state>
        <zip>90952</zip>
    </shipTo>

and has type

   element shipTo of type USAddress

The following query orders all children of the shipTo element by alphabetical order of their content.

   for $x in $shipTo/*
   order by $x/text()
   return $x

resulting in the sequence

    (<street>123 Maple Street</street>,
     <zip>90952</zip>,
     <name>Alice Smith</name>,
     <state>CA</state>,
     <city>Mill Valley</city>)

This operation iterates over the elements in the input sequence returned by the expression $shipTo/*, whose type is the content of a type USAddress.

    (element name of type xsd:string,
     element street of type xsd:string,
     element city of type xsd:string,
     element state of type xsd:string,
     element zip of type xsd:decimal)

During static typing, one must give a type to the variable $x which will be bound to each element in the sequence. Since each item as a of a different type, one must find an item type which is valid for all cases in the sequence. This can be done by using a choice for the variable $x, as follows

    (element name of type xsd:string |
     element street of type xsd:string |
     element city of type xsd:string |
     element state of type xsd:string |
     element zip of type xsd:decimal)

This type indicates that the type of the variable can be of any of the item types in the input sequence.

The static inference also needs to approximate the number of occurrence of items in the sequence. In this example, there is at least one item and more than one, so the closest occurrence indicator is + for one or more items.

In Section [5.8 [For/FLWOR] Expressions], we will see how static inference for this example finally results in the type, which is a sequence of one or more items of one of the item types in the input sequence.

    (element name of type xsd:string |
     element street of type xsd:string |
     element city of type xsd:string |
     element state of type xsd:string |
     element zip of type xsd:decimal)+

This section defines the notion of prime type which is a choice of item types. It defines two functions on type which compute the prime type of an arbitrary type, and approximate the occurrence of items in an arbitrary type. Those judgments are used notably in the static semantics of "for" expressions, "fn:unordered", and "fn:distinct" functions.

Notation

A choice of item types is called a prime type, as described by the following grammar production.

[44 (Formal)]   PrimeType   ::=   ItemType
|  (PrimeType "|" PrimeType)

Notation

The type function prime(Type) extracts all item types from the type Type, and combines them into a choice.

The function quantifier(Type) approximates the possible number of items in Type with the occurrence indicators supported by the [XPath/XQuery] type system (?, +, *).

For interim results, the following auxiliary occurrence indicators are used: 1 for exactly one occurrence, and 0 for exactly zero occurrences, i.e., the occurrence indicator of the empty list.

Semantics

The prime function is defined by induction as follows.

prime(ItemType)  =  ItemType
prime(())  =  none
prime(none)  =  none
prime(Type1 , Type2)  =  prime(Type1) | prime(Type2)
prime(Type1 & Type2)  =  prime(Type1) | prime(Type2)
prime(Type1 | Type2)  =  prime(Type1) | prime(Type2)
prime(Type?)  =  prime(Type)
prime(Type*)  =  prime(Type)
prime(Type+)  =  prime(Type)

Semantics

The quantifier function is defined by induction as follows.

quantifier(ItemType)  =  1
quantifier(())  =  0
quantifier(none)  =  0
quantifier(Type1 , Type2)  =  quantifier(Type1) , quantifier(Type2)
quantifier(Type1 & Type2)  =  quantifier(Type1) , quantifier(Type2)
quantifier(Type1 | Type2)  =  quantifier(Type1) | quantifier(Type2)
quantifier(Type?)  =  quantifier(Type) · ?
quantifier(Type*)  =  quantifier(Type) · *
quantifier(Type+)  =  quantifier(Type) · +

This definition uses the sum (Occurrence1 , Occurrence2), the choice (Occurrence1 | Occurrence2), and the product (Occurrence1 · Occurrence2) of two occurrence indicators Occurrence1, Occurrence2, which are defined by the following tables.

 ,  0  1  ?  +  * 
 0  0  1  ?  +  * 
 1  1  +  +  +  + 
 ?  ?  +  *  +  * 
 +  +  +  +  +  + 
 *  *  +  *  +  * 
    
 |  0  1  ?  +  * 
 0  0  ?  ?  *  * 
 1  ?  1  ?  +  * 
 ?  ?  ?  ?  *  * 
 +  *  +  *  +  * 
 *  *  *  *  *  * 
    
 ·  0  1  ?  +  * 
 0  0  0  0  0  0 
 1  0  1  ?  +  * 
 ?  0  ?  ?  *  * 
 +  0  +  *  +  * 
 *  0  *  *  *  * 

Examples

For example, here are the result of applying prime and quantifier on a few simple types.

  prime(element a+)                         = element a
  prime(element a | ())                     = element a
  prime(element a?,element b?)              = element a | element b
  prime(element a | element b+, element c*) = element a | element b | element c

  quantifier(element a+)                         = +
  quantifier(element a | ())                     = ?
  quantifier(element a?,element b?)              = *
  quantifier(element a | element b+, element d*) = +

Note that the last occurrence indicator should be '+', since the regular expression is such that there must be at least one element in the sequence (this element being an 'a' element or a 'b' element).

Note

Note that prime(Type) · quantifier(Type) is always a super type of the original type Type I.e., prime(Type) · quantifier(Type) <: Type always holds. Therefore, it is appropriate to used it as an approximation for the type of an expression. This property is required for the soundness of the static type analysis.

Semantics

Finally, a type Type and an occurrence indicator can be combined back together to yield a new type with the · operation, as follows.

Type · 0  =  ()
Type · 1  =  Type
Type · ?  =  Type?
Type · +  =  Type+
Type · *  =  Type*

3.6 Judgments for the "typeswitch" expression

Introduction

Static type analysis for the "typeswitch" expression has to compute the choice of all the common items between two types, and the corresponding occurrence indicators.

Example

For example, consider the following typeswitch expression, applied on the result of the previous query.

    let $recipients := //shipTo return
      typeswitch ($recipients)
        case $r as element of type NYCAddress return
            <single-nyc-recipient> { $r/street, $r/appt } </single-nyc-recipient>
        case $r as element usaddress? return
            <single-us-recipient> { $r/city, $r/street } </single-us-recipient>
        default return
            <all-recipients> { $recipients } </all-recipients>

Static type analysis needs to compute the type of the variable $r precisely for the two first case clauses. This type depends on both the type for the input expression ($recipients) and the type of the case clause.

If the variable $recipients has type

  element nycaddress of type NYCAddress

then the first case branch is always evaluated and variable $r has type

  element nycaddress of type NYCAddress

Note that the type on the case clause was using a wildcard element name so the static typing for the typeswitch must take into account the element name from the type of the input expression.

If the variable $recipients has type

  element usaddress of type USAddress

then both the first and the second branch of the typeswitch can be evaluated. The type of the variable $r in the first branch must be an element type such that its element name smaller than both usaddress and the element wildcard name, and the type annotation is smaller than both USAddress and NYCAddress, which results in

  element usaddress of type NYCAddress

The type of variable $r in the second branch is the same as the type for the input expression, which is

  element usaddress of type USAddress

Note that since the input expression indicates that there is exactly one item, and the type in the case clause indicate an optional item, then the type of the variable $r is such that there is exactly one item, which is the conjunction of constraints from both occurrence indicators.

Notation

The two following auxiliary functions on types are defined.

The function common-prime(PrimeType1, PrimeType2) computes the type for all the items common between the prime types PrimeType1 and PrimeType2, and combines them into a new prime type.

The function common-occurrence(Occurrence1, Occurrence2) computes the occurrence indicator which approximates the possible number of items that can result from the constraints described by both Occurrence1, and Occurrence2.

Semantics

The common-prime function is defined by induction as follows.

The common-prime for two arbitrary prime types, is defined as the choice containing the results of the function common-prime applied between all possible pairs of items in each input prime types.

statEnv |- common-prime( ItemType1, ItemType1' ) = ItemType(1,1)
statEnv |- common-prime( ItemType1, ItemType2' ) = ItemType(1,2)
···
statEnv |- common-prime( ItemTypen, ItemTyper' ) = ItemType(n,r)

statEnv |- common-prime( ( ItemType1 | ··· | ItemTypen ), ( ItemType1' | ··· | ItemTyper' ) ) = ItemType(1,1) | ··· | ItemType(n,r)

The common-prime function between any pair of item types returns either a new item type or the empty choice (none);. Remember that since none is the identity for choice, all of those cases are eventually be eliminated through the rule above.

If one item type is a subtype of the other, the function returns the subtype.

statEnv |- ItemType1 <: ItemType2

statEnv |- common-prime(ItemType1, ItemType2) = ItemType1

statEnv |- ItemType2 <: ItemType1

statEnv |- common-prime(ItemType1, ItemType2) = ItemType2

This rule captures most of the cases, including cases with document, text nodes and atomic types.

In the case of attributes and elements, additional rules also apply to cover the cases where the element name and the type names are compatible but subtyping does not hold directly, which happens when one of the element or attribute type has a wildcard name.

If one element type has a wildcard name, then one applies the same rule with the wildcard replaced by the element name of the other element type.

statEnv |- common-prime(element ElementName2 of type TypeName1, element ElementName2 Nillable2? TypeReference2?) = ElementType3

statEnv |- common-prime(element of type TypeName1, element ElementName2 Nillable2? TypeReference2?) = ElementType3

statEnv |- common-prime(element ElementName1 Nillable1? TypeReference1?, element ElementName1 of type TypeName2) = ElementType3

statEnv |- common-prime(element ElementName1 Nillable1? TypeReference1?, element of type TypeName2) = ElementType3

If one attribute type has a wildcard name, then one applies the same rule with the wildcard replaced by the attribute name of the other attribute type.

statEnv |- common-prime(attribute AttributeName2 of type TypeName1, attribute AttributeName2 TypeReference2?) = AttributeType3

statEnv |- common-prime(attribute of type TypeName1, attribute AttributeName2 TypeReference2?) = AttributeType3

statEnv |- common-prime(attribute AttributeName1 TypeReference1?, attribute AttributeName1 of type TypeName2) = AttributeType3

statEnv |- common-prime(attribute TypeReference1?, attribute of type TypeName2) = AttributeType3

In all the other cases, common-prime returns none. I.e., otherwise:

statEnv |- common-prime(ItemType1, ItemType2) => none

Semantics

The common-occurrence function is defined by the following table.

 common-occurrence  0  1  ?  +  * 
 0  0  0  0  0  0 
 1  0  1  1  1  1 
 ?  0  1  ?  1  ? 
 +  0  1  1  +  + 
 *  0  1  ?  +  * 

Examples

For example, the following illustrates some applications of common-prime and common-occurrence.

  common-prime((element a),(element a))                                      = element a
  common-prime((element a),(element b))                                      = none
  common-prime((element a | element b | element c), (element c | element a)) = element a | element c

  common-occurrence(+,+)   = +
  common-occurrence(*,+)   = +
  common-occurrence(?,+)   = 1

Note that the last occurrence indicator should be '1', since the first input occurrence indicator '?' specifies that there should be at most one item, while the second '+' specifies that there should be at least one.

The definition of relies on subtyping, hence covering some non trivial cases with derivation, substitution groups, etc. For example, consider the following definitions.

  define element person of type Person

  define type Person { element name of type xs:string }

  define element student substitutes for person of type Student
  define type Student extends Person {
    element promotion of type Promotion,
    element grade of type Grade
  }

  define type Promotion restricts xs:integer { xs:integer }
  define type Grade restricts xs:integer { xs:integer }

The following illustrates some more advanced applications of common-prime and common-occurrence.

  common-prime(element of type Promotion, element of type xs:integer) = element of type Promotion
  common-prime(element of type Promotion, element of type Grade)      = element of type none
  common-prime(element student of type Person, element of type Student)       = element of type Student
  common-prime(element person, element student)                       = element student

3.7 Judgments for type promotion

Introduction

Function calls can perform type promotion between atomic types. This section introduces judgments which describe type promotion for the purpose of the dynamic and static semantics.

Notation

The judgment

Type1 can be promoted to Type2

holds if type Type1 can be promoted to type Type2.

Example

For example, the following judgments hold

  xs:integer  can be promoted to  xs:integer
  xs:decimal  can be promoted to  xs:float
  xs:integer  can be promoted to  xs:float
  xs:float    can be promoted to  xs:double

Semantics

This judgment is specified by the following rules.

xs:decimal can be promoted to xs:float


statEnv |- xs:decimal can be promoted to xs:float

xs:float can be promoted to xs:double


statEnv |- xs:float can be promoted to xs:double

A type can be promoted to itself


statEnv |- Type can be promoted to Type

type promotion is transitive

statEnv |- Type1 can be promoted to Type2      statEnv |- Type2 can be promoted to Type3

statEnv |- Type1 can be promoted to Type3

Notation

The judgment

Value1 against Type2 promotes to Value2

holds if value Value1 can be promoted to the value Value2 against the type Type2.

Example

For example, the following judgments hold

  1     of type xs:integer  against  xs:integer  is promoted to  1     of type xs:integer
  1     of type xs:integer  against  xs:decimal  is promoted to  1     of type xs:integer
  1     of type xs:integer  against  xs:float    is promoted to  1.0e0 of type xs:float
  1.0e0 of type xs:float    against  xs:double   is promoted to  1.0e0 of type xs:double

Note that type promotion changes the value, and only occurs if the input value does not matches the target type.

Semantics

This judgment is specified by the following rules.

If the value matches the target type, then it is promoted to itself

Value matches Type

Value against Type promotes to Value

If the value does not match the target type, but matches a type which can be promoted to the target type, then the value is cast to the target type.

statEnv |- Value1 matches Type1
statEnv |- Type1 can be promoted to Type2
statEnv |- Type1 != Type2
cast as Type2 (Value1) => Value2

statEnv |- Value1 against Type2 promotes to Value2

3.8 Judgments for the validate expression

XQuery supports XML Schema validation using the validate expression. This section describes the semantics of XML Schema validation, for the purpose of specifying its usage in XQuery.

Specifying XML Schema validation requires a fairly large number of auxiliary judgments. There are three main judgments used to describe the semantics of validation.

  • The "smatch" judgment stands for "structural" type matching. It takes as input a value and a type and either succeeds or fails. As opposed to type matching, structural type matching checks that the value given as input verifies both the name and structural constraints given by the type. This is natural since XML Schema must verify that both name and structural constraints are verified before accepting the document as valid or invalid. Once the document has been validated other XQuery operation only need to use type matching as described in [3.4.2 Matches].

  • The "erase" judgment takes a value and removes all type information from it. This operation is necessary since, in XQuery, validation can occur both on well-formed or already validated documents.

  • The "annotate" operation takes an untyped value and a type and either fails or succeeds by returning a new -validated- value.

Before defining those three judgments, we first introduce auxiliary judgments used to describe specific parts of the XML Schema's semantics.

3.8.1 Builtin attributes

Schema defines four built-in attributes that can appear on any element in the document without being explicitly declared in the schema. Those four attributes need to be added inside content models when doing matching. The four built-in attributes of Schema are declared as follows.

  define attribute xsi:type of type xs:QName
  define attribute xsi:nil of type xs:boolean
  define attribute xsi:schemaLocation of type fs:anyURIlist
  define type fs:anyURIlist { xs:anyURI* }
  define attribute xsi:noNamespaceSchemaLocation of type xs:anyURI

For convenience, a type that is an all group of the four built-in XML Schema attributes is defined.

  BuiltInAttributes =
      attribute xsi:type ?
    & attribute xsi:nil ?
    & attribute xsi:schemaLocation ?
    & attribute xsi:noNamespaceSchemaLocation ?

3.8.2 Extension

Notation

The judgment

statEnv |- Type1 extended by Type2 is Type

holds when the result of extending Type1 by Type2 is Type.

Semantics

This judgment is specified by the following rules.

statEnv |- Type1 = AttributeAll1 , ElementContent1      statEnv |- Type2 = AttributeAll2 , ElementContent2

statEnv |- Type1 extended by Type2 is (AttributeAll1 & AttributeAll2) , ElementContent1 , ElementContent2

3.8.3 Mixed content

Notation

The judgment

statEnv |- Type1 mixes to Type2

holds when the result of creating a mixed content from Type1 is Type2.

Semantics

This judgment is specified by the following rules.

statEnv |- Type = AttributeAll , ElementContent

statEnv |- Type mixes to AttributeAll , ( ElementContent & text* )

3.8.4 Adjusts

An element may optionally include the four built-in attributes xsi:type, xsi:nil, xsi:schemaLocation, or xsi:noNamespaceSchemaLocation.

Notation

The judgment

statEnv |- Mixed? Type1 adjusts to Type2

holds when the second type is the same as the first, with the four built-in attributes added, and with text nodes added if the type is mixed.

Semantics

This judgment is specified by the following rules.

If the type is flagged as mixed, then mix the type and extend it by the built-in attributes.

statEnv |- Type1 mixes to Type2
statEnv |- Type2 extended by BuiltInAttributes is Type3

statEnv |- mixed Type1 adjusts to Type3

Otherwise, just extend the type by the built-in attributes.

statEnv |- Type1 extended by BuiltInAttributes is Type2

statEnv |- Type1 adjusts to Type2

The definition of BuiltInAttributes appears in [3.2.4 Built-in type declarations].

3.8.5 Type resolution

Notation

The judgment

statEnv |- (TypeReference | TypeDerivation) resolves to TypeName { Type }

holds when a type reference or a type derivation resolves to the given type name and type content.

Semantics

This judgment is specified by the following rules.

If the type is omitted, it is resolved as the empty sequence type.

statEnv |- Derivation? Mixed? { () } resolves to TypeName { Type }

statEnv |- Derivation? Mixed? { } resolves to TypeName { Type }

In case of a type reference, then the type name is the name of that type, and the type is taken by resolving the type declaration of the global type.

statEnv.typeDefn(TypeName) => define type TypeName TypeDerivation
statEnv |- TypeDerivation resolves to BaseTypeName { Type }

statEnv |- of type TypeName resolves to TypeName { Type }

In the above inference rule, note that BaseTypeName is the base type of the type referred to. So this is indeed the original type name, TypeName, which must be returned, and eventually used to annotated the corresponding element or attribute. However, the type needs to be obtained through a second application of the judgment.

If the type derivation is a restriction, then the type name is the name of the base type, and the type is taken from the type derivation.

statEnv |- Mixed? Type adjusts to AdjustedType

statEnv |- restricts TypeName Mixed? { Type } resolves to TypeName { AdjustedType }

If the type derivation is an extension, then the type name is the name of the base type, and the type is the base type extended by the type in the type derivation.

statEnv.typeDefn(TypeName) => define type TypeName Derivation? BaseMixed? { BaseType? }
statEnv |- BaseType? extended by Type is ExtendedType
statEnv |- Mixed? ExtendedType adjusts to AdjustedType

statEnv |- extends TypeName Mixed? { Type } resolves to TypeName { AdjustedType }

3.8.6 Element and attribute lookup

Element and attribute lookup used in validation is identical to the one used for type matching and is defined in [3.4.1 Element and attribute lookup].

3.8.7 Interleaving

Notation

The judgment

statEnv |- Value1 interleave Value2 yields Value3

holds if some interleaving of Value1 and Value2 yields Value3. Interleaving is non-deterministic; it is used for processing all groups.

Semantics

This judgment is specified by the following rules.

Interleaving two empty sequences yields the empty sequence.


statEnv |- () interleave () yields ()

Otherwise, pick an item from the head of one of the sequences, and recursively interleave the remainder.

statEnv |- Value1 interleave Value2 yields Value3

statEnv |- Item,Value1 interleave Value2 yields Item,Value3

statEnv |- Value1 interleave Value2 yields Value3

statEnv |- Value1 interleave Item,Value2 yields Item,Value3

3.8.8 Filtering

Notation

The judgment

Value filter @QName => ()

holds if there are no occurrences of the attribute QName in Value. The judgment

Value filter @QName => SimpleValue

holds if there is one occurrence of the attribute QName in Value, and the value of that attribute is SimpleValue. The judgment

Value filter @QName => () or SimpleValue

holds if either of the previous two judgments hold.

Semantics

These judgments are defined using the auxiliary judgments

dynEnv |- Value1 on Axis => Value2

and

dynEnv |- Value1 on PrincipalNodeKind, NodeTest => Value2

which are defined in [5.2.1 Steps].

The filter judgments are defined as follows.

dynEnv |- Value1 on attribute:: => Value2
dynEnv |- Value2 on "attribute", QName => ()

Value1 filter @QName => ()

dynEnv |- Value1 on attribute:: => Value2
dynEnv |- Value2 on "attribute",QName => Value3
Value3 = attribute QName { SimpleValue }

Value1 filter @QName => SimpleValue

3.8.9 Structural matching

Introduction

We first extend the notion of type matching to handle both names and structures. The structural semantics of a type is given by the notion of structural matching.

A tree in the [XQuery 1.0 and XPath 2.0 Data Model] structurally matches a type in the [XPath/XQuery] type system, if and only if:

  • It verifies the structural constraints described by the type.

  • It verifies the name constraints described by the type.

Example

For example, the data model value

  element bib:pubdate of type [Anon3] { 1954, 1966, 1974, 1986 }

structurally matches the following type

  element of type [Anon3]
  define type [Anon3] { attribute country { xs:string }?, xs:integer* }

because the element name (bib:pubdate) matches the wildcard element in the type, the type annotation ([Anon3]) matches the type name, the attribute "country" is optional, and the content of that element is indeed a sequence of integers.

But the data model value

  element bib:pubdate of type [Anon3] { attribute pays { "France" }, 1954, 1966, 1974, 1986 }

does not structurally match the same type, since the element in the data model contains an attribute which is not described in the schema. This violates the structural requirement of the schema.

3.8.9.1 Structural nil matching

Notation

The judgment

statEnv |- Value nil-smatches Nillable? Type

holds when the given value structurally matches the given nillable type.

Semantics

This judgment is specified by the following rules.

If the type is not nillable, then the xsi:nil attribute must not appear in the value, and the value must structurally match the type.

Value filter @xsi:nil => ()
statEnv |- Value smatches Type

statEnv |- Value nil-smatches Type

If the type is nillable, and the xsi:nil attribute does not appear or is false, then the value must structurally match the type.

Value filter @xsi:nil => () or false
statEnv |- Value smatches Type

statEnv |- Value nil-smatches nillable Type

If the type is nillable, and the xsi:nil attribute is true, then the value must structurally match the attributes in the type only. The element content of the type is ignored.

Value filter @xsi:nil => true
statEnv |- Value smatches AttributeAll

statEnv |- Value nil-smatches nillable (AttributeAll, ElementContent)

3.8.9.2 Structural matching

Notation

The judgment

statEnv |- Value smatches Type

holds when the given value structurally matches the given type.

Semantics

This judgment is specified by the following rules.

The empty sequence structurally matches the empty sequence type.


statEnv |- () smatches ()

If two values structurally match two types, then their sequence structurally matches the corresponding sequence type.

statEnv |- Value1 smatches Type1
statEnv |- Value2 smatches Type2

statEnv |- Value1,Value2 smatches Type1,Type2

If a value structurally matches a type, then it also structurally matches a choice type where that type is one of the choices.

statEnv |- Value smatches Type1

statEnv |- Value smatches Type1|Type2

statEnv |- Value smatches Type2

statEnv |- Value smatches Type1|Type2

If two values structurally match two types, then their interleaving structurally matches the corresponding all group.

statEnv |- Value1 smatches Type1
statEnv |- Value2 smatches Type2
statEnv |- Value1 interleave Value2 yields Value

statEnv |- Value smatches Type1 & Type2

An optional type structurally matches a value of that type or the empty sequence.

statEnv |- Value smatches (Type | ())

statEnv |- Value smatches Type?

The following rules are used to structurally match a value against a sequence of zero (or one) or more types.


statEnv |- () smatches Type*

statEnv |- Value1 smatches Type      statEnv |- Value2 smatches Type*

statEnv |- Value1, Value2 smatches Type*

statEnv |- Value1 smatches Type      statEnv |- Value2 smatches Type*

statEnv |- Value1, Value2 smatches Type+

An element structurally matches an element type if the element type resolves to another element type, and the type annotation is derived from the type annotation of the resolved type, and the element value structurally matches the enclosed type of the resolved type.

Note that as opposed to the rule for type matching in [3.4.2 Matches], this rule also checks the content of the element.

statEnv |- ElementName lookup ElementType yields Nillable? TypeReference
statEnv |- TypeReference resolves to TypeName2 { Type }
statEnv |- TypeName1 derives from TypeName2
statEnv |- Value nil-smatches Nillable? Type

statEnv |- element ElementName of type TypeName1 { Value } smatches ElementType

The rule for attributes is similar.

statEnv |- AttributeName lookup AttributeType yields TypeReference
statEnv |- TypeReference resolves to TypeName2 { Type }
statEnv |- TypeName1 derives from TypeName2
statEnv |- Value nil-smatches Nillable? Type

statEnv |- attribute AttributeName of type TypeName1 { Value } smatches AttributeType

A document node structurally matches the corresponding document type.


statEnv |- document { Value } smatches document

A text node structurally matches the text node type.


statEnv |- text { String } smatches text

An atomic value structurally matches an atomic type if its type annotation derives from the atomic type.

statEnv |- AtomicTypeName1 derives from AtomicTypeName2

statEnv |- AtomicValue of type AtomicTypeName1 smatches AtomicTypeName2

An atomic value structurally matches "untyped" only if it is annotated with xs:anySimpleType.


statEnv |- AtomicValue of type xs:anySimpleType smatches untyped

Note

The above definition of structural matching, although complete and precise, does not give a simple means to compute structural matching. Notably, some of the above rules can be non-deterministic (e.g., the rule for matching of choice or repetition).

The structural component of the [XPath/XQuery] type system can be modeled by tree grammars. Computing structural matching can be done by computing if a given tree grammar recognizes the given data model value.

This document does not provide a complete algorithm to recognize a given tree by a tree grammar. The interested reader can consult the relevant literature, for instance [Languages], or [TATA].

3.8.10 Erasure

3.8.10.1 Simply erases

Notation

To define erasure, an auxiliary judgment is needed. The judgment

statEnv |- SimpleValue simply erases to String

holds when SimpleValue erases to the string String.

Semantics

This judgment is specified by the following rules.

The empty sequence erases to the empty string.

The concatenation of two non-empty sequences of values erases to the concatenation of their erasures with a separating space.

statEnv |- SimpleValue1 simply erases to String1 SimpleValue1 != ()
statEnv |- SimpleValue2 simply erases to String2 SimpleValue2 != ()

statEnv |- SimpleValue1,SimpleValue2 simply erases to fn:concat(String1," ",String2)

An atomic value erases to its string representation.


statEnv |- AtomicValue of type AtomicTypeName simply erases to dm:string-value(AtomicValue)

3.8.10.2 Erases

Notation

The judgment

statEnv |- Value1 erases to Value2

holds when the erasure of Value1 is Value2.

Semantics

This judgment is specified by the following rules.

The empty sequence erases to itself.


statEnv |- () erases to ()

The erasure of the concatenation of two values is the concatenation of their erasure, so long as neither of the two original values is simple.

statEnv |- Value1 erases to Value1'      statEnv |- Value1 not a simple value
statEnv |- Value2 erases to Value2'      statEnv |- Value2 not a simple value

statEnv |- Value1,Value2 erases to Value1',Value2'

The erasure of an element is an element that has the same name and the type xs:anyType and the erasure of the original content.

statEnv |- Value1 erases to Value2

statEnv |- element ElementName of type TypeName { Value1 } erases to element ElementName of type xs:anyType { Value2 }

The erasure of an attribute is an attribute that has the same name and the type xs:anySimpleType and the simple erasure of the original content labeled with xs:anySimpleType.

statEnv |- Value simply erases to String

statEnv |- attribute AttributeName of type TypeName { Value } erases to attribute AttributeName of type xs:anySimpleType { String of type xs:anySimpleType }

The erasure of a document is a document with the erasure of the original content.

statEnv |- Value1 erases to Value2

statEnv |- document { Value1 } erases to document { Value2 }

The erasure of a text node is itself.


statEnv |- text { String } erases to text { String }

The erasure of a simple value is the corresponding text node.

statEnv |- SimpleValue simply erases to String

statEnv |- SimpleValue erases to text { String }

3.8.11 Annotate

3.8.11.1 Simply annotate

Notation

The judgment

statEnv |- simply annotate as SimpleType1 ( SimpleValue ) => SimpleValue2

holds if the result of casting the SimpleValue1 to SimpleType is SimpleValue2.

Ed. Note: Issue: The simply annotate judgment is used to describe the behavior of validation of simple values. This operation is essentially similar to casting from string to an atomic value. It is not clear if this actually aligns to the behavior of casting as specified by the [XQuery 1.0 and XPath 2.0 Functions and Operators]. See [Issue-0156:  Casting and validation].

Semantics

This judgment is specified by the following rules.

Simply annotating a simple value to a union type yields the result of simply annotating the simple value to either the first or second type in the union. Note that simply annotating to the second type is attempted only if simply annotating to the first type fails.

statEnv |- simply annotate as SimpleType1 (SimpleValue1) => SimpleValue2

statEnv |- simply annotate as SimpleType1|SimpleType2 (SimpleValue) => SimpleValue2

statEnv |- (simply annotate as SimpleType1 (SimpleValue1) => SimpleValue2) fails
statEnv |- simply annotate as SimpleType2 (SimpleValue1) => SimpleValue2

statEnv |- simply annotate as SimpleType1|SimpleType2 (SimpleValue) => SimpleValue2

The simple annotation rules for ?, +, * are similar.


statEnv |- simply annotate as SimpleType1? ( () ) => ()

statEnv |- simply annotate as SimpleType (SimpleValue1) => SimpleValue2

statEnv |- simply annotate as SimpleType? (SimpleValue1) => SimpleValue2


statEnv |- simply annotate as SimpleType* ( () ) => ()

statEnv |- simply annotate as SimpleType (SimpleValue1) => SimpleValue1'      statEnv |- simply annotate as SimpleType* (SimpleValue2) => SimpleValue2'

statEnv |- simply annotate as SimpleType* (SimpleValue1,SimpleValue2) => SimpleValue1',SimpleValue2'

statEnv |- simply annotate as SimpleType (SimpleValue1) => SimpleValue1'      statEnv |- simply annotate as SimpleType* (SimpleValue2) => SimpleValue2'

statEnv |- simply annotate as SimpleType+ (SimpleValue1,SimpleValue2) => SimpleValue1',SimpleValue2'

Simply annotating an atomic value to xs:string yields its string representation.


statEnv |- simply annotate as xs:string (AtomicValue) => dm:string-value(AtomicValue)

Simply annotating an atomic value to xs:decimal yields the decimal that results from parsing its string representation.

Similar rules are assumed for the rest of the 19 XML Schema primitive types.

3.8.11.2 Nil-annotate

Notation

The judgment

statEnv |- nil-annotate as Nillable? Type ( Value1 ) => Value2

holds if it is possible to annotate value Value1 as if it had the nillable type Type and Value2 is the corresponding annotated value.

Semantics

This judgment is specified by the following rules.

If the type is not nillable, then the xsi:nil attribute must not appear in the value, and it must be possible to annotate value Value as if it had the type Type.

Value1 filter @xsi:nil => ()
statEnv |- annotate as Type ( Value ) => Value2

statEnv |- nil-annotate as Type ( Value1 ) => Value2

If the type is nillable, and the xsi:nil attribute does not appear or is false, then it must be possible to annotate value Value1 as if it had the type Type.

Value1 filter @xsi:nil => () or false
statEnv |- annotate as Type ( Value1 ) => Value2

statEnv |- nil-annotate as nillable Type ( Value1 ) => Value2

If the type is nillable, and the xsi:nil attribute is true, then it must be possible to annotate value Value1 as if it had a type where the attributes in the type are kept and the element content of the type is ignored.

Value1 filter @xsi:nil => true
statEnv |- annotate as AttributeAll ( Value1 ) => Value2

statEnv |- nil-annotate as nillable (AttributeAll, ElementContent) ( Value1 ) => Value2

3.8.11.3 Annotate

Notation

The judgment

statEnv |- annotate as Type ( Value1 ) => Value2

holds if it is possible to annotate value Value1 as if it had type Type and Value2 is the corresponding annotated value.

Note

Assume an XML Infoset instance X1 is validated against an XML Schema S, yielding PSVI instance X2. Then if X1 corresponds to Value1 and S corresponds to Type and X2 corresponds to Value2, the following should hold: annotate as Type ( Value1 ) => Value2.

Semantics

This judgment is specified by the following rules.

Annotating the empty sequence as the empty type yields the empty sequence.


statEnv |- annotate as () (()) => ()

Annotating a concatenation of values as a concatenation of types yields the concatenation of the annotated values.

statEnv |- annotate as Type1 (Value1) => Value1'
statEnv |- annotate as Type2 (Value2) => Value2'

statEnv |- annotate as Type1,Type2 (Value1,Value2) => Value1',Value2'

Annotating a value as a choice type yields the result of annotating the value as either the first or second type in the choice.

statEnv |- annotate as Type1 (Value1) => Value2

statEnv |- annotate as Type1|Type2 (Value1) => Value2

statEnv |- annotate as Type2 (Value1) => Value2

statEnv |- annotate as Type1|Type2 (Value1) => Value2

Annotating a value as an all group uses interleaving to decompose the original value and recompose the annotated value.

Ed. Note: Jerome and Phil: Note that this may reorder the original sequence. Perhaps we should disallow such reordering. Specifying that formally is not as easy as we would like.

statEnv |- annotate as Type1 ( Value1 ) => Value1'
statEnv |- annotate as Type2 ( Value2 ) => Value2'
statEnv |- Value1 interleave Value2 yields Value
statEnv |- Value1' interleave Value2' yields Value'

statEnv |- annotate as Type1 & Type2 ( Value ) => Value'

The annotation rules for ?, +, * are similar.

statEnv |- annotate as (Type | ())(Value1) => Value2

statEnv |- annotate as Type? (Value1) => Value2

statEnv |- annotate as Type (Value1) => Value1'      statEnv |- annotate as Type* (Value2) => Value2'

statEnv |- annotate as Type+ (Value1,Value2) => (Value1',Value2')


statEnv |- annotate as Type* ( () ) => ()

statEnv |- annotate as Type (Value1) => Value1'      statEnv |- annotate as Type* (Value2) => Value2'

statEnv |- annotate as Type* (Value1,Value2) => (Value1',Value2')

To annotate an element with no xsi:type attribute, first look up the the element type, next resolve the resulting type reference, then annotate the value against the resolved type, and finally return a new element with the name of the original element, the resolved type name, and the annotated value.

Value filter @xsi:type => ()
statEnv |- ElementName lookup ElementType yields Nillable? TypeReference
statEnv |- TypeReference resolves to TypeName { Type }
statEnv |- nil-annotate as Type Nillable? (Value) => Value'

statEnv |- annotate as ElementType ( element ElementName of type xs:anyType { Value } ) => element ElementName of type TypeName { Value' }

To annotate an element with an xsi:type attribute, define a type reference corresponding to the xsi:type. Look up the element type, yielding a type reference, and check that the xsi:type reference derives from this type reference. Resolve the xsi:type reference, then annotate the value against the resolved type, and finally return a new element with the name of the original element, the resolved type name, and the annotated value.

Value filter @xsi:type => TypeName
statEnv |- XsiTypeReference = of type TypeName
statEnv |- ElementName lookup ElementType yields Nillable? of type BaseTypeName
statEnv |- TypeName derives from BaseTypeName
statEnv |- XsiTypeReference resolves to TypeName { Type }
statEnv |- nil-annotate as Type Nillable? (Value) => Value'

statEnv |- annotate as ElementType ( element ElementName of type xs:anyType { Value } ) => element ElementName of type TypeName { Value' }

Ed. Note: Issue: the treatment of xsi:type in the [XQuery 1.0: A Query Language for XML] document and in the formal semantics document still differ. See [Issue-0142:  Treatment of xsi:type in validation].

The rule for attributes is similar to the first rule for elements.

statEnv |- AttributeName lookup AttributeType yields TypeReference
statEnv |- TypeReference resolves to TypeName { Type }
statEnv |- nil-annotate as Type Nillable? (Value) => Value'

statEnv |- annotate as AttributeType ( attribute AttributeName of type xs:anySimpleType { Value } ) => attribute AttributeName of type TypeName { Value' }

Annotating a document node yields a document with the annotation of its contents.

statEnv |- annotate as Type (Value) => Value'

statEnv |- annotate as document { Type } ( document { Value } ) => document { Value' }

Annotating a text node as text yields itself.


statEnv |- annotate as text (text { String }) => text { String }

Annotating a text nodes as a simple type is identical to casting.

statEnv |- simply annotate as SimpleType ( String as xs:anySimpleType ) => SimpleValue'

statEnv |- annotate as SimpleType ( text { String } ) => SimpleValue'

Annotating a simple value as a simple type is identical to casting.

statEnv |- simply annotate as SimpleType ( SimpleValue ) => SimpleValue'

statEnv |- annotate as SimpleType ( SimpleValue ) => SimpleValue'

4 Basics

The organization of this section parallels the organization of Section 2 of the [XPath/XQuery] document.

4.1 Expression Context

Introduction

The expression context for a given expression consists of all the information that can affect the result of the expression. This information is organized into the static context and the evaluation context. This section specifies the environments that represent the context information used by [XPath/XQuery] expressions.

4.1.1 Static Context

The [XPath/XQuery] static inference rules use the environment group statEnv containing the environments built during static type checking and used during both static type checking and dynamic evaluation.

The following environments are maintained in the static environment group:

statEnv.xpath1.0_compatibility

The XPath 1.0 compatibility flag specifies whether the semantic rules for backward compatibility with XPath 1.0 are in effect.

The initial value of statEnv.xpath1.0_compatibility is false.

statEnv.namespace

The namespace environment maps namespace prefixes (NCNames) onto namespace URIs (URIs).

The namespace environment captures in-scope namespaces in the [XPath/XQuery] static context.

The initial namespace environment maps the following prefixes to the following URIS:

xmlhttp://www.w3.org/XML/1998/namespace
xshttp://www.w3.org/2001/XMLSchema
xsd http://www.w3.org/2001/XMLSchema-datatypes
xsi http://www.w3.org/2001/XMLSchema-instance
statEnv.default_namespace

The default namespace environment maps "element", and "function" to a namespace URI (a URI), as determined by the appropriate namespace declaration.

The default namespace environment captures the default namespace for element and type names and Default namespace for function names in the [XPath/XQuery] static context.

The initial default element namespace is the empty namespace. The initial default function namespace is:

http://www.w3.org/2002/11/xquery-functions

statEnv.typeDefn

The type definition environment maps type names (TypeNames) onto their global type definitions (Definitions).

The type definition environment captures in-scope schema definitions in the [XPath/XQuery] static context.

statEnv.elemDecl

The element declaration environment maps element names (ElementNames) onto their global element declarations (Definitions).

The element declaration environment does not have a counterpart in the [XPath/XQuery] static context. See [Issue-0159:  Element and attribute declarations in the static context].

statEnv.attrDecl

The attribute declaration environment maps attribute names (AttributeNames) onto their global attribute declarations (Definitions).

The attribute declaration environment does not have a counterpart in the [XPath/XQuery] static context. See [Issue-0159:  Element and attribute declarations in the static context].

statEnv.varType

The variable static type environment maps variable names (Variables) to their static types (Types).

The variable static type environment captures in-scope variables in the [XPath/XQuery] static context.

statEnv.funcType

The function declaration environment stores the static type signatures of functions. Because [XPath/XQuery] allows multiple built-in functions with the same name differing only in the number and signature of the arguments, this environment maps a QName to the set of all function declaration signatures of the form "define function QName (Type1, ..., Typen) return Type" each corresponding to a declaration of the function. (For user-defined functions the set will contain exactly one such declaration.)

The initial function declaration environment contains the signatures of the internal functions defined in [B Mapping of Overloaded Internal Functions].

statEnv.base_uri

The base uri environment contains a unique namespace URI (a URI).

The base uri environment captures the base URI in the [XPath/XQuery] static context.

Ed. Note: Issue: The static environment does not represent collations. See [Issue-0160:  Collations in the static environment]

Environments have an initial state when [expression/query] processing begins, containing, for example, the function signatures of all built-in functions.

A common use of the static environment is to expand QNames by looking up namespace prefixes in the statEnv.namespace environment.

The helper function expand is used to expand QNames by looking up the namespace prefix in statEnv.namespace. This function is defined as follows:

   statEnv |- expand(QName) = qname(URI, ncname)

or

   statEnv |- expand(QName) = qname(ncname)

the right hand side is a QName value (not a QName expression).

The helper expand function is defined as follows:

  • If QName matches NCName1:NCName2, and if statEnv.namespace(NCName1) = URI, then the expression yields qname(URI,NCName2). If the namespace prefix NCName1 is not found in statEnv.namespace, then the expression does not apply (that is, the inference rule will not match).

  • If QName matches NCName, NCName is an element or type name, and statEnv.default_namespace("element") = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.

  • If QName matches NCName, NCName is a function name, and statEnv.default_namespace("function") = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.

Ed. Note: The above rules could be given as proper inference rules defining the expand judgment.

Here is an example that shows how the static environment is modified in response to a namespace definition.

statEnv [ namespace(NCName |-> URI) ] |- Expr*

statEnv |- namespace NCName = URI Expr*

This rule reads as follows: "the phrase on the bottom (a namespace declaration followed by a sequence of expressions) is well-typed (accepted by the static type inference rules) within an environment statEnv if the sequence of expressions above the line is well-typed in the environment obtained from statEnv by adding the namespace declaration".

This is the common idiom for passing new information in an environment using sub-expressions. In the case where the environment must be updated with a completely new component, the following notation is used:

statEnv [ namespace = (NewEnvironment) ]

4.1.2 Evaluation Context

dynEnv denotes the group of environments built and used only during dynamic evaluation.

The following environments are dynamic:

dynEnv.funcDefn

The dynamic function environment maps a function declaration name and parameter signature of the form "QName (Type1, ..., Typen)" to the remainder of the corresponding function definition, which is either the special constant #BUILT-IN, or the function's body, and the list of variables, which are the function's formal parameters, on the form "(Expr, Variable1, ,..., Variablen)".

The initial function environment maps the signatures of the internal functions defined in [B Mapping of Overloaded Internal Functions] and the signatures of the functions defined in [XQuery 1.0 and XPath 2.0 Functions and Operators] to #BUILT-IN.

dynEnv.varValue

The dynamic value environment maps a variable name (QName) onto the variable's current value (Value).

Ed. Note: Somewhere an exact definition of what is in the initial environments is needed. Notice that for XPath this is partially defined by the containing language. See [Issue-0115:  What is in the default context?].

The following Formal Semantics built-in variables represent the context item, context position , and context size properties of the evaluation context:

Built-in Variable  Represents:
$fs:dotcontext item
$fs:positioncontext position
$fs:lastcontext size
$fs:implicitTimezoneimplicit timezone

Variables with the "fs" namespace prefix are reserved for use in the definition of the Formal Semantics. It is a static error to define a variable in the "fs" namespace.

Values of $fs:dot, $fs:position and $fs:last can be obtained by invoking the fn:context-item(), fn:position() and fn:last() functions, respectively. The context document, can be obtained by application of the fn:root() function.

The variable $fs:implicitTimezone is used to represent the implicit timezone, and is used by the timezone related functions in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Ed. Note: The Formal Semantics does not yet model the semantics of timezone related functions.

Ed. Note: The following dynamic contexts have no formal representation yet: current-date, current-time and current-dateTime. See [Issue-0114:  Dynamic context for current date and time]

4.2 Input Functions

[XPath/XQuery] has three functions that provide access to input data: input, collection, and document. The dynamic semantics of these three input functions are described in more detail in [XQuery 1.0 and XPath 2.0 Functions and Operators]. The static typing for these function is still an open issue (See [Issue-0137:  Typing of input functions]).

4.3 Expression Processing

This section gives background material about the [XPath/XQuery] data model. More information on the [XPath/XQuery] data model can be found in [2.3 Data Model].

4.4 Types

4.4.1 Type Checking

This section gives background material about [XPath/XQuery] type checking. More information on the [XPath/XQuery] type system can be found in [2.4 Schemas and types] and [3 The XQuery type system].

4.4.2 SequenceType

Introduction

SequenceTypes can be used in [XPath/XQuery] to refer to a type imported from a schema (see [6 The Query Prolog]). SequenceTypes are used to declare the types of function parameters and in several kinds of [XPath/XQuery] expressions.

The syntax of SequenceTypes is described by the following grammar productions.

[88 (XQuery)]   SequenceType   ::=   (ItemType OccurrenceIndicator) |  "empty"
[89 (XQuery)]   ItemType   ::=   (("element" |  "attribute") ElemOrAttrType?)
|  "node"
|  "processing-instruction"
|  "comment"
|  "text"
|  "document"
|  "item"
|  AtomicType
|  "untyped"
|  <"atomic" "value">
[90 (XQuery)]   ElemOrAttrType   ::=   (QName (SchemaType |  SchemaContext?)) |  SchemaType
[91 (XQuery)]   SchemaType   ::=   <"of" "type"> QName
[83 (XQuery)]   SchemaContext   ::=   "context" SchemaGlobalContext ("/" SchemaContextStep)*
[84 (XQuery)]   SchemaGlobalContext   ::=   QName |  <"type" QName>
[85 (XQuery)]   SchemaContextStep   ::=   QName
[92 (XQuery)]   AtomicType   ::=   QName
[93 (XQuery)]   OccurrenceIndicator   ::=   ( |  "*" |  "+" |  "?")?

The semantics of SequenceTypes is defined by means of normalization rules from SequenceTypes to the [XPath/XQuery] type system (see [3 The XQuery type system]).

Core Grammar

The core grammar productions for sequence types are:

[64 (Core)]   SequenceType   ::=   (ItemType OccurrenceIndicator) |  "empty"
[65 (Core)]   ItemType   ::=   (("element" |  "attribute") ElemOrAttrType?)
|  "node"
|  "processing-instruction"
|  "comment"
|  "text"
|  "document"
|  "item"
|  AtomicType
|  "untyped"
|  <"atomic" "value">
[66 (Core)]   ElemOrAttrType   ::=   (QName (SchemaType |  SchemaContext?)) |  SchemaType
[67 (Core)]   SchemaType   ::=   <"of" "type"> QName
[59 (Core)]   SchemaContext   ::=   "context" SchemaGlobalContext ("/" SchemaContextStep)*
[60 (Core)]   SchemaGlobalContext   ::=   QName |  <"type" QName>
[61 (Core)]   SchemaContextStep   ::=   QName
[68 (Core)]   AtomicType   ::=   QName
[69 (Core)]   OccurrenceIndicator   ::=   ( |  "*" |  "+" |  "?")?

Ed. Note: Note that normalization on SequenceTypes does not occur during the normalization phase but whenever a dynamic or static rule requires it. The reason for this deviation from the processing model is that the result of SequenceType normalization is not part of the [XPath/XQuery] syntax (See issue [Issue-0089:  Syntax for types in XQuery]). SequenceType normalization is the only occurrence of such a deviation in the formal semantics.

4.4.2.1 SequenceType Matching

Introduction

During processing of a query, it is sometimes necessary to determine whether a given value matches a type that was declared using the SequenceType syntax. This process is known as SequenceType matching, and is formally specified in [3.4 Judgments for type matching].

Sequence type matching is defined between a value and a type, rules to normalize a sequence type to the type system are necessary.

Notation

To define normalization of SequenceTypes to the type system, the following auxiliary mapping rule is used.

 
[SequenceType]sequencetype
==
Type

specifies that SequenceType is mapped to Type, in the XQuery type system.

Normalization

OccurenceIndicators are left unchanged when normalizing SequenceTypes into [XPath/XQuery] types. Each kind of SequenceType component is normalized separately into the [XPath/XQuery] type system.

 
[ItemType OccurrenceIndicator]sequencetype
==
[ItemType]sequencetype OccurrenceIndicator

The "empty" sequence type is mapped to the empty sequence type in the [XPath/XQuery] type system.

 
[empty]sequencetype
==
[()]sequencetype

The SequenceType component with type SchemaType is mapped directly into the [XPath/XQuery] type system.

 
[element SchemaType]sequencetype
==
element SchemaType
 
[attribute SchemaType]sequencetype
==
attribute SchemaType

The mapping still does not handle SequenceTypes using SchemaContext (See [Issue-0138:  Semantics of Schema Context]). The two following rules map references to a global element or attribute, without taking the schema context into account.

 
[element QName]sequencetype
==
element QName
 
[attribute QName]sequencetype
==
attribute QName

Document nodes, text nodes, untyped and atomic types are left unchanged.

 
[text]sequencetype
==
text
 
[document]sequencetype
==
document
 
[untyped]sequencetype
==
untyped
 
[AtomicType]sequencetype
==
AtomicType

Processing instruction and comment types are ignored. See [Issue-0105:  Types for nodes in the data model.].

 
[comment]sequencetype
==
 
[processing-instruction]sequencetype
==

The SequenceType components "node", "item", and "atomic value" correspond to wildcard types. node indicates that any node is allowed, "atomic value" indicates that any atomic (not list) value is allowed, and item indicates that any node or atomic value is allowed. The following mapping rules make use of the corresponding wildcard types.

 
[node]sequencetype
==
(element | attribute | text | document)
 
[item]sequencetype
==
(element | attribute | text | document | xs:string | xs:decimal | ... )
 
[atomic value]sequencetype
==
(xs:string | xs:decimal | ... )

Ed. Note: Jerome: The Formal Semantics makes use of fs:numeric which is not in XML Schema. This is necessary for the specification of some of XPath type conversion rules. See [Issue-0127:  SequenceType limitations].

4.4.3 Type Conversions

Introduction

Some expressions do not require their operands to exactly match the expected type. For example, function parameters and returns expect a value of a particular type, but automatically perform certain type conversions, such as extraction of atomic values from nodes, promotion of numeric values, and implicit casting of untyped values. The conversion rules for function parameters and returns are discussed in [5.1.4 Function Calls]. Other operators that provide special conversion rules include arithmetic operators, which are discussed in [5.4 Arithmetic Expressions], and value comparisons, which are discussed in [5.5.1 Value Comparisons].

Some special conversion rules: atomization, effective boolean value, and fallback conversions, are used in several places in the language.

4.4.3.1 Atomization

Notation

Atomization, denoted []Atomize, converts an item sequence into a sequence of atomic values.

Normalization

The rule below examines each item in the item sequence. If it is an atomic value, it is simply returned. If it is a node value, its typed content is returned.

 
[Expr]Atomize
==
for $item in (Expr) return
   typeswitch ($item)
    case $value as atomic value return $value
    default $node return fn:data($node)
4.4.3.2 Effective Boolean Value

Notation

The function []Effective_Boolean_Value takes an arbitrary value and converts it into a boolean value by applying the function fn:boolean().

Normalization

The effective boolean value is obtained through the following normalization rule.

 
[Expr]Effective_Boolean_Value
==
fn:boolean(Expr)

4.5 Variable Bindings

The semantics of variable bindings is handled by means of changing the static and dynamic environments. This is specified in the sections where variables are bound ([5.8 [For/FLWOR] Expressions], [5.11 Quantified Expressions], [5.12.2 Typeswitch], and [6.5 Function Definitions]) and where variables are used ([5.1.2 Variables]).

4.6 Errors and Conformance

This section describes error handling and conformance levels.

Ed. Note: Issue: Error handling is not formally specified. See [Issue-0094:  Static type errors and warnings]

Ed. Note: Issue: Conformance levels are not formally specified. See [Issue-0169:  Conformance Levels]

5 Expressions

This section gives the semantics of all the [XPath/XQuery] expressions. The organization of this section parallels the organization of Section 3 of the [XPath/XQuery] document.

[25 (XQuery)]   Expr   ::=   OrExpr
[13 (XPath)]   XPath   ::=   ExprSequence?

For each expression, a short description and the relevant grammar productions are given. The semantics of an expression includes the normalization, static analysis, and dynamic evaluation phases. Recall that normalization rules translate [XPath/XQuery] syntax into core [XPath/XQuery] syntax. In the sections that contain normalization rules, the Core grammar productions into which the expression is normalized are also provided. After normalization, sections on static type inference and dynamic evaluation define the static type and dynamic value for the core expression.

Core Grammar

The core grammar production for expressions is:

[22 (Core)]   Expr   ::=   ForExpr
|  ForExpr

5.1 Primary Expressions

Primary expressions are the basic primitives of the language.They include literals, variables, function calls, and the use of parentheses to control precedence of operators.

[62 (XQuery)]   PrimaryExpr   ::=   Literal |  FunctionCall |  ("$" VarName) |  ParenthesizedExpr
[13 (XQuery)]   VarName   ::=   QName

Core Grammar

The core grammar productions for primary expressions are:

[41 (Core)]   PrimaryExpr   ::=   Literal |  FunctionCall |  ("$" VarName) |  ParenthesizedExpr
[12 (Core)]   VarName   ::=   QName

5.1.1 Literals

Introduction

A literal is a direct syntactic representation of an atomic value. [XPath/XQuery] supports two kinds of literals: string literals and numeric literals.

[79 (XQuery)]   Literal   ::=   NumericLiteral |  StringLiteral
[78 (XQuery)]   NumericLiteral   ::=   IntegerLiteral |  DecimalLiteral |  DoubleLiteral
[1 (XQuery)]   IntegerLiteral   ::=   Digits
[2 (XQuery)]   DecimalLiteral   ::=   ("." Digits) |  (Digits "." [0-9]*)
[3 (XQuery)]   DoubleLiteral   ::=   (("." Digits) |  (Digits ("." [0-9]*)?)) ([e] | [E]) ([+] | [-])? Digits
[4 (XQuery)]   StringLiteral   ::=   (["] ((" ") |  [^"])* ["]) |  (['] ((' ') |  [^'])* ['])
[5 (XQuery)]   URLLiteral   ::=   (["] ((" ") |  [^"])* ["]) |  (['] ((' ') |  [^'])* ['])

Normalization

All literals are core expressions, therefore no normalization rules are required for literals.

Core Grammar

The core grammar productions for literals are:

[55 (Core)]   Literal   ::=   NumericLiteral |  StringLiteral
[54 (Core)]   NumericLiteral   ::=   IntegerLiteral |  DecimalLiteral |  DoubleLiteral
[1 (Core)]   IntegerLiteral   ::=   Digits
[2 (Core)]   DecimalLiteral   ::=   ("." Digits) |  (Digits "." [0-9]*)
[3 (Core)]   DoubleLiteral   ::=   (("." Digits) |  (Digits ("." [0-9]*)?)) ([e] | [E]) ([+] | [-])? Digits
[4 (Core)]   StringLiteral   ::=   (["] ((" ") |  [^"])* ["]) |  (['] ((' ') |  [^'])* ['])
[5 (Core)]   URLLiteral   ::=   (["] ((" ") |  [^"])* ["]) |  (['] ((' ') |  [^'])* ['])

Static Type Analysis

In the static semantics, the type of an integer literal is simply xs:integer:


statEnv |- IntegerLiteral : xs:integer

Dynamic Evaluation

In the dynamic semantics, an integer literal is evaluated by constructing an atomic value in the data model, which consists of the literal value and its type:


dynEnv |- IntegerLiteral => dm:atomic-value(IntegerLiteral, xs:integer)

The formal definitions of decimal, double, and string literals are analogous to those for integer.

Static Type Analysis


statEnv |- DecimalLiteral : xs:decimal

Dynamic Evaluation


dynEnv |- DecimalLiteral => dm:atomic-value(DecimalLiteral, xs:decimal)

Static Type Analysis


statEnv |- DoubleLiteral : xs:double

Dynamic Evaluation


dynEnv |- DoubleLiteral => dm:atomic-value(DoubleLiteral, xs:double)

Static Type Analysis


statEnv |- StringLiteral : xs:string

Dynamic Evaluation


dynEnv |- StringLiteral => dm:atomic-value(StringLiteral, xs:string)

Ed. Note: MFF: Phil has noted that the data model should support primitive literals in their lexical form, in which case no explicit dynamic semantic rule would be necessary. See [Issue-0118:  Data model syntax and literal values].

5.1.2 Variables

Introduction

A variable evaluates to the value to which the variable's QName is bound in the evaluation context.

Normalization

A variable is a core expression, therefore no normalization rule is required for a variable.

Static Type Analysis

In the static semantics, the type of a variable is simply its type in the static type environment statEnv.varType:

statEnv.varType(Variable) = Type

statEnv |- Variable : Type

If the variable is not bound in the environment, the system raises an error.

Dynamic Evaluation

In the dynamic semantics, a variable is evaluated by "looking up" its value in dynEnv.varValue:

dynEnv.varValue(Variable) = Value

dynEnv |- Variable => Value

If the variable is not bound in the environment, the system raises an error.

5.1.3 Parenthesized Expressions

[80 (XQuery)]   ParenthesizedExpr   ::=   "(" ExprSequence? ")"

The formal definition of parenthesized expressions is in [5.3 Sequence Expressions].

Core Grammar

The core grammar production for parenthesized expressions is:

[56 (Core)]   ParenthesizedExpr   ::=   "(" ExprSequence? ")"

5.1.4 Function Calls

Introduction

A function call consists of a QName followed by a parenthesized list of zero or more expressions.

[81 (XQuery)]   FunctionCall   ::=   <QName "("> (Expr ("," Expr)*)? ")"

Because [XPath/XQuery] performs implicit operations when doing function calls, a normalization step is required.

Notation

Normalization of function calls uses an auxiliary mapping rule, []FormalArgument, used to map the formal arguments of a function call.

There are two variants of this mapping rule. If the required type of the formal argument is a sequence of an atomic type, the following rule is applied:

 
[Expr]FormalArgument
==
[[Expr]Expr]Atomize

If the required type of the formal argument is not an optional atomic type or a sequence of atomic types, the expression is simply mapped to its corresponding core expression:

 
[Expr]FormalArgument
==
[Expr]Expr

Ed. Note: The above two rules should explicitly look up whether the signatures of the argument in question has atomic type or not.

Normalization

First, each argument in a function call is normalized to its corresponding core expression by applying []FormalArgument.

 
[ QName (Expr1, ..., Exprn) ]Expr
==
QName ( [Expr1]FormalArgument, ..., [Exprn]FormalArgument )

Note that this normalization rule depends on the static environment containing function signatures and is the only place where we exploit the implicit presence of statEnv.

Ed. Note: Moved the outermost normalization to the (dynamic!) evaluation rule as we have no idea at this point what type the context expects...and it does not seem to be necessary.

Core Grammar

The core grammar production for function calls is:

[57 (Core)]   FunctionCall   ::=   <QName "("> (Expr ("," Expr)*)? ")"

Static Type Analysis

Based on the function's name, the possible function signatures are retrieved from the static environment. If the function is not present in the environment with an appropriate signature, an error is raised. The type of each actual argument to the function must be a subtype of a type that is promotable to the corresponding formal argument type of the function.

statEnv |- QName => qname
statEnv.funcType(qname) = { FuncDecl1, ..., FuncDeclm }
FuncDecli = define function qname(Type1, ..., Typen) as Type
statEnv |- Expr1 : Type1'     Type1' <: Type1''     Type1'' can be promoted to Type1
...
statEnv |- Exprn : Typen'     Typen' <: Typen''     Typen'' can be promoted to Typen

statEnv |- QName (Expr1, ..., Exprn) : Type

Notice that the function body is not typed for each invocation: static typing of the declaration itself guarantees that the function will, indeed, always return a value of the declared return Type.

Dynamic Evaluation

Based on the function's name and argument types, the function body is retrieved from the dynamic environment. If the function is not present in the environment, an error is raised. The rule first evaluates each actual function argument expression. Next a match is searched for among the function's possible declaration signatures, retrieved from the statEnv.funcType static environment component. If the function is not present in the environment, or there is no matching declaration signature, a type error is raised. Otherwise, the function body and formal variables are obtained from dynEnv.funcDefn for the matching declaration signature. The rule then extends by binding each formal variable to its corresponding value converted as required for the expected type and backwards compatibility flag, and evaluates the body of the function in the new environment. The resulting value is the value of the function call.

Ed. Note: XPath 1.0 compatibility: Rules for converting function arguments not implemented yet. See [Issue-0178:  Semantics of XPath 1.0 compatibility].

statEnv |- QName => qname
statEnv.funcType(qname) = { FuncDecl1, ..., FuncDeclm }
FuncDecli = define function qname(Type1, ..., Typen) as Type
dynEnv |- Expr1 => Value1 ... dynEnv |- Exprn => Valuen
statEnv |- Value1 against Type1 promotes to Value1'
...
statEnv |- Valuen against Typen promotes to Valuen'
dynEnv.funcDefn(qname(Type1, ..., Typen)) = (Expr, Variable1, ... , Variablen)
dynEnv [ varValue = ( Variable1 |-> Value1'; ...; Variablen |-> Valuen') ] |- Expr => Value
statEnv |- Value against Type promotes to Value'

dynEnv |- QName ( Expr1, ..., Exprn ) => Value'

Note that the function body is evaluated in the default (top-level) environment extended with just the parameter bindings. Note also that input values and output values are matched against the types declared for the function. If static analysis was performed, all these checks are guaranteed to be true and may be omitted.

Built-in functions are invoked similarly except they are applied directly.

statEnv |- QName => qname
statEnv.funcType(qname) = { FuncDecl1, ..., FuncDeclm }
FuncDecli = define function qname(Type1, ..., Typen) as Type
dynEnv |- Expr1 => Value1 ... dynEnv |- Exprn => Valuen
statEnv |- Value1 against Type1 promotes to Value1'
...
statEnv |- Valuen against Typen promotes to Valuen'
dynEnv.funcDefn(qname(Type1, ..., Typen)) = #BUILT-IN
qname(Value1', ..., Valuen') => Value
statEnv |- Value against Type promotes to Value'

dynEnv |- QName ( Expr1, ..., Exprn ) => Value'

Primitive Function Application use the following additional auxiliary judgment to evaluate the primitive function application:

"The primitive function F (from data model, type constructor, or functions and operators) applied to the given parameter values yields the specified result value"

F(Value1, ..., Valuen) => Value

Ed. Note: Issue: Primitive function applications must be defined more precisely. See [Issue-0179:  Primitive function applications]

5.1.5 Comments

[6 (XQuery)]   ExprComment   ::=   "{--" [^}]* "--}"

Comments are lexical constructs only, and have no meaning within the query and therefore have no Formal Semantics.

5.2 Path Expressions

Introduction

Path expressions are used to locate nodes within a tree. There are two kinds of path expressions, absolute path expressions and relative path expressions. An absolute path expression is a rooted relative path expression. A relative path expression is composed of a sequence of steps.

[42 (XQuery)]   PathExpr   ::=   ("/" RelativePathExpr?) |  ("//" RelativePathExpr) |  RelativePathExpr
[43 (XQuery)]   RelativePathExpr   ::=   StepExpr (("/" |  "//") StepExpr)*

Core Grammar

PathExpr and RelativePathExpr are fully normalized. There for, there is no corresponding production in the core. The grammar of path expressions in the core starts with the StepExpr production.

Normalization

Absolute path expressions are path expressions starting with the / symbol, indicating that the expression must be applied on the root node in the current context. Remember that the root node in the current context is the topmost ancestor of the context node. The following two rules are used to normalize absolute path expressions to relative ones, and rely on the use of the fn:root() function, which computes the document node from the context node.

 
["/"]Expr
==
fn:root( $fs:dot )
 
["/" RelativePathExpr]Expr
==
[fn:root( $fs:dot ) "/" RelativePathExpr]Expr

Ed. Note: Some of the semantics of the root expression '/' are still undefined. For instance, what should the semantics of '/' be in case of a document fragment (e.g., created using XQuery element constructor). See [Issue-0123:  Semantics of /].

A composite relative path expression (using /) is normalized into a for expression by concatenating the sequences obtained by mapping each node of the left-hand side in document order to the sequence it generates on the right-hand side; the encapsulating call of the fs:distinct-doc-order function then ensures that the result is in document order without duplicates. The evaluation context is carefully set up (using notably the at $fs:position position binding in the for expression to bind the position.

Note that applying sorting by document order is enforces the restriction that input and output sequences contains only nodes.

 
[StepExpr1 "/" StepExpr2]Expr
==
fs:distinct-doc-order (
  let $fs:sequence := fs:distinct-doc-order( [StepExpr1]Expr ) return
  let $fs:last := fn:count($fs:sequence) return
  for $fs:dot at $fs:position in $fs:sequence return
    [StepExpr2]Expr
)

5.2.1 Steps

Introduction

[44 (XQuery)]   StepExpr   ::=   (ForwardStep |  ReverseStep |  PrimaryExpr) Predicates
[73 (XQuery)]   ForwardStep   ::=   (ForwardAxis NodeTest) |  AbbreviatedForwardStep
[74 (XQuery)]   ReverseStep   ::=   (ReverseAxis NodeTest) |  AbbreviatedReverseStep

Core Grammar

The core grammars productions for XPath steps are:

[29 (Core)]   StepExpr   ::=   ForwardStep |  ReverseStep |  PrimaryExpr
[52 (Core)]   ForwardStep   ::=   ForwardAxis NodeTest
[53 (Core)]   ReverseStep   ::=   ReverseAxis NodeTest

Notation

Step expressions can be followed by predicates. Normalization of predicates uses the following auxiliary mapping rule: []Predicates.

Normalization

Normalization of predicates need to distinguish between forward steps, reverse steps, and primary expressions.

As explained in the [XPath/XQuery] document, applying a step in XPath changes the focus (or context). The change of focus is made explicit by the normalization rule below, which binds the variable $fs:dot to the node currently being processed, and the variable $fs:position to the position (i.e., the position within the input sequence) of that node.

In the case predicates are applied on a forward step, the input sequence is first sorted in document order and duplicates are removed. The context is changed to bind the appropriate variable in document order.

Note that applying sorting by document order is enforces the restriction that input and output sequences contains only nodes.

 
[ForwardStep Predicates "[" Expr "]"]Expr
==
let $fs:sequence := fs:distinct-doc-order( [ForwardStep Predicates]Expr ) return
let $fs:last := fn:count($fs:sequence) return
for $fs:dot at $fs:position in $fs:sequence return
   if [Expr]Predicates then $fs:dot else ()

In the case predicates are applied on a reverse step, the input sequence is first sorted in document order and duplicates are removed. The context is changed such that the position variable is bound in reverse document order.

Note that applying sorting by document order is enforces the restriction that input and output sequences contains only nodes.

 
[ReverseStep Predicates "[" Expr "]"]Expr
==
let $fs:sequence := fs:distinct-doc-order( [ReverseStep Predicates]Expr ) return
let $fs:last := fn:count($fs:sequence) return
for $fs:dot at $fs:new in $fs:sequence return
let $fs:position := $fs:last - $fs:new + 1 return
  if [Expr]Predicates then $fs:dot else ()

In the case predicates are applied on a primary expression, the input sequence is processed in sequence order and the context is bound as in the case of forward axis. In that case, the sequence can contain both nodes and atomic values.

 
[PrimaryExpr Predicates "[" Expr "]"]Expr
==
let $fs:sequence := [PrimaryExpr Predicates]Expr return
let $fs:last := fn:count($fs:sequence) return
for $fs:dot at $fs:position in $fs:sequence return
   if [Expr]Predicates then $fs:dot else ()

The []Predicates function is specified in [5.2.2 Predicates].

5.2.1.1 Axes

Introduction

[63 (XQuery)]   ForwardAxis   ::=   <"child" "::">
|  <"descendant" "::">
|  <"attribute" "::">
|  <"self" "::">
|  <"descendant-or-self" "::">
[64 (XQuery)]   ReverseAxis   ::=   <"parent" "::">
[50 (XPath)]   ForwardAxis   ::=   <"child" "::">
|  <"descendant" "::">
|  <"attribute" "::">
|  <"self" "::">
|  <"descendant-or-self" "::">
|  <"following-sibling" "::">
|  <"following" "::">
|  <"namespace" "::">
[51 (XPath)]   ReverseAxis   ::=   <"parent" "::">
|  <"ancestor" "::">
|  <"preceding-sibling" "::">
|  <"preceding" "::">
|  <"ancestor-or-self" "::">

Core Grammar

The core grammars productions for XPath axis are:

[42 (Core)]   ForwardAxis   ::=   <"child" "::">
|  <"descendant" "::">
|  <"attribute" "::">
|  <"self" "::">
|  <"descendant-or-self" "::">
|  <"following-sibling" "::">
|  <"following" "::">
|  <"namespace" "::">
[43 (Core)]   ReverseAxis   ::=   <"parent" "::">
|  <"ancestor" "::">
|  <"preceding-sibling" "::">
|  <"preceding" "::">
|  <"ancestor-or-self" "::">

Notation

Some auxiliary grammar productions and judgments are introduced to structure the semantic rules for axes and node tests. The first is used to capture the principal node kind associated with each axis:

[61 (Formal)]   PrincipalNodeKind   ::=   "element" |  "attribute" |  "namespace"
[59 (Formal)]   Axis   ::=   ForwardAxis |  ReverseAxis

Notation

The principal node kind for each axis is specified using the following judgment.

Axis principal PrincipalNodeKind

Notation

The following auxiliary grammar production defines Filters, which denote either an axis, or a node test in the context of a principal node kind. An Axis merely combines the forward and reverse axes to allow judgments that range over both.

[59 (Formal)]   Axis   ::=   ForwardAxis |  ReverseAxis
[60 (Formal)]   Filter   ::=   Axis |  (PrincipalNodeKind NodeTest)

Notation

The semantics of filters is defined with three auxiliary judgments. The first judgment defines the effect of a filter on a value, and is used in the dynamic semantics. This judgment should be read as follows: in the current context (i.e., with respect to the dynamic environment dynEnv.varValue), applying the Filter on Value1 yields Value2:

dynEnv |- Filter on Value1 => Value2

The second judgment is defining the effect of a filter (either an axis or a nodetest) on a type and is used in the static semantics. This judgment should be read as follows: in the current context, (i.e., with respect to the the static environment statEnv.varType), applying the filter Filter on type Type1 yields the type Type2

statEnv |- Filter on Type1 : Type2

The last judgment allows an additional type environment, localTypeEnv. This "local" type environment maps variables to types (in the same way as statEnv.varType), and is used when computing the static type of the descendant axis.

statEnv ; localTypeEnv |- Filter on Type1 : Type2

These judgments will be defined separately for axes and node tests below.

Ed. Note: Explain the judgments in detail, maybe similarly to in section 3.2.

Static Type Analysis

The static semantics of an Axis NodeTest pair is obtained by retrieving the type of the context node, and applying the two filters (the Axis, and then the NodeTest with a PrincipalNodeKind) on the result.

statEnv.varType($fs:dot) = Type1
statEnv |- Axis on Type1 : Type2
Axis principal PrincipalNodeKind
statEnv |- PrincipalNodeKind, NodeTest on Type2 : Type3

statEnv |- Axis NodeTest : Type3

Here are the common set of inference rules defining the static semantics of filters. Essentially, these rules are used to process complex types through a filter, breaking the types down to the type of a node or of a value. The semantics of a filter applied to a node or value type is then specific to each Axis and NodeTest.


statEnv |- Filter on () : ()


statEnv |- Filter on none : none

statEnv |- Filter on Type1 : Type3
statEnv |- Filter on Type2 : Type4

statEnv |- Filter on Type1,Type2 : Type3,Type4

statEnv |- Filter on Type1 : Type3
statEnv |- Filter on Type2 : Type4

statEnv |- Filter on Type1|Type2 : Type3|Type4

statEnv |- Filter on Type1 : Type3
statEnv |- Filter on Type2 : Type4

statEnv |- Filter on Type1&Type2 : Type3&Type4

The cases for specific axis and node test filters are given in the axis and node test sections below.

Dynamic Evaluation

The dynamic semantics of an Axis NodeTest pair is obtained by retrieving the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the filter judgment as follows.

dynEnv.varValue($fs:dot) = Value1
dynEnv |- Axis on Value1 => Value2
Axis principal PrincipalNodeKind
dynEnv |- PrincipalNodeKind, NodeTest on Value2 => Value3

dynEnv |- Axis NodeTest => Value3

Here are the common set of inference rules defining the dynamic semantics of filters. These rules apply filters to a sequence, by breaking the sequence down to a node or value.


dynEnv |- Filter on () => ()

dynEnv |- Filter on Value1 => Value3
dynEnv |- Filter on Value2 => Value4

dynEnv |- Filter on Value1,Value2 => Value3,Value4

Ed. Note: Jerome: These rules fetaure a notational abuse! Value1,Value2 indicates the concatenation of Value1 and Value2, and is used for both constructing a new sequence and deconstructing it. When doing the construction, it is equivalent to applying the op:concatenate operator. The Formal Semantics specification should use more consistent notation for data model constructions and deconstruction. See also [Issue-0118:  Data model syntax and literal values].

The semantics of a filter applied to a specific axis or nodetest is given in the appropriate section below.

Normalization

The semantics of the following(-sibling) and preceding(-sibling) axes are expressed by mapping them to core expressions, all other axes are part of core [XPath/XQuery] and therefore are left unchanged through normalization.

 
[following-sibling:: NodeTest]Expr
==
typeswitch ($fs:dot)
  case $v as attribute return ()
  default $v return
    (for $fs:new in (for $fs:dot in (for $fs:dot in $fs:dot return parent::node() ) return child::NodeTest) return
      if dm:node-before($fs:dot,$fs:new) then $fs:new else ())

Ed. Note: Should all mappings be unfolded all the way to the core language like the above one?

 
[following:: NodeTest]Expr
==
[ancestor-or-self::node()/following-sibling::node()/descendant-or-self::NodeTest]Expr

Otherwise, the forward axis is part of the core [XPath/XQuery] and handled by the Axis/NodeTest semantics below:

 
[child:: NodeTest]Expr
==
child:: NodeTest
 
[attribute:: NodeTest]Expr
==
attribute:: NodeTest
 
[self:: NodeTest]Expr
==
self:: NodeTest
 
[descendant:: NodeTest]Expr
==
descendant:: NodeTest
 
[descendant-or-self:: NodeTest]Expr
==
descendant-or-self:: NodeTest
 
[namespace:: NodeTest]Expr
==
namespace:: NodeTest

Reverse axes:

 
[preceding-sibling:: NodeTest]Expr
==
for $fs:new in (for $fs:dot in (for $fs:dot in $fs:dot return parent::node()) return child::NodeTest) return
  if dm:node-before($fs:new,$fs:dot) then $fs:new else ()
 
[preceding:: NodeTest]Expr
==
[ancestor-or-self::node()/preceding-sibling::node()/descendant-or-self::NodeTest]Expr

Otherwise, the reverse axis is part of the core [XPath/XQuery] and is handled by the Axis/NodeTest semantics below.

 
[parent:: NodeTest]Expr
==
parent:: NodeTest
 
[ancestor:: NodeTest]Expr
==
ancestor:: NodeTest
 
[ancestor-or-self:: NodeTest]Expr
==
ancestor-or-self:: NodeTest

Finally, the principal node kind of each axis is enumerated by the following rules (given here because they are used both by the static and dynamic semantic rules).


attribute:: principal attribute


namespace:: principal namespace

Axis != attribute::     Axis != namespace::

Axis principal element

Static Type Analysis

The following rules define the static semantics of the filter judgment when applied to an Axis.

The type for the self axis is the same type as the type of the context node.


statEnv |- self:: on NodeType : NodeType

In case of elements, the type of the child axis is obtained by extracting the children types out of the content model describing the type of the context node.


statEnv |- child:: on element qname { AttrType, ChildType } : ChildType

Ed. Note: [Kristoffer/XSL] Need to add filters for all cases.

The type of the attribute axis is obtained by extracting the attribute types out of the content model describing the type of the context node.


statEnv |- attribute:: on element qname { AttrType, ChildType } : AttrType

Ed. Note: [Kristoffer/XSL] Need to add filters for all cases.

The type for the parent axis is either an element, a document (for document elements), or empty.


statEnv |- parent:: on element : (element|document)?

Ed. Note: Better typing for the parent axis is still an open issue. See [Issue-0080:  Typing of parent].

The type for the namespace axis is always empty.


statEnv |- namespace:: on NodeType : ()

Ed. Note: Jerome: The type of namespace nodes is still an open issue. See [Issue-0105:  Types for nodes in the data model.].

The types for the descendant, and descendant-or-self, ancestor, and ancestor-or-self axis are implemented through recursive application of the children and parent filters. The corresponding inference rules use the auxiliary filter judgment with a local type environment. This local type environment is used to keep track of the visited elements in order to deal with recursive types.

statEnv ; { } |- descendant:: on Type : Type'

statEnv |- descendant:: on Type : Type'

The following two rules are used to deal with global elements. Global elements need careful treatment here in order to deal with possible recursion. Notably, if the global element has been seen before, then the rule terminates and returns the union of all types already visited in the local environment.

localTypeEnv(qname) = Type
localTypeEnv = { Type1, Type2, ... }

statEnv |- descendant:: on element qname : (Type1|Type2|...)*

localTypeEnv(qname) is an error
statEnv.elemDecl(qname) = define element qname TypeReference
TypeReference resolves to TypeName { Type }
statEnv ; localTypeEnv[qname : Type] |- descendant:: on Type : Type'

statEnv localTypeEnv |- descendant:: on element qname Type'

Ed. Note: Peter: I need yet to figure out, whether this works, there still appear a few glitches. See [Issue-0132:  Typing for descendant].

In all other cases, the following rule applies

statEnv |- child:: on NodeType : Type1
statEnv |- descendant:: on Type1 : Type2

statEnv |- descendant:: on NodeType : Type1,Type2

statEnv |- self:: on NodeType : Type1
statEnv |- descendant:: on Type1 : Type2

statEnv |- descendant-or-self:: on NodeType : Type1,Type2

The type for the ancestor axis is the element*,document? type indicating that it includes elements and possibly a document element.


statEnv |- ancestor:: on NodeType : element*

statEnv |- self:: on NodeType : Type1
statEnv |- ancestor:: on Type1 : Type2

statEnv |- ancestor-or-self:: on NodeType : Type1, Type2

In all the other cases, the filter application results in an empty type.

statEnv |- AxisName:: on NodeType : () otherwise.

Ed. Note: KHR: "otherwise" rules should be avoided and specify the cases exhaustively.

Dynamic Evaluation

The inference rules below indicate for each axis whether a particular node should be considered further or whether the node should be left out. Therefore, each rule either returns the input node, or returns the empty sequence. The overall result is obtained by putting together all of the remaining nodes, following the generic semantics for filters.

The self axis just returns the context node.


dynEnv |- self:: on NodeValue => NodeValue

The child, parent, attribute and namespace axis are implemented through their corresponding accessors in the [XQuery 1.0 and XPath 2.0 Data Model].


dynEnv |- child:: on NodeValue => dm:children(NodeValue)


dynEnv |- attribute:: on NodeValue => dm:attributes(NodeValue)


dynEnv |- parent:: on NodeValue => dm:parent(NodeValue)

The descendant, descendant-or-self, ancestor, and ancestor-or-self axis are implemented through recursive application of the children and parent filters.

dynEnv |- child:: on NodeValue => Value1
dynEnv |- descendant:: on Value1 => Value2

dynEnv |- descendant:: on NodeValue => fs:distinct-doc-order(op:concatenate(Value1, Value2))

dynEnv |- self:: on NodeValue => Value1
dynEnv |- descendant:: on Value1 => Value2

dynEnv |- descendant-or-self:: on NodeValue => op:concatenate(Value1, Value2)

dynEnv |- parent:: on NodeValue => Value1
dynEnv |- ancestor:: on Value1 => Value2

dynEnv |- ancestor:: on NodeValue => op:concatenate(Value1, Value2)

dynEnv |- self:: on NodeValue => Value1
dynEnv |- ancestor:: on Value1 => Value2

dynEnv |- ancestor-or-self:: on NodeValue => op:concatenate(Value1, Value2)

In all the other cases, the filter application results in an empty sequence.

dynEnv |- AxisName:: on NodeValue => () otherwise.

Ed. Note: Peter: Generally steps may operate on nodes and values alike; the axis rules only can operate on nodes (NodeValue). Is it a dynamic error to apply an axis rule on a value? See [Issue-0125:  Operations on node only in XPath].

5.2.1.2 Node Tests

Introduction

A node test is a condition applied on the nodes selected by an axis step. Possible node tests are described by the following grammar productions.

[65 (XQuery)]   NodeTest   ::=   KindTest |  NameTest
[66 (XQuery)]   NameTest   ::=   QName |  Wildcard
[67 (XQuery)]   Wildcard   ::=   "*" |  <NCName ":" "*"> |  <"*" ":" NCName>
[68 (XQuery)]   KindTest   ::=   ProcessingInstructionTest
|  CommentTest
|  TextTest
|  AnyKindTest
[69 (XQuery)]   ProcessingInstructionTest   ::=   <"processing-instruction" "("> StringLiteral? ")"
[70 (XQuery)]   CommentTest   ::=   <"comment" "("> ")"
[71 (XQuery)]   TextTest   ::=   <"text" "("> ")"
[72 (XQuery)]   AnyKindTest   ::=   <"node" "("> ")"

Core Grammar

The core grammars productions for XPath node tests are:

[44 (Core)]   NodeTest   ::=   KindTest |  NameTest
[45 (Core)]   NameTest   ::=   QName |  Wildcard
[46 (Core)]   Wildcard   ::=   "*" |  <NCName ":" "*"> |  <"*" ":" NCName>
[47 (Core)]   KindTest   ::=   ProcessingInstructionTest
|  CommentTest
|  TextTest
|  AnyKindTest
[48 (Core)]   ProcessingInstructionTest   ::=   <"processing-instruction" "("> StringLiteral? ")"
[49 (Core)]   CommentTest   ::=   <"comment" "("> ")"
[50 (Core)]   TextTest   ::=   <"text" "("> ")"
[51 (Core)]   AnyKindTest   ::=   <"node" "("> ")"

Static Type Analysis

The following typing rules apply to filters that are node tests in the context of a principal node kind.

Node tests on elements and attributes always accomplish the most specific type possible. For example, if $v is bound to an element with a computed name, the type of $v is element { Type }. The static type computed for the expression $v/self::foo is element foo {Type}, which makes use of the nametest foo to arrive at a more specific type. Also note that each case of name matching restricts the principal node kind to the appropriate one.

Ed. Note: [Kristoffer/XSL] Perhaps use statEnv |- expand(QName) = qname(URI, ncname) instead of qname = prefix:local? Also The ElemOrAttrType doesn't allow wildcard in the ElementName or AttributeName, so either the rules can be removed or the definition should be updated.

QName2 = Prefix2:LocalPart2
fn:get-namespace-uri( QName1) = statEnv.namespace(Prefix2)
fn:get-local-name( QName1 ) = LocalPart2

statEnv |- element, QName2 on element QName1 {Type} : element QName1 {Type}

QName2 = Prefix2:LocalPart2
LocalPart1 = LocalPart2

statEnv |- element, QName2 on element *:LocalPart1 {Type} : element QName2 {Type}

QName2 = Prefix2:LocalPart2
statEnv.namespace(Prefix1) = statEnv.namespace(Prefix2)

statEnv |- element, QName2 on element Prefix1:* {Type} : element Prefix1:LocalPart2{Type}


statEnv |- element, QName2 on element {Type} : element QName2 {Type}

fn:get-local-name( QName1 ) = LocalPart2

statEnv |- element, *:LocalPart2 on element QName1 {Type} : element QName1 {Type}

LocalPart1 = LocalPart2

statEnv |- element, *:LocalPart2 on element *:LocalPart1 {Type} : element *:LocalPart2 {Type}


statEnv |- element, *:LocalPart2 on element Prefix1:* {Type} : element Prefix1:LocalPart2{Type}


statEnv |- element, *:LocalPart2 on element { Type } : element *:LocalPart2 {Type}

fn:get-namespace-uri( QName1) = statEnv.namespace(Prefix2)

statEnv |- element, Prefix2:* on element QName1 {Type} : element QName1 {Type}


statEnv |- element, Prefix2:* on element *:LocalPart1 {Type} : element Prefix2:LocalPart1 {Type}

statEnv.namespace(Prefix1) = statEnv.namespace(Prefix2)

statEnv |- element, Prefix2:* onelement Prefix1:* {Type} : element Prefix1:* {Type}


statEnv |- element, Prefix2:* on element {Type} : element Prefix2:* {Type}


statEnv |- element, * on element prefix:local {Type} : element prefix:local {Type}

Very similar typing rules apply to attributes:

QName2 = Prefix2:LocalPart2
fn:get-namespace-uri( QName1) = statEnv.namespace(Prefix2)
fn:get-local-name( QName1 ) = LocalPart2

statEnv |- attribute, QName2 on attribute QName1 {Type} : attribute QName1 {Type}

QName2 = Prefix2:LocalPart2
LocalPart1 = LocalPart2

statEnv |- attribute, QName2 on attribute *:LocalPart1 {Type} : attribute QName2 {Type}

QName2 = Prefix2:LocalPart2
statEnv.namespace(Prefix1) = statEnv.namespace(Prefix2)

statEnv |- attribute, QName2 on attribute Prefix1:* {Type} : attribute Prefix1:LocalPart2{Type}


statEnv |- attribute, QName2 on attribute {Type} : attribute QName2 {Type}

fn:get-local-name( QName1 ) = LocalPart2

statEnv |- attribute, *:LocalPart2 on attribute QName1 {Type} : attribute QName1 {Type}

LocalPart1 = LocalPart2

statEnv |- attribute, *:LocalPart2 on attribute *:LocalPart1 {Type} : attribute *:LocalPart2 {Type}


statEnv |- attribute, *:LocalPart2 on attribute Prefix1:* {Type} : attribute Prefix1:LocalPart2{Type}


statEnv |- attribute, *:LocalPart2 on attribute {Type} : attribute *:LocalPart2 {Type}

fn:get-namespace-uri( QName1) = statEnv.namespace(Prefix2)

statEnv |- attribute, Prefix2:* on attribute QName1 {Type} : attribute QName1 {Type}


statEnv |- attribute, Prefix2:* on attribute *:LocalPart1 {Type} : attribute Prefix2:LocalPart1 {Type}

statEnv.namespace(Prefix1) = statEnv.namespace(Prefix2)

statEnv |- attribute, Prefix2:* on attribute Prefix1:* {Type} : attribute Prefix1:* {Type}


statEnv |- attribute, Prefix2:* on attribute {Type} : attribute Prefix2:* {Type}


statEnv |- attribute, * on attribute prefix:local {Type} : attribute prefix:local {Type}

Comments, processing instructions, and text:


statEnv |- PrincipalNodeKind, processing-instruction() on processing-instruction : processing-instruction


statEnv |- PrincipalNodeKind, processing-instruction(String) on processing-instruction : processing-instruction


statEnv |- PrincipalNodeKind, comment() on comment : comment


statEnv |- PrincipalNodeKind, text() on text : text


statEnv |- PrincipalNodeKind, node() on NodeType : NodeType

If none of the above rules apply, then the node test returns the empty sequence and the following dynamic rule is applied:

statEnv |- PrincipalNodeKind, node() on NodeType : ()

Ed. Note: Peter: Except for self::, all axes guarantee that the NodeType is not the generic type node. However, when the type node is encountered, it has to be interpreted as element | attribute | text | comment | processing-instruction for these typing rules to work.

Dynamic Evaluation

The dynamic semantics indicates for each node test whether a node is retained.

dm:node-kind( NodeValue ) = PrincipalNodeKind
dm:name( NodeValue ) = QName1
QName2 = Prefix2:LocalPart2
fn:get-namespace-uri ( QName1 ) = statEnv.namespace(Prefix2)
fn:get-local-name ( QName1 ) = LocalPart2

dynEnv |- PrincipalNodeKind, QName2 on NodeValue => NodeValue

dm:node-kind( NodeValue ) = PrincipalNodeKind
dm:name ( NodeValue ) = qname

dynEnv |- PrincipalNodeKind, * on NodeValue => NodeValue

dm:node-kind( NodeValue ) = PrincipalNodeKind
dm:name ( NodeValue ) = qname
fn:get-namespace-uri ( qname ) = statEnv.namespace(prefix)

dynEnv |- PrincipalNodeKind, prefix:* on NodeValue => NodeValue

dm:node-kind( NodeValue ) = PrincipalNodeKind
dm:name ( NodeValue ) = qname
fn:get-local-name ( qname ) = local

dynEnv |- PrincipalNodeKind, *:local on NodeValue => NodeValue

dm:node-kind ( NodeValue ) = "processing-instruction"

dynEnv |- PrincipalNodeKind, processing-instruction() on NodeValue => NodeValue

dm:node-kind ( NodeValue ) = "processing-instruction"
dm:name ( NodeValue ) = qname
fn:get-local-name ( qname ) = String

dynEnv |- PrincipalNodeKind, processing-instruction( String ) on NodeValue => NodeValue

Ed. Note: Note the use of the fn:get-local-name function to extract the local name out of the name of the node.

dm:node-kind ( NodeValue ) = "comment"

dynEnv |- PrincipalNodeKind, comment() on NodeValue => NodeValue

dm:node-kind ( NodeValue ) = "text"

dynEnv |- PrincipalNodeKind, text() on NodeValue => NodeValue

The node() node test is true for all nodes. Therefore, the following rule does not have any precondition (remember that an empty upper part in the rule indicates that the rule is always true).


dynEnv |- PrincipalNodeKind, node() on NodeValue => NodeValue

If none of the above rules applies then the node test returns the empty sequence, and the following dynamic rule is applied:

dynEnv |- PrincipalNodeKind, node() on NodeValue => ()

5.2.2 Predicates

Introduction

Predicates are composed of zero or more expressions enclosed in square brackets.

[77 (XQuery)]   Predicates   ::=   ("[" Expr "]")*

Normalization

Predicates in path expressions are normalized with a special mapping rule:

 
[Expr]Predicates
==
typeswitch (Expr)
  case $v as numeric return op:numeric-equal($v, $fs:position)
  default $v return fn:boolean($v)

Note that the semantics of predicates whose parameter is a numeric value also works for other numeric than integer values, in which case the op:numeric-equal returns false when compared to a position. For example the expression //a[3.4] is allowed and always returns the empty sequence)

Ed. Note: Issue: The static semantics for the case where the parameter of a predicate is a numeric value is still an open issue. See [Issue-0011:  XPath tumbler syntax instead of index?].

5.2.3 Unabbreviated Syntax

The corresponding Section in the [XPath/XQuery] document just contains examples.

5.2.4 Abbreviated Syntax

[75 (XQuery)]   AbbreviatedForwardStep   ::=   "." |  ("@" NameTest) |  NodeTest
[76 (XQuery)]   AbbreviatedReverseStep   ::=   ".."

Normalization

Here are normalization rules for the abbreviated syntax.

 
[ // RelativePathExpr ]Expr
==
[ / descendant-or-self::node() / RelativePathExpr]Expr
 
[ StepExpr1 // StepExpr2 ]Expr
==
[StepExpr1 / descendant-or-self::node() / StepExpr2]Expr
 
[ . ]Expr
==
$fs:dot
 
[ .. ]Expr
==
parent::node()
 
[ @ NameTest ]Expr
==
attribute :: NameTest
 
[ NodeTest ]Expr
==
child :: NodeTest

5.3 Sequence Expressions

Introduction

[XPath/XQuery] supports operators to construct and combine sequences. A sequence is an ordered collection of zero or more items. An item is either an atomic value or a node.

5.3.1 Constructing Sequences

[24 (XQuery)]   ExprSequence   ::=   Expr ("," Expr)*
[35 (XQuery)]   RangeExpr   ::=   AdditiveExpr ( "to"  AdditiveExpr )?

Normalization

The sequence expression is normalized into a sequence of core expressions:

 
["(" ExprSequence ")"]Expr
==
"(" [ ExprSequence ]Expr ")"
 
[Expr1 , ... , Exprn]Expr
==
[Expr1]Expr, ..., [Exprn]Expr

Ed. Note: Mike Kay remarks that it would be cleaner to have binary rules and recursion to define the semantics rather than the ... notation.

The range operator is normalized to the op:to function.

 
[Expr1 to Expr2]Expr
==
op:to ([Expr1],[Expr2]Expr)

Core Grammar

The core grammar rule for sequence expressions is:

[21 (Core)]   ExprSequence   ::=   Expr ("," Expr)*

Static Type Analysis

The static semantics of the sequence expression follows. The type of the sequence expression is the sequence over the types of the individual expressions.

statEnv |- Expr1 : Type1      ...     statEnv |- Exprn : Typen

statEnv |- Expr1 , ... , Exprn : Type1, ..., Typen

Dynamic Evaluation

The dynamic semantics of the sequence expression follows. Each expression in the sequence is evaluated and the resulting values are concatenated into one sequence.

dynEnv |- Expr1 => Value1     dynEnv |- Expr2 => Value2     ...     dynEnv |- Exprn => Valuen

dynEnv |- Expr1, ..., Exprn => op:concatenate(Value1, op:concatenate(Value2, op:concatenate(... , Valuen) ) )

Normalization

The normalization of the infix "to" operator maps it into an application of op:to.

 
[Expr1 "op:to" Expr2]Expr
==
[op:to (Expr1, Expr2)]Expr

Static Type Analysis

The static semantics of the op:to function is defined in [7 Additional Semantics of Functions].

Dynamic Evaluation

The dynamic semantics rules for function calls given in [5.1.4 Function Calls] are applied to the function call op:to above.

Ed. Note: Should the "to" operator be defined formally? See [Issue-0133:  Should to also be described in the formal semantics?].

5.3.2 Combining Sequences

[XPath/XQuery] provides several operators for combining sequences.

[39 (XQuery)]   UnionExpr   ::=   IntersectExceptExpr ( ("union" |  "|")  IntersectExceptExpr )*
[40 (XQuery)]   IntersectExceptExpr   ::=   ValueExpr ( ("intersect" |  "except")  ValueExpr )*

Notation

First, a union, intersect, or except expression is normalized into a corresponding core expression. The mapping function []SequenceOp is defined by the following tables:

SequenceOp[SequenceOp]SequenceOp
"union"op:union
"|"op:union
"intersect"op:intersect
"except"op:except

Normalization

 
[Expr1 SequenceOp Expr2]Expr
==
[SequenceOp]SequenceOp ( [Expr1]Expr, [Expr2]Expr )

Static Type Analysis

The static semantics of the functions that operate on sequences are defined in [7 Additional Semantics of Functions].

Dynamic Evaluation

The dynamic semantics rules for function calls given in [5.1.4 Function Calls] are applied to the calls to functions on sequences above.

5.4 Arithmetic Expressions

[XPath/XQuery] provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms.

[36 (XQuery)]   AdditiveExpr   ::=   MultiplicativeExpr ( ("+" |  "-")  MultiplicativeExpr )*
[37 (XQuery)]   MultiplicativeExpr   ::=   UnaryExpr ( ("*" |  "div" |  "idiv" |  "mod")  UnaryExpr )*
[38 (XQuery)]   UnaryExpr   ::=   ("-" |  "+")* UnionExpr
[41 (XQuery)]   ValueExpr   ::=   ValidateExpr |  CastExpr |  TreatExpr |  Constructor |  PathExpr
[37 (XPath)]   ValueExpr   ::=   ValidateExpr |  CastExpr |  TreatExpr |  PathExpr

Notation

The mapping function []ArithOp is defined by the following table:

ArithOp[ArithOp]ArithOp
"+"fs:plus
"-"fs:minus
"*"fs:times
"div"fs:div
"idiv"fs:idiv
"mod"fs:mod

Core Grammar

All of those expressions are normalized to the following core grammar production:

[28 (Core)]   ValueExpr   ::=   ValidateExpr |  CastExpr |  Constructor |  StepExpr

Normalization

The normalization rules for the arithmetic operators first apply atomization to each argument and bind these values to variables. The type declarations on the let variables assert that the result of atomizing the arguments is the empty sequence or one atomic value. If either atomized argument is the empty sequence, the expression evaluates to the empty sequence. Otherwise, the overloaded internal function corresponding the the arithmetic operator is applied. The table above maps the operators to the appropriate internal function. The mapping from the overloaded internal functions to the corresponding monomorphic function is given in [B Mapping of Overloaded Internal Functions].

 
[Expr1 ArithOp Expr2]Expr
==
let atomic value? $e1 := [Expr1]Atomize return
let atomic value? $e2 := [Expr2]Atomize return
   typeswitch ($e1)
   case $v as empty return ()
   default $v1
     typeswitch ($e2)
     case $v as empty return ()
     default $v2 return [ArithOp]ArithOp ($v1, $v2))

Ed. Note: Mary: Casting from untyped to double not implemented. Current (function call) semantics will always attempt to promote an untyped value to the numeric type of the other argument, instead of directly to double. See [Issue-0177:  Coercion between untyped and atomic values].

The unary operators are mapped similarly.

 
[+ Expr]Expr
==
let atomic value? $e := [Expr]Atomize return
   typeswitch ($e)
   case $v as empty return ()
   default $v return fs:plus(0, $v))
 
[- Expr]Expr
==
let atomic value? $e := [Expr]Atomize return
   typeswitch ($e)
   case $v as empty return ()
   default $v return fs:minus (0, $v))

Ed. Note: XPath 1.0 compatibility: Rules for converting function arguments of arithmetic operators not implemented yet. See [Issue-0178:  Semantics of XPath 1.0 compatibility].

Ed. Note: XPath 1.0 backwards compatibility not yet incorporated. See [Issue-0178:  Semantics of XPath 1.0 compatibility].

Core Grammar

There are no core grammar rules for arithmetic expressions as they are normalized to function calls.

Static Type Analysis

There are no static typing rules for arithmetic expressions as they are normalized to core function calls.

Dynamic Evaluation

The normalization rules map all arithmetic operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for arithmetic operators. The dynamic semantics rules for function calls given in [5.1.4 Function Calls] apply to all the function calls.

5.5 Comparison Expressions

Introduction

Comparison expressions allow two values to be compared. [XPath/XQuery] provides four kinds of comparison expressions, called value comparisons, general comparisons, node comparisons, and order comparisons.

[34 (XQuery)]   ComparisonExpr   ::=   RangeExpr ( (ValueComp
|  GeneralComp
|  NodeComp
|  OrderComp)  RangeExpr )?
[54 (XQuery)]   ValueComp   ::=   "eq" |  "ne" |  "lt" |  "le" |  "gt" |  "ge"
[53 (XQuery)]   GeneralComp   ::=   "=" |  "!=" |  "<" |  "<=" |  ">" |  ">="
[55 (XQuery)]   NodeComp   ::=   "is" |  "isnot"
[56 (XQuery)]   OrderComp   ::=   "<<" |  ">>"

5.5.1 Value Comparisons

Notation

The mapping function []ValueOp is defined by the following table:

ValueOp[ValueOp]ValueOp
"eq"fs:eq
"ne"fs:ne
"lt"fs:lt
"le"fs:le
"gt"fs:gt
"ge"fs:ge

Normalization

The normalization rules for the value comparison operators first apply atomization to each argument and bind these values to variables. The type declarations on the let variables assert that the result of atomizing the arguments is the empty sequence or one atomic value. If either atomized argument is the empty sequence, the expression evaluates to the empty sequence. Otherwise, the overloaded internal function corresponding the the value comparison operator is applied. The table above maps the operators to the appropriate internal function. The mapping from the overloaded internal functions to the corresponding monomorphic function is given in [B Mapping of Overloaded Internal Functions].

 
[Expr1 ValueOp Expr2]Expr
==
let atomic value? $e1 := [Expr1]Atomize return
let atomic value? $e2 := [Expr2]Atomize return
   typeswitch ($e1)
   case $v as empty return ()
   default $v1
     typeswitch ($e2)
     case $v as empty return ()
     default $v2 return [ValueOp]ValueOp ($v1, $v2))

Core Grammar

There are no core grammar rules for value comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the value comparison operators as they are normalized to function calls. As it were, they all have return type xs:boolean, as specified in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Dynamic Evaluation

The normalization rules map all value comparison operators into core expressions, therefore, there are no dynamic semantics rules for value comparison operators. The dynamic semantics rules for function calls given in [5.1.4 Function Calls] apply to all the function calls.

5.5.2 General Comparisons

Introduction

General comparisons are defined by adding existential semantics to value comparisons. The operands of a general comparison may be sequences of any length. The result of a general comparison is always true or false.

Notation

For convenience, GeneralOp denotes the operators "=", "!=", "<", "<=", ">", or ">=".

The function []GeneralOp is defined by the following table:

GeneralOp[GeneralOp]GeneralOp
"=""eq"
"!=""ne"
"<""lt"
"<=""le"
">""gt"
">=""ge"

Normalization

A general comparison expression is normalized by mapping it into an existentially quantified, value-comparison expression, which is normalized recursively.

 
[Expr1 GeneralOp Expr2]Expr
==
[some $v1 in Expr1 satisfies (some $v2 in Expr2 satisfies $v1 [GeneralOp]GeneralOp $v2)]Expr

Ed. Note: XPath 1.0 backwards compatibility not yet incorporated.See [Issue-0178:  Semantics of XPath 1.0 compatibility].

Core Grammar

There are no core grammar rules for general comparisons as they are normalized to existentially quantified core expressions.

Static Type Analysis

There are no static type rules for the general comparison operators. The existentially quantified "some" expression always returns xs:boolean. Its static typing semantics is given in [5.11 Quantified Expressions].

Dynamic Evaluation

The normalization rules map all general comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for general comparison operators.

5.5.3 Node Comparisons

Normalization

The normalization rules for node comparisons map each argument expression and bind these values to variables. The type declarations on the let variables assert that each value is the empty sequence or one node value. If either argument is the empty sequence, the expression evaluates to the empty sequence. Otherwise, the appropriate node comparison function is applied.

 
[Expr1 is Expr2]Expr
==
let node? $e1 := [Expr1]Expr return
let node? $e2 := [Expr2]Expr return
   typeswitch ($e1)
   case $v as empty return ()
   default $v1
     typeswitch ($e2)
     case $v as empty return ()
     default $v2 return op:node-equal($v1, $v2)
 
[Expr1 isnot Expr2]Expr
==
let node? $e1 := [Expr1]Expr return
let node? $e2 := [Expr2]Expr return
   typeswitch ($e1)
   case $v as empty return ()
   default $v1
     typeswitch ($e2)
     case $v as empty return ()
     default $v2 return fn:not(op:node-equal($v1, $v2))

Core Grammar

There are no core grammar rules for node comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the node comparison operators.

Dynamic Evaluation

The normalization rules map the node comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for node comparison operators.

5.5.4 Order Comparisons

Normalization

The normalization rules for order comparisons map each argument expression and bind these values to variables. The type declarations on the let variables assert that each value is the empty sequence or one node value. If either argument is the empty sequence, the expression evaluates to the empty sequence. Otherwise, the appropriate order comparison function is applied.

 
[Expr1 << Expr2]Expr
==
let node? $e1 := [Expr1]Expr return
let node? $e2 := [Expr2]Expr return
   typeswitch ($e1)
   case $v as empty return ()
   default $v1
     typeswitch ($e2)
     case $v as empty return ()
     default $v2 return fn:node-before($v1, $v2))
 
[Expr1 >> Expr2]Expr
==
let node? $e1 := [Expr1]Expr return
let node? $e2 := [Expr2]Expr return
   typeswitch ($e1)
   case $v as empty return ()
   default $v1
     typeswitch ($e2)
     case $v as empty return ()
     default $v2 return fn:node-after($v1, $v2))

Core Grammar

There are no core grammar rules for order comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the order comparison operators.

Dynamic Evaluation

The normalization rules map the order comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for order comparison operators.

5.6 Logical Expressions

Introduction

A logical expression is either an and-expression or an or-expression. The value of a logical expression is always one of the boolean values: true or false.

[26 (XQuery)]   OrExpr   ::=   AndExpr ( "or"  AndExpr )*
[27 (XQuery)]   AndExpr   ::=   FLWRExpr ( "and"  FLWRExpr )*

Normalization

The normalization rules for "and" and "or" first get the effective boolean value of each argument, then apply the appropriate operand.

 
[Expr1 and Expr2]Expr
==
fn:empty( for $fs:item in ( [[Expr1]Expr]Effective_Boolean_Value , [[Expr2]Expr]Effective_Boolean_Value ) return if ($fs:item) then () else 1)
 
[Expr1 or Expr2]Expr
==
fn:exists( for $fs:item in ( [[Expr1]Expr]Effective_Boolean_Value , [[Expr2]Expr]Effective_Boolean_Value ) return if ($fs:item) then 1 else ())

Ed. Note: Note that the non-determinism in the semantics of quantified expressions is not modeled here. See [Issue-0136:  Non-determinism in the semantics].

Core Grammar

There are no core grammar rules for logical expressions which normalized to other kinds of [XPath/XQuery] expressions.

Static Type Analysis

There are no static type rules for the logical comparison operators. They both have return type xs:boolean, as specified in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Dynamic Evaluation

The normalization rules map the logical comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for logical comparison operators.

Notice that the conditional operators are non-deterministic because for is.

5.7 Constructors

Ed. Note: Status:This section is still awaiting further revision related to the semantics of element construction and validation. See [Issue-0166:  Static typing for validate] and [Issue-0170:  Imprecise static type of constructed elements].

[XPath/XQuery] supports two forms of constructors: a "literal" form that follows the XML syntax, and element and attribute constructors that can be used to construct stand-alone elements and attributes, possibly with a computed name.

[52 (XQuery)]   Constructor   ::=   ElementConstructor
|  XmlComment
|  XmlProcessingInstruction
|  CdataSection
|  ComputedDocumentConstructor
|  ComputedElementConstructor
|  ComputedAttributeConstructor
|  ComputedTextConstructor
[94 (XQuery)]   ElementConstructor   ::=   ( |  "<") QName AttributeList ("/>" |  (">" ElementContent* "</" QName S? ">"))
[102 (XQuery)]   ElementContent   ::=   Char
|  "{{"
|  "}}"
|  ElementConstructor
|  EnclosedExpr
|  CdataSection
|  CharRef
|  PredefinedEntityRef
|  XmlComment
|  XmlProcessingInstruction
[103 (XQuery)]   AttributeList   ::=   (S (QName S? "=" S? AttributeValue)?)*
[104 (XQuery)]   AttributeValue   ::=   (["] (EscapeQuot |  AttributeValueContent)* ["])
|  (['] (EscapeApos |  AttributeValueContent)* ['])
[105 (XQuery)]   AttributeValueContent   ::=   Char
|  CharRef
|  "{{"
|  "}}"
|  EnclosedExpr
|  PredefinedEntityRef
[106 (XQuery)]   EnclosedExpr   ::=   ( |  "{") ExprSequence "}"

Core Grammar

The core grammar productions for constructors are:

[35 (Core)]   Constructor   ::=   XmlComment
|  XmlProcessingInstruction
|  ComputedDocumentConstructor
|  ComputedElementConstructor
|  ComputedAttributeConstructor
|  ComputedTextConstructor
[77 (Core)]   EnclosedExpr   ::=   ( |  "{") ExprSequence "}"

5.7.1 Element Constructors

Introduction

[102 (XQuery)]   ElementContent   ::=   Char
|  "{{"
|  "}}"
|  ElementConstructor
|  EnclosedExpr
|  CdataSection
|  CharRef
|  PredefinedEntityRef
|  XmlComment
|  XmlProcessingInstruction

The static and dynamic semantics of the literal forms of element constructors is obtained after normalization to computed element constructors.

Notation

The auxiliary mapping rules []ElementContent and []AttributeContent are used for the normalization of element and attribute content respectively.

Normalization

Literal characters, escaped curly braces, character references, and predefined entity references in element content are treated specially. This normalization rule assumes:

  1. that the significant whitespace characters in element constructors have been preserved, as described in [5.7.3 Whitespace in Constructors];

  2. that character references have been resolved to individual characters and predefined entity references have been resolved to sequences of characters, and

  3. that the rule is applied to the longest contiguous sequence of characters.

The following normalization rules take the longest consecutive sequence of individual characters that arise from literal characters, escaped curly braces, character references, and predefined entity references and normalizes the character sequence as a text node.

 
[Char*]ElementContent
==
dm:text-node(fs:characters_to_string(Char*))

We start with the rules for normalizing a literal XML element's content. We normalize each individual value and construct a sequence of the normalized values. Consecutive sequences of characters are normalized as a unit, applying the rule above.

 
[ ElementContent0 ..., ElementContentn ]ElementContent
==
([ ElementContent0 ]ElementContent , ..., [ ElementContentn]ElementContent)

We normalize an enclosed expression in element content by normalizing each individual expression in its expression sequence and then construct a sequence of the normalized values:

 
[ { Expr0, ..., Exprn } ]ElementContent
==
([ Expr0 ]Expr , ..., [ Exprn]Expr, dm:text-node(""))

We need to distinguish between multiple enclosed expressions, because the rule for converting sequences of atomic values into strings are applied to sequences within distinct enclosed expressions. To distinguish between multiple enclosed expressions, we add an empty text node to the end of the constructed sequence above -- this empty text node is eliminated in the dynamic evaluation rule when consecutive text nodes are coalesced into a single text node. The text node guarantees that a whitespace character will not be inserted between atomic values computed by distinct enclosed expressions.

Processing instructions and comments in element content are normalized by applying the standard normalization rules.

 
[XmlProcessingInstruction]ElementContent
==
[XmlProcessingInstruction]Expr
 
[XmlComment]ElementContent
==
[XmlComment]Expr

The following rules normalize the two forms of literal XML element constructors by normalizing their name, attribute list, and element content and translates the normalized values into the corresponding computed element constructor.

[94 (XQuery)]   ElementConstructor   ::=   ( |  "<") QName AttributeList ("/>" |  (">" ElementContent* "</" QName S? ">"))
 
[ < QName AttributeList > ElementContent* </ QName S? > ]Expr
==
element [QName]{ [ AttributeList ]AttributeContent , [ ElementContent* ]ElementContent }
 
[ < QName AttributeList /> ]Expr
==
element [QName]{ [ AttributeList ]AttributeContent }

Like literal XML element constructors, literal XML attribute constructors are normalized to computed attribute constructors.

[103 (XQuery)]   AttributeList   ::=   (S (QName S? "=" S? AttributeValue)?)*
[104 (XQuery)]   AttributeValue   ::=   (["] (EscapeQuot |  AttributeValueContent)* ["])
|  (['] (EscapeApos |  AttributeValueContent)* ['])
[105 (XQuery)]   AttributeValueContent   ::=   Char
|  CharRef
|  "{{"
|  "}}"
|  EnclosedExpr
|  PredefinedEntityRef

Literal characters, escaped curly braces, character references, and predefined entity references in attribute content are treated as in element content. In addition, the normalization rule for characters in attributes assumes:

  1. that an escaped single or double quote is converted to an individual single or double quote.

The following normalization rules take the longest consecutive sequence of individual characters that arise from literal characters, escaped curly braces, character references, predefined entity references, and escaped single and double quotes and normalizes the character sequence as a string.

 
[Char*]AttributeContent
==
fs:characters_to_string(Char*)

We normalize an enclosed expression in attribute content by normalizing each individual expression in its expression sequence and then construct a sequence of the normalized values:

 
[ { Expr0, ..., Exprn } ]AttributeContent
==
([ Expr0 ]Expr , ..., [ Exprn]Expr, dm:text-node(""))

As in literal XML elements, we need to distinguish between multiple enclosed expressions in attribute content, because the rule for converting sequences of atomic values into strings are applied to sequences within distinct enclosed expressions. As in element content, to distinguish between multiple enclosed expressions in attribute content, we add an empty text node to the end of the constructed sequence above -- this empty text node is eliminated in the dynamic evaluation rule.

We normalize an AttributeValue (a literal attribute's value) by normalizing each individual value in its content and then construct a sequence of the normalized values. Consecutive sequences of characters are normalized as a unit, applying the rules above.

 
[ AttributeValueContent0 ..., AttributeValueContentn ]AttributeContent
==
([ AttributeValueContent0 ]AttributeContent , ..., [ AttributeValueContentn]AttributeContent)

An AttributeList is normalized by the following rule, which maps each of the individual attribute-value expressions in the attribute list and constructs a sequence of the normalized values.

 
[
QName0 S? = S? (AttributeValue0 | EnclosedExpr0) ...
QNamen S? = S? (AttributeValuen | EnclosedExprn) ...
]
==
(attribute [QName0] { [ (AttributeValue0 | EnclosedExpr0) ]AttributeContent},
...,
attribute [QNamen] { [ (AttributeValuen | EnclosedExprn) ]AttributeContent})

Core Grammar

There are no core grammar rules for literal XML element or attribute constructors as they are normalized to computed constructors.

Static Type Analysis

There are no additional static type rules for literal XML element or attribute constructors.

Dynamic Evaluation

There are no additional dynamic evaluation rules for literal XML element or attribute constructors.

5.7.2 Computed Element and Attribute Constructors

[95 (XQuery)]   ComputedDocumentConstructor   ::=   <"document" "{"> ExprSequence "}"
[96 (XQuery)]   ComputedElementConstructor   ::=   (<"element" QName "{"> |  (<"element" "{"> Expr "}" "{")) ExprSequence? "}"
[97 (XQuery)]   ComputedAttributeConstructor   ::=   (<"attribute" QName "{"> |  (<"attribute" "{"> Expr "}" "{")) ExprSequence? "}"
[98 (XQuery)]   ComputedTextConstructor   ::=   <"text" "{"> ExprSequence? "}"
5.7.2.1 Computed Text Nodes

Normalization

A text node constructor contains an expression, which must evaluate to an xs:string value.

 
[text { Expr }]Expr
==
text { [Expr]Expr }

Static Type Analysis

The static semantics checks that the argument expression has type xs:string. The type of the entire expression is simply the text node type.

statEnv |- Expr : xs:string

statEnv |- text { Expr } : text

Dynamic Evaluation

The dynamic semantics checks that the argument expression evaluates a value that has type xs:string. The entire expression evaluates to a new text node by applying the text node constructor dm:text-node.

dynEnv |- Expr => Value      dynEnv |- Value matches xs:string

statEnv |- text { Expr } => dm:text-node ( Value )

5.7.2.2 Computed Document Nodes

Normalization

A document node constructor contains an expression, which must evaluate to a sequence of element nodes.

 
[document { Expr }]Expr
==
document { [Expr]Expr }

Static Type Analysis

The static semantics checks that the argument expression has type sequence of elements. The type of the entire expression is simply the document node type.

statEnv |- Expr : element*

statEnv |- document { Expr } : document

Dynamic Evaluation

The dynamic semantics checks that the argument expression evaluates a value that has type element*. The entire expression evaluates to a new document node by applying the text node constructor .

dynEnv |- Expr => Value      dynEnv |- Value matches element*

statEnv |- document { Expr } => ( Value )

5.7.2.3 Computed Element Nodes

An element constructor may contain an arbitrary sequence of items, e.g., atomic values, attributes, and child elements. Here's an element constructor that contains an integer, an element, a string, and an attribute:

element address { 123, element street {"Roosevelt Ave."},
"Flushing, NY", attribute zip { "11368" } }

Normalization

Computed element constructors are normalized by mapping their name and expression sequence. The normalization rule partitions attribute nodes from other nodes and atomic values in the element content.

 
[element QName { ExprSequence }]Expr
==
let $e := [ExprSequence]Expr return
let $attributes := for $fs:new in $e return
     typeswitch $fs:new
     case $a as attribute return $a
     default return ()
let $everythingelse := for $fs:new in $e return
     typeswitch $fs:new
     case $a as attribute return ()
     default $v return $v
return
   element [QName] { $attributes, $everythingelse }

When the name of an element is computed, the normalization rule also checks that the value of the element's name is a xs:QName.

 
[element { Expr } { ExprSequence }]Expr
==
let $name as xs:QName := [Expr]Expr return
let $e := [ExprSequence]Expr return
let $attributes := for $fs:new in $e return
     typeswitch $fs:new
     case $a as attribute return $a
     default return ()
let $everythingelse := for $fs:new in $e return
     typeswitch $fs:new
     case $a as attribute return ()
     default $v return $v
return
   element { $name } { $attributes, $everythingelse }
5.7.2.4 Constructed Attribute Nodes

Normalization

Computed attribute expressions are normalized by mapping their sub-expressions.

 
[attribute QName { ExprSequence }]Expr
==
attribute [QName]Expr { [ExprSequence]Expr }
 
[attribute { Expr } { ExprSequence }]Expr
==
let $name as xs:QName := [Expr]Expr return
attribute { $name } { [ExprSequence]Expr }

Core Grammar

The core grammar rules for computed constructors are:

[71 (Core)]   ComputedElementConstructor   ::=   (<"element" QName "{"> |  (<"element" "{"> Expr "}" "{")) ExprSequence? "}"
[72 (Core)]   ComputedAttributeConstructor   ::=   (<"attribute" QName "{"> |  (<"attribute" "{"> Expr "}" "{")) ExprSequence? "}"
[70 (Core)]   ComputedDocumentConstructor   ::=   <"document" "{"> ExprSequence "}"
[73 (Core)]   ComputedTextConstructor   ::=   <"text" "{"> ExprSequence? "}"

Static Type Analysis

The normalization rules leave us with only the operator form of the element or attribute constructor to handle. The element (attribute) operator still has two forms: one in which a QName is supplied as the element (attribute) name, and the other in which a computed expression is supplied. In the latter case, the name cannot be known until runtime, and the element (attribute) is given a wildcard type.

Note that the static type for constructed elements and attributes is very poor as the static type of their content is lost during construction. A possible solution is to implicitly validate when the element is constructed. See [Issue-0169:  Conformance Levels] for suggestions.

statEnv |- expand(QName) = qname

statEnv |- element QName { ExprSequence } : element qname { xs:anyType }

statEnv |- Expr : xs:QName

statEnv |- element { Expr } { ExprSequence } : element { xs:anyType }

statEnv |- Expr : xs:QName

statEnv |- attribute { Expr } { ExprSequence } : attribute { xs:anySimpleType }

statEnv |- expand(QName) = qname

statEnv |- attribute QName { ExprSequence } : attribute qname { xs:anySimpleType }

Ed. Note: DD: the above is a necessary consequence of our current evaluation model. One variation that I can think of is that the static rules could try looking up the element name in element environment. If the element name is found, then the associated, named type is used (and the contents are required to match). If the named element type is not found, then the constructed type is used, as before. Thus, in this scenario, validation would be done whenever an element with a "global" name was created. See also comments on the impact of (non-)validation on element construction, in the next section.

Dynamic Evaluation

The following rules construct an element from its name and content. The earlier normalization rule guarantees that all attributes precede other element content, which is a sequence of node and atomic values. Section [3.7.4 Data Model Representation] in [XQuery 1.0: A Query Language for XML] specifies the rules for converting a sequence of atomic values into a text node prior to element construction. As formal specification of these conversion rules is not instructive, the function [7.1.3 The fs:item-sequence-to-node-sequence function] implements this conversion.

dynEnv |- QName => qname
dynEnv |- ExprSequence => ( AttributeValues, Items )

dynEnv |- element QName { ExprSequence } =>
       dm:element-node(qname,
dm:empty-sequence(),
AttributeValues,
fs:item-sequence-to-node-sequence(Items)
xs:anyType)

dynEnv |- Expr => qname
dynEnv |- ExprSequence => ( AttributeValues, Items )

dynEnv |- element { Expr } { ExprSequence } =>
       dm:element-node(qname,
dm:empty-sequence(),
AttributeValues,
fs:item-sequence-to-node-sequence(Items)
xs:anyType)

Note that the element constructor dm:element-node makes copies of any nodes in its arguments; therefore the result of element construction is a completely "new" element.

The following rules construct an attribute from its name and values and are similar to those for elements. Section [3.7.2 Data Model Representation] in [XQuery 1.0: A Query Language for XML] specifies the rules for converting a sequence of atomic values into a string prior to attribute construction. Each node is replaced by its string value. For each adjacent sequence of one or more atomic values returned by an enclosed expression, a string is constructed, containing the canonical lexical representation of all the atomic values, with a single blank character inserted between adjacent values. As formal specification of these conversion rules is not instructive, the function [7.1.4 The fs:item-sequence-to-string function] implements this conversion.

dynEnv |- QName => qname
dynEnv |- ExprSequence => AtomicValues

dynEnv |- attribute QName { ExprSequence } =>
       dm:attribute-node(qname,
fs:item-sequence-to-string(AtomicValues),
xs:anySimpleType)

dynEnv |- Expr => qname
dynEnv |- ExprSequence => AtomicValues

dynEnv |- attribute { Expr } { ExprSequence } =>
       dm:attribute-node(qname,
fs:item-sequence-to-string(AtomicValues),
xs:anySimpleType)

Ed. Note: DD: There are several issues with element construction:

  • Namespaces and more precise type annotations in the constructor are not supported. This is due to the bottom-up nature of element construction: in general, one does not know either the namespaces in scope or the validation-associated schema type until this element has been "seated" in some containing element (and so on recursively). See [Issue-0165:  Namespaces in element constructors].

    There is a possible solution to this chicken-and-egg problem, however: because the element constructor makes copies of its children, it could be the responsibility of the element constructor to "fill in" the values for namespaces-in-scope and schema-component on each newly-copied child (recursively), based on information provided for the node. In this scenario, these fields would remain "blank" until some appropriate activity caused a schema component to become associated with a node, etc.

    One implication of such a scheme would be that the "value" of elements could change as they are copied into a new containing element. For example, defaulted attributes could be added. Possibly the interpretation of data values would change as well, e.g. a data value supplied as a string could be re-interpreted as a number.

  • Any special treatment of xmlns, etc. that would be needed to associate namespaces with elements is not modeled.

  • Even if/when schema components are available, it is not clear when or how defaulted attributes and/or elements are created.

5.7.3 Whitespace in Constructors

Section [3.7.3 Whitespace in Constructors] in [XQuery 1.0: A Query Language for XML] describes how whitespace in element and attribute constructors is processed depending on the value of the xmlspace declaration in the query prolog. The formal semantics assumes that the rules for handling whitespace are applied prior to normalization rules, for example, during parsing of a query. Therefore, there are no formal rules for handling whitespace.

5.7.4 Other Constructors and Comments

[99 (XQuery)]   CdataSection   ::=   "<![CDATA[" Char* "]]>"
[100 (XQuery)]   XmlProcessingInstruction   ::=   "<?" PITarget Char* "?>"
[101 (XQuery)]   XmlComment   ::=   "<!--" Char* "-->"

Core Grammar

The core grammar productions for other constructors and comments are:

[75 (Core)]   XmlProcessingInstruction   ::=   "<?" PITarget Char* "?>"
[76 (Core)]   XmlComment   ::=   "<!--" Char* "-->"

Normalization

A literal XML character data (CDATA) section is normalized into a text node constructor by applying the rule for converting characters to a text node in element content.

 
[<![CDATA[" Char* "]]>]Expr
==
[Char*]ElementContent

A literal XML processing instruction is normalized into a processing instructor constructor; its character content is converted to a string as in attribute content.

 
[<? NCName Char* ?>"]Expr
==
dm:processing-instruction-node(NCName [Char*]AttributeContent)

A literal XML comment is normalized into a comment constructor; its character content is converted to a string as in attribute content.

 
[<!-- Char* -->]Expr
==
dm:comment-node([Char*]AttributeContent)

Static Type Analysis

There are no additional static type rules for CDATA, processing instructions or comments.

Dynamic Evaluation

There are no additional dynamic evaluation rules for CDATA, processing instructions or comments.

5.8 [For/FLWOR] Expressions

Introduction

[XPath/XQuery] provides [For/FLWOR] expressions for iteration, for binding variables to intermediate results, and filtering bound variables according to a predicate.

A FLWRExpr in XQuery 1.0 consists of a sequence of ForClauses and LetClauses, followed by an optional WhereClause, followed by an expression to be "return"ed, as described by the following grammar productions. Each variable binding is preceded by an optional type declaration which specify the type expected for the variable.

[28 (XQuery)]   FLWRExpr   ::=   ((ForClause |  LetClause)+ WhereClause? OrderByClause? "return")* QuantifiedExpr
[45 (XQuery)]   ForClause   ::=   <"for" "$"> VarName TypeDeclaration? PositionalVar? "in" Expr ("," "$" VarName TypeDeclaration? PositionalVar? "in" Expr)*
[46 (XQuery)]   LetClause   ::=   <"let" "$"> VarName TypeDeclaration? ":=" Expr ("," "$" VarName TypeDeclaration? ":=" Expr)*
[86 (XQuery)]   TypeDeclaration   ::=   "as" SequenceType
[48 (XQuery)]   PositionalVar   ::=   "at" "$" VarName
[47 (XQuery)]   WhereClause   ::=   "where" Expr
[57 (XQuery)]   OrderByClause   ::=   (<"order" "by"> |  <"stable" "order" "by">) OrderSpecList
[58 (XQuery)]   OrderSpecList   ::=   OrderSpec ("," OrderSpec)*
[59 (XQuery)]   OrderSpec   ::=   Expr OrderModifier
[60 (XQuery)]   OrderModifier   ::=   ("ascending" |  "descending")? (<"empty" "greatest"> |  <"empty" "least">)? ("collation" StringLiteral)?
[25 (XPath)]   ForExpr   ::=   (SimpleForClause "return")* QuantifiedExpr
[41 (XPath)]   SimpleForClause   ::=   <"for" "$"> VarName "in" Expr ("," "$" VarName "in" Expr)*

5.8.1 FLWR expressions

Notation

Individual [For/FLWOR] clauses are normalized by means of the auxiliary normalization rules:

[FLWRClause]FLWR(Expr)

Where FLWRClause can be any either a ForClause, a LetClause, or a WhereClause:

[66 (Formal)]   FLWRClause   ::=   ForClause |  LetClause |  WhereClause

Note that, as is, this auxiliary rule normalizes a fragment of the [For/FLWOR] expression, while taking the remainder of the expression (in Expr) as an additional parameter.

Normalization

Full [For/FLWOR] expressions are normalized to nested core expressions using two sets of normalization rules. Note that some of the rules also accept ungrammatical FLWRExprs such as "where Expr1 return Expr2". This does not matter, as normalization is always applied on parsed [XPath/XQuery] expressions, and ungrammatical FLWRExprs would be rejected by the parser beforehand.

The first set of rules is applied on a full [For/FLWOR] expression, splitting it at the clause level, then applying further normalization on each separate clause.

 
[ (ForClause | LetClause | WhereClause) FLWRExpr ]Expr
==
[(ForClause | LetClause | WhereClause)]FLWR([FLWRExpr])
 
[ (ForClause | LetClause | WhereClause) return Expr ]Expr
==
[(ForClause | LetClause | WhereClause)]FLWR([Expr])

Then each [For/FLWOR] clause is normalized separately. A ForClause may bind more than one variable, whereas a for expression in the [XPath/XQuery] core binds and iterates over only one variable. Therefore, a ForClause is normalized to nested for expressions:

 
[ for Variable1 TypeDeclaration1? PositionalVar1? in Expr1,..., Variablen TypeDeclarationn? PositionalVarn? in Exprn ] FLWR(Expr)
==
for Variable1 TypeDeclaration1? PositionalVar1? in [Expr1]Expr return
  ···
     for Variablen TypeDeclarationn? PositionalVarn? in [ Exprn ]Expr return Expr

Each individual for clause, is then normalized to always have a type declaration.

Note that the additional Expr parameter of the auxiliary normalization rule is used as the final return expression.

Likewise, a LetClause clause is normalized to nested let expressions:

 
[ let Variable1 TypeDeclaration1? := Expr1, ···, Variablen TypeDeclarationn? := Exprn]FLWR(Expr)
==
let Variable1 TypeDeclaration1? := [Expr1 ]Expr return
  ···
    let Variablen TypeDeclarationn? := [Exprn]Expr return Expr

A WhereClause is normalized to an IfExpr, with the else-branch returning the empty list:

 
[ where Expr1]FLWR(Expr)
==
if ( [Expr1]Expr ) then Expr else ()

Example

The following simple example illustrates, how a FLWRExpr is normalized. The for expression in the example below is used to iterate over two collections, binding variables $i and $j to items in these collections. It uses a let clause to binds the local variable $k to the sum of both numbers, and a where clause to select only those numbers that have a sum equal to or greater than the integer 5.

  for $i as xs:integer in (1, 2),
      $j in (3, 4)
  let $k := $i + $j
  where $k >= 5
  return
    <tuple>
       <i> { $i } </i>
       <j> { $j } </j>
    </tuple>

By the first set of rules, this is normalized to (except for the operators and element constructor which are not treated here):

  for $i as xs:integer in (1, 2) return
    for $j in (3, 4) return
      let $k := $i + $j return
        if ($k >= 5) then 
          <tuple>
            <i> { $i } </i>
            <j> { $j } </j>
          </tuple>
        else
          ()

For each binding of $i to an item in the sequence (1 , 2) the inner for expression iterates over the sequence (3 , 4) to produce tuples ordered by the ordering of the outer sequence and then by the ordering of the inner sequence. This core expression eventually results in the following document fragment:

  (<tuple>
      <i>1</i>
      <j>4</j>
   </tuple>,
   <tuple>
      <i>2</i>
      <j>3</j>
   </tuple>,
   <tuple>
      <i>2</i>
      <j>4</j>
   </tuple>)

with the static type:

  element tuple {
    element i { xs:integer },
    element j { xs:integer }
  }*

5.8.2 For expression

Core Grammar

After normalization, single for expressions are described by the following core grammar production.

[23 (Core)]   ForExpr   ::=   (ForClause OrderByClause? "return")* TypeswitchExpr
[30 (Core)]   ForClause   ::=   <"for" "$"> VarName TypeDeclaration? PositionalVar? "in" Expr
[32 (Core)]   PositionalVar   ::=   "at" "$" VarName
[62 (Core)]   TypeDeclaration   ::=   "as" SequenceType

Static Type Analysis

A single for expression is typed as follows: First Type1 of the iteration expression Expr1 is inferred. Then the prime type of Type1 - prime(Type1) - is determined. This is a choice over all item types in Type1 (see also [3.5 Judgments for sequences of item types]). With the variable component of the static environment statEnv extended with Variable1 as type prime(Type1) (and Variablepos as type xs:integer, if it is present), the type Type2 of Expr2 is inferred. Because the for expression iterates over the result of Expr1, the final type of the iteration is Type2 multiplied with the possible number of items in Type1 (one, ?, *, or +). This number is determined by the auxiliary type-function quantifier(Type1).

statEnv |- Expr1 : Type1
statEnv [ varType(Variable1 : prime(Type1)) ] |- Expr2 : Type2

statEnv |- for Variable1 in Expr1 return Expr2 : Type2 · quantifier(Type1)

statEnv |- Expr1 : Type1
statEnv [ varType(Variable1 : prime(Type1), Variablepos : xs:integer) ] |- Expr2 : Type2

statEnv |- for Variable1 at Variablepos in Expr1 return Expr2 : Type2 · quantifier(Type1)

In the case a type declaration is present, the static semantics also checks that the type of the input expression is a subtype of the declared type. This semantics is specified by the following typing rule.

statEnv |- Expr1 : Type1
Type0 = [ SequenceType ]sequencetype
Type1 <: Type0
statEnv [ varType(Variable1 : prime(Type1)) ] |- Expr2 : Type2

statEnv |- for Variable1 as SequenceType in Expr1 return Expr2 : Type2 · quantifier(Type1)

statEnv |- Expr1 : Type1
Type0 = [ SequenceType ]sequencetype
Type1 <: Type0
statEnv [ varType(Variable1 : prime(Type1), Variablepos : xs:integer) ] |- Expr2 : Type2

statEnv |- for Variable1 as SequenceType at Variablepos in Expr1 return Expr2 : Type2 · quantifier(Type1)

Example

For example, if $example is bound to the sequence (<one/> , <two/> , <three/>) of type element one {}, element two {}, element three {}, then the query

  for $s in $example
  return <out> {$s} </out>

is typed as follows:

  (1) prime(element one {}, element two {}, element three {}) =
      element one {} | element two {} | element three {}
  (2) quantifier(element one {}, element two {}, element three {}) = +
  (3) $s : element one {} | element two {} | element three {}
  (4) <out> {$s} </out> : 
      element out {element one {} | element two {} | element three {}}
  (5) result-type :
      element out {element one {} | element two {} | element three {}}+

This result-type is not the most specific type possible. It does not take into account the order of elements in the input type, and it ignores the individual and overall number of elements in the input type. The most specific type possible is: element out {element one {}}, element out {element two {}}, element out {element three {}}. However, inferring such a specific type for arbitrary input types and arbitrary return clauses requires significantly more complex type inference rules. In addition, if put into the context of an element, the specific type violates the "unique particle attribution" restriction of XML schema, which requires that an element must have a unique content model within a particular context.

Dynamic Evaluation

The evaluation of a for expression distinguishes two cases: If the iteration expression Expr1 evaluates to the empty sequence, then the entire expression evaluates to the empty sequence:

dynEnv |- Expr1 => Value      empty(Value)

dynEnv |- for Variable1 TypeDeclaration? in Expr1 return Expr2 => dm:empty-sequence()

Otherwise, the iteration expression Expr1, is evaluated to produce the sequence Item1, ..., Itemn. For each item Itemi in this sequence, the body of the for expression Expr2 is evaluated in the environment dynEnv extended with Variable1 bound to Itemi. This produces values Valuei, ..., Valuen which are concatenated to produce the result sequence. If the optional positional variable Variablepos is present, it is bound to the position of the item in the input sequence, i.e., the value i.

dynEnv |- Expr1 => Item1 ,..., Itemn
dynEnv [ varValue(Variable |-> Item1) ] |- Expr2 => Value1
···
dynEnv [ varValue(Variable |-> Itemn) ] |- Expr2 => Valuen

dynEnv |- for Variable in Expr1 return Expr2 => op:concatenate(Value1 ,..., Valuen)

dynEnv |- Expr1 => Item1 ,..., Itemn
dynEnv [ varValue(Variable |-> Item1, Variablepos |-> 1) ] |- Expr2 => Value1
···
dynEnv [ varValue(Variable |-> Itemn, Variablepos |-> n) ] |- Expr2 => Valuen

dynEnv |- for Variable at Variablepos in Expr1 return Expr2 => op:concatenate(Value1 ,..., Valuen)

In the case a type declaration is present, the dynamic semantics also checks that the input value matches the declared type. This semantics is specified as the following dynamic rule.

dynEnv |- Expr1 => Item1 ,..., Itemn
Type0 = [ SequenceType ]sequencetype
Item1 matches Type0
dynEnv [ varValue(Variable |-> Item1) ] |- Expr2 => Value1
···
Itemn matches Type0
dynEnv [ varValue(Variable |-> Itemn) ] |- Expr2 => Valuen

dynEnv |- for Variable as SequenceType in Expr1 return Expr2 => op:concatenate(Value1 ,..., Valuen)

dynEnv |- Expr1 => Item1 ,..., Itemn
Type0 = [ SequenceType ]sequencetype
Item1 matches Type0
dynEnv [ varValue(Variable |-> Item1, Variablepos |-> 1) ] |- Expr2 => Value1
···
Itemn matches Type0
dynEnv [ varValue(Variable |-> Itemn, Variablepos |-> n) ] |- Expr2 => Valuen

dynEnv |- for Variable as SequenceType at Variablepos in Expr1 return Expr2 => op:concatenate(Value1 ,..., Valuen)

Ed. Note: The dynamic semantics of for could be defined without the use of ··· and using recursion. See [Issue-0134:  Should we define for with head and tail?]. The important thing is that the definition allows non-deterministic evaluation of the sequence.

Example

Note that if the expression in the return clause results in a sequence, sequences are never nested in the [XPath/XQuery] data model. For instance, in the following for expression:

  
  for $i in (1,2)
    return (<i> {$i} </i>, <negi> {-$i} </negi>)

each iteration in the for results in a sequence of two elements, which are then concatenated and flattened in the resulting sequence (using the op:concatenate function):

  
  (<i>1</i>,
   <negi>-1</negi>,
   <i>2</i>,
   <negi>-2</negi>)

5.8.3 Let Expression

Core Grammar

After normalization, single let expressions are described by the following core grammar production.

[24 (Core)]   LetExpr   ::=   (LetClause "return")* TypeswitchExpr
[31 (Core)]   LetClause   ::=   <"let" "$"> VarName TypeDeclaration? ":=" Expr

Static Type Analysis

A let expression extends the type environment statEnv with Variable1 of type Type1 inferred from Expr1, and infers the type of Expr2 in the extended environment to produce the result type Type2.

statEnv |- Expr1 : Type1      statEnv [ varType(Variable1 : Type1) ] |- Expr2 : Type2

statEnv |- let Variable1 := Expr1 return Expr2 : Type2

In the case a type declaration is present, the static semantics also checks that the type of the input expression is a subtype of the declared type. This semantics is specified as the following typing rule.

statEnv |- Expr1 : Type1
Type0 = [ SequenceType ]sequencetype
Type1 <: Type0
statEnv [ varType(Variable1 : Type1) ] |- Expr2 : Type2

statEnv |- let Variable1 as SequenceType := Expr1 return Expr2 : Type2

Dynamic Evaluation

A let expression extends the dynamic environment dynEnv with Variable bound to Value1 returned by Expr1, and evaluates Expr2 in the extended environment to produce Value2.

dynEnv |- Expr1 => Value1
dynEnv [ varValue(Variable1 |-> Value1) ] |- Expr2 => Value2

dynEnv |- let Variable1 := Expr1 return Expr2 => Value2

In the case a type declaration is present, the dynamic semantics also checks that the input value matches the declared type. This semantics is specified as the following dynamic rule.

dynEnv |- Expr1 => Value1
Type0 = [ SequenceType ]sequencetype
Value1 matches Type0
dynEnv [ varValue(Variable1 |-> Value1) ] |- Expr2 => Value2

dynEnv |- let Variable1 as SequenceType := Expr1 return Expr2 => Value2

Example

Note the use of the environment discipline to define the scope of each variable. For instance, in the following nested let expression:

  let $k := 5 return
    let $k := $k + 1 return
      $k+1

the outermost let expression binds variable $k to the integer 5 in the environment, then the expression $k+1 is computed, yielding value 6, to which the second variable $k is bound. The expression then results in the final integer 7.

5.8.4 Order By

[36 (Core)]   OrderByClause   ::=   (<"order" "by"> |  <"stable" "order" "by">) OrderSpecList
[37 (Core)]   OrderSpecList   ::=   OrderSpec ("," OrderSpec)*
[38 (Core)]   OrderSpec   ::=   Expr OrderModifier
[39 (Core)]   OrderModifier   ::=   ("ascending" |  "descending")? (<"empty" "greatest"> |  <"empty" "least">)? ("collation" StringLiteral)?

Ed. Note: Issue: The semantics of order by is not currently specified. Revision is pending agreement about the semantics of sorting.See [Issue-0109:  Semantics of order by].

5.9 Unordered Expressions

Introduction

The unordered function returns its input sequence in any order.

Notation

The dynamic semantics for unordered use an auxiliary judgments which disregards order between the items in a sequence.

The following judgment

dynEnv |- Value1 permutes to Value2

holds if the sequence of items in Value2 is a permutation of the sequence of items in Value1.

Static Type Analysis

The static semantics for unordered uses the auxiliary type functions prime(Type) and quantifier(Type); which are defined in [3.5 Judgments for sequences of item types]. The type of each argument is determined, and then prime(.) and quantifier(.) are applied to the sequence type (Type1, Type2).

statEnv |- Expr1 : Type1

statEnv |- unordered(Expr1) : prime(Type1) · quantifier(Type1)

Dynamic Evaluation

The dynamic semantics for unordered is specified using the auxiliary judgments permutes to as follows.

dynEnv |- Expr1 => Value1
dynEnv |- Value1 permutes to Value2

statEnv |- unordered(Expr1) => Value2)

It is important to remark that the permutes to judgments is non deterministic. There are many sequences which can be a permutation of a given sequence. Any of those permutations would satisfy the above semantics.

5.10 Conditional Expressions

Introduction

A conditional expression supports conditional evaluation of one of two expressions.

[31 (XQuery)]   IfExpr   ::=   (<"if" "("> Expr ")" "then" Expr "else")* InstanceofExpr

Normalization

 
[if (Expr1) then Expr2 else Expr3]Expr
==
  if ([ Expr1 ]Effective_Boolean_Value) then [Expr2]Expr else [Expr3]Expr

where $fs:new is a newly created variable that does not appear in the rest of the query.

Core Grammar

The core grammar rule for the conditional expression is:

[26 (Core)]   IfExpr   ::=   (<"if" "("> Expr ")" "then" Expr "else")* CastableExpr

Static Type Analysis

statEnv |- Expr1 : xs:boolean      statEnv |- Expr2 : Type2      statEnv |- Expr3 : Type3

statEnv |- if (Expr1) then Expr2 else Expr3 : (Type2 | Type3)

Dynamic Evaluation

If the conditional's boolean expression Expr1 evaluates to true, Expr2 is evaluated and its value is produced. If the conditional's boolean expression evaluates to false, Expr3 is evaluated and its value is produced. Note that the existence of two separate evaluation rules ensures that only one branch of the conditional is evaluated.

dynEnv |- Expr1 => true      dynEnv |- Expr2 => Value2

dynEnv |- if Expr1 then Expr2 else Expr3 => Value2

dynEnv |- Expr1 => false      dynEnv |- Expr3 => Value3

dynEnv |- if Expr1 thenExpr2 elseExpr3 => Value3

5.11 Quantified Expressions

Introduction

[XPath/XQuery] defines two quantification expressions:

[29 (XQuery)]   QuantifiedExpr   ::=   ((<"some" "$"> |  <"every" "$">) VarName TypeDeclaration? "in" Expr ("," "$" VarName TypeDeclaration? "in" Expr)* "satisfies")* TypeswitchExpr
[26 (XPath)]   QuantifiedExpr   ::=   ((<"some" "$"> |  <"every" "$">) VarName "in" Expr ("," "$" VarName "in" Expr)* "satisfies")* IfExpr

Normalization

The quantification expressions are entirely normalized into other core expressions using the following normalization rules.

First, multiple variable quantifiers are normalized to single variable quantifiers.

 
[some Variable1 in Expr1, ..., Variablen in Exprn satisfies Expr]Expr
==
[
some Variable1 in Expr1 satisfies
   some Variable2 in Expr2 satistfies
         ...
     some Variablen in Exprn satistfies
     Expr
]Expr
 
[every Variable1 in Expr1, ..., Variablen in Exprn satisfies Expr]Expr
==
[
every Variable1 in Expr1 satisfies
   every Variable2 in Expr2 satistfies
         ...
     every Variablen in Exprn satistfies
     Expr
]Expr
 
[some Variable in Expr1 satisfies Expr2]Expr
==
fn:not( fn:empty(
   for Variable in [Expr1]Expr return
     if ( [ Expr2 ]Effective_Boolean_Value ) then 1
     else ()
   ))
 
[every Variable in Expr1 satisfies Expr2]Expr
==
fn:empty(
   for Variable in [Expr1]Expr return
     if ( fn:not( [ Expr2 ]Effective_Boolean_Value) ) then 1
     else ()
   )

Ed. Note: Note that the non-determinism in the semantics of quantified expressions is not modeled here. See [Issue-0136:  Non-determinism in the semantics].

Core Grammar

There are no core grammar rules for quantified expressions as they are normalized to other core expressions.

Static Type Analysis

There are no additional static type rules for the quantified expressions.

Dynamic Evaluation

There are no additional dynamic evaluation rules for the quantified expressions.

5.12 Expressions on SequenceTypes

Introduction

Expressions on SequenceTypes are expressions whose semantics depends on the type of some of the sub-expressions to which they are applied. The syntax of SequenceType expressions is described in [4.4.2 SequenceType].

5.12.1 Instance Of

[32 (XQuery)]   InstanceofExpr   ::=   CastableExpr ( <"instance" "of"> SequenceType )

Introduction

The SequenceType expression "Expr instance of SequenceType" is true if and only if the result of evaluating expression Expr is an instance of the type referred to by SequenceType.

Normalization

An "instance of" expression is normalized into a "typeswitch" expression. Note that the following normalization rule uses a variable $fs:new, which is a newly created variable which must not conflict with any variables already in scope. This variable is necessary to comply with the syntax of typeswitch expressions in the core [XPath/XQuery], but is never used.

 
[Expr instance of SequenceType]Expr
==
typeswitch ([ Expr ]Expr)
  case $fs:new as SequenceType return fn:true()
  default $fs:new return fn:false()

5.12.2 Typeswitch

[30 (XQuery)]   TypeswitchExpr   ::=   (<"typeswitch" "("> Expr ")" CaseClause+ "default" ("$" VarName)? "return")* IfExpr
[61 (XQuery)]   CaseClause   ::=   "case" ("$" VarName "as")? SequenceType "return" Expr

Introduction

The typeswitch expression of XQuery allows users to perform different operations according to the type of an input expression.

Each branch of a typeswitch expression may have an optional "Variable" statement, used to bind a variable to the result of the input expression. This variable is optional in [XPath/XQuery] but mandatory in the [XPath/XQuery] core. One of the reasons for having this variable is that it can have a more specific type for the corresponding branch.

Normalization

Normalization of typeswitch expressions is applied to make sure an appropriate "Variable" is present in each branch.

The following general normalization rule merely adds a newly created variable which does not appear in the rest of the query. Note that $fs:new is a newly generated variable that must not conflict with any variables already in scope but is not used in any of the sub-expressions.

 
[
typeswitch ( Expr0 )
  case SequenceType1 return Expr1
    ···
  case SequenceTypen return Exprn
  default return Exprn+1
]Expr
==
typeswitch ( [ Expr0 ]Expr )
  case $fs:new1 as SequenceType1 return [ Expr1 ]Expr
    ···
  case $fs:newn as SequenceTypen return [ Exprn ]Expr
  default $fs:newn+1 return [ Exprn+1 ]Expr

Core Grammar

The core grammar productions for typeswitch are:

[25 (Core)]   TypeswitchExpr   ::=   (<"typeswitch" "("> Expr ")" CaseClause+ "default" "$" VarName "return")* IfExpr
[40 (Core)]   CaseClause   ::=   "case" "$" VarName "as" SequenceType "return" Expr

Notation

The following auxiliary grammar production is used to identify branches of the typeswitch.

[67 (Formal)]   CaseRules   ::=   ("case" SequenceType "$" VarName "return" Expr CaseRules) |  ("default" "return" "$" VarName Expr)

Notation

The following judgment

dynEnv |- Value1 against CaseRules => Value2

is used in the dynamic semantics of typeswitch. It indicate that under the environment dynEnv, and with the input value of the typeswitch being "Value1", the given case rules yields the value Value2.

Notation

The following judgment

statEnv |- Type knowing (Type1 | ... | Typen) against CaseRules : Type2

is used in the static semantics of typeswitch. It indicates that under the type environment statEnv, with the input type of the typeswitch being "Type", and with the previously visited types Type1 | ... | Typen, the given case rules (e.g., "CaseRules") has type Type2. This typing judgment keep track of all previously visited types in the typeswitch. This additional information is used later on in typing the default clause.

Static Type Analysis

The typeswitch expression possesses one of the more complex sets of static typing rules. The rules account for the fact that if the static type of the conditional expression is known, then it is possible to determine statically that some of the case clauses do not apply, and thus do not contribute to the static type of a typeswitch expression.

Like the dynamic typeswitch rule, the static typeswitch rule relies upon auxiliary rules to determine the type of each of the case clauses and of the default clause. These auxiliary rules are provided after the main rule. Note that the type of the input expression is always treated as a sequence of prime types, using the "prime" and "quantifier" operations on types. This is necessary because the further typing rules compute the common prime types for each case clause of the type switch.

Ed. Note: Jerome: the use of the common prime types replaces the previous use of type intersection. Common prime types simplifies significantly the complexity in implementing typeswitch, but is less precise in certain cases.

statEnv |- Expr0 : Type0
CaseType0 = prime(Type0) · quantifier(Type0)
CaseType1 = [ SequenceType1 ]sequencetype
    ···
CaseTypen = [ SequenceTypen ]sequencetype
statEnv |- CaseType0 knowing () against case Variable1 as SequenceType1 return Expr1 : Type1
    ···
statEnv |- CaseType0 knowing (CaseType1 | ... | CaseTypen-1) against case SequenceTypen Variablen return Exprn : Typen
statEnv |- CaseType0 knowing (CaseType1 | ... | CaseTypen) against default Variablen+1 return Exprn+1 : Typen+1

statEnv |-  
(typeswitch (Expr0)
  case Variable1 as SequenceType1 return Expr1
    ···
  case Variablen as SequenceTypen return Exprn
  default Variablen+1 return Exprn+1)
 : Type1 | ... | Typen+1

The rules that determine the static type of each case clause in the typeswitch are given next. In each rule, one needs to compute the "common prime types" between the input type and the case clause SequenceType.

The first rule is applied if the "common prime types" is none. In that case, it is known for sure the corresponding case clause is not evaluated and that the corresponding result type is none. Thanks to this rule, it is often possible to infer a precise type for the overall typeswitch by eliminating some of the cases.

CaseType = [ SequenceType ]sequencetype
common-prime(prime(CaseType0), prime(CaseType)) = none

statEnv |- CaseType0 knowing (CaseType1 | ... | CaseTyper)) against case Variable as SequenceType return Expr : none

The second rule is applied if the "common prime types" is anything other than none. In that case, the input variable is added into the type environment and type inference is applied on the expression on the right-hand side of the case clause. Note that the type of the input variable is set to the "common prime types", and not to the input type.

CaseType = [ SequenceType ]sequencetype
Type0 = common-prime(prime(CaseType0), prime(CaseType))
Occurrence0 = common-occurrence(quantifier(CaseType0), quantifier(CaseType))
not(Type0 = none)
statEnv [ varType(Variable : (Type0 · Occurrence0)) ] |- Expr : Type1

statEnv |- CaseType0 knowing (CaseType1 | ... | CaseTyper)) against case Variable as SequenceType return Expr : Type1

Note that these two rules do not take the visited datatypes into account. The "default" clause differs from the other clauses in that it does not specify a SequenceType. The typing rule for the "default" clause uses the visited type instead. Intuitively, the type corresponding to the "default" clause is any type but the ones in the other cases clauses.

Therefore, if the type of the input expression is a subtype of the choice of all visited types, then it is known for sure that the case clause is not evaluated and that the type of the default clause is none.

CaseType0 <: (CaseType1 | ... | CaseTypen)

statEnv |- CaseType0 knowing (CaseType1 | ... | CaseTyper)) against default Variable return Expr : none

Otherwise, the input variable is added into the type environment, and type inference is applied to the expression on the right-hand side of the default clause. Note that the type of the input variable is set to the input type of the expression.

not(CaseType0 <: (CaseType1 | ... | CaseTypen))
statEnv [ varType(Variable : CaseType0) ] |- Expr : Type1

statEnv |- CaseType0 knowing (CaseType1 | ... | CaseTyper)) against default Variable return Expr : Type1

Ed. Note: Jerome: There is an asymmetry here. It would be nicer to be able to have the type be more precise, like for the other case clauses. The technical problem is the need for some form of negation. I think one could define a "non-common-primes" function that would do the trick, but I leave that open for now until further review of the new typeswitch section is made. See [Issue-0112:  Typing for the typeswitch default clause].

Example

The typing rules for typeswitch provides reasonably precise type information in a number of useful cases. For example consider the following simple schema and query.

    <xs:element name="book">
       <xs:complexType>
          <xs:element name="title" type="xs:string"/>
          <xs:element name="author>
             <xs:complexType>
                <xs:element name="name" type="xs:string"/>
                <xs:element name="affiliation" type="xs:string"/>
             </xs:complexType>
          </xs:element>
       </xs:complexType>
    </xs:element>
    
    <xs:element name="bib">
       <xs:complexType>
          <xs:element ref="book" minOccurs="0" maxOccurs="unbounded"/>
       </xs:complexType>
    </xs:element>
    
    ...
    
    for $x in $bib/book/*
    return
       typeswitch $x
       case $a as element author return $a/name
       case $e as element editor return $e
       case $t as element title  return <wrote>{data($t)}</wrote>
       default return ()

The static rules for typeswitch can determine (1) that, with the given input schema, the second case matching against element editor never applies, and (2) that the data() function in the third case matching against element title, will not raise an error, because $e is guaranteed to contain data rather than more nodes. The resulting type corresponds to the following schema fragment.

    <xs:choice minOccurs="0" maxOccurs="unbounded">
       <xs:element name="name" type="xs:string"/>
       <xs:element name="wrote" type="xs:string"/>
    </xs:choice>

The type rules for typeswitch do not, however, account for the interdependence between successive case clauses. Thus if two case clauses had overlapping SequenceTypes, the static rules would behave as if both case clauses "fired", rather than just the first one.

Ed. Note: Jerome: It seems that the simpler version of typeswitch proposed here would actually allow us to take previous case clauses into account. This is something worth exploring as it would improve the static analysis in a way that might be helpful to users. See [Issue-0112:  Typing for the typeswitch default clause].

Ed. Note: Issue: There seem to be cases where the current semantics of typeswitch breaks type substitutability. See [Issue-0174:  Semantics of 'element foo of type T'].

Dynamic Evaluation

The evaluation of a typeswitch proceeds as follows. First, the input expression is evaluated, yielding an input value. Then the first case clause whose SequenceType type matches that value is selected and its corresponding expression is evaluated.

Note that the dynamic environmentdynEnv is only extended with this binding by the first auxiliary rule, which applies, if the input value matches the corresponding sequence type, or by the third auxiliary rule for the default case.

dynEnv |- Expr => Value0
dynEnv |- Value0 against CaseRules => Value1

dynEnv |- typeswitch (Expr) CaseRules => Value1

If the value matches the sequence type, the first auxiliary rule applies: It extends the environment by binding the variable Variable to Value0 and evaluates the body of the case rule.

CaseType = [ SequenceType ]sequencetype
Value0 matches CaseType
dynEnv [ varValue(Variable |-> Value0) ] |- Expr => Value1

dynEnv |- Value0 against case Variable SequenceType return Expr CaseRules => Value1

If the value does not match the sequence type, the second auxiliary rule evaluates on the remaining case rules, and the current case rule is not evaluated.

CaseType = [ SequenceType ]sequencetype      not (Value0 matches CaseType)      dynEnv |- Value0 against CaseRules => Value1

dynEnv |- Value0 against case SequenceType Variable return Expr CaseRules => Value1

Finally, the last rule states that the "default" branch of a typeswitch expression always evaluates to its given expression.

dynEnv [ varValue(Variable |-> Value0) ] Expr => Value1

dynEnv |- Value0 against default Variable return Expr => Value1

5.12.3 Cast

[50 (XQuery)]   CastExpr   ::=   <"cast" "as"> SingleType ParenthesizedExpr
[87 (XQuery)]   SingleType   ::=   AtomicType "?"?

Cast expressions change the type and value of an expression against a given type.

Core Grammar

The core grammar production for cast as is:

[34 (Core)]   CastExpr   ::=   <"cast" "as"> SingleType ParenthesizedExpr
[63 (Core)]   SingleType   ::=   AtomicType "?"?

Introduction

The expression "cast as SequenceType ( Expr )" can be used to explicitly convert the result of an expression from one type to another. It changes both the type and value of the result of an expression, and can only be applied on an atomic value.

The semantics of cast expressions follows the specification given in Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document. The casting table in Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document indicates whether a cast is allowed or not. In case it is allowed, a specific cast function is applied, based on the input and output XML Schema simple types. The semantics of the cast function follows casting rules which are described in the the remainder of Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document and is not specified further here.

Normalization

The normalization of cast applies atomization to its argument. The type declaration asserts that the result is a single atomic value. Normalization differs whether the type is followed by ? or not.

 
[cast as AtomicType (Expr)]Expr
==
let $v as atomic value := [ Expr ]Atomize return
  cast as AtomicType ($v)
 
[cast as AtomicType? (Expr)]Expr
==
let $v as atomic value? := [ Expr ]Atomize return
  typeswitch ($v)
    case empty return ()
    default return cast as AtomicType ($v)

Notation

The following auxiliary judgments are used to represent access to the casting table and to the semantics of casting, as described in Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

The judgment:

Type1 cast allowed Type2 => { Y, M, N }

holds if casting from type Type1 to Type2 is always possible (Y), may be possible (M), or is not allowed (N).

The notation:

cast as Type2 ( Value1 ) => Value2

indicates that applying the casting rules for Type2 on Value1 yields the value Value2.

Static Type Analysis

If the cast table indicates the cast is not allowed (N), the system raises a static type error. Otherwise, the following static typing rules give the static semantics of "cast as" expression.

statEnv |- Expr : Type1
Type2 = [ SequenceType2 ]sequencetype
Type1 cast allowed Type2 => Y or Type1 cast allowed Type2 => M

statEnv |- cast as SequenceType2 (Expr) : Type2

Dynamic Evaluation

If the cast is allowed (Y or M), the following evaluation rule applies the casting rules on the result of the input expression. The rule uses the data model function dm:type in order to obtain the dynamic type of the input value, SequenceType normalization to obtain the output type, and the above auxiliary judgments to check whether the cast is allowed and to apply the casting rules.

dynEnv |- Expr1 => Value1
Type2 = [ SequenceType2 ]sequencetype
cast as Type2 ( Value1 ) => Value2

dynEnv |- cast as SequenceType2 ( Expr1 ) => Value2

Note that in the case that the casting table indicates "M", the casting operation is allowed but might fail at run-time if the input value is inappropriate (e.g. attempting to cast the string "VRAI" into xs:boolean). In that case, the dynamic evaluation raises an error.

If the casting table returns "N", the cast is not allowed and the dynamic semantics always returns an error value.

dynEnv |- Expr1 => Value1
dm:type(Value1) = Type1
Type2 = [ SequenceType2 ]sequencetype
Type1 cast allowed Type2 => N

dynEnv |- cast as SequenceType2 ( Expr1 ) => fn:error()

5.12.4 Castable

[33 (XQuery)]   CastableExpr   ::=   ComparisonExpr ( <"castable" "as"> SingleType )

Castable expressions check whether a value can be cast to a given type.

Core Grammar

The core grammar production for cast as is:

[27 (Core)]   CastableExpr   ::=   ValueExpr ( <"castable" "as"> SingleType )

Normalization

The normalization of castable simply maps its expression argument.

 
[Expr castable as AtomicType]Expr
==
let $v as atomic value := [ Expr ]Atomize return
$v castable as AtomicType
 
[Expr castable as AtomicType]Expr
==
let $v as atomic value? := [ Expr ]Atomize return
$v castable as AtomicType?

Static Type Analysis

The static semantics of the CastableExpr requires that the type of its expression argument and the target type be atomic types.

statEnv |- Expr : Type1
Type2 = [ SequenceType2 ]sequencetype
Type1 <: fs:anyAtomicType
Type2 <: fs:anyAtomicType

statEnv |- Expr castable as SequenceType2 : xs:boolean

Ed. Note: fs:anyAtomicType is still under discussion; it denotes any singleton atomic type, i.e., any non-list type derived from xs:anySimpleType.

Dynamic Evaluation

If the casting table returns "Y" or "M" and the cast does not raise an error, then 'castable as' evaluates to true.

dynEnv |- Expr1 => Value1
Type2 = [ SequenceType2 ]sequencetype
cast as Type2 ( Value1 ) => Value2

dynEnv |- Expr1 castable as SequenceType2 => fn:true()

If the casting table returns "N", the cast is not allowed and then 'castable as' evaluates to false.

dynEnv |- Expr1 => Value1
dm:type(Value1) = Type1
Type2 = [ SequenceType2 ]sequencetype
Type1 cast allowed Type2 => N

dynEnv |- Expr1 castable as SequenceType2 => fn:false()

5.12.5 Constructor Functions

Constructor functions provide an alternative syntax for casting.

Normalization

Constructor functions are normalized to explicit cast as operations.

 
[AtomicType(Expr)]Expr
==
[cast as AtomicType (Expr)]Expr

5.12.6 Treat as

[51 (XQuery)]   TreatExpr   ::=   <"treat" "as"> SequenceType ParenthesizedExpr

Introduction

The expression "treat as SequenceType ( Expr )", can be used to change the static type of the result of an expression without changing its value. Treat as raises a static type error, if the static type of expression and the specified type are incompatible. Treat as raises a run-time error, if the dynamic type of the expression is not an instance of the specified type.

Normalization

Treat as expressions are normalized to typeswitch expressions. Note that the following normalization rule uses a variable $fs:new, which is a newly created variable which must not conflict with any variables already in scope.

 
[treat as SequenceType ( Expr )]Expr
==
typeswitch ([ Expr ]Expr) as $fs:new
  case SequenceType return $fs:new
  default return fn:error()

5.13 Validate Expressions

[49 (XQuery)]   ValidateExpr   ::=   (<"validate" "{"> |  (<"validate" "context"> SchemaGlobalContext ("/" SchemaContextStep)* "{")) Expr "}"

Core Grammar

The core grammar production for validate is:

[33 (Core)]   ValidateExpr   ::=   (<"validate" "{"> |  (<"validate" "context"> SchemaGlobalContext ("/" SchemaContextStep)* "{")) Expr "}"

A validate expression validates its argument with respect to the in-scope schema definitions, using the schema validation process described in [XML Schema]. The argument to a validate expression may be any sequence of elements. Validation replaces element and attribute nodes with new nodes that have their own identity and contain type annotations and defaults created by the validation process. If a node that has a parent is validated, the parent of the original node will not be the parent of the validated node.

Static Type Analysis

Ed. Note: Issue: The current form of validate cannot be typed statically. See [Issue-0166:  Static typing for validate].

Dynamic Evaluation

An important effect of validation is that it removes existing type annotations and values (erasure), and it revalidates the corresponding data model instance, possibly adding new type annotations and values (annotation). Both erasure and annotation are described formally in [3.8 Judgments for the validate expression]. Indeed, the conjunction of erasure and annotation provides a formal model for a large part of actual schema validation. The semantics of the "validate" operation is (partially) specified as follows.

dynEnv.varValue |- Expr => Value1
Value1 erases to element ElementName { Value2 }
annotate as element ElementName (element ElementName { Value2 }) => Value3

dynEnv |- validate element ElementName { Expr } => Value3

Ed. Note: Issue: This semantics only applies when validating against a top-level element. See [Issue-0138:  Semantics of Schema Context]

6 The Query Prolog

Introduction

The Query Prolog is a series of declarations and definitions that affect query processing. The Query Prolog can be used to define namespaces, import type definitions from XML Schemas, and define functions. Namespace declarations and schema imports always precede function definitions, as specified by the following grammar productions.

[21 (XQuery)]   Query   ::=   QueryProlog QueryBody
[22 (XQuery)]   QueryProlog   ::=   (NamespaceDecl
|  XMLSpaceDecl
|  DefaultNamespaceDecl
|  DefaultCollationDecl
|  SchemaImport)* FunctionDefn*
[23 (XQuery)]   QueryBody   ::=   ExprSequence?

The order in which functions are defined is immaterial. Notably, user-defined functions may invoke other user-defined functions in any order.

Namespace Declarations and Schema Import are not part of the query proper but are used to modify the input context for the rest of the query processing. Namespace Declarations and Schema Import are processed before the normalization phase.

The semantics of Schema Import is described in terms of the XQuery type system. The process of converting an XML Schema into a sequence of type declarations is described in Section [8 Importing Schemas]. This section describes how the resulting sequence of type declarations is added into the static environment when the prolog is processed.

Notation

Prolog declarations are either namespace declarations or type declarations.

[68 (Formal)]   PrologDeclList   ::=   PrologDecl*
[69 (Formal)]   PrologDecl   ::=   NamespaceDecl
|  XMLSpaceDecl
|  DefaultNamespaceDecl
|  DefaultCollationDecl
|  SchemaImport

Notation

The following auxiliary judgments are used when processing the [XPath/XQuery] prolog.

The judgment:

PrologDeclList => statEnv

holds if the sequence of prolog declarations PrologDeclList yields the static environment statEnv.

The judgment:

statEnv1 |- PrologDecl => statEnv2

holds if under the static environment statEnv1, the single prolog declarations PrologDeclList yields the new static environment statEnv2.

Context Processing

Prolog declarations are processed in the order they are encountered, as described by the following inference rules. The first rule specifies that for an empty sequence of prolog declarations, the static environment is composed of a default context.

Ed. Note: Jerome: What do the default namespace and type environments contain? I believe at least the default namespace environment should contain the "xs", "fn" and "op" prefixes, as well as the default namespaces bound to the empty namespace. Should the default type environment contain wildcard types? See [Issue-0115:  What is in the default context?].


() => statEnvDefault

PrologDeclList => statEnv1
statEnv1 |- PrologDecl => statEnv2

PrologDecl PrologDeclList => statEnv2

6.1 Namespace Declarations

[109 (XQuery)]   NamespaceDecl   ::=   <"declare" "namespace"> NCNameForPrefix "=" URLLiteral
[111 (XQuery)]   DefaultNamespaceDecl   ::=   (<"default" "element"> |  <"default" "function">) "namespace" "=" URLLiteral

Context Processing

A namespace declaration adds a new (prefix,uri) binding in the namespace component of the static environment.

statEnv1 = statEnv [ namespace(NCName |-> StringLiteral) ]

statEnv |- namespace NCName = StringLiteral => statEnv1

A default element namespace declaration changes the default element namespace prefix binding in the namespace component of the static environment.

statEnv1 = statEnv [ namespace(fsdefaultelem |-> StringLiteral) ]

statEnv |- default element namespace = StringLiteral => statEnv1

A default function namespace declaration changes the default function namespace prefix binding in the namespace component of the static environment.

statEnv1 = statEnv [ namespace(fsdefaultfunc |-> StringLiteral) ]

statEnv |- default function namespace = StringLiteral => statEnv1

Note that for namespaces, later declarations can override earlier declarations of the same prefix.

6.2 Schema Imports

[114 (XQuery)]   SchemaImport   ::=   <"import" "schema"> (StringLiteral |  SubNamespaceDecl |  DefaultNamespaceDecl) <"at" StringLiteral>?
[110 (XQuery)]   SubNamespaceDecl   ::=   "namespace" NCNameForPrefix "=" URLLiteral

Notation

The following auxiliary judgments are used when processing schema imports.

The judgment:

statEnv1 |- Definition* => statEnv2

holds if under the static environment statEnv1, the sequence of type definitions Definition* yields the new static environment statEnv2.

The judgment:

statEnv1 |- Definition => statEnv2

holds if under the static environment statEnv1, the single definition Definition yields the new static environment statEnv2.

Context Processing

Schema imports are first imported into the query type system and yield a sequence of type definitions. Then each type definitions is added to the static environment.

Definition* = [Schema]Schema
statEnv |- Definition* => statEnv1

statEnv |- import schema Schema => statEnv1

An empty sequence of definitions yields the original environment.


statEnv |- () => statEnv

Each definition is added into the static environment.

statEnv |- Definition* => statEnv1
statEnv1 |- Definition1 => statEnv2

Definition1 Definition* => statEnv2

Type, element and attribute declarations are added respectively to the type, element and attribute declarations components of the static environment.

statEnv1 = statEnv [ typeDefn(TypeName |-> define type TypeName TypeDerivation ) ]

statEnv |- define type TypeName TypeDerivation => statEnv1

statEnv1 = statEnv [ elemDecl(ElementName |-> define element ElementName Substitution? Nillable? TypeReference) ]

statEnv |- define element ElementName Substitution? Nillable? TypeReference => statEnv1

statEnv1 = statEnv [ attrDecl(AttributeName |-> define attribute AttributeName TypeReference) ]

statEnv |- define attribute AttributeName TypeReference => statEnv1

Note that for namespaces, later declarations can override earlier declarations of the same prefix. In the case of global elements, attributes and types, multiple declarations correspond to an error.

6.3 Xmlspace Declaration

[107 (XQuery)]   XMLSpaceDecl   ::=   <"declare" "xmlspace"> "=" ("preserve" |  "strip")

The impact of xmlspace declaration is not formally specified in this document.

6.4 Default Collation

[108 (XQuery)]   DefaultCollationDecl   ::=   <"default" "collation" "="> URLLiteral

The impact of default collation is not formally specified in this document. See [Issue-0160:  Collations in the static environment].

6.5 Function Definitions

Introduction

User defined functions specify the name of the function, the names and types of the parameters, and the type of the result. The function body defines how the result of the function is computed from its parameters.

[112 (XQuery)]   FunctionDefn   ::=   <"define" "function"> <QName "("> ParamList? (")" |  (<")" "as"> SequenceType)) EnclosedExpr
[113 (XQuery)]   ParamList   ::=   Param ("," Param)*
[82 (XQuery)]   Param   ::=   "$" VarName TypeDeclaration?

Notation

The following auxiliary mapping rule is used for the normalization of parameters in function definitions: []Param.

Normalization

The only form of normalization required for user defined functions is adding the type for its parameters or for the return clause if it is not provided.

 
[ define function QName ( ParamList? ) as SequenceType EnclosedExpr ]Expr
==
define function [QName] ( [ParamList?]Param ) as SequenceType [EnclosedExpr]Expr

If the return type of the function is not provided, it is given the item* sequence type.

 
[define function QName ( ParamList? ) EnclosedExpr ]Expr
==
define function [QName] ( [ParamList?]Param ) as item* [EnclosedExpr]Expr

Parameters without a declared typed are given the item* sequence type.

 
[Variable]Param
==
item* Variable
 
[SequenceType Variable]Param
==
SequenceType Variable

Context Processing

First, all the function signatures are added into the static environment, and all the function bodies are added into the dynamic environment. This process happens before static type analysis occurs.

FunctionDefnList => statEnv, dynEnv
statEnv' = statEnv [ funcType(QName |-> ( SequenceType1 , ··· SequenceTypen , SequenceTyper)) ]
dynEnv' = dynEnv [ funcDefn(QName |-> ( Expr , Variable1 , ··· Variablen)) ]

FunctionDefnList define function QName ( SequenceType1 Variable1, ··· SequenceTypen Variablen )
as SequenceTyper
{ Expr } => statEnv', dynEnv'

Static Type Analysis

The static typing rules for function definitions checks whether the type of the enclosed expression is consistent with the type of the input parameters, and the type of the return clause.

statEnv [ varType( Variable1 : SequenceType1 ;...; Variablen : SequenceTypen ) ] |- Expr : Type
Type <: SequenceTyper

statEnv |- define function QName ( SequenceType1 Variable1 , ··· SequenceTypen Variablen )
as SequenceTyper
{ Expr }

What this typing rule is checking is: if the input parameters are of the given type, then is it true that the result of the function is of the return type. If the type checking fails, the system raises an error. Otherwise, it does not have any other effect, as function signatures are already inside the static environment.

Dynamic Evaluation

There is no need to describe a dynamic semantics at this point, as functions are only evaluated when called. The actual semantics of function calls is described in [5.1.4 Function Calls].

7 Additional Semantics of Functions

Ed. Note: Status: This section is still incomplete. This section will be completed as soon as Sections 4 and 5 are consolidated. See [Issue-0135:  Semantics of special functions].

As was explained in section [2.5 Functions], a number of functions play a role in defining the formal semantics of [XPath/XQuery]. Some other functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document need special static typing rules. This section gives the semantics of all "special" functions, used in the formal semantics.

7.1 Formal Semantics Functions

Introduction

This section gives the definition and semantics of functions which are used in the formal semantics but are not part of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

7.1.1 The fs:characters-to-string function

The fs:characters-to-string function takes a sequence of character values and synthecizes a string.

7.1.2 The fs:distinct-doc-order function

The fs:distinct-doc-order function sorts by document order and removes duplicates. It is defined as the composition of the [XQuery 1.0 and XPath 2.0 Functions and Operators] functions fn:distinct-nodes and sorting by document order.

The dynamic semantics of fs:distinct-doc-order cannot be specified as a simple order by expression. See [Issue-0168:  Sorting by document order].

7.1.3 The fs:item-sequence-to-node-sequence function

The fs:item-sequence-to-node-sequence function converts a sequence of item values to nodes by applying the rules in Section [3.7.2 Data Model Representation] in .

7.1.4 The fs:item-sequence-to-string function

The fs:item-sequence-to-node-sequence function converts a sequence of item values to a string by applying the rules in Section [3.7.2 Data Model Representation] in .

In particular, each node is replaced by its string value. For each adjacent sequence of one or more atomic values returned by an enclosed expression, a string is constructed, containing the canonical lexical representation of all the atomic values, with a single blank character inserted between adjacent values.

7.2 Functions with specific typing rules

7.2.1 The fn:error function

Static Type Analysis

The fn:error function always returns the none type.


statEnv |- fn:error() : none

7.2.2 The fn:distinct-nodes, and fn:distinct-values functions

Static Type Analysis

The fn:distinct-nodes function takes a sequence of nodes as input and returns a sequence of prime types.

statEnv |- Expr : Type
statEnv |- Type <: node*

statEnv |- fn:distinct-nodes(Expr) : prime(Type) · quantifier(Type)

The fn:distinct-values function takes a sequence of atomic values as input and returns a sequence of prime types.

statEnv |- Expr : Type
statEnv |- Type <: atomic value*

statEnv |- fn:distinct-values(Expr) : prime(Type) · quantifier(Type)

7.2.3 The op:union, op:intersect and op:except operators

Static Type Analysis

The static semantics for op:union uses the auxiliary type functions prime(Type) and quantifier(Type); which are defined in [3.5 Judgments for sequences of item types]. The type of each argument is determined, and then prime(.) and quantifier(.) are applied to the sequence type (Type1, Type2).

statEnv |- Expr1 : Type1      statEnv |- Expr2 : Type2

statEnv |- op:union(Expr1, Expr2) : prime(Type1 , Type2) · quantifier(Type1 , Type2)

The static semantics of op:intersect is analogous to that for op:union, but uses the common-prime and common-occurrence operations. Because an intersection may always be empty, the result type needs to be made optional.

statEnv |- Expr1 : Type1
statEnv |- Expr2 : Type2
statEnv |- PrimeType3 = common-prime(prime(Type1), prime(Type2))
statEnv |- Occurrence3 = common-occurrence(quantifier(Type1), quantifier(Type2))

statEnv |- op:intersect(Expr1, Expr2) : PrimeType3 · Occurrence3 ?

The static semantics of op:except follows. The type of the second argument is ignored as it does not contribute to the result type. Like with op:intersect the result of op:except may be the empty list.

statEnv |- Expr1 : Type1

statEnv |- op:except(Expr1, Expr2) : prime(Type1) · quantifier(Type1) · ?

7.2.4 The op:to operator

The static semantics of the op:to function states that it always returns an integer sequence:

statEnv |- Expr1 : xs:integer      statEnv |- Expr2 : xs:integer

statEnv |- op:to(Expr1, Expr2) : xs:integer*

Ed. Note: MFF: the binary operator "to" is not defined on empty sequences. The [XQuery 1.0 and XPath 2.0 Functions and Operators] document says operands are decimals, while the XQuery document says they are integers. What happens when Expr1 > Expr2? See [Issue-0119:  Semantics of op:to].

7.2.5 The fn:data function

Introduction

The fn:data function is used to access the value content of an element or attribute. This function corresponds to the dm:typed-value accessor in the [XPath/XQuery] data model.

Ed. Note: Some aspects of the semantics of the fn:data() function are still an open issue. For instance, what should be the result of fn:data() over a text node. See [Issue-0107:  Semantics of data()].

Notation

Infering the type for the fn:data function is done by applying the fn:data function as a Filter, using the same approach as for the XPath steps.

The following notation, adapted from the Filter judgment in [5.2.1 Steps], is used.

dynEnv.varValue ; Type1 |- fn:data : Type2

Static Type Analysis

Static type analysis for the fn:data function checks that the function is applied on an element or attribute with a simple type content. If so, it returns the corresponding simple type through an application of the Filter judgment on the input type of the function.

statEnv |- Expr : Type1
statEnv |- Type1 <: (element { attribute*, xs:anySimpleType } | attribute })
dynEnv.varValue ; Type1 |- fn:data : Type2

statEnv |- fn:data(Expr) : Type2


statEnv ; element ElementName { AttrType, SimpleType } |- fn:data : SimpleType;


statEnv ; attribute AttributeName { SimpleType } |- fn:data : SimpleType;

If applied on any other kind of item, it returns the empty sequence.

statEnv |- Expr : Type
statEnv |- not(Type <: (element { attribute*, xs:anySimpleType } | attribute))
statEnv |- Type <: item

dynEnv |- fn:data(Expr) : ()

Otherwise (for empty sequences or sequences of more than one item value), it raises an error.

Example

Consider the following variables and its corresponding static type.

  
    $x : (element price { attribute currency { xs:string }, xs:decimal }
         | element price_code { xs:integer })

Applying the fn:data function on that variable results in the following type.

    fn:data($x) : (xs:decimal | xs:integer)

Remark that, as the input type is a choice, applying the Filter judgment results in a choice of simple types for the output of the fn:data function.

Dynamic Evaluation

Dynamically, the fn:data function is implemented as the dm:typed-value data model accessor.

dynEnv |- Expr => Value1      dm:typed-value(Value1) = Value2

dynEnv |- fn:data(Expr) => Value2

8 Importing Schemas

Ed. Note: Status:This section has been extensively revised to reflect the addition of named typing to the [XPath/XQuery] type system, and improve the editorial level. Feedback on the new mapping and on the new presentation is solicited.

This section describes how XML Schema declarations, as specified by [XML Schema] are imported into the [XPath/XQuery] type system.

8.1 Introduction

At compile time, the [XPath/XQuery] environment imports XML Schema declarations and loads them as declarations in the [XPath/XQuery] type system. The semantics of that loading process is defined by normalization rules that map XML Schema descriptions into the [XPath/XQuery] type system.

8.1.1 Features

Here is summarized the XML Schema features which are covered by the formal semantics, and handled by the import mapping described in this section. For each feature, the following indications are used.

  • Handled indicates features which are relevant for [XPath/XQuery], are modeled within the [XPath/XQuery] type system, and handled by the mapping.

  • Not yet indicates features which are relevant for [XPath/XQuery], but are not yet modeled within the [XPath/XQuery] type system or are not yet handled by the mapping. Those features correspond to open issues. In case the [XPath/XQuery] type system provides appropriate support for those features, but the mapping is incomplete, the additional annotation mapping only is used.

  • Not handled indicates features which are relevant for [XPath/XQuery], but are not modeled within the [XPath/XQuery] type system, and are not handled by the mapping. Such feature are typically only related to validation, for which the formal semantics will only give a partial model.

  • Ignored Indicates features which are not relevant for [XPath/XQuery], are not modeled within the [XPath/XQuery] type system, and are not relevant for the mapping. Such features might have to do with documentation of the schema, or might affect which Schemas are legal, but do not affect which documents match which Schemas.

Here is the exhaustive list of XML Schema features and their status in this document.

Feature:Supported
Primitive Simple typesHandled
Simple type derivation by restrictionHandled
Derivation by list and unionHandled
Facets on simple typesNot handled
ID and IDREF constraintsIgnored
Attribute Declarations
    default,fixed,useNot yet
Element Declarations
    default, fixed (value constraint)Not yet
    nillableHandled
    substitution group affiliationHandled
    substitution group exclusionsIgnored
    disallowed substitutionsIgnored
    abstractNot yet
Complex Type Definitions
    derivation by restrictionHandled
    derivation by extensionHandled
    finalIgnored
    abstractNot yet
AttributeUses
    requiredNot yet, mapping only
    default, fixed (value constraint)Not yet
Attribute Group DefinitionsNot yet, mapping only
Model Group DefinitionsNot yet, mapping only
Model GroupsHandled
ParticlesHandled
Wildcards
    process contents strict, skip, laxNot yet
Identity-constraint DefinitionsIgnored
Notation DeclarationsIgnored
AnnotationsIgnored

Note that the schema import feature specified here assumes it is given a legal schema as input. As a result, it is not necessary to check for 'block' or 'abstract' attributes.

8.1.2 Organization

The presentation of the schema mapping is done according to the following organization.

Schema component

First each schema component is summarized using the same notation used in the XML Representation Summary sections in [XML Schema]. For instance, here is the XML Representation Summary for complex types.

  <complexType
        [ ignored ]   abstract = boolean : false
        [ ignored ]   block = (#all | List of (extension | restriction))
        [ ignored ]   final = (#all | List of (extension | restriction))
        [ ignored ]   id = ID
        mixed = boolean : false
        name = NCName
        [ ignored ]   {any attributes with non-schema namespace . . .} >
        Content: (annotation?, (simpleContent | complexContent | ((group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?))))
  </complexType>

Attributes indicated as [ ignored ] are not mapped into the [XPath/XQuery] type system.

Attributes indicated as [ not handled ] are not currently handled by the mapping.

Note that in order to simplify the mapping, it is assumed that the default values for all attributes in the XML Representation of Schema are filled in. For instance in the above complex type, if the mixed attribute is not present, it will be treated as being present and having the value "false".

Schema mapping

XML Schema import is specified by means of mapping rules. All mapping rules have the structure below.

 
[SchemaComponent]Subscript
==
TypeComponent

The SchemaComponent above the horizontal rule denotes an XML Schema component before translation and the TypeComponent beneath the horizontal rule denotes an equivalent type component in the [XPath/XQuery] type system.

Notation

Whenever necessary for the mapping rules, specific grammar productions which describe fragments of XML Schema may be introduced. For instance, here are grammar productions used to describes fragments of the XML Representation Summary for the complexType Element Information Item.

[53 (Formal)]   ComplexTypeContent   ::=   "annotation"? ("simpleContent" |  "complexContent" |  (ChildrenContent AttributeContent))
[56 (Formal)]   AttributeContent   ::=   ("attribute" |  "attributeGroup")* "anyAttribute"?
[54 (Formal)]   ChildrenContent   ::=   ("group" |  "all" |  "choice" |  "sequence")?

As in the rest of this document, some mapping rules may use fragments of the XML Representation corresponding to the syntactic categories defined by those grammar productions. For instance, the following complex type fragment uses the syntactic categories: TypeName, ComplexTypeContent, and AttributeContent, ChildrenContent, and MixedAttribute.

  <complexType
        name = TypeName
       MixedAttribute>
       ChildrenContent AttributeContent
  </complexType>

8.1.3 Main mapping rules

Notation

The normalization rule

 
[Schema]Schema
==
Definition*

maps a complete schema into a set of Definition in the [XPath/XQuery] type system.

The normalization rule

 
[SchemaComponent]definition(targetNCName)
==
Definition

maps a toplevel schema component into a Definition in the [XPath/XQuery] type system, given the target namespace .

The normalization rule

 
[SchemaComponent]content(targetNCName)
==
TypeComponent

maps a schema component not directly under the schema element, into a TypeComponent in the [XPath/XQuery] type system, given the target namespace .

8.1.4 Special attributes

The XML Schema attributes: use, minOccurs, maxOccurs, mixed, nillable, and substitutionGroup, require specific mapping rules.

8.1.4.1 use

The "use" attribute is used to describe the occurrence and default behavior of a given attribute.

Notation

The following auxiliary grammar productions are used to describe the "use" attribute.

The normalization rule

 
[UseAttribute]