The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.


W3C

XQuery 1.0 and XPath 2.0 Formal Semantics

W3C Proposed Recommendation 21 November 2006

This version:
http://www.w3.org/TR/2006/PR-xquery-semantics-20061121/
Latest version:
http://www.w3.org/TR/xquery-semantics/
Previous versions:
http://www.w3.org/TR/2005/CR-xquery-semantics-20051103/ http://www.w3.org/TR/2005/WD-xquery-semantics-20050915/
Editors:
Denise Draper, Microsoft <denised@microsoft.com>
Peter Fankhauser (XML Query WG), Infonyte GmbH <fankhaus@infonyte.com>
Mary Fernández (XML Query WG), AT&T Labs - Research <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Oracle Corporation <ashok.malhotra@oracle.com>
Kristoffer Rose (XSL WG), IBM T.J. Watson Research Center <krisrose@us.ibm.com>
Michael Rys (XML Query WG), Microsoft <mrys@microsoft.com>
Jérôme Siméon (XML Query WG), IBM T.J. Watson Research Center <simeon@us.ibm.com>
Philip Wadler (XML Query WG), University of Edinburgh <wadler@inf.ed.ac.uk>

This document is also available in these non-normative formats: XML and Recent revisions.


Abstract

This document defines formally the semantics of XQuery 1.0 [XQuery 1.0: An XML Query Language] and XPath 2.0 [XML Path Language (XPath) 2.0].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Proposed Recommendation. On 3 November 2005, this specification was published as a Candidate Recommendation, and a Call for Implementations was announced. On 08 June 2006, a revisionwas published in order to give visibility to the technical decisions that had been made inresponseto commentsreceived since initial publicationof the CandidateRecommendation.

Publication as a Proposed Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. This specification will remain a Proposed Recommendation until at least 31 December 2006.

This document was produced jointly by the XML Query Working Group and the XSL Working Group, both of which are part of the XML Activity.

This draft includes corrections and changes based on public comments recorded in the W3C public Bugzilla repository (http://www.w3.org/Bugs/Public/) used for issue tracking on theCandidate Recommendation.A list of substantive changes since the publication of the Candidate Recommendation of 03 November 2005 can be found in [G Revision Log].

This specification is designed to be referred to normatively from other specifications defining a host language for it; it is not intended to be implemented outside a host language. The implementability ofthis specification has been tested in the context of its normative inclusion in host languages defined by the XQuery 1.0 and XSLT 2.0 specifications; see those specifications for implementation reports.

TheW3C Membership and other interested parties are invited to review thedocument and, through31 December 2006, submit comments in W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string [FS] in the subject line of your comment, whether made in Bugzilla or in email. Each Bugzilla entry and email message should contain only one comment. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/ .

Membersof the W3C AdvisoryCommittee will find the appropriatereview formfor this documentby consulting their list of currentWBS questionnaires. Notethat substantivetechnical comments wereexpected during the Last Call review period that ended 15 February2004.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Normative and Informative Sections
2 Preliminaries
    2.1 Introduction to the Formal Semantics
        2.1.1 Notations from grammar productions
        2.1.2 Notations for judgments
        2.1.3 Notations for environments
        2.1.4 Notations for inference rules
        2.1.5 Putting it together
    2.2 URIs, Namespaces, and Prefixes
    2.3 XML Values
        2.3.1 Formal values
        2.3.2 Examples of values
    2.4 The [XPath/XQuery] Type System
        2.4.1 XML Schema and the [XPath/XQuery] Type System
        2.4.2 Item types
        2.4.3 Content models
        2.4.4 Top level definitions
        2.4.5 Example of a complete Schema
    2.5 Functions and operators
3 Basics
    3.1 Expression Context
        3.1.1 Static Context
            3.1.1.1 Resolving QNames to Expanded QNames
        3.1.2 Dynamic Context
    3.2 Processing Model
        3.2.1 Processing model
        3.2.2 Normalization mapping rules
        3.2.3 Static typing judgment
        3.2.4 Dynamic evaluation judgment
    3.3 Error Handling
    3.4 Concepts
        3.4.1 Document Order
        3.4.2 Atomization
        3.4.3 Effective Boolean Value
        3.4.4 Input Sources
        3.4.5 URI Literals
    3.5 Types
        3.5.1 Predefined Schema Types
        3.5.2 Typed Value and String Value
        3.5.3 SequenceType Syntax
        3.5.4 SequenceType Matching
    3.6 Comments
    3.7 XML-defined Terminals
4 Expressions
    4.1 Primary Expressions
        4.1.1 Literals
        4.1.2 Variable References
        4.1.3 Parenthesized Expressions
        4.1.4 Context Item Expression
        4.1.5 Function Calls
    4.2 Path Expressions
        4.2.1 Steps
            4.2.1.1 Axes
            4.2.1.2 Node Tests
        4.2.2 Predicates
        4.2.3 Unabbreviated Syntax
        4.2.4 Abbreviated Syntax
    4.3 Sequence Expressions
        4.3.1 Constructing Sequences
        4.3.2 Filter Expressions
        4.3.3 Combining Node Sequences
    4.4 Arithmetic Expressions
    4.5 Comparison Expressions
        4.5.1 Value Comparisons
        4.5.2 General Comparisons
        4.5.3 Node Comparisons
    4.6 Logical Expressions
    4.7 Constructors
        4.7.1 Direct Element Constructors
            4.7.1.1 Attributes
            4.7.1.2 Namespace Declaration Attributes
            4.7.1.3 Content
            4.7.1.4 Boundary Whitespace
        4.7.2 Other Direct Constructors
        4.7.3 Computed Constructors
            4.7.3.1 Computed Element Constructors
            4.7.3.2 Computed Attribute Constructors
            4.7.3.3 Document Node Constructors
            4.7.3.4 Text Node Constructors
            4.7.3.5 Computed Processing Instruction Constructors
            4.7.3.6 Computed Comment Constructors
        4.7.4 In-scope Namespaces of a Constructed Element
    4.8 [For/FLWOR] Expressions
        4.8.1 FLWOR expressions
        4.8.2 For expression
        4.8.3 Let Expression
        4.8.4 Order By and Return Clauses
    4.9 Ordered and Unordered Expressions
    4.10 Conditional Expressions
    4.11 Quantified Expressions
    4.12 Expressions on SequenceTypes
        4.12.1 Instance Of
        4.12.2 Typeswitch
        4.12.3 Cast
        4.12.4 Castable
        4.12.5 Constructor Functions
        4.12.6 Treat
    4.13 Validate Expressions
        4.13.1 Validating an Element Node
        4.13.2 Validating a Document Node
    4.14 Extension Expressions
5 Modules and Prologs
    5.1 Version Declaration
    5.2 Module Declaration
    5.3 Boundary-space Declaration
    5.4 Default Collation Declaration
    5.5 Base URI Declaration
    5.6 Construction Declaration
    5.7 Ordering Mode Declaration
    5.8 Empty Order Declaration
    5.9 Copy-Namespaces Declaration
    5.10 Schema Import
    5.11 Module Import
    5.12 Namespace Declaration
    5.13 Default Namespace Declaration
    5.14 Variable Declaration
    5.15 Function Declaration
    5.16 Option Declaration
6 Conformance
    6.1 Static Typing Feature
        6.1.1 Static Typing Extensions
7 Additional Semantics of Functions
    7.1 Formal Semantics Functions
        7.1.1 The fs:convert-operand function
        7.1.2 The fs:convert-simple-operand function
        7.1.3 The fs:distinct-doc-order function
        7.1.4 The fs:distinct-doc-order-or-atomic-sequence function
        7.1.5 The fs:item-sequence-to-node-sequence function
        7.1.6 The fs:item-sequence-to-node-sequence-doc function
        7.1.7 The fs:item-sequence-to-untypedAtomic function
        7.1.8 The fs:item-sequence-to-untypedAtomic-PI function
        7.1.9 The fs:item-sequence-to-untypedAtomic-text function
        7.1.10 The fs:item-sequence-to-untypedAtomic-comment function
        7.1.11 The fs:apply-ordering-mode function
        7.1.12 The fs:to function
    7.2 Standard functions with specific static typing rules
        7.2.1 The fn:last context function
        7.2.2 The fn:position context function
        7.2.3 The fn:abs, fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even functions
        7.2.4 The fn:boolean function
        7.2.5 The fn:collection and fn:doc functions
        7.2.6 The fn:data function
        7.2.7 The fn:distinct-values function
        7.2.8 The fn:unordered function
        7.2.9 The fn:error function
        7.2.10 The fn:min, fn:max, fn:avg, and fn:sum functions
        7.2.11 The fn:remove function
        7.2.12 The fn:reverse function
        7.2.13 The fn:subsequence function
        7.2.14 The op:union, op:intersect, and op:except operators
        7.2.15 The fn:insert-before function
        7.2.16 The fn:zero-or-one, fn:one-or-more, and fn:exactly-one functions
8 Auxiliary Judgments
    8.1 Judgments for accessing types
        8.1.1 Derives from
        8.1.2 Substitutes for
        8.1.3 Element and attribute name lookup (Dynamic)
        8.1.4 Element and attribute type lookup (Static)
        8.1.5 Extension
        8.1.6 Mixed content
        8.1.7 Type adjustment
        8.1.8 Builtin attributes
        8.1.9 Type expansion
        8.1.10 Union interpretation of derived types
    8.2 Judgments for step expressions and filtering
        8.2.1 Principal Node Kind
        8.2.2 Auxiliary judgments for axes
            8.2.2.1 Static semantics of axes
                8.2.2.1.1 Inference rules for all axes
                8.2.2.1.2 Inference rules for the self axis
                8.2.2.1.3 Inference rules for the child axis
                8.2.2.1.4 Inference rules for the attribute axis
                8.2.2.1.5 Inference rules for the parent axis
                8.2.2.1.6 Inference rules for the namespace axis
                8.2.2.1.7 Inference rules for the descendant axis
                8.2.2.1.8 Inference rules for the descendant-or-self axis
                8.2.2.1.9 Inference rules for the ancestor axis
                8.2.2.1.10 Inference rules for the ancestor-or-self axis
            8.2.2.2 Dynamic semantics of axes
        8.2.3 Auxiliary judgments for node tests
            8.2.3.1 Static semantics of node tests
                8.2.3.1.1 Name Tests
                8.2.3.1.2 Kind Tests
            8.2.3.2 Dynamic semantics of node tests
                8.2.3.2.1 Name Tests
                8.2.3.2.2 Kind Tests
    8.3 Judgments for type matching
        8.3.1 Matches
        8.3.2 Subtyping (<:)
    8.4 Judgments for FLWOR and other expressions on sequences
    8.5 Judgments for function calls
        8.5.1 Type promotion
    8.6 Judgments for validation modes and contexts
        8.6.1 Elements in validation mode

Appendices

A Normalized core and formal grammars
    A.1 Core BNF
    A.2 Formal BNF
B Index of judgments
C Functions and Operators
    C.1 Functions and Operators used in the Formal Semantics
    C.2 Mapping of Overloaded Internal Functions
D Importing Schemas
    D.1 Introduction
        D.1.1 Features
        D.1.2 Organization
        D.1.3 Main mapping rules
        D.1.4 Special attributes
            D.1.4.1 use, default, and fixed
            D.1.4.2 minOccurs, maxOccurs, minLength, maxLength, and length
            D.1.4.3 mixed
            D.1.4.4 nillable
            D.1.4.5 substitutionGroup
        D.1.5 Anonymous type names
    D.2 Schemas as a whole
        D.2.1 Schema
        D.2.2 Include
        D.2.3 Redefine
        D.2.4 Import
    D.3 Attribute Declarations
        D.3.1 Global attributes declarations
        D.3.2 Local attribute declarations
    D.4 Element Declarations
        D.4.1 Global element declarations
        D.4.2 Local element declarations
    D.5 Complex Type Definitions
        D.5.1 Global complex type
        D.5.2 Local complex type
        D.5.3 Complex type with simple content
        D.5.4 Complex type with complex content
    D.6 Attribute Uses
    D.7 Attribute Group Definitions
        D.7.1 Attribute group definitions
        D.7.2 Attribute group reference
    D.8 Model Group Definitions
    D.9 Model Groups
        D.9.1 All groups
        D.9.2 Choice groups
        D.9.3 Sequence groups
    D.10 Particles
        D.10.1 Element reference
        D.10.2 Group reference
    D.11 Wildcards
        D.11.1 Attribute wildcards
        D.11.2 Element wildcards
    D.12 Identity-constraint Definitions
    D.13 Notation Declarations
    D.14 Annotation
    D.15 Simple Type Definitions
        D.15.1 Global simple type definition
        D.15.2 Local simple type definition
        D.15.3 Simple type content
E References
    E.1 Normative References
    E.2 Non-normative References
    E.3 Background References
F Auxiliary Judgments for Validation (Non-Normative)
    F.1 Judgments for the validate expression
        F.1.1 Type resolution
        F.1.2 Interleaving
        F.1.3 Attribute filtering
        F.1.4 Erasure
            F.1.4.1 Simply erases
            F.1.4.2 Erases
        F.1.5 Annotate
            F.1.5.1 Simply annotate
            F.1.5.2 Nil-annotate
            F.1.5.3 Annotate
G Revision Log (Non-Normative)
    G.1 15 September 2005
    G.2 03 November 2005 (CR Draft)
    G.3 09 June 2006
    G.4 16 June 2006


1 Introduction

This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:

The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].

This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the expressions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language. This document defines the formal semantics for XPath 2.0 only when the XPath 1.0 backward compatibility rules are not in effect.

Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.

[XPath/XQuery] is a functional language. [XPath/XQuery] is built from expressions, rather than statements. Every construct in the language (except for the XQuery query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another characteristic of a functional language is that variables are always passed by value, and a variable's value cannot be modified through side effects.

[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas that describe the input documents and the output document, and the [XPath/XQuery] language can then perform operations based on these types. In addition, [XPath/XQuery] supports static type analysis. Static type analysis infers the output type of an expression based on the type of its input expressions. In addition to inferring the type of an expression for the user, static typing allows early detection of type errors, and can be used as the basis for certain classes of optimization. The [XPath/XQuery] type system captures most of the features of [Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types.

This document is organized as follows. [2 Preliminaries] introduces the notations used to define the [XPath/XQuery] Formal Semantics. These include the formal notations for values in the [XPath/XQuery] data model and for types in XML Schema. The next three sections: [3 Basics], [4 Expressions], and [5 Modules and Prologs] have the same structure as the corresponding sections in the [XQuery 1.0: An XML Query Language] and [XML Path Language (XPath) 2.0] documents. This allows the reader to quickly find the formal definition of a particular language construct. [3 Basics] defines the semantics for basic [XPath/XQuery] concepts, and [4 Expressions] defines the dynamic and static semantics of each [XPath/XQuery] expression. [5 Modules and Prologs] defines the semantics of the [XPath/XQuery] prolog. [7 Additional Semantics of Functions] defines the static semantics of several functions in [Functions and Operators] and gives the dynamic and static semantics of several supporting functions used in this document. The remaining sections, [8 Auxiliary Judgments] and [D Importing Schemas], contain material that supports the formal semantics of [XPath/XQuery]. [8 Auxiliary Judgments] defines formal judgments that relate data model values to types, that relate types to types, and that support the formal definition of validation. These judgments are used in the definition of expressions in [4 Expressions]. Lastly, [D Importing Schemas], specifies how XML Schema documents are imported into the [XPath/XQuery] type system and relates XML Schema types to the [XPath/XQuery] type system.

1.1 Normative and Informative Sections

Certain aspects of language processing are described in this specification as implementation-defined or implementation-dependent.

  • [Definition: Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.]

  • [Definition: Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.]

A language aspect described in this specification as implementation-defined or implementation dependent may be further constrained by the specifications of a host language in which XPath or XQuery is embedded.

This document contains the normative static semantics of [XPath/XQuery]. The static semantics rules in [3 Basics], [4 Expressions], [5 Modules and Prologs], and [7 Additional Semantics of Functions] are normative. [3.1.1 Static Context] is normative, because it defines the static context used in the static typing rules. [8 Auxiliary Judgments] is normative, because it contains all the judgments necessary for defining SequenceType Matching.

The dynamic semantics of [XPath/XQuery] are normatively defined in [XQuery 1.0: An XML Query Language] and [XML Path Language (XPath) 2.0]. In this document, the dynamic semantic rules in [3 Basics], [4 Expressions], and [5 Modules and Prologs], the examples, and the material labeled as "Note" are provided for explanatory purposes and are not normative.

The mapping rules from XML Schema to the XQuery type system provided in [D Importing Schemas], and the formal semantics of XML Schema validation in [F Auxiliary Judgments for Validation] are informative and do not handle every feature of XML Schema.

2 Preliminaries

This section provides the background necessary to understand the Formal Semantics, introduces the notations that are used, and explains its relationship to other documents.

2.1 Introduction to the Formal Semantics

Why a Formal Semantics? The goal of the formal semantics is to complement the [XPath/XQuery] specification ([XQuery 1.0: An XML Query Language] and [XML Path Language (XPath) 2.0]), by defining the meaning of [XPath/XQuery] expressions with mathematical rigor.

A rigorous formal semantics clarifies the intended meaning of the English specification, ensures that no corner cases are left out, and provides a reference for implementation.

Why use formal notations? Rigor is achieved by the use of formal notations to represent [XPath/XQuery] objects such as expressions, XML values, and XML Schema types, and by the systematic definition of the relationships between those objects to reflect the meaning of the language. In particular, the dynamic semantics relates [XPath/XQuery] expressions to the XML value to which they evaluate, and the static semantics relates [XPath/XQuery] expressions to the XML Schema type that is inferred for that expression.

The Formal Semantics uses several kinds of formal notations to define the relationships between [XPath/XQuery] expressions, XML values, and XML Schema types. This section introduces the notations for judgments, inference rules, and mapping rules as well as the notation for environments, which implement the dynamic and static contexts. The reader already familiar with these notations can skip this section and continue with [2.3 XML Values].

2.1.1 Notations from grammar productions

Grammar productions are used to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions: productions from the [XPath/XQuery] grammar itself, productions for a subset of the [XPath/XQuery] language called the XQuery Core which is used throughout this document, and other productions used for formal specification only such as for the XQuery type system.

XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to their number in the [XQuery 1.0: An XML Query Language] document, and are marked with "(XQuery)". For instance, the following production describes FLWOR expressions in XQuery.

[For/FLWOR] Expressions
[33 (XQuery)]   FLWORExprXQ   ::=   (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle

For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to their number in [XML Path Language (XPath) 2.0], and are marked with "(XPath)". For instance, the following production describes for expressions in XPath.

[For/FLWOR] Expressions
[4 (XPath)]   ForExprXP   ::=   SimpleForClause "return" ExprSingle

XQuery Core grammar productions describe the XQuery Core. The Core grammar is given in [A Normalized core and formal grammars]. Core productions are identified by a number, which corresponds to their number in [A Normalized core and formal grammars], and are marked with "(Core)". For instance, the following production describes the simpler form of the "FLWOR" expression in the XQuery Core.

Core FLWOR Expressions
[24 (Core)]   FLWORExpr   ::=   (ForClause | LetClause) "return" ExprSingle

The Formal Semantics manipulates "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: An XML Query Language] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe values in the [Data Model], and to describe the [XPath/XQuery] type system. Formal Semantics productions are identified by a number, and are marked by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.

Type Definitions
[39 (Formal)]   Definition   ::=   ("define" "element" ElementName OptSubstitution OptNillable TypeReference)
| ("define" "attribute" AttributeName TypeReference)
| ("define" "type" TypeName TypeDerivation)

Note that grammar productions that are specific to the Formal Semantics (i.e., marked with "(Formal)") are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the languages' semantics.

Grammar non-terminals are used extensively in this document to represent objects in judgments (see the next section). As a convenience, non-terminals used in judgments link to the appropriate grammar production.

2.1.2 Notations for judgments

The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.

For example:

Notation

The judgment

Object is a positive integer

holds if the object Object is a positive integer.

A judgment may hold (if it is true) or not hold (if it is false). For instance '1 is a positive integer' holds and '-1 is a positive integer' does not hold.

Notation

Here are two other example judgments.

The judgment

Expr => Value

holds if the expression Expr yields (or evaluates to) the value Value.

The judgment

Expr : Type

holds if the expression Expr has the type Type.

Most other judgments used in this document are short English sentences intended to reflect their meaning. For instance, the judgment

Axis has principal PrincipalNodeKind

holds if PrincipalNodeKind is the principal node kind for the axis Axis.

A judgment can contain symbols and patterns.

Symbols are purely syntactic and are used to write the judgment itself. Symbols are chosen to reflect a judgment's meaning, and are written in bold fonts. For example, 'is a positive integer', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.

Patterns are used to represent objects, constructed from a given grammar production. In patterns, italicized words usually correspond to non-terminals in the grammar. The name of those non-terminals is significant, and may be instantiated only to an "object" (a value, a type, an expression, etc.) that can be substituted legally for that non-terminal. For example, 'Expr' is a pattern that stands for every [XPath/XQuery] expressions, 'Expr1 + Expr2' is a pattern that stands for every addition expression, 'element a { Value }' is a pattern that stands for every value in the [XPath/XQuery] data model that is an 'a' element.

Non-terminals in a pattern may appear with subscripts (e.g. Expr1, Expr2) to distinguish different instances of the same sort of pattern. In some cases, non-terminals in a pattern may have a name that is not exactly the name of that non terminal, but is based on it. For instance, a BaseTypeName is a pattern that stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.

When instantiating the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the value '3' (on the right-hand side of the => symbol).

In some cases, inference rules may need to use the fact that a certain judgment does not hold. not(Judgment) holds if andonly if Judgment does not hold.

In some cases, a pattern may be instantiated to a value within a finite set of pre-determined values. We may write that set of possible values using the in judgment. For instance, the judgment

Color in { blue, green }

holds if the pattern Color has either the value blue or the value green.

In some cases, a judgment may use the "=" sign to indicate that a given value is equal to another value, or that a pattern is equal to a given value. For instance, the judgment

Color = blue

holds if the pattern Color has the value blue.

An index to all the judgments used in this specification is provided in [B Index of judgments].

2.1.3 Notations for environments

An environment component is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can access information in an environment component or update it.

If "envComp" is an environment component, then "envComp(symbol)" denotes the "object" to which symbol is mapped. The notation is intentionally similar to function application, because an environment component can be considered a function from the argument symbol to the "object" to which the symbol is mapped.

This document uses environments that group related environment components. If "env" is an environment containing the environment component "envComp", that environment component is denoted "env.envComp". The value that symbol is mapped to in that environment component is denoted "env.envComp(symbol)".

The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which models the [XPath/XQuery] dynamic context, and a static environment (statEnv), which models the [XPath/XQuery] static context. Both are defined in [3.1 Expression Context].

For example, dynEnv.varValue denotes the dynamic environment component that maps variables to values and dynEnv.varValue(Variable) denotes the value of the variable Variable in the dynamic context.

Environments are used in a judgment to capture some of the context in which the judgment is computed, and most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is used in almost all inference rules.

For instance, the judgment

dynEnv |-  Expr => Value

is read as: Assuming the dynamic environment dynEnv, the expression Expr yields the value Value.

Environments can be updated, using the following notation:

  • "env + envComp(symbol => object) " denotes the new environment that is identical to env except that the environment component envComp has been updated to map symbol to object. The notation symbol => object indicates that symbol is mapped to object in the new environment.

  • In case the environment component contains only a constant value (e.g., the ordering mode which can only be either ordered or unordered), the following notation is used to set its value. "env + envComp( object ) ".

  • The following shorthand is also allowed: "env + envComp( symbol1 => object1 ; ... ; symboln => objectn ) " in which each symbol is mapped to a corresponding object in the new environment.

    This notation is equivalent to nested updates, as in " (env + envComp( symbol1 => object1) + ... ) + env(symboln => objectn)".

Updating an environment creates a copy of the original environment and overrides any previous binding that might exist for the same name and the same component in that environment. Updating the environment is used to capture the scope of a symbol (e.g., for variables, namespace prefixes, etc). For instance, in the following expression

  let $x := 1 return
  let $x := $x + 2 return
  $x - 3

each let expression changes the dynamic context by binding a new variable to a new value. Each different context is represented by a different environment. The original environment, in which the expression 1 is evaluated, does not contain any binding for variable $x. This environment is updated a first time with a binding of variable $x to the value 1, and this environment is used for the evaluation of the expression $x + 2. Then it is updated a second time with a binding of variable $x to the value 3, and this environment is used for the evaluation of the expression $x - 3.

Also, note that there are no operations to remove entries from environments. This is never necessary as updating an environment effectively creates a new extended copy of the original environment, leaving the original environment accessible wherever it is in scope along with the updated copy.

2.1.4 Notations for inference rules

Inference rules are used to specify how to infer whether a given judgment holds or not. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments.

A logical inference rule is written as a collection of premises and a conclusion, written respectively above and below a dividing line, as follows:

premise1 ... premisen

conclusion

All premises and the conclusion are judgments. From a logical point of view, an inference rule is a deduction that if the premises hold, then the conclusion holds as well. In that sense, the previous inference rule has a similar meaning as the following logical statement.

IF premise1

AND ...

AND premisen

THEN conclusion

Here is a simple example of inference rule, which uses specific instances of the example judgment 'Expr => Value' from above:

$x => 0      3 => 3

$x + 3 => 3

This inference rule expresses the following property: if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.

An inference rule may have no premises above the line, which means that the expression below the line always holds. For instance:


3 => 3

This inference rule expresses the following property: evaluating the literal expression '3' always yields the value '3'.

The two above rules are expressed in terms of specific expressions and values, but usually rules are more abstract. That is, the judgments are not fully instantiated. Here is a rule that says that for any variable $VarName that yields the integer value Integer, adding '0' yields the same integer value:

$VarName => Integer

$VarName + 0 => Integer

Each occurrence of a given pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that, in the context of a particular instantiation of a rule, one can talk about "the value of $VarName" instead of the value bound to the first (second, etc) occurrence of $VarName.

Here is an example of a rule occurring later in this document.

statEnv |-  Expr1 : Type1      statEnv |-  Expr2 : Type2

statEnv |-  Expr1 , Expr2 : Type1, Type2

This rule is read as follows: if two expressions Expr1 and Expr2 are known to have the static types Type1 and Type2 (the two premises above the line), then it is the case that the sequence expression "Expr1 , Expr2" has the static type "Type1, Type2", which is the sequence of types Type1 and Type2. Note that this inference rule does not modify the static environment.

The following rule defines the static semantics of a "let" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.

statEnv |-  VarName of var expands to expanded-QName
statEnv |-  Expr1 : Type1      statEnv + varType(expanded-QName => Type1)  |-  Expr2 : Type2

statEnv |-  let $VarName := Expr1 return Expr2 : Type2

This rule is read as follows: First, because the variable is a QName, it is first expanded into an expanded QName. Second, the type Type1 for the "let" input expression Expr1 is computed. Then the "let" variable with expanded name, expanded-QName with type Type1 is added into the varType component of the static environment statEnv. Finally, the type Type2 of Expr2 is computed in that new environment.

In some cases, ellipses may be used in inference rules to handle an arbitrary number of judgments. In those cases, some of the patterns may have indices as subscript. If the same index is used several times within the same rule, the number of judgment in each case must be the same. For instance, the following rule holds for any number of expressions, from Expr1 to Exprn, with the same number of types Type1 to Typen.

statEnv |-  QName of func expands to expanded-QName
statEnv |-  Expr1 : Type1
...
statEnv |-  Exprn : Typen
statEnv  |-  expanded-QName(Type1,...,Typen) : Type

statEnv  |-  QName (Expr1,...,Exprn) : Type

This inference rule is equivalent to having an unbounded number of rules, the first of which has 1 judgment, the second of which has 2 judgments, etc. For instance, the above rule holds if and only if one of the following rules hold.

statEnv |-  QName of func expands to expanded-QName
statEnv |-  Expr1 : Type1
statEnv  |-  expanded-QName(Type1) : Type

statEnv  |-  QName (Expr1) : Type

or

statEnv |-  QName of func expands to expanded-QName
statEnv |-  Expr1 : Type1
statEnv |-  Expr2 : Type2
statEnv  |-  expanded-QName(Type1,Type2) : Type

statEnv  |-  QName (Expr1,Expr2) : Type

etc.

When ellipses are used, the value for the index always ranges from 1 to an arbitrary number n.

2.1.5 Putting it together

In isolation, each inference rule describes a fragment of the semantics for a given judgment. Put together, inference rules describe possible inferences that can be used to decide whether a particular judgment holds.

For a given judgment, if that judgment can be inferred to be true by applying any sequence of inferences based on premises which are known to be true, the inference succeeds. In most cases, the inference will proceed by proving intermediate judgments, following the consequences from one judgment to the next by applying successive inference rules.

Such inference is a mechanism which can be used to describe both static type analysis and dynamic evaluation. More specifically, performing static typing consists in proving that the following judgment holds for a given expression Expr.

statEnv |-  Expr : Type

If the judgment holds for a given type Type, this type is a possible static type for the expression. If there exists no type for which this judgment holds, then static typing fails and a static type error is returned to the user.

Consider the following expression.

  fn:count((1,2,3))

Using the static typing rules given for expressions in the rest of this document, one can deduce that the expression is of type xs:integer through the following inference.

  statEnv |- 1 : xs:integer  (from typing of literals)
  statEnv |- 2 : xs:integer  (from typing of literals)
  --------------------------------------------------- (sequence)
    statEnv |- 1,2 : xs:integer, xs:integer
    statEnv |- 3 : xs:integer
    ----------------------------------------------------- (sequence)
    statEnv |- 1,2,3 : xs:integer, xs:integer, xs:integer

    declare function fn:count($x as item()*) as xs:integer
    statEnv |- xs:integer,xs:integer,xs:integer <: item()*
    ---------------------------------------------------------- (function call)
    statEnv |- fn:count((1,2,3)) : xs:integer

Conversly, consider the following expression.

  fn:nilled((1,2,3))

Using the static typing rules given for expressions in the rest of this document, one can apply inference rules up to the following point.

    ....
    ----------------------------------------------------- (sequence)
    statEnv |- 1,2,3 : xs:integer, xs:integer, xs:integer

However, there is no rule that can infer the type of fn:nilled((1,2,3)), because the static typing rules for function calls will only hold if the type of the function parameters is a subtype of the expected type. However, here (xs:integer,xs:integer,xs:integer) is not a node type, which is the expected type for the function fn:nilled.

Note that in some cases, the inference can only proceed through the appropriate changes to the environment. For instance, consider the following expression.

  let $x := 1 return ($x,$x)

Using the static typing rules given for expressions in the rest of this document, one can deduce that the expression is of type (xs:integer,xs:integer) through the following inference.

statEnv0.varType = ()

  -------------------------- (literal)
  statEnv0 |- 1 : xs:integer

statEnv1 = statEnv0 + varType($x => xs:integer)

     statEnv1.varType($x) = xs:integer
     --------------------------------- (variable reference)
     statEnv1 |- $x : xs:integer

     statEnv1.varType($x) = xs:integer
     --------------------------------- (variable reference)
     statEnv1 |- $x : xs:integer

     ------------------------------------------- (sequence)
     statEnv1 |- ($x,$x) : xs:integer,xs:integer

  -------------------------------------------------------------- (let)
  statEnv0 |- let $x := 1 return ($x,$x) : xs:integer,xs:integer

This example illustrates how each rule is applied to individual sub-expressions, and how the environment is used to maintain the relevant context information.

2.2 URIs, Namespaces, and Prefixes

The Formal Semantics does not formally specify the adjustment of relative URIs according to a base URI. All URIs used in this document are assumed to be absolute URIs.

The Formal Semantics uses the following namespace prefixes.

These prefixes are assumed to be bound to the appropriate URIs.

In addition, the Formal Semantics uses the following special prefixes for specification purposes.

These prefixes are always italicized to emphasize that the corresponding functions, variables, and types are abstract: they are not and cannot be made accessible in [XPath/XQuery]. None of these special prefixes are given an explicit URI, but they behave as if they had one for the purposes of namespace resolution.

2.3 XML Values

The [XPath/XQuery] language is defined over values of the [XPath/XQuery] data model. The [XPath/XQuery] data model is defined normatively in [Data Model]. We define the formal notation that is used in this document to describe and manipulate values in inference rules. Formal values are used for specification purposes only and are not exposed to the [XPath/XQuery] user.

This section gives the grammar for formal values, along with a summary of the corresponding data model properties. In the context of this document, all constraints on values that are specified in [Data Model] are assumed to hold.

2.3.1 Formal values

A value is a sequence of zero or more items. An item is either an atomic value or a node.

An atomic value is a value in the value space of an atomic type, labeled with the name of that atomic type. An atomic type is either a primitive or derived atomic type according to XML Schema [Schema Part 2], xs:untypedAtomic, or xs:anyAtomicType.

A node is either an element, an attribute, a document, a text, a comment, or a processing-instruction node.

Element nodes have a type annotationXQ and contain a complex value or a simple value. Attribute nodes have a type annotationXQ and contain a simple value. Text nodes always contain one string value of type xs:untypedAtomic, therefore the corresponding type annotation is omitted in the formal notation of a text node. Document nodes do not have a type annotation and contain a sequence of element, text, comment, or processing-instruction nodes.

A simple value is a sequence of atomic values.

A complex value is a sequence of attribute nodes followed by a sequence of element, text, comment, or processing-instruction nodes.

A type annotationXQ can be either the QName of a declared type or an anonymous type. An anonymous type corresponds to an XML Schema type for which the schema writer did not provide a name. Anonymous type names are not visible to the user, but are generated during schema validation and used to annotate nodes in the data model. By convention, anonymous type names are written using the fs: Formal Semantics prefix: fs:anon0, fs:anon1, etc.

Formal values are defined by the following grammar.

Values
[7 (Formal)]   Value   ::=   Item
| (Value "," Value)
| ("(" ")")
[21 (Formal)]   Item   ::=   NodeValue
| AtomicValue
[22 (Formal)]   AtomicValue   ::=   AtomicValueContent TypeAnnotation?
[1 (Formal)]   AtomicValueContent   ::=   String
| Boolean
| Decimal
| Float
| Double
| Duration
| DateTime
| Time
| Date
| GYearMonth
| GYear
| GMonthDay
| GDay
| GMonth
| HexBinary
| Base64Binary
| AnyURI
| expanded-QName
| NOTATION
[2 (Formal)]   TypeAnnotation   ::=   "of" "type" TypeName
[9 (Formal)]   ElementValue   ::=   "element" ElementName "nilled"? TypeAnnotation? "{" Value "}" ("{" NamespaceBindings "}")?
[10 (Formal)]   AttributeValue   ::=   "attribute" AttributeName TypeAnnotation? "{" SimpleValue "}"
[8 (Formal)]   SimpleValue   ::=   AtomicValue
| (SimpleValue "," SimpleValue)
| ("(" ")")
[11 (Formal)]   DocumentValue   ::=   "document" "{" Value "}"
[13 (Formal)]   CommentValue   ::=   "comment" "{" String "}"
[14 (Formal)]   ProcessingInstructionValue   ::=   "processing-instruction" NCName "{" String "}"
[12 (Formal)]   TextValue   ::=   "text" "{" String "}"
[20 (Formal)]   NodeValue   ::=   ElementValue
| AttributeValue
| DocumentValue
| TextValue
| CommentValue
| ProcessingInstructionValue
[3 (Formal)]   ElementName   ::=   QName
[6 (Formal)]   AttributeName   ::=   QName
[23 (Formal)]   TypeName   ::=   QName
[15 (Formal)]   NamespaceBindings   ::=   NamespaceBinding ("," NamespaceBinding)*
[17 (Formal)]   NamespaceBinding   ::=   "namespace" NCName "{" AnyURI "}"

Notation

In the production for AtomicValueContent, each symbol in the right-hand side corresponds to one of the primitive datatypes. For example, String corresponds to xs:string, and Boolean corresponds to xs:boolean. (The mapping is obvious, except that "expanded-QName" corresponds to xs:QName) Although there are no explicit productions for these symbols, we assume that each is a non-terminal that derives a set of syntactic objects, each of which corresponds to a value in the value space of the corresponding datatype. For instance, the non-terminal String derives a set of syntactic objects, which appear in examples as "", "a", "John", etc.; each one corresponds to a string value in the xs:string value space. For familiarity, these objects have been given the same appearance as StringLiterals from the XQuery and Core grammars; however, these are formal objects, with a distinct role in the FS.

Element (resp. attributes) without type annotations, are assumed to have the type annotation xs:anyType (resp. xs:anySimpleType). Atomic values without type annotations, are assumed to have a type annotation which is the base type for the corresponding value. For instance, "Hello, World!" is equivalent to "Hello, World!" of type xs:string.

Untyped elements (e.g., from well-formed documents) have the type annotationXQ xs:untyped, untyped attributes have the type annotationXQ xs:untypedAtomic, and untyped atomic values have the type annotationXQ xs:untypedAtomic.

An element has an optional "nilled" marker. This marker is present only if the element has been validated against an element type in the schema which is "nillable", and the element has no content and an attribute xsi:nil set to "true".

An element also has a sequence of namespace bindings, which are the set of in-scope namespaces for that element. Each namespace binding is a prefix, URI pair. Elements without namespace bindings are assumed to have an empty set of in-scope namespaces.

Note:

In [XPath], the in-scope namespaces of an element node are represented by a collection of namespace nodes arranged on a namespace axis, which is optional and deprecated in [XML Path Language (XPath) 2.0]. XQuery does not support the namespace axis and does not represent namespace bindings in the form of nodes.

2.3.2 Examples of values

A well-formed document

In the absence of a Schema, this document is represented as

A document before and after validation.

The formal model for values can represent values before and after validation. Before validation, this element is represented as:

After validation, this element is represented as:

An element with a list type

Before validation, this element is represented as:

Assume the following Schema.

After validation against this Schema, the element is represented as:

An element with an anonymous type

Before validation, this element is represented as:

Assume the following Schema.

After validation, this element is represented as:

where fs:anon1 stands for the internal anonymous name generated by the system for the sizes element.

A nillable element with xsi:type set to true

Before validation, this element is represented as:

Assume the following Schema.

After validation against this Schema, the element is represented as:

An element with a union type

Before validation, this element is represented as:

Assume the following Schema:

After validation against this Schema, the element is represented as:

2.4 The [XPath/XQuery] Type System

The [XPath/XQuery] type system is used in the specification of the dynamic and of the static semantics of [XPath/XQuery]. This section introduces formal notations for describing types.

2.4.1 XML Schema and the [XPath/XQuery] Type System

The [XPath/XQuery] type system is based on [Schema Part 1] and [Schema Part 2]. [Schema Part 1] and [Schema Part 2] specify normatively the type information available in [XPath/XQuery]. We define the formal notation that is used in this document to describe and manipulate types in inference rules. Formal types are used for specification purposes only and are not exposed to the [XPath/XQuery] user.

Representation of content models. For the purpose of static typing, the [XPath/XQuery] type system only describes minOccurs, maxOccurs, and minLength, maxLength on list types for the occurrences that correspond to the DTD operators +, *, and ?. Choices are represented using the DTD operator |. All groups are represented using the interleaving operator (&).

Representation of anonymous types. To clarify the semantics, the [XPath/XQuery] type system makes all anonymous types explicit.

Representation of XML Schema simple type facets and identity constraints. For simplicity, XML Schema simple type facets and identity constraints are not formally represented in the [XPath/XQuery] type system. However, an [XPath/XQuery] implementation supporting XML Schema import and validation must take simple type facets and identity constraints into account.

This document describe types in the [XPath/XQuery] types system, as well as the operations and properties over those types which are used to define the [XPath/XQuery] static typing feature. The two most important properties are whether a data instances matches a type, and whether a type is a subtype of another. Those properties are described in [8.3 Judgments for type matching]. This document does not describe all other possible properties over those types.

The mapping from XML Schema into the [XPath/XQuery] type system is given in [D Importing Schemas]. The rest of this section is organized as follows. [2.4.2 Item types] describes item types, [2.4.3 Content models] describes content models, and [2.4.4 Top level definitions] describe top-level type declarations.

2.4.2 Item types

An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, a comment node type, or a processing instruction type. We distinguish between document nodes, attribute nodes, and nodes that can occur in element content (elements, comments, processing instructions, and text nodes), as we need to refer to element content types later in the formal semantics.

Item Types
[25 (Formal)]   FormalItemType   ::=   AtomicTypeName | NodeType
[28 (Formal)]   AtomicTypeName   ::=   TypeName
[26 (Formal)]   NodeType   ::=   DocumentType
| AttributeType
| ElementContentType
[27 (Formal)]   ElementContentType   ::=   ElementType
| "comment"
| "processing-instruction"
| "text"
[29 (Formal)]   ElementType   ::=   "element" ElementNameOrWildcard OptTypeSpecifier
[4 (Formal)]   ElementNameOrWildcard   ::=   QName | "*"
[5 (Formal)]   AttributeNameOrWildcard   ::=   QName | "*"
[77 (Formal)]   OptTypeSpecifier   ::=   TypeSpecifier?
[30 (Formal)]   TypeSpecifier   ::=   OptNillable TypeReference
[31 (Formal)]   AttributeType   ::=   "attribute" AttributeNameOrWildcard OptTypeReference
[75 (Formal)]   OptNillable   ::=   Nillable?
[32 (Formal)]   Nillable   ::=   "nillable"
[78 (Formal)]   OptTypeReference   ::=   TypeReference?
[36 (Formal)]   TypeReference   ::=   "of" "type" TypeName
[45 (Formal)]   DocumentType   ::=   "document" ("{" Type "}")?

An element or attribute type has a name or wildcard, and an optional type reference. A name alone corresponds to a reference to a global element or attribute declaration. A name with a type reference corresponds to a local element or attribute declaration. "element *" or "attribute *" alone refers to the wildcard types for any element or any attribute. In addition, an element type has an optional nillable flag that indicates whether the element can be nilled or not.

A document type has an optional content type. If no content type is given, then the type is treated as being the wildcard type for documents, i.e., a sequence of text and element nodes. For consistency with element nodes, PIs and comments are not indicated in that wildcard type, but may occur in instances.

Note

Generic node types (e.g., node()) such as used in the SequenceType production, are interpreted in the type system as a union of the corresponding node types (e.g., element,attribute,text,comment and processing-instruction nodes) and therefore do not appear in the grammar. The semantics of sequence types is described in [3.5.4 SequenceType Matching].

Examples

The following is a text node type

  text

The following is a type for all elements

  element * of type xs:anyType

The following is a type for all elements of type string

  element * of type xs:string

The following is a type for a nillable element of type string and with name size

  element size nillable of type xs:string

The following is a reference to a global attribute declaration

  attribute sizes

The following is a type for elements with anonymous type fs:anon1:

  element sizes of type fs:anon1