W3C

XQuery 1.0 and XPath 2.0 Formal Semantics

W3C Working Draft 2 May 2003

This version:
http://www.w3.org/TR/2003/WD-xquery-semantics-20030502/
Latest version:
http://www.w3.org/TR/xquery-semantics/
Previous versions:
http://www.w3.org/TR/2002/WD-query-semantics-20021115/
http://www.w3.org/TR/2002/WD-query-semantics-20020816/
http://www.w3.org/TR/2002/WD-query-semantics-20020326/
http://www.w3.org/TR/2001/WD-query-semantics-20010607/
http://www.w3.org/TR/2001/WD-query-algebra-20010215/
http://www.w3.org/TR/2000/WD-query-algebra-20001204/
Editors:
Denise Draper (XML Query WG), Nimble Technology <ddraper@nimble.com>
Peter Fankhauser (XML Query WG), Infonyte GmbH <fankhaus@infonyte.com>
Mary Fernández (XML Query WG), AT&T Labs - Research <mff@research.att.com>
Ashok Malhotra (XML Query and XSL WGs), Microsoft <ashokma@microsoft.com>
Kristoffer Rose (XSL WG), IBM Research <krisrose@us.ibm.com>
Michael Rys (XML Query WG), Microsoft <mrys@microsoft.com>
Jérôme Siméon (XML Query WG), Bell Labs, Lucent Technologies <simeon@research.bell-labs.com>
Philip Wadler (XML Query WG), Avaya <wadler@avaya.com>

Abstract

This document defines formally the semantics of XQuery 1.0 [XQuery 1.0: A Query Language for XML] and XPath 2.0 [XML Path Language (XPath) 2.0].

Status of this Document

This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.

This document is a work in progress. It contains a number of open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.

This version is a major revision from the previous version, and it closes more than 20 issues. Among the most important changes from the previous version of this document are:

The following are identified as high priority issues.

Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).

XQuery 1.0, XPath 2.0, and their formal semantics has been defined jointly by the XML Query Working Group and the XSL Working Group (both part of the XML Activity).

Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.

Table of Contents

1 Introduction
2 Preliminaries
    2.1 Introduction to the Formal Semantics
        2.1.1 Notations from grammar productions
        2.1.2 Notations for judgments
        2.1.3 Notations for inference rules
        2.1.4 Notations for environments
        2.1.5 Putting it together
    2.2 XML Values
        2.2.1 Formal values
        2.2.2 Examples of values
    2.3 The Type System
        2.3.1 XML Schema and the Type System
        2.3.2 Item types
        2.3.3 Content models
        2.3.4 Top level definitions
        2.3.5 Example of a complete Schema
    2.4 Processing model and main judgments
        2.4.1 Processing model
        2.4.2 Normalization judgment
        2.4.3 Static typing judgment
        2.4.4 Dynamic evaluation judgment
    2.5 Relationship with other documents
        2.5.1 Namespaces
        2.5.2 Functions and operators
3 Basics
    3.1 Expression Context
        3.1.1 Static Context
        3.1.2 Evaluation Context
    3.2 Input Functions
    3.3 Expression Processing
    3.4 Types
        3.4.1 Predefined Types
        3.4.2 Type Checking
        3.4.3 SequenceType
            3.4.3.1 SequenceType Matching
        3.4.4 Type Conversions
            3.4.4.1 Atomization
            3.4.4.2 Effective Boolean Value
            3.4.4.3 XPath 1.0 Backward Compatibility
    3.5 Errors Handling
    3.6 Optional Features
        3.6.1 Basic XQuery
        3.6.2 Schema Import Feature
        3.6.3 Static Typing Feature
        3.6.4 Extensions
            3.6.4.1 Pragmas
            3.6.4.2 Must-Understand Extensions
            3.6.4.3 XQuery Flagger
4 Expressions
    4.1 Primary Expressions
        4.1.1 Literals
        4.1.2 Variables
        4.1.3 Parenthesized Expressions
        4.1.4 Function Calls
        4.1.5 XQuery Comments
    4.2 Path Expressions
        4.2.1 Steps
            4.2.1.1 Axes
            4.2.1.2 Node Tests
        4.2.2 Predicates
        4.2.3 Unabbreviated Syntax
        4.2.4 Abbreviated Syntax
    4.3 Sequence Expressions
        4.3.1 Constructing Sequences
        4.3.2 Combining Sequences
    4.4 Arithmetic Expressions
    4.5 Comparison Expressions
        4.5.1 Value Comparisons
        4.5.2 General Comparisons
        4.5.3 Node Comparisons
        4.5.4 Order Comparisons
    4.6 Logical Expressions
    4.7 Constructors
        4.7.1 Direct Element Constructors
            4.7.1.1 Attributes
            4.7.1.2 Namespaces
            4.7.1.3 Content
            4.7.1.4 Whitespace in Element Content
            4.7.1.5 Type of a Constructed Element
        4.7.2 Computed Constructors
            4.7.2.1 Computed Element Constructors
            4.7.2.2 Computed Attribute Constructors
            4.7.2.3 Document Node Constructors
            4.7.2.4 Text Nodes Constructors
        4.7.3 Other Constructors and Comments
    4.8 [For/FLWR] Expressions
        4.8.1 FLWOR expressions
        4.8.2 For expression
        4.8.3 Let Expression
        4.8.4 Order By
    4.9 Unordered Expressions
    4.10 Conditional Expressions
    4.11 Quantified Expressions
    4.12 Expressions on SequenceTypes
        4.12.1 Instance Of
        4.12.2 Typeswitch
        4.12.3 Cast
        4.12.4 Castable
        4.12.5 Constructor Functions
        4.12.6 Treat
    4.13 Validate Expressions
5 Modules and Prologs
    5.1 Version Declaration
    5.2 Namespace Declarations
    5.3 Default Namespace Declarations
    5.4 Schema Imports
    5.5 Module Imports
    5.6 Variable Definitions
    5.7 Validation Declaration
    5.8 Xmlspace Declaration
    5.9 Default Collation
    5.10 Function Definitions
6 Additional Semantics of Functions
    6.1 Formal Semantics Functions
        6.1.1 The fs:distinct-doc-order function
        6.1.2 The fs:item-sequence-to-node-sequence function
        6.1.3 The fs:item-sequence-to-untypedAtomic function
        6.1.4 The fs:convert-simple-operand function
        6.1.5 The fs:convert-operand function
        6.1.6 The arithmetic operator pseudo-functions: fs:minus, fs:plus, fs:times, fs:idiv, fs:div, and fs:mod
        6.1.7 The comparison pseudo-functions: fs:eq, fs:ne, fs:lt, fs:le, fs:gt, and fs:ge
    6.2 Standard functions with specific typing rules
        6.2.1 The fn:error function
        6.2.2 The fn:distinct-nodes and fn:distinct-values functions
        6.2.3 The fn:collection and fn:doc functions
        6.2.4 The op:union, op:intersect, and op:except operators
        6.2.5 The fn:data function
        6.2.6 The fn:ceiling, fn:floor, fn:round, and fn:round-half-to-even functions
        6.2.7 The fn:subsequence, and fn:remove functions
        6.2.8 The fn:min fn:max, fn:avg, and fn:sum functions
7 Auxiliary Judgments
    7.1 Judgments for schema contexts
        7.1.1 Mode is
        7.1.2 Context is
    7.2 Judgments for accessing types
        7.2.1 Derives
        7.2.2 Substitutes
        7.2.3 Element and attribute type lookup
        7.2.4 Static element and attribute type lookup
        7.2.5 Extension
        7.2.6 Type adjustment
        7.2.7 Type expansion
    7.3 Judgments for step expressions and filtering
        7.3.1 Principal Node Kind
        7.3.2 Filters
            7.3.2.1 Type filters
            7.3.2.2 Value filters
        7.3.3 Attribute filtering
    7.4 Judgments for type matching
        7.4.1 Matches
        7.4.2 Subtype
    7.5 Judgments for sequences of item types
    7.6 Judgments for type promotion
    7.7 Judgments for the validate expression
        7.7.1 Builtin attributes
        7.7.2 Mixed content
        7.7.3 Type resolution
        7.7.4 Interleaving
        7.7.5 Erasure
            7.7.5.1 Simply erases
            7.7.5.2 Erases
        7.7.6 Annotate
            7.7.6.1 Simply annotate
            7.7.6.2 Nil-annotate
            7.7.6.3 Annotate
        7.7.7 Validates as
8 Importing Schemas
    8.1 Introduction
        8.1.1 Features
        8.1.2 Organization
        8.1.3 Main mapping rules
        8.1.4 Special attributes
            8.1.4.1 use
            8.1.4.2 minOccurs and maxOccurs
            8.1.4.3 mixed
            8.1.4.4 nillable
            8.1.4.5 substitutionGroup
        8.1.5 Anonymous type names
    8.2 Attribute Declarations
        8.2.1 Global attributes declarations
        8.2.2 Local attribute declarations
    8.3 Element Declarations
        8.3.1 Global element declarations
        8.3.2 Local element declarations
    8.4 Complex Type Definitions
        8.4.1 Global complex type
        8.4.2 Local complex type
        8.4.3 Complex type with simple content
        8.4.4 Complex type with complex content
    8.5 Attribute Uses
    8.6 Attribute Group Definitions
        8.6.1 Attribute group definitions
        8.6.2 Attribute group reference
    8.7 Model Group Definitions
    8.8 Model Groups
        8.8.1 All groups
        8.8.2 Choice groups
        8.8.3 Sequence groups
    8.9 Particles
        8.9.1 Element reference
        8.9.2 Group reference
    8.10 Wildcards
        8.10.1 Attribute wildcards
        8.10.2 Element wildcards
    8.11 Identity-constraint Definitions
    8.12 Notation Declarations
    8.13 Annotation
    8.14 Simple Type Definitions
        8.14.1 Global simple type definition
        8.14.2 Local simple type definition
        8.14.3 Simple type content
    8.15 Schemas as a whole
        8.15.1 Schema
        8.15.2 Include
        8.15.3 Redefine
        8.15.4 Import

Appendices

A Normalized core grammar
    A.1 Core lexical structure
        A.1.1 Syntactic Constructs
    A.2 Core BNF
B Functions and Operators
    B.1 Functions and Operators used in the Formal Semantics
    B.2 Mapping of Overloaded Internal Functions
C References
    C.1 Normative References
    C.2 Non-normative References
    C.3 Background References
D XQuery 1.0 and XPath 2.0 Issues


1 Introduction

This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:

The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].

This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the expressions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.

Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.

[XPath/XQuery] is a functional language. [XPath/XQuery] is built from expressions, rather than statements. Every construct in the language (except for the XQuery query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.

[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas that describe the input documents and the output document, and the [XPath/XQuery] language can then perform operations based on these types. In addition, [XPath/XQuery] supports static type analysis. This means a static analysis phase is defined on [XPath/XQuery] expressions that infers the output type of an expression, based on the type of its inputs. Static typing allows early detection of type errors, and can be used as the basis for certain forms of optimization. The [XPath/XQuery] type system captures most of the features of [XML Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types.

This document is organized as follows. [2 Preliminaries] introduces the notations used to define the [XPath/XQuery] formal semantics. These include the formal notations for values in the [XPath/XQuery] data model and for types in XML Schema. The next three sections: [3 Basics], [4 Expressions], and [5 Modules and Prologs] have the same structure as the corresponding sections in the [XQuery 1.0: A Query Language for XML] and [XML Path Language (XPath) 2.0] documents. This allows the reader to quickly find the formal definition of a particular language construct. [3 Basics] defines the semantics for basic [XPath/XQuery] concepts, and [4 Expressions] defines the dynamic and static semantics of each [XPath/XQuery] expression. [5 Modules and Prologs] defines the semantics of the [XPath/XQuery] prolog. [6 Additional Semantics of Functions] defines the static semantics of several functions in [XQuery 1.0 and XPath 2.0 Functions and Operators] and gives the dynamic and static semantics of several supporting functions used in this document. The remaining sections, [7 Auxiliary Judgments] and [8 Importing Schemas], contain material that supports the formal semantics of [XPath/XQuery]. [7 Auxiliary Judgments] defines formal judgments that relate data model values to types, that relate types to types, and that support the formal definition of validation. These judgements are used in the definition of expressions in [4 Expressions]. Lastly, [8 Importing Schemas], specifies how XML Schema documents are imported into the [XPath/XQuery] type system and relates XML Schema types to the [XPath/XQuery] type system.

2 Preliminaries

This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.

2.1 Introduction to the Formal Semantics

Why a Formal Semantics? The goal of the formal semantics is to complement the [XPath/XQuery] specification ([XQuery 1.0: A Query Language for XML] and [XML Path Language (XPath) 2.0]), by defining the meaning of [XPath/XQuery] expressions with mathematical rigor.

A rigorous formal semantics clarifies the intended meaning of the English specification, ensures that no corner cases are left out, and provides a reference for implementation.

Why use formal notations? Rigor is achieved by the use of formal notations to represent [XPath/XQuery] objects such as expressions, XML values, and XML Schema types, and by the systematic definition of the relationships between those objects to reflect the meaning of the language. In particular, the dynamic semantics relates [XPath/XQuery] expressions to the XML value to which they evaluate, and the static semantics relates [XPath/XQuery] expressions to the XML Schema type that is infered for that expression.

The Formal Semantics uses several kinds of formal notations to define the relationships between [XPath/XQuery] expressions, XML values, and XML Schema types. This section contains a small tutorial to introduce the notations for judgments, inference rules, and mapping rules as well as the notation for environments, which implement the dynamic and static contexts. The reader already familiar with these notations can skip this section and continue with [2.2 XML Values].

2.1.1 Notations from grammar productions

Grammar productions are used to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.

XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.

[41 (XQuery)]    FLWORExpr    ::=    (ForClause |  LetClause)+ WhereClause? OrderByClause? "return" ExprSingle

For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in [XML Path Language (XPath) 2.0], and are annotated with "(XPath)". For instance, the following production describes for expressions in XPath.

[25 (XPath)]    ForExpr    ::=    SimpleForClause "return" ExprSingle

XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.

[35 (Core)]    ForExpr    ::=    SimpleForClause "return" ExprSingle

The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe values in the [XQuery 1.0 and XPath 2.0 Data Model], and to describe the [XPath/XQuery] type system. Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.

[43 (Formal)]    Definition    ::=    (<"define" "element"> ElementName Substitution? Nillable? TypeReference)
|  (<"define" "attribute"> AttributeName TypeReference)
|  (<"define" "type"> TypeName TypeDerivation)

Note that grammar productions that are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.

2.1.2 Notations for judgments

The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.

For example:

Notation

The judgment

Painting is beautiful

holds if the object Painting is beautiful.

Notation

Here are three judgments that are used extensively in this document.

The judgment

Expr => Value

holds if the expression Expr yields (or evaluates to) the value Value.

The judgment

Expr : Type

holds when the expression Expr has type Type.

The judgment

Expr raises Error

holds when the expression Expr raises the error Error.

A judgment can contain symbols and patterns.

Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.

Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr represents any [XPath/XQuery] expression, and Value represents any value in the [XPath/XQuery] data model.

When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).

Patterns may appear with subscripts (e.g. Expr1, Expr2) to distinguish different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. The judgment'$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.

In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.

2.1.3 Notations for inference rules

Inference rules are used to specify whether a judgment holds or not. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments.

A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:

premise1 ... premisen

conclusion

All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.

Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:

$x => 0      3 => 3

$x + 3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.

It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:


3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.

The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:

Variable => Integer

Variable + 0 => Integer

As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

Note

In effect, inference rules are just a notation that describes a bottom-up algorithm, for instance an evaluation algorithm where the result of an expression depends on the result for its sub-expressions.

2.1.4 Notations for environments

Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" expression can be captured in an environment and used for further evaluations.

An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.

If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped. The notation is intentionally akin to function application as an environment can be seen as a function from the argument symbol to the "object" to which the symbol is mapped.

This document uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".

Updating is only defined on environment groups:

  • "env + mem(symbol => object) " denotes the new environment group that is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol => object indicates that symbol is mapped to object in the new environment.

  • If the "object" is a type then the following notation relates a symbol to a type: "env + mem(symbol : object) ".

  • The following shorthand is also allowed: "env + mem( symbol1 => object1 ; ... ; symboln => objectn ) " in which each symbol is mapped to a corresponding object in the new environment.

    This notation is equivalent to nested updates, as in " (env + mem( symbol1 => object1) + ... ) + mem(symboln => objectn)".

Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of a symbol (e.g., a variable, a namespace prefix, etc.) Also, note that there are no operations to remove entries from environments: this is never necessary because updating an the environment group effectively creates a new extended copy of the original environment group, and the original environme group remains accessible along with the updated copy.

Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is used in almost all inference rules.

For instance, the judgment

dynEnv |- Expr => Value

is read as: Assuming the dynamic environment dynEnv, the expression Expr yields the value Value.

The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [3.1 Expression Context].

2.1.5 Putting it together

Putting the above notations together, here is an example of an inference rule that occurs later in this document:

statEnv |- Expr1 : Type1      statEnv |- Expr2 : Type2

statEnv |- Expr1 , Expr2 : Type1, Type2

This rule is read as follows: if two expressions Expr1 and Expr2 are known to have the static types types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the static type "Type1, Type2", which is the sequence of types Type1 and Type2.

The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.

statEnv |- Expr1 : Type1      statEnv + varType(QName : Type1) |- Expr2 : Type2

statEnv |- let $QName := Expr1 return Expr2 : Type2

This rule is as follows: First, the type Type1 for the "let" input expression Expr1 is computed. Second the "let" variable is added into the varType component of the static environment group statEnv, with type Type1. Finally, the type Type2 of Expr2 is computed in that new environment.

Ed. Note: Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.2 XML Values

[XPath/XQuery] manipulates XML values as defined in the [XQuery 1.0 and XPath 2.0 Data Model]. XML values are composed of nodes, atomic values and sequences. This section introduces formal notations for describing [XPath/XQuery] values from [XQuery 1.0 and XPath 2.0 Data Model]. These notations are used to describe and manipulate values in inference rules, but are not exposed to the [XPath/XQuery] user.

2.2.1 Formal values

A value is a sequence of zero or more items. An item is either an atomic value or a node.

An atomic value is a value in the value space of an atomic type and is labeled with the name of that atomic type. An XML Schema atomic type [XML Schema Part 2] may be primitive or derived, or xdt:untypedAtomic.

A node is either an element, an attribute, a document, a text, a comment, or a processing-instruction node. Elements have a type annotation and contain a value. Attributes have a type annotation and contain a simple value, which is a sequence of atomic values. Text nodes always contain one string value of type xdt:untypedAtomic, therefore the corresponding type annotation is omitted.

A type annotation can be either the QName of a declared type or an anonymous type. An anonymous type corresponds to an XML Schema type for which the schema writer did not provide a name. Anonymous type names are not visible to the user, but are generated during schema validation and used to annotate nodes in the data model. By convention, anonymous type names are written in the Formal Semantics as: [Anon0], [Anon1], etc.

Untyped elements (e.g., from well-formed documents) are annotated with xs:anyType, untyped attributes are annotated with xs:anySimpleType, and untyped atomic values (i.e., text content or attribute content in well-formed documents) are annotated with xdt:untypedAtomic.

Element have an optional "nilled" marker. This marker can only be present if the element has been validated against an element type in the schema which is "nillable", and they have no content and an attribute xsi:nil set to "true".

[7 (Formal)]    Value    ::=    Item
|  (Value "," Value)
|  ("(" ")")
[16 (Formal)]    Item    ::=    NodeValue
|  AtomicValue
[17 (Formal)]    AtomicValue    ::=    AtomicValueContent TypeAnnotation
[18 (Formal)]    AtomicValueContent    ::=    String
|  Boolean
|  Decimal
|  Float
|  Double
|  Duration
|  DateTime
|  Time
|  Date
|  GYearMonth
|  GYear
|  GMonthDay
|  GDay
|  GMonth
|  HexBinary
|  Base64Binary
|  AnyURI
|  QName
|  NOTATION
[19 (Formal)]    TypeAnnotation    ::=    <"of" "type"> TypeName
[9 (Formal)]    ElementValue    ::=    "element" ElementName "nilled"? TypeAnnotation "{" Value "}"
[10 (Formal)]    AttributeValue    ::=    "attribute" AttributeName TypeAnnotation "{" SimpleValue "}"
[8 (Formal)]    SimpleValue    ::=    AtomicValue
|  (SimpleValue "," SimpleValue)
|  ("(" ")")
[11 (Formal)]    DocumentValue    ::=    "document" "{" Value "}"
[13 (Formal)]    CommentValue    ::=    "comment" "{" String "}"
[14 (Formal)]    ProcessingInstructionValue    ::=    "processing-instruction" QName "{" String "}"
[12 (Formal)]    TextValue    ::=    "text" "{" String "}"
[15 (Formal)]    NodeValue    ::=    ElementValue
|  AttributeValue
|  DocumentValue
|  TextValue
|  CommentValue
|  ProcessingInstructionValue
[20 (Formal)]    ElementName    ::=    QName
[23 (Formal)]    AttributeName    ::=    QName
[27 (Formal)]    TypeName    ::=    QName |  AnonymousTypeName
[26 (Formal)]    AnonymousTypeName    ::=    [Anon1] |  [Anon2] |  ...

Notation

In the above grammar, "String" indicates the value space of xs:string, "Decimal" indicates the value space of xs:decimal, etc.

Note that the same rule about constructing sequences apply to the values described by that grammar. Notably sequences cannot be nested. For example, the sequence (10, (1, 2), (), (3, 4)) is equivalent to the sequence (10, 1, 2, 3, 4).

Ed. Note: Issue: The formal semantics does not represent namespace nodes (See Issue 486. (FS-Issue-0143)).

2.2.2 Examples of values

A well-formed document

  <fact>The cat weighs <weight units="lbs">12</weight> pounds.</fact>

In the absence of a Schema, this document is represented as

  element fact of type xs:anyType {
    text { "The cat weighs " },
    element weight of type xs:anyType {
      attribute units of type xs:anySimpleType {
        "lbs" of type xdt:untypedAtomic
      }
      text { "12" }
    },
    text { " pounds." }
  }

A document before and after validation.

  <weight xsi:type="xs:integer">42</weight>

The formal model for values can represent values before and after validation. Before validation, this element is represented as:

  element weight of type xs:anyType {
    attribute xsi:type of type xs:anySimpleType {
      "xs:integer" of type xdt:untypedAtomic
    },
    text { "42" }
  }

After validation, this element is represented as:

  element weight of type xs:integer {
    attribute xsi:type of type xs:QName {
      "xs:integer" of type xs:QName
    },
    42 of type xs:integer
  }

An element with a list type

  <sizes>1 2 3</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 2 3" }
  }

Assume the following Schema.

  <xs:element name="sizes" type="sizesType"/>
  <xs:simpleType name="sizesType">
    <xs:list itemType="sizeType"/>
  </xs:simpleType>
  <xs:simpleType name="sizeType">
    <xs:restriction base="xs:integer"/>
  </xs:simpleType>

After validation against this Schema, the element is represented as:

  element sizes of type sizesType {
    1 of type sizeType,
    2 of type sizeType,
    3 of type sizeType
  }

An element with an anonymous type

  <sizes>1 2 3</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 2 3" }
  }

Assume the following Schema.

  <xs:element name="sizes">
    <xs:simpleType>
      <xs:list itemType="xs:integer"/>
    </xs:simpleType>
  </xs:element>

After validation, this element is represented as:

  element sizes of type [Anon1] {
    1 of type xs:integer,
    2 of type xs:integer,
    3 of type xs:integer
  }

where [Anon1] stands for the internal anonymous name generated by the system for the sizes element.

An element with a nillable

  <sizes xsi:nil="true"/>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    attribute xsi:nil of type xs:anySimpleType { "true" }
  }

Assume the following Schema.

  <xs:element name="sizes" type="sizesType" nillable="true"/>

After validation against this Schema, the element is represented as:

  element sizes nilled of type sizesType {
    attribute xsi:nil of type xs:anySimpleType { "true" }
  }

An element with a union type

  <sizes>1 two 3 four</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 two 3 four" }
  }

Assume the following Schema:

  <xs:element name="sizes" type="sizesType"/>
  <xs:simpleType name="sizesType">
    <xs:list itemType="sizeType"/>
  </xs:simpleType>
  <xs:simpleType name="sizeType">
    <xs:union memberType="xs:integer xs:string"/>
  </xs:simpleType>

After validation against this Schema, the element is represented as:

  element sizes of type sizesType {
    1 of type xs:integer,
    "two" of type xs:string,
    3 of type xs:integer,
    "four" of type xs:string
  }

2.3 The [XPath/XQuery] Type System

The [XPath/XQuery] type system is used in the specification of the dynamic and of the static semantics of [XPath/XQuery]. This section introduces formal notations for describing types.

2.3.1 XML Schema and the [XPath/XQuery] Type System

The [XPath/XQuery] type system is based on XML Schema. Formalizing the treatment of types in [XPath/XQuery], however, requires some adjustments.

Use of formal notations for types. The Formal Semantics uses formal notations for types instead of XML Schema syntax. Those notations are used extensively to describe and manipulate types in the inference rules. The formal notations for types introduced here are not exposed to the [XPath/XQuery] user.

Representation of content models. For the purpose of static typing, the [XPath/XQuery] type system only describes minOccurs, maxOccurs combinations which corresponds to the standard DTD constructs +, *, and ?. Choices are represented using the standard DTD construct |. All groups are represented using the & notation.

Representation of anonymous types. To clarify the semantics, the [XPath/XQuery] type system makes all anonymous types explicit.

Representation of XML Schema simple type facets and identity constraints. For simplicity, XML Schema simple type facets as well as indentity constraints are not formally represented in the [XPath/XQuery] type system. Still, [XPath/XQuery] implementation supporting XML Schema import and validation must follow the XML Schema specification and take simple type facets and indentity constraints into account.

The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas]. The rest of this section is organized as follows. [2.3.2 Item types] describes types items, [2.3.3 Content models] describes content models, and [2.3.4 Top level definitions] describe top-level type declarations.

2.3.2 Item types

An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, a comment node type, or a processing instruction type.

[30 (Formal)]    ItemType    ::=    AtomicTypeName |  NodeType
[32 (Formal)]    AtomicTypeName    ::=    QName
[31 (Formal)]    NodeType    ::=    ElementType
|  AttributeType
|  DocumentType
|  "comment"
|  "processing-instruction"
|  "text"
[33 (Formal)]    ElementType    ::=    "element" ElementName? TypeSpecifier?
[34 (Formal)]    TypeSpecifier    ::=    Nillable? TypeReference
[35 (Formal)]    AttributeType    ::=    "attribute" AttributeName? TypeReference?
[36 (Formal)]    Nillable    ::=    "nillable"
[40 (Formal)]    TypeReference    ::=    <"of" "type"> TypeName
[51 (Formal)]    DocumentType    ::=    "document" ("{" Type? "}")?

An element or attribute type has an optional name and an optional type reference. A name alone corresponds to a reference to a global element or attribute declaration. A name with a type reference corresponds to a local element or attribute declaration. The word "element" or "attribute" alone refers to the wildcard types any element or any attribute. In addition, an element type has an optional nillable flag which indicates whether the element can be nilled or not.

A document type has an optional content type. If no content type is given, then it refers to the wildcard type describing any document.

Note

Note that generic node types (e.g., node()), are interpreted in the type system as union types (e.g., element | attribute | text | comment | processing-instruction) and therefore do not appear here. The semantics of sequence types is described in [3.4.3.1 SequenceType Matching].

Examples

A text node type

  text

matches any text node, such as:

  text { "Text is beautiful" }

A wildcard element

  element

matches any element.

A wildcard element of type string

  element of type xs:integer 

matches any element of type string, such as:

  element name of type xs:string { "John Doe" }

A nillable element of type string

  element size nillable of type xs:integer 

matches any element with name size of type xs:integer, such as:

  element size of type xs:integer {
    2 of type xs:integer
  }

or it matches any element with name size which has no content and the xsi:nil attribute set to true, such as:

  element size of type xs:integer {
    attribute xsi:nil of type xs:anySimpleType { "true" }
  }

A reference to a globally declared attribute

  attribute sizes

refers to the global declaration for the attribute sizes.

An element with an anonymous type

  element sizes of type [Anon1]

matches any element with name sizes and the anonymous type [Anon1], such as:

  element sizes of type [Anon1] {
    1 of type xs:integer,
    2 of type xs:integer,
    3 of type xs:integer
  }

2.3.3 Content models

Following XML Schema, types in [XPath/XQuery] are composed from item types by optional, one or more, zero or more, all group, sequence, choice, empty sequence, or empty choice (written none).

The type empty matches the empty sequence. The type none matches no values. It is called the empty choice because it is the identity for choice, that is (Type | none) = Type)). The type none is the static type for [6.2.1 The fn:error function].

[28 (Formal)]    Type    ::=    ItemType
|  (Type Occurrence)
|  (Type "&" Type)
|  (Type "," Type)
|  (Type "|" Type)
|  "empty"
|  "none"
[29 (Formal)]    Occurrence    ::=    "*" |  "+" |  "?"

The [XPath/XQuery] type system includes three binary operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema. The [XPath/XQuery] type system includes three unary operators on types: "*", "+", and "?", corresponding respectively to zero or more instances of the type, one or more instances of the type, or an optional instance of the type.

The "&" operator builds the "interleaved product" of two types. The type Type1 & Type2 matches any sequence that is an interleaving of a sequence that matches Type1 and a sequence that matches Type2.

The interleaved product is used to represent all groups in XML Schema. All group in XML Schema are restricted to apply only on global or local element declarations with lower bound 0 or 1, and upper bound 1.

Examples

A sequence of elements

The "," operator builds the "sequence" of two types. For example,

  element title of type xs:string, element year of type xs:integer

is a sequence of an element title of type string followed by an element year of type integer.

The union of two element types

The "|" operator builds the "union" of two types. For example,

  element editor of type xs:string | element bib:author

means either an element editor of type string, or a reference to the gobal element bib:author.

An all group of two elements

The "&" operator builds the "interleaved product" of two types. For example,

  (element a & element b) =
    element a, element b
  | element b, element a

which specifies that the a and b elements can occur in any order.

An empty type

The following type matches the empty sequence.

  empty

A sequence of zero or more elements

The following type matches zero or more elements each of which can be a surgeon or a plumber.

  (element surgeon | element plumber)*

2.3.4 Top level definitions

Top level definitions correspond to global element declarations, global attribute declarations and type definitions in XML Schema.