Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document defines formally the semantics of XQuery 1.0 [XQuery 1.0: A Query Language for XML] XPath 2.0 [XML Path Language (XPath) 2.0].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. It contains a number of open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.
This version reflects the most recent semantics of [XPath/XQuery]. Among the most important changes from the previous version of this document are:
Implementation of the semantics of namespaces in [XPath/XQuery]. This closes Issue 443 (FS-Issue-0100: Namespace resolution) and Issue 508 (FS-Issue-0165: Namespaces in element constructors).
A simplified static semantics for path expressions. This closes Issue 475 (FS-Issue-0132: Typing for descendant), Issue 527 (Static typing of XPath index expressions), and Issue 560 (Exactness of Type Inference).
Public feedback is requested on the remaining open XPath issues: Issue 481 (Semantics of Schema Context), Issue 496 (Support for lax and strict wildcards), Issue 555 (Formal Semantics of Module Import), Issue 556 (Formal Semantics of Variable Definitions), Issue 557 (Formal semantics of Validation Declaration), and Issue 559 (New Sequence Type needs to be fully implemented in Formal Semantics).
This Working Draft references the Last Call Working Drafts of [XQuery 1.0 and XPath 2.0 Functions and Operators] and [XQuery 1.0 and XPath 2.0 Data Model]. Since these Last Call Working Drafts are not being re-published along with this Working Draft, it is possible that some differences may exist between this Working Draft and the Last Call Working Drafts. The public is encouraged to provide feedback on any differences that they find. The Working Groups are planning to publish a set of synchronized documents as early as possible.
Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).
XQuery 1.0, XPath 2.0, and their formal semantics has been defined jointly by the XML Query Working Group and the XSL Working Group (both part of the XML Activity).
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.
1 Introduction
2 Preliminaries
2.1 Introduction to the Formal Semantics
2.1.1 Notations from grammar productions
2.1.2 Notations for judgments
2.1.3 Notations for inference rules
2.1.4 Notations for environments
2.1.5 Putting it together
2.2 XML Values
2.2.1 Formal values
2.2.2 Examples of values
2.3 The [XPath/XQuery] Type System
2.3.1 XML Schema and the [XPath/XQuery] Type System
2.3.2 Item types
2.3.3 Content models
2.3.4 Top level definitions
2.3.5 Example of a complete Schema
2.4 Processing model and main judgments
2.4.1 Processing model
2.4.2 Normalization judgment
2.4.3 Static typing judgment
2.4.4 Dynamic evaluation judgment
2.5 Relationship with other documents
2.5.1 Namespaces
2.5.2 Functions and operators
3 Basics
3.1 Expression Context
3.1.1 Static Context
3.1.1.1 Predefined Types
3.1.2 Dynamic Context
3.2 Processing Model
3.3 Important Concepts
3.3.1 Document Order
3.3.2 Typed Value and String Value
3.3.3 Input Sources
3.4 Types
3.4.1 SequenceType
3.4.1.1 SequenceType Matching
3.4.2 Type Conversions
3.4.2.1 Atomization
3.4.2.2 Effective Boolean Value
3.4.2.3 XPath 1.0 Backward Compatibility
3.5 Errors Handling
3.6 Optional Features
3.6.1 Basic XQuery
3.6.2 Schema Import Feature
3.6.3 Static Typing Feature
3.6.4 Extensions
3.6.4.1 Pragmas
3.6.4.2 Must-Understand Extensions
3.6.4.3 XQuery Flagger
4 Expressions
4.1 Primary Expressions
4.1.1 Literals
4.1.2 Variables
4.1.3 Parenthesized Expressions
4.1.4 Context Item Expression
4.1.5 Function Calls
4.1.6 [XPath/XQuery] Comments
4.2 Path Expressions
4.2.1 Steps
4.2.1.1 Axes
4.2.1.2 Node Tests
4.2.2 Predicates
4.2.3 Unabbreviated Syntax
4.2.4 Abbreviated Syntax
4.3 Sequence Expressions
4.3.1 Constructing Sequences
4.3.2 Combining Sequences
4.4 Arithmetic Expressions
4.5 Comparison Expressions
4.5.1 Value Comparisons
4.5.2 General Comparisons
4.5.3 Node Comparisons
4.5.4 Order Comparisons
4.6 Logical Expressions
4.7 Constructors
4.7.1 Direct Element Constructors
4.7.1.1 Attributes
4.7.1.2 Namespaces
4.7.1.3 Content
4.7.1.4 Whitespace in Element Content
4.7.1.5 Type of a Constructed Element
4.7.2 Other Direct Constructors
4.7.3 Computed Constructors
4.7.3.1 Computed Element Constructors
4.7.3.2 Computed Attribute Constructors
4.7.3.3 Document Node Constructors
4.7.3.4 Text Nodes Constructors
4.7.3.5 Computed Processing Instruction Constructors
4.7.3.6 Computed Comment Constructors
4.7.3.7 Computed Namespace Constructors
4.8 [For/FLWR] Expressions
4.8.1 FLWOR expressions
4.8.2 For expression
4.8.3 Let Expression
4.8.4 Order By and Return Clauses
4.9 Unordered Expressions
4.10 Conditional Expressions
4.11 Quantified Expressions
4.12 Expressions on SequenceTypes
4.12.1 Instance Of
4.12.2 Typeswitch
4.12.3 Cast
4.12.4 Castable
4.12.5 Constructor Functions
4.12.6 Treat
4.13 Validate Expressions
5 Modules and Prologs
5.1 Version Declaration
5.2 Namespace Declarations
5.3 Default Namespace Declarations
5.4 Schema Imports
5.5 Module Imports
5.6 Variable Definitions
5.7 Validation Declaration
5.8 Xmlspace Declaration
5.9 Default Collation
5.10 Function Definitions
6 Additional Semantics of Functions
6.1 Formal Semantics Functions
6.1.1 The fs:distinct-doc-order function
6.1.2 The fs:item-sequence-to-node-sequence function
6.1.3 The fs:item-sequence-to-untypedAtomic function
6.1.4 The fs:convert-simple-operand function
6.1.5 The fs:convert-operand function
6.1.6
The arithmetic operator pseudo-functions:
fs:minus,
fs:plus,
fs:times,
fs:idiv,
fs:div, and
fs:mod
6.1.7
The comparison pseudo-functions:
fs:eq,
fs:ne,
fs:lt,
fs:le,
fs:gt, and
fs:ge
6.2 Standard functions with specific typing rules
6.2.1 The fn:error function
6.2.2 The fn:distinct-nodes and
fn:distinct-values functions
6.2.3 The fn:collection and fn:doc functions
6.2.4 The op:union, op:intersect, and
op:except operators
6.2.5 The fn:data function
6.2.6 The fn:ceiling, fn:floor,
fn:round, and fn:round-half-to-even functions
6.2.7 The fn:subsequence, and fn:remove functions
6.2.8 The fn:min
fn:max, fn:avg, and fn:sum functions
7 Auxiliary Judgments
7.1 Judgments for schema contexts
7.1.1 Mode is
7.1.2 Context is
7.2 Judgments for accessing types
7.2.1 Derives
7.2.2 Substitutes
7.2.3 Element and attribute type lookup
7.2.4 Static element and attribute type lookup
7.2.5 Extension
7.2.6 Type adjustment
7.2.7 Type expansion
7.3 Judgments for step expressions and filtering
7.3.1 Principal Node Kind
7.3.2 Auxiliary judgments for axis
7.3.2.1 Static semantics of axis
7.3.2.2 Dynamic semantics of axis
7.3.3 Auxiliary judgments for node tests
7.3.3.1 Static typing of node tests
7.3.3.2 Dynamic semantics of node tests
7.3.4 Attribute filtering
7.4 Judgments for type matching
7.4.1 Matches
7.4.2 Subtype
7.5 Judgments for sequences of item types
7.6 Judgments for type promotion
7.7 Judgments for the validate expression
7.7.1 Builtin attributes
7.7.2 Mixed content
7.7.3 Type resolution
7.7.4 Interleaving
7.7.5 Erasure
7.7.5.1 Simply erases
7.7.5.2 Erases
7.7.6 Annotate
7.7.6.1 Simply annotate
7.7.6.2 Nil-annotate
7.7.6.3 Annotate
7.7.7 Validates as
8 Importing Schemas
8.1 Introduction
8.1.1 Features
8.1.2 Organization
8.1.3 Main mapping rules
8.1.4 Special attributes
8.1.4.1 use
8.1.4.2 minOccurs and maxOccurs
8.1.4.3 mixed
8.1.4.4 nillable
8.1.4.5 substitutionGroup
8.1.5 Anonymous type names
8.2 Attribute Declarations
8.2.1 Global attributes declarations
8.2.2 Local attribute declarations
8.3 Element Declarations
8.3.1 Global element declarations
8.3.2 Local element declarations
8.4 Complex Type Definitions
8.4.1 Global complex type
8.4.2 Local complex type
8.4.3 Complex type with simple content
8.4.4 Complex type with complex content
8.5 Attribute Uses
8.6 Attribute Group Definitions
8.6.1 Attribute group definitions
8.6.2 Attribute group reference
8.7 Model Group Definitions
8.8 Model Groups
8.8.1 All groups
8.8.2 Choice groups
8.8.3 Sequence groups
8.9 Particles
8.9.1 Element reference
8.9.2 Group reference
8.10 Wildcards
8.10.1 Attribute wildcards
8.10.2 Element wildcards
8.11 Identity-constraint Definitions
8.12 Notation Declarations
8.13 Annotation
8.14 Simple Type Definitions
8.14.1 Global simple type definition
8.14.2 Local simple type definition
8.14.3 Simple type content
8.15 Schemas as a whole
8.15.1 Schema
8.15.2 Include
8.15.3 Redefine
8.15.4 Import
A Normalized core grammar
A.1 Core BNF
B Functions and Operators
B.1 Functions and Operators used in the Formal Semantics
B.2 Mapping of Overloaded Internal Functions
C References
C.1 Normative References
C.2 Non-normative References
C.3 Background References
D XQuery 1.0 and XPath 2.0 Issues
This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:
[XQuery 1.0: A Query Language for XML] introduces the XQuery 1.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XML Path Language (XPath) 2.0] introduces the XPath 2.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XQuery 1.0 and XPath 2.0 Functions and Operators] lists the functions and operators defined for the [XPath/XQuery] language and specifies to which arguments they can be applied and what the result should be.
[XQuery 1.0 and XPath 2.0 Data Model] formally specifies the data model used by [XPath/XQuery] to represent the content of XML documents. The [XPath/XQuery] language is formally defined by operations on this data model.
[XQuery 1.0 and XPath 2.0 Data Model Serialization] specifies how [XPath/XQuery] data model values are serialized back into XML.
The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].
This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the expressions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.
Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.
[XPath/XQuery] is a functional language. [XPath/XQuery] is built from expressions, rather than statements. Every construct in the language (except for the XQuery query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.
[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas that describe the input documents and the output document, and the [XPath/XQuery] language can then perform operations based on these types. In addition, [XPath/XQuery] supports static type analysis. This means a static analysis phase is defined on [XPath/XQuery] expressions that infers the output type of an expression, based on the type of its inputs. Static typing allows early detection of type errors, and can be used as the basis for certain forms of optimization. The [XPath/XQuery] type system captures most of the features of [XML Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types.
This document is organized as follows. [2 Preliminaries] introduces the notations used to define the [XPath/XQuery] formal semantics. These include the formal notations for values in the [XPath/XQuery] data model and for types in XML Schema. The next three sections: [3 Basics], [4 Expressions], and [5 Modules and Prologs] have the same structure as the corresponding sections in the [XQuery 1.0: A Query Language for XML] and [XML Path Language (XPath) 2.0] documents. This allows the reader to quickly find the formal definition of a particular language construct. [3 Basics] defines the semantics for basic [XPath/XQuery] concepts, and [4 Expressions] defines the dynamic and static semantics of each [XPath/XQuery] expression. [5 Modules and Prologs] defines the semantics of the [XPath/XQuery] prolog. [6 Additional Semantics of Functions] defines the static semantics of several functions in [XQuery 1.0 and XPath 2.0 Functions and Operators] and gives the dynamic and static semantics of several supporting functions used in this document. The remaining sections, [7 Auxiliary Judgments] and [8 Importing Schemas], contain material that supports the formal semantics of [XPath/XQuery]. [7 Auxiliary Judgments] defines formal judgments that relate data model values to types, that relate types to types, and that support the formal definition of validation. These judgements are used in the definition of expressions in [4 Expressions]. Lastly, [8 Importing Schemas], specifies how XML Schema documents are imported into the [XPath/XQuery] type system and relates XML Schema types to the [XPath/XQuery] type system.
This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.
Why a Formal Semantics? The goal of the formal semantics is to complement the [XPath/XQuery] specification ([XQuery 1.0: A Query Language for XML] and [XML Path Language (XPath) 2.0]), by defining the meaning of [XPath/XQuery] expressions with mathematical rigor.
A rigorous formal semantics clarifies the intended meaning of the English specification, ensures that no corner cases are left out, and provides a reference for implementation.
Why use formal notations? Rigor is achieved by the use of formal notations to represent [XPath/XQuery] objects such as expressions, XML values, and XML Schema types, and by the systematic definition of the relationships between those objects to reflect the meaning of the language. In particular, the dynamic semantics relates [XPath/XQuery] expressions to the XML value to which they evaluate, and the static semantics relates [XPath/XQuery] expressions to the XML Schema type that is infered for that expression.
The Formal Semantics uses several kinds of formal notations to define the relationships between [XPath/XQuery] expressions, XML values, and XML Schema types. This section contains a small tutorial to introduce the notations for judgments, inference rules, and mapping rules as well as the notation for environments, which implement the dynamic and static contexts. The reader already familiar with these notations can skip this section and continue with [2.2 XML Values].
Grammar productions are used to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.
XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.
| [42 (XQuery)] | FLWORExpr | ::= | (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle |
For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in [XML Path Language (XPath) 2.0], and are annotated with "(XPath)". For instance, the following production describes for expressions in XPath.
| [25 (XPath)] | ForExpr | ::= | SimpleForClause "return" ExprSingle |
XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.
The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe values in the [XQuery 1.0 and XPath 2.0 Data Model], and to describe the [XPath/XQuery] type system. Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.
| [43 (Formal)] | Definition | ::= | ("define" "element" ElementName Substitution? Nillable? TypeReference) |
Note that grammar productions that are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.
The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.
For example:
Notation
The judgment
holds if the object Painting is beautiful.
Notation
Here are three judgments that are used extensively in this document.
The judgment
holds if the expression Expr yields (or evaluates to) the value Value.
The judgment
holds when the expression Expr has type Type.
The judgment
holds when the expression Expr raises the error Error.
A judgment can contain symbols and patterns.
Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.
Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr represents any [XPath/XQuery] expression, and Value represents any value in the [XPath/XQuery] data model.
When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).
Patterns may appear with subscripts (e.g. Expr1, Expr2) to distinguish different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. The judgment'$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.
In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.
Inference rules are used to specify whether a judgment holds or not. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments.
A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:
| premise1 ... premisen |
| conclusion |
All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.
Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:
| $x => 0 3 => 3 |
| $x + 3 => 3 |
This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.
It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:
| 3 => 3 |
This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.
The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:
| Variable => Integer |
| Variable + 0 => Integer |
As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".
Note
In effect, inference rules are just a notation that describes a bottom-up algorithm, for instance an evaluation algorithm where the result of an expression depends on the result for its sub-expressions.
Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" expression can be captured in an environment and used for further evaluations.
An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.
If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped. The notation is intentionally akin to function application as an environment can be seen as a function from the argument symbol to the "object" to which the symbol is mapped.
This document uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".
Updating is only defined on environment groups:
"env + mem(symbol => object) " denotes the new environment group that is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol => object indicates that symbol is mapped to object in the new environment.
If the "object" is a type then the following notation relates a symbol to a type: "env + mem(symbol : object) ".
The following shorthand is also allowed: "env + mem( symbol1 => object1 ; ... ; symboln => objectn ) " in which each symbol is mapped to a corresponding object in the new environment.
This notation is equivalent to nested updates, as in " (env + mem( symbol1 => object1) + ... ) + mem(symboln => objectn)".
Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of a symbol (e.g., a variable, a namespace prefix, etc.) Also, note that there are no operations to remove entries from environments: this is never necessary because updating an the environment group effectively creates a new extended copy of the original environment group, and the original environme group remains accessible along with the updated copy.
Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is used in almost all inference rules.
For instance, the judgment
is read as: Assuming the dynamic environment dynEnv, the expression Expr yields the value Value.
The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [3.1 Expression Context].
Putting the above notations together, here is an example of an inference rule that occurs later in this document:
This rule is read as follows: if two expressions Expr1 and Expr2 are known to have the static types types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the static type "Type1, Type2", which is the sequence of types Type1 and Type2.
The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.
| statEnv |- Expr1 : Type1 statEnv + varType(QName : Type1) |- Expr2 : Type2 |
statEnv |-
let
$QName := Expr1
return
Expr2 : Type2
|
This rule is as follows: First, the type Type1 for the "let" input expression Expr1 is computed. Second the "let" variable is added into the varType component of the static environment group statEnv, with type Type1. Finally, the type Type2 of Expr2 is computed in that new environment.
| Editorial note | |
| Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively. | |
[XPath/XQuery] manipulates XML values as defined in the [XQuery 1.0 and XPath 2.0 Data Model]. XML values are composed of nodes, atomic values and sequences. This section introduces formal notations for describing [XPath/XQuery] values from [XQuery 1.0 and XPath 2.0 Data Model]. These notations are used to describe and manipulate values in inference rules, but are not exposed to the [XPath/XQuery] user.
A value is a sequence of zero or more items. An item is either an atomic value or a node.
An atomic value is a value in the value space of an atomic
type and is labeled with the name of that atomic type. An XML
Schema atomic type [XML Schema Part 2] may be primitive or derived,
or xdt:untypedAtomic.
A node is either an element, an attribute, a document, a
text, a comment, or a processing-instruction node. Elements have
a type annotation and contain a value. Attributes have a type
annotation and contain a simple value, which is a sequence of
atomic values. Text nodes always contain one string value of
type xdt:untypedAtomic, therefore the corresponding type
annotation is omitted.
A type annotation can be either the QName of a declared type or an anonymous type. An anonymous type corresponds to an XML Schema type for which the schema writer did not provide a name. Anonymous type names are not visible to the user, but are generated during schema validation and used to annotate nodes in the data model. By convention, anonymous type names are written in the Formal Semantics as: [Anon0], [Anon1], etc.
Untyped elements (e.g., from well-formed documents) are
annotated with xs:anyType, untyped attributes are annotated
with xs:anySimpleType, and untyped atomic values (i.e., text
content or attribute content in well-formed documents) are
annotated with xdt:untypedAtomic.
An element has an optional "nilled" marker. This
marker can only be present if the element has been validated
against an element type in the schema which is
"nillable", and they have no content and an
attribute xsi:nil set to "true".
An element also has a sequence of namespace annotations, which are the set of active namespace declarations that are in-scope for the element. Each namespace annotation is a prefix, URI pair. Namespace annotations are not values, i.e., they are never the result of evaluating an arbitrary expression, and only occur as annotations on elements. In examples, we omit the namespace annotations when they are empty. For example, the following two values are equivalent:
element weight of type xs:integer { text { "42" } } {}
element weight of type xs:integer { text { "42" } }
| [14 (Formal)] | Value | ::= | Item |
| [25 (Formal)] | Item | ::= | NodeValue |
| [26 (Formal)] | AtomicValue | ::= | AtomicValueContent TypeAnnotation |
| [7 (Formal)] | AtomicValueContent | ::= | String |
| [8 (Formal)] | TypeAnnotation | ::= | "of" "type" TypeName |
| [16 (Formal)] | ElementValue | ::= | "element" ElementName "nilled"? TypeAnnotation "{" Value "}" "{" NamespaceAnnotations "}" |
| [17 (Formal)] | AttributeValue | ::= | "attribute" AttributeName TypeAnnotation "{" SimpleValue "}" |
| [15 (Formal)] | SimpleValue | ::= | AtomicValue |
| [18 (Formal)] | DocumentValue | ::= | "document" "{" Value "}" |
| [20 (Formal)] | CommentValue | ::= | "comment" "{" String "}" |
| [21 (Formal)] | ProcessingInstructionValue | ::= | "processing-instruction" QName "{" String "}" |
| [19 (Formal)] | TextValue | ::= | "text" "{" String "}" |
| [24 (Formal)] | NodeValue | ::= | ElementValue |
| [9 (Formal)] | ElementName | ::= | QName |
| [12 (Formal)] | AttributeName | ::= | QName |
| [27 (Formal)] | TypeName | ::= | QName | AnonymousTypeName |
| [13 (Formal)] | AnonymousTypeName | ::= | [Anon1] | [Anon2] | ... |
| [22 (Formal)] | NamespaceAnnotations | ::= | NamespaceAnnotation ... NamespaceAnnotation |
| [23 (Formal)] | NamespaceAnnotation | ::= | "namespace" NCName "{" AnyURI "}" |
Notation
In the above grammar, "String" indicates the
value space of xs:string, "Decimal" indicates
the value space of xs:decimal, etc.
Note that the same rule about constructing sequences apply to
the values described by that grammar. Notably sequences cannot
be nested. For example, the sequence (10, (1, 2), (), (3,
4)) is equivalent to the sequence (10, 1, 2, 3,
4).
A well-formed document
<fact>The cat weighs <weight units="lbs">12</weight> pounds.</fact>
In the absence of a Schema, this document is represented as
element fact of type xs:anyType {
text { "The cat weighs " },
element weight of type xs:anyType {
attribute units of type xs:anySimpleType {
"lbs" of type xdt:untypedAtomic
}
text { "12" }
},
text { " pounds." }
}
A document before and after validation.
<weight xsi:type="xs:integer">42</weight>
The formal model for values can represent values before and after validation. Before validation, this element is represented as:
element weight of type xs:anyType {
attribute xsi:type of type xs:anySimpleType {
"xs:integer" of type xdt:untypedAtomic
},
text { "42" }
}
After validation, this element is represented as:
element weight of type xs:integer {
attribute xsi:type of type xs:QName {
"xs:integer" of type xs:QName
},
42 of type xs:integer
}
An element with a list type
<sizes>1 2 3</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 2 3" }
}
Assume the following Schema.
<xs:element name="sizes" type="sizesType"/>
<xs:simpleType name="sizesType">
<xs:list itemType="sizeType"/>
</xs:simpleType>
<xs:simpleType name="sizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>
After validation against this Schema, the element is represented as:
element sizes of type sizesType {
1 of type sizeType,
2 of type sizeType,
3 of type sizeType
}
An element with an anonymous type
<sizes>1 2 3</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 2 3" }
}
Assume the following Schema.
<xs:element name="sizes">
<xs:simpleType>
<xs:list itemType="xs:integer"/>
</xs:simpleType>
</xs:element>
After validation, this element is represented as:
element sizes of type [Anon1] {
1 of type xs:integer,
2 of type xs:integer,
3 of type xs:integer
}
where [Anon1] stands for the internal anonymous
name generated by the system for the sizes
element.
An element with a nillable
<sizes xsi:nil="true"/>
Before validation, this element is represented as:
element sizes of type xs:anyType {
attribute xsi:nil of type xs:anySimpleType { "true" }
}
Assume the following Schema.
<xs:element name="sizes" type="sizesType" nillable="true"/>
After validation against this Schema, the element is represented as:
element sizes nilled of type sizesType {
attribute xsi:nil of type xs:anySimpleType { "true" }
}
An element with a union type
<sizes>1 two 3 four</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 two 3 four" }
}
Assume the following Schema:
<xs:element name="sizes" type="sizesType"/>
<xs:simpleType name="sizesType">
<xs:list itemType="sizeType"/>
</xs:simpleType>
<xs:simpleType name="sizeType">
<xs:union memberType="xs:integer xs:string"/>
</xs:simpleType>
After validation against this Schema, the element is represented as:
element sizes of type sizesType {
1 of type xs:integer,
"two" of type xs:string,
3 of type xs:integer,
"four" of type xs:string
}
The [XPath/XQuery] type system is used in the specification of the dynamic and of the static semantics of [XPath/XQuery]. This section introduces formal notations for describing types.
The [XPath/XQuery] type system is based on XML Schema. Formalizing the treatment of types in [XPath/XQuery], however, requires some adjustments.
Use of formal notations for types. The Formal Semantics uses formal notations for types instead of XML Schema syntax. Those notations are used extensively to describe and manipulate types in the inference rules. The formal notations for types introduced here are not exposed to the [XPath/XQuery] user.
Representation of content models. For the purpose of
static typing, the [XPath/XQuery] type system only describes
minOccurs, maxOccurs combinations which corresponds to the
standard DTD constructs +, *, and
?. Choices are represented using the standard DTD
construct |. All groups are represented using the
& notation.
Representation of anonymous types. To clarify the semantics, the [XPath/XQuery] type system makes all anonymous types explicit.
Representation of XML Schema simple type facets and identity constraints. For simplicity, XML Schema simple type facets as well as indentity constraints are not formally represented in the [XPath/XQuery] type system. Still, [XPath/XQuery] implementation supporting XML Schema import and validation must follow the XML Schema specification and take simple type facets and indentity constraints into account.
The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas]. The rest of this section is organized as follows. [2.3.2 Item types] describes types items, [2.3.3 Content models] describes content models, and [2.3.4 Top level definitions] describe top-level type declarations.
An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, a comment node type, or a processing instruction type.
| [30 (Formal)] | ItemType | ::= | AtomicTypeName | NodeType |
| [32 (Formal)] | AtomicTypeName | ::= | QName |
| [31 (Formal)] | NodeType | ::= | ElementType |
| [33 (Formal)] | ElementType | ::= | "element" ElementName? TypeSpecifier? |
| [34 (Formal)] | TypeSpecifier | ::= | Nillable? TypeReference |
| [35 (Formal)] | AttributeType | ::= | "attribute" AttributeName? TypeReference? |
| [36 (Formal)] | Nillable | ::= | "nillable" |
| [40 (Formal)] | TypeReference | ::= | "of" "type" TypeName |
| [51 (Formal)] | DocumentType | ::= | "document" ("{" Type? "}")? |
An element or attribute type has an optional name and an optional type reference. A name alone corresponds to a reference to a global element or attribute declaration. A name with a type reference corresponds to a local element or attribute declaration. The word "element" or "attribute" alone refers to the wildcard types any element or any attribute. In addition, an element type has an optional nillable flag which indicates whether the element can be nilled or not.
A document type has an optional content type. If no content type is given, then it refers to the wildcard type describing any document.
Note
Note that generic node types (e.g., node()),
are interpreted in the type system as union types (e.g.,
element | attribute | text | comment |
processing-instruction) and therefore do not appear
here. The semantics of sequence types is described in [3.4.1.1 SequenceType Matching].
Examples
A text node type
text
matches any text node, such as:
text { "Text is beautiful" }
A wildcard element
element
matches any element.
A wildcard element of type string
element of type xs:integer
matches any element of type string, such as:
element name of type xs:string { "John Doe" }
A nillable element of type string
element size nillable of type xs:integer
matches any element with name size of type
xs:integer, such as:
element size of type xs:integer {
2 of type xs:integer
}
or it matches any element with name size which
has no content and the xsi:nil attribute set to
true, such as:
element size of type xs:integer {
attribute xsi:nil of type xs:anySimpleType { "true" }
}
A reference to a globally declared attribute
attribute sizes
refers to the global declaration for the attribute
sizes.
An element with an anonymous type
element sizes of type [Anon1]
matches any element with name sizes and the
anonymous type [Anon1], such as:
element sizes of type [Anon1] {
1 of type xs:integer,
2 of type xs:integer,
3 of type xs:integer
}
Following XML Schema, types in [XPath/XQuery] are composed from
item types by optional, one or more, zero or more, all group,
sequence, choice, empty sequence, or empty choice (written
none).
The type empty matches the empty sequence. The type
none matches no values. It is called the empty
choice because it is the identity for choice, that is
(Type | none) =
Type)). The type none is the static type
for [6.2.1 The fn:error function].
| [28 (Formal)] | Type | ::= | ItemType |
| [29 (Formal)] | Occurrence | ::= | "*" | "+" | "?" |
The [XPath/XQuery] type system includes three binary operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema. The [XPath/XQuery] type system includes three unary operators on types: "*", "+", and "?", corresponding respectively to zero or more instances of the type, one or more instances of the type, or an optional instance of the type.
The "&" operator builds the "interleaved
product" of two types. The type Type1
& Type2 matches any sequence that is an
interleaving of a sequence that matches Type1 and a
sequence that matches Type2.
The interleaved product is used to represent all groups in XML Schema. All group in XML Schema are restricted to apply only on global or local element declarations with lower bound 0 or 1, and upper bound 1.
Examples
A sequence of elements
The "," operator builds the "sequence" of two types. For example,
element title of type xs:string, element year of type xs:integer
is a sequence of an element title of type string followed by an element year of type integer.
The union of two element types
The "|" operator builds the "union" of two types. For example,
element editor of type xs:string | element bib:author
means either an element editor of type string, or a
reference to the gobal element bib:author.
An all group of two elements
The "&" operator builds the "interleaved product" of two types. For example,
(element a & element b) =
element a, element b
| element b, element a
which specifies that the a and b
elements can occur in any order.
An empty type
The following type matches the empty sequence.
empty
A sequence of zero or more elements
The following type matches zero or more elements each of
which can be a surgeon or a
plumber.
(element surgeon | element plumber)*
Top level definitions correspond to global element declarations, global attribute declarations and type definitions in XML Schema.
| [43 (Formal)] | Definition | ::= | ("define" "element" ElementName Substitution? Nillable? TypeReference) |
| [44 (Formal)] | Substitution | ::= | "substitutes" "for" ElementName |
| [37 (Formal)] | TypeDerivation | ::= | ComplexTypeDerivation | AtomicTypeDerivation |
| [38 (Formal)] | ComplexTypeDerivation | ::= | Derivation? Mixed? "{" Type? "}" |
| [39 (Formal)] | AtomicTypeDerivation | ::= | "restricts" AtomicTypeName |
| [41 (Formal)] | Derivation | ::= | ("restricts" TypeName) |
| [42 (Formal)] | Mixed | ::= | "mixed" |
A type definition has a name (possibly anonymous) and a type derivation. In the case of a complex type, or a simple type derived by list or uniont, derivation indicates if the type is derived by extension or restriction from its based type, gives its content model, and an optional flag indicating if it has mixed content. In the case of an atomic type, it just indicate from which type it is derived from.
A type is derived either by extension or restriction from a
base type. In case the type derivation is omitted, the type
derives by restriction from xs:anyType, as in:
define type Bib { xs:integer } =
define type Bib restricts xs:anyType { xs:integer }
Empty content can be indicated with the explicit empty sequence, or omitted, as in:
define type Bib { } =
define type Bib { empty }
Global element and attribute declarations have always a name, and a reference to a (possibly anonymous) type. In addition, a global element declaration may declare a substitution group for the element and whether the element is nillable.
Examples
A type declaration with complex content
define type Address {
element name of type xsd:string,
element street of type xsd:string*
}
A type declaration with complex content derived by extension
define type USAddress extends Address {
element zip name of type xsd:integer
}
A type declaration with mixed content
define type Section mixed {
(element h1 of type xsd:string |
element p of type xsd:string |
element div of type Section)*
}
A type declaration with simple content derived by restriction
define type SKU restricts xsd:string { xsd:string }
An element declaration
define element address of type Address
An element declaration with a substitution group
define element usaddress substitutes for address of type USAddress
An element declaration which is nillable
define element zip nillable of type xs:integer
Here is a schema describing purchase orders, taken from the XML Schema Primer, followed by its mapping into the [XPath/XQuery] type system. The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas].
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
namespace xsd = "http://www.w3.org/2001/XMLSchema"
define element purchaseOrder of type ipo:PurchaseOrderType
define element comment of type xsd:string
define type PurchaseOrderType {
attribute orderDate of type xsd:date?,
element shipTo of type USAddress,
element billTo of type USAddress,
element ipo:comment?,
element items of type Items
}
define type USAddress {
attribute country of type xsd:NMTOKEN?,
element name of type xsd:string,
element street of type xsd:string,
element city of type xsd:string,
element state of type xsd:string,
element zip of type xsd:decimal
}
define type Items {
attribute partNum of type SKU,
element item of type [Anon1]*
}
define type [Anon1] {
element productName of type xsd:string,
element quantity of type [Anon2],
element USPrice of type xsd:decimal,
element comment?,
element shipDate of type xsd:date?
}
define type [Anon2] restricts xsd:positiveInteger { xsd:positiveInteger }
define type SKU restrict xsd:string { xsd:string }
Note the way the two anonymous types inside the
item element are handled by giving them anonymous
names [Anon1] and [Anon2].
The following additional definitions illustrate how more advanced XML Schema features (a complex type derived by extension, an anonymous simple type derived by restriction, and substitution groups) are represented in the [XPath/XQuery] type system.
<complexType name="NYCAddress">
<complexContent>
<extension base="USAddress">
<sequence>
<element ref="apt"/>
</sequence>
</extension>
</complexContent>
</complexType>
<element name="apt"/>
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="10000"/>
</xsd:restriction>
</xsd:simpleType>
<element>
<element name="usaddress" substitutionGroup="address" type="USAddress"/>
<element name="nycaddress" substitutionGroup="usaddress" type="NYCAddress"/>
Those definitions are written in the [XPath/XQuery] type system as follows.
define type NYCAddress extends USAddress {
element apt
}
define element apt of type [Anon3]
define type [Anon3] restricts xsd:positiveInteger
define element usaddress substitutes for address of type USAddress
define element nycaddress substitutes for usaddress of type NYCAddress
This section defines a processing model for [XPath/XQuery], then defines formal judgments for each key phase in that processing model (normalization, static type analysis and dynamic evaluation).
This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.
The processing model consists of five phases; each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we point to the relevant notations introduced later in the document.
Parsing. The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal abstract syntax tree of the parsed query is created.
Context Processing. The semantics of [expression/query] depends on the input context. The input context needs to be generated before the [expression/query] can be processed. In XQuery, the input context may be defined by the processing environment and by statements in the Query Prolog (See [5 Modules and Prologs]). In XPath, the input context is defined by the processing environment. The input context has a static and a dynamic part (denoted statEnv and dynEnv, respectively).
Normalization. To
simplify the semantics specification, some normalization is
performed on the [expression/query]. The [XPath/XQuery] language
provides many powerful features that make [expression/query]s
simpler to write and use, but are also redundant. For
instance, a complex for expression might be
rewritten as a composition of several simple
for expressions. The language composed of these
simpler [expression/query] is called the [XPath/XQuery] Core
language and is described by a grammar which is a
subset of the XQuery grammar. The grammar of the [XPath/XQuery]
Core language is given in [A Normalized core grammar].
During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literal expressions and variables.
Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.4.2 Normalization judgment].
After normalization, the full semantics is obtained by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.
Static type analysis. Static type analysis is optional. If this phase is not supported, then normalization is followed directly by dynamic evaluation.
Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and of input documents into account.
If the [expression/query] is not type-safe, static type analysis yields a type error. For instance, a comparison between an integer value and a string value might be detected as an type error during the static type analysis. If static type analysis succeeds, it yields an abstract syntax tree where each sub-expression is "annotated" with its static type
More precisely, the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.4.3 Static typing judgment].
Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.
Dynamic Evaluation. This phase computes the value of an [expression/query]. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals and variables. Evaluation may result in a value OR a dynamic error, which may be a non-type error or a type error. If static typing of an expression does not raise a type error, then dynamic evaluation of the same expression will not raise a type error, although dynamic evaluation may raise some non-type error.
The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (dynEnv), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.4.4 Dynamic evaluation judgment].
The first four phases above are "analysis-time" (sometimes also called "compile-time") steps, meaning that they can be performed on a [expression/query] before examining any input document. Indeed, analysis-time processing can be performed before input documents even exist. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. The last phase is an "execution-time" (sometimes also called "run-time") step, meaning that the query is evaluated on actual input document(s).
Static analysis catches only certain classes of errors. For
instance, it can detect a comparison operation applied between
incompatible types (e.g., xs:int and
xs:date). Some other classes of errors cannot be
detected by the static analysis and are only detected at
execution time. For instance, whether an arithmetic expression
on 32 bits integers (xs:int) yields an out-of-bound
value can only be detected at run-time by looking at the
data.
While implementations are free to implement different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report type errors during static analysis, as specified in this document. Dynamically typed implementations are required to find and report type errors during evaluation, but are permitted to report them during static analysis.
Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: [XPath/XQuery] processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model simply defines the final result.
The above processing phases are all internal to the [XPath/XQuery] processor. They do not deal with how the [XPath/XQuery] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:
XML Schema import phase. The [XPath/XQuery] type system is based on XML Schema. In order to perform dynamic or static typing, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [8 Importing Schemas].
XML loading phase. Expressions are evaluated on values in the [XQuery 1.0 and XPath 2.0 Data Model]. XML documents must be loaded into the [XQuery 1.0 and XPath 2.0 Data Model] before the evaluation phase. This is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not discussed further here.
Serialization phase. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is described in [XQuery 1.0 and XPath 2.0 Data Model Serialization] and is not discussed further here.
The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.
Normalization is specified using mapping rules which describe how a [XPath/XQuery] expression is rewritten into an expression in the [XPath/XQuery] Core. Mapping rules are also used in [8 Importing Schemas] to specify how XML Schemas are imported into the [XPath/XQuery] type system.
Notation
Mapping rules are written using a square bracket notation, as follows:
| [Object]Subscript |
| == |
| Mapped Object |
The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.
(Since normalization is always done in the context of the static context the above is really a shorthand for
We use the shorthand because statEnv is always implied.)
The static environment is used in certain cases (e.g. for normalization of function calls) during normalization. To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.
| Editorial note | |
| [Kristoffer/XSL] We should decide whether to use a shorthand notation as suggested or modify the mapping rules throughout. | |
The normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is:
| [Expr]Expr |
| == |
| CoreExpr |
which indicates that the expression Expr is normalized to the expression CoreExpr in the [XPath/XQuery] core (with the implied statEnv).
Example
For instance, the following [expression/query]
for $i in (1, 2),
$j in (3, 4)
return
element pair { ($i,$j) }
is normalized to the core expression
for $i in (1, 2) return
for $j in (3, 4) return
return
element pair { ($i,$j) }
in which the complex "FWLR" expression is mapped into a composition of two simpler "for" expressions.
The static semantics is specified using type inference rules, which relate [XPath/XQuery] expressions to types and specify under what conditions an expression is well typed.
Notation
The judgment
holds when, in the static environment statEnv, the expression Expr has type Type.
Example
The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.
For instance, the following expression.
let $v := 3 return $v+5
has type xs:integer. This can be inferred as follows: the
input literals '3' and '5' have type integer, so the variable
$v also has type integer. Since the sum of two integers is an
integer, the complete expression has type integer.
Note
The type of an expression is computed by inference. Static type inference rules define for each kind of expression how to compute the type of the expression given the types of its sub-expressions. Here is a simple example of such a static typing rule:
statEnv |-
Expr1 : xs:boolean
statEnv |-
Expr2 : Type2
statEnv |-
Expr3 : Type3
|
statEnv |-
if Expr1
then Expr2
else Expr3
: ( Type2 | Type3 )
|
This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a union: '(Type2|Type3)'.
The "left half" (the part before the :) of the expression below the line corresponds to some [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal abstract syntax tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr1, Expr2, and Expr3) that need to be matched against the children of the node in the abstract syntax tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the condition expression and the two branches of the if-then-else expression. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This illustrates a general feature of the [XPath/XQuery] type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the abstract syntax of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type-safe.
The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.
Notation
The judgment
holds when, in the static environment statEnv and dynamic environment dynEnv, the expression Expr yields the value Value.
The judgment
holds when, in the static environment statEnv and dynamic environment dynEnv, the expression Expr raises the error Error.
The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.
Example
For instance, the following expression.
let $v := 3 return $v+5
yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.
Note
As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions.
The inference rules used for dynamic evaluation, like those for static type inference, follow a bottom-up recursive structure, computing the value of expressions from the values of their sub-expressions.
Whenever appropriate, the Formal Semantics uses the following namespace prefixes.
fn: for functions and operators from the
[XQu