Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document defines formally the semantics of XQuery 1.0 [XQuery 1.0: A Query Language for XML] XPath 2.0 [XML Path Language (XPath) 2.0].
This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.
This document is a work in progress. It contains a number of open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.
This version is a major revision from the previous version, and it closes more than 20 issues. Among the most important changes from the previous version of this document are:
The complete semantics of element constructors, as well as XML Schema validation. This resolves Issue 449 (FS-Issue-0110), Issue 453 (FS-Issue-0106), Issue 509 (FS-Issue-0166), and Issue 513 (FS-Issue-0170).
The complete description of the static semantics of [XQuery 1.0 and XPath 2.0 Functions and Operators]. This resolves Issue 478 (FS-Issue-0135), and Issue 450 (FS-Issue-0107).
A simplified semantics for the typeswitch expression. This resolves Issue 048, Issue 515 (FS-Issue-0173), and Issue 454 (FS-Issue-0112).
The formal specification of error handling in [XPath/XQuery]. This resolves Issue 437 (FS-Issue-0094), and Issue 514 (FS-Issue-0171).
The formal specification of non-deterministic semantics in [XPath/XQuery] This resolves Issue 479. (FS-Issue-0136).
The formal specification of the XPath 1.0 backward compatibility mode. This resolves Issue 521 (FS-Issue-0178).
Fixes to the static semantics of path expressions. This resolves Issue 488 (FS-Issue-0145)
Some fixes to the semantics of function calls. This resolves 472 (FS-Issue-0129), Issue 520 (FS-Issue-0177), Issue 553, Issue 554, Issue 539 and Issue 540.
A revised issue list has been integrated with other [XPath/XQuery] issues.
The following are identified as high priority issues.
This document is not aligned with the other [XPath/XQuery] documents on the treatment of conformance levels. See Issue 441 (FS-Issue-0098), and Issue 512 (FS-Issue-0169).
This document is not aligned with the other [XPath/XQuery] documents on the treatment of the modules. See Issues 555, and 556.
The formal semantics of order by are still under discussion. See Issue 452 (FS-Issue-0109).
The formal semantics of node identity are still under discussion. See Issue 529.
The formal semantics of the new sequence types and of type tests are still under discussion. See Issue 559.
Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).
XQuery 1.0, XPath 2.0, and their formal semantics has been defined jointly by the XML Query Working Group and the XSL Working Group (both part of the XML Activity).
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.
1 Introduction
2 Preliminaries
2.1 Introduction to
the Formal Semantics
2.1.1 Notations from grammar productions
2.1.2 Notations for judgments
2.1.3 Notations for inference rules
2.1.4 Notations for environments
2.1.5 Putting it together
2.2 XML
Values
2.2.1 Formal values
2.2.2 Examples of values
2.3 The Type
System
2.3.1 XML Schema and the Type System
2.3.2 Item types
2.3.3 Content models
2.3.4 Top level
definitions
2.3.5 Example of a complete
Schema
2.4 Processing
model and main judgments
2.4.1 Processing model
2.4.2 Normalization judgment
2.4.3 Static typing judgment
2.4.4 Dynamic evaluation judgment
2.5 Relationship
with other documents
2.5.1 Namespaces
2.5.2 Functions and operators
3 Basics
3.1 Expression
Context
3.1.1 Static Context
3.1.2 Evaluation Context
3.2 Input
Functions
3.3 Expression
Processing
3.4 Types
3.4.1 Predefined Types
3.4.2 Type Checking
3.4.3 SequenceType
3.4.3.1
SequenceType
Matching
3.4.4 Type Conversions
3.4.4.1
Atomization
3.4.4.2
Effective Boolean Value
3.4.4.3
XPath 1.0 Backward Compatibility
3.5 Errors
Handling
3.6 Optional
Features
3.6.1 Basic XQuery
3.6.2 Schema Import Feature
3.6.3 Static Typing Feature
3.6.4 Extensions
3.6.4.1
Pragmas
3.6.4.2
Must-Understand Extensions
3.6.4.3
XQuery Flagger
4 Expressions
4.1 Primary
Expressions
4.1.1 Literals
4.1.2 Variables
4.1.3 Parenthesized Expressions
4.1.4 Function Calls
4.1.5 XQuery Comments
4.2 Path Expressions
4.2.1 Steps
4.2.1.1
Axes
4.2.1.2
Node Tests
4.2.2 Predicates
4.2.3 Unabbreviated Syntax
4.2.4 Abbreviated Syntax
4.3 Sequence
Expressions
4.3.1 Constructing
Sequences
4.3.2 Combining Sequences
4.4 Arithmetic Expressions
4.5 Comparison
Expressions
4.5.1 Value Comparisons
4.5.2 General Comparisons
4.5.3 Node Comparisons
4.5.4 Order Comparisons
4.6 Logical
Expressions
4.7 Constructors
4.7.1 Direct Element
Constructors
4.7.1.1
Attributes
4.7.1.2
Namespaces
4.7.1.3
Content
4.7.1.4
Whitespace in Element
Content
4.7.1.5
Type of a Constructed Element
4.7.2 Computed Constructors
4.7.2.1
Computed Element Constructors
4.7.2.2
Computed Attribute Constructors
4.7.2.3
Document Node Constructors
4.7.2.4
Text Nodes Constructors
4.7.3 Other Constructors and
Comments
4.8 [For/FLWR]
Expressions
4.8.1 FLWOR expressions
4.8.2 For expression
4.8.3 Let Expression
4.8.4 Order By
4.9 Unordered
Expressions
4.10 Conditional
Expressions
4.11 Quantified
Expressions
4.12 Expressions on
SequenceTypes
4.12.1 Instance Of
4.12.2 Typeswitch
4.12.3 Cast
4.12.4 Castable
4.12.5 Constructor Functions
4.12.6 Treat
4.13 Validate Expressions
5 Modules and Prologs
5.1 Version
Declaration
5.2 Namespace Declarations
5.3 Default Namespace
Declarations
5.4 Schema Imports
5.5 Module
Imports
5.6 Variable
Definitions
5.7 Validation Declaration
5.8 Xmlspace
Declaration
5.9 Default
Collation
5.10 Function
Definitions
6 Additional Semantics of
Functions
6.1 Formal Semantics
Functions
6.1.1 The fs:distinct-doc-order function
6.1.2 The
fs:item-sequence-to-node-sequence function
6.1.3 The
fs:item-sequence-to-untypedAtomic function
6.1.4 The fs:convert-simple-operand
function
6.1.5 The fs:convert-operand function
6.1.6 The arithmetic operator pseudo-functions:
fs:minus, fs:plus, fs:times, fs:idiv, fs:div, and
fs:mod
6.1.7 The comparison pseudo-functions: fs:eq, fs:ne,
fs:lt, fs:le, fs:gt, and fs:ge
6.2 Standard
functions with specific typing rules
6.2.1 The fn:error function
6.2.2 The fn:distinct-nodes and fn:distinct-values
functions
6.2.3 The fn:collection and fn:doc
functions
6.2.4 The op:union, op:intersect, and op:except
operators
6.2.5 The fn:data function
6.2.6 The fn:ceiling, fn:floor, fn:round, and
fn:round-half-to-even functions
6.2.7 The fn:subsequence, and fn:remove
functions
6.2.8 The fn:min fn:max, fn:avg, and fn:sum
functions
7 Auxiliary
Judgments
7.1 Judgments for
schema contexts
7.1.1 Mode is
7.1.2 Context is
7.2 Judgments for
accessing types
7.2.1 Derives
7.2.2 Substitutes
7.2.3 Element and attribute type
lookup
7.2.4 Static element and attribute type
lookup
7.2.5 Extension
7.2.6 Type adjustment
7.2.7 Type expansion
7.3 Judgments for step expressions
and filtering
7.3.1 Principal Node Kind
7.3.2 Filters
7.3.2.1
Type filters
7.3.2.2
Value filters
7.3.3 Attribute filtering
7.4 Judgments for type
matching
7.4.1 Matches
7.4.2 Subtype
7.5 Judgments
for sequences of item types
7.6 Judgments for type
promotion
7.7 Judgments
for the validate expression
7.7.1 Builtin attributes
7.7.2 Mixed content
7.7.3 Type resolution
7.7.4 Interleaving
7.7.5 Erasure
7.7.5.1
Simply erases
7.7.5.2
Erases
7.7.6 Annotate
7.7.6.1
Simply annotate
7.7.6.2
Nil-annotate
7.7.6.3
Annotate
7.7.7 Validates as
8 Importing Schemas
8.1 Introduction
8.1.1 Features
8.1.2 Organization
8.1.3 Main mapping rules
8.1.4 Special attributes
8.1.4.1
use
8.1.4.2
minOccurs and maxOccurs
8.1.4.3
mixed
8.1.4.4
nillable
8.1.4.5
substitutionGroup
8.1.5 Anonymous type names
8.2 Attribute
Declarations
8.2.1 Global attributes declarations
8.2.2 Local attribute declarations
8.3 Element
Declarations
8.3.1 Global element declarations
8.3.2 Local element declarations
8.4 Complex Type
Definitions
8.4.1 Global complex type
8.4.2 Local complex type
8.4.3 Complex type with simple content
8.4.4 Complex type with complex content
8.5 Attribute
Uses
8.6 Attribute Group
Definitions
8.6.1 Attribute group definitions
8.6.2 Attribute group reference
8.7 Model Group
Definitions
8.8 Model
Groups
8.8.1 All groups
8.8.2 Choice groups
8.8.3 Sequence groups
8.9 Particles
8.9.1 Element reference
8.9.2 Group reference
8.10 Wildcards
8.10.1 Attribute wildcards
8.10.2 Element wildcards
8.11 Identity-constraint Definitions
8.12 Notation
Declarations
8.13 Annotation
8.14 Simple Type
Definitions
8.14.1 Global simple type definition
8.14.2 Local simple type definition
8.14.3 Simple type content
8.15 Schemas as a
whole
8.15.1 Schema
8.15.2 Include
8.15.3 Redefine
8.15.4 Import
A Normalized core
grammar
A.1 Core lexical
structure
A.1.1 Syntactic Constructs
A.2 Core
BNF
B Functions and
Operators
B.1 Functions and Operators used in
the Formal Semantics
B.2 Mapping
of Overloaded Internal Functions
C References
C.1 Normative
References
C.2 Non-normative
References
C.3 Background
References
D XQuery 1.0 and XPath 2.0
Issues
This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:
[XQuery 1.0: A Query Language for XML] introduces the XQuery 1.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XML Path Language (XPath) 2.0] introduces the XPath 2.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XQuery 1.0 and XPath 2.0 Functions and Operators] lists the functions and operators defined for the [XPath/XQuery] language and specifies to which arguments they can be applied and what the result should be.
[XQuery 1.0 and XPath 2.0 Data Model] formally specifies the data model used by [XPath/XQuery] to represent the content of XML documents. The [XPath/XQuery] language is formally defined by operations on this data model.
[XQuery 1.0 and XPath 2.0 Data Model Serialization] specifies how [XPath/XQuery] data model values are serialized back into XML.
The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].
This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the expressions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.
Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.
[XPath/XQuery] is a functional language. [XPath/XQuery] is built from expressions, rather than statements. Every construct in the language (except for the XQuery query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.
[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas that describe the input documents and the output document, and the [XPath/XQuery] language can then perform operations based on these types. In addition, [XPath/XQuery] supports static type analysis. This means a static analysis phase is defined on [XPath/XQuery] expressions that infers the output type of an expression, based on the type of its inputs. Static typing allows early detection of type errors, and can be used as the basis for certain forms of optimization. The [XPath/XQuery] type system captures most of the features of [XML Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types.
This document is organized as follows. [2 Preliminaries] introduces the notations used to define the [XPath/XQuery] formal semantics. These include the formal notations for values in the [XPath/XQuery] data model and for types in XML Schema. The next three sections: [3 Basics], [4 Expressions], and [5 Modules and Prologs] have the same structure as the corresponding sections in the [XQuery 1.0: A Query Language for XML] and [XML Path Language (XPath) 2.0] documents. This allows the reader to quickly find the formal definition of a particular language construct. [3 Basics] defines the semantics for basic [XPath/XQuery] concepts, and [4 Expressions] defines the dynamic and static semantics of each [XPath/XQuery] expression. [5 Modules and Prologs] defines the semantics of the [XPath/XQuery] prolog. [6 Additional Semantics of Functions] defines the static semantics of several functions in [XQuery 1.0 and XPath 2.0 Functions and Operators] and gives the dynamic and static semantics of several supporting functions used in this document. The remaining sections, [7 Auxiliary Judgments] and [8 Importing Schemas], contain material that supports the formal semantics of [XPath/XQuery]. [7 Auxiliary Judgments] defines formal judgments that relate data model values to types, that relate types to types, and that support the formal definition of validation. These judgements are used in the definition of expressions in [4 Expressions]. Lastly, [8 Importing Schemas], specifies how XML Schema documents are imported into the [XPath/XQuery] type system and relates XML Schema types to the [XPath/XQuery] type system.
This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.
Why a Formal Semantics? The goal of the formal semantics is to complement the [XPath/XQuery] specification ([XQuery 1.0: A Query Language for XML] and [XML Path Language (XPath) 2.0]), by defining the meaning of [XPath/XQuery] expressions with mathematical rigor.
A rigorous formal semantics clarifies the intended meaning of the English specification, ensures that no corner cases are left out, and provides a reference for implementation.
Why use formal notations? Rigor is achieved by the use of formal notations to represent [XPath/XQuery] objects such as expressions, XML values, and XML Schema types, and by the systematic definition of the relationships between those objects to reflect the meaning of the language. In particular, the dynamic semantics relates [XPath/XQuery] expressions to the XML value to which they evaluate, and the static semantics relates [XPath/XQuery] expressions to the XML Schema type that is infered for that expression.
The Formal Semantics uses several kinds of formal notations to define the relationships between [XPath/XQuery] expressions, XML values, and XML Schema types. This section contains a small tutorial to introduce the notations for judgments, inference rules, and mapping rules as well as the notation for environments, which implement the dynamic and static contexts. The reader already familiar with these notations can skip this section and continue with [2.2 XML Values].
Grammar productions are used to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.
XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.
| [41 (XQuery)] | FLWORExpr |
::= | |
For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in [XML Path Language (XPath) 2.0], and are annotated with "(XPath)". For instance, the following production describes for expressions in XPath.
| [25 (XPath)] | ForExpr |
::= | |
XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.
| [35 (Core)] | ForExpr |
::= | |
The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe values in the [XQuery 1.0 and XPath 2.0 Data Model], and to describe the [XPath/XQuery] type system. Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.
| [43 (Formal)] | Definition |
::= | |
Note that grammar productions that are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.
The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.
For example:
Notation
The judgment
holds if the object Painting is beautiful.
Notation
Here are three judgments that are used extensively in this document.
The judgment
holds if the expression Expr yields (or evaluates to) the value Value.
The judgment
holds when the expression Expr has type Type.
The judgment
holds when the expression Expr raises the error Error.
A judgment can contain symbols and patterns.
Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.
Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr represents any [XPath/XQuery] expression, and Value represents any value in the [XPath/XQuery] data model.
When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).
Patterns may appear with subscripts (e.g. Expr1, Expr2) to distinguish different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. The judgment'$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.
In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.
Inference rules are used to specify whether a judgment holds or not. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments.
A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:
| premise1 ... premisen |
|
|
| conclusion |
All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.
Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:
| $x => 0 3 => 3 |
|
|
| $x + 3 => 3 |
This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.
It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:
|
|
| 3 => 3 |
This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.
The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:
| Variable => Integer |
|
|
| Variable + 0 => Integer |
As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".
Note
In effect, inference rules are just a notation that describes a bottom-up algorithm, for instance an evaluation algorithm where the result of an expression depends on the result for its sub-expressions.
Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" expression can be captured in an environment and used for further evaluations.
An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.
If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped. The notation is intentionally akin to function application as an environment can be seen as a function from the argument symbol to the "object" to which the symbol is mapped.
This document uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".
Updating is only defined on environment groups:
"env + mem(symbol => object) " denotes the new environment group that is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol => object indicates that symbol is mapped to object in the new environment.
If the "object" is a type then the following notation relates a symbol to a type: "env + mem(symbol : object) ".
The following shorthand is also allowed: "env + mem( symbol1 => object1 ; ... ; symboln => objectn ) " in which each symbol is mapped to a corresponding object in the new environment.
This notation is equivalent to nested updates, as in " (env + mem( symbol1 => object1) + ... ) + mem(symboln => objectn)".
Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of a symbol (e.g., a variable, a namespace prefix, etc.) Also, note that there are no operations to remove entries from environments: this is never necessary because updating an the environment group effectively creates a new extended copy of the original environment group, and the original environme group remains accessible along with the updated copy.
Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is used in almost all inference rules.
For instance, the judgment
is read as: Assuming the dynamic environment dynEnv, the expression Expr yields the value Value.
The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [3.1 Expression Context].
Putting the above notations together, here is an example of an inference rule that occurs later in this document:
This rule is read as follows: if two expressions Expr1 and Expr2 are known to have the static types types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the static type "Type1, Type2", which is the sequence of types Type1 and Type2.
The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.
| statEnv |- Expr1 : Type1 statEnv + varType(QName : Type1) |- Expr2 : Type2 |
|
|
statEnv |-
let $QName :=
Expr1
return Expr2 : Type2 |
This rule is as follows: First, the type Type1 for the "let" input expression Expr1 is computed. Second the "let" variable is added into the varType component of the static environment group statEnv, with type Type1. Finally, the type Type2 of Expr2 is computed in that new environment.
Ed. Note: Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.
[XPath/XQuery] manipulates XML values as defined in the [XQuery 1.0 and XPath 2.0 Data Model]. XML values are composed of nodes, atomic values and sequences. This section introduces formal notations for describing [XPath/XQuery] values from [XQuery 1.0 and XPath 2.0 Data Model]. These notations are used to describe and manipulate values in inference rules, but are not exposed to the [XPath/XQuery] user.
A value is a sequence of zero or more items. An item is either an atomic value or a node.
An atomic value is a value in the value space of an
atomic type and is labeled with the name of that atomic
type. An XML Schema atomic type [XML Schema Part 2] may be
primitive or derived, or
xdt:untypedAtomic.
A node is either an element, an attribute, a
document, a text, a comment, or a
processing-instruction node. Elements have a type
annotation and contain a value. Attributes have a type
annotation and contain a simple value, which is a
sequence of atomic values. Text nodes always contain
one string value of type
xdt:untypedAtomic, therefore the
corresponding type annotation is omitted.
A type annotation can be either the QName of a declared type or an anonymous type. An anonymous type corresponds to an XML Schema type for which the schema writer did not provide a name. Anonymous type names are not visible to the user, but are generated during schema validation and used to annotate nodes in the data model. By convention, anonymous type names are written in the Formal Semantics as: [Anon0], [Anon1], etc.
Untyped elements (e.g., from well-formed documents)
are annotated with xs:anyType, untyped
attributes are annotated with
xs:anySimpleType, and untyped atomic
values (i.e., text content or attribute content in
well-formed documents) are annotated with
xdt:untypedAtomic.
Element have an optional "nilled" marker. This
marker can only be present if the element has been
validated against an element type in the schema which
is "nillable", and they have no content and an
attribute xsi:nil set to
"true".
Notation
In the above grammar, "String" indicates the value
space of xs:string, "Decimal" indicates
the value space of xs:decimal, etc.
Note that the same rule about constructing sequences
apply to the values described by that grammar. Notably
sequences cannot be nested. For example, the sequence
(10, (1, 2), (), (3, 4)) is equivalent to
the sequence (10, 1, 2, 3, 4).
Ed. Note: Issue: The formal semantics does not represent namespace nodes (See Issue 486. (FS-Issue-0143)).
A well-formed document
<fact>The cat weighs <weight units="lbs">12</weight> pounds.</fact>
In the absence of a Schema, this document is represented as
element fact of type xs:anyType {
text { "The cat weighs " },
element weight of type xs:anyType {
attribute units of type xs:anySimpleType {
"lbs" of type xdt:untypedAtomic
}
text { "12" }
},
text { " pounds." }
}
A document before and after validation.
<weight xsi:type="xs:integer">42</weight>
The formal model for values can represent values before and after validation. Before validation, this element is represented as:
element weight of type xs:anyType {
attribute xsi:type of type xs:anySimpleType {
"xs:integer" of type xdt:untypedAtomic
},
text { "42" }
}
After validation, this element is represented as:
element weight of type xs:integer {
attribute xsi:type of type xs:QName {
"xs:integer" of type xs:QName
},
42 of type xs:integer
}
An element with a list type
<sizes>1 2 3</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 2 3" }
}
Assume the following Schema.
<xs:element name="sizes" type="sizesType"/>
<xs:simpleType name="sizesType">
<xs:list itemType="sizeType"/>
</xs:simpleType>
<xs:simpleType name="sizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>
After validation against this Schema, the element is represented as:
element sizes of type sizesType {
1 of type sizeType,
2 of type sizeType,
3 of type sizeType
}
An element with an anonymous type
<sizes>1 2 3</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 2 3" }
}
Assume the following Schema.
<xs:element name="sizes">
<xs:simpleType>
<xs:list itemType="xs:integer"/>
</xs:simpleType>
</xs:element>
After validation, this element is represented as:
element sizes of type [Anon1] {
1 of type xs:integer,
2 of type xs:integer,
3 of type xs:integer
}
where [Anon1] stands for the internal
anonymous name generated by the system for the
sizes element.
An element with a nillable
<sizes xsi:nil="true"/>
Before validation, this element is represented as:
element sizes of type xs:anyType {
attribute xsi:nil of type xs:anySimpleType { "true" }
}
Assume the following Schema.
<xs:element name="sizes" type="sizesType" nillable="true"/>
After validation against this Schema, the element is represented as:
element sizes nilled of type sizesType {
attribute xsi:nil of type xs:anySimpleType { "true" }
}
An element with a union type
<sizes>1 two 3 four</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 two 3 four" }
}
Assume the following Schema:
<xs:element name="sizes" type="sizesType"/>
<xs:simpleType name="sizesType">
<xs:list itemType="sizeType"/>
</xs:simpleType>
<xs:simpleType name="sizeType">
<xs:union memberType="xs:integer xs:string"/>
</xs:simpleType>
After validation against this Schema, the element is represented as:
element sizes of type sizesType {
1 of type xs:integer,
"two" of type xs:string,
3 of type xs:integer,
"four" of type xs:string
}
The [XPath/XQuery] type system is used in the specification of the dynamic and of the static semantics of [XPath/XQuery]. This section introduces formal notations for describing types.
The [XPath/XQuery] type system is based on XML Schema. Formalizing the treatment of types in [XPath/XQuery], however, requires some adjustments.
Use of formal notations for types. The Formal Semantics uses formal notations for types instead of XML Schema syntax. Those notations are used extensively to describe and manipulate types in the inference rules. The formal notations for types introduced here are not exposed to the [XPath/XQuery] user.
Representation of content models. For the
purpose of static typing, the [XPath/XQuery] type
system only describes minOccurs, maxOccurs combinations
which corresponds to the standard DTD constructs
+, *, and ?.
Choices are represented using the standard DTD
construct |. All groups are represented
using the & notation.
Representation of anonymous types. To clarify the semantics, the [XPath/XQuery] type system makes all anonymous types explicit.
Representation of XML Schema simple type facets and identity constraints. For simplicity, XML Schema simple type facets as well as indentity constraints are not formally represented in the [XPath/XQuery] type system. Still, [XPath/XQuery] implementation supporting XML Schema import and validation must follow the XML Schema specification and take simple type facets and indentity constraints into account.
The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas]. The rest of this section is organized as follows. [2.3.2 Item types] describes types items, [2.3.3 Content models] describes content models, and [2.3.4 Top level definitions] describe top-level type declarations.
An item type is either an atomic type, an element type, an attribute type, a document node type, a text node type, a comment node type, or a processing instruction type.
An element or attribute type has an optional name and an optional type reference. A name alone corresponds to a reference to a global element or attribute declaration. A name with a type reference corresponds to a local element or attribute declaration. The word "element" or "attribute" alone refers to the wildcard types any element or any attribute. In addition, an element type has an optional nillable flag which indicates whether the element can be nilled or not.
A document type has an optional content type. If no content type is given, then it refers to the wildcard type describing any document.
Note
Note that generic node types (e.g.,
node()), are interpreted in the type
system as union types (e.g., element | attribute
| text | comment | processing-instruction) and
therefore do not appear here. The semantics of sequence
types is described in [3.4.3.1
SequenceType Matching].
Examples
A text node type
text
matches any text node, such as:
text { "Text is beautiful" }
A wildcard element
element
matches any element.
A wildcard element of type string
element of type xs:integer
matches any element of type string, such as:
element name of type xs:string { "John Doe" }
A nillable element of type string
element size nillable of type xs:integer
matches any element with name size of
type xs:integer, such as:
element size of type xs:integer {
2 of type xs:integer
}
or it matches any element with name
size which has no content and the
xsi:nil attribute set to true, such
as:
element size of type xs:integer {
attribute xsi:nil of type xs:anySimpleType { "true" }
}
A reference to a globally declared attribute
attribute sizes
refers to the global declaration for the attribute
sizes.
An element with an anonymous type
element sizes of type [Anon1]
matches any element with name sizes and
the anonymous type [Anon1], such as:
element sizes of type [Anon1] {
1 of type xs:integer,
2 of type xs:integer,
3 of type xs:integer
}
Following XML Schema, types in [XPath/XQuery] are
composed from item types by optional, one or more, zero
or more, all group, sequence, choice, empty sequence,
or empty choice (written none).
The type empty matches the empty
sequence. The type none matches no values.
It is called the empty choice because it is the
identity for choice, that is (Type |
none) = Type)). The type
none is the static type for [6.2.1 The fn:error
function].
| [28 (Formal)] | Type |
::= | |
| [29 (Formal)] | Occurrence |
::= | |
The [XPath/XQuery] type system includes three binary operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema. The [XPath/XQuery] type system includes three unary operators on types: "*", "+", and "?", corresponding respectively to zero or more instances of the type, one or more instances of the type, or an optional instance of the type.
The "&" operator builds the "interleaved
product" of two types. The type Type1 &
Type2 matches
any sequence that is an interleaving of a sequence that
matches Type1
and a sequence that matches Type2.
The interleaved product is used to represent all groups in XML Schema. All group in XML Schema are restricted to apply only on global or local element declarations with lower bound 0 or 1, and upper bound 1.
Examples
A sequence of elements
The "," operator builds the "sequence" of two types. For example,
element title of type xs:string, element year of type xs:integer
is a sequence of an element title of type string followed by an element year of type integer.
The union of two element types
The "|" operator builds the "union" of two types. For example,
element editor of type xs:string | element bib:author
means either an element editor of type string, or a
reference to the gobal element
bib:author.
An all group of two elements
The "&" operator builds the "interleaved product" of two types. For example,
(element a & element b) =
element a, element b
| element b, element a
which specifies that the a and
b elements can occur in any order.
An empty type
The following type matches the empty sequence.
empty
A sequence of zero or more elements
The following type matches zero or more elements
each of which can be a surgeon or a
plumber.
(element surgeon | element plumber)*
Top level definitions correspond to global element declarations, global attribute declarations and type definitions in XML Schema.
A type definition has a name (possibly anonymous) and a type derivation. In the case of a complex type, or a simple type derived by list or uniont, derivation indicates if the type is derived by extension or restriction from its based type, gives its content model, and an optional flag indicating if it has mixed content. In the case of an atomic type, it just indicate from which type it is derived from.
A type is derived either by extension or restriction
from a base type. In case the type derivation is
omitted, the type derives by restriction from
xs:anyType, as in:
define type Bib { xs:integer } =
define type Bib restricts xs:anyType { xs:integer }
Empty content can be indicated with the explicit empty sequence, or omitted, as in:
define type Bib { } =
define type Bib { empty }
Global element and attribute declarations have always a name, and a reference to a (possibly anonymous) type. In addition, a global element declaration may declare a substitution group for the element and whether the element is nillable.
Examples
A type declaration with complex content
define type Address {
element name of type xsd:string,
element street of type xsd:string*
}
A type declaration with complex content derived by extension
define type USAddress extends Address {
element zip name of type xsd:integer
}
A type declaration with mixed content
define type Section mixed {
(element h1 of type xsd:string |
element p of type xsd:string |
element div of type Section)*
}
A type declaration with simple content derived by restriction
define type SKU restricts xsd:string { xsd:string }
An element declaration
define element address of type Address
An element declaration with a substitution group
define element usaddress substitutes for address of type USAddress
An element declaration which is nillable
define element zip nillable of type xs:integer
Here is a schema describing purchase orders, taken from the XML Schema Primer, followed by its mapping into the [XPath/XQuery] type system. The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas].
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
namespace xsd = "http://www.w3.org/2001/XMLSchema"
define element purchaseOrder of type ipo:PurchaseOrderType
define element comment of type xsd:string
define type PurchaseOrderType {
attribute orderDate of type xsd:date?,
element shipTo of type USAddress,
element billTo of type USAddress,
element ipo:comment?,
element items of type Items
}
define type USAddress {
attribute country of type xsd:NMTOKEN?,
element name of type xsd:string,
element street of type xsd:string,
element city of type xsd:string,
element state of type xsd:string,
element zip of type xsd:decimal
}
define type Items {
attribute partNum of type SKU,
element item of type [Anon1]*
}
define type [Anon1] {
element productName of type xsd:string,
element quantity of type [Anon2],
element USPrice of type xsd:decimal,
element comment?,
element shipDate of type xsd:date?
}
define type [Anon2] restricts xsd:positiveInteger { xsd:positiveInteger }
define type SKU restrict xsd:string { xsd:string }
Note the way the two anonymous types inside the
item element are handled by giving them
anonymous names [Anon1] and
[Anon2].
The following additional definitions illustrate how more advanced XML Schema features (a complex type derived by extension, an anonymous simple type derived by restriction, and substitution groups) are represented in the [XPath/XQuery] type system.
<complexType name="NYCAddress">
<complexContent>
<extension base="USAddress">
<sequence>
<element ref="apt"/>
</sequence>
</extension>
</complexContent>
</complexType>
<element name="apt"/>
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="10000"/>
</xsd:restriction>
</xsd:simpleType>
<element>
<element name="usaddress" substitutionGroup="address" type="USAddress"/>
<element name="nycaddress" substitutionGroup="usaddress" type="NYCAddress"/>
Those definitions are written in the [XPath/XQuery] type system as follows.
define type NYCAddress extends USAddress {
element apt
}
define element apt of type [Anon3]
define type [Anon3] restricts xsd:positiveInteger
define element usaddress substitutes for address of type USAddress
define element nycaddress substitutes for usaddress of type NYCAddress
This section defines a processing model for [XPath/XQuery], then defines formal judgments for each key phase in that processing model (normalization, static type analysis and dynamic evaluation).
This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.
The processing model consists of five phases; each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we point to the relevant notations introduced later in the document.
Parsing. The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal abstract syntax tree of the parsed query is created.
Context Processing. The semantics of [expression/query] depends on the input context. The input context needs to be generated before the [expression/query] can be processed. In XQuery, the input context may be defined by the processing environment and by statements in the Query Prolog (See [5 Modules and Prologs]). In XPath, the input context is defined by the processing environment. The input context has a static and a dynamic part (denoted statEnv and dynEnv, respectively).
Normalization. To simplify the semantics
specification, some normalization is performed on
the [expression/query]. The [XPath/XQuery] language
provides many powerful features that make
[expression/query]s simpler to write and use, but
are also redundant. For instance, a complex
for expression might be rewritten as a
composition of several simple for
expressions. The language composed of these simpler
[expression/query] is called the [XPath/XQuery]
Core language and is described by a
grammar which is a subset of the XQuery grammar.
The grammar of the [XPath/XQuery] Core language is
given in [A Normalized core
grammar].
During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literal expressions and variables.
Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.4.2 Normalization judgment].
After normalization, the full semantics is obtained by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.
Static type analysis. Static type analysis is optional. If this phase is not supported, then normalization is followed directly by dynamic evaluation.
Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and of input documents into account.
If the [expression/query] is not type-safe, static type analysis yields a type error. For instance, a comparison between an integer value and a string value might be detected as an type error during the static type analysis. If static type analysis succeeds, it yields an abstract syntax tree where each sub-expression is "annotated" with its static type
More precisely, the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.4.3 Static typing judgment].
Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.
Dynamic Evaluation. This phase computes the value of an [expression/query]. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals and variables. Evaluation may result in a value OR a dynamic error, which may be a non-type error or a type error. If static typing of an expression does not raise a type error, then dynamic evaluation of the same expression will not raise a type error, although dynamic evaluation may raise some non-type error.
The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (dynEnv), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.4.4 Dynamic evaluation judgment].
The first four phases above are "analysis-time" (sometimes also called "compile-time") steps, meaning that they can be performed on a [expression/query] before examining any input document. Indeed, analysis-time processing can be performed before input documents even exist. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. The last phase is an "execution-time" (sometimes also called "run-time") step, meaning that the query is evaluated on actual input document(s).
Static analysis catches only certain classes of
errors. For instance, it can detect a comparison
operation applied between incompatible types (e.g.,
xs:int and xs:date). Some
other classes of errors cannot be detected by the
static analysis and are only detected at execution
time. For instance, whether an arithmetic expression on
32 bits integers (xs:int) yields an
out-of-bound value can only be detected at run-time by
looking at the data.
While implementations are free to implement different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report type errors during static analysis, as specified in this document. Dynamically typed implementations are required to find and report type errors during evaluation, but are permitted to report them during static analysis.
Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: [XPath/XQuery] processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model simply defines the final result.
The above processing phases are all internal to the [XPath/XQuery] processor. They do not deal with how the [XPath/XQuery] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:
XML Schema import phase. The [XPath/XQuery] type system is based on XML Schema. In order to perform dynamic or static typing, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [8 Importing Schemas].
XML loading phase. Expressions are evaluated on values in the [XQuery 1.0 and XPath 2.0 Data Model]. XML documents must be loaded into the [XQuery 1.0 and XPath 2.0 Data Model] before the evaluation phase. This is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not discussed further here.
Serialization phase. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is described in [XQuery 1.0 and XPath 2.0 Data Model Serialization] and is not discussed further here.
The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.
Normalization is specified using mapping rules which describe how a [XPath/XQuery] expression is rewritten into an expression in the [XPath/XQuery] Core. Mapping rules are also used in [8 Importing Schemas] to specify how XML Schemas are imported into the [XPath/XQuery] type system.
Notation
Mapping rules are written using a square bracket notation, as follows:
| [Object]Subscript |
| == |
| Mapped Object |
The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.
(Since normalization is always done in the context of the static context the above is really a shorthand for
We use the shorthand because statEnv is always implied.)
The static environment is used in certain cases (e.g. for normalization of function calls) during normalization. To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.
Ed. Note: [Kristoffer/XSL] We should decide whether to use a shorthand notation as suggested or modify the mapping rules throughout.
The normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is:
| [Expr]Expr |
| == |
| CoreExpr |
which indicates that the expression Expr is normalized to the expression CoreExpr in the [XPath/XQuery] core (with the implied statEnv).
Example
For instance, the following [expression/query]
for $i in (1, 2),
$j in (3, 4)
return
element pair { ($i,$j) }
is normalized to the core expression
for $i in (1, 2) return
for $j in (3, 4) return
return
element pair { ($i,$j) }
in which the complex "FWLR" expression is mapped into a composition of two simpler "for" expressions.
The static semantics is specified using type inference rules, which relate [XPath/XQuery] expressions to types and specify under what conditions an expression is well typed.
Notation
The judgment
holds when, in the static environment statEnv, the expression Expr has type Type.
Example
The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.
For instance, the following expression.
let $v := 3 return $v+5
has type xs:integer. This can be
inferred as follows: the input literals '3' and '5'
have type integer, so the variable $v also has type
integer. Since the sum of two integers is an integer,
the complete expression has type integer.
Note
The type of an expression is computed by inference. Static type inference rules define for each kind of expression how to compute the type of the expression given the types of its sub-expressions. Here is a simple example of such a static typing rule:
statEnv |-
Expr1 :
xs:boolean
statEnv |-
Expr2 :
Type2
statEnv |-
Expr3 :
Type3 |
|
|
statEnv |-
if Expr1 then
Expr2
else Expr3 : (
Type2 |
Type3
) |
This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a union: '(Type2|Type3)'.
The "left half" (the part before the :) of the expression below the line corresponds to some [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal abstract syntax tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr1, Expr2, and Expr3) that need to be matched against the children of the node in the abstract syntax tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the condition expression and the two branches of the if-then-else expression. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This illustrates a general feature of the [XPath/XQuery] type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the abstract syntax of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type-safe.
The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.
Notation
The judgment
holds when, in the static environment statEnv and dynamic environment dynEnv, the expression Expr yields the value Value.
The judgment
holds when, in the static environment statEnv and dynamic environment dynEnv, the expression Expr raises the error Error.
The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.
Example
For instance, the following expression.
let $v := 3 return $v+5
yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.
Note
As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions.
The inference rules used for dynamic evaluation, like those for static type inference, follow a bottom-up recursive structure, computing the value of expressions from the values of their sub-expressions.
Whenever appropriate, the Formal Semantics uses the following namespace prefixes.
fn: for functions and operators
from the [XQuery
1.0 and XPath 2.0 Functions and Operators]
document.
xs: for XML Schema components, and
built-in types.
xdt: for XML Schema components, and
built-in types.
All these prefixes are assumed to be bound to the appropriate URIs.
In addition, the Formal Semantics uses the following special prefixes for the means of specification.
dm: for accessors of the [XQuery 1.0 and XPath 2.0 Data Model].
op: for operators in [XQuery 1.0 and XPath 2.0 Functions and Operators].
fs: for functions and types defined in the formal semantics.
Those prefixes are always shown in italics to emphasize that the corresponding functions, variables, and types are abstract: they are not and cannot be made accessible directly from the host language. None of those special prefixes are given a URI.
The [XQuery 1.0 and XPath 2.0 Functions and Operators] document defines built-in functions available in [XPath/XQuery]. A number of these functions are used to define the [XPath/XQuery] semantics. The list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the [XPath/XQuery] Formal Semantics is given in [B.1 Functions and Operators used in the Formal Semantics].
Many functions in the [XQuery 1.0 and XPath 2.0
Functions and Operators] document are
generic: they perform operations on arbitrary
components of the data model, e.g., any kind of node,
or any sequence of items. For instance, the
fn:distinct-nodes function removes
duplicates in any sequence of nodes. As a result, the
signature given in the [XQuery 1.0 and XPath 2.0
Functions and Operators] document is also generic.
For instance, the signature of the
fn:distinct-nodes function is:
fn:distinct-nodes(node*) as node*
Applied directly, this signature results in little
type information. For such functions, better type
information can often be obtained by having the output
type depend on the type of input parameters. For
instance, if the function
fn:distinct-nodes is applied on a
parameter of type element a*, element b,
one can easily deduce that the resulting sequence is a
collection of either a or b
elements.
In order to provide better static typing for those functions, specific typing rules are given in [6 Additional Semantics of Functions].
The organization of this section parallels the organization of Section 2 of the [XPath/XQuery] document.
Introduction
The expression context for a given expression consists of all the information that can affect the result of the expression. This information is organized into the static context and the evaluation context. This section specifies the environments that represent the context information used by [XPath/XQuery] expressions.
The environment group statEnv denotes the environments that are available during static analysis. Static analysis may extend parts of the static environment. The static environment is also available during dynamic evaluation.
The following environments are part of the static environment group:
| statEnv.xpath1.0_compatibility |
|
||||||||||||||||
| statEnv.namespace |
|
||||||||||||||||
| statEnv.default_type_namespace |
|
||||||||||||||||
| statEnv.default_function_namespace |
|
||||||||||||||||
| statEnv.typeDefn |
|
||||||||||||||||
| statEnv.elemDecl |
|
||||||||||||||||
| statEnv.attrDecl |
|
||||||||||||||||
| statEnv.varType |
|
||||||||||||||||
| statEnv.localElemDecl |
|
||||||||||||||||
| statEnv.localAttrDecl |
|
||||||||||||||||
| statEnv.funcType |
|
||||||||||||||||
| statEnv.collations |
|
||||||||||||||||
| statEnv.defaultCollation |
|
||||||||||||||||
| statEnv.validationMode |
|
||||||||||||||||
| statEnv.validationContext |
|
||||||||||||||||
| statEnv.baseUri |
|
||||||||||||||||
| statEnv.collectionType |
|
||||||||||||||||
| statEnv.docType |
|
Environments have an initial state when [expression/query] processing begins, containing, for example, the function signatures of all built-in functions.
A common use of the static environment is to expand QNames by looking up namespace prefixes in the statEnv.namespace environment.
The helper function expand is
used to expand QNames by looking up the
namespace prefix in statEnv.namespace. This function
is defined as follows:
statEnv |- expand(QName)
= qname(URI, ncname)
or
statEnv |- expand(QName)
= qname(ncname)
the right hand side is a QName value (not a QName expression).
The helper expand
function is defined as follows:
If QName matches
NCName1:NCName
2, and if statEnv.namespace(NCName
1) = URI, then
the expression yields
qname(URI,NCName2). If the namespace prefix
NCName1
is not found in statEnv.namespace, then the
expression does not apply (that is, the inference
rule will not match).
If QName matches NCName, NCName is an element or type name, and statEnv.default_type_namespace = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.
If QName matches NCName, NCName is a function name, and statEnv.default_function_namespace = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.
Ed. Note: The above rules could be given as proper inference rules defining the
expandjudgment.
Here is an example that shows how the static environment is modified in response to a namespace definition.
This rule reads as follows: "the phrase on the bottom (a namespace declaration followed by a sequence of expressions) is well-typed (accepted by the static type inference rules) within an environment statEnv if the sequence of expressions above the line is well-typed in the environment obtained from statEnv by adding the namespace declaration".
This is the common idiom for passing new information in an environment using sub-expressions. In the case where the environment must be updated with a completely new component, the following notation is used:
statEnv [ namespace = (NewEnvironment) ]The environment group dynEnv denotes the group of environments built and used during dynamic evaluation.
The following environments are part of evaluation environment group:
| dynEnv.funcDefn |
|
|||||
| dynEnv.varValue |
|
|||||
| dynEnv.dateTime |
|
|||||
| dynEnv.timezone |
|
Ed. Note: Somewhere an exact definition of what is in the initial environments is needed. Notice that for XPath this is partially defined by the containing language. See Issue 458 (FS-Issue-0115).
The following Formal Semantics built-in variables represent the context item, context position, and context size properties of the evaluation context:
| Built-in Variable | Represents: |
$fs:dot |
context item |
$fs:position |
context position |
$fs:last |
context size |
Variables with the "fs" namespace prefix are reserved for use in the definition of the Formal Semantics. It is a static error to define a variable in the "fs" namespace.
Values of
$fs:dot,
$fs:position and
$fs:last can be
obtained by invoking the fn:context-item(),
fn:position() and fn:last()
functions, respectively.
[XPath/XQuery] has three functions that provide access
to input data: input,
collection, and document. The
dynamic semantics of these three input functions are
described in more detail in [XQuery 1.0 and XPath 2.0
Functions and Operators]. The static typing for these
function is still an open issue (See Issue 480
(FS-Issue-0137)).
This section of the [XPath/XQuery] document gives background material about the [XPath/XQuery] data model. Formal representation of data model values has been described in [2.2 XML Values].
All the built-in types of XML Schema are recognized
by [XPath/XQuery]. In addition, [XPath/XQuery]
recognizes the predefined types:
xdt:anyAtomicType,
xdt:untypedAtomic,
xdt:yearMonthDuration,
xdt:dayTimeDuration. The representation of
those types in the [XPath/XQuery] type system is given
below.
Definition of xs:anyType. The
following type definition reflects the semantics of the
Ur typefrom Schema in the [XPath/XQuery] type
system.
define type xs:anyType restricts xs:anyType {
attribute*,
( xdt:anyAtomicType* |
( element* & text* & comment* & processing-instruction* ) )
}
Definition of xs:anySimpleType.
The following type definition reflects the semantics of
the Ur simple type from Schema in the [XPath/XQuery]
type system.
define type xs:anySimpleType restricts xs:anyType {
( xs:string
| xs:boolean
| xs:decimal
| xs:float
| xs:double
| xs:duration
| xs:dateTime
| xs:time
| xs:date
| xs:gYearMonth
| xs:gYear
| xs:gMonthDay
| xs:gDay
| xs:gMonth
| xs:hexBinary
| xs:base64Binary
| xs:anyURI
| xs:QName
| xs:NOTATION )*
}
The name of the Ur simple type is
xs:anySimpleType. It is derived by
restriction from xs:anyType, its content
is given by the union of all primitive atomic
types.
Definition of xdt:anyAtomicType.
The following type definition reflects the semantics of
xdt:anyAtomicType in the [XPath/XQuery]
type system.
define type xdt:anyAtomicType restricts xs:anySimpleType {
( xs:string
| xs:boolean
| xs:decimal
| xs:float
| xs:double
| xs:duration
| xs:dateTime
| xs:time
| xs:date
| xs:gYearMonth
| xs:gYear
| xs:gMonthDay
| xs:gDay
| xs:gMonth
| xs:hexBinary
| xs:base64Binary
| xs:anyURI
| xs:QName
| xs:NOTATION )
}
Definition of the XML Schema primitive types. The following type definitions reflects the semantics of the primitive types from Schema in the [XPath/XQuery] type system.
define type xs:string restricts xdt:anyAtomicType define type xs:boolean restricts xdt:anyAtomicType define type xs:decimal restricts xdt:anyAtomicType define type xs:float restricts xdt:anyAtomicType define type xs:double restricts xdt:anyAtomicType define type xs:duration restricts xdt:anyAtomicType define type xs:dateTime restricts xdt:anyAtomicType define type xs:time restricts xdt:anyAtomicType define type xs:date restricts xdt:anyAtomicType define type xs:gYearMonth restricts xdt:anyAtomicType define type xs:gYear restricts xdt:anyAtomicType define type xs:gMonthDay restricts xdt:anyAtomicType define type xs:gDay restricts xdt:anyAtomicType define type xs:gMonth restricts xdt:anyAtomicType define type xs:hexBinary restricts xdt:anyAtomicType define type xs:base64Binary restricts xdt:anyAtomicType define type xs:anyURI restricts xdt:anyAtomicType define type xs:QName restricts xdt:anyAtomicType define type xs:NOTATION restricts xdt:anyAtomicType
All of those primitive types derive from
xdt:anyAtomicType. Note that the value
space of each atomic type (such as
xs:string) does not appear. The value
space for each type is built-in and is as defined in [XML Schema Part 2].
Definition of xdt:untypedAtomic.
In addition, the type xdt:untypedAtomic is
defined as follows.
define type xdt:untypedAtomic restricts xdt:anyAtomicType
Note that this rule does not indicate the value
space of xdt:untypedAtomic. By definition,
xdt:untypedAtomic has the same value space
as xs:string.
The following example shows two atomic values. The first one is a value of type string containing "Database". The second one is an untyped value containing "Database". Both are using a string as content, but they have different type annotations.
"Databases" of type xs:string "Databases" of type xdt:untypedAtomic
Definition of the XML Schema derived types. The following type definitions reflects the semantics of the XML Schema types derived types derived by restriction from another atomic type.
define type xs:normalizedString restricts xs:string define type xs:token restricts xs:normalizedString define type xs:language restricts xs:token define type xs:NMTOKEN restricts xs:token define type xs:Name restricts xs:token define type xs:NCName restricts xs:Name define type xs:ID restricts xs:Name define type xs:IDREF restricts xs:Name define type xs:ENTITY restricts xs:Name define type xs:integer restricts xs:decimal define type xs:nonPositiveInteger restricts xs:integer define type xs:negativeInteger restricts xs:nonPositiveInteger define type xs:long restricts xs:integer define type xs:int restricts xs:long define type xs:short restricts xs:int define type xs:byte restricts xs:short define type xs:nonNegativeInteger restricts xs:integer define type xs:unsignedLong restricts xs:nonNegativeInteger define type xs:unsignedInt restricts xs:unsignedLong define type xs:unsignedShort restricts xs:unsignedInt define type xs:unsignedByte restricts xs:unsignedShort define type xs:positiveInteger restricts xs:nonNegativeInteger
Three XML Schema built-in derived types are derived
by list, as follows. Note that those derive directly
from xs:anySimpleType, since they are
derived by list, and that their value space is define
using a "one or more" occurrence indicator.
define type xs:NMTOKENS restricts xs:anySimpleType { xs:NMTOKEN+ }
define type xs:IDREFS restricts xs:anySimpleType { xs:IDREF+ }
define type xs:ENTITIES restricts xs:anySimpleType { xs:ENTITY+ }
For example, here is an element whose content is of
type xs:IDREFS.
element a of type xs:IDREFS {
"id1" of type xs:IDREF,
"id2" of type xs:IDREF,
"id3" of type xs:IDREF
}
Note that the type name xs:IDREFS
derives from xs:anySimpleType, but not
from xs:IDREF. As a consequence, calling
the following three XQuery functions with that element
a as a parameter succeeds in the case of
f1 and f2, but raises a
typing error in the case of f3.
define function f1($x as element(*,xs:anySimpleType)) { $x }
define function f2($x as element(*,xs:IDREFS)) { $x }
define function f3($x as element(*,xs:IDREF)) { $x }
Definition of xdt:yearMonthDuration
and xdt:dayTimeDuration.
xdt:yearMonthDuration and
xdt:dayTimeDuration are derived by
restriction from xs:duration.
define type xdt:yearMonthDuration restricts xs:duration define type xdt:dayTimeDuration restricts xs:duration
This section gives background material about [XPath/XQuery] type checking. The [XPath/XQuery] type system has been described in [2.3 The Type System].
Introduction
SequenceTypes can be used in [XPath/XQuery] to refer to a type imported from a schema (see [5 Modules and Prologs]). SequenceTypes are used to declare the types of function parameters and in several kinds of [XPath/XQuery] expressions.
The syntax of SequenceTypes is described by the following grammar productions.
The semantics of SequenceTypes is defined by means of normalization rules from SequenceTypes to the [XPath/XQuery] type system (see [2.3 The Type System]).
Core Grammar
The core grammar productions for sequence types are:
Ed. Note: Note that normalization on SequenceTypes does not occur during the normalization phase but whenever a dynamic or static rule requires it. The reason for this deviation from the processing model is that the result of SequenceType normalization is not part of the [XPath/XQuery] syntax (See Issue 432 (FS-Issue-0089)). SequenceType normalization is the only occurrence of such a deviation in the formal semantics.
Introduction
During processing of a query, it is sometimes necessary to determine whether a given value matches a type that was declared using the SequenceType syntax. This process is known as SequenceType matching, and is formally specified in [7.4 Judgments for type matching].
Sequence type matching is defined between a value and a type, rules to normalize a sequence type to the [XPath/XQuery] type system are necessary.
Notation
To define normalization of SequenceTypes to the [XPath/XQuery] type system, the following auxiliary mapping rule is used.
| [SequenceType]sequencetype |
| == |
| Type |
specifies that SequenceType is mapped to Type, in the [XPath/XQuery] type system.
OccurenceIndicators are left unchanged when normalizing SequenceTypes into [XPath/XQuery] types. Each kind of SequenceType component is normalized separately into the [XPath/XQuery] type system.
| [ItemType OccurrenceIndicator]sequencetype |
| == |
| [ItemType]sequencetype OccurrenceIndicator |
The "empty" sequence type is mapped to the empty sequence type in the [XPath/XQuery] type system.
| [empty()]sequencetype |
| == |
| [empty]sequencetype |
The SequenceType component with type SchemaType is mapped directly into the [XPath/XQuery] type system.
| [element(ElementName)]sequencetype |
| == |
| element ElementName |
| [attribute(@AttributeName)]sequencetype |
| == |
| attribute AttributeName |
| [element(ElementName, TypeName)]sequencetype |
| == |
| element ElementName of type TypeName |
| [attribute(@AttributeName, TypeName)]sequencetype |
| == |
| attribute AttributeName of type TypeName |
| [element(ElementName, TypeName nillable)]sequencetype |
| == |
| element ElementName nillable of type TypeName |
The mapping still does not handle SequenceTypes using SchemaContext (See Issue 481 (FS-Issue-0138)). The two following rules map references to a global element or attribute, without taking the schema context into account.
Document nodes, text nodes, and atomic types are left unchanged.
| [text()]sequencetype |
| == |
| text |
| [document()]sequencetype |
| == |
| document |
| [AtomicType]sequencetype |
| == |
| AtomicType |
| [comment()]sequencetype |
| == |
| comment |
| [processing-instruction()]sequencetype |
| == |
| processing-instruction |
The SequenceType components "node()" and "item()"
correspond to wildcard types. node()
indicates that any node is allowed and
item() indicates that any node or atomic
value is allowed. The following mapping rules make
use of the corresponding wildcard types.
| [node()]sequencetype |
| == |
| (element | attribute | text | document | comment | processing-instruction) |
| [item()]sequencetype |
| == |
(element | attribute | text | document |
comment | processing-instruction |
xdt:anyAtomicType ) |
Ed. Note: Jerome: The Formal Semantics makes use of fs:
numericwhich is not in XML Schema. This is necessary for the specification of some of XPath type conversion rules.
Introduction
Some expressions do not require that their operands exactly match an expected type. For example, function parameters and returns expect a value of a particular type, but automatically perform certain type conversions, such as extraction of atomic values from nodes, promotion of numeric values, and implicit casting of untyped values. The conversion rules for function parameters and returns are discussed in [4.1.4 Function Calls]. Other operators that provide special conversion rules include arithmetic operators ([4.4 Arithmetic Expressions]), value comparisons ([4.5.1 Value Comparisons]), and general comparisons ([4.5.2 General Comparisons]).
Atomization converts an item sequence into a
sequence of atomic values and is implemented by the
fn:data function. Atomization is applied
in contexts where an arbitrary sequence of items is
used where a sequence of atomic values is
expected.
If a sequence of items is encountered where a
boolean value is expected, the item sequence's
effective boolean value is used. The
fn:boolean function returns the
effective boolean value of an item sequence.
The semantics of some functions (see [6 Additional Semantics of Functions]) depends on whether the XPath 1.0 backward compatibility flag is true or false.
Expressions can raise errors during static analysis or
dynamic evaluation. The [XQuery 1.0 and XPath 2.0
Functions and Operators] [XQuery
1.0: A Query Language for XML], and [XML Path Language (XPath) 2.0]
specify the conditions under which an expression or
operator raises an error. The user may raise an error
explicitly by calling the fn:error function,
which takes an optional item as an argument.
Notation
The symbol Error denotes any error. We distinguish between a static error, a type error, denoted by typeError, and a generic dynamic error, denoted by dynError, which represents all dynamic errors. A static error is raised during static analysis. A type error may be raised during static analysis or dynamic evaluation. A dynamic error is raised during dynamic evaluation. Non-type static errors are not formalized in this document.
Ed. Note: We use a generic dynamic error, because the definition of all errors raised by functions and operators in [XQuery 1.0 and XPath 2.0 Functions and Operators] is under discussion.
In general, when an error is raised during evaluation of some expression Expr, the error is propogated to the expression Expr1 in which Expr is evaluated. The expression Expr1, in turn, propogates the error to the expression in which Expr1 is evaluated, and so on, until the error is returned to the query environment.
Since most expressions propogate errors as described, we use one inference rule to specify this default behavior. The rule below states that if any sub-expression Expri of expression Expr raises an error dynError then Expr also raises dynError.
There are several expressions that do not propogate an error raised by some sub-expression. For each such expression, we give specific error inference rules.
Ed. Note: Issue: Optional features and conformance levels are not formally specified. See Issue 512 (FS-Issue-0169).
The semantics of XML Schema import is described in [8 Importing Schemas].
The semantics of the static typing feature is described using static inference rules in the formal semantics.
The formal semantics of extensions is not specified.
| [1 (XQuery)] | Pragma |
::= | |
| [5 (XQuery)] | PragmaContents |
::= | |
| [2 (XQuery)] | MustUnderstandExtension |
::= | |
| [6 (XQuery)] | ExtensionContents |
::= | |
This section gives the semantics of all the [XPath/XQuery] expressions. The organization of this section parallels the organization of Section 3 of the [XPath/XQuery] document.
| [39 (XQuery)] | Expr |
::= | |
| [40 (XQuery)] | ExprSingle |
::= | |
| [15 (XPath)] | XPath |
::= | |
For each expression, a short description and the relevant grammar productions are given. The semantics of an expression includes the normalization, static analysis, and dynamic evaluation phases. Recall that normalization rules translate [XPath/XQuery] syntax into core [XPath/XQuery] syntax. In the sections that contain normalization rules, the Core grammar productions into which the expression is normalized are also provided. After normalization, sections on static type inference and dynamic evaluation define the static type and dynamic value for the core expression.
Core Grammar
The core grammar production for expressions is:
| [32 (Core)] | Expr |
::= | |
Primary expressions are the basic primitives of the language.They include literals, variables, function calls, and the parenthesized expressions.
| [73 (XQuery)] | PrimaryExpr |
::= | |
| [20 (XQuery)] | VarName |
::= | |
Core Grammar
The core grammar productions for primary expressions are:
| [56 (Core)] | PrimaryExpr |
::= | |
| [15 (Core)] | VarName |
::= | |
Introduction
A literal is a direct syntactic representation of an atomic value. [XPath/XQuery] supports two kinds of literals: string literals and numeric literals.
| [91 (XQuery)] | Literal |
::= | |
| [92 (XQuery)] | NumericLiteral |
::= | |
| [7 (XQuery)] | IntegerLiteral |
::= | |
| [8 (XQuery)] | DecimalLiteral(ws: explicit) |
::= | |
| [9 (XQuery)] | DoubleLiteral(ws: explicit) |
::= | |
| [10 (XQuery)] | StringLiteral(ws: significant) |
::= | |
| [22 (XQuery)] | PredefinedEntityRef(ws: explicit) |
::= | |
| [24 (XQuery)] | CharRef(ws: explicit) |
::= | |
All literals are core expressions, therefore no normalization rules are required for literals.
Core Grammar
The core grammar productions for literals are:
| [67 (Core)] | Literal |
::= | |
| [68 (Core)] | NumericLiteral |
::= | |
| [3 (Core)] | IntegerLiteral |
::= | |
| [4 (Core)] | DecimalLiteral(ws: explicit) |
::= | |
| [5 (Core)] | DoubleLiteral(ws: explicit) |
::= | |
| [6 (Core)] | StringLiteral(ws: significant) |
::= | |
| [17 (Core)] | PredefinedEntityRef(ws: explicit) |
::= | |
| [18 (Core)] | CharRef(ws: explicit) |
::= | |
In the static semantics, the type of an integer literal is simply xs:integer:
In the dynamic semantics, an integer literal is evaluated by constructing an atomic value in the data model, which consists of the literal value and its type:
The formal definitions of decimal, double, and string literals are analogous to those for integer.
|
|
||
|
|
|
||
|
Dynamic Errors
Literal expressions never raise an error.
Introduction
A variable evaluates to the value to which the variable's QName is bound in the evaluation context.
A variable is a core expression, therefore no normalization rule is required for a variable.
In the static semantics, the type of a variable is simply its type in the static type environment statEnv.varType:
If the variable is not bound in the static environment, the system raises a static error.
In the dynamic semantics, a variable is evaluated by "looking up" its value in dynEnv.varValue:
Dynamic Errors
If the variable is not bound in the dynamic environment, the system raises a dynamic error.
Ed. Note: MFF: Given sufficient consistency constraints on the static and dynamic environments, this error rule is not necessary, because every variable must be defined.
| [93 (XQuery)] | ParenthesizedExpr |
::= | |
Core Grammar
The core grammar production for parenthesized expressions is:
| [69 (Core)] | ParenthesizedExpr |
::= | |
Dynamic Errors
The default rules for propogating errors, described in [3.5 Errors Handling] apply to parenthesized expressions.
Introduction
A function call consists of a QName followed by a parenthesized list of zero or more expressions.
| [94 (XQuery)] | FunctionCall |
::= | |
Because [XPath/XQuery] implicitly converts the values of function arguments, a normalization step is required.
Notation
Ed. Note: Proposal (Kris). The following rules have been changed to make normalization independent of the runtime types.
Normalization of function calls uses an auxiliary mapping []FormalArgument(SequenceType) used to insert conversions of function arguments that depend only on the expected SequenceType of the formal argument. It is defined as follows:
| [Expr]FormalArgument(SequenceType) |
| == |
| Convert( Extract( Atomize([Expr]Expr))) |
Atomize(Expr) denotes
fn:data(Expr) |
If SequenceType <: xs:anyAtomic* | |
| Expr | Otherwise |
Extract(Expr) denotes
| fn:subsequence(Expr,1,1) | If statEnv.xpath1.0_compatibility is true and SequenceType <: item? | |
| Expr | Otherwise |
Convert(Expr) denotes
fs:convert-simple-operand(
Expr,PrototypicalValue) |
If SequenceType <: xs:anySimpleType |
| Expr | Otherwise |
where PrototypicalValue has the following values for each possible SequenceType:
Ed. Note: Todo: insert dummy prototypical values for each of the 44+4 types...
Each argument in a function call is normalized to its corresponding core expression by applying []FormalArgument(SequenceType) for each argument with the expected SequenceType for the argument inserted.
| [ QName (Expr1, ..., Exprn) ]Expr |
| == |
| QName ( [Expr1]FormalArgument(SequenceType1) , ..., [Exprn]FormalArgument(SequenceTypen)) ) |
Note that this normalization rule depends on the static environment containing function signatures and is the only place where we exploit the implicit presence of statEnv. Furthermore notice that the normalization is only well-defined when it is guaranteed that overloading is restricted to atomic types with the same quantifier. This is presently the case.
Core Grammar
The core grammar production for function calls is:
| [70 (Core)] | FunctionCall |
::= | |
Ed. Note: Proposal (Kris). The following rules attempt to solve the "atomic overload protection" problem by factoring typing over atomic unions. Also it is made explicit that specific typing rules for functions (in [6 Additional Semantics of Functions]) take precedence.
To typecheck a core function call we first check in Section [6 Additional Semantics of Functions] if there is a specialized typing rule for the function, and, if so, use it. Otherwise, the function signatures matching the function name and arity are retrieved from the static environment. If the function is not present in the environment with an appropriate signature, an error is raised. The type of each actual argument to the function must be a subtype of a type that is promotable to the corresponding formal argument type of the function; if the expected type is a union of atomic types then this check is performed separately for each possibility.
The first rule to bootstrap type checking of a function call expands the function's QName and then applies the function call rule for the expanded function call:
|
||||
|
|
||||
|
For a function call in which the type of some argument is a union of atomic types, we recursively type check the function call once for each atomic type in the union. The type of the entire function call is the union of the types for each distinct function call:
|
|||||
|
|
|||||
|
Notice that the function body is not reanalyzed for each invocation: static typing of the function definition itself guarantees that the function body always returns a value of the declared return type.
Finally, the following auxilliary rule type checks a function call in which none of the actual arguments has a type that is a union of atomic types. The rule looks up the function in the static environment and checks that some signature for the function satisfies the following constraint: the type of each actual argument is a subtype of some type that can be promoted to the type of the correponding formal argument. In this case, the function call is well typed and the result type is the return type specified in the function's signature.
|
||||||||
|
|
||||||||
|
Based on a function's name and argument types, the function body is retrieved from the dynamic environment. If the function is not present in the environment, a static error is raised.
If the function is a user-defined function then it is evaluated as follows. First, the rule evaluates each actual function argument expression. Next, a match is searched for among the function's possible declaration signatures, retrieved from the statEnv.funcType static environment component. If the function is not present in the environment, or there is no matching declaration signature, a type error is raised. Otherwise, the function body and formal variables are obtained from dynEnv.funcDefn for the matching declaration signature. The rule then extends dynEnv.varValue by binding each formal variable to its corresponding value (converted by the normalization as required for the expected type and backwards compatibility flag), and evaluates the body of the function in the new environment. The resulting value is the value of the function call.
|
|||||||||||
|
|
|||||||||||
|
Note that the function body is evaluated in the default (top-level) environment extended with just the parameter bindings. Note also that input values and output values are matched against the types declared for the function. If static analysis was performed, all these checks are guaranteed to be true and may be omitted.
If the function is a built-in function then the rule is somewhat simpler:
|
|||||||||||
|
|
|||||||||||
|
Built-in function calls use the following auxiliary judgment to evaluate the built-in function:
| "The built-in function F (from [XQuery 1.0 and XPath 2.0 Data Model], [XQuery 1.0 and XPath 2.0 Functions and Operators], or [6 Additional Semantics of Functions]) applied to the given parameter values yields the specified result value" | ||
|
|
||
|
Ed. Note: Issue:Built-in function calls must be defined more precisely. See Issue 522 (FS-Issue-0179).
Dynamic Errors
If the evaluation of any actual argument raises an error, the function call can raise an error. This rule applies to both user-defined and built-in functions. Note that if more than one expression may raise an error, the function call may raise any one of the errors.
|
|||||
|
|
|||||
|
If, for all possible function signatures, the evaluation of some actual argument yields a value that cannot be promoted to the corresponding formal type of the argument, the function call raises a type error. This rule applies to both user-defined and built-in functions.
|
||||||
|
|
||||||
|
If the evaluation of the function call to a user-defined function yields a value that cannot be promoted to the corresponding return type of the function, the function call raises a type error.
|
|||||||||||
|
|
|||||||||||
|
If the evaluation of the function call to a built-in function yields a value that cannot be promoted to the corresponding return type of the function, the built-in function call raises a type error.
|
|||||||||||
|
|
|||||||||||
|
Built-in function calls use the following auxiliary judgment to evaluate the built-in function call. If the built-in function raises an error, the function call raises an error.
| "The built-in function F (from data model, type constructor, or functions and operators) applied to the given parameter raises an error" | ||
|
|
||
|
Ed. Note: Issue:Built-in function calls must be defined more precisely. See Issue 522 (FS-Issue-0179).
| [3 (XQuery)] | ExprComment |
::= | |
| [4 (XQuery)] | ExprCommentContent |
::= | |
Comments are lexical constructs only, and have no meaning within the query and therefore have no formal semantics.
Introduction
Path expressions are used to locate nodes within a tree. There are two kinds of path expressions, absolute path expressions and relative path expressions. An absolute path expression is a rooted relative path expression. A relative path expression is composed of a sequence of steps.
| [68 (XQuery)] | PathExpr |
::= | |
| [69 (XQuery)] | RelativePathExpr |
::= | |
Core Grammar
PathExpr and RelativePathExpr are fully normalized. There for, there is no corresponding production in the core. The grammar of path expressions in the core starts with the StepExpr production.
Absolute path expressions are path expressions
starting with the / or /
symbols, indicating that the expression must be applied
on the root node in the current context. The root node in
the current context is the greatest ancestor of the
context node. The following two rules normalize absolute
path expressions to relative ones. They use the
fn:root() function, which returns the
greatest ancestor of its argument node. The let
expressions guarantee that the value bound to the context
variable $fs:dot is a
node.
| ["/"]Expr |
| == |
| fn:root(self::node()) |
| ["//" RelativePathExpr]Expr |
| == |
[fn:root( self::node() ) "/"
descendant-or-self::node() "/"
RelativePathExpr]Expr |
A composite relative path expression (using
/) is normalized into a for
expression by concatenating the sequences obtained by
mapping each node of the left-hand side in document order
to the sequence it generates on the right-hand side. The
call to the fs:distinct-doc-order
function ensures that the result is in document order
without duplicates. The evaluation context is defined by
binding the $fs:dot,
$fs:sequence,
$fs:position and
$fs:last
variables.
Note that sorting by document order enforces the restriction that input and output sequences contains only nodes.
| [StepExpr1 "/" StepExpr2]Expr | ||||||
| == | ||||||
|
Note that for this section uses some auxiliary judgments which are defined in [7.3 Judgments for step expressions and filtering].
Introduction
| [70 (XQuery)] | StepExpr |
::= | |
| [71 (XQuery)] | AxisStep |
::= | |
| [72 (XQuery)] | FilterStep |
::= | |
| [82 (XQuery)] | ForwardStep |
::= | |
| [83 (XQuery)] | ReverseStep |
::= | |
Core Grammar
The core grammar productions for XPath steps are:
| [53 (Core)] | StepExpr |
::= | |
| [55 (Core)] | FilterStep |
::= | |
| [60 (Core)] | ForwardStep |
::= | |
| [61 (Core)] | ReverseStep |
::= | |
Note
Step expressions can be followed by predicates. Normalization of predicates uses the following auxiliary mapping rule: []Predicates, which is specified in [4.2.2 Predicates]. Normalization for step expressions also uses the following auxiliary mapping rule: []Axis, which is specified in [4.2.1.1 Axes].
Normalization of predicates need to distinguish between forward steps, reverse steps, and primary expressions.
As explained in the [XPath/XQuery] document,
applying a step in XPath changes the focus (or
context). The change of focus is made explicit by the
normalization rule below, which binds the variable
$fs:dot to the node
currently being processed, and the variable
$fs:position to the
position (i.e., the position within the input sequence)
of that node.
In the case predicates are applied on a forward step, the input sequence is first sorted in document order and duplicates are removed. The context is changed to bind the appropriate variable in document order.
Note that applying sorting by document order is enforces the restriction that input and output sequences contains only nodes.
| [ForwardStep Predicates "[" Expr "]"]Expr | ||||
| == | ||||
|
In the case predicates are applied on a reverse step, the input sequence is first sorted in document order and duplicates are removed. The context is changed such that the position variable is bound in reverse document order.
Note that applying sorting by document order is enforces the restriction that input and output sequences contains only nodes.
| [ReverseStep Predicates "[" Expr "]"]Expr | |||||
| == | |||||
|
In the case predicates are applied on a primary expression, the input sequence is processed in sequence order and the context is bound as in the case of forward axis. In that case, the sequence can contain both nodes and atomic values.
| [PrimaryExpr Predicates "[" Expr "]"]Expr | ||||
| == | ||||
|
Finally, a stand-alone forward or reverse step is normalized through the auxiliary normalization rule for Axis.
The static semantics of an Axis NodeTest pair is obtained by retrieving the type of the context node, and applying the two filters (the Axis, and then the NodeTest with a PrincipalNodeKind) on the result.
|
||||||
|
|
||||||
| statEnv |- Axis NodeTest : Type3 |
Note
Note that the second judgment in the inference rule imposes that the context item be a node type, hence making sure the dynamic error in case the context node is an atomic value does not occur.
The dynamic semantics of an Axis NodeTest pair is obtained by retrieving the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the filter judgment as follows.
|
||||||
|
|
||||||
| dynEnv |- Axis NodeTest => Value3 |
Note
Note that the second judgment in the inference rule makes sure the axis nodetest expression is not evaluated in case the context item is not bound to a node.
Dynamic Errors
If the context item is not a node, the evaluation of an axis node test expression raises a dynamic error.
Introduction
| [86 (XQuery)] | ForwardAxis |
::= | |
| [87 (XQuery)] | ReverseAxis |
::= | |
| [58 (XPath)] | ForwardAxis |
::= | |
| [59 (XPath)] | ReverseAxis |
::= | |
Core Grammar
The core grammar productions for XPath axis are:
| [62 (Core)] | ForwardAxis |
::= | |
| [63 (Core)] | ReverseAxis |
::= | |