Copyright © 2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
This document defines the formal semantics of XQuery 1.0 [XQuery 1.0: A Query Language for XML] and XPath 2.0 [XML Path Language (XPath) 2.0].
This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.
This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.
Among the most important changes from the previous version of this document are:
The addition of the semantics of XPath 2.0 (notably changes in Section [5.2 Path Expressions]).
A new type system aligned with [XML Schema] (see Section [3 The XQuery Type System] and the new Section [8 Importing Schemas]).
The semantics of constructors (see Section [5.7 Constructors]).
An improved presentation (notably revisions to Section [2 Preliminaries]).
A complete list of issues is maintained in [C Issues]. The following are identified as high priority issues.
This document is not aligned with the other XQuery and XPath documents that include information about error handling and conformance levels. See [Issue-0094: Static type errors and warnings], [Issue-0098: Implementation of and conformance levels for static type checking], [Issue-0171: Raising errors], and [Issue-0169: Conformance Levels].
The semantics of sort by are still under discussion. See [Issue-0109: Semantics of sortby].
The current semantics does not completely cover all functions. Some functions used in the Formal Semantics, or some functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document need additional semantics specification. See [Issue-0135: Semantics of special functions].
Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).
XQuery 1.0, XPath 2.0, and their formal semantics has been defined jointly by the XML Query Working Group (part of the XML Activity) and the XSL Working Group (part of the Style Activity).
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.
1 Introduction
2 Preliminaries
2.1 Processing model
2.2 Namespaces
2.3 Data Model
2.3.1 Data model overview
2.3.2 Node identity
2.3.3 Document order and sequence order
2.3.4 Type annotations
2.4 Schemas and types
2.4.1 The elements of a (statically) typed language
2.4.2 XML Schema and the XQuery type system
2.4.3 Structural and named typing
2.4.4 Subtyping
2.4.4.1 Type substitutability
2.4.4.2 Subtyping and XML Schema derivation
2.5 Functions
2.5.1 Functions and operators
2.5.2 Functions and static typing
2.5.3 The error function
2.5.4 Data Model Accessors and XPath Axes
2.5.5 Other Formal Semantics functions
2.6 Notations
2.6.1 Grammar productions
2.6.2 Judgments
2.6.3 Inference rules
2.6.4 Environments
2.6.5 Putting it together
2.7 The Formal Semantics
2.7.1 Normalization
2.7.2 Static type inference
2.7.3 Dynamic Evaluation
3 The XQuery Type System
3.1 Values and Types
3.1.1 Values
3.1.2 Types
3.1.3 Top level definitions
3.1.4 Built-in type declarations
3.1.5 Syntactic constraints on types
3.1.6 Example
3.2 Auxiliary judgments
3.2.1 Derivation and substitution
3.2.1.1 Derives
3.2.1.2 Substitutes
3.2.2 Extension
3.2.3 Mixed content
3.2.4 Adjusts
3.2.5 Resolution
3.2.6 Derives
3.2.7 Lookup
3.2.8 Interleaving
3.2.9 Filtering
3.3 Matching
3.3.1 Nil-matches
3.3.2 Matches
3.3.3 Optimized matching
3.4 Erase and Annotate
3.4.1 Erasure
3.4.1.1 Simply erases
3.4.1.2 Erases
3.4.2 Annotate
3.4.2.1 Simply annotate
3.4.2.2 Nil-annotate
3.4.2.3 Annotate
3.5 Subtyping
3.5.1 Subtype
3.5.2 Type equivalence
3.6 Auxiliary typing judgments for for,
unordered, and sortby
expressions
3.6.1 Prime types
3.6.2 Computing Prime Types and Occurrence Indicators
3.7 Auxiliary typing judgments for typeswitch
expressions
3.7.1 Computing common types and occurrence in typeswitch
3.7.2 Computing Common Prime Types and Occurrence
Indicators
3.8 Major type issues
4 Basics
4.1 Expression Context
4.1.1 Static Context
4.1.2 Evaluation Context
4.2 Input Functions
4.3 Expression Processing
4.4 Types
4.4.1 Type Checking
4.4.2 SequenceType
4.4.3 Type Conversions
4.4.3.1 Atomization
4.4.3.2 Effective Boolean Value
4.4.3.3 Fallback Conversion
4.5 Errors and Conformance
5 Expressions
5.1 Primary Expressions
5.1.1 Literals
5.1.2 Variables
5.1.3 Parenthesized Expressions
5.1.4 Function Calls
5.1.5 Comments
5.2 Path Expressions
5.2.1 Steps
5.2.1.1 Axes
5.2.1.2 Node Tests
5.2.2 Predicates
5.2.3 Unabbreviated Syntax
5.2.4 Abbreviated Syntax
5.3 Sequence Expressions
5.3.1 Constructing Sequences
5.3.2 Combining Sequences
5.4 Arithmetic Expressions
5.5 Comparison Expressions
5.5.1 Value Comparisons
5.5.2 General Comparisons
5.5.3 Node Comparisons
5.5.4 Order Comparisons
5.6 Logical Expressions
5.7 Constructors
5.7.1 Element Constructors
5.7.2 Computed Element and Attribute Constructors
5.7.2.1 Computed Element Nodes
5.7.2.2 Constructed Attribute Nodes
5.7.3 Whitespace in Constructors
5.7.4 Other Constructors and Comments
5.8 [For/FLWR] Expressions
5.8.1 FLWR expressions
5.8.2 For expression
5.8.3 Let Expression
5.9 Order-Related Expressions
5.9.1 Sorting Expressions
5.9.2 Unordered Expressions
5.10 Conditional Expressions
5.11 Quantified Expressions
5.12 Expressions on SequenceTypes
5.12.1 Instance Of
5.12.2 Typeswitch
5.12.3 Cast and Treat
5.12.3.1 Cast as
5.12.3.2 Treat as
5.13 Validate Expressions
6 The Query Prolog
6.1 Namespace Declarations
6.2 Schema Imports
6.3 Xmlspace Declaration
6.4 Default Collation
6.5 Function Definitions
7 Additional Semantics of Functions
7.1 Formal Semantics Functions
7.1.1 The fs:characters-to-string function
7.1.2 The fs:distinct-doc-order function
7.1.3 The fs:item-sequence-to-node-sequence function
7.1.4 The fs:item-sequence-to-string function
7.2 Functions with specific typing rules
7.2.1 The xf:error function
7.2.2 The xf:distinct-nodes, and
xf:distinct-values functions
7.2.3 The op:union, op:intersect and
op:except operators
7.2.4 The op:to operator
7.2.5 The xf:data function
8 Importing Schemas
8.1 Introduction
8.1.1 Features
8.1.2 Organization
8.1.3 Main mapping rules
8.1.4 Special attributes
8.1.4.1 use
8.1.4.2 minOccurs and maxOccurs
8.1.4.3 mixed
8.1.4.4 nillable
8.1.4.5 substitutionGroup
8.2 Attribute Declarations
8.2.1 Global attributes declarations
8.2.2 Local attribute declarations
8.3 Element Declarations
8.3.1 Global element declarations
8.3.2 Local element declarations
8.4 Complex Type Definitions
8.4.1 Global complex type
8.4.2 Local complex type
8.4.3 Complex type with simple content
8.4.4 Complex type with complex content
8.5 Attribute Uses
8.6 Attribute Group Definitions
8.6.1 Attribute group definitions
8.6.2 Attribute group reference
8.7 Model Group Definitions
8.8 Model Groups
8.8.1 All groups
8.8.2 Choice groups
8.8.3 Sequence groups
8.9 Particles
8.9.1 Element reference
8.9.2 Group reference
8.10 Wildcards
8.10.1 Attribute wildcards
8.10.2 Element wildcards
8.11 Identity-constraint Definitions
8.12 Notation Declarations
8.13 Annotation
8.14 Simple Type Definitions
8.14.1 Global simple type definition
8.14.2 Local simple type definition
8.14.3 Simple type content
8.15 Schemas as a whole
8.15.1 Schema
8.15.2 Include
8.15.3 Redefine
8.15.4 Import
A Normalized core grammar
A.1 Core lexical structure
A.1.1 Syntactic Constructs
A.2 Core BNF
B References
B.1 Normative References
B.2 Non-normative References
B.3 Background References
C Issues
C.1 Introduction
C.2 Issues list
C.3 Alphabetic list of issues
C.3.1 Open Issues
C.3.2 Resolved (or redundant) Issues
C.4 Delegated Issues
C.4.1 XPath 2.0
C.4.2 XQuery
C.4.3 Operators
This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:
[XQuery 1.0: A Query Language for XML] introduces the XQuery 1.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XML Path Language (XPath) 2.0] introduces the XPath 2.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XQuery 1.0 and XPath 2.0 Data Model] formally specifies the data model used by [XPath/XQuery] to represent the content of XML documents. The [XPath/XQuery] language is formally defined by operations on this data model.
[XQuery 1.0 and XPath 2.0 Functions and Operators] lists the functions and operators defined for the [XPath/XQuery] language and specifies to which arguments they can be applied and what the result should be.
Ed. Note: [Kristoffer/XSL] The role of the formal semantics document could be clarified: does it supplement or subsume the language specification in case of conflict?
The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].
[XPath/XQuery] is a powerful language, capable of selecting and extracting complex patterns from XML documents , reformulating them into results in arbitrary ways, and computing expressions over the result. This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the constructions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.
Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.
[XPath/XQuery] is a functional language. It is built from expressions, rather than statements. Every construct in the language (except for the query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.
[XPath/XQuery] is a typed language. Types can be imported from
one or more XML Schemas describing the documents that will be
processed, and the [XPath/XQuery] language can then perform operations
based on these types (e.g., using a treat as
operation). In addition, [XPath/XQuery] also supports a level of
static type analysis. This means that the system can
perform infer the type of a [expression/query], based on the type of its
inputs. Static typing allows early error detection, and can be used
as the basis for certain forms of optimization. The type system of
[XPath/XQuery] is based on [XML Schema]. The [XPath/XQuery] type system
models most of the features of [XML Schema Part 1], including global and
local element and attribute declarations, complex and simple type
definitions, named and anonymous types, derivation by restriction,
extension, list and union, substitution groups, and wildcard
types. It does not model uniqueness constraints and facet
constraints on simple types. The main issues with respect to the
[XPath/XQuery] type system are discussed in [3.8 Major type issues].
The [XPath/XQuery] formal semantics builds on long-standing traditions in the database and in the programming languages communities. In particular, the [XPath/XQuery] formal semantics has been inspired by works on SQL [SQL], OQL [ODMG], [BKD90], [CM93], and nested relational algebra (NRA) [BNTW95], [Col90], [LW97], [LMW96]. Other references include Quilt [Quilt], UnQL [BFS00], XDuce [HP2000], XML-QL [XMLQL99], XPath 1.0 [XML Path Language (XPath) : Version 1.0], XQL [XQL99], XSLT [XSLT 99], and YATL [YAT99]. Additional related work can be found in the bibliography [B References].
This document is organized as follows. [2 Preliminaries] introduces notations used to define the [XPath/XQuery] formal semantics. [3 The XQuery Type System] describes the [XPath/XQuery] type system, operations on the [XPath/XQuery] type system, and explains the relationship between the [XPath/XQuery] type system and XML Schema. [4 Basics] describes semantics for basic [XPath/XQuery] concepts, and [5 Expressions], describes the dynamic and static semantics of [XPath/XQuery] expressions. [6 The Query Prolog] describes the semantics of the [XPath/XQuery] prolog. [7 Additional Semantics of Functions] describes any additional semantics required for [XPath/XQuery] functions. Finally, [8 Importing Schemas], specifies how XML Schemas are imported into the [XPath/XQuery] type system.
This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.
First, a processing model for [expression/query] evaluation is defined. This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.
The processing model consists of five phases; each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we furthermore point to the relevant notations introduced later in the document.
Parsing. The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal representation of the parsed syntax tree corresponding to the query is created: this ensures that we can unambiguously specify the following phases by a case per syntax production.
Context Processing. The semantics of [expression/query] depends on the input context. The input context needs to be generated before the [expression/query] can be processed. In XQuery, the input context is described in the Query Prolog (See [6 The Query Prolog]). In XPath, the input context is generated by the processing environment. The input context is split in a static and a dynamic part (denoted statEnv and dynEnv, respectively).
Normalization. To
simplify the semantics specification, some normalization
must first be performed over the [expression/query]. The
[XPath/XQuery] language provides many powerful features that
make [expression/query]s simpler to write and use, but are also
redundant. For instance, a complex for
expression might be rewritten as a composition of several
simple for expressions. The language composed
of these simpler [expression/query] is called the [XPath/XQuery]
Core language and is a subset of the complete
[XPath/XQuery] language. The grammar of the [XPath/XQuery] Core
language is given in [A Normalized core grammar].
During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literals.
Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.7.1 Normalization].
After normalization, the full semantics can be obtained just by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.
Static type analysis. Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and the known types of input documents into account.
If the [expression/query] is not type-safe, static type analysis can result in a static error. For instance, a comparison between an integer value and a string value might be detected as an error during the static type analysis. If static type analysis succeeds, it results in a syntax tree where each sub-expression is "decorated" with its static type, and in an output type for the result of the [expression/query].
More precisely the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.7.2 Static type inference].
Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.
Dynamic Evaluation. This is the phase in which the result of the [expression/query] is computed. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals. This guarantees that every expression can be unambiguously reduced to a value, which is the final result of the [expression/query].
The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.7.3 Dynamic Evaluation].
Ed. Note: Issue:[Kristoffer/XSL] The distinction between errors manifest by not being able to prove a judgment and special error values is not clear. Also the processing model does not allow for skipping static typing. See [Issue-0094: Static type errors and warnings], [Issue-0171: Raising errors], and [Issue-0169: Conformance Levels].
The first four phases above are "analysis-time" (sometimes also called "compile-time") steps, which is to say that they can be performed on a [expression/query] before examining any input document. Indeed, analysis-time processing can be performed before such document even exists. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. The last phase is an "execution-time" (sometimes also called "run-time") step, which is to say that the query is evaluated on actual input document(s).
Static analysis catches only certain classes of errors. For
instance, it can detect a comparison operation applied between
incompatible types (e.g., xs:int and
xs:date). Some other classes of errors cannot be
detected by the static analysis and are only detected at execution
time. For instance, whether an arithmetic expression on 32 bits
integers (xs:int) yields an out-of-bound value can
only be detected at run-time by looking at the data.
While implementations may be free to employ different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report static type errors, as specified in this document. It is still an open issue (See [Issue-0098: Implementation of and conformance levels for static type checking]) whether to make the static type analysis phase optional to allow implementations to return a result for a type-invalid [expression/query] in special cases where the [expression/query] happens to perform correctly on a particular instance of input data. (Note that this is not the same as merely permitting that the static type error is emitted at evaluation time as is done below.)
Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: [expression/query] processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model defines only what the final result.
The above processing phases are all internal to the [expression/query] processor. They do not deal with how the [expression/query] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:
XML Schema import phase. The [XPath/XQuery] type system is based on XML Schema. In order to perform static type analysis, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [8 Importing Schemas].
XML loading phase. Expressions are evaluated on values in the [XQuery 1.0 and XPath 2.0 Data Model]. XML documents must be loaded into the [XQuery 1.0 and XPath 2.0 Data Model] before the evaluation phase. This is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not discussed further here.
Serialization phase. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is still an open issue (See [Issue-0116: Serialization]) and is not discussed further here.
The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.
For the other three phases (normalization, static type analysis and evaluation), instead or giving an algorithm in pseudo-code, the semantics is described formally by means of inference rules. Inference rules provide a notation to describe how the semantics of an expression derives from the semantics of its sub-expressions. Hence, they provide a concise and precise way for specifying bottom-up algorithms, as required by the normalization, static type analysis, and evaluation phases.
The Formal Semantics uses the following namespace prefixes.
xf: and op: for functions and
operators from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.dm: for constructors and accessors of the
[XQuery 1.0 and XPath 2.0 Data Model].xs: for XML Schema components, and built-in
types.fs: for Formal Semantics specific
components.All these prefixes are assumed to be bound to the appropriate URIs.
Ed. Note: [Kristoffer/XSL] The fs: prefix URI should be defined here.
This section gives an overview of the main data model features that are relevant to the Formal Semantics. The reader is referred to the [XQuery 1.0 and XPath 2.0 Data Model] document for more details.
An atomic value is a value in the value space of one of the [XML Schema Part 2] atomic types (that is, a primitive simple type that is not derived by list or union).
A node is one of the following kinds: document, element, attribute, namespace, comment, processing-instruction and text. The [XPath/XQuery] Formal Semantics does not currently discuss namespace, comment, and processing-instruction nodes (See [Issue-0105: Types for nodes in the data model.]). Components of nodes can be typed values, string values, names, type annotations, or other nodes.
XML documents and their contents are represented in the data model as trees composed of nodes and atomic values. [XQuery 1.0 and XPath 2.0 Data Model] defines a mapping from the PSVI (Post Schema Validation Information Set) to such trees.
[XQuery 1.0 and XPath 2.0 Data Model] defines constructors, which are functions that construct nodes and atomic values, and accessors, which are functions that access a value's properties or components.
For instance, the children function returns all the child nodes of an element node. The typed value of a node is a sequence of zero or more atomic values. The string value of a node is an instance of the type xs:string. The name of a node is an instance of the type xs:QName.
An item is either an atomic value or a node.
Every value in the data model is a sequence. A sequence is an ordered collection of zero or more items. Sequences are represented by items separated by commas, enclosed in parentheses.
A sequence containing exactly one item is called a singleton sequence. An item is identical to a singleton sequence containing that item. Sequences are never nested--for example, combining the values 1, (2, 3), and () into a single sequence results in the sequence (1, 2, 3). A sequence containing zero items is called an empty sequence.
The data model annotates element nodes, attribute nodes, and atomic values with type names. This type annotation is the type name given to the element, attribute, or value through the validation process.
The remainder of the section is devoted to several related features that merit special attention: node identity, order, and type annotations.
In the [XQuery 1.0 and XPath 2.0 Data Model], nodes have
identity. Node identity follows from the unique
location of the node within exactly one XML document (or document fragment, in the case of XML being
constructed within the [expression/query] itself). It is
possible for two expressions to return not only the same value,
but also the identical node. On the other hand, two expressions
could return results with the same document structure, but
different identities. [XPath/XQuery] supports comparison of
nodes using either "node equality" (equality by
identity) or "value equality" (equality by
value), for instance by using the operators
is or =.
Atomic values do not have an associated identity and are therefore always compared by value.
The difference between node equality and value equality is illustrated by the following example:
let $a1 := <book><author>Suciu</author></book>,
$a2 := <author>Suciu</author>,
$a3 := $a1/author
return
($a1/author is $a3), {-- true, they refer to the same node --}
($a1/author = $a2), {-- true, they have the same value --}
($a1/author is $a2) {-- false, they are not the same node --}
All [XPath/XQuery]'s operators preserve node identity with the
exception of explicit copy operations, node
(element, attribute, etc.) constructors and validate
expressions. An element constructor always creates a deep copy
of its attributes and child nodes. Other operators, such as
sequence constructors or path expressions, do not result in
copies of the corresponding nodes. So, for example, ($a3,
$a2) creates a new sequence of nodes, the first of which
has the same identity as $a1/author.
There are two kinds of order in [XPath/XQuery]: sequence order and document order.
Sequence order. Sequences of items in the [XPath/XQuery] data model are ordered. Sequence order refers to the order of items within a given sequence.
Document order. Document order refers to the total order among all nodes in a given document. It is defined as the order of appearance of the nodes when performing a pre-order, depth-first traversal of a tree. For elements, this corresponds to the order of appearance of their opening tags in the corresponding XML serialization. Document order is equivalent to the definition used in [XML Path Language (XPath) : Version 1.0].
Depending on the context, it may be possible to talk about two different notions of order for the same (set of) nodes. For example:
let $e := <list>
<item id="1"/>
<item id="2"/>
</list>,
$s := ($e/item[@id='2'], $e/item[@id='1'])
...
The item "1" is before item "2" in document order, but the
same two nodes are in the opposite order in the sequence
$s.
The order between nodes from distinct documents is implementation defined but must be stable. That is, it must not change for the duration of query evaluation.
A type annotation indicates the type name for each element and attribute node, which results from the validation process or is assigned by default (depending on the way the data was built). In the case of anonymous types, the data model annotates element and attributes with xs:anyType and xs:anySimpleType respectively.
For instance, consider the document fragment
<fact>The cat weighs <weight>12</weight> pounds.</fact>
and the associated schema
<xs:element name="fact" type="FactType"/>
<xs:complexType name="FactType" mixed="true">
<xs:sequence>
<xs:element name="weight" type="xs:integer">
</xs:sequence>
</xs:complexType>
Before validation, the two elements fact and
weight in the document are annotated by
xs:anyType.
After validation, the element fact is annotated
with the type name FactType and the element
weight is annotated with the type name
xs:integer.
Type annotations are taken into account during [expression/query] evaluation and have an impact on the semantics of [XPath/XQuery].
For instance, consider the following function declaration
define function convert_weight(element of type FactType $x)
{ ... }
This function only accepts as input an element annotated with
the type FactType, or with one of the types derived
from it. Therefore, the previous document fragment is only
accepted by the function after it has been validated against the
given schema.
[3 The XQuery Type System] gives a formal description of data model values with type annotations and explain how type annotations are taken into account when matching a value against a given type.
This section presents an introduction to (statically) typed languages in general, and a conceptual overview of the [XPath/XQuery] type system in particular. The [XPath/XQuery] type system is discussed formally in [3 The XQuery Type System].
A type system relates values, types, and expressions in the language.
The main constituent parts of a typed system are:
A universe of possible types. Type are composed of other types, starting with primitive types, and can be built using type constructors.
In [XPath/XQuery], the primitive types are the primitive atomic types of [XML Schema Part 2]. Type constructors are also based on schema, and can be used to build types for attributes and elements, sequence and choice groups, etc. The [XPath/XQuery] type system is defined in [3.1 Values and Types].
A notation for specifying types.
In [XPath/XQuery], types are imported from XML Schema, and can be used in expressions using the SequenceType syntax. In order to model and reason about types in [XPath/XQuery], this document introduces its own notation for types. This notation is defined in [3.1 Values and Types].
A relationship between values and types. Formally, the semantics of a type is defined to be the (usually infinite) set of instance values which match that type.
For example, the type xs:integer consists of the set of
all integer values, and the type element address {
xs:string } consists of all elements named
"address" whose contents are a single string value, etc.
Matching a [XPath/XQuery] value against a type is defined in [3.3 Matching].
A notion of subtyping. Subtyping is a relationship between types that plays an important role in a statically typed language. Notably, it is used to determine whether function calls are legal or not.
Subtyping in [XPath/XQuery] is defined in [3.5.1 Subtype].
For a statically typed language one also needs to define:
Rules that relate expressions to types. Static type analysis infers a type for each expression in the language. The most important property for a static type system is that the type inferred statically must be such that for all valid inputs, evaluating the expression returns a result that matches the inferred type. If this property holds, then the static typing provides guarantees that certain kinds of errors can not occur at run-time.
The static typing rules for [XPath/XQuery] are defined for all [XPath/XQuery] expressions in [5 Expressions], [6 The Query Prolog] and [7 Additional Semantics of Functions].
Finally, types can also be used during language evaluation if the language provides:
Expressions on types. A language can support operations that take types into account, such as casting, type matching, etc.
[XPath/XQuery] provide several such expressions: "typeswitch", "cast as", "treat as", and "validate as". [XPath/XQuery] also support a "validate" expression, which performs XML Schema validation. Validation is related to matching in that if validation succeeds, it returns a value which matches the type used for validation. Expressions on types are defined in [5.12 Expressions on SequenceTypes]. The validate expression is defined in [3.4 Erase and Annotate].
The remainder of the section is devoted to several related features that merit special attention: the relationship between the [XPath/XQuery] type system and XML Schema, structural and named typing, and subtyping.
The [XPath/XQuery] type system is based on [XML Schema]. An introduction to XML Schema can be found in [XML Schema Part 0].
The [XPath/XQuery] type system captures a large subset of XML Schema. This section gives a summary of which features are captured and in some cases deviations from XML Schema.
The following components and features of XML Schema are captured exactly in the [XPath/XQuery] type system:
Element and attribute declarations;
Simple and complex type definitions;
Local attributes and elements;
Named and anonymous types;
the type name hierarchy;
Sequence, choice and all groups;
Wildcards;
Derivation by extension, list and union;
Substitution groups.
The following components or features of XML Schema are not represented in the [XPath/XQuery] type system.
Simple type facets;
Identity-constraint definitions;
Notations and annotations.
The following components or features of XML Schema are represented partially or differently in the [XPath/XQuery] type system.
The [XPath/XQuery] type system only captures an approximation
of minOccurs and maxOccurs. Like DTDs, the [XPath/XQuery]
type system only supports *, +, and ? which correspond to
[minOccurs="0" maxOccurs="unbounded"],
[minOccurs="1" maxOccurs="unbounded"], and
[minOccurs="0" maxOccurs="1"] respectively.
The [XPath/XQuery] type system generalizes the notion of derivation by restriction for the needs of static type analysis. Derivation by restriction in XML Schema relies on a syntactic relationship between content models. The [XPath/XQuery] type system uses a generalization of this relationship based on the notion of inclusion between types -- also called structural subtyping.
At compile time, the [XPath/XQuery] environment imports XML Schema declarations, and loads them as declarations in the [XPath/XQuery] type system. This loading process is specified by a mapping from XML Schema to the [XPath/XQuery] type system, given in [8 Importing Schemas].
XML Schema is a powerful schema language for XML. XML Schema can describe both structural constraints as well as type annotations that apply to XML documents. These aspects are referred to as structural typing and named typing respectively.
Structural typing. An important notion underlying XML Schema is the concept of regular expressions, and their tree counterparts, regular tree grammars. An introduction to regular (tree) languages can be found in [Languages] or in [TATA].
Tree grammars can be used to capture the structural constraints imposed by an XML Schema. Automata are the traditional way of implementing tree grammars and can be used to implement part of the XML Schema validation process.
For instance, consider the following element declaration:
<xs:element name="root">
<xs:complexType>
<xs:sequence maxOccurs="unbounded">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b" minOccurs="0"/>
</xs:choice>
<xs:element ref="c"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The structural constraints imposed by the definition of
the content of element root can be described by
the following regular expression (written here in
DTD notation): ((a|b?)*,c)+. Such a constraint can be
checked using a finite state automaton applied on the
children of the element root in the instance
document.
Named typing. In addition, XML Schema provides the ability to associate names with type definitions and to declare relationships between type names.
For instance, consider the following two element declarations:
<xs:element name="person">
<xs:complexType name="Person">
<xs:sequence>
<xs:element ref="ssn">
<xs:choice>
<xs:sequence>
<xs:element ref="major" minOccurs="0"/>
<xs:element ref="minors" maxOccurs="unbounded"/>
</xs:sequence>
<xs:element ref="job_desc"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="student">
<xs:complexType name="Student">
<xs:complexContent>
<xs:restriction base="Person">
<xs:sequence>
<xs:element ref="ssn">
<xs:element ref="major" minOccurs="0"/>
<xs:element ref="minors" maxOccurs="unbounded"/>
</xs:sequence>
</xs:restriction>
</xs:complexType>
</xs:complexContent>
</xs:element>
The type Studentof the element
student is derived by restriction from the type
Person. Such a declaration not only defines a
new type, but also creates a relationship between the type
names Person and Student which
needs to be taken into account when doing validation and
matching against a given type. For instance, the following
function
define function salary(element of type Person $x)
{ .... }
accepts as parameters elements of type Person as well as
elements with types derived by restriction from Person, but does not
accept elements with the same content model as the element
person but a type which does not derive
from the type Person.
A type is a subtype of another type if every value matching the first type also matches the second type. Since matching takes both type structure and type names into account, so is subtyping.
Here is a simple example of subtyping on two regular expressions:
( xs:double )* ( xs:integer | xs:double )*
The first type is a subtype of the second, because any sequence that contains only doubles is also an example of a sequence that contains either integers or doubles. This example only illustrates the structural subtyping.
Type1 <: Type2 indicates that type Type1 is a subtype of Type2. Note that the notation <: is used when defining the [XPath/XQuery] semantics, but is not part of the [XPath/XQuery] syntax.
Subtyping plays a crucial role in the semantics of function application since all functions have an associated signature that declares the types of input parameters and the type of the result of the function.
Intuitively, one can call a function not only by passing an argument of the declared type, but also any argument whose type is a subtype of the declared type. This property is called type substitutability.
Ed. Note: [Kristoffer/XSL] Type substitutability should really be true for any subexpression, not just function arguments (it just happens that in functional languages, where type substitutability is usually defined, the situations are equivalent). Specifically type substitutability does not apply to any subexpression of [XPath/XQuery]:
typeswitchviolates it. In particular this means that while one may replace a function argument with one whose type is a subtype this does not guarantee that the resulting value is the same even if the value is the same (but cast to a subtype).
Subtyping in [XPath/XQuery] is related to, but not identical to, "derivation by restriction" in XML Schema. A type derived by restriction is always a subtype of its base type, but the opposite is not true. For instance, consider the following types:
( xs:double )+ xs:double, ( xs:integer | xs:double )*
The first type is a subtype of the second, because any sequence that contains one or more doubles is also an example of a sequence containing a double followed by a sequence of zero or more integers or doubles. But the first type cannot be derived from the second type using derivation by restriction as defined in XML Schema.
The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides a number of useful basic functions over the components of the [XPath/XQuery] data model (atomic values, nodes, sequences, etc). A number of these functions are used in the course of describing the [XPath/XQuery] semantics. Here is the list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the [XPath/XQuery] Formal Semantics:
op:numeric-add
op:numeric-subtract
op:numeric-multiply
op:numeric-divide
op:numeric-mod
op:numeric-unary-plus
op:numeric-unary-minus
op:numeric-equal
op:numeric-less-than
op:numeric-greater-than
op:boolean-equal
op:boolean-less-than
op:boolean-greater-than
op:yearMonthDuration-equal
op:yearMonthDuration-less-than
op:yearMonthDuration-greater-than
op:dayTimeDuration-equal
op:dayTimeDuration-less-than
op:dayTimeDuration-greater-than
op:dateTime-equal
op:dateTime-less-than
op:dateTime-greater-than
op:add-yearMonthDurations
op:subtract-yearMonthDurations
op:multiply-yearMonthDuration
op:divide-yearMonthDuration
op:add-dayTimeDurations
op:subtract-dayTimeDurations
op:multiply-dayTimeDuration
op:divide-dayTimeDuration
op:add-yearMonthDuration-to-dateTime
op:add-dayTimeDuration-to-dateTime
op:subtract-yearMonthDuration-from-dateTime
op:subtract-dayTimeDuration-from-dateTime
op:add-yearMonthDuration-to-date
op:add-dayTimeDuration-to-date
op:subtract-yearMonthDuration-from-date
op:subtract-dayTimeDuration-from-date
op:add-dayTimeDuration-to-time
op:subtract-dayTimeDuration-from-time
op:QName-equal
op:anyURI-equal
op:hex-binary-equal
op:base64-binary-equal
op:NOTATION-equal
op:node-equal
op:node-before
op:node-after
op:node-precedes
op:node-follows
op:to
op:concatenate
op:item-at
op:union
op:intersect
op:except
op:operation
xf:false
xf:true
xf:count
xf:boolean
xf:get-namespace-uri
xf:get-local-name
xf:round
xf:compare
xf:not
xf:empty
xf:root
xf:error
Many functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document are
generic: they perform operations on arbitrary
components of the data model, e.g., any kind of node, or any
sequence of items. For instance, the
xf:distinct-nodes function removes duplicates in
any sequence of nodes. As a result, the signature given in the
[XQuery 1.0 and XPath 2.0 Functions and Operators] document is also generic. For instance, the
signature of the xf:distinct-nodes function is:
define function xf:distinct-nodes(node*) returns node*
If applied as is, this signature would result in poor static
type information. In general, it would be preferable if some of
these function signatures could take the type of input
parameters into account. For instance, if the function
xf:distinct-nodes is applied on a parameter of type
element a*, element b, one can easily deduce that
the resulting sequence is a collection of either a
or b elements.
In order to provide better static typing, some specific typing rules are specified for some of the functions in [XQuery 1.0 and XPath 2.0 Functions and Operators]. These additional typing rules are given in [7 Additional Semantics of Functions]. Here is the list of functions that are given specific typing rules.
xf:error
xf:distinct-nodes
xf:distinct-values
op:union
op:intersect
op:except
op:to
xf:data
Ed. Note: There might be other functions that need specific typing rules. See [Issue-0135: Semantics of special functions].
Some queries result in an error. Errors that are detected at
compile-time, like parse errors or type errors, are reported to
the user by the system. Dynamic errors are raised using the
xf:error() function. General handling of errors in
the Formal Semantics is still an open issue (See [Issue-0094:
Static type errors and warnings], [Issue-0098:
Implementation of and conformance levels for static type
checking], [Issue-0171:
Raising errors], and [Issue-0169:
Conformance Levels]).
The [XPath/XQuery] Data Model provides operations to construct or
access components of the Data Model, some of which are used in
the course of defining the [XPath/XQuery] semantics. The Formal
Semantics uses the namespace prefix dm: for those
functions to distinguish them from other functions in the
[XQuery 1.0 and XPath 2.0 Functions and Operators] document.
Here is a list of data model constructors or accessors used for Formal Semantics specification.
dm:atomic-value
dm:children
dm:attributes
dm:parent
dm:name
dm:node-kind
dm:dereference
dm:empty-sequence
dm:namespaces
dm:type
dm:typed-value
dm:attribute-node-atomic
dm:element-node-atomic
dm:text-node
XPath axes are used to describe tree navigation in instances of the [XPath/XQuery] data model. The semantics of axis navigation is described in terms of data model functions. Some of the XPath axes are directly supported through existing data model accessors. Some axes (e.g., ancestor) are defined as a simple recursive application of a data model accessor and others (e.g., following) require more complex computation on top of existing data model accessors. The correspondence between XPath axis and data model accessors is specified in section [5.2.1.1 Axes]
In a few cases, the Formal Semantics makes use of
functions that are not currently in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.
The namespace prefix fs:
is used for those functions, to distinguish them from functions
in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.
Here is a list of additional functions used for Formal Semantics specification.
fs:characters-to-string
fs:distinct-doc-order
fs:item-sequence-to-node-sequence
fs:item-sequence-to-string
In order to make the specification of the semantics both concise and precise, formal notations in the form of judgments and inference rules are used. This section, introduces the notations for judgments, inference rules and mapping rules, as well as the notion of an environment.
This section is intended for the reader who may not be familiar with these notations. The reader familiar with these notations can skip this section and go directly to [2.7 The Formal Semantics].
Grammar productions are used in this document to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.
XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.
| [10 (XQuery)] | FLWRExpr | ::= |
For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in the document, and are annotated with "(Path)". For instance, the following production describes for expressions in XPath.
| [13 (XPath)] | ForExpr | ::= |
XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.
| [8 (Core)] | ForExpr | ::= |
The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe the [XPath/XQuery] type system. XQuery Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.
| [27 (Formal)] | Definition | ::= |
Note that grammar productions which are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.
The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.
For example:
Notation
The judgment
holds if the painting Painting is beautiful.
Or more relevant, here are two example of judgments that are used extensively in the rest of this document.
Notation
The judgment
holds if the expression Expr yields (or evaluates to) the value Value.
Notation
The judgment
holds when the expression Expr has type Type.
A judgment can contain symbols and patterns.
Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.
Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr can be used to represent any [XPath/XQuery] expression, Value can be used to represent any value in the [XPath/XQuery] data model.
When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).
Patterns may appear with subscripts (e.g. Expr1, Expr2), which simply name different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr1 => Expr1' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr1. On the contrary, '$x+0 => 3' is an instance of the judgment 'Expr1 => Expr2'.
In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName2. This usage is limited, and only occurs to improve the readability of some of the inference rules.
Whether a judgments holds or not is specified by means of inference rules. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments. (As explained in [2.1 Processing model], this approach lends itself well to bottom-up algorithms.)
A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:
| premise1 ... premisen |
|
|
| conclusion |
All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.
Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:
| $x => 0 3 => 3 |
|
|
| $x + 3 => 3 |
This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.
It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:
|
|
| 3 => 3 |
This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.
The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:
| Variable => Integer |
|
|
| Variable + 0 => Integer |
As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".
Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" statement can be captured in an environment and used for further evaluations.
An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.
If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped to. The notation is intentionally akin to function application as an environment can be seen as a "function" from the argument symbol to the "object" that the symbol is mapped to.
This specification uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".
Updating is only defined on environment groups:
"env [ mem(symbol |-> object) ]" denotes the a new environment group which is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol |-> object indicates that symbol is mapped to object in the new environment.
If the "object" is a type then the following conventional variant notation is used: "env [ mem(symbol : object) ]".
The following shorthand is also allowed: "env [ mem( symbol1 |-> object1 ; ... ; symboln |-> objectn ) ]" in which each symbol is mapped to a corresponding object in the new environment.
This notation is equivalent to nested simple updates, as in " (...(env[mem(symbol1 |-> object1)]) ... [mem(symboln |-> objectn)])".
Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of variables, namespaces, etc. Also, note that there are no operations to remove entries from environments: the need never arises because the environment group from "before" the update remains accessible concurrently with the updated version.
Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is ubiquitous in inference rules.
For instance, the judgment
should be read: assuming the dynamic environment dynEnv, the expression Expr yields the value Value.
The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [4.1 Expression Context].
Putting the above notations together, here is an example of an inference rule that occurs later in this document:
This rule is read as follows: if two expressions Expr1 and Expr2 have already been statically inferred to have types Type1 and Type2 (the two premises above the line), then it is the case that the expression below the line "Expr1 , Expr2" must have the type "Type1, Type2", which is the sequence of types Type1 and Type2.
The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let/return" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.
| statEnv |- Expr1 : Type1 statEnv [ varType(QName : Type1) ] |- Expr2 : Type2 |
|
|
statEnv |-
let
$QName := Expr1
return
Expr2 : Type2
|
This rule should be read as follows. First, the type Type1 for the "let" input expression Expr1 is computed. Second the "let" variable is added into the varType member of the static environment group statEnv, with type Type1. Finally, the type Type2 of Expr2 is computed in that new environment.
Ed. Note: Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.
Finally, this section introduces the top-level notations defining the processing model phases described above. These notations are used to define the normalization, static type analysis, and dynamic evaluation processing phases.
Normalization is specified using mapping rules which describe how a [XPath/XQuery] expression is rewritten into another expression in the [XPath/XQuery] Core. Note that mapping rules are also used in [8 Importing Schemas] to specify how XML Schemas are imported into the [XPath/XQuery] Type System.
Notation
Mapping rules are written using a square bracket notation, as follows:
| [Object]Subscript |
| == |
| Mapped Object |
The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.
(Since normalization is always done in the context of the static context the above is really a shorthand for
but we shall stay with the shorthand because statEnv will always be implied.)
The static environment is used in certain cases (e.g. for normalization of function calls) during normalization. To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.
Ed. Note: [Kristoffer/XSL] We should decide whether to use a shorthand notation as suggested or modify the mapping rules throughout.
Specifically the normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is
| [Expr]Expr |
| == |
| CoreExpr |
which indicates that the expression Expr is normalized to the expression CoreExpr in the [XPath/XQuery] core (with the implied statEnv).
Example
For instance, the following [expression/query]
for $i in (1, 2),
$j in (3, 4)
return
element pair { ($i,$j) }
is normalized to the core expression
for $i in (1, 2) return
for $j in (3, 4) return
return
element pair { ($i,$j) }
in which the complex "FWLR" expression is mapped into a composition of two simpler "for" expressions.
The static semantics is specified using type inference rules, which relate [XPath/XQuery] expressions to types and specify under what conditions an expression is well typed.
Notation
The judgment
holds when, assuming the static environment statEnv is given, the expression Expr has type Type.
Example
The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.
For instance, the following expression.
let $v := 3
return
$v+5
has type xs:integer. This can be inferred as follows: the input literals '3' and '5' have type integer, so the variable $v also has type integer. Since the sum of two integers is an integer, the complete expression has type integer.
Note
The type of an expression is computed by inference. Static type inference rules are given to describe, for each kind of expression, how to compute the type of the expression given the types of its sub-expressions. Here is a simple example of such a rule:
| statEnv |- Expr1 : xs:boolean statEnv |- Expr2 : Type2 statEnv |- Expr3 : Type3 |
|
|
statEnv |-
if Expr1
then Expr2
else Expr3
: ( Type2 | Type3 )
|
This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a choice: '(Type2|Type3)'.
The "left half" of the expression below the line (the part before the :) corresponds to some phrase in a [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal parse tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr1, Expr2, and Expr3) that need to be matched against the children of the node in the parse tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the two sub-expressions of the "," operator. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This illustrates a general feature of the type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the parse structure of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type-safe.
The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.
Notation
The judgment
holds when, assuming the static environment statEnv and dynamic environment dynEnv are given, the expression Expr yields the value Value.
The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.
Example
For instance, the following expression.
let $v := 3
return
$v+5
yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.
Note
As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions. [XPath/XQuery]'s dynamic semantics is modeled on the dynamic semantics presented in [Milner].
The inference rules used for dynamic evaluation, like those for static type inference, follow a top-down recursive structure, computing the value of expressions from the values of their sub-expressions.
Ed. Note: Status:This section has been extensively revised to improve the alignment between the [XPath/XQuery] type system and XML Schema. It now incorporates "named typing". Feedback on the new design is solicited.
The [XPath/XQuery] type system is used for the specification of both the dynamic and the static semantics of [XPath/XQuery]. It is used to describe the semantics of expressions on sequence types and in type conversion rules. It is also the basis for the static semantics of [XPath/XQuery].
This section defines formal values and types, four main operations on types. Three operations (match, erase, and annotate) are used for the dynamic semantics. The last operation (subtyping) is used for the static semantics.
The "match" operation takes as input a value and a type and either succeeds or fails. Type matching checks that the value given as input verifies the name and structural constraints given by the type. It is used in matching parameters against function signatures, and matching values against cases in "typeswitch". An informal description of type matching is given in [2.4.2 SequenceType] in [XQuery 1.0: A Query Language for XML].
The match operation is one the most important operation on types and is the basis for the semantics of typeswitch, treat and assert as, and is used in function calls to check dynamically that the parameter(s) of a function verifies the function signature.
The "erase" operation takes a value and removes all type information from it.
The "annotate" operation takes an untyped value and a type and either succeeds or fails. Type annotation models part of the semantics of XML Schema validation. From the untyped value, it verifies the constraints described by the type, and annotates the value with type names, and adds atomic values. If it succeeds it returns a new value.
The "erase" and "annotate" operations model aspects of the validate as in [XPath/XQuery]. Note that the validate operation in [XPath/XQuery] needs to operate on possibly already typed data model instances, hence the need to erase type information before applying standard XML Schema validation.
The "subtyping" operation takes two types and succeeds if the first type is smaller than the second.
This section introduces formal notations for describing XQuery types and values. Those notations are used extensively throughout this document to represent values and types in inference rules. We use formal notations for values and types because they are more compact, and are easier to manipulate within the inference rules. The formal notations for values and types introduced here are not exposed to the XQuery user.
The following grammar introduces formal notations for values. This notation is used to formally represent values in the the [XQuery 1.0 and XPath 2.0 Data Model].
A value is a sequence of zero or more items. An item is
either a node or an atomic value. A node is either an element,
an attribute, a document node, or a text node. Text nodes always
contain string values of type xs:anySimpleType. Elements,
attributes, and atomic values have a type annotation, which is
the QName of a type.
Elements may be annotated with any type, and attributes may be annotated with any simple type.
A atomic value encapsulates an XML Schema atomic type (which
is its type annotation) and a corresponding value of that
type. An XML Schema atomic type [XML Schema Part 2] may be primitive
or derived, or xs:anySimpleType. The corresponding atomic type
value may be a value in the value space of one of the 19
primitive XML Schema types.
Type annotations are always present. Untyped elements or
attributes (e.g., from well-formed documents before validation
has been performed) are annotated with xs:anyType and attributes
and atomic values with no other type information are annotated
with xs:anySimpleType.
A value is called a simple value if it consists of a sequence of zero or more atomic values.
| [1 (Formal)] | Value | ::= | |
| [2 (Formal)] | SimpleValue | ::= | |
| [3 (Formal)] | ElementValue | ::= | |
| [4 (Formal)] | AttributeValue | ::= | |
| [5 (Formal)] | DocumentValue | ::= | |
| [6 (Formal)] | TextValue | ::= | |
| [7 (Formal)] | NodeValue | ::= | |
| [8 (Formal)] | Item | ::= | |
| [9 (Formal)] | AtomicValue | ::= | |
| [10 (Formal)] | AtomicTypeValue | ::= | |
| [11 (Formal)] | TypeAnnotation | ::= | |
| [12 (Formal)] | ElementName | ::= | |
| [13 (Formal)] | AttributeName | ::= | |
| [14 (Formal)] | TypeName | ::= |
Notation
In the above grammar, the atomic type names are used to
represents the value space of the corresponding XML Schema
atomic type. For instance, "String" indicates the
value space of xs:string, and
"Decimal" indicates the value space of
xs:decimal.
Note that the same rule about constructing sequences apply to
the values described by that grammar. Notably sequences cannot
be nested. For instance, the sequence (10, (1, 2), (), (3,
4)) is equivalent to the sequence (10, 1, 2, 3,
4).
Ed. Note: Issue: Although the XQuery data model describes an error value, this error value is not represented formally here. Formal treatment of errors is deferred to a future version of that document.
Ed. Note: Issue: The formal semantics does not represent comments and process-instructions (See [Issue-0143: Support for PI, comment and namespace nodes]).
Ed. Note: Issue: This formal notation for values only represents either text nodes or values. (See [Issue-0144: Representation of text nodes in formal values])
Example
Example (1):
<fact>The cat weighs <weight units="lbs">12</weight> pounds.</fact>
In the absence of a Schema, this document is represented as
element fact of type xs:anyType {
text { "The cat weighs " },
element weight of type xs:anyType {
attribute units of type xs:anySimpleType {
"lbs" of type xs:anySimpleType
}
text { "12" }
},
text { " pounds." }
}
Example (2):
<weight xsi:type="xs:integer">42</weight>
The formal model can represent values before and after validation. Before validation, this element is represented as:
element weight of type xs:anyType {
attribute xsi:type of type xs:anySimpleType {
"xs:integer" of type xs:anySimpleType
},
text { "42" }
}
After validation, this element is represented as:
element weight of type xs:integer {
attribute xsi:type of type xs:QName {
"xs:integer" of type xs:QName
},
42 of type xs:integer
}
Note that the typing rules must permit attributes with name xsi:type and xsi:nil to appear on any element.
Example (3):
<sizes>1 2 3</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 2 3" }
}
Assume the following Schema.
<xs:element name="sizes" type="sizesType"/>
<xs:simpleType name="sizesType">
<xs:list itemType="sizeType"/>
</xs:simpleType>
<xs:simpleType name="sizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>
After validation against this Schema, the element is represented as:
element sizes of type sizesType {
1 of type sizeType,
2 of type sizeType,
3 of type sizeType
}
Example (4): Example with an anonymous type.
<sizes>1 2 3</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 2 3" }
}
Assume the following Schema.
<xs:element name="sizes">
<xs:simpleType>
<xs:list itemType="xs:integer"/>
</xs:simpleType>
</xs:element>
After validation against this Schema, the element is represented as:
element sizes of type xs:anySimpleType {
1 of type xs:integer,
2 of type xs:integer,
3 of type xs:integer
}
Example (5): Example with a union type.
<sizes>1 two 3 four</sizes>
Before validation, this element is represented as:
element sizes of type xs:anyType {
text { "1 two 3 four" }
}
Assume the following Schema:
<xs:element name="sizes" type="sizesType"/>
<xs:simpleType name="sizesType">
<xs:list itemType="sizeType"/>
</xs:simpleType>
<xs:simpleType name="sizeType">
<xs:union memberType="xs:integer xs:string"/>
</xs:simpleType>
After validation against this Schema, the element is represented as:
element sizes of type sizesType {
1 of type xs:integer,
"two" of type xs:string,
3 of type xs:integer,
"four" of type xs:string
}
As they are in DTDs, types in [XPath/XQuery] are based on regular expressions. A type is composed from item types by optional, one or more, zero or more, all group, sequence, choice, empty sequence, or empty choice (written none).
The type none matches no values. It is called the empty choice because it is the identity for choice, that is (Type | none) = Type)). The type none appears in the definition of [3.6.1 Prime types], and is the type for [2.5.3 The error function].
| [15 (Formal)] | Type | ::= | |
| [16 (Formal)] | Occurrence | ::= |
An item type is an element or attribute type, a document node type, a text node type, the untyped type (i.e., which is the type of untyped text values), or an atomic type.
| [17 (Formal)] | ItemType | ::= | |
| [18 (Formal)] | AtomicTypeName | ::= |
An element or attribute type gives an optional name and an optional type specifier. A name with a type specifier is a local declaration. A type specifier alone is a local declaration that matches any name. A name alone refers to a global declaration. The word "element" or "attribute" alone refers to any element or any attribute.
| [19 (Formal)] | ElementType | ::= | |
| [20 (Formal)] | AttributeType | ::= | |
| [21 (Formal)] | Nillable | ::= |
A type specifier either references a global type, or specifies an anonymous type. An anonymous type is described by naming the base type it derives from and giving the type, optionally with a flag indicating a mixed content.
| [22 (Formal)] | TypeSpecifier | ::= | |
| [23 (Formal)] | TypeDerivation | ::= | |
| [24 (Formal)] | TypeReference | ::= | |
| [25 (Formal)] | Derivation | ::= | |
| [26 (Formal)] | Mixed | ::= |
Example
Example (1).
element
matches any element.
Example (2).
element of type xs:integer
matches any element of type xs:integer, such as:
element size of type xs:integer {
2 of type xs:integer
}
Example (3).
element restricts xs:anyType { xs:integer* }
matches any element of any type that contains a sequence of integers, such as:
element numbers of type xs:integer {
1 of type xs:integer,
2 of type xs:integer,
3 of type xs:integer
}
Example (4).
element sizes
refers to the global declaration for element "sizes".
Example (5).
element trouble of type xs:anyType { ( xs:float | xs:string )* }
describes an element with a simple type derived by union. This type matches the following instance:
validate as trouble { <trouble>this is not 1 string</trouble> }
=>
element trouble of type xs:anyType {"this", "is", "not", 1, "string" }
Empty content can be indicated with the explicit empty sequence, or omitted, as in:
element bib { () }
element bib { }
If the content model is omitted, then the element, (resp. attribute or type) name must be a proper QName, and refers to the global element (resp. attribute or type) with that name, which must be defined (by importation from a schema or through a type definition expression).
The type system includes three operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema.
The "," operator builds the "sequence" of two types. For example,
element title of type xs:string, element year of type xs:integer
means a sequence of an element title of type string followed by an element year of type integer.
The "|" operator builds a "choice" between two types. For example,
element editor of type xs:string | element bib:author*
means either an element editor of type string, or a
sequence of bib:author elements.
The "&" operator builds the "interleaved
product" of two types. The type t1
& t2 matches any sequence that is
an interleaving of a sequence that matches t1 and a
sequence that matches t2. For example,
(element a , element b) & xs:string =
element a, element b, xs:string
| element a, xs:string, element b
| xs:string, element a, element b
In each case, the order of element a and
element b is unchanged, but the xs:string can
be interleaved in any position.
The interleaved product is used to represent all groups in XML Schema. The xs:all construct in XML Schema may only consist of global or local element declarations with lower bound 0 or 1, and upper bound 1.
At the top level, one can define elements, attributes, and types.
| [27 (Formal)] | Definition | ::= | |
| [28 (Formal)] | Substitution | ::= |
Global element and attribute declarations, like local element and attribute declarations, contain a type specifier. In addition, a global element declaration may declare the substitution group for the element and whether the element is nillable. A global type declaration specifies both the derivation and the declared type.
Ed. Note: Issue: The formal semantics provide support for substitution group. Support for substitution group is still an open issue in the [XPath/XQuery] document. (See [Issue-0144: Representation of text nodes in formal values])
The following type definitions capture the primitive types from Schema.
define type xs:string restricts xs:anySimpleType { xs:string }
define type xs:decimal restricts xs:anySimpleType { xs:decimal }
...
In the above definitions, each name (such as xs:string) appears twice. The first appearance is used for named typing (xs:string is derived by restriction from xs:anySimpleType), and the second for structural typing (the value space of xs:string is given by xs:string -- the apparent circularity here is broken in the rule for matching in [3.3 Matching], which defines matching against xs:string directly).
The following type definition captures the Ur simple type from Schema.
define type xs:anySimpleType restricts xs:anyType {
( xs:string | xs:decimal | ... )*
}
The name of the Ur simple type is
xs:anySimpleType, and its structural definition
is given by the union of all simple type.
Note that there is no separate "value space"
for the type xs:anySimpleType, it's value space
being the same as xs:string. The following
example shows the distinction between a value of type string
containing "Database" and an untyped value containing
"Database": both are using string values as content with
different type names.
"Databases" of type xs:string "Databases" of type xs:anySimpleType }
The following type definition captures the Ur type from Schema.
define type xs:anyType restricts xs:anyType {
attribute*,
( xs:anySimpleType |
( element* & text* ) )
}
Schema defines four built-in attributes that can appear on any element in the document without being explicitly declared in the schema. Those four attributes need to be added inside content models when doing matching. The four built-in attributes of Schema are declared as follows.
define attribute xsi:type of type xs:QName
define attribute xsi:nil of type xs:boolean
define attribute xsi:schemaLocation of type xs:anySimpleType {
xs:anyURI*
}
define attribute xsi:noNamespaceSchemaLocation of type xs:anyURI
For convenience, a type that is an all group of the four built-in XML Schema attributes is defined.
BuiltInAttributes =
attribute xsi:type ?
& attribute xsi:nil ?
& attribute xsi:schemaLocation ?
& attribute xsi:noNamespaceSchemaLocation ?
The type system given in [3.1.2 Types] gives a simple but overly general definition of types. For instance, the above type grammar allows types to describe sequences of attributes and elements in any order. For example, the following type, describing a sequence of zero or more elements or attributes with name "year", is allowed.
(element year of type string | attribute year of type string)*
There are two reasons for this approach. First, it makes the grammar for types simpler. Second, it allows to describe types for heterogeneous sequences, that can result from [XPath/XQuery] expressions but cannot be represented as an XML Schema. For example, the above type is inferred for the following query.
for $b in //book return
if ($b/publisher = "Springer") then
element year { $b/year/text() }
else
attribute year { $b/year/text() }
When constructing elements, or attributes, some additional constraints on type need to be imposed, for which this grammar is too general. This section gives auxiliary grammar productions which impose additional constraints on the content of elements and attributes.
First, a grammar for simple types is defined. This grammar is a subset of the grammar for types. A simple type is composed from atomic types by optional, one or more, zero or more, choice, or empty choice.
| [29 (Formal)] | SimpleType | ::= |
An all group contains only attributes or only elements; attributes or elements in an all group may be optional but not repeated; attributes always precede other content of an element.
| [30 (Formal)] | AttributeAll | ::= | |
| [31 (Formal)] | ElementAll | ::= | |
| [32 (Formal)] | ElementNoAll | ::= | |
| [130 (Formal)] | ElementContent | ::= | "ElementContent" |
| [33 (Formal)] | ElementBody | ::= |
The type in an element type should always match the grammar
for ElementBody given above. (Note that in case the element type
is omitted altogether, e.g., like in element a { },
the type is implicitely matching the empty sequence.)
Those auxiliary productions is be used when matching the type content of elements or attributes within inference rules.
Here is a schema describing purchase orders, taken from the XML Schema Primer, followed by its mapping into the [XPath/XQuery] type system. The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas].
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Purchase order schema for Example.com.
Copyright 2000 Example.com. All rights reserved.
</xsd:documentation>
</xsd:annotation>
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
<xsd:element name="comment" type="xsd:string"/>
<xsd:complexType name="PurchaseOrderType">
<xsd:sequence>
<xsd:element name="shipTo" type="USAddress"/>
<xsd:element name="billTo" type="USAddress"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="items" type="Items"/>
</xsd:sequence>
<xsd:attribute name="orderDate" type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name="USAddress">
<xsd:sequence>
<xsd:element name="name" type="xsd:string"/>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN"
fixed="US"/>
</xsd:complexType>
<xsd:complexType name="Items">
<xsd:sequence>
<xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="quantity">
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="USPrice" type="xsd:decimal"/>
<xsd:element ref="comment" minOccurs="0"/>
<xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="partNum" type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name="SKU">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
namespace xsd = "http://www.w3.org/2001/XMLSchema"
define element purchaseOrder of type ipo:PurchaseOrderType
define element comment of type xsd:string
define type PurchaseOrderType {
attribute orderDate of type xsd:date?,
element shipTo of type USAddress,
element billTo of type USAddress,
element ipo:comment?,
element items of type Items
}
define type USAddress {
attribute country of type xsd:NMTOKEN?,
element name of type xsd:string,
element street of type xsd:string,
element city of type xsd:string,
element state of type xsd:string,
element zip of type xsd:decimal
}
define type Items {
attribute partNum of type SKU,
element item {
element productName of type xsd:string,
element quantity restricts xsd:positiveInteger { xsd:positiveInteger },
element USPrice of type xsd:decimal,
element comment?,
element shipDate of type xsd:date?
}*
}
define type SKU restrict xsd:string { xsd:string }
A number of auxiliary judgment that specify part of the semantics of Schema are defined. Those judgments are used in the rest of the section, but are not used directly used from other parts of the formal semantics.
The following judgements capture the relationship between names, obtained from derivation and substitution groups in a given XML Schema.
Notation
The judgment
holds when the first type name derives from the second type name.
Note
Derivation is a partial order. It is reflexive and transitive by the definition below. It is asymmetric because no cycles are allowed in derivation by restriction or extension.
Semantics
This judgment is specified by the following rules.
Some rules have hypotheses that simply list a type, element, or attribute declaration.
Every type name derives from itself.
|
|
| statEnv |- TypeName derives from TypeName |
Every type name derives from the type it is declared to derive from by restriction or extension.
| statEnv.typeDefn(TypeName) => define type TypeName extends BaseTypeName Mixed? { Type? } |
|
|
| statEnv |- TypeName derives from BaseTypeName |
| statEnv.typeDefn(TypeName) => define type TypeName restricts BaseTypeName Mixed? { Type? } |
|
|
| statEnv |- TypeName derives from BaseTypeName |
Derivation is transitive.
| statEnv |- TypeName1 derives from TypeName2 statEnv |- TypeName2 derives from TypeName3 |
|
|
| statEnv |- TypeName1 derives from TypeName3 |
Notation
The judgment
holds when the first element name substitutes for the second element name.
Note
Substitution is a partial order. It is reflexive and transitive by the definition below. It is asymmetric because no cycles are allowed in substitution groups.
Semantics
This judgment is specified by the following rules.
Every element name substitutes for itself.
|
|
| statEnv |- ElementName substitutes for ElementName |
Every element name substitutes for the element it is declared to substitute for.
| statEnv.elemDecl(ElementName) => define element ElementName substitutes for BaseElementName Nillable? TypeSpecifier |
|
|
| statEnv |- ElementName substitutes for BaseElementName |
Substitution is transitive.
| statEnv |- ElementName1 substitutes for ElementName2 statEnv |- ElementName1 substitutes for ElementName3 |
|
|
| statEnv |- ElementName1 substitutes for ElementName3 |
Notation
The judgment
holds when the result of extending Type1 by Type2 is Type.
Semantics
This judgment is specified by the following rules.
| statEnv |- Type1 = AttributeAll1 , ElementContent1 statEnv |- Type2 = AttributeAll2 , ElementContent2 |
|
|
| statEnv |- Type1 extended by Type2 is (AttributeAll1 & AttributeAll2) , ElementContent1 , ElementContent2 |
Notation
The judgment
holds when the result of creating a mixed content from Type1 is Type2.
Semantics
This judgment is specified by the following rules.
An element may optionally include the four built-in attributes xsi:type, xsi:nil, xsi:schemaLocation, or xsi:noNamespaceSchemaLocation.
Notation
The judgment
holds when the second type is the same as the first, with the four built-in attributes added, and with text nodes added if the type is mixed.
Semantics
This judgment is specified by the following rules.
If the type is flagged as mixed, then mix the type and extend it by the built-in attributes.
|
|||
|
|
|||
| statEnv |- mixed Type adjusts to Type'' |
Otherwise, just extend the type by the built-in attributes.
| statEnv |- Type extended by BuiltInAttributes is Type' |
|
|
| statEnv |- Type adjusts to Type' |
The definition of BuiltInAttributes appears in [3.1.4 Built-in type declarations].
Notation
The judgment
holds when an element matches the type declaration if the element has a type annotation that derives from the type name, and a content that matches the type.
Semantics
This judgment is specified by the following rules.
If the type is omitted, it is resolved as the empty sequence type.
| statEnv |- Derivation? Mixed? { () } resolves to TypeName { Type } |
|
|
| statEnv |- Derivation? Mixed? { } resolves to TypeName { Type } |
If the type specifier names a global type, then the type name is the name of that type, and the type is taken by resolving the type specifier of the global type.
|
|||
|
|
|||
| statEnv |- of type TypeName resolves to TypeName { Type } |
In the above inference rule, note that BaseTypeName is the base type of the type referred to. So this is indeed the original type name, TypeName, which must be returned, and eventually used to annotated the corresponding element or attribute. However, the type specifier needs to be obtained through a second application of the judgment.
If the type specifier is a restriction, then the type name is the name of the base type, and the type is taken from the type specifier.
| statEnv |- Mixed? Type adjusts to AdjustedType |
|
|
| statEnv |- restricts TypeName Mixed? { Type } resolves to TypeName { AdjustedType } |
If the type specifier is an extension, then the type name is the name of the base type, and the type is the base type extended by the type in the type specifier.
|
||||
|
|
||||
| statEnv |- extends TypeName Mixed? { Type } resolves to TypeName { AdjustedType } |
Notation
The judgment
holds if the first type is derived from the second type. This corresponds to [Type Derivation OK] in [XML Schema Part 1].
Semantics
This judgment is specified by the following rules.
A type specifier validly derives from a type reference if the type specifier resolves to a type name that derives from the referenced type.
| statEnv |- TypeSpecifier resolves to TypeName { Type } statEnv |- TypeName derives from TypeName' |
|
|
| statEnv |- TypeSpecifier derives from of type TypeName' |
There is no rule if the type specifier not a type reference, because it is not possible for a local type to derive from another local type.
Notation
The judgment
holds when matching an element with the given element name against the given element type requires that the element be nillable as indicated and matches the type specifier.
Semantics
This judgment is specified by the following rules.
If the element type is a reference to a global element, then lookup yields the type specifier in the element declaration for the given element name. The given element name must be in the substitution group of the global element.
|
|||
|
|
|||
| statEnv |- ElementName lookup element ElementName' yields Nillable? TypeSpecifier |
If the given element name matches the element name in the element type, and the element type contains a type specifier, then lookup yields that type specifier.
|
|
| statEnv |- ElementName lookup element ElementName Nillable? TypeSpecifier yields Nillable? TypeSpecifier |
If the element type has no element name but contains a type specifier, then lookup yields the type specifier.
If the element type has no element name and no type
specifier, then lookup yields xs:anyType.
|
|
statEnv |-
ElementName lookup element
yields of type xs:anyType
|
Notation
The judgment
holds when matching an attribute with the given attribute name against the given attribute type matches the type specifier.
Semantics
This judgment is specified by the following rules.
If the attribute type is a reference to a global attribute, then lookup yields the type specifier in the attribute declaration for the given attribute name.
|
||
|
|
||
| statEnv |- AttributeName lookup attribute AttributeName yields TypeSpecifier |
If the given attribute name matches the attribute name in the attribute type, and the attribute type contains a type specifier, then lookup yields that type specifier.
If the attribute type has no attribute name but contains a type specifier, then lookup yields the type specifier.
If the attribute type has no attribute name and no type
specifier, then lookup yields xs:anySimpleType.
|
|
statEnv |-
AttributeName lookup
attribute yields of type
xs:anySimpleType
|
Notation
The judgment
holds if some interleaving of Value1 and Value2 yields Value. Interleaving is non-deterministic; it is used for processing all groups.
Semantics
This judgment is specified by the following rules.
Interleaving two empty sequences yields the empty sequence.
|
|
| statEnv |- () interleave () yields () |
Otherwise, pick an item from the head of one of the sequences, and recursively interleave the remainder.
| statEnv |- Value1 interleave Value2 yields Value |
|
|
| statEnv |- Item,Value1 interleave Value2 yields Item,Value |
| statEnv |- Value1 interleave Value2 yields Value |
|
|
| statEnv |- Value1 interleave Item,Value2 yields Item,Value |
Notation
The judgment
holds if there are no occurrences of the attribute QName in Value. The judgment
holds if there is one occurrence of the attribute QName in Value, and the value of that attribute is SimpleValue. The judgment
holds if either of the previous two judgments hold.
Semantics
These judgments are defined using the auxiliary judgments
and
which are defined in [5.2.1 Steps].
The filter judgments are defined as follows.
Introduction
The semantics of a type is given by the notion of matching, I.e., the set of all values which match that type. A tree in the [XQuery 1.0 and XPath 2.0 Data Model] matches a type in the [XPath/XQuery] type system, if and only if:
It verifies the structural constraints described by the type. For instance, the data model value:
element bib:pubdate { 1954, 1966, 1974, 1986 }
matches the following type:
element { attribute country { xs:string }?, xs:integer* }
because the element name "bib:pubdate" matches the wildcard element in the type (in which the name is omitted), the attribute "country" is optional, and the content of that element is indeed a sequence of integers.
It verifies the name constraints described by the type. For instance, the data model value:
element bib:pubdate of type xs:anyType { 1954, 1966, 1974, 1986 }
does not match the following type:
element of type PublicationInfo
define type PublicationInfo { attribute country { xs:string }?, xs:integer* }
because the element in the data model is annotated with
the type name xs:anyType while the schema expects an element
annotated with the type name PublicationInfo.
Notation
The judgment
holds when the given value matches the given nillable type.
Semantics
This judgment is specified by the following rules.
If the type is not nillable, then the xsi:nil attribute must not appear in the value, and the value must match the type.
|
|||
|
|
|||
| statEnv |- Value nil-matches Type |
If the type is nillable, and the xsi:nil attribute does not appear or is false, then the value must match the type.
|
|||
|
|
|||
| statEnv |- Value nil-matches nillable Type |
If the type is nillable, and the xsi:nil attribute is true, then the value must match the attributes in the type. The element content of the type is ignored.
Notation
The judgment
holds when the given value matches the given type.
Semantics
This judgment is specified by the following rules.
The empty sequence matches the empty sequence type.
If two values match two types, then their sequence matches the corresponding sequence type.
|
|||
|
|
|||
| statEnv |- Value1,Value2 matches Type1,Type2 |
If a value matches a type, then it also matches a choice type where that type is one of the choices.
If two values match two types, then their interleaving matches the corresponding all group.
|
||||
|
|
||||
| statEnv |- Value matches Type1 & Type2 |
An optional type matches a value of that type or the empty sequence.
The following rules are used to match a value against a sequence of zero (or one) or more types.
| statEnv |- Value1 matches Type statEnv |- Value2 matches Type* |
|
|
| statEnv |- Value1, Value2 matches Type* |
| statEnv |- Value1 matches Type statEnv |- Value2 matches Type* |
|
|
| statEnv |- Value1, Value2 matches Type+ |
An element matches an element type if the element type resolves to another element type, and the type annotation is derived from the type annotation of the resolved type, and the element value matches the enclosed type of the resolved type.
|
|||||
|
|
|||||
| statEnv |- element ElementName of type TypeName { Value } matches ElementType |
The rule for attributes is similar.
|
|||||
|
|
|||||
| statEnv |- attribute AttributeName of type TypeName { Value } matches AttributeType |
A document node matches the corresponding document type.
A text node matches text.
An atomic value matches an atomic type if its type annotation derives from the atomic type. The value itself is ignored -- this is checked as part of validation.
| statEnv |- AtomicType derives from AtomicType' |
|
|
| statEnv |- AtomicValue of type AtomicType matches AtomicType' |
An atomic value matches "untyped" only if it is annotated
with xs:anySimpleType.
|
|
statEnv |-
AtomicValue of type xs:anySimpleType
matches untyped
|
Note
The above definition of matching, although complete and precise, does not give a simple means to compute the structural matching. Notably, some of the above rules can be non-deterministic (e.g., the rule for matching of choice or repetition).
The structural component of the [XPath/XQuery] type system can be modeled by tree grammars. Computing structural matching can be done by computing if a given tree grammar recognizes the given data model value.
This document does not provide a complete algorithm to recognize a given tree by a tree grammar. The interested reader can consult the relevant literature, for instance [Languages], or [TATA].
Ed. Note: A future version of this document should include a more complete description of algorithms to compute type matching.
Recall the rule for matching an element against an element type.
|
|||||
|
|
|||||
| statEnv |- element ElementName of type TypeName { Value } matches ElementType |
This rule simplifies greatly in the case that the type specifier is a reference to a global type.
|
||||
|
|
||||
| statEnv |- element ElementName of type TypeName { Value } matches ElementType |
In this case, it is not necessary to do resolution or to check that the value matches the type, because this is guaranteed by the way in which elements are labeled with types. Note that the optimization applies only if xsi:nil is not true.
A similar optimization applies to attributes.
Erase and annotate define the core of XML Schema validation. They are used to define (partially) the semantics of the validate operation in [XPath/XQuery].
Notation
To define erasure, an auxiliary judgment is needed. The judgment
holds when SimpleValue erases to the string String.
Semantics
This judgment is specified by the following rules.
The empty sequence erases to the empty string.
|
|
| statEnv |- () simply erases to "" |
The concatenation of two non-empty sequences of values erases to the concatenation of their erasures with a separating space.
|
|||
|
|
|||
| statEnv |- SimpleValue1,SimpleValue2 simply erases to xf:concat(String1," ",String2) |
An atomic value erases to its string representation.
|
|
statEnv |-
AtomicValue of type AtomicType
simply erases to
dm:string-value(AtomicValue)
|
Notation
The judgment
holds when the erasure of Value is Value'.
Semantics
This judgment is specified by the following rules.
The empty sequence erases to itself.
The erasure of the concatenation of two values is the concatenation of their erasure, so long as neither of the two original values is simple.
|
|||
|
|
|||
| statEnv |- Value1,Value2 erases to Value1',Value2' |
The erasure of an element is an element that has the same
name and the type xs:anyType and the erasure of the original
content.
|
||
|
|
||
statEnv |-
element ElementName of type
TypeName { Value } erases to element
ElementName of type xs:anyType { Value'
}
|
The erasure of an attribute is an attribute that has the
same name and the type xs:anySimpleType and the simple erasure
of the original content labeled with xs:anySimpleType.
|
||
|
|
||
statEnv |-
attribute AttributeName of type
TypeName { Value } erases to attribute
AttributeName of type xs:anySimpleType { String
of type xs:anySimpleType }
|
The erasure of a document is a document with the erasure of the original content.
The erasure of a text node is itself.
The erasure of a simple value is the corresponding text node.
|
||
|
|
||
| statEnv |- SimpleValue erases to text { String } |
Notation
The judgment
holds if the result of casting the SimpleValue to SimpleType is SimpleValue'.
Ed. Note: Issue: The simply annotate judgment is used to describe the behavior of validation of simple values. This operation is essentially similar to casting from string to an atomic value. It is not clear if this actually aligns to the behavior of casting as specified by the [XQuery 1.0 and XPath 2.0 Functions and Operators]. See [Issue-0156: Casting and validation].
Semantics
This judgment is specified by the following rules.
Simply annotating a simple value to a union type yields the result of simply annotating the simple value to either the first or second type in the union. Note that simply annotating to the second type is attempted only if simply annotating to the first type fails.
|
||
|
|
||
| statEnv |- simply annotate as SimpleType1|SimpleType2 (SimpleValue) => SimpleValue' |
|
|||
|
|
|||
| statEnv |- simply annotate as SimpleType1|SimpleType2 (SimpleValue) => SimpleValue' |
The simple annotation rules for ?, +, *, and none are similar.
|
|
| statEnv |- simply annotate as SimpleType1? ( () ) => () |
|
||
|
|
||
| statEnv |- simply annotate as SimpleType? (SimpleValue) => SimpleValue' |
|
|
| statEnv |- simply annotate as SimpleType* ( () ) => () |
|
||
|
|
||
| statEnv |- simply annotate as SimpleType* (SimpleValue1,SimpleValue2) => SimpleValue1',SimpleValue2' |
|
||
|
|
||
| statEnv |- simply annotate as SimpleType+ (SimpleValue1,SimpleValue2) => SimpleValue1',SimpleValue2' |
Simply annotating an atomic value to xs:string yields its string representation.
|
|
| statEnv |- simply annotate as xs:string (AtomicValue) => dm:string-value(AtomicValue) |
Simply annotating an atomic value to xs:decimal yields the decimal that results from parsing its string representation.
|
|
statEnv |-
simply annotate as xs:decimal (AtomicValue)
=>
xs:decimal(dm:string-value(AtomicValue))
|
Similar rules are assumed for the rest of the 19 XML Schema primitive types.
Notation
The judgment
holds if it is possible to annotate value Value as if it had the nillable type Type and Value' is the corresponding annotated value.
Semantics
This judgment is specified by the following rules.
If the type is not nillable, then the xsi:nil attribute must not appear in the value, and it must be possible to annotate value Value as if it had the type Type.
|
|||
|
|
|||
| statEnv |- nil-annotate as Type ( Value ) => Value' |
If the type is nillable, and the xsi:nil attribute does not appear or is false, then it must be possible to annotate value Value as if it had the type Type.
|
|||
|
|
|||
| statEnv |- nil-annotate as nillable Type ( Value ) => Value' |
If the type is nillable, and the xsi:nil attribute is true, then it must be possible to annotate value Value as if it had a type where the attributes in the type are kept and the element content of the type is ignored.
|
|||
|
|
|||
| statEnv |- nil-annotate as nillable (AttributeAll, ElementContent) ( Value ) => Value' |
Notation
The judgment
holds if it is possible to annotate value Value as if it had type Type and Value' is the corresponding annotated value.
Note
Assume an XML Infoset instance X is validated against an XML Schema S, yielding PSVI instance X'. Then if X corresponds to Value and S corresponds to Type and X' corresponds to Value', the following should hold: annotate as Type ( Value ) => Value'.
Semantics
This judgment is specified by the following rules.
Annotating the empty sequence as the empty type yields the empty sequence.
|
|
| statEnv |- annotate as () (()) => () |
Annotating a concatenation of values as a concatenation of types yields the concatenation of the annotated values.
|
|||
|
|
|||
| statEnv |- annotate as Type1,Type2 (Value1,Value2) => Value1',Value2' |
Annotating a value as a choice type yields the result of annotating the value as either the first or second type in the choice.
| statEnv |- annotate as Type1 (Value) => Value' |
|
|
| statEnv |- annotate as Type1|Type2 (Value) => Value' |
| statEnv |- annotate as Type2 (Value) => Value' |
|
|
| statEnv |- annotate as Type1|Type2 (Value) => Value' |
Annotating a value as an all group uses interleaving to decompose the original value and recompose the annotated value.
Ed. Note: Jerome and Phil: Note that this may reorder the original sequence. Perhaps we should disallow such reordering. Specifying that formally is not as easy as we would like.
|
|||||
|
|
|||||
| statEnv |- annotate as Type1 & Type2 ( Value ) => Value' |
The annotation rules for ?, +, *, none are similar.
| statEnv |- annotate as (Type | ())(Value) => Value' |
|
|
| statEnv |- annotate as Type? (Value) => Value' |
| statEnv |- annotate as Type (Value1) => Value1' statEnv |- annotate as Type* (Value2) => Value2' |
|
|
| statEnv |- annotate as Type+ (Value1,Value2) => (Value1',Value2') |
|
|
| statEnv |- annotate as Type* ( () ) => () |
| statEnv |- annotate as Type (Value1) => Value1' statEnv |- annotate as Type* (Value2) => Value2' |
|
|
| statEnv |- annotate as Type* (Value1,Value2) => (Value1',Value2') |
To annotate an element with no xsi:type attribute, first look up the the element type, next resolve the resulting type specifier, then annotate the value against the resolved type, and finally return a new element with the name of the original element, the resolved type name, and the annotated value.
|
|||||
|
|
|||||
statEnv |-
annotate as ElementType ( element
ElementName of type xs:anyType { Value } )
=> element ElementName of type
TypeName { Value' }
|
To annotate an element with an xsi:type attribute, define a type specifier corresponding to the xsi:type. Look up the element type, yielding a type specifier, and check that the xsi:type specifier derives from this type specifier. Resolve the xsi:type specifier, then annotate the value against the resolved type, and finally return a new element with the name of the original element, the resolved type name, and the annotated value.
|
|||||||
|
|
|||||||
statEnv |-
annotate as ElementType ( element
ElementName of type xs:anyType { Value } )
=> element ElementName of type
TypeName { Value' }
|
Ed. Note: Issue: the treatment of xsi:type in the [XQuery 1.0: A Query Language for XML] document and in the formal semantics document still differ. See [Issue-0142: Treatment of xsi:type in validation].
The rule for attributes is similar to the first rule for elements.
|
||||
|
|
||||
statEnv |-
annotate as AttributeType (
attribute AttributeName of type xs:anySimpleType {
Value } ) => attribute
AttributeName of type TypeName { Value'
}
|
Annotating a document node yields a document with the annotation of its contents.
| statEnv |- annotate as Type (Value) => Value' |
|
|
| statEnv |- annotate as document { Type } ( document { Value } ) => document { Value' } |
Annotating a text node as text yields itself.
|
|
| statEnv |- annotate as text (text { String }) => text { String } |
Annotating a text nodes as a simple type is identical to casting.
statEnv |-
simply annotate as SimpleType ( String
as xs:anySimpleType ) =>
SimpleValue'
|
|
|
| statEnv |- annotate as SimpleType ( text { String } ) => SimpleValue' |
Annotating a simple value as a simple type is identical to casting.
| statEnv |- simply annotate as SimpleType ( SimpleValue ) => SimpleValue' |
|
|
| statEnv |- annotate as SimpleType ( SimpleValue ) => SimpleValue' |
Introduction
This section defines the semantics of subtyping in [XPath/XQuery]. Subtyping is used during the static type analysis, in typeswitch expressions, treat and assert expressions, and to check the correctness of function applications.
Notation
The judgment
holds if the first type is a subtype of the second.
Semantics
This judgment is true if and only if, for every value Value, then Value matches Type implies Value matches Type'.
Note
It is easy to see that the subtype relation <: is a partial order, i.e. it is reflexive:
statEnv |- Type <: Typeand it is transitive: if,
statEnv |- Type1 <: Type2and,
statEnv |- Type2 <: Type3then,
statEnv |- Type1 <: Type3The above definition although complete and precise, does not give a simple means to compute subtyping.
The structural component of the [XPath/XQuery] type system can be modeled by tree grammars. Computing subtyping between two types can be done by computing if inclusion holds between their corresponding tree grammars.
This document does not provide a complete algorithm to compute inclusion between tree grammar. The interested reader can consult the relevant literature on tree grammars, for instance [Languages], or [TATA].
Ed. Note: A future version of this document should include a more complete description of algorithms to compute subtyping.
In a few cases, equivalence between two types (i.e., that they define exactly the same domain over data model instances) is used.
Notation
The judgment
holds if the Type1 is equivalent to the Type2.
Semantics
By definition, Type1 = Type2 if and only if, Type1<:Type2 and Type2<: Type1.
This section defines the notion of Prime Type, and operations on prime types. Prime types play an important role in defining the static semantics of [XPath/XQuery], and are used notably in the semantics of "for", "unordered", and "sortby" expressions.
Some expressions must operate on sequences where all items have the same type -- this type possibly being a choice of item types. These include FLWR expressions, sortby, and distinct. For example, let's assume the document
<paper>
<author>Buneman</author>
<author>Davidson</author>
<author>Fernandez</author>
<author>Suciu</author>
<title>Adding structure to unstructured data</title>
<editor>Afrati</editor>
<editor>Kolaitis</editor>
</paper>
is of type
element paper {
element author {xs:string}+,
element title {xs:string},
element editor {xs:string}*
}
The following query extracts a sequence of authors followed by editors and sorts them by their content.
(/paper/author,/paper/editor) sortby (.)
This results in a sequence of authors and editors.
<editor>Afrati</editor>
<author>Buneman</author>
<author>Davidson</author>
<author>Fernandez</author>
<editor>Kolaitis</editor>
<author>Suciu</author>
Based on the content model for the paper
element, the type inferred for the expression
(/paper/author,/paper/editor) is simply
element author {xs:string}+, element
editor{xs:string}*. This type indicates that there should
be a sequence of one or more authors, followed by zero or more
editors. Due to the sortby expression, the original order of
elements within the type can then be changed in an arbitrary
way. As a result, the regular expression which describes the
type cannot preserve any information which relates to order. The
type infered for the complete expression reflects the arbitrary
order by repeating a choice of author and editor elements.
(element author {xs:string} | element editor {xs:string})+
Intuitively, sorting expressions, as well as iteration with for expressions, always operate on sequences of items of the same type, this type possibly being a choice. A choice of items is called a Prime type, and can be described by the following grammar production.
| [34 (Formal)] | PrimeType | ::= |
When inferring the type of a for or a sortby expression, the system needs to compute such a prime type.
It also needs to compute the appropriate occurrence indicator for the sequence type. For instance, note that the type in the above example ends in a plus rather than a star, since there must be at least one author element in the sequence.
The rest of this section defines operations related to prime types and occurrence indicators, which are used during static type analysis.
Notation
Two auxiliary functions on types are used.
The type function prime(Type) extracts all item types from the type Type, and combines them into a choice.
The function quantifier(Type)
approximates the possible number of items in Type with
the occurrence indicators supported by the [XPath/XQuery] type
system (?, +, *).
For interim results, the following auxiliary occurrence
indicators are used: 1 for exactly one
occurrence, and 0 for exactly zero occurrences,
i.e., the occurrence indicator of the empty list.
Types and occurrence can be combined with the · operation, as follows.
| Type · 0 | = | () |
| Type · 1 | = | Type |
| Type · ? | = | Type? |
| Type · + | = | Type+ |
| Type · * | = | Type* |
Example
For instance, here are the result of applying prime and quantifier on a few simple types.
prime(element a+) = element a prime(element a | ()) = element a prime(element a?,element b?) = element a | element b prime(element a | element b+, element c*) = element a | element b | element c quantifier(element a+) = + quantifier(element a | ()) = ? quantifier(element a?,element b?) = * quantifier(element a | element b+, element d*) = +
Note that the last occurrence indicator should be '+', since the regular expression is such that there must be at least one element in the sequence (this element being an 'a' element or a 'b' element).
Note
Note that both prime and quantifier functions are only used within type inference rules; they are not part of the [XPath/XQuery] syntax.
Semantics
The prime function is defined by induction as follows.
| prime(ItemType) | = | ItemType |
| prime(()) | = | none |
| prime(none) | = | none |
| prime(Type1 , Type2) | = | prime(Type1) | prime(Type2) |
| prime(Type1 & Type2) | = | prime(Type1) | prime(Type2) |
| prime(Type1 | Type2) | = | prime(Type1) | prime(Type2) |
| prime(Type?) | = | prime(Type) |
| prime(Type*) | = | prime(Type) |
| prime(Type+) | = | prime(Type) |
Semantics
The quantifier function is defined by induction as follows.
| quantifier(ItemType) | = | 1 |
| quantifier(()) | = | 0 |
| quantifier(none) | = | 0 |
| quantifier(Type1 , Type2) | = | quantifier(Type1) , quantifier(Type2) |
| quantifier(Type1 & Type2) | = | quantifier(Type1) , quantifier(Type2) |
| quantifier(Type1 | Type2) | = | quantifier(Type1) | quantifier(Type2) |
| quantifier(Type?) | = | quantifier(Type) · ? |
| quantifier(Type*) | = | quantifier(Type) · * |
| quantifier(Type+) | = | quantifier(Type) · + |
This definition uses the sum (Occurrence1 , Occurrence2), the choice (Occurrence1 | Occurrence2), and the product (Occurrence1 · Occurrence2) of two occurrence indicators Occurrence1, Occurrence2, which are defined by the following tables.
|
|
|
Using these two newly defined functions, the result type of a SortExpr can be computed as follows. If Expr has type Type then Expr sortby SortSpecList has type prime(Type) · quantifier(Type). For the formal type inference rule see [5.9.1 Sorting Expressions]. Similar rules are applied to type iteration over sequences using for, as well as other operations that destroy sequence order - union, intersect, except, distinct-node, distinct-value, and unordered.
Note
Note that prime(Type) · quantifier(Type) is always a super type of the original type Type and therefore, can be used as an approximation for it. More formally, the following property always holds: Type <: prime(Type) · quantifier(Type). This property is important for the soundness of the static type analysis.
Static type analysis for the "typeswitch" expression has to compute the choice of all the common items between two types, and the corresponding occurrence indicators.
For instance, consider the following typeswitch expression, applied on the result of the previous query.
let $persons := (/paper/author,/paper/editor) sortby (.)
return
typeswitch $person
case element author return <single-author> { $person } </single-author>
case element author* return <authors-only> { $person } </authors-only>
default return <authors-and-editors> { $person } </authors-and-editors>
Static type analysis needs to compute the type of the
variable $person within each case clause. In the
first case, the variable is of type element
author. In the second case, the variable is of type
element author+. The '+' occurrence indicator can
be infered since the input type guarantees the input sequence
has at least one author.
Notation
The two following auxiliary functions on types are used.
The function common-prime(PrimeType1, PrimeType2) computes the type for all the items common between the prime types PrimeType1 and PrimeType2, and combines them into a new prime type.
The function common-occurrence(Occurrence1, Occurrence2) computes the occurrence indicator which approximates the possible number of items that can result from the constraints described by both Occurrence1, and Occurrence2.
Example
For instance, here are the result of applying common-prime and common-occurrence on a few simple prime types.
common-prime((element a),(element a)) = element a common-prime((element a),(element b)) = none common-prime((element a | element b | element c), (element c | element a)) = element a | element c common-occurrence(+,+) = + common-occurrence(*,+) = + common-occurrence(?,+) = 1
Note that the last occurrence indicator should be '1', since the first input occurrence indicator '?' specifies that there should be at most one item, while the second '+' specifies that there should be at least one.
Semantics
The common-prime function is defined by induction as follows.
The common-prime for two arbitrary prime types, is defined as the choice containing the results of the function common-prime applied between all possible pairs of items in each input prime types.
|
|||||
|
|
|||||
| statEnv |- common-prime( ( ItemType1 | ··· | ItemTypen ), ( ItemType1' | ··· | ItemTyper' ) ) = ItemType(1,1) | ··· | ItemType(n,r) |
The common-prime function between any pair of item types return either a new item type or the empty choice (none); remember that none is the unit for choice.
If one of the item type is a subtype of the other, then the function returns the smaller item type.
| statEnv |- ItemType1 <: ItemType2 |
|
|
| statEnv |- common-prime(ItemType1, ItemType2) = ItemType1 |
| statEnv |- ItemType2 <: ItemType1 |
|
|
| statEnv |- common-prime(ItemType1, ItemType2) = ItemType2 |
Ed. Note: Issue: The above rule does not deal with cases where the two item types might have some intersection (i.e., there exist values which match both types), but subtyping does not hold between them. See [Issue-0155: Common primes for incomparable types].
In all the other cases, common-prime returns none. I.e., otherwise:
Semantics
The common-occurrence function is defined by the following table.
| common-occurrence | 0 | 1 | ? | + | * |
| 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1 | 1 | 1 | 1 |
| ? | 0 | 1 | ? | 1 | ? |
| + | 0 | 1 | 1 | + | + |
| * | 0 | 1 | ? | + | * |
Example
The above definition relies on subtyping which covers complex cases (involving derivation, substitution groups, etc.) appropriately. For instance, consider the following type declarations.
define element person of type Person define type Person
{ element name of type xs:string }
define element student substitutes for person of type Student
define type Student extends Person {
element promotion of type Promotion,
element grade of type Grade
}
define type Promotion restricts xs:integer { xs:integer }
define type Grade restricts xs:integer { xs:integer }
Here are some more complex examples of the application of the common-prime type function involving that schema.
common-prime(Promotion, xs:integer) = Promotion common-prime(Promotion, Grade) = none common-prime(element of type Person, element of type Student) = element of type Student common-prime(element person, element student) = element student
Ed. Note: Alignment between [XPath/XQuery], the [XQuery 1.0 and XPath 2.0 Data Model], and the [XPath/XQuery] formal semantics. There are some known discrepancies between the type models in the Data Model document, the [XPath/XQuery] document and this document. Currently the [XPath/XQuery] grammar does not contain comments, processing-instructions or schema-components. Currently this document does not deal with PI and comment nodes, which are present in the Data Model. See issues [Issue-0068: Document Collections], [Issue-0105: Types for nodes in the data model.].
Ed. Note: Complexity of type operations. There are concerns about the complexity of type inference, notably about type subsumption used during type inference, and validation used during evaluation of some of the type operations (e.g., typeswitch). Additionally, there are many language features for which it is possible to define special type inference rules that would give tighter bounds. The question is, which features, which rules, and how do they interact with the complexity of type inference and subtype computation. See [Issue-0080: Typing of parent], [Issue-0083: Expressive power and complexity of typeswitch expression], [Issue-0091: Attribute expression].
The organization of this section parallels the organization of Section 2 of the [XPath/XQuery] document.
Introduction
The expression context for a given expression consists of all the information that can affect the result of the expression. This information is organized into the static context and the evaluation context. This section specifies the environments that represent the context information used by [XPath/XQuery] expressions.
The [XPath/XQuery] static inference rules use the environment group statEnv containing the environments built during static type checking and used during both static type checking and dynamic evaluation.
The following environments are maintained in the static environment group:
| statEnv.namespace |
The namespace environment maps namespace prefixes (NCNames) onto namespace URIs (URIs). The namespace environment captures in-scope namespaces in the [XPath/XQuery] static context. | ||
| statEnv.default_namespace |
The default namespace environment maps "element", and "function" to a namespace URI (a URI), as determined by the appropriate namespace declaration. The default namespace environment captures the default namespace for element and type names and Default namespace for function names in the [XPath/XQuery] static context. | ||
| statEnv.typeDefn |
The type definition environment maps type names (TypeNames) onto their global type definitions (Definitions). The type definition environment captures in-scope schema definitions in the [XPath/XQuery] static context. | ||
| statEnv.elemDecl |
The element declaration environment maps element names (ElementNames) onto their global element declarations (Definitions). The element declaration environment does not have a counterpart in the [XPath/XQuery] static context. See [Issue-0159: Element and attribute declarations in the static context]. | ||
| statEnv.attrDecl |
The attribute declaration environment maps attribute names (AttributeNames) onto their global attribute declarations (Definitions). The attribute declaration environment does not have a counterpart in the [XPath/XQuery] static context. See [Issue-0159: Element and attribute declarations in the static context]. | ||
| statEnv.varType |
The variable static type environment maps variable names (Variables) to their static types (Types). The variable static type environment captures in-scope variables in the [XPath/XQuery] static context. | ||
| statEnv.funcType |
The function declaration environment
stores the static type signatures of functions. Because
[XPath/XQuery] allows multiple functions with the same name
differing only in the number of arguments, this
environment maps function name/arity (QName, integer)
pairs to function type signatures The function declaration environment captures in-scope functions in the [XPath/XQuery] static context. | ||
| statEnv.base_uri |
The base uri environment contains a unique namespace URI (a URI). The base uri environment captures the base URI in the [XPath/XQuery] static context. |
Ed. Note: Issue: The static environment does not represent collations. See [Issue-0160: Collations in the static environment]
Environments have an initial state when [expression/query] processing begins, containing, for example, the function signatures of all built-in functions.
A common use of the static environment is to expand QNames by looking up namespace prefixes in the statEnv.namespace environment.
The helper function expand is
used to expand QNames by looking up the namespace prefix in
statEnv.namespace. This function is defined as follows:
statEnv |-
expand(QName) = qname(URI,
ncname)
or
statEnv |-
expand(QName) = qname(ncname)
the right hand side is a QName value (not a QName expression).
The helper expand function is defined as
follows:
If QName matches
NCName1:NCName2, and if
statEnv.namespace(NCName1) = URI, then the expression
yields qname(URI,NCName2). If the namespace
prefix NCName1 is not found in statEnv.namespace, then the
expression does not apply (that is, the inference rule will
not match).
If QName matches NCName, NCName is an element or type name, and statEnv.default_namespace("element") = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.
If QName matches NCName, NCName is a function name, and statEnv.default_namespace("function") = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.
Ed. Note: The above rules could be given as proper inference rules defining the
expandjudgment.
Here is an example that shows how the static environment is modified in response to a namespace definition.
This rule reads as follows: "the phrase on the bottom (a namespace declaration followed by a sequence of expressions) is well-typed (accepted by the static type inference rules) within an environment statEnv if the sequence of expressions above the line is well-typed in the environment obtained from statEnv by adding the namespace declaration".
This is the common idiom for passing new information in an environment using sub-expressions. In the case where the environment must be updated with a completely new component, the following notation is used:
statEnv [ namespace = (NewEnvironment) ]dynEnv denotes the group of environments built and used only during dynamic evaluation.
The following environments are dynamic:
| dynEnv.funcDefn |
The dynamic function environment maps a function name to a function definition. The function definition consists of an expression, which is the function's body, and a list of variables, which are the function's formal parameters. As with statEnv.funcType, dynEnv.funcDefn requires a function name and a parameter count: it maps a (QName, integer) pair to a value list of the form (Expr , Variable1, ..., Variablen). | ||
| dynEnv.varValue |
The dynamic value environment maps a variable name (QName) onto the variable's current value (Value). |
Ed. Note: DD: this still does not account for those functions that are genuinely overloaded, as some of the built-in functions are. Probably this will be handled by special rules for those functions in the main semantics section of the document. See [Issue-0122: Overloaded functions]
Finally, dynamic environments have an initial state when [expression/query] processing begins, containing, for example, the function declarations of all built-in functions.
Ed. Note: Somewhere an exact definition of what is in the initial environments is needed. Notice that for XPath this is partially defined by the containing language. See [Issue-0115: What is in the default context?].
The following Formal Semantics built-in variables represent the context item, context position , and context size properties of the evaluation context:
| Built-in Variable | Represents: |
| $fs:dot | context item |
| $fs:position | context position |
| $fs:last | context size |
| $fs:implicitTimezone | implicit timezone |
Variables with the "fs" namespace prefix are reserved for use in the definition of the Formal Semantics. It is a static error to define a variable in the "fs" namespace.
Ed. Note: "fs" is actually a namespace prefix and should be replaced with a proper namespace URI unique to this spec. See [Issue-0100: Namespace resolution].
Values of $fs:dot, $fs:position and $fs:last can be
obtained by invoking the xf:context-item(),
xf:position() and xf:last() functions,
respectively. The context document, can be
obtained by application of the xf:root()
function.
The variable $fs:implicitTimezone is used to represent the implicit timezone, and is used by the timezone related functions in [XQuery 1.0 and XPath 2.0 Functions and Operators].
Ed. Note: The Formal Semantics does not yet model the semantics of semantics of timezone related functions.
Ed. Note: The following dynamic contexts have no formal representation yet: current-date, current-time and current-dateTime. See [Issue-0114: Dynamic context for current date and time]
[XPath/XQuery] has three functions that provide access to input
data: input, collection, and
document. The dynamic semantics of these three input
functions are described in more detail in [XQuery 1.0 and XPath 2.0 Functions and Operators]. The static typing for these
function is still an open issue (See [Issue-0137:
Typing of input functions]).
This section gives background material about the [XPath/XQuery] data model. More information on the [XPath/XQuery] data model can be found in [2.3 Data Model].
This section gives background material about [XPath/XQuery] type checking. More information on the [XPath/XQuery] type system can be found in [2.4 Schemas and types] and [3 The XQuery Type System].
Introduction
SequenceTypes can be used in [XPath/XQuery] to refer to a type imported from a schema (see [6 The Query Prolog]). SequenceTypes are used to declare the types of function parameters and in several kinds of [XPath/XQuery] expressions.
The syntax of SequenceTypes is described by the following grammar productions.
The semantics of SequenceTypes is defined by means of normalization rules from SequenceTypes to the [XPath/XQuery] type system (see [3 The XQuery Type System]).
Core Grammar
The core grammar productions for sequence types are:
Ed. Note: Note that normalization on SequenceTypes does not occur during the normalization phase but whenever a dynamic or static rule requires it. The reason for this deviation from the processing model is that the result of SequenceType normalization is not part of the [XPath/XQuery] syntax (See issue [Issue-0089: Syntax for types in XQuery]). SequenceType normalization is the only occurrence of such a deviation in the formal semantics.
Notation
To define the semantics of SequenceTypes, the following auxiliary mapping rule is used.
| [SequenceType]sequencetype |
| == |
| Type |
specifies that SequenceType is mapped to Type, in the XQuery type system.
OccurenceIndicators are left unchanged when normalizing SequenceTypes into [XPath/XQuery] types. Each kind of SequenceType component is normalized separately into the [XPath/XQuery] type system.
| [ItemType OccurrenceIndicator]sequencetype |
| == |
| [ItemType]sequencetype OccurrenceIndicator |
The "empty" sequence type is mapped to the empty sequence type in the [XPath/XQuery] type system.
| [empty]sequencetype |
| == |
| [()]sequencetype |
The SequenceType component with type SchemaType is mapped directly into the [XPath/XQuery] type system.
| [element SchemaType]sequencetype |
| == |
| element SchemaType |
| [attribute SchemaType]sequencetype |
| == |
| attribute SchemaType |
The mapping still does not handle SequenceTypes using SchemaContext (See [Issue-0138: Semantics of Schema Context]). The two following rules map references to a global element or attribute, without taking the schema context into account.
| [element QName]sequencetype |
| == |
| element QName |
| [attribute QName]sequencetype |
| == |
| attribute QName |
Document nodes, text nodes, untyped and atomic types are left unchanged.
| [text]sequencetype |
| == |
| text |
| [document]sequencetype |
| == |
| document |
| [untyped]sequencetype |
| == |
| untyped |
| [AtomicType]sequencetype |
| == |
| AtomicType |
Processing instruction and comment types are ignored. See [Issue-0105: Types for nodes in the data model.].
| [comment]sequencetype |
| == |
| [processing-instruction]sequencetype |
| == |
The SequenceType components "node",
"item", and "atomic value" correspond to
wildcard types. node indicates that any node is
allowed, "atomic value" indicates that any atomic (not list)
value is allowed, and item indicates that any node or atomic
value is allowed. The following mapping rules make use of the corresponding
wildcard types.
| [node]sequencetype |
| == |
| (element | attribute | text | document) |
| [item]sequencetype |
| == |
| (element | attribute | text | document | xs:string | xs:decimal | ... ) |
| [atomic value]sequencetype |
| == |
| (xs:string | xs:decimal | ... ) |
Ed. Note: Jerome: The Formal Semantics makes use of fs:numeric which is not in XML Schema. This is necessary for the specification of some of XPath type conversion rules. See [Issue-0127: Datatype limitations].
During processing of a query, it is sometimes necessary to determine whether a given value matches a type that was declared using the SequenceType syntax. This process is known as SequenceType matching, and is formally specified in [3.3 Matching].
Introduction
Some expressions do not require their operands to exactly match the expected type. For example, function parameters and returns expect a value of a particular type, but automatically perform certain type conversions, such as extraction of atomic values from nodes, promotion of numeric values, and implicit casting of untyped values. The conversion rules for function parameters and returns are discussed in [5.1.4 Function Calls]. Other operators that provide special conversion rules include arithmetic operators, which are discussed in [5.4 Arithmetic Expressions], and value comparisons, which are discussed in [5.5.1 Value Comparisons].
Some special conversion rules: atomization, effective boolean value, and fallback conversions, are used in several places in the language.
Notation
There are two variants of the normalization function []Atomize.
The first variant, []Atomize is not parameterized by the required type of the expression.
The second variant, []Atomize(Type), is parameterized by the required type.
Both functions convert an expression into an expression that returns an optional atomic value and is used in the normalization of expressions whose required type is an optional atomic value.
A node is converted to an optional atomic value by extracting its typed content.
| [Expr]Atomize | |||||||
| == | |||||||
|
In the parameterized variant, an atomic value with type untyped is always cast to the required type:
| [Expr]Atomize(Type) | |||||||||
| == | |||||||||
|
Ed. Note: Issue:Don Chamberlin indicates that this rule should do type promotion. See [Issue-0161: Type promotion in Atomization].
Notation
The following normalization rule implements type exceptions: []Type_Exception(Type).
There are two definitions for each of these functions.
The definition here implements the strict type
exception policy. The strict policy always raises an
error:
| [Expr]Type_Exception(Type) |
| == |
| xf:error() |
The flexible type exception policy is defined
in [4.4.3.3 Fallback Conversion].
Notation
The function []Effective_Boolean_Value takes an expression and normalizes it into an effective boolean value.
The effective boolean value is obtained through the following normalization rule. Note that the default clause returns true is at least one item in the sequence is a node, otherwise, it raises a type exception.
| [Expr]Effective_Boolean_Value | ||||
| == | ||||
|
The following rules define the "fallback conversions" that support the "flexible" type-exception policy. These rules depend on the required type for the expression that is being normalized.
In the first rule, the required type is xs:boolean. The conditional expression in the "default return" clause below guarantees that in an arbitrary sequence of items, if at least one value is a node, then the boolean expression evaluates to true.
| [Expr]Type_Exception(xs:boolean) | |||||
| == | |||||
|
Ed. Note: MFF: The rule above requires that $v/self::node() on an atomic value returns () rather than raise an error.
In the next rule, the required type is "node". The conditional expression in the "default return" clause below guarantees that, if the first item is a node, then it returns the node, otherwise it returns an error.
| [Expr]Type_Exception(node) | ||||||
| == | ||||||
|
Ed. Note: MFF: All the remaining rules in the fallback conversions table must be specified. See [Issue-0113: Incomplete specification of type conversions]
This section describes error handling and conformance levels.
Ed. Note: Issue: Error handling is not formally specified. See [Issue-0094: Static type errors and warnings]
Ed. Note: Issue: Conformance levels are not formally specified. See [Issue-0169: Conformance Levels]
This section gives the semantics of all the [XPath/XQuery] expressions. The organization of this section parallels the organization of Section 3 of the [XPath/XQuery] document.
| [5 (XQuery)] | Expr | ::= | |
| [10 (XPath)] | Expr | ::= |
For each expression, a short description and the relevant grammar productions are given. The semantics of an expression includes the normalization, static analysis, and dynamic evaluation phases. Recall that normalization rules translate [XPath/XQuery] syntax into core [XPath/XQuery] syntax. In the sections that contain normalization rules, the Core grammar productions into which the expression is normalized are also provided. After normalization, sections on static type inference and dynamic evaluation define the static type and dynamic value for the core expression.
Core Grammar
The core grammar production for expressions is:
| [5 (Core)] | Expr | ::= |
Primary expressions are the basic primitives of the language.They include literals, variables, function calls, and the use of parentheses to control precedence of operators.
| [39 (XQuery)] | PrimaryExpr | ::= | |
| [227 (XQuery)] | VarName | ::= |
Core Grammar
The core grammar productions for primary expressions are:
| [24 (Core)] | PrimaryExpr | ::= | |
| [169 (Core)] | VarName | ::= |
Introduction
A literal is a direct syntactic representation of an atomic value. [XPath/XQuery] supports two kinds of literals: string literals and numeric literals.
| [56 (XQuery)] | Literal | ::= | |
| [55 (XQuery)] | NumericLiteral | ::= | |
| [193 (XQuery)] | IntegerLiteral | ::= | |
| [194 (XQuery)] | DecimalLiteral | ::= | |
| [195 (XQuery)] | DoubleLiteral | ::= | |
| [214 (XQuery)] | StringLiteral | ::= |
All literals are core expressions, therefore no normalization rules are required for literals.
Core Grammar
The core grammar productions for literals are:
| [38 (Core)] | Literal | ::= | |
| [37 (Core)] | NumericLiteral | ::= | |
| [138 (Core)] | IntegerLiteral | ::= | |
| [139 (Core)] | DecimalLiteral | ::= | |
| [140 (Core)] | DoubleLiteral | ::= | |
| [158 (Core)] | StringLiteral | ::= |
In the static semantics, the type of an integer literal is simply xs:integer:
|
|
||
|
In the dynamic semantics, an integer literal is evaluated by constructing an atomic value in the data model, which consists of the literal value and its type:
The formal definitions of decimal, double, and string literals are analogous to those for integer.
|
|
||
|
|
|
||
|
|
|
||
|
|
|
||
|
|
|
||
|
Ed. Note: MFF: Phil has noted that the data model should support primitive literals in their lexical form, in which case no explicit dynamic semantic rule would be necessary. See [Issue-0118: Data model syntax and literal values].
Introduction
A variable evaluates to the value to which the variable's QName is bound in the evaluation context.
A variable is a core expression, therefore no normalization rule is required for a variable.
In the static semantics, the type of a variable is simply its type in the static type environment statEnv.varType:
If the variable is not bound in the environment, the system raises an error.
In the dynamic semantics, a variable is evaluated by "looking up" its value in dynEnv.varValue:
If the variable is not bound in the environment, the system raises an error.
| [57 (XQuery)] | ParenthesizedExpr | ::= |
The formal definition of parenthesized expressions is in [5.3 Sequence Expressions].
Core Grammar
The core grammar production for parenthesized expressions is:
| [39 (Core)] | ParenthesizedExpr | ::= |
Introduction
A function call consists of a QName followed by a parenthesized list of zero or more expressions.
| [58 (XQuery)] | FunctionCall | ::= |
Because [XPath/XQuery] perform implicit operations when doing function calls, a normalization step is required.
Notation
Normalization of function calls uses an auxiliary mapping rule, []FormalArgument, used to map the formal arguments of a function call.
There are three variants of this mapping rule. If the required type of the formal argument is an optional atomic type, the following rule is applied: (Type is the required atomic type.)
| [Expr]FormalArgument |
| == |
| [Expr]Atomize(Type) |
If the required type of the formal argument is a sequence of atomic types, the following rule is applied: (Type is the atomic type of the sequence.)
| [Expr]FormalArgument | ||||
| == | ||||
|
Ed. Note: Jerome: it is not clear whether the above rule is correct. At least type promotion seems to be implied by the corresponding description in the [XPath/XQuery] document. See .
If the required type of the formal argument is neither an optional atomic type or a sequence of atomic types, the expression is simply mapped to its corresponding core expression:
| [Expr]FormalArgument |
| == |
| [Expr]Expr |
First, each argument in a function call is normalized to its corresponding core expression by applying []FormalArgument.
| [ QName (Expr1, ..., Exprn) ]Expr |
| == |
| [[QName] ( [[Expr1 ]Expr]FormalArgument, ..., [[ Exprn ]Expr]FormalArgument)]FormalArgument |
Note that this normalization rule depends on the static environment containing function signatures and is the only place where we exploit the implicit presence of statEnv.
Core Grammar
The core grammar production for function calls is:
| [40 (Core)] | FunctionCall | ::= |
Based on the function's name and number of arguments (and in the case of built-in functions based also on the type of its arguments), the function signature is retrieved from the static environment. If the function is not present in the environment, an error is raised. The type of each actual argument to the function must be a subtype of the corresponding formal argument to the function, i.e., it is not necessary that the actual and formal types be the same.
Ed. Note: Issue: Michael Rys points out there is a dependency between normalization (which requires the function signature) and the fact that built-in functions may be overloaded for various types. See [Issue-0140: Dependency in normalization and function resolution]
|
||||||
|
|
||||||
|
Based on the function's name and the number of arguments (and in the case of built-in functions based also based on the type of its arguments), the function body is retrieved from the dynamic environment. If the function is not present in the environment, an error is raised. The rule first evaluates each function argument, then extends dynEnv.varValue by binding each formal variable to its corresponding value, and then evaluates the body of the function in the new environment. The resulting value is the value of the function call.
|
||||||||
|
|
||||||||
|
Note that the function body is evaluated in the default environment. Note also that input values and output values are matched against the types declared for the function. If static analysis was performed, all these checks are guaranteed to be true and may be omitted.
Ed. Note: This only describes the semantics of user defined functions. The semantics of built-in functions is still an open issue. See [Issue-0122: Overloaded functions] and [Issue-0162: How to describe the semantics of built-in functions].
| [94 (XQuery)] | ExprComment | ::= |
Comments are lexical constructs only, and have no meaning within the query and therefore have no Formal Semantics.
Introduction
Path expressions are used to locate nodes within a tree. There are two kinds of path expressions, absolute path expressions and relative path expressions. An absolute path expression is a rooted relative path expression. A relative path expression is composed of a sequence of steps.
| [23 (XQuery)] | PathExpr | ::= | |
| [24 (XQuery)] | RelativePathExpr | ::= |
Core Grammar
The core grammars productions for path expressions are:
| [13 (Core)] | PathExpr | ::= | |
| [14 (Core)] | RelativePathExpr | ::= |
Notation
To define the semantics of relative path expressions, the auxiliary mapping rule is used: [RelativePathExpr]Path.
Absolute path expressions are path expressions starting with
the / symbol, indicating that the expression must be
applied on the root node in the current context. Remember that the
root node in the current context is the topmost ancestor of the
context node. The following two rules are used to normalize
absolute path expressions to relative ones, and rely on the use of
the xf:root() function, which computes the document
node from the context node.
| ["/"]Expr |
| == |
| xf:root( $fs:dot ) |
| ["/" RelativePathExpr]Expr |
| == |
| fs:distinct-doc-order( [xf:root( $fs:dot ) "/" RelativePathExpr]Path ) |
Ed. Note: Some of the semantics of the root expression
'/'are still undefined. For instance, what should the semantics of'/'be in case of a document fragment (e.g., created using XQuery element constructor). See [Issue-0123: Semantics of /].
Ed. Note: [Kristoffer/XSL] Also the context node and document order are not defined when it is a context item).
In general, a path expression always returns a sequence in document order. The complete normalization of a Path expression is obtained by applying the Path normalization and then sorting the result in document order and removing duplicates.
Ed. Note: Jerome: the restriction that XPath expressions operate on nodes here seems too strict and is still an open issue. See [Issue-0125: Operations on node only in XPath].
A composite relative path expression (using /)
is normalized into a for expression by
concatenating the sequences obtained by mapping each node of the
left-hand side in document order to the sequence it generates on
the right-hand side; the encapsulating call of the
fs:distinct-doc-order function then ensures that the result is
in document order without duplicates. The evaluation context is
carefully set up (using two bindings of the
$fs:last semantic variable).
| [StepExpr "/" RelativePathExpr ]Path | |||||
| == | |||||
|
Ed. Note: [Kristoffer/XSL] We have defined normalization over a right recursive variant of the syntax production -- this should perhaps be explained more elaborately.
Introduction
| [25 (XQuery)] | StepExpr | ::= | |
| [50 (XQuery)] | ForwardStep | ::= | |
| [51 (XQuery)] | ReverseStep | ::= | |
| [54 (XQuery)] | Predicates | ::= |
Core Grammar
The core grammars productions for XPath steps are:
| [15 (Core)] | StepExpr | ::= | |
| [35 (Core)] | ForwardStep | ::= | |
| [36 (Core)] | ReverseStep | ::= |
Notation
Step expressions can be followed by predicates. Normalization of predicates uses the following auxiliary mapping rule: []Predicates.
Normalization of predicates need to distinguish between forward and reverse axes.
As explained in the [XPath/XQuery] document, applying a step in
XPath changes the focus (or context). The change of focus is
made explicit by the normalization rule below, which binds the
variable $fs:dot to the node currently being
processed, and the variable $fs:position to the
position (i.e., the position within the input sequence) of that
node. Notice that the expression does not reorder the sequences
obtained by evaluating the subexpressions: if they are
RelativePathExprs then the contained path normalization will
insert fs:distinct-doc-order appropriately.
| [ForwardStep Predicates "[" Expr "]"]Path | |||||
| == | |||||
|
| [ReverseStep Predicates "[" Expr "]"]Path | |||||
| == | |||||
|
The []Predicates function is specified in [5.2.2 Predicates].
Ed. Note: Issue:These rules use the function xf:index-of to bind the position of a node in a sequence. This works since path expressions only work on nodes. See [Issue-0125: Operations on node only in XPath]. Still, it might be preferable to be able to bind both the current node and its position through a single processing of the collection within the for expression. The FS editors have made several syntax proposals for such an operation. For instance:
for $v at $i in E1 return E2. See [Issue-0124: Binding position in FLWR expressions].
Ed. Note: Issue:The semantics of Predicates applied on primary expression is not specified. See [Issue-0163: Normalization of XPath predicates]
Introduction
| [40 (XQuery)] | ForwardAxis | ::= | |
| [41 (XQuery)] | ReverseAxis | ::= | |
| [36 (XPath)] | ForwardAxis | ::= | |
| [37 (XPath)] | ReverseAxis | ::= |
Core Grammar
The core grammars productions for XPath axis are:
| [25 (Core)] | ForwardAxis | ::= | |
| [26 (Core)] | ReverseAxis | ::= |
Notation
Some auxiliary grammar productions and judgments are introduced to structure the semantic rules for axes and node tests. The first is used to capture the principal node kind associated with each axis:
| [51 (Formal)] | PrincipalNodeKind | ::= | |
| [49 (Formal)] | Axis | ::= |
Notation
The principal node kind for each axis is specified using the following judgment.
Notation
The following auxiliary grammar production defines Filters, which denote either an axis, or a principal node kind in the context of a principal node kind. An Axis merely combines the forward and reverse axes to allow judgments that range over both.
| [49 (Formal)] | Axis | ::= | |
| [50 (Formal)] | Filter | ::= |
Notation
The semantics of filters is defined with three auxiliary judgments. The first judgment defines the effect of a filter on a value, and is used in the dynamic semantics. This judgment should be read as follows: in the current context (i.e., with respect to the dynamic environment dynEnv.varValue), applying the Filter on Value1 yields Value2:
The second judgment is defining the
effect of a filter (either an axis or a nodetest) on a type and
is used in the static semantics. This judgment should be read as
follows: in the current context, (i.e., with respect to the the
static environment statEnv.varType), applying the filter
Filter on type Type1 yields the
type Type2
The last judgment allows an additional type environment, localTypeEnv. This "local" type environment maps variables to types (in the same way as statEnv.varType), and is used when computing the static type of the descendant axis.
These judgments will be defined separately for axes and node tests below.
Ed. Note: Explain the judgments in detail, maybe similarly to in section 3.2.
The static semantics of an Axis NodeTest pair is obtained by retrieving the type of the context node, and applying the two filters (the Axis, and then the NodeTest with a PrincipalNodeKind) on the result.
|
|||||
|
|
|||||
| statEnv |- Axis NodeTest : Type3 |
Here are the common set of inference rules defining the static semantics of filters. Essentially, these rules are used to process complex types through a filter, breaking the types down to the type of a node or of a value. The semantics of a filter applied to a node or value type is then specific to each Axis and NodeTest.
|
|
| statEnv |- Filter on () : () |
|
|
| statEnv |- Filter on none : none |
|
|||
|
|
|||
| statEnv |- Filter on Type1,Type2 : Type3,Type4 |
|
|||
|
|
|||
| statEnv |- Filter on Type1|Type2 : Type3|Type4 |
|
|||
|
|
|||
| statEnv |- Filter on Type1&Type2 : Type3&Type4 |
The cases for specific axis and node test filters are given in the axis and node test sections below.
The dynamic semantics of an Axis NodeTest pair is obtained by retrieving the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the filter judgment as follows.
|
|||||
|
|
|||||
| dynEnv |- Axis NodeTest => Value3 |
Here are the common set of inference rules defining the dynamic semantics of filters. These rules apply filters to a sequence, by breaking the sequence down to a node or value.
|
|
| dynEnv |- Filter on () => () |
|
|||
|
|
|||
| dynEnv |- Filter on Value1,Value2 => Value3,Value4 |
Ed. Note: Jerome: These rules fetaure a notational abuse! Value1,Value2 indicates the concatenation of Value1 and Value2, and is used for both constructing a new sequence and deconstructing it. When doing the construction, it is equivalent to applying the
op:concatenateoperator. The Formal Semantics specification should use more consistent notation for data model constructions and deconstruction. See also [Issue-0118: Data model syntax and literal values].
The semantics of a filter applied to a specific axis or nodetest is given in the appropriate section below.
The semantics of the following(-sibling) and preceding(-sibling) axes are expressed by mapping them to core expressions, all other axes are part of core [XPath/XQuery] and therefore are left unchanged through normalization.
[following-sibling::
NodeTest]Path
|
|||||
| == | |||||
|
Ed. Note: Should all mappings be unfolded all the way to the core language like the above one?
[following:: NodeTest]Path
|
| == |
[ancestor-or-self::node()/following-sibling::node()/descendant-or-self::NodeTest]Path
|
Otherwise, the forward axis is part of the core [XPath/XQuery] and handled by the Axis/NodeTest semantics below:
[child:: NodeTest]Path
|
| == |
child:: NodeTest
|
[attribute:: NodeTest]Path
|
| == |
attribute:: NodeTest
|
[self:: NodeTest]Path
|
| == |
self:: NodeTest
|
[descendant:: NodeTest]Path
|
| == |
descendant:: NodeTest
|
[descendant-or-self:: NodeTest]Path
|
| == |
descendant-or-self:: NodeTest
|
[namespace:: NodeTest]Path
|
| == |
namespace:: NodeTest
|
Reverse axes:
[preceding-sibling:: NodeTest]Path
|
||
| == | ||
|
[preceding:: NodeTest]Path
|
| == |
[ancestor-or-self::node()/preceding-sibling::node()/descendant-or-self::NodeTest]Path
|
Otherwise, the reverse axis is part of the core [XPath/XQuery] and is handled by the Axis/NodeTest semantics below.
[parent:: NodeTest]Path
|
| == |
parent:: NodeTest
|
[ancestor:: NodeTest]Path
|
| == |
ancestor:: NodeTest
|
[ancestor-or-self:: NodeTest]Path
|
| == |
ancestor-or-self:: NodeTest
|
Finally, the principal node kind of each axis is enumerated by the following rules (given here because they are used both by the static and dynamic semantic rules).
|
|
attribute:: principal attribute
|
|
|
namespace:: principal namespace
|
Axis != attribute::
Axis != namespace::
|
|
|
| Axis principal element |
The following rules define the static semantics of the filter judgment when applied to an Axis.
The type for the self axis is the same type as the type of the context node.
In case of elements, the type of the child axis is obtained by extracting the children types out of the content model describing the type of the context node.
Ed. Note: [Kristoffer/XSL] Need to add filters for all cases.
The type of the attribute axis is obtained by extracting the attribute types out of the content model describing the type of the context node.
Ed. Note: [Kristoffer/XSL] Need to add filters for all cases.
The type for the parent axis is either an element, a document (for document elements), or empty.
Ed. Note: Better typing for the parent axis is still an open issue. See [Issue-0080: Typing of parent].
The type for the namespace axis is always empty.
Ed. Note: Jerome: The type of namespace nodes is still an open issue. See [Issue-0105: Types for nodes in the data model.].
The types for the descendant, and descendant-or-self, ancestor, and ancestor-or-self axis are implemented through recursive application of the children and parent filters. The corresponding inference rules use the auxiliary filter judgment with a local type environment. This local type environment is used to keep track of the visited elements in order to deal with recursive types.
The following two rules are used to deal with global elements. Global elements need careful treatment here in order to deal with possible recursion. Notably, if the global element has been seen before, then the rule terminates and returns the union of all types already visited in the local