XQuery 1.0 and XPath 2.0 Formal Semantics

1 Introduction

This document defines the formal semantics of XQuery 1.0 and XPath 2.0. The present document is part of a set of documents that together define the XQuery 1.0 and XPath 2.0 languages:

[XQuery 1.0: A Query Language for XML] introduces the XQuery 1.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XML Path Language (XPath) 2.0] introduces the XPath 2.0 language, defines its capabilities from a user-centric view, and defines the language syntax.
[XQuery 1.0 and XPath 2.0 Data Model] formally specifies the data model used by [XPath/XQuery] to represent the content of XML documents. The [XPath/XQuery] language is formally defined by operations on this data model.
[XQuery 1.0 and XPath 2.0 Functions and Operators] lists the functions and operators defined for the [XPath/XQuery] language and specifies to which arguments they can be applied and what the result should be.

Ed. Note: [Kristoffer/XSL] The role of the formal semantics document could be clarified: does it supplement or subsume the language specification in case of conflict?

The scope and goals for the [XPath/XQuery] language are discussed in the charter of the W3C [XSL/XML Query] Working Group and in the [XPath/XQuery] requirements [XML Query 1.0 Requirements].

[XPath/XQuery] is a powerful language, capable of selecting and extracting complex patterns from XML documents , reformulating them into results in arbitrary ways, and computing expressions over the result. This document defines the semantics of [XPath/XQuery] by giving a precise formal meaning to each of the constructions of the [XPath/XQuery] specification in terms of the [XPath/XQuery] data model. This document assumes that the reader is already familiar with the [XPath/XQuery] language.

Two important design aspects of [XPath/XQuery] are that it is functional and that it is typed. These two aspects play an important role in the [XPath/XQuery] Formal Semantics.

[XPath/XQuery] is a functional language. It is built from expressions, rather than statements. Every construct in the language (except for the query prolog) is an expression and expressions can be composed arbitrarily. The result of one expression can be used as the input to any other expression, as long as the type of the result of the former expression is compatible with the input type of the latter expression with which it is composed. Another aspect of the functional approach is that variables are always passed by value and their value cannot be modified through side-effects.

[XPath/XQuery] is a typed language. Types can be imported from one or more XML Schemas describing the documents that will be processed, and the [XPath/XQuery] language can then perform operations based on these types (e.g., using a treat as operation). In addition, [XPath/XQuery] also supports a level of static type analysis. This means that the system can perform infer the type of a [expression/query], based on the type of its inputs. Static typing allows early error detection, and can be used as the basis for certain forms of optimization. The type system of [XPath/XQuery] is based on [XML Schema]. The [XPath/XQuery] type system models most of the features of [XML Schema Part 1], including global and local element and attribute declarations, complex and simple type definitions, named and anonymous types, derivation by restriction, extension, list and union, substitution groups, and wildcard types. It does not model uniqueness constraints and facet constraints on simple types. The main issues with respect to the [XPath/XQuery] type system are discussed in [3.8 Major type issues].

The [XPath/XQuery] formal semantics builds on long-standing traditions in the database and in the programming languages communities. In particular, the [XPath/XQuery] formal semantics has been inspired by works on SQL [SQL], OQL [ODMG], [BKD90], [CM93], and nested relational algebra (NRA) [BNTW95], [Col90], [LW97], [LMW96]. Other references include Quilt [Quilt], UnQL [BFS00], XDuce [HP2000], XML-QL [XMLQL99], XPath 1.0 [XML Path Language (XPath) : Version 1.0], XQL [XQL99], XSLT [XSLT 99], and YATL [YAT99]. Additional related work can be found in the bibliography [B References].

This document is organized as follows. [2 Preliminaries] introduces notations used to define the [XPath/XQuery] formal semantics. [3 The XQuery Type System] describes the [XPath/XQuery] type system, operations on the [XPath/XQuery] type system, and explains the relationship between the [XPath/XQuery] type system and XML Schema. [4 Basics] describes semantics for basic [XPath/XQuery] concepts, and [5 Expressions], describes the dynamic and static semantics of [XPath/XQuery] expressions. [6 The Query Prolog] describes the semantics of the [XPath/XQuery] prolog. [7 Additional Semantics of Functions] describes any additional semantics required for [XPath/XQuery] functions. Finally, [8 Importing Schemas], specifies how XML Schemas are imported into the [XPath/XQuery] type system.

2 Preliminaries

This section provides the background necessary to understand the [XPath/XQuery] Formal Semantics and introduces the notations that are used.

2.1 Processing model

First, a processing model for [expression/query] evaluation is defined. This processing model is not intended to describe an actual implementation, although a naive implementation might be based upon it. It does not prescribe an implementation technique, but any implementation should produce the same results as obtained by following this processing model and applying the rest of the Formal Semantics specification.

The processing model consists of five phases; each phase consumes the result of the previous phase and generates output for the next phase. For each processing phase, we furthermore point to the relevant notations introduced later in the document.

Parsing. The grammar for the [XPath/XQuery] syntax is defined in [XQuery 1.0: A Query Language for XML]. Parsing may generate syntax errors. If no error occurs, an internal representation of the parsed syntax tree corresponding to the query is created: this ensures that we can unambiguously specify the following phases by a case per syntax production.
Context Processing. The semantics of [expression/query] depends on the input context. The input context needs to be generated before the [expression/query] can be processed. In XQuery, the input context is described in the Query Prolog (See [6 The Query Prolog]). In XPath, the input context is generated by the processing environment. The input context is split in a static and a dynamic part (denoted statEnv and dynEnv, respectively).
Normalization. To simplify the semantics specification, some normalization must first be performed over the [expression/query]. The [XPath/XQuery] language provides many powerful features that make [expression/query]s simpler to write and use, but are also redundant. For instance, a complex for expression might be rewritten as a composition of several simple for expressions. The language composed of these simpler [expression/query] is called the [XPath/XQuery] Core language and is a subset of the complete [XPath/XQuery] language. The grammar of the [XPath/XQuery] Core language is given in [A Normalized core grammar].

During the normalization phase, each [XPath/XQuery] [expression/query] is mapped into its equivalent [expression/query] in the core. (Note that this has nothing to do with Unicode Normalization, which works on character strings.) Normalization works by bottom-up application of normalization rules over expressions, starting with normalization of literals.

Specifically the normalization phase is defined in terms of the static part of the context (statEnv) and a [expression/query] (Expr) abstract syntax tree. Formal notations for the normalization phase are introduced in [2.7.1 Normalization].

After normalization, the full semantics can be obtained just by giving a semantics to the normalized Core [expression/query]. This is done during the last two phases.
Static type analysis. Static type analysis checks whether each [expression/query] is type safe, and if so, determines its static type. Static type analysis is defined only for Core [expression/query]. Static type analysis works by bottom-up application of type inference rules over expressions, taking the type of literals and the known types of input documents into account.

If the [expression/query] is not type-safe, static type analysis can result in a static error. For instance, a comparison between an integer value and a string value might be detected as an error during the static type analysis. If static type analysis succeeds, it results in a syntax tree where each sub-expression is "decorated" with its static type, and in an output type for the result of the [expression/query].

More precisely the static analysis phase is defined in terms of the static context (statEnv) and a core [expression/query] (CoreExpr). Formal notations for the static analysis phase are introduced in [2.7.2 Static type inference].

Static typing does not imply that the content of XML documents must be rigidly fixed or even known in advance. The [XPath/XQuery] type system accommodates "flexible" types, such as elements that can contain any content. Schema-less documents are handled in [XPath/XQuery] by associating a standard type with the document, such that it may include any legal XML content.
Dynamic Evaluation. This is the phase in which the result of the [expression/query] is computed. The semantics of evaluation is defined only for Core [expression/query] terms. Evaluation works by bottom-up application of evaluation rules over expressions, starting with evaluation of literals. This guarantees that every expression can be unambiguously reduced to a value, which is the final result of the [expression/query].

The dynamic evaluation phase is defined in terms of the static context (statEnv) and evaluation context (), and a core [expression/query] (CoreExpr). Formal notations for the dynamic evaluation phase are introduced in [2.7.3 Dynamic Evaluation].

Ed. Note: Issue:[Kristoffer/XSL] The distinction between errors manifest by not being able to prove a judgment and special error values is not clear. Also the processing model does not allow for skipping static typing. See [Issue-0094: Static type errors and warnings], [Issue-0171: Raising errors], and [Issue-0169: Conformance Levels].

The first four phases above are "analysis-time" (sometimes also called "compile-time") steps, which is to say that they can be performed on a [expression/query] before examining any input document. Indeed, analysis-time processing can be performed before such document even exists. Analysis-time processing can detect many errors early-on, e.g., syntax errors or type errors. If no error occurs, the result of analysis-time processing could be some compiled form of [expression/query], suitable for execution by a compiled-[expression/query] processor. The last phase is an "execution-time" (sometimes also called "run-time") step, which is to say that the query is evaluated on actual input document(s).

Static analysis catches only certain classes of errors. For instance, it can detect a comparison operation applied between incompatible types (e.g., xs:int and xs:date). Some other classes of errors cannot be detected by the static analysis and are only detected at execution time. For instance, whether an arithmetic expression on 32 bits integers (xs:int) yields an out-of-bound value can only be detected at run-time by looking at the data.

While implementations may be free to employ different processing models, the [XPath/XQuery] static semantics relies on the existence of a static type analysis phase that precedes any access to the input data. Statically typed implementations are required to find and report static type errors, as specified in this document. It is still an open issue (See [Issue-0098: Implementation of and conformance levels for static type checking]) whether to make the static type analysis phase optional to allow implementations to return a result for a type-invalid [expression/query] in special cases where the [expression/query] happens to perform correctly on a particular instance of input data. (Note that this is not the same as merely permitting that the static type error is emitted at evaluation time as is done below.)

Notice that the separation of logical processing into phases is not meant to imply that implementations must separate analysis-time from evaluation-time processing: [expression/query] processors may choose to perform all phases simultaneously at evaluation-time and may even mix the phases in their internal implementations. The processing model defines only what the final result.

The above processing phases are all internal to the [expression/query] processor. They do not deal with how the [expression/query] processor interacts with the outside world, notably how it accesses actual documents and types. A typical [expression/query] engine would support at least three other important processing phases:

XML Schema import phase. The [XPath/XQuery] type system is based on XML Schema. In order to perform static type analysis, the [XPath/XQuery] processor needs to build type descriptions that correspond to the schema(s) of the input documents. This phase is achieved by mapping all schemas required by the [expression/query] into the [XPath/XQuery] type system. The XML Schema import phase is described in [8 Importing Schemas].
XML loading phase. Expressions are evaluated on values in the [XQuery 1.0 and XPath 2.0 Data Model]. XML documents must be loaded into the [XQuery 1.0 and XPath 2.0 Data Model] before the evaluation phase. This is described in the [XQuery 1.0 and XPath 2.0 Data Model] and is not discussed further here.
Serialization phase. Once the [expression/query] is evaluated, processors might want to serialize the result of the [expression/query] as actual XML documents. Serialization of data model instances is still an open issue (See [Issue-0116: Serialization]) and is not discussed further here.

The parsing phase is not specified formally; the formal semantics does not define a formal model for the syntax trees, but uses the [XPath/XQuery] concrete syntax directly. More details about parsing for XQuery 1.0 can be found in the [XQuery 1.0: A Query Language for XML] document and more details about parsing for XPath 2.0 can be found in the [XML Path Language (XPath) 2.0] document. No further discussion of parsing is included here.

For the other three phases (normalization, static type analysis and evaluation), instead or giving an algorithm in pseudo-code, the semantics is described formally by means of inference rules. Inference rules provide a notation to describe how the semantics of an expression derives from the semantics of its sub-expressions. Hence, they provide a concise and precise way for specifying bottom-up algorithms, as required by the normalization, static type analysis, and evaluation phases.

2.2 Namespaces

The Formal Semantics uses the following namespace prefixes.

xf: and op: for functions and operators from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.
dm: for constructors and accessors of the [XQuery 1.0 and XPath 2.0 Data Model].
xs: for XML Schema components, and built-in types.
fs: for Formal Semantics specific components.

All these prefixes are assumed to be bound to the appropriate URIs.

Ed. Note: [Kristoffer/XSL] The fs: prefix URI should be defined here.

2.3 Data Model

2.3.1 Data model overview

This section gives an overview of the main data model features that are relevant to the Formal Semantics. The reader is referred to the [XQuery 1.0 and XPath 2.0 Data Model] document for more details.

An atomic value is a value in the value space of one of the [XML Schema Part 2] atomic types (that is, a primitive simple type that is not derived by list or union).
A node is one of the following kinds: document, element, attribute, namespace, comment, processing-instruction and text. The [XPath/XQuery] Formal Semantics does not currently discuss namespace, comment, and processing-instruction nodes (See [Issue-0105: Types for nodes in the data model.]). Components of nodes can be typed values, string values, names, type annotations, or other nodes.
XML documents and their contents are represented in the data model as trees composed of nodes and atomic values. [XQuery 1.0 and XPath 2.0 Data Model] defines a mapping from the PSVI (Post Schema Validation Information Set) to such trees.
[XQuery 1.0 and XPath 2.0 Data Model] defines constructors, which are functions that construct nodes and atomic values, and accessors, which are functions that access a value's properties or components.

For instance, the children function returns all the child nodes of an element node. The typed value of a node is a sequence of zero or more atomic values. The string value of a node is an instance of the type xs:string. The name of a node is an instance of the type xs:QName.
An item is either an atomic value or a node.
Every value in the data model is a sequence. A sequence is an ordered collection of zero or more items. Sequences are represented by items separated by commas, enclosed in parentheses.

A sequence containing exactly one item is called a singleton sequence. An item is identical to a singleton sequence containing that item. Sequences are never nested--for example, combining the values 1, (2, 3), and () into a single sequence results in the sequence (1, 2, 3). A sequence containing zero items is called an empty sequence.
The data model annotates element nodes, attribute nodes, and atomic values with type names. This type annotation is the type name given to the element, attribute, or value through the validation process.

The remainder of the section is devoted to several related features that merit special attention: node identity, order, and type annotations.

2.3.2 Node identity

In the [XQuery 1.0 and XPath 2.0 Data Model], nodes have identity. Node identity follows from the unique location of the node within exactly one XML document (or document fragment, in the case of XML being constructed within the [expression/query] itself). It is possible for two expressions to return not only the same value, but also the identical node. On the other hand, two expressions could return results with the same document structure, but different identities. [XPath/XQuery] supports comparison of nodes using either "node equality" (equality by identity) or "value equality" (equality by value), for instance by using the operators is or =.

Atomic values do not have an associated identity and are therefore always compared by value.

The difference between node equality and value equality is illustrated by the following example:

   let $a1 := <book><author>Suciu</author></book>,
       $a2 := <author>Suciu</author>,
       $a3 := $a1/author
   return
       ($a1/author is $a3), {-- true, they refer to the same node --}
       ($a1/author = $a2),  {-- true, they have the same value --}
       ($a1/author is $a2)  {-- false, they are not the same node --}

All [XPath/XQuery]'s operators preserve node identity with the exception of explicit copy operations, node (element, attribute, etc.) constructors and validate expressions. An element constructor always creates a deep copy of its attributes and child nodes. Other operators, such as sequence constructors or path expressions, do not result in copies of the corresponding nodes. So, for example, ($a3, $a2) creates a new sequence of nodes, the first of which has the same identity as $a1/author.

2.3.3 Document order and sequence order

There are two kinds of order in [XPath/XQuery]: sequence order and document order.

Sequence order. Sequences of items in the [XPath/XQuery] data model are ordered. Sequence order refers to the order of items within a given sequence.

Document order. Document order refers to the total order among all nodes in a given document. It is defined as the order of appearance of the nodes when performing a pre-order, depth-first traversal of a tree. For elements, this corresponds to the order of appearance of their opening tags in the corresponding XML serialization. Document order is equivalent to the definition used in [XML Path Language (XPath) : Version 1.0].

Depending on the context, it may be possible to talk about two different notions of order for the same (set of) nodes. For example:

   let $e := <list>
               <item id="1"/>
               <item id="2"/>
             </list>,
       $s := ($e/item[@id='2'], $e/item[@id='1'])
   ...

The item "1" is before item "2" in document order, but the same two nodes are in the opposite order in the sequence $s.

The order between nodes from distinct documents is implementation defined but must be stable. That is, it must not change for the duration of query evaluation.

2.3.4 Type annotations

A type annotation indicates the type name for each element and attribute node, which results from the validation process or is assigned by default (depending on the way the data was built). In the case of anonymous types, the data model annotates element and attributes with xs:anyType and xs:anySimpleType respectively.

For instance, consider the document fragment

  <fact>The cat weighs <weight>12</weight> pounds.</fact>

and the associated schema

  <xs:element name="fact" type="FactType"/>

  <xs:complexType name="FactType" mixed="true">
    <xs:sequence>
      <xs:element name="weight" type="xs:integer">
    </xs:sequence>
  </xs:complexType>

Before validation, the two elements fact and weight in the document are annotated by xs:anyType.

After validation, the element fact is annotated with the type name FactType and the element weight is annotated with the type name xs:integer.

Type annotations are taken into account during [expression/query] evaluation and have an impact on the semantics of [XPath/XQuery].

For instance, consider the following function declaration

  define function convert_weight(element of type FactType $x)
    { ... }

This function only accepts as input an element annotated with the type FactType, or with one of the types derived from it. Therefore, the previous document fragment is only accepted by the function after it has been validated against the given schema.

[3 The XQuery Type System] gives a formal description of data model values with type annotations and explain how type annotations are taken into account when matching a value against a given type.

2.4 Schemas and types

This section presents an introduction to (statically) typed languages in general, and a conceptual overview of the [XPath/XQuery] type system in particular. The [XPath/XQuery] type system is discussed formally in [3 The XQuery Type System].

2.4.1 The elements of a (statically) typed language

A type system relates values, types, and expressions in the language.

The main constituent parts of a typed system are:

A universe of possible types. Type are composed of other types, starting with primitive types, and can be built using type constructors.

In [XPath/XQuery], the primitive types are the primitive atomic types of [XML Schema Part 2]. Type constructors are also based on schema, and can be used to build types for attributes and elements, sequence and choice groups, etc. The [XPath/XQuery] type system is defined in [3.1 Values and Types].
A notation for specifying types.

In [XPath/XQuery], types are imported from XML Schema, and can be used in expressions using the SequenceType syntax. In order to model and reason about types in [XPath/XQuery], this document introduces its own notation for types. This notation is defined in [3.1 Values and Types].
A relationship between values and types. Formally, the semantics of a type is defined to be the (usually infinite) set of instance values which match that type.

For example, the type xs:integer consists of the set of all integer values, and the type element address { xs:string } consists of all elements named "address" whose contents are a single string value, etc.

Matching a [XPath/XQuery] value against a type is defined in [3.3 Matching].
A notion of subtyping. Subtyping is a relationship between types that plays an important role in a statically typed language. Notably, it is used to determine whether function calls are legal or not.

Subtyping in [XPath/XQuery] is defined in [3.5.1 Subtype].

For a statically typed language one also needs to define:

Rules that relate expressions to types. Static type analysis infers a type for each expression in the language. The most important property for a static type system is that the type inferred statically must be such that for all valid inputs, evaluating the expression returns a result that matches the inferred type. If this property holds, then the static typing provides guarantees that certain kinds of errors can not occur at run-time.

The static typing rules for [XPath/XQuery] are defined for all [XPath/XQuery] expressions in [5 Expressions], [6 The Query Prolog] and [7 Additional Semantics of Functions].

Finally, types can also be used during language evaluation if the language provides:

Expressions on types. A language can support operations that take types into account, such as casting, type matching, etc.

[XPath/XQuery] provide several such expressions: "typeswitch", "cast as", "treat as", and "validate as". [XPath/XQuery] also support a "validate" expression, which performs XML Schema validation. Validation is related to matching in that if validation succeeds, it returns a value which matches the type used for validation. Expressions on types are defined in [5.12 Expressions on SequenceTypes]. The validate expression is defined in [3.4 Erase and Annotate].

The remainder of the section is devoted to several related features that merit special attention: the relationship between the [XPath/XQuery] type system and XML Schema, structural and named typing, and subtyping.

2.4.2 XML Schema and the XQuery type system

The [XPath/XQuery] type system is based on [XML Schema]. An introduction to XML Schema can be found in [XML Schema Part 0].

The [XPath/XQuery] type system captures a large subset of XML Schema. This section gives a summary of which features are captured and in some cases deviations from XML Schema.

The following components and features of XML Schema are captured exactly in the [XPath/XQuery] type system:

Element and attribute declarations;
Simple and complex type definitions;
Local attributes and elements;
Named and anonymous types;
the type name hierarchy;
Sequence, choice and all groups;
Wildcards;
Derivation by extension, list and union;
Substitution groups.

The following components or features of XML Schema are not represented in the [XPath/XQuery] type system.

Simple type facets;
Identity-constraint definitions;
Notations and annotations.

The following components or features of XML Schema are represented partially or differently in the [XPath/XQuery] type system.

The [XPath/XQuery] type system only captures an approximation of minOccurs and maxOccurs. Like DTDs, the [XPath/XQuery] type system only supports *, +, and ? which correspond to [minOccurs="0" maxOccurs="unbounded"], [minOccurs="1" maxOccurs="unbounded"], and [minOccurs="0" maxOccurs="1"] respectively.
The [XPath/XQuery] type system generalizes the notion of derivation by restriction for the needs of static type analysis. Derivation by restriction in XML Schema relies on a syntactic relationship between content models. The [XPath/XQuery] type system uses a generalization of this relationship based on the notion of inclusion between types -- also called structural subtyping.

At compile time, the [XPath/XQuery] environment imports XML Schema declarations, and loads them as declarations in the [XPath/XQuery] type system. This loading process is specified by a mapping from XML Schema to the [XPath/XQuery] type system, given in [8 Importing Schemas].

2.4.3 Structural and named typing

XML Schema is a powerful schema language for XML. XML Schema can describe both structural constraints as well as type annotations that apply to XML documents. These aspects are referred to as structural typing and named typing respectively.

Structural typing. An important notion underlying XML Schema is the concept of regular expressions, and their tree counterparts, regular tree grammars. An introduction to regular (tree) languages can be found in [Languages] or in [TATA].

Tree grammars can be used to capture the structural constraints imposed by an XML Schema. Automata are the traditional way of implementing tree grammars and can be used to implement part of the XML Schema validation process.

For instance, consider the following element declaration:
```
  <xs:element name="root">
    <xs:complexType>
      <xs:sequence maxOccurs="unbounded">
	<xs:choice minOccurs="0" maxOccurs="unbounded">
	  <xs:element ref="a"/>
	  <xs:element ref="b" minOccurs="0"/>
	</xs:choice>
	<xs:element ref="c"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
```
The structural constraints imposed by the definition of the content of element root can be described by the following regular expression (written here in DTD notation): ((a|b?)*,c)+. Such a constraint can be checked using a finite state automaton applied on the children of the element root in the instance document.
Named typing. In addition, XML Schema provides the ability to associate names with type definitions and to declare relationships between type names.

For instance, consider the following two element declarations:
```
  <xs:element name="person">
    <xs:complexType name="Person">
      <xs:sequence>
        <xs:element ref="ssn">
	<xs:choice>
          <xs:sequence>
  	    <xs:element ref="major" minOccurs="0"/>
  	    <xs:element ref="minors" maxOccurs="unbounded"/>
          </xs:sequence>
          <xs:element ref="job_desc"/>
	</xs:choice>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="student">
    <xs:complexType name="Student">
      <xs:complexContent>
        <xs:restriction base="Person">
          <xs:sequence>
            <xs:element ref="ssn">
            <xs:element ref="major" minOccurs="0"/>
            <xs:element ref="minors" maxOccurs="unbounded"/>
          </xs:sequence>
        </xs:restriction>
      </xs:complexType>
    </xs:complexContent>
  </xs:element>
```
The type Studentof the element student is derived by restriction from the type Person. Such a declaration not only defines a new type, but also creates a relationship between the type names Person and Student which needs to be taken into account when doing validation and matching against a given type. For instance, the following function
```
  define function salary(element of type Person $x)
    { .... }
```
accepts as parameters elements of type Person as well as elements with types derived by restriction from Person, but does not accept elements with the same content model as the element person but a type which does not derive from the type Person.

2.4.4 Subtyping

A type is a subtype of another type if every value matching the first type also matches the second type. Since matching takes both type structure and type names into account, so is subtyping.

Here is a simple example of subtyping on two regular expressions:

  ( xs:double )*
  ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains only doubles is also an example of a sequence that contains either integers or doubles. This example only illustrates the structural subtyping.

Type₁ <: Type₂ indicates that type Type₁ is a subtype of Type₂. Note that the notation <: is used when defining the [XPath/XQuery] semantics, but is not part of the [XPath/XQuery] syntax.

2.4.4.1 Type substitutability

Subtyping plays a crucial role in the semantics of function application since all functions have an associated signature that declares the types of input parameters and the type of the result of the function.

Intuitively, one can call a function not only by passing an argument of the declared type, but also any argument whose type is a subtype of the declared type. This property is called type substitutability.

Ed. Note: [Kristoffer/XSL] Type substitutability should really be true for any subexpression, not just function arguments (it just happens that in functional languages, where type substitutability is usually defined, the situations are equivalent). Specifically type substitutability does not apply to any subexpression of [XPath/XQuery]: typeswitch violates it. In particular this means that while one may replace a function argument with one whose type is a subtype this does not guarantee that the resulting value is the same even if the value is the same (but cast to a subtype).

2.4.4.2 Subtyping and XML Schema derivation

Subtyping in [XPath/XQuery] is related to, but not identical to, "derivation by restriction" in XML Schema. A type derived by restriction is always a subtype of its base type, but the opposite is not true. For instance, consider the following types:

  ( xs:double )+
  xs:double, ( xs:integer | xs:double )*

The first type is a subtype of the second, because any sequence that contains one or more doubles is also an example of a sequence containing a double followed by a sequence of zero or more integers or doubles. But the first type cannot be derived from the second type using derivation by restriction as defined in XML Schema.

2.5 Functions

2.5.1 Functions and operators

The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides a number of useful basic functions over the components of the [XPath/XQuery] data model (atomic values, nodes, sequences, etc). A number of these functions are used in the course of describing the [XPath/XQuery] semantics. Here is the list of functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document that are used in the [XPath/XQuery] Formal Semantics:

op:numeric-add
op:numeric-subtract
op:numeric-multiply
op:numeric-divide
op:numeric-mod
op:numeric-unary-plus
op:numeric-unary-minus
op:numeric-equal
op:numeric-less-than
op:numeric-greater-than
op:boolean-equal
op:boolean-less-than
op:boolean-greater-than
op:yearMonthDuration-equal
op:yearMonthDuration-less-than
op:yearMonthDuration-greater-than
op:dayTimeDuration-equal
op:dayTimeDuration-less-than
op:dayTimeDuration-greater-than
op:dateTime-equal
op:dateTime-less-than
op:dateTime-greater-than
op:add-yearMonthDurations
op:subtract-yearMonthDurations
op:multiply-yearMonthDuration
op:divide-yearMonthDuration
op:add-dayTimeDurations
op:subtract-dayTimeDurations
op:multiply-dayTimeDuration
op:divide-dayTimeDuration
op:add-yearMonthDuration-to-dateTime
op:add-dayTimeDuration-to-dateTime
op:subtract-yearMonthDuration-from-dateTime
op:subtract-dayTimeDuration-from-dateTime
op:add-yearMonthDuration-to-date
op:add-dayTimeDuration-to-date
op:subtract-yearMonthDuration-from-date
op:subtract-dayTimeDuration-from-date
op:add-dayTimeDuration-to-time
op:subtract-dayTimeDuration-from-time
op:QName-equal
op:anyURI-equal
op:hex-binary-equal
op:base64-binary-equal
op:NOTATION-equal
op:node-equal
op:node-before
op:node-after
op:node-precedes
op:node-follows
op:to
op:concatenate
op:item-at
op:union
op:intersect
op:except
op:operation
xf:false
xf:true
xf:count
xf:boolean
xf:get-namespace-uri
xf:get-local-name
xf:round
xf:compare
xf:not
xf:empty
xf:root
xf:error

2.5.2 Functions and static typing

Many functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document are generic: they perform operations on arbitrary components of the data model, e.g., any kind of node, or any sequence of items. For instance, the xf:distinct-nodes function removes duplicates in any sequence of nodes. As a result, the signature given in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document is also generic. For instance, the signature of the xf:distinct-nodes function is:

  define function xf:distinct-nodes(node*) returns node*

If applied as is, this signature would result in poor static type information. In general, it would be preferable if some of these function signatures could take the type of input parameters into account. For instance, if the function xf:distinct-nodes is applied on a parameter of type element a*, element b, one can easily deduce that the resulting sequence is a collection of either a or b elements.

In order to provide better static typing, some specific typing rules are specified for some of the functions in [XQuery 1.0 and XPath 2.0 Functions and Operators]. These additional typing rules are given in [7 Additional Semantics of Functions]. Here is the list of functions that are given specific typing rules.

xf:error
xf:distinct-nodes
xf:distinct-values
op:union
op:intersect
op:except
op:to
xf:data

Ed. Note: There might be other functions that need specific typing rules. See [Issue-0135: Semantics of special functions].

2.5.3 The error function

Some queries result in an error. Errors that are detected at compile-time, like parse errors or type errors, are reported to the user by the system. Dynamic errors are raised using the xf:error() function. General handling of errors in the Formal Semantics is still an open issue (See [Issue-0094: Static type errors and warnings], [Issue-0098: Implementation of and conformance levels for static type checking], [Issue-0171: Raising errors], and [Issue-0169: Conformance Levels]).

2.5.4 Data Model Accessors and XPath Axes

The [XPath/XQuery] Data Model provides operations to construct or access components of the Data Model, some of which are used in the course of defining the [XPath/XQuery] semantics. The Formal Semantics uses the namespace prefix dm: for those functions to distinguish them from other functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Here is a list of data model constructors or accessors used for Formal Semantics specification.

dm:atomic-value
dm:children
dm:attributes
dm:parent
dm:name
dm:node-kind
dm:dereference
dm:empty-sequence
dm:namespaces
dm:type
dm:typed-value
dm:attribute-node-atomic
dm:element-node-atomic
dm:text-node

XPath axes are used to describe tree navigation in instances of the [XPath/XQuery] data model. The semantics of axis navigation is described in terms of data model functions. Some of the XPath axes are directly supported through existing data model accessors. Some axes (e.g., ancestor) are defined as a simple recursive application of a data model accessor and others (e.g., following) require more complex computation on top of existing data model accessors. The correspondence between XPath axis and data model accessors is specified in section [5.2.1.1 Axes]

2.5.5 Other Formal Semantics functions

In a few cases, the Formal Semantics makes use of functions that are not currently in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document. The namespace prefix fs: is used for those functions, to distinguish them from functions in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Here is a list of additional functions used for Formal Semantics specification.

fs:characters-to-string
fs:distinct-doc-order
fs:item-sequence-to-node-sequence
fs:item-sequence-to-string

2.6 Notations

In order to make the specification of the semantics both concise and precise, formal notations in the form of judgments and inference rules are used. This section, introduces the notations for judgments, inference rules and mapping rules, as well as the notion of an environment.

This section is intended for the reader who may not be familiar with these notations. The reader familiar with these notations can skip this section and go directly to [2.7 The Formal Semantics].

2.6.1 Grammar productions

Grammar productions are used in this document to describe "objects" (values, types, [XPath/XQuery] expressions, etc.) manipulated by the Formal Semantics. The Formal Semantics makes use of several kinds of grammar productions.

XQuery grammar productions describe the XQuery language and expressions. XQuery productions are identified by a number, which corresponds to the number in the [XQuery 1.0: A Query Language for XML] document, and are annotated with "(XQuery)". For instance, the following production describes FLWR expressions in XQuery.

[10 (XQuery)]

FLWRExpr

::=

((ForClause | LetClause)+ WhereClause? "return")* QuantifiedExpr

For the purpose of this document, the differences between the XQuery 1.0 and the XPath 2.0 grammars are mostly irrelevant. By default, this document uses XQuery 1.0 grammar productions. Whenever the grammar for XPath 2.0 differs from the one for XQuery 1.0, the corresponding XPath 2.0 productions are also given. XPath productions are identified by a number, which corresponds to the number in the document, and are annotated with "(Path)". For instance, the following production describes for expressions in XPath.

[13 (XPath)]

ForExpr

::=

(ForClause "return")* QuantifiedExpr

XQuery Core grammar productions describe the XQuery Core. The complete XQuery Core grammar is given in [A Normalized core grammar]. XQuery Core productions are identified by a number, which corresponds to the number in [A Normalized core grammar], and are annotated by "(Core)". For instance, the following production describes the simpler form of "for" expressions present in the XQuery Core.

[8 (Core)]

ForExpr

::=

(ForClause "return")* TypeswitchExpr

The Formal Semantics sometimes needs to manipulate "objects" (values, types, expressions, etc.) for which there is no existing grammar production in the [XQuery 1.0: A Query Language for XML] document. In these cases, specific grammar productions are introduced. Notably, additional productions are used to describe the [XPath/XQuery] type system. XQuery Formal Semantics productions are identified by a number, and are annotated by "(Formal)". For instance, the following production describes global type definitions in the [XPath/XQuery] type system.

[27 (Formal)]

Definition

::=

(<"define" "element"> ElementName Substitution? Nillable? TypeSpecifier)
| (<"define" "attribute"> AttributeName TypeSpecifier)
| (<"define" "type"> TypeName TypeDerivation)

Note that grammar productions which are specific to the Formal Semantics (i.e., with the "(Formal)" annotation) are not part of [XPath/XQuery]. They are not accessible to the user and are only used in the course of defining the language's semantics.

2.6.2 Judgments

The basic building block of the formal specification is called a judgment. A judgment expresses whether a property holds or not.

For example:

Notation

The judgment

Painting is beautiful

holds if the painting Painting is beautiful.

Or more relevant, here are two example of judgments that are used extensively in the rest of this document.

Notation

The judgment

Expr => Value

holds if the expression Expr yields (or evaluates to) the value Value.

Notation

The judgment

Expr : Type

holds when the expression Expr has type Type.

A judgment can contain symbols and patterns.

Symbols are purely syntactic and are used to write the judgment itself. In general, symbols in a judgment are chosen to reflect its meaning. For example, 'is beautiful', '=>' and ':' are symbols, the second and third of which should be read "yields", and "has type" respectively.

Patterns are written with italicized words. The name of a pattern is significant: each pattern name corresponds to an "object" (a value, a type, an expression, etc.) that can be substituted legally for the pattern. By convention, all patterns in the Formal Semantics correspond to grammar non-terminals, and are used to represent entities that can be constructed through application of the corresponding grammar production. For example, Expr can be used to represent any [XPath/XQuery] expression, Value can be used to represent any value in the [XPath/XQuery] data model.

When applying the judgment, each pattern must be instantiated to an appropriate sort of "object" (value, type, expression, etc). For example, '3 => 3' and '$x+0 => 3' are both instances of the judgment 'Expr => Value'. Note that in the first judgment, '3' corresponds to both the expression '3' (on the left-hand side of the => symbol) and to the the value '3' (on the right-hand side of the => symbol).

Patterns may appear with subscripts (e.g. Expr₁, Expr₂), which simply name different instances of the same sort of pattern. Each distinct pattern must be instantiated to a single "object" (value, type, expression, etc.). If the same pattern occurs twice in a judgment description then it should be instantiated with the same "object". For example, '3 => 3' is an instance of the judgment 'Expr₁ => Expr₁' but '$x+0 => 3' is not since the two expressions '$x+0' and '3' cannot be both instance of the pattern Expr₁. On the contrary, '$x+0 => 3' is an instance of the judgment 'Expr₁ => Expr₂'.

In a few cases, patterns may have a name which is not exactly the name of a grammar production but is based on it. For instance, a BaseTypeName is a pattern which stands for a type name, as would TypeName, or TypeName₂. This usage is limited, and only occurs to improve the readability of some of the inference rules.

2.6.3 Inference rules

Whether a judgments holds or not is specified by means of inference rules. Inference rules express the logical relation between judgments and describe how complex judgments can be concluded from simpler premise judgments. (As explained in [2.1 Processing model], this approach lends itself well to bottom-up algorithms.)

A logical inference rule is written as a collection of premises and a conclusion, respectively written above and below a dividing line:

premise₁ ... premise_n

conclusion

All premises and the conclusion are judgments. The interpretation of an inference rule is: if all the premise judgments above the line hold, then the conclusion judgment below the line must also hold.

Here is a simple example of inference rule, which uses the example judgment 'Expr => Value' from above:

$x => 0 3 => 3

$x + 3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': if the variable expression '$x' yields the value '0', and the literal expression '3' yields the value '3', then the expression '$x + 3' yields the value '3'.

It is also possible for an inference rule to have no premises above the line to have no judgments at all; this simply means that the expression below the line always holds:

3 => 3

This inference rule expresses the following property of the judgment 'Expr => Value': evaluating the literal expression '3' always yields the value '3'.

The two above rules are expressed in terms of specific variables and values, but usually rules are more abstract. That is, the judgments they relate contain patterns. For example, here is a rule that says that for any variable Variable that yields the integer value Integer, adding '0' yields the same integer value:

Variable => Integer

Variable + 0 => Integer

As in a judgment, each pattern in a particular inference rule must be instantiated to the same "object" within the entire rule. This means that one can talk about "the value of Variable" instead of the more precise "what Variable is instantiated to in (this particular instantiation of) the inference rule".

2.6.4 Environments

Logical inference rules use environments to record information computed during static type analysis or dynamic evaluation so that this information can be used by other logical inference rules. For example, the type signature of a user-defined function in a [expression/query] prolog can be recorded in an environment and used by subsequent rules. Similarly, the value assigned to a variable within a "let" statement can be captured in an environment and used for further evaluations.

An environment is a dictionary that maps a symbol (e.g., a function name or a variable name) to an "object" (e.g., a function body, a type, a value). One can either access existing information from an environment, or update the environment.

If "env" is an environment, then "env(symbol)" denotes the "object" to which symbol is mapped to. The notation is intentionally akin to function application as an environment can be seen as a "function" from the argument symbol to the "object" that the symbol is mapped to.

This specification uses environment groups that group related environments. If "env" is an environment group with the member "mem", then that environment is denoted "env.mem" and the value that it maps symbol to is denoted "env.mem(symbol)".

Updating is only defined on environment groups:

"env [ mem(symbol |-> object) ]" denotes the a new environment group which is identical to env except that the mem environment has been updated to map symbol to object. The notation symbol |-> object indicates that symbol is mapped to object in the new environment.
If the "object" is a type then the following conventional variant notation is used: "env [ mem(symbol : object) ]".
The following shorthand is also allowed: "env [ mem( symbol₁ |-> object₁ ; ... ; symbol_n |-> object_n ) ]" in which each symbol is mapped to a corresponding object in the new environment.

This notation is equivalent to nested simple updates, as in " (...(env[mem(symbol₁ |-> object₁)]) ... [mem(symbol_n |-> object_n)])".

Note that updating the environment overrides any previous binding that might exist for the same name. Updating the environment is used to capture the scope of variables, namespaces, etc. Also, note that there are no operations to remove entries from environments: the need never arises because the environment group from "before" the update remains accessible concurrently with the updated version.

Environments are typically used as part of a judgment, to capture some of the context in which the judgment is computed. Indeed, most judgments are computed assuming that some environment is given. This assumption is denoted by prefixing the judgment with "env |-". The "|-" symbol is called a "turnstile" and is ubiquitous in inference rules.

For instance, the judgment

dynEnv |- Expr => Value

should be read: assuming the dynamic environment dynEnv, the expression Expr yields the value Value.

The two main environments used in the Formal Semantics are: a dynamic environment (dynEnv), which captures the [XPath/XQuery]'s dynamic context, and a static environment (statEnv), which captures the [XPath/XQuery]'s static context. Both are defined in [4.1 Expression Context].

2.6.5 Putting it together

Putting the above notations together, here is an example of an inference rule that occurs later in this document:

statEnv |- Expr₁ : Type₁ statEnv |- Expr₂ : Type₂

statEnv |- Expr₁ , Expr₂ : Type₁, Type₂

This rule is read as follows: if two expressions Expr₁ and Expr₂ have already been statically inferred to have types Type₁ and Type₂ (the two premises above the line), then it is the case that the expression below the line "Expr₁ , Expr₂" must have the type "Type₁, Type₂", which is the sequence of types Type₁ and Type₂.

The above inference rule, does not modify the (static) environment. The following rule defines the static semantics of a "let/return" expression. The binding of the new variable is captured by an update to the varType component of the original static environment.

statEnv |- Expr₁ : Type₁ statEnv [ varType(QName : Type₁) ] |- Expr₂ : Type₂

statEnv |- let $QName := Expr₁ return Expr₂ : Type₂

This rule should be read as follows. First, the type Type₁ for the "let" input expression Expr₁ is computed. Second the "let" variable is added into the varType member of the static environment group statEnv, with type Type₁. Finally, the type Type₂ of Expr₂ is computed in that new environment.

Ed. Note: Jonathan suggests that we should explain 'chain' inference rules. I.e., how several inference rules are applied recursively.

2.7 The Formal Semantics

Finally, this section introduces the top-level notations defining the processing model phases described above. These notations are used to define the normalization, static type analysis, and dynamic evaluation processing phases.

2.7.1 Normalization

Normalization is specified using mapping rules which describe how a [XPath/XQuery] expression is rewritten into another expression in the [XPath/XQuery] Core. Note that mapping rules are also used in [8 Importing Schemas] to specify how XML Schemas are imported into the [XPath/XQuery] Type System.

Notation

Mapping rules are written using a square bracket notation, as follows:

[Object]_Subscript

Mapped Object

The original "object" is written above the == sign. The rewritten "object" is written beneath the == sign. The subscript is used to indicate what kind of "object" is mapped, and sometimes to pass some information between mapping rules.

(Since normalization is always done in the context of the static context the above is really a shorthand for

statEnv |- [Object] _Subscript == Mapped Object

but we shall stay with the shorthand because statEnv will always be implied.)

The static environment is used in certain cases (e.g. for normalization of function calls) during normalization. To keep the notation simpler, the static environment is not written in the normalization rules, but it is assumed to be available.

Ed. Note: [Kristoffer/XSL] We should decide whether to use a shorthand notation as suggested or modify the mapping rules throughout.

Specifically the normalization rule that is used to map "top-level" expressions in the [XPath/XQuery] syntax into expressions in the [XPath/XQuery] Core is

[Expr]_Expr

CoreExpr

which indicates that the expression Expr is normalized to the expression CoreExpr in the [XPath/XQuery] core (with the implied statEnv).

Example

For instance, the following [expression/query]

    for $i in (1, 2),
        $j in (3, 4)
    return
      element pair { ($i,$j) }

is normalized to the core expression

    for $i in (1, 2) return
      for $j in (3, 4) return
        return
          element pair { ($i,$j) }

in which the complex "FWLR" expression is mapped into a composition of two simpler "for" expressions.

2.7.2 Static type inference

The static semantics is specified using type inference rules, which relate [XPath/XQuery] expressions to types and specify under what conditions an expression is well typed.

Notation

The judgment

statEnv |- Expr : Type

holds when, assuming the static environment statEnv is given, the expression Expr has type Type.

Example

The result of static type inference is to associate a static type with every [expression/query], such that any evaluation of that [expression/query] is guaranteed to yield a value that belongs to that type.

For instance, the following expression.

   let $v := 3
   return
       $v+5

has type xs:integer. This can be inferred as follows: the input literals '3' and '5' have type integer, so the variable $v also has type integer. Since the sum of two integers is an integer, the complete expression has type integer.

Note

The type of an expression is computed by inference. Static type inference rules are given to describe, for each kind of expression, how to compute the type of the expression given the types of its sub-expressions. Here is a simple example of such a rule:

statEnv |- Expr₁ : xs:boolean statEnv |- Expr₂ : Type₂ statEnv |- Expr₃ : Type₃

statEnv |- if Expr₁ then Expr₂ else Expr₃ : ( Type₂ | Type₃ )

This rule states that if the conditional expression of an "if" expression has type boolean, then the type of the entire expression is one of the two types of its "then" and "else" clauses. Note that the resulting type is represented as a choice: '(Type₂|Type₃)'.

The "left half" of the expression below the line (the part before the :) corresponds to some phrase in a [expression/query], for which a type is computed. If the [expression/query] has been parsed into an internal parse tree, this usually corresponds to some node in that tree. The expression usually has patterns in it (here Expr₁, Expr₂, and Expr₃) that need to be matched against the children of the node in the parse tree. The expressions above the line indicate things that need to be computed to use this rule; in this case, the types of the two sub-expressions of the "," operator. Once those types are computed (by further applying static inference rules recursively to the expressions on each side), then the type of the expression below the line can be computed. This illustrates a general feature of the type system: the type of an expression depends only on the type of its sub-expressions. The overall static type inference algorithm is recursive, following the parse structure of the [expression/query]. At each point in the recursion, an appropriate matching inference rule is sought; if at any point there is no applicable rule, then static type inference has failed and the [expression/query] is not type-safe.

2.7.3 Dynamic Evaluation

The dynamic, or operational, semantics is specified using value inference rules, which relate [XPath/XQuery] expressions to values, and in some cases specify the order in which an [XPath/XQuery] expression is evaluated.

Notation

The judgment

statEnv ; dynEnv |- Expr => Value

holds when, assuming the static environment statEnv and dynamic environment dynEnv are given, the expression Expr yields the value Value.

The static environment is used in certain cases (e.g. for type matching) during evaluation. To keep the notation simpler, the static environment is not written in the dynamic inference rules, but it is assumed to be available.

Example

For instance, the following expression.

   let $v := 3
   return
       $v+5

yields the integer value 8. This can be inferred as follows: the input literals '3' and '5' denote the values 3 and 5, respectively, so the variable $v has the value 3. Since the sum of 3 and 5 is 8, the complete expression has the value 8.

Note

As with static type inference, logical inference rules are used to determine the value of each expression, given the dynamic environment and the values of its sub-expressions. [XPath/XQuery]'s dynamic semantics is modeled on the dynamic semantics presented in [Milner].

The inference rules used for dynamic evaluation, like those for static type inference, follow a top-down recursive structure, computing the value of expressions from the values of their sub-expressions.

3 The XQuery Type System

Ed. Note: Status:This section has been extensively revised to improve the alignment between the [XPath/XQuery] type system and XML Schema. It now incorporates "named typing". Feedback on the new design is solicited.

The [XPath/XQuery] type system is used for the specification of both the dynamic and the static semantics of [XPath/XQuery]. It is used to describe the semantics of expressions on sequence types and in type conversion rules. It is also the basis for the static semantics of [XPath/XQuery].

This section defines formal values and types, four main operations on types. Three operations (match, erase, and annotate) are used for the dynamic semantics. The last operation (subtyping) is used for the static semantics.

The "match" operation takes as input a value and a type and either succeeds or fails. Type matching checks that the value given as input verifies the name and structural constraints given by the type. It is used in matching parameters against function signatures, and matching values against cases in "typeswitch". An informal description of type matching is given in [2.4.2 SequenceType] in [XQuery 1.0: A Query Language for XML].

The match operation is one the most important operation on types and is the basis for the semantics of typeswitch, treat and assert as, and is used in function calls to check dynamically that the parameter(s) of a function verifies the function signature.
The "erase" operation takes a value and removes all type information from it.
The "annotate" operation takes an untyped value and a type and either succeeds or fails. Type annotation models part of the semantics of XML Schema validation. From the untyped value, it verifies the constraints described by the type, and annotates the value with type names, and adds atomic values. If it succeeds it returns a new value.

The "erase" and "annotate" operations model aspects of the validate as in [XPath/XQuery]. Note that the validate operation in [XPath/XQuery] needs to operate on possibly already typed data model instances, hence the need to erase type information before applying standard XML Schema validation.
The "subtyping" operation takes two types and succeeds if the first type is smaller than the second.

3.1 Values and Types

This section introduces formal notations for describing XQuery types and values. Those notations are used extensively throughout this document to represent values and types in inference rules. We use formal notations for values and types because they are more compact, and are easier to manipulate within the inference rules. The formal notations for values and types introduced here are not exposed to the XQuery user.

3.1.1 Values

The following grammar introduces formal notations for values. This notation is used to formally represent values in the the [XQuery 1.0 and XPath 2.0 Data Model].

A value is a sequence of zero or more items. An item is either a node or an atomic value. A node is either an element, an attribute, a document node, or a text node. Text nodes always contain string values of type xs:anySimpleType. Elements, attributes, and atomic values have a type annotation, which is the QName of a type.

Elements may be annotated with any type, and attributes may be annotated with any simple type.

A atomic value encapsulates an XML Schema atomic type (which is its type annotation) and a corresponding value of that type. An XML Schema atomic type [XML Schema Part 2] may be primitive or derived, or xs:anySimpleType. The corresponding atomic type value may be a value in the value space of one of the 19 primitive XML Schema types.

Type annotations are always present. Untyped elements or attributes (e.g., from well-formed documents before validation has been performed) are annotated with xs:anyType and attributes and atomic values with no other type information are annotated with xs:anySimpleType.

A value is called a simple value if it consists of a sequence of zero or more atomic values.

[1 (Formal)]	Value	::=	Item \| (Value "," Value) \| ("(" ")")
[2 (Formal)]	SimpleValue	::=	AtomicValue \| (SimpleValue "," SimpleValue) \| ("(" ")")
[3 (Formal)]	ElementValue	::=	"element" ElementName TypeAnnotation "{" Value "}"
[4 (Formal)]	AttributeValue	::=	"attribute" AttributeName TypeAnnotation "{" SimpleValue "}"
[5 (Formal)]	DocumentValue	::=	"document" "{" Value "}"
[6 (Formal)]	TextValue	::=	"text" "{" String "}"
[7 (Formal)]	NodeValue	::=	ElementValue \| AttributeValue \| DocumentValue \| TextValue
[8 (Formal)]	Item	::=	NodeValue \| AtomicValue
[9 (Formal)]	AtomicValue	::=	AtomicTypeValue TypeAnnotation
[10 (Formal)]	AtomicTypeValue	::=	String \| Boolean \| Decimal \| Float \| Double \| Duration \| DateTime \| Time \| Date \| GYearMonth \| GYear \| GMonthDay \| GDay \| GMonth \| HexBinary \| Base64Binary \| AnyURI \| QName \| NOTATION
[11 (Formal)]	TypeAnnotation	::=	<"of" "type"> TypeName
[12 (Formal)]	ElementName	::=	QName
[13 (Formal)]	AttributeName	::=	QName
[14 (Formal)]	TypeName	::=	QName

Notation

In the above grammar, the atomic type names are used to represents the value space of the corresponding XML Schema atomic type. For instance, "String" indicates the value space of xs:string, and "Decimal" indicates the value space of xs:decimal.

Note that the same rule about constructing sequences apply to the values described by that grammar. Notably sequences cannot be nested. For instance, the sequence (10, (1, 2), (), (3, 4)) is equivalent to the sequence (10, 1, 2, 3, 4).

Ed. Note: Issue: Although the XQuery data model describes an error value, this error value is not represented formally here. Formal treatment of errors is deferred to a future version of that document.

Ed. Note: Issue: The formal semantics does not represent comments and process-instructions (See [Issue-0143: Support for PI, comment and namespace nodes]).

Ed. Note: Issue: This formal notation for values only represents either text nodes or values. (See [Issue-0144: Representation of text nodes in formal values])

Example

Example (1):

  <fact>The cat weighs <weight units="lbs">12</weight> pounds.</fact>

In the absence of a Schema, this document is represented as

  element fact of type xs:anyType {
    text { "The cat weighs " },
    element weight of type xs:anyType {
      attribute units of type xs:anySimpleType {
        "lbs" of type xs:anySimpleType
      }
      text { "12" }
    },
    text { " pounds." }
  }

Example (2):

  <weight xsi:type="xs:integer">42</weight>

The formal model can represent values before and after validation. Before validation, this element is represented as:

  element weight of type xs:anyType {
    attribute xsi:type of type xs:anySimpleType {
      "xs:integer" of type xs:anySimpleType
    },
    text { "42" }
  }

After validation, this element is represented as:

  element weight of type xs:integer {
    attribute xsi:type of type xs:QName {
      "xs:integer" of type xs:QName
    },
    42 of type xs:integer
  }

Note that the typing rules must permit attributes with name xsi:type and xsi:nil to appear on any element.

Example (3):

  <sizes>1 2 3</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 2 3" }
  }

Assume the following Schema.

  <xs:element name="sizes" type="sizesType"/>
  <xs:simpleType name="sizesType">
    <xs:list itemType="sizeType"/>
  </xs:simpleType>
  <xs:simpleType name="sizeType">
    <xs:restriction base="xs:integer"/>
  </xs:simpleType>

After validation against this Schema, the element is represented as:

  element sizes of type sizesType {
    1 of type sizeType,
    2 of type sizeType,
    3 of type sizeType
  }

Example (4): Example with an anonymous type.

  <sizes>1 2 3</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 2 3" }
  }

Assume the following Schema.

  <xs:element name="sizes">
    <xs:simpleType>
      <xs:list itemType="xs:integer"/>
    </xs:simpleType>
  </xs:element>

After validation against this Schema, the element is represented as:

  element sizes of type xs:anySimpleType {
    1 of type xs:integer,
    2 of type xs:integer,
    3 of type xs:integer
  }

Example (5): Example with a union type.

  <sizes>1 two 3 four</sizes>

Before validation, this element is represented as:

  element sizes of type xs:anyType {
    text { "1 two 3 four" }
  }

Assume the following Schema:

  <xs:element name="sizes" type="sizesType"/>
  <xs:simpleType name="sizesType">
    <xs:list itemType="sizeType"/>
  </xs:simpleType>
  <xs:simpleType name="sizeType">
    <xs:union memberType="xs:integer xs:string"/>
  </xs:simpleType>

After validation against this Schema, the element is represented as:

  element sizes of type sizesType {
    1 of type xs:integer,
    "two" of type xs:string,
    3 of type xs:integer,
    "four" of type xs:string
  }

3.1.2 Types

As they are in DTDs, types in [XPath/XQuery] are based on regular expressions. A type is composed from item types by optional, one or more, zero or more, all group, sequence, choice, empty sequence, or empty choice (written none).

The type none matches no values. It is called the empty choice because it is the identity for choice, that is (Type | none) = Type)). The type none appears in the definition of [3.6.1 Prime types], and is the type for [2.5.3 The error function].

[15 (Formal)]	Type	::=	ItemType \| (Type Occurrence) \| (Type "&" Type) \| (Type "," Type) \| (Type "\|" Type) \| ("(" ")") \| "none"
[16 (Formal)]	Occurrence	::=	"*" \| "+" \| "?"

An item type is an element or attribute type, a document node type, a text node type, the untyped type (i.e., which is the type of untyped text values), or an atomic type.

[17 (Formal)]	ItemType	::=	ElementType \| AttributeType \| ("document" ("{" Type? "}")?) \| "text" \| "untyped" \| AtomicTypeName
[18 (Formal)]	AtomicTypeName	::=	QName

An element or attribute type gives an optional name and an optional type specifier. A name with a type specifier is a local declaration. A type specifier alone is a local declaration that matches any name. A name alone refers to a global declaration. The word "element" or "attribute" alone refers to any element or any attribute.

[19 (Formal)]	ElementType	::=	"element" ElementName? Nillable? TypeSpecifier?
[20 (Formal)]	AttributeType	::=	"attribute" AttributeName? TypeSpecifier?
[21 (Formal)]	Nillable	::=	"nillable"

A type specifier either references a global type, or specifies an anonymous type. An anonymous type is described by naming the base type it derives from and giving the type, optionally with a flag indicating a mixed content.

[22 (Formal)]	TypeSpecifier	::=	TypeDerivation \| TypeReference
[23 (Formal)]	TypeDerivation	::=	Derivation? Mixed? "{" Type? "}"
[24 (Formal)]	TypeReference	::=	<"of" "type"> TypeName
[25 (Formal)]	Derivation	::=	("restricts" TypeName) \| ("extends" TypeName)
[26 (Formal)]	Mixed	::=	"mixed"

Example

Example (1).

  element

matches any element.

Example (2).

  element of type xs:integer

matches any element of type xs:integer, such as:

  element size of type xs:integer {
    2 of type xs:integer
  }

Example (3).

  element restricts xs:anyType { xs:integer* }

matches any element of any type that contains a sequence of integers, such as:

  element numbers of type xs:integer {
    1 of type xs:integer,
    2 of type xs:integer,
    3 of type xs:integer
  }

Example (4).

  element sizes

refers to the global declaration for element "sizes".

Example (5).

  element trouble of type xs:anyType { ( xs:float | xs:string )* }

describes an element with a simple type derived by union. This type matches the following instance:

  validate as trouble { <trouble>this is not 1 string</trouble> }
=>
  element trouble of type xs:anyType {"this", "is", "not", 1, "string" }

Empty content can be indicated with the explicit empty sequence, or omitted, as in:

  element bib { () }
  element bib { }

If the content model is omitted, then the element, (resp. attribute or type) name must be a proper QName, and refers to the global element (resp. attribute or type) with that name, which must be defined (by importation from a schema or through a type definition expression).

The type system includes three operators on types: ",", "|" and "&", corresponding respectively to sequence, choice and all groups in Schema.

The "," operator builds the "sequence" of two types. For example,

  element title of type xs:string, element year of type xs:integer

means a sequence of an element title of type string followed by an element year of type integer.

The "|" operator builds a "choice" between two types. For example,

  element editor of type xs:string | element bib:author*

means either an element editor of type string, or a sequence of bib:author elements.

The "&" operator builds the "interleaved product" of two types. The type t₁ & t₂ matches any sequence that is an interleaving of a sequence that matches t₁ and a sequence that matches t₂. For example,

  (element a , element b) & xs:string   =
    element a, element b, xs:string
  | element a, xs:string, element b
  | xs:string, element a, element b

In each case, the order of element a and element b is unchanged, but the xs:string can be interleaved in any position.

The interleaved product is used to represent all groups in XML Schema. The xs:all construct in XML Schema may only consist of global or local element declarations with lower bound 0 or 1, and upper bound 1.

3.1.3 Top level definitions

At the top level, one can define elements, attributes, and types.

[27 (Formal)]	Definition	::=	(<"define" "element"> ElementName Substitution? Nillable? TypeSpecifier) \| (<"define" "attribute"> AttributeName TypeSpecifier) \| (<"define" "type"> TypeName TypeDerivation)
[28 (Formal)]	Substitution	::=	<"substitutes" "for"> ElementName

Global element and attribute declarations, like local element and attribute declarations, contain a type specifier. In addition, a global element declaration may declare the substitution group for the element and whether the element is nillable. A global type declaration specifies both the derivation and the declared type.

Ed. Note: Issue: The formal semantics provide support for substitution group. Support for substitution group is still an open issue in the [XPath/XQuery] document. (See [Issue-0144: Representation of text nodes in formal values])

3.1.4 Built-in type declarations

The following type definitions capture the primitive types from Schema.

  define type xs:string restricts xs:anySimpleType { xs:string }
  define type xs:decimal restricts xs:anySimpleType { xs:decimal }
  ...

In the above definitions, each name (such as xs:string) appears twice. The first appearance is used for named typing (xs:string is derived by restriction from xs:anySimpleType), and the second for structural typing (the value space of xs:string is given by xs:string -- the apparent circularity here is broken in the rule for matching in [3.3 Matching], which defines matching against xs:string directly).

The following type definition captures the Ur simple type from Schema.

  define type xs:anySimpleType restricts xs:anyType {
    ( xs:string | xs:decimal | ... )*
  }

The name of the Ur simple type is xs:anySimpleType, and its structural definition is given by the union of all simple type.

Note that there is no separate "value space" for the type xs:anySimpleType, it's value space being the same as xs:string. The following example shows the distinction between a value of type string containing "Database" and an untyped value containing "Database": both are using string values as content with different type names.

"Databases" of type xs:string
"Databases" of type xs:anySimpleType
  }

The following type definition captures the Ur type from Schema.

  define type xs:anyType restricts xs:anyType {
    attribute*, 
      ( xs:anySimpleType |
        ( element* & text* ) )
  }

Schema defines four built-in attributes that can appear on any element in the document without being explicitly declared in the schema. Those four attributes need to be added inside content models when doing matching. The four built-in attributes of Schema are declared as follows.

  define attribute xsi:type of type xs:QName
  define attribute xsi:nil of type xs:boolean
  define attribute xsi:schemaLocation of type xs:anySimpleType {
    xs:anyURI*
  }
  define attribute xsi:noNamespaceSchemaLocation of type xs:anyURI

For convenience, a type that is an all group of the four built-in XML Schema attributes is defined.

  BuiltInAttributes =
      attribute xsi:type ?
    & attribute xsi:nil ?
    & attribute xsi:schemaLocation ?
    & attribute xsi:noNamespaceSchemaLocation ?

3.1.5 Syntactic constraints on types

The type system given in [3.1.2 Types] gives a simple but overly general definition of types. For instance, the above type grammar allows types to describe sequences of attributes and elements in any order. For example, the following type, describing a sequence of zero or more elements or attributes with name "year", is allowed.

  (element year of type string | attribute year of type string)*

There are two reasons for this approach. First, it makes the grammar for types simpler. Second, it allows to describe types for heterogeneous sequences, that can result from [XPath/XQuery] expressions but cannot be represented as an XML Schema. For example, the above type is inferred for the following query.

  for $b in //book return
    if ($b/publisher = "Springer") then
      element year { $b/year/text() }
    else
      attribute year { $b/year/text() }

When constructing elements, or attributes, some additional constraints on type need to be imposed, for which this grammar is too general. This section gives auxiliary grammar productions which impose additional constraints on the content of elements and attributes.

First, a grammar for simple types is defined. This grammar is a subset of the grammar for types. A simple type is composed from atomic types by optional, one or more, zero or more, choice, or empty choice.

[29 (Formal)]

SimpleType

::=

An all group contains only attributes or only elements; attributes or elements in an all group may be optional but not repeated; attributes always precede other content of an element.

[30 (Formal)]	AttributeAll	::=	AttributeType \| (AttributeType "?") \| (AttributeAll "&" AttributeAll) \| ("(" ")")
[31 (Formal)]	ElementAll	::=	ElementType \| (ElementType "?") \| (ElementAll "&" ElementAll) \| ("(" ")") \| "none"
[32 (Formal)]	ElementNoAll	::=	ElementType \| "text" \| (ElementType Occurrence) \| (ElementAll "," ElementAll) \| (ElementAll "\|" ElementAll) \| ("(" ")")
[130 (Formal)]	ElementContent	::=	"ElementContent"
[33 (Formal)]	ElementBody	::=	AttributeAll "ElementContent"

The type in an element type should always match the grammar for ElementBody given above. (Note that in case the element type is omitted altogether, e.g., like in element a { }, the type is implicitely matching the empty sequence.)

Those auxiliary productions is be used when matching the type content of elements or attributes within inference rules.

3.1.6 Example

Here is a schema describing purchase orders, taken from the XML Schema Primer, followed by its mapping into the [XPath/XQuery] type system. The complete mapping from XML Schema into the [XPath/XQuery] type system is given in [8 Importing Schemas].

  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  
   <xsd:annotation>
    <xsd:documentation xml:lang="en">
     Purchase order schema for Example.com.
     Copyright 2000 Example.com. All rights reserved.
    </xsd:documentation>
   </xsd:annotation>
  
   <xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
  
   <xsd:element name="comment" type="xsd:string"/>
  
   <xsd:complexType name="PurchaseOrderType">
    <xsd:sequence>
     <xsd:element name="shipTo" type="USAddress"/>
     <xsd:element name="billTo" type="USAddress"/>
     <xsd:element ref="comment" minOccurs="0"/>
     <xsd:element name="items"  type="Items"/>
    </xsd:sequence>
    <xsd:attribute name="orderDate" type="xsd:date"/>
   </xsd:complexType>
  
   <xsd:complexType name="USAddress">
    <xsd:sequence>
     <xsd:element name="name"   type="xsd:string"/>
     <xsd:element name="street" type="xsd:string"/>
     <xsd:element name="city"   type="xsd:string"/>
     <xsd:element name="state"  type="xsd:string"/>
     <xsd:element name="zip"    type="xsd:decimal"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:NMTOKEN"
  	fixed="US"/>
   </xsd:complexType>
  
   <xsd:complexType name="Items">
    <xsd:sequence>
     <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
      <xsd:complexType>
  	<xsd:sequence>
  	 <xsd:element name="productName" type="xsd:string"/>
  	 <xsd:element name="quantity">
  	  <xsd:simpleType>
  	   <xsd:restriction base="xsd:positiveInteger">
  	    <xsd:maxExclusive value="100"/>
  	   </xsd:restriction>
  	  </xsd:simpleType>
  	 </xsd:element>
  	 <xsd:element name="USPrice"  type="xsd:decimal"/>
  	 <xsd:element ref="comment"   minOccurs="0"/>
  	 <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/>
  	</xsd:sequence>
  	<xsd:attribute name="partNum" type="SKU" use="required"/>
      </xsd:complexType>
     </xsd:element>
    </xsd:sequence>
   </xsd:complexType>
  
   <!-- Stock Keeping Unit, a code for identifying products -->
   <xsd:simpleType name="SKU">
    <xsd:restriction base="xsd:string">
     <xsd:pattern value="\d{3}-[A-Z]{2}"/>
    </xsd:restriction>
   </xsd:simpleType>
  
  </xsd:schema>

  namespace xsd = "http://www.w3.org/2001/XMLSchema"

  define element purchaseOrder of type ipo:PurchaseOrderType
 
  define element comment of type xsd:string
  
  define type PurchaseOrderType {
    attribute orderDate of type xsd:date?,
    element shipTo of type USAddress,
    element billTo of type USAddress,
    element ipo:comment?,
    element items of type Items
  }

  define type USAddress {
    attribute country of type xsd:NMTOKEN?,
    element name of type xsd:string,
    element street of type xsd:string,
    element city of type xsd:string,
    element state of type xsd:string,
    element zip of type xsd:decimal
  }

  define type Items {
    attribute partNum of type SKU,
    element item {
      element productName of type xsd:string,
      element quantity restricts xsd:positiveInteger { xsd:positiveInteger },
      element USPrice of type xsd:decimal,
      element comment?,
      element shipDate of type xsd:date?
    }*
  }

  define type SKU restrict xsd:string { xsd:string }

3.2 Auxiliary judgments

A number of auxiliary judgment that specify part of the semantics of Schema are defined. Those judgments are used in the rest of the section, but are not used directly used from other parts of the formal semantics.

3.2.1 Derivation and substitution

The following judgements capture the relationship between names, obtained from derivation and substitution groups in a given XML Schema.

3.2.1.1 Derives

Notation

The judgment

TypeName derives from TypeName

holds when the first type name derives from the second type name.

Note

Derivation is a partial order. It is reflexive and transitive by the definition below. It is asymmetric because no cycles are allowed in derivation by restriction or extension.

Semantics

This judgment is specified by the following rules.

Some rules have hypotheses that simply list a type, element, or attribute declaration.

Every type name derives from itself.

statEnv |- TypeName derives from TypeName

Every type name derives from the type it is declared to derive from by restriction or extension.

statEnv.typeDefn(TypeName) => define type TypeName extends BaseTypeName Mixed? { Type? }

statEnv |- TypeName derives from BaseTypeName

statEnv.typeDefn(TypeName) => define type TypeName restricts BaseTypeName Mixed? { Type? }

statEnv |- TypeName derives from BaseTypeName

Derivation is transitive.

statEnv |- TypeName₁ derives from TypeName₂ statEnv |- TypeName₂ derives from TypeName₃

statEnv |- TypeName₁ derives from TypeName₃

3.2.1.2 Substitutes

Notation

The judgment

ElementName substitutes for ElementName

holds when the first element name substitutes for the second element name.

Note

Substitution is a partial order. It is reflexive and transitive by the definition below. It is asymmetric because no cycles are allowed in substitution groups.

Semantics

This judgment is specified by the following rules.

Every element name substitutes for itself.

statEnv |- ElementName substitutes for ElementName

Every element name substitutes for the element it is declared to substitute for.

statEnv.elemDecl(ElementName) => define element ElementName substitutes for BaseElementName Nillable? TypeSpecifier

statEnv |- ElementName substitutes for BaseElementName

Substitution is transitive.

statEnv |- ElementName₁ substitutes for ElementName₂ statEnv |- ElementName₁ substitutes for ElementName₃

statEnv |- ElementName₁ substitutes for ElementName₃

3.2.2 Extension

Notation

The judgment

Type₁ extended by Type₂ is Type

holds when the result of extending Type₁ by Type₂ is Type.

Semantics

This judgment is specified by the following rules.

statEnv |- Type₁ = AttributeAll₁ , ElementContent₁ statEnv |- Type₂ = AttributeAll₂ , ElementContent₂

statEnv |- Type₁ extended by Type₂ is (AttributeAll₁ & AttributeAll₂) , ElementContent₁ , ElementContent₂

3.2.3 Mixed content

Notation

The judgment

Type₁ mixes to Type₂

holds when the result of creating a mixed content from Type₁ is Type₂.

Semantics

This judgment is specified by the following rules.

statEnv |- Type = AttributeAll , ElementContent

statEnv |- Type mixes to AttributeAll , ( ElementContent & text* )

3.2.4 Adjusts

An element may optionally include the four built-in attributes xsi:type, xsi:nil, xsi:schemaLocation, or xsi:noNamespaceSchemaLocation.

Notation

The judgment

Mixed? Type₁ adjusts to Type₂

holds when the second type is the same as the first, with the four built-in attributes added, and with text nodes added if the type is mixed.

Semantics

This judgment is specified by the following rules.

If the type is flagged as mixed, then mix the type and extend it by the built-in attributes.

statEnv |- Type mixes to Type'

statEnv |- Type' extended by BuiltInAttributes is Type''

statEnv |- mixed Type adjusts to Type''

Otherwise, just extend the type by the built-in attributes.

statEnv |- Type extended by BuiltInAttributes is Type'

statEnv |- Type adjusts to Type'

The definition of BuiltInAttributes appears in [3.1.4 Built-in type declarations].

3.2.5 Resolution

Notation

The judgment

TypeSpecifier resolves to TypeName { Type }

holds when an element matches the type declaration if the element has a type annotation that derives from the type name, and a content that matches the type.

Semantics

This judgment is specified by the following rules.

If the type is omitted, it is resolved as the empty sequence type.

statEnv |- Derivation? Mixed? { () } resolves to TypeName { Type }

statEnv |- Derivation? Mixed? { } resolves to TypeName { Type }

If the type specifier names a global type, then the type name is the name of that type, and the type is taken by resolving the type specifier of the global type.

statEnv.typeDefn(TypeName) => define type TypeName TypeDerivation

statEnv |- TypeDerivation resolves to BaseTypeName { Type }

statEnv |- of type TypeName resolves to TypeName { Type }

In the above inference rule, note that BaseTypeName is the base type of the type referred to. So this is indeed the original type name, TypeName, which must be returned, and eventually used to annotated the corresponding element or attribute. However, the type specifier needs to be obtained through a second application of the judgment.

If the type specifier is a restriction, then the type name is the name of the base type, and the type is taken from the type specifier.

statEnv |- Mixed? Type adjusts to AdjustedType

statEnv |- restricts TypeName Mixed? { Type } resolves to TypeName { AdjustedType }

If the type specifier is an extension, then the type name is the name of the base type, and the type is the base type extended by the type in the type specifier.

statEnv.typeDefn(TypeName) => define type TypeName Derivation? BaseMixed? { BaseType? }

statEnv |- BaseType? extended by Type is ExtendedType

statEnv |- Mixed? ExtendedType adjusts to AdjustedType

statEnv |- extends TypeName Mixed? { Type } resolves to TypeName { AdjustedType }

3.2.6 Derives

Notation

The judgment

TypeSpecifier derives from TypeSpecifier'

holds if the first type is derived from the second type. This corresponds to [Type Derivation OK] in [XML Schema Part 1].

Semantics

This judgment is specified by the following rules.

A type specifier validly derives from a type reference if the type specifier resolves to a type name that derives from the referenced type.

statEnv |- TypeSpecifier resolves to TypeName { Type } statEnv |- TypeName derives from TypeName'

statEnv |- TypeSpecifier derives from of type TypeName'

There is no rule if the type specifier not a type reference, because it is not possible for a local type to derive from another local type.

3.2.7 Lookup

Notation

The judgment

ElementName lookup ElementType yields Nillable? TypeSpecifier

holds when matching an element with the given element name against the given element type requires that the element be nillable as indicated and matches the type specifier.

Semantics

This judgment is specified by the following rules.

If the element type is a reference to a global element, then lookup yields the type specifier in the element declaration for the given element name. The given element name must be in the substitution group of the global element.

statEnv |- ElementName substitutes for ElementName'

statEnv.elemDecl(ElementName) => define element ElementName Substitution? Nillable? TypeSpecifier

statEnv |- ElementName lookup element ElementName' yields Nillable? TypeSpecifier

If the given element name matches the element name in the element type, and the element type contains a type specifier, then lookup yields that type specifier.

statEnv |- ElementName lookup element ElementName Nillable? TypeSpecifier yields Nillable? TypeSpecifier

If the element type has no element name but contains a type specifier, then lookup yields the type specifier.

statEnv |- ElementName lookup element TypeSpecifier yields TypeSpecifier

If the element type has no element name and no type specifier, then lookup yields xs:anyType.

statEnv |- ElementName lookup element yields of type xs:anyType

Notation

The judgment

AttributeName lookup AttributeType yields TypeSpecifier

holds when matching an attribute with the given attribute name against the given attribute type matches the type specifier.

Semantics

This judgment is specified by the following rules.

If the attribute type is a reference to a global attribute, then lookup yields the type specifier in the attribute declaration for the given attribute name.

statEnv.attrDecl(AttributeName) => define attribute AttributeName TypeSpecifier

statEnv |- AttributeName lookup attribute AttributeName yields TypeSpecifier

If the given attribute name matches the attribute name in the attribute type, and the attribute type contains a type specifier, then lookup yields that type specifier.

statEnv |- AttributeName lookup attribute AttributeName TypeSpecifier yields TypeSpecifier

If the attribute type has no attribute name but contains a type specifier, then lookup yields the type specifier.

statEnv |- AttributeName lookup attribute TypeSpecifier yields TypeSpecifier

If the attribute type has no attribute name and no type specifier, then lookup yields xs:anySimpleType.

statEnv |- AttributeName lookup attribute yields of type xs:anySimpleType

3.2.8 Interleaving

Notation

The judgment

Value₁ interleave Value₂ yields Value

holds if some interleaving of Value₁ and Value₂ yields Value. Interleaving is non-deterministic; it is used for processing all groups.

Semantics

This judgment is specified by the following rules.

Interleaving two empty sequences yields the empty sequence.

statEnv |- () interleave () yields ()

Otherwise, pick an item from the head of one of the sequences, and recursively interleave the remainder.

statEnv |- Value₁ interleave Value₂ yields Value

statEnv |- Item,Value₁ interleave Value₂ yields Item,Value

statEnv |- Value₁ interleave Value₂ yields Value

statEnv |- Value₁ interleave Item,Value₂ yields Item,Value

3.2.9 Filtering

Notation

The judgment

Value filter @QName => ()

holds if there are no occurrences of the attribute QName in Value. The judgment

Value filter @QName => SimpleValue

holds if there is one occurrence of the attribute QName in Value, and the value of that attribute is SimpleValue. The judgment

Value filter @QName => () or SimpleValue

holds if either of the previous two judgments hold.

Semantics

These judgments are defined using the auxiliary judgments

dynEnv |- Value on Axis => Value'

and

dynEnv |- Value on PrincipalNodeKind, NodeTest => Value'

which are defined in [5.2.1 Steps].

The filter judgments are defined as follows.

dynEnv |- Value on attribute:: => Value'

dynEnv |- Value' on "attribute", QName => ()

dynEnv |- Value filter @QName => ()

dynEnv |- Value on attribute:: => Value'

dynEnv |- Value' on "attribute",QName => Value''

Value'' = attribute QName { SimpleValue }

dynEnv |- Value filter @QName => SimpleValue

3.3 Matching

Introduction

The semantics of a type is given by the notion of matching, I.e., the set of all values which match that type. A tree in the [XQuery 1.0 and XPath 2.0 Data Model] matches a type in the [XPath/XQuery] type system, if and only if:

It verifies the structural constraints described by the type. For instance, the data model value:
```
  element bib:pubdate { 1954, 1966, 1974, 1986 }
```
matches the following type:
```
  element { attribute country { xs:string }?, xs:integer* }
```
because the element name "bib:pubdate" matches the wildcard element in the type (in which the name is omitted), the attribute "country" is optional, and the content of that element is indeed a sequence of integers.
It verifies the name constraints described by the type. For instance, the data model value:
```
  element bib:pubdate of type xs:anyType { 1954, 1966, 1974, 1986 }
```
does not match the following type:
```
  element of type PublicationInfo
  define type PublicationInfo { attribute country { xs:string }?, xs:integer* }
```
because the element in the data model is annotated with the type name xs:anyType while the schema expects an element annotated with the type name PublicationInfo.

3.3.1 Nil-matches

Notation

The judgment

Value nil-matches Nillable? Type

holds when the given value matches the given nillable type.

Semantics

This judgment is specified by the following rules.

If the type is not nillable, then the xsi:nil attribute must not appear in the value, and the value must match the type.

statEnv |- Value filter @xsi:nil => ()

statEnv |- Value matches Type

statEnv |- Value nil-matches Type

If the type is nillable, and the xsi:nil attribute does not appear or is false, then the value must match the type.

statEnv |- Value filter @xsi:nil => () or false

statEnv |- Value matches Type

statEnv |- Value nil-matches nillable Type

If the type is nillable, and the xsi:nil attribute is true, then the value must match the attributes in the type. The element content of the type is ignored.

statEnv |- Value filter @xsi:nil => true

statEnv |- Value matches AttributeAll

statEnv |- Value nil-matches nillable (AttributeAll, ElementContent)

3.3.2 Matches

Notation

The judgment

Value matches Type

holds when the given value matches the given type.

Semantics

This judgment is specified by the following rules.

The empty sequence matches the empty sequence type.

statEnv |- () matches ()

If two values match two types, then their sequence matches the corresponding sequence type.

statEnv |- Value₁ matches Type₁

statEnv |- Value₂ matches Type₂

statEnv |- Value₁,Value₂ matches Type₁,Type₂

If a value matches a type, then it also matches a choice type where that type is one of the choices.

statEnv |- Value matches Type₁

statEnv |- Value matches Type₁|Type₂

statEnv |- Value matches Type₂

statEnv |- Value matches Type₁|Type₂

If two values match two types, then their interleaving matches the corresponding all group.

statEnv |- Value₁ matches Type₁

statEnv |- Value₂ matches Type₂

statEnv |- Value₁ interleave Value₂ yields Value

statEnv |- Value matches Type₁ & Type₂

An optional type matches a value of that type or the empty sequence.

statEnv |- Value matches (Type | ())

statEnv |- Value matches Type?

The following rules are used to match a value against a sequence of zero (or one) or more types.

statEnv |- () matches Type*

statEnv |- Value₁ matches Type statEnv |- Value₂ matches Type*

statEnv |- Value₁, Value₂ matches Type*

statEnv |- Value₁ matches Type statEnv |- Value₂ matches Type*

statEnv |- Value₁, Value₂ matches Type+

An element matches an element type if the element type resolves to another element type, and the type annotation is derived from the type annotation of the resolved type, and the element value matches the enclosed type of the resolved type.

statEnv |- ElementName lookup ElementType yields Nillable? TypeSpecifier

statEnv |- TypeSpecifier resolves to TypeName' { Type }

statEnv |- TypeName derives from TypeName'

statEnv |- Value nil-matches Nillable? Type

statEnv |- element ElementName of type TypeName { Value } matches ElementType

The rule for attributes is similar.

statEnv |- AttributeName lookup AttributeType yields TypeSpecifier

statEnv |- TypeSpecifier resolves to TypeName' { Type }

statEnv |- TypeName derives from TypeName'

statEnv |- Value nil-matches Nillable? Type

statEnv |- attribute AttributeName of type TypeName { Value } matches AttributeType

A document node matches the corresponding document type.

statEnv |- Value matches Type

statEnv |- document { Value } matches document { Type }

A text node matches text.

statEnv |- text { String } matches text

An atomic value matches an atomic type if its type annotation derives from the atomic type. The value itself is ignored -- this is checked as part of validation.

statEnv |- AtomicType derives from AtomicType'

statEnv |- AtomicValue of type AtomicType matches AtomicType'

An atomic value matches "untyped" only if it is annotated with xs:anySimpleType.

statEnv |- AtomicValue of type xs:anySimpleType matches untyped

Note

The above definition of matching, although complete and precise, does not give a simple means to compute the structural matching. Notably, some of the above rules can be non-deterministic (e.g., the rule for matching of choice or repetition).

The structural component of the [XPath/XQuery] type system can be modeled by tree grammars. Computing structural matching can be done by computing if a given tree grammar recognizes the given data model value.

This document does not provide a complete algorithm to recognize a given tree by a tree grammar. The interested reader can consult the relevant literature, for instance [Languages], or [TATA].

Ed. Note: A future version of this document should include a more complete description of algorithms to compute type matching.

3.3.3 Optimized matching

Recall the rule for matching an element against an element type.

statEnv |- ElementName lookup ElementType yields Nillable? TypeSpecifier

statEnv |- TypeSpecifier resolves to TypeName' { Type }

statEnv |- TypeName derives from TypeName'

statEnv |- Value nil-matches Nillable? Type

statEnv |- element ElementName of type TypeName { Value } matches ElementType

This rule simplifies greatly in the case that the type specifier is a reference to a global type.

statEnv |- ElementName lookup ElementType yields Nillable? of type TypeName'

statEnv |- TypeName derives from TypeName'

statEnv |- Value filter @xsi:nil => () or false

statEnv |- element ElementName of type TypeName { Value } matches ElementType

In this case, it is not necessary to do resolution or to check that the value matches the type, because this is guaranteed by the way in which elements are labeled with types. Note that the optimization applies only if xsi:nil is not true.

A similar optimization applies to attributes.

3.4 Erase and Annotate

Erase and annotate define the core of XML Schema validation. They are used to define (partially) the semantics of the validate operation in [XPath/XQuery].

3.4.1 Erasure

3.4.1.1 Simply erases

Notation

To define erasure, an auxiliary judgment is needed. The judgment

SimpleValue simply erases to String

holds when SimpleValue erases to the string String.

Semantics

This judgment is specified by the following rules.

The empty sequence erases to the empty string.

statEnv |- () simply erases to ""

The concatenation of two non-empty sequences of values erases to the concatenation of their erasures with a separating space.

statEnv |- SimpleValue₁ simply erases to String₁ SimpleValue₁ != ()

statEnv |- SimpleValue₂ simply erases to String₂ SimpleValue₂ != ()

statEnv |- SimpleValue₁,SimpleValue₂ simply erases to xf:concat(String₁," ",String₂)

An atomic value erases to its string representation.

statEnv |- AtomicValue of type AtomicType simply erases to dm:string-value(AtomicValue)

3.4.1.2 Erases

Notation

The judgment

Value erases to Value'

holds when the erasure of Value is Value'.

Semantics

This judgment is specified by the following rules.

The empty sequence erases to itself.

statEnv |- () erases to ()

The erasure of the concatenation of two values is the concatenation of their erasure, so long as neither of the two original values is simple.

statEnv |- Value₁ erases to Value₁' statEnv |- Value₁ not a simple value

statEnv |- Value₂ erases to Value₂' statEnv |- Value₂ not a simple value

statEnv |- Value₁,Value₂ erases to Value₁',Value₂'

The erasure of an element is an element that has the same name and the type xs:anyType and the erasure of the original content.

statEnv |- Value erases to Value'

statEnv |- element ElementName of type TypeName { Value } erases to element ElementName of type xs:anyType { Value' }

The erasure of an attribute is an attribute that has the same name and the type xs:anySimpleType and the simple erasure of the original content labeled with xs:anySimpleType.

statEnv |- Value simply erases to String

statEnv |- attribute AttributeName of type TypeName { Value } erases to attribute AttributeName of type xs:anySimpleType { String of type xs:anySimpleType }

The erasure of a document is a document with the erasure of the original content.

statEnv |- Value erases to Value'

statEnv |- document { Value } erases to document { Value' }

The erasure of a text node is itself.

statEnv |- text { String } erases to text { String }

The erasure of a simple value is the corresponding text node.

statEnv |- SimpleValue simply erases to String

statEnv |- SimpleValue erases to text { String }

3.4.2 Annotate

3.4.2.1 Simply annotate

Notation

The judgment

simply annotate as SimpleType ( SimpleValue ) => SimpleValue'

holds if the result of casting the SimpleValue to SimpleType is SimpleValue'.

Ed. Note: Issue: The simply annotate judgment is used to describe the behavior of validation of simple values. This operation is essentially similar to casting from string to an atomic value. It is not clear if this actually aligns to the behavior of casting as specified by the [XQuery 1.0 and XPath 2.0 Functions and Operators]. See [Issue-0156: Casting and validation].

Semantics

This judgment is specified by the following rules.

Simply annotating a simple value to a union type yields the result of simply annotating the simple value to either the first or second type in the union. Note that simply annotating to the second type is attempted only if simply annotating to the first type fails.

statEnv |- simply annotate as SimpleType₁ (SimpleValue) => SimpleValue'

statEnv |- simply annotate as SimpleType₁|SimpleType₂ (SimpleValue) => SimpleValue'

statEnv |- (simply annotate as SimpleType₁ (SimpleValue) => SimpleValue') fails

statEnv |- simply annotate as SimpleType₂ (SimpleValue) => SimpleValue'

statEnv |- simply annotate as SimpleType₁|SimpleType₂ (SimpleValue) => SimpleValue'

The simple annotation rules for ?, +, *, and none are similar.

statEnv |- simply annotate as SimpleType₁? ( () ) => ()

statEnv |- simply annotate as SimpleType (SimpleValue) => SimpleValue'

statEnv |- simply annotate as SimpleType? (SimpleValue) => SimpleValue'

statEnv |- simply annotate as SimpleType* ( () ) => ()

statEnv |- simply annotate as SimpleType (SimpleValue₁) => SimpleValue₁' statEnv |- simply annotate as SimpleType* (SimpleValue₂) => SimpleValue₂'

statEnv |- simply annotate as SimpleType* (SimpleValue₁,SimpleValue₂) => SimpleValue₁',SimpleValue₂'

statEnv |- simply annotate as SimpleType (SimpleValue₁) => SimpleValue₁' statEnv |- simply annotate as SimpleType* (SimpleValue₂) => SimpleValue₂'

statEnv |- simply annotate as SimpleType+ (SimpleValue₁,SimpleValue₂) => SimpleValue₁',SimpleValue₂'

Simply annotating an atomic value to xs:string yields its string representation.

statEnv |- simply annotate as xs:string (AtomicValue) => dm:string-value(AtomicValue)

Simply annotating an atomic value to xs:decimal yields the decimal that results from parsing its string representation.

statEnv |- simply annotate as xs:decimal (AtomicValue) => xs:decimal(dm:string-value(AtomicValue))

Similar rules are assumed for the rest of the 19 XML Schema primitive types.

3.4.2.2 Nil-annotate

Notation

The judgment

nil-annotate as Nillable? Type ( Value ) => Value'

holds if it is possible to annotate value Value as if it had the nillable type Type and Value' is the corresponding annotated value.

Semantics

This judgment is specified by the following rules.

If the type is not nillable, then the xsi:nil attribute must not appear in the value, and it must be possible to annotate value Value as if it had the type Type.

statEnv |- Value filter @xsi:nil => ()

statEnv |- annotate as Type ( Value ) => Value'

statEnv |- nil-annotate as Type ( Value ) => Value'

If the type is nillable, and the xsi:nil attribute does not appear or is false, then it must be possible to annotate value Value as if it had the type Type.

statEnv |- Value filter @xsi:nil => () or false

statEnv |- annotate as Type ( Value ) => Value'

statEnv |- nil-annotate as nillable Type ( Value ) => Value'

If the type is nillable, and the xsi:nil attribute is true, then it must be possible to annotate value Value as if it had a type where the attributes in the type are kept and the element content of the type is ignored.

statEnv |- Value filter @xsi:nil => true

statEnv |- annotate as AttributeAll ( Value ) => Value'

statEnv |- nil-annotate as nillable (AttributeAll, ElementContent) ( Value ) => Value'

3.4.2.3 Annotate

Notation

The judgment

annotate as Type ( Value ) => Value'

holds if it is possible to annotate value Value as if it had type Type and Value' is the corresponding annotated value.

Note

Assume an XML Infoset instance X is validated against an XML Schema S, yielding PSVI instance X'. Then if X corresponds to Value and S corresponds to Type and X' corresponds to Value', the following should hold: annotate as Type ( Value ) => Value'.

Semantics

This judgment is specified by the following rules.

Annotating the empty sequence as the empty type yields the empty sequence.

statEnv |- annotate as () (()) => ()

Annotating a concatenation of values as a concatenation of types yields the concatenation of the annotated values.

statEnv |- annotate as Type₁ (Value₁) => Value₁'

statEnv |- annotate as Type₂ (Value₂) => Value₂'

statEnv |- annotate as Type₁,Type₂ (Value₁,Value₂) => Value₁',Value₂'

Annotating a value as a choice type yields the result of annotating the value as either the first or second type in the choice.

statEnv |- annotate as Type₁ (Value) => Value'

statEnv |- annotate as Type₁|Type₂ (Value) => Value'

statEnv |- annotate as Type₂ (Value) => Value'

statEnv |- annotate as Type₁|Type₂ (Value) => Value'

Annotating a value as an all group uses interleaving to decompose the original value and recompose the annotated value.

Ed. Note: Jerome and Phil: Note that this may reorder the original sequence. Perhaps we should disallow such reordering. Specifying that formally is not as easy as we would like.

statEnv |- annotate as Type₁ ( Value₁ ) => Value₁'

statEnv |- annotate as Type₂ ( Value₂ ) => Value₂'

statEnv |- Value₁ interleave Value₂ yields Value

statEnv |- Value₁' interleave Value₂' yields Value'

statEnv |- annotate as Type₁ & Type₂ ( Value ) => Value'

The annotation rules for ?, +, *, none are similar.

statEnv |- annotate as (Type | ())(Value) => Value'

statEnv |- annotate as Type? (Value) => Value'

statEnv |- annotate as Type (Value₁) => Value₁' statEnv |- annotate as Type* (Value₂) => Value₂'

statEnv |- annotate as Type+ (Value₁,Value₂) => (Value₁',Value₂')

statEnv |- annotate as Type* ( () ) => ()

statEnv |- annotate as Type (Value₁) => Value₁' statEnv |- annotate as Type* (Value₂) => Value₂'

statEnv |- annotate as Type* (Value₁,Value₂) => (Value₁',Value₂')

To annotate an element with no xsi:type attribute, first look up the the element type, next resolve the resulting type specifier, then annotate the value against the resolved type, and finally return a new element with the name of the original element, the resolved type name, and the annotated value.

statEnv |- Value filter @xsi:type => ()

statEnv |- ElementName lookup ElementType yields Nillable? TypeSpecifier

statEnv |- TypeSpecifier resolves to TypeName { Type }

statEnv |- nil-annotate as Type Nillable? (Value) => Value'

statEnv |- annotate as ElementType ( element ElementName of type xs:anyType { Value } ) => element ElementName of type TypeName { Value' }

To annotate an element with an xsi:type attribute, define a type specifier corresponding to the xsi:type. Look up the element type, yielding a type specifier, and check that the xsi:type specifier derives from this type specifier. Resolve the xsi:type specifier, then annotate the value against the resolved type, and finally return a new element with the name of the original element, the resolved type name, and the annotated value.

statEnv |- Value filter @xsi:type => TypeName

statEnv |- XsiTypeSpecifier = of type TypeName

statEnv |- ElementName lookup ElementType yields Nillable? TypeSpecifier

statEnv |- XsiTypeSpecifier derives from TypeSpecifier

statEnv |- XsiTypeSpecifier resolves to TypeName { Type }

statEnv |- nil-annotate as Type Nillable? (Value) => Value'

statEnv |- annotate as ElementType ( element ElementName of type xs:anyType { Value } ) => element ElementName of type TypeName { Value' }

Ed. Note: Issue: the treatment of xsi:type in the [XQuery 1.0: A Query Language for XML] document and in the formal semantics document still differ. See [Issue-0142: Treatment of xsi:type in validation].

The rule for attributes is similar to the first rule for elements.

statEnv |- AttributeName lookup AttributeType yields TypeSpecifier

statEnv |- TypeSpecifier resolves to TypeName { Type }

statEnv |- nil-annotate as Type Nillable? (Value) => Value'

statEnv |- annotate as AttributeType ( attribute AttributeName of type xs:anySimpleType { Value } ) => attribute AttributeName of type TypeName { Value' }

Annotating a document node yields a document with the annotation of its contents.

statEnv |- annotate as Type (Value) => Value'

statEnv |- annotate as document { Type } ( document { Value } ) => document { Value' }

Annotating a text node as text yields itself.

statEnv |- annotate as text (text { String }) => text { String }

Annotating a text nodes as a simple type is identical to casting.

statEnv |- simply annotate as SimpleType ( String as xs:anySimpleType ) => SimpleValue'

statEnv |- annotate as SimpleType ( text { String } ) => SimpleValue'

Annotating a simple value as a simple type is identical to casting.

statEnv |- simply annotate as SimpleType ( SimpleValue ) => SimpleValue'

statEnv |- annotate as SimpleType ( SimpleValue ) => SimpleValue'

3.5 Subtyping

Introduction

This section defines the semantics of subtyping in [XPath/XQuery]. Subtyping is used during the static type analysis, in typeswitch expressions, treat and assert expressions, and to check the correctness of function applications.

3.5.1 Subtype

Notation

The judgment

Type <: Type'

holds if the first type is a subtype of the second.

Semantics

This judgment is true if and only if, for every value Value, then Value matches Type implies Value matches Type'.

Note

It is easy to see that the subtype relation <: is a partial order, i.e. it is reflexive:

statEnv |- Type <: Type

and it is transitive: if,

statEnv |- Type₁ <: Type₂

and,

statEnv |- Type₂ <: Type₃

then,

statEnv |- Type₁ <: Type₃

The above definition although complete and precise, does not give a simple means to compute subtyping.

The structural component of the [XPath/XQuery] type system can be modeled by tree grammars. Computing subtyping between two types can be done by computing if inclusion holds between their corresponding tree grammars.

This document does not provide a complete algorithm to compute inclusion between tree grammar. The interested reader can consult the relevant literature on tree grammars, for instance [Languages], or [TATA].

Ed. Note: A future version of this document should include a more complete description of algorithms to compute subtyping.

3.5.2 Type equivalence

In a few cases, equivalence between two types (i.e., that they define exactly the same domain over data model instances) is used.

Notation

The judgment

Type₁ = Type₂

holds if the Type₁ is equivalent to the Type₂.

Semantics

By definition, Type₁ = Type₂ if and only if, Type₁<:Type₂ and Type₂<: Type₁.

3.6 Auxiliary typing judgments for "for", "unordered", and "sortby" expressions

This section defines the notion of Prime Type, and operations on prime types. Prime types play an important role in defining the static semantics of [XPath/XQuery], and are used notably in the semantics of "for", "unordered", and "sortby" expressions.

3.6.1 Prime types

Some expressions must operate on sequences where all items have the same type -- this type possibly being a choice of item types. These include FLWR expressions, sortby, and distinct. For example, let's assume the document

  <paper>
      <author>Buneman</author>
      <author>Davidson</author>
      <author>Fernandez</author>
      <author>Suciu</author>
      <title>Adding structure to unstructured data</title>
      <editor>Afrati</editor>
      <editor>Kolaitis</editor>
   </paper>

is of type

   element paper {
      element author {xs:string}+,
      element title {xs:string},
      element editor {xs:string}*
   }

The following query extracts a sequence of authors followed by editors and sorts them by their content.

   (/paper/author,/paper/editor) sortby (.)

This results in a sequence of authors and editors.

      <editor>Afrati</editor>
      <author>Buneman</author>
      <author>Davidson</author>
      <author>Fernandez</author>
      <editor>Kolaitis</editor>
      <author>Suciu</author>

Based on the content model for the paper element, the type inferred for the expression (/paper/author,/paper/editor) is simply element author {xs:string}+, element editor{xs:string}*. This type indicates that there should be a sequence of one or more authors, followed by zero or more editors. Due to the sortby expression, the original order of elements within the type can then be changed in an arbitrary way. As a result, the regular expression which describes the type cannot preserve any information which relates to order. The type infered for the complete expression reflects the arbitrary order by repeating a choice of author and editor elements.

       (element author {xs:string} | element editor {xs:string})+

Intuitively, sorting expressions, as well as iteration with for expressions, always operate on sequences of items of the same type, this type possibly being a choice. A choice of items is called a Prime type, and can be described by the following grammar production.

[34 (Formal)]

PrimeType

::=

ItemType
| (PrimeType "|" PrimeType)

When inferring the type of a for or a sortby expression, the system needs to compute such a prime type.

It also needs to compute the appropriate occurrence indicator for the sequence type. For instance, note that the type in the above example ends in a plus rather than a star, since there must be at least one author element in the sequence.

The rest of this section defines operations related to prime types and occurrence indicators, which are used during static type analysis.

3.6.2 Computing Prime Types and Occurrence Indicators

Notation

Two auxiliary functions on types are used.

The type function prime(Type) extracts all item types from the type Type, and combines them into a choice.

The function quantifier(Type) approximates the possible number of items in Type with the occurrence indicators supported by the [XPath/XQuery] type system (?, +, *).

For interim results, the following auxiliary occurrence indicators are used: 1 for exactly one occurrence, and 0 for exactly zero occurrences, i.e., the occurrence indicator of the empty list.

Types and occurrence can be combined with the · operation, as follows.

Type · 0	=	()
Type · 1	=	Type
Type · ?	=	Type?
Type · +	=	Type+
Type · *	=	Type*

Example

For instance, here are the result of applying prime and quantifier on a few simple types.

  prime(element a+)                         = element a
  prime(element a | ())                     = element a
  prime(element a?,element b?)              = element a | element b
  prime(element a | element b+, element c*) = element a | element b | element c

  quantifier(element a+)                         = +
  quantifier(element a | ())                     = ?
  quantifier(element a?,element b?)              = *
  quantifier(element a | element b+, element d*) = +

Note that the last occurrence indicator should be '+', since the regular expression is such that there must be at least one element in the sequence (this element being an 'a' element or a 'b' element).

Note

Note that both prime and quantifier functions are only used within type inference rules; they are not part of the [XPath/XQuery] syntax.

Semantics

The prime function is defined by induction as follows.

prime(ItemType)	=	ItemType
prime(())	=	none
prime(none)	=	none
prime(Type₁ , Type₂)	=	prime(Type₁) \| prime(Type₂)
prime(Type₁ & Type₂)	=	prime(Type₁) \| prime(Type₂)
prime(Type₁ \| Type₂)	=	prime(Type₁) \| prime(Type₂)
prime(Type?)	=	prime(Type)
prime(Type*)	=	prime(Type)
prime(Type+)	=	prime(Type)

Semantics

The quantifier function is defined by induction as follows.

quantifier(ItemType)	=	1
quantifier(())	=	0
quantifier(none)	=	0
quantifier(Type₁ , Type₂)	=	quantifier(Type₁) , quantifier(Type₂)
quantifier(Type₁ & Type₂)	=	quantifier(Type₁) , quantifier(Type₂)
quantifier(Type₁ \| Type₂)	=	quantifier(Type₁) \| quantifier(Type₂)
quantifier(Type?)	=	quantifier(Type) · ?
quantifier(Type*)	=	quantifier(Type) · *
quantifier(Type+)	=	quantifier(Type) · +

This definition uses the sum (Occurrence₁ , Occurrence₂), the choice (Occurrence₁ | Occurrence₂), and the product (Occurrence₁ · Occurrence₂) of two occurrence indicators Occurrence₁, Occurrence₂, which are defined by the following tables.

,	0	1	?	+	*
0	0	1	?	+	*
1	1	+	+	+	+
?	?	+	*	+	*
+	+	+	+	+	+
*	*	+	*	+	*

\|	0	1	?	+	*
0	0	?	?	*	*
1	?	1	?	+	*
?	?	?	?	*	*
+	*	+	*	+	*
*	*	*	*	*	*

·	1	?	+	*
0	0	0	0	0
1	1	?	+	*
?	?	?	*	*
+	+	*	+	*
*	*	*	*	*

Using these two newly defined functions, the result type of a SortExpr can be computed as follows. If Expr has type Type then Expr sortby SortSpecList has type prime(Type) · quantifier(Type). For the formal type inference rule see [5.9.1 Sorting Expressions]. Similar rules are applied to type iteration over sequences using for, as well as other operations that destroy sequence order - union, intersect, except, distinct-node, distinct-value, and unordered.

Note

Note that prime(Type) · quantifier(Type) is always a super type of the original type Type and therefore, can be used as an approximation for it. More formally, the following property always holds: Type <: prime(Type) · quantifier(Type). This property is important for the soundness of the static type analysis.

3.7 Auxiliary typing judgments for "typeswitch" expressions

3.7.1 Computing common types and occurrence in typeswitch

Static type analysis for the "typeswitch" expression has to compute the choice of all the common items between two types, and the corresponding occurrence indicators.

For instance, consider the following typeswitch expression, applied on the result of the previous query.

   let $persons := (/paper/author,/paper/editor) sortby (.)
   return
     typeswitch $person
        case element author return <single-author> { $person } </single-author>
        case element author* return <authors-only> { $person } </authors-only>
        default return <authors-and-editors> { $person } </authors-and-editors>

Static type analysis needs to compute the type of the variable $person within each case clause. In the first case, the variable is of type element author. In the second case, the variable is of type element author+. The '+' occurrence indicator can be infered since the input type guarantees the input sequence has at least one author.

3.7.2 Computing Common Prime Types and Occurrence Indicators

Notation

The two following auxiliary functions on types are used.

The function common-prime(PrimeType₁, PrimeType₂) computes the type for all the items common between the prime types PrimeType₁ and PrimeType₂, and combines them into a new prime type.

The function common-occurrence(Occurrence₁, Occurrence₂) computes the occurrence indicator which approximates the possible number of items that can result from the constraints described by both Occurrence₁, and Occurrence₂.

Example

For instance, here are the result of applying common-prime and common-occurrence on a few simple prime types.

  common-prime((element a),(element a))                                      = element a
  common-prime((element a),(element b))                                      = none
  common-prime((element a | element b | element c), (element c | element a)) = element a | element c

  common-occurrence(+,+)   = +
  common-occurrence(*,+)   = +
  common-occurrence(?,+)   = 1

Note that the last occurrence indicator should be '1', since the first input occurrence indicator '?' specifies that there should be at most one item, while the second '+' specifies that there should be at least one.

Semantics

The common-prime function is defined by induction as follows.

The common-prime for two arbitrary prime types, is defined as the choice containing the results of the function common-prime applied between all possible pairs of items in each input prime types.

statEnv |- common-prime( ItemType₁, ItemType₁' ) = ItemType_(1,1)

statEnv |- common-prime( ItemType₁, ItemType₂' ) = ItemType_(1,2)

···

statEnv |- common-prime( ItemType_n, ItemType_r' ) = ItemType_(n,r)

The common-prime function between any pair of item types return either a new item type or the empty choice (none); remember that none is the unit for choice.

If one of the item type is a subtype of the other, then the function returns the smaller item type.

statEnv |- ItemType₁ <: ItemType₂

statEnv |- common-prime(ItemType₁, ItemType₂) = ItemType₁

statEnv |- ItemType₂ <: ItemType₁

statEnv |- common-prime(ItemType₁, ItemType₂) = ItemType₂

Ed. Note: Issue: The above rule does not deal with cases where the two item types might have some intersection (i.e., there exist values which match both types), but subtyping does not hold between them. See [Issue-0155: Common primes for incomparable types].

In all the other cases, common-prime returns none. I.e., otherwise:

statEnv |- common-prime(ItemType₁, ItemType₂) => none

Semantics

The common-occurrence function is defined by the following table.

common-occurrence	1	?	+	*
0	0	0	0	0
1	1	1	1	1
?	1	?	1	?
+	1	1	+	+
*	1	?	+	*

Example

The above definition relies on subtyping which covers complex cases (involving derivation, substitution groups, etc.) appropriately. For instance, consider the following type declarations.

 define element person of type Person define type Person
  { element name of type xs:string }

  define element student substitutes for person of type Student
  define type Student extends Person {
    element promotion of type Promotion,
    element grade of type Grade
  }

  define type Promotion restricts xs:integer { xs:integer }
  define type Grade restricts xs:integer { xs:integer }

Here are some more complex examples of the application of the common-prime type function involving that schema.

  common-prime(Promotion, xs:integer)                           = Promotion
  common-prime(Promotion, Grade)                                = none
  common-prime(element of type Person, element of type Student) = element of type Student
  common-prime(element person, element student)                 = element student

3.8 Major type issues

Ed. Note: Alignment between [XPath/XQuery], the [XQuery 1.0 and XPath 2.0 Data Model], and the [XPath/XQuery] formal semantics. There are some known discrepancies between the type models in the Data Model document, the [XPath/XQuery] document and this document. Currently the [XPath/XQuery] grammar does not contain comments, processing-instructions or schema-components. Currently this document does not deal with PI and comment nodes, which are present in the Data Model. See issues [Issue-0068: Document Collections], [Issue-0105: Types for nodes in the data model.].

Ed. Note: Complexity of type operations. There are concerns about the complexity of type inference, notably about type subsumption used during type inference, and validation used during evaluation of some of the type operations (e.g., typeswitch). Additionally, there are many language features for which it is possible to define special type inference rules that would give tighter bounds. The question is, which features, which rules, and how do they interact with the complexity of type inference and subtype computation. See [Issue-0080: Typing of parent], [Issue-0083: Expressive power and complexity of typeswitch expression], [Issue-0091: Attribute expression].

4 Basics

The organization of this section parallels the organization of Section 2 of the [XPath/XQuery] document.

4.1 Expression Context

Introduction

The expression context for a given expression consists of all the information that can affect the result of the expression. This information is organized into the static context and the evaluation context. This section specifies the environments that represent the context information used by [XPath/XQuery] expressions.

4.1.1 Static Context

The [XPath/XQuery] static inference rules use the environment group statEnv containing the environments built during static type checking and used during both static type checking and dynamic evaluation.

The following environments are maintained in the static environment group:

	statEnv.namespace	The namespace environment maps namespace prefixes (NCNames) onto namespace URIs (URIs). The namespace environment captures in-scope namespaces in the [XPath/XQuery] static context.
	statEnv.default_namespace	The default namespace environment maps "element", and "function" to a namespace URI (a URI), as determined by the appropriate namespace declaration. The default namespace environment captures the default namespace for element and type names and Default namespace for function names in the [XPath/XQuery] static context.
	statEnv.typeDefn	The type definition environment maps type names (TypeNames) onto their global type definitions (Definitions). The type definition environment captures in-scope schema definitions in the [XPath/XQuery] static context.
	statEnv.elemDecl	The element declaration environment maps element names (ElementNames) onto their global element declarations (Definitions). The element declaration environment does not have a counterpart in the [XPath/XQuery] static context. See [Issue-0159: Element and attribute declarations in the static context].
	statEnv.attrDecl	The attribute declaration environment maps attribute names (AttributeNames) onto their global attribute declarations (Definitions). The attribute declaration environment does not have a counterpart in the [XPath/XQuery] static context. See [Issue-0159: Element and attribute declarations in the static context].
	statEnv.varType	The variable static type environment maps variable names (Variables) to their static types (Types). The variable static type environment captures in-scope variables in the [XPath/XQuery] static context.
	statEnv.funcType	The function declaration environment stores the static type signatures of functions. Because [XPath/XQuery] allows multiple functions with the same name differing only in the number of arguments, this environment maps function name/arity (QName, integer) pairs to function type signatures `define function` QName (Type₁, ..., Type_n) `return` Type, where the first is the return type and the rest are the types of the parameters, in order. The function declaration environment captures in-scope functions in the [XPath/XQuery] static context.
	statEnv.base_uri	The base uri environment contains a unique namespace URI (a URI). The base uri environment captures the base URI in the [XPath/XQuery] static context.

Ed. Note: Issue: The static environment does not represent collations. See [Issue-0160: Collations in the static environment]

Environments have an initial state when [expression/query] processing begins, containing, for example, the function signatures of all built-in functions.

A common use of the static environment is to expand QNames by looking up namespace prefixes in the statEnv.namespace environment.

The helper function expand is used to expand QNames by looking up the namespace prefix in statEnv.namespace. This function is defined as follows:

statEnv |- expand(QName) = qname(URI, ncname)

statEnv |- expand(QName) = qname(ncname)

the right hand side is a QName value (not a QName expression).

The helper expand function is defined as follows:

If QName matches NCName₁:NCName₂, and if statEnv.namespace(NCName₁) = URI, then the expression yields qname(URI,NCName₂). If the namespace prefix NCName₁ is not found in statEnv.namespace, then the expression does not apply (that is, the inference rule will not match).
If QName matches NCName, NCName is an element or type name, and statEnv.default_namespace("element") = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.
If QName matches NCName, NCName is a function name, and statEnv.default_namespace("function") = URI, then the expression yields qname(URI,NCName), where URI is the default namespace in effect.

Ed. Note: The above rules could be given as proper inference rules defining the expand judgment.

Here is an example that shows how the static environment is modified in response to a namespace definition.

statEnv [ namespace(NCName |-> URI) ] |- Expr*

statEnv |- namespace NCName = URI Expr*

This rule reads as follows: "the phrase on the bottom (a namespace declaration followed by a sequence of expressions) is well-typed (accepted by the static type inference rules) within an environment statEnv if the sequence of expressions above the line is well-typed in the environment obtained from statEnv by adding the namespace declaration".

This is the common idiom for passing new information in an environment using sub-expressions. In the case where the environment must be updated with a completely new component, the following notation is used:

statEnv [ namespace = (NewEnvironment) ]

4.1.2 Evaluation Context

dynEnv denotes the group of environments built and used only during dynamic evaluation.

The following environments are dynamic:

	dynEnv.funcDefn	The dynamic function environment maps a function name to a function definition. The function definition consists of an expression, which is the function's body, and a list of variables, which are the function's formal parameters. As with statEnv.funcType, dynEnv.funcDefn requires a function name and a parameter count: it maps a (QName, integer) pair to a value list of the form (Expr , Variable₁, ..., Variable_n).
	dynEnv.varValue	The dynamic value environment maps a variable name (QName) onto the variable's current value (Value).

Ed. Note: DD: this still does not account for those functions that are genuinely overloaded, as some of the built-in functions are. Probably this will be handled by special rules for those functions in the main semantics section of the document. See [Issue-0122: Overloaded functions]

Finally, dynamic environments have an initial state when [expression/query] processing begins, containing, for example, the function declarations of all built-in functions.

Ed. Note: Somewhere an exact definition of what is in the initial environments is needed. Notice that for XPath this is partially defined by the containing language. See [Issue-0115: What is in the default context?].

The following Formal Semantics built-in variables represent the context item, context position , and context size properties of the evaluation context:

Built-in Variable	Represents:
$fs:dot	context item
$fs:position	context position
$fs:last	context size
$fs:implicitTimezone	implicit timezone

Variables with the "fs" namespace prefix are reserved for use in the definition of the Formal Semantics. It is a static error to define a variable in the "fs" namespace.

Ed. Note: "fs" is actually a namespace prefix and should be replaced with a proper namespace URI unique to this spec. See [Issue-0100: Namespace resolution].

Values of $fs:dot, $fs:position and $fs:last can be obtained by invoking the xf:context-item(), xf:position() and xf:last() functions, respectively. The context document, can be obtained by application of the xf:root() function.

The variable $fs:implicitTimezone is used to represent the implicit timezone, and is used by the timezone related functions in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Ed. Note: The Formal Semantics does not yet model the semantics of semantics of timezone related functions.

Ed. Note: The following dynamic contexts have no formal representation yet: current-date, current-time and current-dateTime. See [Issue-0114: Dynamic context for current date and time]

4.2 Input Functions

[XPath/XQuery] has three functions that provide access to input data: input, collection, and document. The dynamic semantics of these three input functions are described in more detail in [XQuery 1.0 and XPath 2.0 Functions and Operators]. The static typing for these function is still an open issue (See [Issue-0137: Typing of input functions]).

4.3 Expression Processing

This section gives background material about the [XPath/XQuery] data model. More information on the [XPath/XQuery] data model can be found in [2.3 Data Model].

4.4 Types

4.4.1 Type Checking

This section gives background material about [XPath/XQuery] type checking. More information on the [XPath/XQuery] type system can be found in [2.4 Schemas and types] and [3 The XQuery Type System].

4.4.2 SequenceType

Introduction

SequenceTypes can be used in [XPath/XQuery] to refer to a type imported from a schema (see [6 The Query Prolog]). SequenceTypes are used to declare the types of function parameters and in several kinds of [XPath/XQuery] expressions.

The syntax of SequenceTypes is described by the following grammar productions.

[64 (XQuery)]	SequenceType	::=	(ItemType OccurrenceIndicator) \| "empty"
[65 (XQuery)]	ItemType	::=	(("element" \| "attribute") ElemOrAttrType?) \| "node" \| "processing-instruction" \| "comment" \| "text" \| "document" \| "item" \| AtomicType \| "untyped" \| <"atomic" "value">
[66 (XQuery)]	ElemOrAttrType	::=	(QName (SchemaType \| SchemaContext?)) \| SchemaType
[67 (XQuery)]	SchemaType	::=	<"of" "type"> QName
[60 (XQuery)]	SchemaContext	::=	"in" SchemaGlobalContext ("/" SchemaContextStep)*
[61 (XQuery)]	SchemaGlobalContext	::=	QName \| ("type" QName)
[62 (XQuery)]	SchemaContextStep	::=	QName
[68 (XQuery)]	AtomicType	::=	QName
[69 (XQuery)]	OccurrenceIndicator	::=	("*" \| "+" \| "?")?

The semantics of SequenceTypes is defined by means of normalization rules from SequenceTypes to the [XPath/XQuery] type system (see [3 The XQuery Type System]).

Core Grammar

The core grammar productions for sequence types are:

[46 (Core)]	SequenceType	::=	(ItemType OccurrenceIndicator) \| "empty"
[47 (Core)]	ItemType	::=	(("element" \| "attribute") ElemOrAttrType?) \| "node" \| "processing-instruction" \| "comment" \| "text" \| "document" \| "item" \| AtomicType \| "untyped" \| <"atomic" "value">
[48 (Core)]	ElemOrAttrType	::=	(QName (SchemaType \| SchemaContext?)) \| SchemaType
[49 (Core)]	SchemaType	::=	<"of" "type"> QName
[42 (Core)]	SchemaContext	::=	"in" SchemaGlobalContext ("/" SchemaContextStep)*
[43 (Core)]	SchemaGlobalContext	::=	QName \| ("type" QName)
[44 (Core)]	SchemaContextStep	::=	QName
[50 (Core)]	AtomicType	::=	QName
[51 (Core)]	OccurrenceIndicator	::=	("*" \| "+" \| "?")?

Ed. Note: Note that normalization on SequenceTypes does not occur during the normalization phase but whenever a dynamic or static rule requires it. The reason for this deviation from the processing model is that the result of SequenceType normalization is not part of the [XPath/XQuery] syntax (See issue [Issue-0089: Syntax for types in XQuery]). SequenceType normalization is the only occurrence of such a deviation in the formal semantics.

Notation

To define the semantics of SequenceTypes, the following auxiliary mapping rule is used.

[SequenceType]_sequencetype

Type

specifies that SequenceType is mapped to Type, in the XQuery type system.

Normalization

OccurenceIndicators are left unchanged when normalizing SequenceTypes into [XPath/XQuery] types. Each kind of SequenceType component is normalized separately into the [XPath/XQuery] type system.

[ItemType OccurrenceIndicator]_sequencetype

[ItemType]_sequencetype OccurrenceIndicator

The "empty" sequence type is mapped to the empty sequence type in the [XPath/XQuery] type system.

[empty]_sequencetype

[()]_sequencetype

The SequenceType component with type SchemaType is mapped directly into the [XPath/XQuery] type system.

[element SchemaType]_sequencetype

element SchemaType

[attribute SchemaType]_sequencetype

attribute SchemaType

The mapping still does not handle SequenceTypes using SchemaContext (See [Issue-0138: Semantics of Schema Context]). The two following rules map references to a global element or attribute, without taking the schema context into account.

[element QName]_sequencetype

element QName

[attribute QName]_sequencetype

attribute QName

Document nodes, text nodes, untyped and atomic types are left unchanged.

[text]_sequencetype

text

[document]_sequencetype

document

[untyped]_sequencetype

untyped

[AtomicType]_sequencetype

AtomicType

Processing instruction and comment types are ignored. See [Issue-0105: Types for nodes in the data model.].

[comment]_sequencetype

[processing-instruction]_sequencetype

The SequenceType components "node", "item", and "atomic value" correspond to wildcard types. node indicates that any node is allowed, "atomic value" indicates that any atomic (not list) value is allowed, and item indicates that any node or atomic value is allowed. The following mapping rules make use of the corresponding wildcard types.

[node]_sequencetype

(element | attribute | text | document)

[item]_sequencetype

[atomic value]_sequencetype

(xs:string | xs:decimal | ... )

Ed. Note: Jerome: The Formal Semantics makes use of fs:numeric which is not in XML Schema. This is necessary for the specification of some of XPath type conversion rules. See [Issue-0127: Datatype limitations].

During processing of a query, it is sometimes necessary to determine whether a given value matches a type that was declared using the SequenceType syntax. This process is known as SequenceType matching, and is formally specified in [3.3 Matching].

4.4.3 Type Conversions

Introduction

Some expressions do not require their operands to exactly match the expected type. For example, function parameters and returns expect a value of a particular type, but automatically perform certain type conversions, such as extraction of atomic values from nodes, promotion of numeric values, and implicit casting of untyped values. The conversion rules for function parameters and returns are discussed in [5.1.4 Function Calls]. Other operators that provide special conversion rules include arithmetic operators, which are discussed in [5.4 Arithmetic Expressions], and value comparisons, which are discussed in [5.5.1 Value Comparisons].

Some special conversion rules: atomization, effective boolean value, and fallback conversions, are used in several places in the language.

4.4.3.1 Atomization

Notation

There are two variants of the normalization function []_Atomize.

The first variant, []_Atomize is not parameterized by the required type of the expression.

The second variant, []_{Atomize(Type)}, is parameterized by the required type.

Both functions convert an expression into an expression that returns an optional atomic value and is used in the normalization of expressions whose required type is an optional atomic value.

Normalization

A node is converted to an optional atomic value by extracting its typed content.

[Expr]_Atomize

typeswitch (Expr)

case atomic value? $v return $v

case node $v return

(typeswitch (xf:data($v))

case atomic value? $w return $w

default $w return xf:error()

default $v return xf:error()

In the parameterized variant, an atomic value with type untyped is always cast to the required type:

[Expr]_{Atomize(Type)}

typeswitch (Expr)

case untyped $v return cast as Type ($v)

case atomic value? $v return $v

case node $v return

(typeswitch (xf:data($v))

case untyped $w return cast as Type ($w)

case atomic value? $w return $w

default $w return [$w]_{Type_Exception(Type)})

default $v return [$v]_{Type_Exception(Type)}

Ed. Note: Issue:Don Chamberlin indicates that this rule should do type promotion. See [Issue-0161: Type promotion in Atomization].

Notation

The following normalization rule implements type exceptions: []_{Type_Exception(Type)}.

Normalization

There are two definitions for each of these functions.

The definition here implements the strict type exception policy. The strict policy always raises an error:

[Expr]_{Type_Exception(Type)}

xf:error()

The flexible type exception policy is defined in [4.4.3.3 Fallback Conversion].

4.4.3.2 Effective Boolean Value

Notation

The function []_{Effective_Boolean_Value} takes an expression and normalizes it into an effective boolean value.

Normalization

The effective boolean value is obtained through the following normalization rule. Note that the default clause returns true is at least one item in the sequence is a node, otherwise, it raises a type exception.

[Expr]_{Effective_Boolean_Value}

typeswitch (Expr)

case empty $v return xf:false()

case xs:boolean $v return $v

default $v return if (length([$v/self::node()]_Expr) ge 1) then xf:true() else [$v]_{Type_Exception(xs:boolean)}

4.4.3.3 Fallback Conversion

Normalization

The following rules define the "fallback conversions" that support the "flexible" type-exception policy. These rules depend on the required type for the expression that is being normalized.

In the first rule, the required type is xs:boolean. The conditional expression in the "default return" clause below guarantees that in an arbitrary sequence of items, if at least one value is a node, then the boolean expression evaluates to true.

[Expr]_{Type_Exception(xs:boolean)}

typeswitch (Expr)

case empty $v return xf:false()

default $v return

if (xf:count([$v/self::node()]_Expr) >= 1) then xf:true()

else xf:boolean(op:item-at($v, 1))

Ed. Note: MFF: The rule above requires that $v/self::node() on an atomic value returns () rather than raise an error.

In the next rule, the required type is "node". The conditional expression in the "default return" clause below guarantees that, if the first item is a node, then it returns the node, otherwise it returns an error.

[Expr]_{Type_Exception(node)}

typeswitch (Expr)

case empty $v return xf:error()

default $v return

(typeswitch (op:item-at($v, 1))

case node $w return $w

default $w return xf:error())

Ed. Note: MFF: All the remaining rules in the fallback conversions table must be specified. See [Issue-0113: Incomplete specification of type conversions]

4.5 Errors and Conformance

This section describes error handling and conformance levels.

Ed. Note: Issue: Error handling is not formally specified. See [Issue-0094: Static type errors and warnings]

Ed. Note: Issue: Conformance levels are not formally specified. See [Issue-0169: Conformance Levels]

5 Expressions

This section gives the semantics of all the [XPath/XQuery] expressions. The organization of this section parallels the organization of Section 3 of the [XPath/XQuery] document.

[5 (XQuery)]	Expr	::=	SortExpr
[10 (XPath)]	Expr	::=	OrExpr

For each expression, a short description and the relevant grammar productions are given. The semantics of an expression includes the normalization, static analysis, and dynamic evaluation phases. Recall that normalization rules translate [XPath/XQuery] syntax into core [XPath/XQuery] syntax. In the sections that contain normalization rules, the Core grammar productions into which the expression is normalized are also provided. After normalization, sections on static type inference and dynamic evaluation define the static type and dynamic value for the core expression.

Core Grammar

The core grammar production for expressions is:

[5 (Core)]

Expr

::=

SortExpr

5.1 Primary Expressions

Primary expressions are the basic primitives of the language.They include literals, variables, function calls, and the use of parentheses to control precedence of operators.

[39 (XQuery)]	PrimaryExpr	::=	Literal \| FunctionCall \| ("$" VarName) \| ParenthesizedExpr
[227 (XQuery)]	VarName	::=	QName

Core Grammar

The core grammar productions for primary expressions are:

[24 (Core)]	PrimaryExpr	::=	Literal \| FunctionCall \| ("$" VarName) \| ParenthesizedExpr
[169 (Core)]	VarName	::=	QName

5.1.1 Literals

Introduction

A literal is a direct syntactic representation of an atomic value. [XPath/XQuery] supports two kinds of literals: string literals and numeric literals.

[56 (XQuery)]	Literal	::=	NumericLiteral \| StringLiteral
[55 (XQuery)]	NumericLiteral	::=	IntegerLiteral \| DecimalLiteral \| DoubleLiteral
[193 (XQuery)]	IntegerLiteral	::=	Digits
[194 (XQuery)]	DecimalLiteral	::=	("." Digits) \| (Digits "." [0-9]*)
[195 (XQuery)]	DoubleLiteral	::=	(("." Digits) \| (Digits ("." [0-9]*)?)) ([e] \| [E]) ([+] \| [-])? Digits
[214 (XQuery)]	StringLiteral	::=	(["] ((" ") \| [^"])* ["]) \| (['] ((' ') \| [^'])* ['])

Normalization

All literals are core expressions, therefore no normalization rules are required for literals.

Core Grammar

The core grammar productions for literals are:

[38 (Core)]	Literal	::=	NumericLiteral \| StringLiteral
[37 (Core)]	NumericLiteral	::=	IntegerLiteral \| DecimalLiteral \| DoubleLiteral
[138 (Core)]	IntegerLiteral	::=	Digits
[139 (Core)]	DecimalLiteral	::=	("." Digits) \| (Digits "." [0-9]*)
[140 (Core)]	DoubleLiteral	::=	(("." Digits) \| (Digits ("." [0-9]*)?)) ([e] \| [E]) ([+] \| [-])? Digits
[158 (Core)]	StringLiteral	::=	(["] ((" ") \| [^"])* ["]) \| (['] ((' ') \| [^'])* ['])

Static Type Analysis

In the static semantics, the type of an integer literal is simply xs:integer:

statEnv |- IntegerLiteral : xs:integer

Dynamic Evaluation

In the dynamic semantics, an integer literal is evaluated by constructing an atomic value in the data model, which consists of the literal value and its type:

dynEnv |- IntegerLiteral => dm:atomic-value(IntegerLiteral, xs:integer)

The formal definitions of decimal, double, and string literals are analogous to those for integer.

Static Type Analysis

statEnv |- DecimalLiteral : xs:decimal

Dynamic Evaluation

dynEnv |- DecimalLiteral => dm:atomic-value(DecimalLiteral, xs:decimal)

Static Type Analysis

statEnv |- DoubleLiteral : xs:double

Dynamic Evaluation

dynEnv |- DoubleLiteral => dm:atomic-value(DoubleLiteral, xs:double)

Static Type Analysis

statEnv |- StringLiteral : xs:string

Dynamic Evaluation

dynEnv |- StringLiteral => dm:atomic-value(StringLiteral, xs:string)

Ed. Note: MFF: Phil has noted that the data model should support primitive literals in their lexical form, in which case no explicit dynamic semantic rule would be necessary. See [Issue-0118: Data model syntax and literal values].

5.1.2 Variables

Introduction

A variable evaluates to the value to which the variable's QName is bound in the evaluation context.

Normalization

A variable is a core expression, therefore no normalization rule is required for a variable.

Static Type Analysis

In the static semantics, the type of a variable is simply its type in the static type environment statEnv.varType:

statEnv.varType(Variable) = Type

statEnv |- Variable : Type

If the variable is not bound in the environment, the system raises an error.

Dynamic Evaluation

In the dynamic semantics, a variable is evaluated by "looking up" its value in dynEnv.varValue:

dynEnv.varValue(Variable) = Value

dynEnv |- Variable => Value

If the variable is not bound in the environment, the system raises an error.

5.1.3 Parenthesized Expressions

[57 (XQuery)]

ParenthesizedExpr

::=

"(" ExprSequence? ")"

The formal definition of parenthesized expressions is in [5.3 Sequence Expressions].

Core Grammar

The core grammar production for parenthesized expressions is:

[39 (Core)]

ParenthesizedExpr

::=

"(" ExprSequence? ")"

5.1.4 Function Calls

Introduction

A function call consists of a QName followed by a parenthesized list of zero or more expressions.

[58 (XQuery)]

FunctionCall

::=

<QName "("> (Expr ("," Expr)*)? ")"

Because [XPath/XQuery] perform implicit operations when doing function calls, a normalization step is required.

Notation

Normalization of function calls uses an auxiliary mapping rule, []_{FormalArgument}, used to map the formal arguments of a function call.

There are three variants of this mapping rule. If the required type of the formal argument is an optional atomic type, the following rule is applied: (Type is the required atomic type.)

[Expr]_{FormalArgument}

[Expr]_{Atomize(Type)}

If the required type of the formal argument is a sequence of atomic types, the following rule is applied: (Type is the atomic type of the sequence.)

[Expr]_{FormalArgument}

for $x in (Expr) return

typeswitch ($x)

case node $v return data($v)

default $v return $v

Ed. Note: Jerome: it is not clear whether the above rule is correct. At least type promotion seems to be implied by the corresponding description in the [XPath/XQuery] document. See .

If the required type of the formal argument is neither an optional atomic type or a sequence of atomic types, the expression is simply mapped to its corresponding core expression:

[Expr]_{FormalArgument}

[Expr]_Expr

Normalization

First, each argument in a function call is normalized to its corresponding core expression by applying []_{FormalArgument}.

[ QName (Expr₁, ..., Expr_n) ]_Expr

[[QName] ( [[Expr₁ ]_Expr]_{FormalArgument}, ..., [[ Expr_n ]_Expr]_{FormalArgument})]_{FormalArgument}

Note that this normalization rule depends on the static environment containing function signatures and is the only place where we exploit the implicit presence of statEnv.

Core Grammar

The core grammar production for function calls is:

[40 (Core)]

FunctionCall

::=

<QName "("> (Expr ("," Expr)*)? ")"

Static Type Analysis

Based on the function's name and number of arguments (and in the case of built-in functions based also on the type of its arguments), the function signature is retrieved from the static environment. If the function is not present in the environment, an error is raised. The type of each actual argument to the function must be a subtype of the corresponding formal argument to the function, i.e., it is not necessary that the actual and formal types be the same.

Ed. Note: Issue: Michael Rys points out there is a dependency between normalization (which requires the function signature) and the fact that built-in functions may be overloaded for various types. See [Issue-0140: Dependency in normalization and function resolution]

statEnv |- QName => qname

statEnv.funcType(qname, n) = define function qname(Type₁, ..., Type_n) returns Type

statEnv |- Expr₁ : Type₁' Type₁' <: Type₁

...

statEnv |- Expr_n : Type_n' Type_n' <: Type_n

statEnv |- QName (Expr₁, ..., Expr_n) : Type

Dynamic Evaluation

Based on the function's name and the number of arguments (and in the case of built-in functions based also based on the type of its arguments), the function body is retrieved from the dynamic environment. If the function is not present in the environment, an error is raised. The rule first evaluates each function argument, then extends dynEnv.varValue by binding each formal variable to its corresponding value, and then evaluates the body of the function in the new environment. The resulting value is the value of the function call.

dynEnv |- Expr₁ => Value₁ ... dynEnv |- Expr_n => Value_n

dynEnv |- QName => qname

statEnv.funcType(qname, n) = define function qname(Type₁, ..., Type_n) returns Type

statEnv |- Value₁ matches Type₁ ..., Value_n matches Type_n

dynEnv.funcDefn(qname, n) = (Expr, Variable₁, ... , Variable_n)

dynEnv [ varValue = (Variable₁ |-> Value₁;...; Variable_n |-> Value_n) ] |- Expr => Value

statEnv |- Value matches Type

dynEnv |- QName ( Expr₁, ..., Expr_n ) => Value

Note that the function body is evaluated in the default environment. Note also that input values and output values are matched against the types declared for the function. If static analysis was performed, all these checks are guaranteed to be true and may be omitted.

Ed. Note: This only describes the semantics of user defined functions. The semantics of built-in functions is still an open issue. See [Issue-0122: Overloaded functions] and [Issue-0162: How to describe the semantics of built-in functions].

5.1.5 Comments

[94 (XQuery)]

ExprComment

::=

"{--" [^}]* "--}"

Comments are lexical constructs only, and have no meaning within the query and therefore have no Formal Semantics.

5.2 Path Expressions

Introduction

Path expressions are used to locate nodes within a tree. There are two kinds of path expressions, absolute path expressions and relative path expressions. An absolute path expression is a rooted relative path expression. A relative path expression is composed of a sequence of steps.

[23 (XQuery)]	PathExpr	::=	("/" RelativePathExpr?) \| ("//" RelativePathExpr) \| RelativePathExpr
[24 (XQuery)]	RelativePathExpr	::=	StepExpr (("/" \| "//") StepExpr)*

Core Grammar

The core grammars productions for path expressions are:

[13 (Core)]	PathExpr	::=	RelativePathExpr
[14 (Core)]	RelativePathExpr	::=	StepExpr

Notation

To define the semantics of relative path expressions, the auxiliary mapping rule is used: [RelativePathExpr]_Path.

Normalization

Absolute path expressions are path expressions starting with the / symbol, indicating that the expression must be applied on the root node in the current context. Remember that the root node in the current context is the topmost ancestor of the context node. The following two rules are used to normalize absolute path expressions to relative ones, and rely on the use of the xf:root() function, which computes the document node from the context node.

["/"]_Expr

xf:root( $fs:dot )

["/" RelativePathExpr]_Expr

fs:distinct-doc-order( [xf:root( $fs:dot ) "/" RelativePathExpr]_Path )

Ed. Note: Some of the semantics of the root expression '/' are still undefined. For instance, what should the semantics of '/' be in case of a document fragment (e.g., created using XQuery element constructor). See [Issue-0123: Semantics of /].

Ed. Note: [Kristoffer/XSL] Also the context node and document order are not defined when it is a context item).

In general, a path expression always returns a sequence in document order. The complete normalization of a Path expression is obtained by applying the Path normalization and then sorting the result in document order and removing duplicates.

[RelativePathExpr]_Expr

fs:distinct-doc-order( [RelativePathExpr]_Path )

Ed. Note: Jerome: the restriction that XPath expressions operate on nodes here seems too strict and is still an open issue. See [Issue-0125: Operations on node only in XPath].

A composite relative path expression (using /) is normalized into a for expression by concatenating the sequences obtained by mapping each node of the left-hand side in document order to the sequence it generates on the right-hand side; the encapsulating call of the fs:distinct-doc-order function then ensures that the result is in document order without duplicates. The evaluation context is carefully set up (using two bindings of the $fs:last semantic variable).

[StepExpr "/" RelativePathExpr ]_Path

let $fs:sequence := fs:distinct-doc-order( [StepExpr]_Path ) return

let $fs:last := xf:count($fs:sequence) return

for $fs:dot in $fs:sequence return

let $fs:position := op:index-of($fs:sequence, $fs:dot) return

[RelativePathExpr]_Path

Ed. Note: [Kristoffer/XSL] We have defined normalization over a right recursive variant of the syntax production -- this should perhaps be explained more elaborately.

5.2.1 Steps

Introduction

[25 (XQuery)]	StepExpr	::=	(ForwardStep \| ReverseStep \| PrimaryExpr) Predicates
[50 (XQuery)]	ForwardStep	::=	(ForwardAxis NodeTest) \| AbbreviatedForwardStep
[51 (XQuery)]	ReverseStep	::=	(ReverseAxis NodeTest) \| AbbreviatedReverseStep
[54 (XQuery)]	Predicates	::=	("[" Expr "]")*

Core Grammar

The core grammars productions for XPath steps are:

[15 (Core)]	StepExpr	::=	ForwardStep \| ReverseStep \| PrimaryExpr
[35 (Core)]	ForwardStep	::=	ForwardAxis NodeTest
[36 (Core)]	ReverseStep	::=	ReverseAxis NodeTest

Notation

Step expressions can be followed by predicates. Normalization of predicates uses the following auxiliary mapping rule: []_Predicates.

Normalization

Normalization of predicates need to distinguish between forward and reverse axes.

As explained in the [XPath/XQuery] document, applying a step in XPath changes the focus (or context). The change of focus is made explicit by the normalization rule below, which binds the variable $fs:dot to the node currently being processed, and the variable $fs:position to the position (i.e., the position within the input sequence) of that node. Notice that the expression does not reorder the sequences obtained by evaluating the subexpressions: if they are RelativePathExprs then the contained path normalization will insert fs:distinct-doc-order appropriately.

[ForwardStep Predicates "[" Expr "]"]_Path

let $fs:sequence := [ForwardStep Predicates]_Path return

let $fs:last := xf:count($fs:sequence) return

for $fs:position in op:to(1, $fs:last) return

let $fs:dot := op:item-at($fs:sequence, $fs:position) return

if [Expr]_Predicates then $fs:dot else ()

[ReverseStep Predicates "[" Expr "]"]_Path

let $fs:sequence := [ReverseStep Predicates]_Path return

let $fs:last := xf:count($fs:sequence) return

for $fs:position in op:to($fs:last,1) return

let $fs:dot := op:item-at($fs:sequence, $fs:position) return

if [Expr]_Predicates then $fs:dot else ()

The []_Predicates function is specified in [5.2.2 Predicates].

Ed. Note: Issue:These rules use the function xf:index-of to bind the position of a node in a sequence. This works since path expressions only work on nodes. See [Issue-0125: Operations on node only in XPath]. Still, it might be preferable to be able to bind both the current node and its position through a single processing of the collection within the for expression. The FS editors have made several syntax proposals for such an operation. For instance: for $v at $i in E1 return E2. See [Issue-0124: Binding position in FLWR expressions].

Ed. Note: Issue:The semantics of Predicates applied on primary expression is not specified. See [Issue-0163: Normalization of XPath predicates]

5.2.1.1 Axes

Introduction

[40 (XQuery)]	ForwardAxis	::=	<"child" "::"> \| <"descendant" "::"> \| <"attribute" "::"> \| <"self" "::"> \| <"descendant-or-self" "::">
[41 (XQuery)]	ReverseAxis	::=	<"parent" "::">
[36 (XPath)]	ForwardAxis	::=	<"child" "::"> \| <"descendant" "::"> \| <"attribute" "::"> \| <"self" "::"> \| <"descendant-or-self" "::"> \| <"following-sibling" "::"> \| <"following" "::"> \| <"namespace" "::">
[37 (XPath)]	ReverseAxis	::=	<"parent" "::"> \| <"ancestor" "::"> \| <"preceding-sibling" "::"> \| <"preceding" "::"> \| <"ancestor-or-self" "::">

Core Grammar

The core grammars productions for XPath axis are:

[25 (Core)]	ForwardAxis	::=	<"child" "::"> \| <"descendant" "::"> \| <"attribute" "::"> \| <"self" "::"> \| <"descendant-or-self" "::"> \| <"following-sibling" "::"> \| <"following" "::"> \| <"namespace" "::">
[26 (Core)]	ReverseAxis	::=	<"parent" "::"> \| <"ancestor" "::"> \| <"preceding-sibling" "::"> \| <"preceding" "::"> \| <"ancestor-or-self" "::">

Notation

Some auxiliary grammar productions and judgments are introduced to structure the semantic rules for axes and node tests. The first is used to capture the principal node kind associated with each axis:

[51 (Formal)]	PrincipalNodeKind	::=	"element" \| "attribute" \| "namespace"
[49 (Formal)]	Axis	::=	ForwardAxis \| ReverseAxis

Notation

The principal node kind for each axis is specified using the following judgment.

Axis principal PrincipalNodeKind

Notation

The following auxiliary grammar production defines Filters, which denote either an axis, or a principal node kind in the context of a principal node kind. An Axis merely combines the forward and reverse axes to allow judgments that range over both.

[49 (Formal)]	Axis	::=	ForwardAxis \| ReverseAxis
[50 (Formal)]	Filter	::=	Axis \| (PrincipalNodeKind NodeTest)

Notation

The semantics of filters is defined with three auxiliary judgments. The first judgment defines the effect of a filter on a value, and is used in the dynamic semantics. This judgment should be read as follows: in the current context (i.e., with respect to the dynamic environment dynEnv.varValue), applying the Filter on Value₁ yields Value₂:

dynEnv |- Filter on Value₁ => Value₂

The second judgment is defining the effect of a filter (either an axis or a nodetest) on a type and is used in the static semantics. This judgment should be read as follows: in the current context, (i.e., with respect to the the static environment statEnv.varType), applying the filter Filter on type Type₁ yields the type Type₂

statEnv |- Filter on Type₁ : Type₂

The last judgment allows an additional type environment, localTypeEnv. This "local" type environment maps variables to types (in the same way as statEnv.varType), and is used when computing the static type of the descendant axis.

statEnv ; localTypeEnv |- Filter on Type₁ : Type₂

These judgments will be defined separately for axes and node tests below.

Ed. Note: Explain the judgments in detail, maybe similarly to in section 3.2.

Static Type Analysis

The static semantics of an Axis NodeTest pair is obtained by retrieving the type of the context node, and applying the two filters (the Axis, and then the NodeTest with a PrincipalNodeKind) on the result.

statEnv.varType($fs:dot) = Type₁

statEnv |- Axis on Type₁ : Type₂

Axis principal PrincipalNodeKind

statEnv |- PrincipalNodeKind, NodeTest on Type₂ : Type₃

statEnv |- Axis NodeTest : Type₃

Here are the common set of inference rules defining the static semantics of filters. Essentially, these rules are used to process complex types through a filter, breaking the types down to the type of a node or of a value. The semantics of a filter applied to a node or value type is then specific to each Axis and NodeTest.

statEnv |- Filter on () : ()

statEnv |- Filter on none : none

statEnv |- Filter on Type₁ : Type₃

statEnv |- Filter on Type₂ : Type₄

statEnv |- Filter on Type₁,Type₂ : Type₃,Type₄

statEnv |- Filter on Type₁ : Type₃

statEnv |- Filter on Type₂ : Type₄

statEnv |- Filter on Type₁|Type₂ : Type₃|Type₄

statEnv |- Filter on Type₁ : Type₃

statEnv |- Filter on Type₂ : Type₄

statEnv |- Filter on Type₁&Type₂ : Type₃&Type₄

The cases for specific axis and node test filters are given in the axis and node test sections below.

Dynamic Evaluation

The dynamic semantics of an Axis NodeTest pair is obtained by retrieving the context node, and applying the two filters (Axis, then NodeTest) on the result. The application of each filter is expressed through the filter judgment as follows.

dynEnv.varValue($fs:dot) = Value₁

dynEnv |- Axis on Value₁ => Value₂

Axis principal PrincipalNodeKind

dynEnv |- PrincipalNodeKind, NodeTest on Value₂ => Value₃

dynEnv |- Axis NodeTest => Value₃

Here are the common set of inference rules defining the dynamic semantics of filters. These rules apply filters to a sequence, by breaking the sequence down to a node or value.

dynEnv |- Filter on () => ()

dynEnv |- Filter on Value₁ => Value₃

dynEnv |- Filter on Value₂ => Value₄

dynEnv |- Filter on Value₁,Value₂ => Value₃,Value₄

Ed. Note: Jerome: These rules fetaure a notational abuse! Value₁,Value₂ indicates the concatenation of Value₁ and Value₂, and is used for both constructing a new sequence and deconstructing it. When doing the construction, it is equivalent to applying the op:concatenate operator. The Formal Semantics specification should use more consistent notation for data model constructions and deconstruction. See also [Issue-0118: Data model syntax and literal values].

The semantics of a filter applied to a specific axis or nodetest is given in the appropriate section below.

Normalization

The semantics of the following(-sibling) and preceding(-sibling) axes are expressed by mapping them to core expressions, all other axes are part of core [XPath/XQuery] and therefore are left unchanged through normalization.

[following-sibling:: NodeTest]_Path

typeswitch ($fs:dot)

case attribute $v return ()

default $v return

(for $fs:new in (for $fs:dot in (for $fs:dot in $fs:dot return parent::node() ) return child::NodeTest) return

if dm:node-before($fs:dot,$fs:new) then $fs:new else ())

Ed. Note: Should all mappings be unfolded all the way to the core language like the above one?

[following:: NodeTest]_Path

[ancestor-or-self::node()/following-sibling::node()/descendant-or-self::NodeTest]_Path

Otherwise, the forward axis is part of the core [XPath/XQuery] and handled by the Axis/NodeTest semantics below:

[child:: NodeTest]_Path

child:: NodeTest

[attribute:: NodeTest]_Path

attribute:: NodeTest

[self:: NodeTest]_Path

self:: NodeTest

[descendant:: NodeTest]_Path

descendant:: NodeTest

[descendant-or-self:: NodeTest]_Path

descendant-or-self:: NodeTest

[namespace:: NodeTest]_Path

namespace:: NodeTest

Reverse axes:

[preceding-sibling:: NodeTest]_Path

for $fs:new in (for $fs:dot in (for $fs:dot in $fs:dot return parent::node()) return child::NodeTest) return

if dm:node-before($fs:new,$fs:dot) then $fs:new else ()

[preceding:: NodeTest]_Path

[ancestor-or-self::node()/preceding-sibling::node()/descendant-or-self::NodeTest]_Path

Otherwise, the reverse axis is part of the core [XPath/XQuery] and is handled by the Axis/NodeTest semantics below.

[parent:: NodeTest]_Path

parent:: NodeTest

[ancestor:: NodeTest]_Path

ancestor:: NodeTest

[ancestor-or-self:: NodeTest]_Path

ancestor-or-self:: NodeTest

Finally, the principal node kind of each axis is enumerated by the following rules (given here because they are used both by the static and dynamic semantic rules).

attribute:: principal attribute

namespace:: principal namespace

Axis != attribute:: Axis != namespace::

Axis principal element

Static Type Analysis

The following rules define the static semantics of the filter judgment when applied to an Axis.

The type for the self axis is the same type as the type of the context node.

statEnv |- self:: on NodeType : NodeType

In case of elements, the type of the child axis is obtained by extracting the children types out of the content model describing the type of the context node.

statEnv |- child:: on element qname { AttrType, ChildType } : ChildType

Ed. Note: [Kristoffer/XSL] Need to add filters for all cases.

The type of the attribute axis is obtained by extracting the attribute types out of the content model describing the type of the context node.

statEnv |- attribute:: on element qname { AttrType, ChildType } : AttrType

Ed. Note: [Kristoffer/XSL] Need to add filters for all cases.

The type for the parent axis is either an element, a document (for document elements), or empty.

statEnv |- parent:: on element : (element|document)?

Ed. Note: Better typing for the parent axis is still an open issue. See [Issue-0080: Typing of parent].

The type for the namespace axis is always empty.

statEnv |- namespace:: on NodeType : ()

Ed. Note: Jerome: The type of namespace nodes is still an open issue. See [Issue-0105: Types for nodes in the data model.].

The types for the descendant, and descendant-or-self, ancestor, and ancestor-or-self axis are implemented through recursive application of the children and parent filters. The corresponding inference rules use the auxiliary filter judgment with a local type environment. This local type environment is used to keep track of the visited elements in order to deal with recursive types.

statEnv ; { } |- descendant:: on Type : Type'

statEnv |- descendant:: on Type : Type'

The following two rules are used to deal with global elements. Global elements need careful treatment here in order to deal with possible recursion. Notably, if the global element has been seen before, then the rule terminates and returns the union of all types already visited in the local environment.

localTypeEnv(qname) = Type

localTypeEnv = { Type₁, Type₂, ... }

statEnv |- descendant:: on element qname : (Type₁|Type₂|...)*

localTypeEnv(qname) is an error

statEnv.elemDecl(qname) = define element qname TypeSpecifier

TypeSpecifier resolves to TypeName { Type }

statEnv ; localTypeEnv[qname : Type] |- descendant:: on Type : Type'

statEnv localTypeEnv |- descendant:: on element qname Type'

Ed. Note: Peter: I need yet to figure out, whether this works, there still appear a few glitches. See [Issue-0132: Typing for descendant].

In all other cases, the following rule applies

statEnv |- child:: on NodeType : Type₁

statEnv |- descendant:: on Type₁ : Type₂

statEnv |- descendant:: on NodeType : Type₁,Type₂

statEnv |- self:: on NodeType : Type₁

statEnv |- descendant:: on Type₁ : Type₂

statEnv |- descendant-or-self:: on NodeType : Type₁,Type₂

The type for the ancestor axis is the element*,document? type indicating that it includes elements and possibly a document element.

statEnv |- ancestor:: on NodeType : element*

statEnv |- self:: on NodeType : Type₁

statEnv |- ancestor:: on Type₁ : Type₂

statEnv |- ancestor-or-self:: on NodeType : Type₁, Type₂

In all the other cases, the filter application results in an empty type.

statEnv |- AxisName:: on NodeType : () otherwise.

Ed. Note: KHR: "otherwise" rules should be avoided and specify the cases exhaustively.

Dynamic Evaluation

The inference rules below indicate for each axis whether a particular node should be considered further or whether the node should be left out. Therefore, each rule either returns the input node, or returns the empty sequence. The overall result is obtained by putting together all of the remaining nodes, following the generic semantics for filters.

The self axis just returns the context node.

dynEnv |- self:: on NodeValue => NodeValue

The child, parent, attribute and namespace axis are implemented through their corresponding accessors in the [XQuery 1.0 and XPath 2.0 Data Model].

dynEnv |- child:: on NodeValue => dm:children(NodeValue)

dynEnv |- attribute:: on NodeValue => dm:attributes(NodeValue)

dynEnv |- parent:: on NodeValue => dm:parent(NodeValue)

The descendant, descendant-or-self, ancestor, and ancestor-or-self axis are implemented through recursive application of the children and parent filters.

dynEnv |- child:: on NodeValue => Value₁

dynEnv |- descendant:: on Value₁ => Value₂

dynEnv |- descendant:: on NodeValue => fs:distinct-doc-order(op:concatenate(Value₁, Value₂))

dynEnv |- self:: on NodeValue => Value₁

dynEnv |- descendant:: on Value₁ => Value₂

dynEnv |- descendant-or-self:: on NodeValue => op:concatenate(Value₁, Value₂)

dynEnv |- parent:: on NodeValue => Value₁

dynEnv |- ancestor:: on Value₁ => Value₂

dynEnv |- ancestor:: on NodeValue => op:concatenate(Value₁, Value₂)

dynEnv |- self:: on NodeValue => Value₁

dynEnv |- ancestor:: on Value₁ => Value₂

dynEnv |- ancestor-or-self:: on NodeValue => op:concatenate(Value₁, Value₂)

In all the other cases, the filter application results in an empty sequence.

dynEnv |- AxisName:: on NodeValue => () otherwise.

Ed. Note: Peter: Generally steps may operate on nodes and values alike; the axis rules only can operate on nodes (NodeValue). Is it a dynamic error to apply an axis rule on a value? See [Issue-0125: Operations on node only in XPath].

5.2.1.2 Node Tests

Introduction

A node test is a condition applied on the nodes selected by an axis step. Possible node tests are described by the following grammar productions.

[42 (XQuery)]	NodeTest	::=	KindTest \| NameTest
[43 (XQuery)]	NameTest	::=	QName \| Wildcard
[44 (XQuery)]	Wildcard	::=	"" \| <NCName ":" ""> \| <"*" ":" NCName>
[45 (XQuery)]	KindTest	::=	ProcessingInstructionTest \| CommentTest \| TextTest \| AnyKindTest
[46 (XQuery)]	ProcessingInstructionTest	::=	<"processing-instruction" "("> StringLiteral? ")"
[47 (XQuery)]	CommentTest	::=	<"comment" "("> ")"
[48 (XQuery)]	TextTest	::=	<"text" "("> ")"
[49 (XQuery)]	AnyKindTest	::=	<"node" "("> ")"

Core Grammar

The core grammars productions for XPath node tests are:

[27 (Core)]	NodeTest	::=	KindTest \| NameTest
[28 (Core)]	NameTest	::=	QName \| Wildcard
[29 (Core)]	Wildcard	::=	"" \| <NCName ":" ""> \| <"*" ":" NCName>
[30 (Core)]	KindTest	::=	ProcessingInstructionTest \| CommentTest \| TextTest \| AnyKindTest
[31 (Core)]	ProcessingInstructionTest	::=	<"processing-instruction" "("> StringLiteral? ")"
[32 (Core)]	CommentTest	::=	<"comment" "("> ")"
[33 (Core)]	TextTest	::=	<"text" "("> ")"
[34 (Core)]	AnyKindTest	::=	<"node" "("> ")"

Static Type Analysis

The following typing rules apply to filters that are node tests in the context of a principal node kind.

Node tests on elements and attributes always accomplish the most specific type possible. For example, if $v is bound to an element with a computed name, the type of $v is element { Type }. The static type computed for the expression $v/self::foo is element foo {Type}, which makes use of the nametest foo to arrive at a more specific type. Also note that each case of name matching restricts the principal node kind to the appropriate one.

Ed. Note: [Kristoffer/XSL] Perhaps use statEnv |- expand(QName) = qname(URI, ncname) instead of qname = prefix:local? Also The ElemOrAttrType doesn't allow wildcard in the ElementName or AttributeName, so either the rules can be removed or the definition should be updated.

QName₂ = Prefix₂:LocalPart₂

xf:get-namespace-uri( QName₁) = statEnv.varType(Prefix₂)

xf:get-local-name( QName₁ ) = LocalPart₂

statEnv |- element, QName₂ on element QName₁ {Type} : element QName₁ {Type}

QName₂ = Prefix₂:LocalPart₂

LocalPart₁ = LocalPart₂

statEnv |- element, QName₂ on element *:LocalPart₁ {Type} : element QName₂ {Type}

QName₂ = Prefix₂:LocalPart₂

statEnv.namespace(Prefix₁) = statEnv.namespace(Prefix₂)

statEnv |- element, QName₂ on element Prefix₁:* {Type} : element Prefix₁:LocalPart₂{Type}

statEnv |- element, QName₂ on element {Type} : element QName₂ {Type}

xf:get-local-name( QName₁ ) = LocalPart₂

statEnv |- element, *:LocalPart₂ on element QName₁ {Type} : element QName₁ {Type}

LocalPart₁ = LocalPart₂

statEnv |- element, *:LocalPart₂ on element *:LocalPart₁ {Type} : element *:LocalPart₂ {Type}

statEnv |- element, *:LocalPart₂ on element Prefix₁:* {Type} : element Prefix₁:LocalPart₂{Type}

statEnv |- element, *:LocalPart₂ on element { Type } : element *:LocalPart₂ {Type}

xf:get-namespace-uri( QName₁) = statEnv.namespace(Prefix₂)

statEnv |- element, Prefix₂:* on element QName₁ {Type} : element QName₁ {Type}

statEnv |- element, Prefix₂:* on element *:LocalPart₁ {Type} : element Prefix₂:LocalPart₁ {Type}

statEnv.namespace(Prefix₁) = statEnv.namespace(Prefix₂)

statEnv |- element, Prefix₂:* onelement Prefix₁:* {Type} : element Prefix₁:* {Type}

statEnv |- element, Prefix₂:* on element {Type} : element Prefix₂:* {Type}

statEnv |- element, * on element prefix:local {Type} : element prefix:local {Type}

Very similar typing rules apply to attributes:

QName₂ = Prefix₂:LocalPart₂

xf:get-namespace-uri( QName₁) = statEnv.namespace(Prefix₂)

xf:get-local-name( QName₁ ) = LocalPart₂

statEnv |- attribute, QName₂ on attribute QName₁ {Type} : attribute QName₁ {Type}

QName₂ = Prefix₂:LocalPart₂

LocalPart₁ = LocalPart₂

statEnv |- attribute, QName₂ on attribute *:LocalPart₁ {Type} : attribute QName₂ {Type}

QName₂ = Prefix₂:LocalPart₂

statEnv.namespace(Prefix₁) = statEnv.namespace(Prefix₂)

statEnv |- attribute, QName₂ on attribute Prefix₁:* {Type} : attribute Prefix₁:LocalPart₂{Type}

statEnv |- attribute, QName₂ on attribute {Type} : attribute QName₂ {Type}

xf:get-local-name( QName₁ ) = LocalPart₂

statEnv |- attribute, *:LocalPart₂ on attribute QName₁ {Type} : attribute QName₁ {Type}

LocalPart₁ = LocalPart₂

statEnv |- attribute, *:LocalPart₂ on attribute *:LocalPart₁ {Type} : attribute *:LocalPart₂ {Type}

statEnv |- attribute, *:LocalPart₂ on attribute Prefix₁:* {Type} : attribute Prefix₁:LocalPart₂{Type}

statEnv |- attribute, *:LocalPart₂ on attribute {Type} : attribute *:LocalPart₂ {Type}

xf:get-namespace-uri( QName₁) = statEnv.namespace(Prefix₂)

statEnv |- attribute, Prefix₂:* on attribute QName₁ {Type} : attribute QName₁ {Type}

statEnv |- attribute, Prefix₂:* on attribute *:LocalPart₁ {Type} : attribute Prefix₂:LocalPart₁ {Type}

statEnv.namespace(Prefix₁) = statEnv.namespace(Prefix₂)

statEnv |- attribute, Prefix₂:* on attribute Prefix₁:* {Type} : attribute Prefix₁:* {Type}

statEnv |- attribute, Prefix₂:* on attribute {Type} : attribute Prefix₂:* {Type}

statEnv |- attribute, * on attribute prefix:local {Type} : attribute prefix:local {Type}

Comments, processing instructions, and text:

statEnv |- PrincipalNodeKind, processing-instruction() on processing-instruction : processing-instruction

statEnv |- PrincipalNodeKind, processing-instruction(String) on processing-instruction : processing-instruction

statEnv |- PrincipalNodeKind, comment() on comment : comment

statEnv |- PrincipalNodeKind, text() on text : text

statEnv |- PrincipalNodeKind, node() on NodeType : NodeType

If none of the above rules apply, then the node test returns the empty sequence and the following dynamic rule is applied:

statEnv |- PrincipalNodeKind, node() on NodeType : ()

Ed. Note: Peter: Except for self::, all axes guarantee that the NodeType is not the generic type node. However, when the type node is encountered, it has to be interpreted as element | attribute | text | comment | processing-instruction for these typing rules to work.

Dynamic Evaluation

The dynamic semantics indicates for each node test whether a node is retained.

dm:node-kind( NodeValue ) = PrincipalNodeKind

dm:name( NodeValue ) = QName₁

QName₂ = Prefix₂:LocalPart₂

xf:get-namespace-uri ( QName₁ ) = statEnv.namespace(Prefix₂)

xf:get-local-name ( QName₁ ) = LocalPart₂

dynEnv |- PrincipalNodeKind, QName₂ on NodeValue => NodeValue

dm:node-kind( NodeValue ) = PrincipalNodeKind

dm:name ( NodeValue ) = qname

dynEnv |- PrincipalNodeKind, * on NodeValue => NodeValue

dm:node-kind( NodeValue ) = PrincipalNodeKind

dm:name ( NodeValue ) = qname

xf:get-namespace-uri ( qname ) = statEnv.namespace(prefix)

dynEnv |- PrincipalNodeKind, prefix:* on NodeValue => NodeValue

dm:node-kind( NodeValue ) = PrincipalNodeKind

dm:name ( NodeValue ) = qname

xf:get-local-name ( qname ) = local

dynEnv |- PrincipalNodeKind, *:local on NodeValue => NodeValue

dm:node-kind ( NodeValue ) = "processing-instruction"

dynEnv |- PrincipalNodeKind, processing-instruction() on NodeValue => NodeValue

dm:node-kind ( NodeValue ) = "processing-instruction"

dm:name ( NodeValue ) = qname

xf:get-local-name ( qname ) = String

dynEnv |- PrincipalNodeKind, processing-instruction( String ) on NodeValue => NodeValue

Ed. Note: Note the use of the xf:get-local-name function to extract the local name out of the name of the node.

dm:node-kind ( NodeValue ) = "comment"

dynEnv |- PrincipalNodeKind, comment() on NodeValue => NodeValue

dm:node-kind ( NodeValue ) = "text"

dynEnv |- PrincipalNodeKind, text() on NodeValue => NodeValue

The node() node test is true for all nodes. Therefore, the following rule does not have any precondition (remember that an empty upper part in the rule indicates that the rule is always true).

dynEnv |- PrincipalNodeKind, node() on NodeValue => NodeValue

If none of the above rules applies then the node test returns the empty sequence, and the following dynamic rule is applied:

dynEnv |- PrincipalNodeKind, node() on NodeValue => ()

5.2.2 Predicates

Introduction

Predicates are composed of zero or more expressions enclosed in square brackets.

[54 (XQuery)]

Predicates

::=

("[" Expr "]")*

Normalization

Predicates in path expressions are normalized with a special mapping rule:

[Expr]_Predicates

typeswitch (Expr)

case empty $v return xf:false()

case numeric $v return op:numeric-equal(xf:round($v), $fs:position)

case xs:boolean $v return $v

default $v return

if (xf:length([$v/self::node()]_Path)>=1)

then xf:true()

else xf:error()

5.2.3 Unabbreviated Syntax

Ed. Note: XQuery Section 2.3.4 has no semantic content -- it just contains examples! (This suggests that perhaps the section structure is a little weird.)

5.2.4 Abbreviated Syntax

[52 (XQuery)]	AbbreviatedForwardStep	::=	"." \| ("@" NameTest) \| NodeTest
[53 (XQuery)]	AbbreviatedReverseStep	::=	".."

Normalization

Here are normalization rules for the abbreviated syntax.

[ // RelativePathExpr ]_Path

/ descendant-or-self::node() / [RelativePathExpr]_Path

[ Expr // RelativePathExpr ]_Path

[Expr]_Expr / descendant-or-self::node() / [RelativePathExpr]_Path

[ . ]_Path

$fs:dot

[ .. ]_Path

parent::node()

[ @ NameTest ]_Path

attribute :: NameTest

[ NodeTest ]_Path

child :: NodeTest

5.3 Sequence Expressions

Introduction

[XPath/XQuery] supports operators to construct and combine sequences. A sequence is an ordered collection of zero or more items. An item is either an atomic value or a node.

5.3.1 Constructing Sequences

[4 (XQuery)]	ExprSequence	::=	Expr ("," Expr)*
[16 (XQuery)]	RangeExpr	::=	AdditiveExpr ( "to" AdditiveExpr )*

Normalization

The sequence expression is normalized into a sequence of core expressions:

["(" ExprSequence ")"]_Expr

"(" [ ExprSequence ]_Expr ")"

[Expr₁ , ... , Expr_n]_Expr

[Expr₁]_Expr, ..., [Expr_n]_Expr

Ed. Note: Mike Kay remarks that it would be cleaner to have binary rules and recursion to define the semantics rather than the ... notation.

Core Grammar

The core grammar rule for sequence expressions is:

[4 (Core)]

ExprSequence

::=

Expr ("," Expr)*

Static Type Analysis

The static semantics of the sequence expression follows. The type of the sequence expression is the sequence over the types of the individual expressions.

statEnv |- Expr₁ : Type₁ ... statEnv |- Expr_n : Type_n

statEnv |- Expr₁ , ... , Expr_n : Type₁, ..., Type_n

Dynamic Evaluation

The dynamic semantics of the sequence expression follows. Each expression in the sequence is evaluated and the resulting values are concatenated into one sequence.

dynEnv |- Expr₁ => Value₁ dynEnv |- Expr₂ => Value₂ ... dynEnv |- Expr_n => Value_n

dynEnv |- Expr₁, ..., Expr_n => op:concatenate(Value₁, op:concatenate(Value₂, op:concatenate(... , Value_n) ) )

Normalization

The normalization of the infix "to" operator maps it into an application of op:to.

[Expr₁ "op:to" Expr₂]_Expr

[op:to (Expr₁, Expr₂)]_Expr

Static Type Analysis

The static semantics of the op:to function is defined in [7 Additional Semantics of Functions].

Dynamic Evaluation

The dynamic semantics rules for function calls given in [5.1.4 Function Calls] are applied to the function call op:to above.

Ed. Note: Should the "to" operator be defined formally? See [Issue-0133: Should to also be described in the formal semantics?].

5.3.2 Combining Sequences

[XPath/XQuery] provides several operators for combining sequences.

[19 (XQuery)]	UnionExpr	::=	IntersectExceptExpr ( ("union" \| "\|") IntersectExceptExpr )*
[20 (XQuery)]	IntersectExceptExpr	::=	UnaryExpr ( ("intersect" \| "except") UnaryExpr )*

Notation

First, a union, intersect, or except expression is normalized into a corresponding core expression. The mapping function []_SequenceOp is defined by the following tables:

SequenceOp	[SequenceOp]_SequenceOp
"union"	op:union
"\|"	op:union
"intersect"	op:intersect
"except"	op:except

Normalization

[Expr₁ SequenceOp Expr₂]_Expr

[SequenceOp]_SequenceOp ( [Expr₁]_Expr, [Expr₂]_Expr )

Static Type Analysis

The static semantics of the functions that operate on sequences are defined in [7 Additional Semantics of Functions].

Dynamic Evaluation

The dynamic semantics rules for function calls given in [5.1.4 Function Calls] are applied to the calls to functions on sequences above.

5.4 Arithmetic Expressions

[XPath/XQuery] provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms.

[17 (XQuery)]	AdditiveExpr	::=	MultiplicativeExpr ( ("+" \| "-") MultiplicativeExpr )*
[18 (XQuery)]	MultiplicativeExpr	::=	UnionExpr ( ("" \| "div" \| "idiv" \| "mod") UnionExpr )
[21 (XQuery)]	UnaryExpr	::=	("-" \| "+")* ValueExpr
[22 (XQuery)]	ValueExpr	::=	ValidateExpr \| CastExpr \| Constructor \| PathExpr
[24 (XPath)]	ValueExpr	::=	ValidateExpr \| CastExpr \| PathExpr

Core Grammar

All of those expressions are normalized to the following core grammar production:

[12 (Core)]

ValueExpr

::=

ValidateExpr | CastExpr | Constructor | PathExpr

Notation

Ed. Note: MFF: The operator table in this section was produced by Don Chamberlin. This table should be in one place, probably the Formal Semantics

Ed. Note: [Kristoffer/XSL] The present rules mix static and dynamic dispatch of which primitive function is used. We are investigating how to repair that as well as make the definitions work with both static and dynamic typing. See [Issue-0122: Overloaded functions].

The tables in this section list the combinations of datatypes for which the various operators of [XPath/XQuery] are defined. For each valid combination of datatypes, the table indicates the name of the function that implements the operator and the datatype of the result. Definitions of the functions can be found in [XQuery 1.0 and XPath 2.0 Functions and Operators].

In the following tables, the term fs:numeric refers to the types xs:integer, xs:decimal, xs:float, and xs:double. When the result type of an operator is listed as numeric, it means "same as the highest type of any input operand, in promotion order." For example, when invoked with operands of type xs:integer and xs:float, the binary + operator returns a result of type xs:float.

In the following tables, the term Gregorian refers to the types xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gDay, and xs:gMonth. For binary operators that accept two Gregorian-type operands, both operands must have the same type (for example, if one operand is of type xs:gDay, the other operand must be of type xs:gDay.)

The functions []_BinaryOp and []_UnaryOp are defined by the following two tables. The function []_BinaryOp takes the left-hand expression (A) and its type, the operator, the right-hand expression (b) and its type and returns a new expression, which applies the type-appropriate operator to the two expressions. The function []_UnaryOp takes an operator and an expression and returns a new expression, which applies the appropriate operator to the expression.

Operator	Type(A)	Type(B)	[A; Type(A); Operator; B; Type(B)]_BinaryOp	Result type
A + B	numeric	numeric	op:numeric-add(A, B)	numeric
A + B	xs:date	xs:yearMonthDuration	op:add-yearMonthDuration-to-date(A, B)	xs:date
A + B	xs:yearMonthDuration	xs:date	op:add-yearMonthDuration-to-date(B, A)	xs:date
A + B	xs:date	xs:dayTimeDuration	op:add-dayTimeDuration-to-date(A, B)	xs:date
A + B	xs:dayTimeDuration	xs:date	op:add-dayTimeDuration-to-date(B, A)	xs:date
A + B	xs:time	xs:dayTimeDuration	op:add-dayTimeDuration-to-time(A, B)	xs:time
A + B	xs:dayTimeDuration	xs:time	op:add-dayTimeDuration-to-time(B, A)	xs:time
A + B	xs:datetime	xs:yearMonthDuration	op:add-yearMonthDuration-to-dateTime(A, B)	xs:dateTime
A + B	xs:yearMonthDuration	xs:datetime	op:add-yearMonthDuration-to-dateTime(B, A)	xs:dateTime
A + B	xs:datetime	xs:dayTimeDuration	op:add-dayTimeDuration-to-dateTime(A, B)	xs:dateTime
A + B	xs:dayTimeDuration	xs:datetime	op:add-dayTimeDuration-to-dateTime(B, A)	xs:dateTime
A + B	xs:yearMonthDuration	xs:yearMonthDuration	op:add-yearMonthDurations(A, B)	xs:yearMonthDuration
A + B	xs:dayTimeDuration	xs:dayTimeDuration	op:add-dayTimeDurations(A, B)	xs:dayTimeDuration
A - B	numeric	numeric	op:numeric-subtract(A, B)	numeric
A - B	xs:date	xs:date	xf:subtract-dates(A, B)	xs:dayTimeDuration
A - B	xs:date	xs:yearMonthDuration	op:subtract-yearMonthDuration-from-date(A, B)	xs:date
A - B	xs:date	xs:dayTimeDuration	op:subtract-dayTimeDuration-from-date(A, B)	xs:date
A - B	xs:time	xs:time	xf:subtract-times(A, B)	xs:dayTimeDuration
A - B	xs:time	xs:dayTimeDuration	op:subtract-dayTimeDuration-from-time(A, B)	xs:time
A - B	xs:dateTime	xs:dateTime	xf:get-yearMonthDuration-from-dateTimes(A, B)	xs:yearMonthDuration
A - B	xs:dateTime	xs:dateTime	xf:get-dayTimeDuration-from-dateTimes(A, B)	xs:dayTimeDuration
A - B	xs:datetime	xs:yearMonthDuration	op:subtract-yearMonthDuration-from-dateTime(A, B)	xs:dateTime
A - B	xs:datetime	xs:dayTimeDuration	op:subtract-dayTimeDuration-from-dateTime(A, B)	xs:dateTime
A - B	xs:yearMonthDuration	xs:yearMonthDuration	op:subtract-yearMonthDurations(A, B)	xs:yearMonthDuration
A - B	xs:dayTimeDuration	xs:dayTimeDuration	op:subtract-dayTimeDurations(A, B)	xs:dayTimeDuration
A * B	numeric	numeric	op:numeric-multiply(A, B)	numeric
A * B	xs:yearMonthDuration	xs:decimal	op:multiply-yearMonthDuration(A, B)	xs:yearMonthDuration
A * B	xs:decimal	xs:yearMonthDuration	op:multiply-yearMonthDuration(B, A)	xs:yearMonthDuration
A * B	xs:dayTimeDuration	xs:decimal	op:multiply-dayTimeDuration(A, B)	xs:dayTimeDuration
A * B	xs:decimal	xs:dayTimeDuration	op:multiply-dayTimeDuration(B, A)	xs:dayTimeDuration
A div B	numeric	numeric	op:numeric-divide(A, B)	numeric
A div B	xs:yearMonthDuration	xs:decimal	op:divide-yearMonthDuration(A, B)	xs:yearMonthDuration
A div B	xs:dayTimeDuration	xs:decimal	op:divide-dayTimeDuration(A, B)	xs:dayTimeDuration
A mod B	numeric	numeric	op:numeric-mod(A, B)	numeric
A eq B	numeric	numeric	op:numeric-equal(A, B)	xs:boolean
A eq B	xs:boolean	xs:boolean	op:boolean-equal(A, B)	xs:boolean
A eq B	xs:string	xs:string	op:numeric-equal(xf:compare(A, B), 1)	xs:boolean
A eq B	xs:date	xs:date	op:date-equal(A, B)	xs:boolean?
A eq B	xs:time	xs:time	op:time-equal(A, B)	xs:boolean?
A eq B	xs:dateTime	xs:dateTime	op:datetime-equal(A, B)	xs:boolean?
A eq B	xs:yearMonthDuration	xs:yearMonthDuration	op:yearMonthDuration-equal(A, B)	xs:boolean
A eq B	xs:dayTimeDuration	xs:dayTimeDuration	op:dayTimeDuration-equal(A, B)	xs:boolean
A eq B	Gregorian	Gregorian	op:Gregorian-equal(A, B)	xs:boolean
A eq B	xs:hexBinary	xs:hexBinary	op:hex-binary-equal(A, B)	xs:boolean
A eq B	xs:base64Binary	xs:base64Binary	op:base64-binary-equal(A, B)	xs:boolean
A eq B	xs:anyURI	xs:anyURI	op:anyURI-equal(A, B)	xs:boolean
A eq B	xs:QName	xs:QName	op:QName-equal(A, B)	xs:boolean
A eq B	xs:NOTATION	xs:NOTATION	op:NOTATION-equal(A, B)	xs:boolean
A ne B	numeric	numeric	xf:not(op:numeric-equal(A, B))	xs:boolean
A ne B	xs:boolean	xs:boolean	xf:not(op:boolean-equal(A, B))	xs:boolean
A ne B	xs:string	xs:string	xf:not(op:numeric-equal(xf:compare(A, B), 1))	xs:boolean
A ne B	xs:date	xs:date	xf:not(op:date-equal(A,B))	xs:boolean?
A ne B	xs:time	xs:time	xf:not(op:time-equal(A,B))	xs:boolean?
A ne B	xs:dateTime	xs:dateTime	xf:not(op:datetime-equal(A, B))	xs:boolean?
A ne B	xs:yearMonthDuration	xs:yearMonthDuration	xf:not(op:yearMonthDuration-equal(A, B))	xs:boolean
A ne B	xs:dayTimeDuration	xs:dayTimeDuration	xf:not(op:dayTimeDuration-equal(A, B)	xs:boolean
A ne B	Gregorian	Gregorian	xf:not(op:Gregorian-equal(A, B))	xs:boolean
A ne B	xs:hexBinary	xs:hexBinary	xf:not(op:hex-binary-equal(A, B))	xs:boolean
A ne B	xs:base64Binary	xs:base64Binary	xf:not(op:base64-binary-equal(A, B))	xs:boolean
A ne B	xs:anyURI	xs:anyURI	xf:not(op:anyURI-equal(A, B))	xs:boolean
A ne B	xs:QName	xs:QName	xf:not(op:QName-equal(A, B))	xs:boolean
A ne B	xs:NOTATION	xs:NOTATION	xs:not(op:NOTATION-equal(A, B))	xs:boolean
A gt B	numeric	numeric	op:numeric-greater-than(A, B)	xs:boolean
A gt B	xs:boolean	xs:boolean	op:boolean-greater-than(A, B)	xs:boolean
A gt B	xs:string	xs:string	op:numeric-greater-than(xf:compare(A, B), 0)	xs:boolean
A gt B	xs:date	xs:date	op:date-greater-than(A, B)	xs:boolean?
A gt B	xs:time	xs:time	op:time-greater-than(A, B)	xs:boolean?
A gt B	xs:dateTime	xs:dateTime	op:datetime-greater-than(A, B)	xs:boolean?
A gt B	xs:yearMonthDuration	xs:yearMonthDuration	op:yearMonthDuration-greater-than(A, B)	xs:boolean
A gt B	xs:dayTimeDuration	xs:dayTimeDuration	op:dayTimeDuration-greater-than(A, B)	xs:boolean
A lt B	numeric	numeric	op:numeric-less-than(A, B)	xs:boolean
A lt B	xs:boolean	xs:boolean	op:boolean-less-than(A, B)	xs:boolean
A lt B	xs:string	xs:string	op:numeric-less-than(xf:compare(A, B), 0)	xs:boolean
A lt B	xs:date	xs:date	op:date-less-than(A, B)	xs:boolean?
A lt B	xs:time	xs:time	op:time-less-than(A, B)	xs:boolean?
A lt B	xs:dateTime	xs:dateTime	op:datetime-less-than(A, B)	xs:boolean?
A lt B	xs:yearMonthDuration	xs:yearMonthDuration	op:yearMonthDuration-less-than(A, B)	xs:boolean
A lt B	xs:dayTimeDuration	xs:dayTimeDuration	op:dayTimeDuration-less-than(A, B)	xs:boolean
A ge B	numeric	numeric	op:numeric-less-than(B, A)	xs:boolean
A ge B	xs:string	xs:string	op:numeric-greater-than(xf:compare(A, B), -1)	xs:boolean
A ge B	xs:date	xs:date	op:date-less-than(B, A)	xs:boolean?
A ge B	xs:time	xs:time	op:time-less-than(B, A)	xs:boolean?
A ge B	xs:dateTime	xs:dateTime	op:datetime-less-than(B, A)	xs:boolean?
A ge B	xs:yearMonthDuration	xs:yearMonthDuration	op:yearMonthDuration-less-than(B, A)	xs:boolean
A ge B	xs:dayTimeDuration	xs:dayTimeDuration	op:dayTimeDuration-less-than(B, A)	xs:boolean
A le B	numeric	numeric	op:numeric-greater-than(B, A)	xs:boolean
A le B	xs:string	xs:string	op:numeric-less-than(xf:compare(A, B), 1)	xs:boolean
A le B	xs:date	xs:date	op:date-greater-than(B, A)	xs:boolean?
A le B	xs:time	xs:time	op:ime-greater-than(B, A)	xs:boolean?
A le B	xs:dateTime	xs:dateTime	op:datetime-greater-than(B, A)	xs:boolean?
A le B	xs:yearMonthDuration	xs:yearMonthDuration	op:yearMonthDuration-greater-than(B, A)	xs:boolean
A le B	xs:dayTimeDuration	xs:dayTimeDuration	op:dayTimeDuration-greater-than(B, A)	xs:boolean
A is B	node	node	op:node-equal(A, B)	xs:boolean
A isnot B	node	node	xf:not(op:node-equal(A, B))	xs:boolean
A << B	node	node	op:node-before(A, B)	xs:boolean
A >> B	node	node	op:node-after(A, B)	xs:boolean
A precedes B	node	node	op:node-precedes(A, B)	xs:boolean
A follows B	node	node	op:node-follows(A, B)	xs:boolean
A union B	node*	node*	op:union(A, B)	node*
A \| B	node*	node*	op:union(A, B)	node*
A intersect B	node*	node*	op:intersect(A, B)	node*
A except B	node*	node*	op:except(A, B)	node*
A to B	xs:decimal	xs:decimal	op:to(A, B)	xs:integer+
A , B	item*	item*	op:concatenate(A, B)	item*

An analogous table exists for unary operators.

Operator	Operand type	[Operator; Expr]_UnaryOp	Result type
+ A	numeric	op:numeric-unary-plus(A)	numeric
- A	numeric	op:numeric-unary-minus(A)	numeric

Normalization

The normalization rules for the arithmetic operators "+" and "-" are similar, but not identical, because as the table above illustrates, "-" is not commutative.

The following normalization rule for "+" first applies []_Atomize to each argument expression, binding the results of these expressions to two new variables, $e1 and $e2. It then applies a typeswitch on the left-hand operand $e1, and for each left-hand operand type, it applies a second typeswitch on the right-hand operand $e2. The function [Operator]_BinaryOp takes the operator, the left-hand type, and the right-hand type and returns the appropriate function, which is applied to the argument values.

[Expr₁ "+" Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

$z1 := [ $coree1 ]_Atomize

$e1 := typeswitch ($z1)

case untyped $v1 return cast ($v1) as xs:double

default $v1 return $v1,

$z2 := [ $coree2 ]_Atomize

$e2 := typeswitch ($z2)

case untyped $v2 return cast ($v2) as xs:double

default $v2 return $v2

return

typeswitch ($e1)

case empty $v1 return ()

case fs:numeric $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case fs:numeric $v2 return [ $v1; numeric; "+"; $v2; numeric ]_BinaryOp

default $v2 return xf:error())

case xs:date $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:yearMonthDuration $v2 return [ $v1; xs:date; "+"; $v2; xs:yearMonthDuration ]_BinaryOp

case xs:dayTimeDuration $v2 return [ $v1; xs:date; "+"; $v2; xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case xs:time $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [ $v1; xs:time; "+"; $v2; xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case xs:dateTime $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:yearMonthDuration $v2 return [ $v1; xs:dateTime; "+"; $v2; xs:yearMonthDuration ]_BinaryOp

case xs:dayTimeDuration $v2 return [$v1; xs:dateTime; "+"; $v2; xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case xs:yearMonthDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:date $v2 return [ $v1; xs:yearMonthDuration; "+"; $v2; xs:date ]_BinaryOp

case xs:dateTime $v2 return [ $v1; xs:yearMonthDuration; "+"; $v2; xs:dateTime ]_BinaryOp

case xs:yearMonthDuration $v2 return [ $v1; xs:yearMonthDuration; "+"; $v2; xs:yearMonthDuration ]_BinaryOp

default $v2 return xf:error())

case xs:dayTimeDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:date $v2 return [$v1; xs:dayTimeDuration; "+"; $v2; xs:date ]_BinaryOp

case xs:dateTime $v2 return [ $v1; xs:dayTimeDuration; "+"; $v2; xs:dateTime ]_BinaryOp

case xs:dayTimeDuration $v2 return [ $v1; xs:dayTimeDuration; "+"; $v2; xs:dayTimeDuration ]_BinaryOp

case xs:time $v2 return [ $v1; xs:dayTimeDuration; "+"; $v2; xs:time ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

Ed. Note: MFF (01-May-2002) Not sure what to do about numeric promotion rules. Should they be expressed explicity in normalization rules?

Ed. Note: MFF: The Datatype production does not permit choices of item types -- this is annoying. See [Issue-0127: Datatype limitations].

Ed. Note: Peter: the static semantics of operators is as strict as the one for functions, and is still under discussion. See [Issue-0129: Static typing of union].

The following normalization rule for "-" is analogous to that for "+".

[Expr₁ "-" Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

let $z1 := [ $coree1 ]_Atomize return

$e1 := typeswitch ($z1)

case untyped $v1 return cast ($v1) as xs:double

default $v1 return $v1,

let $z2 := [ $coree2 ]_Atomize return

$e2 := typeswitch ($z2)

case untyped $v2 return cast ($v2) as xs:double

default $v2 return $v2

return

typeswitch ($e1)

case empty $v1 return ()

case fs:numeric $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case fs:numeric $v2 return [ $v1; numeric; "-"; $v2; numeric ]_BinaryOp

default $v2 return xf:error())

case xs:date $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [ $v1; xs:date; "-"; $v2; xs:dayTimeDuration ]_BinaryOp

case xs:yearMonthDuration $v2 return [ $v1; xs:date; "-"; $v2; xs:yearMonthDuration ]_BinaryOp

default $v2 return xf:error())

case xs:time $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [$v1; xs:time; "-"; $v2; xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case xs:dateTime $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [$v1; xs:dateTime; "-"; $v2; xs:dayTimeDuration ]_BinaryOp

case xs:yearMonthDuration $v2 return [$v1; xs:dateTime; "-"; $v2; xs:yearMonthDuration ]_BinaryOp

default $v2 return xf:error())

case xs:yearMonthDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:yearMonthDuration $v2 return [ $v1; xs:yearMonthDuration; "-"; $v2; xs:yearMonthDuration ]_BinaryOp

default $v2 return xf:error())

case xs:dayTimeDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [ $v1; xs:dayTimeDuration; "-"; $v2; xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

The multiplicative operators "*" and "div" are defined on numeric and xs:yearMonthDuration, xs:dayTimeDuration. The multiplicative operator "mod" is only defined on numeric, so its normalization rule is simple.

[Expr₁ "*" Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

$z1 := [ $coree1 ]_Atomize

$e1 := typeswitch ($z1)

case untyped $v1 return cast ($v1) as xs:double

default $v1 return $v1,

$z2 := [ $coree2 ]_Atomize

$e2 := typeswitch ($z2)

case untyped $v2 return cast ($v2) as xs:double

default $v2 return $v2

return

typeswitch ($e1)

case empty $v1 return ()

case xs:decimal $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:yearMonthDuration $v2 return [ $v1; xs:decimal; "*"; $v2; xs:yearMonthDuration ]_BinaryOp

case xs:dayTimeDuration $v2 return [ $v1; xs:decimal; "*"; $v2; xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case fs:numeric $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case fs:numeric $v2 return [ $v1; numeric; "*"; $v2; numeric ]_BinaryOp

default $v2 return xf:error())

case xs:dayTimeDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:decimal $v2 return [ $v1; xs:dayTimeDuration; "*"; $v2; xs:decimal ]_BinaryOp

default $v2 return xf:error())

case xs:yearMonthDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:decimal $v2 return [ $v1; xs:yearMonthDuration; "*"; $v2; xs:decimal ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

[Expr₁ "div" Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

$z1 := [ $coree1 ]_Atomize

$e1 := typeswitch ($z1)

case untyped $v1 return cast ($v1) as xs:double

default $v1 return $v1,

$z2 := [ $coree2 ]_Atomize

$e2 := typeswitch ($z2)

case untyped $v2 return cast ($v2) as xs:double

default $v2 return $v2

return

typeswitch ($e1)

case empty $v1 return ()

case fs:numeric $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case fs:numeric $v2 return [ $v1; numeric; "div"; $v2; numeric ]_BinaryOp

default $v2 return xf:error())

case xs:dayTimeDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:decimal $v2 return [ $v1; xs:dayTimeDuration; "div"; $v2; xs:decimal ]_BinaryOp

default $v2 return xf:error())

case xs:yearMonthDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:decimal $v2 return [ $v1; xs:yearMonthDuration; "div"; $v2; xs:decimal ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

[Expr₁ mod Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

$z1 := [ $coree1 ]_Atomize

$e1 := typeswitch ($z1)

case untyped $v1 return cast ($v1) as xs:double

default $v1 return $v1,

$z2 := [ $coree2 ]_Atomize

$e2 := typeswitch ($z2)

case untyped $v2 return cast ($v2) as xs:double

default $v2 return $v2

return

typeswitch ($e1)

case empty $v1 return ()

case fs:numeric $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case fs:numeric $v2 return [ $v1; numeric; mod $v2; numeric ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

For convenience, UnaryOp denotes the unary operators "+" and "-". The normalization rule for unary operators is straightforward:

[UnaryOp Expr]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$z1 := [ $coree1 ]_Atomize

$e1 := typeswitch ($z1)

case untyped $v1 return cast ($v1) as xs:double

default $v1 return $v1,

return [ UnaryOp; $e1 ]_UnaryOp

Core Grammar

There are no core grammar rules for arithmetic expressions as they are normalized to function calls.

Static Type Analysis

In the [XQuery 1.0 and XPath 2.0 Functions and Operators], type promotion rules are given for all the arithmetic operators, denoted by op:operation, and the result types of these operations. The following static semantics rules specify the result types for all arithmetic operators when applied to specific numeric types.

statEnv |- Expr₁ : Type₁ Type₁ <: xs:decimal statEnv |- Expr₂ : Type₂ Type₂ <: xs:decimal

statEnv |- op:operation(Expr₁, Expr₂) : xs:decimal

statEnv |- Expr₁ : Type₁ Type₁ <: xs:float statEnv |- Expr₂ : Type₂ Type₂ <: xs:float

statEnv |- op:operation(Expr₁, Expr₂) : xs:float

statEnv |- Expr₁ : Type₁ Type₁ <: xs:double statEnv |- Expr₂ : Type₂ Type₂ <: xs:double

statEnv |- op:operation(Expr₁, Expr₂) : xs:double

statEnv |- Expr₁ : Type₁ Type₁ <: xs:integer statEnv |- Expr₂ : Type₂ Type₂ <: xs:integer

statEnv |- op:operation(Expr₁, Expr₂) : xs:integer

Ed. Note: MFF: Can the rules above be factored?

Analogous static type rules are given for the unary arithmetic operators.

statEnv |- Expr₁ : Type₁ Type₁ <: xs:decimal

statEnv |- op:operation(Expr₁) : xs:decimal

statEnv |- Expr₁ : Type₁ Type₁ <: xs:float

statEnv |- op:operation(Expr₁) : xs:float

statEnv |- Expr₁ : Type₁ Type₁ <: xs:double

statEnv |- op:operation(Expr₁) : xs:double

statEnv |- Expr₁ : Type₁ Type₁ <: xs:integer

statEnv |- op:operation(Expr₁) : xs:integer

Dynamic Evaluation

The normalization rules map all arithmetic operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for arithmetic operators. The dynamic semantics rules for function calls given in [5.1.4 Function Calls] apply to all the function calls: op:numeric-add, etc.

5.5 Comparison Expressions

Introduction

Comparison expressions allow two values to be compared. [XPath/XQuery] provides four kinds of comparison expressions, called value comparisons, general comparisons, node comparisons, and order comparisons.

[15 (XQuery)]	ComparisonExpr	::=	RangeExpr ( (ValueComp \| GeneralComp \| NodeComp \| OrderComp) RangeExpr )?
[33 (XQuery)]	ValueComp	::=	"eq" \| "ne" \| "lt" \| "le" \| "gt" \| "ge"
[32 (XQuery)]	GeneralComp	::=	"=" \| "!=" \| "<" \| "<=" \| ">" \| ">="
[34 (XQuery)]	NodeComp	::=	"is" \| "isnot"
[35 (XQuery)]	OrderComp	::=	"<<" \| ">>"

Ed. Note: [Kristoffer/XSL] The present rules mix static and dynamic dispatch of which primitive function is used. We are investigating how to repair that as well as make the definitions work with both static and dynamic typing. See [Issue-0122: Overloaded functions].

5.5.1 Value Comparisons

Normalization

The value comparison equality operators "eq" and "ne" are defined on a large set of types.

[Expr₁ ValueEqOp Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

$e1 := [ $coree1 ]_Atomize ,

$e2 := [ $coree2 ]_Atomize

return

typeswitch ($e1)

case empty $v1 return ()

case numeric $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case numeric $v2 return [ $v1; numeric; ValueEqOp; $v2; numeric ]_BinaryOp

case untyped $v2 return

[ $v1; numeric; ValueEqOp; (cast ($v2) as xs:double); numeric ]_BinaryOp

default $v2 return xf:error())

case xs:boolean $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:boolean $v2 return [ $v1; xs:boolean; ValueEqOp; $v2; xs:boolean ]_BinaryOp

case untyped $v2 return

[ $v1; xs:boolean; ValueEqOp; (cast ($v2) as xs:boolean); xs:boolean ]_BinaryOp

default $v2 return xf:error())

case xs:string $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:string $v2 return [ $v1; xs:string; ValueEqOp; $v2; xs:string ]_BinaryOp

case untyped $v2 return

[ $v1; xs:string; ValueEqOp; (cast ($v2) as xs:string); xs:string ]_BinaryOp

default $v2 return xf:error())

case xs:date $v1 return

(typeswitch ($e2) as $v2

case empty $v2 return ()

case xs:date $v2 return [ $v1; xs:date; ValueEqOp; $v2; xs:date ]_BinaryOp

case untyped $v2 return

[ $v1; xs:date; ValueEqOp; (cast ($v2) as xs:date); xs:date ]_BinaryOp

default $v2 return xf:error())

case xs:time $v1 return

(typeswitch ($e2) as $v2

case empty $v2 return ()

case xs:time $v2 return [ $v1; xs:time; ValueEqOp; $v2; xs:time ]_BinaryOp

case untyped $v2 return

[ $v1; xs:time; ValueEqOp; (cast ($v2) as xs:time); xs:time ]_BinaryOp

default $v2 return xf:error())

case xs:dateTime $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dateTime $v2 return [ $v1; xs:dateTime; ValueEqOp; $v2; xs:dateTime ]_BinaryOp

case untyped $v2 return

[ $v1; xs:dateTime; ValueEqOp; (cast ($v2) as xs:dateTime); xs:dateTime ]_BinaryOp

default $v2 return xf:error())

case xs:dayTimeDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [ $v1; xs:dayTimeDuration; ValueEqOp; $v2; xs:dayTimeDuration ]_BinaryOp

case untyped $v2 return

[ $v1; xs:dayTimeDuration; ValueEqOp; (cast ($v2) as xs:dayTimeDuration); xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case xs:yearMonthDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:yearMonthDuration $v2 return [ $v1; xs:yearMonthDuration; ValueEqOp; $v2; xs:yearMonthDuration ]_BinaryOp

case untyped $v2 return

[ $v1; xs:yearMonthDuration; ValueEqOp; (cast ($v2) as xs:yearMonthDuration); xs:yearMonthDuration ]_BinaryOp

default $v2 return xf:error())

case Gregorian $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case Gregorian $v2 return [ $v1; Gregorian; ValueEqOp; $v2; Gregorian ]_BinaryOp

case untyped $v2 return

[ $v1; Gregorian; ValueEqOp; (cast ($v2) as Gregorian); Gregorian ]_BinaryOp

default $v2 return xf:error())

case xs:hexBinary $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:hexBinary $v2 return [ $v1; xs:hexBinary; ValueEqOp; $v2; xs:hexBinary ]_BinaryOp

case untyped $v2 return

[ $v1; xs:hexBinary; ValueEqOp; (cast ($v2) as xs:hexBinary); xs:hexBinary ]_BinaryOp

default $v2 return xf:error())

case xs:base64Binary $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:base64Binary $v2 return [ $v1; xs:base64Binary; ValueEqOp; $x2; xs:base64Binary ]_BinaryOp

case untyped $v2 return

[ $v1; xs:base64Binary; ValueEqOp; (cast ($v2) as xs:base64Binary); xs:base64Binary ]_BinaryOp

default $v2 return xf:error())

case xs:anyURI $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:anyURI $v2 return [ $v1; xs:anyURI; ValueEqOp; $v2; xs:anyURI ]_BinaryOp

case untyped $v2 return

[ $v1; xs:anyURI; ValueEqOp; (cast ($v2) as xs:anyURI); xs:anyURI ]_BinaryOp

default $v2 return xf:error())

case xs:QName $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:QName $v2 return [ $v1; xs:QName; ValueEqOp; $v2; xs:QName ]_BinaryOp

case untyped $v2 return

[ $v1; xs:QName; ValueEqOp; (cast ($v2) as xs:QName) ; xs:QName ]_BinaryOp

default $v2 return xf:error())

case xs:NOTATION $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:NOTATION $v2 return [ $v1; xs:NOTATION; ValueEqOp; $v2; xs:NOTATION ]_BinaryOp

case untyped $v2 return

[ $v1; xs:NOTATION; ValueEqOp; (cast ($v2) as xs:NOTATION); xs:NOTATION ]_BinaryOp

default $v2 return xf:error())

case untyped $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case numeric $v2 return

[ (cast ($v1) as xs:double); xs:double; ValueEqOp; $v2; xs:double; numeric]_BinaryOp

case xs:string $v2 return

[ (cast ($v1) as xs:string); xs:string; ValueEqOp; $v2; xs:string]_BinaryOp

case xs:date $v2 return

[ (cast ($v1) as xs:date); xs:date; ValueEqOp; $v2; xs:date]_BinaryOp

case xs:time $v2 return

[ (cast ($v1) as xs:time); xs:time; ValueEqOp; $v2; xs:time]_BinaryOp

case xs:dateTime $v2 return

[ (cast ($v1) as xs:dateTime); xs:dateTime; ValueEqOp; $v2; xs:dateTime]_BinaryOp

case xs:dayTimeDuration $v2 return

[ (cast ($v1) as xs:dayTimeDuration); xs:dayTimeDuration; ValueEqOp; $v2; xs:dayTimeDuration]_BinaryOp,

case xs:yearMonthDuration $v2 return

[ (cast ($v1) as xs:yearMonthDuration); xs:yearMonthDuration; ValueEqOp; $v2; xs:yearMonthDuration]_BinaryOp,

case xs:hexBinary $v2 return

[ (cast ($v1) as xs:hexBinary); xs:hexBinary; ValueEqOp; $v2; xs:hexBinary ]_BinaryOp

case xs:base64Binary $v2 return

[ (cast ($v1) as xs:base64Binary); xs:base64Binary; ValueEqOp; $v2; xs:base64Binary ]_BinaryOp

case xs:anyURI $v2 return

[ (cast ($v1) as xs:anyURI); xs:anyURI; ValueEqOp; $v2; xs:anyURI ]_BinaryOp

case xs:QName $v2 return

[ (cast ($v1) as xs:QName); xs:QName; ValueEqOp; $v2; xs:QName ]_BinaryOp

case xs:NOTATION $v2 return

[ (cast ($v1) as xs:NOTATION); xs:NOTATION; ValueEqOp; $v2; xs:NOTATION ]_BinaryOp

case untyped $v2 return

[ (cast ($v1) as xs:string); xs:string; ValueEqOp; (cast ($v2) as xs:string); xs:string ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

Ed. Note: MFF: The definition of equality operators could be factored by introducing another normalization function, which would be applied to the bodies of the cases. Is it clearer (albeit longer) to just enumerate all the cases? For now, they are enumerated.

Normalization

The value comparison in-equality operators "lt", "le", "gt", and "ge" are defined on a smaller set of types than are the equality operators "eq" and "ne". For convenience, ValueInEqOp denotes the operators "lt", "le", "gt", or "ge".

[Expr₁ ValueInEqOp Expr₂]_Expr

let $coree1 := ([ Expr₁ ]_Expr),

$coree2 := ([ Expr₂ ]_Expr),

$e1 := [ $coree1 ]_Atomize ,

$e2 := [ $coree2 ]_Atomize

return

typeswitch ($e1)

case empty $v1 return ()

case numeric $v2 return

(typeswitch ($e2)

case empty $v2 return ()

case numeric $v2 return [ $v1; numeric; ValueInEqOp; $v2; numeric ]_BinaryOp

case untyped $v2 return

[ $v1; numeric; ValueInEqOp; (cast ($v2) as xs:double); numeric ]_BinaryOp

default $v2 return xf:error())

case xs:boolean $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:boolean $v2 return [ $v1; numeric; ValueInEqOp; $v2; numeric ]_BinaryOp

case untyped $v2 return

[ $v1; xs:boolean; ValueInEqOp; (cast ($v2) as xs:boolean); xs:boolean ]_BinaryOp

default $v2 return xf:error())

case xs:string $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:string $v2 return [ $v1; xs:string; ValueInEqOp; $v2; xs:string ]_BinaryOp

case untyped $v2 return

[ $v1; xs:string; ValueInEqOp; (cast ($v2) as xs:string); xs:string ]_BinaryOp

default $v2 return xf:error())

case xs:date $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:date $v2 return [ $v1; xs:date; ValueInEqOp; $v2; xs:date ]_BinaryOp

case untyped $v2 return

[ $v1; xs:date; ValueInEqOp; (cast ($v2) as xs:date); xs:date ]_BinaryOp

default $v2 return xf:error())

case xs:time $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:time $v2 return [ $v1; xs:time; ValueInEqOp; $v2; xs:time ]_BinaryOp

case untyped $v2 return

[ $v1; xs:time; ValueInEqOp; (cast ($v2) as xs:time); xs:time ]_BinaryOp

default $v2 return xf:error())

case xs:dateTime $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dateTime $v2 return [ $v1; xs:dateTime; ValueInEqOp; $v2; xs:dateTime ]_BinaryOp

case untyped return

[ $v1; xs:dateTime; ValueInEqOp; (cast ($v2) as xs:dateTime); xs:dateTime ]_BinaryOp

default $v2 return xf:error())

case xs:dayTimeDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:dayTimeDuration $v2 return [ $v1; xs:dayTimeDuration; ValueInEqOp; $v2; xs:dayTimeDuration ]_BinaryOp

case untyped $v2 return

[ $v1; xs:dayTimeDuration; ValueInEqOp; (cast ($v2) as xs:dayTimeDuration); xs:dayTimeDuration ]_BinaryOp

default $v2 return xf:error())

case xs:yearMonthDuration $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case xs:yearMonthDuration $v2 return [ $v1; xs:yearMonthDuration; ValueInEqOp; $v2; xs:yearMonthDuration ]_BinaryOp

case untyped $v2 return

[ $v1; xs:yearMonthDuration; ValueInEqOp; (cast ($v2) as xs:yearMonthDuration); xs:yearMonthDuration ]_BinaryOp

default $v2 return xf:error())

case untyped $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case numeric $v2 return

[ (cast ($v1) as xs:double); xs:double; ValueInEqOp; $v2; numeric]_BinaryOp

case xs:string $v2 return

[(cast ($v1) as xs:string); xs:string; ValueInEqOp; $v2; xs:string]_BinaryOp

case xs:date $v2 return

[ (cast ($v1) as xs:date); xs:date; ValueInEqOp; $v2; xs:date]_BinaryOp(cast ($v1) as xs:date, $v2)

case xs:time $v2 return

[ (cast ($v1) as xs:time); xs:time; ValueInEqOp; $v2; xs:time]_BinaryOp

case xs:dateTime $v2 return

[ (cast ($v1) as xs:dateTime); xs:dateTime; ValueInEqOp; $v2; xs:dateTime]_BinaryOp

case xs:dayTimeDuration $v2 return

[ (cast ($v1) as xs:dayTimeDuration); xs:dayTimeDuration; ValueInEqOp; $v2; xs:dayTimeDuration]_BinaryOp

case xs:yearMonthDuration $v2 return

[ (cast ($v1) as xs:yearMonthDuration); xs:yearMonthDuration; ValueInEqOp; $v2; xs:yearMonthDuration]_BinaryOp

case untyped $v2 return

[ (cast ($v1) as xs:string); xs:string; ValueInEqOp; (cast ($v2) as xs:string); xs:string ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

Core Grammar

There are no core grammar rules for value comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the value comparison operators. They all have return type xs:boolean, as specified in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Dynamic Evaluation

The normalization rules map all value comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for value comparison operators. The dynamic semantics rules for function calls given in [5.1.4 Function Calls] apply to all the function calls: op:numeric-less-than, etc.

5.5.2 General Comparisons

Introduction

General comparisons are defined by adding existential semantics to value comparisons. The operands of a general comparison may be sequences of any length. The result of a general comparison is always true or false.

Notation

For convenience, GeneralOp denotes the operators "=", "!=", "<", "<=", ">", or ">=".

The function []_ValueOp is defined by the following table:

GeneralOp	[GeneralOp]_ValueOp
`=`	`eq`
`!=`	`ne`
`<`	`lt`
`<=`	`le`
`>`	`gt`
`>=`	`ge`

Normalization

A general comparison expression is normalized by mapping it into an existentially quantified, value-comparison expression, which is normalized recursively.

[Expr₁ GeneralOp Expr₂]_Expr

[some $v1 in Expr₁ satisfies (some $v2 in Expr₂ satisfies $v1 [GeneralOp]_ValueOp $v2)]_Expr

Core Grammar

There are no core grammar rules for general comparisons as they are normalized to existentially quantified core expressions.

Static Type Analysis

There are no static type rules for the general comparison operators. The existentially quantified "some" expression always returns xs:boolean. Its static typing semantics is given in [5.11 Quantified Expressions].

Dynamic Evaluation

The normalization rules map all general comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for general comparison operators.

5.5.3 Node Comparisons

Notation

For convenience, NodeOp denotes the operators "is" and "isnot".

Normalization

The normalization rule for node comparison expressions checks that both operands are optional node values, otherwise it generates an error. If both operands are nodes, it applies the operator specified by the []_BinaryOp function.

[Expr₁ NodeOp Expr₂]_Expr

let $e1 := ([ Expr₁ ]_Expr),

$e2 := ([ Expr₂ ]_Expr),

return

typeswitch ($e1)

case empty $v1 return

(typeswitch ($e2)

case node? $v2 return ()

default $v2 return xf:error())

case node $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case node $v2 return [ $v1; node; NodeOp; $v2; node ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

Ed. Note: Don Chamberlin points out that the above rule is nearly identical to the one in the previous section. Merging the two rules should be considered.

Core Grammar

There are no core grammar rules for node comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the node comparison operators.

Dynamic Evaluation

The normalization rules map the node comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for node comparison operators.

5.5.4 Order Comparisons

Notation

For convenience, OrderOp denotes the operators "follow", "precedes", "<<", and ">>".

Normalization

The normalization rule for order comparison expressions checks that both operands are optional node values, otherwise it generates an error. If both operands are nodes, it applies the operator specified by the []_BinaryOp function.

[Expr₁ OrderOp Expr₂]_Expr

let $e1 := ([ Expr₁ ]_Expr),

$e2 := ([ Expr₂ ]_Expr),

return

typeswitch ($e1)

case empty $v1 return

(typeswitch ($e2)

case node? $v2 return ()

default $v2 return xf:error())

case node $v1 return

(typeswitch ($e2)

case empty $v2 return ()

case node $v2 return [$v1; node; OrderOp; $v2; node ]_BinaryOp

default $v2 return xf:error())

default $v1 return xf:error()

Core Grammar

There are no core grammar rules for order comparisons as they are normalized to function calls.

Static Type Analysis

There are no static type rules for the order comparison operators.

Dynamic Evaluation

The normalization rules map the order comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for order comparison operators.

5.6 Logical Expressions

Introduction

A logical expression is either an and-expression or an or-expression. The value of a logical expression is always one of the boolean values: true or false.

[7 (XQuery)]	OrExpr	::=	AndExpr ( "or" AndExpr )*
[8 (XQuery)]	AndExpr	::=	UnorderedExpr ( "and" UnorderedExpr )*
[12 (XPath)]	AndExpr	::=	ForExpr ( "and" ForExpr )*

Ed. Note: [Kristoffer/XSL] The present rules mix static and dynamic dispatch of which primitive function is used. We are investigating how to repair that as well as make the definitions work with both static and dynamic typing. See [Issue-0122: Overloaded functions].

Normalization

The normalization rules for "and" and "or" first get the effective boolean value of each argument, then apply the appropriate operand.

[Expr₁ and Expr₂]_Expr

let $e1 := [ Expr₁ ]_{Effective_Boolean_Value} return

let $e2 := [ Expr₂ ]_{Effective_Boolean_Value} return

if ($e1) then $e2

else if (xf:not($e1)) then xf:false()

else xf:error()

Ed. Note: Note that non-determinism in the semantics of logical expressions is not modeled here. See [Issue-0136: Non-determinism in the semantics].

[Expr₁ or Expr₂]_Expr

let $e1 := [ Expr₁ ]_{Effective_Boolean_Value}, return

let $e2 := [ Expr₂ ]_{Effective_Boolean_Value}, return

if ($e1) then xf:true()

else if (xf:not($e1)) then $e2

else $e2

Core Grammar

There are no core grammar rules for logical expressions which normalized to other kinds of [XPath/XQuery] expressions.

Static Type Analysis

There are no static type rules for the logical comparison operators. They both have return type xs:boolean, as specified in [XQuery 1.0 and XPath 2.0 Functions and Operators].

Dynamic Evaluation

The normalization rules map the logical comparison operators into core expressions, whose dynamic semantics is defined in other sections. Therefore, there are no dynamic semantics rules for logical comparison operators.

5.7 Constructors

[XPath/XQuery] supports two forms of constructors: a "literal" form that follows the XML syntax, and element and attribute constructors that can be used to construct stand-alone elements and attributes, possibly with a computed name.

[31 (XQuery)]	Constructor	::=	ElementConstructor \| XmlComment \| XmlProcessingInstruction \| CdataSection \| ComputedDocumentConstructor \| ComputedElementConstructor \| ComputedAttributeConstructor
[70 (XQuery)]	ElementConstructor	::=	( \| "<") QName AttributeList ("/>" \| (">" ElementContent* "</" QName S? ">"))
[77 (XQuery)]	ElementContent	::=	Char \| "{{" \| "}}" \| ElementConstructor \| EnclosedExpr \| CdataSection \| CharRef \| PredefinedEntityRef \| XmlComment \| XmlProcessingInstruction
[78 (XQuery)]	AttributeList	::=	(S (QName S? "=" S? AttributeValue)?)*
[79 (XQuery)]	AttributeValue	::=	(["] (EscapeQuot \| AttributeValueContent)* ["]) \| (['] (EscapeApos \| AttributeValueContent)* ['])
[80 (XQuery)]	AttributeValueContent	::=	Char \| CharRef \| "{{" \| "}}" \| EnclosedExpr \| PredefinedEntityRef
[81 (XQuery)]	EnclosedExpr	::=	( \| "{") ExprSequence "}"

Core Grammar

The core grammar productions for constructors are:

[20 (Core)]	Constructor	::=	XmlComment \| XmlProcessingInstruction \| ComputedDocumentConstructor \| ComputedElementConstructor \| ComputedAttributeConstructor
[57 (Core)]	EnclosedExpr	::=	( \| "{") ExprSequence "}"

5.7.1 Element Constructors

Introduction

[77 (XQuery)]

ElementContent

::=

Char
|  "{{"
|  "}}"
|  ElementConstructor
|  EnclosedExpr
|  CdataSection
|  CharRef
|  PredefinedEntityRef
|  XmlComment
|  XmlProcessingInstruction

The static and dynamic semantics of the literal forms of element constructors is obtained after normalization to computed element constructors.

Notation

The auxiliary mapping rules []_{ElementContent} and []_{AttributeContent} are used for the normalization of element and attribute content respectively.

Normalization

Literal characters, escaped curly braces, character references, and predefined entity references in element content are treated specially. This normalization rule assumes:

that the significant whitespace characters in element constructors have been preserved, as described in [5.7.3 Whitespace in Constructors];
that character references have been resolved to individual characters and predefined entity references have been resolved to sequences of characters, and
that the rule is applied to the longest contiguous sequence of characters.

The following normalization rules take the longest consecutive sequence of individual characters that arise from literal characters, escaped curly braces, character references, and predefined entity references and normalizes the character sequence as a text node.

[Char*]_{ElementContent}

dm:text-node(fs:characters_to_string(Char*))

We start with the rules for normalizing a literal XML element's content. We normalize each individual value and construct a sequence of the normalized values. Consecutive sequences of characters are normalized as a unit, applying the rule above.

[ ElementContent₀ ..., ElementContent_n ]_{ElementContent}

([ ElementContent₀ ]_{ElementContent} , ..., [ ElementContent_n]_{ElementContent})

We normalize an enclosed expression in element content by normalizing each individual expression in its expression sequence and then construct a sequence of the normalized values:

[ { Expr₀, ..., Expr_n } ]_{ElementContent}

([ Expr₀ ]_Expr , ..., [ Expr_n]_Expr, dm:text-node(""))

We need to distinguish between multiple enclosed expressions, because the rule for converting sequences of atomic values into strings are applied to sequences within distinct enclosed expressions. To distinguish between multiple enclosed expressions, we add an empty text node to the end of the constructed sequence above -- this empty text node is eliminated in the dynamic evaluation rule when consecutive text nodes are coalesced into a single text node. The text node guarantees that a whitespace character will not be inserted between atomic values computed by distinct enclosed expressions.

Processing instructions and comments in element content are normalized by applying the standard normalization rules.

[XmlProcessingInstruction]_{ElementContent}

[XmlProcessingInstruction]_Expr

[XmlComment]_{ElementContent}

[XmlComment]_Expr

The following rules normalize the two forms of literal XML element constructors by normalizing their name, attribute list, and element content and translates the normalized values into the corresponding computed element constructor.

[70 (XQuery)]

ElementConstructor

::=

( | "<") QName AttributeList ("/>" | (">" ElementContent* "</" QName S? ">"))

[ < QName AttributeList > ElementContent* </ QName S? > ]_Expr

element [QName]{ [ AttributeList ]_{AttributeContent} , [ ElementContent* ]_{ElementContent} }

[ < QName AttributeList /> ]_Expr

element [QName]{ [ AttributeList ]_{AttributeContent} }

Like literal XML element constructors, literal XML attribute constructors are normalized to computed attribute constructors.

[78 (XQuery)]	AttributeList	::=	(S (QName S? "=" S? AttributeValue)?)*
[79 (XQuery)]	AttributeValue	::=	(["] (EscapeQuot \| AttributeValueContent)* ["]) \| (['] (EscapeApos \| AttributeValueContent)* ['])
[80 (XQuery)]	AttributeValueContent	::=	Char \| CharRef \| "{{" \| "}}" \| EnclosedExpr \| PredefinedEntityRef

Literal characters, escaped curly braces, character references, and predefined entity references in attribute content are treated as in element content. In addition, the normalization rule for characters in attributes assumes:

that an escaped single or double quote is converted to an individual single or double quote.

The following normalization rules take the longest consecutive sequence of individual characters that arise from literal characters, escaped curly braces, character references, predefined entity references, and escaped single and double quotes and normalizes the character sequence as a string.

[Char*]_{AttributeContent}

fs:characters_to_string(Char*)

We normalize an enclosed expression in attribute content by normalizing each individual expression in its expression sequence and then construct a sequence of the normalized values:

[ { Expr₀, ..., Expr_n } ]_{AttributeContent}

([ Expr₀ ]_Expr , ..., [ Expr_n]_Expr, dm:text-node(""))

As in literal XML elements, we need to distinguish between multiple enclosed expressions in attribute content, because the rule for converting sequences of atomic values into strings are applied to sequences within distinct enclosed expressions. As in element content, to distinguish between multiple enclosed expressions in attribute content, we add an empty text node to the end of the constructed sequence above -- this empty text node is eliminated in the dynamic evaluation rule.

We normalize an AttributeValue (a literal attribute's value) by normalizing each individual value in its content and then construct a sequence of the normalized values. Consecutive sequences of characters are normalized as a unit, applying the rules above.

[ AttributeValueContent₀ ..., AttributeValueContent_n ]_{AttributeContent}

([ AttributeValueContent₀ ]_{AttributeContent} , ..., [ AttributeValueContent_n]_{AttributeContent})

An AttributeList is normalized by the following rule, which maps each of the individual attribute-value expressions in the attribute list and constructs a sequence of the normalized values.

[

QName₀ S? = S? (AttributeValue₀ | EnclosedExpr₀) ...

QName_n S? = S? (AttributeValue_n | EnclosedExpr_n) ...

]

(attribute [QName₀] { [ (AttributeValue₀ | EnclosedExpr₀) ]_{AttributeContent}},

...,

attribute [QName_n] { [ (AttributeValue_n | EnclosedExpr_n) ]_{AttributeContent}})

Core Grammar

There are no core grammar rules for literal XML element or attribute constructors as they are normalized to computed constructors.

Static Type Analysis

There are no additional static type rules for literal XML element or attribute constructors.

Dynamic Evaluation

There are no additional dynamic evaluation rules for literal XML element or attribute constructors.

5.7.2 Computed Element and Attribute Constructors

[72 (XQuery)]	ComputedElementConstructor	::=	(<"element" QName "{"> \| (<"element" "{"> Expr "}" "{")) ExprSequence? "}"
[73 (XQuery)]	ComputedAttributeConstructor	::=	(<"attribute" QName "{"> \| (<"attribute" "{"> Expr "}" "{")) ExprSequence? "}"
[71 (XQuery)]	ComputedDocumentConstructor	::=	<"document" "{"> ExprSequence "}"

5.7.2.1 Computed Element Nodes

An element constructor may contain an arbitrary sequence of items, e.g., atomic values, attributes, and child elements. Here's an element constructor that contains an integer, an element, a string, and an attribute:

element address { 123, element street {"Roosevelt Ave."},
"Flushing, NY", attribute zip { "11368" } }

Normalization

Computed element constructors are normalized by mapping their name and expression sequence. The normalization rule partitions attribute nodes from other nodes and atomic values in the element content.

[element QName { ExprSequence }]_Expr

let $e := [ExprSequence]_Expr return

let $attributes := for $fs:new in $e return

typeswitch $fs:new

case attribute $a return $a

default return ()

let $everythingelse := for $fs:new in $e return

typeswitch $fs:new

case attribute $a return ()

default $v return $v

return

element [QName] { $attributes, $everythingelse }

When the name of an element is computed, the normalization rule also checks that the value of the element's name is a xs:QName.

[element { Expr } { ExprSequence }]_Expr

let xs:QName $name := [Expr]_Expr return

let $e := [ExprSequence]_Expr return

let $attributes := for $fs:new in $e return

typeswitch $fs:new

case attribute $a return $a

default return ()

let $everythingelse := for $fs:new in $e return

typeswitch $fs:new

case attribute $a return ()

default $v return $v

return

element { $name } { $attributes, $everythingelse }

5.7.2.2 Constructed Attribute Nodes

Normalization

Computed attribute expressions are normalized by mapping their sub-expressions.

[attribute QName { ExprSequence }]_Expr

attribute QName { [ExprSequence]_Expr }

[attribute { Expr } { ExprSequence }]_Expr

let xs:QName $name := [Expr]_Expr return

attribute { $name } { [ExprSequence]_Expr }

Core Grammar

The core grammar rules for computed constructors are:

[53 (Core)]	ComputedElementConstructor	::=	(<"element" QName "{"> \| (<"element" "{"> Expr "}" "{")) ExprSequence? "}"
[54 (Core)]	ComputedAttributeConstructor	::=	(<"attribute" QName "{"> \| (<"attribute" "{"> Expr "}" "{")) ExprSequence? "}"
[52 (Core)]	ComputedDocumentConstructor	::=	<"document" "{"> ExprSequence "}"

Static Type Analysis

The normalization rules leave us with only the operator form of the element or attribute constructor to handle. The element (attribute) operator still has two forms: one in which a QName is supplied as the element (attribute) name, and the other in which a computed expression is supplied. In the latter case, the name cannot be known until runtime, and the element (attribute) is given a wildcard type.

Note that the static type for constructed elements and attributes is very poor as the static type of their content is lost during construction. A possible solution is to implicitly validate when the element is constructed. See [Issue-0169: Conformance Levels] for suggestions.

statEnv |- expand(QName) = qname

statEnv |- element QName { ExprSequence } : element qname { xs:anyType }

statEnv |- Expr : xs:QName

statEnv |- element { Expr } { ExprSequence } : element { xs:anyType }

statEnv |- Expr : xs:QName

statEnv |- attribute { Expr } { ExprSequence } : attribute { xs:anySimpleType }

statEnv |- expand(QName) = qname

statEnv |- attribute QName { ExprSequence } : attribute qname { xs:anySimpleType }

Ed. Note: DD: the above is a necessary consequence of our current evaluation model. One variation that I can think of is that the static rules could try looking up the element name in element environment. If the element name is found, then the associated, named type is used (and the contents are required to match). If the named element type is not found, then the constructed type is used, as before. Thus, in this scenario, validation would be done whenever an element with a "global" name was created. See also comments on the impact of (non-)validation on element construction, in the next section.

Dynamic Evaluation

The following rules construct an element from its name and content. The earlier normalization rule guarantees that all attributes precede other element content, which is a sequence of node and atomic values. Section [3.7.4 Data Model Representation] in [XQuery 1.0: A Query Language for XML] specifies the rules for converting a sequence of atomic values into a text node prior to element construction. As formal specification of these conversion rules is not instructive, the function [7.1.3 The fs:item-sequence-to-node-sequence function] implements this conversion.

dynEnv |- QName => qname

dynEnv |- ExprSequence => ( AttributeValues, Items )

dynEnv |- element QName { ExprSequence } =>

`dm:element-node`(	qname,
	`dm:empty-sequence`(),
	AttributeValues,
	fs:item-sequence-to-node-sequence(Items)
	`xs:anyType`)

dynEnv |- Expr => qname

dynEnv |- ExprSequence => ( AttributeValues, Items )

dynEnv |- element { Expr } { ExprSequence } =>

`dm:element-node`(	qname,
	`dm:empty-sequence`(),
	AttributeValues,
	fs:item-sequence-to-node-sequence(Items)
	`xs:anyType`)

Note that the element constructor dm:element-node makes copies of any nodes in its arguments; therefore the result of element construction is a completely "new" element.

The following rules construct an attribute from its name and values and are similar to those for elements. Section [3.7.2 Data Model Representation] in [XQuery 1.0: A Query Language for XML] specifies the rules for converting a sequence of atomic values into a string prior to attribute construction. Each node is replaced by its string value. For each adjacent sequence of one or more atomic values returned by an enclosed expression, a string is constructed, containing the canonical lexical representation of all the atomic values, with a single blank character inserted between adjacent values. As formal specification of these conversion rules is not instructive, the function [7.1.4 The fs:item-sequence-to-string function] implements this conversion.

dynEnv |- QName => qname

dynEnv |- ExprSequence => AtomicValues

dynEnv |- attribute QName { ExprSequence } =>

`dm:attribute-node`(	qname,
	fs:item-sequence-to-string(AtomicValues),
	`xs:anySimpleType`)

dynEnv |- Expr => qname

dynEnv |- ExprSequence => AtomicValues

dynEnv |- attribute { Expr } { ExprSequence } =>

`dm:attribute-node`(	qname,
	fs:item-sequence-to-string(AtomicValues),
	`xs:anySimpleType`)

Ed. Note: DD: There are several issues with element construction:

Namespaces and more precise type annotations in the constructor are not supported. This is due to the bottom-up nature of element construction: in general, one does not know either the namespaces in scope or the validation-associated schema type until this element has been "seated" in some containing element (and so on recursively). See [Issue-0165: Namespaces in element constructors].

There is a possible solution to this chicken-and-egg problem, however: because the element constructor makes copies of its children, it could be the responsibility of the element constructor to "fill in" the values for namespaces-in-scope and schema-component on each newly-copied child (recursively), based on information provided for the node. In this scenario, these fields would remain "blank" until some appropriate activity caused a schema component to become associated with a node, etc.

One implication of such a scheme would be that the "value" of elements could change as they are copied into a new containing element. For example, defaulted attributes could be added. Possibly the interpretation of data values would change as well, e.g. a data value supplied as a string could be re-interpreted as a number.
Any special treatment of xmlns, etc. that would be needed to associate namespaces with elements is not modeled.
Even if/when schema components are available, it is not clear when or how defaulted attributes and/or elements are created.

5.7.3 Whitespace in Constructors

Section [3.7.3 Whitespace in Constructors] in [XQuery 1.0: A Query Language for XML] describes how whitespace in element and attribute constructors is processed depending on the value of the xmlspace declaration in the query prolog. The formal semantics assumes that the rules for handling whitespace are applied prior to normalization rules, for example, during parsing of a query. Therefore, there are no formal rules for handling whitespace.

5.7.4 Other Constructors and Comments

[74 (XQuery)]	CdataSection	::=	"<![CDATA[" Char* "]]>"
[75 (XQuery)]	XmlProcessingInstruction	::=	"<?" PITarget Char* "?>"
[76 (XQuery)]	XmlComment	::=	"<!--" Char* "-->"

Core Grammar

The core grammar productions for other constructors and comments are:

[55 (Core)]	XmlProcessingInstruction	::=	"<?" PITarget Char* "?>"
[56 (Core)]	XmlComment	::=	"<!--" Char* "-->"

Normalization

A literal XML character data (CDATA) section is normalized into a text node constructor by applying the rule for converting characters to a text node in element content.

[<![CDATA[" Char* "]]>]_Expr

[Char*]_{ElementContent}

A literal XML processing instruction is normalized into a processing instructor constructor; its character content is converted to a string as in attribute content.

[<? NCName Char* ?>"]_Expr

dm:processing-instruction-node(NCName [Char*]_{AttributeContent})

A literal XML comment is normalized into a comment constructor; its character content is converted to a string as in attribute content.

[]_Expr

dm:comment-node([Char*]_{AttributeContent})

Static Type Analysis

There are no additional static type rules for CDATA, processing instructions or comments.

Dynamic Evaluation

There are no additional dynamic evaluation rules for CDATA, processing instructions or comments.

5.8 [For/FLWR] Expressions

Introduction

[XPath/XQuery] provides [For/FLWR] expressions for iteration, for binding variables to intermediate results, and filtering bound variables according to a predicate.

A FLWRExpr in XQuery 1.0 consists of a sequence of ForClauses and LetClauses, followed by an optional WhereClause, followed by an expression to be "return"ed, as described by the following grammar productions. Each variable binding is preceded by an optional type assertion which specify the type expected for the variable.

[10 (XQuery)]	FLWRExpr	::=	((ForClause \| LetClause)+ WhereClause? "return")* QuantifiedExpr
[26 (XQuery)]	ForClause	::=	"for" TypeDeclaration? "$" VarName "in" Expr ("," TypeDeclaration? "$" VarName "in" Expr)*
[27 (XQuery)]	LetClause	::=	"let" TypeDeclaration? "$" VarName ":=" Expr ("," TypeDeclaration? "$" VarName ":=" Expr)*
[28 (XQuery)]	WhereClause	::=	"where" Expr
[63 (XQuery)]	TypeDeclaration	::=	SequenceType
[13 (XPath)]	ForExpr	::=	(ForClause "return")* QuantifiedExpr

5.8.1 FLWR expressions

Notation

Individual [For/FLWR] clauses are normalized by means of the auxiliary normalization rules:

[FLWRClause]_FLWR(Expr)

Where FLWRClause can be any either a ForClause, a LetClause, or a WhereClause:

[56 (Formal)]

FLWRClause

::=

ForClause | LetClause | WhereClause

Note that, as is, this auxiliary rule normalizes a fragment of the [For/FLWR] expression, while taking the remainder of the expression (in Expr) as an additional parameter.

Normalization

Full [For/FLWR] expressions are normalized to nested core expressions using two sets of normalization rules. Note that some of the rules also accept ungrammatical FLWRExprs such as "where Expr₁ return Expr₂". This does not matter, as normalization is always applied on parsed [XPath/XQuery] expressions, and ungrammatical FLWRExprs would be rejected by the parser beforehand.

The first set of rules is applied on a full [For/FLWR] expression, splitting it at the clause level, then applying further normalization on each separate clause.

[ (ForClause | LetClause | WhereClause) FLWRExpr ]_Expr

[(ForClause | LetClause | WhereClause)]_FLWR([FLWRExpr])

[ (ForClause | LetClause | WhereClause) return Expr ]_Expr

[(ForClause | LetClause | WhereClause)]_FLWR([Expr])

Then each [For/FLWR] clause is normalized separately. A ForClause may bind more than one variable, whereas a for expression in the [XPath/XQuery] core binds and iterates over only one variable. Therefore, a ForClause is normalized to nested for expressions:

[ for TypeDeclaration₁? Variable₁ in Expr₁,..., TypeDeclaration_n? Variable_n in Expr_n ] _FLWR(Expr)

for TypeDeclaration₁? Variable₁ in [Expr₁]_Expr return

···

for TypeDeclaration_n? Variable_n in [ Expr_n ]_Expr return Expr

Each individual for clause, is then normalized to always have a type assertion.

Note that the additional Expr parameter of the auxiliary normalization rule is used as the final return expression.

Likewise, a LetClause clause is normalized to nested let expressions:

[ let TypeDeclaration₁? Variable₁ := Expr₁, ···, TypeDeclaration_n? Variable_n := Expr_n]_FLWR(Expr)

let TypeDeclaration₁? Variable₁ := [Expr₁ ]_Expr return

···

let TypeDeclaration_n? Variable_n := [Expr_n]_Expr return Expr

A WhereClause is normalized to an IfExpr, with the else-branch returning the empty list:

[ where Expr₁]_FLWR(Expr)

if ( [Expr₁]_Expr ) then Expr else ()

Example

The following simple example illustrates, how a FLWRExpr is normalized. The for expression in the example below is used to iterate over two collections, binding variables $i and $j to items in these collections. It uses a let clause to binds the local variable $k to the sum of both numbers, and a where clause to select only those numbers that have a sum equal to or greater than the integer 5.

  for xs:integer $i in (1, 2),
      $j in (3, 4)
  let $k := $i + $j
  where $k >= 5
  return
    <tuple>
       <i> { $i } </i>
       <j> { $j } </j>
    </tuple>

By the first set of rules, this is normalized to (except for the operators and element constructor which are not treated here):

  for xs:integer $i in (1, 2) return
    for $j in (3, 4) return
      let $k := $i + $j return
        if ($k >= 5) then 
          <tuple>
            <i> { $i } </i>
            <j> { $j } </j>
          </tuple>
        else
          ()

For each binding of $i to an item in the sequence (1 , 2) the inner for expression iterates over the sequence (3 , 4) to produce tuples ordered by the ordering of the outer sequence and then by the ordering of the inner sequence. This core expression eventually results in the following document fragment:

  (<tuple>
      <i>1</i>
      <j>4</j>
   </tuple>,
   <tuple>
      <i>2</i>
      <j>3</j>
   </tuple>,
   <tuple>
      <i>2</i>
      <j>4</j>
   </tuple>)

with the static type:

  element tuple {
    element i { xs:integer },
    element j { xs:integer }
  }*

5.8.2 For expression

Core Grammar

After normalization, single for expressions are described by the following core grammar production.

[8 (Core)]	ForExpr	::=	(ForClause "return")* TypeswitchExpr
[16 (Core)]	ForClause	::=	"for" TypeDeclaration? "$" VarName "in" Expr
[45 (Core)]	TypeDeclaration	::=	SequenceType

Static Type Analysis

A single for expression is typed as follows: First Type₁ of the iteration expression Expr₁ is inferred. Then the prime type of Type₁ - prime(Type₁) - is determined. This is a choice over all item types in Type₁ (see also [3.6 Auxiliary typing judgments for for, unordered, and sortby expressions]). With the variable component of the static environment statEnv extended with Variable₁ of type prime(Type₁), the type Type₂ of Expr₂ is inferred. Because the for expression iterates over the result of Expr₁, the final type of the iteration is Type₂ multiplied with the possible number of items in Type₁ (one, ?, *, or +). This number is determined by the auxiliary type-function quantifier(Type₁).

statEnv |- Expr₁ : Type₁

statEnv [ varType(Variable₁ : prime(Type₁)) ] |- Expr₂ : Type₂

statEnv |- for Variable₁ in Expr₁ return Expr₂ : Type₂ · quantifier(Type₁)

In the case a type assertion is present, the static semantics also checks that the type of the input expression is a subtype of the asserted type. This semantics is specified as the following typing rule.

statEnv |- Expr₁ : Type₁

Type₀ = [ SequenceType ]_sequencetype

Type₁ <: Type₀

statEnv [ varType(Variable₁ : prime(Type₁)) ] |- Expr₂ : Type₂

statEnv |- for SequenceType Variable₁ in Expr₁ return Expr₂ : Type₂ · quantifier(Type₁)

Example

For example, if $example is bound to the sequence (<one/> , <two/> , <three/>) of type element one {}, element two {}, element three {}, then the query

  for $s in $example
  return <out> {$s} </out>

is typed as follows:

  (1) prime(element one {}, element two {}, element three {}) =
      element one {} | element two {} | element three {}
  (2) quantifier(element one {}, element two {}, element three {}) = +
  (3) $s : element one {} | element two {} | element three {}
  (4) <out> {$s} </out> : 
      element out {element one {} | element two {} | element three {}}
  (5) result-type :
      element out {element one {} | element two {} | element three {}}+

This result-type is not the most specific type possible. It does not take into account the order of elements in the input type, and it ignores the individual and overall number of elements in the input type. The most specific type possible is: element out {element one {}}, element out {element two {}}, element out {element three {}}. However, inferring such a specific type for arbitrary input types and arbitrary return clauses requires significantly more complex type inference rules. In addition, if put into the context of an element, the specific type violates the "unique particle attribution" restriction of XML schema, which requires that an element must have a unique content model within a particular context.

Dynamic Evaluation

The evaluation of a for expression distinguishes two cases: If the iteration expression Expr₁ evaluates to the empty sequence, then the entire expression evaluates to the empty sequence:

dynEnv |- Expr₁ => Value empty(Value)

dynEnv |- for TypeDeclaration? Variable₁ in Expr₁ return Expr₂ => dm:empty-sequence()

Otherwise, the iteration expression Expr₁, is evaluated to produce the sequence Item₁, ..., Item_n. For each item Item_i in this sequence, the body of the for expression Expr₂ is evaluated in the environment dynEnv extended with Variable₁ bound to Item_i. This produces values Value_i, ..., Value_n which are concatenated to produce the result sequence.

dynEnv |- Expr₁ => Item₁ ,..., Item_n

dynEnv [ varValue(Variable₁ |-> Item₁) ] |- Expr₂ => Value₁

···

dynEnv [ varValue(Variable₁ |-> Item_n) ] |- Expr₂ => Value_n

dynEnv |- for Variable₁ in Expr₁ return Expr₂ => op:concatenate(Value₁ ,..., Value_n)

In the case a type assertion is present, the dynamic semantics also checks that the input value matches the asserted type. This semantics is specified as the following dynamic rule.

dynEnv |- Expr₁ => Item₁ ,..., Item_n

Type₀ = [ SequenceType ]_sequencetype

Item₁ matches Type₀

dynEnv [ varValue(Variable₁ |-> Item₁) ] |- Expr₂ => Value₁

···

Item_n matches Type₀

dynEnv [ varValue(Variable₁ |-> Item_n) ] |- Expr₂ => Value_n

dynEnv |- for SequenceType Variable₁ in Expr₁ return Expr₂ => op:concatenate(Value₁ ,..., Value_n)

Ed. Note: The dynamic semantics of for could be better defined without the use of ··· and using recursion. See [Issue-0134: Should we define for with head and tail?].

Example

Note that if the expression in the return clause results in a sequence, sequences are never nested in the [XPath/XQuery] data model. For instance, in the following for expression:

  
  for $i in (1,2)
    return (<i> {$i} </i>, <negi> {-$i} </negi>)

each iteration in the for results in a sequence of two elements, which are then concatenated and flattened in the resulting sequence (using the op:concatenate function):

  
  (<i>1</i>,
   <negi>-1</negi>,
   <i>2</i>,
   <negi>-2</negi>)

5.8.3 Let Expression

Core Grammar

After normalization, single let expressions are described by the following core grammar production.

[9 (Core)]	LetExpr	::=	(LetClause "return")* TypeswitchExpr
[17 (Core)]	LetClause	::=	"let" TypeDeclaration? "$" VarName ":=" Expr

Static Type Analysis

A let expression extends the type environment statEnv with Variable₁ of type Type₁ inferred from Expr₁, and infers the type of Expr₂ in the extended environment to produce the result type Type₂.

statEnv |- Expr₁ : Type₁ statEnv [ varType(Variable₁ : Type₁) ] |- Expr₂ : Type₂

statEnv |- let Variable₁ := Expr₁ return Expr₂ : Type₂

statEnv |- Expr₁ : Type₁

Type₀ = [ SequenceType ]_sequencetype

Type₁ <: Type₀

statEnv [ varType(Variable₁ : Type₁) ] |- Expr₂ : Type₂

statEnv |- let Variable₁ := Expr₁ return Expr₂ : Type₂

Dynamic Evaluation

A let expression extends the dynamic environment dynEnv with Variable bound to Value₁ returned by Expr₁, and evaluates Expr₂ in the extended environment to produce Value₂.

dynEnv |- Expr₁ => Value₁

dynEnv [ varValue(Variable₁ |-> Value₁) ] |- Expr₂ => Value₂

dynEnv |- let Variable₁ := Expr₁ return Expr₂ => Value₂

In the case a type assertion is present, the dynamic semantics also checks that the input value matches the asserted type. This semantics is specified as the following dynamic rule.

dynEnv |- Expr₁ => Value₁

Type₀ = [ SequenceType ]_sequencetype

Value₁ matches Type₀

dynEnv [ varValue(Variable₁ |-> Value₁) ] |- Expr₂ => Value₂

dynEnv |- let SequenceType Variable₁ := Expr₁ return Expr₂ => Value₂

Example

Note the use of the environment discipline to define the scope of each variable. For instance, in the following nested let expression:

  let $k := 5 return
    let $k := $k + 1 return
      $k+1

the outermost let expression binds variable $k to the integer 5 in the environment, then the expression $k+1 is computed, yielding value 6, to which the second variable $k is bound. The expression then results in the final integer 7.

5.9 Order-Related Expressions

5.9.1 Sorting Expressions

[6 (XQuery)]	SortExpr	::=	OrExpr ( "stable"? <"sort" "by" "("> SortSpecList ")" )
[36 (XQuery)]	SortSpecList	::=	Expr SortModifier ("," SortSpecList)?
[37 (XQuery)]	SortModifier	::=	("ascending" \| "descending")? (<"empty" "greatest"> \| <"empty" "least">)?

Core Grammar

The core grammar productions for sortby are:

[6 (Core)]	SortExpr	::=	UnorderedExpr ( "stable"? <"sort" "by" "("> SortSpecList ")" )
[21 (Core)]	SortSpecList	::=	Expr SortModifier ("," SortSpecList)?
[22 (Core)]	SortModifier	::=	("ascending" \| "descending") (<"empty" "greatest"> \| <"empty" "least">)

Ed. Note: Status: The semantics of sortby is not currently specified. Revision is pending agreement about the semantics of sorting.See [Issue-0109: Semantics of sortby].

5.9.2 Unordered Expressions

[9 (XQuery)]

UnorderedExpr

::=

"unordered"? FLWRExpr

Introduction

The unordered keyword may be used as a prefix to any expression to indicate that its order is not significant.

Core Grammar

The core grammar production for unordered expressions is:

[7 (Core)]

UnorderedExpr

::=

"unordered"? ForExpr

Notation

The dynamic semantics for unordered use an auxiliary judgments which disregards order between the items in a sequence.

The following judgment

dynEnv |- Value₁ permutes to Value₂

holds if the sequence of items in Value₂ is a permutation of the sequence of items in Value₁.

Static Type Analysis

The static semantics for unordered uses the auxiliary type functions prime(Type) and quantifier(Type); which are defined in [3.6 Auxiliary typing judgments for for, unordered, and sortby expressions]. The type of each argument is determined, and then prime(.) and quantifier(.) are applied to the sequence type (Type₁, Type₂).

statEnv |- Expr₁ : Type₁

statEnv |- unordered Expr₁ : prime(Type₁) · quantifier(Type₁)

Dynamic Evaluation

The dynamic semantics for unordered is specified using the auxiliary judgments permutes to as follows.

dynEnv |- Expr₁ => Value₁

dynEnv |- Value₁ permutes to Value₂

statEnv |- unordered Expr₁ => Value₂)

It is important to remark that the permutes to judgments is non deterministics. There are many sequences which can be a permutation of a given sequence. Any of those permutations would satisfy the above semantics.

5.10 Conditional Expressions

Introduction

A conditional expression supports conditional evaluation of one of two expressions.

[13 (XQuery)]

IfExpr

::=

(<"if" "("> Expr ")" "then" Expr "else")* InstanceofExpr

Normalization

[if (Expr₁) then Expr₂ else Expr₃]_Expr

let $fs:new := [ Expr₁ ]_{Effective_Boolean_Value} return

if ($fs:new) then [Expr₂]_Expr else [Expr₃]_Expr

where $fs:new is a newly created variable that does not appear in the rest of the query.

Core Grammar

The core grammar rule for the conditional expression is:

[11 (Core)]

IfExpr

::=

(<"if" "("> Expr ")" "then" Expr "else")* ValueExpr

Static Type Analysis

statEnv |- Expr₁ : xs:boolean statEnv |- Expr₂ : Type₂ statEnv |- Expr₃ : Type₃

statEnv |- if (Expr₁) then Expr₂ else Expr₃ : (Type₂ | Type₃)

Dynamic Evaluation

If the conditional's boolean expression Expr₁ evaluates to true, Expr₂ is evaluated and its value is produced. If the conditional's boolean expression evaluates to false, Expr₃ is evaluated and its value is produced. Note that the existence of two separate evaluation rules ensures that only one branch of the conditional is evaluated.

Ed. Note: DD: actually, I would like that last sentence to be true, but I don't think it is: there is nothing that I know of in the semantics of evaluation rules that says that the clauses must be evaluated left-to-right, so currently nothing stops an implementation from evaluating the body expressions first. Even if this is a functional language generally, I am sure there will be non-functional side-effects in implementation-dependent extensions to the language, and I think there must be a way to formally state that bodies of conditionals do not get executed if their test fails.

dynEnv |- Expr₁ => true dynEnv |- Expr₂ => Value₂

dynEnv |- if Expr₁ then Expr₂ else Expr₃ => Value₂

dynEnv |- Expr₁ => false dynEnv |- Expr₃ => Value₃

dynEnv |- if Expr₁ thenExpr₂ elseExpr₃ => Value₃

5.11 Quantified Expressions

Introduction

[XPath/XQuery] defines two quantification expressions:

[11 (XQuery)]	QuantifiedExpr	::=	(("some" \| "every") TypeDeclaration? "$" VarName "in" Expr ("," TypeDeclaration? "$" VarName "in" Expr)* "satisfies")* TypeswitchExpr
[14 (XPath)]	QuantifiedExpr	::=	((<"some" "$"> \| <"every" "$">) VarName "in" Expr ("," "$" VarName "in" Expr)* "satisfies")* IfExpr

Normalization

The quantification expressions are entirely normalized into other core expressions using the following normalization rules.

First, multiple variable quantifiers are normalized to single variable quantifiers.

[some Variable₁ in Expr₁, ..., Variable_n in Expr_n satisfies Expr]_Expr

[

some Variable₁ in Expr₁ satisfies

some Variable₂ in Expr₂ satistfies

...

some Variable_n in Expr_n satistfies

Expr

]_Expr

[every Variable₁ in Expr₁, ..., Variable_n in Expr_n satisfies Expr]_Expr

[

every Variable₁ in Expr₁ satisfies

every Variable₂ in Expr₂ satistfies

...

every Variable_n in Expr_n satistfies

Expr

]_Expr

[some Variable in Expr₁ satisfies Expr₂]_Expr

xf:not( xf:empty(

for Variable in [Expr₁]_Expr return

if ( [ Expr₂ ]_{Effective_Boolean_Value} ) then 1

else ()

))

[every Variable in Expr₁ satisfies Expr₂]_Expr

xf:empty(

for Variable in [Expr₁]_Expr return

if ( xf:not( [ Expr₂ ]_{Effective_Boolean_Value}) ) then 1

else ()

)

Ed. Note: Note that the non-determinism in the semantics of quantified expressions is not modeled here. See [Issue-0136: Non-determinism in the semantics].

Core Grammar

There are no core grammar rules for quantified expressions as they are normalized to other core expressions.

Static Type Analysis

There are no additional static type rules for the quantified expressions.

Dynamic Evaluation

There are no additional dynamic evaluation rules for the quantified expressions.

5.12 Expressions on SequenceTypes

Introduction

Expressions on SequenceTypes are expressions whose semantics depends on the type of some of the sub-expressions to which they are applied. The syntax of SequenceType expressions is described in [4.4.2 SequenceType].

5.12.1 Instance Of

[14 (XQuery)]

InstanceofExpr

::=

ComparisonExpr ( <"instance" "of"> SequenceType )

Introduction

The SequenceType expression "Expr instance of SequenceType" is true if and only if the result of evaluating expression Expr is an instance of the type referred to by SequenceType.

Normalization

An "instance of" expression is normalized into a "typeswitch" expression. Note that the following normalization rule uses a variable $fs:new, which is a newly created variable which must not conflict with any variables already in scope. This variable is necessary to comply with the syntax of typeswitch expressions in the core [XPath/XQuery], but is never used.

[Expr instance of SequenceType]_Expr

typeswitch ([ Expr ]_Expr)

case SequenceType $fs:new return xf:true()

default $fs:new return xf:false()

Ed. Note: MFF: The "instance of" expression allows an optional "only" modifier. The use case for such a modifier is based on named typing, while the [XPath/XQuery] semantics is currently based on structural typing. It is not clear what the semantics of the "only" modifier under structural typing should be and how it can be supported. See [Issue-0111: Semantics of instance of ... only].

5.12.2 Typeswitch

[12 (XQuery)]	TypeswitchExpr	::=	(<"typeswitch" "("> Expr ")" CaseClause+ "default" ("$" VarName)? "return")* IfExpr
[38 (XQuery)]	CaseClause	::=	"case" SequenceType ("$" VarName)? "return" Expr

Introduction

The typeswitch expression of XQuery allows users to perform different operations according to the type of an input expression.

Each branch of a typeswitch expression may have an optional "Variable" statement, used to bind a variable to the result of the input expression. This variable is optional in [XPath/XQuery] but mandatory in the [XPath/XQuery] core. One of the reasons for having this variable is that it can have a more specific type for the corresponding branch.

Normalization

Normalization of typeswitch expressions is applied to make sure an appropriate "Variable" is present in each branch.

The following general normalization rule merely adds a newly created variable which does not appear in the rest of the query. Note that $fs:new is a newly generated variable that must not conflict with any variables already in scope but is not used in any of the sub-expressions.

[

typeswitch ( Expr₀ )

case SequenceType₁ return Expr₁

···

case SequenceType_n return Expr_n

default return Expr_n+1

]_Expr

typeswitch ( [ Expr₀ ]_Expr )

case SequenceType₁ $fs:new₁ return [ Expr₁ ]_Expr

···

case SequenceType_n $fs:new_n return [ Expr_n ]_Expr

default $fs:new_n+1 return [ Expr_n+1 ]_Expr

Core Grammar

The core grammar productions for typeswitch are:

[10 (Core)]	TypeswitchExpr	::=	(<"typeswitch" "("> Expr ")" CaseClause+ "default" "$" VarName "return")* IfExpr
[23 (Core)]	CaseClause	::=	"case" SequenceType "$" VarName "return" Expr

Notation

The following auxiliary grammar production is used to identify branches of the typeswitch.

[57 (Formal)]

CaseRules

::=

("case" SequenceType "$" VarName "return" Expr CaseRules) | ("default" "return" "$" VarName Expr)

Notation

The following judgment

dynEnv |- Value₁ against CaseRules => Value₂

is used in the dynamic semantics of typeswitch. It indicate that under the environment dynEnv, and with the input value of the typeswitch being "Value₁", the given case rules yields the value Value₂.

Notation

The following judgment

statEnv |- Type knowing (Type₁ | ... | Type_n) against CaseRules : Type₂

is used in the static semantics of typeswitch. It indicates that under the type environment statEnv, with the input type of the typeswitch being "Type", and with the previously visited types Type₁ | ... | Type_n, the given case rules (e.g., "CaseRules") has type Type₂. This typing judgment keep track of all previously visited types in the typeswitch. This additional information is used later on in typing the default clause.

Static Type Analysis

The typeswitch expression possesses one of the more complex sets of static typing rules. The rules account for the fact that if the static type of the conditional expression is known, then it is possible to determine statically that some of the case clauses do not apply, and thus do not contribute to the static type of a typeswitch expression.

Like the dynamic typeswitch rule, the static typeswitch rule relies upon auxiliary rules to determine the type of each of the case clauses and of the default clause. These auxiliary rules are provided after the main rule. Note that the type of the input expression is always treated as a sequence of prime types, using the "prime" and "quantifier" operations on types. This is necessary because the further typing rules compute the common prime types for each case clause of the type switch.

Ed. Note: Jerome: the use of the common prime types replaces the previous use of type intersection. Common prime types simplifies significantly the complexity in implementing typeswitch, but is less precise in certain cases.

statEnv |- Expr₀ : Type₀

CaseType₀ = prime(Type₀) · quantifier(Type₀)

CaseType₁ = [ SequenceType₁ ]_sequencetype

···

CaseType_n = [ SequenceType_n ]_sequencetype

statEnv |- CaseType₀ knowing () against case SequenceType₁ Variable₁ return Expr₁ : Type₁

···

statEnv |- CaseType₀ knowing (CaseType₁ | ... | CaseType_n-1) against case SequenceType_n Variable_n return Expr_n : Type_n

statEnv |- CaseType₀ knowing (CaseType₁ | ... | CaseType_n) against default Variable_n+1 return Expr_n+1 : Type_n+1

statEnv |-

(typeswitch (Expr₀)

case SequenceType₁ Variable₁ return Expr₁

···

case SequenceType_n Variable_n return Expr_n

default Variable_n+1 return Expr_n+1)

: Type₁ | ... | Type_n+1

The rules that determine the static type of each case clause in the typeswitch are given next. In each rule, one needs to compute the "common prime types" between the input type and the case clause SequenceType.

The first rule is applied if the "common prime types" is none. In that case, it is known for sure the corresponding case clause is not evaluated and that the corresponding result type is none. Thanks to this rule, it is often possible to infer a precise type for the overall typeswitch by eliminating some of the cases.

CaseType = [ SequenceType ]_sequencetype

common-prime(prime(CaseType₀), prime(CaseType)) = none

statEnv |- CaseType₀ knowing (CaseType₁ | ... | CaseType_r)) against case SequenceType Variable return Expr : none

The second rule is applied if the "common prime types" is anything other than none. In that case, the input variable is added into the type environment and type inference is applied on the expression on the right-hand side of the case clause. Note that the type of the input variable is set to the "common prime types", and not to the input type.

CaseType = [ SequenceType ]_sequencetype

Type₀ = common-prime(prime(CaseType₀), prime(CaseType))

Occurrence₀ = common-occurrence(quantifier(CaseType₀), quantifier(CaseType))

not(Type₀ = none)

statEnv [ varType(Variable : (Type₀ · Occurrence₀)) ] |- Expr : Type₁

statEnv |- CaseType₀ knowing (CaseType₁ | ... | CaseType_r)) against case SequenceType Variable return Expr : Type₁

Note that these two rules do not take the visited datatypes into account. The "default" clause differs from the other clauses in that it does not specify a SequenceType. The typing rule for the "default" clause uses the visited type instead. Intuitively, the type corresponding to the "default" clause is any type but the ones in the other cases clauses.

Therefore, if the type of the input expression is a subtype of the choice of all visited types, then it is known for sure that the case clause is not evaluated and that the type of the default clause is none.

CaseType₀ <: (CaseType₁ | ... | CaseType_n)

statEnv |- CaseType₀ knowing (CaseType₁ | ... | CaseType_r)) against default Variable return Expr : none

Otherwise, the input variable is added into the type environment, and type inference is applied to the expression on the right-hand side of the default clause. Note that the type of the input variable is set to the input type of the expression.

not(CaseType₀ <: (CaseType₁ | ... | CaseType_n))

statEnv [ varType(Variable : CaseType₀) ] |- Expr : Type₁

statEnv |- CaseType₀ knowing (CaseType₁ | ... | CaseType_r)) against default Variable return Expr : Type₁

Ed. Note: Jerome: There is an asymmetry here. It would be nicer to be able to have the type be more precise, like for the other case clauses. The technical problem is the need for some form of negation. I think one could define a "non-common-primes" function that would do the trick, but I leave that open for now until further review of the new typeswitch section is made. See [Issue-0112: Typing for the typeswitch default clause].

Example

The typing rules for typeswitch provides reasonably precise type information in a number of useful cases. For example consider the following simple schema and query.

    <xs:element name="book">
       <xs:complexType>
          <xs:element name="title" type="xs:string"/>
          <xs:element name="author>
             <xs:complexType>
                <xs:element name="name" type="xs:string"/>
                <xs:element name="affiliation" type="xs:string"/>
             </xs:complexType>
          </xs:element>
       </xs:complexType>
    </xs:element>
    
    <xs:element name="bib">
       <xs:complexType>
          <xs:element ref="book" minOccurs="0" maxOccurs="unbounded"/>
       </xs:complexType>
    </xs:element>
    
    ...
    
    for $x in $bib/book/*
    return
       typeswitch $x
       case element author $a return $a/name
       case element editor $e return $e
       case element title $t return <wrote>{data($t)}</wrote>
       default return ()

The static rules for typeswitch can determine (1) that, with the given input schema, the second case matching against element editor never applies, and (2) that the data() function in the third case matching against element title, will not raise an error, because $e is guaranteed to contain data rather than more nodes. The resulting type corresponds to the following schema fragment.

    <xs:choice minOccurs="0" maxOccurs="unbounded">
       <xs:element name="name" type="xs:string"/>
       <xs:element name="wrote" type="xs:string"/>
    </xs:choice>

The type rules for typeswitch do not, however, account for the interdependence between successive case clauses. Thus if two case clauses had overlapping SequenceTypes, the static rules would behave as if both case clauses "fired", rather than just the first one.

Ed. Note: Jerome: It seems that the simpler version of typeswitch proposed here would actually allow us to take previous case clauses into account. This is something worth exploring as it would improve the static analysis in a way that might be helpful to users. See [Issue-0112: Typing for the typeswitch default clause].

Ed. Note: Issue: There seem to be cases where the current semantics of typeswitch breaks type substitutability. See [Issue-0174: Semantics of 'element foo of type T'].

Dynamic Evaluation

The evaluation of a typeswitch proceeds as follows. First, the input expression is evaluated, yielding an input value. Then the first case clause whose SequenceType type matches that value is selected and its corresponding expression is evaluated.

Note that the dynamic environmentdynEnv is only extended with this binding by the first auxiliary rule, which applies, if the input value matches the corresponding sequence type, or by the third auxiliary rule for the default case.

dynEnv |- Expr => Value₀

dynEnv |- Value₀ against CaseRules => Value₁

dynEnv |- typeswitch (Expr) CaseRules => Value₁

If the value matches the sequence type, the first auxiliary rule applies: It extends the environment by binding the variable Variable to Value₀ and evaluates the body of the case rule.

CaseType = [ SequenceType ]_sequencetype

Value₀ matches CaseType

dynEnv [ varValue(Variable |-> Value₀) ] |- Expr => Value₁

dynEnv |- Value₀ against case SequenceType Variable return Expr CaseRules => Value₁

If the value does not match the sequence type, the second auxiliary rule evaluates on the remaining case rules, and the current case rule is not evaluated.

CaseType = [ SequenceType ]_sequencetype not (Value₀ matches CaseType) dynEnv |- Value₀ against CaseRules => Value₁

dynEnv |- Value₀ against case SequenceType Variable return Expr CaseRules => Value₁

Finally, the last rule states that the "default" branch of a typeswitch expression always evaluates to its given expression.

dynEnv [ varValue(Variable |-> Value₀) ] Expr => Value₁

dynEnv |- Value₀ against default Variable return Expr => Value₁

5.12.3 Cast and Treat

[30 (XQuery)]	CastExpr	::=	(<"cast" "as"> \| <"treat" "as">) SequenceType ParenthesizedExpr
[30 (XPath)]	CastExpr	::=	(<"cast" "as"> \| <"treat" "as">) SequenceType ParenthesizedExpr

Core Grammar

The core grammar production for cast as is:

[19 (Core)]

CastExpr

::=

<"cast" "as"> SequenceType ParenthesizedExpr

Cast expressions are expressions that check or change the type of an expression against a given type.

5.12.3.1 Cast as

Introduction

The expression "cast as SequenceType ( Expr )" can be used to explicitly convert the result of an expression from one type to another. It changes both the type and value of the result of an expression, and can only be applied on an atomic value.

The semantics of cast expressions follows the specification given in Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document. The casting table in Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document indicates whether a cast is allowed or not. In case it is allowed, a specific cast function is applied, based on the input and output XML Schema simple types. The semantics of the cast function follows casting rules which are described in the the remainder of Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document and is not specified further here.

Ed. Note: Jerome: The [XQuery 1.0 and XPath 2.0 Functions and Operators] document does not provide any function for casting, just a table and casting rules. It would be preferable to either have an explicit function to normalize to, or to put the semantics of casts in the Formal Semantics. This relates to Issue 17 in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Normalization

Cast operations perform atomization. This is captured during normalization, as specified by the following rule.

[cast as SequenceType (Expr)]_Expr

let $v := [ Expr ]_Atomize return

cast as SequenceType ($v)

Notation

The following auxiliary judgments are used to represent access to the casting table and to the semantics of casting, as described in Section [14. Casting Functions] of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

The judgment:

Type₁ cast allowed Type₂ => { Y, M, N }

holds if casting from type Type₁ to Type₂ is always possible (Y), may be possible (M), or is not allowed (N).

The notation:

cast as Type₂ ( Value₁ ) => Value₂

indicates that applying the casting rules for Type₂ on Value₁ yields the value Value₂.

Static Type Analysis

If the cast table indicates the cast is not allowed (N), the system raises a static type error. Otherwise, the following static typing rules give the static semantics of "cast as" expression. Two cases are distinguished: if the static type of the expression Type₁ can be the empty sequence, then the static output type of the cast is the given type Type₂ made optional. Otherwise the static output type is the given type Type₂

statEnv |- Expr : Type₁

quantifier(Type₁) = ? or quantifier(Type₁) = *

Type₂ = [ SequenceType₂ ]_sequencetype

Type₁ cast allowed Type₂ => Y or Type₁ cast allowed Type₂ => M

statEnv |- cast as SequenceType₂ (Expr) : Type₂ ?

statEnv |- Expr : Type₁

quantifier(Type₁) != ? quantifier(Type₁) != *

Type₂ = [ SequenceType₂ ]_sequencetype

Type₁ cast allowed Type₂ => Y or Type₁ cast allowed Type₂ => M

statEnv |- cast as SequenceType₂ (Expr) : Type₂

Dynamic Evaluation

If the cast is allowed (Y or M), the following evaluation rule applies the casting rules on the result of the input expression. The rule uses the data model function dm:type in order to obtain the dynamic type of the input value, SequenceType normalization to obtain the output type, and the above auxiliary judgments to check whether the cast is allowed and to apply the casting rules.

dynEnv |- Expr₁ => Value₁

Type₂ = [ SequenceType₂ ]_sequencetype

cast as Type₂ ( Value₁ ) => Value₂

dynEnv |- cast as SequenceType₂ ( Expr₁ ) => Value₂

Note that in the case that the casting table indicates "M", the casting operation is allowed but might fail at run-time if the input value is inappropriate (e.g. attempting to cast the string "VRAI" into xs:boolean). In that case, the dynamic evaluation returns an error value.

If the casting table returns "N", the cast is not allowed and the dynamic semantics always returns an error value.

dynEnv |- Expr₁ => Value₁

dm:type(Value₁) = Type₁

Type₂ = [ SequenceType₂ ]_sequencetype

Type₁ cast allowed Type₂ => N

dynEnv |- cast as SequenceType₂ ( Expr₁ ) => xf:error()

5.12.3.2 Treat as

Introduction

The expression "treat as SequenceType ( Expr )", can be used to change the static type of the result of an expression without changing its value. Treat as raises a static type error, if the static type of expression and the specified type are incompatible. Treat as raises a run-time error, if the dynamic type of the expression is not an instance of the specified type.

Normalization

Treat as expressions are normalized to typeswitch expressions. Note that the following normalization rule uses a variable $fs:new, which is a newly created variable which must not conflict with any variables already in scope.

[treat as SequenceType ( Expr )]_Expr

typeswitch ([ Expr ]_Expr) as $fs:new

case SequenceType return $fs:new

default return xf:error()

5.13 Validate Expressions

[29 (XQuery)]

ValidateExpr

::=

"validate" SchemaContext? "{" Expr "}"

Core Grammar

The core grammar production for validate is:

[18 (Core)]

ValidateExpr

::=

"validate" SchemaContext? "{" Expr "}"

A validate expression validates its argument with respect to the in-scope schema definitions, using the schema validation process described in [XML Schema]. The argument to a validate expression may be any sequence of elements. Validation replaces element and attribute nodes with new nodes that have their own identity and contain type annotations and defaults created by the validation process. If a node that has a parent is validated, the parent of the original node will not be the parent of the validated node.

Static Type Analysis

Ed. Note: Issue: The current form of validate cannot be typed statically. See [Issue-0166: Static typing for validate].

Dynamic Evaluation

An important effect of validation is that it removes existing type annotations and values (erasure), and it revalidates the corresponding data model instance, possibly adding new type annotations and values (annotation). Both erasure and annotation are described formally in [3.4 Erase and Annotate]. Indeed, the conjunction of erasure and annotation provides a formal model for a large part of actual schema validation. The semantics of the "validate" operation is (partially) specified as follows.

dynEnv.varValue |- Expr => Value₁

Value₁ erases to element ElementName { Value₂ }

annotate as element ElementName (element ElementName { Value₂ }) => Value₃

dynEnv |- validate element ElementName { Expr } => Value₃

Ed. Note: Issue: This semantics only applies when validating against a top-level element. See [Issue-0138: Semantics of Schema Context]

6 The Query Prolog

Introduction

The Query Prolog is a series of declarations and definitions that affect query processing. The Query Prolog can be used to define namespaces, import type definitions from XML Schemas, and define functions. Namespace declarations and schema imports always precede function definitions, as specified by the following grammar productions.

[1 (XQuery)]	Query	::=	QueryProlog QueryBody
[2 (XQuery)]	QueryProlog	::=	(NamespaceDecl \| XMLSpaceDecl \| DefaultNamespaceDecl \| DefaultCollationDecl \| SchemaImport)* FunctionDefn*
[3 (XQuery)]	QueryBody	::=	ExprSequence?

The order in which functions are defined is immaterial. Notably, user-defined functions may invoke other user-defined functions in any order.

Namespace Declarations and Schema Import are not part of the query proper but are used to modify the input context for the rest of the query processing. Namespace Declarations and Schema Import are processed before the normalization phase.

The semantics of Schema Import is described in terms of the XQuery type system. The process of converting an XML Schema into a sequence of type declarations is described in Section [8 Importing Schemas]. This section describes how the resulting sequence of type declarations is added into the static environment when the prolog is processed.

Notation

Prolog declarations are either namespace declarations or type declarations.

[58 (Formal)]	PrologDeclList	::=	PrologDecl*
[59 (Formal)]	PrologDecl	::=	NamespaceDecl \| XMLSpaceDecl \| DefaultNamespaceDecl \| DefaultCollationDecl \| SchemaImport

Notation

The following auxiliary judgments are used when processing the [XPath/XQuery] prolog.

The judgment:

PrologDeclList => statEnv

holds if the sequence of prolog declarations PrologDeclList yields the static environment statEnv.

The judgment:

statEnv₁ |- PrologDecl => statEnv₂

holds if under the static environment statEnv₁, the single prolog declarations PrologDeclList yields the new static environment statEnv₂.

Context Processing

Prolog declarations are processed in the order they are encountered, as described by the following inference rules. The first rule specifies that for an empty sequence of prolog declarations, the static environment is composed of a default context.

Ed. Note: Jerome: What do the default namespace and type environments contain? I believe at least the default namespace environment should contain the "xs", "xf" and "op" prefixes, as well as the default namespaces bound to the empty namespace. Should the default type environment contain wildcard types? See [Issue-0115: What is in the default context?].

() => statEnvDefault

PrologDeclList => statEnv₁

statEnv₁ |- PrologDecl => statEnv₂

PrologDecl PrologDeclList => statEnv₂

6.1 Namespace Declarations

[84 (XQuery)]	NamespaceDecl	::=	<"declare" "namespace"> NCNameForPrefix "=" URLLiteral
[86 (XQuery)]	DefaultNamespaceDecl	::=	(<"default" "element"> \| <"default" "function">) "namespace" "=" URLLiteral

Context Processing

A namespace declaration adds a new (prefix,uri) binding in the namespace component of the static environment.

statEnv₁ = statEnv [ namespace(NCName |-> StringLiteral) ]

statEnv |- namespace NCName = StringLiteral => statEnv₁

A default element namespace declaration changes the default element namespace prefix binding in the namespace component of the static environment.

statEnv₁ = statEnv [ namespace(fsdefaultelem |-> StringLiteral) ]

statEnv |- default element namespace = StringLiteral => statEnv₁

A default function namespace declaration changes the default function namespace prefix binding in the namespace component of the static environment.

statEnv₁ = statEnv [ namespace(fsdefaultfunc |-> StringLiteral) ]

statEnv |- default function namespace = StringLiteral => statEnv₁

Note that for namespaces, later declarations can override earlier declarations of the same prefix.

6.2 Schema Imports

[89 (XQuery)]	SchemaImport	::=	<"import" "schema"> (StringLiteral \| SubNamespaceDecl \| DefaultNamespaceDecl) <"at" StringLiteral>?
[85 (XQuery)]	SubNamespaceDecl	::=	"namespace" NCNameForPrefix "=" URLLiteral

Notation

The following auxiliary judgments are used when processing schema imports.

The judgment:

statEnv₁ |- Definition* => statEnv₂

holds if under the static environment statEnv₁, the sequence of type definitions Definition* yields the new static environment statEnv₂.

The judgment:

statEnv₁ |- Definition => statEnv₂

holds if under the static environment statEnv₁, the single definition Definition yields the new static environment statEnv₂.

Context Processing

Schema imports are first imported into the query type system and yield a sequence of type definitions. Then each type definitions is added to the static environment.

Definition* = [Schema]_Schema

statEnv |- Definition* => statEnv₁

statEnv |- import schema Schema => statEnv₁

An empty sequence of definitions yields the original environment.

statEnv |- () => statEnv

Each definition is added into the static environment.

statEnv |- Definition* => statEnv₁

statEnv₁ |- Definition₁ => statEnv₂

Definition₁ Definition* => statEnv₂

Type, element and attribute declarations are added respectively to the type, element and attribute declarations components of the static environment.

statEnv₁ = statEnv [ typeDefn(TypeName |-> define type TypeName TypeDerivation ) ]

statEnv |- define type TypeName TypeDerivation => statEnv₁

statEnv₁ = statEnv [ elemDecl(ElementName |-> define element ElementName Substitution? Nillable? TypeSpecifier) ]

statEnv |- define element ElementName Substitution? Nillable? TypeSpecifier => statEnv₁

statEnv₁ = statEnv [ attrDecl(AttributeName |-> define attribute AttributeName TypeSpecifier) ]

statEnv |- define attribute AttributeName TypeSpecifier => statEnv₁

Note that for namespaces, later declarations can override earlier declarations of the same prefix. In the case of global elements, attributes and types, multiple declarations correspond to an error.

6.3 Xmlspace Declaration

[82 (XQuery)]

XMLSpaceDecl

::=

<"declare" "xmlspace"> "=" ("preserve" | "strip")

The impact of xmlspace declaration is not formally specified in this document.

6.4 Default Collation

[83 (XQuery)]

DefaultCollationDecl

::=

<"default" "collation" "="> URLLiteral

The impact of default collation is not formally specified in this document. See [Issue-0160: Collations in the static environment].

6.5 Function Definitions

Introduction

User defined functions specify the name of the function, the names and types of the parameters, and the type of the result. The function body defines how the result of the function is computed from its parameters.

[87 (XQuery)]	FunctionDefn	::=	<"define" "function"> FuncName "(" ParamList? (")" \| (<")" "returns"> SequenceType)) EnclosedExpr
[88 (XQuery)]	ParamList	::=	Param ("," Param)*
[59 (XQuery)]	Param	::=	SequenceType? "$" VarName

Notation

The following auxiliary mapping rule is used for the normalization of parameters in function definitions: []_Param.

Normalization

The only form of normalization required for user defined functions is adding the type for its parameters or for the return clause if it is not provided.

[ define function QName ( ParamList? ) returns SequenceType EnclosedExpr ]_Expr

define function [QName] ( [ParamList?]_Param ) returns SequenceType [EnclosedExpr]_Expr

If the return type of the function is not provided, it is given the item* sequence type.

[define function QName ( ParamList? ) EnclosedExpr ]_Expr

define function [QName] ( [ParamList?]_Param ) returns item* [EnclosedExpr]_Expr

Parameters without a declared typed are given the item* sequence type.

[Variable]_Param

item* Variable

[SequenceType Variable]_Param

SequenceType Variable

Context Processing

First, all the function signatures are added into the static environment, and all the function bodies are added into the dynamic environment. This process happens before static type analysis occurs.

FunctionDefnList => statEnv, dynEnv

statEnv' = statEnv [ funcType(QName |-> ( SequenceType₁ , ··· SequenceType_n , SequenceType_r)) ]

dynEnv' = dynEnv [ funcDefn(QName |-> ( Expr , Variable₁ , ··· Variable_n)) ]

FunctionDefnList define function QName ( SequenceType₁ Variable₁, ··· SequenceType_n Variable_n )
returns SequenceType_r
{ Expr } => statEnv', dynEnv'

Static Type Analysis

The static typing rules for function definitions checks whether the type of the enclosed expression is consistent with the type of the input parameters, and the type of the return clause.

statEnv [ varType( Variable₁ : SequenceType₁ ;...; Variable_n : SequenceType_n ) ] |- Expr : Type

Type <: SequenceType_r

statEnv |- define function QName ( SequenceType₁ Variable₁ , ··· SequenceType_n Variable_n )
returns SequenceType_r
{ Expr }

What this typing rule is checking is: if the input parameters are of the given type, then is it true that the result of the function is of the return type. If the type checking fails, the system raises an error. Otherwise, it does not have any other effect, as function signatures are already inside the static environment.

Dynamic Evaluation

There is no need to describe a dynamic semantics at this point, as functions are only evaluated when called. The actual semantics of function calls is described in [5.1.4 Function Calls].

7 Additional Semantics of Functions

Ed. Note: Status: This section is still incomplete. This section will be completed as soon as Sections 4 and 5 are consolidated. See [Issue-0135: Semantics of special functions].

As was explained in section [2.5 Functions], a number of functions play a role in defining the formal semantics of [XPath/XQuery]. Some other functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document need special static typing rules. This section gives the semantics of all "special" functions, used in the formal semantics.

7.1 Formal Semantics Functions

Introduction

This section gives the definition and semantics of functions which are used in the formal semantics but are not part of the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

7.1.1 The `fs:characters-to-string` function

The fs:characters-to-string function takes a sequence of character values and synthecizes a string.

7.1.2 The `fs:distinct-doc-order` function

The fs:distinct-doc-order function sorts by document order and removes duplicates. It is defined as the composition of the [XQuery 1.0 and XPath 2.0 Functions and Operators] functions xf:distinct-nodes and sorting by document order.

The dynamic semantics of fs:distinct-doc-order cannot be specified as a simple sortby expression. See [Issue-0168: Sorting by document order].

7.1.3 The `fs:item-sequence-to-node-sequence` function

The fs:item-sequence-to-node-sequence function converts a sequence of item values to nodes by applying the rules in Section [3.7.2 Data Model Representation] in .

7.1.4 The `fs:item-sequence-to-string` function

The fs:item-sequence-to-node-sequence function converts a sequence of item values to a string by applying the rules in Section [3.7.2 Data Model Representation] in .

In particular, each node is replaced by its string value. For each adjacent sequence of one or more atomic values returned by an enclosed expression, a string is constructed, containing the canonical lexical representation of all the atomic values, with a single blank character inserted between adjacent values.

7.2 Functions with specific typing rules

7.2.1 The `xf:error` function

Static Type Analysis

The xf:error function always returns the none type.

statEnv |- xf:error() : none

7.2.2 The `xf:distinct-nodes`, and `xf:distinct-values` functions

Static Type Analysis

The xf:distinct-nodes function takes a sequence of nodes as input and returns a sequence of prime types.

statEnv |- Expr : Type

statEnv |- Type <: node*

statEnv |- xf:distinct-nodes(Expr) : prime(Type) · quantifier(Type)

The xf:distinct-values function takes a sequence of atomic values as input and returns a sequence of prime types.

statEnv |- Expr : Type

statEnv |- Type <: atomic value*

statEnv |- xf:distinct-values(Expr) : prime(Type) · quantifier(Type)

7.2.3 The `op:union`, `op:intersect` and `op:except` operators

Static Type Analysis

The static semantics for op:union uses the auxiliary type functions prime(Type) and quantifier(Type); which are defined in [3.6 Auxiliary typing judgments for for, unordered, and sortby expressions]. The type of each argument is determined, and then prime(.) and quantifier(.) are applied to the sequence type (Type₁, Type₂).

statEnv |- Expr₁ : Type₁ statEnv |- Expr₂ : Type₂

statEnv |- op:union(Expr₁, Expr₂) : prime(Type₁ , Type₂) · quantifier(Type₁ , Type₂)

The static semantics of op:intersect is analogous to that for op:union, but uses the common-prime and common-occurrence operations. Because an intersection may always be empty, the result type needs to be made optional.

statEnv |- Expr₁ : Type₁

statEnv |- Expr₂ : Type₂

statEnv |- PrimeType₃ = common-prime(prime(Type₁), prime(Type₂))

statEnv |- Occurrence₃ = common-occurrence(quantifier(Type₁), quantifier(Type₂))

statEnv |- op:intersect(Expr₁, Expr₂) : PrimeType₃ · Occurrence₃ ?

The static semantics of op:except follows. The type of the second argument is ignored as it does not contribute to the result type. Like with op:intersect the result of op:except may be the empty list.

statEnv |- Expr₁ : Type₁

statEnv |- op:except(Expr₁, Expr₂) : prime(Type₁) · quantifier(Type₁) · ?

7.2.4 The `op:to` operator

The static semantics of the op:to function states that it always returns an integer sequence:

statEnv |- Expr₁ : xs:integer statEnv |- Expr₂ : xs:integer

statEnv |- op:to(Expr₁, Expr₂) : xs:integer*

Ed. Note: MFF: the binary operator "to" is not defined on empty sequences. The [XQuery 1.0 and XPath 2.0 Functions and Operators] document says operands are decimals, while the XQuery document says they are integers. What happens when Expr1 > Expr2? See [Issue-0119: Semantics of op:to].

7.2.5 The `xf:data` function

Introduction

The xf:data function is used to access the value content of an element or attribute. This function corresponds to the dm:typed-value accessor in the [XPath/XQuery] data model.

Ed. Note: Some aspects of the semantics of the xf:data() function are still an open issue. For instance, what should be the result of xf:data() over a text node. See [Issue-0107: Semantics of data()].

Notation

Infering the type for the xf:data function is done by applying the xf:data function as a Filter, using the same approach as for the XPath steps.

The following notation, adapted from the Filter judgment in [5.2.1 Steps], is used.

dynEnv.varValue ; Type₁ |- xf:data : Type₂

Static Type Analysis

Static type analysis for the xf:data function checks that the function is applied on an element or attribute with a simple type content. If so, it returns the corresponding simple type through an application of the Filter judgment on the input type of the function.

statEnv |- Expr : Type₁

statEnv |- Type₁ <: (element { attribute*, xs:anySimpleType } | attribute })

dynEnv.varValue ; Type₁ |- xf:data : Type₂

statEnv |- xf:data(Expr) : Type₂

statEnv ; element ElementName { AttrType, SimpleType } |- xf:data : SimpleType;

statEnv ; attribute AttributeName { SimpleType } |- xf:data : SimpleType;

If applied on any other kind of item, it returns the empty sequence.

statEnv |- Expr : Type

statEnv |- not(Type <: (element { attribute*, xs:anySimpleType } | attribute))

statEnv |- Type <: item

dynEnv |- xf:data(Expr) : ()

Otherwise (for empty sequences or sequences of more than one item value), it raises an error.

Example

Consider the following variables and its corresponding static type.

  
    $x : (element price { attribute currency { xs:string }, xs:decimal }
         | element price_code { xs:integer })

Applying the xf:data function on that variable results in the following type.

    xf:data($x) : (xs:decimal | xs:integer)

Remark that, as the input type is a choice, applying the Filter judgment results in a choice of simple types for the output of the xf:data function.

Dynamic Evaluation

Dynamically, the xf:data function is implemented as the dm:typed-value data model accessor.

dynEnv |- Expr => Value₁ dm:typed-value(Value₁) = Value₂

dynEnv |- xf:data(Expr) => Value₂

8 Importing Schemas

Ed. Note: Status:This section has been extensively revised to reflect the addition of named typing to the [XPath/XQuery] type system, and improve the editorial level. Feedback on the new mapping and on the new presentation is solicited.

This section describes how XML Schema declarations, as specified by [XML Schema] are imported into the [XPath/XQuery] type system.

8.1 Introduction

At compile time, the [XPath/XQuery] environment imports XML Schema declarations and loads them as declarations in the [XPath/XQuery] type system. The semantics of that loading process is defined by normalization rules that map XML Schema descriptions into the [XPath/XQuery] type system.

8.1.1 Features

Here is summarized the XML Schema features which are covered by the formal semantics, and handled by the import mapping described in this section. For each feature, the following indications are used.

Handled indicates features which are relevant for [XPath/XQuery], are modeled within the [XPath/XQuery] type system, and handled by the mapping.
Not yet indicates features which are relevant for [XPath/XQuery], but are not yet modeled within the [XPath/XQuery] type system or are not yet handled by the mapping. Those features correspond to open issues. In case the [XPath/XQuery] type system provides appropriate support for those features, but the mapping is incomplete, the additional annotation mapping only is used.
Not handled indicates features which are relevant for [XPath/XQuery], but are not modeled within the [XPath/XQuery] type system, and are not handled by the mapping. Such feature are typically only related to validation, for which the formal semantics will only give a partial model.
Ignored Indicates features which are not relevant for [XPath/XQuery], are not modeled within the [XPath/XQuery] type system, and are not relevant for the mapping. Such features might have to do with documentation of the schema, or might affect which Schemas are legal, but do not affect which documents match which Schemas.

Here is the exhaustive list of XML Schema features and their status in this document.

Feature:	Supported
Primitive Simple types	Handled
Simple type derivation by restriction	Handled
Derivation by list and union	Handled
Facets on simple types	Not handled
ID and IDREF constraints	Ignored
Attribute Declarations
default,fixed,use	Not yet
Element Declarations
default, fixed (value constraint)	Not yet
nillable	Handled
substitution group affiliation	Handled
substitution group exclusions	Ignored
disallowed substitutions	Ignored
abstract	Not yet
Complex Type Definitions
derivation by restriction	Handled
derivation by extension	Handled
final	Ignored
abstract	Not yet
AttributeUses
required	Not yet, mapping only
default, fixed (value constraint)	Not yet
Attribute Group Definitions	Not yet, mapping only
Model Group Definitions	Not yet, mapping only
Model Groups	Handled
Particles	Handled
Wildcards
process contents strict, skip, lax	Not yet
Identity-constraint Definitions	Ignored
Notation Declarations	Ignored
Annotations	Ignored

Note that the schema import feature specified here assumes it is given a legal schema as input. As a result, it is not necessary to check for 'block' or 'abstract' attributes.

8.1.2 Organization

The presentation of the schema mapping is done according to the following organization.

Schema component

First each schema component is summarized using the same notation used in the XML Representation Summary sections in [XML Schema]. For instance, here is the XML Representation Summary for complex types.

<complexType

[ ignored ] abstract = boolean : false

[ ignored ] block = (#all | List of (extension | restriction))

[ ignored ] final = (#all | List of (extension | restriction))

[ ignored ] id = ID

mixed = boolean : false

name = NCName

[ ignored ] {any attributes with non-schema namespace . . .} >

</complexType>

Attributes indicated as [ ignored ] are not mapped into the [XPath/XQuery] type system.

Attributes indicated as [ not handled ] are not currently handled by the mapping.

Note that in order to simplify the mapping, it is assumed that the default values for all attributes in the XML Representation of Schema are filled in. For instance in the above complex type, if the mixed attribute is not present, it will be treated as being present and having the value "false".

Schema mapping

XML Schema import is specified by means of mapping rules. All mapping rules have the structure below.

[SchemaComponent]_Subscript

TypeComponent

The SchemaComponent above the horizontal rule denotes an XML Schema component before translation and the TypeComponent beneath the horizontal rule denotes an equivalent type component in the [XPath/XQuery] type system.

Notation

Whenever necessary for the mapping rules, specific grammar productions which describe fragments of XML Schema may be introduced. For instance, here are grammar productions used to describes fragments of the XML Representation Summary for the complexType Element Information Item.

[43 (Formal)]	ComplexTypeContent	::=	"annotation"? ("simpleContent" \| "complexContent" \| (ChildrenContent AttributeContent))
[46 (Formal)]	AttributeContent	::=	("attribute" \| "attributeGroup")* "anyAttribute"?
[44 (Formal)]	ChildrenContent	::=	("group" \| "all" \| "choice" \| "sequence")?

As in the rest of this document, some mapping rules may use fragments of the XML Representation corresponding to the syntactic categories defined by those grammar productions. For instance, the following complex type fragment uses the syntactic categories: TypeName, ComplexTypeContent, and AttributeContent, ChildrenContent, and MixedAttribute.

<complexType

name = TypeName

MixedAttribute>

ChildrenContent AttributeContent

</complexType>

8.1.3 Main mapping rules

Notation

The normalization rule

[Schema]_Schema

Definition*

maps a complete schema into a set of Definition in the [XPath/XQuery] type system.

The normalization rule

[SchemaComponent]_{definition(targetNCName)}

Definition

maps a toplevel schema component into a Definition in the [XPath/XQuery] type system, given the target namespace .

The normalization rule

[SchemaComponent]_{content(targetNCName)}

TypeComponent

maps a schema component not directly under the schema element, into a TypeComponent in the [XPath/XQuery] type system, given the target namespace .

8.1.4 Special attributes

The XML Schema attributes: use, minOccurs, maxOccurs, mixed, nillable, and substitutionGroup, require specific mapping rules.

8.1.4.1 use

The "use" attribute is used to describe the occurrence and default behavior of a given attribute.

Notation

The following auxiliary grammar productions are used to describe the "use" attribute.

The normalization rule

[UseAttribute]_use

Occurrence

maps the use attribute UseAttribute in Schema into the occurrence indicator Occurrence in the [XPath/XQuery] type system.

Schema mapping

Use attributes are mapped to the type system in the following way.

use = "optional"_use

use = "required"_use

Ed. Note: Issue: how derivation of attribute declaration and the "prohibited" use attributes are mapped in the [XPath/XQuery] type system is still an open issue.

8.1.4.2 minOccurs and maxOccurs

Notation

The following auxiliary grammar productions are used to describe occurrence attributes.

[42 (Formal)]	OccursAttributes	::=	"maxOccurs" "minOccurs"
[40 (Formal)]	maxOccurs	::=	"maxOccurs" "=" ("nonNegativeInteger" \| "unbounded")
[41 (Formal)]	minOccurs	::=	"minOccurs" "=" "nonNegativeInteger"

The normalization rule

[OccursAttributes]_occurs

Occurrence

maps the occurrence attributes OccursAttributes in Schema into the occurrence indicator Occurrence in the [XPath/XQuery] type system.

Schema mapping

Occurrence attributes are mapped to the type system in the following way.

[minOccurs="0" maxOccurs="1"]_occurs

[minOccurs="1" maxOccurs="1"]_occurs

[minOccurs="0" maxOccurs="n"]_occurs

[minOccurs="1" maxOccurs="n"]_occurs

where n > 1.

[minOccurs="n" maxOccurs="m"]_occurs

where n >= m > 1

8.1.4.3 mixed

Notation

The following auxiliary grammar productions are used to describe the "mixed" attribute.

[37 (Formal)]

MixedAttribute

::=

"mixed" "=" Boolean

The normalization rule

[MixedAttribute]_mixed

Mixed

maps the mixed attribute MixedAttribute in Schema into a Mixed notation in the [XPath/XQuery] type system.

Schema mapping

If the mixed attribute is true it is mapped to a mixed notation in the [XPath/XQuery] type system.

[ mixed = "true" ]_mixed

mixed

If the mixed attribute is false it is mapped to empty in the [XPath/XQuery] type system.

[ mixed = "false" ]_mixed

8.1.4.4 nillable

Notation

The following auxiliary grammar productions are used to describe the "nillable" attribute.

[38 (Formal)]

NillableAttribute

::=

"nillable" "=" Boolean

The normalization rule

[NillableAttribute]_nillable

Nillable

maps the nillable attribute NillableAttribute in Schema into a Nillable notation in the [XPath/XQuery] type system.

Schema mapping

If the nillable attribute is true it is mapped to a nillable notation in the [XPath/XQuery] type system.

[ nillable = "true" ]_nillable

nillable

If the nillable attribute is false it is mapped to empty in the [XPath/XQuery] type system.

[ nillable = "false" ]_nillable

8.1.4.5 substitutionGroup

Notation

The substitution group declaration indicates the element that a given element can be substituted for. The following auxiliary grammar productions are used to describe the "substitutionGroup" attribute.

[39 (Formal)]

substitutionGroupAttribute

::=

"substitutionGroup" "=" QName

The normalization rule

[substitutionGroupAttribute]_substitution

Substitution

maps the substitutionGroup attribute substitutionGroupAttribute in Schema into a Substitution notation in the [XPath/XQuery] type system.

Schema mapping

If the substitutionGroup attribute is present, it is mapped to a substitutionGroup notation in the [XPath/XQuery] type system.

[ substitutionGroup = QName ]_substitution

substitutes for QName

Otherwise, it is mapped to empty.

8.2 Attribute Declarations

Schema component

The following structure describes attribute declarations in XML Schema.

<attribute

[ not handled ] default = string

[ not handled ] fixed = string

[ not handled ] form = (qualified | unqualified)

[ ignored ] id = ID

name = NCName

ref = QName

type = QName

use = (optional | prohibited | required) : optional

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (simpleType?))

</attribute>

8.2.1 Global attributes declarations

Schema import distinguishes between global attribute declarations and local attribute declarations.

Schema mapping

Global attribute declarations are mapped like local attribute declarations, but are prefixed by a "define" keyword in the [XPath/XQuery] type system.

[AttributeDecl]_{definition(targetNCName)}

define [AttributeDecl]_{content(targetNCName)}

8.2.2 Local attribute declarations

Schema mapping

Local attributes whose type is given by a reference to a global type name are mapped in the type system as follows.

[

<attribute

name = NCName

type = QName

UseAttribute />

]_{content(targetNCName)}

( attribute targetNCName:NCName { of type QName } )[UseAttribute]_use

References to a global attribute are mapped in the type system as follows.

[

<attribute

ref = QName

UseAttribute />

]_{content(targetNCName)}

( attribute QName )[UseAttribute]_use

A local attribute with a local content is mapped to the [XPath/XQuery] type system as follows.

[

<attribute

name = NCName

UseAttribute>

simpleType

</attribute>

]_{content(targetNCName)}

( attribute targetNCName:NCName { [simpleType]_{content(targetNCName)} } )[UseAttribute]_use

8.3 Element Declarations

Schema component

The following structure describes attribute declarations in XML Schema.

<element

[ ignored ] abstract = boolean : false

[ ignored ] block = (#all | List of (extension | restriction))

[ not handled ] default = string

[ ignored ] final = (#all | List of (extension | restriction))

[ not handled ] fixed = string

[ not handled ] form = (qualified | unqualified)

[ ignored ] id = ID

maxOccurs = (nonNegativeInteger | unbounded) : 1

minOccurs = nonNegativeInteger : 1

name = NCName

nillable = boolean : false

ref = QName

substitutionGroup = QName

type = QName

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, ((simpleType | complexType)?, (unique | key | keyref)*))

</element>

8.3.1 Global element declarations

Schema import distinguishes between global element declarations and local element declarations.

Schema mapping

Global element declarations are mapped like local element declarations, but are prefixed by a "define" keyword in the [XPath/XQuery] type system.

[

<element

name = NCName

NillableAttribute

substitutionGroupAttribute

type = QName />

]_{definition(targetNCName)}

define element targetNCName:NCName [substitutionGroupAttribute]_substitution [NillableAttribute]_nillable of type QName

[

<element

name = NCName

NillableAttribute

substitutionGroupAttribute>

ElementContent

</element>

]_{definition(targetNCName)}

define element targetNCName:NCName [substitutionGroupAttribute]_substitution [NillableAttribute]_nillable [ElementContent]_{content(targetNCName)}

8.3.2 Local element declarations

Schema mapping

Local element declarations, but mapped into corresponding notations in the [XPath/XQuery] type system. Note that substitution group cannot be declared on local elements.

[

<element

OccursAttributes

name = NCName

NillableAttribute

type = QName />

]_{content(targetNCName)}

( element targetNCName:NCName [NillableAttribute]_nillable of type QName ) [OccursAttributes]_occurs

[

<element

OccursAttributes

ref = QName />

]_{content(targetNCName)}

( element QName ) [OccursAttributes]_occurs

[

<element

OccursAttributes

name = NCName

NillableAttribute>

ElementContent

</element>

]_{definition(targetNCName)}

( element targetNCName:NCName [NillableAttribute]_nillable [ElementContent]_{content(targetNCName)} ) [OccursAttributes]_occurs

8.4 Complex Type Definitions

Schema component

A complex type definition is represented in XML by the following structure.

<complexType

[ ignored ] abstract = boolean : false

[ ignored ] block = (#all | List of (extension | restriction))

[ ignored ] final = (#all | List of (extension | restriction))

[ ignored ] id = ID

mixed = boolean : false

name = NCName

[ ignored ] {any attributes with non-schema namespace . . .} >

</complexType>

Notation

The following auxiliary grammar productions are used to describe the content of a complex type definition.

[43 (Formal)]	ComplexTypeContent	::=	"annotation"? ("simpleContent" \| "complexContent" \| (ChildrenContent AttributeContent))
[46 (Formal)]	AttributeContent	::=	("attribute" \| "attributeGroup")* "anyAttribute"?
[44 (Formal)]	ChildrenContent	::=	("group" \| "all" \| "choice" \| "sequence")?

8.4.1 Global complex type

Schema import distinguishes between global complex types (which are mapped to sort declarations) and local complex types (which are mapped to type definitions).

Schema mapping

In the case of global complex types, the mapping rule which applies is denoted by []_{definition(targetNCName)}.

[

<complexType

MixedAttribute

name = NCName>

ComplexTypeContent

</complexType>

]_{definition(targetNCName)}

define type targetNCName:NCName [MixedAttribute ComplexTypeContent]_{mixed_content(targetNCName)}

Note that the mixed is passed along in the normalization rules, in order to map it later on to the appropriate indication in the [XPath/XQuery] type system.

8.4.2 Local complex type

Schema mapping

In the case of a local complex types, there must not be a name attribute and the mapping rule which applies is denoted by []_{content(targetNCName)}.

[

<complexType

MixedAttribute>

ComplexTypeContent

</complexType>

]_{content(targetNCName)}

[MixedAttribute ComplexTypeContent]_{mixed_content(targetNCName)}

Note that the mixed is passed along in the normalization rules, in order to map it later on to the appropriate indication in the [XPath/XQuery] type system.

8.4.3 Complex type with simple content

Schema component

A complex type can be of simple content. A simple content is represented in XML by the following structure.

<simpleContent

[ ignored ] id = ID

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (restriction | extension))

</simpleContent>

Derivation by restriction inside a simple content is represented in XML by the following structure.

<restriction

base = QName

[ ignored ] id = ID

[ ignored ] {any attributes with non-schema namespace . . .} >

</restriction>

Derivation by extension inside a simple content is represented in XML by the following structure.

<extension

base = QName

[ ignored ] id = ID

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, ((attribute | attributeGroup)*, anyAttribute?))

</extension>

Notation

The normalization rule

[MixedAttribute ComplexTypeContent]_{mixed_content(targetNCName)}

TypeSpecifier

maps a pair of mixed attribute and complex type content to a type specifier.

Schema mapping

A complex types with simple content must not have a mixed attribute set to "true".

If the simple content is derived by restriction, it is mapped into a simple type restriction in the [XPath/XQuery] type system. Only the name of the base atomic type and attributes are mapped, while the actual simple type restriction is ignored. (Remember that facets are not captured in the [XPath/XQuery] type system.)

[

mixed = "false"

<restriction

base = QName>

simpleContentRestriction AttributeContent

</restriction>

</simpleContent>

]_{mixed_content(targetNCName)}

restricts QName { [AttributeContent]_{content(targetNCName)} QName }

If the simple type is derived by extension, it is mapped into an extended type specifier into the [XPath/XQuery] type system.

[

mixed = "false"

<extension

base = QName>

AttributeContent

</extension>

</simpleContent>

]_{mixed_content(targetNCName)}

extends QName { [AttributeContent]_{content(targetNCName)} }

8.4.4 Complex type with complex content

Schema component

A complex type can be of complex content. A complex content is represented in XML by the following structure.

<complexContent

[ ignored ] id = ID

mixed = boolean : false

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (restriction | extension))

</complexContent>

Derivation by restriction inside a complex content is represented in XML by the following structure.

<restriction

base = QName

[ ignored ] id = ID

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?))

</restriction>

Derivation by extension inside a complex content is represented in XML by the following structure.

<extension

base = QName

[ ignored ] id = ID

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, ((group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?)))

</extension>

Schema mapping

If the complex content is derived by restriction, it is mapped into a type restriction in the [XPath/XQuery] type system, and the

[

MixedAttribute

<restriction

base = QName>

annotation? ChildrenContent AttributeContent

</restriction>

</complexContent>

]_{mixed_content(targetNCName)}

restricts QName [MixedAttribute]_mixed { [AttributeContent]_{content(targetNCName)} [ChildrenContent]_{content(targetNCName)} }

If the complex content is derived by extension, it is mapped into an extended type specifier into the [XPath/XQuery] type system.

[

MixedAttribute

<extension

base = QName>

annotation? ChildrenContent AttributeContent

</extension>

</complexContent>

]_{mixed_content(targetNCName)}

extends QName [MixedAttribute]_mixed { [AttributeContent]_{content(targetNCName)} [ChildrenContent]_{content(targetNCName)} }

8.5 Attribute Uses

Mapping for attribute uses is given in [8.1.4.1 use].

8.6 Attribute Group Definitions

8.6.1 Attribute group definitions

Schema component

Model group definitions are represented in XML by the following structure.

<attributeGroup

[ ignored ] id = ID

name = NCame

ref = QName

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, ((attribute | attributeGroup)*, anyAttribute?))

</attributeGroup>

Schema mapping

Attribute group definitions are not currently handled by the mapping. See [Issue-0158: Support for XML Schema groups].

8.6.2 Attribute group reference

Schema mapping

Attribute group references are not currently handled by the mapping. See [Issue-0158: Support for XML Schema groups].

8.7 Model Group Definitions

Schema component

Model group definitions are represented in XML by the following structure.

<group

name = NCame >

Content: (annotation?, (all | choice | sequence))

</group>

Schema mapping

Model group definitions are not currently handled by the mapping. See [Issue-0158: Support for XML Schema groups].

8.8 Model Groups

Model groups are either "all", "sequence" or "choice". One can also refer to a model group definition.

8.8.1 All groups

Schema component

All groups are represented in XML by the following structure.

<all

[ ignored ] id = ID

maxOccurs = 1 : 1

minOccurs = (0 | 1) : 1

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, element*)

</all>

Schema mapping

All groups are mapped into the "&" operation in the [XPath/XQuery] type system.

[

<all

OccursAttributes>

Element₁ ... Element_n

</all>

]_{content(targetNCName)}

([Element₁]_{content(targetNCName)} & ... & [Element_n]_{content(targetNCName)}) [OccursAttributes]_occurs

8.8.2 Choice groups

Schema component

Choice groups are represented in XML by the following structure.

<choice

[ ignored ] id = ID

maxOccurs = (nonNegativeInteger | unbounded) : 1

minOccurs = nonNegativeInteger : 1

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (element | group | choice | sequence | any)*)

</choice>

Notation

The following auxiliary grammar productions are used to describe group components.

[45 (Formal)]

GroupComponent

::=

"element" | "group" | "choice" | "sequence" | "any"

Schema mapping

Choice groups are mapped into the "|" operation in the [XPath/XQuery] type system.

[

<choice

OccursAttributes>

GroupComponent₁ ... GroupComponent_n

</choice>

]_{content(targetNCName)}

([GroupComponent₁]_{content(targetNCName)} | ... | [GroupComponent_n]_{content(targetNCName)}) [OccursAttributes]_occurs

8.8.3 Sequence groups

Schema component

Sequence groups are represented in XML by the following structure.

<sequence

[ ignored ] id = ID

maxOccurs = (nonNegativeInteger | unbounded) : 1

minOccurs = nonNegativeInteger : 1

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (element | group | choice | sequence | any)*)

</sequence>

Schema mapping

Choice groups are mapped into the "," operation in the [XPath/XQuery] type system.

[

<sequence

OccursAttributes>

GroupComponent₁ ... GroupComponent_n

</sequence>

]_{content(targetNCName)}

([GroupComponent₁]_{content(targetNCName)} , ... , [GroupComponent_n]_{content(targetNCName)}) [OccursAttributes]_occurs

8.9 Particles

Particles contribute to the definition of content models.

Particle can be either and element reference, a group reference or a wildcard.

8.9.1 Element reference

Schema component

Element reference particles are represented in XML by the following structure.

<element

ref = QName

maxOccurs = (nonNegativeInteger | unbounded) : 1

minOccurs = nonNegativeInteger : 1

[ ignored ] {any attributes with non-schema namespace . . .} >

Schema mapping

Element references are mapped into element references in the [XPath/XQuery] type system.

[

<element

ref = QName

OccursAttributes />

]_{content(targetNCName)}

element QName [OccursAttributes]_occurs

8.9.2 Group reference

Schema component

Group reference particles are represented in XML by the following structure.

<group

ref = QName

maxOccurs = (nonNegativeInteger | unbounded) : 1

minOccurs = nonNegativeInteger : 1

[ ignored ] {any attributes with non-schema namespace . . .} >

Schema mapping

Model group references are not currently handled by the mapping.

8.10 Wildcards

8.10.1 Attribute wildcards

Schema component

Attribute wilcards are represented in XML by the following structure.

<anyAttribute

[ ignored ] id = ID

[ not handled ] namespace = ((##any | ##other) | List of (anyURI | (##targetNamespace | ##local)) ) : ##any

processContents = (lax | skip | strict) : strict

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?)

</anyAttribute>

Schema mapping

An attribute wildcard with a "skip" process content is mapped as an attribute wildcard in the [XPath/XQuery] type system.

[

<anyAttribute

processContents = "skip">

annotation?

</anyAttribute>

]_{content(targetNCName)}

attribute*

Ed. Note: Issue: Attribute wildcards with a "lax" or "strict" process content are not currently handled by the mapping. See [Issue-0153: Support for lax and strict wildcards].

Ed. Note: Issue: Namespace wildcards are not currently handled by the mapping. See [Issue-0157: Support for wildcard namespaces].

8.10.2 Element wildcards

Schema component

Element wilcards are represented in XML by the following structure.

<any

[ ignored ] id = ID

maxOccurs = (nonNegativeInteger | unbounded) : 1

minOccurs = nonNegativeInteger : 1

[ not handled ] namespace = ((##any | ##other) | List of (anyURI | (##targetNamespace | ##local)) ) : ##any

processContents = (lax | skip | strict) : strict

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?)

</any>

Schema mapping

An element wildcard with a "skip" process content is mapped as an element wildcard in the [XPath/XQuery] type system.

[

<any

OccursAttributes

processContents = "skip">

annotation?

</any>

]_{content(targetNCName)}

( element )[OccursAttributes]_occurs

Ed. Note: Issue: Element wildcards with a "lax" or "strict" process content are not currently handled by the mapping.

Ed. Note: Issue: Namespace wildcards are not currently handled by the mapping. See [Issue-0157: Support for wildcard namespaces].

8.11 Identity-constraint Definitions

All identity-constraints definitions are ignored when mapping into the [XPath/XQuery] type system.

8.12 Notation Declarations

All notation declarations are ignored when mapping into the [XPath/XQuery] type system.

8.13 Annotation

All annotation are ignored when mapping into the [XPath/XQuery] type system.

8.14 Simple Type Definitions

Schema component

A simple type is represented in XML by the following structure.

<simpleType

[ ignored ] final = (#all | (list | union | restriction))

[ ignored ] id = ID

name = NCName

[ ignored ] {any attributes with non-schema namespace . . .} >

name = NCName

</simpleType>

Derivation by restriction inside a simple type is represented in XML by the following structure.

<restriction

base = QName

[ ignored ] id = ID

[ ignored ] {any attributes with non-schema namespace . . .} >

</restriction>

Derivation by list inside a simple type is represented in XML by the following structure.

<list

[ ignored ] id = ID

itemType = QName

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (simpleType?))

</list>

Derivation by union inside a simple type is represented in XML by the following structure.

<union

[ ignored ] id = ID

memberTypes = List of QName

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?, (simpleType*))

</union>

8.14.1 Global simple type definition

Schema import distinguishes between global simple types (which are mapped to sort declarations) and local simple types (which are mapped to type definitions).

Schema mapping

In the case of global simple types, the mapping rule which applies is denoted by []_{definition(targetNCName)}.

[

<simpleType

name = NCName>

SimpleTypeContent

</simpleType>

]_{definition(targetNCName)}

define type targetNCName:NCName [SimpleTypeContent]_{simple_content(targetNCName)}

8.14.2 Local simple type definition

Schema mapping

In the case of global simple types, the mapping rule which applies is denoted by []_{content(targetNCName)}.

[

SimpleTypeContent

</simpleType>

]_{content(targetNCName)}

[SimpleTypeContent]_{simple_content(targetNCName)}

8.14.3 Simple type content

Notation

The normalization rule []_{simple_content(targetNCName)} maps a simple type content to a type specifier.

Schema mapping

If the simple type is derived by restriction, it is mapped into a simple type restriction in the [XPath/XQuery] type system. Only the name of the base atomic type and attributes are mapped, while the actual simple type restriction is ignored. (Remember that facets are not captured in the [XPath/XQuery] type system.)

[

<restriction

base = QName>

simpleContentRestriction

</restriction>

]_{simple_content(targetNCName)}

restricts QName { QName }

If the simple type is derived by list, it is mapped into a repetition type into the [XPath/XQuery] type system.

[

<list>

SimpleType

</list>

]_{simple_content(targetNCName)}

{ ([SimpleType]_{content(targetNCName)})* }

[

<list

itemType = QName />

]_{simple_content(targetNCName)}

{ QName* }

If the simple type is derived by union, it is mapped into a union type into the [XPath/XQuery] type system.

[

<union>

SimpleType₁ ... SimpleType_n

</union>

]_{simple_content(targetNCName)}

{ ([SimpleType]_{content(targetNCName)} | ... | [SimpleType_n]_{content(targetNCName)}) }

[

<union

memberTypes = QName₁ ... QName_n />

]_{simple_content(targetNCName)}

{ QName₁ | ... | QName_n }

8.15 Schemas as a whole

8.15.1 Schema

Schema component

A schema is represented in XML by the following structure.

<schema

[ not handled ] attributeFormDefault = (qualified | unqualified) : unqualified

[ ignored ] blockDefault = (#all | List of (extension | restriction | substitution)) : ' '

[ not handled ] elementFormDefault = (qualified | unqualified) : unqualified

[ ignored ] finalDefault = (#all | List of (extension | restriction)) : ' '

[ ignored ] id = ID

targetNamespace = anyURI

[ ignored ] version = token

[ ignored ] xml:lang = language

[ ignored ] {any attributes with non-schema namespace . . .} >

</schema>

Notation

The following auxiliary grammar productions are used.

[35 (Formal)]	Pragma	::=	("include" \| "import" \| "redefine" \| "annotation")*
[36 (Formal)]	Content	::=	(("simpleType" \| "complexType" \| "element" \| "attribute" \| "attributeGroup" \| "group" \| "notation") "annotation")

The auxiliary normalization rule

[Pragma]_{pragma(targetNCName)}

Definition*

maps the a schema pragma into a set of definitions in the [XPath/XQuery] type system.

Schema mapping

Schemas are imported by the "schema" declaration in the preamble of a query. To import a schema, the document referred to by the given URI is opened and the schema declarations contained in the document are translated into the corresponding in-line type definitions.

[schema URI]_Expr

[open-schema-document(URI)]_Schema

[

<schema

targetNamespace = targetURI>

Pragma Content

</schema>

]_Schema

[Pragma]_{pragma(targetNCName)} [Content]_{definition(targetNCName)}

8.15.2 Include

Schema component

A schema include is represented in XML by the following structure.

<include

[ ignored ] id = ID

schemaLocation = anyURI

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?)

</include>

Schema mapping

A schema include is not specified here, and is assumed to be handled by the XML Schema processor.

8.15.3 Redefine

Schema component

A schema redefinition is represented in XML by the following structure.

<redefine

[ ignored ] id = ID

schemaLocation = anyURI

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation | (simpleType | complexType | group | attributeGroup))*

</redefine>

Schema mapping

A schema redefine is not specified here, and is assumed to be handled by the XML Schema processor.

8.15.4 Import

Schema component

A schema import is represented in XML by the following structure.

<import

[ ignored ] id = ID

namespace = anyURI

schemaLocation = anyURI

[ ignored ] {any attributes with non-schema namespace . . .} >

Content: (annotation?)

</import>

Schema mapping

A schema import is not specified here, and is assumed to be handled by the XML Schema processor.

A Normalized core grammar

This section contains the grammar of [XPath/XQuery] after it has been normalized, sometimes referred to as the "core" syntax.

A.1 Core lexical structure

A.1.1 Syntactic Constructs

Character Classes

The following basic tokens are defined in [XML].
1. Letter
2. BaseChar
3. Ideographic
4. CombiningChar
5. Digit
6. Extender
Identifiers

The following identifier components are defined in [XML Names].
1. NCName
2. NCNameChar
3. QName
4. Prefix
5. LocalPart

String Literals and Numbers

[138 (Core)]	IntegerLiteral	::=	Digits
[139 (Core)]	DecimalLiteral	::=	("." Digits) \| (Digits "." [0-9]*)
[140 (Core)]	DoubleLiteral	::=	(("." Digits) \| (Digits ("." [0-9]*)?)) ([e] \| [E]) ([+] \| [-])? Digits
[158 (Core)]	StringLiteral	::=	(["] ((" ") \| [^"])* ["]) \| (['] ((' ') \| [^'])* ['])

Defined Tokens

The following is a list of defined tokens for the [XPath/XQuery] grammar.

[71 (Core)]	S	::=	WhitespaceChar+
[119 (Core)]	Nmstart	::=	Letter \| "_"
[120 (Core)]	Nmchar	::=	Letter \| CombiningChar \| Extender \| Digit \| "." \| "-" \| "_"
[137 (Core)]	Digits	::=	[0-9]+
[160 (Core)]	URLLiteral	::=	(["] ((" ") \| [^"])* ["]) \| (['] ((' ') \| [^'])* ['])
[167 (Core)]	PITarget	::=	NCName
[169 (Core)]	VarName	::=	QName
[170 (Core)]	FuncName	::=	(Prefix ":")? LocalPart
[173 (Core)]	NCNameForPrefix	::=	Nmstart Nmchar*
[175 (Core)]	HexDigits	::=	([0-9] \| [a-f] \| [A-F])+
[176 (Core)]	CharRef	::=	"&#" (Digits \| ("x" HexDigits)) ";"
[182 (Core)]	Char	::=	([#x0009] \| [#x000D] \| [#x000A] \| [#x0020-#xFFFD])
[183 (Core)]	WhitespaceChar	::=	([#x0009] \| [#x000D] \| [#x000A] \| [#x0020])

A.2 Core BNF

The following grammar uses the same Basic EBNF notation as [XML], except that grammar symbols always have initial capital letters. The EBNF contains the lexemes embedded in the productions.

NON-TERMINALS

[1]	Query	::=	QueryProlog QueryBody
[2]	QueryProlog	::=	(NamespaceDecl \| XMLSpaceDecl \| DefaultNamespaceDecl \| DefaultCollationDecl \| SchemaImport)* FunctionDefn*
[3]	QueryBody	::=	ExprSequence?
[4]	ExprSequence	::=	Expr ("," Expr)*
[5]	Expr	::=	SortExpr
[6]	SortExpr	::=	UnorderedExpr ( "stable"? <"sort" "by" "("> SortSpecList ")" )
[7]	UnorderedExpr	::=	"unordered"? ForExpr
[8]	ForExpr	::=	(ForClause "return")* TypeswitchExpr
[9]	LetExpr	::=	(LetClause "return")* TypeswitchExpr
[10]	TypeswitchExpr	::=	(<"typeswitch" "("> Expr ")" CaseClause+ "default" "$" VarName "return")* IfExpr
[11]	IfExpr	::=	(<"if" "("> Expr ")" "then" Expr "else")* ValueExpr
[12]	ValueExpr	::=	ValidateExpr \| CastExpr \| Constructor \| PathExpr
[13]	PathExpr	::=	RelativePathExpr
[14]	RelativePathExpr	::=	StepExpr
[15]	StepExpr	::=	ForwardStep \| ReverseStep \| PrimaryExpr
[16]	ForClause	::=	<"for" "$"> TypeDeclaration? "$" VarName "in" Expr
[17]	LetClause	::=	<"let" "$"> TypeDeclaration? "$" VarName ":=" Expr
[18]	ValidateExpr	::=	"validate" SchemaContext? "{" Expr "}"
[19]	CastExpr	::=	<"cast" "as"> SequenceType ParenthesizedExpr
[20]	Constructor	::=	XmlComment \| XmlProcessingInstruction \| ComputedDocumentConstructor \| ComputedElementConstructor \| ComputedAttributeConstructor
[21]	SortSpecList	::=	Expr SortModifier ("," SortSpecList)?
[22]	SortModifier	::=	("ascending" \| "descending") (<"empty" "greatest"> \| <"empty" "least">)
[23]	CaseClause	::=	"case" SequenceType "$" VarName "return" Expr
[24]	PrimaryExpr	::=	Literal \| FunctionCall \| ("$" VarName) \| ParenthesizedExpr
[25]	ForwardAxis	::=	<"child" "::"> \| <"descendant" "::"> \| <"attribute" "::"> \| <"self" "::"> \| <"descendant-or-self" "::"> \| <"following-sibling" "::"> \| <"following" "::"> \| <"namespace" "::">
[26]	ReverseAxis	::=	<"parent" "::"> \| <"ancestor" "::"> \| <"preceding-sibling" "::"> \| <"preceding" "::"> \| <"ancestor-or-self" "::">
[27]	NodeTest	::=	KindTest \| NameTest
[28]	NameTest	::=	QName \| Wildcard
[29]	Wildcard	::=	"" \| <NCName ":" ""> \| <"*" ":" NCName>
[30]	KindTest	::=	ProcessingInstructionTest \| CommentTest \| TextTest \| AnyKindTest
[31]	ProcessingInstructionTest	::=	<"processing-instruction" "("> StringLiteral? ")"
[32]	CommentTest	::=	<"comment" "("> ")"
[33]	TextTest	::=	<"text" "("> ")"
[34]	AnyKindTest	::=	<"node" "("> ")"
[35]	ForwardStep	::=	ForwardAxis NodeTest
[36]	ReverseStep	::=	ReverseAxis NodeTest
[37]	NumericLiteral	::=	IntegerLiteral \| DecimalLiteral \| DoubleLiteral
[38]	Literal	::=	NumericLiteral \| StringLiteral
[39]	ParenthesizedExpr	::=	"(" ExprSequence? ")"
[40]	FunctionCall	::=	<QName "("> (Expr ("," Expr)*)? ")"
[41]	Param	::=	SequenceType "$" VarName
[42]	SchemaContext	::=	"in" SchemaGlobalContext ("/" SchemaContextStep)*
[43]	SchemaGlobalContext	::=	QName \| ("type" QName)
[44]	SchemaContextStep	::=	QName
[45]	TypeDeclaration	::=	SequenceType
[46]	SequenceType	::=	(ItemType OccurrenceIndicator) \| "empty"
[47]	ItemType	::=	(("element" \| "attribute") ElemOrAttrType?) \| "node" \| "processing-instruction" \| "comment" \| "text" \| "document" \| "item" \| AtomicType \| "untyped" \| <"atomic" "value">
[48]	ElemOrAttrType	::=	(QName (SchemaType \| SchemaContext?)) \| SchemaType
[49]	SchemaType	::=	<"of" "type"> QName
[50]	AtomicType	::=	QName
[51]	OccurrenceIndicator	::=	("*" \| "+" \| "?")?
[52]	ComputedDocumentConstructor	::=	<"document" "{"> ExprSequence "}"
[53]	ComputedElementConstructor	::=	(<"element" QName "{"> \| (<"element" "{"> Expr "}" "{")) ExprSequence? "}"
[54]	ComputedAttributeConstructor	::=	(<"attribute" QName "{"> \| (<"attribute" "{"> Expr "}" "{")) ExprSequence? "}"
[55]	XmlProcessingInstruction	::=	"<?" PITarget Char* "?>"
[56]	XmlComment	::=	"<!--" Char* "-->"
[57]	EnclosedExpr	::=	( \| "{") ExprSequence "}"
[58]	XMLSpaceDecl	::=	<"declare" "xmlspace"> "=" ("preserve" \| "strip")
[59]	DefaultCollationDecl	::=	<"default" "collation" "="> URLLiteral
[60]	NamespaceDecl	::=	<"declare" "namespace"> NCNameForPrefix "=" URLLiteral
[61]	SubNamespaceDecl	::=	"namespace" NCNameForPrefix "=" URLLiteral
[62]	DefaultNamespaceDecl	::=	(<"default" "element"> \| <"default" "function">) "namespace" "=" URLLiteral
[63]	FunctionDefn	::=	<"define" "function"> FuncName "(" ParamList? (")" \| (<")" "returns"> SequenceType)) EnclosedExpr
[64]	ParamList	::=	Param ("," Param)*
[65]	SchemaImport	::=	<"import" "schema"> (StringLiteral \| SubNamespaceDecl \| DefaultNamespaceDecl) <"at" StringLiteral>?

B References

B.1 Normative References

XML: World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (Second edition), October, 2000 See http://www.w3.org/TR/REC-xml.
XML Names: World Wide Web Consortium. Namespaces in XML. W3C Recommendation. See http://www.w3.org/TR/REC-xml-names/ .
XML Path Language (XPath) 2.0: World Wide Web Consortium. XML Path Language (XPath) 2.0. W3C Working Draft, 16 August 2002. See http://www.w3.org/TR/xpath20/.
XML Schema Part 0: World Wide Web Consortium. XML Schema Part 0 : Primer. W3C Recommendation, May 2001. See http://www.w3.org/TR/xmlschema-0/
XML Schema Part 1: World Wide Web Consortium. XML Schema Part 1 : Structures. W3C Recommendation, May 2001. See http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: World Wide Web Consortium. XML Schema Part 2 : Datatypes. W3C Recommendation, May 2001. See http://www.w3.org/TR/xmlschema-2/.
XQuery 1.0: A Query Language for XML: World Wide Web Consortium.XQuery 1.0: A Query Language for XML. W3C Working Draft, 16 August 2002. See http://www.w3.org/TR/xquery/
XQuery 1.0 and XPath 2.0 Data Model: World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Data Model. W3C Working Draft, 16 August 2002. See http://www.w3.org/TR/query-datamodel/.
XQuery 1.0 and XPath 2.0 Functions and Operators: World Wide Web Consortium. XQuery 1.0 and XPath 2.0 Functions and Operators. W3C Working Draft, 16 August 2002. See http://www.w3.org/TR/xquery-operators/

B.2 Non-normative References

XML Path Language (XPath) : Version 1.0: World Wide Web Consortium. XML Path Language (XPath) : Version 1.0. W3C Recommendation, November, 1999. See http://www.w3.org/TR/xpath.html.
XML Query 1.0 Requirements: World Wide Web Consortium. XML Query 1.0 Requirements. W3C Working Draft, 15 Feb 2001. See http://www.w3.org/TR/xmlquery-req.
XML Query Use Cases: World Wide Web Consortium. XML Query Use Cases. W3C Working Draft, 20 Dec 2001. See http://www.w3.org/TR/xmlquery-use-cases.
XML Schema : Formal Description: World Wide Web Consortium. XML Schema: Formal Description. W3C Working Draft, March 2001. See http://www.w3.org/TR/xmlschema-formal/
XSLT 99: World Wide Web Consortium. XSL Transformations (XSLT), Version 1.0. W3C Recommendation, November 1999. See http://www.w3.org/TR/xslt.

B.3 Background References

BFS00: P. Buneman, M. Fernandez, D. Suciu. UnQL: A query language and algebra for semistructured data based on structural recursion. VLDB Journal, April, 2000, Vol 9, Number 1.
BKD90: Francois Bancilhon, Paris Kanellakis, Claude Delobel. Building an Object-Oriented Database System. Morgan Kaufmann, 1990.
BNTW95: Peter Buneman, Shamim Naqvi, Val Tannen, Limsoon Wong. Principles of programming with complex object and collection types. Theoretical Computer Science 149(1):3--48, 1995.
CM93: S. Cluet and G. Moerkotte. Nested queries in object bases. Workshop on Database Programming Languages, pages 226--242, New York, August 1993.
Col90: L. S. Colby. A recursive algebra for nested relations. Information Systems 15(5):567-582, 1990.
Graefe93: Goetz Graefe, Query Evaluation Techniques for Large Databases. In ACM Computing Surveys, 25(2):73--170, 1993.
HP2000: Haruio Hosoya, Benjamin Pierce, XDuce : A Typed XML Processing Language (Preliminary Report) WebDB Workshop 2000.
Languages: Handbook of Formal Languages. G. Rozenberg and A. Salomaa, editors. Springer-Verlag. 1997.
LMW96: Leonid Libkin, Rona Machlin, and Limsoon Wong. A query language for multi-dimensional arrays: Design, implementation, and optimization techniques. SIGMOD 1996.
LW97: Leonid Libkin and Limsoon Wong. Query languages for bags and aggregate functions. Journal of Computer and Systems Sciences, 55(2):241--272, October 1997.
Milner: R. Milner, M. Tofte, R. Harper, D. MacQueen The Definition of Standard ML (Revise). MIT Press, 1997.
Mitchell: John C. Mitchell Foundations for Programming Languages. MIT Press, 1998.
Mog89: E. Moggi, Computational lambda-calculus and monads. In Symposium on Logic in Computer Science Asilomar, California, IEEE, June 1989.
Mog91: E. Moggi, Notions of computation and monads. Information and Computation, 93(1), 1991.
ODMG: Rick Cattell et al. The Object Database Standard: ODMG-93, Release 1.2. Morgan Kaufmann Publishers, San Francisco, 1996.
Quilt: Don Chamberlin, Jonathan Robie, and Daniela Florescu. Quilt: An XML Query Language for Heterogeneous Data Sources. International Workshop on the Web and Databases (WebDB'2000), Dallas, Texas, May 2000.
SQL: International Organization for Standardization (ISO). Information Technology-Database Language SQL. Standard No. ISO/IEC 9075:1999. (Available from American National Standards Institute, New York, NY 10036, (212) 642-4900.)
TATA: Tree Automata Techniques and Applications. H. Comon and M. Dauchet and R. Gilleron and F. Jacquemard and D. Lugiez and S. Tison and M. Tommasi. See http://www.grappa.univ-lille3.fr/tata/. 1997.
Wad93: P. Wadler, Monads for functional programming. In M. Broy, editor, Program Design Calculi, NATO ASI Series, Springer Verlag, 1993. Also in J. Jeuring and E. Meijer, editors, Advanced Functional Programming, LNCS 925, Springer Verlag, 1995.
Wad95: P. Wadler, How to declare an imperative. ACM Computing Surveys, 29(3):240--263, September 1997.
Won00: Limsoon Wong. An introduction to the Kleisli query system and a commentary on the influence of functional programming on its implementation. Journal of Functional Programming, to appear.
XMLQL99: A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML. In International World Wide Web Conference, 1999.
XQL99: J. Robie, editor. XQL '99 Proposal, 1999. See http://www.ibiblio.org/xql/xql-proposal.html.
YAT99: S. Cluet, and J. Siméon. YATL: A Functional and Declarative Language for XML. See http://db.bell-labs.com/user/simeon/icfp.ps

C Issues

C.1 Introduction

The issues in [C.2 Issues list] serve as a design history for this document. The ordering of issues is irrelevant. Each issue has a unique id of the form Issue-<dddd> (where d is a digit). This can be used for referring to the issue by <url-of-this-document>#Issue-<dddd>. Furthermore, each issue has a mnemonic header, a date, an optional description, and an optional resolution. For convenience, resolved issues are displayed in green. Some of the descriptions of the resolved issues are obsolete w.r.t. to the current version of the document.

Ed. Note: Peter (Aug-05-2000): For the sake of archival, there are some duplicate issues raised in multiple instances. Duplicate issues are marked as "resolved" with reference to the representative issue.

C.2 Issues list

Issue-0001: Attributes

Date: Jul-26-2000
Raised by: Algebra Editors

Description: One example of the need for support of [Issue-0049: Unordered Collections], but also: Attributes need to be constrained to contain white space separated lists of simple types only.

Resolution: Attributes are represented by attribute attribute-name { content }.

Issue-0002: Namespaces

Date: Jul-26-2000
Raised by: Algebra Editors

Resolution: Namespaces are represented by {uri-of-namespace}localname.

Issue-0003: Document Order

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The data model and algebra do not define a global order on documents. Querying global order is often required in document-oriented queries.

Resolution: Resolved by adding < operator defined on nodes in same document. See [Issue-0079: Global order between nodes in different documents] for order between nodes in different documents.

Issue-0004: References vs containment

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The query-algebra datamodel currently does not explicitly model children-elements by references (other than the XML-Query Datamodel. This facilitates presentation, but may be an oversimplification with regard to [Issue-0005: Element identity].

Resolution: This issue is resolved by subsumption as follows: (1) All child-elements are (implicit) references to nodes. (2) Thus, having resolved [Issue-0005: Element identity] this issue is resolved too.

Issue-0005: Element identity

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Do expressions preserve element identity or don't they? And does "=" and distinct use comparison by reference or comparison by value?

Resolution: The first part of the question has been resolved by resolution of [Issue-0010: Construct values by copy]. The second part raises a more specific issue [Issue-0066: Shallow or Deep Equality?].

Issue-0006: Source and join syntax instead of "for"

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Another term for "source and join syntax" is "comprehension".

Resolution: This issue is resolved by subsumption under [Issue-0021: Syntax]. List comprehension is a syntactic alternative to "for v in e1 do e2", which has been favored by the WG in the resolution of [Issue-0021: Syntax].

Issue-0007: References: IDREFS, Keyrefs, Joins

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Currently, the Algebra does not support reference values, such as IDREF, or Keyref (not to be mixed up with "node-references" - see [Issue-0005: Element identity], which are defined in the XML Query Data Model. The Algebra's type system should be extended to support reference types and the data model operators ref, and deref should be supported (similar to id() in XPath).

Resolution: Delegated to XPath 2.0. Algebra should adopt solutions (e.g., id()/keyref() functions) provided in XPath 2.0. There may be an interaction between IDREFs and RefNodes, but we're not going to cover that now.

Issue-0008: Fixed point operator or recursive functions

Date: Jul-26-2000
Raised by: Algebra Editors

Description: It may be useful to add a fixed-point operator, which can be used in lieu of recursive functions to compute, for example, the transitive closure of a collection.

Currently, the Algebra does not guarantee termination of recursive expressions. In order to ensure termination, we might require that a recursive function take one argument that is a singleton element, and any recursive invocation should be on a descendant of that element; since any element has a finite number of descendants, this avoids infinite regress. (Ideally, we should have a simple syntactic rule that enforces this restriction, but we have not yet devised such a rule.)

Impacts optimization; hard to do static type inference; current algebra is first-order

Resolution: The functionality described is appropriately covered by the use of recursive functions. XQuery is a functional language and does not provide a fixed-point operator in its current version.

Issue-0009: Externally defined functions

Date: Jul-26-2000
Raised by: Algebra Editors

Description: There is no explicit support for externally defined functions.

The set of built-in functions may be extended to support other important operators.

Resolution: Algebra editors endorse a solution that uses XP for specifying signatures of external functions. Algebra will adopt solution provided by [XPath/XQuery].

Issue-0010: Construct values by copy

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Need to be able to construct new types from bits of old types by reference and by copy. Related to [Issue-0005: Element identity].

Resolution: The WG wishes to support both: construction of values by copy, as well as references to original nodes. This needs some further investigation to sort out all technical difficulties (see [Issue-0062: Open questions for constructing elements by reference]) so the change has not yet been reflected in the Algebra document.

Issue-0011: XPath tumbler syntax instead of index?

Date: Jul-26-2000
Raised by: Algebra Editors

Description: XPath provides as a shorthand syntax [integer] to select child-elements by their position on the sibling axes, whereas the xml-query algebra uses a combination of a built-in function index() and iteration.

Addendum by JS (submitted by MF) Dec 19/2000: The typing of index is lossy : it produces a factored type. Jerome suggests the more precise range operator:

e : q min m  max n   n' - (m'-1) = r  m' >= m   n' <= n
-----------------------------------------------
            range(e;m';n') : q min r  max r

nth(e;n) == range(e;n;n)

The range operator takes a repetition of prime types and those values in the range m' to n'; if the repetition does not include that range, a run-time error is raised. The range and nth operators could also be defined in terms of head and tail and polymorphic recursive functions. In the absence of parameteric polymorphism, it is not possible to define range and nth with precise types.

Here are Peter's rules:

e : p min m  max n     n!=* 
-------------------------------------------------
range(e;m';n') : p{n'-max(m,m')+1,min(n',n)-m'+1}

For example:
let v1 = a[] min 2  max 4

range(v1;3;3): a[] min 1  max 1
range(v1;1;3): a[] min 2  max 3
range(v1;3;5): a[] min 1  max 2
range(v1;1;5): a[] min 2  max 4

e : p min m  max * 
-----------------------------
range(e;m';n') : p min 0  max n'-m'+1

let v2 = a[] min 0  max *

range(v2;1;3): a[] min 0  max 2

this follows the typical semantics for head() and tail():
head(()) = tail(()) = ()
and the semantics behind
range(e;m',n') = tail o ...(m' times)   ... o tail o head,
                 tail o ...(m'+1 times) ... o tail o head,
                 ...
                 tail o ...(n' times)   ... o tail o head

I would have no troubles in restricting ourselves to nth() instead of range() in the algebra (range can always be enumerated by nth()). Furthermore, we should consider whether m',n' can be computed numbers.

Issue-0012: GroupBy - needs second order functions?

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The type system is currently first order: it does not support function types nor higher-order functions. Higher-order functions are useful for specifying, for example, sorting and grouping operators, which take other functions as arguments.

Resolution: The WG has decided to express groupBy by a combination of for and distinct. Thus w.r.t. to GroupBy this Issue is resolved. Because GroupBy is not the only use case for higher order functions, a new issue [Issue-0063: Do we need (user defined) higher order functions?] is raised.

Issue-0013: Collations

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Collations identify the ordering to be applied for sorting strings. Currently, it is considered to have an (optional parameter) collation "name" as follows: "SORT variable IN exp BY +(expression {ASCENDING|DESCENDING} {COLLATION name}). An alternative would be to model a collation as a simple type derived from string, and use type-level casting, i.e. expression :collationtype (which is already supported in the XML Query Algebra), for specifying the collation. That would make: "SORT variable IN exp BY +(expression:collationname {ASCENDING|DESCENDING}). But that requires some support from XML-Schema.

More generally, collations are important for any operator in the Algebra that involves string comparison, among them: sort, distinct, "=" and "<".

Resolution: Formal semantics will adopt solution provided by Operators.

Issue-0014: Polymorphic types

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The type system is currently monomorphic: it does not permit the definition of a function over generalized types. Polymorphic functions are useful for factoring equivalent functions, each of which operate on a fixed type.

The current type system has already a built-in polymorphic type (lists) and is likely to have more (unordered collections). The question is, whether to allow for user-defined polymorphic types and user defined polymorphic functions.

Resolution: It has been decided this feature was not required for XQuery 1.0.

Issue-0015: 3-valued logic to support NULLs

Date: Jul-26-2000
Raised by: Algebra Editors

Resolution: The Formal Semantics supports the current semantics of NULL values, as described in the XQuery December working draft. The Formal Semantics will reflect further resolution of open issues on NULLs and 3 valued logic as decided by XQuery.

Issue-0016: Mixed content

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The XML-Query Algebra allows to generate elements with an arbitrary mixture of data (of simple type) and elements. XML-Schema only allows for a combination of strings interspersed with elements (aka mixed content). We need to figure out whether and how to constrain the XML-Query Algebra accordingly (e.g. by typing rules?)

Resolution: The type system has been extended to support the interleaving operator & - see [3 The XQuery Type System]. Mixed content is defined in terms of &.

Issue-0017: Unordered content

Date: Jul-26-2000
Raised by: Algebra Editors

Description: All-groups in XML-Schema, not to be mixed up with [Issue-0049: Unordered Collections]

Resolution: The type system has been extended with the support of all-groups - see [3 The XQuery Type System].

Issue-0018: Align algebra types with schema

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The Algebra's internal type system is the type system of XDuce. A potentially significant problem is that the Algebra's types may lose information when converted into XML Schema types, for example, when a result is serialized into an XML document and XML Schema.

James Clark points out : "The definition of AnyComplexType doesn't match the concrete syntax for types since it applies unbounded repetition to AnyTree and one alternative for AnyTree is AnyAttribute." This is another example of an alignment issue.

This issue comprises also issues [Issue-0016: Mixed content], [Issue-0017: Unordered content], [Issue-0053: Global vs. local elements], [Issue-0054: Global vs. local complex types], [Issue-0019: Support derived types], substitution groups.

Resolution: Closed by the semantics of named typing. See Sections 3 and 7 of the Formal Semantics document.

Issue-0019: Support derived types

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The current type system does not support user defined type hierarchies (by extension or by restriction).

Resolution: Closed by the semantics of named typing. Derived types are now supported in the type system. See Sections 3 and 7 of the Formal Semantics document.

Issue-0020: Structural vs. name equivalence

Date: Jul-26-2000
Raised by: Algebra Editors

Description: The subtyping rules in [3.5 Subtyping] only define structural subtyping. We need to extend this with support for subtyping via user defined type hierarchies - this is related to [Issue-0019: Support derived types].

Resolution: Closed by the semantics of named typing. The Formal Semantics now support both named and structural typing. See Sections 3 and 7 of the Formal Semantics document.

Issue-0021: Syntax

Date: Jul-26-2000
Raised by: Algebra Editors

Description: (e.g. for.<-.in vs for.in.do)

Resolution: The WG has voted for several syntax changes , "for v in e do e", "let v = e do", "sort v in e by e ...", "distinct", "match case v:t e ... else e".

Issue-0022: Indentation, Whitespaces

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Is indentation significant?

Resolution: The WG has consensus that indentation is not significant , i.e., all documents are white space normalized.

Issue-0023: Catch exceptions and process in algebra?

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Does the Algebra give explicit support for catching exceptions and processing them?

Resolution: Subsumed by new issue [Issue-0064: Error code handling in Query Algebra].

Issue-0024: Value for empty sequences

Date: Jul-26-2000
Raised by: Algebra Editors

Description: What does "value" do with empty sequences?

Resolution: The definition of value(e) has changed to:

value(e) = typeswitch children(e)
                     case v: AnyScalar do v
                     else()

Furthermore, the typing rules for "for v in e1 do e2" have been changed such that the variable v is typed-checked seperately for each unit-type occuring in expression e1.

Consequently the following example would be typed as follows:

query for b in b0/book do
      value(b/year): xs:integer min 0 max *

rather than leading to an error.

Issue-0025: Treatment of empty results at type level

Date: Jul-26-2000
Raised by: Algebra Editors

Description: This is related to [Issue-0024: Value for empty sequences].

Resolution: Resolved by resolution of [Issue-0025: Treatment of empty results at type level].

Issue-0026: Project - one tag only

Date: Jul-26-2000
Raised by: Algebra Editors

Description: Project is only parameterized by one tag. How can we translate a0/(b | c)?

Resolution: With the new syntax (and type system) a0/(b | c) can be translated to "for v in a0 do typeswitch case v1:b[AnyType] do v1 case v2:c[AnyType] do c else ()".

Issue-0027: Case syntax

Date: Jul-26-2000
Raised by: Quilt Comments et al.

Description: N-ary case can be realized by nested binary cases.

Resolution: New (n-ary) case syntax is introduced.

Issue-0028: Fusion

Date: Jul-26-2000
Raised by: Michael Rys

Description: Does the Algebra support fusion as introduced by query languages such as LOREL? This is related to [Issue-0005: Element identity], because fusion only makes sense with support of element identity.

Resolution: Fusion is equivalent to 'natural full-outer join'. [XPath/XQuery] can reraise issue if desired. If added, the Algebra editors should review any solution w.r.t typing.

Issue-0029: Views

Date: Jul-26-2000
Raised by: Michael Rys

Description: One of the problems in views: Can we undeclare/hide things in environment? For example, if we support element-identity, can we explicitly discard a parent, and/or children from an element in the result-set? Related to [Issue-0005: Element identity].

Resolution: [XPath/XQuery] can reraise issue if desired. If added, the Algebra editors should review any solution w.r.t typing.

Issue-0030: Automatic type coercion

Date: Jul-26-2000
Raised by: Dana Florescu

Description: What do we do if a value does not have a type or a different type from what is required?

Suggested Resolution: We believe that the XML Query Language should specify default type coercions for mixed mode arithmetic should be performed according to a fixed precedence hierarchy of types, specifically integer to fixed decimal, fixed decimal to float, float to double. This policy has the advantage of simplicity, tradition, and static type inference. Programmers could explicitly specify alternative type coercions when desirable.

Resolution: Delegation to XPath 2.0, [XPath/XQuery], and/or Operators.

Issue-0031: Recursive functions

Date: Jul-26-2000
Raised by: Dana Florescu

Resolution: subsumed by [Issue-0008: Fixed point operator or recursive functions]

Issue-0032: Full regular path expressions

Date: Jul-26-2000
Raised by: Dana Florescu

Description: Full regular path expressions allow to constrain recursive navigation along paths by means of regular expressions, e.g. a/b*/c denotes all paths starting with an a, proceeding with arbitrarily many b's and ending in a c. Currently the XML-Query Algebra can express this by means of (structurally) recursive functions. An alternative may be the introduction of a fixpoint operator [Issue-0008: Fixed point operator or recursive functions].

Resolution: XPath 2.0 can raise issue if desired. The Algebra editors should review any solution w.r.t typing.

Issue-0033: Metadata Queries

Date: Jul-26-2000
Raised by: Dana Florescu

Description: Metadata queries are queries that require runtime access to type information.

Resolution: Metadata queries are believed to be appropriately covered by XQuery 1.0 (e.g., using typeswitch).

Issue-0034: Fusion

Date: Jul-26-2000
Raised by: Dana Florescu

Resolution: Identical with [Issue-0028: Fusion]

Issue-0035: Exception handling

Date: Jul-26-2000
Raised by: Dana Florescu

Resolution: Subsumed by [Issue-0023: Catch exceptions and process in algebra?] and [Issue-0064: Error code handling in Query Algebra].

Issue-0036: Global-order based operators

Date: Jul-26-2000
Raised by: Dana Florescu

Resolution: Subsumed by [Issue-0003: Document Order]

Issue-0037: Copy vs identity semantics

Date: Jul-26-2000
Raised by: Dana Florescu

Resolution: subsumed by [Issue-0005: Element identity]

Issue-0038: Copy by reachability

Date: Jul-26-2000
Raised by: Dana Florescu

Description: Is it possible to copy children as well as IDREFs, Links, etc.? Related to [Issue-0005: Element identity] and [Issue-0008: Fixed point operator or recursive functions]

Resolution: Resolved by addition of "deep" copy operator in [XQuery 1.0 and XPath 2.0 Data Model].

Issue-0039: Dereferencing semantics

Date: Jul-26-2000
Raised by: Dana Florescu

Resolution: Subsumed by [Issue-0005: Element identity]

Issue-0040: Case Syntax

Date: Aug-01-2000
Raised by: Quilt

Description: We suggest that the syntax for "case" be made more regular. At present, it takes only two branches, the first labelled with a tag-name and the second labelled with a variable. A more traditional syntax for "case" would have multiple branches and label them in a uniform way. If the algebra is intended only for semantic specification, "case" may not even be necessary.

Resolution: subsumed by [Issue-0027: Case syntax]

Issue-0041: Sorting

Date: Aug-01-2000
Raised by: Quilt

Description: We are not happy about the three-step sorting process in the Algebra. We would prefer a one-step sorting operator such as the one illustrated below, which handles multiple sort keys and mixed sorting directions: SORT emp <- employees BY emp/deptno ASCENDING emp/salary DESCENDING

Resolution: The WG has decided to go for the above syntax, with an (optional) indication of COLLATION.

Issue-0042: GroupBy

Date: Aug-01-2000
Raised by: Quilt

Description: We do not think the algebra needs an explicit grouping operator. Quilt and other high-level languages perform grouping by nested iteration. The algebra can do the same.

Resolution: The WG has decided to skip groupBy for the time being.

Issue-0043: Recursive Descent for XPath

Date: Aug-01-2000
Raised by: Quilt

Description: The very important XPath operator "//" is supported in the Algebra only by writing a recursive function. This is adequate for a semantic specification, but if the Algebra is intended as an optimizable target language it will need better support for "//" (possibly in the form of a fix-point operator.)

Resolution: Resolved by subsumption under [Issue-0043: Recursive Descent for XPath]

Issue-0044: Keys and IDREF

Date: Aug-01-2000
Raised by: Quilt

Description: We think the algebra needs some facility for dereferencing keys and IDREFs (exploiting information in the schema.)

Resolution: Subsumed by [Issue-0007: References: IDREFS, Keyrefs, Joins]

Issue-0045: Global Order

Date: Aug-01-2000
Raised by: Quilt

Description: We are concerned about absence of support for operators based on global document ordering such as BEFORE and AFTER.

Resolution: Subsumed by [Issue-0003: Document Order]

Issue-0046: FOR Syntax

Date: Aug-01-2000
Raised by: Quilt

Description: We agree with comments made in the face-to-face meeting about the aesthetics of the Algebra's syntax for iteration. For example, the following syntax is relatively easy to understand: FOR x IN some_expr EVAL f(x) whereas we find the current algebra equivalent to be confusing and misleading: FOR x <- some_expr IN f(x) This syntax appears to assign the result of some_expr to variable x, and uses the word IN in a non-intuitive way.

Resolution: Subsumed by [Issue-0021: Syntax]

Issue-0047: Attributes

Date: Aug-01-2000
Raised by: Quilt

Description: See [Issue-0001: Attributes].

Resolution: Subsumed by [Issue-0001: Attributes]

Issue-0048: Explicit Type Declarations

Date: Jul-27-2000
Raised by: Group 1 at F2F, Redmond

Description: Type Declaration for the results of a query: The issue is whether to auto construct the result type from a query or to pre-declare the type of the result from a query and check for correct type on the return value. Suggestion: Support for pre-declared result data type and as well as to coerce the output to a new type is desirable. Runtime or compile time type checking is to be resolved? Once you attach a name to a type, it is preserved during the query processing.

Resolution: W.r.t. compile time type casts this is already possible with e:t. For run-time casts an issue has been raised in [Issue-0062: Open questions for constructing elements by reference].

Issue-0049: Unordered Collections

Date: Jul-27-2000
Raised by: Algebra Editors, Group 1, F2F, Redmond

Description: Currently, all sequences in the data model are ordered. It may be useful to have unordered forests. The distinct-node function, for example, produces an inherently unordered forest. Unordered forests can benefit from many optimizations for the relational algebra, such as commutable joins.

Handling of collection of attributes is easy but the collection of elements is complex due to complex type support for the elements. It makes sense to allow casting from unordered to ordered collection and vice versa. It is not clear whether the new ordered or unordered collection is a new type or not. It affects function resolution, optimization.

Our request to Schema to represent insignificance of ordering at schema level has not been fulfilled. Thus we need to be aware that this information may get lost, when mapping to schema.

Resolution: Unordered collections are described by {t} see [3 The XQuery Type System], some operators (sort, distinct-node, for, and sequence) are overloaded, and some operators (difference, intersection) are added). A new issue [Issue-0076: Unordered types] is raised.

Issue-0050: Recursive Descent for XPath

Date: Jul-27-2000
Raised by: Group 1, F2F, Redmond

Description: Suggestion: The group likes to add a support for fixed-point operator in the query language that will allow us to express the semantics of the // operator in an xpath expression. A path expression of the form a//b may be represented by a fixed-point operator fp(a, "/.")/b.

Resolution: Subsumed by [Issue-0043: Recursive Descent for XPath]

Issue-0051: Project redundant?

Date: Aug-05-2000
Raised by: Peter Fankhauser

Description: It appears that project a e could be reduced to sth. like

for v <- e in case v of a[v1] =>
        a[v1] | v2 => ()

... or would that generate a less precise type?

Resolution: With the new type system and handling of the for operator, project is indeed redundant.

Issue-0052: Axes of XPath

Date: Aug-05-2000
Raised by: Peter Fankhauser

Description: The current algebra makes navigation to parents difficult to impossible. With support of Element Identity [Issue-0005: Element identity] and recursive functions [Issue-0008: Fixed point operator or recursive functions] one can express parent() by a recursive function via the document root. More direct support needs to be investigated w.r.t its effect on the type system.

The WG wishes to support a built-in operator parent().

Resolution: XPath 2.0 and [XPath/XQuery] can reraise issue if desired. Algebra should review any solution w.r.t typing. Question: whether namespace axis (i.e., access namespace nodes) will be included in [XPath/XQuery]. Algebra currently has issues related to typing of parent() and descendant(). If sibling axes are included in [XPath/XQuery], then Algebra should review w.r.t. typing.

Issue-0053: Global vs. local elements

Date: Aug-05-2000
Raised by: Peter Fankhauser

Description: The current type system cannot represent global element-declarations of XML-Schema. All element declarations are local.

Resolution: The type system now supports both local and global elements and attributes.

Issue-0054: Global vs. local complex types

Date: Aug-05-2000
Raised by: Peter Fankhauser

Description: The current type system does not distinguish between global and local types as XML-Schema does. All types appear to be fully nested (i.e. local types)

Resolution: The type system now supports both local and global types.

Issue-0055: Types with non-wellformed instances

Date: Aug-05-2000
Raised by: Peter Fankhauser

Description: The type system and algebra allows for sequences of simple types, which can usually be not represented as a well-formed document. How shall we constrain this? Related to [Issue-0016: Mixed content].

Resolution: XQuery 1.0 supports non-well-formed heterogeneous sequences for intermediate results. The XQuery type system supports a way to type such intermediate results.

Issue-0056: Operators on Simple Types

Date: Jul-15-2000
Raised by: Fernandez et al.

Description: We intentionally did not define equality or relational operators on element and simple type. These operators should be defined by consensus.

Resolution: [XPath/XQuery] formal semantics adopts solution provided by Operators task force.

Issue-0057: More precise type system; choice in path

Date: Aug-07-2000
Raised by: LA-Team

Description: (This subsumes [Issue-0051: Project redundant?]). If the type system were more precise, then (project a e) could be replaced by:

for v &lt;- e in
    case v of
      a[v1] =&gt; a[v1]
    | v2 =&gt; ()

One could also represent (e/(a|b)) directly in a similar style.

for v &lt;- e in
    case v of
      a[v1] =&gt; a[v1]
    | v2 =&gt; case v2 of
      b[v3] =&gt; b[v3]
            | v4 =&gt; ()

Currently, there is no way to represent (e/(a|b)) without loss of precision, so if we do not change the type system, we may need to have some way to represent (e/(a|b)) and similar terms without losing precision. (The LA team has a design for this more precise type system, but it is too large to fit in the margin of this web page!)

Resolution: See resolution of [Issue-0051: Project redundant?]

Issue-0058: Downward Navigation only?

Date: Aug-07-2000
Raised by: LA-Team

Description: Related to [Issue-0052: Axes of XPath]. The current type system (and the more precise system alluded to in [Issue-0057: More precise type system; choice in path]) seems well suited for handling XPath children and descendant axes, but not parent, ancestor, sibling, preceding, or following axes. Is this limitation one we can live with?

Resolution: Subsumed by [Issue-0052: Axes of XPath]

Issue-0059: Testing Subtyping

Date: Aug-07-2000
Raised by: LA-Team

Description: One operation required in the Algebra is to test whether XML type t1 is a subtype of XML type t2, indicated by writing t1 <: t2. There is a well-known algorithm for this, based on tree automata, which is a straightforward variant of the well-known algorithm for testing whether the language generated by one regular-expression is a subset of the language generated by another. (The algorithm involves generating deterministic automata for both regular expressions or types.)

However, the naive implementation of the algorithm for comparing XML types can be slow in practice, whereas the naive algorithm for regular expressions is tolerably fast. The only acceptably fast implementation of a comparison for XML types that the LA team knows of has been implemented by Haruo Hasoya, Jerome Voullion, and Benjamin Pierce at the University of Pennsylvania, for their implementation of Xduce. (Our implementation of the Algebra re-uses their code, with permission.)

So, should we adopt a simpler definition of subtyping which is easier to test? One possibility is to adopt the sibling restriction from Schema, which requires that any two elements which appear a siblings in the same content model must themselves have contents of the same type. Jerome Simeon and Philip Wadler discovered that adopting the sibling restriction reduces the problem of checking subtyping of XML types to that of checking regular languages for inclusion, so it may be worth adopting the restriction for that reason.

Issue-0060: Internationalization aspects for strings

Date: Jun-26-2000
Raised by: I18N

Description: These issues are taken from the comments on the Requirements Document by I18N

Further information can be found at http://www.w3.org/TR/WD-charreq.

It is a goal of i18n that queries involving string matching ("select x where x='some_constant'") treat canonically equivalent strings (in the Unicode sense) as matching. If the query and the target are both XML, early normalization (as per the Character Model) is assumed and binary comparison ensures that the equivalence requirement is satisfied. However, if the target is originally a legacy database which logically has a layer that exports the data as XML, that XML must be exported in normalized form. The XML Query spec must impose the normalization requirement upon such layers.

Similarly, the query may come from a user-interface layer that creates the XML query. The XML Query spec must impose the normalization requirement upon such layers.

Provided that the query and the target are in normalized form C, the output of the query must itself be in normalized form C.

Queries involving string matching should support various kinds of loose matching (such as case-insensitivity, katakana-hiragana equivalence, accent-accentless equivalence, etc.)

If such features as case-insensitivity are present in queries involving string matching, these features must be properly internationalized (e.g. case folding works for accented letters) and language-dependence must be taken into account (e.g. Turkish dotless-i).

Queries involving character counting and indexing must take into account the Character Model. Specifically, they should follow Layer 3 (locale-independent graphemes). Additional details can be found in The Unicode Standard 3.0 and UTR#18. Queries involving word counting and indexing should similarly follow the recommendations in these references.

Resolution: [XPath/XQuery] formal semantics adopts solution provided by Operators task force.

Issue-0061: Model for References

Date: Aug-16-2000
Raised by: Group 3, F2F, Redmond

Description: Related to a number of issues around [Issue-0005: Element identity].

Use Cases

Table of Contents

REF *could* do this well if it were restructured - it does not maintain unforeseen relationships or use them...

Bibliographies

Recursive parts

RDF assertions

Inversion of simple parent/child references (related to [Issue-0058: Downward Navigation only?]).
What can we leave out?

can we leave out transitive closure?

can we limit recursion?

can we leave out fixed point recursion?

related to [Issue-0008: Fixed point operator or recursive functions]
Do we need to be able to...

a. Find the person with the maximum number of descendants?

b. Airplane routes: how can I get from RDU to Raleigh? (fixed point: guaranteeing termination in reasonable time...)

c. Given children and their mothers, can I get mothers and their children? (without respect to the form of the original reference...)

related to [Issue-0008: Fixed point operator or recursive functions].
Should we abstract out the difference between different kinds of references? If so, should we be able to cast to a particular kind of reference in the output?

a. abstracting out the differences is cheaper, which is kewl...

b. the kind of reference gives me useful information about: locality (same document, same repository, big bad internet...) static vs. dynamic (xpointer *may* be resolved dynamically, or *may* be resolved at run time, ID/IDREF is static).

related to [Issue-0007: References: IDREFS, Keyrefs, Joins].
do we need to be able to generate ids, e.g. using skolem functions?

for a document in RAM, or in a persistent tree, identity may be present, implicit, system dependent, and cheap - it's nice to have an abstraction that requires no more than the implicit identity

persistable ID is more expensive, may want to be able to serialize with ID/IDREF to instantiate references in the data model

can use XPath instead of generating ID/IDREF, but these references are fragile, and one reason for queries is to create data that may be processed further

persistable ID unique within a repository context

persistable ID that is globally unique

related to [Issue-0005: Element identity].
copy vs. reference semantics

"MUST not preclude updates..."

in a pure query environment, sans update, we do not need to distinguish these

if we have update, we may need to distinguish, perhaps in a manner similar to "updatable cursors" in SQL

programs may do queries to get DOM nodes that can that be modified. It is essential to be able to distinguish copies of nodes from the nodes themselves.

copy semantics - what does it mean?

copy the descendant hierarchy?

copy the reachability tree? (to avoid dangling references)

related to [Issue-0038: Copy by reachability].

Resolution: Handled in current data model and algebra.

Issue-0062: Open questions for constructing elements by reference

Date: Sep-25-2000
Raised by: Mary Fernandez et al.

Description: (1) What is the value of parent() when constructing new elements with children refering to original nodes?

(2) Is an approach to either make copies for all children or provide references to all children, or should we allow for a more flexible combination of copies and references?

Resolution: Operational semantics specifies that element node constructor creates copies of all its children. Addition of RefNode in [XQuery 1.0 and XPath 2.0 Data Model] supports explicit reference value.

Issue-0063: Do we need (user defined) higher order functions?

Date: Oct-16-2000
Raised by: Peter Fankhauser

Description: The current XML-Query-Algebra does not allow functions to be parameters of another function - so called higher order functions. However, most of the Algebra operators are (built-in) higher functions, taking expressions as an argument ("sort", "for", "case" to name a few). Even a fixpoint operator, "fun f(x)=e, fix f(x) in e" (see also [Issue-0008: Fixed point operator or recursive functions]), would be a built-in higher order function.

Resolution: The XML Query Algebra will not support user defined higher order functions. It does support a number of built-in higher order functions.

Issue-0064: Error code handling in Query Algebra

Date: Oct-04-2000
Raised by: Rezaur Rahman

Description: How do we return an error code from a function defined in current Query algebra. Do we need to create an array (or a structure) to merge the return value and error code to do this. If that is true, it may be inefficient to implement. In order for cleaner and efficient implementation, it may be necessary to allow a function declaration to take a parameter of type "output" and allow it to return an error code as part of the function definition.

Resolution: One does not need to create a structure to combine return values with error codes, provided each operator or function /either/ returns a value /or/ raises an error. The XML-Query Algebra supports means to raise errors, but does not define standard means to catch errors. Raising errors is accomplished by the expression "error" of type Ø (empty choice). Because Ø | t = t, such runtype errors do not influence static typing. The surface syntax and/or detailed specification of operators on simple types (see [Issue-0056: Operators on Simple Types]) may choose to differentiate errors into several error-codes.

Issue-0065: Built-In GroupBy?

Date: Oct-16-2000
Raised by: Peter Fankhauser

Description: We may revisit the resolution of [Issue-0042: GroupBy] and reintroduce GroupBy along the lines of sort: "group v in e1 by [e2 {collation}]". One reason for this may be that this allows to use collation for deciding about the equality of strings.

Resolution: The WG has decided to close this issue, and for the time being not consider GroupBy as a built-in operator. Furthermore, [Issue-0013: Collations] is ammended to deal with collations for all operators involving a comparison of strings.

Issue-0066: Shallow or Deep Equality?

Date: Oct-16-2000
Raised by: Peter Fankhauser

Description: What is the meaning of "=" and "distinct"? Equality of references to nodes or deep equality of data?

Resolution: [XQuery 1.0 and XPath 2.0 Data Model] defines "=" (value equality) and "==" (identity equality) operators. Description of distinct states that it uses "==".

Issue-0067: Runtime Casts

Date: Sep-21-2000
Raised by: ???

Description: In some contexts it may be desirable to cast values at runtime. Such runtime casts lead to an error if a value cannot be cast to a given type.

Resolution: cast e : t has been introduced as a reducible operator expressed in terms of typeswitch.

Issue-0068: Document Collections

Date: Oct-16-2000
Raised by: Peter Fankhauser

Description: Per our requirements document we are chartered to support document collections. The current XML-Query Algebra deals with single documents only. There are a number of subissues:

(a) Do we need a more elaborate notion of node-references? E.g. pair of (URI of root-node, local node-ref)

(b) Does the namespace mechanism suffice to type collections of nodes from different documents? Probably yes.

(c) Provided (a) and (b) can be settled, will the approach taken for [Issue-0049: Unordered Collections] do the rest?

Resolution: Document collections are now supported in XQuery 1.0. Input functions have been added, the SequenceType production supports types for document nodes. The XQuery Type system has been extended to support document nodes.

Issue-0069: Organization of Document

Date: Oct-16-2000
Raised by: Peter Fankhauser

Description: The current document belongs more to the genre (scientific) paper than to the genre specification. One may consider the following modifications: (a) reorganize intro to give a short overview and then state the purpose (strongly typed, neutral syntax with formal semantics as a basis for possibly multiple syntaxes, etc.) (compared to version Aug-23, this version has already gone a good deal in this direction). (b) Equip various definitions and type rules with id's. (c) Elaborate appendices on mapping XML-Query-Algebra Model vs. XML-Query-Datamodel, XML-Query-Type System vs. XML-Schema-Type System. (d) Maybe add an appendix on use-case-solutions. The problem is of course: Part of this is a lot of work, and we may not achieve all for the first release.

Resolution: The WG decided to dispose of this issue. The current overall organization of the document is quite adequate, but of course editorial decisions will have to made all the time.

Issue-0070: Stable vs. Unstable Sort/Distinct

Date: Oct-02-2000
Raised by: Steve Tolkin

Description: Should sort (and distinct) be stable on ordered collections, i.e. lists, and unstable on unordered collections (see [Issue-0049: Unordered Collections])?

Resolution: sort and distinct are stable on ordered collections, and unstable on unordered collections.

Issue-0071: Alignment with the XML Query Datamodel

Date: Sep-26-2000
Raised by: Mary Fernandez

Description: Currently, the XML Query Algebra Datamodel does not model PI's and comments.

Resolution: Addition of operational semantics defines relationship of Algebra to Data Model.

Issue-0072: Facet value access in Query Algebra

Date: Oct-04-2000
Raised by: Rezaur Rahman

Description: Each of the date-time data types have facet values as defined by the schema data types draft spec. This problem is general enough to be applied to other simple data types.

The question is : Should we provide access to these facet values on an instance of a particular data types? If so, what type of access? My take is the facets are to be treated like read-only attributes of a data instance and one should have a read access to them.

Issue-0073: Facets for simple types and their role for typechecking

Date: Oct-16-2000
Raised by: Peter Fankhauser

Description: XML-Schema introduces a number of constraining facets http://www.w3.org/TR/xmlschema-2/ for simple types (among them: length, pattern, enumeration, ...). We need to figure out whether and how to use these constraining facets for type-checking.

Issue-0074: Operational semantics for expressions

Date: Nov-16-2000
Raised by: Mary Fernandez

Description: It is necessary to add an operational semantics that formally defines each operator in the Algebra.

Resolution: The new document contains a full specification of the dynamic semantics.

Issue-0075: Overloading user defined functions

Date: Nov-17-2000
Raised by: Don Chamberlain

Description: User defined functions can not be overloaded in the XML Query Algebra, i.e., a function is exclusively identified by its name, and not by its signature. Should this restriction be relaxed and if so - to which extent?

Resolution: No overloading in Query 1.0

Issue-0076: Unordered types

Date: Dec-11-2000
Raised by: Phil Wadler

Description: Currently unorderedness is represented at type level by {t}, and some (built-in) operators are overloaded such they have different semantics (and potentially different return type) depending on their input type. An alternative is to not represent unorderedness at type level, but rather support unordered for, unordered (unstable) sort, unordered (unstable) distinct.

Resolution: Removed unordered types from type system. Added support for unordered operator.

Issue-0077: Interleaved repetition and closure

Date: Dec-12-2000
Raised by: Peter Fankhauser

Description: Regular Languages are closed w.r.t. to the interleaved product. However, they are not closed w.r.t. to interleaved repetition, which can (e.g) generate the 1 degree Dyck language D[1] = () | a D[1] b | D[1] D[1] = (a,b)^{0,*}, and more generally, any language that coordinates cardinalities of individual members from an alphabeth: E.g. (a ^ b)^ min 0 max * = all strings with equally many a's and b's. These are beyond regular languages. Should we thus try to do without interleaved repetition?

Resolution: if we use interleaved repetition (which we will because it is in MSL), they will be restricted to prime types.

Issue-0078: Generation of ambiguous types

Date: Dec-12-2000
Raised by: Jerome Simeon

Description: Unambiguous content-models in XML 1.0 and XML Schema are not closed w.r.t. union. It appears that the XML Query-Algebra can generate result types which can not be transformed to an unambiguous content-model.

Resolution: The XQuery type system supports ambiguous types only for intermediate types generated during type inference.

Issue-0079: Global order between nodes in different documents

Date: Dec-16-2000
Raised by: Algebra Editors

Description: The global order operator < is defined on nodes in the same document, but not between nodes in different documents.

Resolution: Resolution follows from the [XPath/XQuery] Data Model. Order between documents is implementation defined but stable.

Issue-0080: Typing of parent

Date: Dec-16-2000
Raised by: Algebra Editors

Description: Currently, the parent operator yields an imprecise type : AnyElement min 0 max 1. It might be possible to type parent more precisely, for example, by using the normalized names in MSL, which encode containment of types.

Issue-0081: Lexical representation of Schema simple types

Date: Jan-17-2001
Raised by: Algebra Editors

Description: Schema simple types must be defined for the Algebra and [XPath/XQuery].

Resolution: Algebra will adopt lexical reps supported by [XPath/XQuery].

Issue-0082: Type and expression operator precedence

Date: Jan-17-2001
Raised by: Algebra Editors

Description: The precedence of the type expressions is not defined.

Issue-0083: Expressive power and complexity of typeswitch expression

Date: Jan-17-2001
Raised by: Algebra Editors, Michael Brundage

Description: When processing an XML document without schema information, i.e., the type of the document is AnyComplexType, then match expressions may be very expensive to evaluate:

  typeswitch x
  case t1 : AnyTree do 1
  case t2 : AnyTree min 0 max 2 do 2
  case t3 : *[*[*[*[* ... [AnyAttribute] ]]]] do 3 
  else ERROR

typeswitch itself is not the issue. The real problem is having very liberal type patterns. We could restrict the kinds of type patterns that we permit.

Resolution: Typeswitch types are now restricted to datatypes. Named typing will further help in reducing the complexity by allowing type annotation contained in the data model as a means for optimization.

Issue-0084: Execution model

Date: Jan-17-2001
Raised by: Algebra Editors

Description: Need prose describing execution model scenarios : interpretor vs. compile/runtime vs. translation into another query language. Explain relationship between static and dynamic semantics.

Resolution: Section [2.1 Processing model] defines a procesing model which serves as a framework for the Formal Semantics specification.

Issue-0085: Semantics of Wildcard type

Date: Jan-17-2001
Raised by: Algebra Editors, Michael Brundage

Description: Cite: wildcard types cannot be implemented. If x!y means any name in x except names in y, what does x!y!z mean? In general, how do ! and | operate (precedence, associativity)? Parentheses are required to force the desired grouping of these two operators. Also, what does x!* mean? (There's an infinite family of such examples.)

Resolution: The [XPath/XQuery] type systyem now uses only simple wildcard names based on XPath's NameTest production.

Issue-0086: Syntactic rules

Date: Jan-17-2001
Raised by: Algebra Editors

Description: Need rules for specifying syntactic correctness of query: symbol spaces; variable def'ns precede uses; list of keywords, etc.

Resolution: Syntactic rules should be dealt with in [XPath/XQuery] document

Issue-0087: More examples of Joins

Date: Jan-17-2001
Raised by: Algebra Editors, Michael Brundage

Description: Cite: no join operator; wants example of many-to-many joins, inner join, left and full outer joins.

Resolution: The XQuery document gives a number of such examples.

Issue-0088: Align types with XML Schema : Formal Description.

Date: 02-Apr-2001
Raised by: Mary Fernandez

Description: Sources of misalignment: [XPath/XQuery] types include comment and processing instruction; [XML Schema : Formal Description] does not. [XPath/XQuery] uses () for empty sequence; MSL uses the epsilon character. [XPath/XQuery] permits the names of attribute and element components to be wildcard expressions. MSL only permits literal names for attributes and elements, but permits stand-alone wildcard expressions. [XPath/XQuery] types call '&' interleaved repetition, but MSL says it means 'all g1 and g2 in either order'. Does MSL mean interleaved repetition?

Resolution: Closed by the semantics of named typing. The XQuery Type System aligns directly with XML Schema. See Sections 3 and 7 of the Formal Semantics document.

Issue-0089: Syntax for types in XQuery

Date: 30-Apr-2001
Raised by: Mary Fernandez

Description: Formalism document gives a particular syntax for type expressions that is not supported in the [XPath/XQuery] surface syntax.

Issue-0090: Static type-assertion expression

Date: 30-Apr-2001
Raised by: Mary Fernandez

Description: Formalism document uses a static type-assertion expression that is not supported in the [XPath/XQuery] surface syntax.

Resolution: Static type assertion is supported in XQuery with the new "assert as" expression.

Issue-0091: Attribute expression

Date: 30-Apr-2001
Raised by: Mary Fernandez

Description: [XPath/XQuery] formal semantics has stand-alone attribute constructor/expression ATTRIBUTE QName (Exp) that is not supported in [XPath/XQuery] surface syntax.

Resolution: Stand-alone attribute construction is supported in XQuery with the new syntax for element and attribute constructors.

Issue-0092: Error expression

Date: 11-May-2001
Raised by: Jerome Simeon

Description: [XPath/XQuery] formal semantics has an error expression Error that is not supported in [XPath/XQuery] surface syntax.

Resolution: Errors are raised by a function "dm:erorr()" instead of a separate expression.

Issue-0093: Representation of Text Nodes in type system

Date: 11-May-2001
Raised by: Mary Fernandez

Description: The data model distinguished between text nodes and strings, which are atomic values. Text nodes have identity, parents, and siblings. Strings do not. Text nodes are accessed by the children() accessor; strings and other atomic values are accessed by the typed-value() accessor. The distinction between text nodes and atomic values should exist in type system as well.

Resolution: Subsumed by new issue [Issue-0105: Types for nodes in the data model.].

Issue-0094: Static type errors and warnings

Date: 31-May-2001
Raised by: Don Chamberlin

Description: Static type errors and warnings are not specified. We need to enumerate in both the [XPath/XQuery] and formal semantics documents what kinds of static type errors and warnings are produced by the type system. See also [Issue-0090: Static type-assertion expression].

Issue-0095: Importing Schemas and DTDs into query

Date: 31-May-2001
Raised by: Don Chamberlin

Description: We do not specify how a Schema or DTD is 'imported' into a query so that its information is available during type checking. Schema and DTDs can either be named explicitly (e.g., by an 'IMPORT SCHEMA' clause in a query) or implicitly, by accessing documents that refer to a Schema or DTD. The mechanism for statically accessing a Schema or DTD is unspecified.

Resolution: Closed by the semantics of named typing. The new Formal Semantics document contains a complete mapping from XML Schema to the XQuery type system. See Section 7 of the Formal Semantics. Import of DTDs is supported, but should not be described in the Formal Semantics document.

Issue-0096: Support for schema-less and incompletely validated documents

Date: 31-May-2001
Raised by: Don Chamberlin/Mary Fernandez

Description: This is related to [Issue-0095: Importing Schemas and DTDs into query]. We do not specify what is the effect of type checking a query that is applied to a document without a DTD or Schema. In general, a schema-less document has type xs:AnyType and type checking can proceed under that assumption. A related issue is what is the effect of type checking a query that is applied to an incompletely validated document. As above, we can make *no* assumptions about the static type of an incompletely validated document and must assume its static type is xs:AnyType.

Resolution: XQuery supports schema-less document, and valid documents. It does not support invalid document, I.e., document with a schema for which validation fails. It can support those documents in a well-formed manner.

Issue-0097: Static type-checking vs. Schema validation

Date: 31-May-2001
Raised by: Mary Fernandez

Description: Static type checking and schema validation are not equivalent, but we might want to do both in a query. For example, we might want to assert statically that an expression has a particular type and also validate dynamically the value of an expression w.r.t a particular schema.

The differences between static type checking and schema validation must be enumerated clearly (the XSFD people should help us with this).

Resolution: Closed by the semantics of named typing. XQuery and the XQuery Formals Semantics support both a validate operation and static typing.

Issue-0098: Implementation of and conformance levels for static type checking

Date: 31-May-2001
Raised by: Don Chamberlin

Description: This issue is related to [Issue-0059: Testing Subtyping] Static type checking may be difficult and/or expensive to implement. Some discussion of algorithmic issues of type checking are needed. In addition, we may want to define "conformance levels" for [XPath/XQuery], in which some processors (or some processing modes) are more permissive about types. This would allow [XPath/XQuery] implementations that do not understand all of Schema, and it would allow customers some control over the cost/benefit tradeoff of type checking.

Issue-0099: Incomplete/inconsistent mapping from to core

Date: 06-June-2001
Raised by: Don Chamberlin

Description: This mapping is still preliminary and contains inconsistencies. These inconsistencies will be addressed in detail in the next draft of the document.

Resolution: The Formal Semantics now provides a complete mapping from [XPath/XQuery] to the Core [XPath/XQuery]. Remaining issues with respect to that mapping are indicated separately.

Issue-0100: Namespace resolution

Date: March-11-2002
Raised by: FS Editors

The way (when? where?) namespace prefixes are resolved is still an open issue.

Issue-0101: Support for mixed content in the type system

Date: March-11-2002
Raised by: FS Editors

Description: Support for mixed content in the type system is an open issue. This reopens issue [Issue-0016: Mixed content]. Dealing with mixed content with interleaving raises complexity issue. See also [Issue-0103: Complexity of interleaving].

Issue-0102: Indentation, Whitespace

Date: March-11-2002
Raised by: FS Editors

Description: Whitespace normalization in [XPath/XQuery] is still an open issue. This reopens issue [Issue-0022: Indentation, Whitespaces].

Resolution: New version now describes the semantics of element constructors and whitespace handling.

Issue-0103: Complexity of interleaving

Date: March-11-2002
Raised by: FS Editors

Description: The current type system allows interleaving is allowed on arbitrary types. Interleaving is an expensive operation and it is not clear how to define subtyping for it. Should we restrict use of interleaving on (optional) atomic types ? Should this restriction reflects the one in XML schema ? Related to [Issue-0077: Interleaved repetition and closure].

Issue-0104: Support for named typing

Date: March-11-2002
Raised by: FS Editors

Description: XML Schema is based on named typing, while the [XPath/XQuery] type system is based on strucural typing. Directly related issues are [Issue-0019: Support derived types] and [Issue-0018: Align algebra types with schema]. Other impacted issues are [Issue-0020: Structural vs. name equivalence], [Issue-0072: Facet value access in Query Algebra], [Issue-0073: Facets for simple types and their role for typechecking], [Issue-0088: Align types with XML Schema : Formal Description.], [Issue-0095: Importing Schemas and DTDs into query], [Issue-0097: Static type-checking vs. Schema validation], [Issue-0098: Implementation of and conformance levels for static type checking], [Issue-0111: Semantics of instance of ... only].

Resolution: Closed by the semantics of named typing. The Formal Semantics now support both named and structural typing. See Sections 3 and 7 of the Formal Semantics document.

Issue-0105: Types for nodes in the data model.

Date: March-11-2002
Raised by: FS Editors

Description: The [XPath/XQuery] type system only supports element and attribute nodes. It needs to support other kinds of nodes from the [XPath/XQuery] datamodel, notably: text nodes and document nodes. Should it also include support for PI nodes, comment nodes and namespace nodes?

The data model distinguishes between text nodes and strings, which are atomic values. Text nodes have identity, parents, and siblings. Strings do not. Text nodes are accessed by the children() accessor; strings and other atomic values are accessed by the typed-value() accessor. The distinction between text nodes and atomic values should exist in type system as well.

Resolution: The new type system supports document and text nodes. Support for other kinds of nodes is discussed in [Issue-0143: Support for PI, comment and namespace nodes]

Issue-0106: Constraint on attribute and element content models

Date: March-11-2002
Raised by: Jerome

Description: The [XPath/XQuery] type system allows more content model than what XML Schema allows. For instance, the current type grammar allows the following types:

   element d { (attribute a | element b, attribute c)* }
   attribute a { element b }

Section [3 The XQuery Type System] indicates corresponding constraints on the [XPath/XQuery] type system to avoid that problem. The status of these constraints is unclear. When are they enforced and checked?

Issue-0107: Semantics of data()

Date: March-11-2002
Raised by: FS Editors

Description: What is the semantics of data() applied to anything else than an element or attribute node ?

Issue-0108: Principal node types in XPath

Date: March-11-2002
Raised by: Michael Kay

Description: There is a known bug in the formal semantics which does not deal properly with principal node types. This bug should be resolved based on some semantics previously proposed by Phil Wadler.

Resolution: The bug in the semantics of XPath has been fixed in the current version of the document. See [5.2.1 Steps].

Issue-0109: Semantics of sortby

Date: March-11-2002
Raised by: Jerome

Description: The precise semantics of sortby is still open issue.

Issue-0110: Semantics of element and attribute constructors

Date: March-11-2002
Raised by: Jerome

Description: The precise semantics of element constructors is still an open issue.

Issue-0111: Semantics of instance of ... only

Date: March-11-2002
Raised by: Mary Fernandez

Description: The "instance of" expression allows an optional "only" modifier. The use case for such a modifier is based on named typing, while the XQuery semantics is currently based on structural typing. It is not clear what the semantics of the "only" modifier under structural typing should be and how it can be supported.

Resolution: XQuery does not support only anymore.

Issue-0112: Typing for the typeswitch default clause

Date: March-11-2002
Raised by: Jerome

Description: There is an asymetry in the typing for the default clause in typeswitch vs. the other case clauses. This results in a less precise type when the default clause can be applied.

It would be nicer to be able to have the type be more precise, like for the other case clauses.

The technical problem is the need for some form of negation. I think one could define a "non-common-primes" function that would do the trick, but I leave that as open for now until further review of the new typeswitch section is made.

Issue-0113: Incomplete specification of type conversions

Date: March-11-2002
Raised by: Mary Fernandez

Description: Not all the fallback conversion rules are specified yet in section [4.4.3 Type Conversions]. All the remaining rules in the fallback conversions table must be specified.

Issue-0114: Dynamic context for current date and time

Date: March-11-2002
Raised by: Jerome

Description: The following components dynamic contexts have no formal representation yet: current date and time.

Related question: where are these context components used?

Issue-0115: What is in the default context?

Date: March-11-2002
Raised by: Jerome

Description: What do the default namespace and type environments contain? I believe at least the default namespace environment should contain the "xs", "xf" and "op" prefixes, as well as the default namespaces bound to the empty namespace. Should the default type environment contain wildcard types?

Issue-0116: Serialization

Date: March-11-2002
Raised by: Jerome

Description: Serialization of data model instances, and XQuery results is still an open issue.

Issue-0117: Data model constructor for error values

Date: March-11-2002
Raised by: Jerome

Description: The [XPath/XQuery] data model supports an error value, but there is no constructor for it. Currently the formal semantics is using the notation dm:error() to create an error value. Should there be some function(s) in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document to create error values?

Issue-0118: Data model syntax and literal values

Date: March-11-2002
Raised by: Phil Wadler

Description: Phil suggests the data model should support primitive literals in their lexical form, in which case no explicit dynamic semantic rule would be necessary.

More generally, should the data model support a constructor syntax?

Issue-0119: Semantics of op:to

Date: March-11-2002
Raised by: Mary Fernandez

Description: The binary operator "to" is not defined on empty sequences. The [XQuery 1.0 and XPath 2.0 Functions and Operators] document says operands are decimals, while the XQuery document says they are integers. What happens when Expr1 > Expr2?

Issue-0120: Sequence operations: value vs. node identity

Date: March-11-2002
Raised by: Mary Fernandez

Description: The [XQuery 1.0 and XPath 2.0 Functions and Operators] document provides only one function for union (intersect, etc.) of sequences of nodes and values. The semantics is very different for node only and value only sequences. The semantics is undefined for heterogeneous sequences. Should we have two union (intersect, etc.) functions, one for nodes, and one for values?

Issue-0121: Casting functions

Date: March-11-2002
Raised by: Jerome

Description: The [XQuery 1.0 and XPath 2.0 Functions and Operators] document does not provide any function for casting, just a table and casting rules. Wouldn't it be preferable to either have an explicit function to normalize to? This relates to Issue 17 in the [XQuery 1.0 and XPath 2.0 Functions and Operators] document.

Issue-0122: Overloaded functions

Date: March-11-2002
Raised by: Denise Draper

Description: Some [XQuery 1.0 and XPath 2.0 Functions and Operators] functions are overloaded. How to deal with overloaded built-in functions in the Formal Semantics is still an open issue.

Issue-0123: Semantics of /

Date: March-11-2002
Raised by: FS Editors

Description: Some of the semantics of the root expression / is still an open issue. For instance, what should be the semantics of '/' in case of a document fragment (I.e., created using XQuery element constructor).

Issue-0124: Binding position in FLWR expressions

Date: March-11-2002
Raised by: FS Editors

Description: The only way to bind position() is through implicit operations. It would be useful and cleaner to also have a way to bind position in a sequence explicitely. The FS editors proposed several syntax for such an operation, one of these syntaxes look like "for $v at $i in E1 return E2" which modifies the for expression to bind a variable "$i" to the position in sequence "E1" explicitely.

Issue-0125: Operations on node only in XPath

Date: March-11-2002
Raised by: FS Editors

Description: Generally steps may operate on nodes and values alike; the axis rules only can operate on nodes (NodeValue). Is it a dynamic error to apply an axis rule on a value?

More generally, the XQuery document states that XPath operates on nodes only. Where that restriction should be applied is an open issue.

Issue-0126: Semantics of effective boolean value

Date: March-11-2002
Raised by: FS Editors

Description: Some of the semantics of effective boolean value is still an open issue.

Issue-0127: Datatype limitations

Date: March-11-2002
Raised by: FS Editors

Description: Should the Datatype production allow the following: fs:atomic, fs:numeric, fs:UnknownSimpleType ?

The formal semantics makes use of several built-in types which are not in XML Schema: fs:numeric, fs:atomic, and fs:UnknownSimpleType. These types are necessary for the specification of some of XPath type conversion rules, and will be accepted without raising an error.

Issue-0128: Casting based on the lexical form

Date: March-11-2002
Raised by: Mary Fernandez

Description: The XQuery/XPath spec says : "If an operand is an untyped simple value (...), it is cast to the type suggested by its lexical form." It is not clear how to define the semantics for this.

Issue-0129: Static typing of union

Date: March-11-2002
Raised by: Michael Rys

Description: What should be the semantics of arithmetics expressions over unions. Right now, it would raise a dynamic error. Do we want to raise a static error?

Should operators and functions consistenly with respect to typing?

With the current semantics in Section 4.5 expr1 + expr2 raises a static type error if (e.g.) expr1 has type string and expr2 has type integer. It raises only a dynamic error, if expr1 has type (string | integer) and expr2 has type integer, and expr1 actually evaluates to a string. An alternative would be that this raises also a static error, because it cannot be guarantueed to succeed on all instances.

Issue-0130: When to process the query prolog

Date: March-11-2002
Raised by: Jerome

Description: The query prolog needs to be processed before the normalization phase. This is not reflected yet in the processing model.

Issue-0131: Boolean node test and sequences

Date: March-11-2002
Raised by: Michael Rys

Description: The current semantics for boolean node tests makes it "true if the expression is a sequence that contains at least one node and error if the sequence contains no node". This is inefficient to implement. Some alternative semantics have been proposed.

Issue-0132: Typing for descendant

Date: March-11-2002
Raised by: Peter Fankhauser

Description: The current static typing for descendant is still under review and the inferences rules in that version are probably containing bugs.

Issue-0133: Should to also be described in the formal semantics?

Date: March-11-2002
Raised by: Michael Kay

Description: The current semantics of the op operator is using the op:to function from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document. Should it be defined formally?

Mike Kay suggests the following definition: [A to B] => if A=B then (A) else (A, A+1 to B)

Issue-0134: Should we define for with head and tail?

Date: March-11-2002
Raised by: Michael Kay

Description: Mike Kay proposes to use the following recursion to define the dynamic semantics of for:

[for $x in () return E2] => ()
[for $x in SEQ return E2] =>
(let $x := head(SEQ) return E2, for $x in tail(SEQ) return E2)

Unfortunately head and tail are not define in [XQuery 1.0 and XPath 2.0 Functions and Operators] right now.

Issue-0135: Semantics of special functions

Date: March-11-2002
Raised by: Michael Kay

Description: The current semantics does not completely cover built-in functions. Some functions used in the Formal semantics, or some functions from the [XQuery 1.0 and XPath 2.0 Functions and Operators] document need additional semantics specification.

Issue-0136: Non-determinism in the semantics

Date: May-01-2002
Raised by: Mary Fernandez

Description: Some operations, such as logical operations and quantified operations are not deterministics ("early-out" semantics allowing not to evaluate all of the expressions). The formal semantics cannot capture the non-determinism in those operations.

Issue-0137: Typing of input functions

Date: May-01-2002
Raised by: Jerome Simeon

Description: Static typing for the input functions input, collection, and document is still undefined.

Issue-0138: Semantics of Schema Context

Date: May-01-2002
Raised by: Jerome Simeon

Description: The semantics of Schema Context types in the SequenceType production is still an open issue.

Issue-0139: Type equivalence rules

Date: May-08-2002
Raised by: Michael Rys

Description: Should we add back equivalence rules for types (e.g., T1** == T1* or (T1 | T1) == T1). They are useful in practical implementations (e.g., to print an infered type or reduce complexity of infered types), and could be added in an appendix.

Issue-0140: Dependency in normalization and function resolution

Date: May-08-2002
Raised by: Michael Rys

Description: The normalization of functions calls depends on the function signature. But you cannot gather the function signature for built-in functions without knowing the type of the parameters.

Issue-0141: Treatment of nillability and xsi:nil

Date: May-08-2002
Raised by: Michael Rys

Description: Nillability on an element declaration indicates that content of the corresponding element can be empty. The current data model preserves the xsi:nil attribute and this is used in the semantics at the type level. An alternative design would be to remove the xsi:nil attribute and add a 'nillable' marker in the schema component in the data model. This might be helping when we perform updates.

Issue-0142: Treatment of xsi:type in validation

Date: May-15-2002
Raised by: XQuery Editors

Description: The treatment of xsi:type in the formal semantics differs from the one specified in the XQuery and XPath documents. The XQuery and XPath documents indicate that validation is performed by serializing the data model value with newly created xsi:type attributes, then removing those xsi:type attributes after validation. The semantics of 'erases to' as described in the formal semantics document is not creating any xsi:type attributes. The semantics of 'annotate as' as described in the formal semantics document is not removing any xsi:type attributes.

Issue-0143: Support for PI, comment and namespace nodes

Date: May-15-2002
Raised by: FS Editors

Description: The [XPath/XQuery] type system does not currently support PI nodes, comment nodes and namespace nodes.

Issue-0144: Representation of text nodes in formal values

Date: May-15-2002
Raised by: Jonathan Robie

Description: Formal Values described in section [3.1 Values and Types] represents either text nodes for well-formed documents or values for validated documents. Do we need to support a dual representation with both text nodes and values in the formal semantics?

Issue-0145: Static typing of path expressions in the presence of derivation by extension

Date: May-15-2002
Raised by: Jerome Simeon

Description: The current static type analysis rules for Step expressions is broken in the presence of derivation by extension. This bug is impacting section [5.2.1 Steps].

Issue-0146: Support for substitution groups

Date: May-15-2002
Raised by: Jerome Simeon

Description: The formal semantics document gives a semantics of substitution groups. This is not aligned with the XQuery document. Support for substitution group in XQuery is still an open issue.

Note that this issue is related to the typing of substitution groups, and should be distinguished from the question about whether we need expressions in the language that operate on substitution groups.

Issue-0147: What should be the type annotation in the data model for anonymous types?

Date: May-15-2002
Raised by: Jerome Simeon

Description: It is not clear what should be the type annotation in the data model for anonymous types. The mapping from the PSVI to the data model and the formal semantics document need to be aligned.

Resolution: XQuery does not support only anymore.

Issue-0148: Validation of an empty string against a string list

Date: May-15-2002
Raised by: Jerome Simeon

Description: The formal semantics assumes that the result of validating an element with empty content or with an empty text node against a list of strings is the empty sequence, not the empty string. Is that consistent with XML Schema?

Issue-0149: Derivation by extension in XQuery

Date: May-15-2002
Raised by: Phil Wadler

Description: If type u is derived from type t by extension, then the formal semantics document specifies that type u may appear wherever type t is expected. It is not clear what the XQuery document says on this point.

Issue-0150: May the content of a text node be the empty string?

Date: May-15-2002
Raised by: Phil Wadler

Description: May the content of a text node be the empty string? None of the formal semantics, the datamodel, or the XQuery document addresses this point.

Issue-0151: Should type annotations be optional?

Date: May-15-2002
Raised by: Phil Wadler

Description: The formal semantics and the data model both state that the type annotation is required, while the XQuery document says that it is optional.

Issue-0152: What is the type of the input functions?

Date: May-30-2002
Raised by: FS Editors

Description: What are the (static) types for the input functions: document(), collection(), and input()?

Issue-0153: Support for lax and strict wildcards

Date: May-30-2002
Raised by: FS Editors

Description: The Formal Semantics does not currently model lax and strict wildcards. The mapping in Section 7, only describes how XML Schema wilcards with the 'skip validation' semantics are imported into the XQuery Type System.

Issue-0154: Semantics of =>

Date: May-30-2002
Raised by: Jerome Simeon

Description: The Formal Semantics of the => operator is not defined.

Resolution: The => operator has been removed from the language.

Issue-0155: Common primes for incomparable types

Date: June-06-2002
Raised by: Jerome Simeon

Description: The common-prime auxiliary type function does not deal with "incomparable" types, (i.e., types for which there is a non empty intersection, but subtyping does not hold -- e.g., element a | element b vs. element b | element c). One way to resolves that issue would be to re-introduce a limited form of intersection (i.e., returning true if the intersection is not empty instead of full-fledge intersection) and use it within the definition of common-prime.

Issue-0156: Casting and validation

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: Validation from text to simple type performs an operation similar to casting. Should validation of simple type values and 'cast as' in [XPath/XQuery] be aligned?

Issue-0157: Support for wildcard namespaces

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: The Formal Semantics does not currently model wildcard namespaces.

Issue-0158: Support for XML Schema groups

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: How to support XML Schema groups during the schema import phase is not clear. If the mapping is based on the XML Schema syntax, then it should be handled durin the mapping phase. Should we have support for XML Schema groups in the XQuery type system?

Issue-0159: Element and attribute declarations in the static context

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: The [XPath/XQuery] static context only contains type definitions, but no global element and attribute declarations. Element and attribute declarations are necessary to give the semantics of SequenceTypes.

Issue-0160: Collations in the static environment

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: The Formal Semantics does not represent collations in the static environment. Should it?

Issue-0161: Type promotion in Atomization

Date: Jun-06-2002
Raised by: Don Chamberlin

Description: Should the normalization rules for atomization also perform type promotion?

Issue-0162: How to describe the semantics of built-in functions

Date: Jun-06-2002
Raised by: Don Chamberlin

Description: The semantics of function calls assumes the function environment returns an [XPath/XQuery] expression which describes the function body. This only works for user-defined functions, but not for external functions or built-in functions.

Issue-0163: Normalization of XPath predicates

Date: Jun-06-2002
Raised by: Don Chamberlin

Description: The semantics of XPath predicates relies on whether they are applied on forward or reverse axis. The semantics of predicates applied on arbitrary expressions is not correctly specified.

Issue-0164: Coalescing text nodes in element constructors

Date: Jun-06-2002
Raised by: Don Chamberlin

Description: The semantics does not model how text nodes are coalesced in element or constructors. Whether it should be described or left out for the data model is still an open issue. Impact on this operation on the static semantics is still an open issue.

Resolution: The data model element constructor coalesces text nodes.

Issue-0165: Namespaces in element constructors

Date: Jun-06-2002
Raised by: Denise Draper

Description: We do not supply either namespaces or schema-components to the constructor. We cannot do these things because of the bottom-up nature of element construction: we do not, in general, know either the namespaces in scope or the validation-associated schema type until this element has been "seated" in some containing element (and so on recursively).

Issue-0166: Static typing for validate

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: Although validate should always return a typed value, there is no way to do static analysis for it, since the type against which it is validated cannot be known at static type (it depends on the result of evaluation for the input expression).

Issue-0167: Is validate working on sequences?

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: Should validate work on sequences of nodes, or only on a single node?

Issue-0168: Sorting by document order

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: There is no way in [XPath/XQuery] to express sorting by document order as a core expression. Since sortby is doing atomization, there is no way it can be used to express sorting by document order.

Issue-0169: Conformance Levels

Date: Jun-06-2002
Raised by: Jerome Simeon

Description: [XPath/XQuery] supports several conformance levels. Whether the formal semantics need to distinguish those conformance levels is an open issue. If yes, how to distinguish those conformance levels in the formal semantics is an open issue.

Issue-0170: Imprecise static type of constructed elements

Date: Jul-26-2002
Raised by: Mary Fernandez

Description: Implementation of Alternative 1 means that the static type for constructed elements and attributes is very imprecise. E.g., the type of {(1,2,3)} is element a { xs: anyType }. See remark by Denise in section on element constructors for possible fix.

Issue-0171: Raising errors

Date: Jul-26-2002
Raised by: Mary Fernandez

Description: The semantics of raising errors in [XPath/XQuery] is not formally specified.

Issue-0172: Element constructors aligned with XSLT

Date: Jul-26-2002
Raised by: Jerome Simeon

Description: Should the semantics of element constructors be aligned with XSLT?.

Issue-0173: Typeswitch and type substitutability

Date: Jul-26-2002
Raised by: XSL Working Group

Description: It seems that some examples of typeswitch with xs:anySimpleType might break type substitutability.

Issue-0174: Semantics of 'element foo of type T'

Date: Jul-30-2002
Raised by: Jerome Simeon

Description: When using the form 'element foo of type T', should 'foo' be a globally declared element in the in-scope schema or not? Should there be constraints on which type T is allowed? The language document assumes foo is globally declared and T is a restriction of the type of foo. The formal semantics does not make that assumption.

Issue-0175: Static typing for atomization, effective boolean values, and function arguments.

Date: Jul-30-2002
Raised by: Jerome Simeon

Description: Currently, normalization rules for atomization, effective boolean values, and function arguments are not type safe. They cannot raise a static errors, but can only raise dynamic errors. This makes the static semantics much less usable.

C.3 Alphabetic list of issues

C.3.1 Open Issues

Number: 77

[Issue-0124: Binding position in FLWR expressions]
[Issue-0131: Boolean node test and sequences]
[Issue-0156: Casting and validation]
[Issue-0128: Casting based on the lexical form]
[Issue-0121: Casting functions]
[Issue-0160: Collations in the static environment]
[Issue-0155: Common primes for incomparable types]
[Issue-0103: Complexity of interleaving]
[Issue-0169: Conformance Levels]
[Issue-0106: Constraint on attribute and element content models]
[Issue-0117: Data model constructor for error values]
[Issue-0118: Data model syntax and literal values]
[Issue-0127: Datatype limitations]
[Issue-0140: Dependency in normalization and function resolution]
[Issue-0149: Derivation by extension in XQuery]
[Issue-0114: Dynamic context for current date and time]
[Issue-0159: Element and attribute declarations in the static context]
[Issue-0172: Element constructors aligned with XSLT]
[Issue-0073: Facets for simple types and their role for typechecking]
[Issue-0072: Facet value access in Query Algebra]
[Issue-0162: How to describe the semantics of built-in functions]
[Issue-0098: Implementation of and conformance levels for static type checking]
[Issue-0170: Imprecise static type of constructed elements]
[Issue-0113: Incomplete specification of type conversions]
[Issue-0167: Is validate working on sequences?]
[Issue-0150: May the content of a text node be the empty string?]
[Issue-0100: Namespace resolution]
[Issue-0165: Namespaces in element constructors]
[Issue-0136: Non-determinism in the semantics]
[Issue-0163: Normalization of XPath predicates]
[Issue-0125: Operations on node only in XPath]
[Issue-0122: Overloaded functions]
[Issue-0171: Raising errors]
[Issue-0144: Representation of text nodes in formal values]
[Issue-0123: Semantics of /]
[Issue-0174: Semantics of 'element foo of type T']
[Issue-0107: Semantics of data()]
[Issue-0126: Semantics of effective boolean value]
[Issue-0110: Semantics of element and attribute constructors]
[Issue-0119: Semantics of op:to]
[Issue-0138: Semantics of Schema Context]
[Issue-0109: Semantics of sortby]
[Issue-0135: Semantics of special functions]
[Issue-0120: Sequence operations: value vs. node identity]
[Issue-0116: Serialization]
[Issue-0133: Should to also be described in the formal semantics?]
[Issue-0151: Should type annotations be optional?]
[Issue-0134: Should we define for with head and tail?]
[Issue-0168: Sorting by document order]
[Issue-0094: Static type errors and warnings]
[Issue-0175: Static typing for atomization, effective boolean values, and function arguments.]
[Issue-0166: Static typing for validate]
[Issue-0145: Static typing of path expressions in the presence of derivation by extension]
[Issue-0129: Static typing of union]
[Issue-0153: Support for lax and strict wildcards]
[Issue-0101: Support for mixed content in the type system]
[Issue-0143: Support for PI, comment and namespace nodes]
[Issue-0146: Support for substitution groups]
[Issue-0157: Support for wildcard namespaces]
[Issue-0158: Support for XML Schema groups]
[Issue-0089: Syntax for types in XQuery]
[Issue-0059: Testing Subtyping]
[Issue-0141: Treatment of nillability and xsi:nil]
[Issue-0142: Treatment of xsi:type in validation]
[Issue-0082: Type and expression operator precedence]
[Issue-0139: Type equivalence rules]
[Issue-0161: Type promotion in Atomization]
[Issue-0173: Typeswitch and type substitutability]
[Issue-0132: Typing for descendant]
[Issue-0112: Typing for the typeswitch default clause]
[Issue-0137: Typing of input functions]
[Issue-0080: Typing of parent]
[Issue-0148: Validation of an empty string against a string list]
[Issue-0115: What is in the default context?]
[Issue-0152: What is the type of the input functions?]
[Issue-0130: When to process the query prolog]
[Issue-0011: XPath tumbler syntax instead of index?]

C.3.2 Resolved (or redundant) Issues

Number: 98

[Issue-0015: 3-valued logic to support NULLs]
[Issue-0018: Align algebra types with schema]
[Issue-0071: Alignment with the XML Query Datamodel]
[Issue-0088: Align types with XML Schema : Formal Description.]
[Issue-0091: Attribute expression]
[Issue-0001: Attributes]
[Issue-0047: Attributes]
[Issue-0030: Automatic type coercion]
[Issue-0052: Axes of XPath]
[Issue-0065: Built-In GroupBy?]
[Issue-0027: Case syntax]
[Issue-0040: Case Syntax]
[Issue-0023: Catch exceptions and process in algebra?]
[Issue-0164: Coalescing text nodes in element constructors]
[Issue-0013: Collations]
[Issue-0010: Construct values by copy]
[Issue-0038: Copy by reachability]
[Issue-0037: Copy vs identity semantics]
[Issue-0039: Dereferencing semantics]
[Issue-0068: Document Collections]
[Issue-0003: Document Order]
[Issue-0063: Do we need (user defined) higher order functions?]
[Issue-0058: Downward Navigation only?]
[Issue-0005: Element identity]
[Issue-0064: Error code handling in Query Algebra]
[Issue-0092: Error expression]
[Issue-0035: Exception handling]
[Issue-0084: Execution model]
[Issue-0048: Explicit Type Declarations]
[Issue-0083: Expressive power and complexity of typeswitch expression]
[Issue-0009: Externally defined functions]
[Issue-0008: Fixed point operator or recursive functions]
[Issue-0046: FOR Syntax]
[Issue-0032: Full regular path expressions]
[Issue-0028: Fusion]
[Issue-0034: Fusion]
[Issue-0078: Generation of ambiguous types ]
[Issue-0045: Global Order]
[Issue-0036: Global-order based operators]
[Issue-0079: Global order between nodes in different documents]
[Issue-0054: Global vs. local complex types]
[Issue-0053: Global vs. local elements]
[Issue-0042: GroupBy]
[Issue-0012: GroupBy - needs second order functions?]
[Issue-0095: Importing Schemas and DTDs into query]
[Issue-0099: Incomplete/inconsistent mapping from to core]
[Issue-0102: Indentation, Whitespace]
[Issue-0022: Indentation, Whitespaces]
[Issue-0077: Interleaved repetition and closure]
[Issue-0060: Internationalization aspects for strings]
[Issue-0044: Keys and IDREF]
[Issue-0081: Lexical representation of Schema simple types]
[Issue-0033: Metadata Queries]
[Issue-0016: Mixed content]
[Issue-0061: Model for References]
[Issue-0087: More examples of Joins]
[Issue-0057: More precise type system; choice in path]
[Issue-0002: Namespaces]
[Issue-0062: Open questions for constructing elements by reference]
[Issue-0074: Operational semantics for expressions]
[Issue-0056: Operators on Simple Types]
[Issue-0069: Organization of Document]
[Issue-0075: Overloading user defined functions]
[Issue-0014: Polymorphic types]
[Issue-0108: Principal node types in XPath]
[Issue-0026: Project - one tag only]
[Issue-0051: Project redundant?]
[Issue-0050: Recursive Descent for XPath]
[Issue-0043: Recursive Descent for XPath]
[Issue-0031: Recursive functions]
[Issue-0007: References: IDREFS, Keyrefs, Joins]
[Issue-0004: References vs containment]
[Issue-0093: Representation of Text Nodes in type system]
[Issue-0067: Runtime Casts]
[Issue-0154: Semantics of =>]
[Issue-0111: Semantics of instance of ... only]
[Issue-0085: Semantics of Wildcard type]
[Issue-0066: Shallow or Deep Equality?]
[Issue-0041: Sorting]
[Issue-0006: Source and join syntax instead of "for"]
[Issue-0070: Stable vs. Unstable Sort/Distinct]
[Issue-0090: Static type-assertion expression]
[Issue-0097: Static type-checking vs. Schema validation]
[Issue-0020: Structural vs. name equivalence]
[Issue-0019: Support derived types]
[Issue-0104: Support for named typing]
[Issue-0096: Support for schema-less and incompletely validated documents]
[Issue-0086: Syntactic rules]
[Issue-0021: Syntax]
[Issue-0025: Treatment of empty results at type level]
[Issue-0105: Types for nodes in the data model.]
[Issue-0055: Types with non-wellformed instances]
[Issue-0049: Unordered Collections]
[Issue-0017: Unordered content]
[Issue-0076: Unordered types]
[Issue-0024: Value for empty sequences]
[Issue-0029: Views]
[Issue-0147: What should be the type annotation in the data model for anonymous types?]

C.4 Delegated Issues

XQuery 1.0 and XPath 2.0 Formal Semantics

W3C Working Draft 16 August 2002

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Preliminaries

2.1 Processing model

2.2 Namespaces

2.3 Data Model

2.3.1 Data model overview

2.3.2 Node identity

2.3.3 Document order and sequence order

2.3.4 Type annotations

2.4 Schemas and types

2.4.1 The elements of a (statically) typed language

2.4.2 XML Schema and the XQuery type system

2.4.3 Structural and named typing

2.4.4 Subtyping

2.4.4.1 Type substitutability

2.4.4.2 Subtyping and XML Schema derivation

2.5 Functions

2.5.1 Functions and operators

2.5.2 Functions and static typing

2.5.3 The error function

2.5.4 Data Model Accessors and XPath Axes

2.5.5 Other Formal Semantics functions

2.6 Notations

2.6.1 Grammar productions

2.6.2 Judgments

2.6.3 Inference rules

2.6.4 Environments

2.6.5 Putting it together

2.7 The Formal Semantics

2.7.1 Normalization

2.7.2 Static type inference

2.7.3 Dynamic Evaluation

3 The XQuery Type System

3.1 Values and Types

3.1.1 Values

3.1.2 Types

3.1.3 Top level definitions

3.1.4 Built-in type declarations

3.1.5 Syntactic constraints on types

3.1.6 Example

3.2 Auxiliary judgments

3.2.1 Derivation and substitution

3.2.1.1 Derives

3.2.1.2 Substitutes

3.2.2 Extension

3.2.3 Mixed content

3.2.4 Adjusts

3.2.5 Resolution

3.2.6 Derives

3.2.7 Lookup

3.2.8 Interleaving

3.2.9 Filtering

3.3 Matching

3.3.1 Nil-matches

3.3.2 Matches

3.3.3 Optimized matching

3.4 Erase and Annotate

3.4.1 Erasure

3.4.1.1 Simply erases

3.4.1.2 Erases

3.4.2 Annotate

3.4.2.1 Simply annotate

3.4.2.2 Nil-annotate

3.4.2.3 Annotate

3.5 Subtyping

3.5.1 Subtype

3.5.2 Type equivalence

3.6 Auxiliary typing judgments for "for", "unordered", and "sortby" expressions

3.6.1 Prime types

3.6.2 Computing Prime Types and Occurrence Indicators

3.7 Auxiliary typing judgments for "typeswitch" expressions

3.7.1 Computing common types and occurrence in typeswitch

3.7.2 Computing Common Prime Types and Occurrence Indicators

3.8 Major type issues

7.1.1 The `fs:characters-to-string` function