Copyright ©2001 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document presents the formal semantics of [XQuery 1.0: A Query Language for XML], an XML query language. This document replaces the [XML Query Algebra].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This is a First Public Working Draft for review by W3C Members and other interested parties. This document replaces the [XML Query Algebra]. It is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by the W3C membership.
This document has been produced as part of the W3C XML Activity, following the procedures set out for the W3C Process. The document has been written by the XML Query Working Group.
The purpose of this document is to present the current state of the formal semantics of [XQuery 1.0: A Query Language for XML] and to elicit feedback on its current state. The XML Query Working Group feels that it has made good progress on this document but that it is subject to change in future versions. Comments on this document should be sent to the W3C mailing list www-xml-query-comments@w3.org (archived at http://lists.w3.org/Archives/Public/www-xml-query-comments/). Important issues remain open - see [B.3.1 Open Issues]. In particular, the reader should note the following issues related to compatibility of the XQuery formal semantics with related XML activities.
[Issue-0089: Syntax for types in XQuery]: The XQuery formal semantics's is based on a subset of the XQuery surface syntax, but some misalignments exist. The XQuery formal semantics presents a syntax for type expressions that is not supported in the XQuery surface syntax. It also has a static type-assertion expression (see [Issue-0090: Static type-assertion expression]), an attribute constructor expression (see [Issue-0091: Attribute expression]), and an error expression (see [Issue-0092: Error expression]) that are not in the XQuery surface syntax
[Issue-0088: Align XQuery types with XML Schema : Formal Description.]: The XQuery formal semantics's type system is based on [XML Schema : Formal Description] (XSFD), but some misalignments exist. A related issue is [Issue-0018: Align algebra types with schema]. We assume that the XQuery formal semantics will be based on XSFD and leave alignment of XSFD and XML Schema for others to resolve.
[Issue-0056: Operators on Simple Types]: A joint XSLT/Schema/Query task force is chartered to define the operators on Schema simple types. XQuery will adopt the operators defined by that group.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
This document defines the formal semantics of XQuery, an XML query language. The formal semantics of XQuery is defined with respect to a ``core syntax'' of XQuery. XQuery's core syntax is based on a subset of the complete syntax that is available to users, and every expression in the user-level syntax can be rewritten as an expression in the core syntax. Although the intent is for the core syntax to be a proper subset of the complete XQuery syntax, some misalignments exist. The XQuery formal semantics presents a syntax for type expressions that is not supported in the XQuery surface syntax. It also has a static type-assertion expression (see [Issue-0090: Static type-assertion expression]), an attribute constructor expression (see [Issue-0091: Attribute expression]), and an error expression (see [Issue-0092: Error expression]) that are not in the XQuery surface syntax
A forthcoming
document defines operators and a library of built-in functions
for XQuery and XPath 2.0.
In this document, functions in this library have the namespace prefix
xfo.
In this document, ``query-analysis time'' refers to when an XQuery expression is parsed and type checked, that is before the value of the expression is computed, and ``query-evaluation time'' refers to when an XQuery expression is evaluated, that is, when it is reduced to a value. We sometimes use the phrases query-analysis time and ``compile time'' interchangeably as well as the phrases query-evaluation time and ``run time''.
This work builds on long-standing traditions in the database community. In particular, we have been inspired by systems such as SQL, OQL, and nested relational algebra (NRA). We have also been inspired by systems such as Quilt, UnQL, XDuce, XML-QL, XPath, XQL, XSLT, and YaTL. We give citations for all these systems below.
In the database world, it is common to translate a query language into an algebra; this happens in SQL, OQL, and NRA, among others. The purpose of the algebra is twofold. First, an algebra is used to give a semantics for the query language, so the operations of an algebra should be well-defined. Second, an algebra is used to support query optimization, so it should possess a rich set of laws. The core syntax of XQuery serves as an algebra for XQuery. The laws we give include analogues of most of the laws of relational algebra.
It is also common for a query language to exploit schemas or types; this happens in SQL, OQL, and NRA, among others. The purpose of types is twofold. Types can be used to detect certain kinds of errors at query-analysis time and to support query optimization. Given the type of input values, a query applied to those values, and the expected type of the query's output value, the XQuery type system can detect at query-analysis time if the query's output value has the expected output type.
DTDs and XML Schema can be thought of as providing something like types for XML. The XQuery formal semantics uses a a type system based on the formalism in [XML Schema : Formal Description]. On this basis, XQuery is statically typed. This allows an implementation of XQuery to determine and check at query-analysis time the output type of a query on documents conforming to an input type. Compare this to an untyped or dynamically typed query language, where each individual output has to be validated against a schema at query-evaluation time, and there is no guarantee that this check will always succeed.
To define the XQuery completely, we present a static semantics and a dynamic semantics. The static semantics is presented as type inference rules, which relate XQuery expressions to types and and specify under what conditions an expression is well typed. The semantics is static, because ill-typed expressions are identified at query-analysis time, i.e., before the query is evaluated. The dynamic, or operational, semantics is presented as value inference rules, which relate XQuery expressions to values. XQuery's values are defined in the [XQuery 1.0 and XPath 2.0 Data Model]; for example, they include XML simple values, attributes, and elements, as well as other values. A dynamic semantics guarantees that every expression can be reduced to a value and may serve as the basis for a query interpreter or compiler.
The document is organized as follows. A tutorial introduction is presented in [2 XQuery Semantics by Example]. The primary purpose of this tutorial to present various features of XQuery and to show how a type is computed for each XQuery expression. The reader is referred to [XQuery 1.0: A Query Language for XML] for a complete tutorial on XQuery's features. The grammar of XQuery's core syntax and the grammar for types are given in [3 XQuery Core Syntax]. We give the static typing rules for XQuery in [4 Static Semantics : Type-Inference Rules] and then the dynamic semantics for XQuery in [5 Dynamic Semantics : Value-Inference Rules]. These sections formalize the information presented informally in [2 XQuery Semantics by Example]. Although these two sections contain the most challenging material, we have tried to make the content as accessible as possible. Readers only interested in learning about XQuery's features need not read these sections, however, we expect that implementors of XQuery will read them. Finally, in [6 XQuery Mapping to Core], we present the mapping from complete syntax of XQuery to the core syntax. We note here that this section is still preliminary and contains inconsistencies (see [Issue-0099: Incomplete/inconsistent mapping from XQuery to core ]).
In [B.2 Issues list], we discuss open issues and problems. We present some equivalence and optimization laws of XQuery in [A Equivalences].
Cited literature includes: monads [Mog89], [Mog91], [Wad92], [Wad93], [Wad95], NRA [BNTW95], [Col90], [LW97], [LMW96], OQL [BK93], [BKD90], [CM93], Quilt [Quilt], SQL [Date97], UnQL [BFS00], XDuce [HP2000], XMSL-QL [XMLQL99], XPath [XPath], XQL [XQL99], XSLT [XSLT 99], and YaTL [YAT99].
For a complete introduction to XQuery, see [XQuery 1.0: A Query Language for XML]. This document focuses on the static type and dynamic operational semantics of XQuery. This section introduces the static and dynamic semantics of XQuery, using examples based on accessing a database of books.
Consider the following sample data:
<bib>
<book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
<book year="2001" isbn="1-XXXXX-YYY-Z">
<title>XML Query</title>
<author>Fernandez</author>
<author>Suciu</author>
</book>
</bib>
Here is a fragment of an XML Schema for such data:
<xs:group name="Bib">
<xs:element name="bib">
<xs:complexType>
<xs:group ref="Book"
minOccurs="0" maxOccurs="unbounded"/>
</xs:complexType>
</xs:element>
</xs:group>
<xs:group name="Book">
<xs:element name="book">
<xs:complexType>
<xs:attribute name="year" type="xs:integer"/>
<xs:attribute name="isbn" type="xs:string"/>
<xs:element name="title" type="xs:string"/>
<xs:element name="author"type="xs:string" maxOccurs="unbounded"/>
</xs:complexType>
</xs:element>
</xs:group>
In the XQuery formal semantics, we present a syntax for type expressions that allows us to explain how the type of an XQuery expression is inferred. This type syntax is not in the XQuery surface syntax (see [Issue-0089: Syntax for types in XQuery]. In addition, the XQuery formal semantics includes an expression that asserts statically the type of an expression (see [Issue-0090: Static type-assertion expression]), an attribute constructor expression (see [Issue-0091: Attribute expression]), and an error expression (see [Issue-0092: Error expression]) that are not in the XQuery surface syntax. With the exception of type expressions and the static type-assertion expression, all other XQuery expressions in this document are in the XQuery surface syntax. The data and schema above is represented as follows:
TYPE Bib = ELEMENT bib (Book*)
TYPE Book =
ELEMENT book
( ATTRIBUTE year (xs:integer) &
ATTRIBUTE isbn (xs:string)
ELEMENT title (xs:string),
(ELEMENT author(xs:string))+
)
LET $bib0 :=
<bib>
<book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
<book year="2001" isbn="1-XXXXX-YYY-Z">
<title>XML Query</title>
<author>Fernandez</author>
<author>Suciu</author>
</book>),
</bib> : Bib
RETURN ...
The expression above defines two types, Bib
and Book, and defines one variable,
$bib0.
The Bib type corresponds to a single
bib element, which contains a
sequence of zero or more Book
elements.
Every attribute or element can be viewed as a sequence of length one.
The Book type corresponds to a single
book element,
which also contains a sequence of zero or more attributes and elements.
It contains one
year attribute
and one isbn attribute, followed by one
title element,
followed by one or more
author elements.
The & operator, called the interleave operator,
indicates that the year and
isbn attributes may occur in any order.
A isbn attribute and
a title or
author element contains a string value, and a
year attribute contains an integer.
The let expression above binds the
variable $bib0 to a literal XML
value.
The variable $bib0 is in scope for all expressions in
the body of the RETURN clause.
For convenience, the RETURN ...
indicates that the expressions in the rest of this document are
contained within the scope of this LET
expression.
The value of a variable is immutable, that is, once
a variable is defined, its value does not change.
The value of $bib0 is a
bib element that contains two book
elements.
XQuery is a strongly typed language, therefore the value of
$bib0 must be an instance of its declared type, or the
expression is ill-typed. Here the value of $bib0 is an
instance of the Bib type, because it contains one
bib element, which contains two
book elements, each of which contain
an integer-valued year attribute, a string-valued
isbn attribute, a string-valued
title element,
and one or more string-valued author elements.
For convenience, we define a second global variable
$book0 also bound to a literal value, which is
equal to the first book in bib0.
LET $book0 :=
<book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book> : Book
RETURN ...
One of XQuery's most basic operations is projection.
The following expression
returns all author elements contained in
book elements contained in
$bib0:
$bib0/book/author
==>(<author>Abiteboul</author>,
<author>Buneman</author>,
<author>Suciu</author>,
<author>Fernandez</author>,
<author>Suciu</author>)
: (ELEMENT author (xs:string))*
Note that in the result, the document order of
author elements is preserved.
The above example and the ones that follow have three parts. First is an
expression in XQuery. Second, following the
==> is the value of this expression. Third,
following the : is the type of the expression, which
is (of course) also a legal type for the value.
It may be unclear why the type of $bib0/book/author
contains zero or more authors, even though the type of
a book element contains one
or more authors. Let's look at the derivation of the result type by looking at
the type of each sub-expression:
$bib0 : Bib
$bib0/book : Book*
$bib0/book/author : (ELEMENT author (xs:string))*
Recall that Bib, the type of bib0, may
contain zero or more Book
elements, therefore the expression bib0/book might
contain zero book elements, in which case,
bib0/book/author would contain no authors.
This illustrates an important feature of the type system: the type of an
expression depends only on the type of its sub-expressions. It also illustrates
the difference between an expression's value at query-evaluation time and its
type at query-analysis time.
Since the type of $bib0 is
Bib, the best type for
$bib0/book/author is one listing zero or more authors,
even though for the given value of $bib0, the expression will
always contain exactly five authors.
Its also possible to project on attributes. This expression
produces the year attribute of $book0 whose
type is ATTRIBUTE year (xs:string).
$book0/@year ==> ATTRIBUTE year "1999" : ATTRIBUTE year (xs:string)
One may access simple data (strings, integers, or booleans) using
the keyword data(). For instance, if we wish to select all
author names in a book, rather than all author elements, we could
write the following.
$book0/author/data()
==> ("Abiteboul",
"Buneman",
"Suciu")
: xs:string+
Similarly, it is possible to project the simple values of attributes. The following returns the year the book was published.
$book0/@year/data() ==> 1999 : xs:integer
The data() operator has a similar purpose to the
the text() node test
in XPath 1.0, in that they both project the atomic values in a
document.
In XPath 1.0, text selects the text node
children of an element node, where as in XQuery, data
returns the simple-typed value of the element node.
We chose the keyword data() because, as the second example
shows, not all data items are strings.
Another common operation is to iterate over elements in a document so that their content can be transformed into new content. Here is an example of how to process each book to list the author before the title, and remove the year and isbn.
FOR $b IN $bib0/book RETURN
<book> { $b/author, $b/title } </book>
==> (<book>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
<title>Data on the Web</author>
</book>,
<book>
<author>Fernandez</author>
<author>Suciu</author>
<title>XML Query</author>
</book>)
: (ELEMENT book(
(ELEMENT author(xs:string))+,
ELEMENT title(xs:string))
)*
The for expression iterates over all
book elements in $bib0, and binds the
variable $b to each such element. For each element
bound to $b, the inner expression constructs a
new book element containing the book's authors followed
by its title. The transformed elements appear in the same order as they occur
in $bib0.
In the result type, a book element is guaranteed
to contain one or more authors followed by one title. Let's look at the
derivation of the result type to see why:
$bib0/book : Book*
$b : Book
$b/author : (ELEMENT author(xs:string))+
$b/title : ELEMENT title (xs:string)
The type system can determine that $b is
always Book, therefore the type
of $b/author is (ELEMENT author(xs:string))+,
and the type of $b/title is ELEMENT title (xs:string).
In general, the value of a for expression is a sequence
of zero or more data-model values as defined in [XQuery 1.0 and XPath 2.0 Data Model].
If the body of the for expression itself yields a sequence, then all of
the sequences are concatenated together. For instance, the
expression:
FOR $b IN $bib0/book RETURN
$b/author
is exactly equivalent to the expression $bib0/book/author.
To select values that satisfy some predicate, we use
the where expression. For example, the following
expression selects all book elements in
$bib0 that were published before 2000.
FOR $b IN $bib0/book
WHERE $b/@year/data() <= 2000 RETURN
$b
==> <book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
: Book*
In general, an expression of the form:
where e1 return e2
is converted to the form
if e1
then e2 else ()
WHERE e1 and e2 are
expressions. Here () is an expression that stands for
the empty sequence, a sequence that contains no attributes or elements. We also write
() for the type of the empty sequence.
According to this rule, the expression above translates to
FOR $b IN $bib0/book RETURN IF $b/@year/data() <= 2000 THEN $b ELSE ()
and this has the same value and the same type as the preceding expression.
The following expression selects all book elements
in $bib0 that have some author
named "Buneman".
FOR $b IN $bib0/book
WHERE SOME $a IN $b/author SATISFIES $a/data() = "Buneman" RETURN
$b
==> <book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
: Book*
We can use the every expression to find all
books where all the authors are Buneman:
FOR $b IN $bib0/book
WHERE EVERY $a IN $b/author SATISFIES $a/data() = "Buneman" RETURN
$b
==> ()
: Book*
There are no such books, so the result is the empty sequence.
Another common operation is to join values from one or more documents. To illustrate joins, we give a second data source that defines book reviews:
TYPE Reviews =
ELEMENT reviews (
(ELEMENT book (
ELEMENT title (xs:string),
ELEMENT review (xs:string))
)*
)
LET $review0 :=
<reviews>
<book>
<title>XML Query</title>
<review>A darn fine book.</review>
</book>,
<book>
<title>Data on the Web</title>
<review>This is great!</review>
</book>
</review> : Reviews
RETURN ...
The Reviews type contains one
reviews element, which contains zero or more
book elements;
each book contains a title and a review.
We can use nested for expressions to join the two
sources $review0 and $bib0 on
title values. The result combines the title, authors, and reviews for each
book.
FOR $b IN $bib0/book, $r IN $review0/book
WHERE $b/title/data() = $r/title/data() RETURN
<book>{ $b/title, $b/author, $r/review }</book>
==> (<book>
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
<review>A darn fine book.</review>
</book>,
<book>
<title>XML Query</title>
<author>Fernandez</author>
<author>Suciu</author>
<review>This is great!</review>
</book>)
: (ELEMENT book(
ELEMENT title (xs:string),
(ELEMENT author (xs:string))+,
ELEMENT review (xs:string))
)*
Note that the outer-most for expression determines
the order of the result. Readers familiar with optimization of relational join
queries know that relational joins commute, i.e., they can be evaluated in any
order. This is not true for XQuery: changing the order of the first
two for expressions would produce different output.
In [2.8 Unordered built-in function], we introduce support for
unordered sequences, which permits commutable joins.
It is beyond the scope of this document to describe algorithms for evaluating nested loop joins. See [Graefe93] for a survey.
As discussed in [2.7 Join] joins do not
commute on ordered forests. In databases, ordering often does not matter. To
permit commutable joins, and to allow for other query
optimization techniques, XQuery also allows to explicitly disregard the order
of a sequence. This is accomplished by the built-in function UNORDERED.
The expression UNORDERED(Expr) may return any permutation of the sequence
returned by Expr. For example, when applying UNORDERED to the
join-query [2.7 Join], the result may be either ordered as in
[2.7 Join] or as below:
UNORDERED(
FOR $b IN $bib0/book, $r IN $review0/book
WHERE $b/title/data() = $r/title/data() RETURN
<book> { $b/title, $b/author, $r/review } </book>
)
==> (<book>
<title>XML Query</title>
<author>Fernandez</author>
<author>Suciu</author>
<review>This is great!</review>
</book>,
<book>
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
<review>A darn fine book.</review>
</book>)
: ELEMENT book (
ELEMENT title (xs:string),
ELEMENT author (xs:string)+,
ELEMENT review (xs:string)
)*
The expression UNORDERED(Expr) satisfies some useful laws that can be used for
optimization. E.g., UNORDERED distributes over FOR,
and nested FOR expressions on UNORDERED
expressions are
commutative; see also Rules 12--18 in [A.2 Laws].
On this basis joins can be commuted, i.e., switching the
inner FOR expression with the outer FOR expression,
does not change the semantics of the above query:
UNORDERED(
FOR $r IN $review0/book, $b IN $bib0/book
WHERE $b/title/data() = $r/title/data() RETURN
<book> { $b/title, $b/author, $r/review } </book>
)
Note that unordered sequences are currently not distinguished from ordered
sequences at type level. This is mainly for two reasons: (1) XML Schema does not distinguish
between unordered sequences and ordered sequences and (2) the distinction requires to
overload all built-in operators for sequences, such as FOR, DISTINCT,
as well as user defined functions on sequences.
Many of the previous queries select values and return them or use them in the construction of new values. Once a value is selected, however, the previous queries do not access the original source of the selected value, i.e., the document or hierarchy of elements in which the selected value is contained. It is sometimes useful, however, to access or preserve the original context of selected nodes.
Consider the following example, which contains a new bibliography of
articles in bib1:
TYPE Bib1 = <bib>Article*</bib>
TYPE Article =
ELEMENT article(
ATTRIBUTE year (xs:integer),
ELEMENT title (xs:string),
ELEMENT journal(xs:string),
(ELEMENT author (xs:string))+
)
LET $bib1 :=
<bib>
<article year="2000">
<title>Queries and computation on the web</title>
<journal>Theoretical Computer Science</journal>
<author>Abiteboul</author>
<author>Vianu</author>
</article>
</bib> : Bib1
RETURN ...
Assume there exists a full-text search function,
contains,
which given a set of documents,
selects elements that contain a particular keyword.
(This function is not defined in XQuery, but is used
here to illustrate a point.)
The details of function application and declaration are given in
[2.18 Functions].
The following expression returns those elements in bib0
and bib1 that contain the keyword "Abiteboul".
Note that the result type of the expression is
AnyTree*. This is because the contains
function cannot know apriori which elements, if any, contain a given keyword.
FOR $a IN contains(($bib0, $bib1), "Abiteboul") RETURN
$a
==> (<author>Abiteboul<author>, <author>Abiteboul</author>)
: AnyTree*
The result above does not provide the context in which
the two author elements occur. Even if the
contains function did return more context,
it might be useful to browse the context in
which they occurred, for example, by accessing their parent and/or
sibling elements.
The built-in function parent accesses the parent of an
attribute or element. For example, this expression returns more
useful information than the previous one:
FOR $a IN contains(($bib0, $bib1), "Abiteboul") RETURN
$a/..
==> (<book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>,
<article year="2000">
<title>Queries and computation on the web</title>
<journal>Theoretical Computer Science</journal>
<author>Abiteboul</author>
<author>Vianu</author>
</article>)
: AnyElement*
Note that the result type of the expression is
AnyElement*. When applied to
an attribute or element value, the parent function
always has return type AnyElement?, i.e., zero or one AnyElement.
This is because XQuery's type system only
preserves type information about an attribute or element's content,
not about its containing parent.
It is possible to recover more precise type information
with the dynamic or run-time treat operator, which attempts to
cast at run time an expression to a given type.
If the expression does not have the given type,
a run-time error is raised.
Dynamic casts are necessary when it not possible to determine at
query-analysis time the most precise type of a value; they are sometimes
called ``down casts''.
For example, the use of parent in the following expression
loses some useful type information, that is that
p is a Book. We can recover more
precise information by casting p to the Book type:
FOR $p IN $book0/title/.. RETURN
TREAT AS Book ($p)
==> <book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
: Book?
The result type is Book? because the
parent function has type AnyElement?.
If we try erroneously to cast p to an
Article, the error value is returned.
Its type is again Article?,
because AnyElement could be an Article:
FOR $p IN $book0/title/.. RETURN
TREAT AS Article ($p)
==> error
: Article?
However, if we try to cast $book0 to an
Article,
the result type becomes Ø,
the empty choice, because we can statically determine that a Book is not
an Article.
FOR $p IN $book0 RETURN
TREAT AS Article ($p)
==> error
: 0
We have already seen many examples of static or
compile-time casting. A static cast permits
the type of an expression to be changed and checked at query-analysis time;
they are sometimes called ``up casts''.
For example, consider the type Book0, which permits a
book to have zero or more authors.
TYPE Book0 =
ELEMENT book(
ATTRIBUTE year (xs:integer) &
ATTRIBUTE isbn (xs:string),
ELEMENT title (xs:string),
(ELEMENT author (xs:string))*
)
The explicit-type expression e : t
statically casts a value to the given type.
For example, the expression below statically casts $book0 to
Book0; this is permissible
because the type of $book0 at query-analysis time is a sub-type of
Book0.
$book0 : Book0
==> <book year"1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
: Book0
If we try erroneously to cast $book0
to a more precise type (e.g., a book with 4 or more authors),
a type error will occur at query-analysis time.
The uses of the parent operator
in [2.9 Parent and treat operators] show
that it is possible to access the original context of nodes. This is
possible because the XML Query Data Model supports node
identity, that is, every instance of a node (e.g., element,
attribute, processing instruction, and comment) in the data model has
a unique identity. We can compare the identity
of two nodes for equality using the == operator.
For example, in the following expression, two distinct element nodes
are created and bound
to variables a1 and a2.
Although the two nodes are structurally equal, their identities
are not equal:
LET $a1 := <author>Suciu</author>,
$a2 := <author>Suciu</author>
RETURN
$a1 == $a2
==> false
: xs:boolean
In general, all XQuery's operators preserve node identity.
There is one exception: the element constructor, which given a tag name
and a sequence of children nodes, constructs a new element.
A new element's content does not refer directly to
the given children nodes, but to copies of these nodes.
For example, the following expression constructs
an element with name newbook and content
$book0/author, $book0/title:
LET $book1 :=
<newbook>
{ $book0/author, $book0/title }
</newbook>
RETURN $book1
==> <newbook>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
<title>Data on the Web</title>
</newbook>
: ELEMENT newbook (
(ELEMENT author (xs:string))+,
ELEMENT title (xs:string)
)
The newbook
element contains copies of the nodes in the sequence
$book0/author, $book0/title,
not the original nodes in $book0.
Copying guarantees that a node is always the
parent of its child nodes and a node is always
the child of its parent; these constraints are invariants of
[XQuery 1.0 and XPath 2.0 Data Model].
For example, we would expect that
the following expression is always true:
$book1/author/.. == $book1If the element constructor did not copy its arguments, anomalies such as the following could occur:
$book1/author/.. == $book0that is, the parent of
book1's child node
is not book1, and
this would violate the XML Query Data Model's parent-child invariant.
Sometimes it is useful to construct elements that do
preserve the identity of its child nodes, for example,
when constructing a view
of one or more XML documents.
In this case,
we want the new element to
contain references to, not copies of,
the original nodes.
The ref operator constructs a reference
to a node. For example, book2 below
contains references to the nodes in $book0:
LET $book2 :=
<newbook>
{ (FOR $a IN $book0/author RETURN ref($a)),
ref($book0/title)
}
</newbook>
RETURN $book2
==> <newbook>
<q:ref><author>Abiteboul</author></q:ref>
<q:ref><author>Buneman</author></q:ref>
<q:ref><author>Suciu</author></q:ref>
<q:ref><title>Data on the Web</title></q:ref>
</newbook>
: ELEMENT newbook (
(REFERENCE (ELEMENT author (xs:string)))+,
REFERENCE (ELEMENT title (xs:string))
)
Ed. Note: MF : Issue - serialized representation of Data Model reference nodes.
Note that the type of the expression contains reference types.
The deref operator dereferences a reference value.
In the following, it returns the elements in $book0.
FOR $v IN $book2/* RETURN deref($v)
==> (<author>Abiteboul</author>,
<author>Buneman</author>,
<author>Suciu</author>,
<title>Data on the Web</title>)
: (ELEMENT author (xs:string))+,
ELEMENT title (xs:string)
For convenience, the expression above can be also be written as
$book2/deref().
Often it is useful to regroup elements in an XML document. For example,
each book element in $bib0
groups one title with multiple authors. This expression groups each author
with the titles of his/her publications.
FOR $a IN distinct-value($bib0/book/author/data()) RETURN
<biblio>
<author>{ $a }</author>
{ FOR $b IN $bib0/book, $a2 IN $b/author/data()
WHERE $a = $a2 RETURN
$b/title
}
</biblio>
==> (<biblio>
<author>Abiteboul</author>
<title>Data on the Web</title>
</biblio>,
<biblio>
<author>Buneman</author>
<title>Data on the Web</title>
</biblio>,
<biblio>
<author>Suciu</author>
<title>Data on the Web</title>
<title>XML Query</title>
</biblio>,
<biblio>
<author>Fernandez</author>
<title>XML Query</title>
</biblio>)
: (ELEMENT biblio (
ELEMENT author (xs:string),
(ELEMENT title (xs:string))*)
)*
Readers may recognize this expression as a self-join of books on authors.
The expression distinct-value($bib0/book/author/data()) produces a
sequence of author names whose values are all distinct.
The outer
for expression binds $a to the name of each
author element, and the inner for expression selects the
title of each book that has some author whose name equals $a.
Here distinct-value is an example of a built-in function.
It produces a sequence of nodes whose values are all distinct,
i.e., there are no duplicate values; the order of the
resulting sequence is not defined.
The builtin function distinct-node
produces a sequenc of nodes whose identities are all distinct.
The type of the result expression may seem surprising:
each biblio element may contain
zero or more title elements,
even though in $bib0 every author
co-occurs with a title. Recognizing such a constraint is
outside the scope of the type system, so the resulting type is not as precise
as we would like.
Ed. Note: MF: Clearly, the following example of index is not appropriate for a tutorial -- it is only used to define RANGE.
Often it is useful to query the order of elements in an sequence or a document. There are two kinds of order among elements: local order and document (or global) order. XQuery supports querying of local and global order.
Local order refers to the order among sibling elements in an sequence.
To query local order, the index function pairs an integer
index with
each element in an sequence:
index($book0/author)
==> (<q:pair><q:fst>1</q:fst>
<q:snd><q:ref><author>Abiteboul</author></q:ref></q:snd></q:pair>,
<q:pair><q:fst>2</q:fst>
<q:snd><q:ref><author>Buneman</author></q:ref></q:snd></q:pair>
<q:pair><q:fst>3</q:fst>
<q:snd><q:ref><author>Suciu</author></q:ref></q:snd></q:pair>)
: (ELEMENT q:pair(
ELEMENT q:fst (xs:integer),
ELEMENT q:snd (REFERENCE (ELEMENT author (xs:string))))
)+
The index function uses reference in order to preserve node identity
when accessing local order. Note that the result type takes into
account that at least one pair exists in the result, as $book0/author
always contains one or more authors.
Once we have paired authors with an integer index, we can select the first two authors:
FOR $p IN index($book0/author)
WHERE ($p/q:fst/data() <= 2) RETURN
$p/q:snd/deref()
==> (<author>Abiteboul</author>,
<author>Buneman</author>)
: (ELEMENT author (xs:string))*
The for expression iterates over all pair elements produced by the
index expression. It selects elements whose index value in the q:fst
element is between one and two inclusive, and it returns the original
content by dereferencing the content of the q:snd element.
The result type may be surprising, because the Book type
guarantees that each book has at least one author. However, the type
system cannot determine that the conditional where
expression will always
succeed, so the inner expression may produce zero results. (A
sophisticated analysis might improve type precision, but is likely to
require much work for little benefit.)
Document (or global) order refers to the total order among all elements in a document. Global order is defined as the order of appearance of the element nodes when performing a pre-order, depth-first traveral of a tree. This corresponds to the order of appearance of their opening tags in the XML serialization. This is equivalent to the definition used in [XPath].
To query global order, the xfo:node-before function
can be applied to two nodes.
It returns true
if the first node is before (and
different from) the second node in document order. It returns false if
the first node is equal to or after the second node in document
order. It raises an error if the nodes are in different documents.
For example, the nodes bib0 and review0
are unrelated therefore comparing their order raises an error:
xfo:node-before($bib0, $review0) ==> ERROR : 0
The xfo:node-after function is defined
similiarly.
Using global order, the following expression returns all author nodes appearing after a book written in 2001:
FOR $b IN $bib0/book
WHERE $b/@year/data() = 2001 RETURN
(FOR $a IN $bib0/book/author
WHERE $b before $a RETURN $a)
==> (<author>Fernandez</author>,
<author>Suciu</author>)
: (ELEMENT author (xs:string))*
Note that the root element of a document is before any other
element. More generally, an element is before all of its children.
For example, the set of elements that are before $bib0 is
empty:
empty(FOR $b IN $bib0/book
WHERE $b before $bib0 $b RETURN
$b)
==> true
: xs:boolean
XQuery supports global order only for elements within the same document. Support for global order among elements in distinct documents is discussed in [Issue-0003: Global Order].
To sort a sequence, XQuery provides a sort
expression, whose form is:
Expr1 sortby Expr2.
The built-in variable '.', called dot, ranges over the items in the sequence Expr1
and sorts those items using the key value
defined by Expr2.
For example, this expression sorts book elements in
$review0 by their titles.
$review0/book SORTBY ./title/data()
==> (<book>
<title>Data on the Web</title>
<review>This is great!</review>
</book>,
<book>
<title>XML Query</title>
<review>This is pretty good too!</review>
</book>)
: (ELEMENT book (
ELEMENT title (xs:string),
ELEMENT review (xs:string))
)*
The sort expression is a restricted
form of higher-order
function, i.e., it takes a function as an argument.
In this case, sort takes a single function, which
extracts the key value from each element.
The sort expression requires that the less-than
inequality, <, be defined for
the type of Expr2.
We have already seen two built-in
functions: index and distinct-value.
In addition to these
functions, XQuery has five built-in aggregation
functions: avg, count,
max, min, and sum.
This expression selects books that have more than two authors:
FOR $b IN $bib0/book
WHERE count($b/author) > 2 RETURN
$b
==> <book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
: Book*
All the aggregation functions take a sequence with repetition type and return
an integer value; count returns the number of elements
in the sequence.
So far, all our examples
of attributes and elements use unqualified local
names, i.e., names that do not include an explicit namespace
URI.
It is also possible to specify and match on the expanded
name of an attribute or element.
The expanded name Namespace:LocalName consists of a namespace URI
Namespace and a local name LocalName.
Consider an inventory of books that contains data from
both http://www.BooksRUs.com and
http://www.cheapBooks.com.
In this example, the first element contains values whose names are
defined in the BooksRUs.com namespace, and the second element
contains values whose names are defined in the
cheapBooks.com
namespace:
NAMESPACE booksRus = "http://www.BooksRUs.com/books.xsd"
NAMESPACE cheapBooks = "http://www.cheapBooks.com/ourschema.xsd"
TYPE Inventory = <inv> InvBook* </inv>
LET $inventory :=
<inv>
<booksRus:book year="1999" isbn="1-55860-622-X">
<booksRus:title>Data on the Web</booksRus:title>
<booksRus:author>Abiteboul</booksRus:author>
<booksRus:author>Buneman</booksRus:author>
<booksRus:author>Suciu</booksRus:author>
</booksRus:book>
<cheapBooks:book year="2001">
<cheapBooks:title>XML Query</cheapBooks:title>
<cheapBooks:author>Fernandez</cheapBooks:author>
<cheapBooks:author>Suciu</cheapBooks:author>
<cheapBooks:isbn>1-XXXXX-YYY-Z</cheapBooks:isbn>
</cheapBooks:book>
</inv> : Inventory
RETURN ...
In this example, elements imported from existing schemas each refer
to a single namespace, thus the definition of InvBook is:
TYPE BooksRUBook =
ELEMENT booksRus:book (
ATTRIBUTE year (xs:integer) &
ATTRIBUTE isbn (xs:string),
ELEMENT booksRus:title(xs:string)
(ELEMENT booksRus:author(xs:string))+
)
TYPE CheapBooksBook =
ELEMENT cheapBooks:book (
ATTRIBUTE year (xs:integer),
ELEMENT cheapBooks:title (xs:string),
(ELEMENT cheapBooks:author(xs:string))+
ELEMENT cheapBooks:isbn (xs:string)
)
TYPE InvBook = BooksRUBook | CheapBooksBook
Here vertical bar (|) is used to indicate a
choice between types: each InvBook
is either a BooksRUBook or
a CheapBooksBook.
We have already seen how to project on the constant name of an attribute or
element.
It is also useful to project on wildcards, which
are used to match names with any namespace and/or any local name.
For example, this expression matches elements
with any local name and with
namespace URI
http://www.BooksRUs.com/books.xsd:
$inventory/booksRus:*
==> <booksRus:book year="1999" isbn="1-55860-622-X">
<booksRus:title>Data on the Web</booksRus:title>
<booksRus:author>Abiteboul</booksRus:author>
<booksRus:author>Buneman</booksRus:author>
<booksRus:author>Suciu</booksRus:author>
</booksRus:book>
: ELEMENT booksRus:book (
ATTRIBUTE year (xs:integer) &
ATTRIBUTE isbn (xs:string),
ELEMENT booksRus:title (xs:string)
(ELEMENT booksRus:author (xs:string))+
)
Similarly, this expression first projects elements
in any namespace whose local name is book
and then projects on their year attributes:
$inventory/*:book/@year
==> (ATTRIBUTE year (1999), ATTRIBUTE year (2001))
: (ATTRIBUTE year (xs:integer))*
Ed. Note: MF: Open issue whether *:localname will be supported.
The expression Expr/a is shorthand for
Expr/ns:a, where
ns is the default namespace.
Similarly, * is shorthand for
ns:*, i.e., any name in the default namespace.
In an XML document, comments and processing instructions may appear anywhere outside other markup[XML]. Processing instructions permit documents to contain instructions for applications. Comments and processing instructions are not part of the document's character data. An XML processor may, but need not, make the text of comments available to an application, but it must pass processing instructions to the application. The processing instruction begins with a target used to identify the application to which the instruction is directed.
XQuery supports comments and processing instructions
in types and expressions.
The type expression PIC(t) denotes
a value in which zero or more processing instructions and comments may be interleaved
arbitrarily with the nodes in type t.
For example, the two element types BibPIC and
BookPIC permit PIs and comments to be interleaved
with their content:
TYPE BibPIC = ELEMENT bib (pic(BookPIC*))
TYPE BookPIC =
ELEMENT book (
ATTRIBUTE year (xs:integer) &
ATTRIBUTE isbn (xs:string),
PIC (ELEMENT title (xs:string), (ELEMENT author(xs:string))+)
)
Note that in the book element, the PIC
operator is only applied to its element content, not its attribute content,
because comments and processing instructions may not occur in
attributes.
We can construct processing instruction and comment values using the
built-in constructors processing_instruction and comment:
LET $bibpc0 :=
<bib>
{ comment("Canonical XQuery example.") }
<book year="1999" isbn="1-55860-622-X">
{ comment("First book example"),
processing_instruction("Publisher.asp",
"publisher=http://www.mkp.com") }
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
<book year="2001" isbn="1-XXXXX-YYY-Z">
<title>XML Query"</title>
{ comment("Second book example") }
<author>Fernandez</author>
<author>Suciu</author>
</book>
</bib> : BibPIC
RETURN ...
Finally, we can project on processing instructions and comments, in the same way we project on children, attributes, and simple content:
$bibpc0/book/comment()
==> (comment("First book example"), comment("Second book example"))
: Comment*
$bibpc0/book/processing_instruction()
==> processing_instruction("Publisher.asp", "publisher=http://www.mkp.com")
: ProcessingInstruction*
Comments and processing instructions may be ignored by an XML processor, in which case they would not even be accessible to a query processor. If they are not ignored, however, comments and processing instructions are typed values and are treated like any other value in an XQuery expression.
An element type has mixed content when elements of that type may
contain character data, optionally interspersed with child
elements[XML].
The type expression mixed(t) denotes
a value in which zero or more xs:string values may be interleaved
arbitrarily with the nodes in type t.
For example, the content of the review element
contains a reviewer element, which may be interleaved
with string values:
TYPE ReviewsMixed =
ELEMENT reviews (
(ELEMENT book (
ELEMENT title (xs:string),
ELEMENT review (MIXED (ELEMENT reviewer(xs:string)))))*
)
Here are two examples of mixed content, in which the text of the book review may be interleaved with the name of the reviewer:
LET $reviewmix0 :=
<reviews>
<book>
<title>XML Query</title>
<review>A darn fine book: <reviewer>XML On-line</reviewer></review>
</book>
<book>
<title>Data on the Web</title>,
<review>The <reviewer>publisher</reviewer> says 'This is great!'</review>
<book>
</reviews> : ReviewsMixed
RETURN ...
It is often useful to
concatenate all the string values of a mixed-content element to
recover
its complete text value.
We use the builtin function string_value;
as defined in XPath [XPath],
the string value of a node is determined by its kind,
e.g., element, attribute, etc.
FOR $b IN $reviewmix0/book RETURN
string_value($b/review)
==> ("A darn fine book : XML On-line",
"The publisher says 'This is great!'")
: xs:string*
Functions can make queries more modular and concise. Recall that we used the following query to find all books that do not have "Buneman" as an author.
FOR $b IN $bib0/book
WHERE EVERY $a IN $b/author SATISFIES NOT($a/data() = "Buneman") RETURN
$b
==> <book year="2001" isbn="1-XXXXX-YYY-Z">
<title>XML Query</title>
<author>Fernandez</author>
<author>Suciu</author>
</book>
: Book*
A different way to formulate this query is to first define a function that
takes a string s and a book b
as arguments, and returns true if book b does not have
an author with name s.
DEFINE FUNCTION notauthor (xs:string $s, Book $b) RETURNS xs:boolean {
EVERY $a IN $b/author SATISFIES NOT($a/data() = $s)
}
The query can then be re-expressed as follows.
FOR $b IN bib0/book
WHERE notauthor("Buneman", $b) RETURN
$b
==> <book year="2001" isbn="1-XXXXX-YYY-Z">
<title>XML Query</title>
<author>Fernandez</author>
<author>Suciu</author>
</book>
: Book*
Note that a function declaration includes the types of all its arguments and the type of its result. This is necessary for the type system to guarantee that applications of functions are type correct.
In general, any number of functions may be declared at the top-level. The order of function declarations does not matter, and each function may refer to any other function. Among other things, this allows functions to be recursive (or mutually recursive), which supports structural recursion, the subject of the next section.
Functions make XQuery extensible. We have seen examples of built-in
functions (sort and
distinct-value) and examples of user-defined functions (notauthor).
In addition to built-in and user-defined
functions, XQuery could support externally defined functions, i.e.,
functions that are not defined in XQuery itself, but in some external
language. This would make special-purpose implementations of, for example,
full-text search functions available in XQuery. We discuss support for
externally defined functions in [Issue-0009:
Externally defined functions].
XML documents can be recursive in structure, for example, it is possible to
define a part element that directly or indirectly
contains other part elements. In XQuery, we use
recursive types to define documents with a recursive structure, and we use
recursive functions to process such documents. (We can also use mutually
recursive functions for more complex recursive structures.)
For instance, here is a recursive type defining a part hierarchy.
TYPE Part = Basic | Composite
TYPE Basic =
ELEMENT basic (
ELEMENT cost (xs:integer)
)
TYPE Composite =
ELEMENT composite (
ELEMENT assembly_cost(xs:integer)
ELEMENT subparts (Part+)
)
And here is some sample data.
LET $part0 :=
<composite>
<assembly_cost>12</assembly_cost>
<subparts>
<composite>
<assembly_cost>22</assembly_cost>
<subparts>
<basic><cost>33</cost</basic>
</subparts>
</composite>
<basic><cost>7</cost</basic>
</subparts>
</composite> : Part
RETURN ...
Here vertical bar (|) is used to indicate a
choice between types: each part is either basic (no subparts), and has a cost,
or is composite, and includes an assembly cost and subparts.
We might want to translate to a second form, WHERE every part has a total cost and a list of subparts (for a basic part, the list of subparts is empty).
TYPE Part2 =
ELEMENT part (
ELEMENT total_cost (xs:integer),
ELEMENT subparts (Part2*)
)
Here is a recursive function that performs the desired transformation. It
uses a new construct, the typeswitch expression.
DEFINE FUNCTION convert(Part $p) RETURNS Part2 {
TYPESWITCH ($p) AS $x
CASE Basic RETURN
<part>
<total_cost> { $x/cost/data() } </total_cost>
<subparts/>
</part>
CASE Composite RETURN
LET $s := (FOR $y IN $x/subparts/* RETURN convert($y))
RETURN
<part>
<total_cost>
{ $q/assembly_cost/data() +
sum($s/total_cost/data()) }
<total_cost>
<subparts> { $s } </subparts>
</part>
DEFAULT RETURN ERROR
}
Each branch of the typeswitch expression is labeled with a type,
Basic or Composite.
The evaluator checks the type of the value of $p at
query-evaluation time, i.e., run time,
and evaluates the corresponding branch.
If the first branch is taken then $x is bound to the
value of $p, and the branch returns a new part with total cost the
same as the cost of $x, and with no subparts. If the second
branch is taken, then $x is bound to the value of $p. The
function is recursively applied to each of the subparts of $x,
giving a sequence of new subparts $s. The branch returns a new part
with total cost computed by adding the assembly cost of $x to the
sum of the total cost of each subpart in $s, and with subparts
$s.
One might wonder why $x is required,
since it has the same value as $p. The reason why is
that $p and $x have different
types.
$p : Part
$x : Basic -- in the first branch
$x : Composite -- in the second branch
The type of $x is more precise than
the type of $p, because which branch is taken depends upon
the type of value in $p.
Applying the query to the given data gives the following result.
convert($part0)
==> <part>
<total_cost>74</total_cost>
<subparts>
<part>
<total_cost>55</total_cost>
<subparts>
<part>
<total_cost>33</total_cost>
<subparts/>
</part>
</subparts>
</part>
<part>
<total_cost>7</total_cost>
<subparts/>
</part>
</subparts>
</part>
: Part2
Of course, a typeswitch expression may be used in any
query, not just in a recursive one.
[XML Schema : Formal Description] defines a ``root'', or most general
type, for the four kinds of schema components: elements, attributes,
simple types, and complex types.
They are named xs:AnyElement,
xs:AnyAttribute, xs:AnySimpleType,
and xs:AnyComplexType, respectively.
The type xs:AnySimpleType stands
for the most general simple type. All built-in primitive types
(like xs:integer or xs:string)
and lists of simple types are subtypes of it.
The built-in simple types are listed in
[3.4 Atomic simple types].
The remaining schema components are defined as follows:
TYPE xs:AnyTree = xs:AnySimpleType
| xs:AnyElement
| xs:AnyAttribute
TYPE xs:AnyAttribute = ATTRIBUTE *:* (xs:AnySimpleType)
TYPE xs:AnyElement = ELEMENT *:* (xs:AnyComplexType)
TYPE xs:AnyComplexType = xs:AnyAttribute*,
((xs:AnyElement | xs:string)* | xs:AnySimpleType)
TYPE xs:AnyType = (xs:AnyTree)*
The type xs:AnyTree
denotes any simple type, element, or attribute.
The type xs:AnyAttribute
stands for the most general attribute type, which
may have any name, and its content must have type
xs:AnySimpleType, i.e., it may contain simple values,
but no elements.
The type xs:AnyElement stands for the most general element type,
which may have any name, and its content must be a
complex type.
The type xs:AnyComplexType stands for the most
general complex
type,
which is any sequence of attributes followed by any sequence of
elements or strings or by any simple type.
Strings are permissible in a complex type because an element may
contain mixed content, i.e., character data interleaved
with other elements.
Finally, xs:AnyType is a sequence of
any tree.
In particular, our earlier data also has type xs:AnyElement.
$book0 : xs:AnyElement
==> <book year="1999" isbn="1-55860-622-X">
<title>Data on the Web</title>
<author>Abiteboul</author>
<author>Buneman</author>
<author>Suciu</author>
</book>
: xs:AnyElement
A specific type can be indicated for any expression in the query language, by writing a colon and the type after the expression.
As an example of a function that can be applied to all well-formed documents, we define a recursive function that converts any XML data into HTML. We first give a simplified definition of HTML.
TYPE HTML_body =
( xs:AnySimpleType
| ELEMENT b(HTML_body)
| ELEMENT ul ((ELEMENT li (HTML_body))*)
) *
An HTML body consists of a sequence of zero or more items, each of which is
either a simple value, or a b element, where the content is
an HTML body, or an ul element, where the children
are li elements, each of which has as content an HTML
body.
Now, here is the function that performs the conversion.
DEFINE FUNCTION html_of_xml( xs:AnyTree $x ) RETURNS HTML_Body {
TYPESWITCH ($x) AS $z
CASE xs:AnySimpleType RETURN $z
CASE xs:AnyAttribute RETURN
<b> { name($z) } </b>,
<ul> { FOR $y IN $z/data() RETURN <li>{ html_of_xml($y) }</li> } </ul>
CASE xs:AnyElement RETURN
<b> { name($z) } </b>,
<ul>{ FOR y IN $z/@* RETURN <li>{ html_of_xml($y) }<li> } </ul>,
<ul>{ FOR y IN $z/* RETURN <li>{ html_of_xml($y) }</li> } </ul>
DEFAULT RETURN ERROR
}
The first branch of the typeswitch expression checks whether the value
of $x is a subtype of xs:AnySimpleType, and
if so then $z is bound to that value, so if this branch
is taken then $z is the same as $x, but
with a more precise type (it must be a simple type, not an element). This branch
returns the scalar.
The second branch checks whether the value
of $x is a subtype of xs:AnyAttribute. As before,
$z is the same as
$x but with a more precise type (it must be an attribute,
not a scalar). This branch returns a b element
containing the name of the attribute, and a ul element
containing one li element for each value of the
attribute. The function is recursively applied to get the content of the
li element.
The last branch is analogous to the second, but it matches an element
instead of an attribute, and it applies html_of_xml
to each of the element's attributes and children.
Applying the query to the book element above gives the following result.
html_of_xml($book0)
==> <b>book</b>
<ul>
<li><b>year</b><ul><li>1999</li></ul></li>
<li><b>isbn</b><ul><li>1-55860-622-X</li></ul></li>
<li><b>title</b><ul><li>Data on the Web</li></ul></li>
<li><b>author</b><ul><li>Abiteboul</li></ul></li>
<li><b>author</b><ul><li>Buneman</li></ul></li>
<li><b>author</b><ul><li>Suciu</li></ul></li>
</ul>
: HTML_Body
A XQuery module consists of a sequence of top-level declarations, i.e., a namespace declaration, function declaration, or type declaration, followed by a query expression. The order of top-level declarations is immaterial; all namespace, function, and type declarations may be mutually recursive.
The query expression is evaluated in the environment specified by all of the declarations. We have already seen examples of type, function, and namespace declarations. An example of a top-level query is:
html_of_xml(book0)
The result of a top-level query can be serialized into an XML document by the application in which XQuery is used.
In this section, we present the grammar for XQuery's core expressions
and types.
Literals are in typewriter font.
Terminal classes of literals are italicized and have the suffix
'literal', e.g., StringLiteral.
Non-terminal symbols are italicized, e.g., Expr.
In the grammar, the '|' operator denotes an alternative of
two symbols; '*' denotes zero or more
repetition of a symbol; and '?' denotes an optional symbol.
[Figure 1] contains the grammar for XQuery's core expressions. We define XQuery's typing rules on these core expressions in [4 Static Semantics : Type-Inference Rules].
NCName Variable ::= $u, $v, $w, ...StringLiteral ::= "", "a", ...NumericLiteral ::= 0, 1, 2,...BooleanLiteral ::= true|falseLiteral ::= StringLiteral | NumericLiteral | BooleanLiteral QName ::= NCName | NCName :NCNameOpeq ::= eq|node-equal|ne|lt|lteq|gt|gteqOparith ::= +|-|*|mod|divOpcoll ::= union|except|intersectOpbool ::= and|orInfixOp ::= Opeq | Oparith | Opcoll | Opbool PrefixOp ::= +|-|notExpr ::= Literal | Variable | QName (ExpSequence?)| Expr InfixOp Expr | PrefixOp Expr | attributeQName(Expr)| ElementConstructor | ExprSequence | if(Expr)thenExprelseExpr| letVariable:=ExprreturnExpr| Expr : Type | error| forVariableinExprreturnExpr| Expr sortbyExprascending|descending| cast asType(Expr)| typeswitch(Expr)asVariable CaseRulesCaseRules ::= caseTypereturnExpr CaseRules| default returnExprExprSequence ::= Expr ( ,ExprSequence)*ElementConstructor ::= <NameSpec/>| <NameSpec>EnclosedExpression</NameSpec>EnclosedExpression ::= {ExprSequence}NameSpec ::= QName | {Expr}TypeDecl ::= typeNCName = TypeContextDecl ::= namespaceNCName=StringLiteral| defaultnamespace=StringLiteralFunctionDefn ::= define functionQName(ParamList?)returnsType{Expr}ParamList ::= Type Variable ( ,Type Variable)*QueryModule ::= ContextDecl* TypeDecl* FunctionDefn* ExprSequence? QueryModuleList ::= QueryModule ( ;QueryModule )*
Figure 1: XQuery Core Expression Syntax
Many of the expressions that appear in the examples, for example
Expr/*, do not
appear in [Figure 1], because they are
reducible to expressions in the core syntax.
[6 XQuery Mapping to Core] defines the mapping for every
XQuery expression into an equivalent expression in the core syntax.
In addition to the core syntax, XQuery has
a set of operators and built-in functions.
The binary and unary operators are enumerated in [Figure 1].
They include two equality operators, eq and
nodeeq, defined in the [XQuery 1.0 and XPath 2.0 Data Model],
and five inequality operators, lteq, lt,
gteq, gt, and ne.
We have not defined the semantics of all the binary operators in XQuery.
In particular, it might be useful to define more than one type of
equality over scalar and element values, or to define implicit
coercions between values of related types.
A joint task force on operators with members from the [XSLT 99],
XML Schema, and XML Query working groups is chartered to define
operators.
XQuery will adopt the decisions of that group
(See [Issue-0056:
Operators
on Simple Types]).
XQuery's built-in functions are either defined
in [XQuery 1.0 and XPath 2.0 Data Model] or in a forthcoming document that
defines operators and a library of functions for XQuery and XPath 2.0.
In this document, the data model functions have no namespace prefix
and the library functions have the namespace prefix
xfo.
Some of these functions require special static type rules;
these are listed in [Figure 2] and [Figure 3].
[Figure 2] contains
the constructor and
accessor functions defined in the [XQuery 1.0 and XPath 2.0 Data Model].
The remaining built-in functions are listed in [Figure 3].
One benefit of having built-in functions is that more precise
types can be given to these functions than to user-defined functions.
The type rules for these functions appear in [4 Static Semantics : Type-Inference Rules].
attributes(Expr)Returns attributes of element. children(Expr)Returns children of element. comment(Expr)Constructs a comment. dereference(Expr)Dereferences a node reference. local-name(Expr)Extracts local NCName of a node. name(Expr)Returns element or attribute's tag name. namespace-uri(Expr)Extracts URI namespace from a node. parent(Expr)Returns the parent of a node. pcdata(Expr)Constructs parsable character data from string argument. processing-instruction(Expr, Expr)Constructs a processing instruction. ref(Expr)Constructs a node reference. string-value(Expr)Returns string value of given node, as defined in [XPath]. typed-value(Expr)Returns the simple typed value of an element or attribute.
Figure 2: Data Model Constructor and Accessor Functions
agg(Expr)Aggregation functions, where aggisavg,count,min,max,sum.descendent-or-self(Expr)Returns given node and all its descendents in document order. distinct-node(Expr)Removes duplicate nodes from a sequence. distinct-value(Expr)Removes duplicate values from a sequence. eop(Expr) Equality/inequality functions, where eop is one of eq,neq,lt,lteq,gt,gteq.index(Expr)Pairs each element of an sequence with integer index. xfo:node-before(Expr, Expr)True if first argument is before second in document order. xfo:node-equal(Expr, Expr)Returns true if both expressions denote the same node.
Figure 3: Built-In Functions
The XQuery type system takes as given the built-in simple datatypes from XML Schema Part 2 [XML Schema Part 2]. We let b range over all built-in datatypes.
|
The built-in simple datatype
AnySimpleType stands for the most
general simple type, and all other primitive
and simple types (like xs:integer or
xs:string) are subtypes of it.
In [XML Schema Part 2], a simple datatype is either a primitive datatype, or is derived from another simple datatype by specifying a set of facets. A type hierarchy is induced between simple types by containment of facets. Note that lists of simple datatypes are specified using repetition and unions are specified using alternation, as defined in [3.5 Types]
For simplicity, the type syntax in this document does not provide any way to define datatypes with facets. Such types can be imported from XML Schema and may be referenced by a qualified name QName. We let p range over all built-in datatypes, lists of built-in datatypes, and imported simple types.
|
[Figure 4] contains the abstract syntax for XQuery's type system. This type syntax appears in the typing rules in [4 Static Semantics : Type-Inference Rules]. An XQuery type corresponds to a content group as defined in [XML Schema : Formal Description]. See [Issue-0088: Align XQuery types with XML Schema : Formal Description.] for alignment issues between XQuery type syntax and [XML Schema : Formal Description].
type variable y unit type u ::= p simple type | ATTRIBUTENameSet (t)| ELEMENTNameSet (t)| PROCESSING-INSTRUCTION| COMMENT| REFERENCE(t)type t ::= y type variable | () empty sequence | Ø empty choice | u unit type | t1 , t2 sequence, t1 followed by t2 | t1 & t2 interleaved product | t1 | t2 choice, t1 or t2 | t minmmaxnrepetition of t m to n times | t *repetition of t 0 to * times | t +repetition of t one to * times | t ?repetition of t 0 to 1 times | PIC(t)| MIXED(t)bound m, n ::= natural number or *prime type q ::= u | q | q expanded QName expQName ::= {anyURI}NCNamenames NameSet ::= QName | expQName | * | NCName:* | *:* | NameSet ORNameSet| NameSet DIFFNameSet| (NameSet)
Figure 4: Abstract Syntax for Types
Figure 5: Precedence of Type Operators (highest to lowest)
A unit type is is either a simple type, an attribute or element type with name in NameSet and content in t, a comment type, a processing-instruction type, or a node-reference type.
The empty sequence matches only the empty document; it is an identity for sequence and all. The empty choice matches no document; it is an identity for choice.
An interleaved product t1 & t2 is
nodes in t1 interleaved with nodes in t2
The interleaved product (also known as the shuffle product) is a generalization of XML Schema's [XML Schema Part 1]
allgroups. t1 & t2
matches any sequence that is an interleaving of a sequence that matches t1
and a sequence that matches t2. For example,
(ELEMENT a(), ELEMENT b()) & ELEMENT c() =
ELEMENT a(), ELEMENT b(), ELEMENT c()
| ELEMENT a(), ELEMENT c(), ELEMENT b()
| ELEMENT c(), ELEMENT a(), ELEMENT b()
As another example, ELEMENT a()* & ELEMENT b()
matches any sequence of ELEMENT a()
and ELEMENT b() that has exactly one ELEMENT b().
Allgroups in XML Schema may only consist of global or local element declarations with lower bound 0 or 1, and upper bound 1. With these restrictions, an allgroup in XML Schema is equivalent to p1