XQuery: A Query Language for XML

W3C Working Draft 15 February 2001

This version:: http://www.w3.org/TR/2001/WD-xquery-20010215
Latest version:: http://www.w3.org/TR/xquery
Editors:: Don Chamberlin (IBM Almaden Research Center) <chamberlin@almaden.ibm.com>
Daniela Florescu (Propel) <DanaF@propel.com>
Jonathan Robie (Software AG) <Jonathan.Robie@SoftwareAG-USA.com>
Jérôme Siméon (Bell Labs, Lucent Technologies) <simeon@research.bell-labs.com>
Mugur Stefanescu (BroadVision) <mugur.stefanescu@broadvision.com>

Abstract

XML is an extremely versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. Because query languages have traditionally been designed for specific kinds of data, most existing proposals for XML query languages are robust for particular types of data sources but weak for other types. This specification describes a new query language called XQuery, which is designed to be broadly applicable across all types of XML data sources.

Status of this document

This document is the first publicly available W3C Working Draft of XQuery, for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress".

Comments on this document should be sent to the W3C mailing list www-xml-query-comments@w3.org (archived at http://lists.w3.org/Archives/Public/www-xml-query-comments/).

This document was produced by the W3C XML Query Working Group, which is part of the W3C XML Activity. A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

1 Introduction
2 The XQuery Language
    2.1 Path Expressions
    2.2 Element Constructors
    2.3 FLWR Expressions
    2.4 Operators in Expressions
    2.5 Conditional Expressions (IF THEN ELSE)
    2.6 Quantifiers
    2.7 Filtering
    2.8 Datatypes
    2.9 Functions
    2.10 User-Defined Datatypes
    2.11 Operations on Datatypes
    2.12 Structure of an XQuery Unit
3 Querying Relational Data
    3.1 Grouping
    3.2 Joins
4 Conclusion

Appendices

A References
B XQuery Grammar
C List of Operators
D Core Function Library
E XQuery Semantics
    E.1 Notations
    E.2 Algebra mapping for XQuery expressions
        E.2.1 Path expressions
        E.2.2 Node constructors
        E.2.3 FLWR expressions
        E.2.4 Operators
        E.2.5 Conditional Expressions
        E.2.6 Literal values
        E.2.7 Function Calls and Declarations
        E.2.8 Existential and Universal Quantifiers
        E.2.9 Variables
        E.2.10 Operations on Types
    E.3 Mapping of XQuery declarations to Algebra declarations
        E.3.1 Mapping of type declarations
        E.3.2 Mapping of namespace declarations
        E.3.3 Mapping of function declarations
        E.3.4 Predefined functions
F Java CUP Grammar for XQuery (Non-Normative)
G JavaCC Grammar (Non-Normative)
H XQuery Issues List (Non-Normative)
    H.1 Issue List
    H.2 XQuery Issues

1 Introduction

As increasing amounts of information are stored, exchanged, and presented using XML, the ability to intelligently query XML data sources becomes increasingly important. One of the great strengths of XML is its flexibility in representing many kinds of information, including information traditionally considered to be a database and information traditionally considered to be a document. To exploit this flexibility, an XML query language must include the features that are necessary for retrieving information from these diverse sources.

XQuery is designed to meet the requirements identified by the W3C XML Query Working Group [Requirements]. It is designed to be a small, easily implementable language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents. The Query Working Group has identified a requirement for both a human-readable query syntax and an XML-based query syntax. XQuery is designed to meet the first of these requirements. An alternative, XML-based syntax for the XQuery semantics will be defined separately.

XQuery is derived from an XML query language called Quilt, which in turn borrowed features from several other languages. From XPath [XPath] and XQL [XQL] it took a path expression syntax suitable for hierarchical documents. From XML-QL [XML-QL] it took the notion of binding variables and then using the bound variables to create new structures. From SQL [SQL] it took the idea of a series of clauses based on keywords that provide a pattern for restructuring data (the SELECT-FROM-WHERE pattern in SQL). From OQL [ODMG] it took the notion of a functional language composed of several different kinds of expressions that can be nested with full generality. Quilt was also influenced by reading about other XML query languages such as Lorel [Lorel] and YATL [YATL]. Quilt has most recently been described in [Quilt].

Important issues remain open in the design of XQuery. Some of these issues deal with relationships between XQuery and other XML activities, for example:

The semantics of XQuery are defined in terms of the operators of the XML Query Algebra (see Appendix E). The mapping of XQuery operators into Algebra operators is still being designed, and may result in some changes to XQuery and/or the Algebra.
The type system of XQuery is the type system of XML Schema. Work is in progress to ensure that the type systems of XQuery, the XML Query Algebra, and XML Schema are completely aligned.
The details of the operators supported on simple XML Schema datatypes will be defined by a joint XSLT/Schema/Query task force.
XQuery relies on path expressions for navigating in hierarchic documents. XQuery expects these path expressions to conform to the semantics of XPath 2.0, as defined by a joint XSLT/Query task force.

For more details on open issues, see Appendix H.

2 The XQuery Language

Like OQL, XQuery is a functional language in which a query is represented as an expression. XQuery supports several kinds of expression, and therefore its queries may take several different forms. The various forms of XQuery expressions can be nested with full generality, so the notion of a "subquery" is natural to XQuery. The input and output of a query are instances of a data model called the XML Query Data Model [QueryDataModel]. This data model is a refinement of the data model described in the XPath specification [XPath], in which a document is modeled as a tree of nodes. A fragment of a document, or a collection of documents, may lack a common root and may therefore be modeled as an ordered forest of nodes of various types, including element nodes, attribute nodes, and text nodes, as illustrated in Figure 1.

Figure 1: An instance of the XML query data model--an ordered forest

An XQuery syntax is given in Appendix B, and an initial approach to a formal definition of XQuery semantics is given in Appendix E.

The principal forms of XQuery expressions are as follows:

Path expressions
Element constructors
FLWR expressions
Expressions involving operators and functions
Conditional expressions
Quantified expressions
List constructors
Expressions that test or modify datatypes

Each of these expression-types is introduced and explained by a series of examples in the following sections. For syntactic details, please refer to the appendices.

In XQuery, keywords (such as FOR and LET) are case-insensitive, whereas identifiers (such as myBigBook) are case-sensitive.

A query may contain a comment, which is ignored during query processing. As in SQL, the beginning delimiter of a comment is a double hyphen and the ending delimiter is a newline character.

2.1 Path Expressions

XQuery path expressions use the abbreviated syntax of XPath, extended with a new "dereference" operator and a new type of predicate called a "range predicate" (described below). XPath syntax is used in several XML-related applications including XSLT [XSLT] and XPointer [XPointer].

In XQuery, the result of a path expression is an ordered list of nodes (of course, each node includes its descendant nodes, so the result can be thought of as an ordered forest.) The top-level nodes in the path expression result are ordered according to their position in the original hierarchy, in top-down, left-to-right order. The result of a path expression may contain duplicate values (i.e., multiple nodes with the same type and content), but it will not contain duplicate nodes (i.e., multiple nodes with the same node identity).

A path expression consists of a series of steps. Each step represents movement through a document in a particular direction, and each step can apply one or more predicates to eliminate nodes that fail to satisfy a given condition. The result of each step is a list of nodes that serves as a starting point for the next step.

A path expression can begin with an expression that identifies a specific node, such as the function document(string), which returns the root node of a named document. A query can also contain a path expression beginning with "/" or "//" which represents an implicit root node, determined by the environment in which the query is executed.

A complete discussion of XPath abbreviated syntax can be found in [XPath]. Briefly, the following symbols are used:

.	Denotes the current node.
..	Denotes the parent of the current node.
/	Denotes the root node, or a separator between steps in a path.
//	Denotes descendants of the current node.
@	Denotes attributes of the current node.
*	Denotes "any" (node with unrestricted name).
[ ]	Brackets enclose a Boolean expression that serves as a predicate for a given step.
[n]	When a predicate consists of an integer, it serves to select the element with the given ordinal number from a list of elements.

The following example uses a path expression consisting of three steps. The first step locates the root node of a document. The second step locates the second chapter element that is a child of the root element. The third step finds figure elements occurring anywhere within the chapter, but retains only those figure elements that have a caption with the value "Tree Frogs."

(Q1) In the second chapter of the document named "zoo.xml", find the figure(s) with caption "Tree Frogs".

document("zoo.xml")/chapter[2]//figure[caption = "Tree Frogs"]

It is sometimes desirable to isolate a list of elements whose ordinal numbers span a certain range. For this purpose, XQuery provides a RANGE predicate that is adapted from XQL [XQL], as illustrated in the following example. The first element in a list has ordinal number 1. The ordinal numbers of elements in a list are not affected by the presence of other types of nodes such as comments or processing instructions.

(Q2) Find all the figures in chapters 2 through 5 of the document named "zoo.xml."

document("zoo.xml")/chapter[RANGE 2 TO 5]//figure

In addition to the operators of the XPath abbreviated syntax, XQuery introduces an operator called the dereference operator ("->"). When a dereference operator follows an IDREF-type attribute, it returns the element(s) that are referenced by the attribute. A dereference operator is followed by a "name test" that specifies the target element. Following the usual XPath convention, a name test of "*" allows the target element to be of any type.

Dereference operators can be used in the steps of a path expression. For example, the following query uses a dereference operator to find the caption of the "fig" element referenced by the "refid" attribute of a "figref" element.

(Q3) Find captions of figures that are referenced by <figref> elements in the chapter of "zoo.xml" with title "Frogs".

document("zoo.xml")/chapter[title = "Frogs"]
   //figref/@refid->fig/caption

The XQuery dereference operator is similar in purpose to the id function of XPath. However, the right-arrow notation is designed to be easier to read, especially in path expressions that involve multiple dereferences. For example, suppose that a given document contains a set of <emp> elements, each of which contains a "mgr" attribute. The "mgr" attribute is of type IDREF, and it references another <emp> element that represents the manager of the given employee. The name of each employee is represented by a <name> element nested inside the <emp> element.

(Q4) List the names of the second-level managers of all employees whose rating is "Poor".

/emp[rating = "Poor"]/@mgr->emp/@mgr->emp/name

As in XPath, the identifiers used in XQuery expressions can be qualified by namespace prefixes [Namespaces]. XQuery provides a syntax for declaring the Universal Resource Identifier (URI) associated with each namespace prefix used in a query, as illustrated in the following example:

(Q5) In the document "zoo.xml", find <tiger> elements in the namespace defined by www.abc.com/names that contain any subelement in the namespace defined by www.xyz.com/names.

NAMESPACE abc = "www.abc.com/names"
NAMESPACE xyz = "www.xyz.com/names"
document("zoo.xml")//abc:tiger[xyz:*]

The XQuery NAMESPACE clause also provides a way to declare a default namespace that applies to all unqualified names used in a query, as shown in the following example which is equivalent to Q5:

(Q6) (Equivalent to Q5)

NAMESPACE DEFAULT = "www.abc.com/names"
NAMESPACE xyz = "www.xyz.com/names"
document("zoo.xml")//tiger[xyz:*]

If no default namespace is declared for a query, unqualified names used in the query are considered to match local names in any namespace.

2.2 Element Constructors

An XML element can be created by a kind of XQuery expression called an element constructor. An element constructor consists of a start tag and an end tag, enclosing an optional list of expressions that provide the content of the element. The start tag may also specify the values of one or more attributes. The name of the start tag may be specified either by a constant or a variable.

Although an element constructor is an expression in its own right, its typical use is nested inside another expression that binds one or more variables that are used in the element constructor. Both of the following examples are query fragments that refer to variables that are bound in some enclosing expression.

(Q7) Generate an <emp> element containing an "empid" attribute and nested <name> and <job> elements. The values of the attribute and nested elements are specified by variables that are bound in other parts of the query.

<emp empid = $id>
   <name> $n </name> ,
   <job> $j </job>
</emp>

In the following example, the name of the generated element is specified by a variable named $tagname. Note that, when a start-tag contains a variable name, the matching end-tag must contain the same variable name (see Query Q14 for a more interesting version of this example.)

(Q8) Generate an element with a computed name, containing nested elements named <description> and <price>.

<$tagname>
   <description> $d </description> ,
   <price> $p </price>
</$tagname>

In addition to elements, it is sometimes necessary to construct other kinds of XML nodes such as comments and processing instructions. Constructor functions are provided for this purpose, which may be invoked using the normal syntax for a function call. The following examples illustrate construction of a comment and a processing instruction:

comment("Houston, we have a problem.")
  
pi("MyFormatter", "Pagebreak")

Occasionally it is necessary to construct (or search for) an element such that the name of the element, or the name of one of its attributes, is the same as an XQuery keyword. This is made possible by the lexing rule that strings enclosed in single quotes are always considered to be identifiers (such as tagnames), even if they match a keyword of the language. The following table illustrates the use of quotes in XQuery, which is slightly different from that in XML:

Example	Meaning
FOR	XQuery keyword
'FOR'	Identifier (e.g., tagname)
"FOR"	Literal string value
\"	Literal double-quote (used in a string)
\'	Literal single-quote (used in a string)

The following example shows how an element constructor can be used to create an element whose name and/or attribute-name is the same as a XQuery keyword:

<'FOR' 'LET' = "WHERE"/>

The following example shows how a path expression might be used to search for the element created in the previous example:

//'FOR'[@'LET' = "WHERE"]

2.3 FLWR Expressions

A FLWR (pronounced "flower") expression is constructed from FOR, LET, WHERE, and RETURN clauses. As in an SQL query, these clauses must appear in a specific order. A FLWR expression binds values to one or more variables and then uses these variables to construct a result (in general, an ordered forest of nodes). The overall flow of data in a FLWR expression is illustrated in Figure 2.

The first part of a FLWR expression consists of FOR-clauses and/or LET-clauses which serve to bind values to one or more variables. The values to be bound to the variables are represented by expressions (for example, path expressions).

A FOR-clause is used whenever iteration is needed. The FOR-clause introduces one or more variables, associating each variable with an expression that returns a list of nodes. The result of the FOR-clause is a list of tuples, each of which contains a binding for each of the variables. The variables are bound to individual nodes returned by their respective expressions, in such a way that the binding-tuples represent the cross-product of the node-lists returned by all the expressions. Each variable in a FOR-clause can be thought of as iterating over the nodes returned by its respective expression.

A LET-clause is also used to bind one or more variables to one or more expressions. Unlike a FOR-clause, however, a LET-clause simply binds each variable to the value of its respective expression without iteration, resulting in a single binding for each variable. The difference between a FOR-clause and a LET-clause can be illustrated by a simple example. The clause FOR $x IN /library/book results in many bindings, each of which binds the variable $x to one book in the library. On the other hand, the clause LET $x := /library/book results in a single binding which binds the variable $x to a list containing all the books in the library.

A FLWR expression may contain several FOR and LET-clauses, and each of these clauses may contain references to variables bound in previous clauses. The result of the sequence of FOR and LET clauses is an ordered list of tuples of bound variables. The number of tuples generated by a FOR/LET sequence is the product of the cardinalities of the node-lists returned by the expressions in the FOR-clauses. The tuples generated by the FOR/LET sequence have an order that is determined by the order of their bound elements in the input document, with the first bound variable taking precedence, followed by the second bound variable, and so on. However, if some expression used in a FOR-clause is unordered (for example, because it contains a distinct function), the tuples generated by the FOR/LET sequence are unordered.

Fig. 2: Flow of data in a FLWR expression

Each of the binding-tuples generated by the FOR and LET clauses is subject to further filtering by an optional WHERE-clause. Only those tuples for which the condition in the WHERE-clause is true are used to invoke the RETURN clause. The WHERE-clause may contain several predicates, connected by AND, OR, and NOT. These predicates usually contain references to the bound variables. Variables bound by a FOR-clause represent a single node (with its descendants) and so they are typically used in scalar predicates such as $p/color = "Red". Variables bound by a LET-clause, on the other hand, may represent lists of nodes, and can be used in list-oriented predicates such as avg($p/price) > 100. The ordering of the binding-tuples generated by the FOR and LET clauses is preserved by the WHERE-clause.

The RETURN-clause generates the output of the FLWR expression, which may be a node, an ordered forest of nodes, or a primitive value. The RETURN-clause is executed once for each tuple of bindings that is generated by the FOR and LET-clauses and satisfies the condition in the WHERE-clause. If an ordering exists among these tuples, the RETURN-clause is executed on each tuple, in order, and the order of the results is preserved in the output document. The RETURN-clause contains an expression that often contains element constructors, references to bound variables, and nested subexpressions.

We will consider some examples of FLWR expressions based on a document named "bib.xml" that contains a list of <book> elements. Each <book> element, in turn, contains a <title> element, one or more <author> elements, a <publisher> element, a <year> element, and a <price> element. The first example is so simple that it could have been expressed using a path expression, but it is perhaps more readable when expressed as a FLWR expression.

(Q9) List the titles of books published by Morgan Kaufmann in 1998.

FOR $b IN document("bib.xml")//book
WHERE $b/publisher = "Morgan Kaufmann"
AND $b/year = "1998"
RETURN $b/title

Example Q10 uses a distinct function in the FOR-clause to eliminate duplicates from the list of publishers found in the input document. Two elements are considered to be duplicates if their values (including name, attributes, and normalized content) are equal. The result of the distinct function is an unordered set of elements. Example Q10 then uses a LET-clause to bind a variable to the average price of books published by each of the publishers bound in the FOR-clause.

(Q10) List each publisher and the average price of its books.

FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")
   /book[publisher = $p]/price)
RETURN 
   <publisher>
      <name> $p/text() </name> ,
      <avgprice> $a </avgprice>
   </publisher>

The next example uses a LET-clause to bind a variable $b to a set of books, and then uses a WHERE-clause to apply a condition to the set, retaining only bindings in which $b contains more than 100 elements. This query also illustrates the common practice of enclosing a FLWR expression inside an element constructor which provides an enclosing element for the query result.

(Q11) List the publishers who have published more than 100 books.

<big_publishers>
   FOR $p IN distinct(document("bib.xml")//publisher)
   LET $b := document("bib.xml")/book[publisher = $p]
   WHERE count($b) > 100
   RETURN $p
</big_publishers>

FLWR expressions are often useful for performing structural transformations on documents, as illustrated by the next query, which inverts a hierarchy. This example also illustrates how one FLWR expression can be nested inside another.

(Q12) Invert the structure of the input document so that, instead of each book element containing a list of authors, each distinct author element contains a list of book-titles.

<author_list>
   FOR $a IN distinct(document("bib.xml")//author)
   RETURN
      <author>
         <name> $a/text() </name>,
         FOR $b IN document("bib.xml")//book[author = $a]
         RETURN $b/title
      </author>
</author_list>

LET-clauses are useful for breaking up long expressions, making queries more readable. They can also be helpful in simplifying a query that makes multiple uses of the same expression (called a "common subexpression.") In the following example, the average price of books is a common subexpression that is bound to variable $a and then used repeatedly in the body of the query.

(Q13) For each book whose price is greater than the average price, return the title of the book and the amount by which the book's price exceeds the average price.

<result>
   LET $a := avg(//book/price)
   FOR $b IN /book
   WHERE $b/price > $a
   RETURN
      <expensive_book>
         $b/title ,
         <price_difference>
            $b/price - $a
         </price_difference>
      </expensive_book>
</result>

A LET-clause can be used in conjunction with an element constructor to replicate some parts of an existing element, as in the following example. This example uses the XPath functions name(element), which returns the tagname of an element, and number(element), which returns the content of an element expressed as a number. When an expression inside the body of an element constructor evaluates to one or more attributes, those attributes are considered to be attributes of the element that is being constructed.

(Q14) Variable $e is bound to some element with numeric content. Construct a new element having the same name and attributes as $e, and with numeric content equal to twice the content of $e.

LET $tagname := name($e)
RETURN
   <$tagname>
      $e/@*,   -- replicates the attributes of $e
      2 * number($e)
   </$tagname>

By default, a FLWR expression preserves the ordering of elements in the input document(s). However, it is often important to specify an ordering for the elements in a query result that supplements or supercedes the order derived from the input. If a query result contains several levels of nested elements, an ordering may be required among the elements at each level. XQuery provides a SORTBY clause that may be used after any expression to specify an ordering among the resulting elements. The arguments of the SORTBY clause are evaluated within the context of the individual nodes to be sorted, and may be followed by ASCENDING or DESCENDING to specify the direction of the sort (ASCENDING is the default.) The use of SORTBY is illustrated by the following example.

(Q15) Make an alphabetic list of publishers. Within each publisher, make a list of books, each containing a title and a price, in descending order by price.

<publisher_list>
   FOR $p IN distinct(document("bib.xml")//publisher)
   RETURN
      <publisher>
         <name> $p/text() </name> ,
         FOR $b IN document("bib.xml")//book[publisher = $p]
         RETURN
            <book>
               $b/title ,
               $b/price
            </book> SORTBY(price DESCENDING)
      </publisher> SORTBY(name)
</publisher_list>

2.4 Operators in Expressions

Like most languages, XQuery allows expressions to be constructed using infix and prefix operators, and allows nested expressions inside parentheses to serve as operands. XQuery supports the usual set of arithmetic and logical operators, and the collection operators UNION, INTERSECT, and EXCEPT. A complete list of these operators and a specification of their semantics as applied to various datatypes is provided in Appendix C.

From XQL, XQuery inherits the infix operators BEFORE and AFTER, which are useful in searching for information by its ordinal position. Each instance of the XML Query data model (regardless of whether it is a complete document, a fragment of a document, or a list of documents) is a forest that includes a total ordering, called "document order," among all its nodes. BEFORE operates on two lists of elements and returns those elements in the first list that occur before at least one element of the second list in document order (of course, this is possible only if the two lists are subsets of the same data model instance.) AFTER is defined in a similar way. Since BEFORE and AFTER are based on global document ordering, they can compare the positions of elements that do not have the same immediate parent. For the same reason, BEFORE and AFTER do not require their operands to have a local ordering. The next two examples illustrate the use of BEFORE and AFTER by retrieving excerpts from a surgical report that includes <procedure>, <incision>, and <anesthesia> elements.

(Q16) Prepare a "critical sequence" report consisting of all elements that occur between the first and second incision in the first procedure.

<critical_sequence>
   LET $p := //procedure[1]
   FOR $e IN //* AFTER ($p//incision)[1] 
          BEFORE ($p//incision)[2]
   RETURN shallow($e)
</critical_sequence>

The shallow function makes a shallow copy of a node, including attributes but not including subelements.

(Q17) Find procedures in which no anesthesia occurs before the first incision.

-- Finds potential lawsuits
FOR $p in //procedure
WHERE empty($p//anesthesia BEFORE ($p//incision)[1])
RETURN $p

The empty function returns True if and only if its argument is an empty list.

2.5 Conditional Expressions (IF THEN ELSE)

Conditional expressions are useful when the structure of the information to be returned depends on some condition. Of course, like all XQuery expressions, conditional expressions can be nested and can be used wherever a value is expected.

As an example of a conditional expression, consider a library that has many holdings, each described by a <holding> element with a "type" attribute that identifies its type: book, journal, etc. All holdings have a title and other nested elements that depend on the type of holding.

(Q18) Make a list of holdings, ordered by title. For journals, include the editor, and for all other holdings, include the author.

FOR $h IN //holding
RETURN
   <holding>
      $h/title,
      IF $h/@type = "Journal"
      THEN $h/editor
      ELSE $h/author
   </holding> SORTBY (title)

2.6 Quantifiers

Occasionally it is necessary to test for existence of some element that satisfies a condition, or to determine whether all elements in some collection satisfy a condition. For this purpose, XQuery provides existential and universal quantifiers. The existential quantifier is illustrated in Q19, and the universal quantifier is illustrated in Q20.

(Q19) Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph.

FOR $b IN //book
WHERE SOME $p IN $b//para SATISFIES
   contains($p, "sailing") 
   AND contains($p, "windsurfing")
RETURN $b/title

(Q20) Find titles of books in which sailing is mentioned in every paragraph.

FOR $b IN //book
WHERE EVERY $p IN $b//para SATISFIES
   contains($p, "sailing")
RETURN $b/title

2.7 Filtering

One of the functions in the XQuery core function library is called filter. This function takes two operands, each of which is an expression that, in general, evaluates to an ordered forest of nodes. filter returns copies of some of the nodes in the forest represented by the first operand, while preserving their hierarchic and sequential relationships. The nodes that are copied into the result are those nodes that are present at any level in the first operand and are also top-level nodes in the second operand. Thus the second operand is used as a "filter" that selects nodes from the forest represented by the first operand. The filtering process is based on node identity; that is, it requires both operands to contain the same node, not just two nodes with the same value. Obviously, if the two operands do not have a common root, the result of the filter function is the empty list.

The action of a filter function is illustrated by Figure 3, which shows an element hierarchy that might result from evaluating the path expression /C. Figure 3 also shows the result of the function call filter(/C, //A | //B). The result contains copies of all nodes of type A and B, and where a hierarchic or sequential relationship exists among these nodes, the relationship is preserved.

Fig 3: Action of FILTER on a hierarchy

The filter function is useful in "pruning" a document, eliminating undesired parts while retaining the document structure. The following example illustrates this process by computing a table of contents for a document that contains many levels of nested sections. The query filters the document, retaining only section elements, title elements nested directly inside section elements, and the text of those title elements. Other elements, such as paragraphs and figure titles, are eliminated, leaving only the "skeleton" of the document.

In this example, the first argument of filter is the root of a document, and the second argument is a path expression that identifies the nodes to be preserved from the original document.

(Q21) Prepare a table of contents for the document "cookbook.xml", containing nested sections and their titles.

LET $b := document("cookbook.xml")
RETURN
   <toc>
      filter($b, $b//section | $b//section/title | $b//section/title/text() )
   </toc>

2.8 Datatypes

The type system of XQuery is the type system of XML Schema. Each XQuery expression has a datatype that can be declared using the language of XML Schema.

In XQuery, datatype names appear in function declarations where they specify the datatypes of the function parameters and result. Datatype names are also used in CAST and TREAT expressions and as operands of the INSTANCEOF operator. A datatype name is a qualified name. By using the datatype names defined in the namespace http://www.w3.org/2000/10/XMLSchema-datatypes (hereafter abbreviated as xsd), all the primitive and derived datatypes of XML Schema can be used in queries. More complex datatypes can also be defined, as described in Section 2.10.

Certain XML Schema datatypes have literal forms that are recognized by the XQuery lexer, as illustrated by the following examples:

Type	Example of literal
xsd:string	"Hello"
xsd:boolean	TRUE, FALSE
xsd:integer	47, -369
xsd:decimal	-2.57
xsd:float	-3.805E-2

Literal values of XML Schema types other than string, boolean, integer, decimal, and float can be specified by means of constructor functions such as date("2000-06-25") or by cast expressions such as CAST AS xsd:positiveInteger(47).

NOTE: The set of constructor functions has not yet been fixed. A task force has been assigned to determine the set of operators on XML Schema types, and we expect this task force to design the set of constructor functions.

It is common to declare functions whose parameters and/or results can be XML elements of any type. XQuery provides a shorthand notation for this common case, which avoids the necessity to make a namespace declaration. The keyword ELEMENT is a shorthand for the type defined by <xsd:any>, which represents an element of any type. The xsd namespace also provides other forms of "wild cards"--for example, a type defined by <xsd:anyAttribute> represents any attribute.

Another datatype that is very commonly used is a repetition of zero or more instances of some type. In XML Schema, this is represented by the facets minOccurs="0" maxOccurs="unbounded". XQuery provides a shorthand notation for this common case: LIST( x ) represents zero or more occurrences of the type denoted by x. For example, LIST(Book) is a shorthand notation for a type defined by <xsd:element ref="Book" minOccurs="0" maxOccurs="unbounded">, and LIST(ELEMENT) is a shorthand notation for a type defined by <xsd:any minOccurs="0" maxOccurs="unbounded">.

In XML, lists of elements are always ordered. However, during processing of a query, some intermediate results may be collections of elements in which order is not significant. Various XQuery functions operate on lists of elements and change their properties: for example, the distinct function operates on a list to remove duplicate elements and to make its ordering insignificant. The SORTBY operator operates on a list to reorder it and to make its order significant. Properties of lists, such as length, order-significance, and presence or absence of duplicates, are not significant for purposes of function resolution.

A list may be constructed by enclosing zero or more expressions in square brackets, separated by commas. For example, [$x, $y, $z] denotes a list containing three members represented by variables, and [ ] denotes an empty list. The square-bracket list constructor always "flattens" any of its arguments that are lists themselves, so that the result of the list constructor is a one-level list. For example, the following two expressions are equivalent, and both result in a one-level list containing four integers:

[[1, 2], 3, [ ], [[4]]]

[1, 2, 3, 4]

The rule restricting data values to one-level lists avoids complications in function resolution and ensures that all XQuery values can be easily converted to XML format.

2.9 Functions

XQuery provides a core library of built-in functions for use in queries, as specified in Appendix D. We have already used some of these core functions, such as document, which returns the root node of a named document. The XQuery core function library contains all the functions of the XPath core function library, all the aggregation functions of SQL (such as avg, sum, count, max, and min), and a number of other useful functions. For example, the distinct function eliminates duplicates from a list, and the empty function returns TRUE if and only if its argument is an empty list.

In addition to the built-in functions, XQuery allows users to define functions of their own. Each function definition must declare the datatypes of its parameters and result, and must provide an expression (the "body" of the function) that defines how the result of the function is computed from its parameters. When a function is invoked, its arguments must be valid instances of the declared parameter types. The result of a function must also be a valid instance of its declared type. These rules are checked using the type-inference rules of the XML Query Algebra.

In future work, we expect to define an extensibility mechanism whereby function definitions with global scope, written in various programming languages, can be added to the XQuery function library.

XQuery Version 1 does not allow user-defined functions to be overloaded--that is, it does not allow multiple functions to be declared with the same name and the same number of parameters. We consider function overloading to be a useful and important feature that deserves further study in future versions of XQuery. Although XQuery does not allow overloading of user-defined functions, some of the built-in functions in the XQuery core library are overloaded--for example, the string function of XPath can convert an instance of almost any type into a string.

It is possible in XQuery to invoke a function on a list of arguments whose types do not exactly match the declared parameter types of the function. The process of finding the best available function for a given invocation is calledfunction resolution. The rules for function resolution in XQuery Version 1 are as follows:

A fixed "promotion hierarchy" is defined among the primitive and derived types of XML Schema (see Appendix C). For example, part of the promotion hierarchy might look like this:
```
integer -> decimal -> float -> double
```
This fragment of the promotion hierarchy would indicate, for example, that a function with a declared parameter of float could be called with an integer argument, causing the integer to be converted to a float. The "best" function for a given call is found by searching among the available functions for the one whose declared parameters are closest in the promotion hierarchy to the arguments of the function call, considering the arguments from left to right.
A declared parameter-type matches an argument that is an instance of the named type or one of its subtypes. This rule is called subtype substitutability.
A function whose parameter type is a list of a given type can be invoked with an argument that is a single instance of the given type. This rule follows directly from the definition of a list in XQuery, which is based on the facets minOccurs="0" maxOccurs="unbounded".
A function with a parameter of a given type can be invoked with an argument that is a list of the given type. The result is a list, in which each member is the result of invoking the function on one of the members of the argument list, in their order of occurrence. For example, suppose that the function price(Book) is declared to take a Book and return an integer. If the price function is invoked on a list of Books, the result is a list of integers representing the prices of the Books, in order. This rule recognizes the fact that most path expressions return lists, and it is desirable to allow a path expression to be used as the argument of a function without requiring the function body to explicitly iterate over the members of the list.

When this rule is applied, any individual function results that are lists are "flattened" before combining them into a single list that represents the final result of the function invocation. This rule prevents functions from returning lists of lists, which are not allowed in the query data model.

This rule generalizes to functions with multiple parameters in the following way: suppose that N arguments of a function-call are lists that match function-parameters where single elements are expected. Then the result of the function-call is a list whose individual members are the results of invoking the function on tuples of arguments taken from the Cartesian product of the N input lists.

A function may be defined recursively-that is, it may be referenced in its own definition. The next query contains an example of a recursive function that computes the depth of an element hierarchy. In its definition, the user-defined function depth calls the built-in functions emptyand max.

(Q22) Find the maximum depth of the document named "partlist.xml."

NAMESPACE xsd = "http://www.w3.org/2000/10/XMLSchema-datatypes"

FUNCTION depth(ELEMENT $e) RETURNS xsd:integer
{
   -- An empty element has depth 1
   -- Otherwise, add 1 to max depth of children
   IF empty($e/*) THEN 1
   ELSE max(depth($e/*)) + 1
}

depth(document("partlist.xml"))

To further illustrate the power of functions, we will write a function that returns all the elements that are "connected" to a given element by child or reference connections, and a recursive function that returns all the elements that are "reachable" from a given element by child or reference connections.

(Q23) In the document "company.xml", find all the elements that are reachable from the employee with serial number 12345 by child or reference connections.

FUNCTION connected(ELEMENT $e) RETURNS LIST(ELEMENT)
{ 
    $e/* UNION $e/@*->* 
}

FUNCTION reachable(ELEMENT $e) RETURNS LIST(ELEMENT)
{ 
    $e UNION reachable(connected($e)) 
}

reachable(document("company.xml")/emp[serial="12345"])

In the above example, the reachable function invokes itself recursively to find all the elements that are reachable from the elements that are directly connected to the parameter $e.

As another example, we might use the filter function together with the reachable function defined in Q23 to return all the elements that are reachable from a specific employee element, while preserving their hierarchic and sequential relationships.

(Q24) Return a fragment of the document "company.xml" consisting of all nodes reachable from employee no. 12345, preserving the original relationships among the reachable nodes.

LET $c := document("company.xml")
RETURN filter($c, reachable($c/emp[empno="12345"]))

Of course, it is possible to write a recursive function that fails to terminate for some set of arguments. In fact, the reachablefunction in the previous example will fail to terminate if called on an element that references one of its ancestors. It is the user's responsibility to avoid writing a nonterminating function call.

2.10 User-Defined Datatypes

In addition to the primitive and derived datatypes of XML Schema, any datatype that can be constructed using the definition facilities of XML Schema can be used as an XQuery datatype. XML Schema language can be used to define an element or datatype and to give it a qualified name, which can then be used in an XQuery function declaration. For example, a schema might define an element named PurchaseOrder by specifying a set of attributes and a content model based on sequences and alternations of various other elements. PurchaseOrder could then be used as the type of a function parameter in a query.

A query can refer to element-names and type-names that are defined in any of the following schemas:

Schemas that are referenced by documents used in the query; i.e., the implicit input document or any document referenced by the documentfunction.
Schemas that are associated with namespaces declared at the beginning of the query, such as NAMESPACE xsd = "http://www.w3.org/2000/10/XMLSchema-datatypes".
Under consideration: A query might include a "preamble" in which local datatypes could be defined using XML Schema notation. Since Schema uses XML notation, this feature would require an XQuery implementation to include an XML parser. (See Issues)

In the following example, a schema defines complex types named "emp_type" and "dept_type" in the target namespace "http://www.BigCompany.com/BigNames". A query then uses these datatypes to define and invoke a function.

(Q25) Define complex types named "emp_type" and "dept_type" in a target namespace.

<?xml version="1.0">
<schema xmlns="http://www.w3.org/2000/10/XMLSchema"
   targetNamespace="http://www.BigCompany.com/BigNames">

   <complexType name="emp_type">
      <sequence>
         <element name="name" type="string"/>
         <element name="deptno" type="string"/>
         <element name="salary" type="decimal"/>
         <element name="location" type="string"/>
      </sequence>
   </complexType>

   <complexType name="dept_type">
      <sequence>
         <element name="deptno" type="string"/>
         <element name="headcount" type="integer"/>
         <element name="payroll" type="decimal"/>
      </sequence>
   </complexType>

</schema>

(Q26) Using the datatypes defined in Q25, create a function that summarizes employees by department, and use this function to summarize all the employees of Acme Corp. that are located in Denver.

NAMESPACE DEFAULT = "http://www.BigCompany.com/BigNames"

FUNCTION summary(LIST(emp_type) $emps) RETURNS LIST(dept_type)
   {
   FOR $d IN distinct($emps/deptno)
   LET $e := $emps[deptno = $d]
   RETURN
      <dept>
         $d,
         <headcount> count($e) </headcount>,
         <payroll> sum($e/salary) </payroll>
      </dept>
   }

summary(document("acme_corp.xml")/emp[location = "Denver"] )

It is sometimes desirable to validate that the result of a query conforms to a specific datatype. This can be done by taking advantage of the fact that XQuery functions validate the datatypes of their parameters and results. For example, suppose that some query Q is intended to generate output that conforms to the datatype abc:PurchaseOrder (for some suitable binding of the namespace prefix abc). The query Q can be type-validated by "wrapping" it in a function that takes an instance of the desired type as a parameter and simply returns it, as in the following example:

(Q27) Validate that the result of a query Q conforms to the datatype abc:PurchaseOrder (Q may be arbitrarily complex).

FUNCTION check(abc:PurchaseOrder $po) RETURNS abc:PurchaseOrder
   { $po }
   
check(Q)

2.11 Operations on Datatypes

The Boolean operator INSTANCEOF returns True if its first operand is an instance of the type named in its second operand; otherwise it returns False. For example, $x INSTANCEOF zoonames:animal returns True if the dynamic type of $x is zoonames:animal or a subtype of zoonames:animal. The INSTANCEOF operator has the same syntax and behavior as the instanceof operator in Java.

Occasionally it is necessary to convert a value from one datatype to another. For the primitive and derived types of XML Schema, a CAST notation is supported that provides conversions between certain combinations of types. For example, the notation CAST AS integer (x DIV y) converts the result of x DIV y into the integer type. The set of type conversions that are supported by the CAST notation is specified in Appendix C. Conversions among user-defined datatypes are not supported by the CAST notation, but user-defined functions can be written for this purpose.

In addition to CAST, XQuery provides a notation called TREAT. Rather than converting an expression from one datatype to another, TREAT causes the query processor to treat an expression as though its datatype were a subtype of its static type. For example, TREAT AS Cat($mypet) tells the query processor to treat the variable $mypet as though it were an instance of the type Cat, even though the static type of $mypet is a supertype of Cat such as Animal. This notation allows functions that require an argument of type Cat to be invoked on the variable $mypet. At query execution time, if the dynamic type of $mypet is not Cat, an error results.

The following example shows how INSTANCEOF and TREAT can be combined to simulate a primitive form of subtype polymorphism. A more robust treatment of polymorphic functions is deferred to a later version of XQuery.

(Q28) Define a function named sound(animal) that returns different strings for various types of animals. Use the function in a query that returns the sounds made by all of Billy's pets.

-- First define some functions to set the stage
NAMESPACE xsd = "http://www.w3.org/2000/10/XMLSchema-datatypes"

FUNCTION quack(duck $d) RETURNS xsd:string
   { "String depends on properties of duck" }

FUNCTION woof(dog $d) RETURNS xsd:string
   { "String depends on properties of dog" }

--This function illustrates simulated subtype polymorphism

FUNCTION sound(animal $a) RETURNS xsd:string
   {
   IF $a INSTANCEOF duck THEN quack(TREAT AS duck($a))
   ELSE IF $a INSTANCEOF dog THEN woof(TREAT AS dog($a))
   ELSE "No sound"
   }

-- This query returns the sounds made by all of Billy's pets

FOR $p IN /kid[name="Billy"]/pet
RETURN sound($p)

2.12 Structure of an XQuery Unit

An XQuery unit is a string that can be recognized by an XQuery parser. At present, the following types of XQuery units are defined:

Queries: A query consists of zero or more namespace declarations, followed by zero or more function definitions, followed by an expression. The expression may invoke the functions declared in the query unit as well as functions declared in other function libraries. The result of the query is the value of the expression.
Function libraries: A function library consists of zero or more namespace declarations, followed by one or more function definitions. Function libraries are used to define functions for general use by multiple queries. The means by which a query gains access to a function library is not yet defined (see Issues).

Future versions of the language may define other types of XQuery units, such as updates and view definitions.

3 Querying Relational Data

Since much of the world's business data is stored in relational databases, access to relational data is a vitally important application for an XML query language. In this section, we will illustrate the use of XQuery to access relational data by a series of examples based on a schema that is often used in relational database tutorials, containing descriptions of suppliers and parts, as shown in Figure 4. In this schema, Table S contains supplier numbers and names; Table P contains part numbers and descriptions, and Table SP contains contains the relationships between suppliers and the parts they supply, including the price of each part from each supplier.

Fig. 4: One possible XML representation of relational data

Figure 4 also shows how the schema of parts and suppliers might be translated into an XML view in which each table appears as a document, each row of a table appears as an element inside the document, and each value inside a row appears as a nested element. Other, more richly structured views can be defined on top of this simple view by using XQuery syntax.

SQL [SQL] is the standard relational database language. In many cases, SQL queries can be converted to XQuery syntax in a straightforward way by mapping SQL query-blocks into FLWR-expressions. We illustrate this mapping by the following query:

(Q29) Find part numbers of gears, in numeric order.

SQL version:

SELECT pno
FROM p
WHERE descrip LIKE 'Gear'
ORDER BY pno;

XQuery version:

FOR $p IN document("p.xml")//p_tuple
WHERE contains($p/descrip, "Gear")
RETURN $p/pno SORTBY(.)

In XQuery, the operand of SORTBY is always interpreted within the context of the element to be sorted. Since the <pno> elements generated by Q29 have no internal structure, we use the notation "SORTBY(.)", which causes the <pno> elements to be sorted by their content.

3.1 Grouping

Many relational queries involve forming data into groups and applying some aggregation function such as count or avg to each group. In SQL, these queries are expressed using GROUP BY and HAVING clauses. The following example shows how such a query might be expressed in XQuery:

(Q30) Find the part number and average price for parts that have at least 3 suppliers.

SQL version:

SELECT pno, avg(price) AS avgprice
FROM sp
GROUP BY pno
HAVING count(*) >= 3
ORDER BY pno;

XQuery version:

FOR $pn IN distinct(document("sp.xml")//pno)
LET $sp := document("sp.xml")//sp_tuple[pno = $pn]
WHERE count($sp) >= 3
RETURN 
   <well_supplied_item>
      $pn,
      <avgprice> avg($sp/price) </avgprice>
   </well_supplied_item> SORTBY(pno)

The distinct function in this query eliminates duplicate part numbers from the set of all part numbers in the document. The result of distinct is a list in which order is not significant.

Note that $pn, bound by a FOR-clause, represents an individual part number, whereas $sp, bound by a LET-clause, represents a set of sp-tuples. The SQL HAVING clause, which applies a predicate to a set, is mapped into a XQuery WHERE-clause that operates on the set-valued variable $sp. The XQuery version of the query also uses an element constructor to enclose each part number and average price in a containing element called <well_supplied_item>.

3.2 Joins

Joins, which combine data from multiple sources into a single query result, are among the most important forms of relational queries. In this section we will illustrate how several types of joins can be expressed in XQuery.

A conventional ("inner") join returns information from two or more related tables, as illustrated by example Q31.

(Q31) Return a "flat" list of supplier names and their part descriptions, in alphabetic order.

FOR $sp IN document("sp.xml")//sp_tuple,
    $p IN document("p.xml")//p_tuple[pno = $sp/pno],
    $s IN document("s.xml")//s_tuple[sno = $sp/sno]
RETURN
   <sp_pair>
      $s/sname ,
      $p/descrip
   </sp_pair> SORTBY (sname, descrip)

Q31 returns information only about parts that have suppliers and suppliers that have parts. An "outer join" is a join that preserves information from one or more of the participating tables, including those rows that have no matching row in the joined table. For example, a "left outer join" between suppliers and parts might return information about suppliers that have no matching parts. In place of the missing parts data, relational systems usually return null values; but an XML query might represent the missing data by an empty element or the absence of an element. Q32 is an example of a query that corresponds to a left outer join.

(Q32) Return names of all the suppliers in alphabetic order, including those that supply no parts; inside each supplier element, list the descriptions of all the parts it supplies, in alphabetic order.

FOR $s IN document("s.xml")//s_tuple
RETURN
   <supplier>
      $s/sname,
      FOR $sp IN document("sp.xml")//sp_tuple
               [sno = $s/sno],
          $p IN document("p.xml")//p_tuple
               [pno = $sp/pno]
      RETURN $p/descrip SORTBY(.)
   </supplier> SORTBY(sname)

Another type of join that is sometimes used in relational systems is a "full outer join," which preserves information from both of the participating tables, including rows of each table that have no matching rows in the other table. In XML, the result of a full outer join can be structured in any of several ways. The example in Q33 uses a format of parts nested inside suppliers, followed by a list of parts that have no supplier. This might be thought of as a "supplier-centered" full outer join. A "part-centered" full outer join, on the other hand, might return a list of suppliers nested inside parts, followed by a list of suppliers that have no parts. Other forms of outer join queries are also possible.

(Q33) Return names of suppliers and descriptions and prices of their parts, including suppliers that supply no parts and parts that have no suppliers.

<master_list>
    (FOR $s IN document("s.xml")//s_tuple
     RETURN
        <supplier>
           $s/sname,
           FOR $sp IN document("sp.xml")//sp_tuple
                   [sno = $s/sno],
               $p IN document("p.xml")//p_tuple
                   [pno = $sp/pno]
           RETURN
              <part>
                 $p/descrip,
                 $sp/price
              </part> SORTBY (descrip)
        </supplier> SORTBY(sname) 
    )
 UNION
    -- parts that have no supplier
    <orphan_parts>
       FOR $p IN document("p.xml")//p_tuple
       WHERE empty(document("sp.xml")//sp_tuple
             [pno = $p/pno] )
       RETURN $p/descrip SORTBY(.)
    </orphan_parts>
</master_list>

Q33 uses an element constructor to enclose its output inside a <master_list> element. The UNION operator, when used as in Q33 to combine two ordered lists, returns the first list with the second list appended at the end (removing duplicate node identities but not duplicate node values). The result is a <master_list> element containing an ordered list of <supplier> elements followed by an <orphan_parts> element that contains descriptions of all the parts that have no supplier.

4 Conclusion

With the emergence of XML, the distinctions among various forms of information, such as documents and databases, are quickly disappearing. XQuery is designed to support queries against a broad spectrum of information sources by incorporating features from several languages that were originally designed for diverse purposes. The versatility of XQuery will help XML to realize its potential as a universal medium for data interchange.

This specification describes XQuery Version 1. Future versions of XQuery may include additional features such as the following:

Data definition facilities for persistent views.
Function overloading and polymorphic functions.
Facilities for updating XML data.

A References

XML10: World Wide Web Consortium. Extensible Markup Language (XML) 1.0. W3C Recommendation, Feb. 10, 1998. Seehttp://www.w3.org/TR/1998/REC-xml-19980210.
XMLSchema: World Wide Web Consortium. XML Schema, Parts 0, 1, and 2. W3C Working Draft, April 7, 2000. Seehttp://www.w3.org/TR/xmlschema-0/,http://www.w3.org/TR/xmlschema-1/, and http://www.w3.org/TR/xmlschema-2/.
Requirements: World Wide Web Consortium. XML Query Requirements. W3C Working Draft, Jan. 31, 2000. Seehttp://www.w3.org/TR/xmlquery-req.
: World Wide Web Consortium. XML Query Algebra. W3C Working Draft, 2 February, 2001. See http://www.w3.org/TR/query-algebra/.
XPath: World Wide Web Consortium. XML Path Language (XPath) Version 1.0. W3C Recommendation, Nov. 16, 1999. Seehttp://www.w3.org/TR/xpath.html
XQL: J. Robie, J. Lapp, D. Schach. XML Query Language (XQL). See http://www.w3.org/TandS/QL/QL98/pp/xql.html.
XML-QL: Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. A Query Language for XML. Seehttp://www.research.att.com/~mff/files/final.html
SQL: International Organization for Standardization (ISO). Information Technology-Database Language SQL. Standard No. ISO/IEC 9075:1999. (Available from American National Standards Institute, New York, NY 10036, (212) 642-4900.)
ODMG: Rick Cattell et al. The Object Database Standard: ODMG-93, Release 1.2. Morgan Kaufmann Publishers, San Francisco, 1996.
Lorel: Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer Widom, and Janet L. Wiener. The Lorel Query Language for Semistructured Data. International Journal on Digital Libraries, 1(1):68-88, April 1997. See "http://www-db.stanford.edu/~widom/pubs.html
YATL: S. Cluet, S. Jacqmin, and J. Simeon. The New YATL: Design and Specifications. Technical Report, INRIA, 1999.
Quilt: Don Chamberlin, Jonathan Robie, and Daniela Florescu. Quilt: an XML Query Language for Heterogeneous Data Sources. InLecture Notes in Computer Science, Springer-Verlag, Dec. 2000. Also available athttp://www.almaden.ibm.com/cs/people/chamberlin/quilt_lncs.pdf. See also http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html.
QueryDataModel: World Wide Web Consortium. XML Query Data Model. W3C Working Draft, May 11, 2000. Seehttp://www.w3.org/TR/query-datamodel/.
XSLT: World Wide Web Consortium. XSL Transformations (XSLT). W3C Recommendation, Nov. 16, 1999. Seehttp://www.w3.org/TR/xslt.
XPointer: World Wide Web Consortium. XML Pointer Language (XPointer). W3C Working Draft, Dec. 6, 1999. Seehttp://www.w3.org/TR/WD-xptr.
Namespaces: World Wide Web Consortium. Namespaces in XML. W3C Recommendation, Jan. 14, 1999. Seehttp://www.w3.org/TR/REC-xml-names/.

B XQuery Grammar

The following rather permissive grammar specifies the structure of the XQuery language. An actual XQuery implementation would also augment the grammar with a set of typing rules (for example, an expression used in a predicate must return a Boolean value or an integer.)

BNF Notations:

'aaa'  token

?      optional

*      zero or more

+      one or more

|      alternation

( )    grouping

------------------------------------------------------------------

XQueryUnit          ::= FunctionLibrary 
                      | Query

FunctionLibrary     ::= ContextDecl*  FunctionDefn+

Query               ::= ContextDecl*  FunctionDefn*  Expr

ContextDecl         ::= 'NAMESPACE' Identifier '=' Literal
                     | 'NAMESPACE' 'DEFAULT' '=' Literal

FunctionDefn	    ::= 'FUNCTION' QName '(' ParameterList? ')'
                            'RETURNS' Datatype '{' Expr '}'

ParameterList	    ::= Parameter ( ',' Parameter )*

Parameter	    ::= Datatype Variable

Datatype            ::= SimpleDatatype
                     |  'LIST' '(' SimpleDatatype ')'

SimpleDatatype	    ::= QName | '@' QName | 'ELEMENT'  

Expr		    ::= Expr 'SORTBY' '(' SortSpecList ')' 
                     |  UnaryOp Expr
                     |  Expr BinaryOp Expr
                     |  Variable
                     |  Literal
                     |  '.'                          
                     |  FunctionName '(' ExprList? ')' 
                     |  ElementConstructor
                     |  '(' Expr ')'
                     |  '[' ExprList? ']'          
                     |  PathExpr
                     |  Expr Predicate
                     |  FlwrExpr
                     | 'IF' Expr 'THEN' Expr 'ELSE' Expr
                     |  ('SOME' | 'EVERY') Variable 'IN' Expr 'SATISFIES' Expr
                     |  ('CAST' | 'TREAT') 'AS' Datatype '(' Expr ')'
                     |  Expr 'INSTANCEOF' Datatype

ExprList	    ::= Expr ( ',' Expr )*

FlwrExpr            ::=  (ForClause | LetClause)+  WhereClause? ReturnClause
ForClause	    ::= 'FOR' Variable 'IN' Expr ( ',' Variable 'IN' Expr )*
LetClause	    ::= 'LET' Variable ':=' Expr ( ',' Variable ':=' Expr )*
WhereClause	    ::= 'WHERE' Expr
ReturnClause	    ::= 'RETURN' Expr

SortSpecList	    ::= SortSpec (',' SortSpec)*
SortSpec	    ::= Expr ( 'ASCENDING' | 'DESCENDING' )?

ElementConstructor  ::= StartTag ExprList? EndTag
                     |  '<' TagName AttributeList? '/>'

PathExpr            ::= Path
                      | '/' Path
                      | '//' Path
                      | Expr '/' Path
                      | Expr '//' Path

Path                ::= Step ( ('/' | '//') Step )* 

Step                ::= NodeGenerator Predicate*

NodeGenerator       ::= NameTest
                     |  NodeType '(' ')'
                     |  '@' NameTest ( '->' NameTest )?
                     |  '..'

Predicate	    ::= '[' Expr ']'
                     |  '[' 'RANGE' Expr 'TO' Expr ']'

NameTest            ::= QName
                      | ( NamePrefix ':' )? '*'
                      | '*' ':' LocalName

NodeType            ::= 'NODE'
                      | 'TEXT'
                      | 'COMMENT'
                      | 'PROCESSING-INSTRUCTION'

StartTag	    ::= '<' TagName AttributeList? '>'

TagName		    ::= QName | Variable

AttributeList	    ::= (AttributeName '=' Expr)*

AttributeName	    ::= QName | Variable

EndTag		    ::= '</' TagName '>'

QName               ::= ( NamePrefix ':' )? LocalName

NamePrefix          ::= Identifier

LocalName           ::= Identifier

BinaryOp	    ::= LogicalOp | CompareOp | ArithOp

LogicalOp	    ::= 'AND' | 'OR' | 'UNION' | 'INTERSECT'
	             |  'EXCEPT' | 'BEFORE' | 'AFTER' | '|'

CompareOp	    ::= '=' | '<' | '<=' | '>' | '>=' | '!='

ArithOp		    ::= '+' | '-' | '*' | 'DIV' | 'MOD'

UnaryOp		    ::= 'NOT' | '+' | '-'


------------------------------------------------------------------

The terminal symbols of this grammar are as follows:

Variable       Example: $x

Literal        Examples: "x", 5, 5.72E-3, TRUE

Identifier     Examples: x, 'FOR'

------------------------------------------------------------------

The operator precedence of this grammar is as follows:

1. Path operators: /, //, ->, predicates

2. Unary +, -

3. *, DIV, MOD

4. Binary +, -

5. =, <, <=, >, >=, !=, INSTANCEOF

6. FLWR, IF..THEN..ELSE, Quantifiers

7. BEFORE, AFTER

8. NOT

9. INTERSECT, AND

10. UNION, |, EXCEPT, OR

11. SORTBY

C List of Operators

(To be completed)

(Include a list of valid casts)

(Include a promotion hierarchy for Schema types)

(Include specifications for literals where applicable)

(Include a specification of how existential quantifiers are implicitly generated by certain operations on lists)

D Core Function Library

(To be completed)

E XQuery Semantics

This section contains a mapping from the XQuery syntax to the XML Query Algebra. The algebra mapping is the support for the semantics of XQuery, both at run-time (evaluation) and at compile-time (typing).

This mapping is still preliminary and is subject to evolution in the future. This mapping might still contain inconsistencies and whenever appropriate, we point to corresponding issues in the issue list.

In XQuery, a query is composed of a preamble (containing schema, namespace and function declarations) and a body (containing a single XQuery expression). First, we give a mapping for XQuery expressions into Algebra expressions, then we give a mapping for XQuery declarations into Algebra declarations. This mapping is based on the XQuery grammar given in appendix B. Note that we assume that the precedence of XQuery and the XML Algebra are aligned (cf. [xquery-align-precedence]).

E.1 Notations

We will use the following notations.

E		  XQuery expression
e		  XML Algebra expression
l		  Local Name (NCName)
{n}l		  QName with namespace n and local name l
a		  Qualified name

$v		  XQuery variables
v		  XML Algebra variables
dot		  Distinguished algebra variable used to contain the
		  current node.

T		  XQuery type
t		  XML Algebra type

[[ E ]]_c ==> e	  Given the context c, the XQuery expression E is mapped
		  to the algebra expression e.

When the context is clear, we will write: [[ E ]] ==> e.

E.2 Algebra mapping for XQuery expressions

We give a mapping for each of the classes of expressions identified at the beginning of Section 2. Each of these classes corresponds to a subset of the Expr production in the XQuery Grammar of Appendix B.

E.2.1 Path expressions

XQuery path expressions use a fragment of XPath, plus two additional features: dereference and range predicates. XQuery and the XML Algebra do not currently support all the axis of XPath (Cf. [Algebra Issue-0052: Axes of Xpath.])

The mapping assumes that path steps are always given an explicit input expression. I.e., XPath abreviations are resolved, for instance, name is already in the form ./name, /person[name = "John"] is already in the form /person[./name = "John"], etc. Cf. Issue [xquery-mapping-input-context].

E.2.1.1 Current node

In XQuery, '.' denotes the current node. The current node is a special node, whose value depends on the context of the expression within which it is evaluated. Inside the Algebra, we will use a distinguished variable, called dot, to contain the value of the current node. Variable dot will always be bound explicitly in the algebra (see for instance, navigation and SORT expressions).

[[ . ]]   ==>   dot

E.2.1.2 Root node

In XQuery, '/' denotes the root of the document of the current node. This is returned by a special algebra function root() that accesses parents of the current node until it reaches the root. This function is given in the preamble section.

[[ / ]]   ==>   root(dot)

E.2.1.3 Simple Navigation

The following mapping is straightforward and works for a being: element names, attribute names, with and without wildcards.

[[ E/a ]]    ==>   [[E]]/a

[[ /a ]]     ==>   [[ / ]]/a

Note that in the XML Algebra, this returns elements with name a. This is not always the case in XQuer due to implicit type coercions. These type coercions are not yet taken into account in the mapping to the Algebra (Cf. [xquery-xpath-coercions]).

E.2.1.4 Recursive Navigation

Recursion is supported through recursive functions in the XML Algebra. Recursive navigation uses a special recursive function called descendant that returns the current node as well as all its descendants. This function is defined within the algebra and given in Section E.3.4.

[[ E//a ]]   ==>   descendant([[E]])/a

[[ //a ]]    ==>   descendant([[ / ]])/a

E.2.1.5 Access to COMMENT, NODES, PIs and TEXT nodes

This assumes the corresponding accessors are in the algebra.

[[ E/COMMENT() ]] ==> [[E]]/comment()

[[ E/PROCESSING_INSTRUCTION() ]] ==> [[E]]/pi()

[[ E/NODE() ]] ==> [[E]]/node()

[[ E/TEXT() ]] ==> [[E]]/value()

E.2.1.6 Dereference

This uses the support for dereference in the Algebra. This assumes an agreement on what is a reference and what is not (e.g., that ID/IDREF are converted to ref() in the data model).

[[E -> a]]   ==>   for v1 in [[E]] do
                     for v2 in deref(v1) do
                       match v2 with
                         case v3 : a[AnyType] do v3
                         else ()

E.2.1.7 Predicates

Local XPath predicates correspond to iteration over element in a collection, and requires the binding of the dot variable, as the predicate might use the current node.

[[ E1[E2] ]]   ==>   for v in [[E1]] do
                       let dot = v do
                       where [[E2]] do v

E.2.1.8 Parent

The parent navigation is mapped to the corresponding parent built-in function in the algebra.

[[ E/.. ]] ==> parent([[E]])

E.2.2 Node constructors

E.2.2.1 Local and qualified names

Element and attribute names are mapped to corresponding statements in the XML Algebra.

[[l]]	  ==>	l

[[n:l]]	  ==>	{n}l

Note that in XQuery, namespaces are declared in the preamble. We will see in section E.3.2 that namespace declarations are mapped to global variables containing the URI of the correponding namespace in the XML Algebra.

E.2.2.2 Element and attribute constructors

Element constructors are mapped to similar operations in the Algebra.

[[ <a a1=E1 ... an=En> E1', ..., Ek' </a> ]] ==>
   a [ @a1 [ [[E1]] ], ..., @an [ [[En]] ], [[E1']], ..., [[Ek']] ]

[[ <$v a1=E1 ... an=En> E1', ..., Ek' </$v> ]] ==>
   ~(v) [ @a1 [ [[E1]] ], ..., @an [ [[En]] ], [[E1']], ..., [[Ek']] ]

Note that if the Ei's do not return an attribute value, this will be detected by the XML Algebra type system at compile time. Note also, that the Ei's might return attributes, in which case the algebra element constructor should append then at the beginning with the explicitely declared attributes.

E.2.2.3 Comments and PI's constructors

Comments and PI's constructors are mapped to similar operations in the Algebra.

[[ comment(E) ]]  ==>  comment([[E]])
[[ pi(E) ]]       ==>  pi([[E]])

E.2.3 FLWR expressions

E.2.3.1 F to for, L to let, W to where, R to do

For, Let, Where and Return components of a FLWR expression are mapped to corresponding components in the Algebra.

[[ FOR $v1 IN E1, ..., $vn IN En
   LET $v1' := E1', ..., $vk' IN Ek'
   WHERE Ew
   RETURN Er ]]

==>

for v1 in [[E1]] do
...
for vn in [[En]] do
let v1' = [[E1']] do
...
for vk' in [[Ek']] do
where [[Ew]] do
[[Er]]

E.2.3.2 Sort by

Like for predicates, this mapping supposes that path expressions in the sort criterias are all given an explicit input. This also assumes a stable sort in the Algebra. Finally, this mapping is only possible for ASCENDING criteria, which is the only currently available in the XML Algebra (Cf. [Algebra Issue-0041: Sorting]).

[[ E SORTBY E1 ASCENDING, ..., En ASCENDING ]]

==>

sort dot in
  (sort dot in
    (...
      (sort dot in [[E]]
       by [[En]])
     ...)
   by [[E2]])
by [[E1]]

E.2.4 Operators

Operators in XQuery are mapped to corresponding operators in the Algebra.

E.2.4.1 Boolean operators

These are simply mapped to the corresponding Algebra operators.

[[ E1 AND E2 ]]       ==> [[E1]] and [[E2]]
[[ E1 OR E2 ]]        ==> [[E1]] or [[E2]]
[[ NOT E ]]           ==> not([[E]])

E.2.4.2 Predicate operators

Mapping predicates from XQuery to the Algebra is not completely straightforward, as XQuery uses implicit existential quantification (like XPath), while the XML Algebra does not.

As a consequence, existential quantification needs to be explicitely introduced by the mapping. It is still an open issue whether XQuery should use existential quantification or not Cf. [xquery-xpath-coercions].

For example, in the case of equality, the mapping is:

[[ E1 = E2 ]]   ==> not(empty(for v1 in [[E1]] do
                              for v2 in [[E2]] do
                              where v1 = v2 do v1))

Note that without existential quantification, the mapping would be more straightforward: [[ E1 = E2 ]] ==> [[E1]] = [[E2]].

[[ E1 < E2 ]]   ==> not(empty(for v1 in [[E1]] do
                              for v2 in [[E2]] do
                              where v1 < v2 do v1))
[[ E1 <= E2 ]]  ==> not(empty(for v1 in [[E1]] do
                              for v2 in [[E2]] do
                              where v1 <= v2))
[[ E1 >= E2 ]]  ==> not(empty(for v1 in [[E1]] do
                              for v2 in [[E2]] do
                              where v1 >= v2))
[[ E1 > E2 ]]   ==> not(empty(for v1 in [[E1]] do
                              for v2 in [[E2]] do
                              where v1 > v2))
[[ E1 != E2 ]]   ==> not(empty(for v1 in [[E1]] do
                              for v2 in [[E2]] do
                              where v1 != v2))

E.2.4.3 Arithmetic operators

Arithmetic operators are not yet integrated in XQuery and in the XML Algebra. See open issues [xquery-operators-and-functions] and [Algebra Issue-0056: Operators on Simple Types]. Note also that we do not perform automatic type coercions (Cf. [xquery-xpath-coercions]).

Here is a few examples of mappings for such expressions.

[[ E1 + E2 ]]   ==> [[E1]] + [[E2]]
[[ E1 - E2 ]]   ==> [[E1]] - [[E2]]
[[ E1 * E2 ]]   ==> [[E1]] MULT [[E2]]
[[ E1 DIV E2 ]] ==> [[E1]] DIV [[E2]]
[[ E1 MOD E2 ]] ==> [[E1]] MOD [[E2]]
...

[[ + E ]]	    ==> +[[E]]
[[ - E ]]	    ==> -[[E]]
...

E.2.4.4 Collection operators

All collection operations in XQuery are based on node identity. See [xquery-equality-identity].

List concatenation corresponds to sequencing in the Algebra.

[[ [E] ]]	      ==> [[E]]_l

[[ E1, E2 ]]_l	      ==> [[E1]], [[E2]]

Union in XQuery corresponds to the set Union operation.

[[ E1 UNION E2 ]]     ==> distinct([[E1]], [[E2]])

Intersection and exception can be written in the algebra using existential quantification. Note the use of '==', the node identity operator in the XML Algebra.

[[ E1 INTERSECT E2 ]]

==>

for v1 in [[E1]] do
  where not(empty(for v2 in [[E2]] do
                    where v2 == v1 do v2))
  do v1

[[ E1 EXCEPT E2 ]]

==>

for v1 in [[E1]] do
  where empty(for v2 in [[E2]] do
                where v2 == v1 do v2)
  do v1

BEFORE and AFTER operators returns the subset of the element in the first expression which are before or after one of the element in the second expression in document order. The mapping to the algebra uses both existential quantification and document order comparators '<' and '>'.

[[ E1 BEFORE E2 ]]

==>

for v1 in [[E1]] do
  where not(empty(for v2 in [[E2]] do
                    where v2 > v1 do v2))
  do v1

[[ E1 AFTER E2 ]]

==>

for v1 in [[E1]] do
  where not(empty(for v2 in [[E2]] do
                    where v1 > v2 do v2)
  do v1

E.2.5 Conditional Expressions

Conditional expressions are mapped directly in the algebra.

[[ IF E1 THEN E2 ELSE E3 ]]

==>

if [[E1]] then [[E2]] else [[E3]]

E.2.6 Literal values

Literal values are mapped to their corresponding values in the algebra.

[[ TRUE ]]    ==>  true
[[ FALSE ]]   ==>  false

[[ 1  ]] ==> 1
[[ "" ]] ==> ""
...

Note that the algebra does not support a syntax for Float values, while XQuery does. See Issue [xquery-literal-xml-constructor] and [xquery-embedding-xml]. Further work is required on arithmetic values and operations.

E.2.7 Function Calls and Declarations

Function calls are mapped to equivalent function calls. Mapping for function declarations is given in the next section.

[[ FunctionName(E1,..., En) ]]
     ==> FunctionName( [[E1]]; ...; [[E2]] )

Notably, the FILTER function will be treated in Section E.3.4.

E.2.8 Existential and Universal Quantifiers

Quantifiers are simply mapped to existential predicate and iteration in the Algebra.

[[ SOME $v IN E1 SATISFIES E2 ]]

==>

not(empty(for v in [[E1]] do
          where [[E2]] do v))

[[ EVERY $v IN E1 SATISFIES E2 ]]

==>

empty(for v in [[E1]] do
      where not([[E2]]) do v)

E.2.9 Variables

Variables in XQuery are mapped to variables in the algebra.

[[ $v ]]	   ==> v		(* variables *)

E.2.10 Operations on Types

XQuery supports operations on the types.

The INSTANCE OF operation allows to check whether a value is of a given type. This operation is mapped to a match expression in the algebra.

[[ E INSTANCE OF T ]]

==>

match [[ E ]]
  case [[ T ]] do true
  else false

The CAST operation in XQuery performs a conversion of the actual value from a type into another. This kind of value conversion is not currently supported in the XML Algebra (Cf. [xquery-mapping-cast]).

The TREAT operation allows to change the type of a value at run-time. It is mapped into the unsafe type casting in the XML Algebra.

[[ CAST AS T (E) ]]

==>

cast [[ E ]] : [[ T ]]

E.3 Mapping of XQuery declarations to Algebra declarations

E.3.1 Mapping of type declarations

XQuery type declarations are mapped to the corresponding type in the XML Algebra. Note that there is a number of issues related to how XQuery types are written (Cf. [xquery-type-syntax] and [xquery-inline-xml-schema]), how typing is done in XQuery (Cf. [xquery-type-correspondence]), and how XML Algebra types relate to XML Schema (Cf. [Algebra Issue-0018: Align algebra types with schema]).

[[ xsd:string ]]_type ==> String		      (* String type *)

[[ xsd:integer ]]_type    ==> Integer		      (* Integer type *)

[[ xsd:float ]]_type      ==> Float			      (* Float type *)

[[ 'ELEMENT' ]]_type  ==> AnyElement		      (* Any Element *)

[[ TagName ]]_type    ==> TagName[AnyType]	      (* An Element *)

[[ @AttributeName ]]  ==> @AttributeName[AnyType]     (* An Attribute *)

[[ SchemaName ]]      ==> X			      (* Where X is
							 the Algebra
							 type name
							 corresponding
							 to the XML
							 Schema name *)

E.3.2 Mapping of namespace declarations

Namespace declarations in XQuery are mapped to global variables in the Algebra. Note that we use here the uri function of the XML Query Data Model, which is not currently in the XML Algebra (See Algebra Issue [Algebra Issue-0071: Alignment with the XML Query Datamodel ]).

[[ NAMESPACE n = d ]]

==>

let n : URI = uri([[d]])

E.3.3 Mapping of function declarations

Each function declaration in XQuery is mapped to a corresponding function declaration in the Algebra.

[[ FUNCTION f(T1 v1, ..., Tn vn) RETURNS T' { E } ]]

==>

fun f(v1 : [[T1]]; ...; vn : [[Tn]]) : [[T']] = [[E]]

E.3.4 Predefined functions

In the above mapping, we used a number of Algebra functions, which are not currently part of the XML Algebra. These functions need to be declared accordingly as part of the preamble.

The following function computes the root of the input node.

fun root(x:AnyElement) : AnyElement =
  let p = parent(x) do
  if p = () then x else root(p)

The following function computes the descendants of the input nodes.

fun descendant(x:AnyForest) : AnyForest =
  for v in x do
    v,children(v)

The following algebra functions implement the corresponding XQuery filter function. Note that this definition lose all type information, see [xquery-filter-typing].

fun member(x : AnyNode; y : AnyForest) : Boolean =
  not(empty(for v in y do
              where v == y do v))

fun filter1(x : AnyNode; y : AnyForest) : AnyForest =
  if (member(x;y))
  then
    match x
      case x' : (Comment|PI|String|AnyAttribute) do x
      else let tag = name(x) do
             ~(tag) [ x/@*,
		      filter(x/node();y) ]
  else
    filter(x/node();y)

fun filter(x : AnyForest; y : AnyForest) : AnyForest =
  for x' in x do filter1(x';y)

F Java CUP Grammar for XQuery (Non-Normative)

As noted earlier, the simple grammar shown in Appendix B needs to be made more specific in order to be used in a real XQuery implementation. The way in which this is done depends on the parser generator tools being used. This Appendix contains a specification, based on the grammar in Appendix B, that can be used with the Java CUP tool (http://www.cs.princeton.edu/~appel/modern/java/CUP/manual.html) to generate a working XQuery parser.

The grammar shown below allows a series of XQuery units (queries and function libraries) to be separated by semicolons and parsed as a single test case. A parser generated from this grammar has been used to validate all the XQuery solutions to the Use Cases of the Query Working Group.

/**
 * An XQueryFile is one or more XQueryUnits separated by semicolons.
 * An XQueryUnit may be either a function library or a query.
 */
XQueryFile ::=   
      XQueryUnit
    | XQueryFile SEMICOLON XQueryUnit 
    ;

XQueryUnit ::=
      FunctionLibrary
    | Query
    ;

FunctionLibrary ::= 
      OptionalContextDeclarationList FunctionDefinitionList
    ;
     
/**
 * A Query is zero or more namespace declarations, 
 * followed by zero or more local function definitions,
 * followed by an expression
 */  
Query ::=    
      OptionalContextDeclarationList OptionalFunctionDefinitionList Expression
    ;
                
OptionalContextDeclarationList ::=    
      /* empty */
    | ContextDeclarationList
    ;
                            
ContextDeclarationList ::=    
      ContextDeclaration
    | ContextDeclarationList ContextDeclaration
    ;
                    
/**
 * A context declaration is used to declare a namespace prefix, 
 * or to declare the default namespace.
 */
ContextDeclaration ::=    
      NAMESPACE  IDENTIFIER  EQUALS  STRING_LITERAL
    | NAMESPACE  DEFAULT  EQUALS  STRING_LITERAL
    ;
                
OptionalFunctionDefinitionList ::=    
      /* empty */
    | FunctionDefinitionList
    ;
                            
FunctionDefinitionList ::=    
      FunctionDefinition
    | FunctionDefinitionList FunctionDefinition
    ;
                    
FunctionDefinition ::=    
      FUNCTION FunctionName
         L_PAREN OptionalParameterList R_PAREN
         RETURNS Datatype
         L_BRACE Expression R_BRACE
    ;
 
FunctionName ::=  
      QName
    ;

OptionalParameterList ::=   
      /* empty */
    | ParameterList
    ;
                            
ParameterList ::=   
      Parameter
    | ParameterList COMMA Parameter
    ;
    
/**
 * Each function parameter has a datatype and a name 
 */
Parameter ::=   
      Datatype VAR
    ;

/**
 * A datatype can be the name of an element or attribute, or the
 * keyword ELEMENT or ATTRIBUTE denoting any element or attribute, or
 * one of the predefined types of XML Schema, or a list of any simple type.
 */
Datatype ::= 
      SimpleDatatype
    | LIST L_PAREN SimpleDatatype R_PAREN
    ;

SimpleDatatype ::=
      QName
    | AT QName
    | ELEMENT
    ;

Expression ::=
      LogicalOrSetExpression
    | LogicalOrSetExpression SORTBY L_PAREN SortSpecList R_PAREN
    ;

SortSpecList ::=   
      SortSpec
    | SortSpecList COMMA SortSpec
    ;  

SortSpec ::=   
      Expression
    | Expression ASCENDING
    | Expression DESCENDING
    ;
                            
LogicalOrSetExpression ::=    
      LogicalOrSetTerm
    | LogicalOrSetExpression OR LogicalOrSetTerm
    | LogicalOrSetExpression UNION LogicalOrSetTerm
    | LogicalOrSetExpression BAR LogicalOrSetTerm
    | LogicalOrSetExpression EXCEPT LogicalOrSetTerm
    ;
                       
LogicalOrSetTerm ::=    
      LogicalOrSetPrimitive
    | LogicalOrSetTerm AND LogicalOrSetPrimitive
    | LogicalOrSetTerm INTERSECT LogicalOrSetPrimitive
    ;

LogicalOrSetPrimitive ::=
      SequencedValue
    | NOT SequencedValue
    ;
                   
SequencedValue ::=    
      ValueExpression
    | SequencedValue  BEFORE  ValueExpression
    | SequencedValue  AFTER  ValueExpression
    ;
                    
ValueExpression ::=    
      Comparison
    | SpecialExpression
    ;

Comparison ::=    
      ArithmeticExpression
    | Comparison CompareOperator ArithmeticExpression
    | Comparison INSTANCEOF Datatype
    ;
 
ArithmeticExpression ::=
      ArithmeticTerm
    | ArithmeticExpression PLUS ArithmeticTerm
    | ArithmeticExpression MINUS ArithmeticTerm
    ;

ArithmeticTerm ::=
      ArithmeticFactor
    | ArithmeticTerm STAR ArithmeticFactor
    | ArithmeticTerm DIV ArithmeticFactor
    | ArithmeticTerm MOD ArithmeticFactor
    ;

ArithmeticFactor ::=
      ArithmeticPrimitive
    | PLUS ArithmeticFactor
    | MINUS ArithmeticFactor
    ;

ArithmeticPrimitive ::=
      BasicExpression OptionalPredicateList
    | PathExpression
    ;
                      
PathExpression ::=    
      Path
    | SLASH Path
    | DOUBLE_SLASH Path
    | BasicExpression OptionalPredicateList SLASH Path
    | BasicExpression OptionalPredicateList DOUBLE_SLASH Path
    ;
                        
/**
 *  In the future, paths may include other forms of regular expressions.
 */
Path ::=    
      Step
    | Path SLASH Step
    | Path DOUBLE_SLASH Step
    ;

/**
 * As in XPath, a step represents movement in an XML document along an
 * axis (currently limited to the child, parent, and attribute axes).
 * If followed by DEREFERENCE, the step dereferences an attribute
 * of type IDREF(S) and returns the target element(s) (this requires
 * information from a schema or Document Type Definition.)
 */
Step ::=
      NodeGenerator OptionalPredicateList
    ;


NodeGenerator ::=
      NameTest
    | NodeType L_PAREN R_PAREN
    | AT NameTest
    | AT NameTest DEREFERENCE NameTest
    | DOTDOT
    ;
    
OptionalPredicateList ::=   
      /* empty */
    | OptionalPredicateList Predicate
    ;

/**
 * An expression in a predicate must evaluate to a Boolean or an ordinal number.
 */
Predicate ::=   
      L_BRACKET Expression R_BRACKET
    | L_BRACKET RANGE Expression TO Expression R_BRACKET
    ;
               
/**
 * When used in an expression, DOT denotes the "current node". 
 */
BasicExpression ::=   
      VAR
    | Literal
    | FunctionName L_PAREN OptionalExpressionList R_PAREN
    | L_PAREN Expression R_PAREN
    | ElementConstructor
    | ListConstructor
    | CastingVerb AS Datatype L_PAREN Expression R_PAREN
    | DOT
    ;
 
Literal ::=   
      STRING_LITERAL
    | INTEGER_LITERAL
    | FLOAT_LITERAL
    | BooleanLiteral
    ;

BooleanLiteral ::=
      TRUE
    | FALSE
    ;

CastingVerb ::= 
      CAST
    | TREAT
    ;                

OptionalExpressionList ::=    
      /* empty */
    | ExpressionList
    ;
                            
ExpressionList ::=    
      Expression
    | ExpressionList COMMA Expression
    ;
            
SpecialExpression ::=    
      FlwrExpression
    | IF Expression THEN ValueExpression ELSE ValueExpression
    | Quantifier VAR IN Expression SATISFIES ValueExpression
    ;
                    
FlwrExpression ::=    
      ForLetClause WhereReturnClause
    ;
                    
/**
 * A For-Let clause consists of one or more FOR-clauses and/or Let-clauses
 */
ForLetClause ::=
      ForOrLet
    | ForLetClause ForOrLet
    ;

ForOrLet ::=   
      ForClause
    | LetClause
    ;
                        
/**
 * A For clause has at least one variable that iterates over a collection
 */
ForClause ::=    
      FOR VAR IN Expression
    | ForClause COMMA VAR IN Expression
    ;
                    
/**
 * A Let clause has at least one variable assignment
 */
LetClause ::=    
      LET VAR SET_EQUAL_TO Expression
    | LetClause COMMA VAR SET_EQUAL_TO Expression
    ;
                    
WhereReturnClause ::=
      WhereClause ReturnClause
    | ReturnClause
    ;
                            
WhereClause ::=   
      WHERE Expression
    ;
                    
ReturnClause ::=    
      RETURN ValueExpression
    ;
                        
/**
 * An element constructor is a start tag, followed by an optional expression 
 * list, followed by an end-tag; or it might be an empty element constructor.
 *
 * For constructing other types of nodes, use constructor functions such as
 * comment("string") and pi("target", "string").
 */
ElementConstructor ::=   
      StartTag OptionalExpressionList EndTag
    | EmptyElementConstructor
    ;
                                          
StartTag ::=   
      LESS_THAN TagName OptionalAttributeList GREATER_THAN
    ;
                
TagName ::=   
      QName
    | VAR
    ;
            
OptionalAttributeList ::=    
      /* empty */
    | AttributeList
    ;                            
                    
AttributeList ::=   
      AttributeValuePair
    | AttributeList AttributeValuePair
    ;
                                    
AttributeValuePair ::=   
      AttributeName EQUALS ArithmeticExpression
    ;
                                
AttributeName ::=     
      QName
    | VAR
    ;
                    
/**
 * If the end tag contains a variable, it must match the start tag
 */
EndTag ::=    
      LESS_THAN_SLASH TagName GREATER_THAN
    ;

EmptyElementConstructor ::=    
      LESS_THAN TagName OptionalAttributeList SLASH_GREATER_THAN
    ;

ListConstructor ::= 
      L_BRACKET ListContent R_BRACKET
    | L_BRACKET R_BRACKET
    ;

ListContent ::=
      Expression
    | ListContent COMMA Expression
    ;

/**
 * A name test is a Qname where "*" serves as a wild card
 */
NameTest ::=    
      QName
    | NamePrefix COLON STAR
    | STAR COLON LocalPart
    | STAR
    ;

NodeType ::=      
      NODE
    | TEXT
    | COMMENT
    | PROCESSING_INSTRUCTION
    ;
                
Quantifier ::=    
      SOME
    | EVERY
    ;
 
/**
* A QName may consist of a single part or two parts
*/
QName ::=    
      LocalPart
    | NamePrefix COLON LocalPart
    ;
                                             
/**
* NamePrefix and LocalPart are both IDENTIFIERs
*/
NamePrefix ::=    
      IDENTIFIER
    ;
 
LocalPart ::=    
      IDENTIFIER
    ;
                
CompareOperator ::= 
      EQUALS
    | LESS_THAN
    | LESS_THAN_EQUALS
    | GREATER_THAN
    | GREATER_THAN_EQUALS
    | NOT_EQUALS
    ;

G JavaCC Grammar (Non-Normative)

Most commonly used grammars are either LALR or LL(1). The CUP grammar presented in the previous appendix is an example of an LALR grammar that corresponds to the abstract grammar in Appendix B. This appendix contains an LL(1) grammar that works with JavaCC.

/**
 *
   Xquery parser; 1.26.2001
 *
 */


PARSER_BEGIN(xquery)

import java.io.*;
import java.util.Vector;

class xquery {

  public static void main(String args[]) throws Exception {


    xquery ql = null;
    if (args.length == 0 ) {
            System.out.println("--Input: test.xq");
            ql = new xquery(new FileInputStream("test.xq"));
    } else {
            System.out.println("--Input: "+args[0]);
            ql = new xquery(new FileInputStream(args[0]));
    }
    try {
      ql.Start();
      System.out.println("--Done.");
    } catch (Exception e) {
      System.out.println("--Undone.");
      System.out.println(e.getMessage());
      e.printStackTrace();
    }
  }
}

PARSER_END(xquery)


void Start() throws Exception : {}
{
  XQueryFile()
}



void XQueryFile() throws Exception : {}
{
   XQueryUnit() [<SEMICOLON> XQueryFile ()]
}



void XQueryUnit()throws Exception: {}
{
          FunctionLibrary() [Query()]
}


void FunctionLibrary()  throws Exception:{}
{
        [ ContextDeclarationList()] [ FunctionDefinitionList()]
}


void Query() throws Exception:{}
{
         Expression()
}


void ContextDeclarationList()throws Exception: {}
{
   ContextDeclaration() [ ContextDeclarationList()]
}


void ContextDeclaration()throws Exception:{}
{
   <NAMESPACE> [<DEFAULTT>] <IDENTIFIER>  <EQUALS>  <STRING>
}



void FunctionDefinitionList()throws Exception:{}
{
       FunctionDefinition() [ FunctionDefinitionList()]
}


void FunctionDefinition()throws Exception:{}
{
        <FUNCTION> <IDENTIFIER>
         <L_PAREN> (ParameterList())? <R_PAREN>
         <RETURNS> DataType()
         <L_BRACE> Expression() <R_BRACE>
}



void ParameterList()throws Exception: {}
{
    Parameter()  [<COMMA> ParameterList()]
}



void Parameter()throws Exception: {}
{
         DataType() VAR() 
}



void DataType()throws Exception:{}
{
         SimpleDatatype()
    |    <LIST> <L_PAREN> SimpleDatatype() <R_PAREN>
}



void SimpleDatatype()throws Exception:{}
{
              QName()
        |    <ELEMENT>
        |    <ATTRIBUTE>
        |    <AT> QName()
}


void Expression() throws Exception:{}
{
        ExpressionLevel14()
}


void  ExpressionLevel14()throws Exception:{}
{
      ExpressionLevel13()
        
                [LOOKAHEAD(2)  <SORTBY> <L_PAREN> SortSpecList() <R_PAREN> ]
}


void  ExpressionLevel13()throws Exception:{}
{
      ExpressionLevel12() [LOOKAHEAD(2)  (<OR> | <BAR> | <UNION> | <EXCEPT> ) ExpressionLevel13()] 
}


void  ExpressionLevel12()throws Exception:{}
{
      ExpressionLevel11() [LOOKAHEAD(2) (<AND>| <INTERSECT>) ExpressionLevel12()] 
}


void  ExpressionLevel11()throws Exception:{}
{
      [<NOT>]  ExpressionLevel10()
}



void  ExpressionLevel10()throws Exception:{}
{
        ExpressionLevel9() [LOOKAHEAD(2) (<BEFORE> | <AFTER> ) ExpressionLevel10()]
}


void  ExpressionLevel9()throws Exception:{}
{
        ExpressionLevel7() | ExpressionLevel8()
}



void ExpressionLevel8() throws Exception:{}
{
        ExpressionLevel8a() /* FLWR Expression */
      | ExpressionLevel8b() /* If Then Else */
      | ExpressionLevel8c() /* Quantified expression */
}


void  ExpressionLevel7()throws Exception:{}
{
        ExpressionLevel6() [LOOKAHEAD(2)  ComparisonOperator() ExpressionLevel7() ]
}



void  ExpressionLevel6()throws Exception:{}
{
        ExpressionLevel5() [LOOKAHEAD(2)  ( <INSTANCEOF> DataType() )  ]
}



void  ExpressionLevel5()throws Exception:{}
{
        ExpressionLevel4() [LOOKAHEAD(2) ( <PLUS> | <MINUS> ) ExpressionLevel5()]
}




void  ExpressionLevel4()throws Exception:{}
{
        ExpressionLevel3() [LOOKAHEAD(2) ( <STAR> | <DIV> | <MOD> ) ExpressionLevel4()]
}


void  ExpressionLevel3()throws Exception:{}
{
        [ <PLUS> | <MINUS>]  ExpressionLevel2()
}




void  ExpressionLevel2()throws Exception:{}
{         
         LOOKAHEAD(4)
          
           ExpressionLevel1()  [  (<SLASH> | <SLASHSLASH>)  Path()]


        |  Path()
        | <SLASH> Path() 
        | <SLASHSLASH> Path()


}



void ExpressionLevel1() throws Exception:{}
{
        ExpressionLevel0() [LOOKAHEAD(2) PredicateList()]
}


void  ExpressionLevel0() throws Exception:{} /* Basic expression */
{
     VAR()
    | <DOT>
    | Literal()
    | QName()  <L_PAREN> [ExpressionList()] <R_PAREN>
    | <L_PAREN> Expression() <R_PAREN>
    | NodeConstructor()
    | ListConstructor()
    | CastingVerb() <AS> DataType() <L_PAREN> Expression() <R_PAREN>

}


void Path()throws Exception:{}
{
      AtomicStep() 
        (     LOOKAHEAD(2) 
               Predicate()
          |   (<SLASH> | <SLASHSLASH>)  AtomicStep()
          |   <DEREFERENCE> NameTest() 
        ) *
}




void AtomicStep()throws Exception:{}
{
      NameTest()  
    | NodeType() <L_PAREN> <R_PAREN> 
    | <AT> NameTest() 
    | <DOTDOT> 
}



void PredicateList()throws Exception:{}
{
        Predicate() [LOOKAHEAD(2) PredicateList()]
}


void Predicate()throws Exception:{}
{
   <L_BRACKET>
     (  Expression() | <RANGE> Expression() <TO> Expression())
   <R_BRACKET>
}




void ExpressionLevel8b() throws Exception:{}
{
         <IF>  Expression() <THEN> Expression() <ELSE> ExpressionLevel9()
}


void  ExpressionLevel8c()throws Exception:{}
{
        Quantifier()   VAR() <IN> Expression() <SATISFIES> ExpressionLevel9()
}


void ExpressionLevel8a()throws Exception:{}
{
        (ForClause() |  LetClause())+
        [WhereClause()]
        ReturnClause()
}



void ForClause()throws Exception:{}
{
      <FOR>  VAR() <IN> Expression() (<COMMA> VAR() <IN> Expression() )*
}



void LetClause()throws Exception:{}
{
      <LET> VAR() <SET_EQUAL_TO> Expression()  (<COMMA> VAR() <SET_EQUAL_TO> Expression())*
}



void WhereClause()throws Exception:{}
{
      <WHERE> Expression()
}


void ReturnClause()throws Exception:{}
{
      <RETURN> ExpressionLevel9()
}


void NodeConstructor()throws Exception:{}
{
      ElementConstructor()
}


void ListConstructor() throws Exception:{}
{
      <L_BRACKET>  [ExpressionList()] <R_BRACKET>
}


   
void  ElementConstructor()throws Exception:{}
{
    StartTag()  (<SLASH_GREATER_THAN> | (<GREATER_THAN>  [ExpressionList()] EndTag()))
}


void StartTag()throws Exception:{}
{
      <LESS_THAN> TagName() [AttributeList()]
}


void AttributeList()throws Exception:{}
{
      AttributeValuePair() [AttributeList()]
}


void AttributeValuePair()throws Exception:{}
{
     AttributeName() <EQUALS> ExpressionLevel5()
}



void AttributeName()throws Exception:{}
{
     QName() | VAR()
}



void TagName()throws Exception:{}
{
     QName() | VAR()
}


void EndTag()throws Exception:{}
{
   <LESS_THAN_SLASH>  TagName() <GREATER_THAN>
}


void NameTest()throws Exception:{}
{
     (<IDENTIFIER> [<COLON> (<IDENTIFIER> | <STAR>)])
    | (<STAR> [<COLON> <IDENTIFIER>])
}


void NodeType()throws Exception:{}
{
       <NODE>
    |  <TEXT>
    |  <COMMENT>
    |  <PROCESSING_INSTRUCTION>
}


void Quantifier()throws Exception:{boolean b;}
{
      <SOME> | <EVERY>
}


void QName()throws Exception:{}
{
      <IDENTIFIER> [ <COLON> <IDENTIFIER>]
}


void ComparisonOperator()throws Exception:{}
{
      <EQUALS>
    | <LESS_THAN>
    | <LESS_THAN_EQUALS>
    | <GREATER_THAN>
    | <GREATER_THAN_EQUALS>
    | <NOT_EQUALS>
}



void CastingVerb()throws Exception:{}
{
      <CAST>
    | <TREAT>
}




void SortSpecList()throws Exception:{}
{
   Expression() [<ASCENDING> |  <DESCENDING>]  [<COMMA> SortSpecList()]
}



void VAR()throws Exception : {}
{
  <DOLLAR> <IDENTIFIER> 
}


void Literal()throws Exception:{}
{
       <STRING>
    |  <INTEGER>
    |  <FLOAT>
    |  Boolean()
}


void Boolean() throws Exception:{}
{
        <TRUE> | <FALSE> 
}


void ExpressionList()throws Exception:{}
{
      Expression() [<COMMA> ExpressionList()]
}

H XQuery Issues List (Non-Normative)

This section contains the current issues for XQuery. The individual issues are shown in detail after an abbreviated issues list.

H.1 Issue List

Issue
1: Collection types (xquery-collection-types)
2: Definitions of Operators (xquery-definition-of-operators)
3: Operators and functions (xquery-operators-and-functions)
4: External Functions (xquery-external-functions)
5: Function Definition (xquery-function-definition)
6: Function Resolution (xquery-function-resolution)
7: CAST expression (xquery-cast-expression)
8: Type Guard (xquery-type-guard)
9: Separation of clauses in FLWR (xquery-separation-of-flowers)
10: Alignment of Syntax (xquery-alignment-of-syntax)
11: Alternative syntax for element construction (xquery-element-construction)
12: Fusion (xquery-fusion)
13: Filter as a Function (xquery-filter-function)
14: TRY/CATCH and error() (xquery-try-catch-error)
15: Updates (xquery-updates)
16: Algebra Mapping (xquery-algebra-mapping)
17: XPath Productions (xquery-xpath-productions)
18: Abstract Syntax (xquery-abstract-syntax)
19: Recursion (xquery-recursion)
20: Copy and Reference Semantics (xquery-copy-reference)
21: View Definition (xquery-persist-views-functions)
22: Human-Readable Syntax for Types (xquery-type-syntax)
23: What is a Query (xquery-query)
24: XPath Type Coercions (xquery-xpath-coercions)
25: Support for Unordered Collections (xquery-unordered-collections)
26: Identity-based equality operator (xquery-equality-identity)
27: Deep equality (xquery-deep-equality)
28: Reference Constructor (xquery-reference-constructor)
29: Precedence of Operators (xquery-precedence)
30: Queries with Invalid Content (xquery-invalid-content)
31: Function Libraries (xquery-function-library)
32: Correspondence of Types (xquery-type-correspondence)
33: Excluding Undesired Elements (xquery-exclude-undesireables)
34: Alignment of Precedence (xquery-align-precedence)
35: Escape to ABQL (xquery-escape-to-abql)
36: CAST and TREAT AS Syntax (xquery-cast-syntax)
37: XML Schema Datatypes Constructors (xquery-datatype-constructors)
38: Attribute Constructor Function (xquery-attribute-constructor-function)
39: Attribute Name, Attribute Content (xquery-attribute-name-content)
40: Dereference Operator and Links (xquery-dereference-links)
41: XML Constructor (xquery-literal-xml-constructor)
42: Eval (xquery-eval)
43: Inline XML Schema Declarations (xquery-inline-xml-schema)
44: Encoding (xquery-encoding)
45: Typing of Filter (xquery-filter-typing)
46: Typeswitch (xquery-typeswitch)
47: Mapping Input Context (xquery-mapping-input-context)
48: Accessing Element Data (xquery-data-function)
49: Embedding XML in XQuery (xquery-embedding-xml)
50: Embedding XQuery in XML (xquery-embedding-xquery-in-xml)
51: Naive Implementation Strategy (xquery-naive-implementation)
52: XML-based Syntax (xquery-abql)
53: Mapping CAST AS (xquery-mapping-cast)
54: Defining Behavior for Well Formed, DTD, and Schema Documents (xquery-define-schema-variants)

H.2 XQuery Issues

Issue 1 : Collection types (xquery-collection-types)

Originator:	XQuery Editors
Locus:	requirements

Description:

XQuery currently considers book and list(book) to be different types. List is an abbreviation of the facets minoccurs=0, maxoccurs = unbounded. We need to confirm that this accurately reflects the type system of XML Schema.

Issue 2 : Definitions of Operators (xquery-definition-of-operators)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

XPath defines some operators on lists in ways that differ from most commonly used languages. For example, if $X is a list, $X+1 is defined as the result of adding 1 to the first element in $X (ignoring the other elements.) XQuery takes a more regular approach to operations on lists, as described in the section on functions. For example, if the function increment(integer) is defined to add 1 to its argument, then the function call increment($X) where $X is a list of integers will return a list in which every member has been incremented by 1.

This issue needs more study, which should probably occur on the joint task force between XML Query and XSL. Operations on lists may be one area in which work is needed to evolve the XPath specification toward Version 2.0.

It is very desirable that operators such as "+" occur only once in the grammar, not separately in the "XPath" and "non-XPath" parts of the language. This probably means that path expressions should be integrated into the XQuery grammar and not treated as a "terminal symbol." This is the approach that has been taken in the current CUP and JavaCC grammars.

Expressions involving operators must also be carefully defined. We expect this to be handled by the task force on operators and the XPath 2.0 task force.

Issue 3 : Operators and functions (xquery-operators-and-functions)

Originator:	XQuery Editorial Team
Locus:	xquery-algebra

Description:

A complete list of XQuery operators and core functions is needed, with their signatures (datatypes of operands and results). We have reserved an appendix for this, but the appendix is currently empty.

Sources of information: XPath specification, operator lists presented at recent F2F meetings by Frank Olken and others.

Note: The Query Working Group and the Schema Working Group have agreed to spawn a joint task force to investigate this issue. The XQuery language will integrate the results of that work.

Issue 4 : External Functions (xquery-external-functions)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

An extensibility mechanism needs to be defined that permits XQuery to access a library of functions written in some other programming language such as Java.

Some sources of information: the definition of external functions in SQL, the implementation of external functions in Kweelt.

Issue 5 : Function Definition (xquery-function-definition)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

We need more thought about what constitutes a valid parameter-type for a function. Attribute-types as well as element-types? Type-names vs. element-names? Should all the MSL symbol spaces be represented?

It is probably important to have NODE as a type, to allow functions to take any XML node as a parameter, or to return any XML node as a result.

Using univeral names similar to those found in the MSL document, but with a different syntax, would allow us to reference any schema type in XQuery function definitions.

Issue 6 : Function Resolution (xquery-function-resolution)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

More detailed rules need to be developed for function resolution. What kinds of function overloading are allowed? A promotion hierarchy of basic types needs to be specified. The issue of polymorphic functions with dynamic dispatch needs to be studied. Can overloaded functions be defined such that the parameter-type of one function is a subtype of the parameter-type of another function? If so, what are the constraints on the return-types of these functions? Is function selection based on the static type of the argument or on the dynamic type of the argument (dynamic dispatch, performed at execution time)? If XQuery supports dynamic dispatch, is it based on all the arguments of a function or on only one distinguished argument?

Observation: This is a very complex area of language design. If it proves too difficult to solve in the available time, it may be wise to take a simple approach such as avoiding dynamic dispatch in Version 1 of XML Query.

Proposal:

The Query Algebra does not support overloading or dynamic dispatch. We will attempt to simplify XML Query Level 1 by omitting these, unless it becomes clear that they are needed. We realize that this might happen.

Issue 7 : CAST expression (xquery-cast-expression)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

Does XQuery need a CAST expression for casting an instance of one type into another type?

Proposal:

A "CAST AS" operator has been added to this current Working Draft. We solicit feedback on this operator.

Issue 8 : Type Guard (xquery-type-guard)

Originator:	Michael Rys
Locus:	xquery-algebra

Description:

Does XQuery require a syntax for type guards?

Proposal:

We have added the "TREAT AS" operator to this current Working Draft. This is especially helpful for function arguments, since we do not have function polymorphism in the current specification. We solicit feedback on this operator.

Issue 9 : Separation of clauses in FLWR (xquery-separation-of-flowers)

Originator:	Algebra Editors
Locus:	xquery-algebra

Description:

A FOR clause can stand alone: FOR $var IN expr1 RETURN expr2. A LET clause can stand alone: LET $var := expr1 EVAL expr2. It has been suggested that a WHERE clause should also be able to stand alone: WHERE expr1 RETURN expr2.

Note: in the current BNF, at least one FOR or LET clause must occur, but it is not necessary to have a FOR.

Observation: This functionality is already provided by the conditional expression, IF expr1 THEN expr2 ELSE [ ]. There does not seem to be a compelling reason to provide a second way to write this expression. However, making the ELSE clause optional might be a slight improvement.

Proposal:

The following content model should be adopted for FLWR expressions:

FlwrExpr ::= (ForClause | LetClause)+ [WhereClause] ReturnClause

This allows us also to remove the separate LET/EVAL expression, which can now be expressed using LET/RETURN.

Issue 10 : Alignment of Syntax (xquery-alignment-of-syntax)

Originator:	Algebra Editors
Locus:	xquery-algebra

Description:

Should the syntaxes of XQuery and the XML Query Algebra be more closely aligned?

Proposal:

Since the Algebra is not a user language, it does not seem necessary for it to use the same keywords as XQuery, as long as it is straightforward to map to it.

Issue 11 : Alternative syntax for element construction (xquery-element-construction)

Originator:	Algebra Editors
Locus:	xquery-algebra

Description:

(From Algebra team): We think it would be helpful to have two syntaxes for construction, xquery-algebra a[. . .] as used in the Algebra and <a> . . . </a> as used in XQuery. Not least, the a[. . .] syntax matches the syntax used in the Algebra for types.

Proposal:

We do not believe it is helpful to have two syntaxes for the same thing, and we do not feel that this suggestion would make XQuery easier to read or use. In any case, square brackets are used in XQuery to enclose predicates and are not available for the suggested purpose.

Issue 12 : Fusion (xquery-fusion)

Originator:	Michael Rys
Locus:	xquery-algebra

Description:

Consider adding a fusion operator to XQuery. Michael has supplied us with a few queries in which fusion would be helpful, and fusion seems promising as a feature, but we have not yet done adequate study of this issue. We would like to explore a wider set of use cases to make sure that we take a general approach to the problems related to fusion.

This requires further study. Use cases will need to be developed to determine precisely what forms of a fusion operator are actually helpful for data integration. An extensive and informative thread on this topic can be found in the W3C archives, starting at lists.w3.org/Archives/Member/w3c-archive/2000Dec/0132.html (W3C members only)..

Issue 13 : Filter as a Function (xquery-filter-function)

Originator:	Dana
Locus:	xquery-algebra

Description:

Dana has observed that Filter can be a function rather than an operator, if nodes have identity.

Resolution:

Filter is now a function, where the first parameter is the expression to be filtered, and the second parameter is the filter expression. We find this more elegant.

Issue 14 : TRY/CATCH and error() (xquery-try-catch-error)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

We believe the following approach to error handling would be very useful - (1) introduce TRY <expression> CATCH <expression>, similar to try/catch in OO languages. Instead of having "throw" to throw objects, use error(<expression>), bind the result of the expression to the variable $err, and allow $err to be used in the CATCH clause.

Proposal:

Dana Florescu has been assigned the task of writing a proposal for this.

Issue 15 : Updates (xquery-updates)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

We believe that a syntax for update would be extremely useful, allowing inserts, updates, and deletion. This might best be added as a non-normative appendix to the syntax proposal, since the algebra is not designed for defining this portion of the language.

Proposal:

Jonathan is working on a proposal for this.

Issue 16 : Algebra Mapping (xquery-algebra-mapping)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

The algebra mapping is incomplete and out of date.

Proposal:

Jerome has created a new version of the mapping, with help from Mary, Dana and Mugur.

Issue 17 : XPath Productions (xquery-xpath-productions)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

XPath can't be treated as a terminal symbol in our grammar. We intend XQuery to be a superset of the abbreviated syntax of XPath. We do not use the grammar of XPath directly because it needs to be integrated into our other productions. For example, operators like the union operator, which occur in path expressions, also occur in other contexts in XQuery, and it makes little sense to define two different operators. This raises issues of coordination with XPath.

Issue 18 : Abstract Syntax (xquery-abstract-syntax)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

Jerome and Mary have suggested that we abandon the separate abstract syntax for XQuery, in favor of a higher-level BNF.

Proposal:

This is the BNF that now appears in Appendix B.

Issue 19 : Recursion (xquery-recursion)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

Should XQuery support general recursion, or should it be limited in some way? Status quo: XQuery currently supports general recursion.

Reference: XML Query Algebra Issues

[Issue-0008] Fixed point operator or recursive functions]

[Issue-0032] Full regular path expressions]

Issue 20 : Copy and Reference Semantics (xquery-copy-reference)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

Copy and reference semantics must be defined properly for updates to work. This must be coordinated with the algebra team.

Issue 21 : View Definition (xquery-persist-views-functions)

Originator:	Mugur
Locus:	xquery-algebra

Description:

Do we need a way to define views?

Resolution:

While a mechanism for view definition is desirable, we do not currently intend to provide one in Level 1.

Issue 22 : Human-Readable Syntax for Types (xquery-type-syntax)

Originator:	Algebra Editors
Locus:	xquery-algebra

Description:

The Algebra has a syntax for declaring types. Up to now, XQuery uses XML Schema for declaring types. Is this sufficient? Some important questions:

Are type names sufficient, or does XQuery really need its own syntax for declaring types?
Would Normalized Universal Names (derived from MSL) be sufficient for type names?
How will type names be bound to definitions?

Issue 23 : What is a Query (xquery-query)

Originator:	Dana
Locus:	xquery-algebra

Description:

What is a query?

According the the algebra: any number of type declarations, function definitions, variable definitions and expressions.

According to XQuery: any number of namespace declarations, function definitions and a unique expression.

These definitions should be coordinated.

Issue 24 : XPath Type Coercions (xquery-xpath-coercions)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

XPath has a number of implicit type coercions, and also has implied existential quantification in some places. In XML 1.0, which had a small and loose type system, this was less problematic than it is with XML Schema, which introduces many types and relationships among types. If XQuery is to be compatible with XPath, we need to study these rules carefully, and adapt them to be rational and intuitive when used with the XML Schema type system.

Also, there are interactions between quantification and type coercion in XPath, sometimes causing non-intuitive results.

Issue 25 : Support for Unordered Collections (xquery-unordered-collections)

Originator:	Algebra Editors
Locus:	xquery

Description:

Does XQuery need features to add support for unordered collections? If so, what features are required? In the current draft, "unordered" is a property of a list. The user can create an ordered list from an unordered list by using SORTBY. The distinct() function not only removes duplicates from a list, it also renders the list unordered.

Do we need a function that merely removes the ordered property of a list?

How does the ordered/unordered property of a list affect the semantics of operators applied to it?

Issue 26 : Identity-based equality operator (xquery-equality-identity)

Originator:	Algebra Editors
Locus:	xquery-algebra

Description:

Do we need an identity-based equality operator? Please justify your answer with sample queries. Note that XPath gets along without it.

Issue 27 : Deep equality (xquery-deep-equality)

Originator:	Jonathan
Locus:	xquery-algebra

Description:

In XPath, <book><title> Mark Twain </title></book> and <book><author> Mark Twain </author></book> are treated as equal in comparisons. Is this acceptable for us? Do we need another notion of deep equality? If so, what are the compatibility issues with XPath?

Issue 28 : Reference Constructor (xquery-reference-constructor)

Originator:	Jonathan
Locus:	xquery

Description:

Queries should be able to return references; for instance, a query that generates a table of contents should also be able to create references to the items in the content itself. How can we extend the syntax to support this?

Issue 29 : Precedence of Operators (xquery-precedence)

Originator:	Dana
Locus:	xquery

Description:

The XQuery editorial team is working through a number of cases to determine the best precedence of operators. Our current precedence rules are shown in Appendix B.

Issue 30 : Queries with Invalid Content (xquery-invalid-content)

Originator:	XQuery Editors
Locus:	xquery-algebra

Description:

Is it an error for a query to specify content that may not appear, according to the schema definition? Consider the following query:

invoice//nose

If the schema does not allow a nose to appear on an invoice, is this query an error, or will it simply return an empty list?

Issue 31 : Function Libraries (xquery-function-library)

Originator:	XQuery Editors
Locus:	xquery

Description:

XQuery needs a mechanism to allow function definitions to be shared by multiple queries. The XQuery grammar allows function definitions to occur without a query expression.

We must provide a way for queries to access functions in libraries. For instance, we might add an IMPORT statement to XQuery, with the URI of the functions to be imported. It must be possible to specify either that (1) local definitions replace the imported definitions, or (2) imported definitions replace the local ones.

Issue 32 : Correspondence of Types (xquery-type-correspondence)

Originator:	Jerome Simeon
Locus:	xquery-algebra

Description:

Section 2.9, on functions, portrays XQuery as a statically typed language, but the mechanisms by which static typing is established are still being developed by the XML Query Algebra editorial team. A complete accounting for type requires that the XML Query Algebra conform completely to the XML Schema type system, and that many open issues be resolved.

The semantics of XQuery are defined in terms of the operators of the XML Query Algebra (see Appendix E). The mapping of XQuery operators into Algebra operators is still being designed, and may result in some changes to XQuery and/or the Algebra. The type system of XQuery is the type system of XML Schema. Work is in progress to ensure that the type systems of XQuery, the XML Query Algebra, and XML Schema are completely aligned. The details of the operators supported on simple XML Schema datatypes will be defined by a joint XSLT/Schema/Query task force.

Issue 33 : Excluding Undesired Elements (xquery-exclude-undesireables)

Originator:	Don Chamberlin
Locus:	xquery

Description:

How do we exclude undesired elements from the results of joins?

This need came out of a thread exploring data integration scenarios, starting with http://lists.w3.org/Archives/Member/w3c-archive/2000Dec/0132.html.

Issue 34 : Alignment of Precedence (xquery-align-precedence)

Originator:	Jerome Simeon
Locus:	xquery-algebra

Description:

The precedence rules of XQuery and the algebra are not completely aligned. This needs to be fixed by the time both specifications are finished.

Issue 35 : Escape to ABQL (xquery-escape-to-abql)

Originator:	Jerome Simeon
Locus:	xquery

Description:

Is there a need to be able to escape to ABQL?

Issue 36 : CAST and TREAT AS Syntax (xquery-cast-syntax)

Originator:	Jonathan Robie
Locus:	xquery

Description:

Various approaches to the syntax of CAST AS and TREAT AS have been proposed. Java and SQL have both been sources used to discuss these approaches. We are still exploring other syntactic approaches.

Issue 37 : XML Schema Datatypes Constructors (xquery-datatype-constructors)

Originator:	Paul Cotton
Locus:	xquery

Description:

The set of constructor functions for XML Schema simple datatypes has not yet been established. A joint task force with XML Schema has been chartered to determine the operators on datatypes, and we expect them to also determine the set of constructors for datatypes.

Issue 38 : Attribute Constructor Function (xquery-attribute-constructor-function)

Originator:	Don Chamberlin
Locus:	xquery

Description:

We need a function for constructing attributes, eg:

<foo> 
   IF $f/temp > 200
	  THEN attribute("warning", "about to explode!!")
     ELSE []f
</foo>

In the above example, the first parameter of attribute is the name of the attribute, and the second is the content of the attribute. Attributes created within an element constructor are placed in the attribute list of the element that is constructed.

Issue 39 : Attribute Name, Attribute Content (xquery-attribute-name-content)

Originator:	Don Chamberlin
Locus:	xquery

Description:

We need functions to return the name of an attribute and the content of an attribute.

Issue 40 : Dereference Operator and Links (xquery-dereference-links)

Originator:	Jonathan Robie
Locus:	xquery

Description:

Does the dereference operator work on links, such as XLink or HTML href?

Issue 41 : XML Constructor (xquery-literal-xml-constructor)

Originator:	Jonathan Robie
Locus:	xquery-algebra

Description:

Is there a need for a constructor that creates an instance of the XML Query Data Model from a string that contains XML text?

Issue 42 : Eval (xquery-eval)

Originator:	Jonathan Robie
Locus:	xquery-algebra

Description:

Is there a need to be able to execute a string that contains the text of an XQuery query? (This is similar to the eval function in Lisp.)

Issue 43 : Inline XML Schema Declarations (xquery-inline-xml-schema)

Originator:	Don Chamberlin
Locus:	xquery

Description:

Do we need to allow inline XML schema declarations in the prolog of a query? The following example shows one potential syntax for this. It extends the namespace declaration to allow literal XML Schema text to occur instead of a URI in a namespace declaration. The implementation would then assign an internal URI to the namespace.

  NAMESPACE fid = "http://www.example.com/fiddlefaddle.xsd"


  NAMESPACE loc =  [[
    <xsd:schema xmlns:xsd = "http://www.w3.org/2000/10/XMLSchema">
      <xsd:simpleType name="myInteger">
        <xsd:restriction base="xsd:integer">
          <xsd:minInclusive value="10000"/>
          <xsd:maxInclusive value="99999"/>
        </xsd:restriction>
      </xsd:simpleType>
    </xsd:schema>
  ]]


  FUNCTION string-to-myInteger ($s STRING) RETURNS loc:myInteger
  {
  -- If the facets of loc:myInteger are not satisfied,
  -- this function raises an error.
  
    LET $t := round(number($s))
    RETURN TREAT AS loc:myInteger($t)
  }


  string-to-myInteger("1023")

Issue 44 : Encoding (xquery-encoding)

Originator:	Jonathan Robie
Locus:	xquery

Description:

Does XQuery need a way to specify the encoding of a query? For instance, should the prolog allow statements like the following?

ENCODING utf-16

Issue 45 : Typing of Filter (xquery-filter-typing)

Originator:	Jerome Simeon
Locus:	xquery-algebra

Description:

The current mapping of filter to the algebra does not preserve much useful type information. Can the mapping be improved? Is there another approach to filter that would yield better type information?

Issue 46 : Typeswitch (xquery-typeswitch)

Originator:	Mary Fernandez
Locus:	xquery

Description:

Do we need an expression similar to the MATCH expression in the algebra? One proposed syntax looks like this:

TYPESWITCH expr 
		CASE typename1: expr1
		CASE typename2: expr2

Since this syntax corresponds more closely to the syntax of the algebra, it can be mapped more easily. Alternatively, the match expression in the algebra could be made to resemble the INSTANCEOF expression of XQuery.

Issue 47 : Mapping Input Context (xquery-mapping-input-context)

Originator:	Jerome Simeon
Locus:	xquery-algebra

Description:

Do we need a way to specify the nodes in the input context? Many queries do not specifically state the input to which they will be applied. This allows the same query, for instance, to be applied to a number of databases. It may be helpful for the mapping to introduce a global variable, eg $input, to represent the input nodes for a query. This variable might even be useful in the syntax of XQuery itself.

Issue 48 : Accessing Element Data (xquery-data-function)

Originator:	Mary Fernandez
Locus:	xquery

Description:

There is no operator to access the typed constant content of an element. In the Algebra, the data() operator does this. Should XQuery do the same?

Issue 49 : Embedding XML in XQuery (xquery-embedding-xml)

Originator:	XQuery Editors
Locus:	xquery

Description:

Do we need a way to embed literal XML content in a query? For instance, should there be a way to construct an XML element using the native syntax of XML?

Issue 50 : Embedding XQuery in XML (xquery-embedding-xquery-in-xml)

Originator:	Steve Tolkin
Locus:	xquery-algebra

Description:

Do we need a way to escape from XML to XQuery syntax? This could be used to provide functionality similar to that of ASP or JSP.

Issue 51 : Naive Implementation Strategy (xquery-naive-implementation)

Originator:	Marton Nagy
Locus:	xquery

Description:

Marton Nagy has suggested that it would be helpful to describe a naive implementation strategy for XQuery.

A naive XQuery implementation might parse the query, map it to Algebra syntax, and pass it to an Algebra implementation to request type checking from the algebra, returning an error if there were static type errors. A naive implementation might then request query execution from the algebra, get the results from the algebra and return it to the user.

Alternatively, the implementation might have its own algebra for execution, or it might generate statements in a specific implementation language such as XPath or SQL.We expect a wide variety of implementation approaches to be used in practice.

Issue 52 : XML-based Syntax (xquery-abql)

Originator:	XML Query WG
Locus:	xquery

Description:

XQuery needs an XML representation that reflects the structure of an XQuery query. Drafts of such a representation have been prepared, but it is not yet ready for publication.

Issue 53 : Mapping CAST AS (xquery-mapping-cast)

Originator:	Jerome Simeon
Locus:	xquery-algebra

Description:

The algebra does not have coercion functions between values. The mapping requires this for CAST AS.

Issue 54 : Defining Behavior for Well Formed, DTD, and Schema Documents (xquery-define-schema-variants)

Originator:	Don Chamberlin
Locus:	xquery-algebra

Description:

We should specify the behavior of XQuery for well formed XML, XML validated by a schema, and XML validated by a DTD.

XQuery: A Query Language for XML

W3C Working Draft 15 February 2001

Abstract

Status of this document

Table of contents

Appendices

1 Introduction

2 The XQuery Language

2.1 Path Expressions

2.2 Element Constructors

2.3 FLWR Expressions

2.4 Operators in Expressions

2.5 Conditional Expressions (IF THEN ELSE)

2.6 Quantifiers

2.7 Filtering

2.8 Datatypes

2.9 Functions

2.10 User-Defined Datatypes

2.11 Operations on Datatypes

2.12 Structure of an XQuery Unit

3 Querying Relational Data

3.1 Grouping

3.2 Joins

4 Conclusion

A References

B XQuery Grammar

C List of Operators

D Core Function Library

E XQuery Semantics

E.1 Notations

E.2 Algebra mapping for XQuery expressions

E.2.1 Path expressions

E.2.1.1 Current node

E.2.1.2 Root node

E.2.1.3 Simple Navigation

E.2.1.4 Recursive Navigation

E.2.1.5 Access to COMMENT, NODES, PIs and TEXT nodes

E.2.1.6 Dereference

E.2.1.7 Predicates

E.2.1.8 Parent

E.2.2 Node constructors

E.2.2.1 Local and qualified names

E.2.2.2 Element and attribute constructors

E.2.2.3 Comments and PI's constructors

E.2.3 FLWR expressions

E.2.3.1 F to for, L to let, W to where, R to do

E.2.3.2 Sort by

E.2.4 Operators

E.2.4.1 Boolean operators

E.2.4.2 Predicate operators

E.2.4.3 Arithmetic operators

E.2.4.4 Collection operators

E.2.5 Conditional Expressions

E.2.6 Literal values

E.2.7 Function Calls and Declarations

E.2.8 Existential and Universal Quantifiers

E.2.9 Variables

E.2.10 Operations on Types

E.3 Mapping of XQuery declarations to Algebra declarations

E.3.1 Mapping of type declarations

E.3.2 Mapping of namespace declarations

E.3.3 Mapping of function declarations

E.3.4 Predefined functions

F Java CUP Grammar for XQuery (Non-Normative)

G JavaCC Grammar (Non-Normative)

H XQuery Issues List (Non-Normative)

H.1 Issue List

H.2 XQuery Issues

Issue 1 : Collection types (xquery-collection-types)

Description:

Issue 2 : Definitions of Operators (xquery-definition-of-operators)

Description:

Issue 3 : Operators and functions (xquery-operators-and-functions)

Description:

Issue 4 : External Functions (xquery-external-functions)

Description:

Issue 5 : Function Definition (xquery-function-definition)

Description:

Issue 6 : Function Resolution (xquery-function-resolution)

Description: