XML Query Language (XQL)

Authors:: Jonathan Robie, Texcel, Inc.
Joe Lapp, webMethods, Inc.
David Schach, Microsoft Corporation

Contributors:: Michael Hyman, Microsoft Corporation
Jonathan Marsh, Microsoft Corporation

Abstract:

The XML Query Language (XQL) is a notation for addressing and filtering the elements and text of XML documents. XQL is a natural extension to the XSL pattern syntax. It provides a concise, understandable notation for pointing to specific elements and for searching for nodes with particular characteristics. This proposal was provided in September 1998 to the XSL Working Group (http://www.w3.org/Style/XSL/Group/1998/09/XQL-proposal.html) as input when considering extensions to the XSL pattern syntax.

The companion document Querying and Transforming XML describes the benefits of basing query and transformation languages for XML on the XSL transformation language and the extensions to the pattern language proposed here.

1. Introduction

The XSL pattern language ( http://www.w3.org/TR/WD-xsl, section 2.6) provides an extremely understandable way to describe a class of nodes to process. It is declarative rather than procedural. One simply describes the types of nodes to look for using a simple pattern modeled after directory notation. For example, book/author means find author elements contained in book elements.

XQL (XML Query Language) provides a natural extension to the XSL pattern language. It builds upon the capabilities XSL provides for identifying classes of nodes, by adding Boolean logic, filters, indexing into collections of nodes, and more.

XQL is designed specifically for XML documents. It is a general purpose query language, providing a single syntax that can be used for queries, addressing, and patterns. XQL is concise, simple, and powerful.

Note that the term XQL is a working term for the language described in this proposal. It is not our intent that this term be used permanently.

For an excellent paper describing the justification for XQL and its theoretical background, please see Jonathan Robie's paper The Design of XQL.

Here are the design goals for XQL:

XQL strings shall be compact.
XQL shall be easy to type and read.
XQL syntax shall be simple for the simple and common cases.
XQL shall be expressed in strings that can easily be embedded in programs, scripts, and XML or HTML attributes.
XQL shall be easily parsed.
XQL shall be expressed in strings that can fit naturally in URLs.
XQL shall be able to specify any path which may occur in an XML document and specify any set of conditions for the nodes in the path.
XQL shall be able to uniquely identify any node in an XML document.
XQL queries may return any number of results, including 0.
XQL queries are declarative, not procedural. They say what should be found, not how it should be found. This is important because a query optimizer must be free to use indexes or other structures in order to find results efficiently.
XQL query conditions may be evaluated at any level of a document, and are not expected to navigate from the root of a document.
XQL queries return results in document order with no repeats of nodes.

XQL is designed to be used in many contexts. Although it is a superset of XSL patterns, it is also applicable to providing links to nodes, for searching repositories, and for many other applications.

XQL is a notation for retrieving information from a document. The information could be a set of nodes, information about node relationships, or derived values. The specification does not indicate the output format. The result of a query could be a node, a list of nodes, an XML document, an array, or some other structure. That is, XQL does not dictate the binary format of the returns, but rather the logical returns.

In some implementations the result of a query will be an XML document or a tree that can be fed back in to XQL. In other cases the result will be a different type of structure, such as a set of pointers to nodes. Thus, closure is not guaranteed. If the implementation returns XML, then the XML must be well formed, and thus closure is guaranteed.

2. XML Patterns

This section describes the core XQL notation. These features should be part of every XQL implementation, and serve as the base level of functionality for its use in different technologies.

The basic syntax for XQL mimics the URI directory navigation syntax, but instead of specifying navigation through a physical file structure, the navigation is through elements in the XML tree.

For example, the following URI means find the foo.jpg file within the bar directory:

bar/foo.jpg

Similarly, in XQL, the following means find the collection of fuz elements within baz elements:

baz/fuz

Throughout this document you will find numerous samples. They refer to the data shown in the Sample Data Appendix.

2.1. Context

A 'context' is the set of nodes against which a query operates. This term is defined formally in Appendix A. To understand the concept of context, consider a tree containing nodes. Asking for all nodes named 'X' from the root of the tree would return one set of results. Asking for the set of nodes named 'X' from a branch in the tree would return a different set of results. Thus, the results of a query depend upon the context against which it is executed. There are a variety of ways that applications might specify the input context of a query.

XQL allows a query to select between using the current context as the input context and using the 'root context' as the input context. The 'root context' is a context containing only the root-most element of the document. By default, a query uses the current context. A query prefixed with '/' (forward slash) uses the root context. A query may optionally explicitly state that it is using the current context by using the './' (dot, forward slash) prefix. Both of these notations are analogous to the notations used to navigate directories in a file system.

The './' prefix is only required in one situation. A query may use the '//' operator to indicate recursive descent. When this operator appears at the beginning of the query, the initial '/' causes the recursive decent to perform relative to the root of the document or repository. The prefix './/' allows a query to perform a recursive descent relative to the current context.

Examples:

Find all author elements within the current context. Since the period is really not used alone, this example forward-references other features:

./author

Note that this is equivalent to:

author

Find the root element (bookstore) of this document:

/bookstore

Find all author elements anywhere within the current document:

//author

Find all books where the value of the style attribute on the book is equal to the value of the specialty attribute of the bookstore element at the root of the document:

book[/bookstore/@specialty = @style]

2.2. Results

The collection returned by an XQL expression preserves document order, hierarchy, and identity, to the extent that these are defined. That is, a collection of elements will always be returned in document order without repeats. A collection of attributes will be returned without repeats, but there is no implicit order since attributes are by definition unordered.

2.3. Collections

The collection of all elements with a certain tag name is expressed using the tag name itself. This can be qualified by showing that the elements are selected from the current context './', but the current context is assumed and often need not be noted explicitly.

Examples:

Find all first-name elements. These examples are equivalent:

./first-name

first-name

Find all unqualified book elements:

book

Find all first.name elements:

first.name

2.4. Selecting children and descendants

The collection of elements of a certain type can be determined using the path operators ('/' or '//'). These operators take as their arguments a collection (left side) from which to query elements, and a collection indicating which elements to select (right side). The child operator ('/')selects from immediate children of the left-side collection, while the descendant operator ('//') selects from arbitrary descendants of the left-side collection. In effect, the '//' can be though of as a substitute for one or more levels of hierarchy. Note that the path operators change the context as the query is performed. By stringing them together users can 'drill down' into the document.

Examples:

Find all first-name elements within an author element. Note that the author children of the current context are found, and then first-name children are found relative to the context of the author elements:

author/first-name

Find all title elements, one or more levels deep in the bookstore (arbitrary descendants):

bookstore//title

Note that this is different from the following query, which finds all title elements that are grandchildren of bookstore elements:

bookstore/*/title

Find emph elements anywhere inside book excerpts, anywhere inside the bookstore:

bookstore//book/excerpt//emph

Find all titles, one or more levels deep in the current context. Note that this situation is essentially the only one where the period notation is required:

.//title

2.5. Collecting element children

An element can be referenced without using its name by substituting the '*' collection. The '*' collection returns all elements that are children of the current context, regardless of their tag name.

Examples:

Find all element children of author elements:

author/*

Find all last-names that are grand-children of books:

book/*/last-name

Find the grandchildren elements of the current context:

*/*

Find all elements with specialty attributes. Note that this example uses subqueries, which are covered in Filters, and attributes, which are discussed in Finding an attribute:

*[@specialty]

2.6. Finding an attribute

Attribute names are preceded by the '@' symbol. XQL is designed to treat attributes and sub-elements impartially, and capabilities are equivalent between the two types wherever possible.

Note: attributes cannot contain subelements. Thus, attributes cannot have path operators applied to them in a query. Such expressions will result in a syntax error. Likewise, attributes are inherently unordered and indices cannot be applied to them.

Examples:

Find the style attribute of the current element context:

@style

Find the exchange attribute on price elements within the current context:

price/@exchange

The following example is not valid:

price/@exchange/total

Find all books with style attributes. Note that this example uses subqueries, which are covered in Filters:

book[@style]

Find the style attribute for all book elements:

book/@style

2.7. Grouping

Parentheses can be used to group collection operators for clarity or where the normal precedence is inadequate to express an operation.

2.8. Filters

Constraints and branching can be applied to any collection by adding a filter clause '[ ]' to the collection. The filter is analogous to the SQL WHERE clause with ANY semantics. The filter contains a query within it, called the subquery. The subquery evaluates to a Boolean, and is tested for each element in the collection. Any elements in the collection failing the subquery test are omitted from the result collection.

For convenience, if a collection is placed within the filter, a Boolean TRUE is generated if the collection contains any members, and a FALSE is generated if the collection is empty. In essence, an expression such as author/degree implies a collection-to-Boolean conversion function like the following mythical 'there-exists-a' method.

author[.there-exists-a(degree)]

Note that any number of filters can appear at a given level of an expression. Empty filters are not allowed.

Examples:

Find all books that contain at least one excerpt element:

book[excerpt]

Find all titles of books that contain at least one excerpt element:

book[excerpt]/title

Find all authors of books where the book contains at least one excerpt, and the author has at least one degree:

book[excerpt]/author[degree]

Find all books that have authors with at least one degree:

book[author/degree]

Find all books that have an excerpt and a title:

book[excerpt][title]

2.9. Boolean Expressions

Boolean expressions can be used within subqueries. For example, one could use Boolean expressions to find all nodes of a particular value, or all nodes with nodes in particular ranges. Boolean expressions are of the form ${op}$, where {op} may be any expression of the form {b|a} - that is, the operator takes lvalue and rvalue arguments and returns a Boolean result. Applications can provide additional operators as needed, although this practice is discouraged.

Note that the XQL Extensions section defines additional Boolean operations.

Operators are case sensitive.

2.9.1. Boolean AND and OR

$and$ and $or$ are used to perform Boolean ands and ors.

The Boolean operators, in conjunction with grouping parentheses, can be used to build very sophisticated logical expressions.

Note that spaces are not significant and can be omitted, or included for clarity as shown here.

Examples:

Find all author elements that contain at least one degree and one award.

author[degree $and$ award]

Find all author elements that contain at least one degree or award and at least one publication.

author[(degree $or$ award) $and$ publication]

2.9.2. Boolean NOT

$not$ is a Boolean operator that negates the value of an expression within a subquery.

Examples:

Find all author elements that contain at least one degree element and that contain no publication elements.

author[degree $and$ $not$ publication]

Find all author elements that contain publications elements but do not contain either degree elements or award elements.

author[$not$ (degree $or$ award) $and$ publication]

2.10. Equivalence

The '=' sign is used for equality; '!=' for inequality. Alternatively, $eq$ and $ne$ can be used for equality and inequality.

Single or double quotes can be used for string delimiters in expressions. This makes it easier to construct and pass XQL from within scripting languages.

For comparing values of elements, the value() method is implied. That is, last-name < 'foo' really means last-name!value() < 'foo'.

Note that filters are always with respect to a context. That is, the expression book[author] means for every book element that is found, see if it has an author subelement. Likewise, book[author = 'Bob'] means for every book element that is found, see if it has a subelement named author whose value is 'Bob'. One can examine the value of the context as well, by using the . (period). For example, book[. = 'Trenton'] means for every book that is found, see if its value is 'Trenton'.

Examples:

Find all author elements whose last name is Bob:

author[last-name = 'Bob']
author[last-name $eq$ 'Bob']

Find all authors where the from attribute is not equal to 'Harvard':

degree[@from != 'Harvard']
degree[@from $ne$ 'Harvard']

Find all authors where the last-name is the same as the /guest/last-name element:

author[last-name = /guest/last-name]

Find all authors whose text is 'Matthew Bob':

author[. = 'Matthew Bob']

2.10.1. Comparisons and vectors

The lvalue of a comparison can be a vector or a scalar. The rvalue of a comparison must be a scalar or a value that can be cast at runtime to a scalar.

If the lvalue of a comparison is a set, then any (exists) semantics are used for the comparison operators. That is, the result of a comparison is true if any item in the set meets the condition.

2.10.2. Comparisons and literals

The lvalue of an expression cannot be a literal. That is, '1' = a is not allowed.

2.10.3. Casting of literals during comparison

All elements and attributes are strings, but quite often one wants to do numeric comparisons.

If the rvalue is an attribute, the text(lvalue) is compared to the text(rvalue).

If the rvalue is a literal, then the following rules apply:

literal type	comparison	example
String	text(lvalue) op text(rvalue)	a < 'foo'
Integer	(long) lvalue op (long) rvalue	a < 3
Real	(double) lvalue op (double) rvalue	a < 3.1

Exponential notation is not supported.

See Data types for a discussion of casting when elements are typed.

2.11. Methods

XQL provides methods for advanced manipulation of collections. These methods provide specialized collections of nodes (see Collection methods), as well as information about sets and nodes.

Methods are of the form {method}(arglist)

Consider the query book[author]. It will find all books that have authors. Formally, we call the book corresponding to a particular author the reference node for that author. That is, every author element that is examined is an author for one of the book elements. (See the Annotated XQL BNF Appendix for a much more thorough definition of reference node and other terms.) Methods apply to the reference node.

For example, the text() method returns the text contained within a node, minus any structure. (That is, it is the concatenation of all text nodes contained with an element and its descendants.) The following expression will return all authors named 'Bob':

author[text() = 'Bob']

The following will return all authors containing a first-name child whose text is 'Bob':

author[first-name!text() = 'Bob']

The following will return all authors containing a child named Bob:

author[*!text() = 'Bob']

Method names are case sensitive.

2.11.1. Information methods

The following methods provide information about nodes in a collection. These methods return strings or numbers, and may be used in conjunction with comparison operators within subqueries.

text()	The text contained within the element. This concatenates the text of all of the element's descendants. It does not include the tag names or any attribute values, comment values, etc. It trims the string as discussed in text(). (Returns a string.)
value()	Returns a type cast version of the value of an element. (See Datatypes) If data typing is not supported or a data type is not provided, returns the same as text().
nodeType()	Returns a number to indicate the type of the node:
	element	1
	attribute	2
	text	3
	PI	7
	comment	8
	document	9
nodeName()	The tag name of the node, including the namespace prefix. (Returns a string.)

2.11.1.1. text()

The text() method concatenates text of the descendents of a node, optionally normalizing white space along the way. White space will be preserved for a node if the node has the xml:space attribute set to 'preserve', or if the nearest ancestor with the xml:space attribute has the attribute set to 'preserve'. When white space is normalized, it is normalized across the entire string. Spaces are used to separate the text between nodes. When entity references are used in a document, spacing is not inserted around the entity refs when they are expanded.

Examples:

Find the authors whose last name is 'Bob':

author[last-name!text() = 'Bob']

Note this is equivalent to:

author[last-name = 'Bob']

Find the authors with value 'Matthew Bob':

author[text() = 'Matthew Bob']
author[. = 'Matthew Bob']

2.11.2. Collection index functions

index()

Returns the index number of the node within the parent. Indexes are zero-based, so 0 is the first element. (Returns a number.)

Find the first 3 degrees:

degree[index() $lt$ 3]

Note that the index function is with respect to a parent. Consider the following data:

<x>
  <y/>
  <y/>
</x>
<x>
  <y/>
  <y/>
</x>

The following expression will return the first y from each x:

x/y[index() = 0]

2.11.3. Shortcuts

For the purposes of comparison, value()is implied if omitted. In other words, when two items are compared, the comparison is between the value of the two items. Remember that in absence of type information, value() returns text.

The following examples are equivalent:

author[last-name!value() = 'Bob' $and$ first-name!value() = 'Joe']
author[last-name = 'Bob' $and$ first-name = 'Joe']

price[@intl!value() = 'canada']
price[@intl = 'canada']

2.12. Indexing into a collection

XQL makes it easy to find a specific node within a set of nodes. Simply enclose the index ordinal within square brackets. The ordinal is 0 based.

For example, the following finds the first author element:

author[0]

The following finds the third author element that has a first-name:

author[first-name][2]

Note that indices are relative to the parent. In other words, consider the following data:

<x>
  <y/>
  <y/>
</x>
<x>
  <y/>
  <y/>
</x>

The following expression will return the first y from each of the x's:

x/y[0]

The following will return the first y from the entire set of y's within x's:

(x/y)[0]

The following will return the first y from the first x:

x[0]/y[0]

2.12.1. Finding the last element in a collection

The end() method returns true for the last element in a collection. Note that end() is relative to the parent node.

Examples:

Find the last book:

book[end()]

Find the last author for each book:

book/author[end()]

Find the last author from the entire set of authors of books:

(book/author)[end()]

3. XQL Extensions

The XQL functionality provides a minimal set of functionality needed across XQL clients. XQL Extensions describes additional functionality that broadens the capabilities of XQL.

3.1. Namespaces

There is the need to establish namespace information inside of a query. This is particularly important since namespaces are local in scope. Thus, XQL needs a mechanism by which to establish namespaces either for the entire scope of the query or for local scope within a query.

Although we have yet to establish a specific namespace syntax, the following are requirements:

The syntax must be able to set a default namespace for use within the query
The syntax must be able to set a namespace for use at any particular scope
The syntax must be able to set namespaces to use at any level in the query. In other words, one should be able to set all of the namespaces at the beginning of the query, even if they are not used until elements deep inside of the query
The syntax must be able to establish a prefix to longname match up for the namespace

If namespaces are not specified for a query, the prefixes are matched.

The XQL processor may perform more efficiently if namespaces are defined for a session rather than on a per query basis. This specification does not describe an API or XML format for specifying a set of namespaces to use throughout a session. That is application defined.

3.1.1. Namespace methods

The following methods can be applied to a node to return namespace information.

baseName()	Returns the name portion of the node, excluding the prefix.
namespace()	Returns the URI for the namespace of the node.
prefix()	Returns the prefix for the node.

Examples:

Find all unqualified book elements. Note that this does not return my:book elements:

book

Find all book elements with the prefix 'my'. Note that this query does not return unqualified book elements:

my:book

Find all book elements with a 'my' prefix that have an author subelement:

my:book[author]

Find all book elements with a 'my' prefix that have an author subelement with a my prefix:

my:book[my:author]

Find all elements with a prefix of 'my':

my:*

Find all book elements from any namespace:

*:book

Find any element from any namespace:

Find the style attribute with a 'my' prefix within a book element:

book/@my:style

3.2. Finding a collection of attributes

All attributes of an element can be returned using @*. This is potentially useful for applications that treat attributes as fields in a record.

Examples:

Find all attributes of the current element context:

@*

Find style attributes from any namespace:

@*:style

Find all attributes from the 'my' namespace, including unqualified attributes on elements from the 'my' namespace:

@my:*

3.3. Comparisons

A set of binary comparison operators is available for comparing numbers and strings and returning Boolean results. $lt$, $le$, $gt$, $ge$ are used for less than, less than or equal, greater than, or greater than or equal. These same operators are also available in a case insensitive form: $ieq$, $ine$, $ilt$, $ile$, $igt$, $ige$.

Single or double quotes can be used for string delimiters in expressions. This makes it easier to construct and pass XQL from within scripting languages.

All elements and attributes are strings, but quite often one wants to do numeric comparisons. See Casting of literals during comparison and Data types for information on casting.

<, <=, > and >= are allowed short cuts for $lt$, $le$, $gt$ and $ge$.

Examples:

Find all author elements whose last name is bob and whose price is > 50

author[last-name = 'Bob' $and$ price $gt$ 50]

Find all authors where the from attribute is not equal to 'Harvard':

degree[@from != 'Harvard']

Find all authors whose last name begins with 'M' or greater:

author[last-name $ge$ 'M']

Find all authors whose last name begins with 'M', 'm' or greater:

author[last-name $ige$ 'M']

Find the first two books:

book[index() $le$ 2]

Find all authors who have more than 10 publications:

author[publications!count() $gt$ 10]

3.3.1. Data types

If data types are provided, the value() function uses them to determine the type for an element.

For purposes of comparison, the lvalue is always cast to the type of the rvalue, thereby guaranteeing that the types will be constant through the comparison. Any lvalues that cannot be coerced are omitted from the result set.

3.3.2. Type casting functions

XQL provides functions to typecast values. The typecasting functions can cast literals or sets.

Currently, only the date function is provided. It casts a value to a date. The value must be in one of the XML data type date formats.

Examples:

Find all books where the publication date is before January 1, 1995:

books[pub_date < date('1995-01-01')]

Find all books where the publication date is before the date value stored in the attribute first:

books[pub_date < date(@first)]

3.4. Any and all semantics

Authors can explicitly indicate whether to use any or all semantics through the $any$ and $all$ keywords.

$any$ flags that a condition will hold true if any item in a set meets that condition. $all$ means that all elements in a set must meet the condition for the condition to hold true.

$any$ and $all$ are keywords that appear before an expression.

Examples:

Find all author elements where one of the last names is Bob:

author[last-name = 'Bob']
author[$any$ last-name = 'Bob']

Find all author elements where none of the last-name elements are Bob:

author[$all$ last-name != 'Bob']

Find all author elements where the first last name is Bob:

author[last-name[0] = 'Bob']

3.5. Union and intersection

The $union$ operator causes multiple elements to be returned. No element reordering or duplication of return elements where doubly specified is implied. All elements in the selection list must be descendants of the context against which the selection is applied. Note: because this is a union, the set returned may include 0 or more elements of each element type in the list. To restrict the returned set to nodes that contain at least one of each of the elements in the list, use a filter, as discussed in Filters.

| is a shortcut for $union$.

The $intersect$ operator returns the set of elements in common between two sets. No element reordering is implied.

Examples:

Find all first-names and last-names:

first-name $union$ last-name

Find all books and magazines from a bookstore:

bookstore/(book | magazine)

Find all books and all authors:

book $union$ book/author

Find the first-names, last-names, or degrees from authors within either books or magazines:

(book $union$ magazine)/author/(first-name $union$ last-name $union$ degree)

Find all books with author/first-name equal to 'Bob' and all magazines with price less than 10:

book[author/first-name = 'Bob'] $union$ magazine[price $lt$ 10]

3.6. Collection methods

The collection methods provide access to the various types of nodes in a document. Any of these collections can be constrained and indexed. The collections return the set of children of the reference node meeting the particular restriction.

textNode()	The collection of text nodes.
comment()	The collection of comment nodes.
pi()	The collection of pi nodes.
element(['name'])	The collection of all element nodes. If the optional text parameter is provided, it only returns element children matching that particular name.
attribute(['name'])	The collection of all attribute nodes. If the optional text parameter is provided, it only returns attributes matching that particular name.
node()	The collection of all non-attribute nodes.

Examples:

Find the second text node in each p element in the current context:

p/textNode()[1]

Find the second comment anywhere in the document. See Context for details on setting the context to the document root:

//comment()[1]

3.6.1. DOM nodes at the document root

Note that in the DOM the document object contains comments, processing instructions, and declarations, as well as what is termed the 'root element'. XQL uses the term 'document entity' for the root of the DOM tree - the document object - instead of the 'root element'. This allows comments, etc. at the document entity level to be addressed.

Examples:

Find all the comments at the document entity level:

/comment()

3.7. Aggregate methods

The following methods create an aggregate result based upon a set.

count()

Returns the number of nodes inside the set.

3.8. Additional methods

The following method returns a string indicating the type of a node.

nodeTypeString()

Returns one of the following strings to indicate the type of the node:

'document'
'element'
'attribute'
'processing_instruction'
'comment'
'text'

3.9. Ancestor

Ancestor finds the nearest ancestor matching a query. It returns either a single element result or null.

ancestor(query)

The nearest ancestor matching the provided query.

Examples:

Find the nearest book ancestor of the current element:

ancestor(book)

Find the nearest ancestor author element that is contained in a book element:

ancestor(book/author)

3.10. Subscript operator

A range of elements can be returned. To do so, specify an expression rather than a single value inside of the subscript operator (square brackets). Such expressions can be a comma separated list of any of the following:

n	Returns the nth element
-n	Returns the element that is n-1 units from the last element. E.g., -1 means the last element. -2 is the next to last element.
m $to$ n	Returns elements m through n, inclusive

Examples:

Find the first and third author elements:

author[0,3]

Find the first through fourth author elements:

author[0 $to$ 3]

Find the first, the third through fifth, and the last author elements:

author[0,2 $to$ 4, -1]

Find the last author element:

author[-1]

Appendix A. Annotated XQL BNF

This appendix provides the production rules for XQL. It expresses the productions in the BNF notation used to portray XML in the XML 1.0 specification. The productions define an LL(1) grammar for XQL. The appendix also formally defines the query semantics that correspond to instances of each production. The formal definitions are not intended to suggest an approach to implementing XQL.

A.1. Notes on Notation

The XQL BNF defines several productions in terms of productions that are already defined in other specifications. The BNF borrows the following productions from the W3C XML 1.0 Recommendation:

Char
Letter
Digit

The BNF also borrows the following productions from the March 27, 1998 draft of XML Namespaces:

NCNameChar
NCName

Numbers appearing in curly braces following a production identify constraints on the syntax. Such productions specify a superset of the valid instances of the production; the constraints specify the valid subset. This notation is necessary to keep the BNF clear and readable. The section Syntactic Constraints, defines the constraint associated with each constraint number.

This appendix distinguishes between node queries and full queries. A node query is a query that is only capable of returning nodes. A full query is capable of returning node values and non-node values. Non-node values include element type names, namespace URI's, concatenated text nodes, and node type names. The distinction is significant because node queries may appear as XSL match and select patterns, while full queries have use in other applications. The difference between the two forms of queries is trivial and exists only as constraints on the syntax of node queries. Node queries may contain nested full queries, so the BNF is actually simpler as a consequence of unifying the two forms of queries.

Unlike the production rules shown in the XML 1.0 Recommendation, the XQL BNF does not depict the valid placement of whitespace. Whitespace consists of spaces, tabs, carriage returns, and line feeds. XQL generally allows but does not require arbitrary strings of whitespace to appear between any two tokens; however, constraints on the productions may specify otherwise. XQL has been designed to ensure that spaces are not necessary so that XQL queries may appear in URI fragments. Leaving whitespace out of the BNF simplifies the BNF and makes the BNF more readable.

A.2. Terminology

Several notions are fundamental to the semantics of XQL. This section names these notions and provides their definitions. The language of this appendix depends heavily on these definitions.

A.2.1. Value

A value is one of the following: a node, a Boolean, a number, a string, or a set of values. When a query engine evaluates an expression, the result of the evaluation is always a value. The semantics ensure that when a value is a set, the set never contains a value that is a set. A non-node value may have an associated source node from which the value was derived.

A.2.2. Node

XQL recognizes the following constructs as nodes:

Document Entity
Processing Instruction
Comment
Element
Element Attribute
Markup-delimited region of text

A CDATA section is considered to be a region of text delimited by '<![CDATA[' and ']]>'.

A.2.3. Source Node

A value that is not a node may have an associated source node. The source node of a value is the node from which the value was derived. XQL associates source nodes with values so that values may be ordered in a set according to the document order of their associated source nodes.

A.2.4. Document Order

The values in a set may be placed in document order. Each value is a node, a non-node value that has an associated source node, or a non-node value that does not have an associated source node. Except for attribute nodes, every node corresponds to a region of markup or a region of text in the XML representation of the document under query.

The region associated with an attribute node is the region of the element node to which the attribute node belongs. The region associated with a non-node value that has an associated source node is the region of the source node. A non-node value that does not have an associated source node also does not have an associated region.

When in document order, the values of a set are ordered by the relative positions of the first characters of their associated regions. XQL does not define the relative order of two or more values that are associated with the same region. It also does not define the order that values associated with regions have relative to values not associated with regions.

A.2.5. Index

Every value of a set has an index. The index of a value is given by the position that the value would have were the values within the set to appear in document order. The value that occurs first has an index of zero, and subsequent values have sequentially increasing integer indexes.

A.2.6. Search Context

An expression in XQL is always evaluated with respect to a search context. A search context is a set of nodes through which an expression may search to yield the value of the expression. All nodes in the search context have the same parent node; the search context consists of all nodes that are immediate children of this parent node, including the attribute nodes of an element node parent.

A.2.7. Reference Node

The reference node for a search context is the node that is the immediate parent of all nodes in the search context. Every search context has an associated reference node.

A.2.8. Set of Reference Nodes

A set of reference nodes is a set of nodes that an operator defines in the process of evaluating an operand. The operator establishes a search context for each node in this set, where the node serves as a reference node for the search context. Each search context retains an association with the set of reference nodes from which its reference node was drawn.

A.2.9. Set of Search Contexts

The set of search contexts of an expression consists of those search contexts with respect to which an operator evaluates the expression. An operator that defines a set of reference nodes establishes one search context for each node in this set. The operator evaluates an operand once for each of these search contexts. It is useful to think of the operand as having a set of search contexts.

A.2.10. Pattern Matching

Pattern matching is the process of testing a node against a query to determine whether there exists a search context such that were the query to be evaluated with respect to the search context it would return the node. For a given node and a given query, the node is said to match the query if there is such a search context. XQL borrows the notion of pattern matching from XSL but defines it in terms of the XQL formalism.

A.3. Query Terms

This section lists the productions for all of the query terms of XQL, and it defines the values to which the terms evaluate. The definitions determine the behavior of the XQL operators, since the operators act on the values of the terms rather than on the terms themselves.

A.3.1. Term Production Rules

The following productions support the term productions but do not themselves correspond to terms:

Productions that Support Terms

WildNCName ::= NCName | '*'
 WildQName ::= (WildNCName? ':')? WildNCName                         {1,2}
   XQLName ::= (Letter | '_') (Letter | Digit | '_')*                  {1}
   Integer ::= '-'? Digit+
     Param ::= Disjunction | Number | Text

The following productions define all of the terms that are available to an XQL query:

Terms

       Number ::= '-'? Digit+ ('.' Digit+)?                            {3}
         Text ::= ''' (Char - ''')* ''' | ''' (Char - ''')* '''        {4}
         Root ::= '/'
          Dot ::= '.'
  ElementName ::= WildQName
AttributeName ::= '@' WildQName                                        {1}
   Invocation ::= XQLName '(' (Param (',' Param)*)? ')'              {5,6}

Each term by itself constitutes a valid query, except for the Number and Text terms. Instances of Number and Text may only appear as function and method parameters and as the right-hand operands of comparison operators. The BNF enforces this. All other terms may appear as either the left-hand operands or the right-hand operands of any operator.

A.3.2. Number

The Number term evaluates to a numeric value, which may be either a positive or negative integer or a floating point value.

A.3.3. Text

The Text term evaluates to a string value, which may be the empty string.

A.3.4. Root

The Root term evaluates to a set that contains only the document entity node of the document under query. The document entity node is the root-most node of the document. The children of this node include the document type declaration (when present) and the root element of the document.

A.3.5. Dot

As with all other terms, the Dot term is evaluated with respect to a search context. The term evaluates to a set that contains only the reference node for this search context.

A.3.6. ElementName

The ElementName term evaluates to a subset of the element nodes that the search context contains. It evaluates to the element nodes whose element type names match the ElementName instance. An element type name matches an instance under the following conditions:

When the instance is identical to '*', every name matches the instance.
When the instance contains neither the namespace separator ':' nor the wildcard character '*', the name matches the instance if it's local name is identical to the instance.
When the instance contains the namespace separator ':' but does not contain a namespace prefix, the name matches the instance if both its local name is identical to the local name portion of the instance and the name belongs to the default namespace.
When the instance contains both a namespace prefix and a local name, the namespace prefix portion of the instance must match the name's namespace, and the local name portion of the instance must match the name's local name. A '*' in the namespace prefix position matches any namespace; otherwise the prefix must correspond to the name's namespace. Local names match using rules (1) and (2), where the term 'instance' refers to the local name portion of the instance.

All matches are case sensitive. If the search context contains no element nodes having element type names that match the instance, the term evaluates to the empty set.

A.3.7. AttributeName

The AttributeName term evaluates to a subset of the attribute nodes that the search context contains. It evaluates to the attribute nodes whose names match the AttributeName instance. An attribute name matches an instance using the name-matching rules given for ElementName. If the search context contains no attribute nodes having names that match the instance, the term evaluates to the empty set.

A.3.8. Invocation

An instance of Invocation is a call to an XQL function or method. Each instance of Invocation names the function or method and provides a list of parameters. The function or method name is given by the instance of XQLName that begins the instance of Invocation. XQL defines a variety of functions and methods. Each constrains the form that the Invocation instance may take. This section uses production rules to define the valid syntax of each function and method.

A.3.8.1. Parameters

An Invocation instance may contain instances of Param. Each instance of Param is called a 'parameter.' Parameters are XQL expressions. This section uses the term query parameter to refer to a parameter that is an instance of Disjunction. A query engine does not evaluate a query parameter. Instead it parses the parameter to produce a query object, which it passes to the function or method. The function or method is responsible for selecting the search context for the parameter and for evaluating the query object with respect to this search context. The value of the instance of Invocation depends on the search context of the instance, on the function or method being called, and the values of the parameters.

A.3.8.2. Functions

This section defines the functions of XQL. XQL defines two kinds of functions: collection functions and pure functions. Collection functions use the search context of the Invocation instance, while pure functions ignore the search context, except to evaluate the function's parameters. A collection function evaluates to a subset of the search context, and a pure function evaluates to either a constant value or to a value that depends only on the function's parameters.

XQL defines the following functions:

Functions

'ancestor(' Disjunction ')'
'attribute(' Text? ')'
'comment(' ')'                                                         {8}
'date(' Param ')'                                                      {7}
'element(' Text? ')'
'id(' Text ')'
'node(' ')'                                                            {8}
'pi(' ')'                                                              {8}
'textNode(' ')'                                                        {8}
'true(' ')'                                                          {7,8}
'false(' ')'                                                         {7,8}

A.3.8.2.1. ancestor()

"ancestor" is a collection function that evaluates to either the empty set or to a set that contains a single node. The function evaluates to a set containing a single node when at least one ancestor of the reference node matches the function's query parameter. The value of the function is a set containing the ancestor of the reference node that is nearest to the reference node. The nearest node is given as the last node when all such nodes are listed in document order. Note that this node is never the reference node itself.

If the reference node has no ancestor nodes or if no ancestor of the reference node matches the query, the function evaluates to the empty set. See the definition of Pattern Matching for an explanation of what it means for a node to match a query.

A.3.8.2.2. attribute()

"attribute" is a collection function that has an optional quoted string parameter. When the parameter is absent, the function evaluates to the set of all attribute nodes found in the search context. When the parameter is present it identifies an attribute name, and the function evaluates to the set of those attribute nodes that appear in the search context and that have this attribute name. The set may be the empty set.

A.3.8.2.3. comment()

"comment" is a collection function that evaluates to the set of all comment nodes found in the search context. This set may be the empty set.

A.3.8.2.4. date()

"date" is a pure function that typecasts the value of its parameter to a set of dates. If the parameter matches the Number of Text productions, the value of the function is a set containing a single date. If the parameter matches the Disjunction production, the value of the function is a set of dates, where the set contains one date for each member of the set to which the parameter evaluates. XQL does not define the representation of the date value, nor does it define how the function translates parameter values into dates.

A.3.8.2.5. element()

"element" is a collection function that has an optional quoted string parameter. When the parameter is absent, the function evaluates to the set of all element nodes found in the search context. When the parameter is present it identifies an element type name, and the function evaluates to the set of those elements that appear in the search context and have this element type name. The set may be the empty set.

A.3.8.2.6. id()

"id" is a pure function that evaluates to a set. The set contains an element node that has an 'id' attribute whose value is identical to the string that the Text parameter quotes. The element node may appear anywhere within the document under query. If more than one element node meets these criteria, the function evaluates to a set that contains the first node appearing in a document ordering of the nodes.

A.3.8.2.7. node()

"node" is a collection function that evaluates to the set of all nodes found in the search context, excluding attribute nodes. This set may be the empty set.

A.3.8.2.8. pi()

"pi" is a collection function that evaluates to the set of all processing instruction nodes found in the search context. This set may be the empty set.

A.3.8.2.9. textNode()

"textNode" is a collection function that evaluates to the set of all text nodes found in the search context. This set may be the empty set.

A.3.8.2.10. true(), false()

"true" and "false" are pure functions that each evaluate to a Boolean. "true()" evaluates to 'true', and "false()" evaluates to 'false'. These functions are useful in expressions that are constructed using entity references or variable substitution, since they may replace an expression found in an instance of Subquery without violating the syntax required by the instance of Subquery.

A.3.8.3. Methods

This section defines the methods of XQL. Functions and methods use the search context differently. A call to a method evaluates to a property of the reference node for the search context rather than to a subset of the search context.

XQL defines the following methods:

Methods

'baseName(' ')'                                                      {7,8}
'count(' ')'                                                         {7,8}
'end(' ')'                                                           {7,8}
'index(' ')'                                                           {7}
'namespace(' ')'                                                     {7,8}
'nodeName(' ')'                                                      {7,8}
'nodeType(' ')'                                                      {7,8}
'nodeTypeString(' ')'                                                {7,8}
'prefix(' ')'                                                        {7,8}
'text(' ')'                                                          {7,8}
'rawText(' ')'                                                       {7,8}
'value(' ')'                                                         {7,8}

A.3.8.3.1. baseName()

"baseName" is a method that evaluates to the local name of the search context's reference node. The reference node serves as the source node of the local name.

Local names are defined only for element nodes and attribute nodes. The local name of an element node is the local portion of the node's element type name. The local name of an attribute node is the local portion of the node's attribute name. If a local name is not defined for the reference node, the method evaluates to the empty set.

A.3.8.3.2. count()

"count" is a method that evaluates to a number. As with any term, this method is evaluated with respect to a search context. Every search context has an associated reference node and an associated set of reference nodes. "count()" evaluates to the number of reference nodes that appear in the associated set of reference nodes. The reference node serves as the source node of the count.

A.3.8.3.3. end()

"end" is a method that evaluates to a Boolean. As with any term, this method is evaluated with respect to a search context. Every search context has an associated reference node and an associated set of reference nodes. "end()" evaluates to 'true' if the reference node is the last node of the reference set, and it evaluates to 'false' otherwise. The reference node serves as the source node of the index.

A.3.8.3.4. index()

"index" is a method that evaluates to a single number. As with any term, this method is evaluated with respect to a search context. Every search context has an associated reference node and an associated set of reference nodes. The number to which the method evaluates is the index that the reference node has within the set of reference nodes. The reference node serves as the source node of the index.

A.3.8.3.5. namespace()

"namespace" is a method that evaluates to the namespace URI associated with the search context's reference node, where the namespace URI is represented as a string. The reference node serves as the source node of the namespace URI.

Namespace URIs are defined only for element nodes and attribute nodes. The namespace URI of an element node is the namespace URI associated with the node's element type name. The namespace URI of an attribute node is the namespace URI associated with the node's attribute name. If a namespace URI is not defined for the reference node, the method evaluates to the empty set.

A.3.8.3.6. nodeName()

"nodeName" is a method that evaluates to the node name of the search context's reference node. The reference node serves as the source node of the node name.

Node names are defined only for element nodes and attribute nodes. The node name of an element node is the node's element type name. The node name of an attribute node is the node's attribute name. If a node name is not defined for the reference node, the method evaluates to the empty set.

A.3.8.3.7. nodeType()

"nodeType" is a method that evaluates to the node type number of the search context's reference node. The reference node serves as the source node of the node type number. One may also use the "nodeTypeString" method to return strings rather than numbers. The node type numbers and node type strings are borrowed from the W3C DOM effort. Those that this draft of XQL supports appear in the following table:

Type of Reference Node	Node Type Number	Node Type String
Element	1	'element'
Element Attribute	2	'attribute'
Markup-Delimited Region of Text	3	'text'
Processing Instruction	7	'processing_instruction'
Comment	8	'comment'
Document Entity	9	'document'

A.3.8.3.8. nodeTypeString()

"nodeTypeString" is a method that evaluates to the node type string of the search context's reference node. The reference node serves as the source node of the node type string. See nodeType() method for a table of node type strings.

A.3.8.3.9. prefix()

"prefix" is a method that evaluates to the namespace prefix of the search context's reference node. The reference node serves as the source node of the namespace prefix.

Namespace prefixes are defined only for element nodes and attribute nodes. The namespace prefix of an element node is the shortname for the namespace of the node's element type name. The namespace prefix of an attribute node is the shortname for the namespace of the node's attribute name. A node's namespace prefix may be defined within the query expression, within the document under query, or within both the query expression and the document under query. If it is defined in both places the prefixes may not agree. In this case, the prefix assigned by the query expression takes precedence. If a namespace prefix is not defined for the reference node, the method evaluates to the empty set.

A.3.8.3.10. text()

"text" is a method that evaluates to a normalization of a node's raw text string. The "rawText()" method defines the raw text string of a node. The pertinent node is the reference node for the function's search context. This node also serves as the source node for the resulting normalized text string.

Normalized text strings are only defined for element nodes, attribute nodes, and text nodes. The normalized text string of a text node is the text that the node contains with all leading and trailing whitespace removed. The normalized text string of an attribute node is the node's attribute value after applying attribute value normalization as defined by the W3C XML Recommendation. The normalized text string of an element node is a space-delimited concatenation of the normalized text strings of all the node's child nodes, subject to the following rules:

The normalized text strings of the nodes are concatenated in document order.
The normalized text string of an element node does not contain the normalized text strings of any of the node's attribute nodes.
The XML 1.0 specification defines the xml:space attribute. The definition provides rules for determining whether white space is preserved in a particular element. The normalized text string of a text node includes leading and trailing whitespace when whitespace is presesrved for the node's parent element.
A single space character is inserted into the concatenation between every two nodes for which a normalized text string is defined, except between adjacent text nodes. Text nodes may be adjacent when CDATA sections are present. The whitespace between adjacent text nodes is not stripped, and no space is inserted between these nodes.
The normalized text string of a child element node is defined recursively by these rules.

If a normalized text string is not defined for the reference node, the method evaluates to the empty set.

A.3.8.3.11. rawText()

"rawText" is a method that evaluates to the raw text string of the search context's reference node. The reference node serves as the source node of the raw text string.

Raw text strings are only defined for element nodes, attribute nodes, and text nodes. The raw text string of an element node is a document order concatenation of the raw text strings of all child nodes of the element node, excluding attribute nodes. The raw text string of a text node is the text that the node contains. The raw text string of an attribute node is the attribute's value after applying attribute value normalization, as defined by the XML specification. If a raw text string is not defined for the reference node, the method evaluates to the empty set.

A.3.8.3.12. value()

"value" is a method that evaluates to the typecasted value of the search context's reference node. The reference node serves as the source node of the typecasted value.

The typecasted value of a node depends on the node's data type. A query engine may recognize node data types so that comparison operations may properly compare analogous nodes. For example, an attribute of an element node may provide the node's data type, and the query engine could recognize this attribute and interpret the node as a value of this type. The attribute might identify the node as an address or a date or a floating point number of a particular precision.

The typecasted value of a node for which type information is available is a value having the type specified for the node. The typecasted value of a node for which type information is not available is the value of the node after applying the 'text' method to the node. In this latter case, if the 'text' method is not defined for the node, the method evaluates to the empty set.

A.4. Query Operators

XQL defines a set of operators that act on one or two operands to yield a value. XQL distinguishes between an operand and the value of the operand. An operand is an expression that the operator evaluates with respect to a search context, and the result of the evaluation is the value of the operand. This section enumerates the different operators, explains how the operations interpret their operands, and defines the values of the operations as a function of the values of their operands.

A.4.1. Operator Precedence

The productions for XQL reflect operator precedence. Higher precedence operators appear more deeply nested. This organization facilitates implementing XQL from the BNF, since it allows one to perform a query directly from a parse of the query. Each node of the parse corresponds to an expression, and by evaluating the expressions of child nodes before evaluating the expressions of parent nodes the engine evaluates the expressions in precedence order. Again in accordance with the parse, all operators of a given precedence are evaluated in left-to-right order.

The following table lists operators in precedence order, highest precedence first, where operators of a given row have the same precedence. The table also lists the associated productions

Query Operators by Decreasing Precedence
Production	Operator(s)
Grouping	( )
Filter	[ ]
Subscript	[ ]
Bang	!
Path	/ //
Comparison	= != < <= > >= $eq$ $ne$ $lt$ $le$ $gt$ $ge$ $ieq$ $ine$ $ilt$ $ile$ $igt$ $ige$
Intersection	$intersect$
Union	$union$ \|
Negation	$not$
Conjunction	$and$
Disjunction	$or$

Although it is not listed above, the Query production may also be thought of as an operator production. The operator is the query engine, and the instance of Query is the operand. The Queries section defines the Query production.

A.4.2. Operator Production Rules

XQL provides a rich variety of operators through the use of a dollar-dollar syntax. Future versions of XQL may support additional operators that conform to this syntax. Applications of XQL such as XQL editors need not be dependent on the meanings of the operators and hence might be designed to accommodate all potential operators. These applications may accomplish this by allowing operators to conform to the following production rule:

Operator

Operator ::= '$' XQLName '$'

The operator productions rely on the following supporting productions:

Productions that Support Operators

      PathOp ::= '/' | '//'
     UnionOp ::= '$union$' | '|'
ComparisonOp ::= '=' | '!=' | '<' | '<=' | '>' | '>=' | '$eq$' | '$ne$' |
                 '$lt$' | '$le$' | '$gt$' | '$ge$' | '$ieq$' | '$ine$' |
                 '$ilt$' | '$ile$' | '$igt$' | '$ige$'
Multiplicity ::= '$any$' | '$all$'

The operator productions follow, ordered by increasing precedence of the associated operators:

Operators

 Disjunction ::= Conjunction | Conjunction '$or$' Disjunction
 Conjunction ::= Negation | Negation '$and$' Conjunction
    Negation ::= Union | '$not$' Negation
       Union ::= Intersection | Intersection UnionOp Union
Intersection ::= Comparison | Comparison '$intersect$' Intersection

  Comparison ::= Path | Multiplicity? LValue ComparisonOp RValue   {11,13}
      LValue ::= Path                                                 {11}
      RValue ::= Path | Number | Text                                 {12}

        Path ::= AbsolutePath | RelativePath
AbsolutePath ::= Root | PathOp RelativePath
RelativePath ::= Bang | Bang PathOp RelativePath

        Bang ::= Subscript | Subscript '!' Invocation                 {14}

   Subscript ::= Filter | Filter '[' IndexList? ']'
   IndexList ::= IndexArg (',' IndexArg)*
    IndexArg ::= Integer | Range
       Range ::= Integer '$to$' Integer                                {9}

      Filter ::= Grouping | Filter '[' Subquery ']'
    Subquery ::= Disjunction

    Grouping ::= RelativeTerm | '(' Disjunction ')'
RelativeTerm ::= Dot | Invocation | ElementName | AttributeName

A.4.3. Disjunction

The $or$ operator defines a disjunction, and the value of a disjunction is always a Boolean. The operator evaluates each of its operands with respect to the search context of the instance of Disjunction. When the value of an operand is a Boolean 'true', a set of Booleans containing at least one 'true' member, or a set containing at least one non-Boolean member, the operator interprets the value of the operand as 'true'. All other values are interpreted as 'false'. The value of the operation is 'true' if the operator interprets at least one operand value as 'true', and the value of the operation is 'false' otherwise.

A.4.4. Conjunction

The $and$ operator defines a conjunction, and the value of a conjunction is always a Boolean. The operator evaluates each of its operands with respect to the search context of the instance of Conjunction. When the value of an operand is a Boolean 'true', a set of Booleans containing at least one 'true' member, or a set containing at least one non-Boolean member, the operator interprets the value of the operand as 'true'. All other values are interpreted as 'false'. The value of the operation is 'true' if the operator interprets both operand values as 'true', and the value of the operation is 'false' otherwise.

A.4.5. Negation

The $not$ operator defines a negation, and the value of a negation is always a Boolean. The operator evaluates its operand with respect to the search context of the instance of Negation. When the value of the operand is a Boolean 'true', a set of Booleans containing at least one 'true' member, or a set containing at least one non-Boolean member, the operator interprets the value of the operand as 'true'. All other values are interpreted as 'false'. The value of the operation is 'true' if the operator interprets the operand value as 'false', and the value of the operation is 'false' otherwise.

A.4.6. Intersection

The $intersect$ operator defines an intersection, and the value of an intersection is always a set. The operator evaluates its operands with respect to the search context of the instance of Intersection. The intersection operation is defined on all operands. An operand that evaluates to a value that is not a set is first interpreted as a set that contains only this value.

The value of the operation is the intersection of the sets to which the operands evaluate. The intersection contains only those nodes that are found in both operand sets, and it contains only those non-node values that are found in both operand sets. Non-node values are compared without any form of normalization or typecasting. A non-node value that has an associated source node may only be identical to another non-node value that has the same associated source node.

A.4.7. Union

The $union$ and | operators are synonymous operators that define a union. The value of a union is always a set. The operator evaluates its operands with respect to the search context of the instance of Union. The union operation is defined on all operands. An operand that evaluates to a value that is not a set is first interpreted as a set that contains only this value.

The value of the operation is the union of the sets to which the operands evaluate. The union contains no duplicate values. Non-node values are compared without any form of normalization or typecasting. A non-node value that has an associated source node may only be identical to another non-node value that has the same associated source node.

A.4.8. Comparison

The Comparison production defines the comparison operations. Unlike comparison operations in most languages, the value of a comparison is a subset of the set to which the left-hand operand evaluates. The subset consists of those values that have a specific relationship with the value of the right-hand operand. The comparison operation defines a variety of comparison operators. The operator that is present determines the relationship that must hold between the operands.

A.4.8.1. Operand Values - Left-hand Set and Right-hand Value

A comparison operator compares the value of its left-hand operand to the value of the right-hand operand. A comparison operator evaluates each of its operands with respect to the search context of the instance of Comparison.

The left-hand operand of a comparison operation is the instance of LValue that precedes the operator. This operand may evaluate to any value. The operator always interprets the value of a left-hand operand as a set by interpreting a value that is not a set as a set whose only member is this value. This definition refers to the set as the left-hand set. The comparison operation evaluates to a subset of the left-hand set.

The right-hand operand of a comparison operation is the instance of RValue that follows the operator. The operator always interprets the value of the right-hand operand as a single non-set value. If the right-hand operand evaluates to a set that contains a single value, the operator interprets the operand as if it evaluated to this single value. This definition refers to the value as the right-hand value.

A.4.8.2. Relationships

A comparison operator specifies a relationship that must hold between the right-hand value and either one member or all members of the left-hand set. If the appropriate relationship does not hold, the operation evaluates to the empty set.

The instance of Multiplicity that precedes the left-hand operand identifies the relationship's multiplicity. The multiplicity determines the number of members of the left-hand set for which the relationship must hold. If the multiplicity is $any$ the relationship must hold for at least one member of the left-hand set. If the multiplicity is $all$ the relationship must hold for every single member of the left-hand set. When no instance of Multiplicity is present, the comparison operation defaults to a multiplicity of $any$.

The instance of ComparisonOp is the comparison operator. The comparison operator identifies the relationship that a comparison operation examines. A table of the comparison operators and their associated relationships follows:

Comparison Operations
Operator(s)	Relationship	Case-Sensitive?
= $eq$	X equals Y	Yes
!= $ne$	X is not equal to Y	Yes
< $lt$	X is less than Y	Yes
<= $le$	X is less than or equal to Y	Yes
> $gt$	X is greater than Y	Yes
>= $ge$	X is greater than or equal to Y	Yes
$ieq$	X equals Y	No
$ine$	X is not equal to Y	No
$ilt$	X is less than Y	No
$ile$	X is less than or equal to Y	No
$igt$	X is greater than Y	No
$ige$	X is greater than or equal to Y	No

X represents a member of the left-hand set and Y represents the right-hand value. The last column indicates whether the comparison recognizes differences in letter case when comparing string values.

A.4.8.3. Typecasting

Comparisons are only defined when X and Y both have the same value type. XQL recognizes strings, numbers, and Booleans as primitive XQL types, but a query engine may also support other richer value types. The query engine takes responsibility for casting X and Y to the same type so that it may properly compare them. For example, an XML element may contain an attribute that indicates that the element represents a date. If this date is compared to a string that contains a human-readable date, the two values are comparable, but only after casting the string to a date type.

XQL uses the following rules for typecasting the operands of a comparison operation:

A comparison of X and Y is actually a typecasted comparison of value(X) and value(Y). Note that the 'value' method also normalizes text values via the 'text' method.
If value(X) and value(Y) are of different types, but one of the two values is a primitive XQL type, the primitive type value is cast to the type of the other value.
If value(X) and value(Y) are of different types, and if neither type is a primitive XQL type, the direction of the typecast depends on the types being compared.

All operators are defined for string and number values of X and Y, but only '=', '!=', '$EQ$', and '$NE$' are defined when X and Y are Booleans.

A.4.8.4. Sort Order

The "less than" and "greater than" comparisons of number values are well-defined, but those of string values require definition. When X and Y are both strings, they are interpreted as integer sequences of the Unicode values of their characters. X is "less than" Y if it is not equal to Y and precedes Y in an increasing sort of X and Y. X is "greater than" Y if it is not equal to Y and follows Y in an increasing sort of X and Y. The sort is performed as if the shorter string were extended to the length of the longer string by appending zeroes to its associated Unicode integer sequence.

A.4.8.5. Value of a Comparison Operation

The value of a comparison operation is a subset of the left-hand set. It is the set of those members of the left-hand set for which the relationship holds. A comparison operation will evaluate to the empty set under any of the following conditions:

The left-hand set is the empty set.
The relationship is not defined for any value in the left-hand set.
The right-hand value is the empty set.
The relationship is not defined for the right-hand value.
No value of X has the specified relationship with Y.
The multiplicity is $all$ and at least one value of X does not have the specified relationship with Y.

A.4.9. Path

The Path production defines the child and descendant operators, and it defines both the unary and binary forms of these operators. The child operator is given by /, and the descendant operator is given by //. These operators differ only in the set of search contexts that they define for the right-hand operand. The value of a Path operation is the union of the values to which the right-hand operand evaluates with respect to each of these search contexts.

A.4.9.1. Unary Child Operator

The unary / operator appears as an instance of PathOp in the AbsolutePath production. Its right-hand operand is the instance of RelativePath that follows it.

This operator defines a set of reference nodes whose only member is the value of the Root term. The operator ignores the search context of the Path instance. Each node in the set of reference nodes serves as a reference node for a search context. The set of search contexts of the right-hand operand is the set of all these search contexts.

A.4.9.2. Unary Descendant Operator

The unary // operator appears as an instance of PathOp in the AbsolutePath production. Its right-hand operand is the instance of RelativePath that follows it.

This operator defines a set of reference nodes consisting of all nodes that descend from the node to which the Root term evaluates. The operator ignores the search context of the Path instance. Each node in the set if reference nodes serves as a reference node for a search context. The set of search contexts of the right-hand operand is the set of all these search contexts.

A.4.9.3. Binary Child Operator

The binary / operator appears as an instance of PathOp in the RelativePath production. Its left-hand operand is the instance of Bang that precedes it, and its right-hand operand is the instance of RelativePath that follows it.

The operator evaluates the left-hand operand with respect to the search context of the instance of Path that contains the operator. This search context has an associated reference node. If the value of the left-hand operand is a set that contains only this reference node, the operator defines the set of search contexts of the right-hand operand as a set that contains only this search context.

If the left-hand operand evaluates to any other value, the operator derives a set of reference nodes from this value. When the value is a set, the set of reference nodes contains all of the node members of the set. When the value is not a set, the set of reference nodes is the empty set. Each node in the set of reference nodes serves as a reference node for a search context, and the set of search contexts of the right-hand operand is the set of all these search contexts.

A.4.9.4. Binary Descendant Operator

The binary // operator appears as an instance of PathOp in the RelativePath production. Its left-hand operand is the instance of Bang that precedes it, and its right-hand operand is the instance of RelativePath that follows it.

If the left-hand operand evaluates to any other value, the operator derives a set of reference nodes from this value. When the value is a set, the set of reference nodes consists of all nodes that descend from the node members of this set. When the value is not a set, the set of reference nodes is the empty set. Each node in the set of reference nodes serves as a reference node for a search context, and the set of search contexts of the right-hand operand is the set of all these search contexts.

A.4.9.5. Value of a Path Operation

All of the Path operators evaluate the right-hand operand once for each search context in the set of search contexts. Evaluating the right-hand operand for a particular search context yields a subset of the value of the operation. If an evaluation yields a set, the subset is this set. If an evaluation yields a non-set value, the subset is a set that contains only this value. The value of the Path operation is the union of all the subsets.

A.4.10. Bang

The Bang production defines the bang operator, which is given by the symbol !. This operator is semantically identical to the binary child operator (/), which the Path production defines. The operators only differ syntactically: the bang operator requires that it's right-hand operand be an instance of Invocation, while the binary child operator does not have this constraint.

The left-hand operand of the bang operator is the instance of Subscript that precedes the operator. The right-hand operand is the instance of Invocation that follows the operator. An instance of the Bang production evaluates to the value to which the binary child operator evaluates when operating on these left-hand and right-hand operands.

A.4.11. Subscript

The Subscript production defines the subscript operator, which is an operator that XQL denotes with a pair of square brackets ('[' and ']'). Square brackets also denote the filter operator, but a parser may distinguish between the two operators because the brackets contain different literals. The left-hand operand of an subscript operator is the instance of Filter that precedes the brackets, and the right-hand operand is the instance of IndexList that the brackets enclose.

The subscript operator evaluates its left-hand operand with respect to the search context of the Subscript instance. If the left-hand operand evaluates to a single value rather than to a set, the operator interprets the value as a set that contains this single value. The value is always interpreted as a set. This definition refers to this set as the left-hand set.

The right-hand operand is a list of zero or more comma-delimited index arguments. This list is called the "index list." Each argument of the index list is either an instance of Integer or an instance of Range. Each instance of Range contains two instances of Integer.

Regardless of where an instance of Integer appears, the subscript operator interprets it as an index. An instance of Integer that is a positive number is interpreted an index that equals this positive number. An instance of Integer that is a negative number is interpreted as an index that equals this negative number plus the size of the left-hand set, where the size of the left-hand set is given by the total number of values appearing in the left-hand set.

Together the arguments of the index list identify a set of indexes. For each argument that is an instance of Integer; the set contains the index that this instance of Integer represents. Each instance of Range specifies two indexes. For each argument that is an instance of Range, the set of indexes contains all integers that lie between the two indexes, including the two indexes themselves. If the argument list contains no arguments, the index list is assumed to be equivalent to 0 $to$ -1.

The value of the subscript operation is a subset of the values that occur in the left-hand set. Let's refer to this subset as the "output set." For each index in the set of indexes, the output set contains the value in the left-hand set that appears at this index, where the operator interprets the left-hand set as a set whose members appear in document order. If no value appears at this index, the index is ignored.

A.4.12. Filter

The Filter production defines the filter operator, which is an operator that XQL denotes with a pair of square brackets ('[' and ']'). Square brackets also denote the subscript operator, but a parser may distinguish between the two operators because the brackets contain different literals. The left-hand operand of a filter operator is the instance of Filter that precedes the brackets, and the right-hand operand is the instance of Subquery that the brackets enclose.

The filter operator defines a set of search contexts for the right-hand operand. The operator evaluates the left-hand operand with respect to the search context of the Filter instance. This search context has an associated reference node. If the value of the left-hand operand is a set that contains only this reference node, the operator defines the set of search contexts of the right-hand operand as a set that contains only this search context.

If the left-hand operand evaluates to any other value, the operator derives a set of reference nodes from the value. When the value is a set, the set of reference nodes consists of the node members of this set. When the value is not a set, the set of reference nodes is the empty set. Each node in the set of reference nodes serves as a reference node for a search context, and the set of search contexts of the right-hand operand is the set of all these search contexts.

The value of the filter operation is a set of nodes taken from the set to which the left-hand operand evaluates. If the set of search contexts for the right-hand operand is empty, the value of the filter operation is the empty set. Otherwise, the filter operation evaluates the right-hand operand with respect to each search context in the set of search contexts.

Upon evaluating the right-hand operand for a given search context, the filter operation interprets the value as a Boolean. When the value of the right-hand operand is a Boolean 'true', a set of Booleans containing at least one 'true' member, or a set containing at least one non-Boolean member, the operator interprets the value of the operand as 'true'. All other values of the right-hand operand are interpreted as 'false'.

Each node found in the set to which the left-hand operand evaluates is a reference node for one of the search contexts with respect to which the operation evaluates the right-hand operand. The value of the filter operation is the set of those reference nodes whose search contexts meet the following requirement: upon evaluating the right-hand operand with respect to the search context, the operation interprets the operand's value as a Boolean 'true'.

A.4.13. Grouping

The Grouping production defines the grouping operator, which is an operator that XQL denotes with a pair of parentheses. It is a unary operator whose operand is the instance of Disjunction that appears between the parenthese. The operation evaluates the operand with respect to the search context of the instance of Grouping. The value of the operation is the value of its operand. The grouping operator allows one to explicitly declare the order of evaluation of expressions.

A.5. Queries

A 'query' is an instance of the Query production. It is an expression that an application inputs to a query engine. Unless the query engine is hard-coded to operate on a specific set of document nodes, the application also provides the query engine with a set of input nodes. The query engine applies the query to the set of input nodes and returns a set of values to the application. The values are those that the query specified for retrieval from the set of input nodes.

The Query production is defined as follows:

Query

Query ::= Disjunction                                                 {10}

A query may be thought of as an operand of the query engine. The query engine is an operator that evaluates the query once for each node in an input set that the application provides. Together these nodes form a set of reference nodes, where each node serves as a reference node for a search context.

For each reference node in the set of reference nodes, the query engine establishes a search context. Together these search contexts comprise the set of search contexts for the query. The query engine evaluates the query once for each context in this set. Each evaluation of a query yields a result subset. If an evaluation yields a set, the result subset is this set. If an evaluation yields a non-set value, the result subset is a set that contains only this value. The result set is the union of all the result subsets. The query engine returns the result set to the application that issued the query.

The result set of a query has the following two properties:

No node occurs more than once in the result set.
The members of the result set occur in document order.

A.6. Syntactic Constraints

This section lists constraints on the syntax that the BNF does not itself express. The motivation for having such constraints appears in the section Notes on Notation. The numbers correspond to the numbers appearing in curly braces at the end of the line that contains the associated XQL production.

Some constraints express limitations on the values that an expression may take. In general, the parser cannot enforce these constraints, since it is possible for an expression to evaluate to an appropriate value in some circumstances and to an inappropriate value in others. The query engine would have to enforce the constraint at the time it evaluates the query. However, a parser with knowledge of the schema of the document under query may be able to partially enforce this rule prior to query evaluation.

Some productions having constraints are expressed in terms of other productions that have constraints. In certain cases the constraint on one production is more stringent that that of another. Since all constraints must hold simultaneously on constrained expressions, the most stringent ones apply.

See Notes on Notation for a discussion of node queries and full queries.

{1}	Whitespace is not allowed between any two terms of this production.
{2}	Instances of this production that contain a colon are only valid under the namespaces extension to XQL.
{3}	Whitespace is significant except between the optional '-' and the first instance of 'Digit+'.
{4}	Whitespace is significant between every two terms of this production.
{5}	XQL defines a set of valid Invocation instances. See the explanation of the Invocation term for a list of these instances.
{6}	Whitespace is not allowed between the period and the instance of XQLName. When '(' is present, whitespace is also not allowed between the instance of XQLName and the '('.
{7}	In a node query this function or method is only valid inside an instance of Subquery, unless it appears within an instance of Param. Functions and methods are valid anywhere in a full query.
{8}	This production consists of two literals in order to indicate that whitespace may appear between the literals.
{9}	If both integers are positive or if both integers are negative, the first integer must be less than or equal to the second integer. Although the literal 'TO' appears in capital letters, XQL does not require capital letters; the case of these letters is insignificant.
{10}	A node query may only return nodes. This property may be enforced as a constraint on the syntax. All operators are valid inside the outermost instance of Subquery, but the only operators that may appear outside of this instance are '/', '//', and '$union$', unless the operator appears within an instance of Param or an instance of RValue. A full query may return any type of value, so this constraint does not apply to full queries.
{11}	The instance of Path may only match AbsolutePath when it appears outside all instances of Subquery and Grouping, unless it appears within an instance of Param or an instance of RValue. When it appears within an instance of either Param or RValue it may only match AbsolutePath when it appears outside all instances of Subquery and Grouping that occur within the Param or RValue.
{12}	RValue must evaluate to a value that is not a set or to a value that is a set having exactly one member.
{13}	In node queries comparison operators may only appear within instances of Subquery. This constraint does not apply to full queries.
{14}	Only methods may match this instance of Invocation; no function may appear after the bang operator.

Appendix B. Sample Data

<?xml version='1.0'?>
<!-- This file represents a fragment of a book store inventory database -->
<bookstore specialty='novel'>
  <book style='autobiography'>
    <title>Seven Years in Trenton</title>
    <author>
      <first-name>Joe</first-name>
      <last-name>Bob</last-name>
      <award>Trenton Literary Review Honorable Mention</award>
    </author>
    <price>12</price>
  </book>
  <book style='textbook'>
    <title>History of Trenton</title>
    <author>
      <first-name>Mary</first-name>
      <last-name>Bob</last-name>
      <publication>
        Selected Short Stories of
        <first-name>Mary</first-name> <last-name>Bob</last-name>
      </publication>
    </author>
    <price>55</price>
  </book>
  <magazine style='glossy' frequency='monthly'>
    <title>Tracking Trenton</title>
    <price>2.50</price>
    <subscription price='24' per='year'/>
  </magazine>
  <book style='novel' id='myfave'>
    <title>Trenton Today, Trenton Tomorrow</title>
    <author>
      <first-name>Toni</first-name>
      <last-name>Bob</last-name>
      <degree from='Trenton U'>B.A.</degree>
      <degree from='Harvard'>Ph.D.</degree>
      <award>Pulizer</award>
      <publication>Still in Trenton</publication>
      <publication>Trenton Forever</publication>
    </author>
    <price intl='canada' exchange='0.7'>6.50</price>
    <excerpt>
      <p>It was a dark and stormy night.</p>
      <p>But then all nights in Trenton seem dark and
      stormy to someone who has gone through what
      <emph>I</emph> have.</p>
      <definition-list>
        <term>Trenton</term>
        <definition>misery</definition>
      </definition-list>
    </excerpt>
  </book>
  <my:book style='leather' price='29.50' xmlns:my='http://www.placeholder-name-here.com/schema/'>
    <my:title>Who's Who in Trenton</my:title>
    <my:author>Robert Bob</my:author>
  </my:book>
</bookstore>

XML Query Language (XQL)

Authors:

Contributors:

Abstract:

Contents:

1. Introduction

2. XML Patterns

2.1. Context

2.2. Results

2.3. Collections

2.4. Selecting children and descendants

2.5. Collecting element children

2.6. Finding an attribute

2.7. Grouping

2.8. Filters

2.9. Boolean Expressions

2.9.1. Boolean AND and OR

2.9.2. Boolean NOT

2.10. Equivalence

2.10.1. Comparisons and vectors

2.10.2. Comparisons and literals

2.10.3. Casting of literals during comparison

2.11. Methods

2.11.1. Information methods

2.11.1.1. text()

2.11.2. Collection index functions

2.11.3. Shortcuts

2.12. Indexing into a collection

2.12.1. Finding the last element in a collection

3. XQL Extensions

3.1. Namespaces

3.1.1. Namespace methods

3.2. Finding a collection of attributes

3.3. Comparisons

3.3.1. Data types

3.3.2. Type casting functions

3.4. Any and all semantics

3.5. Union and intersection

3.6. Collection methods

3.6.1. DOM nodes at the document root

3.7. Aggregate methods

3.8. Additional methods

3.9. Ancestor

3.10. Subscript operator

Appendix A. Annotated XQL BNF

A.1. Notes on Notation

A.2. Terminology

A.2.1. Value

A.2.2. Node

A.2.3. Source Node

A.2.4. Document Order

A.2.5. Index

A.2.6. Search Context

A.2.7. Reference Node

A.2.8. Set of Reference Nodes

A.2.9. Set of Search Contexts

A.2.10. Pattern Matching

A.3. Query Terms

A.3.1. Term Production Rules

A.3.2. Number

A.3.3. Text

A.3.4. Root

A.3.5. Dot

A.3.6. ElementName

A.3.7. AttributeName

A.3.8. Invocation

A.3.8.1. Parameters

A.3.8.2. Functions

A.3.8.2.1. ancestor()

A.3.8.2.2. attribute()

A.3.8.2.3. comment()

A.3.8.2.4. date()

A.3.8.2.5. element()

A.3.8.2.6. id()

A.3.8.2.7. node()

A.3.8.2.8. pi()

A.3.8.2.9. textNode()

A.3.8.2.10. true(), false()

A.3.8.3. Methods

A.3.8.3.1. baseName()