The use cases listed below were created by the XML
Query Working Group to illustrate important applications
for an XML query language. Each use case is focused on a
specific application area, and contains a Document Type
Definition (DTD) and example input data. Each use case
specifies a set of queries that might be applied to the
input data, and the expected results for each query. Since
the English description of each query is concise, the
expected results form an important part of the definition
of each query, specifying the expected output
format. These use cases were originally published as part
The input environment for each use case is stated in its Document Type Definition (DTD) section. All of these use cases assume that input is provided in the form of one or more documents with specific names. For instance, the authors in a document may be accessed with expressions like this:
Some implementations of XQuery bind input to external variables. If the environment has bound the external variable $b to the same document used in the above query, this expression would return the same set of authors:
Some implementations of XQuery predefine a single 'context item', which is available at the root level of a query, and which is used to resolve paths that begin with a leading slash. In such an implementation, if the context item is bound to document node of the same well-formed document used in the previous examples, this expression would return the same set of authors:
Previous versions of this document accessed implicit documents
input() function, which no longer exists.
input() function had similar functionality to a
predefined context item, except that it could be bound to a
sequence of nodes, whereas the context item may only be bound to
a single node. The use cases that used
been rewritten to use explicit file names.
Several implementors have asked that we make the queries from
these use cases available in a separate file to make it easier
for them to test their parsers. These queries may be found in
To make output more readable, the output of queries has been formatted using whitespace which may not be returned by a query processor. This whitespace should not be considered normative for the correctness of results.
These queries were tested with a dynamic implementation of XQuery. Some queries may require additional type declarations to be used with an implementation that implements the Static Typing feature.
This use case contains several example queries that illustrate requirements gathered from the database and document communities.
Most of the example queries in this use case are based on a bibliography document named "http://bstore1.example.com/bib.xml" with the following DTD:
Here is the data found at "bstore1.example.com/bib.xml":
Q5 also uses information on book reviews and prices from a separate data source named "http://bstore2.example.com/reviews.xml" with the following DTD:
Here are the contents of "http://bstore2.example.com/reviews.xml":
Q9 uses an input document named "books.xml", with the following DTD:
Here are the contents of books.xml:
Q10 uses an input document named "prices.xml", with the following DTD:
Here are the contents of prices.xml:
List books published by Addison-Wesley after 1991, including their year and title.
Create a flat list of all the title-author pairs, with each pair enclosed in a "result" element.
For each book in the bibliography, list the title and authors, grouped inside a "result" element.
For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a "result" element.
The order in which values are returned by distinct-values() is undefined. The distinct-values() function returns atomic values, extracting the names from the elements.
For each book found at both bstore1.example.com and bstore2.example.com, list the title of the book and its price from each source.
For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors.
List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.
Find books in which the name of some element ends with the string "or" and the same element contains the string "Suciu" somewhere in its content. For each such book, return the title and the qualifying element.
In the above solution, string(), local-name() and ends-with() are functions defined in the Functions and Operators document.
In the document "books.xml", find all section or chapter titles that contain the word "XML", regardless of the level of nesting.
In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element with the book title as its title attribute.
For each book with an author, return the book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.
Find pairs of books that have different titles but the same set of authors (possibly in a different order).
The above solution uses a function, deep-equal(), which compares sequences. Two sequences are equal if all items in corresponding positions in the two sequences are equal - if the sequences are node sequences, the values of the nodes are used for comparison.
Some XML document-types have a very flexible structure in which text is mixed with elements and many elements are optional. These document-types show a wide variation in structure from one document to another. In documents of these types, the ways in which elements are ordered and nested are usually quite important.
An XML query language should have the ability to extract elements from documents while preserving their original hierarchy. This Use Case illustrates this requirement by means of a flexible document type named Book.
This use case is based on an input document named "book.xml". The DTD for this schema is found in a file called "book.dtd":
The queries in this use case are based on the following sample data.
Prepare a (nested) table of contents for Book1, listing all the sections and their titles. Preserve the original attributes of each <section> element, if any.
Prepare a (flat) figure list for Book1, listing all the figures and their titles. Preserve the original attributes of each <figure> element, if any.
How many sections are in Book1, and how many figures?
How many top-level sections are in Book1?
Make a flat list of the section elements in Book1. In place of its original attributes, each section element should have two attributes, containing the title of the section and the number of figures immediately contained in the section.
Make a nested list of the section elements in Book1, preserving their original attributes and hierarchy. Inside each section element, include the title of the section and an element that includes the number of figures immediately contained in the section.
This use case illustrates queries based on the sequence in which elements appear in a document.
Although sequence is not significant in most traditional database systems or object systems, it can be quite significant in structured documents. This use case presents a series of queries based on a medical report.
This use case is based on a medical report using the HL7 Patient Record Architecture. We simplify the DTD in this example, using only what is needed to understand the queries.
The queries in this use case are based on the following sample data.
In the Procedure section of Report1, what Instruments were used in the second Incision?
In the Procedure section of Report1, what are the first two Instruments to be used?
In Report1, what Instruments were used in the first two Actions after the second Incision?
In Report1, find "Procedure" sections where no Anesthesia element occurs before the first Incision
(No sections satisfy Q4, thankfully.)
In Report1, what happened between the first Incision and the second Incision?
Here is another solution that is perhaps more efficient and less readable:
In the above output, the contents of the critical sequence element include a text node, an action element, and the text node containing the content of the action element. But the serialization we are using already shows all descendants of a given node. If $c is bound to a sequence of nodes, the following expression eliminates members of the sequence that are descendants of another node already found in the sequence:
In the following solution, the between() function takes a sequence of nodes, a starting node, and an ending node, and returns the nodes between them:
Here is the output from the above query:
One important use of an XML query language will be to access data stored in relational databases. This use case describes one possible way in which this access might be accomplished.
A relational database system might present a view in which each table (relation) takes the form of an XML document. One way to represent a database table as an XML document is to allow the document element to represent the table itself, and each row (tuple) inside the table to be represented by a nested element. Inside the tuple-elements, each column is in turn represented by a nested element. Columns that allow null values are represented by optional elements, and a missing element denotes a null value.
As an example, consider a relational database used by an online auction. The auction maintains a USERS table containing information on registered users, each identified by a unique userid, who can either offer items for sale or bid on items. An ITEMS table lists items currently or recently for sale, with the userid of the user who offered each item. A BIDS table contains all bids on record, keyed by the userid of the bidder and the item number of the item to which the bid applies.
The three tables used by the online auction are below, with their column-names indicated in parentheses.
This use case is based on three separate input documents named users.xml, items.xml, and bids.xml. Each of the documents represents one of the tables in the relational database described above, using the following DTDs:
Here is an abbreviated set of data showing the XML format of the instances:
The entire data set is represented by the following table:
|U06||Rip Van Winkle||B|
List the item number and description of all bicycles that currently have an auction in progress, ordered by item number.
This solution assumes that the current date is 1999-01-31.
The above query returns an element named
item_tuple, but its definition does
not match the definition of item_tuple in the DTD.
For all bicycles, list the item number, description, and highest bid (if any), ordered by item number.
Find cases where a user with a rating worse (alphabetically, greater) than "C" is offering an item with a reserve price of more than 1000.
List item numbers and descriptions of items that have no bids.
For bicycle(s) offered by Tom Jones that have received a bid, list the item number, description, highest bid, and name of the highest bidder, ordered by item number.
The above query does several joins, and requires the
results in a particular order. If there were no
order by clause, results would be
reported in document order. If you do not care about
the order, you can use the
function to inform the query processor that the
order of the lists in the for clause is not
significant, which means that the tuples can be
generated in any order. This can enable better
For each item whose highest bid is more than twice its reserve price, list the item number, description, reserve price, and highest bid.
Find the highest bid ever made for a bicycle or tricycle.
How many items were actioned (auction ended) in March 1999?
List the number of items auctioned each month in 1999 for which data is available, ordered by month.
For each item that has received a bid, list the item number, the highest bid, and the name of the highest bidder, ordered by item number.
List the item number and description of the item(s) that received the highest bid ever recorded, and the amount of that bid.
List the item number and description of the item(s) that received the largest number of bids, and the number of bids it (or they) received.
For each user who has placed a bid, give the userid, name, number of bids, and average bid, in order by userid.
List item numbers and average bids for items that have received three or more bids, in descending order by average bid.
List names of users who have placed multiple bids of at least $100 each.
List all registered users in order by userid; for each user, include the userid, name, and an indication of whether the user is active (has at least one bid on record) or inactive (has no bid on record).
List the names of users, if any, who have bid on every item.
(No users satisfy Q17.)
List all users in alphabetic order by name. For each user, include descriptions of all the items (if any) that were bid on by that user, in alphabetic order.
The example document and queries in this Use Case were first created for a 1992 conference on Standard Generalized Markup Language (SGML). For our use, the Document Type Definition (DTD) and example document have been translated from SGML to XML.
This use case is based on data conforming to the DTD shown below.
The queries in this use case are based on the following sample data, which is found in the file "sgml.xml". Line numbers have been added to the data to allow the results of queries to be conveniently specified.
Locate all paragraphs in the report (all "para" elements occurring anywhere within the "report" element).
Elements whose start-tags are on lines 6, 11, 20, 27, 34, 39, 46, 53, 56, 62, 67, 71, 76, 83, 90, 94
Locate all paragraph elements in an introduction (all "para" elements directly contained within an "intro" element).
Elements whose start-tags are on lines 6, 11, 20, 27, 53, 56, 62, 90, 94
Locate all paragraphs in the introduction of a section that is in a chapter that has no introduction (all "para" elements directly contained within an "intro" element directly contained in a "section" element directly contained in a "chapter" element. The "chapter" element must not directly contain an "intro" element).
Elements whose start-tags are on lines 90, 94
Locate the second paragraph in the third section in the second chapter (the second "para" element occurring in the third "section" element occurring in the second "chapter" element occurring in the "report").
Element whose start-tag is on line 67
Locate all classified paragraphs (all "para" elements whose "security" attribute has the value "c").
Element whose start-tag is on line 94
List the short titles of all sections (the values of the "shorttitle" attributes of all "section" elements, expressing each short title as the value of a new element.)
Attribute values in start-tags on lines 23, 50, 59
Locate the initial letter of the initial paragraph of all introductions (the first character in the content [character content as well as element content] of the first "para" element contained in an "intro" element).
Character after start-tag on lines 6, 20, 27, 53, 62, 90
Locate all sections with a title that has "is SGML" in it. The string may occur anywhere in the descendants of the title element, and markup boundaries are ignored.
Elements whose start-tags are on lines 50, 59
Same as (Q8a), but the string "is SGML" cannot be interrupted by sub-elements, and must appear in a single text node.
Element whose start-tag is on line 59
Locate all the topics referenced by a cross-reference anywhere in the report (all the "topic" elements whose "topicid" attribute value is the same as an "xrefid" attribute value of any "xref" element).
Element whose start-tag is on line 65
Locate the closest title preceding the cross-reference ("xref") element whose "xrefid" attribute is "top4" (the "title" element that would be touched last before this "xref" element when touching each element in document order).
Given xref on line 79, element whose start-tag is on line 75
This use case is based on company profiles and a set of news documents which contain data for PR, mergers and acquisitions, etc. Given a company, the use case illustrates several different queries for searching text in news documents and different ways of providing query results by matching the information from the company profile and the content of the news items.
In this use case, the
contains function is
used to test whether a string occurs within a node or a
string. Obviously, using full-text functions would
provide more powerful searching, but the current
Functions and Operators draft does not have full-text
This use case uses data that corresponds to the following DTDs:
The queries in this use case are based on the following input data, which is found in the file "string.xml".
In addition, the following data, listing the partners and competitors of companies, is found in the file "company-data.xml".
Find the titles of all news items where the string "Foobar Corporation" appears in the title.
Find news items where the Foobar Corporation and one or more of its partners are mentioned in the same paragraph and/or title. List each news item by its title and date.
Query Q3 has been withdrawn from the use cases document.
Find news items where a company and one of its partners is mentioned in the same news item and the news item is not authored by the company itself.
Gorilla Corporation acquires YouNameItWeIntegrateIt.com
Foobar Corporation is suing Gorilla Corporation for patent infringement
For each news item that is relevant to the Gorilla Corporation, create an "item summary" element. The content of the item summary is the content of the title, date, and first paragraph of the news item, separated by periods. A news item is relevant if the name of the company is mentioned anywhere within the content of the news item.
This use case performs a variety of queries on namespace-qualified names.
This use case is based on a scenario in which a neutral mediator is acting with public auction servers on behalf of clients. The reason for a client to use this imaginary service may be anonymity, better insurance, or the possibility to cover more than one market at a time. The following aspects of namespaces are illustrated by this use case:
Syntactic disambiguation when combining XML data from different sources
Re-use of predefined modules, such as XLinks or XML Schema
Support for global classification schemas, such as the Dublin Core
The sample data consists of two records. The schema used for this data uses W3C XML Schema's schema composition to create a schema from predefined, namespace separated modules, and uses XLink to express references. Each record describes a running auction. It embeds data specific to an auctioneer (e.g. the company's credit rating system) and a taxonomy specific to a particular good (jazz records) in a framework that contains data common to all auctions (e.g. start and end time), using namespaces to distinguish the three vocabularies.
Note that namespace prefixes must be resolved to their Namespace URIs before matching namespace qualified names. It is not sufficient to use the literal prefixes to denote namespaces. Furthermore, there are several possible ways to represent namespace declarations. Therefore, processing must be done on the namespace processed XML Information Set, not on the XML text representation.
DTDs are not fully compatible with namespaces as they can not express the equality of nodes in the same namespace, but different namespace proxies. In a later version of this paper, an XML Schema should be added here.
This data for this use case is in the file "auction.xml".
List all unique namespaces used in the sample data.
Select the title of each record that is for sale.
Select all elements that have an attribute whose name is in the XML Schema namespace.
List the target URI's of all XLinks in the document.
Select all records that have a remark in German.
Select the closing time elements of all AnyZone auctions currently monitored.
Select the homepage of all auctions where both seller and high bidder are registered at the same auctioneer.
Select all traders (either seller or high bidder) without negative comments
This use case illustrates how a recursive query might be used to construct a hierarchic document of arbitrary depth from flat structures stored in a database.
This use case is based on a "parts explosion" database that contains information about how parts are used in other parts.
The input to the use case is a "flat" document in which each different part is represented by a <part> element with partid and name attributes. Each part may or may not be part of a larger part; if so, the partid of the larger part is contained in a partof attribute. This input document might be derived from a relational database in which each part is represented by a row of a table with partid as primary key and partof as a foreign key referencing partid.
The challenge of this use case is to write a query that converts the "flat" representation of the parts explosion, based on foreign keys, into a hierarchic representation in which part containment is represented by the structure of the document.
The input data set uses the following DTD:
attributes could have been of type ID and IDREF,
respectively, in this schema they are treated as
character data, possibly materialized in a
straightforward way from a relational database. Each
partof attribute matches exactly one
partid. Parts having no
partof attribute are not contained in any
The output data conforms to the following DTD:
Convert the sample document from "partlist" format to "parttree" format (see DTD section for definitions). In the result document, part containment is represented by containment of one <part> element inside another. Each part that is not part of any other part should appear as a separate top-level element in the output document.
Strongly typed and weakly typed data are both important
kinds of XML data. Most of the queries in this document
focus on weakly typed data that is governed by a DTD and
does not contain XML Schema simple datatypes or named
complex types. This use case explores XQuery's support
for types, using data that is governed by a strongly
typed XML Schema
The schema for this example is the International Purchase Order schema taken from the XML Schema Primer, which imports a schema for addresses. The main schema is found in a schema document named "ipo.xsd":
The address constructs are found in a schema document named "address.xsd":
The sample data used for the query is found in a file named "ipo.xml":
Count the invoices shipped to the United Kingdom.
In this dataset, the data for an address does not contain the name of the country, and the name of the shipTo element is the same regardless of the country to which items are shipped. Only the types allow us to identify UK addresses - in the schema, there is one address type for UK addresses and another for US addresses, both derived from a common base class. In the above query, we use the UKAddress type to identify invoices shipped to the UK.
Write a function that tests an American address to check if it has the right zip code.
In our solution, we will assume zip code data is stored in a file named "zips.xml", which looks like this.
The corresponding schema document is named "zips.xsd":
This is not a complete query, it is a function that is meant to be called in a query. We will use this function in Q4.
An attempt to call this function with an element of the wrong address type raises an error. For instance, you can not call z:zip-ok() with an element of type UKAddress.
Note that the parameter for this function specifies the type rather than the element name, since it is written to be used with any element that has the proper address type - for instance, in our sample schema, 'billTo' and 'shipTo' are two different elements which may both have the USAddress type.
Write a function that tests a UK address to see if it has the right postal code.
For England, we store the information needed to test postal codes in a file named "postals.xml", which looks like this:
Here is the schema for the above file.
This is not a complete query, it is a function that is meant to be called in a query. We will use this function in Q4.
Determine whether the postal code or zip code for a purchase order is right.
This query calls the functions defined in Q2 and Q3.
Note that the function local:address-ok() accepts any element whose type is ipo:Address, which is the base type for both ipo:UKAddress and ipo:USAddress. Note also that this function uses a typeswitch to select the appropriate function to test American or British addresses. This can be considered a primitive form of polymorphism.
Determine whether the shipping address matches the billing address.
In this function, note that the function specifies both the element names and the type names for its parameters.
Note also that the schema says both of these elements are local elements, defined only within a purchase order. The element test matches an element with a given name even if that element is locally declared.
Determine whether the invoice has a USAddress and gives all prices as USPrices.
Addresses are part of a type hierarchy, and the element name, ipo:shipTo, does not tell us whether an address is a US price or not, so we have to test the type.
This example is rather contrived, since the schema specifies that all prices are USPrice elements. Nevertheless, it illustrates the ability to easily combine information derived from type information with information derived from structure.
Write a function that returns the text of a comment. Call this function for each shipping comment found in an item shipped to Helen Zoe on the date 1999-12-01.
Our source schema models comments with the following substitution groups:
The following sample data contains instances of these substitution groups:
In this query, the function specifies ipo:comment as the name of the element, but any element in the substitution group of ipo:comment may also be passed to this function. That means that we can call the same function for ipo:shipComment elements or ipo:customerComment elements - for instance, the following query also succeeds:
Find all comments found in an item shipped to Helen Zoe on the date 1999-12-01, including all elements in the substitution group for ipo:comment.
any valid element in the substitution group of
Write a function that returns all comments found on an element, whether an item element or some other element that may have a comment.
In this schema, comments can occur on either a purchase order or on an item. In a more complete schema, they could presumably occur in other areas as well. This function returns all comments found on an element, regardless of the name of the element, illustrating the need to write functions that can accept any element as a parameter.
Write a function that determines whether the person listed in a billTo element is known to be a deadbeat, using a US database.
In American slang, a "deadbeat" is a person who fails to meet a financial obligation.
This query assumes that "deadbeats.xml" lists the names deadbeats in the following format:
Note that this function specifies both the element name and the type. The element name is specified because we do not want to embarrass recipients of gifts by calling this function for the shipping address by mistake. The type is specified because we would need to use a different database to identify deadbeats in other countries.
Also note that the XML file in this example has no schema. We assume that the processor omits validation or does lax validation.
Write a function that computes the total price for a sequence of item elements.
Here is a query that calls the function we just defined to get the total for an invoice (before calculating taxes and shipping charges):
This query illustrates the need to be able to pass a sequence as a parameter to a function.
If the input document contains more than one purchase order for the given date and person, a total will be computed for all purchase orders.
This is the schema given for the above report:
This report, which lists products sold by zip code, is based on the same international purchase report used in previous queries.
Here is a query that generates the desired report from a collection that contains US purchase orders:
The editors thank the members of the XML Query Working Group, which produced the material in this document.
The use cases in this paper were contributed by the following individuals:
|Use Case "R"||Don Chamberlin|
|Use Case "XMP"||Mary Fernandez, Jerome Simeon, Phil Wadler|
|Use Case "TREE"||Jonathan Robie|
|Use Case "PARTS"||Michael Rys|
|Use Case "NS"||Ingo Macherius|
|Use Case "STRING"||Umit Yalcinalp|
|Use Case "SEQ"||Jonathan Robie|
|Use Case "SGML"||Paula Angerstein|
|Use Case "STRONG"||Jonathan Robie and Phil Wadler. Schemas and data
Use case "XMP" has been previously published in
The editors also wish to thank the members of the other W3C Working Groups who have commented on earlier drafts, and Michael Dyck and for his critical reading and helpful suggestions. Michael Wenger found several bugs, and also found more elegant solutions to some of the use cases, which are now included in this document.
Updated status section.
Added note on static typing.
Comments should now be directed to Bugzilla.
Massimo and Dana have each changed their affiliation.
IPR statement has been changed to reflect that this will progress to a Note rather than a Recommendation.
Provided link to changes to Use Case Strong errors in Mozilla.
Minor editorial changes.
Aligned with current XQuery Working Draft.
Fixed the following bugs.
rdb-queries-results-q9 now uses month-from-date() and year-from-date(). The "get-" prefix, now obsolete, has been removed. Fixes
Removed whitespace from the price element in the source document for Use Case "XMP" - it was <price> 65.95</price>,
and is now <price> 65.95</price>. Fixes
Removed trailing whitespace from the <remark/> elements used in "auction.xml". Fixes bug
Need to discuss
Fixed many errors in Use Case "Strong" as proposed
Alignment with 04 April 2005 Working Draft of XQuery. This actually did not change the results of any of the queries, so it was just a matter of tweaking the front matter.
Many changes have been made since the last release, particularly:
Moved to the 29 October XQuery Working Drafts.
Corrected numerous NS queries to reflect current handling of namespaces.
Changed use of locally declared elements and attributes in SequenceType to match current XQuery language.
Fixed reported bugs, ran most queries using two implementations.
The following references are some of the works considered by the WG in deriving its use cases.