XML Query Use Cases

WD-xquery-use-cases-20050915

W3C Working Draft

15 September 2005 http://www.w3.org/TR/2005/WD-xquery-use-cases-20050915/ XML http://www.w3.org/TR/xquery-use-cases/ http://www.w3.org/TR/2005/WD-xquery-use-cases-20050404/ Don Chamberlin IBM Almaden Research Center chamberlin@almaden.ibm.com Peter Fankhauser Infonyte GmbH fankhauser@infonyte.com Daniela Florescu Oracle corporation dana.florescu@oracle.com Massimo Marchiori University of Venice massimo@w3.org Jonathan Robie DataDirect Technologies jonathan.robie@datadirect.com

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This version of the Use Cases document corresponds to the XQuery Working Draft released on 15 September 2005. The queries in this document have been parsed using a parser generated from the same grammar used to create the documentation for the XQuery Working Draft. The syntax of types has changed, and this is reflected in the queries. A number of errors have been corrected. See for more information on changes.

This is a W3C Working Draft for review by W3C Members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document has been produced as part of the W3C XML Activity, following the procedures set out for the W3C Process. The document has been written by the XML Query Working Group The goals of the XML Query working group are discussed in the XML Query Working Group charter.

The XML Query Working Group feels that the contents of this Working Draft are relatively stable, and therefore encourages feedback on this version.

As of this publication, the Working Group expects to eventually publish this document as a Working Group Note. It is not expected to become a W3C Recommendation, and therefore it has no associated W3C Patent Policy licensing obligations.

Comments for this specification should be entered into the last call issue tracking system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla. If access to that system is not feasible, you may send your comments to the W3C mailing list public-qt-comments@w3.org. (archived at http://lists.w3.org/Archives/Public/public-qt-comments/) with "[use]" at the beginning of the subject field.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.

This document specifies usage scenarios for XQuery.

English

Robie, 31 July 2002: Minor changes to ensure that examples parse with the current syntax. All occurrences of date() changed to current-date(), and editorial notes on dates deleted since they now work properly in FandO. Deleted use case REF, since we no longer support the => operator, and people did not feel this use case was a good illustration of the problem domain. Replaced use case FNPARM with use case STRONG. Replaced filter() in the table of contents query with a recursive function call, since filter() no longer exists. Deleted queries Q3 and Q6, which jury rigged some primitive full text search capabilities to get the right answer, but didn't solve the real underlying problem. Added the new W3C patent policy language to the status section. Replaced 'precedes' with '<<', replaced follows with '>>'.

Robie, 23 April 2002: Updated to reflect current language. Many use cases have been corrected based on testing and analysis done by Dana and me.

Robie, 17 Dec 2001: Changed all examples to the current syntax. Changed functions to support the current Functions and Operators equivalents.

Robie, 8 Jun 2001: Corrected many examples, converted all queries to the new XQuery syntax, and added use case FNPARM.

Robie, 15 Feb 2001: First stand-alone Working Draft. This material previously appeared as a part of the W3C XML Query Requirements Working Draft, but was placed into a separate document to make it easier to incorporate solutions.

Use Cases for XML Queries

The use cases listed below were created by the XML Query Working Group to illustrate important applications for an XML query language. Each use case is focused on a specific application area, and contains a Document Type Definition (DTD) and example input data. Each use case specifies a set of queries that might be applied to the input data, and the expected results for each query. Since the English description of each query is concise, the expected results form an important part of the definition of each query, specifying the expected output format. These use cases were originally published as part of the document, without solutions in concrete query languages. Now it is being republished with solutions for . These use cases are also being used by the W3C XML Query Testing Task Force.

The input environment for each use case is stated in its Document Type Definition (DTD) section. All of these use cases assume that input is provided in the form of one or more documents with specific names. For instance, the authors in a document may be accessed with expressions like this:

doc("http://bstore1.example.com/bib.xml")//author

Some implementations of XQuery bind input to external variables. If the environment has bound the external variable $b to the same document used in the above query, this expression would return the same set of authors:

$b//author

Some implementations of XQuery predefine a single 'context item', which is available at the root level of a query, and which is used to resolve paths that begin with a leading slash. In such an implementation, if the context item is bound to document node of the same well-formed document used in the previous examples, this expression would return the same set of authors:

//author

Previous versions of this document accessed implicit documents using the input() function, which no longer exists. The input() function had similar functionality to a predefined context item, except that it could be bound to a sequence of nodes, whereas the context item may only be bound to a single node. The use cases that used input() have been rewritten to use explicit file names.

Several implementors have asked that we make the queries from these use cases available in a separate file to make it easier for them to test their parsers. These queries may be found in . Also, the queries from the XQuery specification itself have been made available in .

To make output more readable, the output of queries has been formatted using whitespace which may not be returned by a query processor. This whitespace should not be considered normative for the correctness of results.

These use cases represent a snapshot of an ongoing work. Some important application areas are not yet adequately covered by a use case. The XML Query Working Group reserves the right to add, delete, or modify individual queries or whole use cases as the work progresses. The presence of a query in this set of use cases does not necessarily indicate that the query will be expressible in the XML Query Language(s) to be created by the XML Query Working Group.

Use Case "XMP": Experiences and Exemplars

This use case contains several example queries that illustrate requirements gathered from the database and document communities.

Document Type Definitions (DTD)

Most of the example queries in this use case are based on a bibliography document named "http://bstore1.example.com/bib.xml" with the following DTD:

<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED > <!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )> <!ELEMENT price (#PCDATA )> Sample Data

Here is the data found at "bstore1.example.com/bib.xml":

<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib> DTD for Q5

Q5 also uses information on book reviews and prices from a separate data source named "http://bstore2.example.com/reviews.xml" with the following DTD:

<!ELEMENT reviews (entry*)> <!ELEMENT entry (title, price, review)> <!ELEMENT title (#PCDATA)> <!ELEMENT price (#PCDATA)> <!ELEMENT review (#PCDATA)> Sample Data for Q5

Here are the contents of "http://bstore2.example.com/reviews.xml":

<reviews> <entry> <title>Data on the Web</title> <price>34.95</price> <review> A very good discussion of semi-structured database systems and XML. </review> </entry> <entry> <title>Advanced Programming in the Unix environment</title> <price>65.95</price> <review> A clear and detailed discussion of UNIX programming. </review> </entry> <entry> <title>TCP/IP Illustrated</title> <price>65.95</price> <review> One of the best books on TCP/IP. </review> </entry> </reviews> DTD for Q9

Q9 uses an input document named "books.xml", with the following DTD:

<!ELEMENT chapter (title, section*)> <!ELEMENT section (title, section*)> <!ELEMENT title (#PCDATA)> Data for Q9

Here are the contents of books.xml:

<chapter> <title>Data Model</title> <section> <title>Syntax For Data Model</title> </section> <section> <title>XML</title> <section> <title>Basic Syntax</title> </section> <section> <title>XML and Semistructured Data</title> </section> </section> </chapter> DTD for Q10

Q10 uses an input document named "prices.xml", with the following DTD:

<!ELEMENT prices (book*)> <!ELEMENT book (title, source, price)> <!ELEMENT title (#PCDATA)> <!ELEMENT source (#PCDATA)> <!ELEMENT price (#PCDATA)> Data for Q10

Here are the contents of prices.xml:

<prices> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>Advanced Programming in the Unix environment</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore2.example.com</source> <price>65.95</price> </book> <book> <title>TCP/IP Illustrated</title> <source>bstore1.example.com</source> <price>65.95</price> </book> <book> <title>Data on the Web</title> <source>bstore2.example.com</source> <price>34.95</price> </book> <book> <title>Data on the Web</title> <source>bstore1.example.com</source> <price>39.95</price> </book> </prices> Queries and Results Q1

List books published by Addison-Wesley after 1991, including their year and title.

Solution in XQuery:

<bib> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> } </bib>

Expected Result:

<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> </bib> Q2

Create a flat list of all the title-author pairs, with each pair enclosed in a "result" element.

Solution in XQuery:

<results> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> } </results>

Expected Result:

<results> <result> <title>TCP/IP Illustrated</title> <author> <last>Stevens</last> <first>W.</first> </author> </result> <result> <title>Advanced Programming in the Unix environment</title> <author> <last>Stevens</last> <first>W.</first> </author> </result> <result> <title>Data on the Web</title> <author> <last>Abiteboul</last> <first>Serge</first> </author> </result> <result> <title>Data on the Web</title> <author> <last>Buneman</last> <first>Peter</first> </author> </result> <result> <title>Data on the Web</title> <author> <last>Suciu</last> <first>Dan</first> </author> </result> </results> Q3

For each book in the bibliography, list the title and authors, grouped inside a "result" element.

Solution in XQuery:

<results> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book return <result> { $b/title } { $b/author } </result> } </results>

Expected Result:

<results> <result> <title>TCP/IP Illustrated</title> <author> <last>Stevens</last> <first>W.</first> </author> </result> <result> <title>Advanced Programming in the Unix environment</title> <author> <last>Stevens</last> <first>W.</first> </author> </result> <result> <title>Data on the Web</title> <author> <last>Abiteboul</last> <first>Serge</first> </author> <author> <last>Buneman</last> <first>Peter</first> </author> <author> <last>Suciu</last> <first>Dan</first> </author> </result> <result> <title>The Economics of Technology and Content for Digital TV</title> </result> </results> Q4

For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a "result" element.

Solution in XQuery:

<results> { let $a := doc("http://bstore1.example.com/bib/bib.xml")//author for $last in distinct-values($a/last), $first in distinct-values($a[last=$last]/first) order by $last, $first return <result> <author> <last>{ $last }</last> <first>{ $first }</first> </author> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book where some $ba in $b/author satisfies ($ba/last = $last and $ba/first=$first) return $b/title } </result> } </results>

The order in which values are returned by distinct-values() is undefined. The distinct-values() function returns atomic values, extracting the names from the elements.

Expected Result:

<results> <result> <author> <last>Abiteboul</last> <first>Serge</first> </author> <title>Data on the Web</title> </result> <result> <author> <last>Buneman</last> <first>Peter</first> </author> <title>Data on the Web</title> </result> <result> <author> <last>Stevens</last> <first>W.</first> </author> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title> </result> <result> <author> <last>Suciu</last> <first>Dan</first> </author> <title>Data on the Web</title> </result> </results> Q5

For each book found at both bstore1.example.com and bstore2.example.com, list the title of the book and its price from each source.

Solution in XQuery:

<books-with-prices> { for $b in doc("http://bstore1.example.com/bib.xml")//book, $a in doc("http://bstore2.example.com/reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <price-bstore2>{ $a/price/text() }</price-bstore2> <price-bstore1>{ $b/price/text() }</price-bstore1> </book-with-prices> } </books-with-prices>

Expected Result:

<books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-bstore2>65.95</price-bstore2> <price-bstore1>65.95</price-bstore1> </book-with-prices> <book-with-prices> <title>Data on the Web</title> <price-bstore2>34.95</price-bstore2> <price-bstore1>39.95</price-bstore1> </book-with-prices> </books-with-prices> Q6

For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors.

Solution in XQuery:

<bib> { for $b in doc("http://bstore1.example.com/bib.xml")//book where count($b/author) > 0 return <book> { $b/title } { for $a in $b/author[position()<=2] return $a } { if (count($b/author) > 2) then <et-al/> else () } </book> } </bib>

Expected Result:

<bib> <book> <title>TCP/IP Illustrated</title> <author> <last>Stevens</last> <first>W.</first> </author> </book> <book> <title>Advanced Programming in the Unix environment</title> <author> <last>Stevens</last> <first>W.</first> </author> </book> <book> <title>Data on the Web</title> <author> <last>Abiteboul</last> <first>Serge</first> </author> <author> <last>Buneman</last> <first>Peter</first> </author> <et-al/> </book> </bib> Q7

List the titles and years of all books published by Addison-Wesley after 1991, in alphabetic order.

Solution in XQuery:

<bib> { for $b in doc("http://bstore1.example.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 order by $b/title return <book> { $b/@year } { $b/title } </book> } </bib>

Expected Result:

<bib> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </book> </bib> Q8

Find books in which the name of some element ends with the string "or" and the same element contains the string "Suciu" somewhere in its content. For each such book, return the title and the qualifying element.

Solution in XQuery:

for $b in doc("http://bstore1.example.com/bib.xml")//book let $e := $b/*[contains(string(.), "Suciu") and ends-with(local-name(.), "or")] where exists($e) return <book> { $b/title } { $e } </book>

In the above solution, string(), local-name() and ends-with() are functions defined in the Functions and Operators document.

Expected Result:

<book> <title>Data on the Web</title> <author> <last>Suciu</last> <first>Dan</first> </author> </book> Q9

In the document "books.xml", find all section or chapter titles that contain the word "XML", regardless of the level of nesting.

Solution in XQuery:

<results> { for $t in doc("books.xml")//(chapter | section)/title where contains($t/text(), "XML") return $t } </results>

Expected Result:

<results> <title>XML</title> <title>XML and Semistructured Data</title> </results> Q10

In the document "prices.xml", find the minimum price for each book, in the form of a "minprice" element with the book title as its title attribute.

Solution in XQuery:

<results> { let $doc := doc("prices.xml") for $t in distinct-values($doc//book/title) let $p := $doc//book[title = $t]/price return <minprice title="{ $t }"> <price>{ min($p) }</price> </minprice> } </results>

Expected Result:

<results> <minprice title="Advanced Programming in the Unix environment"> <price>65.95</price> </minprice> <minprice title="TCP/IP Illustrated"> <price>65.95</price> </minprice> <minprice title="Data on the Web"> <price>34.95</price> </minprice> </results> Q11

For each book with an author, return the book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation.

Solution in XQuery:

<bib> { for $b in doc("http://bstore1.example.com/bib.xml")//book[author] return <book> { $b/title } { $b/author } </book> } { for $b in doc("http://bstore1.example.com/bib.xml")//book[editor] return <reference> { $b/title } {$b/editor/affiliation} </reference> } </bib>

Expected Result:

Find pairs of books that have different titles but the same set of authors (possibly in a different order).

Solution in XQuery:

<bib> { for $book1 in doc("http://bstore1.example.com/bib.xml")//book, $book2 in doc("http://bstore1.example.com/bib.xml")//book let $aut1 := for $a in $book1/author order by $a/last, $a/first return $a let $aut2 := for $a in $book2/author order by $a/last, $a/first return $a where $book1 << $book2 and not($book1/title = $book2/title) and deep-equal($aut1, $aut2) return <book-pair> { $book1/title } { $book2/title } </book-pair> } </bib>

Expected Result:

<bib> <book-pair> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title> </book-pair> </bib>

The above solution uses a function, deep-equal(), which compares sequences. Two sequences are equal if all items in corresponding positions in the two sequences are equal - if the sequences are node sequences, the values of the nodes are used for comparison.

Use Case "TREE": Queries that preserve hierarchy

Some XML document-types have a very flexible structure in which text is mixed with elements and many elements are optional. These document-types show a wide variation in structure from one document to another. In documents of these types, the ways in which elements are ordered and nested are usually quite important.

Description

An XML query language should have the ability to extract elements from documents while preserving their original hierarchy. This Use Case illustrates this requirement by means of a flexible document type named Book.

Document Type Definition (DTD)

This use case is based on an input document named "book.xml". The DTD for this schema is found in a file called "book.dtd":

<!ELEMENT book (title, author+, section+)> <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT section (title, (p | figure | section)* )> <!ATTLIST section id ID #IMPLIED difficulty CDATA #IMPLIED> <!ELEMENT p (#PCDATA)> <!ELEMENT figure (title, image)> <!ATTLIST figure width CDATA #REQUIRED height CDATA #REQUIRED > <!ELEMENT image EMPTY> <!ATTLIST image source CDATA #REQUIRED > Sample Data

The queries in this use case are based on the following sample data.

<?xml version="1.0"?> <!DOCTYPE book SYSTEM "book.dtd"> <book> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> <section id="intro" difficulty="easy" > <title>Introduction</title> <p>Text ... </p> <section> <title>Audience</title> <p>Text ... </p> </section> <section> <title>Web Data and the Two Cultures</title> <p>Text ... </p> <figure height="400" width="400"> <title>Traditional client/server architecture</title> <image source="csarch.gif"/> </figure> <p>Text ... </p> </section> </section> <section id="syntax" difficulty="medium" > <title>A Syntax For Data</title> <p>Text ... </p> <figure height="200" width="500"> <title>Graph representations of structures</title> <image source="graphs.gif"/> </figure> <p>Text ... </p> <section> <title>Base Types</title> <p>Text ... </p> </section> <section> <title>Representing Relational Databases</title> <p>Text ... </p> <figure height="250" width="400"> <title>Examples of Relations</title> <image source="relations.gif"/> </figure> </section> <section> <title>Representing Object Databases</title> <p>Text ... </p> </section> </section> </book> Queries and Results Q1

Prepare a (nested) table of contents for Book1, listing all the sections and their titles. Preserve the original attributes of each <section> element, if any.

Solution in XQuery:

declare function local:toc($book-or-section as element()) as element()* { for $section in $book-or-section/section return <section> { $section/@* , $section/title , local:toc($section) } </section> }; <toc> { for $s in doc("book.xml")/book return local:toc($s) } </toc>

Expected Result:

<toc> <section id="intro" difficulty="easy"> <title>Introduction</title> <section> <title>Audience</title> </section> <section> <title>Web Data and the Two Cultures</title> </section> </section> <section id="syntax" difficulty="medium"> <title>A Syntax For Data</title> <section> <title>Base Types</title> </section> <section> <title>Representing Relational Databases</title> </section> <section> <title>Representing Object Databases</title> </section> </section> </toc> Q2

Prepare a (flat) figure list for Book1, listing all the figures and their titles. Preserve the original attributes of each <figure> element, if any.

Solution in XQuery:

<figlist> { for $f in doc("book.xml")//figure return <figure> { $f/@* } { $f/title } </figure> } </figlist>

Expected Result:

<figlist> <figure height="400" width="400"> <title>Traditional client/server architecture</title> </figure> <figure height="200" width="500"> <title>Graph representations of structures</title> </figure> <figure height="250" width ="400"> <title>Examples of Relations</title> </figure> </figlist> Q3

How many sections are in Book1, and how many figures?

Solution in XQuery:

<section_count>{ count(doc("book.xml")//section) }</section_count>, <figure_count>{ count(doc("book.xml")//figure) }</figure_count>

Expected Result:

<section_count>7</section_count> <figure_count>3</figure_count> Q4

How many top-level sections are in Book1?

Solution in XQuery:

<top_section_count> { count(doc("book.xml")/book/section) } </top_section_count>

Expected Result:

<top_section_count>2</top_section_count> Q5

Make a flat list of the section elements in Book1. In place of its original attributes, each section element should have two attributes, containing the title of the section and the number of figures immediately contained in the section.

Solution in XQuery:

<section_list> { for $s in doc("book.xml")//section let $f := $s/figure return <section title="{ $s/title/text() }" figcount="{ count($f) }"/> } </section_list>

Expected Result:

<section_list> <section title="Introduction" figcount="0"/> <section title="Audience" figcount="0"/> <section title="Web Data and the Two Cultures" figcount="1"/> <section title="A Syntax For Data" figcount="1"/> <section title="Base Types" figcount="0"/> <section title="Representing Relational Databases" figcount="1"/> <section title="Representing Object Databases" figcount="0"/> </section_list> Q6

Make a nested list of the section elements in Book1, preserving their original attributes and hierarchy. Inside each section element, include the title of the section and an element that includes the number of figures immediately contained in the section.

Solution in XQuery:

declare function local:section-summary($book-or-section as element()*) as element()* { for $section in $book-or-section return <section> { $section/@* } { $section/title } <figcount> { count($section/figure) } </figcount> { local:section-summary($section/section) } </section> }; <toc> { for $s in doc("book.xml")/book/section return local:section-summary($s) } </toc> This solution was provided by Michael Wenger, a student at the University of Würzburg.

Expected Result:

<toc> <section id="intro" difficulty="easy"> <title>Introduction</title> <figcount>0</figcount> <section> <title>Audience</title> <figcount>0</figcount> </section> <section> <title>Web Data and the Two Cultures</title> <figcount>1</figcount> </section> </section> <section id="syntax" difficulty="medium"> <title>A Syntax For Data</title> <figcount>1</figcount> <section> <title>Base Types</title> <figcount>0</figcount> </section> <section> <title>Representing Relational Databases</title> <figcount>1</figcount> </section> <section> <title>Representing Object Databases</title> <figcount>0</figcount> </section> </section> </toc> Use Case "SEQ" - Queries based on Sequence

This use case illustrates queries based on the sequence in which elements appear in a document.

Description

Although sequence is not significant in most traditional database systems or object systems, it can be quite significant in structured documents. This use case presents a series of queries based on a medical report.

Document Type Definition (DTD)

This use case is based on a medical report using the HL7 Patient Record Architecture. We simplify the DTD in this example, using only what is needed to understand the queries.

<!DOCTYPE report [ <!ELEMENT report (section*)> <!ELEMENT section (section.title, section.content)> <!ELEMENT section.title (#PCDATA )> <!ELEMENT section.content (#PCDATA | anesthesia | prep | incision | action | observation )*> <!ELEMENT anesthesia (#PCDATA)> <!ELEMENT prep ( (#PCDATA | action)* )> <!ELEMENT incision ( (#PCDATA | geography | instrument)* )> <!ELEMENT action ( (#PCDATA | instrument )* )> <!ELEMENT observation (#PCDATA)> <!ELEMENT geography (#PCDATA)> <!ELEMENT instrument (#PCDATA)> ]> Sample Data

The queries in this use case are based on the following sample data.

<report> <section> <section.title>Procedure</section.title> <section.content> The patient was taken to the operating room where she was placed in supine position and <anesthesia>induced under general anesthesia.</anesthesia> <prep> <action>A Foley catheter was placed to decompress the bladder</action> and the abdomen was then prepped and draped in sterile fashion. </prep> <incision> A curvilinear incision was made <geography>in the midline immediately infraumbilical</geography> and the subcutaneous tissue was divided <instrument>using electrocautery.</instrument> </incision> The fascia was identified and <action>#2 0 Maxon stay sutures were placed on each side of the midline. </action> <incision> The fascia was divided using <instrument>electrocautery</instrument> and the peritoneum was entered. </incision> <observation>The small bowel was identified.</observation> and <action> the <instrument>Hasson trocar</instrument> was placed under direct visualization. </action> <action> The <instrument>trocar</instrument> was secured to the fascia using the stay sutures. </action> </section.content> </section> </report> Queries and Results Q1

In the Procedure section of Report1, what Instruments were used in the second Incision?

Solution in XQuery:

for $s in doc("report1.xml")//section[section.title = "Procedure"] return ($s//incision)[2]/instrument

Expected Result:

<instrument>electrocautery</instrument> Q2

In the Procedure section of Report1, what are the first two Instruments to be used?

Solution in XQuery:

for $s in doc("report1.xml")//section[section.title = "Procedure"] return ($s//instrument)[position()<=2]

Expected Result:

<instrument>using electrocautery.</instrument> <instrument>electrocautery</instrument> Q3

In Report1, what Instruments were used in the first two Actions after the second Incision?

Solution in XQuery:

let $i2 := (doc("report1.xml")//incision)[2] for $a in (doc("report1.xml")//action)[. >> $i2][position()<=2] return $a//instrument

Expected Result:

<instrument>Hasson trocar</instrument> <instrument>trocar</instrument> Q4

In Report1, find "Procedure" sections where no Anesthesia element occurs before the first Incision

Solution in XQuery:

for $p in doc("report1.xml")//section[section.title = "Procedure"] where not(some $a in $p//anesthesia satisfies $a << ($p//incision)[1] ) return $p

Expected Result:

(No sections satisfy Q4, thankfully.)

In Report1, what happened between the first Incision and the second Incision?

Solution in XQuery:

declare function local:precedes($a as node(), $b as node()) as xs:boolean { $a << $b and empty($a//node() intersect $b) }; declare function local:follows($a as node(), $b as node()) as xs:boolean { $a >> $b and empty($b//node() intersect $a) }; <critical_sequence> { let $proc := doc("report1.xml")//section[section.title="Procedure"][1] for $n in $proc//node() where local:follows($n, ($proc//incision)[1]) and local:precedes($n, ($proc//incision)[2]) return $n } </critical_sequence>

Here is another solution that is perhaps more efficient and less readable:

<critical_sequence> { let $proc := doc("report1.xml")//section[section.title="Procedure"][1], $i1 := ($proc//incision)[1], $i2 := ($proc//incision)[2] for $n in $proc//node() except $i1//node() where $n >> $i1 and $n << $i2 return $n } </critical_sequence>

Expected Result:

<critical_sequence> The fascia was identified and <action>#2 0 Maxon stay sutures were placed on each side of the midline. </action>#2 0 Maxon stay sutures were placed on each side of the midline. </critical_sequence>

In the above output, the contents of the critical sequence element include a text node, an action element, and the text node containing the content of the action element. But the serialization we are using already shows all descendants of a given node. If $c is bound to a sequence of nodes, the following expression eliminates members of the sequence that are descendants of another node already found in the sequence:

$c except $c//node()

In the following solution, the between() function takes a sequence of nodes, a starting node, and an ending node, and returns the nodes between them:

declare function local:between($seq as node()*, $start as node(), $end as node()) as item()* { let $nodes := for $n in $seq except $start//node() where $n >> $start and $n << $end return $n return $nodes except $nodes//node() }; <critical_sequence> { let $proc := doc("report1.xml")//section[section.title="Procedure"][1], $first := ($proc//incision)[1], $second:= ($proc//incision)[2] return local:between($proc//node(), $first, $second) } </critical_sequence>

Here is the output from the above query:

<critical_sequence> The fascia was identified and <action>#2 0 Maxon stay sutures were placed on each side of the midline. </action> </critical_sequence> Use Case "R" - Access to Relational Data

One important use of an XML query language will be to access data stored in relational databases. This use case describes one possible way in which this access might be accomplished.

Description

A relational database system might present a view in which each table (relation) takes the form of an XML document. One way to represent a database table as an XML document is to allow the document element to represent the table itself, and each row (tuple) inside the table to be represented by a nested element. Inside the tuple-elements, each column is in turn represented by a nested element. Columns that allow null values are represented by optional elements, and a missing element denotes a null value.

As an example, consider a relational database used by an online auction. The auction maintains a USERS table containing information on registered users, each identified by a unique userid, who can either offer items for sale or bid on items. An ITEMS table lists items currently or recently for sale, with the userid of the user who offered each item. A BIDS table contains all bids on record, keyed by the userid of the bidder and the item number of the item to which the bid applies.

The three tables used by the online auction are below, with their column-names indicated in parentheses.

USERS ( USERID, NAME, RATING ) ITEMS ( ITEMNO, DESCRIPTION, OFFERED_BY, START_DATE, END_DATE, RESERVE_PRICE ) BIDS ( USERID, ITEMNO, BID, BID_DATE ) Document Type Definition (DTD)

This use case is based on three separate input documents named users.xml, items.xml, and bids.xml. Each of the documents represents one of the tables in the relational database described above, using the following DTDs:

<!DOCTYPE users [ <!ELEMENT users (user_tuple*)> <!ELEMENT user_tuple (userid, name, rating?)> <!ELEMENT userid (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT rating (#PCDATA)> ]> <!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT start_date (#PCDATA)> <!ELEMENT end_date (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)> ]> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid (#PCDATA)> <!ELEMENT bid_date (#PCDATA)> ]> Sample Data

Here is an abbreviated set of data showing the XML format of the instances:

<items> <item_tuple> <itemno>1001</itemno> <description>Red Bicycle</description> <offered_by>U01</offered_by> <start_date>1999-01-05</start_date> <end_date>1999-01-20</end_date> <reserve_price>40</reserve_price> </item_tuple>  <users> <user_tuple> <userid>U01</userid> <name>Tom Jones</name> <rating>B</rating> </user_tuple>  <bids> <bid_tuple> <userid>U02</userid> <itemno>1001</itemno> <bid>35</bid> <bid_date>1999-01-07</bid_date> </bid_tuple> <bid_tuple>

The entire data set is represented by the following table:

USERS
USERID	NAME	RATING
U01	Tom Jones	B
U02	Mary Doe	A
U03	Dee Linquent	D
U04	Roger Smith	C
U05	Jack Sprat	B
U06	Rip Van Winkle	B

ITEMS
ITEMNO	DESCRIPTION	OFFERED_BY	START_DATE	END_DATE	RESERVE_PRICE
1001	Red Bicycle	U01	1999-01-05	1999-01-20	40
1002	Motorcycle	U02	1999-02-11	1999-03-15	500
1003	Old Bicycle	U02	1999-01-10	1999-02-20	25
1004	Tricycle	U01	1999-02-25	1999-03-08	15
1005	Tennis Racket	U03	1999-03-19	1999-04-30	20
1006	Helicopter	U03	1999-05-05	1999-05-25	50000
1007	Racing Bicycle	U04	1999-01-20	1999-02-20	200
1008	Broken Bicycle	U01	1999-02-05	1999-03-06	25

BIDS
USERID	ITEMNO	BID	BID_DATE
U02	1001	35	1999-01-07
U04	1001	40	1999-01-08
U02	1001	45	1999-01-11
U04	1001	50	1999-01-13
U02	1001	55	1999-01-15
U01	1002	400	1999-02-14
U02	1002	600	1999-02-16
U03	1002	800	1999-02-17
U04	1002	1000	1999-02-25
U02	1002	1200	1999-03-02
U04	1003	15	1999-01-22
U05	1003	20	1999-02-03
U01	1004	40	1999-03-05
U03	1007	175	1999-01-25
U05	1007	200	1999-02-08
U04	1007	225	1999-02-12

Queries and Results Q1

List the item number and description of all bicycles that currently have an auction in progress, ordered by item number.

Solution in XQuery:

<result> { for $i in doc("items.xml")//item_tuple where $i/start_date <= current-date() and $i/end_date >= current-date() and contains($i/description, "Bicycle") order by $i/itemno return <item_tuple> { $i/itemno } { $i/description } </item_tuple> } </result>

This solution assumes that the current date is 1999-01-31.

Expected Result:

<result> <item_tuple> <itemno>1003</itemno> <description>Old Bicycle</description> </item_tuple> <item_tuple> <itemno>1007</itemno> <description>Racing Bicycle</description> </item_tuple> </result>

The above query returns an element named item_tuple, but its definition does not match the definition of item_tuple in the DTD.

For all bicycles, list the item number, description, and highest bid (if any), ordered by item number.

Solution in XQuery:

<result> { for $i in doc("items.xml")//item_tuple let $b := doc("bids.xml")//bid_tuple[itemno = $i/itemno] where contains($i/description, "Bicycle") order by $i/itemno return <item_tuple> { $i/itemno } { $i/description } <high_bid>{ max($b/bid) }</high_bid> </item_tuple> } </result>

Expected Result:

<result> <item_tuple> <itemno>1001</itemno> <description>Red Bicycle</description> <high_bid>55</high_bid> </item_tuple> <item_tuple> <itemno>1003</itemno> <description>Old Bicycle</description> <high_bid>20</high_bid> </item_tuple> <item_tuple> <itemno>1007</itemno> <description>Racing Bicycle</description> <high_bid>225</high_bid> </item_tuple> <item_tuple> <itemno>1008</itemno> <description>Broken Bicycle</description> <high_bid></high_bid> </item_tuple> </result> Q3

Find cases where a user with a rating worse (alphabetically, greater) than "C" is offering an item with a reserve price of more than 1000.

Solution in XQuery:

<result> { for $u in doc("users.xml")//user_tuple for $i in doc("items.xml")//item_tuple where $u/rating > "C" and $i/reserve_price > 1000 and $i/offered_by = $u/userid return <warning> { $u/name } { $u/rating } { $i/description } { $i/reserve_price } </warning> } </result>

Expected Result:

<result> <warning> <name>Dee Linquent</name> <rating>D</rating> <description>Helicopter</description> <reserve_price>50000</reserve_price> </warning> </result> Q4

List item numbers and descriptions of items that have no bids.

Solution in XQuery:

<result> { for $i in doc("items.xml")//item_tuple where empty(doc("bids.xml")//bid_tuple[itemno = $i/itemno]) return <no_bid_item> { $i/itemno } { $i/description } </no_bid_item> } </result>

Expected Result:

<result> <no_bid_item> <itemno>1005</itemno> <description>Tennis Racket</description> </no_bid_item> <no_bid_item> <itemno>1006</itemno> <description>Helicopter</description> </no_bid_item> <no_bid_item> <itemno>1008</itemno> <description>Broken Bicycle</description> </no_bid_item> </result> Q5

For bicycle(s) offered by Tom Jones that have received a bid, list the item number, description, highest bid, and name of the highest bidder, ordered by item number.

Solution in XQuery:

<result> { for $seller in doc("users.xml")//user_tuple, $buyer in doc("users.xml")//user_tuple, $item in doc("items.xml")//item_tuple, $highbid in doc("bids.xml")//bid_tuple where $seller/name = "Tom Jones" and $seller/userid = $item/offered_by and contains($item/description , "Bicycle") and $item/itemno = $highbid/itemno and $highbid/userid = $buyer/userid and $highbid/bid = max( doc("bids.xml")//bid_tuple [itemno = $item/itemno]/bid ) order by ($item/itemno) return <jones_bike> { $item/itemno } { $item/description } <high_bid>{ $highbid/bid }</high_bid> <high_bidder>{ $buyer/name }</high_bidder> </jones_bike> } </result>

The above query does several joins, and requires the results in a particular order. If there were no order by clause, results would be reported in document order. If you do not care about the order, you can use the unordered function to inform the query processor that the order of the lists in the for clause is not significant, which means that the tuples can be generated in any order. This can enable better optimization.

Unordered Solution in XQuery:

<result> { unordered ( for $seller in doc("users.xml")//user_tuple, $buyer in doc("users.xml")//user_tuple, $item in doc("items.xml")//item_tuple, $highbid in doc("bids.xml")//bid_tuple where $seller/name = "Tom Jones" and $seller/userid = $item/offered_by and contains($item/description , "Bicycle") and $item/itemno = $highbid/itemno and $highbid/userid = $buyer/userid and $highbid/bid = max( doc("bids.xml")//bid_tuple [itemno = $item/itemno]/bid ) return <jones_bike> { $item/itemno } { $item/description } <high_bid>{ $highbid/bid }</high_bid> <high_bidder>{ $buyer/name }</high_bidder> </jones_bike> ) } </result>

Expected Result:

<result> <jones_bike> <itemno>1001</itemno> <description>Red Bicycle</description> <high_bid> <bid>55</bid> </high_bid> <high_bidder> <name>Mary Doe</name> </high_bidder> </jones_bike> </result> Q6

For each item whose highest bid is more than twice its reserve price, list the item number, description, reserve price, and highest bid.

Solution in XQuery:

<result> { for $item in doc("items.xml")//item_tuple let $b := doc("bids.xml")//bid_tuple[itemno = $item/itemno] let $z := max($b/bid) where $item/reserve_price * 2 < $z return <successful_item> { $item/itemno } { $item/description } { $item/reserve_price } <high_bid>{$z }</high_bid> </successful_item> } </result>

Expected Result:

<result> <successful_item> <itemno>1002</itemno> <description>Motorcycle</description> <reserve_price>500</reserve_price> <high_bid>1200</high_bid> </successful_item> <successful_item> <itemno>1004</itemno> <description>Tricycle</description> <reserve_price>15</reserve_price> <high_bid>40</high_bid> </successful_item> </result> Q7

Find the highest bid ever made for a bicycle or tricycle.

Solution in XQuery:

let $allbikes := doc("items.xml")//item_tuple [contains(description, "Bicycle") or contains(description, "Tricycle")] let $bikebids := doc("bids.xml")//bid_tuple[itemno = $allbikes/itemno] return <high_bid> { max($bikebids/bid) } </high_bid>

Expected Result:

<high_bid>225</high_bid> Q8

How many items were actioned (auction ended) in March 1999?

Solution in XQuery:

let $item := doc("items.xml")//item_tuple [end_date >= xs:date("1999-03-01") and end_date <= xs:date("1999-03-31")] return <item_count> { count($item) } </item_count>

Expected Result:

<item_count>3</item_count> Q9

List the number of items auctioned each month in 1999 for which data is available, ordered by month.

Solution in XQuery:

<result> { let $end_dates := doc("items.xml")//item_tuple/end_date for $m in distinct-values(for $e in $end_dates return month-from-date($e)) let $item := doc("items.xml") //item_tuple[year-from-date(end_date) = 1999 and month-from-date(end_date) = $m] order by $m return <monthly_result> <month>{ $m }</month> <item_count>{ count($item) }</item_count> </monthly_result> } </result>

Expected Result:

<result> <monthly_result> <month>1</month> <item_count>1</item_count> </monthly_result> <monthly_result> <month>2</month> <item_count>2</item_count> </monthly_result> <monthly_result> <month>3</month> <item_count>3</item_count> </monthly_result> <monthly_result> <month>4</month> <item_count>1</item_count> </monthly_result> <monthly_result> <month>5</month> <item_count>1</item_count> </monthly_result> </result> Q10

For each item that has received a bid, list the item number, the highest bid, and the name of the highest bidder, ordered by item number.

Solution in XQuery:

<result> { for $highbid in doc("bids.xml")//bid_tuple, $user in doc("users.xml")//user_tuple where $user/userid = $highbid/userid and $highbid/bid = max(doc("bids.xml")//bid_tuple[itemno=$highbid/itemno]/bid) order by $highbid/itemno return <high_bid> { $highbid/itemno } { $highbid/bid } <bidder>{ $user/name/text() }</bidder> </high_bid> } </result>

Expected Result:

<result> <high_bid> <itemno>1001</itemno> <bid>55</bid> <bidder>Mary Doe</bidder> </high_bid> <high_bid> <itemno>1002</itemno> <bid>1200</bid> <bidder>Mary Doe</bidder> </high_bid> <high_bid> <itemno>1003</itemno> <bid>20</bid> <bidder>Jack Sprat</bidder> </high_bid> <high_bid> <itemno>1004</itemno> <bid>40</bid> <bidder>Tom Jones</bidder> </high_bid> <high_bid> <itemno>1007</itemno> <bid>225</bid> <bidder>Roger Smith</bidder> </high_bid> </result> Q11

List the item number and description of the item(s) that received the highest bid ever recorded, and the amount of that bid.

Solution in XQuery:

let $highbid := max(doc("bids.xml")//bid_tuple/bid) return <result> { for $item in doc("items.xml")//item_tuple, $b in doc("bids.xml")//bid_tuple[itemno = $item/itemno] where $b/bid = $highbid return <expensive_item> { $item/itemno } { $item/description } <high_bid>{ $highbid }</high_bid> </expensive_item> } </result>

Expected Result:

<result> <expensive_item> <itemno>1002</itemno> <description>Motorcycle</description> <high_bid>1200</high_bid> </expensive_item> </result> Q12

List the item number and description of the item(s) that received the largest number of bids, and the number of bids it (or they) received.

Solution in XQuery:

declare function local:bid_summary() as element()* { for $i in distinct-values(doc("bids.xml")//itemno) let $b := doc("bids.xml")//bid_tuple[itemno = $i] return <bid_count> <itemno>{ $i }</itemno> <nbids>{ count($b) }</nbids> </bid_count> }; <result> { let $bid_counts := local:bid_summary(), $maxbids := max($bid_counts/nbids), $maxitemnos := $bid_counts[nbids = $maxbids] for $item in doc("items.xml")//item_tuple, $bc in $bid_counts where $bc/nbids = $maxbids and $item/itemno = $bc/itemno return <popular_item> { $item/itemno } { $item/description } <bid_count>{ $bc/nbids/text() }</bid_count> </popular_item> } </result>

Expected Result:

<result> <popular_item> <itemno>1001</itemno> <description>Red Bicycle</description> <bid_count>5</bid_count> </popular_item> <popular_item> <itemno>1002</itemno> <description>Motorcycle</description> <bid_count>5</bid_count> </popular_item> </result> Q13

For each user who has placed a bid, give the userid, name, number of bids, and average bid, in order by userid.

Solution in XQuery:

<result> { for $uid in distinct-values(doc("bids.xml")//userid), $u in doc("users.xml")//user_tuple[userid = $uid] let $b := doc("bids.xml")//bid_tuple[userid = $uid] order by $u/userid return <bidder> { $u/userid } { $u/name } <bidcount>{ count($b) }</bidcount> <avgbid>{ avg($b/bid) }</avgbid> </bidder> } </result>

Expected Result:

<result> <bidder> <userid>U01</userid> <name>Tom Jones</name> <bidcount>2</bidcount> <avgbid>220</avgbid> </bidder> <bidder> <userid>U02</userid> <name>Mary Doe</name> <bidcount>5</bidcount> <avgbid>387</avgbid> </bidder> <bidder> <userid>U03</userid> <name>Dee Linquent</name> <bidcount>2</bidcount> <avgbid>487.5</avgbid> </bidder> <bidder> <userid>U04</userid> <name>Roger Smith</name> <bidcount>5</bidcount> <avgbid>266</avgbid> </bidder> <bidder> <userid>U05</userid> <name>Jack Sprat</name> <bidcount>2</bidcount> <avgbid>110</avgbid> </bidder> </result> Q14

List item numbers and average bids for items that have received three or more bids, in descending order by average bid.

Solution in XQuery:

<result> { for $i in distinct-values(doc("bids.xml")//itemno) let $b := doc("bids.xml")//bid_tuple[itemno = $i] let $avgbid := avg($b/bid) where count($b) >= 3 order by $avgbid descending return <popular_item> <itemno>{ $i }</itemno> <avgbid>{ $avgbid }</avgbid> </popular_item> } </result>

Expected Result:

<result> <popular_item> <itemno>1002</itemno> <avgbid>800</avgbid> </popular_item> <popular_item> <itemno>1007</itemno> <avgbid>200</avgbid> </popular_item> <popular_item> <itemno>1001</itemno> <avgbid>45</avgbid> </popular_item> </result> Q15

List names of users who have placed multiple bids of at least $100 each.

Solution in XQuery:

<result> { for $u in doc("users.xml")//user_tuple let $b := doc("bids.xml")//bid_tuple[userid=$u/userid and bid>=100] where count($b) > 1 return <big_spender>{ $u/name/text() }</big_spender> } </result>

Expected Result:

<result> <big_spender>Mary Doe</big_spender> <big_spender>Dee Linquent</big_spender> <big_spender>Roger Smith</big_spender> </result> Q16

List all registered users in order by userid; for each user, include the userid, name, and an indication of whether the user is active (has at least one bid on record) or inactive (has no bid on record).

Solution in XQuery:

<result> { for $u in doc("users.xml")//user_tuple let $b := doc("bids.xml")//bid_tuple[userid = $u/userid] order by $u/userid return <user> { $u/userid } { $u/name } { if (empty($b)) then <status>inactive</status> else <status>active</status> } </user> } </result>

Expected Result:

<result> <user> <userid>U01</userid> <name>Tom Jones</name> <status>active</status> </user> <user> <userid>U02</userid> <name>Mary Doe</name> <status>active</status> </user> <user> <userid>U03</userid> <name>Dee Linquent</name> <status>active</status> </user> <user> <userid>U04</userid> <name>Roger Smith</name> <status>active</status> </user> <user> <userid>U05</userid> <name>Jack Sprat</name> <status>active</status> </user> <user> <userid>U06</userid> <name>Rip Van Winkle</name> <status>inactive</status> </user> </result> Q17

List the names of users, if any, who have bid on every item.

Solution in XQuery:

<frequent_bidder> { for $u in doc("users.xml")//user_tuple where every $item in doc("items.xml")//item_tuple satisfies some $b in doc("bids.xml")//bid_tuple satisfies ($item/itemno = $b/itemno and $u/userid = $b/userid) return $u/name } </frequent_bidder>

Expected Result:

<frequent_bidder />

(No users satisfy Q17.)

Q18

List all users in alphabetic order by name. For each user, include descriptions of all the items (if any) that were bid on by that user, in alphabetic order.

Solution in XQuery:

<result> { for $u in doc("users.xml")//user_tuple order by $u/name return <user> { $u/name } { for $b in distinct-values(doc("bids.xml")//bid_tuple [userid = $u/userid]/itemno) for $i in doc("items.xml")//item_tuple[itemno = $b] let $descr := $i/description/text() order by $descr return <bid_on_item>{ $descr }</bid_on_item> } </user> } </result>

Expected Result:

<result> <user> <name>Dee Linquent</name> <bid_on_item>Motorcycle</bid_on_item> <bid_on_item>Racing Bicycle</bid_on_item> </user> <user> <name>Jack Sprat</name> <bid_on_item>Old Bicycle</bid_on_item> <bid_on_item>Racing Bicycle</bid_on_item> </user> <user> <name>Mary Doe</name> <bid_on_item>Motorcycle</bid_on_item> <bid_on_item>Red Bicycle</bid_on_item> </user> <user> <name>Rip Van Winkle</name> </user> <user> <name>Roger Smith</name> <bid_on_item>Motorcycle</bid_on_item> <bid_on_item>Old Bicycle</bid_on_item> <bid_on_item>Racing Bicycle</bid_on_item> <bid_on_item>Red Bicycle</bid_on_item> </user> <user> <name>Tom Jones</name> <bid_on_item>Motorcycle</bid_on_item> <bid_on_item>Tricycle</bid_on_item> </user> </result> Use Case "SGML": Standard Generalized Markup Language Description

The example document and queries in this Use Case were first created for a 1992 conference on Standard Generalized Markup Language (SGML). For our use, the Document Type Definition (DTD) and example document have been translated from SGML to XML.

Document Type Definition (DTD)

This use case is based on data conforming to the DTD shown below.

<!NOTATION cgm PUBLIC "Computer Graphics Metafile"> <!NOTATION ccitt PUBLIC "CCITT group 4 raster"> <!ENTITY % text "(#PCDATA | emph)*"> <!ENTITY infoflow SYSTEM "infoflow.ccitt" NDATA ccitt> <!ENTITY tagexamp SYSTEM "tagexamp.cgm" NDATA cgm> <!ELEMENT report (title, chapter+)> <!ELEMENT title %text;> <!ELEMENT chapter (title, intro?, section*)> <!ATTLIST chapter shorttitle CDATA #IMPLIED> <!ELEMENT intro (para | graphic)+> <!ELEMENT section (title, intro?, topic*)> <!ATTLIST section shorttitle CDATA #IMPLIED sectid ID #IMPLIED> <!ELEMENT topic (title, (para | graphic)+)> <!ATTLIST topic shorttitle CDATA #IMPLIED topicid ID #IMPLIED> <!ELEMENT para (#PCDATA | emph | xref)*> <!ATTLIST para security (u | c | s | ts) "u"> <!ELEMENT emph %text;> <!ELEMENT graphic EMPTY> <!ATTLIST graphic graphname ENTITY #REQUIRED> <!ELEMENT xref EMPTY> <!ATTLIST xref xrefid IDREF #IMPLIED> Sample Data

The queries in this use case are based on the following sample data, which is found in the file "sgml.xml". Line numbers have been added to the data to allow the results of queries to be conveniently specified.

0: <!DOCTYPE report SYSTEM "report.dtd"> 1: <report> 2: <title>Getting started with SGML</title> 3: <chapter> 4: <title>The business challenge</title> 5: <intro> 6: <para>With the ever-changing and growing global market, companies and 7: large organizations are searching for ways to become more viable and 8: competitive. Downsizing and other cost-cutting measures demand more 9: efficient use of corporate resources. One very important resource is 10: an organization's information.</para> 11: <para>As part of the move toward integrated information management, 12: whole industries are developing and implementing standards for 13: exchanging technical information. This report describes how one such 14: standard, the Standard Generalized Markup Language (SGML), works as 15: part of an overall information management strategy.</para> 16: <graphic graphname="infoflow"/></intro></chapter> 17: <chapter> 18: <title>Getting to know SGML</title> 19: <intro> 20: <para>While SGML is a fairly recent technology, the use of 21: <emph>markup</emph> in computer-generated documents has existed for a 22: while.</para></intro> 23: <section shorttitle="What is markup?"> 24: <title>What is markup, or everything you always wanted to know about 25: document preparation but were afraid to ask?</title> 26: <intro> 27: <para>Markup is everything in a document that is not content. The 28: traditional meaning of markup is the manual <emph>marking</emph> up 29: of typewritten text to give instructions for a typesetter or 30: compositor about how to fit the text on a page and what typefaces to 31: use. This kind of markup is known as <emph>procedural markup</emph>.</para></intro> 32: <topic topicid="top1"> 33: <title>Procedural markup</title> 34: <para>Most electronic publishing systems today use some form of 35: procedural markup. Procedural markup codes are good for one 36: presentation of the information.</para></topic> 37: <topic topicid="top2"> 38: <title>Generic markup</title> 39: <para>Generic markup (also known as descriptive markup) describes the 40: <emph>purpose</emph> of the text in a document. A basic concept of 41: generic markup is that the content of a document must be separate from 42: the style. Generic markup allows for multiple presentations of the 43: information.</para></topic> 44: <topic topicid="top3"> 45: <title>Drawbacks of procedural markup</title> 46: <para>Industries involved in technical documentation increasingly 47: prefer generic over procedural markup schemes. When a company changes 48: software or hardware systems, enormous data translation tasks arise, 49: often resulting in errors.</para></topic></section> 50: <section shorttitle="What is SGML?"> 51: <title>What <emph>is</emph> SGML in the grand scheme of the universe, anyway?</title> 52: <intro> 53: <para>SGML defines a strict markup scheme with a syntax for defining 54: document data elements and an overall framework for marking up 55: documents.</para> 56: <para>SGML can describe and create documents that are not dependent on 57: any hardware, software, formatter, or operating system. Since SGML documents 58: conform to an international standard, they are portable.</para></intro></section> 59: <section shorttitle="How does SGML work?"> 60: <title>How is SGML and would you recommend it to your grandmother?</title> 61: <intro> 62: <para>You can break a typical document into three layers: structure, 63: content, and style. SGML works by separating these three aspects and 64: deals mainly with the relationship between structure and content.</para></intro> 65: <topic topicid="top4"> 66: <title>Structure</title> 67: <para>At the heart of an SGML application is a file called the DTD, or 68: Document Type Definition. The DTD sets up the structure of a document, 69: much like a database schema describes the types of information it 70: handles.</para> 71: <para>A database schema also defines the relationships between the 72: various types of data. Similarly, a DTD specifies <emph>rules</emph> 73: to help ensure documents have a consistent, logical structure.</para></topic> 74: <topic topicid="top5"> 75: <title>Content</title> 76: <para>Content is the information itself. The method for identifying 77: the information and its meaning within this framework is called 78: <emph>tagging</emph>. Tagging must 79: conform to the rules established in the DTD (see <xref xrefid="top4"/>).</para> 80: <graphic graphname="tagexamp"/></topic> 81: <topic topicid="top6"> 82: <title>Style</title> 83: <para>SGML does not standardize style or other processing methods for 84: information stored in SGML.</para></topic></section></chapter> 85: <chapter> 86: <title>Resources</title> 87: <section> 88: <title>Conferences, tutorials, and training</title> 89: <intro> 90: <para>The Graphic Communications Association has been 91: instrumental in the development of SGML. GCA provides conferences, 92: tutorials, newsletters, and publication sales for both members and 93: non-members.</para> 94: <para security="c">Exiled members of the former Soviet Union's secret 95: police, the KGB, have infiltrated the upper ranks of the GCA and are 96: planning the Final Revolution as soon as DSSSL is completed.</para> 97: </intro> 98: </section> 99: </chapter> 100:</report> Queries and Results Q1

Locate all paragraphs in the report (all "para" elements occurring anywhere within the "report" element).

Solution in XQuery:

<result> { doc("sgml.xml")//report//para } </result>

Expected Result:

Elements whose start-tags are on lines 6, 11, 20, 27, 34, 39, 46, 53, 56, 62, 67, 71, 76, 83, 90, 94

Locate all paragraph elements in an introduction (all "para" elements directly contained within an "intro" element).

Solution in XQuery:

<result> { doc("sgml.xml")//intro/para } </result>

Expected Result:

Elements whose start-tags are on lines 6, 11, 20, 27, 53, 56, 62, 90, 94

Locate all paragraphs in the introduction of a section that is in a chapter that has no introduction (all "para" elements directly contained within an "intro" element directly contained in a "section" element directly contained in a "chapter" element. The "chapter" element must not directly contain an "intro" element).

Solution in XQuery:

<result> { for $c in doc("sgml.xml")//chapter where empty($c/intro) return $c/section/intro/para } </result>

Expected Result:

Elements whose start-tags are on lines 90, 94

Locate the second paragraph in the third section in the second chapter (the second "para" element occurring in the third "section" element occurring in the second "chapter" element occurring in the "report").

Solution in XQuery:

<result> { (((doc("sgml.xml")//chapter)[2]//section)[3]//para)[2] } </result>

Expected Result:

Element whose start-tag is on line 67

Locate all classified paragraphs (all "para" elements whose "security" attribute has the value "c").

Solution in XQuery:

<result> { doc("sgml.xml")//para[@security = "c"] } </result>

Expected Result:

Element whose start-tag is on line 94

List the short titles of all sections (the values of the "shorttitle" attributes of all "section" elements, expressing each short title as the value of a new element.)

Solution in XQuery:

<result> { for $s in doc("sgml.xml")//section/@shorttitle return <stitle>{ $s }</stitle> } </result>

Expected Result:

Attribute values in start-tags on lines 23, 50, 59

Locate the initial letter of the initial paragraph of all introductions (the first character in the content [character content as well as element content] of the first "para" element contained in an "intro" element).

Solution in XQuery:

<result> { for $i in doc("sgml.xml")//intro/para[1] return <first_letter>{ substring(string($i), 1, 1) }</first_letter> } </result>

Expected Result:

Character after start-tag on lines 6, 20, 27, 53, 62, 90

Q8a

Locate all sections with a title that has "is SGML" in it. The string may occur anywhere in the descendants of the title element, and markup boundaries are ignored.

Solution in XQuery:

<result> { doc("sgml.xml")//section[.//title[contains(., "is SGML")]] } </result>

Expected Result:

Elements whose start-tags are on lines 50, 59

Q8b

Same as (Q8a), but the string "is SGML" cannot be interrupted by sub-elements, and must appear in a single text node.

Solution in XQuery:

<result> { doc("sgml.xml")//section[.//title/text()[contains(., "is SGML")]] } </result>

Expected Result:

Element whose start-tag is on line 59

Locate all the topics referenced by a cross-reference anywhere in the report (all the "topic" elements whose "topicid" attribute value is the same as an "xrefid" attribute value of any "xref" element).

Solution in XQuery:

<result> { for $id in doc("sgml.xml")//xref/@xrefid return doc("sgml.xml")//topic[@topicid = $id] } </result>

Expected Result:

Element whose start-tag is on line 65

Q10

Locate the closest title preceding the cross-reference ("xref") element whose "xrefid" attribute is "top4" (the "title" element that would be touched last before this "xref" element when touching each element in document order).

Solution in XQuery:

<result> { let $x := doc("sgml.xml")//xref[@xrefid = "top4"], $t := doc("sgml.xml")//title[. << $x] return $t[last()] } </result>

Expected Result:

Given xref on line 79, element whose start-tag is on line 75

Use Case "STRING": String Search Description

This use case is based on company profiles and a set of news documents which contain data for PR, mergers and acquisitions, etc. Given a company, the use case illustrates several different queries for searching text in news documents and different ways of providing query results by matching the information from the company profile and the content of the news items.

In this use case, the contains function is used to test whether a string occurs within a node or a string. Obviously, using full-text functions would provide more powerful searching, but the current Functions and Operators draft does not have full-text functions.

Document Type Definition (DTD)

This use case uses data that corresponds to the following DTDs:

<!ELEMENT company (name, ticker_symbol?, description?, business_code, partners?, competitors?)> <!ELEMENT name (#PCDATA)> <!ELEMENT ticker_symbol (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT business_code (#PCDATA)> <!ELEMENT partners (partner+)> <!ELEMENT partner (#PCDATA)> <!ELEMENT competitors (competitor+)> <!ELEMENT competitor (#PCDATA)>

<!ELEMENT news (news_item*)> <!ELEMENT news_item (title, content, date, author?, news_agent)> <!ELEMENT title (#PCDATA)> <!ELEMENT content (par | figure)+ > <!ELEMENT date (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT news_agent (#PCDATA)> <!ELEMENT par (#PCDATA | quote | footnote)*> <!ELEMENT quote (#PCDATA)> <!ELEMENT footnote (#PCDATA)> <!ELEMENT figure (title, image)> <!ELEMENT image EMPTY> <!ATTLIST image source CDATA #REQUIRED > Sample Data

The queries in this use case are based on the following input data, which is found in the file "string.xml".

<?xml version="1.0" encoding="ISO-8859-1"?> <news> <news_item> <title> Gorilla Corporation acquires YouNameItWeIntegrateIt.com </title> <content> <par> Today, Gorilla Corporation announced that it will purchase YouNameItWeIntegrateIt.com. The shares of YouNameItWeIntegrateIt.com dropped $3.00 as a result of this announcement. </par> <par> As a result of this acquisition, the CEO of YouNameItWeIntegrateIt.com Bill Smarts resigned. He did not announce what he will do next. Sources close to YouNameItWeIntegrateIt.com hint that Bill Smarts might be taking a position in Foobar Corporation. </par> <par>YouNameItWeIntegrateIt.com is a leading systems integrator that enables <quote>brick and mortar</quote> companies to have a presence on the web. </par> </content> <date>1-20-2000</date> <author>Mark Davis</author> <news_agent>News Online</news_agent> </news_item> <news_item> <title>Foobar Corporation releases its new line of Foo products today</title> <content> <par> Foobar Corporation releases the 20.9 version of its Foo products. The new version of Foo products solve known performance problems which existed in 20.8 line and increases the speed of Foo based products tenfold. It also allows wireless clients to be connected to the Foobar servers. </par> <par> The President of Foobar Corporation announced that they were proud to release 20.9 version of Foo products and they will upgrade existing customers <footnote>where service agreements exist</footnote> promptly. TheAppCompany Inc. immediately announced that it will release the new version of its products to utilize the 20.9 architecture within the next three months. </par> <figure> <title>Presidents of Foobar Corporation and TheAppCompany Inc. Shake Hands</title> <image source="handshake.jpg"/> </figure> </content> <date>1-20-2000</date> <news_agent>Foobar Corporation</news_agent> </news_item> <news_item> <title>Foobar Corporation is suing Gorilla Corporation for patent infringement </title> <content> <par> In surprising developments today, Foobar Corporation announced that it is suing Gorilla Corporation for patent infringement. The patents that were mentioned as part of the lawsuit are considered to be the basis of Foobar Corporation's <quote>Wireless Foo</quote> line of products. </par> <par>The tension between Foobar and Gorilla Corporations has been increasing ever since the Gorilla Corporation acquired more than 40 engineers who have left Foobar Corporation, TheAppCompany Inc. and YouNameItWeIntegrateIt.com over the past 3 months. The engineers who have left the Foobar corporation and its partners were rumored to be working on the next generation of server products and applications which will directly compete with Foobar's Foo 20.9 servers. Most of the engineers have relocated to Hawaii where the Gorilla Corporation's server development is located. </par> </content> <date>1-20-2000</date> <news_agent>Reliable News Corporation</news_agent> </news_item> </news>

In addition, the following data, listing the partners and competitors of companies, is found in the file "company-data.xml".

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE company SYSTEM "company.dtd"> <company> <name>Foobar Corporation</name> <ticker_symbol>FOO</ticker_symbol> <description>Foobar Corporation is a maker of Foo(TM) and Foobar(TM) products and a leading software company with a 300 Billion dollar revenue in 1999. It is located in Alaska. </description> <business_code>Software</business_code> <partners> <partner>YouNameItWeIntegrateIt.com</partner> <partner>TheAppCompany Inc.</partner> </partners> <competitors> <competitor>Gorilla Corporation</competitor> </competitors> </company> Queries and Results Q1

Find the titles of all news items where the string "Foobar Corporation" appears in the title.

Solution in XQuery:

doc("string.xml")//news_item/title[contains(., "Foobar Corporation")]

Expected Results

<title>Foobar Corporation releases its new line of Foo products today</title> <title>Foobar Corporation is suing Gorilla Corporation for patent infringement </title> Q2

Find news items where the Foobar Corporation and one or more of its partners are mentioned in the same paragraph and/or title. List each news item by its title and date.

Solution in XQuery:

declare function local:partners($company as xs:string) as element()* { let $c := doc("company-data.xml")//company[name = $company] return $c//partner }; let $foobar_partners := local:partners("Foobar Corporation") for $item in doc("string.xml")//news_item where some $t in $item//title satisfies (contains($t/text(), "Foobar Corporation") and (some $partner in $foobar_partners satisfies contains($t/text(), $partner/text()))) or (some $par in $item//par satisfies (contains(string($par), "Foobar Corporation") and (some $partner in $foobar_partners satisfies contains(string($par), $partner/text())))) return <news_item> { $item/title } { $item/date } </news_item>

Expected Result:

<news_item> <title> Gorilla Corporation acquires YouNameItWeIntegrateIt.com </title> <date>1-20-2000</date> </news_item> <news_item> <title>Foobar Corporation releases its new line of Foo products today</title> <date>1-20-2000</date> </news_item> <news_item> <title>Foobar Corporation is suing Gorilla Corporation for patent infringement </title> <date>1-20-2000</date> </news_item> Q3

Query Q3 has been withdrawn from the use cases document.

Find news items where a company and one of its partners is mentioned in the same news item and the news item is not authored by the company itself.

Solution in XQuery:

declare function local:partners($company as xs:string) as element()* { let $c := doc("company-data.xml")//company[name = $company] return $c//partner }; for $item in doc("string.xml")//news_item, $c in doc("company-data.xml")//company let $partners := local:partners($c/name) where contains(string($item), $c/name) and (some $p in $partners satisfies contains(string($item), $p) and $item/news_agent != $c/name) return $item

Expected Results: The expected results are the news item elements with the following titles:

Gorilla Corporation acquires YouNameItWeIntegrateIt.com

Foobar Corporation is suing Gorilla Corporation for patent infringement

For each news item that is relevant to the Gorilla Corporation, create an "item summary" element. The content of the item summary is the content of the title, date, and first paragraph of the news item, separated by periods. A news item is relevant if the name of the company is mentioned anywhere within the content of the news item.

Solution in XQuery:

for $item in doc("string.xml")//news_item where contains(string($item/content), "Gorilla Corporation") return <item_summary> { concat($item/title,". ") } { concat($item/date,". ") } { string(($item//par)[1]) } </item_summary>

Expected Result: (with whitespace reformatted for readability)

<item_summary>Gorilla Corporation acquires YouNameItWeIntegrateIt.com. 1-20-2000. Today, Gorilla Corporation announced that it will purchase YouNameItWeIntegrateIt.com. The shares of YouNameItWeIntegrateIt.com dropped $3.00 as a result of this announcement.</item_summary> <item_summary>Foobar Corporation is suing Gorilla Corporation for patent infringement. 1-20-2000. In surprising developments today, Foobar Corporation announced that it is suing Gorilla Corporation for patent infringement. The patents that were mentioned as part of the lawsuit are considered to be the basis of Foobar Corporation's Wireless Foo line of products.</item_summary> Use Case "NS" - Queries Using Namespaces

This use case performs a variety of queries on namespace-qualified names.

Description

This use case is based on a scenario in which a neutral mediator is acting with public auction servers on behalf of clients. The reason for a client to use this imaginary service may be anonymity, better insurance, or the possibility to cover more than one market at a time. The following aspects of namespaces are illustrated by this use case:

Syntactic disambiguation when combining XML data from different sources

Re-use of predefined modules, such as XLinks or XML Schema

Support for global classification schemas, such as the Dublin Core

The sample data consists of two records. The schema used for this data uses W3C XML Schema's schema composition to create a schema from predefined, namespace separated modules, and uses XLink to express references. Each record describes a running auction. It embeds data specific to an auctioneer (e.g. the company's credit rating system) and a taxonomy specific to a particular good (jazz records) in a framework that contains data common to all auctions (e.g. start and end time), using namespaces to distinguish the three vocabularies.

Note that namespace prefixes must be resolved to their Namespace URIs before matching namespace qualified names. It is not sufficient to use the literal prefixes to denote namespaces. Furthermore, there are several possible ways to represent namespace declarations. Therefore, processing must be done on the namespace processed XML Information Set, not on the XML text representation.

Document Type Definition (DTD)

DTDs are not fully compatible with namespaces as they can not express the equality of nodes in the same namespace, but different namespace proxies. In a later version of this paper, an XML Schema should be added here.

This data for this use case is in the file "auction.xml".

Sample Data <?xml version="1.0" encoding="ISO-8859-1"?> <ma:AuctionWatchList xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo" >  <ma:Auction anyzone:ID="0321K372910"> <ma:AuctionHomepage xlink:type="simple" xlink:href="http://www.example.com/item/0321K372910" /> <ma:Schedule> <ma:Open xmlns:dt="http://www.w3.org/2001/XMLSchema" dt:type="timeInstant">2000-03-21:07:41:34-05:00</ma:Open> <ma:Close xmlns:dt="http://www.w3.org/2001/XMLSchema" dt:type="timeInstant">2000-03-23:07:41:34-05:00</ma:Close> </ma:Schedule> <ma:Price> <ma:Start ma:currency="USD">3.00</ma:Start> <ma:Current ma:currency="USD">10.00</ma:Current> <ma:Number_of_Bids>5</ma:Number_of_Bids> </ma:Price> <ma:Trading_Partners> <ma:High_Bidder> <eachbay:ID>RecordsRUs</eachbay:ID> <eachbay:PositiveComments>231</eachbay:PositiveComments> <eachbay:NeutralComments>2</eachbay:NeutralComments> <eachbay:NegativeComments>5</eachbay:NegativeComments> <ma:MemberInfoPage xlink:type="simple" xlink:href="http://auction.eachbay.com/members?get=RecordsRUs" xlink:role="ma:MemberInfoPage" /> </ma:High_Bidder> <ma:Seller> <anyzone:ID>VintageRecordFreak</anyzone:ID> <anyzone:Member_Since>October 1999</anyzone:Member_Since> <anyzone:Rating>5</anyzone:Rating> <ma:MemberInfoPage xlink:type="simple" xlink:href="http://auction.anyzone.com/members/VintageRecordFreak" xlink:role="ma:MemberInfoPage" /> </ma:Seller> </ma:Trading_Partners> <ma:Details> <record xmlns="http://www.example.org/music/records"> <artist>Miles Davis</artist> <title>In a Silent Way</title> <recorded>1969</recorded> <label>Columbia Records</label> <remark>With Miles Davis (trumpet), Herbie Hancock (Electric Piano), Chick Corea (Electric Piano), Wayne Shorter (Tenor Sax), Josef Zawinul (Electric Piano & Organ), John McLaughlin (Guitar), and Tony Williams (Drums). The liner notes were written by Frank Glenn, and the record is in fine condition.</remark> </record> </ma:Details> </ma:Auction>  <ma:Auction yabadoo:ID="13143816"> <ma:AuctionHomepage xlink:type="simple" xlink:href="http://auctions.yabadoo.com/auction/13143816" /> <ma:Schedule> <ma:Open xmlns:dt="http://www.w3.org/2001/XMLSchema" dt:type="timeInstant">2000-03-19:17:03:00-04:00</ma:Open> <ma:Close xmlns:dt="http://www.w3.org/2001/XMLSchema" dt:type="timeInstant">2000-03-29:17:03:00-04:00</ma:Close> </ma:Schedule> <ma:Price> <ma:Start ma:currency="USD">3.00</ma:Start> <ma:Current ma:currency="USD">3.00</ma:Current> <ma:Number_of_Bids>0</ma:Number_of_Bids> </ma:Price> <ma:Trading_Partners> <ma:High_Bidder> <eachbay:ID>VintageRecordFreak</eachbay:ID> <eachbay:PositiveComments>232</eachbay:PositiveComments> <eachbay:NeutralComments>0</eachbay:NeutralComments> <eachbay:NegativeComments>0</eachbay:NegativeComments> <ma:MemberInfoPage xlink:type="simple" xlink:href="http://auction.eachbay.com/showRating/user=VintageRecordFreak" xlink:role="ma:MemberInfoPage" /> </ma:High_Bidder> <ma:Seller xmlns:seller="http://www.example.com/auctioneers#eachbay"> <seller:ID>StarsOn45</seller:ID> <seller:PositiveComments>80</seller:PositiveComments> <seller:NeutralComments>1</seller:NeutralComments> <seller:NegativeComments>2</seller:NegativeComments> <ma:MemberInfoPage xlink:type="simple" xlink:href="http://auction.eachbay.com/showRating/user=StarsOn45" xlink:role="ma:MemberInfoPage" /> </ma:Seller> </ma:Trading_Partners> <ma:Details> <record xmlns="http://www.example.org/music/records"> <artist>Wynton Marsalis</artist> <title>Think of One ...</title> <recorded>1983</recorded> <label>Columbia Records</label> <remark xml:lang="en">Columbia Records 12" 33-1/3 rpm LP, #FC-38641, Stereo. The record is still clean and shiny and looks unplayed (looks like NM condition). The cover has very light surface and edge wear.</remark> <remark xml:lang="de">Columbia Records 12" 33-1/3 rpm LP, #FC-38641, Stereo. Die Platte ist noch immer sauber und glänzend und sieht ungespielt aus (NM Zustand). Das Cover hat leichte Abnutzungen an Oberfläche und Ecken.</remark> </record> </ma:Details> </ma:Auction> </ma:AuctionWatchList> Queries and Results Q1

List all unique namespaces used in the sample data.

Solution in XQuery:

<Q1> { for $n in distinct-values( for $i in (doc("auction.xml")//* | doc("auction.xml")//@*) return namespace-uri($i) ) return <ns>{$n}</ns> } </Q1>

Expected Result:

<Q1> <ns>http://www.example.com/AuctionWatch</ns> <ns>http://www.example.com/auctioneers#anyzone</ns> <ns>http://www.w3.org/1999/xlink</ns> <ns>http://www.w3.org/2001/XMLSchema</ns> <ns>http://www.example.com/auctioneers#eachbay</ns> <ns>http://www.example.org/music/records</ns> <ns>http://www.example.com/auctioneers#yabadoo</ns> <ns>http://www.w3.org/XML/1998/namespace</ns> </Q1> Q2

Select the title of each record that is for sale.

Solution in XQuery:

declare namespace music = "http://www.example.org/music/records"; <Q2> { doc("auction.xml")//music:title } </Q2>

Expected Result:

<Q2> <title xmlns="http://www.example.org/music/records" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo">In a Silent Way</title> <title xmlns="http://www.example.org/music/records" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo">Think of One ...</title> </Q2> Q3

Select all elements that have an attribute whose name is in the XML Schema namespace.

Solution in XQuery:

declare namespace dt = "http://www.w3.org/2001/XMLSchema"; <Q3> { doc("auction.xml")//*[@dt:*] } </Q3>

Expected Result:

<Q3> <ma:Open dt:type="timeInstant" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:dt="http://www.w3.org/2001/XMLSchema" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo">2000-03-21:07:41:34-05:00</ma:Open> <ma:Close dt:type="timeInstant" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:dt="http://www.w3.org/2001/XMLSchema" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo">2000-03-23:07:41:34-05:00</ma:Close> <ma:Open dt:type="timeInstant" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:dt="http://www.w3.org/2001/XMLSchema" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo">2000-03-19:17:03:00-04:00</ma:Open> <ma:Close dt:type="timeInstant" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:dt="http://www.w3.org/2001/XMLSchema" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo">2000-03-29:17:03:00-04:00</ma:Close> </Q3> Q4

List the target URI's of all XLinks in the document.

Solution in XQuery:

declare namespace xlink = "http://www.w3.org/1999/xlink"; <Q4 xmlns:xlink="http://www.w3.org/1999/xlink"> { for $hr in doc("auction.xml")//@xlink:href return <ns>{ $hr }</ns> } </Q4>

Expected Result:

<Q4 xmlns:xlink="http://www.w3.org/1999/xlink"> <ns xlink:href="http://www.example.com/item/0321K372910"/> <ns xlink:href="http://auction.eachbay.com/members?get=RecordsRUs"/> <ns xlink:href="http://auction.anyzone.com/members/VintageRecordFreak"/> <ns xlink:href="http://auctions.yabadoo.com/auction/13143816"/> <ns xlink:href="http://auction.eachbay.com/showRating/user=VintageRecordFreak"/> <ns xlink:href="http://auction.eachbay.com/showRating/user=StarsOn45"/> </Q4> Q5

Select all records that have a remark in German.

Solution in XQuery:

declare namespace music = "http://www.example.org/music/records"; <Q5 xmlns:music="http://www.example.org/music/records"> { doc("auction.xml")//music:record[music:remark/@xml:lang = "de"] } </Q5>

Expected Result:

<Q5 xmlns:music="http://www.example.org/music/records"> <record xmlns="http://www.example.org/music/records" xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo"> <artist>Wynton Marsalis</artist> <title>Think of One ...</title> <recorded>1983</recorded> <label>Columbia Records</label> <remark xml:lang="en">Columbia Records 12" 33-1/3 rpm LP, #FC-38641, Stereo. The record is still clean and shiny and looks unplayed (looks like NM condition). The cover has very light surface and edge wear.</remark> <remark xml:lang="de">Columbia Records 12" 33-1/3 rpm LP, #FC-38641, Stereo. Die Platte ist noch immer sauber und glänzend und sieht ungespielt aus (NM Zustand). Das Cover hat leichte Abnutzungen an Oberfläche und Ecken.</remark> </record> </Q5> Q6

Select the closing time elements of all AnyZone auctions currently monitored.

Solution in XQuery:

declare namespace ma = "http://www.example.com/AuctionWatch"; declare namespace anyzone = "http://www.example.com/auctioneers#anyzone"; <Q6 xmlns:ma="http://www.example.com/AuctionWatch"> { doc("auction.xml")//ma:Auction[@anyzone:ID]/ma:Schedule/ma:Close } </Q6>

Expected Result:

<Q6 xmlns:ma="http://www.example.com/AuctionWatch"> <ma:Close xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:dt="http://www.w3.org/2001/XMLSchema" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo" dt:type="timeInstant">2000-03-23:07:41:34-05:00</ma:Close> </Q6> Q7

Select the homepage of all auctions where both seller and high bidder are registered at the same auctioneer.

Solution in XQuery:

declare namespace ma = "http://www.example.com/AuctionWatch"; <Q7 xmlns:xlink="http://www.w3.org/1999/xlink"> { for $a in doc("auction.xml")//ma:Auction let $seller_id := $a/ma:Trading_Partners/ma:Seller/*:ID, $buyer_id := $a/ma:Trading_Partners/ma:High_Bidder/*:ID where namespace-uri($seller_id) = namespace-uri($buyer_id) return $a/ma:AuctionHomepage } </Q7>

Expected Result:

<Q7 xmlns:xlink="http://www.w3.org/1999/xlink" > <ma:AuctionHomepage xmlns:ma="http://www.example.com/AuctionWatch" xlink:type="simple" xlink:href="http://auctions.yabadoo.com/auction/13143816" /> </Q7> Q8

Select all traders (either seller or high bidder) without negative comments

Solution in XQuery:

declare namespace ma = "http://www.example.com/AuctionWatch"; <Q8 xmlns:ma="http://www.example.com/AuctionWatch" xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:xlink="http://www.w3.org/1999/xlink"> { for $s in doc("auction.xml")//ma:Trading_Partners/(ma:Seller | ma:High_Bidder) where $s/*:NegativeComments = 0 return $s } </Q8>

Expected Result:

<Q8 xmlns:eachbay="http://www.example.com/auctioneers#eachbay" xmlns:ma="http://www.example.com/AuctionWatch" xmlns:xlink="http://www.w3.org/1999/xlink"> <ma:High_Bidder xmlns:anyzone="http://www.example.com/auctioneers#anyzone" xmlns:yabadoo="http://www.example.com/auctioneers#yabadoo"> <eachbay:ID>VintageRecordFreak</eachbay:ID> <eachbay:PositiveComments>232</eachbay:PositiveComments> <eachbay:NeutralComments>0</eachbay:NeutralComments> <eachbay:NegativeComments>0</eachbay:NegativeComments> <ma:MemberInfoPage xlink:href="http://auction.eachbay.com/showRating/user=VintageRecordFreak" xlink:role="ma:MemberInfoPage" xlink:type="simple"/> </ma:High_Bidder> </Q8> Use Case "PARTS" - Recursive Parts Explosion

This use case illustrates how a recursive query might be used to construct a hierarchic document of arbitrary depth from flat structures stored in a database.

Description

This use case is based on a "parts explosion" database that contains information about how parts are used in other parts.

The input to the use case is a "flat" document in which each different part is represented by a <part> element with partid and name attributes. Each part may or may not be part of a larger part; if so, the partid of the larger part is contained in a partof attribute. This input document might be derived from a relational database in which each part is represented by a row of a table with partid as primary key and partof as a foreign key referencing partid.

The challenge of this use case is to write a query that converts the "flat" representation of the parts explosion, based on foreign keys, into a hierarchic representation in which part containment is represented by the structure of the document.

Document Type Definitions (DTD)

The input data set uses the following DTD:

<!DOCTYPE partlist [ <!ELEMENT partlist (part*)> <!ELEMENT part EMPTY> <!ATTLIST part partid CDATA #REQUIRED partof CDATA #IMPLIED name CDATA #REQUIRED> ]>

Although the partid and partof attributes could have been of type ID and IDREF, respectively, in this schema they are treated as character data, possibly materialized in a straightforward way from a relational database. Each partof attribute matches exactly one partid. Parts having no partof attribute are not contained in any other part.

The output data conforms to the following DTD:

<!DOCTYPE parttree [ <!ELEMENT parttree (part*)> <!ELEMENT part (part*)> <!ATTLIST part partid CDATA #REQUIRED name CDATA #REQUIRED> ]> Sample Data <?xml version="1.0" encoding="ISO-8859-1"?> <partlist> <part partid="0" name="car"/> <part partid="1" partof="0" name="engine"/> <part partid="2" partof="0" name="door"/> <part partid="3" partof="1" name="piston"/> <part partid="4" partof="2" name="window"/> <part partid="5" partof="2" name="lock"/> <part partid="10" name="skateboard"/> <part partid="11" partof="10" name="board"/> <part partid="12" partof="10" name="wheel"/> <part partid="20" name="canoe"/> </partlist> Queries and Results Q1

Convert the sample document from "partlist" format to "parttree" format (see DTD section for definitions). In the result document, part containment is represented by containment of one <part> element inside another. Each part that is not part of any other part should appear as a separate top-level element in the output document.

Solution in XQuery:

declare function local:one_level($p as element()) as element() { <part partid="{ $p/@partid }" name="{ $p/@name }" > { for $s in doc("partlist.xml")//part where $s/@partof = $p/@partid return local:one_level($s) } </part> }; <parttree> { for $p in doc("partlist.xml")//part[empty(@partof)] return local:one_level($p) } </parttree>

Expected Result:

<parttree> <part partid="0" name="car"> <part partid="1" name="engine"> <part partid="3" name="piston"/> </part> <part partid="2" name="door"> <part partid="4" name="window"/> <part partid="5" name="lock"/> </part> </part> <part partid="10" name="skateboard"> <part partid="11" name="board"/> <part partid="12" name="wheel"/> </part> <part partid="20" name="canoe"/> </parttree> Use Case "STRONG" - queries that exploit strongly typed data Description

Strongly typed and weakly typed data are both important kinds of XML data. Most of the queries in this document focus on weakly typed data that is governed by a DTD and does not contain XML Schema simple datatypes or named complex types. This use case explores XQuery's support for types, using data that is governed by a strongly typed XML Schema .

Schema

The schema for this example is the International Purchase Order schema taken from the XML Schema Primer, which imports a schema for addresses. The main schema is found in a schema document named "ipo.xsd":

<?xml version="1.0"?> <schema targetNamespace="http://www.example.com/IPO" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:ipo="http://www.example.com/IPO"> <annotation> <documentation xml:lang="en"> International Purchase order schema for Example.com Copyright 2000 Example.com. All rights reserved. </documentation> </annotation>  <include schemaLocation="address.xsd"/> <element name="purchaseOrder" type="ipo:PurchaseOrderType"/> <element name="comment" type="string"/> <element name="shipComment" type="string" substitutionGroup="ipo:comment"/> <element name="customerComment" type="string" substitutionGroup="ipo:comment"/> <complexType name="PurchaseOrderType"> <sequence> <element name="shipTo" type="ipo:Address"/> <element name="billTo" type="ipo:Address"/> <element ref="ipo:comment" minOccurs="0"/> <element name="items" type="ipo:Items"/> </sequence> <attribute name="orderDate" type="date"/> </complexType> <complexType name="Items"> <sequence> <element name="item" minOccurs="0" maxOccurs="unbounded"> <complexType> <sequence> <element name="productName" type="string"/> <element name="quantity"> <simpleType> <restriction base="positiveInteger"> <maxExclusive value="100"/> </restriction> </simpleType> </element> <element name="USPrice" type="decimal"/> <element ref="ipo:comment" minOccurs="0" maxOccurs="unbounded"/> <element name="shipDate" type="date" minOccurs="0" maxOccurs="unbounded"/> </sequence> <attribute name="partNum" type="ipo:SKU" use="required"/> </complexType> </element> </sequence> </complexType> <simpleType name="SKU"> <restriction base="string"> <pattern value="\d{3}-[A-Z]{2}"/> </restriction> </simpleType> </schema>

The address constructs are found in a schema document named "address.xsd":

<schema targetNamespace="http://www.example.com/IPO" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:ipo="http://www.example.com/IPO"> <annotation> <documentation xml:lang="en"> Addresses for International Purchase order schema Copyright 2000 Example.com. All rights reserved. </documentation> </annotation> <complexType name="Address"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </sequence> </complexType> <complexType name="USAddress"> <complexContent> <extension base="ipo:Address"> <sequence> <element name="state" type="ipo:USState"/> <element name="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType> <complexType name="UKAddress"> <complexContent> <extension base="ipo:Address"> <sequence> <element name="postcode" type="ipo:UKPostcode"/> </sequence> <attribute name="exportCode" type="positiveInteger" fixed="1"/> </extension> </complexContent> </complexType>  <simpleType name="USState"> <restriction base="string"> <enumeration value="AK"/> <enumeration value="AL"/> <enumeration value="AR"/>  <enumeration value="PA"/> </restriction> </simpleType>  <simpleType name="UKPostcode"> <restriction base="string"> <pattern value="[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}"/> </restriction> </simpleType> </schema> Sample Data

The sample data used for the query is found in a file named "ipo.xml":

<?xml version="1.0"?> <ipo:purchaseOrder xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ipo="http://www.example.com/IPO" orderDate="1999-12-01"> <shipTo exportCode="1" xsi:type="ipo:UKAddress"> <name>Helen Zoe</name> <street>47 Eden Street</street> <city>Cambridge</city> <postcode>CB1 1JR</postcode> </shipTo> <billTo xsi:type="ipo:USAddress"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <items> <item partNum="833-AA"> <productName>Lapis necklace</productName> <quantity>1</quantity> <USPrice>99.95</USPrice> <ipo:shipComment> Use gold wrap if possible </ipo:shipComment> <ipo:customerComment> Want this for the holidays! </ipo:customerComment> <shipDate>1999-12-05</shipDate> </item> </items> </ipo:purchaseOrder> Queries Q1

Count the invoices shipped to the United Kingdom.

Solution in XQuery:

import schema namespace ipo = "http://www.example.com/IPO" at "ipo.xsd"; count( doc("ipo.xml")//shipTo[. instance of element(*, ipo:UKAddress)] )

Expected Result:

In this dataset, the data for an address does not contain the name of the country, and the name of the shipTo element is the same regardless of the country to which items are shipped. Only the types allow us to identify UK addresses - in the schema, there is one address type for UK addresses and another for US addresses, both derived from a common base class. In the above query, we use the UKAddress type to identify invoices shipped to the UK.

Write a function that tests an American address to check if it has the right zip code.

In our solution, we will assume zip code data is stored in a file named "zips.xml", which looks like this.

<zips xmlns="http://www.example.com/zips"> <row> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </row> <row> <city>Durham</city> <state>NC</state> <zip>27701</zip> </row> <row> <city>Durham</city> <state>NC</state> <zip>27703</zip> </row> </zips>

The corresponding schema document is named "zips.xsd":

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" targetNamespace="http://www.example.com/zips" xmlns:zips="http://www.example.com/zips"> <xs:element name="zips"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" ref="zips:row"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="row"> <xs:complexType> <xs:sequence> <xs:element ref="zips:city"/> <xs:element ref="zips:state"/> <xs:element ref="zips:zip"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="city" type="xs:string"/> <xs:element name="state" type="xs:string"/> <xs:element name="zip" type="xs:integer"/> </xs:schema>

Solution in XQuery:

module namespace z="http://www.example.com/xq/zips"; import schema namespace ipo = "http://www.example.com/IPO" at "ipo.xsd"; import schema namespace zips = "http://www.example.com/zips" at "zips.xsd"; declare function z:zip-ok($a as element(*, ipo:USAddress)) as xs:boolean { some $i in doc("zips.xml")/zips:zips/element(zips:row) satisfies $i/zips:city = $a/city and $i/zips:state = $a/state and $i/zips:zip = $a/zip };

This is not a complete query, it is a function that is meant to be called in a query. We will use this function in Q4.

An attempt to call this function with an element of the wrong address type raises an error. For instance, you can not call z:zip-ok() with an element of type UKAddress.

Note that the parameter for this function specifies the type rather than the element name, since it is written to be used with any element that has the proper address type - for instance, in our sample schema, 'billTo' and 'shipTo' are two different elements which may both have the USAddress type.

Write a function that tests a UK address to see if it has the right postal code.

For England, we store the information needed to test postal codes in a file named "postals.xml", which looks like this:

<postals xmlns="http://www.example.com/postals"> <row> <city>Cambridge</city> <prefix>CB</prefix> </row> <row> <city>Oxford</city> <prefix>OX</prefix> </row> </postals>

Here is the schema for the above file.

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" targetNamespace="http://www.example.com/postals" xmlns:postals="http://www.example.com/postals"> <xs:element name="postals"> <xs:complexType> <xs:sequence> <xs:element maxOccurs="unbounded" ref="postals:row"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="row"> <xs:complexType> <xs:sequence> <xs:element ref="postals:city"/> <xs:element ref="postals:prefix"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="city" type="xs:string"/> <xs:element name="prefix" type="xs:string"/> </xs:schema>

Solution in XQuery:

module namespace p="http://www.example.com/xq/postals"; import schema namespace ipo = "http://www.example.com/IPO" at "ipo.xsd"; import schema namespace pst = "http://www.example.com/postals" at "postals.xsd"; declare function p:postal-ok($a as element(*, ipo:UKAddress)) as xs:boolean { some $i in doc("postals.xml")/pst:postals/element(pst:row) satisfies $i/pst:city = $a/city and starts-with($a/postcode, $i/pst:prefix) };

This is not a complete query, it is a function that is meant to be called in a query. We will use this function in Q4.

Determine whether the postal code or zip code for a purchase order is right.

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; import schema namespace pst="http://www.example.com/postals" at "postals.xsd"; import schema namespace zips="http://www.example.com/zips" at "zips.xsd"; import module namespace zok="http://www.example.com/xq/zips"; import module namespace pok="http://www.example.com/xq/postals"; declare function local:address-ok($a as element(*, ipo:Address)) as xs:boolean { typeswitch ($a) case $zip as element(*, ipo:USAddress) return zok:zip-ok($zip) case $postal as element(*, ipo:UKAddress ) return pok:postal-ok($postal) default return false() }; let $shipTo := doc("ipo.xml")/element(ipo:purchaseOrder)/shipTo let $billTo := doc("ipo.xml")/element(ipo:purchaseOrder)/billTo return local:address-ok($shipTo) and local:address-ok($billTo)

Expected Result:

true

This query calls the functions defined in Q2 and Q3.

Note that the function local:address-ok() accepts any element whose type is ipo:Address, which is the base type for both ipo:UKAddress and ipo:USAddress. Note also that this function uses a typeswitch to select the appropriate function to test American or British addresses. This can be considered a primitive form of polymorphism.

Determine whether the shipping address matches the billing address.

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; declare function local:names-match( $s as element(shipTo, ipo:Address), $b as element(billTo, ipo:Address)) as xs:boolean { $s/name = $b/name }; let $p := doc("ipo.xml")/element(ipo:purchaseOrder) return local:names-match( $p/shipTo, $p/billTo )

Expected Result:

false

In this function, note that the function specifies both the element names and the type names for its parameters.

Note also that the schema says both of these elements are local elements, defined only within a purchase order. The element test matches an element with a given name even if that element is locally declared.

Determine whether the invoice has a USAddress and gives all prices as USPrices.

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; let $p := doc("ipo.xml")//element(ipo:purchaseOrder) let $billTo := $p/billTo let $shipTo := $p/shipTo return if ($billTo instance of element(*, ipo:USAddress)) then every $i in $p/items/item satisfies (exists($i/USPrice)) else false()

Expected Result:

true

Addresses are part of a type hierarchy, and the element name, ipo:shipTo, does not tell us whether an address is a US price or not, so we have to test the type.

This example is rather contrived, since the schema specifies that all prices are USPrice elements. Nevertheless, it illustrates the ability to easily combine information derived from type information with information derived from structure.

Write a function that returns the text of a comment. Call this function for each shipping comment found in an item shipped to Helen Zoe on the date 1999-12-01.

Our source schema models comments with the following substitution groups:

The following sample data contains instances of these substitution groups:

<items> <item partNum="833-AA"> <productName>Lapis necklace</productName> <quantity>1</quantity> <USPrice>99.95</USPrice> <ipo:shipComment> Use gold wrap if possible </ipo:shipComment> <ipo:customerComment> Want this for the holidays! </ipo:customerComment> <shipDate>1999-12-05</shipDate> </item> </items>

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; declare function local:comment-text($c as schema-element(ipo:comment)) as xs:string { xs:string( $c ) }; for $p in doc("ipo.xml")//element(ipo:purchaseOrder), $t in local:comment-text( $p//ipo:shipComment ) where $p/shipTo/name="Helen Zoe" and $p/@orderDate = xs:date("1999-12-01") return $t

In this query, the function specifies ipo:comment as the name of the element, but any element in the substitution group of ipo:comment may also be passed to this function. That means that we can call the same function for ipo:shipComment elements or ipo:customerComment elements - for instance, the following query also succeeds:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; declare function local:comment-text($c as schema-element(ipo:comment)) as xs:string { xs:string( $c ) }; for $p in doc("ipo.xml")/schema-element(ipo:purchaseOrder) where $p/shipTo/name="Helen Zoe" and $p/@orderDate = xs:date("1999-12-01") return local:comment-text( $p//ipo:customerComment ) Q8

Find all comments found in an item shipped to Helen Zoe on the date 1999-12-01, including all elements in the substitution group for ipo:comment.

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; for $p in doc("ipo.xml")//element(ipo:purchaseOrder) where $p/shipTo/name="Helen Zoe" and $p/@orderDate = xs:date("1999-12-01") return $p//schema-element(ipo:comment)

Note that schema-element(ipo:comment) matches any valid element in the substitution group of ipo:comment.

Write a function that returns all comments found on an element, whether an item element or some other element that may have a comment.

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; declare function local:comments-for-element( $e as element() ) as element(ipo:comment)* { $e/schema-element(ipo:comment) }; for $p in doc("ipo.xml")//element(ipo:purchaseOrder) where $p/shipTo/name="Helen Zoe" and $p/@orderDate = xs:date("1999-12-01") return <comments name="{$p/shipTo/name}" date="{$p/@orderDate}"> { local:comments-for-element( $p ) } { for $i in $p//item return local:comments-for-element( $i ) } </comments>

In this schema, comments can occur on either a purchase order or on an item. In a more complete schema, they could presumably occur in other areas as well. This function returns all comments found on an element, regardless of the name of the element, illustrating the need to write functions that can accept any element as a parameter.

Q10

Write a function that determines whether the person listed in a billTo element is known to be a deadbeat, using a US database.

In American slang, a "deadbeat" is a person who fails to meet a financial obligation.

This query assumes that "deadbeats.xml" lists the names deadbeats in the following format:

<?xml version="1.0"?> <deadbeats> <row> <name>Dick Dastardly</name> <creditrating>0</creditrating> </row> <row> <name>Peter Doubt</name> <creditrating>1</creditrating> </row> <row> <name>Robert Smith</name> <creditrating>0</creditrating> </row> </deadbeats>

Solution in XQuery:

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; declare function local:deadbeat( $b as element(billTo, ipo:USAddress) ) as xs:boolean { $b/name = doc("deadbeats.xml")/deadbeats/row/name }; for $p in doc("ipo.xml")//element(ipo:purchaseOrder) where local:deadbeat( $p/element(billTo) ) return <warning>{ string($p/billTo/name) } is a known deadbeat!</warning>

Expected Output:

<warning>Robert Smith is a known deadbeat!</warning>

Note that this function specifies both the element name and the type. The element name is specified because we do not want to embarrass recipients of gifts by calling this function for the shipping address by mistake. The type is specified because we would need to use a different database to identify deadbeats in other countries.

Also note that the XML file in this example has no schema. We assume that the processor omits validation or does lax validation.

Q11

Write a function that computes the total price for a sequence of item elements.

Solution in XQuery:

module namespace c="http://www.example.com/calc"; import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; declare function c:total-price( $i as element(item)* ) as xs:decimal { let $subtotals := for $s in $i return $s/quantity * $s/USPrice return sum( $subtotals ) };

Here is a query that calls the function we just defined to get the total for an invoice (before calculating taxes and shipping charges):

import schema namespace ipo="http://www.example.com/IPO" at "ipo.xsd"; import module namespace calc = "http://www.example.com/calc"; for $p in doc("ipo.xml")//element(ipo:purchaseOrder) where $p/shipTo/name="Helen Zoe" and $p/@orderDate = xs:date("1999-12-01") return calc:total-price($p//item)

This query illustrates the need to be able to pass a sequence as a parameter to a function.

If the input document contains more than one purchase order for the given date and person, a total will be computed for all purchase orders.

Q12

In , a Quarterly Report is created, summarizing the types of products that have been billed on a per region basis. It creates the following sample report:

<purchaseReport xmlns="http://www.example.com/Report" period="P3M" periodEnding="1999-12-31"> <regions> <zip code="95819"> <part number="872-AA" quantity="1"/> <part number="926-AA" quantity="1"/> <part number="833-AA" quantity="1"/> <part number="455-BX" quantity="1"/> </zip> <zip code="63143"> <part number="455-BX" quantity="4"/> </zip> </regions> <parts> <part number="872-AA">Lawnmower</part> <part number="926-AA">Baby Monitor</part> <part number="833-AA">Lapis Necklace</part> <part number="455-BX">Sturdy Shelves</part> </parts> </purchaseReport>

This is the schema given for the above report:

<schema targetNamespace="http://www.example.com/Report" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:r="http://www.example.com/Report" xmlns:xipo="http://www.example.com/IPO" elementFormDefault="qualified">  <import namespace="http://www.example.com/IPO"/> <annotation> <documentation xml:lang="en"> Report schema for Example.com Copyright 2000 Example.com. All rights reserved. </documentation> </annotation> <element name="purchaseReport"> <complexType> <sequence> <element name="regions" type="r:RegionsType"> <keyref name="dummy2" refer="r:pNumKey"> <selector xpath="r:zip/r:part"/> <field xpath="@number"/> </keyref> </element> <element name="parts" type="r:PartsType"/> </sequence> <attribute name="period" type="duration"/> <attribute name="periodEnding" type="date"/> </complexType> <unique name="dummy1"> <selector xpath="r:regions/r:zip"/> <field xpath="@code"/> </unique> <key name="pNumKey"> <selector xpath="r:parts/r:part"/> <field xpath="@number"/> </key> </element> <complexType name="RegionsType"> <sequence> <element name="zip" maxOccurs="unbounded"> <complexType> <sequence> <element name="part" maxOccurs="unbounded"> <complexType> <complexContent> <restriction base="anyType"> <attribute name="number" type="xipo:SKU"/> <attribute name="quantity" type="positiveInteger"/> </restriction> </complexContent> </complexType> </element> </sequence> <attribute name="code" type="positiveInteger"/> </complexType> </element> </sequence> </complexType> <complexType name="PartsType"> <sequence> <element name="part" maxOccurs="unbounded"> <complexType> <simpleContent> <extension base="string"> <attribute name="number" type="xipo:SKU"/> </extension> </simpleContent> </complexType> </element> </sequence> </complexType> </schema>

This report, which lists products sold by zip code, is based on the same international purchase report used in previous queries.

Here is a query that generates the desired report from a collection that contains US purchase orders:

declare namespace rpt="http://www.example.com/Report"; let $orders := doc('ipo.xml')/schema-element(ipo:purchaseOrder) [@orderDate ge xs:date("1999-09-01") and @orderDate le xs:date("1999-12-31")] let $items := $orders/items/item let $zips := distinct-values($orders/billTo/zip) let $parts := distinct-values($items/@partNum) return <rpt:purchaseReport> <rpt:regions> { for $zip in $zips order by $zip return <rpt:zip code="{$zip}"> { for $part in $parts let $hits := $orders[ billTo/zip = $zip and items/item/@partNum = $part] let $quantity := sum($hits//item[@partNum=$part]/quantity) where count($hits) > 0 order by $part return <rpt:part number="{$part}" quantity="{$quantity}"/> } </rpt:zip> } </rpt:regions> <rpt:parts> { for $part in $parts return <rpt:part number="{$part}"> { string($items[@partNum = $part]/productName) } </rpt:part> } </rpt:parts> </rpt:purchaseReport> Acknowledgements

The editors thank the members of the XML Query Working Group, which produced the material in this document.

The use cases in this paper were contributed by the following individuals:

Use Case "R"	Don Chamberlin
Use Case "XMP"	Mary Fernandez, Jerome Simeon, Phil Wadler
Use Case "TREE"	Jonathan Robie
Use Case "PARTS"	Michael Rys
Use Case "NS"	Ingo Macherius
Use Case "STRING"	Umit Yalcinalp
Use Case "SEQ"	Jonathan Robie
Use Case "SGML"	Paula Angerstein
Use Case "STRONG"	Jonathan Robie and Phil Wadler. Schemas and data taken from .

Use case "XMP" has been previously published in . Use cases "Tree" and "Seq" have been previously published in .

The editors also wish to thank the members of the other W3C Working Groups who have commented on earlier drafts, and Michael Dyck and for his critical reading and helpful suggestions. Michael Wenger found several bugs, and also found more elegant solutions to some of the use cases, which are now included in this document.

Change Log 31 Aug 2005

Comments should now be directed to Bugzilla.

Massimo and Dana have each changed their affiliation.

IPR statement has been changed to reflect that this will progress to a Note rather than a Recommendation.

Provided link to changes to Use Case Strong errors in Mozilla.

Minor editorial changes.

11 July 2005

Aligned with current XQuery Working Draft.

Fixed the following bugs.

rdb-queries-results-q9 now uses month-from-date() and year-from-date(). The "get-" prefix, now obsolete, has been removed. Fixes Bug 119.

Removed whitespace from the price element in the source document for Use Case "XMP" - it was <price> 65.95</price>, and is now <price> 65.95</price>. Fixes Bug 121.

Removed trailing whitespace from the <remark/> elements used in "auction.xml". Fixes bug Bug 124.

Need to discuss Bug 148. Did not see the problems with Q1 and Q2 in the document. Have some questions about the best treatment of Q4 and Q5.

Fixed many errors in Use Case "Strong" as proposed here.

04 April 2005

Alignment with 04 April 2005 Working Draft of XQuery. This actually did not change the results of any of the queries, so it was just a matter of tweaking the front matter.

30 Jan 2005

Many changes have been made since the last release, particularly:

Moved to the 29 October XQuery Working Drafts.

Corrected numerous NS queries to reflect current handling of namespaces.

Changed use of locally declared elements and attributes in SequenceType to match current XQuery language.

Fixed reported bugs, ran most queries using two implementations.

References

The following references are some of the works considered by the WG in deriving its use cases.

Candidate Requirements for XML Query, Paul Cotton and Ashok Malhotra, 1998. In Query Languages 98 (QL'98). Document Object Model (DOM), Level 2 Specification. W3C Candidate Recommendation. XML Query Languages: Experiences and Exemplars, Mary Fernandez, Jerome Simeon, Philip Wadler, 1999. Database Desiderata for an XML Query Language, David Maier, 1998. In Query Languages 98 (QL'98). The Tree Structure of XML Queries, Jonathan Robie. Queries from this document, presented in a single file Queries from the XQuery document, presented in a single file.