W3C

XQuery 1.0 and XPath 2.0 Full-Text Use Cases

W3C Working Draft 4 April 2005

This version:
http://www.w3.org/TR/2005/WD-xmlquery-full-text-use-cases-20050404/
Latest version:
http://www.w3.org/TR/xmlquery-full-text-use-cases
Previous version:
http://www.w3.org/TR/2004/WD-xmlquery-full-text-use-cases-20040709/
Editors:
Sihem Amer-Yahia, AT&T Labs - Research <sihem@research.att.com>
Pat Case, Library of Congress <pcase@crs.loc.gov>

Abstract

This document specifies usage scenarios for full-text queries as part of XML Query [XQuery 1.0: An XML Query Language] and XPath [XML Path Language (XPath) 2.0].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a public W3C Working Draft for review by W3C Members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is the third version of this document. Some query statements, solutions, and results have been changed based on implementor feedback and Task Force decisions. See Appendix D (Change Log) for more information.

This document contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.

This document has been produced following the procedures set out for the W3C Process. This document was produced through the joint efforts of the W3C XML Query Working Group and the XSL Working Group (both parts of the XML Activity). It is designed to be read in conjunction with the following documents: W3C XQuery and XPath Full-Text Requirements [XQuery and XPath Full-Text Requirements] and the W3C XQuery 1.0 and XPath 2.0 Full-Text [XQuery 1.0 and XPath 2.0 Full-Text].

Public comments on this document and its open issues are invited. Comments should be entered into the last-call issue tracking system for this specification (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/). Commenters are requested to put the string "[FTUseCases]" at the beginning of the subject line of email messages involving such comments.

The patent policy for this document is specified in the 5 February 2004 W3C Patent Policy. Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page and the XSL Working Group's patent disclosure page . An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Full-Text Use Cases: Preliminaries
    1.1 Proper Display of this Unicode Document
    1.2 Introduction
    1.3 Presentation of Use Cases
    1.4 Schema for Sample Data
    1.5 Sample Data
2 Use Case "ELEMENT": Queries on XML Elements with Simple Content
    2.1 Description
    2.2 Queries and Results
        2.2.1 Q1 Word Query in an Element
        2.2.2 Q2 Phrase Query in an Element
        2.2.3 Q3 Phrase Query on Chinese Characters in an Element
        2.2.4 Q4 Query in Different Elements
        2.2.5 Q5 Query in an Element Returning Different Elements
        2.2.6 Q6 Starts-with Query
        2.2.7 Q7 Entire Element Content Query
3 Use Case "ACROSS": Queries Across XML Element Boundaries
    3.1 Description
    3.2 Queries and Results
        3.2.1 Q1 Query Across Descendant Elements (No Element Content)
        3.2.2 Q2 Query Across Descendant Elements (Highlighting Tags)
        3.2.3 Q3 Query Across Descendant Elements (Substantive Tags)
        3.2.4 Q4 Query Across Siblings
        3.2.5 Q5 Query in Different Sub-Trees
        3.2.6 Q6 Query on Entire Document
4 Use Case "OTHER": Queries on Attribute Values
    4.1 Description
    4.2 Queries and Results
        4.2.1 Q1 Query on Attribute
        4.2.2 Q2 Query on Element and Attribute
5 Use Case "WILDCARD": Character Wildcard Queries
    5.1 Description
    5.2 Queries and Results
        5.2.1 Q1 One Character Suffix Wildcard Query
        5.2.2 Q2 Zero or One Character Prefix Wildcard Query
        5.2.3 Q3 Zero or More Character Infix Wildcard Query
        5.2.4 Q4 One or More Character Suffix Wildcard Query on Part of a Word
        5.2.5 Q5 Specified Range of Characters Suffix Wildcard Query
6 Use Case "STEMMING": Word Stemming Queries
    6.1 Description
    6.2 Queries and Results
        6.2.1 Q1 Query Stemming on Word Root
        6.2.2 Q2 Query Stemming on Multiple Word Roots
7 Use Case "THESAURUS": Queries Which Use Thesauri, Dictionaries, and Taxonomies
    7.1 Description
    7.2 Queries and Results
        7.2.1 Q1 Query on Synonyms Identified by a Thesaurus
        7.2.2 Q2 Query on Narrower Terms Identified by a Thesaurus
        7.2.3 Q3 Query on Broader Terms Identified by a Thesaurus
        7.2.4 Q4 Query on Word Which Sounds Like Other Words
        7.2.5 Q5 Query on Word Spelled Similarly to Other Words
        7.2.6 Q6 Query on Subordinate Terms Identified by a Taxonomy
8 Use Case "STOP-WORD": Queries on Stop Words
    8.1 Description
    8.2 Queries and Results
        8.2.1 Q1 Query on Stop Word Treated as a Stop Word
        8.2.2 Q2 Query on Stop Word Not Treated as a Stop Word
9 Use Case "CHARACTER": Queries Specifying Normalized Forms of Characters and Tokenized Words
    9.1 Description
    9.2 Queries and Results
        9.2.1 Q1 Query on Word with Characters with Diacritics
        9.2.2 Q2 Query on Word with Characters with and Without Diacritics
        9.2.3 Q3 Query on Word with Upper Case Characters
        9.2.4 Q4 Query on Word with Upper Case and Lower Case Characters
10 Use Case "LOGICAL": Queries with Logical Expressions (Or, And, and Not Queries)
    10.1 Description
    10.2 Queries and Results
        10.2.1 Q1 Or Query
        10.2.2 Q2 And Query
        10.2.3 Q3 And Query Ordered
        10.2.4 Q4 Unary Not Query
        10.2.5 Q5 And Not Query
        10.2.6 Q6 And Not Query Where Second Operand Is a Subset of the First Operand
        10.2.7 Q7 Mild Not Query Where Second Operand Is a Subset of the First Operand
11 Use Case "CARDINALITY": Queries in Same, Any, Every Instance of an Element, and Occurrence Count Query
    11.1 Description
    11.2 Queries and Results
        11.2.1 Q1 Query in Same Instance of an Element
        11.2.2 Q2 Query in Any Instance of an Element (Existential Quantification)
        11.2.3 Q3 Query in Every Instance of an Element (Universal Quantification)
        11.2.4 Q4 Occurrence Count Query
12 Use Case "PROXIMITY": Queries on Proximity Relationships Including Distance, Window, Sentence, and Paragraph
    12.1 Description
    12.2 Queries and Results
        12.2.1 Q1 Unordered Distance Query
        12.2.2 Q2 Ordered Distance Query
        12.2.3 Q3 Ordered Window Query
        12.2.4 Q4 Unordered Within a Sentence Query
        12.2.5 Q5 Unordered Within a Paragraph Query
13 Use Case "AXES": Queries Using Relative XPath Axes
    13.1 Description
    13.2 Queries and Results
        13.2.1 Q1 Query on Element and Its Children
        13.2.2 Q2 Query on Element Returning Its First Two Children
        13.2.3 Q3 Query on Element and Its Ancestors
        13.2.4 Q4 Query on Element and Its Right Siblings
14 Use Case "IGNORE": Queries Ignoring Descendant Element Content
    14.1 Description
    14.2 Queries and Results
        14.2.1 Q1 Distance Query Ignoring Content of All Descendant Elements
        14.2.2 Q2 Phrase Query Ignoring Content of Descendant Element Specified by XPath Expression
        14.2.3 Q3 Phrase Query Ignoring Content of Descendant Element Specified by Full-Text Query
        14.2.4 Q4 Distance Query Ignoring Content of Descendant Elements Level By Level
15 Use Case "FULL-TEXT-COMPOSABILITY": Queries Illustrating Composability of Full-Text with Itself
    15.1 Description
    15.2 Queries and Results
        15.2.1 Q1 Query on Words and Phrases in Two Languages
        15.2.2 Q2 Phrase and Distance Query in an Instance of an Element with Stemming
        15.2.3 Q3 Nested Distance Query with Wildcards, Stemming, and Thesaurus Support
        15.2.4 Q4 Distance and Boolean Queries Ignoring Content of a Descendant Element with Wildcards and Stemming
        15.2.5 Q5 Query on Different Elements in Different Sub-Trees with Conditional Return
16 Use Case "XQUERY-XPATH-COMPOSABILITY": Queries Illustrating Composability of Full-Text with Other XQuery and XPath Functionalities
    16.1 Description
    16.2 Queries and Results
        16.2.1 Q1 Full-Text Query Constructing New Element
        16.2.2 Q2 Full-Text Query Returning Count of Descendant Element Occurrences
        16.2.3 Q3 Full-Text Query with Conditional Return
        16.2.4 Q4 Full-Text Query with Numeric Value Comparison
        16.2.5 Q5 Full-Text Query with Character String Query
        16.2.6 Q6 Full-Text Query with Conditional Return of Boolean Values
        16.2.7 Q7 Full-Text Query with Date Comparison and Element Occurrence Count
        16.2.8 Q8 Query with XQuery Expression Within Full-Text Expression
17 Use Case "SCORE": All Queries May Be Written with Score, Queries in this Section Must Be Written with Score
    17.1 Description
    17.2 Queries and Results
        17.2.1 Q1 Query Returning Scores
        17.2.2 Q2 Query Returning Results with Top Scores
        17.2.3 Q3 Query Filtering on Scores
        17.2.4 Q4 Query Combining Score and XML Structure with a Conditional Return
        17.2.5 Q5 Query Returning All Books Ordered by Score

Appendices

A Acknowledgements
B References
    B.1 References (Primary)
    B.2 References (Background)
C Issues
D Change Log


1 Full-Text Use Cases: Preliminaries

1.1 Proper Display of this Unicode Document

(1) Use a current operating system and browser.

(2) If necessary, set the character encoding in the browser manually to Unicode or UTF-8. Often this setting may be changed from the View menu.

(3) If after setting the character encoding to Unicode, the Chinese characters in the subject elements of the sample data still do not display, it is likely that the browser cannot locate a font that contains Chinese characters in Unicode encoding. It might be necessary to add a Unicode font, preferably Arial Unicode MS.

1.2 Introduction

The use cases listed below were created by XML Query and XSL Working Groups, to illustrate important applications of full-text querying within an XML query language. Each use case exercises a specific functionality relevant to full-text querying. An XML Schema and sample input data are provided. Each use case specifies a query applied to the input data, a solution in XQuery, a solution in XPath (when possible), and the expected results.

The document supplements the XML Query Use Cases which can be found in the W3C XML Query Use Cases [XML Query Use Cases]. Use cases for character string querying are included in the XML Query Use Cases, not in this document.

The full-text queries in the following use cases are performed on text which has been tokenized, i.e., broken into a sequence of words, units of punctuation, and spaces.

A word is defined as any character, n-gram, or sequence of characters returned by a tokenizer as a basic unit to be queried. Each instance of a word consists of zero or more consecutive characters. Beyond that words are implementation defined. Note that consecutive words need not be separated by either punctuation or space, and words may overlap. A phrase is an ordered list of words. A phrase may contain any number of words.

Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming).

These use cases:

(1) Present some possible functions and features for tokenized text support in XQuery and XPath. None are yet available in XQuery or XPath. Please comment on these use cases and recommend others.

(2) Illustrate simple and complex queries. The more complex queries would normally only be constructed by programmers, librarians, and other expert users, or provided for novice users via saved queries and graphical user interfaces. Each query is intended to illustrate a single functionality, although queries might overlap in their functionalities (e.g., phrases and ordered distance queries allowing no intervening words). Overlapping and similar functionalities are noted in the comments on query behavior.

(3) Draw from sample data which are almost entirely in English. Use cases in other languages are solicited, especially where they illustrate language-specific implementations of functions and features. Among the most sought after are use cases for queries using prefix and infix wild cards, proximity queries, and operators and queries requiring functionality which may not have Western language equivalents.

(4) Include queries which in most instances can be written with pure Boolean full-text predicates or with scoring (e.g., scoring on the number of occurrences of a word or phrase, scoring on how close words are to one another within a distance query, scoring on how similar a word is to the one being stemmed) [BYR99] [HTK00]. A few, those in Section 17 (SCORE), cannot be written with Boolean full-text predicates. Scoring methodologies will not be defined in this recommendation. Scoring will be implementation defined. Results are provided in document order, except those in Section 17 (SCORE). Results could be returned ordered differently, such as by relevance (based on implementation defined scoring) or explicitly by element.

(5) Query element content. See Section 4 (OTHER) for explicit queries on attribute values.

(6) Include queries which are case-insensitive. When returning a paragraph, the text is returned as it occurs in the data model. This approach was chosen to keep the sample data short and the expected results meaningful. It would have be equally valid to return only the character queried. A case-sensitive query is found in Section 9 (CHARACTER).

(7) Include queries which when they target XML elements are understood, unless otherwise stated, to query text within any text node descendant of the element.

(8) Include queries which return only elements and attributes which meet all the conditions specified in the query. In particular, Boolean queries return results where the Boolean conditions in the query are satisfied, i.e., are used to select what is being returned to users.

Query results may be returned in different ways. From a query for books containing the word "usability", users might be interested in returning, for each book containing the word "usability", its number and its entire content. In another situation for the same query, users might be interested in returning, for each book containing the word "usability", its number and only the elements and attributes in the content which contain the word "usability". As in this second situation, the queries in these use cases return only elements and attributes which meet all the conditions specified in the query.

The Return clause may also include additional or different elements and attributes if specified, and may construct new elements.

(9) Include queries which provide some of the basic functionality of fuzzy match querying (e.g., wildcards, stemming, thesaurus support, proximity).

(10) Provide highlighting of found words and phrases in the expected results of queries as an aid to users. The presence of highlighting says nothing about whether highlighting will be a feature of XQuery or XPath full-text querying.

(11) Display query solutions in XQuery and when possible in XPath. Queries which may not be written in XPath include those which contain element constructors, and cannot be written without let and order by clauses.

Examples of full-text querying functionalities for XML query languages can be found in [FGR01], [HTK00], [MJK98], [SCH01] and [TWE00].

To make the output more readable, the output of queries has been formatted using white space which may not be returned by a query processor. This white space should not be considered normative for the correctness of results.

These use cases represent a snapshot of an ongoing work. Some important operators and features are not yet adequately covered by a use case. The XML Query and XSL Working Groups reserve the right to add, delete, or modify individual queries or whole use cases as the work progresses. The presence of a query in this set of use cases does not necessarily indicate that the query will be expressible in XQuery [XQuery 1.0: An XML Query Language] and/or XPath [XML Path Language (XPath) 2.0] to be created by the XML Query and XSL Working Groups.

1.3 Presentation of Use Cases

The queries in these use cases are presented in the following format:

Query number   Query title

User statement of query

Statement of functionality illustrated by query

  • Operands: Parts of words, words, phrases

  • Functionality: Operators, functions, collations, other functionality

  • Data context: One XPath expression locating the data being queried.

  • Query context: One or more XPath expressions locating the elements and attributes to be queried. The context of elements and attributes used in the Query context is relative to the Data context defined above.

  • Return: One or more XPath expressions which are returned only if the conditions specified in the query are met. Returned elements or attributes may differ from those specified in the Query context. Newly constructed elements might be returned. As in the Query context, the context of elements and attributes in Return statements is relative to the Data context defined above.

  • Comments: Comments on query behavior in general, and against the sample data in particular, plus the rationale for including this query in the use cases.

Solution in XQuery:

Solutions illustrating XQuery Full-Text syntax  
appear here. All queries may be written in XQuery.

Solutions are written to return Boolean full-text predicates 
and not to invoke scoring, except for those in Section 17 (SCORE), 
however all the queries in the document may be written as scored 
queries with the addition of a score clause.

All queries are written assuming the default function namespace, 
without the fn: prefix.

See Issue 1 [staticErrorTesting]: The queries in the Full-Text Use Cases have not yet been tested to ensure they do not cause static errors, such as cardinality errors.

Solution in XPath:

Solutions illustrating XPath Full-Text syntax appear 
(when the query may be written in XPath) here.

Solutions are written to return Boolean full-text predicates 
and not to invoke scoring, except for those in Section 17 (SCORE), 
however all the queries in the document may be written as scored 
queries with the addition of a score clause.

All queries are written assuming the default function namespace, 
without the fn: prefix.

See Issue 1 [staticErrorTesting]: The queries in the Full-Text Use Cases have not yet been tested to ensure they do not cause static errors, such as cardinality errors.

Expected Result:

Results are provided here.
                                
Found words and phrases are highlighted. 
                                
For brevity, only the elements and attributes which meet 
the conditions specified in the query are displayed. Others are
replaced with ...s.

Results are provided in document order, except those 
in Section 17 (SCORE)

1.4 Schema for Sample Data

The example queries in these use cases are based on a collection with the following Schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
version="1.0">
<xs:import namespace="http://www.w3.org/XML/1998/namespace" 
schemaLocation = "http://www.w3.org/2001/xml.xsd"/>
   <xs:element name="books">
      <xs:annotation>
         <xs:documentation>A possible XML Schema for Sample Data 
         in XQuery and XPath Full-Text Use Cases
         </xs:documentation>
      </xs:annotation>
      <xs:complexType>
         <xs:sequence maxOccurs="unbounded">
            <xs:element name="book">
               <xs:complexType>
                  <xs:sequence>
                     <xs:element name="metadata" 
                     type="metadataType"/>
                     <xs:element name="content" 
                     type="contentType"/>
                  </xs:sequence>
                  <xs:attribute name="number" type="xs:integer"/>
               </xs:complexType>
            </xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
   <xs:complexType name="anyXMLTextType" mixed="true">
      <xs:annotation>
         <xs:documentation>free text, contains any well-formed 
         XML</xs:documentation>
      </xs:annotation>
      <xs:sequence>
         <xs:any processContents="skip" minOccurs="0" 
         maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="metadataType">
      <xs:sequence>
         <xs:element name="title">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="shortTitle" 
                     type="xs:string"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType>
         </xs:element>
         <xs:element name="author" type="xs:string" 
         minOccurs="0" 
         maxOccurs="unbounded"/>
         <xs:element name="publicationInfo" 
         type="publicationInfoType"/>
         <xs:element name="price" minOccurs="0">
            <xs:simpleType>
               <xs:restriction base="xs:float">
                  <xs:minInclusive value="0"/>
                  <xs:maxInclusive value="10000"/>
               </xs:restriction>
            </xs:simpleType>
         </xs:element>
         <xs:element name="subjects" 
         maxOccurs="unbounded">
            <xs:sequence>
               <xs:element name="subject" type="xs:string" 
               maxOccurs="unbounded"/>
            </xs:sequence>
               <xs:attribute ref="xml:lang"/>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="publicationInfoType">
      <xs:sequence>
         <xs:element name="place" type="xs:string" 
         minOccurs="0"/>
         <xs:element name="publisher" type="xs:string" 
         maxOccurs="unbounded"/>
         <xs:element name="dateIssued" type="xs:string"/>
         <xs:element name="dateRevised" type="xs:string"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="contentType">
      <xs:sequence>
         <xs:element name="introduction" 
         type="introductionType" 
         minOccurs="0"/>
         <xs:element name="part" type="partType" 
         maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="introductionType">
      <xs:sequence>
         <xs:element name="author" type="xs:string" 
         minOccurs="0"/>
         <xs:element name="p" maxOccurs="unbounded">
            <xs:complexType mixed="true">
               <xs:choice minOccurs="0" 
               maxOccurs="unbounded">
                  <xs:element name="b"/>
                  <xs:element name="emph"/>
                  <xs:element name="i"/>
               </xs:choice>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="partType">
      <xs:sequence>
         <xs:element name="container" type="xs:string" 
         minOccurs="0">
             <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="type" type="xs:string"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType>
         </xs:element>
         <xs:element name="title" type="titleType" 
         minOccurs="0"/>
         <xs:element name="introduction" 
         type="introductionType" 
         minOccurs="0"/>
         <xs:element name="chapter" type="chapterType" 
         minOccurs="0" maxOccurs="unbounded"/>
         <xs:element name="component" 
         type="componentType" 
         minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="number" type="xs:string"/>
   </xs:complexType>
   <xs:complexType name="chapterType">
      <xs:sequence>
         <xs:element name="title" type="xs:string"/>
         <xs:element name="p" type="anyXMLTextType" 
         maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="footnoteType" mixed="true">
      <xs:sequence>
         <xs:element name="citation" type="xs:string"
         minOccurs="0" maxOccurs="unbounded">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="url" type="xs:anyURI"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType>
         </xs:element>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="componentType">
      <xs:sequence>
         <xs:element name="container" type="xs:string" 
         minOccurs="0">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="type" type="xs:string"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType>
         </xs:element>
         <xs:element name="componentTitle" 
         type="componentTitleType"/>
         <xs:element name="subComponent" 
         type="subComponentType" 
         minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="titleType" mixed="true">
      <xs:all minOccurs="0">
         <xs:element name="date" type="xs:string">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="normalize" 
                     type="xs:string"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType>
         </xs:element>
      </xs:all>
   </xs:complexType>
   <xs:complexType name="componentTitleType" 
   mixed="true">
      <xs:sequence>
         <xs:element name="componentDate" 
         type="xs:string" 
        minOccurs="0">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="normalize" 
                     type="xs:string"
                      use="optional"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType> 
         </xs:element>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="subComponentType">
      <xs:sequence>
         <xs:element name="container" type="xs:string" 
         minOccurs="0">
            <xs:complexType>
               <xs:simpleContent>
                  <xs:extension base="xs:string">
                     <xs:attribute name="type" type="xs:string"/>
                  </xs:extension>
               </xs:simpleContent>
            </xs:complexType> 
        </xs:element>
        <xs:element type="componentTitleType"/>
        <xs:element name="subsubComponent" 
        type="subSubComponentType" 
        minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
   </xs:complexType>
   <xs:complexType name="subSubComponentType">
      <xs:sequence>
         <xs:element name="container" type="xs:string" 
         minOccurs="0">
           <xs:complexType>
              <xs:simpleContent>
                 <xs:extension base="xs:string">
                    <xs:attribute name="type" type="xs:string"/>
                 </xs:extension>
              </xs:simpleContent>
           </xs:complexType> 
        </xs:element>
        <xs:element name="componentTitle" 
        type="componentTitleType"/>
      </xs:sequence>
   </xs:complexType>
</xs:schema>

1.5 Sample Data

The data consists of a collection of three books. Two are primarily instructive text. The third is a guide to a manuscript collection. All contain metadata and full text.

The sample data binds to this URL: "http://bstore1.example.com/full-text.html".

<books>
<book number="1">
   <metadata>
      <title shortTitle="Improving Web Site Usability">Improving 
      the Usability of a Web Site Through Expert Reviews and 
      Usability Testing</title>
      <author>Millicent Marigold</author>
      <author>Montana Marigold</author>
      <publicationInfo>
         <place>New York</place>
         <publisher>Ersatz Publications</publisher>
         <dateIssued>2001</dateIssued>
         <dateRevised>2002</dateRevised>
      </publicationInfo>
      <price>25.99</price>   
      <subjects xml:lang="en">
         <subject>Usability testing</subject>
         <subject>Web site development</subject>        
         <subject>Heuristic evaluation</subject>
         <subject>Cognitive walk-through</subject>
         <subject>Web site usability</subject>
      </subjects>
      <subjects xml:lang="fr">        
         <subject>Tests d'ergonomie</subject>
         <subject>Développement de site web</subject>        
         <subject>Évaluation heuristique</subject>
         <subject>Parcours cognitif</subject>
         <subject>Ergonomie de site web</subject>     
      </subjects>
      <subjects xml:lang="zh">    
         <subject>可用性测试</subject>
         <subject>网站建置</subject>        
         <subject>启发式评价</subject>
         <subject>认知推演</subject>
         <subject>网站可用性</subject>
      </subjects>
   </metadata>
   <content>
      <introduction>
         <author>Elina Rose</author>
         <p>The usability of a Web site is how well the 
         site supports the user in achieving specified 
         goals. A Web site should facilitate learning, 
         and enable efficient and effective task 
         completion, while propagating few errors. 
         Satisfaction with the site is also important. 
         The user must not only be well-served, but must 
         feel well-served.</p> 
         <p>Expert reviews and usability testing are 
         methods of identifying problems in layout, 
         terminology, and navigation before they frustrate
         users and drive them away from your site.</p>
         <p>The most successful projects employ multiple 
         methods in multiple iterations. As Millicent 
         Marigold remarked during a recent conference, 
         "Don't stop. Iterate, iterate, then iterate 
         again."</p>
         <p>This book has been approved by the Web Site 
         Users Association.</p>
      </introduction>
      <part number="1">
         <title>Expert Reviews</title>
         <introduction>
            <p>Expert reviewers identify problems 
            and recommend changes to web sites based 
            on research in human computer interaction 
            and their experience in the field.</p> 
            <p>Two expert review methods are discussed 
            here. They are heuristic evaluation and 
            cognitive walk-through.</p> 
            <p>Expert review methods should be 
            initiated early in the development process, 
            as soon as paper <b>p</b>rototypes 
            (hand-drawn pictures of Web pages) or 
            <b>w</b>ireframes (electronic mockups) are 
            available. They should be conducted using 
            the hardware and software similar to that 
            employed by users.</p>
         </introduction>
         <chapter>
            <title>Heuristic Evaluation</title>
            <p>Expert reviewers critique an interface to 
            determine conformance with recognized 
            usability principles. <footnote>One of the 
            best known lists of heuristics is <citation 
            url="http://www.useit.com/papers/heuristic
            /heuristic_list.html">Ten Usability 
            Heuristics by Jacob Nielson</citation>. Another 
            is <citation url="http://usability.gov
            /guidelines/index.html"> Research-Based Web 
            Design and Usability Guidelines</citation>
            </footnote></p> 
         </chapter>
         <chapter>
            <title>Cognitive Walk-Through</title>
            <p>Expert reviewers evaluate Web site 
            understandability and ease of learning while 
            performing specified tasks. They walk through 
            the site answering questions such as "Would a 
            user know by looking at the screen how to 
            complete the first step of the task?" and "If 
            the user completed the first step, would the 
            user know what to do next?," with the goal of 
            identifying any obstacles to completing the 
            task and assessing whether the user would 
            cognitively be aware that he was successful in 
            completing a step in the process.</p>
         </chapter>
      </part>
      <part number="2">
         <chapter>
            <title>Usability Testing</title>
            <p>Once the problems identified by expert 
            reviews have been corrected, it is time to 
            conduct some tests of the site with your unique 
            audience or audiences by conducting usability 
            testing.</p>
            <p>Users are asked to complete tasks which 
            measure the success of the information 
            architecture and navigational elements of the 
            site.</p>
            <p>Then changes are made to improve service to 
            users.</p>
         </chapter>
      </part> 
   </content>
</book>

<book number="2">
   <metadata>
      <title shortTitle="Usability Basics">Usability 
      Basics: How to Plan for and Conduct Usability Tests 
      on Web Site Thereby Improving the Usability of Your 
      Web Site</title>
      <publicationInfo>
         <place>New York</place>
         <publisher>Ersatz Publications</publisher>
         <publisher>Electronic BookWorks</publisher>         
         <dateIssued>2000</dateIssued>
         <dateRevised>2001</dateRevised>
      </publicationInfo>
      <price>174.00</price>   
      <subjects xml:lang="en">
         <subject>Usability testing</subject>
         <subject>Web site development</subject>
         <subject>Guides and finding aids</subject>
      </subjects>
      <subjects xml:lang="fr">
         <subject>Tests d'ergonomie</subject>
         <subject>Développement de site web</subject>
         <subject>Guides et outils de recherche</subject>
      </subjects>
      <subjects xml:lang="zh">
         <subject>可用性测试</subject>
         <subject>网站建置</subject>
         <subject>指南和检索工具</subject>
      </subjects>
   </metadata>
   <content>
      <introduction>
         <p>This is a basic handbook for planning and 
         conducting usability tests on Web sites. Usability 
         testing should be used in conjunction with other 
         expert review methods.</p>
         <p>This book has not been approved by the Web Site 
         Users Association.</p>
      </introduction>
      <part number="1">
         <chapter>
            <title>Planning then Conducting Usability 
            Tests</title> 
            <p>Take the following steps to plan usability 
            testing. <step number="1">Clarify and 
            articulate the goal of the usability testing.
            </step> <step number="2">Identify tasks which 
            are critical for users to be able to complete 
            successfully.</step> <step number="3">Compile 
            a script of questions or instructions which 
            will prompt the user to attempt those 
            tasks.</step> <step number="4">Identify your 
            users and begin recruiting them.</step> <step 
            number="5">Conduct a pretest on a few users.
            </step> <step number="6">Edit the script based 
            on insights gleaned from the pretest.</step> 
            <step number="7">Resume testing.</step></p>
         </chapter>
      </part>
      <part number="2">
         <chapter>
            <title>Conducting Usability Tests</title> 
            <p>Users can be tested at any computer 
            workstation <footnote>They may be most 
            comfortable at their own workstation.
            </footnote> or in a lab.</p>
            <p>Give the user the script, then assure them 
            that you are testing the Web site, not them. 
            Users are asked to verbalize their thoughts as 
            they complete the tasks. The event is recorded 
            or someone takes notes. It is often preferable 
            to have two testers, <footnote>Usability 
            testing can be done at great expense or on a 
            shoe string, using <testingProcedure>in-house 
            expertise</testingProcedure> or 
            <testingProcedure>contracting with human 
            computer interaction professionals
            </testingProcedure>.</footnote> one to ask the 
            questions, another to take notes. Testers should 
            offer no guidance or comments to the user. Mouse 
            movements, typing, expressions, and the user's 
            words should be recorded.</p>
         </chapter>
         <chapter>
            <title>Evaluating and Implementing Results</title> 
            <p>Compile the results and review collectively. 
            Make changes to the site to alleviate the problems 
            found in Web site components which were propagating 
            the largest number of or the most devastating errors. 
            Begin new iterations of testing and changes, until 
            users are successful in the accomplishing the 
            tasks.</p>
         </chapter>
      </part>
   </content>
</book>

<book number="3">
   <metadata>
      <title shortTitle="Usabilityguy Manuscript 
      Guide">John Wesley Usabilityguy: A Register of His 
      Papers</title>
      <author>Millicent Marigold</author>
      <author>Morty Marigold</author>
      <publicationInfo>  
         <place>Washington, D.C.</place>    
         <publisher>Ersatz Manuscript Library</publisher>
         <dateIssued>1998</dateIssued>
         <dateRevised>2002</dateRevised>
      </publicationInfo>
      <price>21.49</price>   
      <subjects xml:lang="en">
         <subject>Computers</subject>
         <subject>Software evaluation</subject>
         <subject>Usability testing</subject>
         <subject>Manuscript collections</subject>
      </subjects>
      <subjects xml:lang="fr">
         <subject>Ordinateurs</subject>
         <subject>Évaluation de logiciels</subject>
         <subject>Tests d'ergonomie</subject>
         <subject>Collections de manuscrits</subject>
      </subjects>
      <subjects xml:lang="zh">
         <subject>计算机</subject>
         <subject>软件评价</subject>
         <subject>可用性测试</subject>
         <subject>手稿专藏</subject>
      </subjects>
   </metadata>
   <content>
      <introduction>
         <p>The papers of John Wesley Usabilityguy span the 
         years 1946-2001, with the bulk of the items 
         concentrated in the period from 1985 to 2001. The 
         papers feature his career as a developer of software 
         applications and usability specialist. The collection 
         consists of correspondence, memoranda, journals, 
         speeches, article drafts, book drafts, notes, charts, 
         graphs, family papers, clippings, printed matter, 
         photographs, résumés and other materials.</p>
      </introduction>
      <part number="1"><container type="box">1-12</container>
         <title>Subject File, <date normalize="1930/1974">
         1930-1974</date></title>
         <introduction>
            <p>Correspondence, telegrams, memoranda, journals, 
            logs, testimony, approved travel orders, invitations, 
            charts, graphs, forms, biographical data, photographs, 
            book drafts, clippings and other printed matter, 
            résumés and miscellaneous material. Organized by 
            name of person or organization, topic, or type of 
            material.</p>
         </introduction>
         <component><container type="box">1</container>
           <componentTitle>Computers</componentTitle>
           <subComponent>
              <componentTitle>Software, 
              <componentDate normalize="1946/1947">1946-1947
              </componentDate>
              </componentTitle>
           </subComponent>
           <subComponent>
              <componentTitle>Human Computer Interaction 
              research, <componentDate normalize="1945/1952">
              1945-1952</componentDate>
              </componentTitle>
              <subsubComponent>
                 <componentTitle>Flow diagram, 
                 <componentDate normalize="1950">1950
                 </componentDate>
                 </componentTitle>
              </subsubComponent>
              <subsubComponent>
                 <componentTitle>General, 
                 <componentDate normalize="1947/1951">1947-1951
                 </componentDate>
                 </componentTitle>
              </subsubComponent>
              <subsubComponent><container type="box">2</container>
                 <componentTitle>Eye Movement research,
                 <componentDate normalize="1949/1950">1949-1950
                 </componentDate>
                 </componentTitle>
              </subsubComponent> 
              <subsubComponent>
                 <componentTitle>User profiling, 
                 <componentDate normalize="1950/1959">1950s
                 </componentDate>
                 </componentTitle>
              </subsubComponent>
            </subComponent>
         </component>
         <component>
           <componentTitle>Web User Appreciation Award, 
           <componentDate normalize="1956">1956</componentDate>
           </componentTitle>
         </component>
      </part>
      <part number="2"><container type="box">3-5</container>
         <title>Writings File, 
         <date normalize="1985/1999">1985-1999</date>
         </title>
         <introduction>
            <p>Correspondence, articles, book drafts, notes, 
            contracts, clippings, and printed matter. Arranged 
            alphabetically by type (articles, books, reports, 
            and miscellaneous) and therein alphabetically by 
            type of material, subject, or title.</p>
         </introduction>
         <component>
            <componentTitle>Writings by Usabilityguy
            </componentTitle>
            <subComponent>
               <componentTitle><componentDate normalize="1996">
               1996</componentDate>
               </componentTitle> 
               <subsubComponent>
                  <componentTitle>"How Many Users Are Enough 
                  for User Testing?"</componentTitle>
               </subsubComponent> 
               <subsubComponent>
                  <componentTitle>"How to Evaluate Results from 
                  User Tests."</componentTitle>
               </subsubComponent>
               <subsubComponent>
                  <container type="box">5</container>
                  <componentTitle>"When Are You Done Testing?"
                  </componentTitle>
               </subsubComponent>
               <subsubComponent>
                  <componentTitle>"Do-It-Yourself User Testing"
                  </componentTitle>
               </subsubComponent> 
            </subComponent>
         </component>
         <component>
            <componentTitle>Charitable Contributions
            </componentTitle> 
            <subComponent>
               <componentTitle>Diseases: AIDS, Hepatitis, 
               Tuberculosis <componentDate normalize=
               "1990/1999">1990-1999</componentDate>
               </componentTitle>
            </subComponent> 
            <subComponent>
               <componentTitle>Environmental Conservation: 
               Rivers <componentDate normalize="1995">1995
               </componentDate>
               </componentTitle>
            </subComponent>
         </component>
      </part>
   </content>
</book>
</books>

2 Use Case "ELEMENT": Queries on XML Elements with Simple Content

2.1 Description

These use cases query words and phrases in XML elements with simple content.

These use cases begin with the simplest queries possible. They query a word or phrase in an element with simple content and no descendants. One of these queries is on Chinese characters. Some queries return additional or different elements than were queried. A query queries and returns the full document. Others find a phrase only when it starts an element and find a exact phrase when it is the entire content of an element, allowing full-text variations, such as case, diacritics, and wildcards.

2.2 Queries and Results

2.2.1 Q1 Word Query in an Element

Find all book titles containing the word "usability".

This query finds a word in an element.

  • Operands: "usability"

  • Functionality: word query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/title

  • Return: ./metadata/title

  • Comments: This is the simplest query possible, a query on a word in an element. This query does not employ wildcards, stemming, or thesaurus support. While this query finds useful results in the sample data, many queries such as one on the word "test" would not. A query on the word "test" would return no results, missing the word variants which exist in the sample data: "pretest", "tested", "testers", "testimony", "testing", and "tests".

Solution in XQuery:

doc("http://bstore1.example.com/full-text.xml")
   /books/book/metadata/title[. ftcontains "usability"]

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book/metadata/title[. ftcontains "usability"]

Expected Result:

<title shortTitle="Improving Web Site Usability">Improving 
the Usability of a Web Site Through Expert Reviews 
and Usability Testing</title>

<title shortTitle="Usability Basics">Usability 
Basics: How to Plan for and Conduct Usability Tests 
on Web Site Thereby Improving the Usability of Your 
Web Site</title>

2.2.2 Q2 Phrase Query in an Element

Find all book subjects containing the phrase "usability testing".

This query finds a phrase in an element.

  • Operands: "usability testing"

  • Functionality: phrase query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/subjects/subject

  • Return: ./metadata/subjects/subject

  • Comments: This is a simple query on a phrase in an element. Like an ordered distance query allowing no intervening words, the words in this phrase query must be adjacent to each other and must appear in the order specified. While this query finds useful results in the sample data, many queries such as one on "software developer" would not. A query on the phrase "software developer" would return no results, missing "developer of software" which exists in the sample data.

Solution in XQuery:

doc("http://bstore1.example.com/full-text.xml")
   /books/book/metadata/subjects/subject[. ftcontains 
   "usability testing"]

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book/metadata/subjects/subject[. ftcontains 
"usability testing"]

Expected Result:

<subject>Usability testing</subject>
                                                
<subject>Usability testing</subject>

<subject>Usability testing</subject>

2.2.3 Q3 Phrase Query on Chinese Characters in an Element

Find all book subjects containing the phrase (n-gram) "网站".

This query finds a phrase (n-gram) in an element.

  • Operands: "网站"

  • Functionality: phrase query, language qualifier

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/subjects/subject

  • Return: ./metadata/subjects/subject

  • Comments: This query finds a phrase (n-gram) consisting of two Chinese characters. It assumes a specific language dependent tokenization.

Solution in XQuery:

doc("http://bstore1.example.com/full-text.xml")
   /books/book/metadata/subjects/subject[. ftcontains 
   "网站" language "zh"]

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book/metadata/subjects/subject[. ftcontains 
"网站" language "zh"]

Expected Result:

<subject>网站建置</subject>  
                                            
<subject>网站可用性</subject>

<subject>网站建置</subject>

2.2.4 Q4 Query in Different Elements

Find all books with "usability tests" in book or chapter titles.

This query finds a phrase in different elements.

  • Operands: "usability tests"

  • Functionality: phrase query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/title, ./content/part/chapter/title

  • Return: .

  • Comments: This query is an example of a query in two different elements.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $title := $book/metadata/title[. ftcontains "usability tests"] 
   or $book/content/part/chapter/title[. ftcontains "usability tests"] 
where count($title) > 0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./metadata/title[. ftcontains "usability tests"] 
or ./content/part/chapter/title[. ftcontains "usability tests"])>0]

Expected Result:

<book number="2">
   <metadata>
      ...
      <title shortTitle="Usability Basics">Usability 
      Basics: How to Plan for and Conduct Usability Tests 
      on Web Site Thereby Improving the Usability of 
      Your Web Site</title> 
      ...
   </metadata>
   <content>
      ...
      <part number="1">
         <chapter>
            <title>Planning then Conducting Usability
            Tests</title> 
            ...
         </chapter>
      </part>
      <part number="2">
         <chapter>
            <title>Conducting Usability Tests</title>
            ...
         </chapter>
      </part>
      .... 
   </content>
</book>

2.2.5 Q5 Query in an Element Returning Different Elements

Find all books with the phrase "usability testing" in some subject.

This query finds a phrase in an element and returns different elements from the same document.

  • Operands: "usability testing"

  • Functionality: phrase query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/subjects/subject

  • Return: ./metadata/title, ./metadata/author

  • Comments: This query queries the subject element, but does not return it. It returns two different elements.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
where $book//subject ftcontains "usability testing"
return $book/metadata/(title|author)

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[./metadata/subjects/subject 
ftcontains "usability testing"]/metadata/(title|author)

Expected Result:

<title shortTitle="Improving Web Site Usability">Improving 
the Usability of a Web Site Through Expert Reviews 
and Usability Testing</title> 
<author>Millicent Marigold</author>     
<author>Montana Marigold</author> 
   
<title shortTitle="Usability Basics">Usability 
Basics: How to Plan for and Conduct Usability Tests 
on Web Site Thereby Improving the Usability of Your 
Web Site</title>
   
<title shortTitle="Usabilityguy Manuscript 
Guide">John Wesley Usabilityguy: A Register of His 
Papers</title>
<author>Millicent Marigold</author>
<author>Morty Marigold</author>

2.2.6 Q6 Starts-with Query

Find all book titles which start with "improving" followed within 2 words by "usability".

This query finds an element which starts with specific words.

  • Operands: "improving" "usability"

  • Functionality: word queries, ordered distance (0 to 2 intervening words), starts-with functionality

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/title

  • Return: ./metadata/title

  • Comments: The starts-with functionality restricts the query to the first words or phrase in an element. It is especially useful in querying journal titles (e.g., Journal of Psychology) in large library collections. This query does not find Book 2 which contains the phrase "improving the usability" in the title element, because the title element does not start with "improving" followed within 2 words by "usability".

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $title := $book/metadata/title[. ftcontains 
"improving" && "usability" distance at 
most 2 words ordered at start]
where count($title)>0
return $title

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book/metadata/title[count(. ftcontains 
"improving" && "usability" distance at 
most 2 words ordered at start)>0]

Expected Result:

<title shortTitle="Improving Web Site Usability">Improving 
the Usability of a Web Site Through Expert Reviews and 
Usability Testing</title>

2.2.7 Q7 Entire Element Content Query

Find all books with the entire title "improve the usability of a web site through expert reviews and usability testing", allowing any form of the word "improve".

This query finds the phrase when it is the entire content of an element.

  • Operands: "improve the usability of a web site through expert reviews and usability testing"

  • Functionality: phrase query, character wildcard (suffix) (0 or more), entire element content functionality

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/title

  • Return: ./metadata/title

  • Comments: This query insists that the element contains the entire phrase being queried, no more and no less. It allows full-text variations, such as case, diacritics, and wildcards.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $exactTitle := $book/metadata/title[. ftcontains 
   "improv.* the usability of a web site through expert 
   reviews and usability testing" entire content]
where count($exactTitle)>0
return $exactTitle

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book/metadata/title[count(. ftcontains 
"improv.* the usability of a web site through expert 
reviews and usability testing" entire content)>0]

Expected Result:

<title shortTitle="Improving Web Site Usability">Improving 
the Usability of a Web Site Through Expert Reviews and 
Usability Testing</title>

3 Use Case "ACROSS": Queries Across XML Element Boundaries

3.1 Description

These use cases by default query across XML element boundaries.

Boundaries include XML tags: Start-Tags, End-Tags, and Empty-Element Tags. Descendant XML tags and attribute values are removed from the string to be queried by tokenization before the query. At the XQuery Data Model level tags are a syntactic element.

Find queries in an element which do not query some or all of its descendant elements in Section 14 (IGNORE).

3.2 Queries and Results

3.2.1 Q1 Query Across Descendant Elements (No Element Content)

Find all book chapters containing the phrase "one of the best known lists of heuristics is Ten Usability Heuristics".

This query crosses element boundaries.

  • Operands: "one of the best known lists of heuristics is Ten Usability Heuristics"

  • Functionality: phrase query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content/part/chapter

  • Return: .

  • Comments: Querying across element boundaries is similar to an XQuery and XPath character string function converting the sub-tree under an element into a string by removing all markup. The citation element tags and its attribute have been removed by tokenization.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $chap := $book//chapter[. ftcontains 
   "one of the best known lists of heuristics is 
   Ten Usability Heuristics"]
where count($chap) > 0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//chapter ftcontains "one of 
the best known lists of heuristics is Ten Usability 
Heuristics")>0]

Expected Result:

<book number="1">
   <metadata>
      ...
   </metadata>
   <content>
      ...
      <part number="1">
         ...
         <chapter> 
            <title>Heuristic Evaluation</title> 
            <p>Expert reviewers critique an interface to
            determine conformance with recognized 
            usability principles. <footnote>One of the
            best known lists of heuristics is <citation
            url="http://www.useit.com/papers/heuristic
            /heuristic_list.html"> Ten Usability 
            Heuristics by Jacob Nielson</citation>. Another
            is <citation url="http://usability.gov
            /guidelines/index.html"> Research-Based Web
            Design and Usability Guidelines</citation>
            </footnote></p> 
         </chapter>
         ...
      </part>
      ...
   </content>
</book>

3.2.2 Q2 Query Across Descendant Elements (Highlighting Tags)

Find all part introductions containing the word "prototypes".

This query crosses element boundaries.

  • Operands: "prototypes"

  • Functionality: word query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content/part/introduction

  • Return: .

  • Comments: Querying across element boundaries is similar to an XQuery and XPath character string function converting the sub-tree under an element into a string by removing all markup. The bold element tags have been removed by tokenization.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $intro := $book/content/part/introduction[. ftcontains 
   "prototypes"]
where count($intro)>0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./content/part/introduction ftcontains 
"prototypes")>0]

Expected Result:

<book number="1"> 
   <metadata>
      ...
   </metadata>
   <content>
      ...
      <part number="1">
         <introduction>
            <p>Expert review methods should be
            initiated early in the development process, 
            as soon as paper <b>p</b>rototypes
            (hand-drawn pictures of Web pages) or
            <b>w</b>ireframes (electronic mockups) are
            available. They should be conducted using
            the hardware and software similar to that 
            employed by users.</p>
         </introduction>
         ...
      </part>
      ...
   </content>  
</book>

3.2.3 Q3 Query Across Descendant Elements (Substantive Tags)

Find all book text with the word "tests".

This query finds a word in an element and its descendants.

  • Operands: "tests"

  • Functionality: word query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: .

  • Comments: Querying across element boundaries is similar to an XQuery and XPath character string function converting the sub-tree under an element into a string by removing all markup. Element tags and have been removed by tokenization, including part, chapter, title, p, component, and componentTitle tags.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book/content[. ftcontains "tests"]
where count($cont)>0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./content ftcontains "tests")>0]

Expected Result:

<book number="1">
   <metadata>
      ...
   </metadata>
   <content>
      ...
      <part number="2">
         <chapter>
            <title>Usability Testing</title>
            <p>Once the problems identified by expert 
            reviews have been corrected, it is time to 
            conduct some tests of the site with your unique 
            audience or audiences by conducting usability 
            testing.</p>
            ...
         </chapter>
      </part> 
      ...
   </content>
</book>      
   
<book number="2">
   <metadata>
      ...
   </metadata>
   <content>
      <introduction>
         <p>This is a basic handbook for planning and 
         conducting usability tests on Web sites. Usability 
         testing should be used in conjunction with other 
         expert review methods.</p>
          ...
      </introduction>
      <part number="1">
         <chapter>
            <title>Planning then Conducting Usability 
            Tests</title>   
            ...
         </chapter>
      </part>   
      ...
   </content>
</book>   

<book number="3">
   <metadata>
      ...
   </metadata>     
   <content>
      ...
      <component>
         <componentTitle>Writings by Usabilityguy
         </componentTitle>
         <subComponent>
           <componentTitle><componentDate normalize="1996">
           1996</componentDate>
           </componentTitle> 
           ...
           <subsubComponent>
           <componentTitle>"How to Evaluate Results from 
           User Tests."</componentTitle>
           </subsubComponent>
         </subComponent>
         ...
      <component>
      ...
   </content>
</book>

3.2.4 Q4 Query Across Siblings

Find all book text with the phrase "usability testing once the problems".

This query finds a phrase which begins in one element and ends in a sibling.

  • Operands: "usability testing once the problems"

  • Functionality: phrase query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: .

  • Comments: Querying across element boundaries is similar to an XQuery and XPath character string function converting the sub-tree under an element into a string by removing all markup. Element tags and names have been removed by tokenization, including title and p tags.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book//content[. ftcontains 
   "usability testing once the problems"]
where count($cont)>0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//content ftcontains "usability 
testing once the problems")>0]

Expected Result:

<book number="1">
   <metadata>
      ...
   </metadata> 
   <content>
      ...   
      <part number="2">
         <chapter>
            <title>Usability Testing</title>
            <p>Once the problems identified by expert 
            reviews have been corrected, it is time to 
            conduct some tests of the site with your unique 
            audience or audiences by conducting usability 
            testing.</p>
            ...
         </chapter>
      </part> 
   </content>
</book>

3.2.5 Q5 Query in Different Sub-Trees

Find all books with word "identify" in book introductions and part introductions.

This query finds a word in an element in different sub-trees.

  • Operands: "identify"

  • Functionality: word query, character wildcard (suffix) (0 or more)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content/introduction, ./content/part/introduction

  • Return: .

  • Comments: This query looks for a word in multiple instances of the introduction element which appear as a child of the content or part elements.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $bi := $book/content/introduction[./p ftcontains 
   "identif.*" with wildcards]
let $pi := $book/content/part/introduction[./p ftcontains 
   "identif.*" with wildcards]
where count($bi)>0 and count($pi)>0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./content/introduction ftcontains 
"identif.*" with wildcards and ./content/part/introduction 
ftcontains "identif.*" with wildcards)>0]

Expected Result:

<book number="1">  
   <metadata>
      ...
   </metadata>
   <content>                                          
      <introduction>
         ...
         <p>Expert reviews and usability testing are 
         methods of identifying problems in layout, 
         terminology, and navigation before they frustrate
         users and drive them away from your site.</p>
         ...
      </introduction>
      <part number="1">
         <title>Expert Reviews</title>    
         <introduction>
             <p>Expert reviewers identify problems 
             and recommend changes to web sites based 
             on research in human computer interaction 
             and their experience in the field.</p> 
             ...
         </introduction>
         ...
      </part> 
   </content>
</book> 

3.2.6 Q6 Query on Entire Document

Find all books if any one contains the word "mouse".

This query finds a word in a document (anywhere in the document), crossing all element boundaries

  • Operands: "mouse"

  • Functionality: word query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books

  • Query context: .

  • Return: .

  • Comments: It queries the root element and all its descendants. Querying across element boundaries is similar to an XQuery and XPath character string function converting the sub-tree under an element into a string by removing all markup. Element tags and their attributes have been removed by tokenization. This query looks for a word inside an entire document and returns the entire document if the word exists. It does not employ wildcards, stemming, or thesaurus support. It is similar to search engine queries that search a collection of documents and return a subset of the searched collection.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
where $book ftcontains "usability"
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[. ftcontains "usability"]

Expected Result:

<book number="1">
   <metadata>
      ...
   </metadata>
   <content>
      ...
   </content>
</book>

<book number="2">
   <metadata>
      ...
   </metadata>
   <content>
      ...
      <part number="2">
         <chapter>
            <title>Conducting Usability Tests</title> 
            ...
            <p>Give the user the script, then assure them 
            that you are testing the Web site, not them. 
            Users are asked to verbalize their thoughts as 
            they complete the tasks. The event is recorded 
            or someone takes notes. It is often preferable 
            to have two testers, <footnote>Usability 
            testing can be done at great expense or on a 
            shoe string, using <testingProcedure>in-house 
            expertise</testingProcedure> or 
            <testingProcedure>contracting with human 
            computer interaction professionals
            </testingProcedure>.</footnote> one to ask the 
            questions, another to take notes. Testers should 
            offer no guidance or comments to the user. Mouse 
            movements, typing, expressions, and the user's 
            words should be recorded.</p>
         </chapter>
         ...
      </part>
   </content>
</book>

<book number="3">
   <metadata>
      ...
   </metadata>
   <content>
      ...
   </content>
</book>

4 Use Case "OTHER": Queries on Attribute Values

4.1 Description

Unlike all the other use cases in this document which query element content implicitly, these use cases query XML attribute values. Attribute values are not queried implicitly; they are queried explicitly.

4.2 Queries and Results

4.2.1 Q1 Query on Attribute

Find all books with "improve" "web" "usability" in the short title.

This query finds multiple words in an attribute allowing word variants and allowing the words in any order with up to a specified number of intervening words.

  • Operands: "improve" "web" "usability"

  • Functionality: word queries, stemming, unordered distance (0 to 2 intervening words)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/title/@shortTitle

  • Return: ./metadata/title

  • Comments: This query illustrates full-text querying in an attribute.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
where $book/metadata/title/@shortTitle ftcontains "improve" 
   && "web" && "usability" with stemming distance at most 2 words    
return $book/metadata/title

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./metadata/title/@shortTitle ftcontains 
"improve" && "web" && "usability" with stemming  
distance at most 2 words)>0]/metadata/title

Expected Result:

<title shortTitle="Improving Web 
Site Usability">Improving the Usability of a 
Web Site Through Expert Reviews and Usability Testing</title>

4.2.2 Q2 Query on Element and Attribute

Find all books with the phrase "manuscript guides" in the short title and the phrase "user profiling" in a component title.

This query finds a phrase in an attribute and a phrase in an element.

  • Operands: "manuscript guides" "user profiling"

  • Functionality: phrase queries, stemming, and query

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./metadata/title/@shortTitle, ./componentTitle

  • Return: ./metadata/title/@shortTitle/text()

  • Comments: This query combines querying in an element with querying in an attribute.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $stitle := $book/metadata/title[./@shortTitle ftcontains 
   "manuscript guides" with stemming]
let $cont := $book//componentTitle[. ftcontains 
   "user profiling" with stemming]
where count($stitle)>0 and count($cont)>0
return data($book/metadata/title/@shortTitle)

Solution in XPath: None

Expected Result:

Usabilityguy Manuscript Guide

5 Use Case "WILDCARD": Character Wildcard Queries

5.1 Description

These use cases illustrate queries which use wildcards to append or insert a character or sequence of characters to a word or a part of a word. Character wildcards may be prefix (appended before the first character), infix (inserted into a word), or suffix (appended after the last character).

5.2 Queries and Results

5.2.1 Q1 One Character Suffix Wildcard Query

Find all books with the word "test" with a one character suffix in the text.

This query finds a word with a one character suffix (one character after the last character).

  • Operands: "test"

  • Functionality: word query, character wildcard (suffix) (1)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: .

  • Comments: This query finds "tests", but not "pretest, "tested", "testers", "testimony", and "testing" which also appear in the sample data. There is no "test" in the sample data, but if there was, this query would not have found it.

solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book/content[. ftcontains "test." 
  with wildcards]
where count($cont)>0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./content ftcontains "test." 
with wildcards)>0]

Expected Result:

<book number="1">
   <metadata>
      ...
   </metadata> 
   <content>
      ...   
      <part number="2">
         <chapter>
            <title>Usability Testing</title>
            <p>Once the problems identified by expert 
            reviews have been corrected, it is time to 
            conduct some tests of the site with your unique 
            audience or audiences by conducting usability 
            testing.</p>
            ...
         </chapter>
      </part>   
   </content>
</book>      

<book number="2">
   <metadata>
      ...
   </metadata> 
   <content>
      <introduction>
         <p>This is a basic handbook for planning and 
         conducting usability tests on Web sites. Usability 
         testing should be used in conjunction with other 
         expert review methods.</p>
         <p>This book has not been approved by the Web Site 
         Users Association.</p>
      </introduction>
      <part number="1">
         <chapter>
            <title>Planning then Conducting Usability 
            Tests</title> 
             ...
         </chapter>
      </part>
      <part number="2">
         <chapter>
            <title>Conducting Usability Tests</title>  
            ...  
         </chapter>
         ...  
      </part>      
      ...                                         
   </content> 
</book>     
   
<book number="3">
   <metadata>
      ...
   </metadata> 
   <content>
      ...   
     <part number="2"><container type="box">3-5</container>
         <title>Writings File, 
         <date normalize="1985/1999">1985-1999</date>
         </title>
         ...
         <component>
            <componentTitle>Writings by Usabilityguy
            </componentTitle>
            <subComponent>
               <componentTitle><componentDate normalize="1996">
               1996</componentDate>
               </componentTitle> 
               ...
               <subsubComponent>
                  <componentTitle>"How to Evaluate Results from 
                  User Tests."</componentTitle>
               </subsubComponent>
               ...
            </subComponent>
            ...
         </component>
         ...
      </part>
      ...
   </content>
</book>

5.2.2 Q2 Zero or One Character Prefix Wildcard Query

Find all books with the word "way" with no prefix or a one character prefix in the text.

This query finds a word with no prefix or a one character prefix (zero or one character before the first character).

  • Operands: "way"

  • Functionality: word query, character wildcard (prefix) (0 or 1)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: .

  • Comments: There is no "way" in the sample data but if there was, this query would have found it.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book//content[. ftcontains ".?way" 
   with wildcards]
where count($cont)>0
return $book

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//content ftcontains ".?way" 
with wildcards)>0]

Expected Result:

<book number="1">
   <metadata>
      ...
   </metadata> 
   <content>
      <introduction>
         ... 
         <p>Expert reviews and usability testing are 
         methods of identifying problems in layout, 
         terminology, and navigation before they frustrate
         users and drive them away from your site.</p>
         ...                
      </introduction>
      ...
   </content>     
</book>

5.2.3 Q3 Zero or More Character Infix Wildcard Query

Find all books with the words "serve" or "service" in the text.

This query finds words with no infix character or any number of infix characters (zero or more characters inserted in the middle of a word).

  • Operands: "serv", "e"

  • Functionality: word query, character wildcard (infix) (0 or more)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: ./@number, ./metadata/title/text(), ./content

  • Comments: This query returns the word "service" and would return the word "serve" if it existed in the sample data. It does not return the word "served" which exists in the sample data.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book//content[. ftcontains "serv.*e" 
   with wildcards]
where count($cont)>0
return $book/@number, $book/metadata/title/text(), $cont

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//content ftcontains "serv.*e" 
with wildcards)>0]/(@number|./metadata/title/text()
|./content)

Expected Result:

<book number="1"></book>
<title>Improving the Usability of a Web Site Through 
Expert Reviews and Usability Testing</title>
<content>
   ...      
   <part number="2">
      <chapter>
        <title>Usability Testing</title>
        ...
        <p>Then changes are made to improve service to 
        users.</p>
      </chapter>
   </part> 
</content>

5.2.4 Q4 One or More Character Suffix Wildcard Query on Part of a Word

Find all books with the phrases "usability testing" or "user testing" in the text.

This query finds a phrase allowing a suffix of one or more characters (one or more characters after the last character) on a part of one of the words.

  • Operands: "us testing"

  • Functionality: phrase query, character wildcard (suffix) (1 or more)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: ./@number, ./metadata/title/text(), ./content

  • Comments: This is a suffix query on a part of a word "us" which is not one of the words or one of the roots of the words desired in the results. The query on "us" will find "usability" and "user". Where stemmed queries (Section 6 (STEMMING)) attempt to return linguistic variants on a word or the root of a word, wildcards may be applied to any part of a word and will return all character combinations found.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book//content[. ftcontains "us.+ testing" 
   with wildcards]
where count($cont)>0
return $book/@number, $book/metadata/title/text(), $cont

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//content ftcontains "us.+ testing" 
with wildcards)>0]/(@number|./metadata/title/text()|./content)

Expected Result:

<book number="1"></book>  
<title>Improving the Usability of a Web Site Through 
Expert Reviews and Usability Testing</title>
<content>
   <introduction>
      ...
      <p>Expert reviews and usability testing are 
      methods of identifying problems in layout, 
      terminology, and navigation before they frustrate
      users and drive them away from your site.</p>
      ...
   </introduction>
   ...
   <part number="2">
      ...
      <chapter>
         <title>Usability Testing</title>
         <p>Once the problems identified by expert 
         reviews have been corrected, it is time to 
         conduct some tests of the site with your unique 
         audience or audiences by conducting usability 
         testing.</p>
         ...
      </chapter>
   </part> 
</content>
   
<book number="2"></book>  
<title>Usability Basics: How to Plan for and Conduct 
Usability Tests on Web Site Thereby Improving the 
Usability of Your Web Site</title> 
<content>
   <introduction>
      <p>This is a basic handbook for planning and 
      conducting usability tests on Web sites. Usability
      testing should be used in conjunction with other 
      expert review methods.</p>
      ...
   </introduction>
   <part number="1">
      ...
      <chapter>
         <p>Take the following steps to plan usability
         testing. <step number="1">Clarify and 
         articulate the goal of the usability testing. 
         <step number="1">Clarify and 
         articulate the goal of the usability testing.
         </step> <step number="2">Identify tasks which 
         are critical for users to be able to complete 
         successfully.</step> <step number="3">Compile 
         a script of questions or instructions which 
         will prompt the user to attempt those 
         tasks.</step> <step number="4">Identify your 
         users and begin recruiting them.</step> <step 
         number="5">Conduct a pretest on a few users.
         </step> <step number="6">Edit the script based 
         on insights gleaned from the pretest.</step> 
         <step number="7">Resume testing.</step></p>
      </chapter>
   </part>
   <part number="2">
      <chapter>
         <title>Conducting Usability Tests</title> 
         ...
         <p>Give the user the script, then assure them 
         that you are testing the Web site, not them. 
         Users are asked to verbalize their thoughts as 
         they complete the tasks. The event is recorded 
         or someone takes notes. It is often preferable 
         to have two testers, <footnote>Usability
         testing can be done at great expense or on a 
         shoe string, using <testingProcedure>in-house 
         expertise</testingProcedure> or 
         <testingProcedure>contracting with human 
         computer interaction professionals
         </testingProcedure>.</footnote> one to ask the 
         questions, another to take notes. Testers should 
         offer no guidance or comments to the user. Mouse 
         movements, typing, expressions, and the user's 
         words should be recorded.</p>
      </chapter>
      ...
   </part>
</content>

<book number="3"></book>  
<title>John Wesley Usabilityguy: A Register of His 
Papers</title>   
<content>
   ...
   <part number="2"><container type="box">3-5</container>
      <title>Writings File, 
      <date normalize="1985/1999">1985-1999</date>
      </title>
      ...
      <component>
         <componentTitle>Writings by Usabilityguy
         </componentTitle>
         <subComponent>
            <componentTitle><componentDate normalize="1996">
            1996</componentDate>
            </componentTitle> 
            <subsubComponent>
               <componentTitle>"How Many Users Are Enough 
               for User Testing?"</componentTitle>
            </subsubComponent> 
            ...
            <subsubComponent>
               <subsubComponent><componentTitle>"Do-It-
               Yourself User Testing"
               </componentTitle>
            </subsubComponent> 
         </subComponent>
      </component>
      ...
   </part>      
</content>

5.2.5 Q5 Specified Range of Characters Suffix Wildcard Query

Find all books with the word "test" with a three to four character suffix in the text.

This query finds a word with a number of characters within a specified range in a suffix (specified range of characters after the last character).

  • Operands: "test"

  • Functionality: word query, character wildcard (suffix) (3 to 4)

  • Data context: doc("http://bstore1.example.com/full-text.xml")/books/book

  • Query context: ./content

  • Return: ./@number, ./content

  • Comments: This query allows any three or four character suffix. It returns "testers" and "testing", but not "pretest" "tests" and "tested" which also appear in the sample data. There is no "test" in the sample data, but if there were, this query would not have found it.

Solution in XQuery:

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $cont := $book//content[. ftcontains "test.{3,4}" 
   with wildcards]
where count($cont)>0
return $book/@number, $cont

Solution in XPath:

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./content ftcontains "test.{3,4}" 
with wildcards)>0]/(@number|./content)

Expected Result:

<book number="1"></book>
<content>
   <introduction>
      ...
      <p>Expert reviews and usability testing are 
      methods of identifying problems in layout, 
      terminology, and navigation before they frustrate
      users and drive them away from your site.</p>
      ...
   </introduction>
   ...
   <part number="2">
      <chapter>
         <title>Usability Testing</title>
         <p>Once the problems identified by expert 
         reviews have been corrected, it is time to 
         conduct some tests of the site with your unique 
         audience or audiences by conducting usability 
         testing.</p>
         ...
      </chapter>
   </part> 
</content>
                                                
<book number="2"></book>
<content>
   <introduction>
      <p>This is a basic handbook for planning and 
      conducting usability tests on Web sites. Usability
      testing should be used in conjunction with other 
      expert review methods.</p>
      ...
   </introduction>
   <part number="1">
      <chapter>
         <p>Take the following steps to plan >usability
         testing. <step number="1">Clarify and 
         articulate the goal of the >usability testing. 
         <step number="1">Clarify and 
         articulate the goal of the usability testing.
         </step> <step number="2">Identify tasks which 
         are critical for users to be able to complete 
         successfully.</step> <step number="3">Compile 
         a script of questions or instructions which 
         will prompt the user to attempt those 
         tasks.</step> <step number="4">Identify your 
         users and begin recruiting them.</step> <step 
         number="5">Conduct a pretest on a few users.
         </step> <step number="6">Edit the script based 
         on insights gleaned from the pretest.</step> 
         <step number="7">Resume testing.</step></p>
      </chapter>
   </part>
   <part number="2">
      <chapter>
         <title>Conducting Usability Tests</title> 
         ...
         <p>Give the user the script, then assure them 
         that you are testing the Web site, not them. 
         Users are asked