W3C

XQuery and XPath Full Text 3.0 Requirements and Use Cases

W3C Working Draft 27 March 2012

This version:
http://www.w3.org/TR/2012/WD-xpath-full-text-30-requirements-use-cases-20120327/
Latest version:
http://www.w3.org/TR/xpath-full-text-30-requirements-use-cases/
Editor:
Pat Case, Library of Congress

Abstract

This document specifies requirements and use cases for Full-Text Search for use in XQuery 3.0 [XQuery 3.0: An XML Query Language] and XPath 3.0 [XML Path Language (XPath) 3.0].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a First Public Working Draft as described in the Process Document. It was jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity. This document will eventually be published as a Working Group Note to persistently record the Requirements that guided the development of XQuery and XPath Full Text 3.0 as a W3C Recommendation.

This document includes, for each requirement, a corresponding status, indicating the current situation of the requirement in XQuery and XPath Full Text 3.0 at the time that the spec was most recently published on 13 December 2011. Organizations and individuals should review this document to determine whether or not the requirements provided meet the needs of the full-text community. If additional requirements are identified, they may be added to these requirements in a future publication.

A future publication of this document will incorporate a number of Use Cases that assist the Working Groups in determining whether a candidate requirement is, in fact, a real requirement and illustrating various problems that XQuery and XPath Full Text 3.0 is intended to address.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FT30req]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Goals
2 Requirements
    2.1 Terminology
    2.2 General Requirements
        2.2.1 Backward compatibility
        2.2.2 Extension compatibility
    2.3 Full Text Search Functionality
        2.3.1 Feature names for XQuery require-feature and prohibit-feature
        2.3.2 Language Identifiers
        2.3.3 Match highlighting
        2.3.4 Optional URI for stemming algorithms
        2.3.5 Refine language about levels in FTThesaurus Option
        2.3.6 Tokenize items
        2.3.7 Count occurrences of search terms
        2.3.8 Return score values
        2.3.9 Search on punctuation
        2.3.10 Snippets
    2.4 Editorial Changes
        2.4.1 Irrevocable Stop Words
        2.4.2 Rename TokenInfo

Appendix

A References
    A.1 Non-Normative


1 Goals

The goal of XQuery and XPath Full Text 3.0 is to extend XQuery and XPath Full Text 1.0 with additional functionality in response to requests from users and implementors.

2 Requirements

2.1 Terminology

The following key words are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:

MUST

This word means that the item is an absolute requirement.

SHOULD

This word means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.

MAY

This word means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.

When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.

Each requirement also includes a status section, indicating its current situation in the XML-Query family of specifications. Three status levels are available:

"Green" status

green status This indicates that the requirement, according to its original formulation, has been completely met. Optional clarificatory text may follow.

"Yellow" status

yellow status This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.

"Red" status

red status This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.

2.2 General Requirements

2.2.1 Backward compatibility

XQuery and XPath Full Text 3.0 MUST be backward compatible.

Every valid XQuery and XPath Full Text 1.0 expression MUST be valid in XQuery and XPath Full Text 3.0 and it MUST evaluate to the same result.

green status Status: this requirement has been met.

2.2.2 Extension compatibility

XQuery and XPath Full Text 3.0 MUST be compatible with XQuery and XPath 3.0 extensions developed by the XML Query Working Group and the XSL Working Group.

green status Status: this requirement has been met.

2.3 Full Text Search Functionality

2.3.1 Feature names for XQuery require-feature and prohibit-feature

XQuery and XPath Full Text 3.0 MUST add feature names for XQuery require-feature and prohibit-feature to include the names defined in the "http://www.w3.org/2011/xquery-features" namespace.

green status Status: this requirement has been met.

2.3.2 Language Identifiers

XQuery and XPath Full Text 3.0 MUST specify in the Language Option how to handle multiple language identifiers for the same language, including languages represented by both two- and three- letter identifiers.

green status Status: this requirement has been met.

2.3.3 Match highlighting

XQuery and XPath Full Text 3.0 MUST support adding marker elements around token matches. How the element will be marked MAY be specified by an EQName or a function item as argument.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.4 Optional URI for stemming algorithms

XQuery and XPath Full Text 3.0 MUST define an optional URI to identify stemming algorithms. Issue raised in Bugzilla Bug 9680.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.5 Refine language about levels in FTThesaurus Option

XQuery and XPath Full Text 3.0 MUST refine language about levels in FTThesaurus Option. Issue raised in Bugzilla Bug 11444.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.6 Tokenize items

XQuery and XPath Full Text 3.0 MUST support explicitly tokenizing an item and returning a sequence of strings.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.7 Count occurrences of search terms

XQuery and XPath Full Text 3.0 MUST enable counting the number of occurrences of search terms specified in a full-text expression.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.8 Return score values

XQuery and XPath Full Text 3.0 MUST allow explicit access to score values that have been assigned to items by an FTContains expression, making the score values available for merging and other computations.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.9 Search on punctuation

XQuery and XPath Full Text 3.0 MUST support searching on punctuation. For example, searching on tokens that contain punctuation such as PB&J and document.xml.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.3.10 Snippets

XQuery and XPath Full Text 3.0 MUST support displaying snippets (match tokens in context). Snippets are brief segments of text surrounding matches displayed in search results to enable the user to better judge the usefulness of a search result.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

2.4 Editorial Changes

2.4.1 Irrevocable Stop Words

XQuery and XPath Full Text 3.0 MUST specify in the Stop Word Option that implementations may apply stop word lists during indexing and be unable to comply with query-time requests to not apply those stop words.

green status Status: this requirement has been met.

2.4.2 Rename TokenInfo

XQuery and XPath Full Text 3.0 MUST rename TokenInfo to something like TokenSpan or TokenRange. Issue raised in Bugzilla Bug 9541.

red status Status: this requirement has not been met. The Working Group has yet to decide whether it will be a requirement for this or any other version.

A References

A.1 Non-Normative

XQuery and XPath Data Model (XDM) 3.0
XQuery and XPath Data Model (XDM) 3.0, Norman Walsh, John Snelson, Editors. World Wide Web Consortium, 13 December 2011. This version is http://www.w3.org/TR/2011/WD-xpath-datamodel-30-20111213/. The latest version is available at http://www.w3.org/TR/xpath-datamodel-30/.
XQuery 3.0: An XML Query Language
XQuery 3.0: An XML Query Language, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 13 December 2011. This version is http://www.w3.org/TR/2011/WD-xquery-30-20111213/. The latest version is available at http://www.w3.org/TR/xquery-30/.
XML Path Language (XPath) 3.0
XML Path Language (XPath) 3.0, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 13 December 2011. This version is http://www.w3.org/TR/2011/WD-xpath-30-20111213/. The latest version is available at http://www.w3.org/TR/xpath-30/.
XQuery 3.0 Requirements
XQuery 3.0 Requirements, Daniel Engovatov, Jonathan Robie, Editors. World Wide Web Consortium, 16 September 2010. This version is http://www.w3.org/TR/2010/WD-xquery-30-requirements-20100916/. The latest version is available at http://www.w3.org/TR/xquery-30-requirements/.
SQL/MM Full-Text
ISO/IEC 13249-2:2000, Information technology — Database languages — SQL Multimedia and Application Packages — Part 2: Full-Text, International Organization For Standardization, 2000, referenced in e.g. "SQL Multimedia and Application Packages (SQL/MM)" (See http://www.acm.org/sigmod/record/issues/0112/standards.pdf)