Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The document specifies requirements for Full-Text search for use in XQuery [XQuery] and XPath [XPath].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a Working Draft for review by W3C Members and other interested parties. As of this publication, the Working Groups expect to eventually publish this document as a Working Group Note. This document was produced following the procedures set out for the W3C Process and was defined jointly by the XSL Working Group and the XML Query Working Group (both part of the XML Activity).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
These requirements, according to their original formulation, have been met by the Last Call Working Draft for XQuery 1.0 and XPath 2.0 Full-Text 1.0. In a few instances after the implications were understood and the cases carefully weighed, different alternatives than were specified in these requirements were chosen. In 3.4.3 Type, Range of Score, the requirement reads in part: The SCORE SHOULD be a float, in the range 0-1. Float has been changed to double because double is the maximal promotion type. In 3.5.3 Scoring Algorithm overridable, the requirement reads: The algorithm to produce combined SCOREs SHOULD be overridable by users. Since SCORE is implementation-dependent, the recommendation is silent on this and all matters relating to implementation of scoring.
This document includes, for each requirement, a corresponding status, indicating the current situation of the requirement in XQuery 1.0 and XPath 2.0 Full-Text 1.0 at the time that it was issued as a Last Call Working Draft in May, 2007. Organizations and individuals should review this document to determine whether or not the requirements provided meet the needs of the full-text community.
Public Last Call comments on this document and its open issues are invited. Comments on this document are desired by 22 June 2007. Comments on this document should be made in W3C's public Bugzilla system for this specification (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). When entering comments, select the Product named "XPath / XQuery / XSLT", the Component named "Full Text", and the Version named "Working drafts". This repository includes open issues recorded by the XML Query Working Group and the XSL Working Group, as well as by members of the public. If access to the Bugzilla system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery mailing list, public-qt-comments@w3.org It will be very helpful if you include the string [FTReq] in the subject line of your comment, whether made in Bugzilla or in email. Each Bugzilla entry and email message should contain only one comment. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.
This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
2 Terminology
2.1 Terminology
2.2 SCORE
2.3 Full-Text
Search
3 Language Design
3.1 The Data
Model
3.2 Side-effects on the data
3.3 Score
Function and Full-Text predicates
3.3.1 Predicate and Score
Independence
3.3.2 Score language
3.4 Score
algorithm
3.4.1 Return Score
3.4.2 Sort by Score
3.4.3 Type, Range of Score
3.4.4 Score Statistics
3.4.5 Semantics of Score
3.5 Combined
score
3.5.1 Score Combination
3.5.2 Score algorithm vendor-provided
3.5.3 Score algorithm overridable
3.5.4 Score influence
3.6 Extensibility
3.6.1 Extensible by vendors
3.6.2 Extensible by users
3.7 First, Future
Versions
3.8 End
user language
3.9 Searchable
query
3.10 Universality
4 Integration
4.1 XPath
4.2 Extensibility
Mechanisms
4.2.1 Integration into XQuery/XPath
4.2.2 XQuery and XPath Full-Text Extensibility
4.3 Composability
4.4 Human-readable
4.5 XML
syntax
5 Implementation
5.1 Declarativity
6 Functionality and Scope
6.1 Functionality
6.2 Search
Scope
6.2.1 Search within arbitrary structure
6.2.2 Constructed Structures
6.2.3 Return Arbitrary Nodes
6.2.4 Parts of Search Tree
6.3 Attributes
6.3.1 Search within attributes
6.3.2 Search across attributes and content
6.4 Markup
6.5 Element
Boundaries
6.5.1 Search across element boundaries
6.5.2 Element as a token boundary
6.6 Score
6.6.1 Score accessible
6.6.2 Implicit ordering
6.6.3 Score extendable
A References
A.1 Non-Normative
B Change Log
"Full-Text Search" (FTS) is a large field which covers a vast array of functionality. In addition, there are many different ways one could combine FTS capabilities with XQuery and XPath.
The requirements are written without reference to any particular solution.
The following key words are used throughout the document to specify the extent to which an item is a requirement for the work of the XML Query Working Group:
This word means that the item is an absolute requirement.
This word means that there may exist valid reasons not to treat this item as a requirement, but the full implications should be understood and the case carefully weighed before discarding this item.
This word means that an item deserves attention, but further study is needed to determine whether the item should be treated as a requirement.
When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Each requirement also includes a status section, indicating its current situation in the XML-Query family of specifications. Three status levels are available:
This indicates that the requirement, according to its original formulation, has been completely met. Optional clarificatory text may follow.
This indicates that the requirement has been partially met according to its original formulation. When this happens, explanatory text is provided to better clarify the current scope of the requirement.
This indicates that the requirement, according to its original formulation, has not been met. If this is the case, explanatory text is provided.
[Definition: SCORE reflects relevance of matched material.]
[Definition: Full-Text Search in this document is an extension to the XQuery and XPath language. It provides a way to query text which has been tokenized, i.e. broken into a sequence of words, units of punctuation, and spaces. Tokenization enables functions and operators which work with the relative positioning of words (e.g., proximity operators). Tokenization also enables functions and operators which operate on a part or the root of the word (e.g., wildcards, stemming).]
This section covers requirements for XQuery and XPath Full-Text language design that are independent from, but related to, integration and scoping requirements.
XQuery and XPath Full-Text functions MUST operate on instances of the [XDM].
Status: this requirement has been met.
XQuery and XPath Full-Text MUST NOT introduce or rely on side-effects.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST allow the user to return SCORE.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST allow the user to sort by SCORE.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST define the type and range of SCORE values. The SCORE SHOULD be a float, in the range 0-1.
Status: this requirement has been partially met. Float has been changed to double because double is the maximal promotion type.
XQuery and XPath Full-Text MUST be able to generate a SCORE for a combination of Full-Text predicates.
Status: this requirement has been met.
The algorithm to produce combined SCOREs MUST be vendor-provided.
Status: this requirement has been met.
The algorithm to produce combined SCOREs SHOULD be overridable by users.
Status: this requirement has been partially met. Since SCORE is implementation-dependent, the recommendation is silent on this and all matters relating to implementation of scoring.
Users MUST be able to influence individual components of complex score expressions.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST be extensible by vendors.
Status: this requirement has been met.
XQuery and XPath Full-Text MAY be extensible by users.
Status: this requirement has been met.
The first version of XQuery and XPath Full-Text MUST provide a robust framework for future versions.
Status: this requirement has been met.
It is not a requirement that XQuery and XPath Full Text be designed as an end-user UI language.
Status: this requirement has been met.
It SHOULD be possible to search XQuery and XPath Full-Text queries.
Status: this requirement has been met.
This section specifies requirements for the integration of XQuery and XPath Full-Text with XQuery and XPath.
Part, but not necessarily all, of XQuery and XPath Full-Text MUST be usable as part of an XPath expression.
Status: this requirement has been met.
XQuery and XPath Full-Text SHOULD use the extensibility mechanisms that exist in XQuery and XPath for integration into XQuery and XPath.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST use the extensibility mechanisms that exist in XQuery and XPath for it's own extensibility.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST be composable with XQuery, and SHOULD be composable with itself.
Status: this requirement has been met.
XQuery and XPath Full-Text may have more than one syntax binding. One query language syntax must be convenient for humans to read and write. See XML Query Requirements.
Status: this requirement has been met.
XQuery and XPath Full-Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements.
Status: this requirement has been met.
This section defines requirements for the functionality in XQuery and XPath Full-Text, and the scope of XQuery and XPath Full-Text queries.
XQuery and XPath Full-Text MUST provide, in the first release, the minimum set of Full-Text functionality that is useful.
single-word search
phrase search
support for stop words
single character suffix
0 or more character suffix
0 or more character prefix
0 or more character infix
proximity searching (unit: words)
specification of order in proximity searching
combination using AND
combination using OR
combination using NOT
word normalization, diacritics
ranking, relevance
Status: this requirement has been met.
Additional functionality represented in the [XQuery and XPath Full-Text Use Cases] MUST be considered, but may be left to a future release.
Status: this requirement has been met.
Additional functionality from other Full-Text search contexts such as [SQL/MM Full-Text] MUST be considered, but SHOULD be left to a future release.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST allow search within an arbitrary structure (an arbitrary XPath expression).
Status: this requirement has been met.
XQuery and XPath Full-Text MUST NOT preclude Full-Text search within structures constructed during a query.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST allow a query to return arbitrary nodes.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST allow the combination of predicates on different parts of the searched document 'tree'.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST support Full-Text search within attributes.
Status: this requirement has been met.
XQuery and XPath Full-Text MAY support Full-Text search within attributes in conjunction with Full-Text search within element content.
Status: this requirement has been met.
If XQuery and XPath Full-Text supports search within names of elements and attributes, then it MUST distinguish between
element content and attribute values
and
names of elements and attributes
in any search.
Status: this requirement has been met.
XQuery and XPath Full-Text MUST support search across element boundaries, at least for NEAR.
Status: this requirement has been met.
Author | Date | Action | Description |
Stephen Buxton | 2003-03-19 | Added a Change Log | |
Stephen Buxton | 2003-03-19 | Terminology definition changes | Switched the definitions of SHOULD and MAY, to be consistent with [XML Query Requirements]. The rest of the document does not need to change, since the earlier versions of this document, on which the text of the spec is based, referred to the definitions in [XML Query Requirements]. |
Stephen Buxton | 2003-04-18 | Change XML Query Requirements link to external URI | Changed links in the document body to point to external latest copy of XML Query Requirements. |
Pat Case | 2006-11-17 | Recorded that requirements were met | Recorded that the XML Query Requirements have been met by the Full-Text Last Call Working Drafts. |