Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The document specifies requirements for Full-Text search for use in XQuery [XQuery] and XPath [XPath].
This is a public W3C Working Draft for review by W3C Members and other interested parties. This section describes the status of this document at the time of its publication. It is a draft document and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress." A list of current public W3C technical reports can be found at http://www.w3.org/TR/.
The Full-Text Requirements have been defined jointly by the XQuery Working Group and the XSL Working Group (both part of the XML Activity).
This is the first version of this document.
This document is a work in progress. It contains many open issues, and should not be considered to be fully stable. Vendors who wish to create preview implementations based on this document do so at their own risk. While this document reflects the general consensus of the working groups, there are still controversial areas that may be subject to change.
Public comments on this document and its open issues are welcome. Comments should be sent to the W3C XPath/XQuery mailing list, public-qt-comments@w3.org (archived at http://lists.w3.org/Archives/Public/public-qt-comments/).
Patent disclosures relevant to this specification may be found on the XML Query Working Group's patent disclosure page at http://www.w3.org/2002/08/xmlquery-IPR-statements and on the XSL Working Group's patent disclosure page at http://www.w3.org/Style/XSL/Disclosures.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
1 Introduction
2 Terminology
2.1 MUST
2.2 MAY
2.3 SHOULD
2.4 SCORE
2.5 Full-Text Search
3 Language Design
3.1 The Data Model
3.2 Side-effects on the data
3.3 Score Function and Full-Text predicates
3.3.1 Predicate and Score Independence
3.3.2 Score language
3.4 Score algorithm
3.4.1 Return Score
3.4.2 Sort by Score
3.4.3 Type, Range of Score
3.4.4 Score Statistics
3.4.5 Semantics of Score
3.5 Combined score
3.5.1 Score Combination
3.5.2 Score algorithm vendor-provided
3.5.3 Score algorithm overridable
3.5.4 Score influence
3.6 Extensibility
3.6.1 Extensible by vendors
3.6.2 Extensible by users
3.7 First, Future Versions
3.8 End user language
3.9 Searchable query
3.10 Universality
4 Integration
4.1 XPath
4.2 Extensibility Mechanisms
4.2.1 Integration into XQuery/XPath
4.2.2 XQuery/XPath Full-Text Extensibility
4.3 Composability
4.4 Human-readable
4.5 XML syntax
5 Implementation
5.1 Declarativity
6 Functionality and Scope
6.1 Functionality
6.2 Search Scope
6.2.1 Search within arbitrary structure
6.2.2 Constructed Structures
6.2.3 Return Arbitrary Nodes
6.2.4 Parts of Search Tree
6.3 Attributes
6.3.1 Search within attributes
6.3.2 Search across attributes and content
6.4 Markup
6.5 Element Boundaries
6.5.1 Search across element boundaries
6.5.2 Element as a token boundary
6.6 Score
6.6.1 Score accessible
6.6.2 Implicit ordering
6.6.3 Score extendable
A References
A.1 Non-Normative
"Full-Text Search" (FTS) is a large field which covers a vast array of functionality. In addition, there are many different ways one could combine FTS capabilities with XQuery and XPath.
This paper describes a set of requirements for FTS in XQuery/XPath (XQuery/XPath Full-Text). At this stage in the life of the document, these requirements should be read as suggestions only: the issues associated with the requirements are to be discussed and resolved by the relevant Working Groups. This format provides a firm basis for the Working Groups to set the direction of the work on XQuery/XPath Full-Text, and to compare existing proposals. Once the issues are resolved and this Requirements document is finalized, it will be easier to define the functionality of XQuery/XPath Full-Text and it's integration with XQuery and/or XPath.
Note that we will attempt to define requirements for the language without reference to any particular solution.
We use the terms MUST, SHOULD and MAY throughout the document to specify the extent to which an item is a requirement for the work of XQuery/XPath Full-Text. We use the same definitions of MUST, SHOULD and MAY as The XQuery Requirements [XQuery Requirements]
When the words MUST, SHOULD, or MAY are used in this technical sense, they occur as a hyperlink to these definitions. These words will also be used with their conventional English meaning, in which case there is no hyperlink. For instance, the phrase "the full implications should be understood" uses the word "should" in its conventional English sense, and therefore occurs without the hyperlink.
Other terminology used in this document:
This section covers requirements for XQuery/XPath Full-Text language design that are independent from, but related to, integration and scoping requirements.
XQuery/XPath Full-Text functions MUST operate on instances of the XQuery/XPath Data Model.
XQuery/XPath Full-Text MUST NOT introduce or rely on side-effects.
The first version of XQuery/XPath Full-Text MUST provide a robust framework for future versions.
It is not a requirement that XQuery/XPath Full Text be designed as an end-user UI language.
It SHOULD be possible to search XQuery/XPath Full-Text queries.
This section specifies requirements for the integration of XQuery/XPath Full-Text with XQuery and XPath.
Part, but not necessarily all, of XQuery/XPath Full-Text MUST be usable as part of an XPath expression..
XQuery/XPath Full-Text MUST be composable with XQuery, and SHOULD be composable with itself.
XQuery/XPath Full-Text may have more than one syntax binding. One query language syntax must be convenient for humans to read and write. See XQuery Requirements
XQuery/XPath Full-Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XQuery Requirements
This section defines requirements for the functionality in XQuery/XPath Full-Text, and the scope of XQuery/XPath Full-Text queries.
XQuery/XPath Full-Text MUST provide, in the first release, the minimum set of Full-Text functionality that is useful.
single-word search
phrase search
support for stopwords
single character suffix
0 or more character suffix
0 or more character prefix
0 or more character infix
proximity searching (unit: words)
specification of order in proximity searching
combination using AND
combination using OR
combination using NOT
word normalization, diacritics
ranking, relevance
Additional functionality represented in the [XQuery and XPath Full-Text Use Cases] MUST be considered, but may be left to a future release.
Additional functionality from other Full-Text search contexts such as [SQL/MM Full-Text] MUST be considered, but SHOULD be left to a future release.
element content and attribute values
names of elements and attributes