W3C

XQuery 1.0 and XPath 2.0 Full-Text

W3C Working Draft 1 May 2006

This version:
http://www.w3.org/TR/2006/WD-xquery-full-text-20060501/
Latest version:
http://www.w3.org/TR/xquery-full-text/
Previous versions:
http://www.w3.org/TR/2005/WD-xquery-full-text-20051103/ http://www.w3.org/TR/2005/WD-xquery-full-text-20050915/ http://www.w3.org/TR/2005/WD-xquery-full-text-20050404/ http://www.w3.org/TR/2004/WD-xquery-full-text-20040709/
Editors:
Sihem Amer-Yahia, AT&T Labs - Research <sihem@research.att.com>
Chavdar Botev, Invited Expert <cbotev@cs.cornell.edu>
Stephen Buxton, Mark Logic Corporation <stephen.buxton@marklogic.com>
Pat Case, Library of Congress <pcase@crs.loc.gov>
Jochen Doerre, IBM <doerre@de.ibm.com>
Mary Holstege, Mark Logic Corporation <mary.holstege@marklogic.com>
Darin McBeath, Elsevier <D.McBeath@elsevier.com>
Michael Rys, Microsoft <mrys@microsoft.com>
Jayavel Shanmugasundaram, Invited Expert <jai@cs.cornell.edu>

This document is also available in these non-normative formats: XML.


Abstract

This document defines the syntax and formal semantics of XQuery 1.0 and XPath 2.0 Full-Text which is a language that extends XQuery 1.0 [XQuery 1.0: An XML Query Language] and XPath 2.0 [XML Path Language (XPath) 2.0] with full-text search capabilities.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a public W3C Working Draft for review by W3C members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is the fifth version of this document. Since the last version was published, several technical and editorial changes have been made to all the sections of the document. Among the most significant changes are: the addition of a section describing the processing model for full-text search and how it integrates with the XQuery Processing Model; the reformulation of the AllMatches model so that a primitive match (TokenInfo) now can represent an interval of token positions, and hence, a match of a phrase (in the former version phrases were modeled using distance constraints, which had certain unwanted implications when distance operators were explicitly applied to phrases); the restriction of the FTTimes operation to simple FTSelections; and several simplifications in the semantics functions that the latter two changes made possible, like the removal of the AllMatches normalization. The XQuery functions that are used to define the semantics of the full-text operations have been thoroughly revised and are now syntax- and type-checked.

This document has been produced following the procedures set out for the W3C Process. This document was produced through the efforts of XML Query Working Group and the XSL Working Group (both part of the XML Activity). It is designed to be read in conjunction with the following documents: W3C XQuery and XPath Full-Text Requirements [XQuery and XPath Full-Text Requirements] and the W3C XQuery Full-Text Use Cases [XQuery 1.0 and XPath 2.0 Full-Text Use Cases].

Public comments on this document and its open issues are invited. Comments should be entered into the issue tracking system for this specification (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C mailing list, public-qt-comments@w3.org (http://lists.w3.org/Archives/Public/public-qt-comments/) with "[FT]" at the beginning of the subject field of email messages involving such comments.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains public lists of any patent disclosures made in connection with the deliverables of the XML Query Working Group and the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Full-Text Search and XML
    1.2 Organization of this document
    1.3 A word about namespaces
2 Full-Text Extensions to XQuery and XPath
    2.1 Processing Model
    2.2 Expression FTContainsExpr
        2.2.1 FTContainsExpr Description
        2.2.2 FTContainsExpr Examples
    2.3 Score Variables
        2.3.1 Using Weights Within a Scored FTContainsExpr
    2.4 Extensions to the Static Context
3 FTSelections
    3.1 Full-Text Operators
        3.1.1 FTWords
        3.1.2 FTOr
        3.1.3 FTAnd
        3.1.4 FTMildNot
        3.1.5 FTUnaryNot
        3.1.6 FTOrder
        3.1.7 FTScope
        3.1.8 FTDistance
        3.1.9 FTWindow
        3.1.10 FTTimes
        3.1.11 FTContent
    3.2 FTMatchOptions
        3.2.1 FTCaseOption
        3.2.2 FTDiacriticsOption
        3.2.3 FTStemOption
        3.2.4 FTThesaurusOption
        3.2.5 FTStopwordOption
        3.2.6 FTLanguageOption
        3.2.7 FTWildCardOption
    3.3 FTIgnoreOption
4 Semantics
    4.1 Tokenization
        4.1.1 Examples
        4.1.2 Representations of Tokenized Text and Matching
    4.2 Evaluation of FTSelections
        4.2.1 AllMatches
            4.2.1.1 Formal Model
            4.2.1.2 Examples
            4.2.1.3 XML representation
        4.2.2 FTSelections
            4.2.2.1 XML Representation
            4.2.2.2 The evaluate function
            4.2.2.3 Formal semantics functions
            4.2.2.4 FTWords
            4.2.2.5 FTOr
            4.2.2.6 FTAnd
            4.2.2.7 FTUnaryNot
            4.2.2.8 FTMildNot
            4.2.2.9 FTOrder
            4.2.2.10 FTScope
            4.2.2.11 FTContent
            4.2.2.12 FTDistance
            4.2.2.13 FTWindow
            4.2.2.14 FTTimes
        4.2.3 Match Options Semantics
            4.2.3.1 Types
            4.2.3.2 High-Level Semantics
            4.2.3.3 Formal Semantics Functions
            4.2.3.4 FTCaseOption
            4.2.3.5 FTDiacriticsOption
            4.2.3.6 FTStemOption
            4.2.3.7 FTThesaurusOption
            4.2.3.8 FTStopWordOption
            4.2.3.9 FTLanguageOption
            4.2.3.10 FTWildCardOption
    4.3 XQuery 1.0 and XPath 2.0 Full-Text and Scoring Expressions
        4.3.1 FTContainsExpr
            4.3.1.1 Semantics of FTContainsExpr
        4.3.2 Scoring
        4.3.3 Example

Appendices

A EBNF for XQuery 1.0 Grammar with Full-Text extensions
    A.1 Terminal Symbols
B EBNF for XPath 2.0 Grammar with Full-Text extensions
    B.1 Terminal Symbols
C Static Context Components
D Error Conditions
E References
    E.1 Normative References
    E.2 Non-normative References
F Acknowledgements (Non-Normative)
G Glossary (Non-Normative)
H Checklist of Implementation-Defined Features (Non-Normative)
I Change Log (Non-Normative)


1 Introduction

This document defines the language and the formal semantics of XQuery 1.0 and XPath 2.0 Full-Text. This language is designed to meet the requirements identified in W3C XQuery and XPath Full-Text Requirements [XQuery and XPath Full-Text Requirements] and to support the queries in the W3C XQuery Full-Text Use Cases [XQuery 1.0 and XPath 2.0 Full-Text Use Cases].

XQuery 1.0 and XPath 2.0 Full-Text extends the syntax and semantics of XQuery 1.0 and XPath 2.0.

1.1 Full-Text Search and XML

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT [SQL/MM] defines extensions to SQL to express full-text searches providing similar functionality as does this full-text language extension to XQuery 1.0 and XPath 2.0.

XML documents may contain highly-structured data (numbers, dates), unstructured data (untagged free-flowing text), and semi-structured data (text with embedded tags). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.

Full-text search is different from substring search in many ways:

  1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases the 20.9 version ...". A full-text search for the token "lease" will not.

  2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as "mouse" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens "XML" and "Query" allowing up to 3 intervening words.

  3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

    As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full-Text.

The following definitions apply to full-text search:

  1. [Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of words, units of punctuation, and spaces.

  2. [Definition: A token is defined as a character, n-gram, or sequence of characters returned by a tokenizer as a basic unit to be searched. Each instance of a token consists of one or more consecutive characters. Beyond that, tokens are implementation-defined.] Note that consecutive tokens need not be separated by either punctuation or space, and tokens may overlap. [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

    Note:

    In some natural languages, tokens and words can be used interchangeably.

  3. Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).

    Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).

    Tokenization also uniquely identifies sentences and paragraphs in which tokens appear. [Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.] [Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.] Whatever a tokenizer for a particular language chooses to do, it must preserve the containment hierarchy: paragraphs contain sentences which contain tokens.

    The tokenizer has to evaluate two equal strings in the same way, i.e., it should identify the same tokens. Everything else is implementation-defined.

  4. This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.

  5. Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries, while formatting markup sometimes does not. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization.

1.2 Organization of this document

This document is organized as follows. We first present a high level syntax for the XQuery 1.0 and XPath 2.0 Full-Text language along with some examples. Then, we present the syntax and examples of the basic primitives in the XQuery 1.0 and XPath 2.0 Full-Text language. This is followed by the semantics of the XQuery 1.0 and XPath 2.0 Full-Text language. The appendix contains a section that provides an EBNF for the XPath 2.0 Grammar with Full-Text extensions, an EBNF for XQuery 1.0 Grammar with Full-Text extensions, acknowledgements and a glossary.

1.3 A word about namespaces

Certain namespace prefixes are predeclared by XQuery 1.0 and, by implication, by this specification, and bound to fixed namespace URIs. These namespace prefixes are as follows:

  • xml = http://www.w3.org/XML/1998/namespace

  • xs = http://www.w3.org/2001/XMLSchema

  • xsi = http://www.w3.org/2001/XMLSchema-instance

  • fn = http://www.w3.org/2005/xpath-functions

  • xdt = http://www.w3.org/2005/xpath-datatypes

  • local = http://www.w3.org/2005/xquery-local-functions

In addition to the prefixes in the above list, this document uses the prefix err to represent the namespace URI http://www.w3.org/2005/xqt-errors, This namespace prefix is not predeclared and its use in this document is not normative. Error codes that are not defined in this document are defined in other XQuery 1.0 and XPath 2.0 specifications, particularly [XML Path Language (XPath) 2.0] and [XQuery 1.0 and XPath 2.0 Functions and Operators].

Finally, this document uses the prefix fts to represent a namespace containing a number of functions used in this document to describe the semantics of XQuery 1.0 and XPath 2.0 Full-Text functions. There is no requirement that these functions be implemented, therefore no URI is associated with that prefix.

2 Full-Text Extensions to XQuery and XPath

XQuery 1.0 and XPath 2.0 Full-Text extends the languages of XQuery 1.0 and XPath 2.0 in three ways. It:

  1. Adds a new expression called FTContainsExpr;

  2. Enhances the syntax of FLWOR expressions in XQuery 1.0 and for expressions in XPath 2.0 with optional score variables; and

  3. Adds static context declarations for full-text match options to the query prolog.

Additionally, it extends the data model and processing models in various ways.

2.1 Processing Model

As part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance, an implementation-defined full-text process, called tokenization is usually executed.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of token occurrences found in the target text (nodes) of a search. These token occurrences are characterized by unique identifiers that capture the relative position of the token inside the string, the relative position of the sentence containing the token, and the relative position of the paragraph containing the token.

The tokenization process is implementation-dependent. For example, the tokenization may differ from domain to domain and from language to language. This specification will only impose a very few number of constraints on the semantics of a correct tokenizer. As a consequence, all the examples in this document are only given for explanation purposes but they are not mandatory, i.e. the result of such full-text queries will of course depend on the tokenizer that is being used.

A full-text expression or FTContainsExpr, evaluated within the normal Query Processing (XQuery Processing Model), is composed of several parts:

  1. An XPath 2.0 or XQuery 1.0 expression (RangeExpr) that specifies the sequence of items to be searched. Those items are called the search context.

  2. The full-text selection to be applied (FTSelections). FTSelections are, syntactically and semantically, fully composable and contain:

    • Required:

      • Words and phrases for which a search is performed (FTWords).

    • Optional:

      • Match options, such as indicators for case sensitivity and stop words (FTMatchOptions);

      • Boolean full-text operators, that compose an FTSelection from simpler FTSelections;

      • Other full-text operators that are constraints on the positions of matches, such as indicators for distance between tokens and for the cardinality of matches; and

      • The weighing information. Each individual search term in an FTSelection may be annotated with optional weight information. This information may be used during the evaluation of the FTSelections to calculate scoring, information that quantifies the relevance of the result to the given search criteria.

  3. An optional Xpath 2.0 or XQuery 1.0 expression (UnionExpr) that specifies the set of nodes, descendents of the RangeExp, which contents may be ignored for the purpose of determining a match during the search (FTIgnoreOption).

The results of the evaluation of the FTSelection operators are instances of the AllMatches model, which complements the XQuery Data Model (XDM) for processing full-text queries. An AllMatches instance describes all possible solutions to the full-text query for a given search context item. Each solution is described by a Match instance. A Match instance contains the tokens from the search context that must be included (described using StringInclude instances which model the positive terms) and the tokens from search context item that must be excluded (described using StringExclude instances which model the negative terms). Each negative or positive term is modeled as a tuple: the position of the query word or phrase in the FTSelection, and a TokenInfo structure that describes a consecutive sequence of token occurrences in the text string which match the query word or phrase.

Processing Model Extensions

Figure 1 provides a schematic overview of the XQuery 1.0 and XPath 2.0 Full-Text processing steps that are discussed in detail below. Some of these steps are completely outside the domain of XQuery; in Figure 1, these are depicted outside the black line that represents the boundaries of language. The diagram only shows the central pieces of the XQuery Processing Model (see Section 2.2 Processing ModelXQ), however zooms in on the Execution Engine where the processing of the Full-Text extensions takes place. The full-text processing steps are labeled as FTn within the diagram and are referenced within the text.

Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all FTSelections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization normally occurs at the time of parsing of the original XML documents, for example, during the Data Model Generation process (see Figure 1). But here it may also occur "on-the-fly" transforming an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of FTSelections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models.

The resulting AllMatches instance obtained by the evaluation of a Full Text expression is converted into a Boolean value before being returned to the enclosing XPath or XQuery operation as follows. If at least one member of the disjunction contains only positive terms then value returned is true. If all members of the disjunction contain negative terms the result is false.

Weighing information, in an implementation-dependent fashion, may be used when calculating the scoring information computed and made available by FTContainsExpr to the optional score construct.

Section 3 describes the syntax and the informal semantics of Full Text operators. Their formal semantics is defined in Section 4. The AllMatches data model is formally defined in Section 4.

Given the components of a given Full Text expression, the evaluation algorithm will proceed according to the following steps, also referenced in the processing model diagram as steps FTn (see Fig. 1):

  1. Evaluate the search context expression, resulting in the set of search context items; (FT1 provides the evaluation of any Xpath 2.0 or XQuery 1.0 expressions that generates or modifies the search context, as well as the query string(s) in a partially evaluated FTSelection expression)

  2. Evaluate the (optional) ignore expression, resulting in the set of ignored nodes and virtually delete the ignore nodes from the search context nodes tree. (Included in FT1)

  3. Apply the tokenization algorithm to query string(s). (FT2.1 -- this is implementation-dependent)

  4. For each search context item:

    1. Apply the tokenization algorithm in order to extract potentially matching terms together with their positional information. This step results in a sequence of token occurrences. (FT2.2 -- this is implementation-dependent)

    2. Evaluate the simple "FTWord" operators in the FTSelection against the tokenized input. This results in a set of AllMatches instances. (FT3)

    3. Evaluate the rest of the FTSelection operator tree in a bottom up fashion. At each step the AllMatches instance produced by the previous steps are given as input, and a new instance of the AllMatches is obtained as output. At each step the FTMatchOptions are controlling the semantics of the application of the FTWords operator. (FT4)

  5. Convert the AllMatches instance into a Boolean value. (FT5)

The additional scoring information (also part of FT5) that is produced by the evaluation of the Full Text expression is implementation dependent and is not specified in this document and is made available at the same time the Boolean value is returned.

2.2 Expression FTContainsExpr

As a syntactic construct an FTContainsExpr behaves similar to a comparison expression (see Section 3.5.2 General ComparisonsXQ). This grammar rule introduces FTContainsExpr.

[50]    ComparisonExpr    ::=    FTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?

An FTContainsExpr may be used anywhere a ComparisonExpr may be used. FTContainsExprs have higher precedence than comparison operators, so the results of FTContainsExpr may be compared without enclosing them in parentheses.

2.2.1 FTContainsExpr Description

[51]    FTContainsExpr    ::=    RangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?

An FTContainsExpr returns a Boolean value. It returns true, if there is some node in RangeExpr that, after tokenization, matches FTSelection. For the purpose of determining a match, certain descendants of nodes in RangeExpr may be ignored, as specified in FTIgnoreOption.

2.2.2 FTContainsExpr Examples

The following example in extended XQuery 1.0 returns the author of each book with a title containing a token with the same root as dog and the token cat.

for $b in /books/book
where $b/title ftcontains ("dog" with stemming) && "cat" 
return $b/author

The same example in extended XPath 2.0 is written as:


/books/book[title ftcontains ("dog" with stemming) && "cat"]/author

2.3 Score Variables

Besides specifying a match of a full-text search as a Boolean condition, full-text search applications typically also have the ability to associate scores with the results. [Definition: Scores express the relevance of those results to the full-text search conditions.]

XQuery 1.0 and XPath 2.0 Full-Text extends the languages of XQuery 1.0 and XPath 2.0 further by adding optional score variables to the for and let clauses of FLWOR expressions.

The production for the extended for clause follows.

[35]    ForClause    ::=    "for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)*
[37]    FTScoreVar    ::=    "score" "$" VarName

When a score variable is present in a for clause the evaluation of the expression following the in keyword not only needs to determine the result sequence of the expression, i.e., the sequence of items which are iteratively bound to the for variable. It must also determine in each iteration the relevance "score" value of the current item and bind the score variable to that value.

In the following example book elements are determined that satisfy the condition [content ftcontains "web site" && "usability" and .//chapter/title ftcontains "testing"]. The scores assigned to the book elements are returned.

for $b score $s 
    in /books/book[content ftcontains "web site" && "usability" 
                   and .//chapter/title ftcontains "testing"]
return $s

XPath 2.0 Full-Text extends the language of XPath 2.0 in the for expression in the same way: with optional score variables. The example above is also a legal example of the XPath 2.0 extension.

Scores are typically used to order results, as in the following, more complete example.

for $b score $s 
    in /books/book[content ftcontains "web site" && "usability"]
where $s > 0.5
order by $s descending
return <result>  
          <title> {$b//title} </title> 
          <score> {$s} </score> 
       </result>

The score variable is bound to a value which reflects the relevance of the match criteria in the FTSelections to the nodes in the respective RangeExprs. The calculation of relevance is implementation-dependent, but score evaluation must follow these rules:

  1. Score values are of type xs:double in the range [0, 1].

  2. For score values greater than 0, a higher score must imply a higher degree of relevance

Similar to their use in a for clause, score variables may be specified in a let clause. A score variable in a let clause is also bound to the score of the expression evaluation, but in the let clause one score is determined for the complete result. The let variable may be dropped from the let clause, if the score variable is present.

The production for the extended let clause follows.

[38]    LetClause    ::=    (("let" "$" VarName TypeDeclaration? FTScoreVar?) | ("let" "score" "$" VarName)) ":=" ExprSingle ("," (("$" VarName TypeDeclaration? FTScoreVar?) | FTScoreVar) ":=" ExprSingle)*

While when using the score option in a for clause the expression following the in keyword has the dual purpose of filtering, i.e., driving the iteration, and determining the scores, it is possible to separately specify expressions for filtering and scoring by combining a simple for clause with a let clause that uses scoring. The following is an example of this.

for $b in /books/book[.//chapter/title ftcontains "testing"]
let score $s := $b/content ftcontains "web site" && "usability" 
order by $s descending
return <result score="{$s}">{$b}</result>

This example returns book elements with chapter titles that contain "testing". Along with the book elements scores are returned. These scores, however, reflect whether the book content contains "web site" and "usability".

Note that it is not a requirement of the score of an FTContainsExpr to be 0, if the expression evaluates to false, nor to be non-zero, if the expression evaluates to true. Hence, in the example above it is not possible to infer the Boolean value of the FTContainsExpr in the let clause from the calculated score of a returned result element. For instance, an implementation may want to assign a non-zero score to a book that contained only "web site", but not "usability", as this may be considered more relevant than a book that does not contain either of both.

The use of score variables introduces a second-order aspect to the evaluation of expressions which cannot be emulated by (first-order) XQuery functions. Consider the following replacement of the clause let score $s := FTContainsExpr

let $s := score(FTContainsExpr)

where a function score is applied to some FTContainsExpr. If the function score were first-order, it would only be applied to the result of the evaluation of its argument, which is one of the Boolean constants true or false. Hence, there would be at most two possible values such a score function would be able to return and no further differentiation would be possible.

2.3.1 Using Weights Within a Scored FTContainsExpr

[Definition: Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions.] Syntactically weight declarations are introduced in the FTSelection production, described in FTSelections.

for $b in /books/book
let score $s := $b/content ftcontains ("web site" weight 0.2)
                                  && ("usability" weight 0.8)
return <result score="{$s}">{$b}</result>

The effect of weights on the result score is implementation-dependent. However, weight declarations must follow these rules:

  1. Weights in an FTContainsExpr are significant only in relation to each other; and

  2. When no explicit weight is specified, the default weight is 0.5.

Weight declarations in an FTContainsExpr for which no scores are evaluated are ignored.

2.4 Extensions to the Static Context

The XQuery Static Context is extended by a component for each of the full-text match options. Thus, the default of a match option in a query may be changed by providing a setting in the static context using the following declaration syntax.

[6]    Prolog    ::=    ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator)* ((VarDecl | FunctionDecl | OptionDecl | FTOptionDecl) Separator)*
[14]    FTOptionDecl    ::=    "declare" "ft-option" FTMatchOption

Match options modify the match semantics of full-text expressions. They are described in detail in Section 3.2 FTMatchOptions. When a match option is specified explicitly in a query, that setting overrides the setting of the respective match option in the static context.

3 FTSelections

This section describes FTSelections which contain the full-text operators in the FTContainsExpr, and the match options in FTMatchOptions which modify the matching semantics of the full-text selection expressions.

The FTSelection production specifies the possible full-text search conditions.

[144]    FTSelection    ::=    FTOr (FTMatchOption | FTProximity)* ("weight" RangeExpr)?

The "weight" value is the result of evaluating ExprSingle and can be any numeric value.

The syntax and semantics of the individual full-text selection operators follow.

This XML document fragment is the source document for examples in this section.

Tokenization is implementation-defined. A sample tokenization is used for the examples in this section. The results may be different for other tokenizations.

Unless stated otherwise, the results assume a case-insensitive match.

<book number="1">
  <title shortTitle="Improving Web Site Usability">Improving  
      the Usability of a Web Site Through Expert Reviews and
      Usability Testing</title>
   <author>Millicent Marigold</author>
   <author>Montana Marigold</author>
   <editor>Véra Tudor-Medina</editor>
   <content>
     <p>The usability of a Web site is how well the  
         site supports the users in achieving specified  
         goals. A Web site should facilitate learning,  
         and enable efficient and effective task  
         completion, while propagating few errors.
     </p>
     <note>This book has been approved by the Web Site  
         Users Association.
     </note>
   </content>
 </book>

3.1 Full-Text Operators

[Definition: Full-text operators perform operations on tokens, phrases, and expressions. Some require that the relative positions of tokens in the document be known (e.g., proximity operators).]

3.1.1 FTWords

FTWords specifies the tokens and phrases that are being searched as the left-hand side argument of FTContainsExpr.

[150]    FTWords    ::=    FTWordsValue FTAnyallOption?
[151]    FTWordsValue    ::=    Literal | ("{" Expr "}")
[166]    FTAnyallOption    ::=    ("any" "word"?) | ("all" "words"?) | "phrase"

An FTWords is an FTWordsValue followed by the optional modifier FTAnyallOption. The right-hand side of FTWordsValue is an XQuery expression which must evaluate to a sequence of string values or nodes of type "xs:string". The result is then atomized into a sequence of strings which is tokenized into a sequence of tokens and phrases. If the atomized sequence is not a subtype of "xs:string*", an error is raised: [err:XPTY0004]XP.

If the "any" option is specified, a match occurs, if and only if at least one token or phrase in the sequence has a match in the searched text.

If the "all" option is specified, a match occurs, if and only if all of the tokens and phrases in the sequence are matched in the searched text.

If the "phrase" option is specified, all words and phrases are used to create a sequence of ordered words representing a new phrase. A match occurs, if and only if the resulting phrase is matched in the searched text.

If the "any word" option is specified, a match occurs, if and only if at least one token in the sequence of tokens and phrases is matched in the searched text.

If the "all word" option is specified, a match occurs, if and only if all tokens in the sequence of tokens and phrases are matched in the searched text.

If no option is specified, "any" is the default.

If the result is a single string, "any", "all", and "phrase" are equivalent.

/book[@number="1" and ./title ftcontains "Expert"]

returns the book element whose number is 1, because its title element contains the token "Expert".

/book[@number="1" and ./title ftcontains "Expert Reviews"]

returns the book element whose number is 1, because its title element contains the phrase "Expert Reviews".

/book[@number="1" and ./title ftcontains {"Expert",
"Reviews"} all]

returns the book element whose number is 1, because its title element contains two tokens "Expert" and "Reviews".

/book[@number="1"]//p ftcontains "Web Site Usability"

returns false, because the p element doesn't contain the phrase "Web Site Usability" although it contains all of the tokens in the phrase.

for $book in /book[.//author ftcontains "Marigold"] 
let score $score := $book/title ftcontains "Web Site Usability" 
where $score > 0.8 
order by $score descending
return $book/@number

returns book numbers of book elements by "Marigold" with a title about "Web Site Usability" sorting them in descending score order.

3.1.2 FTOr

[145]    FTOr    ::=    FTAnd ( "||" FTAnd )*

FTOr finds matches that satisfy at least one of the selection criteria.

A match must satisfy at least one of the FTSelection criteria.

 /book[.//author ftcontains "Millicent" ||
"Voltaire"] 

returns the book element written by "Millicent".

3.1.3 FTAnd

[146]    FTAnd    ::=    FTMildnot ( "&&" FTMildnot )*

FTAnd finds matches that satisfy both of the selection criteria.

A match must satisfy all of the FTSelection criteria which are specified by one or more FTMildNot expressions.

/book[@number="1"]/title ftcontains ("usability" && "testing")

returns true, since the book title contains "usability" and "testing".

/book/author ftcontains "Millicent" && "Montana"

returns false, because "Millicent" and "Montana" are not contained by the same author element in any book element.

3.1.4 FTMildNot

[147]    FTMildnot    ::=    FTUnaryNot ( "not" "in" FTUnaryNot )*

FTMildNot is a milder form of && ! (and not). 'a not in b' matches an expression that contains "a", but not when it is a part of "b". For example, a search for "Mexico" not in "New Mexico" returns, among others, a document which is all about "Mexico" but mentions at the end that "New Mexico was named after Mexico", which would not be returned by an "and not" search.

A match to FTMildNot must contain at least one token occurrence that satisfies the first condition and does not satisfy the second condition. If it contains a token occurrence that satisfies both the first and the second condition, the occurrence is not considered as a result.

/book ftcontains "usability" not in "usability
testing"

returns true, because "usability" appears in the title and the p elements and the occurrence within the phrase "Usability Testing" in the title element is not considered.

The right-hand side of a FTMildNot may not contain an FTSelection that evaluates to an AllMatches that contains a StringExclude. Such FTSelections are FTUnaryNot and FTTimes with at most, from-to, and exactly occurrences ranges.

3.1.5 FTUnaryNot

[148]    FTUnaryNot    ::=    ("!")? FTWordsSelection

FTUnaryNot finds matches that do not satisfy the selection criteria.

/book[. ftcontains ! "usability"]

returns the empty sequence, because all book elements contain "usability".

/book ftcontains "information" &&
"retrieval" && ! "information retrieval"

returns true, because book elements contain "information" and "retrieval" but not "information retrieval".

/book[. ftcontains "web site usability" && 
!"usability testing"]

return book elements containing "web site usability" but not "usability testing".

3.1.6 FTOrder

[153]    FTOrderedIndicator    ::=    "ordered"

FTOrder controls the order of tokens and phrases to be the same as the order in which they are written in the query.

The default is unordered. Unordered is in effect when ordered is not specified in the query. Unordered cannot be written explicitly in the query.

FTOrder finds matches which must satisfy the nested selection condition and the match must contain the tokens in the order specified in the query.

/book/title ftcontains ("web site" && "usability")
ordered 

returns true, because titles of book elements contain "web site" and "usability" in the order in which they are written in the query, i.e., "web site" must precede "usability".

/book[@number="1"]/title ftcontains ("Montana" &&
"Millicent") ordered 

returns false, because although "Montana" and "Millicent" appear in the title element, they do not appear in the order they are written in the query.

3.1.7 FTScope

[171]    FTScope    ::=    ("same" | "different") FTBigUnit
[173]    FTBigUnit    ::=    "sentence" | "paragraph"

FTScope finds tokens and phrases contained in the same or a different scope.

Possible scopes are sentences and paragraphs.

By default, there are no restrictions on the scope of the matches.

If two tokens appear in the same sentence and in different sentences, then both same sentence and different sentence return true. The same is true for same paragraph and different paragraph.

/book ftcontains "usability"
&& "Marigold" same sentence

returns false, because the tokens "usability" and "Marigold" are not contained within the same sentence.

/book ftcontains "usability"
&& "Marigold" different sentence

returns true, because the tokens "usability" and "Marigold" are contained within different sentences.

/book[. ftcontains "usability" && "testing"
same paragraph] 

returns a book element, because it contains "usability" and "testing" in the same paragraph.

/book[. ftcontains "site" && "errors"
same sentence] 

returns a book element, because "site" and "errors" appear in the same sentence.

Some subtle relationships between FTScope and FTDistance will be discussed in Section 4.

3.1.8 FTDistance

[168]    FTDistance    ::=    "distance" FTRange FTUnit
[167]    FTRange    ::=    ("exactly" UnionExpr)
| ("at" "least" UnionExpr)
| ("at" "most" UnionExpr)
| ("from" UnionExpr "to" UnionExpr)
[172]    FTUnit    ::=    "words" | "sentences" | "paragraphs"

FTDistance finds matches by specifying the distance between tokens and phrases in FTUnits (tokens, sentences, and paragraphs). The number of intervening FTUnits is specified in the integer value of FTRange.

FTRange specifies a range of integer values, providing a minimum and maximum value. Each UnionExpr in an FTRange must evaluate (after atomization) to a singleton sequence with an atomic value of type "xs:integer". Otherwise, an error is raised [err:XPTY0004]XP.

Let the value of the first (or only) UnionExpr be M. If "from" is specified, let the value of the second UnionExpr be N. FTDistance may cross element boundaries when computing distance.

The following rule applies to FTDistance:

  • Zero words (sentences, paragraphs) means adjacent tokens (sentences, paragraphs).

If "exactly" is specified, then the range is the closed interval [M, M]. If "at least" is specified, then the range is the half-closed interval [M, unbounded). If "at most" is specified, then the range is the closed interval [0, M]. If "from-to" is specified, then the range is the closed interval [M, N].

Here are some examples of FTRanges:

  1. 'exactly 0' specifies the range [0, 0].

  2. 'at least 1' specifies the range [1,unbounded].

  3. 'at most 1' specifies the range [0, 1].

  4. 'from 5 to 10' specifies the range [5, 10].

The distances computed by FTDistance are not affected by the presence or absence of element boundaries in the text. Stop words are counted in those computations whether they are ignored or not.

/book ftcontains ("information" &&
"retrieval") not in ("information" && "retrieval" 
distance at least 11 words)

returns false, because "information" and "retrieval" are more than at least 11 tokens apart.

/book ftcontains "web" && "site" &&
"usability" distance at most 2 words

returns true, because "web", "site", and "usability" have at most 2 intervening tokens between them.

/book[. ftcontains "web site"
&& "usability" distance at most 1 words]/title 

returns the book title. A similar query for the p element would return false because "web site" and "usability" have two intervening tokens between them.

3.1.9 FTWindow

[169]    FTWindow    ::=    "window" UnionExpr FTUnit

FTWindow finds matches within a number of FTUnits (tokens, paragraphs, and phrases). The number of FTUnits is specified as an integer.

FTWindow may cross element boundaries. The size of the window is not affected by the presence or absence of element boundaries. Stop words are included in those computations whether they are ignored or not.

UnionExpr must evaluate to an atom of type "xs:integer".

A match of an FTSelection is considered a match within a window, if there exists a window of the given number of consecutive units (tokens, sentences, or paragraphs) in the document within which the match lies.

/book/title ftcontains "web" && "site"
&& "usability" window 5 words

returns true, because "web", "site", and "usability" are within a window of 5 tokens in the title element.

/book ftcontains ("web" && "site" ordered)
&& ("usability" || "testing") window 10 words

returns true, because "web" and "site" in the order they are written in the query and either "usability" or "testing" are within a window of at most 10 tokens.

/book//title ftcontains "web site" &&
"usability" window 3 words

returns true, because the title element contains "Web Site Usability". A similar query on the p element would not return true, because its occurrences of "web site" and "usability" are not within a window of 3.

/book[@number="1" and . ftcontains "efficient" 
&& ! "and" window 3 words]

returns the empty sequence, because in the selected book element, there is no occurrence of "efficient" within a window of 3 tokens which would not also contain an occurrence of "and".

3.1.10 FTTimes

[170]    FTTimes    ::=    "occurs" FTRange "times"

FTTimes finds matches in which an FTSelection occurs a specified number of times.

FTTimes limits the number of different occurrences of FTSelection, within the specified range.

In the document fragment "very very big":

  1. The FTSelection "very big" has 1 occurrence consisting of the second "very" and "big".

  2. The FTSelection "very && big" has 2 occurrences; one consisting of the first "very" and "big", and the other containing the second "very" and "big".

  3. The FTSelection "very || big" has 3 occurrences.

  4. The FTSelection ! "small" has 1 occurrence.

/book[. ftcontains "usability" occurs at least 2 times]/@number

returns book numbers because book elements contain 2 or more occurrences of "usability".

/book[@number="1" and title ftcontains "usability" ||
"testing" occurs at most 3 times] 

returns the empty sequence, because there are 4 occurrences of "usability" || "testing" in the designated title.

/book ftcontains "usability" occurs at least 2 times

returns true, because the book element contains 3 occurrences of "usability" in its title element although its p element contains only 1 occurrence.

3.1.11 FTContent

[165]    FTContent    ::=    ("at" "start") | ("at" "end") | ("entire" "content")

FTContent finds matches in which the tokens and phrases are the first, last or all of the tokens and phrases in the tokenized form of the items being searched.

The "at" "start" option finds matches in which the tokens or phrases are the first tokens or phrases in the tokenized string value of the element being searched.

The "at" "end" option finds matches in which the tokens or phrases are the last tokens or phrases in the tokenized string value of the element being searched.

The "entire" content" option finds matches in which the tokens or phrases are the entire content of the tokenized string value of the element being searched.

/books//title[. ftcontains "improving the usability
of a web site" at start]

returns each title element starting with the phrase "improving the usability of a web site".

/books//p[. ftcontains "propagat*" && "few
errors" distance at most 2 words at end]

returns each p element ending with the phrase "propagating few errors".

/books//note[. ftcontains "this site has been
approved by the web site users association" entire content]

returns each note element whose entire content is "this site has been approved by the web site users association".

3.2 FTMatchOptions

FTMatchOptions modify the operational semantics of the FTSelection on which they are applied.

[154]    FTMatchOption    ::=    FTCaseOption
| FTDiacriticsOption
| FTStemOption
| FTThesaurusOption
| FTStopwordOption
| FTLanguageOption
| FTWildCardOption

FTMatchOptions set environments for the matching options of FTSelection. [Definition: Match options modify the set of tokens and phrases in the query. Some of these options (e.g., stemming) have behaviors which depend on the language of the document, the language of the query, or both.] If a match option isn't specified explicitly in the query, its value is given by its static context component. Details about these context components, including their default values, are given in Appendix C Static Context Components.

If no match options declarations are present in the prolog and the implementation does not define any overwriting of the static context components for the match options, the query:

/book/title ftcontains "usability" 

is equivalent to the query

/book/title ftcontains "usability" case insensitive 
    diacritics insensitive 
    without stemming without thesaurus  
    without stop words language "none" without wildcards

FTMatchOptions are applied in the order in which they are written in the query. More information on their semantics is given in 4.2.3 Match Options Semantics.

We describe each match option in more detail in the following sections.

3.2.1 FTCaseOption

[155]    FTCaseOption    ::=    "lowercase"
| "uppercase"
| ("case" "sensitive")
| ("case" "insensitive")

FTCaseOption modifies tokens and phrases matching by specifying how upper and lower charcters are considered.

FTCaseOption influences the way FTWords is applied.

There are four possible character case options:

  1. The option "uppercase" matches tokens and phrases with uppercase characters, regardless of the case of characters of the tokens and phrases as they are written in the query.

  2. The option "lowercase" matches tokens and phrases with lowercase characters, regardless of the case of characters of the tokens and phrases as they are written in the query.

  3. The option "case" "insensitive" matches the uppercase and lowercase characters of tokens and phrases. The case of characters as they are written in the query is not considered.

  4. The option "case" "sensitive" matches the case of the characters in tokens and phrases as they are written in the query.

The default is "case insensitive".

The following table summarizes the interactions between the case match options and the use of the default collations.

Case Matrix
Default collation options/Case options UCC (Unicode Codepoint Collation) CCS (some generic case-sensitive collation) CCI (some generic case-insensitive collation)
insensitive compare as if both lower case-insensitive variant of CCS if it exists, else error CCI
sensitive UCC CCS case-sensitive variant of CCI if it exists, else error
uppercase uppercase(Expr) + UCC uppercase(Expr) + CSS CCI
lowercase lowercase(Expr) + UCC lowercase(Expr) + CSS CCI

Note:

In this table, "else error" means "Otherwise, an error is raised: [err:FOCH0002]FO". The phrase "if it exists" is used, because the case-sensitive collation CCS does not always have a case-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the case-insensitive collation CCI does not always have a case-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

/book[@number="1"]/title ftcontains "Usability" lowercase 

returns false, because the title element doesn't contain "usability" in lower-case characters.

/book[@number="1"]/title ftcontains "usability" 
case insensitive

returns true, because the character case is not considered.

3.2.2 FTDiacriticsOption

[156]    FTDiacriticsOption    ::=    ("with" "diacritics")
| ("without" "diacritics")
| ("diacritics" "sensitive")
| ("diacritics" "insensitive")

FTDiacriticsOption modifies token and phrase matching by specifying how diacritics are considered.

There are four possible diacritics options:

  1. The option "with" "diacritics" matches tokens and phrases with diacritics, regardless of whether the diacritics are written in the query.

  2. The option "without" "diacritics" matches tokens and phrases without diacritics, regardless of whether the diacritics are written in the query.

  3. The option "diacritics" "insensitive" matches tokens and phrases with and without diacritics. Whether diacritics are written in the query or not is not considered.

  4. The option "diacritics" "sensitive" matches tokens and phrases only if they contain the diacritics as they are written in the query.

The default is "diacritics insensitive".

The following table summarizes the interactions between the diacritics match options and the use of the default collations.

Diacritics Matrix
Default collation options/Diacritics options UCC (Unicode Codepoint Collation) CDS (some generic diacritics-sensitive collation) CDI (some generic diacritics-insensitive collation)
insensitive compare as if with and without diacritics-insensitive variant of CDS if it exists, else error CDI
sensitive UCC CDS diacritics-sensitive variant of CDI if it exists, else error
with diacritics "resume diacritic insensitive" not in "resume" "resume diacritic insensitive" not in "resume" CDI
without diacritics "resume" not in "resume diacritic sensitive" "resume" not in "resume diacritic sensitive" CDI

Note:

In this table, "else error" means "Otherwise, an error is raised: [err:FOCH0002]FO". The phrase "if it exists" is used, because the diacritics-sensitive collation CDS does not always have a diacritics-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the diacritics-insensitive collation CDI does not always have a diacritics-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

/book[@number="1"]//editor ftcontains "Vera" with diacritics 

returns true, because the editor element contains the token "Vera" with an acute accent.

/book[@number="1"]/editors ftcontains "Véra" without diacritics 

returns false, because the editor element does not contain the token "Vera" without an acute accent.

3.2.3 FTStemOption

[157]    FTStemOption    ::=    ("with" "stemming") | ("without" "stemming")

FTStemOption modifies token and phrase matching by specifying whether stemming is applied or not.

FTStemOption influences the way FTWords is applied. It produces a disjunction of the query tokens by expanding the tokens into the list of tokens that share the same stem. By definition, the query tokens are included in that disjunction.

The "with stemming" option specifies that matches may contain tokens that have the same stem as the tokens and phrases written in the query. It is implementation-defined what a stem of a token is.

The "without stemming" option specifies that the tokens and phrases are not stemmed.

It is implementation-defined whether the stemming is based on an algorithm, dictionary, or mixed approach.

The default is "without stemming".

/book[@number="1"]/title ftcontains "improve" with stemming 

returns true, because the title of the specified book contains "improving" which has the same stem as "improve".

3.2.4 FTThesaurusOption

[158]    FTThesaurusOption    ::=    ("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")
[159]    FTThesaurusID    ::=    "at" StringLiteral ("relationship" StringLiteral)? (FTRange "levels")?

FTThesaurusOption modifies token and phrase matching by specifying whether a thesaurus is used or not. If thesauri are used, it locates the thesauri by default or URI reference. It also states the relationship to be applied and how many levels within the thesaurus to be traversed.

FTThesaurusOption influences the way FTWords is applied.

The StringLiteral following the keyword at in FTThesaurusID is of the form of a URI Reference.

Thesauri add related tokens and phrases to the search. Thus, the user may narrow, broaden, or otherwise modify the search using synonyms, hypernyms (more generic terms), etc. The search is performed as though the user has specified all related search tokens and phrases in a disjunction (FTOr).

Note:

A thesaurus may be standards-based or locally-defined. It may be a traditional thesaurus, or a taxonomy, soundex, ontology, or topic map. How the thesaurus is represented is implementation-dependent.

FTThesaurusID specifies the relationship sought between tokens and phrases written in the query and terms in the thesaurus and the number of levels to be queried in hierarchical relationships by including an FTRange "levels". If no levels are specified, the default is to query all levels in hierarchical relationships.

Relationships include, but are not limited to, the relationships and their abbreviations presented in [ISO 2788] and their equivalents in other languages:

  1. equivalence relationships (synoymns): PREFERRED TERM (USE), NONPREFERRED USED FOR TERM (UF);

  2. hierarchical relationships: BROADER TERM (BT), NARROWER TERM (NT), BROADER TERM GENERIC (BTG), NARROWER TERM GENERIC (NTG), BROADER TERM PARTITIVE (BTP), NARROWER TERM PARTITIVE (NTP), TOP Terms (TT); and

  3. associative relationships: RELATED TERM (RT).

The "with thesaurus" option specifies that string matches include tokens that can be found in one of the specified thesauri.

The "without thesaurus" option specifies that no thesaurus will be used.

The "with default thesaurus" option specifies that a system-defined default thesaurus with a system-defined relationship is used. The default thesaurus may be used in combination with other explicitly specified thesauri.

The default is "without thesaurus".

count(.//book/content ftcontains "duties" with
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "synonyms")>0

returns true, because it finds a content element containing "tasks" which the thesaurus identified as a synonym for "duties".

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(./content ftcontains "web site components" with
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "narrower terms" at most 2 levels)>0]

returns book elements, because it finds a content element containing "web site components", and narrower terms "navigation" and "layout".

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(. ftcontains "Merrygould" with thesaurus at
"http://bstore1.example.com/UsabilitySoundex.xml" relationship
"sounds like")>0]

returns a book element containing "Marigold which sounds which sound like "Merrygould".

3.2.5 FTStopwordOption

[160]    FTStopwordOption    ::=    ("with" "stop" "words" FTRefOrList FTInclExclStringLiteral*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTInclExclStringLiteral*)
[161]    FTRefOrList    ::=    ("at" StringLiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")
[162]    FTInclExclStringLiteral    ::=    ("union" | "except") FTRefOrList

FTStopWordOption controls word matching by specifying whether stop words are used or not. It can be used to define a set of tokens that will be replaced with a search on any token if used as search tokens.

FTStopWordOption influences the way FTWords is applied.

FTRefOrList specifies the list of stop words either explicitly as a comma-separated list of string literals, or by a URI following the keyword at. If a URI is used, it must point to a sequence of string atoms or nodes of type "xs:string". In both cases, no tokenization is performed on the strings: they are used as they occur in the sequence.

The "with stop words" option specifies that if a token is within the specified collection of stop words, it is removed from the search and any token may be substituted for it. Stop words retain their position numbers and are counted in FTDistance and FTWindow searches.

Multiple stop word lists may be combined using "union" or "except". If "union" is specified, every string occurring in the lists specified by the left-hand side or the right-hand side is a stop word. If "except" is specified, only strings occurring in the list specified by the left-hand side but not in the list specified by the right-hand side are stop words.

The "with default stop words" option specifies that an implementation-defined collection of stop words is used.

The "without stop words" option specifies that no stop words are used. This is equivalent to specifying an empty list of stop words.

The default is "without stop words".

/book[@number="1"]//p ftcontains "propagation of errors"
with stemming with stop words ("a", "the", "of") 

returns true, because the document contains the phrase "propagating few errors".

Note the asymmetry in the stop word semantics: the property of being a stop word is only relevant to query terms, not to document terms. Hence, it is irrelevant for the above-mentioned match whether "few" is a stop word or not, and on the other hand we do not want the query above to match "propagation" followed by 2 stop words, or even a sequence of 3 stop words in the document.

/book[@number="1"]//p ftcontains "propagation of errors" 
with stemming without stop words

returns false, because "of" is not in the p element between "propagating" and "errors".

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//content ftcontains "planning then 
conducting" with stop words at 
"http://bstore1.example.com/StopWordList.xml")>0]

uses the stop words list specified at the URL. Assuming that the specified stop word list contains the "then", this query is reduced to a query on the phrase "planning X conducting", allowing any token as a substitute for X. It returns a book element, because its content element contains "planning then conducting". It would have also returned the book if the phrases "planning and conducting" and "planning before conducting" had been in its content.

doc("http://bstore1.example.com/full-text.xml")
/books/book[count(.//content ftcontains "planning then conducting"
with stop words at "http://bstore1.example.com/StopWordList.xml"
except ("the then"))>0]

returns books containing "planning then conducting", but not does not return books containing "planning and conducting", since it is exempting "then" from being a stop word.

3.2.6 FTLanguageOption

[163]    FTLanguageOption    ::=    "language" StringLiteral

FTLanguageOption modifies token matching by specifying the language of search tokens and phrases.

FTLanguageOption influences the way FTWords is applied.

The StringLiteral following the keyword language designates one language. It must either be castable to "xs:language", or be the value "none". Otherwise, an error is raised: [err:XPTY0004]XP.

The "language" option influences tokenization, stemming, and stop words.

If the language "none" option is specified, no language selected.

The set of valid language identifiers is implementation-defined.

By default, there is no language selected.

/book[@number="1"]//editor ftcontains "salon de the"
with default stop words language "fr"

This is an example where the language option is used to select the appropriate stop word list.

3.2.7 FTWildCardOption

[164]    FTWildCardOption    ::=    ("with" "wildcards") | ("without" "wildcards")

FTWildCardOption modifies token and phrase matching by specifying whether wildcards are used or not.

FTWildCardOption influences the way FTWords is applied.

In addition to specifying the "with wildcards"' option, indicators (represented by periods (.)) and qualifiers are appended to or inserted into tokens being searched. Zero or more characters replace each indicator and qualifier.

Indicators are mandatory. When the "with wildcards"' option is present, one or more periods (.) must be appended at the beginning or end of tokens or inserted into tokens. If the period is at the beginning of a token, the wildcard is a prefix wildcard. If the period is at the end of a token, it is a suffix wildcard. If the period is inserted into a token, it is an infix wildcard.

When the "with wildcards" option and one or more periods (.) appended to or inserted into tokens are present, characters are appended or inserted at each of the periods. Any characters may be appended or inserted except newline characters (#xA), return characters (#xD), and tab characters (#x9). The number of characters depends on the qualifier. Qualifiers available are none, question mark, asterisk, plus sign, and two numbers separated by a comma, both enclosed by curly braces.

  1. If a period is present, but no qualifiers, one character is appended or inserted.

  2. If a period is followed by a question mark (.?), zero or one characters are appended or inserted.

  3. If a period is followed by an asterisk (.*), zero or more characters are appended or inserted.

  4. If a period is followed by a plus sign (.+), one or more characters are appended or inserted.

  5. If a period is followed by two numbers separated by a comma, both enclosed by curly braces (.{n,m}), a specified range of characters is appended or inserted.

The "without wildcards" option finds tokens without recognizing wildcard indicators and qualifiers. Periods, question marks, asterisks, plus signs, and two numbers separated by a comma, both enclosed by curly braces recognized as regular characters.

The default is "without wildcards".

/book[@number="1"]/title ftcontains "improv.*" with
wildcards

returns true, because the title element contains "improving".

/book[@number="1"]/title ftcontains ".?site" with
wildcards

returns true, because the title element contains "site".

/book[@number="1"]/p ftcontains "w.ll" with
wildcards

returns true, because the p element contains "well".

3.3 FTIgnoreOption

[174]    FTIgnoreOption    ::=    "without" "content" UnionExpr

FTIgnoreOption specifies a set of element nodes whose content are ignored. [Definition: Ignored nodes are the set of element nodes whose content are ignored.] Ignored nodes are identified by the XQuery expression UnionExpr. Let N1, N2, ..., Nk be the sequence of nodes of the search context. The expression UnionExpr is evaluated in the context of each node Ni being searched. That is, the search context expression of the ftcontains predicate creates a new focus for the evaluation of the UnionExpr given with FTIgnoreOption, similar to the creation of the dynamic context of a path expression E1/E2 or a filter expression E1[E2] (see Section 2.1.2 Dynamic ContextXQ).

Now, let I1, I2, ..., In be the sequence of items that UnionExpr evaluates to. For each Ni (i=1..k) a copy is made that omits each node Ij (j=1..n) that is not Ni. Those copies form the new search context. If UnionExpr evaluates to an empty sequence no nodes are omitted.

In the following fragment, if .//annotation is ignored, "Web Usability" will be found 2 times: once in the title element and once in the editor element. The 2 occurrences in the 2 annotation elements are ignored. On the other hand, "expert" will not be found, as it appears only in an annotation element.

<book>
   <title>Web Usability and Practice</title>
   <author>Montana <annotation> this author is an expert in Web Usability</annotation> 
           Marigold
   </author>
   <editor>Véra Tudor-Medina on Web <annotation> best editor on Web Usability</annotation>
           Usability
   </editor>
 </book>

By default, no element content is ignored.

4 Semantics

This section describes the formal semantics of XQuery 1.0 and XPath 2.0 Full-Text. The figure below shows how XQuery 1.0 and XPath 2.0 Full-Text integrates with XQuery 1.0 and XPath 2.0.

The following diagram represents the interaction of XQuery 1.0 and XPath 2.0 Full-Text with the rest of XQuery 1.0 and XPath 2.0 languages. It specifies how full-text expression can be nested within XQuery 1.0 and XPath 2.0 expressions and vice versa.

Composability diagram

The functions and schemas defined in this section are considered to be within the fts: namespace. These functions and schemas are used only for describing the semantics. There is no requirement that these functions and schemas be implemented, so there is no URI is associated with the fts: prefix.

4.1 Tokenization

[Definition: Formally, tokenization is the process of converting the string value of a node to a sequence of token occurrences, taking the structural information of the node into account to identify token, sentence, and paragraph boundaries.]

Tokenization is subject to the following constraint:

  1. Attribute values are not tokenized.

4.1.1 Examples

The following document fragment is the source document for examples in this section. Tokenization is implementation-defined. A sample tokenization is used for the examples in this section. The results might be different for other tokenizations.

Unless stated otherwise, the results assume a case-insensitive match.

<offers>
    <offer id="1000" price="10000">
        Ford Mustang 2000, 65K, excellent condition, runs 
        great, AC, CC, power all
    </offer>
    <offer id="1001" price="8000">
        Honda Accord 1999, 78K, A/C, cruise control, runs 
        and looks great, excellent condition
    </offer>
    <offer id="1005" price="5500">
        Ford Mustang, 1995, 150K highway mileage, no rust, 
        excellent condition
    </offer>
</offers>
        

In this sample tokenization, tokens are delimited by punctuation and whitespace symbols.

  • The token "Ford" is at relative position 1.

  • The token "Mustang" is at relative position 2.

  • The token "2000" is at relative position 3.

  • Relative position numbers are assigned sequentially through the end of the document.

Hence each token occupies exactly one position, and no overlapping of tokens occurs. The relative positions of token occurrences are shown below in parentheses.

<offers>
    <offer id="1000" price="10000">
        Ford(1) Mustang(2) 2000(3), 65K(4), excellent(5)
        condition(6), runs(7) great(8), AC(9), CC(10), 
        power(11) all(12)
    </offer>
    <offer id="1001" price="8000">
        Honda(13) Accord(14) 1999(15), 78K(16), A(17)/C(18),
        cruise(19) control(20), runs(21) and(22) looks(23)
        great(24), excellent(25) condition(26)
    </offer>
    <offer id="1005" price="5500">
        Ford(27) Mustang(28), 1995(29), 150K(30) highway(31)
        mileage(32), little(33)  rust(34), excellent(35) 
        condition(36)
    </offer>
</offers>
        

The relative positions of paragraphs are determined similarly. In this sample tokenization, the paragraph delimiters are start tags, end tags, and end of line characters.

  • The tokens in the first element are assigned relative paragraph number 1.

  • The tokens from the next element are assigned relative paragraph number 2.

  • Relative paragraph numbers are assigned sequentially through the end of the document.

The relative positions of sentences are determined similarly using sentence delimiters.

Implementations may provide for the means to ignore or side-step certain structural elements when performing tokenization. In the following example, the implementation has decided to ignore the markup for <bold> and prune out the entire subtree headed by <deleted>.

<para><deleted>This sentence was deleted.</deleted>
This <bold>entire paragraph</bold> is one sentence
as far as the tokenizer is concerned.
</para>

Using the same notation as before, this sample tokenization is shown below. All the token occurrences marked with a token position also have the same sentence and paragraph relative positions. Note that there are no tokens marked for the ignored subtree.

<para><deleted>This sentence was deleted.</deleted>
This(1) <bold>entire(2) paragraph(3)</bold> is(4) one(5) sentence(6)
as(7) far(8) as(9) the(10) tokenizer(11) is(12) concerned(13).
</para>

4.1.2 Representations of Tokenized Text and Matching

Two representations of tokenized text will be employed in the formal semantics functions, one for the search strings of a query and one for matched token occurrences of search context items.

A [Definition: SearchItem is a sequence of SearchTokenInfos representing the sequence of tokens derived from tokenizing one search string. ]

A [Definition: SearchTokenInfo is the identity of a token inside a search string. ] Each SearchTokenInfo is associated with a unique identifier that captures the relative position of the search string in the query in document order.

A [Definition: TokenInfo represents a sequence of consecutive token occurrences inside an XML document. ] Each TokenInfo is associated with:

  • a unique identifier that captures the relative position of the first token occurrence of the sequence in the document order: startPos

  • a unique identifier that captures the relative position of the last token occurrence of the sequence in the document order: endPos

  • the relative position of the sentence containing the first token occurrence or zero if the tokenizer does not report sentences: startSent

  • the relative position of the sentence containing the last token occurrence or zero if the tokenizer does not report sentences: endSent

  • the relative position of the paragraph containing the first token occurrence or zero if the tokenizer does not report paragraphs: startPara

  • the relative position of the paragraph containing the last token occurrence or zero if the tokenizer does not report paragraphs: endPara

The following matching function is the central implementation-defined primitive performing the full-text retrieval.

declare function fts:matchTokenInfos (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $stopWords as xs:string*,
      $searchTokens as element(fts:searchToken)* )
   as element(fts:tokenInfo)*  external;
            

The above function returns the TokenInfos in items in $searchContext that match the search string represented by the sequence $searchTokens, when using the match options in $matchOptions and stop words in $stopWords. If $searchTokens is a sequence of more than one search token, each returned TokenInfo must represent a phrase matching that sequence.

Note:

While this matching function assumes a tokenized representation of the search strings, it does not assume a tokenized representation of the input items in $searchContext, i.e. the texts in which the search happens. Hence, the tokenization of the search context is implicit in this function and coupled to the retrieval of matches. Of course, this does not imply that tokenization of the search context cannot be done a priori. Because tokenization is implementation-defined, the tokenization of each item in $searchContext does not necessarily take into account the match options in $matchOptions or the search tokens in $searchTokens. This allows implementations to tokenize and index input data without the knowledge of particular match options used in full-text queries.

4.2 Evaluation of FTSelections

The sequence of nodes in the XQuery 1.0 and XPath 2.0 Data Model is inadequate to support fully composable FTSelections. Full-text operations, such as FTSelections, operate on linguistic units, such as positions of tokens, and which are not captured in the XQuery 1.0 and XPath 2.0 Data Model (XDM).

XQuery 1.0 and XPath 2.0 Full-Text adds relative token, sentence, and paragraph position numbers via AllMatches. AllMatches make FTSelections fully composable.

4.2.1 AllMatches

4.2.1.1 Formal Model

[Definition: An AllMatches describes the possible results of an FTSelection.] The UML Static Class diagram of AllMatches is shown on the diagram given below.

AllMatches class diagram

The AllMatches object contains zero or more Matches.

[Definition: Each Match describes one result to the FTSelection.] The result is described in terms of zero or more StringIncludes and zero or more StringExcludes.

[Definition: A StringMatch is a possible match of a sequence of search tokens with a corresponding sequence of consecutive token occurrences in a document. A StringMatch may be a StringInclude or StringExclude.] The queryPos attribute specifies the position of the search token in the query. This attribute is needed for FTOrders. The matched document token sequence is described in the TokenInfo associated with the StringMatch.

[Definition: A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.]

[Definition: A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.]

Intuitively, AllMatches specifies the TokenInfos that a node contains and does not contain to satisfy an FTSelection.

The AllMatches structure resembles the Disjunctive Normal Form (DNF) in propositional and first-order logic. The AllMatches is a disjunction of Matches. Each Match is a conjunction of StringIncludes, and StringExcludes.

4.2.1.2 Examples

Since in most of the examples below the tokens span only a single position, we characterize the TokenInfo instance by simply giving this position, written as "Pos:X". This should be read as the value for both, the startPos and the endPos attribute. Furthermore, for expository reasons, we include in each StringMatch example an attribute "query string", set to the original query string, in order to facilitate the association from which query string that match came from.

The simplest example of an FTSelection is an FTWords such as "Mustang". The AllMatches corresponding to this FTWords is given below.

Sample AllMatches

As shown, the AllMatches consists of two Matches. Each Match represents one possible result of the FTWords "Mustang". The result represented by the first Match, represented as a StringInclude, contains the token "Mustang" at position 2. The result described by the second Match contains the token "Mustang" at position 28.

A more complex example of an FTSelection is an FTWords such as "Ford Mustang". The AllMatches for this FTWords is given below.

Sample AllMatches

There are two possible results for this FTWords, and these are represented by the two Matches. Each of the Matches requires two tokens to be matched. The first Match is obtained by matching "Ford" at position 1 and matching "Mustang" at position 2. Similarly, the second Match is obtained by matching "Ford" at position 27 and "Mustang" at position 28.

An even more complex example of an FTSelection is an FTSelection such as "Mustang" && ! "rust" that searches for "Mustang" but not "rust". The AllMatches for this FTSelection is given below.

Sample AllMatches

This example introduces StringExclude. StringExclude corresponds to negation in DNF. It specifies that the result described by the corresponding Match must not match the token at the specified position. In this example, the first Match specifies that "Mustang" is matched at position 2, and that the token "rust" at position 34 is not matched.

4.2.1.3 XML representation

AllMatches has a well-defined hierarchical structure. Therefore, the AllMatches can be easily modeled in XML. This XML representation and those which follow formally describe the semantics of FTSelections. For example, the XML representation of AllMatches formally specifies how an FTSelection operates on zero or more AllMatches to produce a resulting AllMatches.

The XML schema for representing AllMatches is given below.

<xs:schema 
     xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     xmlns:fts="http://www.w3.org/2006/xquery-full-text"
     targetNamespace="http://www.w3.org/2006/xquery-full-text"
     elementFormDefault="qualified" 
     attributeFormDefault="unqualified">

  <xs:complexType name="AllMatches">
    <xs:sequence>
      <xs:element ref="fts:match" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="stokenNum" type="xs:integer" use="required" />
  </xs:complexType>

  <xs:element name="allMatches" type="fts:AllMatches"/>

  <xs:complexType name="Match">
    <xs:sequence>
      <xs:element ref="fts:stringInclude" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
      <xs:element ref="fts:stringExclude" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
   </xs:sequence>
  </xs:complexType>
  
  <xs:element name="stringInclude" 
              type="fts:StringMatch" />

  <xs:element name="stringExclude" 
              type="fts:StringMatch" />

  <xs:element name="match" type="fts:Match"/>

  <xs:complexType name="StringMatch">
    <xs:sequence>
      <xs:element ref="fts:tokenInfo"/>
    </xs:sequence>
    <xs:attribute name="queryPos" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>

  <xs:complexType name="TokenInfo">
    <xs:attribute name="startPos" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="endPos" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="startSent" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="endSent" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="startPara" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="endPara" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>

  <xs:element name="tokenInfo" type="fts:TokenInfo"/>

  <xs:complexType name="SearchItem">
    <xs:sequence>
      <xs:element ref="fts:searchToken" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
   </xs:sequence>
  </xs:complexType>

  <xs:complexType name="SearchTokenInfo">
    <xs:attribute name="word" 
                  type="xs:string" 
                  use="required"/>
    <xs:attribute name="queryPos" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>

  <xs:element name="searchToken" type="fts:SearchTokenInfo"/>
</xs:schema>
                

The stokenNum attribute in AllMatches is related to the representation of the semantics as XQuery functions. Therefore, it is not considered part of the AllMatches model. The stokenNum attribute stores the number of search tokens used when evaluating the AllMatches. This value is used to compute the correct value for the queryPos attribute in new StringMatches.

4.2.2 FTSelections

FTSelections are fully composable and may be nested arbitrarily under other FTSelections. Each FTSelection may be associated with match options (such as stemming and stop words) and score weights. Since score weights are solely interpreted by the formal semantics scoring function, they do not influence the semantics of FTSelections. Therefore, score weights are not considered in the formal semantics.

4.2.2.1 XML Representation

The XML representation of the FTSelections used in the fts:evaluate function closely follows the grammar of the language. It can be viewed as an XML representation of an abstract syntax tree (AST) of a parsed full-text query. Every FTSelection is represented as an XML element. Every nested FTSelection is represented as a nested descendant element. For binary FTSelections, e.g., FTAnd, the nested FTSelections are represented in <left> and <right> descendant elements. For unary FTSelections, a <selection> descendant element is used. Additional characteristics of FTSelections, e.g., the distance unit for FTDistance, are stored in attributes.

<xs:schema
     xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     xmlns:fts="http://www.w3.org/2006/xquery-full-text"
     targetNamespace="http://www.w3.org/2006/xquery-full-text"
     elementFormDefault="qualified" 
     attributeFormDefault="unqualified">
           
  <xs:include schemaLocation="AllMatches.xsd" />
  <xs:include schemaLocation="MatchOptions.xsd" />

  <xs:complexType name="FTSelection">
    <xs:sequence>
      <xs:choice>
        <xs:element name="FTWords" type="fts:FTWords"/>
        <xs:element name="FTAnd" type="fts:FTAnd"/>
        <xs:element name="FTOr" type="fts:FTOr"/>
        <xs:element name="FTUnaryNot" type="fts:FTUnaryNot"/>
        <xs:element name="FTMildNot" type="fts:FTMildNot"/>
        <xs:element name="FTOrder" type="fts:FTOrder"/>
        <xs:element name="FTScope" type="fts:FTScope"/>
        <xs:element name="FTContent" type="fts:FTContent"/>
        <xs:element name="FTDistance" type="fts:FTDistance"/>
        <xs:element name="FTWindow" type="fts:FTWindow"/>
        <xs:element name="FTTimes" type="fts:FTTimes"/>
      </xs:choice>
      <xs:element ref="fts:matchOptions" 
                  minOccurs="0"/>
      <xs:element name="weight" 
                  type="xs:double" 
                  minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:element name="selection" type="fts:FTSelection"/>

  <xs:complexType name="FTWords">
    <xs:sequence>
      <xs:element ref="searchItem" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:FTWordsType" 
                  use="required"/>
  </xs:complexType>

  <xs:element name="searchItem" type="fts:SearchItem"/>
  
  <xs:complexType name="FTAnd">
    <xs:sequence>
      <xs:element name="left" type="fts:FTSelection"/>
      <xs:element name="right" type="fts:FTSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="FTOr">
    <xs:sequence>
      <xs:element name="left" type="fts:FTSelection"/>
      <xs:element name="right" type="fts:FTSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="FTUnaryNot">
    <xs:sequence>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="FTMildNot">
    <xs:sequence>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="FTOrder">
    <xs:sequence>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="FTScope">
    <xs:sequence>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:ScopeType" 
                  use="required"/>
    <xs:attribute name="scope" 
                  type="fts:ScopeSelector" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="FTContent">
    <xs:sequence>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:ContentMatchType" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="FTDistance">
    <xs:sequence>
      <xs:element name="range" type="fts:FTRangeSpec"/>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:DistanceType" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="FTWindow">
    <xs:sequence>
      <xs:element name="selection" type="fts:FTSelection"/>
    </xs:sequence>
    <xs:attribute name="size" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="type" 
                  type="fts:DistanceType" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="FTTimes">
    <xs:sequence>
      <xs:element name="range" type="fts:FTRangeSpec"/>
      <xs:element name="selection" type="fts:FTWords"/>
    </xs:sequence>
  </xs:complexType>
    
  <xs:simpleType name="FTWordsType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="any"/>
      <xs:enumeration value="all"/>
      <xs:enumeration value="phrase"/>
      <xs:enumeration value="any word"/>
      <xs:enumeration value="all word"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="ScopeType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="same"/>
      <xs:enumeration value="different"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="ScopeSelector">
    <xs:restriction base="xs:string">
      <xs:enumeration value="paragraph"/>
      <xs:enumeration value="sentence"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="DistanceType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="paragraph"/>
      <xs:enumeration value="sentence"/>
      <xs:enumeration value="word"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="ContentMatchType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="at start"/>
      <xs:enumeration value="at end"/>
      <xs:enumeration value="entire content"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>
            
4.2.2.2 The evaluate function

The denotational semantics for the evaluation of FTSelections is defined using the fts:evaluate function. The function takes three parameters: (1) an FTSelection, 2) a search context node, and 3) the default set of match options that apply to the evaluation of the FTSelection.

The fts:evaluate function returns the AllMatches that is the result of evaluating the FTSelection. When fts:evaluate is applied to some FTSelection X, it calls the function fts:ApplyX to build the resulting AllMatches. If X is applied on nested FTSelections, the fts:evaluate function is recursively called on these nested FTSelections and the returned AllMatches are used in the evaluation of fts:ApplyX.

The semantics for the fts:evaluate function is given below.

declare function fts:evaluate (
      $ftSelect as element(*, fts:FTSelection), 
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $searchTokenNum as xs:integer )
   as element(fts:allMatches)
{
   if (fn:count($ftSelect/fts:matchOptions) > 0) then 
      (: First we deal with all match options that the    :)
      (: FTSelection might bear: we add the match options :)
      (: in front of the current match options sequence   :)
      (: and pass the new sequence to the recursive call  :)
      let $newFTSelection := 
         <fts:selection>{$ftSelect/*[fn:not(self::fts:matchOptions)]}</fts:selection>
      return fts:evaluate($newFTSelection, 
                          $searchContext, 
                          element fts:matchOptions {
                             $ftSelect/fts:matchOptions/*, 
                             $matchOptions/*
                          },
                          $searchTokenNum)
   else if (fn:count($ftSelect/fts:weight) > 0) then
      (: Weight has no bearing on semantics -- just :)
      (: call "evaluate" on nested FTSelection     :)
      let $newFTSelection := $ftSelect/*[fn:not(self::fts:weight)]
      return fts:evaluate($newFTSelection, 
                          $searchContext, 
                          $matchOptions,
                          $searchTokenNum)
   else
      typeswitch ($ftSelect/*[1]) 
         case $nftSelection as element(fts:FTWords) return
            (: Apply the FTWords in the search context :)
            fts:ApplyFTWords($searchContext,
                             $matchOptions,
                             $nftSelection/@type,
                             $nftSelection/fts:searchItem,
                             $searchTokenNum + 1)
         case $nftSelection as element(fts:FTAnd) return
            let $left := fts:evaluate($nftSelection/fts:left,
                                     $searchContext,
                                     $matchOptions,
                                     $searchTokenNum)
            let $newSearchTokenNum := $left/@stokenNum
            let $right := fts:evaluate($nftSelection/fts:right,
                                      $searchContext,
                                      $matchOptions,
                                      $newSearchTokenNum)
            return fts:ApplyFTAnd($left, $right)
         case $nftSelection as element(fts:FTOr) return
            let $left := fts:evaluate($nftSelection/fts:left,
                                     $searchContext,
                                     $matchOptions,
                                     $searchTokenNum)
            let $newSearchTokenNum := $left/@stokenNum
            let $right := fts:evaluate($nftSelection/fts:right,
                                      $searchContext,
                                      $matchOptions,
                                      $newSearchTokenNum)
            return fts:ApplyFTOr($left, $right)
         case $nftSelection as element(fts:FTUnaryNot) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTUnaryNot($nested)
         case $nftSelection as element(fts:FTMildNot) return
            let $left := fts:evaluate($nftSelection/fts:left,
                                     $searchContext,
                                     $matchOptions,
                                     $searchTokenNum)
            let $newSearchTokenNum := $left/@stokenNum
            let $right := fts:evaluate($nftSelection/fts:right,
                                      $searchContext,
                                      $matchOptions,
                                      $newSearchTokenNum)
            return fts:ApplyFTMildNot($left, $right)
         case $nftSelection as element(fts:FTOrder) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTOrder($nested)
         case $nftSelection as element(fts:FTScope) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTScope($nftSelection/@type, 
                                    $nftSelection/@scope,
                                    $nested)
         case $nftSelection as element(fts:FTContent) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTContent($searchContext,
                                      $matchOptions,
                                      $nftSelection/@type, 
                                      $nested)
         case $nftSelection as element(fts:FTDistance) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTDistance($matchOptions,
                                       $nftSelection/@type,
                                       $nftSelection/fts:range,
                                       $nested)
         case $nftSelection as element(fts:FTWindow) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTWindow($matchOptions,
                                     $nftSelection/@type,
                                     $nftSelection/@size,
                                     $nested)
         case $nftSelection as element(fts:FTTimes) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $searchTokenNum)
            return fts:ApplyFTTimes($nftSelection/fts:range,
                                    $nested)
         default return ()
};
            

For concreteness, assume that the FTSelection was invoked inside an ftcontains expression such as searchContext ftcontains ftselection. In order to determine the AllMatches result of ftselection, the fts:evaluate function is invoked as follows: fts:evaluate($ftselection, $searchContext, $matchOptions, 0), where $ftselection is the XML representation of the ftselection and $searchContext is bound to the result of the evaluation of the XQuery expression searchContext.

Initially, the $searchTokensNum is 0, i.e., no search tokens have been processed.

The variable $matchOptions is bound to the list of match options as defined in the static context (see Appendix C Static Context Components). Match options embedded in ftselection modify the match options collection as evaluation proceeds.

Match options are applied to an FTSelection, organized in a stack.

  • The top match option in the stack is applied first.

  • The second match option is applied next.

  • Match options are applied sequentially down to the bottom of the stack.

Ordering among match options is necessary because match options are not always commutative. For example, synonym(stem(word)) is not always the same as stem(synonym(word)). Naturally, match options may be reordered when they commute, but this is an optimization issue and is beyond the scope of this document.

Given the invocation of: fts:evaluate($ftselection, $searchContext, $matchOptions), evaluation proceeds as follows. First, $ftselection is checked to see whether a match option is applied 1) on a nested FTSelection, 2) on a weight specification, 3) on an FTWords, or 4) on some other FTSelection (case 4).

  1. If $ftselection contains a match option, then it modifies the context for the nested FTSelection. Consequently, a new match option element is created and pushed onto the top of the stack of match options. The createOptionElement function used to create a stack element corresponding to the match option creates a data structure that stores the type of match option, such as stemming, thesaurus, and the details relating to the match option, such as the name of the thesaurus, the stop words for which other tokens may be substituted. The context match option created is added to the top of the stack because, in the FTSelection, it was applied before the other match options in the current match options stack. The evaluate function is then invoked on the nested FTSelection with the new match options stack. When the function returns, the match option is popped from the stack, and the result of the nested evaluate function is returned. The match option is popped because the match options do not apply to FTSelections outside its scope.

  2. If $ftselection contains a weight specification, then the specification is ignored because it does not alter the semantics. The evaluate function is recursively called on the nested FTSelection and the resulting AllMatches is returned.

  3. If $ftselection is an FTWords, then it does not have any nested FTSelections. Consequently, this is the base of the recursive call, and the AllMatches result of the FTWords is computed and returned. The AllMatches is computed by invoking the ApplyFTWords function with the current search context and other necessary information.

  4. If $ftselection contains neither a match option nor a weight specification and is not an FTWords, the FTSelection performs a full-text operation, such as &&, ||, window. These operations are fully-compositional and may be invoked on nested FTSelections. Consequently, evaluation proceeds as follows.

    • First, the evaluate function is recursively invoked on each nested FTSelection. The result of evaluating each nested FTSelection is an AllMatches.

    • The AllMatches are transformed into the resulting AllMatches by applying the full-text operation corresponding to FTSelection1 which is generically named applyX for some type of FTSelection X in the code.

    For example, let FTSelection1 be FTSelection2 && FTSelection3 . Here FTSelection2 and FTSelection3 may themselves be arbitrarily nested FTSelections. Thus, evaluate is invoked on FTSelection2 and FTSelection3, and the resulting AllMatches are transformed to the final AllMatches using the ApplyFTAnd function corresponding to && .

The semantics of the ApplyX function for each FTSelection kind X is given below.

4.2.2.3 Formal semantics functions

The formal semantics of the applyX functions for each FTSelection kind X is specified by six functions. How these six functions are computed is implementation-dependent, but the functions must satisfy some well-defined properties.

The getTokenInfo function is described in Section 4.1 Tokenization.

The wordDistance function returns the number of tokens that occur between the positions of the TokenInfos $tokenInfo1 and $tokenInfo2. For example, two consecutive tokens have a distance of 0 tokens.

declare function fts:wordDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo),
             $matchOptions as element(fts:matchOptions) ) 
   as xs:integer external;
            

The getParaDistance function returns the number of paragraphs between the TokenInfos $tokenInfo1 and $tokenInfo2.

declare function fts:paraDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo),
             $matchOptions as element(fts:matchOptions) ) 
   as xs:integer external;
            

The sentenceDistance function returns the number of sentences between the TokenInfos $tokenInfo1 and $tokenInfo2.

declare function fts:sentenceDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo),
             $matchOptions as element(fts:matchOptions) ) 
   as xs:integer external;
            

The isStartToken function returns true if the TokenInfo $tokenInfo describes a token whose start position is the first position of the node $searchContext.

declare function fts:isStartToken (
             $searchContext as item(),
             $tokenInfo as element(fts:tokenInfo) ) 
   as xs:boolean external;
            

The isEndToken function returns true if the TokenInfo $tokenInfo describes a token whose end position is the last position of the node $searchContext.

declare function fts:isEndToken (
             $searchContext as item(),
             $tokenInfo as element(fts:tokenInfo) ) 
   as xs:boolean external;
            
4.2.2.4 FTWords

An FTWords that consists of a single search string consisting of a sequence of token to be matched as a phrase is evaluated by the applySearchTokensAsPhrase function. Its parameters are 1) the search context, 2) the list of match options, 3) the search string to be matched as a sequence of fts:searchToken items, and 4) the position where the latter search string occurs in the query.

(: simplified version not dealing with special match options :)
declare function fts:applySearchTokensAsPhrase (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $searchTokens as element(fts:searchToken)*,
      $queryPos as xs:integer )
   as element(fts:allMatches)
{
   <fts:allMatches stokenNum="{$queryPos}"> 
   {
      for $tokenInfo in
         fts:matchTokenInfos( 
            $searchContext,
            $matchOptions,
            (),
            $searchTokens )
      return  
         <fts:match>  
            <fts:stringInclude queryPos="{$queryPos}"> 
            {$tokenInfo}
            </fts:stringInclude> 
         </fts:match>
   } 
   </fts:allMatches>
};

If after the application of all the match options, the sequence of search tokens returned for an FTWords is empty, an empty AllMatches is returned.

The AllMatches corresponding to an FTWords is a set of Matches. Each of the Matches is associated with a start and an end position indicating where the corresponding search tokens were found. For example, the AllMatches result for the FTWords "Mustang" is given below. To simplify the presentation in the figures we write Pos: N, if the attributes startPos and endPos are the same with N being that position.

FTWords example

There are five variations of FTWords depending on how the tokens and phrases in the nested XQuery 1.0 and XPath 2.0 expression are matched.

  • When any word is specified, at least one token in the tokenization of the nested expression must be matched.

  • When all word is specified, all tokens in the tokenization of the nested expression must be matched.

  • When phrase is specified, all tokens in the tokenization of the nested expression must be matched as a phrase.

  • When any is specified, at least one string atomic value in the nested expression must be matched as a phrase.

  • When all is specified, all string atomic values in the nested expression must be matched as a phrase.

The semantics for FTWords when any word is specified is given below. Since FTWords does not have nested FTSelections, the ApplyFTWords function does not take AllMatches parameters corresponding to nested FTSelection results.

declare function fts:MakeDisjunction (
      $curRes as element(fts:allMatches),
      $rest as element(fts:allMatches)* ) 
   as element(fts:allMatches) 
{
   if (fn:count($rest) = 0)
   then $curRes
   else 
      let $firstAllMatches := $rest[1]
      let $restAllMatches := fn:subsequence($rest, 2)
      let $newCurRes := fts:ApplyFTOr($curRes, 
                                      $firstAllMatches)
      return fts:MakeDisjunction($newCurRes, 
                                 $restAllMatches)
};

declare function fts:ApplyFTWordsAnyWord (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $searchItems as element(fts:searchItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   (: Tokenization of search string has already occurred. :)
   (: Get sequence of SearchTokens over all search items. :)
   let $searchTokens := $searchItems/fts:searchToken
   return
      if (fn:count($searchItems) eq 0) 
      then <fts:allMatches stokenNum="0" />
      else
         let $allAllMatches := 
            for $searchToken at $pos in $searchTokens
            return fts:applySearchTokensAsPhrase($searchContext,
                                                 $matchOptions,
                                                 $searchToken,
                                                 $queryPos + $pos - 1)
         let $firstAllMatches := $allAllMatches[1]
         let $restAllMatches := fn:subsequence($allAllMatches, 2)
         return fts:MakeDisjunction($firstAllMatches, $restAllMatches)
};

The tokenized search strings are passed to ApplyFTWordsAnyWord as a sequence of fts:searchItem, each containing the tokens of a single search string. A single flattened sequence of all tokens (of type fts:searchToken) over all search items is constructed. For each of these, the result of FTWords is computed using applySearchTokensAsPhrase. Finally, the disjunction of all resulting AllMatches is computed.

The semantics for FTWords when all word is specified is similar to the above, however composes a conjunction. It is given below.

declare function fts:MakeConjunction ( 
      $curRes as element(fts:allMatches),
      $rest as element(fts:allMatches)* ) 
   as element(fts:allMatches)
{
   if (fn:count($rest) = 0)
   then $curRes
   else 
      let $firstAllMatches := $rest[1]
      let $restAllMatches := fn:subsequence($rest, 2)
      let $newCurRes := fts:ApplyFTAnd($curRes, 
                                       $firstAllMatches)
      return fts:MakeConjunction($newCurRes, 
                                 $restAllMatches)
};

declare function fts:ApplyFTWordsAllWord (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $searchItems as element(fts:searchItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   (: Tokenization of search strings has already occurred. :)
   (: Get sequence of SearchTokens over all search items :)
   let $searchTokens := $searchItems/fts:searchToken
   return
      if (fn:count($searchTokens) eq 0) 
      then <fts:allMatches stokenNum="0" />
      else
         let $allAllMatches := 
            for $searchToken at $pos in $searchTokens
            return fts:applySearchTokensAsPhrase($searchContext,
                                                 $matchOptions,
                                                 $searchToken,
                                                 $queryPos + $pos - 1)
            let $firstAllMatches := $allAllMatches[1]
            let $restAllMatches := fn:subsequence($allAllMatches, 2)
            return fts:MakeConjunction($firstAllMatches, $restAllMatches)
};

The semantics for FTWords if phrase is specified is given below.

declare function fts:ApplyFTWordsPhrase (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $searchItems as element(fts:searchItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   (: Get sequence of SearchTokenInfos over all search items :)
   let $searchTokens := $searchItems/fts:searchToken
   return
      if (fn:count($searchTokens) eq 0) 
      then <fts:allMatches stokenNum="0" />
      else
         fts:applySearchTokensAsPhrase($searchContext,
                                       $matchOptions,
                                       $searchTokens,
                                       $queryPos)
};

The ApplyFTWordsPhrase function also flattens the sequence of search items to a sequence of search tokens, but then calls applySearchTokensAsPhrase on that entire sequence, instead of calling it on each search token individually. Hence, the sequence of all search tokens is matched as a single phrase and the computed TokenInfos are returned.

The semantics for FTWords when any is specified is given below.

declare function fts:ApplyFTWordsAny (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $searchItems as element(fts:searchItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   if (fn:count($searchItems) eq 0) 
   then <fts:allMatches stokenNum="0" />
   else 
      let $firstSearchItem := $searchItems[1]
      let $restSearchItem := fn:subsequence($searchItems, 2)
      let $firstAllMatches := 
         fts:ApplyFTWordsPhrase($searchContext,
                                $matchOptions,
                                $firstSearchItem,
                                $queryPos)
      let $newQueryPos := 
         if ($firstAllMatches//@queryPos) 
         then fn:max($firstAllMatches//@queryPos) + 1
         else $queryPos
      let $restAllMatches :=
         fts:ApplyFTWordsAny($searchContext,
                             $matchOptions,
                             $restSearchItem,
                             $newQueryPos)
      return fts:ApplyFTOr($firstAllMatches, $restAllMatches)
};

The FTWords with any specified forms the disjunction of the AllMatches that are the result of the matching of each search item as a phrase.

The semantics for FTWords when all is specified is given below.

declare function fts:ApplyFTWordsAll (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $searchItems as element(fts:searchItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   if (fn:count($searchItems) = 0) 
   then <fts:allMatches stokenNum="0" />
   else 
      let $firstSearchItem := $searchItems[1]
      let $restSearchItem := fn:subsequence($searchItems, 2)
      let $firstAllMatches := 
         fts:ApplyFTWordsPhrase($searchContext,
                                $matchOptions,
                                $firstSearchItem,
                                $queryPos)
      return
         if ($restSearchItem) then
            let $newQueryPos := 
               if ($firstAllMatches//@queryPos) 
               then fn:max($firstAllMatches//@queryPos) + 1
               else $queryPos
            let $restAllMatches :=
               fts:ApplyFTWordsAll($searchContext,
                                   $matchOptions,
                                   $restSearchItem,
                                   $newQueryPos)
            return 
               fts:ApplyFTAnd($firstAllMatches, $restAllMatches)
         else $firstAllMatches
};

The difference between all and any is the use of conjunction instead of disjunction.

The ApplyFTWords function combines all of these functions.

declare function fts:ApplyFTWords ( 
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $type as fts:FTWordsType,
      $searchItems as element(fts:searchItem)*, 
      $queryPos as xs:integer )
   as element(fts:allMatches) 
{
   if ($type eq "any word")
   then fts:ApplyFTWordsAnyWord($searchContext,
                                $matchOptions,
                                $searchItems,
                                $queryPos)
   else if ($type eq "all word")
   then fts:ApplyFTWordsAllWord($searchContext,
                                $matchOptions,
                                $searchItems,
                                $queryPos)
   else if ($type eq "phrase")
   then fts:ApplyFTWordsPhrase($searchContext,
                               $matchOptions,
                               $searchItems,
                               $queryPos)
   else if ($type eq "any")
   then fts:ApplyFTWordsAny($searchContext,
                            $matchOptions,
                            $searchItems,
                            $queryPos)
   else fts:ApplyFTWordsAll($searchContext,
                            $matchOptions,
                            $searchItems,
                            $queryPos)
};
                
4.2.2.5 FTOr

The parameters of the ApplyFTOr function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The search context and the match options stack are not used by this function. The semantics is given below.

declare function fts:ApplyFTOr (
      $allMatches1 as element(fts:allMatches),
      $allMatches2 as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, 
                                       $allMatches2/@stokenNum))}">
   {$allMatches1/fts:match,$allMatches2/fts:match}
   </fts:allMatches>
};
            

The ApplyFTOr function creates a new AllMatches in which Matches are the union of those found in the input AllMatches. Each Match represents one possible result of the corresponding FTSelection. Thus, a Match from either of the AllMatches is a result.

For example, consider the FTSelection "Mustang" || "Honda". The AllMatches corresponding to "Mustang" and "Honda" are given below.

FTOr input AllMatches 1FTOr input AllMatches 2

The AllMatches produced by ApplyFTOr is given below.

FTOr result AllMatches
4.2.2.6 FTAnd

The parameters of the ApplyFTAnd function are the two AllMatches corresponding to the results of the two nested FTSelections. The search context and the match options are not used by this function. The semantics is given below.

declare function fts:ApplyFTAnd (
      $allMatches1 as element(fts:allMatches),
      $allMatches2 as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, 
                                       $allMatches2/@stokenNum))}" >
   {
      for $sm1 in $allMatches1/fts:match
      for $sm2 in $allMatches2/fts:match
      return <fts:match>
             {$sm1/*, $sm2/*}
             </fts:match>
   }
   </fts:allMatches>
};
            

The result of the conjunction is a new AllMatches that contains the "Cartesian product" of the matches of the participating FTSelections. Every resulting Match is formed by the combination of the StringInclude components and StringExclude from the AllMatches of the nested FTSelection . Thus every match contains the positions to satisfy a Match from both original FTSelections and excludes the positions that violate the same Matches.

For example, consider the FTSelection "Mustang" && "rust". The source AllMatches are give below.

FTAnd input AllMatches 1FTAnd input AllMatches 2

The AllMatches produced by ApplyFTAnd is given below.

FTAnd result AllMatches
4.2.2.7 FTUnaryNot

The parameters of the ApplyFTUnaryNot function are 1) the search context, 2) the list of match options, and 3) one AllMatches parameter corresponding to the result of the nested FTSelection to be negated. The search context and the match options are not used by this function. The semantics is given below.

declare function fts:InvertStringMatch ( $strm as element(*,fts:StringMatch) ) 
   as element(*,fts:StringMatch)
{
   if ($strm instance of element(fts:stringExclude)) then
      <fts:stringInclude queryPos="{$strm/@queryPos}">
      {$strm/fts:tokenInfo}
      </fts:stringInclude>
   else
      <fts:stringExclude queryPos="{$strm/@queryPos}">
      {$strm/fts:tokenInfo}
      </fts:stringExclude>
};

declare function fts:UnaryNotHelper ( $matches as element(fts:match)* )
   as element(fts:match)*
{
   if (fn:count($matches) = 0 
   then <match/>
   else
      for $sm in $matches[1]/*
      for $rest in fts:UnaryNotHelper( fn:subsequence($matches, 2) )
      return 
         <fts:match>
         {
            fts:InvertStringMatch($sm),
            $rest/*
         }
         </fts:match>
};

declare function fts:ApplyFTUnaryNot (
      $allMatches as element(fts:allMatches) )
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      fts:UnaryNotHelper($allMatches/fts:match)
   }
   </fts:allMatches>
};
            

The generation of the resulting AllMatches of an FTUnaryNot resembles the transformation of a negation of prepositional formula in DNF back to DNF. The negation of AllMatches requires the inversion of all the conditions on the nodes encoded by the AllMatches.

In the InvertStringMatch function above, this inversion occurs as follows.

  1. The function fts:invertStringMatch inverts a stringInclude into a stringExclude and vice versa.

  2. The function fts:UnaryNotHelper transforms the source Matches into the resulting Matches by forming the combinations of the inversions of a StringInclude or StringExclude component over the source Matches into new Matches.

For example, consider the FTSelection ! ("Mustang" || "Honda"). The source AllMatches is given below:

FTUnaryNot input AllMatches

The FTUnaryNot transforms the StringIncludes to StringExcludes as illustrated below.

FTUnaryNot result AllMatches
4.2.2.8 FTMildNot

The parameters of the ApplyFTMildNot function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The search context and the match options stack are not used by this function. The semantics is given below.

declare function fts:ApplyFTMildNot (
      $allMatches1 as element(fts:allMatches),
      $allMatches2 as element(fts:allMatches) ) 
   as element(fts:allMatches)
{
   if (fn:count($allMatches2//fts:stringExclude) gt 0) then
      fn:error("Invalid expression on the right-hand side of a not-in")
   else
      <fts:allMatches stokenNum="{$allMatches1/@stokenNum}">
      {
         let $posSet2 := $allMatches2/fts:match/fts:stringInclude/@queryPos
         return 
            $allMatches1/fts:match[
               every $pos1 in ./fts:stringInclude/@queryPos, 
                     $pos2 in $posSet2
               satisfies $pos1 ne $pos2]
      }
      </fts:allMatches>
};
            

The resulting AllMatches contains Matches of the first operand that do not mention in their stringInclude components positions in a StringInclude component in the AllMatches of the second operand.

For example, consider the FTSelection ("Ford" mildnot "Ford Mustang"). The source AllMatches for the left-hand side argument is given below.

FTMildnot input AllMatches 1

The source AllMatches for the right-hand side argument is given below.

FTMildnot input AllMatches 2

The FTMildNot will transform these to an empty AllMatches because both position 1 and position 27 from the first AllMatches contain only TokenInfos from stringInclude components of the second AllMatches.

4.2.2.9 FTOrder

The parameters of the ApplyFTOrder function are 1) the search context, 2) the list of match options, and 3) one AllMatches parameter corresponding to the result of the nested FTSelections. The evaluation context and the match options are not used by this function. The semantics is given below.

declare function fts:ApplyFTOrder (
      $allMatches as element(fts:allMatches) )
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude
            satisfies (($stringInclude1/fts:tokenInfo/@startPos <= 
                        $stringInclude2/fts:tokenInfo/@startPos)
                       and
                       ($stringInclude1/@queryPos <= 
                        $stringInclude2/@queryPos))
                      or
                       (($stringInclude1/fts:tokenInfo/@startPos>= 
                         $stringInclude2/fts:tokenInfo/@startPos)
                        and
                        ($stringInclude1/@queryPos >= 
                         $stringInclude2/@queryPos))
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where every $stringIncl in $match/fts:stringInclude
                  satisfies (($stringExcl/fts:tokenInfo/@startPos <= 
                              $stringIncl/fts:tokenInfo/@startPos)
                             and
                              ($stringExcl/@queryPos <= 
                               $stringIncl/@queryPos))
                            or
                             (($stringExcl/fts:tokenInfo/@startPos >= 
                               $stringIncl/fts:tokenInfo/@startPos)
                              and
                              ($stringExcl/@queryPos >= 
                               $stringIncl/@queryPos))
            return $stringExcl
         }
         </fts:match>
   }         
   </fts:allMatches>
};
            

The resulting AllMatches contains the Matches for which the start positions in the StringInclude elements are in the order of the query positions of their query strings. StringExcludes that preserve the order (with respect to their start positions) are also retained.

For example, consider the FTSelection ("great" && "condition") ordered. The source AllMatches is given below.

FTOrder input AllMatchesFTOrder input AllMatchesFTOrder input AllMatches

The AllMatches for FTOrder are given below.

FTOrder result AllMatchesFTOrder result AllMatches
4.2.2.10 FTScope

The parameters of the ApplyFTScope function are 1) the search context, 2) the list of match options, 3) the type of the scope (same or different), 4) the linguistic unit (sentence or paragraph), and 5) one AllMatches parameter corresponding to the result of the nested FTSelections. The search context and the match options are not used by this function. The function definitions depend on the type of the scope (paragraph, sentence) and the scope predicate (same, different).

The semantics of same sentence is given below.

declare function fts:ApplyFTScopeSameSentence (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude 
            satisfies $stringInclude1/fts:tokenInfo/@startSent = 
                      $stringInclude2/fts:tokenInfo/@startSent
                  and $stringInclude1/fts:tokenInfo/@startSent = 
                      $stringInclude1/fts:tokenInfo/@endSent
                  and $stringInclude2/fts:tokenInfo/@startSent = 
                      $stringInclude2/fts:tokenInfo/@endSent
                  and $stringInclude1/fts:tokenInfo/@startSent > 0
                  and $stringInclude2/fts:tokenInfo/@startSent > 0
      return 
        <fts:match>
        {
           $match/fts:stringInclude,
           for $stringExcl in $match/fts:stringExclude
           where
              $stringExcl/fts:tokenInfo/@startSent = 0
              or
              ($stringExcl/fts:tokenInfo/@startSent = 
               $stringExcl/fts:tokenInfo/@endSent
               and 
                  (every $stringIncl in $match/fts:stringInclude
                   satisfies $stringIncl/fts:tokenInfo/@startSent = 
                             $stringExcl/fts:tokenInfo/@startSent) )
           return $stringExcl
        }
        </fts:match>
   }
   </fts:allMatches>
};

An AllMatches returned by the scope same sentence contains those Matches whose StringIncludes span only a single sentence and all span the same sentence. In these Matches only those StringExcludes are retained that also only span a single sentence, which is, in case there are StringIncludes in that Match, the same as the one spanned by the StringIncludes.

The semantics of different sentence is given below.

declare function fts:ApplyFTScopeDifferentSentence (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude  
            satisfies $stringInclude1 = $stringInclude2
                  or $stringInclude1/fts:tokenInfo/@endSent <
                     $stringInclude2/fts:tokenInfo/@startSent
                  or $stringInclude2/fts:tokenInfo/@endSent <
                     $stringInclude1/fts:tokenInfo/@startSent
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where every $stringIncl in $match/fts:stringInclude
                  satisfies $stringExcl/fts:tokenInfo/@endSent <  
                            $stringIncl/fts:tokenInfo/@startSent
                         or $stringIncl/fts:tokenInfo/@endSent <
                            $stringExcl/fts:tokenInfo/@startSent
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

An AllMatches returned by the scope different sentence contains those Matches that have no two StringIncludes covering the same sentence. In these Matches only those StringExcludes are retained that also do not cover a common sentence with one of the StringIncludes.

The semantics of same paragraph is analogous to same sentence and is given below.

declare function fts:ApplyFTScopeSameParagraph (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude  
            satisfies $stringInclude1/fts:tokenInfo/@startPara = 
                      $stringInclude2/fts:tokenInfo/@startPara
                  and $stringInclude1/fts:tokenInfo/@startPara = 
                      $stringInclude1/fts:tokenInfo/@endPara
                  and $stringInclude2/fts:tokenInfo/@startPara = 
                      $stringInclude2/fts:tokenInfo/@endPara
                  and $stringInclude1/fts:tokenInfo/@startPara > 0
                  and $stringInclude2/fts:tokenInfo/@endPara > 0
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where 
               $stringExcl/fts:tokenInfo/@startPara = 0
               or
               ($stringExcl/fts:tokenInfo/@startPara = 
                $stringExcl/fts:tokenInfo/@endPara
                and
                   (every $stringIncl in $match/fts:stringInclude
                    satisfies $stringIncl/fts:tokenInfo/@startPara = 
                              $stringExcl/fts:tokenInfo/@startPara) )
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of different paragraph is analogous to different sentence and is given below.

declare function fts:ApplyFTScopeDifferentParagraph (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude  
            satisfies $stringInclude1 = $stringInclude2
                   or $stringInclude1/fts:tokenInfo/@endPara <
                      $stringInclude2/fts:tokenInfo/@startPara
                   or $stringInclude2/fts:tokenInfo/@endPara <
                      $stringInclude1/fts:tokenInfo/@startPara
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where every $stringIncl in $match/fts:stringInclude
                  satisfies $stringExcl/fts:tokenInfo/@endPara <  
                            $stringIncl/fts:tokenInfo/@startPara
                         or $stringIncl/fts:tokenInfo/@endPara <
                            $stringExcl/fts:tokenInfo/@startPara
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics for the general case is given below.

declare function fts:ApplyFTScope (
      $type as fts:ScopeType,
      $selector as fts:ScopeSelector, 
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "same" and $selector eq "sentence")
   then fts:ApplyFTScopeSameSentence($allMatches)
   else if ($type eq "different" and $selector eq "sentence")
      then fts:ApplyFTScopeDifferentSentence($allMatches)
   else if ($type eq "same" and $selector eq "paragraph")
      then fts:ApplyFTScopeSameParagraph($allMatches)
   else fts:ApplyFTScopeDifferentParagraph($allMatches)
};

For example, consider the FTSelection ("Mustang" && "Honda") same paragraph. The source AllMatches is given below.

FTScope input AllMatches

The FTScope returns an empty AllMatches because neither Match contains TokenInfos from a single sentence.

4.2.2.11 FTContent

The parameters of the ApplyFTContent function are 1) the search context, 2) the match options, 3) the type of the content match at the start of the current node, at the end of it, or its entire content, and 4) one AllMatches parameter corresponding to the result of the nested FTSelections. The semantics is given below.

declare function fts:ApplyFTContent (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $type as fts:ContentMatchType,
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "entire content") then
      let $temp1 := fts:ApplyFTWordDistanceExactly(
                        $matchOptions,
                        $allMatches,
                        1)
      let $temp2 := fts:ApplyFTContent(
                        $searchContext,
                        $matchOptions,
                        $temp1,
                        "at start")
      let $temp3 := fts:ApplyFTContent(
                        $searchContext,
                        $matchOptions,
                        $temp2,
                        "at end")
      return $temp3
   else
      <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
      {
         for $match in $allMatches/fts:match
         where if ($type eq "at start") then
                  some $si in $match/fts:stringInclude
                  satisfies fts:isStartToken($searchContext,
                                             $si/fts:tokenInfo)
               else (: $type eq "at end" :) 
                  some $si in $match/fts:stringInclude
                  satisfies fts:isEndToken($searchContext,
                                           $si/fts:tokenInfo)
         return $match
      }
      </fts:allMatches>
};

The evaluation of scope functions depends on the type of the content match.

  • entire match is evaluated as distance exactly 0 words at start at end, i.e., all the StringIncludes must match every token in the content of the current search context node.

  • at start retains only Matches that contain a StringInclude that matches the first token. This is checked using the semantic function fts:isStartToken.

  • at end retains the Matches that contain a StringInclude that matches the last token. This is checked using the semantic function fts:isEndToken.

4.2.2.12 FTDistance

The parameters of the ApplyFTDistance function are 1) the search context, 2) the list of match options, 3) one AllMatches parameter corresponding to the result of the nested FTSelections, 4) the unit of the distance (tokens, sentences, paragraphs), and 5) the range specified. The search context is not used by this function. The function definitions depend on the distance units and the range specifications.

The semantics of case word distance exactly N is given below.

declare function fts:ApplyFTWordDistanceExactly(
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $idx in 1 to fn:count($sorted) - 1
            satisfies fts:wordDistance(
                         $sorted[$idx]/fts:tokenInfo,
                         $sorted[$idx+1]/fts:tokenInfo,
                         $matchOptions) = $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) = $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of word distance at least N is given below.

declare function fts:ApplyFTWordDistanceAtLeast (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                         $sorted[$index]/fts:tokenInfo,
                         $sorted[$index+1]/fts:tokenInfo,
                         $matchOptions) >= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) >= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of word distance at most N is given below.

declare function fts:ApplyFTWordDistanceAtMost (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending
                     return $si
      where
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) <= $n 
      return 
        <fts:match>
        {
           $match/fts:stringInclude,
           for $stringExcl in $match/fts:stringExclude
           where some $stringIncl in $match/fts:stringInclude
                 satisfies fts:wordDistance(
                               $stringIncl/fts:tokenInfo,
                               $stringExcl/fts:tokenInfo,
                               $matchOptions) <= $n
           return $stringExcl
        }
        </fts:match>
   }
   </fts:allMatches>
};

The semantics of word distance from M to N is given below.

declare function fts:ApplyFTWordDistanceFromTo (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) >= $m 
                      and
                      fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) <= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) >= $m
                            and
                            fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) <= $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of sentence distance exactly N is given below.

declare function fts:ApplyFTSentenceDistanceExactly (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) = $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) = $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of sentence distance at least N is given below.

declare function fts:ApplyFTSentenceDistanceAtLeast (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) >= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) >= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of sentence distance at most N is given below.

declare function fts:ApplyFTSentenceDistanceAtMost (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) <= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) <= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of sentence distance from M to N is given below.

declare function fts:ApplyFTSentenceDistanceFromTo (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) >= $m 
                      and
                      fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) <= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) >= $m
                            and
                            fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) <= $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of paragraph distance exactly N is given below.

declare function fts:ApplyFTParagraphDistanceExactly (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) = $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) = $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of paragraph distance at least N is given below.

declare function fts:ApplyFTParagraphDistanceAtLeast (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) >= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) >= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of paragraph distance at most N is given below.

declare function fts:ApplyFTParagraphDistanceAtMost (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) <= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) <= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of paragraph distance from M to N is given below.

declare function fts:ApplyFTParagraphDistanceFromTo (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startPos ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) >= $m 
                      and
                      fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo,
                          $matchOptions) <= $n 
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) >= $m
                            and
                            fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo,
                                $matchOptions) <= $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The resulting AllMatches contains Matches of the operand that satisfy the condition that the distance for every pair of consecutive StringIncludes is within the specified interval, where the distance is measured in tokens, sentences, or paragraphs from the end of the preceding StringInclude to the start of the next.

In the general case, the semantics is given below.

declare function fts:ApplyFTDistance (
      $matchOptions as element(fts:matchOptions),
      $type as fts:DistanceType,
      $range as element(fts:range),
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "word") then
      if ($range/@type eq "exactly") then
         fts:ApplyFTWordDistanceExactly($matchOptions, 
                                        $allMatches, 
                                        $range/@n)
      else if ($range/@type eq "at least") then 
         fts:ApplyFTWordDistanceAtLeast($matchOptions, 
                                        $allMatches, 
                                        $range/@n)
      else if ($range/@type eq "at most") then
         fts:ApplyFTWordDistanceAtMost($matchOptions, 
                                       $allMatches,
                                       $range/@n)
      else fts:ApplyFTWordDistanceFromTo($matchOptions, 
                                         $allMatches, 
                                         $range/@m,
                                         $range/@n)
   else if ($type eq "sentence") then
      if ($range/@type eq "exactly") then
         fts:ApplyFTSentenceDistanceExactly($matchOptions, 
                                            $allMatches, 
                                            $range/@n)
      else if ($range/@type eq "at least") then
         fts:ApplyFTSentenceDistanceAtLeast($matchOptions, 
                                            $allMatches, 
                                            $range/@n)
      else if ($range/@type eq "at most") then
         fts:ApplyFTSentenceDistanceAtMost($matchOptions, 
                                           $allMatches, 
                                           $range/@n)
      else fts:ApplyFTSentenceDistanceFromTo($matchOptions, 
                                             $allMatches, 
                                             $range/@m,
                                             $range/@n)
   else if ($range/@type eq "exactly") then
      fts:ApplyFTParagraphDistanceExactly($matchOptions, 
                                          $allMatches, 
                                          $range/@n)
   else if ($range/@type eq "at least") then
      fts:ApplyFTParagraphDistanceAtLeast($matchOptions, 
                                          $allMatches, 
                                          $range/@n)
   else if ($range/@type eq "at most") then
      fts:ApplyFTParagraphDistanceAtMost($matchOptions, 
                                         $allMatches, 
                                         $range/@n)
   else fts:ApplyFTParagraphDistanceFromTo($matchOptions, 
                                           $allMatches, 
                                           $range/@m,
                                           $range/@n)
};

For example, consider the FTDistance selection ("Ford Mustang" && "excellent") distance at most 3 words. The Matches of the source AllMatches for ("Ford Mustang" && "excellent") are given below.

FTDistance input AllMatches

Continued on next diagram

FTDistance input AllMatches

Continued on next diagram

FTDistance input AllMatches

Continued on next diagram

FTDistance input AllMatches

Continued on next diagram

FTDistance input AllMatches

Continued on next diagram

FTDistance input AllMatches

The result for the FTDistance selection consists of only the first Match (with positions 1, 2, and 5), because only for this Match the word distance between consecutive TokenInfos is always less than or equal to 3. It is 1 for the first pair and 3 for the second.

4.2.2.13 FTWindow

The parameters of the ApplyFTWindow function are 1) the search context, 2) the list of match options, 3) the unit of type fts:DistanceType, 4) a size, and 5) one AllMatches parameter corresponding to the result of the nested FTSelections. The search context is not used by this function. For each unit type a function is defined as follows.

The semantics of window N words is given below.

declare function fts:ApplyFTWordWindow (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $minpos := fn:min($match/*/fts:tokenInfo/@startPos),
          $maxpos := fn:max($match/*/fts:tokenInfo/@endPos)
      for $windowStartPos in ($minpos to $maxpos - $n + 1)
      let $windowEndPos := $windowStartPos + $n - 1
      where fn:min($match/fts:stringInclude/fts:tokenInfo/@startPos) 
            >= $windowStartPos
            and fn:max($match/fts:stringInclude/fts:tokenInfo/@endPos) 
                <= $windowEndPos
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExclude in $match/fts:stringExclude
            where $stringExclude/fts:tokenInfo/@startPos >= $windowStartPos
                  and $stringExclude/fts:tokenInfo/@endPos <= $windowEndPos
            return $stringExclude
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of window N sentences is given below.

declare function fts:ApplyFTSentenceWindow (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $minpos := fn:min($match/*/fts:tokenInfo/@startSent),
          $maxpos := fn:max($match/*/fts:tokenInfo/@endSent)
      for $windowStartPos in ($minpos to $maxpos - $n + 1)
      let $windowEndPos := $windowStartPos + $n - 1
      where fn:min($match/fts:stringInclude/fts:tokenInfo/@startSent) 
            >= $windowStartPos
            and fn:max($match/fts:stringInclude/fts:tokenInfo/@endSent) 
                <= $windowEndPos
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExclude in $match/fts:stringExclude
            where $stringExclude/fts:tokenInfo/@startSent >= $windowStartPos
                  and $stringExclude/fts:tokenInfo/@endSent <= $windowEndPos
            return $stringExclude
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of word N paragraphs is given below.

declare function fts:ApplyFTParagraphWindow (
      $matchOptions as element(fts:matchOptions),
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches)
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $minpos := fn:min($match/*/fts:tokenInfo/@startPara),
          $maxpos := fn:max($match/*/fts:tokenInfo/@endPara)
      for $windowStartPos in ($minpos to $maxpos - $n + 1)
      let $windowEndPos := $windowStartPos + $n - 1
      where fn:min($match/fts:stringInclude/fts:tokenInfo/@startPara) 
            >= $windowStartPos
            and fn:max($match/fts:stringInclude/fts:tokenInfo/@endPara) 
                <= $windowEndPos
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExclude in $match/fts:stringExclude
            where $stringExclude/fts:tokenInfo/@startPara >= $windowStartPos
                  and $stringExclude/fts:tokenInfo/@endPara <= $windowEndPos
            return $stringExclude
         }
         </fts:match>
   }
   </fts:allMatches>
};

The resulting AllMatches contains Matches of the operand that satisfy the condition that there exists a sequence of the specified number of consecutive (token, sentence, or paragraph) positions, such that all StringIncludes are within that window, and the StringExcludes retained are also within that window.

The semantics for the general function is given below.

declare function fts:ApplyFTWindow (
      $matchOptions as element(fts:matchOptions),
      $type as fts:DistanceType,
      $size as xs:integer,
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "word") then
      fts:ApplyFTWordWindow($matchOptions, 
                            $allMatches, 
                            $size)
   else if ($type eq "sentence") then 
      fts:ApplyFTSentenceWindow($matchOptions, 
                                $allMatches, 
                                $size)
   else
      fts:ApplyFTParagraphWindow($matchOptions, 
                                 $allMatches, 
                                 $size)
};

For example, consider the FTWindow selection ("Ford Mustang" && "excellent") window 10 words. The Matches of the source AllMatches for ("Ford Mustang" && "excellent") are given below.

FTWindow AllMatches

Continued on next diagram

FTWindow AllMatches

Continued on next diagram

FTWindow AllMatches

Continued on next diagram

FTWindow AllMatches

Continued on next diagram

FTWindow AllMatches

Continued on next diagram

FTWindow AllMatches

The result for the FTWindow selection consists of only the first, the fifth, and the sixth Matches because their respective window sizes are 5, 4, and 9.

4.2.2.14 FTTimes

The parameters of the ApplyFTTimes function are 1)an FTRange specification, and 2) parameter corresponding to the result of the nested FTWords.

The function definitions depend on the range specification FTRange to limit the number of occurrences.

The general semantics is given below.

declare function fts:FormCombinations (
      $sms as element(fts:match)*, 
      $times as xs:integer ) 
   as element(fts:match)*
{
   if ( $times eq 1 ) then $sms
   else if (fn:count($sms) lt $times) then ()
   else if (fn:count($sms) eq $times) then 
      <fts:match>{$sms/*}</fts:match> 
   else (
      fts:FormCombinations(fn:subsequence($sms, 2), $times),
      for $combination in 
         fts:FormCombinations(fn:subsequence($sms, 2), $times - 1)
      return 
      <fts:match>
      {
         $sms[1]/*,
         $combination/*
      }
      </fts:match>
   )
};

declare function fts:FormRange (
      $sms as element(fts:match)*, 
      $l as xs:integer, 
      $u as xs:integer, 
      $stokenNum as xs:integer ) 
   as element(fts:allMatches)
{
   if ($l > $u) then ()
   else 
      let $am1 := <fts:allMatches stokenNum="{$stokenNum}">
                     {fts:FormCombinations($sms, $l)}
                  </fts:allMatches>
      let $am2 := <fts:allMatches stokenNum="{$stokenNum}">
                     {fts:FormCombinations($sms, $u+1)}
                  </fts:allMatches>
      return fts:ApplyFTAnd($am1,
                            fts:ApplyFTUnaryNot($am2))
};
            

The semantics of occurs exactly N times is given below.

declare function fts:ApplyFTTimesExactly (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   fts:FormRange($allMatches/match, $n, $n, $allMatches/@stokenNum)      
};

The semantics of occurs at least N times is given below.

declare function fts:ApplyFTTimesAtLeast (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> 
   {fts:FormCombinations($allMatches/fts:match, $n)} 
   </fts:allMatches>
};

The semantics of occurs at most N times is given below.

declare function fts:ApplyFTTimesAtMost (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   fts:FormRange($allMatches/fts:match, 0, $n, $allMatches/@stokenNum)
};

The semantics of occurs from M to N times is given below.

declare function fts:ApplyFTTimesFromTo (
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   fts:FormRange($allMatches/fts:match, $m, $n, $allMatches/@stokenNum)  
};

The way to ensure that there are at least N different matches of an FTSelection is to ensure that at least N of its Matches occur simultaneously. This is similar to forming their conjunction by combining N distinct Matches into one simple match. Therefore, the AllMatches for the selection condition specifying the range qualifier at least N contains the possible combinations of N simple matches of the operand and one Match for each combination negating the rest of the simple matches. This operations is performed in the function fts:FormCombinations.

The range [l, u] is represented by the condition at least l and not at least l+1.This transformation is performed in the function fts:FormRange.

The semantics for the general case is given below.

declare function fts:ApplyFTTimes (
      $range as element(fts:range),
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if (fn:count($allMatches//fts:stringExclude) gt 0) then
      fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0003'))
   else if ($range/@type eq "exactly") then
      fts:ApplyFTTimesExactly($allMatches, $range/@n)
   else if ($range/@type eq "at least") then 
      fts:ApplyFTTimesAtLeast($allMatches, $range/@n)
   else if ($range/@type eq "at most") then
      fts:ApplyFTTimesAtMost($allMatches, $range/@n)
   else fts:ApplyFTTimesFromTo($allMatches, 
                               $range/@m, 
                               $range/@n)
};

The above function performs a sanity check to ensure that the nested AllMatches is a result of the evaluation of FTWords as defined in the grammar.

[149]    FTWordsSelection    ::=    (FTWords FTTimes?) | ("(" FTSelection ")")

Otherwise, an error [err:XPST0003]XP is raised.

For example, consider the FTTimes selection "Mustang" occurs at least 2 times. The source AllMatches of the FTWords selection "Mustang" is given below.

FTTimes input AllMatches

The result consists of the pairs of the Matches.

FTTimes result AllMatches

4.2.3 Match Options Semantics

4.2.3.1 Types

XQuery 1.0 functions are used to define the semantics of FTMatchOptions. These functions operate on an XML representation of the FTMatchOptions. The representation closely follows the syntax. Each FTMatchOption is represented by an XML element. Additional characteristics of the match option are represented as attributes. The schema is given below.

<xs:schema 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:fts="http://www.w3.org/2006/xquery-full-text"
    targetNamespace="http://www.w3.org/2006/xquery-full-text"
    elementFormDefault="qualified" 
    attributeFormDefault="unqualified">

  <xs:complexType name="FTMatchOptions">
    <xs:sequence minOccurs="0" maxOccurs="unbounded">
       <xs:element ref="fts:matchOption"/>
    </xs:sequence>
  </xs:complexType>

  <xs:element name="matchOptions" type="fts:FTMatchOptions"/>

  <xs:element name="matchOption" abstract="true" type="fts:FTMatchOption"/>

  <xs:complexType name="FTMatchOption">
  </xs:complexType>
 
  <xs:element name="case" substitutionGroup="fts:matchOption"
              type="fts:FTCaseOption" />
  <xs:element name="diacritics" substitutionGroup="fts:matchOption"
              type="fts:FTDiacriticsOption" />
  <xs:element name="thesaurus" substitutionGroup="fts:matchOption"
              type="fts:FTThesaurusOption" />
  <xs:element name="stem" substitutionGroup="fts:matchOption"
                type="fts:FTStemOption" />
  <xs:element name="wildcard" substitutionGroup="fts:matchOption"
              type="fts:FTWildCardOption" />
  <xs:element name="language" substitutionGroup="fts:matchOption"
              type="fts:FTLanguageOption" />
  <xs:element name="stopWord" substitutionGroup="fts:matchOption"
              type="fts:FTStopwordOption" /> 

 <xs:complexType name="FTCaseOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
       <xs:attribute name="caseIndicator">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="insensitive"/>
              <xs:enumeration value="sensitive"/>
              <xs:enumeration value="lowercase"/>
              <xs:enumeration value="uppercase"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="language" type="xs:string"/>
     </xs:extension>
   </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="FTDiacriticsOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
        <xs:attribute name="diacriticsIndicator">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="insensitive"/>
              <xs:enumeration value="sensitive"/>
              <xs:enumeration value="with"/>
              <xs:enumeration value="without"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="language" type="xs:string"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
           
  <xs:complexType name="FTThesaurusOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
        <xs:sequence>
          <xs:element name="thesaurusName" type="xs:string" 
                      minOccurs="0" maxOccurs="1"/>
          <xs:element name="relationship" type="xs:string" 
                      minOccurs="0" maxOccurs="1"/>
          <xs:element name="range" type="fts:FTRangeSpec" 
                      minOccurs="0" maxOccurs="1"/>
        </xs:sequence>
        <xs:attribute name="thesaurusIndicator">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="with"/>
              <xs:enumeration value="without"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="language" type="xs:string"/>
     </xs:extension>
   </xs:complexContent>
  </xs:complexType>
 
  <xs:complexType name="FTRangeSpec">
    <xs:attribute name="type" 
                  type="fts:RangeSpecType" 
                  use="required"/>
    <xs:attribute name="m" 
                  type="xs:integer"/>
    <xs:attribute name="n" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>
  
  <xs:simpleType name="RangeSpecType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="exactly"/>
      <xs:enumeration value="at least"/>
      <xs:enumeration value="at most"/>
      <xs:enumeration value="from to"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:complexType name="FTStemOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
        <xs:attribute name="stemIndicator">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="with"/>
              <xs:enumeration value="without"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="language" type="xs:string"/>
     </xs:extension>
   </xs:complexContent>
  </xs:complexType>
 
  <xs:complexType name="FTWildCardOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
        <xs:attribute name="wildcardIndicator">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="with"/>
              <xs:enumeration value="without"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:attribute>
        <xs:attribute name="language" type="xs:string"/>
     </xs:extension>
   </xs:complexContent>
  </xs:complexType>
 
  <xs:complexType name="FTLanguageOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
        <xs:attribute name="languageName" type="xs:string"/>
     </xs:extension>
   </xs:complexContent>
  </xs:complexType>

  <xs:complexType name="FTStopwordOption">
   <xs:complexContent>
     <xs:extension base="fts:FTMatchOption">
        <xs:sequence>
          <xs:choice>
            <xs:element name="default-stopwords">
                <xs:complexType />
            </xs:element>
            <xs:element name="stop-word" type="xs:string" />
            <xs:element name="uri" type="xs:anyURI" />
          </xs:choice>
          <xs:element name="oper" minOccurs="0" maxOccurs="unbounded">
            <xs:complexType>
              <xs:choice>
                <xs:element name="stop-word" type="xs:string" />
                <xs:element name="uri" type="xs:anyURI" />
              </xs:choice>
              <xs:attribute name="type">
                <xs:simpleType>
                  <xs:restriction base="xs:string">
                    <xs:enumeration value="union"/>
                    <xs:enumeration value="except"/>
                  </xs:restriction>
                </xs:simpleType>
              </xs:attribute>
            </xs:complexType>
          </xs:element>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>
 
</xs:schema>            
4.2.3.2 High-Level Semantics

The previous section described FTSelections without giving any details about how FTMatchOptions need to be interpreted. All processing of FTMatchOptions was delegated to the function matchTokenInfos, which is implementation-defined. In this section, further details on the semantics of FTMatchOptions are given.

The extension is achieved by modifying an existing function and adding functions that are specific to the FTMatchOptions.

Modifications in the semantics of existing functions

The semantics of most of the FTSelections remains unmodified. The modifications are to the method for matching a sequence of search tokens.

declare function fts:applySearchTokensAsPhrase (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $searchTokens as element(fts:searchToken)*,
      $queryPos as xs:integer )
   as element(fts:allMatches)
{
   let $thesaurusOption := $matchOptions/fts:thesaurus[1]
   return 
      if ($thesaurusOption and $thesaurusOption/@type eq "with") then
         let $noThesaurusOptions := 
            <fts:matchOptions>{
               $matchOptions/*[fn:not(self::fts:thesaurus)]
            }</fts:matchOptions>
         let $lookupRes := fts:applyThesaurusOption($thesaurusOption,
                                                    $searchTokens)            
         return fts:ApplyFTWordsAny($searchContext,
                                    $noThesaurusOption,
                                    $lookupRes,
                                    $queryPos)
      else
         (: from here on we have a single sequence of search tokens :)
         (: which is to be matched a phrase; no alternatives anymore :)
         <fts:allMatches stokenNum="{$queryPos}"> 
         {
            for $pos in
               fts:matchTokenInfos( 
                  $searchContext,
                  fts:reduceMatchOptions($matchOptions),
                  fts:applyStopwordOption($matchOptions/fts:stopWord),
                  $searchTokens )
            return  
               <fts:match>  
                  <fts:stringInclude queryPos="{$queryPos}"> 
                  {$pos}
                  </fts:stringInclude> 
               </fts:match>
         } 
         </fts:allMatches> 
};

Two FTMatchOptions need to be processed differently than the rest of the FTMatchOptions as shown in the function above.

  • Differently from all other FTMatchOptions the semantics of the FTThesaurusOption cannot be formulated as an operation on individual search tokens, because a thesaurus lookup may return alternative search items for a whole phrase, i.e., a sequence of search tokens. Since the result of a thesaurus lookup is a sequence of alternatives, there must be a higher level of processing. The above call to applyThesaurusOption> returns for the given sequence of search tokens (representing a phrase) all thesaurus expansions for the selected thesaurus, relationship and level range as a sequence of search items. The alternative expansions are evaluated as a disjunction using the fts:ApplyFTWordsAny. The matching of the alternatives is performed with FTThesaurusOption turned off to avoid double expansions, i.e., expansion of an already expanded token.

  • For the semantics of the FTStopWordOption the list of stop words needs to be computed as demanded by the special syntax for stop word lists involving the operators "union" and "except".

Semantics of new FTMatchOptions functions

The expansion of FTSelections also includes adding additional functions that are specific to the FTMatchOptions.

The above function applySearchTokensAsPhrase exhibits a call to the function reduceMatchOptions, which is defined here.

declare function fts:reduceMatchOptions (
      $matchOptions as element(fts:matchOptions) )
   as element(fts:matchOptions)
{
   <fts:matchOptions>
   {
      let $new-match-options := 
         $matchOptions/(fts:stopWord|
                        fts:case|
                        fts:diacritics|
                        fts:stem|
                        fts:language|
                        fts:wildcard)
      for $match-option at $index in $new-match-options
      where 
         fn:not(
            some $other in $new-match-options[fn:position() lt $index]
            satisfies fn:node-name($other) = fn:node-name($match-option)
         )
      return $match-option
   }
   </fts:matchOptions>
};

This function determines how match options of the same kind overwrite each other, so that only one option of the same kind remains.

The details of the semantics of the remaining FTMatchOptions are determined by the implementation-defined function matchTokenInfos.

4.2.3.3 Formal Semantics Functions

FTMatchOption functions which are necessary to support match option processing are given below.

declare function fts:resolveStopwordsUri ( $uri as xs:string? ) 
   as xs:string* external;

declare function fts:lookupThesaurus (
      $tokens as element(fts:searchToken)*,
      $thesaurusName as xs:string?, 
      $thesaurusLanguage as xs:string?,
      $relationship as xs:string?,
      $range as element(fts:range)? ) 
   as element(fts:searchItem)* external;

The function resolveStopwordsUri is used to resolve any URI to a sequence of strings to be used as stop words.

The function lookupThesaurus finds all expansions related to $tokens in the thesaurus $thesaurusName for the language $thesaurusLanguage using the relationship $relationship within the optional number of levels $range. If $tokens consists of more than one search token, it is regarded as a phrase.

The thesaurus function returns a sequence of expansion alternatives. Each alternative is regarded as a new search phrase and is represented as a search item. Alternatives are treated as though they are connected with a disjunction (FTOr).

4.2.3.4 FTCaseOption

FTMatchOptions of type FTCaseOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTCaseOption is "lowercase" ["uppercase"] the returned TokenInfos must span only tokens that are all lowercase [uppercase] If the FTCaseOption is "case insensitive" ["case sensitive"] the function must return all TokenInfos matching the search tokens when disregarding character case [that also accord with the search tokens in character case].

4.2.3.5 FTDiacriticsOption

FTMatchOptions of type FTDiacriticsOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTDiacriticsOption is "with diacritics" ["without diacritics"] the returned TokenInfos must span only tokens that have at least one character containing some diacritical mark [don't have any character containing a diacritical mark] as defined by Unicode. If the FTDiacriticsOption is "diacritics insensitive" ["diacritics sensitive"] the function must return all TokenInfos matching the search tokens when disregarding diacritical marks [that also accord with the search tokens in diacritical marks].

4.2.3.6 FTStemOption

FTMatchOptions of type FTStemOption are passed in the $matchOptions parameter to matchTokenInfos. It is implementation-defined what the effect of the option "with stemming" is on matching tokens, however, it is expected that this option allows to match linguistic variants of the search tokens. If the FTStemOption is "without stemming" the returned TokenInfos must span exact matches (i.e. not including linguistic variations) of the search tokens.

4.2.3.7 FTThesaurusOption

The semantics for the FTThesaurusOption is given below.

declare function fts:applyThesaurusOption (
      $matchOption as element(fts:matchOption),
      $searchTokens as element(fts:searchToken)* )
   as element(xs:searchItem)*
{
   if ($matchOption/@thesaurusIndicator = "with") then
      fts:lookupThesaurus( $searchTokens,
                           $matchOption/fts:thesaurusName,
                           $matchOption/@language,
                           $matchOption/fts:relationship,
                           $matchOption/fts:range )
   else if ($matchOption/@thesaurus = "without") then
      <fts:searchItem>
      {$searchTokens}
      </fts:searchItem>
   else ()
};
4.2.3.8 FTStopWordOption

Stop words interact with FTDistance and FTWindow. The semantics for the FTStopWordOption is given below.

declare function fts:applyStopwordOption (
      $stopwordOption as element(fts:stopWord)? )
   as xs:string*
{
   if ($stopwordOption) then
      let $swords := 
         typeswitch ($stopwordOption/*[1])
            case $e as element(fts:stop-word) 
               return $e/text()
            case $e as element(fts:uri) 
               return fts:resolveStopwordsUri($e/text())
            case element(fts:default-stopwords)
               return fts:resolveStopwordsUri(())
            default return ()
      return calcStopwords( $swords, $stopwordOption/fts:oper )
   else ()
};
declare function fts:calcStopwords ( 
      $stopWords as xs:string*,
      $opers as element(fts:oper)* )
   as element(fts:searchToken)*
{
   if ( fn:empty($opers) ) then $stopWords
   else
      let $swords := 
         typeswitch ($opers[1]/*[1])
            case $e as element(fts:stop-word) 
               return $e/text()
            case $e as element(fts:uri) 
               return fts:resolveStopwordsUri($e/text())
            default return ()
      return
         if ($opers[1]/@type eq "union") then
            fts:calcStopwords( ($stopWords, $swords), 
                               $opers[fn:position() gt 2] )
         else (: "except" :)
            fts:calcStopwords( $stopWords[fn:not(.)=$swords],
                               $opers[fn:position() gt 2] )
    else $stopWords
};
            

The stop words set is computed using the fts:calcStopwords function. The function uses the function fts:resolveStopwordsUri to resolve any URI to a sequence of strings. Then, the stop words are removed from the set of search tokens.

4.2.3.9 FTLanguageOption

The FTLanguageOption is not associated with a semantics function. It is just a parameter to other semantics functions.

4.2.3.10 FTWildCardOption

FTMatchOptions of type FTWildCardOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTWildCardOption is "with wildcards" the function must return all TokenInfos in the search context that span token occurrences, such that those token occurrences are wildcard expansions of the corresponding search token. The wildcard expansions are described in Section 3.2.7 FTWildCardOption. If the FTWildCardOption is "without wildcards" all search tokens must be matched literally.

4.3 XQuery 1.0 and XPath 2.0 Full-Text and Scoring Expressions

4.3.1 FTContainsExpr

The FTContainsExpr function defines the semantics of FTContainsExpr. The function takes the following parameters: 1) a search context consisting of a sequence of nodes (which is the result of a XQuery 1.0 and XPath 2.0 expression) and 2) an AllMatches corresponding to an FTSelection. The function returns a xs:boolean atomic value. This value is true if and only if some node in the search contains satisifes the full-text condition given by the FTSelection. Since FTContainsExpr returns results in XDM (a sequence of items), it may be treated like XQuery 1.0 expressions and may be fully composed with other XQuery 1.0 expressions. In addition, since the FTContainsExpr function maps AllMatches to a sequence of items, it provides semantics for mapping from AllMatches to XDM.

4.3.1.1 Semantics of FTContainsExpr

Consider an FTContainsExpr expression of the form EvaluationContext ftcontains FTSelection, where EvaluationContext is an XQuery 1.0 expression that returns a sequence of nodes and FTSelection is an FTSelection that returns AllMatches. The FTContainsExpr returns true if and only if some node in the result of EvaluationContext satisfies the AllMatches returned by FTSelection.

If the FTContainsExpr is of the form EvaluationContext ftcontains FTSelection without content IgnoreExpr for some XQuery 1.0 expression IgnoreExpr, then the following helper function is required.

declare function fts:reconstruct (
      $n as item(), 
      $ignore as node()* ) 
   as item()? 
{
   typeswitch ($n)
     case node() return
        if (some $i in $ignore satisfies $n is $i) then () 
        else if ($n instance of element()) then
           let $nodeName := fn:node-name($n)
           let $nodeContent := for $nn in $n/node()
                               return fts:reconstruct($nn,$ignore)
           return element {$nodeName} {$nodeContent}
        else $n
     default return $n
};
            

In the general case, the XQuery 1.0 and XPath 2.0 FTContainsExpr function takes four parameters.

  1. The sequence of items returned by EvalationContext;

  2. The XML node representation of FTSelection;

  3. The sequence of nodes returned by IgnoreExpr, if that expression is present, or the empty sequence otherwise; and

  4. The XML representation of the set of default values for each of the FTMatchOptions as given by the static context.

The FTContainsExpr function returns true if and only if the corresponding FTContainsExpr returns true, and thus specifies the semantics of FTContainsExpr. Note that by using XQuery 1.0 and XPath 2.0 to specify the formal semantics, we avoid the need to introduce new formalism. We simply reuse the formal semantics of XQuery 1.0 and XPath 2.0.

declare function fts:FTContainsExpr (
      $searchContext as item()*,
      $ftSelection as element(*,fts:FTSelection),
      $ignoreNodes as node()*,
      $defOptions as element(fts:matchOptions) )
   as xs:boolean 
{ 
   some $node in $searchContext
   satisfies 
      let $newNode := fts:reconstruct( $node, $ignoreNodes )
      return
         if (fn:empty($newNode)) then fn:false()
         else
            let $allMatches := fts:evaluate($ftSelection,
                                            $newNode,
                                            $defOptions,
                                            0)
            return 
               some $match in $allMatches/fts:match
               satisfies 
                  fn:count($match/fts:stringExclude) eq 0
};
            

The FTContainsExpr function returns true if and only if the AllMatches that is the result of the application of the FTSelection for some node in the search context contains a Match with no StringExcludes. In other words, there is a set of TokenInfos in that node which satisfy the condition of the FTSelection. If an FTIgnoreOption has been specified in the FTContainsExpr, then each node $ignoreNodes that is part of the tree of a node in the search context is pruned from that tree using the function reconstruct before that node is being passed to fts:evaluate.

4.3.2 Scoring

This section addresses the semantics of scoring variables in XQuery 1.0 for and let clauses and XPath 2.0 for expressions.

Scoring variables associate a numeric score with the result of the evaluation of XQuery 1.0 and XPath 2.0 expressions. This numeric score tries to estimate the value of a result item to the user information need expressed using the XQuery 1.0 and XPath 2.0 expression. The numeric score is computed using a implementation-provided scoring algorithm.

There are numerous scoring algorithms used in practice. Most of the scoring algorithms take as inputs a query and a set of results to the query. In computing the score, these algorithms rely on the structure of the query to estimate the relevance of the results.

In the context of defining the semantics of XQuery 1.0 and XPath 2.0 Full-Text, passing the structure of the query poses a problem. The query is an XQuery 1.0 and XPath 2.0 expression and an XQuery 1.0 and XPath 2.0 Full-text expression in particular. The semantics of XQuery 1.0 and XPath 2.0 expressions is expressed using functions take as arguments sequences of items and return sequences of items. They are not aware of what expression produced a particular sequence, i.e., they are not aware of the expression structure.

To define the semantics of scoring in XQuery 1.0 and XPath 2.0 Full-Text using XQuery 1.0, expressions that produce the query result (or the functions that implement the expressions) must be passed as arguments. In other words, second-order functions are necessary. Current XQuery 1.0 and XPath 2.0 do not provide such functions.

Nevertheless, in the interest of the exposition, assume that such second-order functions are present. In particular, that there are two semantic second-order function fts:score and fts:scoreSequence that take one argument (an expression) and return the score value of this expression, respectively a sequence of score values, one for each item to which the expression evaluates. The scores must satisfy scoring properties.

A for clause containing a score variable

for $result score $score in Expr
...

is evaluated as though it is replaced by the following the set of clauses.

let $scoreSeq := fts:scoreSequence(Expr)
for $result at $i in Expr
let $score := $scoreSeq[$i]
...

Here, $scoreSeq and $i are new variables, not appearing elsewhere, and fts:scoreSequence is the second-order function.

Similarly, a let clause containing a score variable

let $result score $score := Expr
...

is evaluated as though it is replaced by the following set of clauses.

let $result := Expr
let $score := fts:score(Expr)
...

4.3.3 Example

This section presents a more complex example for the evaluation of FTContainsExpr. This example uses the same sample document fragment and assigns it $doc. Consider the following FTContainsExpr.

    $doc ftcontains (
      (
       "mustang" && ({("great", "excellent")} any word
       occurs at least 2 times)
      ) window 30 words
      &&
      ! "rust"
    ) same paragraph

Begin by evaluating the FTSelection to AllMatches.

    (
      (
       "mustang" && ({("great", "excellent")} any word
       occurs at least 2 times)
      ) window 30 words
      &&
      ! "rust"
    ) same paragraph

Step 1: Evaluate the FTWords "mustang".

Example, step 1

Step 2: Evaluate the FTWords {"great", "excellent"} any word.

Step 2.1: Match the token "great"

Example, step 2

Step 2.2 Match the token "excellent"

Example, step 3

Step 2.3 - Combine the above AllMatches as if FTOr is used, i.e., by forming a union of the Matches.

Example, step 4

Step 3 - Apply the FTTimes {("great", "excellent")} any word occurs at least 2 times forming two pairs of Matches.

Example, step 5.1

Continued on next diagram

Example, step 5.1

Continued on next diagram

Example, step 5.2

Step 4 - Apply the FTAnd "Mustang" && ({("great", "excellent")} any word occurs at least 2 times) forming all possible pairs of StringMatches.

Example, step 6.1

Continued on next diagram

Example, step 6.1

Continued on next diagram

Example, step 6.2

Continued on next diagram

Example, step 6.2

Continued on next diagram

Example, step 6.3

Continued on next diagram

Example, step 6.3

Continued on next diagram

Example, step 6.4

Continued on next diagram

Example, step 6.4

Continued on next diagram

Example, step 6.5

Continued on next diagram

Example, step 6.5

Step 5 - Apply the FTWindow ("Mustang" && ({("great", "excellent")} any word occurs at least 2 times)) window 30 words, filtering out Matches for which the window is not less than or equal to 30 tokens.

Example, step 7.1

Continued on next diagram

Example, step 7.1

Continued on next diagram

Example, step 7.2

Continued on next diagram

Example, step 7.2

Step 6 - Evaluate FTWords "rust".

Example, step 8

Step 7 - Apply the FTUnaryNot ! "rust", transforming the StringInclude into a StringExclude.

Example, step 9

Step 8 - Apply the FTAnd (("Mustang" && ({("great", "excellent")} any word occurs at least 2 times)) window 30 words) && ! "rust", forming all possible combintations of three StringMatches from the first AllMatches and one StringMatch from the second AllMatches.

Example, step 10.1

Continued on next diagram

Example, step 10.1

Continued on next diagram

Example, step 10.2

Continued on next diagram

Example, step 10.2

Continued on next diagram

Example, step 10.3

Step 9: Apply the FTScope, filtering out Matches whose TokenInfos are not within the same paragraph (assuming the <offer> elements determine paragraph boundaries).

Example, step 11

The resulting AllMatches contains a Match that does not contain a StringExclude. Therefore, the sample FTContainsExpr returns true.

A EBNF for XQuery 1.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XML Query 1.0 grammar (see http://www.w3.org/TR/2005/CR-xquery-20051103/).

[1]    Module    ::=    VersionDecl? (LibraryModule | MainModule)
[2]    VersionDecl    ::=    "xquery" "version" StringLiteral ("encoding" StringLiteral)? Separator
[3]    MainModule    ::=    Prolog QueryBody
[4]    LibraryModule    ::=    ModuleDecl Prolog
[5]    ModuleDecl    ::=    "module" "namespace" NCName "=" URILiteral Separator
[6]    Prolog    ::=    ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator)* ((VarDecl | FunctionDecl | OptionDecl | FTOptionDecl) Separator)*
[7]    Setter    ::=    BoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderDecl | CopyNamespacesDecl
[8]    Import    ::=    SchemaImport | ModuleImport
[9]    Separator    ::=    ";"
[10]    NamespaceDecl    ::=    "declare" "namespace" NCName "=" URILiteral
[11]    BoundarySpaceDecl    ::=    "declare" "boundary-space" ("preserve" | "strip")
[12]    DefaultNamespaceDecl    ::=    "declare" "default" ("element" | "function") "namespace" URILiteral
[13]    OptionDecl    ::=    "declare" "option" QName StringLiteral
[14]    FTOptionDecl    ::=    "declare" "ft-option" FTMatchOption
[15]    OrderingModeDecl    ::=    "declare" "ordering" ("ordered" | "unordered")
[16]    EmptyOrderDecl    ::=    "declare" "default" "order" "empty" ("greatest" | "least")
[17]    CopyNamespacesDecl    ::=    "declare" "copy-namespaces" PreserveMode "," InheritMode
[18]    PreserveMode    ::=    "preserve" | "no-preserve"
[19]    InheritMode    ::=    "inherit" | "no-inherit"
[20]    DefaultCollationDecl    ::=    "declare" "default" "collation" URILiteral
[21]    BaseURIDecl    ::=    "declare" "base-uri" URILiteral
[22]    SchemaImport    ::=    "import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)?
[23]    SchemaPrefix    ::=    ("namespace" NCName "=") | ("default" "element" "namespace")
[24]    ModuleImport    ::=    "import" "module" ("namespace" NCName "=")? URILiteral ("at" URILiteral ("," URILiteral)*)?
[25]    VarDecl    ::=    "declare" "variable" "$" QName TypeDeclaration? ((":=" ExprSingle) | "external")
[26]    ConstructionDecl    ::=    "declare" "construction" ("strip" | "preserve")
[27]    FunctionDecl    ::=    "declare" "function" QName "(" ParamList? ")" ("as" SequenceType)? (EnclosedExpr | "external")
[28]    ParamList    ::=    Param ("," Param)*
[29]    Param    ::=    "$" QName TypeDeclaration?
[30]    EnclosedExpr    ::=    "{" Expr "}"
[31]    QueryBody    ::=    Expr
[32]    Expr    ::=    ExprSingle ("," ExprSingle)*
[33]    ExprSingle    ::=    FLWORExpr
| QuantifiedExpr
| TypeswitchExpr
| IfExpr
| OrExpr
[34]    FLWORExpr    ::=    (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle
[35]    ForClause    ::=    "for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)*
[36]    PositionalVar    ::=    "at" "$" VarName
[37]    FTScoreVar    ::=    "score" "$" VarName
[38]    LetClause    ::=    (("let" "$" VarName TypeDeclaration? FTScoreVar?) | ("let" "score" "$" VarName)) ":=" ExprSingle ("," (("$" VarName TypeDeclaration? FTScoreVar?) | FTScoreVar) ":=" ExprSingle)*
[39]    WhereClause    ::=    "where" ExprSingle
[40]    OrderByClause    ::=    (("order" "by") | ("stable" "order" "by")) OrderSpecList
[41]    OrderSpecList    ::=    OrderSpec ("," OrderSpec)*
[42]    OrderSpec    ::=    ExprSingle OrderModifier
[43]    OrderModifier    ::=    ("ascending" | "descending")? ("empty" ("greatest" | "least"))? ("collation" URILiteral)?
[44]    QuantifiedExpr    ::=    ("some" | "every") "$" VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle
[45]    TypeswitchExpr    ::=    "typeswitch" "(" Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle
[46]    CaseClause    ::=    "case" ("$" VarName "as")? SequenceType "return" ExprSingle
[47]    IfExpr    ::=    "if" "(" Expr ")" "then" ExprSingle "else" ExprSingle
[48]    OrExpr    ::=    AndExpr ( "or" AndExpr )*
[49]    AndExpr    ::=    ComparisonExpr ( "and" ComparisonExpr )*
[50]    ComparisonExpr    ::=    FTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?
[51]    FTContainsExpr    ::=    RangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?
[52]    RangeExpr    ::=    AdditiveExpr ( "to" AdditiveExpr )?
[53]    AdditiveExpr    ::=    MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*
[54]    MultiplicativeExpr    ::=    UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
[55]    UnionExpr    ::=    IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*
[56]    IntersectExceptExpr    ::=    InstanceofExpr ( ("intersect" | "except") InstanceofExpr )*
[57]    InstanceofExpr    ::=    TreatExpr ( "instance" "of" SequenceType )?
[58]    TreatExpr    ::=    CastableExpr ( "treat" "as" SequenceType )?
[59]    CastableExpr    ::=    CastExpr ( "castable" "as" SingleType )?
[60]    CastExpr    ::=    UnaryExpr ( "cast" "as" SingleType )?
[61]    UnaryExpr    ::=    ("-" | "+")* ValueExpr
[62]    ValueExpr    ::=    ValidateExpr | PathExpr | ExtensionExpr
[63]    GeneralComp    ::=    "=" | "!=" | "<" | "<=" | ">" | ">="
[64]    ValueComp    ::=    "eq" | "ne" | "lt" | "le" | "gt" | "ge"
[65]    NodeComp    ::=    "is" | "<<" | ">>"
[66]    ValidateExpr    ::=    "validate" ValidationMode? "{" Expr "}"
[67]    ValidationMode    ::=    "lax" | "strict"
[68]    ExtensionExpr    ::=    Pragma+ "{" Expr? "}"
[69]    Pragma    ::=    "(#" S? QName (S PragmaContents)? "#)" /* ws: explicitXQ */
[70]    PragmaContents    ::=    (Char* - (Char* '#)' Char*))
[71]    PathExpr    ::=    ("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExpr
/* gn: leading-lone-slashXQ */
[72]    RelativePathExpr    ::=    StepExpr (("/" | "//") StepExpr)*
[73]    StepExpr    ::=    FilterExpr | AxisStep
[74]    AxisStep    ::=    (ReverseStep | ForwardStep) PredicateList
[75]    ForwardStep    ::=    (ForwardAxis NodeTest) | AbbrevForwardStep
[76]    ForwardAxis    ::=    ("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")
[77]    AbbrevForwardStep    ::=    "@"? NodeTest
[78]    ReverseStep    ::=    (ReverseAxis NodeTest) | AbbrevReverseStep
[79]    ReverseAxis    ::=    ("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")
[80]    AbbrevReverseStep    ::=    ".."
[81]    NodeTest    ::=    KindTest | NameTest
[82]    NameTest    ::=    QName | Wildcard
[83]    Wildcard    ::=    "*"
| (NCName ":" "*")
| ("*" ":" NCName)
/* ws: explicitXQ */
[84]    FilterExpr    ::=    PrimaryExpr PredicateList
[85]    PredicateList    ::=    Predicate*
[86]    Predicate    ::=    "[" Expr "]"
[87]    PrimaryExpr    ::=    Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | OrderedExpr | UnorderedExpr | Constructor
[88]    Literal    ::=    NumericLiteral | StringLiteral
[89]    NumericLiteral    ::=    IntegerLiteral | DecimalLiteral | DoubleLiteral
[90]    VarRef    ::=    "$" VarName
[91]    VarName    ::=    QName
[92]    ParenthesizedExpr    ::=    "(" Expr? ")"
[93]    ContextItemExpr    ::=    "."
[94]    OrderedExpr    ::=    "ordered" "{" Expr "}"
[95]    UnorderedExpr    ::=    "unordered" "{" Expr "}"
[96]    FunctionCall    ::=    QName "(" (ExprSingle ("," ExprSingle)*)? ")" /* gn: reserved-function-namesXQ */
/* gn: parensXQ */
[97]    Constructor    ::=    DirectConstructor
| ComputedConstructor
[98]    DirectConstructor    ::=    DirElemConstructor
| DirCommentConstructor
| DirPIConstructor
[99]    DirElemConstructor    ::=    "<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">")) /* ws: explicitXQ */
[100]    DirAttributeList    ::=    (S (QName S? "=" S? DirAttributeValue)?)* /* ws: explicitXQ */
[101]    DirAttributeValue    ::=    ('"' (EscapeQuot | QuotAttrValueContent)* '"')
| ("'" (EscapeApos | AposAttrValueContent)* "'")
/* ws: explicitXQ */
[102]    QuotAttrValueContent    ::=    QuotAttrContentChar
| CommonContent
[103]    AposAttrValueContent    ::=    AposAttrContentChar
| CommonContent
[104]    DirElemContent    ::=    DirectConstructor
| CDataSection
| CommonContent
| ElementContentChar
[105]    CommonContent    ::=    PredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExpr
[106]    DirCommentConstructor    ::=    "<!--" DirCommentContents "-->" /* ws: explicitXQ */
[107]    DirCommentContents    ::=    ((Char - '-') | ('-' (Char - '-')))* /* ws: explicitXQ */
[108]    DirPIConstructor    ::=    "<?" PITarget (S DirPIContents)? "?>" /* ws: explicitXQ */
[109]    DirPIContents    ::=    (Char* - (Char* '?>' Char*)) /* ws: explicitXQ */
[110]    CDataSection    ::=    "<![CDATA[" CDataSectionContents "]]>" /* ws: explicitXQ */
[111]    CDataSectionContents    ::=    (Char* - (Char* ']]>' Char*)) /* ws: explicitXQ */
[112]    ComputedConstructor    ::=    CompDocConstructor
| CompElemConstructor
| CompAttrConstructor
| CompTextConstructor
| CompCommentConstructor
| CompPIConstructor
[113]    CompDocConstructor    ::=    "document" "{" Expr "}"
[114]    CompElemConstructor    ::=    "element" (QName | ("{" Expr "}")) "{" ContentExpr? "}"
[115]    ContentExpr    ::=    Expr
[116]    CompAttrConstructor    ::=    "attribute" (QName | ("{" Expr "}")) "{" Expr? "}"
[117]    CompTextConstructor    ::=    "text" "{" Expr "}"
[118]    CompCommentConstructor    ::=    "comment" "{" Expr "}"
[119]    CompPIConstructor    ::=    "processing-instruction" (NCName | ("{" Expr "}")) "{" Expr? "}"
[120]    SingleType    ::=    AtomicType "?"?
[121]    TypeDeclaration    ::=    "as" SequenceType
[122]    SequenceType    ::=    ("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)
[123]    OccurrenceIndicator    ::=    "?" | "*" | "+" /* gn: occurrence-indicatorsXQ */
[124]    ItemType    ::=    KindTest | ("item" "(" ")") | AtomicType
[125]    AtomicType    ::=    QName
[126]    KindTest    ::=    DocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTest
[127]    AnyKindTest    ::=    "node" "(" ")"
[128]    DocumentTest    ::=    "document-node" "(" (ElementTest | SchemaElementTest)? ")"
[129]    TextTest    ::=    "text" "(" ")"
[130]    CommentTest    ::=    "comment" "(" ")"
[131]    PITest    ::=    "processing-instruction" "(" (NCName | StringLiteral)? ")"
[132]    AttributeTest    ::=    "attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"
[133]    AttribNameOrWildcard    ::=    AttributeName | "*"
[134]    SchemaAttributeTest    ::=    "schema-attribute" "(" AttributeDeclaration ")"
[135]    AttributeDeclaration    ::=    AttributeName
[136]    ElementTest    ::=    "element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"
[137]    ElementNameOrWildcard    ::=    ElementName | "*"
[138]    SchemaElementTest    ::=    "schema-element" "(" ElementDeclaration ")"
[139]    ElementDeclaration    ::=    ElementName
[140]    AttributeName    ::=    QName
[141]    ElementName    ::=    QName
[142]    TypeName    ::=    QName
[143]    URILiteral    ::=    StringLiteral
[144]    FTSelection    ::=    FTOr (FTMatchOption | FTProximity)* ("weight" RangeExpr)?
[145]    FTOr    ::=    FTAnd ( "||" FTAnd )*
[146]    FTAnd    ::=    FTMildnot ( "&&" FTMildnot )*
[147]    FTMildnot    ::=    FTUnaryNot ( "not" "in" FTUnaryNot )*
[148]    FTUnaryNot    ::=    ("!")? FTWordsSelection
[149]    FTWordsSelection    ::=    (FTWords FTTimes?) | ("(" FTSelection ")")
[150]    FTWords    ::=    FTWordsValue FTAnyallOption?
[151]    FTWordsValue    ::=    Literal | ("{" Expr "}")
[152]    FTProximity    ::=    FTOrderedIndicator | FTWindow | FTDistance | FTScope | FTContent
[153]    FTOrderedIndicator    ::=    "ordered"
[154]    FTMatchOption    ::=    FTCaseOption
| FTDiacriticsOption
| FTStemOption
| FTThesaurusOption
| FTStopwordOption
| FTLanguageOption
| FTWildCardOption
[155]    FTCaseOption    ::=    "lowercase"
| "uppercase"
| ("case" "sensitive")
| ("case" "insensitive")
[156]    FTDiacriticsOption    ::=    ("with" "diacritics")
| ("without" "diacritics")
| ("diacritics" "sensitive")
| ("diacritics" "insensitive")
[157]    FTStemOption    ::=    ("with" "stemming") | ("without" "stemming")
[158]    FTThesaurusOption    ::=    ("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")
[159]    FTThesaurusID    ::=    "at" StringLiteral ("relationship" StringLiteral)? (FTRange "levels")?
[160]    FTStopwordOption    ::=    ("with" "stop" "words" FTRefOrList FTInclExclStringLiteral*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTInclExclStringLiteral*)
[161]    FTRefOrList    ::=    ("at" StringLiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")
[162]    FTInclExclStringLiteral    ::=    ("union" | "except") FTRefOrList
[163]    FTLanguageOption    ::=    "language" StringLiteral
[164]    FTWildCardOption    ::=    ("with" "wildcards") | ("without" "wildcards")
[165]    FTContent    ::=    ("at" "start") | ("at" "end") | ("entire" "content")
[166]    FTAnyallOption    ::=    ("any" "word"?) | ("all" "words"?) | "phrase"
[167]    FTRange    ::=    ("exactly" UnionExpr)
| ("at" "least" UnionExpr)
| ("at" "most" UnionExpr)
| ("from" UnionExpr "to" UnionExpr)
[168]    FTDistance    ::=    "distance" FTRange FTUnit
[169]    FTWindow    ::=    "window" UnionExpr FTUnit
[170]    FTTimes    ::=    "occurs" FTRange "times"
[171]    FTScope    ::=    ("same" | "different") FTBigUnit
[172]    FTUnit    ::=    "words" | "sentences" | "paragraphs"
[173]    FTBigUnit    ::=    "sentence" | "paragraph"
[174]    FTIgnoreOption    ::=    "without" "content" UnionExpr

A.1 Terminal Symbols

[175]    IntegerLiteral    ::=    Digits
[176]    DecimalLiteral    ::=    ("." Digits) | (Digits "." [0-9]*) /* ws: explicitXQ */
[177]    DoubleLiteral    ::=    (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits /* ws: explicitXQ */
[178]    StringLiteral    ::=    ('"' (PredefinedEntityRef | CharRef | EscapeQuot | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | EscapeApos | [^'&])* "'") /* ws: explicitXQ */
[179]    PredefinedEntityRef    ::=    "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";" /* ws: explicitXQ */
[180]    EscapeQuot    ::=    '""'
[181]    EscapeApos    ::=    "''"
[182]    ElementContentChar    ::=    Char - [{}<&]
[183]    QuotAttrContentChar    ::=    Char - ["{}<&]
[184]    AposAttrContentChar    ::=    Char - ['{}<&]
[185]    Comment    ::=    "(:" (CommentContents | Comment)* ":)" /* ws: explicitXQ */
/* gn: commentsXQ */
[186]    PITarget    ::=    [http://www.w3.org/TR/REC-xml#NT-PITarget]XML /* gn: xml-versionXQ */
[187]    CharRef    ::=    [http://www.w3.org/TR/REC-xml#NT-CharRef]XML /* gn: xml-versionXQ */
[188]    QName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-QName]Names /* gn: xml-versionXQ */
[189]    NCName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names /* gn: xml-versionXQ */
[190]    S    ::=    [http://www.w3.org/TR/REC-xml#NT-S]XML /* gn: xml-versionXQ */
[191]    Char    ::=    [http://www.w3.org/TR/REC-xml#NT-Char]XML /* gn: xml-versionXQ */

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of A EBNF for XQuery 1.0 Grammar with Full-Text extensions.

[192]    Digits    ::=    [0-9]+
[193]    CommentContents    ::=    (Char+ - (Char* ('(:' | ':)') Char*))

B EBNF for XPath 2.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XPath 2.0 grammar (see http://www.w3.org/TR/2005/CR-xpath20-20051103/).

[1]    XPath    ::=    Expr
[2]    Expr    ::=    ExprSingle ("," ExprSingle)*
[3]    ExprSingle    ::=    ForExpr
| QuantifiedExpr
| IfExpr
| OrExpr
[4]    ForExpr    ::=    SimpleForClause "return" ExprSingle
[5]    SimpleForClause    ::=    "for" "$" VarName FTScoreVar? "in" ExprSingle ("," "$" VarName FTScoreVar? "in" ExprSingle)*
[6]    FTScoreVar    ::=    "score" "$" VarName
[7]    QuantifiedExpr    ::=    ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle
[8]    IfExpr    ::=    "if" "(" Expr ")" "then" ExprSingle "else" ExprSingle
[9]    OrExpr    ::=    AndExpr ( "or" AndExpr )*
[10]    AndExpr    ::=    ComparisonExpr ( "and" ComparisonExpr )*
[11]    ComparisonExpr    ::=    FTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?
[12]    FTContainsExpr    ::=    RangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?
[13]    RangeExpr    ::=    AdditiveExpr ( "to" AdditiveExpr )?
[14]    AdditiveExpr    ::=    MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*
[15]    MultiplicativeExpr    ::=    UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
[16]    UnionExpr    ::=    IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*
[17]    IntersectExceptExpr    ::=    InstanceofExpr ( ("intersect" | "except") InstanceofExpr )*
[18]    InstanceofExpr    ::=    TreatExpr ( "instance" "of" SequenceType )?
[19]    TreatExpr    ::=    CastableExpr ( "treat" "as" SequenceType )?
[20]    CastableExpr    ::=    CastExpr ( "castable" "as" SingleType )?
[21]    CastExpr    ::=    UnaryExpr ( "cast" "as" SingleType )?
[22]    UnaryExpr    ::=    ("-" | "+")* ValueExpr
[23]    ValueExpr    ::=    PathExpr
[24]    GeneralComp    ::=    "=" | "!=" | "<" | "<=" | ">" | ">="
[25]    ValueComp    ::=    "eq" | "ne" | "lt" | "le" | "gt" | "ge"
[26]    NodeComp    ::=    "is" | "<<" | ">>"
[27]    PathExpr    ::=    ("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExpr
/* gn: leading-lone-slashXP */
[28]    RelativePathExpr    ::=    StepExpr (("/" | "//") StepExpr)*
[29]    StepExpr    ::=    FilterExpr | AxisStep
[30]    AxisStep    ::=    (ReverseStep | ForwardStep) PredicateList
[31]    ForwardStep    ::=    (ForwardAxis NodeTest) | AbbrevForwardStep
[32]    ForwardAxis    ::=    ("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")
| ("namespace" "::")
[33]    AbbrevForwardStep    ::=    "@"? NodeTest
[34]    ReverseStep    ::=    (ReverseAxis NodeTest) | AbbrevReverseStep
[35]    ReverseAxis    ::=    ("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")
[36]    AbbrevReverseStep    ::=    ".."
[37]    NodeTest    ::=    KindTest | NameTest
[38]    NameTest    ::=    QName | Wildcard
[39]    Wildcard    ::=    "*"
| (NCName ":" "*")
| ("*" ":" NCName)
/* ws: explicitXP */
[40]    FilterExpr    ::=    PrimaryExpr PredicateList
[41]    PredicateList    ::=    Predicate*
[42]    Predicate    ::=    "[" Expr "]"
[43]    PrimaryExpr    ::=    Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall
[44]    Literal    ::=    NumericLiteral | StringLiteral
[45]    NumericLiteral    ::=    IntegerLiteral | DecimalLiteral | DoubleLiteral
[46]    VarRef    ::=    "$" VarName
[47]    VarName    ::=    QName
[48]    ParenthesizedExpr    ::=    "(" Expr? ")"
[49]    ContextItemExpr    ::=    "."
[50]    FunctionCall    ::=    QName "(" (ExprSingle ("," ExprSingle)*)? ")" /* gn: reserved-function-namesXP */
/* gn: parensXP */
[51]    SingleType    ::=    AtomicType "?"?
[52]    SequenceType    ::=    ("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)
[53]    OccurrenceIndicator    ::=    "?" | "*" | "+" /* gn: occurrence-indicatorsXP */
[54]    ItemType    ::=    KindTest | ("item" "(" ")") | AtomicType
[55]    AtomicType    ::=    QName
[56]    KindTest    ::=    DocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTest
[57]    AnyKindTest    ::=    "node" "(" ")"
[58]    DocumentTest    ::=    "document-node" "(" (ElementTest | SchemaElementTest)? ")"
[59]    TextTest    ::=    "text" "(" ")"
[60]    CommentTest    ::=    "comment" "(" ")"
[61]    PITest    ::=    "processing-instruction" "(" (NCName | StringLiteral)? ")"
[62]    AttributeTest    ::=    "attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"
[63]    AttribNameOrWildcard    ::=    AttributeName | "*"
[64]    SchemaAttributeTest    ::=    "schema-attribute" "(" AttributeDeclaration ")"
[65]    AttributeDeclaration    ::=    AttributeName
[66]    ElementTest    ::=    "element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"
[67]    ElementNameOrWildcard    ::=    ElementName | "*"
[68]    SchemaElementTest    ::=    "schema-element" "(" ElementDeclaration ")"
[69]    ElementDeclaration    ::=    ElementName
[70]    AttributeName    ::=    QName
[71]    ElementName    ::=    QName
[72]    TypeName    ::=    QName
[73]    FTSelection    ::=    FTOr (FTMatchOption | FTProximity)* ("weight" RangeExpr)?
[74]    FTOr    ::=    FTAnd ( "||" FTAnd )*
[75]    FTAnd    ::=    FTMildnot ( "&&" FTMildnot )*
[76]    FTMildnot    ::=    FTUnaryNot ( "not" "in" FTUnaryNot )*
[77]    FTUnaryNot    ::=    ("!")? FTWordsSelection
[78]    FTWordsSelection    ::=    (FTWords FTTimes?) | ("(" FTSelection ")")
[79]    FTWords    ::=    FTWordsValue FTAnyallOption?
[80]    FTWordsValue    ::=    Literal | ("{" Expr "}")
[81]    FTProximity    ::=    FTOrderedIndicator | FTWindow | FTDistance | FTScope | FTContent
[82]    FTOrderedIndicator    ::=    "ordered"
[83]    FTMatchOption    ::=    FTCaseOption
| FTDiacriticsOption
| FTStemOption
| FTThesaurusOption
| FTStopwordOption
| FTLanguageOption
| FTWildCardOption
[84]    FTCaseOption    ::=    "lowercase"
| "uppercase"
| ("case" "sensitive")
| ("case" "insensitive")
[85]    FTDiacriticsOption    ::=    ("with" "diacritics")
| ("without" "diacritics")
| ("diacritics" "sensitive")
| ("diacritics" "insensitive")
[86]    FTStemOption    ::=    ("with" "stemming") | ("without" "stemming")
[87]    FTThesaurusOption    ::=    ("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")
[88]    FTThesaurusID    ::=    "at" StringLiteral ("relationship" StringLiteral)? (FTRange "levels")?
[89]    FTStopwordOption    ::=    ("with" "stop" "words" FTRefOrList FTInclExclStringLiteral*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTInclExclStringLiteral*)
[90]    FTRefOrList    ::=    ("at" StringLiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")
[91]    FTInclExclStringLiteral    ::=    ("union" | "except") FTRefOrList
[92]    FTLanguageOption    ::=    "language" StringLiteral
[93]    FTWildCardOption    ::=    ("with" "wildcards") | ("without" "wildcards")
[94]    FTContent    ::=    ("at" "start") | ("at" "end") | ("entire" "content")
[95]    FTAnyallOption    ::=    ("any" "word"?) | ("all" "words"?) | "phrase"
[96]    FTRange    ::=    ("exactly" UnionExpr)
| ("at" "least" UnionExpr)
| ("at" "most" UnionExpr)
| ("from" UnionExpr "to" UnionExpr)
[97]    FTDistance    ::=    "distance" FTRange FTUnit
[98]    FTWindow    ::=    "window" UnionExpr FTUnit
[99]    FTTimes    ::=    "occurs" FTRange "times"
[100]    FTScope    ::=    ("same" | "different") FTBigUnit
[101]    FTUnit    ::=    "words" | "sentences" | "paragraphs"
[102]    FTBigUnit    ::=    "sentence" | "paragraph"
[103]    FTIgnoreOption    ::=    "without" "content" UnionExpr

B.1 Terminal Symbols

[104]    IntegerLiteral    ::=    Digits
[105]    DecimalLiteral    ::=    ("." Digits) | (Digits "." [0-9]*) /* ws: explicitXP */
[106]    DoubleLiteral    ::=    (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits /* ws: explicitXP */
[107]    StringLiteral    ::=    ('"' (EscapeQuot | [^"])* '"') | ("'" (EscapeApos | [^'])* "'") /* ws: explicitXP */
[108]    EscapeQuot    ::=    '""'
[109]    EscapeApos    ::=    "''"
[110]    Comment    ::=    "(:" (CommentContents | Comment)* ":)" /* ws: explicitXP */
/* gn: commentsXP */
[111]    QName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-QName]Names /* gn: xml-versionXP */
[112]    NCName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names /* gn: xml-versionXP */
[113]    Char    ::=    [http://www.w3.org/TR/REC-xml#NT-Char]XML /* gn: xml-versionXP */

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of B EBNF for XPath 2.0 Grammar with Full-Text extensions.

[114]    Digits    ::=    [0-9]+
[115]    CommentContents    ::=    (Char+ - (Char* ('(:' | ':)') Char*))

C Static Context Components

The following table describes the full-text components of the static context (see Section XQ). The following aspects of each component are described:

Static Context Components
Component Default initial value Can be overwritten or augmented by implementation? Can be overwritten or augmented by a query? Scope Consistency rules
FTCaseOption case insensitive overwriteable overwriteable by prolog lexical Value must be case insensitive or case sensitive.
FTDiacriticsOption diacritics insensitive overwriteable overwriteable by prolog lexical Value must be diacritics insensitive or diacritics sensitive.
FTStemOption without stemming overwriteable overwriteable by prolog lexical Value must be without stemming or with stemming.
FTThesaurusOption without thesaurus overwriteable overwriteable by prolog (refer to default to augment) lexical Value must be part of the statically known thesauri.
Statically known thesauri none augmentable cannot be augmented or overwritten by prolog module Each URI uniquely identifies a thesaurus list.
FTStopWordOption without stopwords overwriteable overwriteable by prolog (refer to default to augment) lexical Value must be part of the statically known stop word lists.
Statically known stop word lists none augmentable cannot be augmented or overwritten by prolog module Each URI uniquely identifies a stop word list.
FTLanguageOption no language is selected overwriteable overwriteable by prolog lexical Value must be castable to "xs:language" or "none".
Statically known languages none augmentable cannot be augmented or overwritten by prolog module Each string uniquely identifies a language.
FTWildCardOption without wildcards no overwriteable by prolog lexical Value must be without wildcards or without wildcards.

D Error Conditions

err:XPTY0004

It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in Section 2.5.4 SequenceType MatchingXP.

err:FOCH0002

It is a dynamic error if, in a function invocation, the argument corresponding to the specified function's collation parameter does not identify a supported collation.

E References

E.1 Normative References

XQuery 1.0: An XML Query Language
XQuery 1.0: An XML Query Language, Don Chamberlin , Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 3 Nov 2005. This version is http://www.w3.org/TR/2005/CR-xquery-20051103/. The latest version is available at http://www.w3.org/TR/xquery/.
XML Path Language (XPath) 2.0
XML Path Language (XPath) 2.0, Don Chamberlin , Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 3 Nov 2005. This version is http://www.w3.org/TR/2005/CR-xpath20-20051103/. The latest version is available at http://www.w3.org/TR/xpath20/.
XQuery 1.0 and XPath 2.0 Functions and Operators
XQuery 1.0 and XPath 2.0 Functions and Operators, Ashok Malhotra, Jim Melton, and Norman Walsh, Editors. World Wide Web Consortium, 3 Nov 2005. This version is http://www.w3.org/TR/2005/CR-xpath-functions-20051103/. The latest version is available at http://www.w3.org/TR/xpath-functions/.
XQuery and XPath Full-Text Requirements
XQuery and XPath Full-Text Requirements, Stephen Buxton, Michael Rys, Editors. World Wide Web Consortium, 02 May 2003. This version is http://www.w3.org/TR/2003/WD-xquery-full-text-requirements-20030502/. The latest version is available at http://www.w3.org/TR/xquery-full-text-requirements/.
XQuery 1.0 and XPath 2.0 Full-Text Use Cases
XQuery 1.0 and XPath 2.0 Full-Text Use Cases, Sihem Amer-Yahia and Pat Case, Editors. World Wide Web Consortium, 3 Nov 2005. This version is http://www.w3.org/TR/2005/WD-xmlquery-full-text-use-cases-20051103/. The latest version is available at http://www.w3.org/TR/xmlquery-full-text-use-cases/.

E.2 Non-normative References

ISO 2788
Documentation Guidelines for the Establishment and Development of Monolingual Thesauri, Geneva: International Organization for Standardization, 2nd edition, 1986.
SQL/MM
ISO/IEC 13249-2 Information technology --- Database languages --- SQL Multimedia and Application Packages --- Part 2: Full-Text. Geneva: International Organization for Standardization, 2nd edition, 2003.

F Acknowledgements (Non-Normative)

We would like to thank the members of the XQuery and XPath Full-Text group for their fruitful discussions.

We would like to thank the following people for their contributions on earlier drafts of this document.

G Glossary (Non-Normative)

AllMatches

An AllMatches describes the possible results of an FTSelection.

Full-TextOperators

Full-text operators perform operations on tokens, phrases, and expressions. Some require that the relative positions of tokens in the document be known (e.g., proximity operators).

Full-TextQueries

Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.

IgnoredNodes

Ignored nodes are the set of element nodes whose content are ignored.

Match

Each Match describes one result to the FTSelection.

MatchOptions

Match options modify the set of tokens and phrases in the query. Some of these options (e.g., stemming) have behaviors which depend on the language of the document, the language of the query, or both.

Paragraph

A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.

Phrase

A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.

Scores

Scores express the relevance of those results to the full-text search conditions.

SearchItem

SearchItem is a sequence of SearchTokenInfos representing the sequence of tokens derived from tokenizing one search string.

SearchTokenInfo

SearchTokenInfo is the identity of a token inside a search string.

Sentence

A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.

StringExclude

A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.

StringInclude

A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.

StringMatch

A StringMatch is a possible match of a sequence of search tokens with a corresponding sequence of consecutive token occurrences in a document. A StringMatch may be a StringInclude or StringExclude.

Token

A token is defined as a character, n-gram, or sequence of characters returned by a tokenizer as a basic unit to be searched. Each instance of a token consists of one or more consecutive characters. Beyond that, tokens are implementation-defined.

TokenInfo

TokenInfo represents a sequence of consecutive token occurrences inside an XML document.

Tokenization

Formally, tokenization is the process of converting the string value of a node to a sequence of token occurrences, taking the structural information of the node into account to identify token, sentence, and paragraph boundaries.

WeightDeclarations

Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions.

H Checklist of Implementation-Defined Features (Non-Normative)

This appendix provides a summary of features defined in this specification whose effect is explicitly implementation-defined. The conformance rules require vendors to provide documentation that explains how these choices have been exercised.

  1. Everything about tokenization, including the definition of the term "words", is implementation-defined, except that

    1. each word consists of one or more consecutive characters;

    2. the tokenizer must preserve the containment hierarchy (paragraphs contain sentences contain words); and

    3. the tokenizer must, when tokenizing two equal strings, identify the same tokens in each.

  2. Implementations are free to provide implementation-defined ways to differentiate between markup's effect on token boundaries during tokenization.

  3. It is implementation-defined what a stem of a word is and whether stemming will based on an algorithm, dictionary, or mixed approach.

  4. When the option "with default stop words" is used, an implementation-defined collection of stop words is used.

  5. The set of valid language identifiers is implementation-defined.

  6. Certain values in the static context (see C Static Context Components) that can be overwritten or augmented by implementations are implementation-defined.

I Change Log (Non-Normative)

Sihem Amer-Yahia 2005-04-08 Updated case matrix Updated case matrix row "sensitive", column "CCI" from "case-insensitive variant of CCI if it exists, else error" to "case-sensitive variant of CCI if it exists, else error".
Sihem Amer-Yahia 2005-05-02 Closed issues with no changes Closed Cluster B, Issue 28 IGNORE Syntax with no change to the document. Closed Cluster B, Issue 50 IGNORE Queries with no change to the document.
Sihem Amer-Yahia 2005-05-02 Updated FTTimes syntax Closed Cluster G, Issue 14 FTTimesSelection and added a related bullet item in Section 3.
Sihem Amer-Yahia 2005-05-02 Updated FTWildCard syntax Updated FTWildCardOption in Section 3.
Sihem Amer-Yahia 2005-05-03 Updated introduction Replaced "semantic element" with "semantic markup" and "tag" with "element" in the introduction.
Sihem Amer-Yahia 2005-05-03 Added issue on error codes Added Cluster J, Issue 59 Error Codes.
Sihem Amer-Yahia 2005-05-03 Closed issues with no change Closed Cluster A, Issue 54 Weight Granularity in Scoring with same resolution as for Cluster A, Issue 5 Score Weighting, no further change to document. Closed Cluster H, Issue 9 Window with no change to the document. Closed Cluster H, Issue 19 FTScopeSelection on structure with no change to the document. Closed Cluster E, Issue 25 MatchOption Syntax with no change to the document. Closed Cluster H, Issue 44 FTContains Semantics with no change to the document.
Sihem Amer-Yahia 2005-05-03 Updated FTContent syntax Updated FTContent adding "entire content", Closed Cluster C, Issue 39 Exact Element Content.
Sihem Amer-Yahia 2005-05-03 Closed issue on Boolean Naming Closed Cluster F, Issue 38 Boolean Naming. Changes to the document are pending awaiting a decision on whether it is OK to use "and", "or", "not" for full-text. If so change existing symbols to "and", "or", "not". If not change existing symbols to "ftand", "ftor", "ftnot".
Chavdar Botev 2005-05-03 Updated FTDistance semantics Updated the semantics for distance.
Sihem Amer-Yahia 2005-05-03 Updated FTRange syntax Made "exactly" required before an exact number in FTRange. Closed Cluster F, Issue 43 Exactly in FTRangeSpec.
Sihem Amer-Yahia 2005-05-04 Closed issue on collations Closed Cluster D, Issue 57 Collations Match Option.
Jochen Doerre 2005-05-19 Added issue on scoring Added Cluster A, Issue 60 Extended Scoring.
Chavdar Botev 2005-06-29 Added issue on FTNegation Added Cluster G, Issue 62 Precise semantics of double negation.
Chavdar Botev 2005-06-29 Added issue on FTTimes Added Cluster G, Issue 61 Desired semantics of FTTimes.
Sihem Amer-Yahia 2005-07-11 Updated FTMildNegation syntax Updated the mild not syntax from "mild not" to "not in". Closed Cluster I, Issue 10 MildNot and Cluster F, Issue 41 Mildnot Naming.
Chavdar Botev 2005-07-12 Updated FTIgnore semantics Changed semantics of FTIgnoreOption.
Sihem Amer-Yahia 2005-07-18 Corrected error codes Corrected and added error codes, closing and implementing the resolution for Cluster J Issue 59 Error Codes.
Sihem Amer-Yahia 2005-07-18 Closed issues with no changes closed Cluster I, Issue 13 "loose-grammar" leaving the grammar as it is. Closed issue Cluster D, Issue 53 "matchoptions-default" with no change to the document. Closed Cluster H, Issue 58 "ft-about-operator" with no change to the document.
Sihem Amer-Yahia 2005-07-21 Updated score syntax Closed Cluster A, Issue 60 "new-scoring-proposal" and Issue 2 "scoring-values" and updated Section 2.2 Score Clause to reflect new score syntaxes. There are now syntaxes for scored queries 1) returning the same results as queries with Boolean predicates and 2) for returning more or fewer results.
Sihem Amer-Yahia 2005-07-21 Added appendix for defaults Added appendix for defaults in the query prolog analogous to C.1 in the XQuery language document.
Sihem Amer-Yahia 2005-07-21 Updated FTThesaurus section Aligned description in Section 3.2.4 FTThesaurusOption with current grammar.
Sihem Amer-Yahia 2005-07-21 Opened and closed issue on nested FTNegation Opened and closed Cluster I, Issue 65 Nested FTNegations on the right side of an FTMildNegation.
Chavdar Botev 2005-07-25 Updated FTMildNegation semantics Changed the semantics of MildNot.
Sihem Amer-Yahia 2005-08-10 Added Change Log Added Change Log harvesting back entries from CVS change log.
Jochen Doerre 2005-08-17 Grammar changes Changed XQuery/XPath grammar for new scoring syntax (resolution of Issue 60), for match option defaults in query prolog (resolution of Issue 45), for simplified window operator (resolution to Issue 51), renamed "mild not" to "not in" (resolution of Issue 41), modified FTThesaurusOption, FTStopwordOption and FTLanguageOption to require StringLiterals as decided in May 05 F2F.
Jochen Doerre 2005-08-17 Changes to Section 2 New scoring syntax introduced; rewritten most of 2.2. Corrected use of weights in 2.2.1 (wrong default, wrong use of 1.5)
Jochen Doerre 2005-08-17 Changes to Section 3 Adapting the explanations to changed syntax for FTWindow, FTThesaurusOption, FTStopwordOption and FTLanguageOption. Also corrected a couple of example explanations. Removed FTIgnoreOption from the list of match option defaults in 3.2 Corrected explanation and example of FTLanguageOption (diacritics nor case are language-specific!). Commented out last two examples of FTDistance, because distance 15 does not work for phrases.
Jochen Doerre 2005-08-17 Appendices A+B Adapted introductory comment about which version of the XQuery/XPath grammars we are aligned to.
Jochen Doerre 2005-08-17 Dates in Header Adapted current date and previous date and links in full-text-query-language-semantics.xml and in tqheader.xml.
Jochen Doerre 2005-08-19 Added Section 2.3, Changes in 3+4 Added Section 2.3 Extension to Static Context. Changed Sections 3.2 and 4.4.1.1 to refer to match option settings in the static context.
Jochen Doerre 2005-08-19 Added Issue 63 Added Cluster G Issue 63: Distance constraints do not work on phrases.
Jochen Doerre 2005-08-19 Changes in Section 4 Adapted semantics to new scoring feature (resolution of Issue 60), changed FTWindow semantics according to resolution of Issue 51, and cleaned examples.
Jochen Doerre 2005-08-19 Appendix G Added lines for statically known thesauri and stop lists.
Jochen Doerre 2005-08-25 Added Issue 64 Added Cluster E Issue 64:System Relative Operator Defaults (using wording proposed by Pat Case).
Jochen Doerre 2005-10-10 Changes in Section 3 Rephrased Section 3.2.7 FTIgnoreOption. Explanation and example adapted to simple (non-recursive) use of "ignore".
Jochen Doerre 2005-10-10 Changes in Section 4 Incorporated Section 4.3.1.4 Match and AllMatches Normal Form.
Sihem Amer-Yahia 2005-10-12 Incorporated comments Incorporated Pat's comments at http://lists.w3.org/Archives/Member/member-query-fttf/2005Sep/0068.html
Jim Melton 2005-10-20 Changes in Sections 3 and 4 Properly marked up errors and inserted error summary appendix. Re-ordered appendices so normative appendices precede non-normative appendices.
Jochen Doerre 2005-10-24 Final editings Included corrections to examples in Section 3. Changed meaning of distance 0 for sentences (paragraphs) to mean adjacent. Rework of Appendix H Checklist of Implementation-Defined Features. Resolution texts to issues 45, 59, and 62.
Jochen Doerre 2005-11-28 Restrict FTTimes to FTWords Modified EBNF syntax to allow the FTTimes operation to be applicable only to simple FTWords.
Jochen Doerre 2005-11-28 Re: Bug 2299: Changes to Section 4 The AllMatches model has been changed to allow the TokenInfo of a StringMatch to represent an interval of token positions, instead of single positions. Thus, a phrase is now modeled using a single StringMatch, and consequently distance constraints (which always apply to the individual StringMatches) can be used to constrain the entire phrase. In addition, this change allows to model overlapping tokens. The semantics functions for FTOrder (order now constrains the start positions of tokens), for FTScope, for FTDistance (a distance constraint requires a certain number of positions between the end of one token and the start of the next) and for FTWindows have been adapted.
Jochen Doerre 2006-01-09 Issues List removed Dropped Appendix I "Issues List", as issues are tracked in Bugzilla now.
Mary Holstege 2006-02-01 Static context Added known languages to static context.
Jochen Doerre 2006-03-06 Bug 2776 Changed EBNF grammar to allow weights to be specified using RangeExpr.
Mary Holstege 2006-03-30 Updated Tokenization 4.2.7 Expanded and clarified definition. Added examples.
Pat Case 2006-04-13 Replaced glossary Removed glossary copied from the XQuery language document and inserted coding to produce a full-text glossary.
Jochen Doerre 2006-04-24 Section 2 Added new Processing Model section.
Jochen Doerre 2006-04-25 Section 4 Included the completely revised semantics schemata and functions, which now (i) correctly handle interval-based TokenInfos, (ii) separate the representation of TokenInfos and SearchTokenInfos and SearchItems, (iii) have been simplified regarding the semantics of match options by no longer separating the implementation-defined matching function from (most of) the implementation-defined application of match options, and (iv) have been type- and syntax-checked.