XQuery 1.0 and XPath 2.0 Full-Text 1.0

WD-xpath-full-text-10

W3C Working Draft

18 May 2007 http://www.w3.org/TR/2007/WD-xpath-full-text-10-20070518/ XML http://www.w3.org/TR/xpath-full-text-10/ Sihem Amer-Yahia AT&T Labs - Research Chavdar Botev Invited Expert Stephen Buxton Mark Logic Corporation Pat Case Library of Congress Jochen Doerre IBM Mary Holstege Mark Logic Corporation Jim Melton Oracle Michael Rys Microsoft Jayavel Shanmugasundaram Invited Expert

This document defines the syntax and formal semantics of XQuery 1.0 and XPath 2.0 Full-Text 1.0 which is a language that extends XQuery 1.0 and XPath 2.0 with full-text search capabilities.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Last Call Working Draft for review by W3C Members and other interested parties. This document was produced following the procedures set out for the W3C Process and was defined jointly by the XSL Working Group and the XML Query Working Group (both part of the XML Activity). It is designed to be read in conjunction with the following documents: , , and and .

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document defines a language for expressing full-text queries on XML documents; the language is specified in the form of extensions to both XPath 2.0 and XQuery 1.0. Organizations and individuals should review this document to determine the degree to which the language specified meets the needs of the full-text community. The Working Groups believe that this work is essentially complete and intend to advance it as soon as possible.

This is the sixth version of this document. Since the last version was published several technical and editorial changes have been made. Among the most significant changes are: The formal semantics diagrams have been redrawn. A conformance statement has been added. XML Schemas that together define the XML representation of XQuery 1.0 and XPath 2.0 Full-Text have been added, along with a stylesheet to transform that XML representation to the ordinary XQuery syntax. Section 3 has been significantly restructured for clarity and readability. The semantics of nesting FTDistance selections have been made more useful. The semantics for FTMildNot now properly handle phrases. See Appendix for more information on these and other changes.

Of the XQuery 1.0 and XPath 2.0 Full Text documents, only this document, XQuery 1.0 and XPath 2.0 Full-Text 1.0, is a Last Call document. The XQuery and XPath Full-Text Requirements , although not on the Recommendation track, is being republished concurrently with this document in order to demonstrate the degree to which this document satisfies those Requirements. The XQuery Full-Text Use Cases document, although not on the Recommendation track, is being republished concurrently with this document in order to illustrate various use cases that guided the design of the XQuery 1.0 and XPath 2.0 Full Text specification.

Public Last Call comments on this document and its open issues are invited. Comments on this document are due by 22 June 2007. Comments on this document should be made in W3C's public Bugzilla system for this specification (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). When entering comments, select the Product named "XPath / XQuery / XSLT", the Component named "Full Text", and the Version named "Last Call drafts". This repository includes open issues recorded by the Query Working Group as well as by members of the public. If access to the Bugzilla system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery mailing list, public-qt-comments@w3.org It will be very helpful if you include the string [FT] in the subject line of your comment, whether made in Bugzilla or in email. Each Bugzilla entry and email message should contain only one comment. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

English EBNF

SA January 2004: First version of document before Feb F2F

SA 26 February 2004: Second version of document before Feb F2F meetings.

JM 18 May 2007: Last Call Working Draft

Introduction

This document defines the language and the formal semantics of XQuery 1.0 and XPath 2.0 Full-Text 1.0. This language is designed to meet the requirements identified in W3C XQuery and XPath Full-Text Requirements and to support the queries in the W3C XQuery Full-Text Use Cases .

XQuery 1.0 and XPath 2.0 Full-Text 1.0 extends the syntax and semantics of XQuery 1.0 and XPath 2.0.

Full-Text Search and XML

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 1.0 and XPath 2.0.

XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.

Full-text search is different from substring search in many ways:

A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases the 20.9 version ...". A full-text search for the token "lease" will not.

There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as "mouse" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens "XML" and "Query" allowing up to 3 intervening words.

Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full-Text.

The following definitions apply to full-text search:

Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization. Informally, tokenization breaks a character string into a sequence of words, units of punctuation, and spaces.

A token is defined as a character, n-gram, or sequence of characters returned by a tokenizer as a basic unit to be searched. Each instance of a token consists of one or more consecutive characters. Beyond that, tokens are implementation-defined. Note that consecutive tokens need not be separated by either punctuation or space, and tokens may overlap. A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.

In some natural languages, tokens and words can be used interchangeably.

Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).

Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).

Tokenization also uniquely identifies sentences and paragraphs in which tokens appear. A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences. A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs. Whatever a tokenizer for a particular language chooses to do, it must preserve the containment hierarchy: paragraphs contain sentences, which contain tokens.

The tokenizer has to process two codepoint equal strings in the same way, i.e., it should identify the same tokens. Everything else about the behavior of the tokenizer is implementation-defined.

This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.

Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization.

Certain aspects of language processing are described in this specification as implementation-defined or implementation-dependent.

Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.

Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.

Organization of this document

This document is organized as follows. We first present a high level syntax for the XQuery 1.0 and XPath 2.0 Full-Text 1.0 language along with some examples. Then, we present the syntax and examples of the basic primitives in the XQuery 1.0 and XPath 2.0 Full-Text 1.0 language. This is followed by the semantics of the XQuery 1.0 and XPath 2.0 Full-Text 1.0 language. The appendix contains a section that provides an EBNF for the XPath 2.0 Grammar with Full-Text extensions, an EBNF for XQuery 1.0 Grammar with Full-Text extensions, acknowledgements and a glossary.

A word about namespaces

Certain namespace prefixes are predeclared by XQuery 1.0 and, by implication, by this specification, and bound to fixed namespace URIs. These namespace prefixes are as follows:

xml = http://www.w3.org/XML/1998/namespace

xs = http://www.w3.org/2001/XMLSchema

xsi = http://www.w3.org/2001/XMLSchema-instance

fn = http://www.w3.org/2005/xpath-functions

local = http://www.w3.org/2005/xquery-local-functions

In addition to the prefixes in the above list, this document uses the prefix err to represent the namespace URI http://www.w3.org/2005/xqt-errors, This namespace prefix is not predeclared and its use in this document is not normative. Error codes that are not defined in this document are defined in other XQuery 1.0 and XPath 2.0 specifications, particularly and .

Finally, this document uses the prefix fts to represent a namespace containing a number of functions used in this document to describe the semantics of XQuery 1.0 and XPath 2.0 Full-Text functions. There is no requirement that these functions be implemented, therefore no URI is associated with that prefix.

Full-Text Extensions to XQuery and XPath

XQuery 1.0 and XPath 2.0 Full-Text extends the languages of XQuery 1.0 and XPath 2.0 in three ways. It:

Adds a new expression called FTContainsExpr;

Enhances the syntax of FLWOR expressions in XQuery 1.0 and for expressions in XPath 2.0 with optional score variables; and

Adds static context declarations for full-text match options to the query prolog.

Additionally, it extends the data model and processing models in various ways.

Processing Model

As part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance, a full-text process called tokenization is usually executed.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of token occurrences found in the target text (nodes) of a search. These token occurrences are characterized by unique identifiers that capture the relative position of the token inside the string, the relative position of the sentence containing the token, and the relative position of the paragraph containing the token.

Tokenization, including the definition of the term "words", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interprete the results of tokenization. Tokenization MUST only conform to these constraints:

Each word MUST consist of one or more consecutive characters;

The tokenizer MUST preserve the containment hierarchy (e.g., paragraphs contain sentences, which contain words); and

The tokenizer MUST, when tokenizing two equal strings, identify the same tokens in each.

A sample tokenization is used for the examples in this document. The results might be different for other tokenizations.

A full-text contains expression (), evaluated within the normal Query Processing (XQuery Processing Model), is composed of several parts:

An XPath 2.0 or XQuery 1.0 expression (RangeExpr) that specifies the sequence of items to be searched. Those items are called the search context.

The full-text selection to be applied (). Full-text selections are, syntactically and semantically, fully composable and contain:

Required:

Words and phrases for which a search is performed ().

Optional:

Match options, such as indicators for case sensitivity and stop words ();

Boolean full-text operators, that compose a full-text selection from simpler full-text selections ();

Other full-text operators that are constraints on the positions of matches, such as indicators for distance between tokens and for the cardinality of matches ( and ); and

The weighting information. Each individual search term in a full-text selection may be annotated with optional weight information. This information may be used during the evaluation of the full-text selections to calculate scoring, information that quantifies the relevance of the result to the given search criteria.

An optional XPath 2.0 or XQuery 1.0 expression (UnionExpr) that specifies the set of nodes, descendents of the RangeExp, which contents may be ignored for the purpose of determining a match during the search ().

The results of the evaluation of the full-text selection operators are instances of the AllMatches model, which complements the XQuery Data Model (XDM) for processing full-text queries. An AllMatches instance describes all possible solutions to the full-text query for a given search context item. Each solution is described by a Match instance. A Match instance contains the tokens from the search context that must be included (described using StringInclude instances which model the positive terms) and the tokens from search context item that must be excluded (described using StringExclude instances which model the negative terms). Each negative or positive term is modeled as a tuple: the position of the query word or phrase in the full-text selection, and a TokenInfo structure that describes a consecutive sequence of token occurrences in the text string which match the query word or phrase.

Figure 1 provides a schematic overview of the XQuery 1.0 and XPath 2.0 Full-Text processing steps that are discussed in detail below. Some of these steps are completely outside the domain of XQuery; in Figure 1, these are depicted outside the black line that represents the boundaries of the language. The diagram only shows the central pieces of the XQuery Processing Model (see ), however zooms in on the Execution Engine where the processing of the Full-Text extensions takes place. The full-text processing steps are labeled as FTn within the diagram and are referenced within the text.

Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all full-text selections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization normally occurs at the time of parsing of the original XML documents, for example, during the Data Model Generation process (see Figure 1). But here it may also occur "on-the-fly" transforming an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of full-text selections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models.

The resulting AllMatches instance obtained by the evaluation of a Full Text expression is converted into a Boolean value before being returned to the enclosing XPath or XQuery operation as follows. If at least one member of the disjunction contains only positive terms then value returned is true. If all members of the disjunction contain negative terms the result is false.

Weighting information, in an implementation-dependent fashion, may be used when calculating the scoring information computed and made available by FTContainsExpr to the optional score construct.

Given the components of a given Full Text expression, the evaluation algorithm will proceed according to the following steps, also referenced in the processing model diagram as steps FTn (see Fig. 1):

Evaluate the search context expression, resulting in the set of search context items; (FT1 provides the evaluation of any XPath 2.0 or XQuery 1.0 expressions that generates or modifies the search context, as well as the query string(s) in a partially evaluated full-text selection)

Evaluate the (optional) ignore expression, resulting in the set of ignored nodes and virtually delete the ignore nodes from the search context nodes tree. (Included in FT1)

Apply the tokenization algorithm to query string(s).

For each search context item:

Apply the tokenization algorithm in order to extract potentially matching terms together with their positional information. This step results in a sequence of token occurrences.

Evaluate the simple "FTWord" operators in the full-text selection against the tokenized input. This results in a set of AllMatches instances. (FT3)

Evaluate the rest of the full-text selection operator tree in a bottom up fashion. At each step the AllMatches instance produced by the previous steps are given as input, and a new instance of the AllMatches is obtained as output. At each step the FTMatchOptions are controlling the semantics of the application of the FTWords operator. (FT4)

Convert the AllMatches instance into a Boolean value. (FT5)

The additional scoring information (also part of FT5) that is produced by the evaluation of the Full Text expression is implementation-dependent and is not specified in this document. The scoring information is made available at the same time the Boolean value is returned.

Section describes the syntax and the informal semantics of Full Text operators. Their formal semantics as well as the formal definition of the AllMatches data model are given in Section .

Full-text Contains Expression

A full-text contains expression is a expression that evaluates a sequence of nodes against a full-text selection.

As a syntactic construct a full-text contains expression (grammar symbol: FTContainsExpr) behaves like a comparison expression (see ). This grammar rule introduces FTContainsExpr.

ComparisonExprFTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?

A full-text contains expression may be used anywhere a ComparisonExpr may be used. The ftcontains operator has higher precedence than other comparison operators, so the results of ftcontains expressions may be compared without enclosing them in parentheses.

Description FTContainsExprRangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?

A full-text contains expression returns a Boolean value. It returns true if there is some node in the RangeExpr that, after tokenization, matches the full-text selection FTSelection. See Section for more details. For the purpose of determining a match, certain descendants of nodes (identified by FTIgnoreOption) in the RangeExpr may be ignored, as specified in Section .

An XQuery 1.0 and XPath 2.0 Full-Text processor SHOULD try to use the information available in xml:lang for processing of collations, as well as the various match options defined in Section .

Examples

The following example in XQuery 1.0 Full-Text returns the author of each book with a title containing a token with the same root as dog and the token cat. for $b in /books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author

The same example in XPath 2.0 Full-Text is written as: /books/book[title ftcontains ("dog" with stemming) ftand "cat"]/author

This example selects books where either the title contains the token dog and the token cat and the content does not contain a token with the same root as train, or where the title fails to have one of the matching tokens but the content does:

/books/book[title ftcontains "dog" ftand "cat" ne content ftcontains ("train" with stemming)] Score Variables

Besides specifying a match of a full-text search as a Boolean condition, full-text search applications typically also have the ability to associate scores with the results. Scores express the relevance of those results to the full-text search conditions.

XQuery 1.0 and XPath 2.0 Full-Text extends the languages of XQuery 1.0 and XPath 2.0 further by adding optional score variables to the for and let clauses of FLWOR expressions.

The production for the extended for clause follows. ForClause"for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)* FTScoreVar"score" "$" VarName

When a score variable is present in a for clause the evaluation of the expression following the in keyword not only needs to determine the result sequence of the expression, i.e., the sequence of items which are iteratively bound to the for variable. It must also determine in each iteration the relevance "score" value of the current item and bind the score variable to that value.

The semantics of scoring and how it relates to second-order functions is discussed in Section .

In the following example book elements are determined that satisfy the condition [content ftcontains "web site" ftand "usability" and .//chapter/title ftcontains "testing"]. The scores assigned to the book elements are returned. for $b score $s in /books/book[content ftcontains "web site" ftand "usability" and .//chapter/title ftcontains "testing"] return $s

XPath 2.0 Full-Text extends the language of XPath 2.0 in the for expression in the same way: with optional score variables. The example above is also a legal example of the XPath 2.0 extension.

Scores are typically used to order results, as in the following, more complete example. for $b score $s in /books/book[content ftcontains "web site" ftand "usability"] where $s > 0.5 order by $s descending return <result> <title> {$b//title} </title> <score> {$s} </score> </result>

Note that the score applies to the entire for expression. In the following example, two separate full-text contains expressions are used to select the matching paragraphs. There is still just one score for each para returned. The highest scoring paragraphs will be returned first:

for $p score $s in //book[title ftcontains "software"]/para[. ftcontains "usability"] order by $s descending return $p

The following more elaborate example uses multiple score variables to return the matching paragraphs ordered so that those from the highest scoring books precede those from the lowest scoring books, where the highest scoring paragraphs of each book are returned before the lower scoring paragraphs of that book:

for $b score $score1 in //book[title ftcontains "software"] order by $score1 descending return for $p score $score2 in $b/para[. ftcontains "usability"] order by $score2 descending return $p

The score variable is bound to a value which reflects the relevance of the match criteria in the full-text selections to the nodes in the respective RangeExprs. The calculation of relevance is implementation-dependent, but score evaluation must follow these rules:

Score values are of type xs:double in the range [0, 1].

For score values greater than 0, a higher score must imply a higher degree of relevance

Similarly to their use in a for clause, score variables may be specified in a let clause. A score variable in a let clause is also bound to the score of the expression evaluation, but in the let clause one score is determined for the complete result. The let variable may be dropped from the let clause, if the score variable is present.

The production for the extended let clause follows. LetClause(("let" "$" VarName TypeDeclaration?) | ("let" "score" "$" VarName)) ":=" ExprSingle ("," (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle)*

While when using the score option in a for clause the expression following the in keyword has the dual purpose of filtering, i.e., driving the iteration, and determining the scores, it is possible to separately specify expressions for filtering and scoring by combining a simple for clause with a let clause that uses scoring. The following is an example of this. for $b in /books/book[.//chapter/title ftcontains "testing"] let score $s := $b/content ftcontains "web site" ftand "usability" order by $s descending return <result score="{$s}">{$b}</result> This example returns book elements with chapter titles that contain "testing". Along with the book elements scores are returned. These scores, however, reflect whether the book content contains "web site" and "usability".

Note that it is not a requirement of the score of an FTContainsExpr to be 0, if the expression evaluates to false, nor to be non-zero, if the expression evaluates to true. Hence, in the example above it is not possible to infer the Boolean value of the FTContainsExpr in the let clause from the calculated score of a returned result element. For instance, an implementation may want to assign a non-zero score to a book that contained only "web site", but not "usability", as this may be considered more relevant than a book that does not contain either of both.

The expression ExprSingle assigned to the score variable is passed to the scoring algorithm and is not evaluated directly. The scoring algorithm calculates the score value based on the passed expression (not on the value returned by evaluating the expression). The set of supported expressions is implementation-defined.

The use of score variables introduces a second-order aspect to the evaluation of expressions which cannot be emulated by (first-order) XQuery functions. Consider the following replacement of the clause let score $s := FTContainsExpr

let $s := score(FTContainsExpr)

where a function score is applied to some FTContainsExpr. If the function score were first-order, it would only be applied to the result of the evaluation of its argument, which is one of the Boolean constants true or false. Hence, there would be at most two possible values such a score function would be able to return and no further differentiation would be possible.

Using Weights Within a Scored FTContainsExpr

Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions. Syntactically weight declarations are introduced in the FTSelection production, described in Section .

The effect of weights on the result score is implementation-dependent. However, weight declarations must follow these rules:

Weights in an FTContainsExpr are significant only in relation to each other; and

When no explicit weight is specified, the default weight is 1.0.

The weight must be between 0.0 and 1000.0 inclusive.

Weight declarations in an FTContainsExpr for which no scores are evaluated are ignored.

The following example illustrates how different weights can be used for different search terms. for $b in /books/book let score $s := $b/content ftcontains ("web site" weight 0.5) ftand ("usability" weight 2) return <result score="{$s}">{$b}</result>

Extensions to the Static Context

The XQuery Static Context is extended by a component for each of the full-text match options. Thus, the default of a match option in a query may be changed by providing a setting in the static context using the following declaration syntax. Prolog((DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator)* ((VarDecl | FunctionDecl | OptionDecl | FTOptionDecl) Separator)* FTOptionDecl"declare" "ft-option" FTMatchOptions Match options modify the match semantics of full-text expressions. They are described in detail in Section . When a match option is specified explicitly in a query, that setting overrides the setting of the respective match option in the static context.

Full-Text Selections

This section describes the full-text selections which contain the full-text operators in a full-text contains expression (FTContainsExpr), as well as the match options which modify the matching semantics of the full-text selections. In the following the syntax for each type of full-text selection is given together with an informal statement of its meaning.

A full-text selection specifies the possible full-text search conditions.

FTSelectionFTOr FTPosFilter* ("weight" RangeExpr)?

As shown in the grammar, a full-text selection consists of search conditions possibly involving logical operators (FTOr) followed by an arbitrary number of positional filters (FTPosFilter) optionally followed by a "weight" value which is specified using a range expression. The RangeExpr is evaluated, as if it were an argument to a function with an expected type "xs:double"; it must be between 0.0 and 1000.0 inclusive.

The syntax and semantics of the individual full-text selection operators follow.

This XML document fragment is the source document for examples in this section.

<book number="1"> <title shortTitle="Improving Web Site Usability">Improving the Usability of a Web Site Through Expert Reviews and Usability Testing</title> <author>Millicent Marigold</author> <author>Montana Marigold</author> <editor>Véra Tudor-Medina</editor> <content> <p>The usability of a Web site is how well the site supports the users in achieving specified goals. A Web site should facilitate learning, and enable efficient and effective task completion, while propagating few errors. </p> <note>This book has been approved by the Web Site Users Association. </note> </content> </book>

Tokenization is implementation-defined. A sample tokenization is used for the examples in this section. This sample tokenization uses white space, punctuation and XML tags as word-breakers and <p> for paragraph boundaries. The results may be different for other tokenizations.

The first five tokens in this example using the sample tokenization would be "Improving", "the", "usability", "of", and "a".

Unless stated otherwise, the results assume a case-insensitive match.

Primary Full-Text Selections FTPrimary(FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection

A primary full-text selection is the basic form of a full-text selection. It specifies words and phrases as search conditions (FTWords), optionally followed by a cardinality constraint (FTTimes). An FTSelection in parentheses is also a primary full-text selection.

Search Tokens and Phrases FTWordsFTWordsValue FTAnyallOption? FTWordsValueLiteral | ("{" Expr "}") FTAnyallOption("any" "word"?) | ("all" "words"?) | "phrase"

FTWords finds matches that contain the specified tokens and phrases.

FTWords consists of two parts: a mandatory FTWordsValue part and an optional FTAnyallOption part. FTWordsValue specifies the tokens and phrases that must be contained in the matches. FTAnyallOption specifies how containment is checked.

The FTWordsValue is converted as though it were an argument to a function with the expected type of "xs:string*".

In general, the tokens and phrases in FTWordsValue are specified using a nested XQuery expression. To simplify notation, the enclosing braces may be omitted if FTWordsValue consists of a single literal.

The following rules specify how the containment of the strings from the FTWordsValue sequence is checked. First, every string is tokenized into a sequence of tokens as described in Section 4.1 Tokenization. Then, FTAnyallOption is checked.

If FTAnyallOption is "any", the sequence of tokens for every string is considered as a phrase, i.e. the tokens must occur consecutively in the text in the specified order. If the sequence contains more than one string, the different strings are considered to be alternatives, i.e. the resulting matches must contain at least one of the generated phrases.

If FTAnyallOption is "all", the sequence of tokens for every string is considered as a phrase. The resulting matches must contain all of the generated phrases.

If FTAnyallOption is "phrase", the tokens from all the strings are concatenated in a single sequence, which is considered as a phrase. The resulting matches must contain the generated phrase.

If FTAnyallOption is "any word", the tokens from all the strings are combined into a single set. The resulting matches must contain at least one of the tokens in the set.

If FTAnyallOption is "all words", the tokens from all the strings are combined into a single set. The resulting matches must contain all of the tokens in the set.

If the FTWordsValue evaluates to a single string, the use of "any", "all", and "phrase" in FTAnyallOption produces the same results.

If FTAnyallOptions is omitted, "any" is the default.

The following expression returns the book element whose number is 1, because its title element contains the token "Expert":

/book[@number="1" and ./title ftcontains "Expert"]

The following expression returns the book element whose number is 1, because its title element contains the phrase "Expert Reviews":

/book[@number="1" and ./title ftcontains "Expert Reviews"]

The following expression returns the book element whose number is 1, because its title element contains two tokens "Expert" and "Reviews":

/book[@number="1" and ./title ftcontains {"Expert", "Reviews"} all]

The following expression returns false, because the p element doesn't contain the phrase "Web Site Usability" although it contains all of the tokens in the phrase:

/book[@number="1"]//p ftcontains "Web Site Usability"

The following expression returns book numbers of book elements by "Marigold" with a title about "Web Site Usability", sorting them in descending score order:

for $book in /book[.//author ftcontains "Marigold"] let score $score := $book/title ftcontains "Web Site Usability" where $score > 0.8 order by $score descending return $book/@number Match Options

Full-text match options modify the matching behaviour of the primary full-text selection to which they are applied.

Match options modify the set of tokens in the query, or how they are matched against tokens in the text.

Each of the seven alternatives of production FTMatchOption corresponds to one match option group. The match options from any given group are mutually exclusive, i.e., only one of these settings can be in effect, whereas match options of different groups can be combined freely.

Note that, along with the syntax rules above, there is an extra-grammatical constraint, multiple-match-options , which needs to be considered, if multiple match options are specified. It states that within a single FTMatchOptions at most one match option of any given match option group may be specified. For example, if the FTCaseOption "lowercase" is specified, then "uppercase" cannot also be specified as part of the same FTMatchOptions.

Although match options only take effect in the application of FTWords, the syntax also allows to specify match options that modify the non-primitive full-text selection "(" FTSelection ")". Such a higher-level match option provides a default for the respective match option group for any embedded FTPrimary, just as the static context components corresponding to the match option groups provide default match options for the whole query. Details about these context components, including their default values, are given in Appendix .

In other words, there is a tuple of seven effective match options, one from each group, which are propagated from top to bottom in the query syntax tree. For the top-level query the seven values are given by the static context and at each FTPrimary the locally (like postfix operators) specified match options may override these propagated values. Thus, any occurrence of an FTWords in a query is associated with seven effective match options, one from each group, that influence its matching.

The order in which effective match options for an FTWords are applied is subject to some constraints:

The Language Option must be applied first

The Stemming Option must be applied before the Case Option and the Diacritics Option

Aside from these constraints, the full order of the application of match options is implementation-defined. This order is called the match option application order.

More information on their semantics is given in .

If no match options declarations are present in the prolog and the implementation does not define any overwriting of the static context components for the match options, the query:

/book/title ftcontains "usability"

is, assuming "de" is the implementation-defined default language, equivalent to the query:

/book/title ftcontains "usability" case insensitive diacritics insensitive without stemming without thesaurus without stop words language "de" without wildcards

We describe each match option group in more detail in the following sections.

Case Option FTCaseOption("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"

A case option modifies the matching of tokens and phrases by specifying how uppercase and lowercase characters are considered.

There are four possible character case options:

Using the option "case insensitive" tokens and phrases are matched, regardless of the case of characters of the query tokens and phrases.

Using the option "case sensitive" tokens and phrases are matched, if and only if the case of their characters is the same as written in the query.

Using the option "lowercase" tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only lowercase characters.

Using the option "uppercase" tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only uppercase characters.

The default is "case insensitive".

The following table summarizes the interactions between the case match options and the use of the default collation.

Case Matrix

Default collation options/Case options UCC (Unicode Codepoint Collation) CCS (some generic case-sensitive collation) CCI (some generic case-insensitive collation)

insensitive compare as if both lower case-insensitive variant of CCS if it exists, else error CCI

sensitive UCC CCS case-sensitive variant of CCI if it exists, else error

lowercase lowercase(Expr) + UCC lowercase(Expr) + CCS CCI

uppercase uppercase(Expr) + UCC uppercase(Expr) + CCS CCI

Case Matrix
Default collation options/Case options	UCC (Unicode Codepoint Collation)	CCS (some generic case-sensitive collation)	CCI (some generic case-insensitive collation)
insensitive	compare as if both lower	case-insensitive variant of CCS if it exists, else error	CCI
sensitive	UCC	CCS	case-sensitive variant of CCI if it exists, else error
lowercase	lowercase(Expr) + UCC	lowercase(Expr) + CCS	CCI
uppercase	uppercase(Expr) + UCC	uppercase(Expr) + CCS	CCI

In this table, "else error" means "Otherwise, an error is raised: ". The phrase "if it exists" is used, because the case-sensitive collation CCS does not always have a case-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the case-insensitive collation CCI does not always have a case-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

Using the "lowercase" (respectively "uppercase") option is equivalent to using the option "case sensitive", while converting the query strings to their lowercase (respectively uppercase) form before matching.

The following expression returns false, because the title element doesn't contain "usability" in lower-case characters:

/book[@number="1"]/title ftcontains "Usability" lowercase

The following expression returns true, because the character case is not considered:

/book[@number="1"]/title ftcontains "usability" case insensitive Diacritics Option FTDiacriticsOption("diacritics" "insensitive")
| ("diacritics" "sensitive")

A diacritics option modifies token and phrase matching by specifying how diacritics are considered.

There are two possible diacritics options:

The option "diacritics" "insensitive" matches tokens and phrases with and without diacritics. Whether diacritics are written in the query or not is not considered.

The option "diacritics" "sensitive" matches tokens and phrases only if they contain the diacritics as they are written in the query.

The default is "diacritics insensitive".

The following table summarizes the interactions between the diacritics match options and the use of the default collations.

Diacritics Matrix

Default collation options/Diacritics options UCC (Unicode Codepoint Collation) CDS (some generic diacritics-sensitive collation) CDI (some generic diacritics-insensitive collation)

insensitive UCC comparison, but without considering diacritics diacritics-insensitive variant of CDS if it exists, else error CDI

sensitive UCC CDS diacritics-sensitive variant of CDI if it exists, else error

Diacritics Matrix
Default collation options/Diacritics options	UCC (Unicode Codepoint Collation)	CDS (some generic diacritics-sensitive collation)	CDI (some generic diacritics-insensitive collation)
insensitive	UCC comparison, but without considering diacritics	diacritics-insensitive variant of CDS if it exists, else error	CDI
sensitive	UCC	CDS	diacritics-sensitive variant of CDI if it exists, else error

In this table, "else error" means "Otherwise, an error is raised: ". The phrase "if it exists" is used, because the diacritics-sensitive collation CDS does not always have a diacritics-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the diacritics-insensitive collation CDI does not always have a diacritics-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

The following expression returns true, because the token "Véra" in the editor element is matched, as the acute accent is not considered in the comparison:

/book[@number="1"]//editor ftcontains "Vera" diacritics insensitive

This returns false, because the editor element does not contain the token "Vera" in this exact form, i.e. without any diacritics:

/book[@number="1"]/editors ftcontains "Vera" diacritics sensitive Stemming Option FTStemOption("with" "stemming") | ("without" "stemming")

A stemming option modifies token and phrase matching by specifying whether stemming is applied or not.

The "with stemming" option specifies that matches may contain tokens that have the same stem as the tokens and phrases written in the query. It is implementation-defined what a stem of a token is.

The "without stemming" option specifies that the tokens and phrases are not stemmed.

It is implementation-defined whether the stemming is based on an algorithm, dictionary, or mixed approach.

The default is "without stemming".

The following expression returns true, because the title of the specified book contains "improving" which has the same stem as "improve":

/book[@number="1"]/title ftcontains "improve" with stemming Thesaurus Option FTThesaurusOption("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus") FTThesaurusID"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")? URILiteralStringLiteral

A thesaurus option modifies token and phrase matching by specifying whether a thesaurus is used or not. If thesauri are used, the thesaurus option specifies information to locate the thesauri either by default or through a URI reference. It also states the relationship to be applied and how many levels within the thesaurus to be traversed.

The value of the FTThesaurusID must be a URILiteral.

Thesauri add related tokens and phrases to the search. Thus, the user may narrow, broaden, or otherwise modify the search using synonyms, hypernyms (more generic terms), etc. The search is performed as though the user has specified all related search tokens and phrases in a disjunction (FTOr).

A thesaurus may be standards-based or locally-defined. It may be a traditional thesaurus, or a taxonomy, soundex, ontology, or topic map. How the thesaurus is represented is implementation-dependent.

FTThesaurusID specifies the relationship sought between tokens and phrases written in the query and terms in the thesaurus and the number of levels to be queried in hierarchical relationships by including an FTRange "levels". If no levels are specified, the default is to query all levels in hierarchical relationships.

Relationships include, but are not limited to, the relationships and their abbreviations presented in and their equivalents in other languages. The set of relationships supported by an implementation is implementation-defined, but implementations SHOULD support the relationships defined in . The following list of terms have the meanings defined in . If a query specifies thesaurus relationships or levels not supported by the thesaurus, the behavior is implementation-defined.

equivalence relationships (synoymns): PREFERRED TERM (USE), NONPREFERRED USED FOR TERM (UF);

hierarchical relationships: BROADER TERM (BT), NARROWER TERM (NT), BROADER TERM GENERIC (BTG), NARROWER TERM GENERIC (NTG), BROADER TERM PARTITIVE (BTP), NARROWER TERM PARTITIVE (NTP), TOP Terms (TT); and

associative relationships: RELATED TERM (RT).

The "with thesaurus" option specifies that string matches include tokens that can be found in one of the specified thesauri.

The "without thesaurus" option specifies that no thesaurus will be used.

The "with default thesaurus" option specifies that a system-defined default thesaurus with a system-defined relationship is used. The default thesaurus may be used in combination with other explicitly specified thesauri.

The default is "without thesaurus".

The following expression returns true, because it finds a content element containing "tasks" which the thesaurus identified as a synonym for "duties":

count(.//book/content ftcontains "duties" with thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml" relationship "UF")>0

The following expression returns book elements, because it finds a content element containing "web site components", and narrower terms "navigation" and "layout":

doc("http://bstore1.example.com/full-text.xml") /books/book[count(./content ftcontains "web site components" with thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml" relationship "NT" at most 2 levels)>0]

Assuming that there is a locally defined thesaurus that contains soundex capabilities, the following query returns a book element containing "Marigold" which sounds which sound like "Merrygould":

doc("http://bstore1.example.com/full-text.xml") /books/book[count(. ftcontains "Merrygould" with thesaurus at "http://bstore1.example.com/UsabilitySoundex.xml" relationship "sounds like")>0] Stop Word Option FTStopwordOption("with" "stop" "words" FTRefOrList FTInclExclStringLiteral*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTInclExclStringLiteral*) FTRefOrList("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")") FTInclExclStringLiteral("union" | "except") FTRefOrList

A stop word option controls word matching by specifying whether stop words are used or not. Stop words are tokens in the query that match any token in the text. Normally a stop word matches exactly one token, but there may be implementation-defined conditions, under which a stop word may match a different number of tokens.

FTRefOrList specifies the list of stop words either explicitly as a comma-separated list of string literals, or by the keyword at followed by a literal URI. If the URI specifies a list of stop words that is not found in the statically known stop word lists, an error is raised . Whether the stop word list is resolved from the statically known stop word lists or given explicitly, no tokenization is performed on the stop words: they are used as they occur in the sequence.

The "with stop words" option specifies that if a token is within the specified collection of stop words, it is removed from the search and any token may be substituted for it. Stop words retain their position numbers and are counted in FTDistance and FTWindow searches.

Multiple stop word lists may be combined using "union" or "except". The keywords "union" and "except" are applied from left to right. If "union" is specified, every string occurring in the lists specified by the left-hand side or the right-hand side is a stop word. If "except" is specified, only strings occurring in the list specified by the left-hand side but not in the list specified by the right-hand side are stop words.

The "with default stop words" option specifies that an implementation-defined collection of stop words is used.

The "without stop words" option specifies that no stop words are used. This is equivalent to specifying an empty list of stop words.

The default is "without stop words".

Stop word lists may be applied during indexing. If applied during indexing asking for stop words to not be used during a query, will have no effect.

The following expression returns true, because the document contains the phrase "propagating few errors":

/book[@number="1"]//p ftcontains "propagation of errors" with stemming with stop words ("a", "the", "of")

Note the asymmetry in the stop word semantics: the property of being a stop word is only relevant to query terms, not to document terms. Hence, it is irrelevant for the above-mentioned match whether "few" is a stop word or not, and on the other hand we do not want the query above to match "propagation" followed by 2 stop words, or even a sequence of 3 stop words in the document.

The following expression returns false, because "of" is not in the p element between "propagating" and "errors":

/book[@number="1"]//p ftcontains "propagation of errors" with stemming without stop words

The following expression uses the stop words list specified at the URL. Assuming that the specified stop word list contains the "then", this query is reduced to a query on the phrase "planning X conducting", allowing any token as a substitute for X. It returns a book element, because its content element contains "planning then conducting". It would also return the book if the phrases "planning and conducting" and "planning before conducting" had been in its content:

doc("http://bstore1.example.com/full-text.xml") /books/book[count(.//content ftcontains "planning then conducting" with stop words at "http://bstore1.example.com/StopWordList.xml")>0]

The following expression returns books containing "planning then conducting", but not does not return books containing "planning and conducting", since it is exempting "then" from being a stop word:

doc("http://bstore1.example.com/full-text.xml") /books/book[count(.//content ftcontains "planning then conducting" with stop words at "http://bstore1.example.com/StopWordList.xml" except ("the then"))>0] Language Option FTLanguageOption"language" StringLiteral

A language option modifies token matching by specifying the language of search tokens and phrases.

The StringLiteral following the keyword language designates one language. It must be castable to "xs:language"; otherwise, an error is raised: .

The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. The "language" option MAY influence the behavior of other match options in an implementation-defined way.

The set of standardized language identifiers are defined in . The set of valid language identifiers among the standardized set is implementation-defined. An implementation MAY choose to use private extensions introduced by a singleton 'x' for additional language identifiers, or other singletons for registered extensions as described in sec. 2.2.6 of . It is implementation-defined what additional language identifiers, if any, are valid. If an invalid language identifier is specified, then the behavior is implementation-defined. If the implementation chooses to raise an error in that case, it must raise .

The default language is specified in the static context.

When an XQuery 1.0 and XPath 2.0 Full-Text processor evaluates text in a document that is governed by an xml:lang attribute and the portion of the full-text query doing that evaluation contains an FTLanguageOption that specifies a different language that the language specified by the governing xml:lang attribute, the language-related behavior of that full-text query is implementation-defined.

This is an example where the language option is used to select the appropriate stop word list:

/book[@number="1"]//editor ftcontains "salon de the" with default stop words language "fr" Wildcard Option FTWildCardOption("with" "wildcards") | ("without" "wildcards")

A wildcard option modifies token and phrase matching by specifying whether wildcards are used or not.

When the "with wildcards" option is used, wildcard indicators (represented by periods (.)) and qualifiers may be appended to or inserted into the query tokens. If the period is at the beginning of a query token, the wildcard is a prefix wildcard. If the period is at the end of a query token, it is a suffix wildcard. If the period is inserted into a query token, it is an infix wildcard.

Each indicator and qualifier in a query token will match zero or more characters within a token in the text, as described below. The number of characters matched depends on the qualifier. Qualifiers available are none, question mark, asterisk, plus sign, and two numbers separated by a comma, both enclosed by curly braces.

If a period is present, but there are no qualifiers, one character in the text will match.

If a period is followed by a question mark (.?), zero or one characters in the text will match.

If a period is followed by an asterisk (.*), zero or more characters will match.

If a period is followed by a plus sign (.+), one or more characters will match.

If a period is followed by two numbers separated by a comma, both enclosed by curly braces (.{n,m}), a specified range of characters (at least n characters and no more than m characters) will match.

When "with wildcards" is present and an indicator or qualifier character is intended to be taken literally (as itself), that character must be preceded by ("escaped by") a backslash (\). For example, a period (.) that is intended to be a sentence terminator or a decimal point must be preceded by a backslash so that it is not interpreted to be an indicator. Similarly a question mark (?), asterisk (*), or plus sign (+) that is intended to be interpreted as an ordinary text character must be preceded by a backslash so that it is not interpreted to be an indicator.

The "without wildcards" option finds tokens without recognizing wildcard indicators and qualifiers. Periods, question marks, asterisks, plus signs, and two numbers separated by a comma, both enclosed by curly braces, are always recognized as ordinary text characters.

The default is "without wildcards".

Note: Wildcard indicators and qualifiers may be token boundaries. How text with wildcard indicators and qualifiers is tokenized is implementation-defined.

The expression returns true, because the title element contains "improving":

/book[@number="1"]/title ftcontains "improv.*" with wildcards

The following expression returns true, because the title element contains "site":

/book[@number="1"]/title ftcontains ".?site" with wildcards

The following expression returns true, because the p element contains "well":

/book[@number="1"]/p ftcontains "w.ll" with wildcards

The following expression returns false, because the p element does not contain "w.ll":

/book[@number="1"]/p ftcontains "w.ll" without wildcards Extension Option

An extension option is a match option that acts in an implementation-defined way.

FTExtensionOption"option" QName StringLiteral

An extension option consists of an identifying QName and a StringLiteral. Typically, a particular option will be recognized by some implementations and not by others. The syntax is designed so that option declarations can be successfully parsed by all implementations.

The QName of an option must resolve to a namespace URI and local name, using the statically known namespaces.

There is no default namespace for options.

Each implementation recognizes an implementation-defined set of namespace URIs used to denote extension options.

If the namespace part of the QName is not a namespace recognized by the implementation as one used to denote extension option, then the extension option is ignored.

Otherwise, the effect of the extension option, including its error behavior, is implementation-defined. For example, if the local part of the QName is not recognized, or if the StringLiteral does not conform to the rules defined by the implementation for the particular extension option, the implementation may choose whether to report an error, ignore the extension option, or take some other action.

Implementations may impose rules on where particular extension options may appear relative to other match options, and the interpretation of an option declaration may depend on its position.

An extension option must not be used to change the syntax accepted by the processor, or to suppress the detection of static errors. However, it may be used without restriction to modify the set of tokens in the query or how they are matched against tokens in the text. An extension option has the same scope as other match options.

The following examples illustrate several possible uses for extension options:

This extension option is set as part of the static context of all full-text expressions in the module and might be used to ensure that queries are insensitive to Arabic short-vowels.

declare namespace exq = "http://example.org/XQueryImplementation"; declare ft-option option exq:diacritics "short-vowel insensitive"

This extension option applies only to the matching in the full-text selection in which it is found and might be used to specify how compound words should be matched.

declare namespace exq = "http://example.org/XQueryImplementation"; //para[. ftcontains "Kinder" ftand "Platz" distance 1 words with stemming option exq:compounds "distance=1" Logical Full-Text Operators

Full-text selections can be combined with the logical connectives ftor (full-text or), ftand (full-text and), not in (mild not), and ftnot (unary full-text not).

FTOrFTAnd ( "ftor" FTAnd )* FTAndFTMildNot ( "ftand" FTMildNot )* FTMildNotFTUnaryNot ( "not" "in" FTUnaryNot )* FTUnaryNot("ftnot")? FTPrimaryWithOptions Or-Selection

An or-selection combines two full-text selections using the ftor operator.

An or-selection finds all matches that satisfy at least one of the operand full-text selections.

The following expression returns the book element written by "Millicent":

/book[.//author ftcontains "Millicent" ftor "Voltaire"] And-Selection

An and-selection combines two full-text selections using the ftand operator.

An and-selection finds matches that satisfy all of the operand full-text selections simultaneously. A match of an and-selection is formed by combining matches for each of the operand full-text selections as described in .

For example, "usability" ftand "testing" will find two matches in /book[@number="1"]/title: each of the two matches for the FTWords selection "usability" (the two occurrences of the token "usability" in the string value of the title element) is combined with the single match for the FTWords "testing" (only one occurrence of the token "testing" in the title). Since the above and-selection has at least one match, the following expression will return "true".

/book[@number="1"]/title ftcontains ("usability" ftand "testing")

The following expression returns false, because "Millicent" and "Montana" are not contained by the same author element in any book element:

/book/author ftcontains "Millicent" ftand "Montana"

No author element in any book element contains both "Millicent" and "Montana". Therefore, for any such author element, there are either one match for the FTWords "Millicent" and zero matches for the FTWords "Montana", or vice versa, or no matches for both of them. In any of these cases, the and-selection will have zero matches.

Mild-Not Selection

A mild-not selection combines two full-text selections using the not in operator.

The not in operator is a milder form of the operator combination ftand ftnot. The selection A not in B matches a token sequence that matches A, but not when it is a part of a match of B. In contrast, A ftand ftnot B only finds matches, when the token sequence contains A and does not contain B.

As an example, consider a search for "Mexico" not in "New Mexico". This may return, among others, a document which is all about "Mexico" but mentions at the end that "New Mexico was named after Mexico". The occurrence of "Mexico" in "New Mexico" is not considered, but other occurrences of "Mexico" are matched. Note that this document would not be matched by the full-text selection "Mexico" ftand ftnot "New Mexico".

A match to a mild-not selection must contain at least one token occurrence that satisfies the first condition and does not satisfy the second condition. If it contains a token occurrence that satisfies both the first and the second condition, the occurrence is not considered as a match.

The following expression returns true, because "usability" appears in the title and the p elements and the occurrence within the phrase "Usability Testing" in the title element is not considered:

/book ftcontains "usability" not in "usability testing"

Operands of a mild-not selection may not contain a full-text selection that evaluates to an AllMatches that contains a StringExclude. Such full-text selections are not-selection and FTWords with a cardinality constraint using at most, from ... to, and exactly occurrences ranges.

Not-Selection

A not-selection is a full-text selection starting with the prefix operator ftnot.

A not-selection selects matches that do not satisfy the operand full-text selection. Details about how such matches are constructed are given in .

The following expression returns the empty sequence, because all book elements contain "usability":

/book[. ftcontains ftnot "usability"]

The following expression returns true, because book elements contain "information" and "retrieval" but not "information retrieval":

/book ftcontains "information" ftand "retrieval" ftand ftnot "information retrieval"

The following expression returns book elements containing "web site usability" but not "usability testing":

/book[. ftcontains "web site usability" ftand ftnot "usability testing"] Positional Filters FTPosFilterFTOrder | FTWindow | FTDistance | FTScope | FTContent

Positional filters are postfix operators that serve to filter matches based on various constraints on their positional information.

Recall that the grammar rule for FTSelection allows an arbitrary number of positional filters to follow an FTOr. Multiple adjacent positional filters are applied from left to right, i.e., the first filter is applied to the result of the FTOr, the second is applied to the result of that first application, and so on.

Ordered Selection FTOrder"ordered"

An ordered selection consist of a full-text selection followed by the postfix operator "ordered". An ordered selection controls the order of tokens and phrases to be the same as the order in which they are written in the operand selection.

The default is unordered. Unordered is in effect when ordered is not specified in the query. Unordered cannot be written explicitly in the query.

An ordered selection selects matches which satisfy the operand full-text selection and for which the order the matching tokens have in the text is the same order that the corresponding query tokens have in the operand selection.

The following expression returns true, because titles of book elements contain "web site" and "usability" in the order in which they are written in the query, i.e., "web site" must precede "usability":

/book/title ftcontains ("web site" ftand "usability") ordered

The following expression returns false, because although "Montana" and "Millicent" both appear in the book element, they do not appear in the order they are written in the query:

/book[@number="1"] ftcontains ("Montana" ftand "Millicent") ordered Window Selection FTWindow"window" AdditiveExpr FTUnit FTUnit"words" | "sentences" | "paragraphs"

A window selection consist of a full-text selection followed by one of the (complex) postfix operators derived from FTWindow. A window selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases, more precisely the individual StringIncludes of that match, are found within a number of FTUnits (words, sentences, and paragraphs). The number of FTUnits is specified by an AdditiveExpr that is converted as though it were an argument to a function with the expected type of "xs:integer".

A window selection may cross element boundaries. The size of the window is not affected by the presence or absence of element boundaries. Stop words are included in the computation of the window size whether they are ignored by the query or not.

A match of an FTSelection is considered a match within a window, if there exists a window of at most the given number of consecutive units (tokens, sentences, or paragraphs) in the document within which all StringIncludes of the match lie.

The following expression returns true, because "web", "site", and "usability" are within a window of 5 tokens in the title element:

/book/title ftcontains "web" ftand "site" ftand "usability" window 5 words

The following expression returns true, because "web" and "site" in the order they are written in the query and either "usability" or "testing" are within a window of at most 10 tokens:

/book ftcontains ("web" ftand "site" ordered) ftand ("usability" ftor "testing") window 10 words

The following expression returns true, because the title element contains "Web Site Usability". A similar query on the p element would not return true, because its occurrences of "web site" and "usability" are not within a window of 3:

/book//title ftcontains "web site" ftand "usability" window 3 words

The following expression returns the empty sequence, because in the selected book element, there is no occurrence of "efficient" within a window of 3 tokens which would not also contain an occurrence of "and":

/book[@number="1" and . ftcontains "efficient" ftand ftnot "and" window 3 words]

In order to allow meaningful results for nested positional filters, e.g., a window selection embedded inside a distance selection, the resulting matches for window selections are formed from the input matches that satisfy the window constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. This is explained in more detail in Section .

Distance Selection FTDistance"distance" FTRange FTUnit FTRange("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)

A distance selection consist of a full-text selection followed by one of the (complex) postfix operators derived from FTDistance.

A distance selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases satisfy the specified distance conditions. Distance is specified in units of FTUnits (words, sentences, and paragraphs). The number of intervening FTUnits is specified in the integer value of FTRange.

FTRange specifies a range of integer values, providing a minimum and maximum value. Each one of the AdditiveExpr specified in an FTRange is converted as though it were an argument to a function with the expected parameter type of "xs:integer".

Let the value of the first (or only) operand be M. If "from" is specified, let the value of the second operand be N. A distance selection may cross element boundaries when computing distance.

The following rule applies to the computation of distance:

Zero words (sentences, paragraphs) means adjacent tokens (sentences, paragraphs).

If "exactly" is specified, then the range is the closed interval [M, M]. If "at least" is specified, then the range is the half-closed interval [M, unbounded). If "at most" is specified, then the range is the closed interval [0, M]. If "from-to" is specified, then the range is the closed interval [M, N]. Note: If M is greater then N, the range is empty.

Here are some examples of FTRanges:

'exactly 0' specifies the range [0, 0].

'at least 1' specifies the range [1,unbounded].

'at most 1' specifies the range [0, 1].

'from 5 to 10' specifies the range [5, 10].

The distances computed by a distance selection are not affected by the presence or absence of element boundaries in the text. Stop words are counted in those computations whether they are ignored or not.

The following expression returns false, because "completion" and "errors" are less than 11 tokens apart:

/book ftcontains ("completion" ftand "errors" distance at least 11 words)

The following expression returns true, because the book element contains tokens "web", "site", and "usability" that have at most 2 intervening tokens between them:

/book ftcontains "web" ftand "site" ftand "usability" distance at most 2 words

The following expression returns the empty sequence, because between any token "usability" and the token in any occurrence of the phrase "web site" that is the nearest to the token "usability" there is always more than one intervening token:

/book[.//p ftcontains "web site" ftand "usability" distance at most 1 words]

The following expression returns the book title, because for the occurrences of the tokens "web" and "users" in the note element only one intervening token appears:

/book[. ftcontains "web" ftand "users" distance at most 1 words]/title

In order to allow meaningful results for nested positional filters, e.g., a distance selection embedded inside another distance selection, the resulting matches for distance selections are formed from the input matches that satisfy the distance constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. Thus, a distance selection that embeds a window or a distance selection takes the result of the embedded selection as a single unit.

The following gives an example of nested distance selections:

/books ftcontains ((("richard" ftand "nixon") distance at most 2) ftand (("george" ftand "bush") distance at most 2) distance at least 20)

This expression allows to find book elements that contain, for instance, "Richard M. Nixon" and "George W. Bush" at least 20 words apart. The matches for the inner distance selections are treated as single units (represented by StringIncludes) by the outer distance selection. Suppose such phrases are present in the search context, then the outer distance selection enforces a constraint on the number of intervening tokens ("at least 20") between the last token of "Richard M. Nixon" and the first token of "George W. Bush".

Scope Selection FTScope("same" | "different") FTBigUnit FTBigUnit"sentence" | "paragraph"

A scope selection consist of a full-text selection followed by one of the (complex) postfix operators derived from FTScope.

A scope selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are contained in the same scope or in different scopes.

Possible scopes are sentences and paragraphs.

By default, there are no restrictions on the scope of the matches.

The following expression returns false, because the tokens "usability" and "Marigold" are not contained within the same sentence:

/book ftcontains "usability" ftand "Marigold" same sentence

The following expression returns true, because the tokens "usability" and "Marigold" are contained within different sentences:

/book ftcontains "usability" ftand "Marigold" different sentence

The following expression returns a book element, because it contains "usability" and "testing" in the same paragraph:

/book[. ftcontains "usability" ftand "testing" same paragraph]

The following expression returns a book element, because "site" and "errors" appear in the same sentence:

/book[. ftcontains "site" ftand "errors" same sentence]

It is possible that both "same sentence" and "different sentence" conditions are simultaneously safisfied for several tokens and/or phrases within the same document fragment. This can be observed if there are occurrences of the tokens and/or phrases both within the same sentence and within difference sentences. For example, consider the following document fragment.

<introduction> ... The usability of a Web site is how well the site supports the user in achieving specified goals. ... Expert reviews and usability testing are methods of identifying problems in layout, terminology, and navigation. ... </introduction>

This sample will satisfy both conditions ("usability" ftand "reviews") different sentence and ("usability" ftand "reviews") same sentence. The tokens "usability" and "reviews" occur both in different sentences (the first and second shown sentences) and in the same sentence (the second shown sentences.)

The above observation also holds for the "same paragraph" and "different paragraph" conditions.

Anchoring Selection FTContent("at" "start") | ("at" "end") | ("entire" "content")

An anchoring selection consist of a full-text selection followed by one of the postfix operators "at start", "at end", or "entire content".

An anchoring selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are the first, last, or all tokens in the tokenized form of the items being searched.

Using the "at start" operator tokens or phrases are matched which are the first tokens or phrases in the tokenized string value of the item being searched.

Using the "at end" operator tokens or phrases are matched which are the last tokens or phrases in the tokenized string value of the item being searched.

Using the "entire content" operator tokens or phrases are matched which are the entire content of the tokenized string value of the item being searched.

The following expression returns each title element starting with the phrase "improving the usability of a web site":

/books//title[. ftcontains "improving the usability of a web site" at start]

The following expression returns each p element ending with the phrase "propagating few errors":

/books//p[. ftcontains "propagat.*" with wildcards ftand "few errors" distance at most 2 words at end]

The following expression returns each note element whose entire content is "this site has been approved by the web site users association":

/books//note[. ftcontains "this site has been approved by the web site users association" entire content] Cardinality Selection FTTimes"occurs" FTRange "times"

A cardinality selection consist of an FTWords followed by the FTTimes postfix operator. A cardinality selection selects matches for which the operand FTWords is matched a specified number of times.

A cardinality selection limits the number of different matches of FTWords within the specified range. The semantics of FTRange are described in .

In the document fragment "very very big":

The FTWords "very big" has 1 match consisting of the second "very" and "big".

The FTWords {"very", "big"} all has 2 matches; one consisting of the first "very" and "big", and the other containing the second "very" and "big".

The FTWords {"very", "big"} any has 3 matches.

The following expression returns the example book element's number, because the book element contains 2 or more occurrences of "usability":

/book[. ftcontains "usability" occurs at least 2 times]/@number

The following expression returns the empty sequence, because there are 4 occurrences of {"usability", "testing"} any in the designated title:

/book[@number="1" and title ftcontains {"usability", "testing"} any occurs at most 3 times] /book ftcontains "usability" occurs at least 2 times Ignore Option FTIgnoreOption"without" "content" UnionExpr

The ignore option specifies a set of nodes whose content are ignored. It is applicable only to a top-level FTSelection (see FTContainsExp). Ignored nodes are the set of nodes whose content are ignored. Ignored nodes are identified by the XQuery expression UnionExpr. Let N1, N2, ..., Nk be the sequence of nodes of the search context. The expression UnionExpr is evaluated in the context of each node Ni being searched. That is, the search context expression of the ftcontains predicate creates a new focus for the evaluation of the UnionExpr given with FTIgnoreOption, similar to the creation of the dynamic context of a path expression E1/E2 or a filter expression E1[E2] (see ).

Now, let I1, I2, ..., In be the sequence of items that UnionExpr evaluates to. For each Ni (i=1..k) a copy is made that omits each node Ij (j=1..n) that is not Ni. Those copies form the new search context. If UnionExpr evaluates to an empty sequence no nodes are omitted.

In the following fragment, if .//annotation is ignored, "Web Usability" will be found 2 times: once in the title element and once in the editor element. The 2 occurrences in the 2 annotation elements are ignored. On the other hand, "expert" will not be found, as it appears only in an annotation element.

<book> <title>Web Usability and Practice</title> <author>Montana <annotation> this author is an expert in Web Usability</annotation> Marigold </author> <editor>Véra Tudor-Medina on Web <annotation> best editor on Web Usability</annotation> Usability </editor> </book>

By default, no element content is ignored.

Nodes MAY be ignored during indexing and during query processing. The ignore option applies only to query processing. Whether and how indexing ignores nodes is out of scope for this specification.

Extension Selections

An extension selection is a full-text selection whose semantics are implementation-defined. Typically, a particular extension will be recognized by some implementations and not by others. The syntax is designed so that extension selections can be successfully parsed by all implementations, and so that fallback behavior can be defined for implementations that do not recognize a particular extension.

FTExtensionSelectionPragma+ "{" FTSelection? "}" Pragma"(#" S? QName (S PragmaContents)? "#)" PragmaContents(Char* - (Char* '#)' Char*))

An extension selection consists of one or more pragmas followed by a full-text selection enclosed in curly braces. See for information on pragmas in general. A pragma is denoted by the delimiters (# and #), and consists of an identifying QName followed by implementation-defined content. The content of a pragma may consist of any string of characters that does not contain the ending delimiter #). The QName of a pragma must resolve to a namespace URI and local name, using the statically known namespaces.

Since there is no default namespace for pragmas, a pragma QName must include a namespace prefix.

Each implementation recognizes an implementation-defined set of namespace URIs used to denote pragmas.

If the namespace part of a pragma QName is not recognized by the implementation as a pragma namespace, then the pragma is ignored. If all the pragmas in an FTExtensionSelection are ignored, then full-text extension selection is just the full-text selection enclosed in curly braces; if this full-text selection is absent, then a static error is raised .

If an implementation recognizes the namespace of one or more pragmas in an FTExtensionSelection, then the value of the FTExtensionSelection, including its error behavior, is implementation-defined. For example, an implementation that recognizes the namespace of a pragma QName, but does not recognize the local part of the QName, might choose either to raise an error or to ignore the pragma.

It is a static error if an implementation recognizes a pragma but determines that its content is invalid.

If an implementation recognizes a pragma, it must report any static errors in the following full-text selection even if it will not apply that selection.

The following examples illustrate three ways in which extension selections might be used.

A pragma can be used to furnish a hint for how to evaluate the following full-text selection, without actually changing the result. For example:

declare namespace exq = "http://example.org/XQueryImplementation"; /book/author[name ftcontains (# exq:use-index #) {'Berners-Lee'}]

An implementation that recognizes the exq:use-index pragma might use an index to evaluate the full-text selection that follows. An implementation that does not recognize this pragma would evaluate the full-text selection in its normal way.

A pragma might be used to modify the semantics of the following full-text selection in ways that would not (in the absence of the pragma) be conformant with this specification. For example, a pragma might be used to change distance counting so that adjacent words are at a distance of 1 (otherwise they would be at a distance of 0): declare namespace exq = "http://example.org/XQueryImplementation"; /book[.//p ftcontains (# exq:distance #) { "web site" ftand "usability" distance at most 1 words] }

Such changes to the language semantics must be scoped to the expression contained within the curly braces following the pragma.

A pragma might contain syntactic constructs that are evaluated in place of the following full-text selection. In this case, the following selection itself (if it is present) provides a fallback for use by implementations that do not recognize the pragma. For example:

declare namespace exq = "http://example.org/XQueryImplementation"; //city[. ftcontains (# exq:classifier with class 'Animals' #) {"animal" with thesaurus at "http://example.org/thesaurus.xml" relationship "RT"}

Here an implementation that recognizes the pragma will return the result of evaluating the proprietary syntax with class 'animals', while an implementation that does not recognize the pragma will instead return the result of the thesaurus option. If no fallback expression is required, or if none is feasible, then the expression between the curly braces may be omitted, in which case implementations that do not recognize the pragma will raise a static error.

Semantics

This section describes the formal semantics of XQuery 1.0 and XPath 2.0 Full-Text 1.0. The figure below shows how XQuery 1.0 and XPath 2.0 Full-Text 1.0 integrates with XQuery 1.0 and XPath 2.0.

The following diagram represents the interaction of XQuery 1.0 and XPath 2.0 Full-Text with the rest of XQuery 1.0 and XPath 2.0 languages. It specifies how full-text expression can be nested within XQuery 1.0 and XPath 2.0 expressions and vice versa.

Arrow 1 represents the composability of the XQuery 1.0 and XPath 2.0 expressions. This is outside the scope of this document and will not be discussed further.

Arrow 2 shows how XQuery 1.0 and XPath 2.0 expressions can be nested inside FTSelections by evaluating them to a sequence of items. If the XQuery 1.0 and XPath 2.0 expression is nested on the left-hand side of a FTContains expression or within FTWords, the items in the sequence are converted to their tokenized form. The process is described in Tokenization. If the XQuery 1.0 and XPath expression is nested within another type of FTSelection, the items in its results sequence are converted to atomic values as discussed in FTSelections.

Arrow 3 represents the composability of FTSelections. The composability is achieved by evaluating the FTSelections to AllMatches. Each FTSelection operates on zero or more AllMatches and returns AllMatches. The process is described in the Evaluation of FTSelections section.

Arrow 4 shows how the result of the evaluation of XQuery 1.0 and XPath 2.0 Full-Text 1.0 and scoring expressions are integrated into the XQuery 1.0 and XPath 2.0 model. The section XQuery 1.0 and XPath 2.0 Full-Text 1.0 and scoring expressions describes how this is achieved.

In the list above and throughout the rest of this section, bold typeface has been used to distinguish the concepts that are part of the AllMatches model.

The functions and schemas defined in this section are considered to be within the fts: namespace. These functions and schemas are used only for describing the semantics. There is no requirement that these functions and schemas be implemented, so no URI is associated with the fts: prefix.

Tokenization

Formally, tokenization is the process of converting the string value of a node to a sequence of token occurrences, taking the structural information of the node into account to identify token, sentence, and paragraph boundaries.

Each word MUST consist of one or more consecutive characters;

The tokenizer MUST preserve the containment hierarchy (paragraphs contain sentences contain words); and

The tokenizer MUST, when tokenizing two equal strings, identify the same tokens in each.

For some languages, some tokenizers may identify overlapping tokens. For example, the German word "Donaudampfschifffahrtskapitaensmuetze" might be tokenized into the following tokens: "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfschiff", "kapitaen", "muetze", "kapitaensmuetze", "schifffahrt", "dampfschifffahrt", and perhaps others.

Examples

The following document fragment is the source document for examples in this section. A sample tokenization is used for the examples in this section. The results might be different for other tokenizations.

Unless stated otherwise, the results assume a case-insensitive match.

<offers> <offer id="1000" price="10000"> Ford Mustang 2000, 65K, excellent condition, runs great, AC, CC, power all </offer> <offer id="1001" price="8000"> Honda Accord 1999, 78K, A/C, cruise control, runs and looks great, excellent condition </offer> <offer id="1005" price="5500"> Ford Mustang, 1995, 150K highway mileage, no rust, excellent condition </offer> </offers>

In this sample tokenization, tokens are delimited by punctuation and whitespace symbols.

The token "Ford" is at relative position 1.

The token "Mustang" is at relative position 2.

The token "2000" is at relative position 3.

Relative position numbers are assigned sequentially through the end of the document.

Hence each token occupies exactly one position, and no overlapping of tokens occurs. The relative positions of token occurrences are shown below in parentheses.

<offers> <offer id="1000" price="10000"> Ford(1) Mustang(2) 2000(3), 65K(4), excellent(5) condition(6), runs(7) great(8), AC(9), CC(10), power(11) all(12) </offer> <offer id="1001" price="8000"> Honda(13) Accord(14) 1999(15), 78K(16), A(17)/C(18), cruise(19) control(20), runs(21) and(22) looks(23) great(24), excellent(25) condition(26) </offer> <offer id="1005" price="5500"> Ford(27) Mustang(28), 1995(29), 150K(30) highway(31) mileage(32), little(33) rust(34), excellent(35) condition(36) </offer> </offers>

The relative positions of paragraphs are determined similarly. In this sample tokenization, the paragraph delimiters are start tags, end tags, and end of line characters.

The tokens in the first element are assigned relative paragraph number 1.

The tokens from the next element are assigned relative paragraph number 2.

Relative paragraph numbers are assigned sequentially through the end of the document.

The relative positions of sentences are determined similarly using sentence delimiters.

Implementations may provide for the means to ignore or side-step certain structural elements when performing tokenization. In the following example, the implementation has decided to ignore the markup for <bold> and prune out the entire subtree headed by <deleted>.

<para><deleted>This sentence was deleted.</deleted> This <bold>entire paragraph</bold> is one sentence as far as the tokenizer is concerned. </para>

Using the same notation as before, this sample tokenization is shown below. All the token occurrences marked with a token position also have the same sentence and paragraph relative positions. Note that there are no tokens marked for the ignored subtree.

<para><deleted>This sentence was deleted.</deleted> This(1) <bold>entire(2) paragraph(3)</bold> is(4) one(5) sentence(6) as(7) far(8) as(9) the(10) tokenizer(11) is(12) concerned(13). </para> Representations of Tokenized Text and Matching

Two representations of tokenized text will be employed in the formal semantics functions, one for the search strings of a query and one for matched token occurrences of search context items.

A SearchItem is a sequence of SearchTokenInfos representing the sequence of tokens derived from tokenizing one search string.

A SearchTokenInfo is the identity of a token inside a search string. Each SearchTokenInfo is associated with a unique identifier that captures the relative position of the search string in the query in document order.

A TokenInfo represents a sequence of consecutive token occurrences inside an XML document. Each TokenInfo is associated with:

a unique identifier that captures the relative position of the first token occurrence of the sequence in the document order: startPos

a unique identifier that captures the relative position of the last token occurrence of the sequence in the document order: endPos

the relative position of the sentence containing the first token occurrence or zero if the tokenizer does not report sentences: startSent

the relative position of the sentence containing the last token occurrence or zero if the tokenizer does not report sentences: endSent

the relative position of the paragraph containing the first token occurrence or zero if the tokenizer does not report paragraphs: startPara

the relative position of the paragraph containing the last token occurrence or zero if the tokenizer does not report paragraphs: endPara

The following matching function is the central implementation-defined primitive performing the full-text retrieval.

declare function fts:matchTokenInfos ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $stopWords as xs:string*, $searchTokens as element(fts:searchToken)* ) as element(fts:tokenInfo)* external;

The above function returns the TokenInfos in items in $searchContext that match the search string represented by the sequence $searchTokens, when using the match options in $matchOptions and stop words in $stopWords. If $searchTokens is a sequence of more than one search token, each returned TokenInfo must represent a phrase matching that sequence.

While this matching function assumes a tokenized representation of the search strings, it does not assume a tokenized representation of the input items in $searchContext, i.e. the texts in which the search happens. Hence, the tokenization of the search context is implicit in this function and coupled to the retrieval of matches. Of course, this does not imply that tokenization of the search context cannot be done a priori. The tokenization of each item in $searchContext does not necessarily take into account the match options in $matchOptions or the search tokens in $searchTokens. This allows implementations to tokenize and index input data without the knowledge of particular match options used in full-text queries.

Evaluation of FTSelections

The sequence of nodes in the XQuery 1.0 and XPath 2.0 Data Model is inadequate to support fully composable FTSelections. Full-text operations, such as FTSelections, operate on linguistic units, such as positions of tokens, and which are not captured in the XQuery 1.0 and XPath 2.0 Data Model (XDM).

XQuery 1.0 and XPath 2.0 Full-Text adds relative token, sentence, and paragraph position numbers via AllMatches. AllMatches make FTSelections fully composable.

AllMatches Formal Model

An AllMatches describes the possible results of an FTSelection. The UML Static Class diagram of AllMatches is shown on the diagram given below.

The AllMatches object contains zero or more Matches.

Each Match describes one result to the FTSelection. The result is described in terms of zero or more StringIncludes and zero or more StringExcludes.

A StringMatch is a possible match of a sequence of search tokens with a corresponding sequence of consecutive token occurrences in a document. A StringMatch may be a StringInclude or StringExclude. The queryPos attribute specifies the position of the search token in the query. This attribute is needed for FTOrders. The matched document token sequence is described in the TokenInfo associated with the StringMatch.

A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.

A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.

Intuitively, AllMatches specifies the TokenInfos that a node contains and does not contain to satisfy an FTSelection.

The AllMatches structure resembles the Disjunctive Normal Form (DNF) in propositional and first-order logic. The AllMatches is a disjunction of Matches. Each Match is a conjunction of StringIncludes, and StringExcludes.

Examples

Since in most of the examples below the tokens span only a single position, we characterize the TokenInfo instance by simply giving this position, written as "Pos:X". This should be read as the value for both, the startPos and the endPos attribute. Furthermore, for expository reasons, we include in each StringMatch example an attribute "query string", set to the original query string, in order to facilitate the association from which query string that match came from.

The simplest example of an FTSelection is an FTWords such as "Mustang". The AllMatches corresponding to this FTWords is given below.

As shown, the AllMatches consists of two Matches. Each Match represents one possible result of the FTWords "Mustang". The result represented by the first Match, represented as a StringInclude, contains the token "Mustang" at position 2. The result described by the second Match contains the token "Mustang" at position 28.

A more complex example of an FTSelection is an FTWords such as "Ford Mustang". The AllMatches for this FTWords is given below.

There are two possible results for this FTWords, and these are represented by the two Matches. Each of the Matches requires two tokens to be matched. The first Match is obtained by matching "Ford" at position 1 and matching "Mustang" at position 2. Similarly, the second Match is obtained by matching "Ford" at position 27 and "Mustang" at position 28.

An even more complex example of an FTSelection is an FTSelection such as "Mustang" ftand ftnot "rust" that searches for "Mustang" but not "rust". The AllMatches for this FTSelection is given below.

This example introduces StringExclude. StringExclude corresponds to negation in DNF (Disjunctive Normal Form). It specifies that the result described by the corresponding Match must not match the token at the specified position. In this example, the first Match specifies that "Mustang" is matched at position 2, and that the token "rust" at position 34 is not matched.

XML representation

AllMatches has a well-defined hierarchical structure. Therefore, the AllMatches can be easily modeled in XML. This XML representation and those which follow formally describe the semantics of FTSelections. For example, the XML representation of AllMatches formally specifies how an FTSelection operates on zero or more AllMatches to produce a resulting AllMatches.

The XML schema for representing AllMatches is given below.

The stokenNum attribute in AllMatches is related to the representation of the semantics as XQuery functions. Therefore, it is not considered part of the AllMatches model. The stokenNum attribute stores the number of search tokens used when evaluating the AllMatches. This value is used to compute the correct value for the queryPos attribute in new StringMatches.

XML Representation

FTSelections are fully composable and may be nested arbitrarily under other FTSelections. Each FTSelection may be associated with match options (such as stemming and stop words) and score weights. Since score weights are solely interpreted by the formal semantics scoring function, they do not influence the semantics of FTSelections. Therefore, score weights are not considered in the formal semantics.

The XML representation of the FTSelections used in the fts:evaluate function closely follows the grammar of the language. It can be viewed as an XML representation of an abstract syntax tree (AST) of a parsed full-text query. Every FTSelection is represented as an XML element. Every nested FTSelection is represented as a nested descendant element. For binary FTSelections, e.g., FTAnd, the nested FTSelections are represented in <left> and <right> descendant elements. For unary FTSelections, a <selection> descendant element is used. Additional characteristics of FTSelections, e.g., the distance unit for FTDistance, are stored in attributes.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text-10" targetNamespace="http://www.w3.org/2007/xpath-full-text-10" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:include schemaLocation="AllMatches.xsd" /> <xs:include schemaLocation="MatchOptions.xsd" /> <xs:complexType name="ftSelection"> <xs:sequence> <xs:choice> <xs:element name="ftWords" type="fts:ftWords"/> <xs:element name="ftAnd" type="fts:ftAnd"/> <xs:element name="ftOr" type="fts:ftOr"/> <xs:element name="ftUnaryNot" type="fts:ftUnaryNot"/> <xs:element name="ftMildNot" type="fts:ftMildNot"/> <xs:element name="ftOrder" type="fts:ftOrder"/> <xs:element name="ftScope" type="fts:ftScope"/> <xs:element name="ftContent" type="fts:ftContent"/> <xs:element name="ftDistance" type="fts:ftDistance"/> <xs:element name="ftWindow" type="fts:ftWindow"/> <xs:element name="ftTimes" type="fts:ftTimes"/> </xs:choice> <xs:element ref="fts:matchOptions" minOccurs="0"/> <xs:element name="weight" type="xs:double" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:element name="selection" type="fts:ftSelection"/> <xs:complexType name="ftWords"> <xs:sequence> <xs:element ref="searchItem" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="type" type="fts:ftWordsType" use="required"/> </xs:complexType> <xs:element name="searchItem" type="fts:searchItem"/> <xs:complexType name="ftAnd"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftOr"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftUnaryNot"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftMildNot"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftOrder"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftScope"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:scopeType" use="required"/> <xs:attribute name="scope" type="fts:scopeSelector" use="required"/> </xs:complexType> <xs:complexType name="ftContent"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:contentMatchType" use="required"/> </xs:complexType> <xs:complexType name="ftDistance"> <xs:sequence> <xs:element name="range" type="fts:ftRangeSpec"/> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:distanceType" use="required"/> </xs:complexType> <xs:complexType name="ftWindow"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="size" type="xs:integer" use="required"/> <xs:attribute name="type" type="fts:distanceType" use="required"/> </xs:complexType> <xs:complexType name="ftTimes"> <xs:sequence> <xs:element name="range" type="fts:ftRangeSpec"/> <xs:element name="selection" type="fts:ftWords"/> </xs:sequence> </xs:complexType> <xs:simpleType name="ftWordsType"> <xs:restriction base="xs:string"> <xs:enumeration value="any"/> <xs:enumeration value="all"/> <xs:enumeration value="phrase"/> <xs:enumeration value="any word"/> <xs:enumeration value="all word"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="scopeType"> <xs:restriction base="xs:string"> <xs:enumeration value="same"/> <xs:enumeration value="different"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="scopeSelector"> <xs:restriction base="xs:string"> <xs:enumeration value="paragraph"/> <xs:enumeration value="sentence"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="distanceType"> <xs:restriction base="xs:string"> <xs:enumeration value="paragraph"/> <xs:enumeration value="sentence"/> <xs:enumeration value="word"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="contentMatchType"> <xs:restriction base="xs:string"> <xs:enumeration value="at start"/> <xs:enumeration value="at end"/> <xs:enumeration value="entire content"/> </xs:restriction> </xs:simpleType> </xs:schema> The evaluate function

The denotational semantics for the evaluation of FTSelections is defined using the fts:evaluate function. The function takes three parameters: (1) an FTSelection, 2) a search context node, and 3) the default set of match options that apply to the evaluation of the FTSelection.

The fts:evaluate function returns the AllMatches that is the result of evaluating the FTSelection. When fts:evaluate is applied to some FTSelection X, it calls the function fts:ApplyX to build the resulting AllMatches. If X is applied on nested FTSelections, the fts:evaluate function is recursively called on these nested FTSelections and the returned AllMatches are used in the evaluation of fts:ApplyX.

The semantics for the fts:evaluate function is given below.

declare function fts:evaluate ( $ftSelect as element(*, fts:ftSelection), $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchTokenNum as xs:integer ) as element(fts:allMatches) { if (fn:count($ftSelect/fts:matchOptions) > 0) then (: First we deal with all match options that the :) (: FTSelection might bear: we add the match options :) (: to the current match options structure, and :) (: pass the new structure to the recursive call. :) let $newFTSelection := <fts:selection>{$ftSelect/* [fn:not(self::fts:matchOptions)]}</fts:selection> return fts:evaluate($newFTSelection, $searchContext, replaceMatchOptions($matchOptions, $ftSelect/fts:matchOptions), $searchTokenNum) else if (fn:count($ftSelect/fts:weight) > 0) then (: Weight has no bearing on semantics -- just :) (: call "evaluate" on nested FTSelection :) let $newFTSelection := $ftSelect/*[fn:not(self::fts:weight)] return fts:evaluate($newFTSelection, $searchContext, $matchOptions, $searchTokenNum) else typeswitch ($ftSelect/*[1]) case $nftSelection as element(fts:ftWords) return (: Apply the FTWords in the search context :) fts:ApplyFTWords($searchContext, $matchOptions, $nftSelection/@type, $nftSelection/fts:searchItem, $searchTokenNum + 1) case $nftSelection as element(fts:ftAnd) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $searchTokenNum) let $newSearchTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newSearchTokenNum) return fts:ApplyFTAnd($left, $right) case $nftSelection as element(fts:ftOr) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $searchTokenNum) let $newSearchTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newSearchTokenNum) return fts:ApplyFTOr($left, $right) case $nftSelection as element(fts:ftUnaryNot) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTUnaryNot($nested) case $nftSelection as element(fts:ftMildNot) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $searchTokenNum) let $newSearchTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newSearchTokenNum) return fts:ApplyFTMildNot($left, $right) case $nftSelection as element(fts:ftOrder) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTOrder($nested) case $nftSelection as element(fts:ftScope) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTScope($nftSelection/@type, $nftSelection/@scope, $nested) case $nftSelection as element(fts:ftContent) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTContent($searchContext, $matchOptions, $nftSelection/@type, $nested) case $nftSelection as element(fts:ftDistance) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTDistance($matchOptions, $nftSelection/@type, $nftSelection/fts:range, $nested) case $nftSelection as element(fts:ftWindow) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTWindow($matchOptions, $nftSelection/@type, $nftSelection/@size, $nested) case $nftSelection as element(fts:ftTimes) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $searchTokenNum) return fts:ApplyFTTimes($nftSelection/fts:range, $nested) default return () };

For concreteness, assume that the FTSelection was invoked inside an ftcontains expression such as searchContext ftcontains ftselection. In order to determine the AllMatches result of ftselection, the fts:evaluate function is invoked as follows: fts:evaluate($ftselection, $searchContext, $matchOptions, 0), where $ftselection is the XML representation of the ftselection and $searchContext is bound to the result of the evaluation of the XQuery expression searchContext.

Initially, the $searchTokensNum is 0, i.e., no search tokens have been processed.

The variable $matchOptions is bound to the list of match options as defined in the static context (see Appendix ). Match options embedded in ftselection modify the match options collection as evaluation proceeds.

Match options are applied to an FTSelection, organized in a stack.

The top match option in the stack is applied first.

The second match option is applied next.

Match options are applied sequentially down to the bottom of the stack.

Ordering among match options is necessary because match options are not always commutative. For example, synonym(stem(word)) is not always the same as stem(synonym(word)). Naturally, match options may be reordered when they commute, but this is an optimization issue and is beyond the scope of this document.

Given the invocation of: fts:evaluate($ftselection, $searchContext, $matchOptions), evaluation proceeds as follows. First, $ftselection is checked to see whether a match option is applied 1) on a nested FTSelection, 2) on a weight specification, 3) on an FTWords, or 4) on some other FTSelection (case 4).

If $ftselection contains a match option, then it modifies the context for the nested FTSelection. Consequently, a new match option element is created and pushed onto the top of the stack of match options. The createOptionElement function used to create a stack element corresponding to the match option creates a data structure that stores the type of match option, such as stemming, thesaurus, and the details relating to the match option, such as the name of the thesaurus, the stop words for which other tokens may be substituted. The context match option created is added to the top of the stack because, in the FTSelection, it was applied before the other match options in the current match options stack. The evaluate function is then invoked on the nested FTSelection with the new match options stack. When the function returns, the match option is popped from the stack, and the result of the nested evaluate function is returned. The match option is popped because the match options do not apply to FTSelections outside its scope.

If $ftselection contains a weight specification, then the specification is ignored because it does not alter the semantics. The evaluate function is recursively called on the nested FTSelection and the resulting AllMatches is returned.

If $ftselection is an FTWords, then it does not have any nested FTSelections. Consequently, this is the base of the recursive call, and the AllMatches result of the FTWords is computed and returned. The AllMatches is computed by invoking the ApplyFTWords function with the current search context and other necessary information.

If $ftselection contains neither a match option nor a weight specification and is not an FTWords, the FTSelection performs a full-text operation, such as ftand, ftor, window. These operations are fully-compositional and may be invoked on nested FTSelections. Consequently, evaluation proceeds as follows.

First, the evaluate function is recursively invoked on each nested FTSelection. The result of evaluating each nested FTSelection is an AllMatches.

The AllMatches are transformed into the resulting AllMatches by applying the full-text operation corresponding to FTSelection1 which is generically named applyX for some type of FTSelection X in the code.

For example, let FTSelection1 be FTSelection2 ftand FTSelection3. Here FTSelection2 and FTSelection3 may themselves be arbitrarily nested FTSelections. Thus, evaluate is invoked on FTSelection2 and FTSelection3, and the resulting AllMatches are transformed to the final AllMatches using the ApplyFTAnd function corresponding to ftand .

The semantics of the ApplyX function for each FTSelection kind X is given below.

Formal semantics functions

The formal semantics of the applyX functions for each FTSelection kind X is specified by five functions. How two of these functions are computed is implementation-dependent, but all the functions must satisfy some well-defined properties.

The wordDistance function returns the number of tokens that occur between the positions of the TokenInfos $tokenInfo1 and $tokenInfo2. For example, two consecutive tokens have a distance of 0 tokens.

declare function fts:wordDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo), $matchOptions as element(fts:matchOptions) ) as xs:integer { (: -1 because we count starting at 0 :) $tokenInfo2/@startPos - $tokenInfo1/@endPos - 1 };

The paraDistance function returns the number of paragraphs between the TokenInfos $tokenInfo1 and $tokenInfo2.

declare function fts:paraDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo), $matchOptions as element(fts:matchOptions) ) as xs:integer { (: -1 because we count starting at 0 :) $tokenInfo2/@startPara - $tokenInfo1/@endPara - 1 };

The sentenceDistance function returns the number of sentences between the TokenInfos $tokenInfo1 and $tokenInfo2.

declare function fts:sentenceDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo), $matchOptions as element(fts:matchOptions) ) as xs:integer { (: -1 because we count starting at 0 :) $tokenInfo2/@startSent - $tokenInfo1/@endSent - 1 };

The isStartToken function returns true if the TokenInfo $tokenInfo describes a token whose start position is the first position of the node $searchContext.

declare function fts:isStartToken ( $searchContext as item(), $tokenInfo as element(fts:tokenInfo) ) as xs:boolean external;

The isEndToken function returns true if the TokenInfo $tokenInfo describes a token whose end position is the last position of the node $searchContext.

declare function fts:isEndToken ( $searchContext as item(), $tokenInfo as element(fts:tokenInfo) ) as xs:boolean external; FTWords

An FTWords that consists of a single search string consisting of a sequence of token to be matched as a phrase is evaluated by the applySearchTokensAsPhrase function. Its parameters are 1) the search context, 2) the list of match options, 3) the search string to be matched as a sequence of fts:searchToken items, and 4) the position where the latter search string occurs in the query.

(: simplified version not dealing with special match options :) declare function fts:applySearchTokensAsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchTokens as element(fts:searchToken)*, $queryPos as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$queryPos}"> { for $tokenInfo in fts:matchTokenInfos( $searchContext, $matchOptions, (), $searchTokens ) return <fts:match> <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> {$tokenInfo} </fts:stringInclude> </fts:match> } </fts:allMatches> };

If after the application of all the match options, the sequence of search tokens returned for an FTWords is empty, an empty AllMatches is returned.

The AllMatches corresponding to an FTWords is a set of Matches. Each of the Matches is associated with a start and an end position indicating where the corresponding search tokens were found. For example, the AllMatches result for the FTWords "Mustang" is given below. To simplify the presentation in the figures we write Pos: N, if the attributes startPos and endPos are the same with N being that position.

There are five variations of FTWords depending on how the tokens and phrases in the nested XQuery 1.0 and XPath 2.0 expression are matched.

When any word is specified, at least one token in the tokenization of the nested expression must be matched.

When all word is specified, all tokens in the tokenization of the nested expression must be matched.

When phrase is specified, all tokens in the tokenization of the nested expression must be matched as a phrase.

When any is specified, at least one string atomic value in the nested expression must be matched as a phrase.

When all is specified, all string atomic values in the nested expression must be matched as a phrase.

The semantics for FTWords when any word is specified is given below. Since FTWords does not have nested FTSelections, the ApplyFTWords function does not take AllMatches parameters corresponding to nested FTSelection results.

declare function fts:MakeDisjunction ( $curRes as element(fts:allMatches), $rest as element(fts:allMatches)* ) as element(fts:allMatches) { if (fn:count($rest) = 0) then $curRes else let $firstAllMatches := $rest[1] let $restAllMatches := fn:subsequence($rest, 2) let $newCurRes := fts:ApplyFTOr($curRes, $firstAllMatches) return fts:MakeDisjunction($newCurRes, $restAllMatches) }; declare function fts:ApplyFTWordsAnyWord ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchItems as element(fts:searchItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Tokenization of search string has already occurred. :) (: Get sequence of SearchTokens over all search items. :) let $searchTokens := $searchItems/fts:searchToken return if (fn:count($searchItems) eq 0) then <fts:allMatches stokenNum="0" /> else let $allAllMatches := for $searchToken at $pos in $searchTokens return fts:applySearchTokensAsPhrase($searchContext, $matchOptions, $searchToken, $queryPos + $pos - 1) let $firstAllMatches := $allAllMatches[1] let $restAllMatches := fn:subsequence($allAllMatches, 2) return fts:MakeDisjunction($firstAllMatches, $restAllMatches) };

The tokenized search strings are passed to ApplyFTWordsAnyWord as a sequence of fts:searchItem, each containing the tokens of a single search string. A single flattened sequence of all tokens (of type fts:searchToken) over all search items is constructed. For each of these, the result of FTWords is computed using applySearchTokensAsPhrase. Finally, the disjunction of all resulting AllMatches is computed.

The semantics for FTWords when all word is specified is similar to the above, however composes a conjunction. It is given below.

declare function fts:MakeConjunction ( $curRes as element(fts:allMatches), $rest as element(fts:allMatches)* ) as element(fts:allMatches) { if (fn:count($rest) = 0) then $curRes else let $firstAllMatches := $rest[1] let $restAllMatches := fn:subsequence($rest, 2) let $newCurRes := fts:ApplyFTAnd($curRes, $firstAllMatches) return fts:MakeConjunction($newCurRes, $restAllMatches) }; declare function fts:ApplyFTWordsAllWord ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchItems as element(fts:searchItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Tokenization of search strings has already occurred. :) (: Get sequence of SearchTokens over all search items :) let $searchTokens := $searchItems/fts:searchToken return if (fn:count($searchTokens) eq 0) then <fts:allMatches stokenNum="0" /> else let $allAllMatches := for $searchToken at $pos in $searchTokens return fts:applySearchTokensAsPhrase($searchContext, $matchOptions, $searchToken, $queryPos + $pos - 1) let $firstAllMatches := $allAllMatches[1] let $restAllMatches := fn:subsequence($allAllMatches, 2) return fts:MakeConjunction($firstAllMatches, $restAllMatches) };

The semantics for FTWords if phrase is specified is given below.

declare function fts:ApplyFTWordsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchItems as element(fts:searchItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Get sequence of SearchTokenInfos over all search items :) let $searchTokens := $searchItems/fts:searchToken return if (fn:count($searchTokens) eq 0) then <fts:allMatches stokenNum="0" /> else fts:applySearchTokensAsPhrase($searchContext, $matchOptions, $searchTokens, $queryPos) };

The ApplyFTWordsPhrase function also flattens the sequence of search items to a sequence of search tokens, but then calls applySearchTokensAsPhrase on that entire sequence, instead of calling it on each search token individually. Hence, the sequence of all search tokens is matched as a single phrase and the computed TokenInfos are returned.

The semantics for FTWords when any is specified is given below.

declare function fts:ApplyFTWordsAny ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchItems as element(fts:searchItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if (fn:count($searchItems) eq 0) then <fts:allMatches stokenNum="0" /> else let $firstSearchItem := $searchItems[1] let $restSearchItem := fn:subsequence($searchItems, 2) let $firstAllMatches := fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $firstSearchItem, $queryPos) let $newQueryPos := if ($firstAllMatches//@queryPos) then fn:max($firstAllMatches//@queryPos) + 1 else $queryPos let $restAllMatches := fts:ApplyFTWordsAny($searchContext, $matchOptions, $restSearchItem, $newQueryPos) return fts:ApplyFTOr($firstAllMatches, $restAllMatches) };

The FTWords with any specified forms the disjunction of the AllMatches that are the result of the matching of each search item as a phrase.

The semantics for FTWords when all is specified is given below.

declare function fts:ApplyFTWordsAll ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchItems as element(fts:searchItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if (fn:count($searchItems) = 0) then <fts:allMatches stokenNum="0" /> else let $firstSearchItem := $searchItems[1] let $restSearchItem := fn:subsequence($searchItems, 2) let $firstAllMatches := fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $firstSearchItem, $queryPos) return if ($restSearchItem) then let $newQueryPos := if ($firstAllMatches//@queryPos) then fn:max($firstAllMatches//@queryPos) + 1 else $queryPos let $restAllMatches := fts:ApplyFTWordsAll($searchContext, $matchOptions, $restSearchItem, $newQueryPos) return fts:ApplyFTAnd($firstAllMatches, $restAllMatches) else $firstAllMatches };

The difference between all and any is the use of conjunction instead of disjunction.

The ApplyFTWords function combines all of these functions.

declare function fts:ApplyFTWords ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $type as fts:ftWordsType, $searchItems as element(fts:searchItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if ($type eq "any word") then fts:ApplyFTWordsAnyWord($searchContext, $matchOptions, $searchItems, $queryPos) else if ($type eq "all word") then fts:ApplyFTWordsAllWord($searchContext, $matchOptions, $searchItems, $queryPos) else if ($type eq "phrase") then fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $searchItems, $queryPos) else if ($type eq "any") then fts:ApplyFTWordsAny($searchContext, $matchOptions, $searchItems, $queryPos) else fts:ApplyFTWordsAll($searchContext, $matchOptions, $searchItems, $queryPos) }; Match Options Semantics Types

XQuery 1.0 functions are used to define the semantics of FTMatchOptions. These functions operate on an XML representation of the FTMatchOptions. The representation closely follows the syntax. Each FTMatchOption is represented by an XML element. Additional characteristics of the match option are represented as attributes. The schema is given below.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text-10" targetNamespace="http://www.w3.org/2007/xpath-full-text-10" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:complexType name="ftMatchOptions"> <xs:sequence> <xs:element ref="thesaurus" minOccurs="0" maxOccurs="1"/> <xs:element ref="stopwords" minOccurs="0" maxOccurs="1"/> <xs:element ref="case" minOccurs="0" maxOccurs="1"/> <xs:element ref="diacritics" minOccurs="0" maxOccurs="1"/> <xs:element ref="stem" minOccurs="0" maxOccurs="1"/> <xs:element ref="wildcard" minOccurs="0" maxOccurs="1"/> <xs:element ref="language" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:element name="matchOptions" type="fts:ftMatchOptions"/> <xs:element name="case" type="fts:ftCaseOption" /> <xs:element name="diacritics" type="fts:ftDiacriticsOption" /> <xs:element name="thesaurus" type="fts:ftThesaurusOption" /> <xs:element name="stem" type="fts:ftStemOption" /> <xs:element name="wildcard" type="fts:ftWildCardOption" /> <xs:element name="language" type="fts:ftLanguageOption" /> <xs:element name="stopwords" type="fts:ftStopwordOption" /> <xs:complexType name="ftCaseOption"> <xs:sequence> <xs:element name="value"> <xsd:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="case insensitive"/> <xs:enumeration value="case sensitive"/> <xs:enumeration value="lowercase"/> <xs:enumeration value="uppercase"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftDiacriticsOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="case insensitive"/> <xs:enumeration value="case sensitive"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftThesaurusOption"> <xs:sequence> <xs:element name="thesaurusName" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="relationship" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="range" type="fts:FTRangeSpec" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="thesaurusIndicator"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="with"/> <xs:enumeration value="without"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="language" type="xs:string"/> </xs:complexType> <xs:complexType name="ftRangeSpec"> <xs:attribute name="type" type="fts:rangeSpecType" use="required"/> <xs:attribute name="m" type="xs:integer"/> <xs:attribute name="n" type="xs:integer" use="required"/> </xs:complexType> <xs:simpleType name="rangeSpecType"> <xs:restriction base="xs:string"> <xs:enumeration value="exactly"/> <xs:enumeration value="at least"/> <xs:enumeration value="at most"/> <xs:enumeration value="from to"/> </xs:restriction> </xs:simpleType> <xs:complexType name="ftStemOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="with stemming"/> <xs:enumeration value="without stemming"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftWildCardOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="with wildcards"/> <xs:enumeration value="without wildcards"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftLanguageOption"> <xs:sequence> <xs:element name="value" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftStopwordOption"> <xs:sequence> <xs:choice> <xs:element name="default-stopwords"> <xs:complexType /> </xs:element> <xs:element name="stopword" type="xs:string" /> <xs:element name="uri" type="xs:anyURI" /> </xs:choice> <xs:element name="oper" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:choice> <xs:element name="stopword" type="xs:string" /> <xs:element name="uri" type="xs:anyURI" /> </xs:choice> <xs:attribute name="type"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="union"/> <xs:enumeration value="except"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema> High-Level Semantics

The previous section described FTSelections without giving any details about how FTMatchOptions need to be interpreted. All processing of FTMatchOptions was delegated to the function matchTokenInfos, which is implementation-defined. In this section, further details on the semantics of FTMatchOptions are given.

The extension is achieved by modifying an existing function and adding functions that are specific to the FTMatchOptions.

Modifications in the semantics of existing functions

The semantics of most of the FTSelections remains unmodified. The modifications are to the method for matching a sequence of search tokens.

declare function fts:applySearchTokensAsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $searchTokens as element(fts:searchToken)*, $queryPos as xs:integer ) as element(fts:allMatches) { let $thesaurusOption := $matchOptions/fts:thesaurus[1] return if ($thesaurusOption and $thesaurusOption/@thesaurusIndicator eq "with") then let $noThesaurusOptions := <fts:matchOptions>{ $matchOptions/*[fn:not(self::fts:thesaurus)] }</fts:matchOptions> let $lookupRes := fts:applyThesaurusOption($thesaurusOption, $searchTokens) return fts:ApplyFTWordsAny($searchContext, $noThesaurusOption, $lookupRes, $queryPos) else (: from here on we have a single sequence of search tokens :) (: which is to be matched a phrase; no alternatives anymore :) <fts:allMatches stokenNum="{$queryPos}"> { for $pos in fts:matchTokenInfos( $searchContext, $matchOptions, fts:applyStopwordOption($matchOptions/fts:stopwords), $searchTokens ) return <fts:match> <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> {$pos} </fts:stringInclude> </fts:match> } </fts:allMatches> };

Two FTMatchOptions need to be processed differently than the rest of the FTMatchOptions as shown in the function above.

Unlike all other FTMatchOptions the semantics of the FTThesaurusOption cannot be formulated as an operation on individual search tokens, because a thesaurus lookup may return alternative search items for a whole phrase, i.e., a sequence of search tokens. Since the result of a thesaurus lookup is a sequence of alternatives, there must be a higher level of processing. The above call to applyThesaurusOption> returns for the given sequence of search tokens (representing a phrase) all thesaurus expansions for the selected thesaurus, relationship and level range as a sequence of search items. The alternative expansions are evaluated as a disjunction using the fts:ApplyFTWordsAny. The matching of the alternatives is performed with FTThesaurusOption turned off to avoid double expansions, i.e., expansion of an already expanded token.

For the semantics of the FTStopWordOption the list of stop words needs to be computed as demanded by the special syntax for stop word lists involving the operators "union" and "except".

Semantics of new FTMatchOptions functions

The expansion of FTSelections also includes adding additional functions that are specific to the FTMatchOptions.

The evaluate function above handled match options occurring in the query structure by using a call to the function replaceMatchOptions which is defined here. This function is used to overwrite a match option which occurs more to the root of the query structure tree by match options of the same group that occur further below.

declare function fts:replaceMatchOptions ( $matchOptions as element(fts:matchOptions), $newMatchOptions as element(fts:matchOptions) ) as element(fts:matchOptions) { <fts:matchOptions> { (if ($newMatchOptions/fts:thesaurus) then $newMatchOptions/fts:thesaurus else $matchOptions/fts:thesaurus), (if ($newMatchOptions/fts:stopwords) then $newMatchOptions/fts:stopwords else $matchOptions/fts:stopwords), (if ($newMatchOptions/fts:case) then $newMatchOptions/fts:case else $matchOptions/fts:case), (if ($newMatchOptions/fts:diacritics) then $newMatchOptions/fts:diacritics else $matchOptions/fts:diacritics), (if ($newMatchOptions/fts:stem) then $newMatchOptions/fts:stem else $matchOptions/fts:stem), (if ($newMatchOptions/fts:wildcard) then $newMatchOptions/fts:wildcard else $matchOptions/fts:wildcard), (if ($newMatchOptions/fts:language) then $newMatchOptions/fts:language else $matchOptions/fts:language) } </fts:matchOptions> };

This function determines how match options of the same kind overwrite each other, so that only one option of the same kind remains.

The details of the semantics of the remaining FTMatchOptions are determined by the implementation-defined function matchTokenInfos.

Formal Semantics Functions

FTMatchOption functions which are necessary to support match option processing are given below.

declare function fts:resolveStopwordsUri ( $uri as xs:string? ) as xs:string* external; declare function fts:lookupThesaurus ( $tokens as element(fts:searchToken)*, $thesaurusName as xs:string?, $thesaurusLanguage as xs:string?, $relationship as xs:string?, $range as element(fts:range)? ) as element(fts:searchItem)* external;

The function resolveStopwordsUri is used to resolve any URI to a sequence of strings to be used as stop words.

The function lookupThesaurus finds all expansions related to $tokens in the thesaurus $thesaurusName for the language $thesaurusLanguage using the relationship $relationship within the optional number of levels $range. If $tokens consists of more than one search token, it is regarded as a phrase.

The thesaurus function returns a sequence of expansion alternatives. Each alternative is regarded as a new search phrase and is represented as a search item. Alternatives are treated as though they are connected with a disjunction (FTOr).

FTCaseOption

FTMatchOptions of type FTCaseOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTCaseOption is "lowercase" the returned TokenInfos must span only tokens that are all lowercase. If the FTCaseOption is "uppercase" the returned TokenInfos must span only tokens that are all uppercase. If the FTCaseOption is "case insensitive" the function must return all TokenInfos matching the search tokens when disregarding character case. If the FTCaseOption is "case sensitive" the function must return all TokenInfos that also accord with the search tokens in character case.

FTDiacriticsOption

FTMatchOptions of type FTDiacriticsOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTDiacriticsOption is "diacritics insensitive" the function must return all TokenInfos matching the search tokens when disregarding diacritical marks. If the FTDiacriticsOption is "diacritics sensitive" the function must return all TokenInfos that also accord with the search tokens in diacritical marks.

FTStemOption

FTMatchOptions of type FTStemOption are passed in the $matchOptions parameter to matchTokenInfos. It is implementation-defined what the effect of the option "with stemming" is on matching tokens, however, it is expected that this option allows to match linguistic variants of the search tokens. If the FTStemOption is "without stemming" the returned TokenInfos must span exact matches (i.e. not including linguistic variations) of the search tokens.

FTThesaurusOption

The semantics for the FTThesaurusOption is given below.

declare function fts:applyThesaurusOption ( $matchOption as element(fts:thesaurus), $searchTokens as element(fts:searchToken)* ) as element(xs:searchItem)* { if ($matchOption/@thesaurusIndicator = "with") then fts:lookupThesaurus( $searchTokens, $matchOption/fts:thesaurusName, $matchOption/@language, $matchOption/fts:relationship, $matchOption/fts:range ) else if ($matchOption/@thesaurusIndicator = "without") then <fts:searchItem> {$searchTokens} </fts:searchItem> else () }; FTStopWordOption

Stop words interact with FTDistance and FTWindow. The semantics for the FTStopWordOption is given below.

declare function fts:applyStopwordOption ( $stopwordOption as element(fts:stopwords)? ) as xs:string* { if ($stopwordOption) then let $swords := typeswitch ($stopwordOption/*[1]) case $e as element(fts:stopword) return $e/text() case $e as element(fts:uri) return fts:resolveStopwordsUri($e/text()) case element(fts:default-stopwords) return fts:resolveStopwordsUri(()) default return () return calcStopwords( $swords, $stopwordOption/fts:oper ) else () }; declare function fts:calcStopwords ( $stopWords as xs:string*, $opers as element(fts:oper)* ) as element(fts:searchToken)* { if ( fn:empty($opers) ) then $stopWords else let $swords := typeswitch ($opers[1]/*[1]) case $e as element(fts:stopword) return $e/text() case $e as element(fts:uri) return fts:resolveStopwordsUri($e/text()) default return () return if ($opers[1]/@type eq "union") then fts:calcStopwords( ($stopWords, $swords), $opers[fn:position() gt 2] ) else (: "except" :) fts:calcStopwords( $stopWords[fn:not(.)=$swords], $opers[fn:position() gt 2] ) };

The stop words set is computed using the fts:calcStopwords function. The function uses the function fts:resolveStopwordsUri to resolve any URI to a sequence of strings. Then, the stop words are removed from the set of search tokens.

FTLanguageOption

The FTLanguageOption is not associated with a semantics function. It is just a parameter to other semantics functions.

FTWildCardOption

FTMatchOptions of type FTWildCardOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTWildCardOption is "with wildcards" the function must return all TokenInfos in the search context that span token occurrences, such that those token occurrences are wildcard expansions of the corresponding search token. The wildcard expansions are described in Section 3.2.7 FTWildCardOption. If the FTWildCardOption is "without wildcards" all search tokens must be matched literally.

Full-Text Operators Semantics FTOr

The parameters of the ApplyFTOr function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The search context and the match options stack are not used by this function. The semantics is given below.

declare function fts:ApplyFTOr ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, $allMatches2/@stokenNum))}"> {$allMatches1/fts:match,$allMatches2/fts:match} </fts:allMatches> };

The ApplyFTOr function creates a new AllMatches in which Matches are the union of those found in the input AllMatches. Each Match represents one possible result of the corresponding FTSelection. Thus, a Match from either of the AllMatches is a result.

For example, consider the FTSelection "Mustang" ftor "Honda". The AllMatches corresponding to "Mustang" and "Honda" are given below.

The AllMatches produced by ApplyFTOr is given below.

FTAnd

The parameters of the ApplyFTAnd function are the two AllMatches corresponding to the results of the two nested FTSelections. The search context and the match options are not used by this function. The semantics is given below.

declare function fts:ApplyFTAnd ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, $allMatches2/@stokenNum))}" > { for $sm1 in $allMatches1/fts:match for $sm2 in $allMatches2/fts:match return <fts:match> {$sm1/*, $sm2/*} </fts:match> } </fts:allMatches> };

The result of the conjunction is a new AllMatches that contains the "Cartesian product" of the matches of the participating FTSelections. Every resulting Match is formed by the combination of the StringInclude components and StringExclude from the AllMatches of the nested FTSelection . Thus every match contains the positions to satisfy a Match from both original FTSelections and excludes the positions that violate the same Matches.

For example, consider the FTSelection "Mustang" ftand "rust". The source AllMatches are give below.

The AllMatches produced by ApplyFTAnd is given below.

FTUnaryNot

The parameters of the ApplyFTUnaryNot function are 1) the search context, 2) the list of match options, and 3) one AllMatches parameter corresponding to the result of the nested FTSelection to be negated. The search context and the match options are not used by this function. The semantics is given below.

declare function fts:InvertStringMatch ( $strm as element(*,fts:stringMatch) ) as element(*,fts:stringMatch) { if ($strm instance of element(fts:stringExclude)) then <fts:stringInclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}"> {$strm/fts:tokenInfo} </fts:stringInclude> else <fts:stringExclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}"> {$strm/fts:tokenInfo} </fts:stringExclude> }; declare function fts:UnaryNotHelper ( $matches as element(fts:match)* ) as element(fts:match)* { if (fn:empty($matches)) then <match/> else for $sm in $matches[1]/* for $rest in fts:UnaryNotHelper( fn:subsequence($matches, 2) ) return <fts:match> { fts:InvertStringMatch($sm), $rest/* } </fts:match> }; declare function fts:ApplyFTUnaryNot ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { fts:UnaryNotHelper($allMatches/fts:match) } </fts:allMatches> };

The generation of the resulting AllMatches of an FTUnaryNot resembles the transformation of a negation of prepositional formula in DNF back to DNF. The negation of AllMatches requires the inversion of all the conditions on the nodes encoded by the AllMatches.

In the InvertStringMatch function above, this inversion occurs as follows.

The function fts:invertStringMatch inverts a StringInclude into a StringExclude and vice versa.

The function fts:UnaryNotHelper transforms the source Matches into the resulting Matches by forming the combinations of the inversions of a StringInclude or StringExclude component over the source Matches into new Matches.

For example, consider the FTSelection ftnot ("Mustang" ftor "Honda"). The source AllMatches is given below:

The FTUnaryNot transforms the StringIncludes to StringExcludes as illustrated below.

FTMildNot

The parameters of the ApplyFTMildNot function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The search context and the match options stack are not used by this function. The semantics is given below.

declare function fts:CoveredIncludePositions ( $match as element(fts:match) ) as xs:integer* { for $strInclude in $match/fts:stringInclude return $strInclude/fts:tokenInfo/@startPos to $strInclude/fts:tokenInfo/@endPos }; declare function fts:ApplyFTMildNot ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { if (fn:count($allMatches1//fts:stringExclude) gt 0) then fn:error("Invalid expression on the left-hand side of a not-in") else if (fn:count($allMatches2//fts:stringExclude) gt 0) then fn:error("Invalid expression on the right-hand side of a not-in") else if (fn:count($allMatches2//fts:stringInclude) eq 0) then $allMatches1 else <fts:allMatches stokenNum="{$allMatches1/@stokenNum}"> { $allMatches1/fts:match[ every $matches2 in $allMatches2/fts:match satisfies let $posSet1 := fts:CoveredIncludePositions(.) let $posSet2 := fts:CoveredIncludePositions($matches2) return some $pos in $posSet1 satisfies fn:not($pos = $posSet2) ] } </fts:allMatches> };

The resulting AllMatches contains Matches of the first operand that do not mention in their StringInclude components positions in a StringInclude component in the AllMatches of the second operand.

For example, consider the FTSelection ("Ford" not in "Ford Mustang"). The source AllMatches for the left-hand side argument is given below.

The source AllMatches for the right-hand side argument is given below.

The FTMildNot will transform these to an empty AllMatches because both position 1 and position 27 from the first AllMatches contain only TokenInfos from StringInclude components of the second AllMatches.

FTOrder

The parameters of the ApplyFTOrder function are 1) the search context, 2) the list of match options, and 3) one AllMatches parameter corresponding to the result of the nested FTSelections. The evaluation context and the match options are not used by this function. The semantics is given below.

declare function fts:ApplyFTOrder ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies (($stringInclude1/fts:tokenInfo/@startPos <= $stringInclude2/fts:tokenInfo/@startPos) and ($stringInclude1/@queryPos <= $stringInclude2/@queryPos)) or (($stringInclude1/fts:tokenInfo/@startPos>= $stringInclude2/fts:tokenInfo/@startPos) and ($stringInclude1/@queryPos >= $stringInclude2/@queryPos)) return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies (($stringExcl/fts:tokenInfo/@startPos <= $stringIncl/fts:tokenInfo/@startPos) and ($stringExcl/@queryPos <= $stringIncl/@queryPos)) or (($stringExcl/fts:tokenInfo/@startPos >= $stringIncl/fts:tokenInfo/@startPos) and ($stringExcl/@queryPos >= $stringIncl/@queryPos)) return $stringExcl } </fts:match> } </fts:allMatches> };

The resulting AllMatches contains the Matches for which the start positions in the StringInclude elements are in the order of the query positions of their query strings. StringExcludes that preserve the order (with respect to their start positions) are also retained.

For example, consider the FTSelection ("great" ftand "condition") ordered. The source AllMatches is given below.

The AllMatches for FTOrder are given below.

FTScope

The parameters of the ApplyFTScope function are 1) the search context, 2) the list of match options, 3) the type of the scope (same or different), 4) the linguistic unit (sentence or paragraph), and 5) one AllMatches parameter corresponding to the result of the nested FTSelections. The search context and the match options are not used by this function. The function definitions depend on the type of the scope (paragraph, sentence) and the scope predicate (same, different).

The semantics of same sentence is given below.

declare function fts:ApplyFTScopeSameSentence ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1/fts:tokenInfo/@startSent = $stringInclude2/fts:tokenInfo/@startSent and $stringInclude1/fts:tokenInfo/@startSent = $stringInclude1/fts:tokenInfo/@endSent and $stringInclude2/fts:tokenInfo/@startSent = $stringInclude2/fts:tokenInfo/@endSent and $stringInclude1/fts:tokenInfo/@startSent > 0 and $stringInclude2/fts:tokenInfo/@startSent > 0 return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where $stringExcl/fts:tokenInfo/@startSent = 0 or ($stringExcl/fts:tokenInfo/@startSent = $stringExcl/fts:tokenInfo/@endSent and (every $stringIncl in $match/fts:stringInclude satisfies $stringIncl/fts:tokenInfo/@startSent = $stringExcl/fts:tokenInfo/@startSent) ) return $stringExcl } </fts:match> } </fts:allMatches> };

An AllMatches returned by the scope same sentence contains those Matches whose StringIncludes span only a single sentence and all span the same sentence. In these Matches only those StringExcludes are retained that also only span a single sentence, which is, in case there are StringIncludes in that Match, the same as the one spanned by the StringIncludes.

The semantics of different sentence is given below.

declare function fts:ApplyFTScopeDifferentSentence ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1 = $stringInclude2 or $stringInclude1/fts:tokenInfo/@endSent < $stringInclude2/fts:tokenInfo/@startSent or $stringInclude2/fts:tokenInfo/@endSent < $stringInclude1/fts:tokenInfo/@startSent return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies $stringExcl/fts:tokenInfo/@endSent < $stringIncl/fts:tokenInfo/@startSent or $stringIncl/fts:tokenInfo/@endSent < $stringExcl/fts:tokenInfo/@startSent return $stringExcl } </fts:match> } </fts:allMatches> };

An AllMatches returned by the scope different sentence contains those Matches that have no two StringIncludes covering the same sentence. In these Matches only those StringExcludes are retained that also do not cover a common sentence with one of the StringIncludes.

The semantics of same paragraph is analogous to same sentence and is given below.

declare function fts:ApplyFTScopeSameParagraph ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1/fts:tokenInfo/@startPara = $stringInclude2/fts:tokenInfo/@startPara and $stringInclude1/fts:tokenInfo/@startPara = $stringInclude1/fts:tokenInfo/@endPara and $stringInclude2/fts:tokenInfo/@startPara = $stringInclude2/fts:tokenInfo/@endPara and $stringInclude1/fts:tokenInfo/@startPara > 0 and $stringInclude2/fts:tokenInfo/@endPara > 0 return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where $stringExcl/fts:tokenInfo/@startPara = 0 or ($stringExcl/fts:tokenInfo/@startPara = $stringExcl/fts:tokenInfo/@endPara and (every $stringIncl in $match/fts:stringInclude satisfies $stringIncl/fts:tokenInfo/@startPara = $stringExcl/fts:tokenInfo/@startPara) ) return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of different paragraph is analogous to different sentence and is given below.

declare function fts:ApplyFTScopeDifferentParagraph ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1 = $stringInclude2 or $stringInclude1/fts:tokenInfo/@endPara < $stringInclude2/fts:tokenInfo/@startPara or $stringInclude2/fts:tokenInfo/@endPara < $stringInclude1/fts:tokenInfo/@startPara return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies $stringExcl/fts:tokenInfo/@endPara < $stringIncl/fts:tokenInfo/@startPara or $stringIncl/fts:tokenInfo/@endPara < $stringExcl/fts:tokenInfo/@startPara return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics for the general case is given below.

declare function fts:ApplyFTScope ( $type as fts:ScopeType, $selector as fts:ScopeSelector, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "same" and $selector eq "sentence") then fts:ApplyFTScopeSameSentence($allMatches) else if ($type eq "different" and $selector eq "sentence") then fts:ApplyFTScopeDifferentSentence($allMatches) else if ($type eq "same" and $selector eq "paragraph") then fts:ApplyFTScopeSameParagraph($allMatches) else fts:ApplyFTScopeDifferentParagraph($allMatches) };

For example, consider the FTSelection ("Mustang" ftand "Honda") same paragraph. The source AllMatches is given below.

The FTScope returns an empty AllMatches because neither Match contains TokenInfos from a single sentence.

FTContent

The parameters of the ApplyFTContent function are 1) the search context, 2) the match options, 3) the type of the content match at the start of the current node, at the end of it, or its entire content, and 4) one AllMatches parameter corresponding to the result of the nested FTSelections. The semantics is given below.

declare function fts:ApplyFTContent ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $type as fts:ContentMatchType, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "entire content") then let $temp1 := fts:ApplyFTWordDistanceExactly( $matchOptions, $allMatches, 1) let $temp2 := fts:ApplyFTContent( $searchContext, $matchOptions, $temp1, "at start") let $temp3 := fts:ApplyFTContent( $searchContext, $matchOptions, $temp2, "at end") return <fts:allMatches stokenNum="{$temp3/@stokenNum}"> { for $match in $temp3/fts:match return <fts:match> { (: Note: due to ApplyFTWordDistanceExactly above there must be either one or no stringInclude in $match :) $match/fts:stringInclude[@isContiguous], $match/fts:stringExclude[@isContiguous] } </fts:match> } </fts:allMatches> else <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where if ($type eq "at start") then some $si in $match/fts:stringInclude satisfies fts:isStartToken($searchContext, $si/fts:tokenInfo) else (: $type eq "at end" :) some $si in $match/fts:stringInclude satisfies fts:isEndToken($searchContext, $si/fts:tokenInfo) return $match } </fts:allMatches> };

The evaluation of scope functions depends on the type of the content match.

entire match is evaluated as distance exactly 0 words at start at end, i.e., all the StringIncludes must match every token in the content of the current search context node.

at start retains only Matches that contain a StringInclude that matches the first token. This is checked using the semantic function fts:isStartToken.

at end retains the Matches that contain a StringInclude that matches the last token. This is checked using the semantic function fts:isEndToken.

FTWindow

Before we define the semantics functions of the FTWindow and FTDistance operations, we introduce the auxiliary function joinIncludes that will be used in their definitions. joinIncludes takes a sequence of StringIncludes of a Match and transforms it into either the empty sequence, in case the input sequence was empty, or otherwise a single StringInclude representing the span from the first position of the match to the last. For the purpose of being able to evaluate an "entire content" operator further up in the tree, we pre-evaluate whether all possible positions between first and last are covered in the input StringIncludes and store that boolean in the attribute "isContiguous".

declare function fts:joinIncludes( $strIncls as element(fts:stringInclude)* ) as element(fts:stringInclude)* { if (fn:empty($strIncls)) then $strIncls else let $posSet := fts:CoveredIncludePositions(<fts:match>$strIncls</fts:match>), $minPos := fn:min($strIncls/fts:tokenInfo/@startPos), $maxPos := fn:max($strIncls/fts:tokenInfo/@endPos), $isContiguous := ( every $pos in $minPos to $maxPos satisfies ($pos = $posSet) ) and ( every $strIncl in $strIncls satisfies $strIncl/@isContiguous ) return <fts:stringInclude queryPos="{$strIncls[1]/@queryPos}" isContiguous="{$isContiguous}"> <fts:tokenInfo startPos="{$minPos}" endPos="{$maxPos}" startSent="{fn:min($strIncls/fts:tokenInfo/@startSent)}" endSent="{fn:max($strIncls/fts:tokenInfo/@startSent)}" startPara="{fn:min($strIncls/fts:tokenInfo/@startPara)}" endPara="{fn:max($strIncls/fts:tokenInfo/@startPara)}"/> </fts:stringInclude> };

The parameters of the ApplyFTWindow function are 1) the search context, 2) the list of match options, 3) the unit of type fts:DistanceType, 4) a size, and 5) one AllMatches parameter corresponding to the result of the nested FTSelections. The search context is not used by this function. For each unit type a function is defined as follows.

The semantics of window N words is given below.

declare function fts:ApplyFTWordWindow ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/*/fts:tokenInfo/@startPos), $maxpos := fn:max($match/*/fts:tokenInfo/@endPos) for $windowStartPos in ($minpos to $maxpos - $n + 1) let $windowEndPos := $windowStartPos + $n - 1 where fn:min($match/fts:stringInclude/fts:tokenInfo/@startPos) >= $windowStartPos and fn:max($match/fts:stringInclude/fts:tokenInfo/@endPos) <= $windowEndPos return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startPos >= $windowStartPos and $stringExclude/fts:tokenInfo/@endPos <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };

The semantics of window N sentences is given below.

declare function fts:ApplyFTSentenceWindow ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/*/fts:tokenInfo/@startSent), $maxpos := fn:max($match/*/fts:tokenInfo/@endSent) for $windowStartPos in ($minpos to $maxpos - $n + 1) let $windowEndPos := $windowStartPos + $n - 1 where fn:min($match/fts:stringInclude/fts:tokenInfo/@startSent) >= $windowStartPos and fn:max($match/fts:stringInclude/fts:tokenInfo/@endSent) <= $windowEndPos return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startSent >= $windowStartPos and $stringExclude/fts:tokenInfo/@endSent <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };

The semantics of word N paragraphs is given below.

declare function fts:ApplyFTParagraphWindow ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/*/fts:tokenInfo/@startPara), $maxpos := fn:max($match/*/fts:tokenInfo/@endPara) for $windowStartPos in ($minpos to $maxpos - $n + 1) let $windowEndPos := $windowStartPos + $n - 1 where fn:min($match/fts:stringInclude/fts:tokenInfo/@startPara) >= $windowStartPos and fn:max($match/fts:stringInclude/fts:tokenInfo/@endPara) <= $windowEndPos return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startPara >= $windowStartPos and $stringExclude/fts:tokenInfo/@endPara <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };

The resulting AllMatches contains Matches of the operand that satisfy the condition that there exists a sequence of the specified number of consecutive (token, sentence, or paragraph) positions, such that all StringIncludes are within that window, and the StringExcludes retained are also within that window. For each Match that satisfies the window condition the StringIncludes are joined into a single StringInclude. This enables further window or distance operations to be applied to the result in a way that that result is taken as a single entity.

The semantics for the general function is given below.

declare function fts:ApplyFTWindow ( $matchOptions as element(fts:matchOptions), $type as fts:DistanceType, $size as xs:integer, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "word") then fts:ApplyFTWordWindow($matchOptions, $allMatches, $size) else if ($type eq "sentence") then fts:ApplyFTSentenceWindow($matchOptions, $allMatches, $size) else fts:ApplyFTParagraphWindow($matchOptions, $allMatches, $size) };

For example, consider the FTWindow selection ("Ford Mustang" ftand "excellent") window 10 words. The Matches of the source AllMatches for ("Ford Mustang" ftand "excellent") are given below.

The result for the FTWindow selection consists of only the first, the fifth, and the sixth Matches because their respective window sizes are 5, 4, and 9.

FTDistance

The parameters of the ApplyFTDistance function are 1) the search context, 2) the list of match options, 3) one AllMatches parameter corresponding to the result of the nested FTSelections, 4) the unit of the distance (tokens, sentences, paragraphs), and 5) the range specified. The search context is not used by this function. The function definitions depend on the distance units and the range specifications.

The semantics of case word distance exactly N is given below.

declare function fts:ApplyFTWordDistanceExactly( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $idx in 1 to fn:count($sorted) - 1 satisfies fts:wordDistance( $sorted[$idx]/fts:tokenInfo, $sorted[$idx+1]/fts:tokenInfo, $matchOptions) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) = $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of word distance at least N is given below.

declare function fts:ApplyFTWordDistanceAtLeast ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of word distance at most N is given below.

declare function fts:ApplyFTWordDistanceAtMost ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of word distance from M to N is given below.

declare function fts:ApplyFTWordDistanceFromTo ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) >= $m and fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) >= $m and fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance exactly N is given below.

declare function fts:ApplyFTSentenceDistanceExactly ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) = $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance at least N is given below.

declare function fts:ApplyFTSentenceDistanceAtLeast ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance at most N is given below.

declare function fts:ApplyFTSentenceDistanceAtMost ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance from M to N is given below.

declare function fts:ApplyFTSentenceDistanceFromTo ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) >= $m and fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) >= $m and fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance exactly N is given below.

declare function fts:ApplyFTParagraphDistanceExactly ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) = $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance at least N is given below.

declare function fts:ApplyFTParagraphDistanceAtLeast ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance at most N is given below.

declare function fts:ApplyFTParagraphDistanceAtMost ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance from M to N is given below.

declare function fts:ApplyFTParagraphDistanceFromTo ( $matchOptions as element(fts:matchOptions), $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) >= $m and fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo, $matchOptions) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) >= $m and fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo, $matchOptions) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The resulting AllMatches contains Matches of the operand that satisfy the condition that the distance for every pair of consecutive StringIncludes is within the specified interval, where the distance is measured in tokens, sentences, or paragraphs from the end of the preceding StringInclude to the start of the next.

In the general case, the semantics is given below.

declare function fts:ApplyFTDistance ( $matchOptions as element(fts:matchOptions), $type as fts:DistanceType, $range as element(fts:range), $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "word") then if ($range/@type eq "exactly") then fts:ApplyFTWordDistanceExactly($matchOptions, $allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTWordDistanceAtLeast($matchOptions, $allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTWordDistanceAtMost($matchOptions, $allMatches, $range/@n) else fts:ApplyFTWordDistanceFromTo($matchOptions, $allMatches, $range/@m, $range/@n) else if ($type eq "sentence") then if ($range/@type eq "exactly") then fts:ApplyFTSentenceDistanceExactly($matchOptions, $allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTSentenceDistanceAtLeast($matchOptions, $allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTSentenceDistanceAtMost($matchOptions, $allMatches, $range/@n) else fts:ApplyFTSentenceDistanceFromTo($matchOptions, $allMatches, $range/@m, $range/@n) else if ($range/@type eq "exactly") then fts:ApplyFTParagraphDistanceExactly($matchOptions, $allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTParagraphDistanceAtLeast($matchOptions, $allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTParagraphDistanceAtMost($matchOptions, $allMatches, $range/@n) else fts:ApplyFTParagraphDistanceFromTo($matchOptions, $allMatches, $range/@m, $range/@n) };

For example, consider the FTDistance selection ("Ford Mustang" ftand "excellent") distance at most 3 words. The Matches of the source AllMatches for ("Ford Mustang" ftand "excellent") are given below.

The result for the FTDistance selection consists of only the first Match (with positions 1, 2, and 5), because only for this Match the word distance between consecutive TokenInfos is always less than or equal to 3. It is 1 for the first pair and 3 for the second.

FTTimes

The parameters of the ApplyFTTimes function are 1)an FTRange specification, and 2) parameter corresponding to the result of the nested FTWords.

The function definitions depend on the range specification FTRange to limit the number of occurrences.

The general semantics is given below.

declare function fts:FormCombinations ( $sms as element(fts:match)*, $times as xs:integer ) as element(fts:match)* { if ( $times eq 1 ) then $sms else if (fn:count($sms) lt $times) then () else if (fn:count($sms) eq $times) then <fts:match>{$sms/*}</fts:match> else ( fts:FormCombinations(fn:subsequence($sms, 2), $times), for $combination in fts:FormCombinations(fn:subsequence($sms, 2), $times - 1) return <fts:match> { $sms[1]/*, $combination/* } </fts:match> ) }; declare function fts:FormRange ( $sms as element(fts:match)*, $l as xs:integer, $u as xs:integer, $stokenNum as xs:integer ) as element(fts:allMatches) { if ($l > $u) then () else let $am1 := <fts:allMatches stokenNum="{$stokenNum}"> {fts:FormCombinations($sms, $l)} </fts:allMatches> let $am2 := <fts:allMatches stokenNum="{$stokenNum}"> {fts:FormCombinations($sms, $u+1)} </fts:allMatches> return fts:ApplyFTAnd($am1, fts:ApplyFTUnaryNot($am2)) };

The semantics of occurs exactly N times is given below.

declare function fts:ApplyFTTimesExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/match, $n, $n, $allMatches/@stokenNum) };

The semantics of occurs at least N times is given below.

declare function fts:ApplyFTTimesAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> {fts:FormCombinations($allMatches/fts:match, $n)} </fts:allMatches> };

The semantics of occurs at most N times is given below.

declare function fts:ApplyFTTimesAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, 0, $n, $allMatches/@stokenNum) };

The semantics of occurs from M to N times is given below.

declare function fts:ApplyFTTimesFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, $m, $n, $allMatches/@stokenNum) };

The way to ensure that there are at least N different matches of an FTSelection is to ensure that at least N of its Matches occur simultaneously. This is similar to forming their conjunction by combining N distinct Matches into one simple match. Therefore, the AllMatches for the selection condition specifying the range qualifier at least N contains the possible combinations of N simple matches of the operand and one Match for each combination negating the rest of the simple matches. This operations is performed in the function fts:FormCombinations.

The range [l, u] is represented by the condition at least l and not at least l+1.This transformation is performed in the function fts:FormRange.

The semantics for the general case is given below.

declare function fts:ApplyFTTimes ( $range as element(fts:range), $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if (fn:count($allMatches//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'err:XPST0003')) else if ($range/@type eq "exactly") then fts:ApplyFTTimesExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTTimesAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTTimesAtMost($allMatches, $range/@n) else fts:ApplyFTTimesFromTo($allMatches, $range/@m, $range/@n) };

The above function performs a sanity check to ensure that the nested AllMatches is a result of the evaluation of FTWords as defined in the grammar rule for FTPrimary . Otherwise, an error is raised.

For example, consider the FTTimes selection "Mustang" occurs at least 2 times. The source AllMatches of the FTWords selection "Mustang" is given below.

The result consists of the pairs of the Matches.

XQuery 1.0 and XPath 2.0 Full-Text 1.0 and Scoring Expressions FTContainsExpr

The FTContainsExpr function defines the semantics of FTContainsExpr. The function takes the following parameters: 1) a search context consisting of a sequence of nodes (which is the result of a XQuery 1.0 and XPath 2.0 expression) and 2) an AllMatches corresponding to an FTSelection. The function returns a xs:boolean atomic value. This value is true if and only if some node in the search contains satisifes the full-text condition given by the FTSelection. Since FTContainsExpr returns results in XDM (a sequence of items), it may be treated like XQuery 1.0 expressions and may be fully composed with other XQuery 1.0 expressions. In addition, since the FTContainsExpr function maps AllMatches to a sequence of items, it provides semantics for mapping from AllMatches to XDM.

Semantics of FTContainsExpr

Consider an FTContainsExpr expression of the form EvaluationContext ftcontains FTSelection, where EvaluationContext is an XQuery 1.0 expression that returns a sequence of nodes and FTSelection is an FTSelection that returns AllMatches. The FTContainsExpr returns true if and only if some node in the result of EvaluationContext satisfies the AllMatches returned by FTSelection.

If the FTContainsExpr is of the form EvaluationContext ftcontains FTSelection without content IgnoreExpr for some XQuery 1.0 expression IgnoreExpr, then the following helper function is required.

declare function fts:reconstruct ( $n as item(), $ignore as node()* ) as item()? { typeswitch ($n) case node() return if (some $i in $ignore satisfies $n is $i) then () else if ($n instance of element()) then let $nodeName := fn:node-name($n) let $nodeContent := for $nn in $n/node() return fts:reconstruct($nn,$ignore) return element {$nodeName} {$nodeContent} else if ($n instance of document-node()) then document { for $nn in $n/node() return fts:reconstruct($nn, $ignore) } else $n default return $n };

In the general case, the XQuery 1.0 and XPath 2.0 FTContainsExpr function takes four parameters.

The sequence of items returned by EvalationContext;

The XML node representation of FTSelection;

The sequence of nodes returned by IgnoreExpr, if that expression is present, or the empty sequence otherwise; and

The XML representation of the set of default values for each of the FTMatchOptions as given by the static context.

The FTContainsExpr function returns true if and only if the corresponding FTContainsExpr returns true, and thus specifies the semantics of FTContainsExpr. Note that by using XQuery 1.0 and XPath 2.0 to specify the formal semantics, we avoid the need to introduce new formalism. We simply reuse the formal semantics of XQuery 1.0 and XPath 2.0.

declare function fts:FTContainsExpr ( $searchContext as item()*, $ftSelection as element(*,fts:ftSelection), $ignoreNodes as node()*, $defOptions as element(fts:matchOptions) ) as xs:boolean { some $node in $searchContext satisfies let $newNode := fts:reconstruct( $node, $ignoreNodes ) return if (fn:empty($newNode)) then fn:false() else let $allMatches := fts:evaluate($ftSelection, $newNode, $defOptions, 0) return some $match in $allMatches/fts:match satisfies fn:count($match/fts:stringExclude) eq 0 };

The FTContainsExpr function returns true if and only if the AllMatches that is the result of the application of the FTSelection for some node in the search context contains a Match with no StringExcludes. In other words, there is a set of TokenInfos in that node which satisfy the condition of the FTSelection. If an FTIgnoreOption has been specified in the FTContainsExpr, then each node $ignoreNodes that is part of the tree of a node in the search context is pruned from that tree using the function reconstruct before that node is being passed to fts:evaluate.

Scoring

This section addresses the semantics of scoring variables in XQuery 1.0 for and let clauses and XPath 2.0 for expressions.

Scoring variables associate a numeric score with the result of the evaluation of XQuery 1.0 and XPath 2.0 expressions. This numeric score tries to estimate the value of a result item to the user information need expressed using the XQuery 1.0 and XPath 2.0 expression. The numeric score is computed using a implementation-provided scoring algorithm.

There are numerous scoring algorithms used in practice. Most of the scoring algorithms take as inputs a query and a set of results to the query. In computing the score, these algorithms rely on the structure of the query to estimate the relevance of the results.

In the context of defining the semantics of XQuery 1.0 and XPath 2.0 Full-Text, passing the structure of the query poses a problem. The query is an XQuery 1.0 and XPath 2.0 expression and an XQuery 1.0 and XPath 2.0 Full-text expression in particular. The semantics of XQuery 1.0 and XPath 2.0 expressions is expressed using functions take as arguments sequences of items and return sequences of items. They are not aware of what expression produced a particular sequence, i.e., they are not aware of the expression structure.

To define the semantics of scoring in XQuery 1.0 and XPath 2.0 Full-Text using XQuery 1.0, expressions that produce the query result (or the functions that implement the expressions) must be passed as arguments. In other words, second-order functions are necessary. Current XQuery 1.0 and XPath 2.0 do not provide such functions.

Nevertheless, in the interest of the exposition, assume that such second-order functions are present. In particular, that there are two semantic second-order function fts:score and fts:scoreSequence that take one argument (an expression) and return the score value of this expression, respectively a sequence of score values, one for each item to which the expression evaluates. The scores must satisfy scoring properties.

A for clause containing a score variable for $result score $score in Expr ... is evaluated as though it is replaced by the following the set of clauses. let $scoreSeq := fts:scoreSequence(Expr) for $result at $i in Expr let $score := $scoreSeq[$i] ... Here, $scoreSeq and $i are new variables, not appearing elsewhere, and fts:scoreSequence is the second-order function.

Similarly, a let clause containing a score variable let $result score $score := Expr ... is evaluated as though it is replaced by the following set of clauses. let $result := Expr let $score := fts:score(Expr) ...

Example

This section presents a more complex example for the evaluation of FTContainsExpr. This example uses the same sample document fragment and assigns it $doc. Consider the following FTContainsExpr.

$doc ftcontains ( ( "mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) ) window 11 words ftand ftnot "rust" ) same paragraph

Begin by evaluating the FTSelection to AllMatches.

( ( "mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) ) window 11 words ftand ftnot "rust" ) same paragraph

Step 1: Evaluate the FTWords "mustang".

Step 2: Evaluate the FTWords {"great", "excellent"} any word.

Step 2.1: Match the token "great"

Step 2.2 Match the token "excellent"

Step 2.3 - Combine the above AllMatches as if FTOr is used, i.e., by forming a union of the Matches.

Step 3 - Apply the FTTimes {("great", "excellent")} any word occurs at least 2 times forming two pairs of Matches.

Step 4 - Apply the FTAnd "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) forming all possible pairs of StringMatches.

Step 5 - Apply the FTWindow ("Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times)) window 11 words, filtering out Matches for which the window is not less than or equal to 11 tokens.

Step 6 - Evaluate FTWords "rust".

Step 7 - Apply the FTUnaryNot ftnot "rust", transforming the StringInclude into a StringExclude.

Step 8 - Apply the FTAnd (("Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times)) window 11 words) ftand ftnot "rust", forming all possible combintations of three StringMatches from the first AllMatches and one StringMatch from the second AllMatches.

Step 9: Apply the FTScope, filtering out Matches whose TokenInfos are not within the same paragraph (assuming the <offer> elements determine paragraph boundaries).

The resulting AllMatches contains a Match that does not contain a StringExclude. Therefore, the sample FTContainsExpr returns true.

Conformance

This section defines the conformance criteria for a XQuery 1.0 and XPath 2.0 Full-Text 1.0 processor.

In this section, the following terms are used to indicate the requirement levels defined in . MUST means that the item is an absolute requirement of the specification. MAY means that an item is truly optional. SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

An XQuery 1.0 and XPath 2.0 Full-Text 1.0 processor that claims to conform to this specification MUST include a claim of Minimal Conformance as defined in . In addition to a claim of Minimal Conformance, it MAY claim conformance to one or more optional features defined in

Minimal Conformance

Minimal Conformance to this specification MUST include all of the following items:

Minimal support for XQuery 1.0 or XPath 2.0 . The optional features of XQuery 1.0 or XPath 2.0 MAY be supported.

Support for everything specified in this document except those operators and match options specified in to be optional. If an implementation does not provide a given optional operator or match option, it MUST implement any requirements specified in for implementations that do not provide that operator or match option.

A definition of every item specified to be implementation-defined unless that item is an optional operator or match option that is not supported by the implementation. A list of implementation-defined items can be found in .

Implementations are not required to define items specified to be implementation-dependent

Optional Operators and Match Options FTMildNot Operator

It is optional whether the implementation supports the FTMildNot. If it does not support FTMildNot and encounters one in a full-text query, then it MUST raise an error .

FTUnaryNot Operator

The unrestricted form of negation in FTUnaryNot, that can negate every kind of FTSelection, is optional. Implementations may choose to support the negation operation in a restricted form, enforcing one or both of the following restrictions.

Negation Restriction 1. An FTUnaryNot expression may only appear as a direct right operand of an "ftand" (FTAnd) operation.

Negation Restriction 2. An FTUnaryNot expression may not appear as a descendant of an FTOr that is modified by an FTPosFilter. (An FTOr is modified by an FTPosFilter, if it is derived using the production for FTSelection together with that FTPosFilter.)

Consider the following example FTSelections.

1. ftnot "web" 2. "web" ftand ( ftnot "information" ftor "retrieval" ) 3. "web" ftand ftnot("information" ftand "retrieval") 4. "web" ftand ftnot("information" ftand "retrieval" window 5 words) 5. "web" ftand ("information" ftand ftnot "retrieval" window 5 words)

The first two FTSelections both violate restriction 1, while the third and the fourth are conform with both restrictions. The fifth one violates restriction 2, while obeying restriction 1. Note that in the last example the FTSelection to which the window operation is applied is "information" ftand ftnot "retrieval", which contains an FTUnaryNot expression.

If the implementation does enforce one or both of these restrictions on FTUnaryNot and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTUnit and FTBigUnit

It is optional whether the implementation supports all the choices of FTUnit and FTBigUnit. If it does not support one or more choices of FTUnit or FTBigUnit and encounters an unsupported FTUnit or FTBigUnit in a full-text query, then it MUST raise an error .

FTOrder Operator

The unrestricted form of the FTOrder postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTOrder.

Order Operator Restriction. FTOrder may only appear directly succeeding an FTWindow or an FTDistance operator.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTScope Operator

It is optional whether the implementation supports the FTScope operator. If it does not support FTScope and encounters one in a full-text query, then it MUST raise an error .

FTWindow Operator

The unrestricted form of the FTWindow postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTWindow.

Window Operator Restriction. FTWindow can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTDistance Operator

The unrestricted form of the FTDistance postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTDistance.

Distance Operator Restriction. FTDistance can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTTimes Operator

It is optional whether the implementation supports the FTTimes operator. If it does not support FTTimes and encounters one in a full-text query, then it MUST raise an error .

FTContent Operator

It is optional whether the implementation supports the FTContent operator. If it does not support FTContent and encounters one in a full-text query, then it MUST raise an error .

FTCaseOption

It is optional whether the implementation supports the "lowercase" and "uppercase" choices for the FTCaseOption. If it does not support these choices for the FTCaseOption and encounters an unsupported choice in a full-text query, then it MUST raise an error .

FTStopwordOption

It is optional whether the implementation supports the FTStopwordOption. If it does not support FTStopwordOption and encounters one in a full-text query, then it MUST raise an error .

It is optional whether the implementation supports the FTStopwordOption in the body of the query. If it supports FTStopwordOption in the prolog, but not in the body of a query, and encounters one in the body of a query it MUST raise an error .

It is optional whether the implementation supports the StringLiteral alternative of FTRefOrList in the FTStopwordOption. If it does not support the StringLiteral alternative of FTRefOrList and encounters such an alternative in a full-text query, then it MUST raise an error .

FTLanguageOption

It is optional whether the implementation supports the unrestricted form of FTLanguageOption. Implementations may choose to enforce the following restriction on the use of FTLanguageOption.

Single Language Restriction. If a full-text query contains more than one FTLanguageOption in its body and the prolog, then the languages specified must be the same.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTIgnoreOption

The implementation may constrain the set of ignored nodes. If the operand of FTIgnoreOption violates the implementation-defined restriction on that operand, it MUST raise an error .

Scoring

The implementation may constrain the expression that is used to compute scores as follows.

Scoring Restriction. If a score variable is used in a ForClause, a LetClause, or a SimpleForClause, then the corresponding ExprSingle may only be composed of a single FTContainsExpr, or of a combination of FTContainsExpr formed with the XQuery Boolean operators and and or.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

EBNF for XQuery 1.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XML Query 1.0 grammar (see http://www.w3.org/TR/2005/CR-xquery-20051103/).

ModuleVersionDecl? (LibraryModule | MainModule)VersionDecl"xquery" "version" StringLiteral ("encoding" StringLiteral)? SeparatorMainModuleProlog QueryBodyLibraryModuleModuleDecl PrologModuleDecl"module" "namespace" NCName "=" URILiteral SeparatorProlog((DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator)* ((VarDecl | FunctionDecl | OptionDecl | FTOptionDecl) Separator)*SetterBoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderDecl | CopyNamespacesDeclImportSchemaImport | ModuleImportSeparator";"NamespaceDecl"declare" "namespace" NCName "=" URILiteralBoundarySpaceDecl"declare" "boundary-space" ("preserve" | "strip")DefaultNamespaceDecl"declare" "default" ("element" | "function") "namespace" URILiteralOptionDecl"declare" "option" QName StringLiteralFTOptionDecl"declare" "ft-option" FTMatchOptionsOrderingModeDecl"declare" "ordering" ("ordered" | "unordered")EmptyOrderDecl"declare" "default" "order" "empty" ("greatest" | "least")CopyNamespacesDecl"declare" "copy-namespaces" PreserveMode "," InheritModePreserveMode"preserve" | "no-preserve"InheritMode"inherit" | "no-inherit"DefaultCollationDecl"declare" "default" "collation" URILiteralBaseURIDecl"declare" "base-uri" URILiteralSchemaImport"import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)?SchemaPrefix("namespace" NCName "=") | ("default" "element" "namespace")ModuleImport"import" "module" ("namespace" NCName "=")? URILiteral ("at" URILiteral ("," URILiteral)*)?VarDecl"declare" "variable" "$" QName TypeDeclaration? ((":=" ExprSingle) | "external")ConstructionDecl"declare" "construction" ("strip" | "preserve")FunctionDecl"declare" "function" QName "(" ParamList? ")" ("as" SequenceType)? (EnclosedExpr | "external")ParamListParam ("," Param)*Param"$" QName TypeDeclaration?EnclosedExpr"{" Expr "}"QueryBodyExprExprExprSingle ("," ExprSingle)*ExprSingleFLWORExpr
| QuantifiedExpr
| TypeswitchExpr
| IfExpr
| OrExprFLWORExpr(ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingleForClause"for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)*PositionalVar"at" "$" VarNameFTScoreVar"score" "$" VarNameLetClause(("let" "$" VarName TypeDeclaration?) | ("let" "score" "$" VarName)) ":=" ExprSingle ("," (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle)*WhereClause"where" ExprSingleOrderByClause(("order" "by") | ("stable" "order" "by")) OrderSpecListOrderSpecListOrderSpec ("," OrderSpec)*OrderSpecExprSingle OrderModifierOrderModifier("ascending" | "descending")? ("empty" ("greatest" | "least"))? ("collation" URILiteral)?QuantifiedExpr("some" | "every") "$" VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingleTypeswitchExpr"typeswitch" "(" Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingleCaseClause"case" ("$" VarName "as")? SequenceType "return" ExprSingleIfExpr"if" "(" Expr ")" "then" ExprSingle "else" ExprSingleOrExprAndExpr ( "or" AndExpr )*AndExprComparisonExpr ( "and" ComparisonExpr )*ComparisonExprFTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?FTContainsExprRangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?RangeExprAdditiveExpr ( "to" AdditiveExpr )?AdditiveExprMultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*MultiplicativeExprUnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*UnionExprIntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*IntersectExceptExprInstanceofExpr ( ("intersect" | "except") InstanceofExpr )*InstanceofExprTreatExpr ( "instance" "of" SequenceType )?TreatExprCastableExpr ( "treat" "as" SequenceType )?CastableExprCastExpr ( "castable" "as" SingleType )?CastExprUnaryExpr ( "cast" "as" SingleType )?UnaryExpr("-" | "+")* ValueExprValueExprValidateExpr | PathExpr | ExtensionExprGeneralComp"=" | "!=" | "<" | "<=" | ">" | ">="ValueComp"eq" | "ne" | "lt" | "le" | "gt" | "ge"NodeComp"is" | "<<" | ">>"ValidateExpr"validate" ValidationMode? "{" Expr "}"ValidationMode"lax" | "strict"ExtensionExprPragma+ "{" Expr? "}"Pragma"(#" S? QName (S PragmaContents)? "#)"ws: explicitPragmaContents(Char* - (Char* '#)' Char*))PathExpr("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExprxgc: leading-lone-slashRelativePathExprStepExpr (("/" | "//") StepExpr)*StepExprFilterExpr | AxisStepAxisStep(ReverseStep | ForwardStep) PredicateListForwardStep(ForwardAxis NodeTest) | AbbrevForwardStepForwardAxis("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")AbbrevForwardStep"@"? NodeTestReverseStep(ReverseAxis NodeTest) | AbbrevReverseStepReverseAxis("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")AbbrevReverseStep".."NodeTestKindTest | NameTestNameTestQName | WildcardWildcard"*"
| (NCName ":" "*")
| ("*" ":" NCName)ws: explicitFilterExprPrimaryExpr PredicateListPredicateListPredicate*Predicate"[" Expr "]"PrimaryExprLiteral | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | OrderedExpr | UnorderedExpr | ConstructorLiteralNumericLiteral | StringLiteralNumericLiteralIntegerLiteral | DecimalLiteral | DoubleLiteralVarRef"$" VarNameVarNameQNameParenthesizedExpr"(" Expr? ")"ContextItemExpr"."OrderedExpr"ordered" "{" Expr "}"UnorderedExpr"unordered" "{" Expr "}"FunctionCallQName "(" (ExprSingle ("," ExprSingle)*)? ")"xgc: reserved-function-namesgn: parensConstructorDirectConstructor
| ComputedConstructorDirectConstructorDirElemConstructor
| DirCommentConstructor
| DirPIConstructorDirElemConstructor"<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">"))ws: explicitDirAttributeList(S (QName S? "=" S? DirAttributeValue)?)*ws: explicitDirAttributeValue('"' (EscapeQuot | QuotAttrValueContent)* '"')
| ("'" (EscapeApos | AposAttrValueContent)* "'")ws: explicitQuotAttrValueContentQuotAttrContentChar
| CommonContentAposAttrValueContentAposAttrContentChar
| CommonContentDirElemContentDirectConstructor
| CDataSection
| CommonContent
| ElementContentCharCommonContentPredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExprDirCommentConstructor""ws: explicitDirCommentContents((Char - '-') | ('-' (Char - '-')))*ws: explicitDirPIConstructor"<?" PITarget (S DirPIContents)? "?>"ws: explicitDirPIContents(Char* - (Char* '?>' Char*))ws: explicitCDataSection"<![CDATA[" CDataSectionContents "]]>"ws: explicitCDataSectionContents(Char* - (Char* ']]>' Char*))ws: explicitComputedConstructorCompDocConstructor
| CompElemConstructor
| CompAttrConstructor
| CompTextConstructor
| CompCommentConstructor
| CompPIConstructorCompDocConstructor"document" "{" Expr "}"CompElemConstructor"element" (QName | ("{" Expr "}")) "{" ContentExpr? "}"ContentExprExprCompAttrConstructor"attribute" (QName | ("{" Expr "}")) "{" Expr? "}"CompTextConstructor"text" "{" Expr "}"CompCommentConstructor"comment" "{" Expr "}"CompPIConstructor"processing-instruction" (NCName | ("{" Expr "}")) "{" Expr? "}"SingleTypeAtomicType "?"?TypeDeclaration"as" SequenceTypeSequenceType("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)OccurrenceIndicator"?" | "*" | "+"xgc: occurrence-indicatorsItemTypeKindTest | ("item" "(" ")") | AtomicTypeAtomicTypeQNameKindTestDocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTestAnyKindTest"node" "(" ")"DocumentTest"document-node" "(" (ElementTest | SchemaElementTest)? ")"TextTest"text" "(" ")"CommentTest"comment" "(" ")"PITest"processing-instruction" "(" (NCName | StringLiteral)? ")"AttributeTest"attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"AttribNameOrWildcardAttributeName | "*"SchemaAttributeTest"schema-attribute" "(" AttributeDeclaration ")"AttributeDeclarationAttributeNameElementTest"element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"ElementNameOrWildcardElementName | "*"SchemaElementTest"schema-element" "(" ElementDeclaration ")"ElementDeclarationElementNameAttributeNameQNameElementNameQNameTypeNameQNameURILiteralStringLiteralFTSelectionFTOr FTPosFilter* ("weight" RangeExpr)?FTOrFTAnd ( "ftor" FTAnd )*FTAndFTMildNot ( "ftand" FTMildNot )*FTMildNotFTUnaryNot ( "not" "in" FTUnaryNot )*FTUnaryNot("ftnot")? FTPrimaryWithOptionsFTPrimaryWithOptionsFTPrimary FTMatchOptions?FTPrimary(FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelectionFTWordsFTWordsValue FTAnyallOption?FTWordsValueLiteral | ("{" Expr "}")FTExtensionSelectionPragma+ "{" FTSelection? "}"FTAnyallOption("any" "word"?) | ("all" "words"?) | "phrase"FTTimes"occurs" FTRange "times"FTRange("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)FTPosFilterFTOrder | FTWindow | FTDistance | FTScope | FTContentFTOrder"ordered"FTWindow"window" AdditiveExpr FTUnitFTDistance"distance" FTRange FTUnitFTUnit"words" | "sentences" | "paragraphs"FTScope("same" | "different") FTBigUnitFTBigUnit"sentence" | "paragraph"FTContent("at" "start") | ("at" "end") | ("entire" "content")FTMatchOptionsFTMatchOption+xgc: multiple-match-optionsFTMatchOptionFTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopwordOption
| FTExtensionOptionFTCaseOption("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"FTDiacriticsOption("diacritics" "insensitive")
| ("diacritics" "sensitive")FTStemOption("with" "stemming") | ("without" "stemming")FTThesaurusOption("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")FTThesaurusID"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")?FTStopwordOption("with" "stop" "words" FTRefOrList FTInclExclStringLiteral*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTInclExclStringLiteral*)FTRefOrList("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")FTInclExclStringLiteral("union" | "except") FTRefOrListFTLanguageOption"language" StringLiteralFTWildCardOption("with" "wildcards") | ("without" "wildcards")FTExtensionOption"option" QName StringLiteralFTIgnoreOption"without" "content" UnionExpr Terminal Symbols IntegerLiteralDigitsDecimalLiteral("." Digits) | (Digits "." [0-9]*)ws: explicitDoubleLiteral(("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digitsws: explicitStringLiteral('"' (PredefinedEntityRef | CharRef | EscapeQuot | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | EscapeApos | [^'&])* "'")ws: explicitPredefinedEntityRef"&" ("lt" | "gt" | "amp" | "quot" | "apos") ";"ws: explicitEscapeQuot'""'EscapeApos"''"ElementContentCharChar - [{}<&]QuotAttrContentCharChar - ["{}<&]AposAttrContentCharChar - ['{}<&]Comment"(:" (CommentContents | Comment)* ":)"ws: explicitgn: commentsPITarget[http://www.w3.org/TR/REC-xml#NT-PITarget]xgc: xml-versionCharRef[http://www.w3.org/TR/REC-xml#NT-CharRef]xgc: xml-versionQName[http://www.w3.org/TR/REC-xml-names/#NT-QName]xgc: xml-versionNCName[http://www.w3.org/TR/REC-xml-names/#NT-NCName]xgc: xml-versionS[http://www.w3.org/TR/REC-xml#NT-S]xgc: xml-versionChar[http://www.w3.org/TR/REC-xml#NT-Char]xgc: xml-version

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of .

Digits[0-9]+CommentContents(Char+ - (Char* ('(:' | ':)') Char*)) Extra-grammatical Constraints

This section contains constraints on the EBNF productions, which are required to parse legal sentences. The note below is referenced from the right side of the production, with the notation: /* xgc: <id> */.

multiple-match-options

No single alternative for FTMatchOption can be specified more than once as part of the same FTMatchOptions. For example, if the FTCaseOption "lowercase" is specified, then "uppercase" cannot also be specified as part of the same FTMatchOptions.

EBNF for XPath 2.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XPath 2.0 grammar (see http://www.w3.org/TR/2005/CR-xpath20-20051103/).

XPathExprExprExprSingle ("," ExprSingle)*ExprSingleForExpr
| QuantifiedExpr
| IfExpr
| OrExprForExprSimpleForClause "return" ExprSingleSimpleForClause"for" "$" VarName FTScoreVar? "in" ExprSingle ("," "$" VarName FTScoreVar? "in" ExprSingle)*FTScoreVar"score" "$" VarNameQuantifiedExpr("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingleIfExpr"if" "(" Expr ")" "then" ExprSingle "else" ExprSingleOrExprAndExpr ( "or" AndExpr )*AndExprComparisonExpr ( "and" ComparisonExpr )*ComparisonExprFTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?FTContainsExprRangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?RangeExprAdditiveExpr ( "to" AdditiveExpr )?AdditiveExprMultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*MultiplicativeExprUnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*UnionExprIntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*IntersectExceptExprInstanceofExpr ( ("intersect" | "except") InstanceofExpr )*InstanceofExprTreatExpr ( "instance" "of" SequenceType )?TreatExprCastableExpr ( "treat" "as" SequenceType )?CastableExprCastExpr ( "castable" "as" SingleType )?CastExprUnaryExpr ( "cast" "as" SingleType )?UnaryExpr("-" | "+")* ValueExprValueExprPathExprGeneralComp"=" | "!=" | "<" | "<=" | ">" | ">="ValueComp"eq" | "ne" | "lt" | "le" | "gt" | "ge"NodeComp"is" | "<<" | ">>"Pragma"(#" S? QName (S PragmaContents)? "#)"ws: explicitPragmaContents(Char* - (Char* '#)' Char*))PathExpr("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExprxgc: leading-lone-slashRelativePathExprStepExpr (("/" | "//") StepExpr)*StepExprFilterExpr | AxisStepAxisStep(ReverseStep | ForwardStep) PredicateListForwardStep(ForwardAxis NodeTest) | AbbrevForwardStepForwardAxis("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")
| ("namespace" "::")AbbrevForwardStep"@"? NodeTestReverseStep(ReverseAxis NodeTest) | AbbrevReverseStepReverseAxis("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")AbbrevReverseStep".."NodeTestKindTest | NameTestNameTestQName | WildcardWildcard"*"
| (NCName ":" "*")
| ("*" ":" NCName)ws: explicitFilterExprPrimaryExpr PredicateListPredicateListPredicate*Predicate"[" Expr "]"PrimaryExprLiteral | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCallLiteralNumericLiteral | StringLiteralNumericLiteralIntegerLiteral | DecimalLiteral | DoubleLiteralVarRef"$" VarNameVarNameQNameParenthesizedExpr"(" Expr? ")"ContextItemExpr"."FunctionCallQName "(" (ExprSingle ("," ExprSingle)*)? ")"xgc: reserved-function-namesgn: parensSingleTypeAtomicType "?"?SequenceType("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)OccurrenceIndicator"?" | "*" | "+"xgc: occurrence-indicatorsItemTypeKindTest | ("item" "(" ")") | AtomicTypeAtomicTypeQNameKindTestDocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTestAnyKindTest"node" "(" ")"DocumentTest"document-node" "(" (ElementTest | SchemaElementTest)? ")"TextTest"text" "(" ")"CommentTest"comment" "(" ")"PITest"processing-instruction" "(" (NCName | StringLiteral)? ")"AttributeTest"attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"AttribNameOrWildcardAttributeName | "*"SchemaAttributeTest"schema-attribute" "(" AttributeDeclaration ")"AttributeDeclarationAttributeNameElementTest"element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"ElementNameOrWildcardElementName | "*"SchemaElementTest"schema-element" "(" ElementDeclaration ")"ElementDeclarationElementNameAttributeNameQNameElementNameQNameTypeNameQNameURILiteralStringLiteralFTSelectionFTOr FTPosFilter* ("weight" RangeExpr)?FTOrFTAnd ( "ftor" FTAnd )*FTAndFTMildNot ( "ftand" FTMildNot )*FTMildNotFTUnaryNot ( "not" "in" FTUnaryNot )*FTUnaryNot("ftnot")? FTPrimaryWithOptionsFTPrimaryWithOptionsFTPrimary FTMatchOptions?FTPrimary(FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelectionFTWordsFTWordsValue FTAnyallOption?FTWordsValueLiteral | ("{" Expr "}")FTExtensionSelectionPragma+ "{" FTSelection? "}"FTAnyallOption("any" "word"?) | ("all" "words"?) | "phrase"FTTimes"occurs" FTRange "times"FTRange("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)FTPosFilterFTOrder | FTWindow | FTDistance | FTScope | FTContentFTOrder"ordered"FTWindow"window" AdditiveExpr FTUnitFTDistance"distance" FTRange FTUnitFTUnit"words" | "sentences" | "paragraphs"FTScope("same" | "different") FTBigUnitFTBigUnit"sentence" | "paragraph"FTContent("at" "start") | ("at" "end") | ("entire" "content")FTMatchOptionsFTMatchOption+xgc: multiple-match-optionsFTMatchOptionFTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopwordOption
| FTExtensionOptionFTCaseOption("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"FTDiacriticsOption("diacritics" "insensitive")
| ("diacritics" "sensitive")FTStemOption("with" "stemming") | ("without" "stemming")FTThesaurusOption("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")FTThesaurusID"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")?FTStopwordOption("with" "stop" "words" FTRefOrList FTInclExclStringLiteral*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTInclExclStringLiteral*)FTRefOrList("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")FTInclExclStringLiteral("union" | "except") FTRefOrListFTLanguageOption"language" StringLiteralFTWildCardOption("with" "wildcards") | ("without" "wildcards")FTExtensionOption"option" QName StringLiteralFTIgnoreOption"without" "content" UnionExpr Terminal Symbols IntegerLiteralDigitsDecimalLiteral("." Digits) | (Digits "." [0-9]*)ws: explicitDoubleLiteral(("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digitsws: explicitStringLiteral('"' (EscapeQuot | [^"])* '"') | ("'" (EscapeApos | [^'])* "'")ws: explicitEscapeQuot'""'EscapeApos"''"Comment"(:" (CommentContents | Comment)* ":)"ws: explicitgn: commentsQName[http://www.w3.org/TR/REC-xml-names/#NT-QName]xgc: xml-versionNCName[http://www.w3.org/TR/REC-xml-names/#NT-NCName]xgc: xml-versionS[http://www.w3.org/TR/REC-xml#NT-S]xgc: xml-versionChar[http://www.w3.org/TR/REC-xml#NT-Char]xgc: xml-version

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of .

Digits[0-9]+CommentContents(Char+ - (Char* ('(:' | ':)') Char*)) Static Context Components

The following table describes the full-text components of the static context (as defined in ). The following aspects of each component are described:

Default initial value: This is the initial value of the component if it is not overridden or augmented by the implementation or by a query.

Can be overwritten or augmented by implementation: Indicates whether an XQuery implementation is allowed to replace the default initial value of the component by a different, implementation-defined value and/or to augment the default initial value by additional implementation-defined values.

Can be overwritten or augmented by a query: Indicates whether a query is allowed to replace and/or augment the initial value provided by default or by the implementation. If so, indicates how this is accomplished (for example, by a declaration in the prolog; as defined in ).

Scope: Indicates where the component is applicable. "Global" indicates that the component applies globally, throughout all the modules used in a query. "Module" indicates that the component applies throughout a module (as defined in ). "Lexical" indicates that the component applies within the expression in which it is defined (equivalent to "module", if the component is declared in a prolog.)

Consistency Rules: Indicates rules that must be observed in assigning values to the component.

Static Context Components
Component	Default initial value	Can be overwritten or augmented by implementation?	Can be overwritten or augmented by a query?	Scope	Consistency rules
FTCaseOption	`case insensitive`	overwriteable	overwriteable by prolog	lexical	Value must be `case insensitive`, `case sensitive`, `lowercase`, or `uppercase`.
FTDiacriticsOption	`diacritics insensitive`	overwriteable	overwriteable by prolog	lexical	Value must be `diacritics insensitive` or `diacritics sensitive`.
FTStemOption	`without stemming`	overwriteable	overwriteable by prolog	lexical	Value must be `without stemming` or `with stemming`.
FTThesaurusOption	`without thesaurus`	overwriteable	overwriteable by prolog (refer to default to augment)	lexical	Value must be part of the statically known thesauri.
Statically known thesauri	none	augmentable	cannot be augmented or overwritten by prolog	module	Each URI uniquely identifies a thesaurus list.
FTStopWordOption	`without stopwords`	overwriteable	overwriteable by prolog (refer to default to augment)	lexical	Value must be part of the statically known stop word lists.
Statically known stop word lists	none	augmentable	cannot be augmented or overwritten by prolog	module	Each URI uniquely identifies a stop word list.
FTLanguageOption	implementation-defined	overwriteable	overwriteable by prolog	lexical	Value must be castable to "xs:language".
Statically known languages	none	augmentable	cannot be augmented or overwritten by prolog	module	Each string uniquely identifies a language.
FTWildCardOption	`without wildcards`	no	overwriteable by prolog	lexical	Value must be `without wildcards` or `without wildcards`.

Error Conditions

An implementation that does not support the FTMildNot operator must raise a static error if a full-text query contains a mild not.

An implementation that enforces one of the restrictions on FTUnaryNot must raise a static error if a full-text query does not obey the restriction.

An implementation that does not support one or more of the choices on FTUnit and FTBigUnit must raise a static error if a full-text query contains one of those choices.

An implementation that does not support the FTScope operator must raise a static error if a full-text query contains a scope.

An implementation that does not support the FTTimes operator must raise a static error if a full-text query contains a times.

An implementation that restricts the use of FTStopwordsOption must raise a static error if a full-text query contains a stopwords option that does not meet the restriction.

An implementation that restricts the use of FTIgnoreOption must raise a static error if a full-text query contains an ignore option that does not meet the restriction.

It is a static error if, during the static analysis phase, the query is found to contain a stopwords option that refers to a stop word list that is not found in the statically known stop word lists.

It may be a static error if, during the static analysis phase, the query is found to contain a language identifier in a language option that the implementation does not support. The implementation may choose not to raise this error and instead provide some other implementation-defined behavior.

It is a static error if, during the static analysis phase, an expression is found to use an FTOrder operator that does not appear directly succeeding an FTWindow or an FTDistance operator and the implementation enforces this restriction.

An implementation may restrict the use of FTWindow and FTDistance to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators && and ||. If it a static error if, during the static analysis phase, an expression is found that violates this restriction and the implementation enforces this restriction.

An implementation that does not support the FTContent operator must raise a static error if a full-text query contains one.

It is a static error if, during the static analysis phase, an implementation that restricts the use of FTLanguageOption to a single language, encounters more than one distinct language option.

An implementation may constrain the form of the expression used to compute scores. It is a static error if, during the static analysis phase, such an implementation encounters a scoring expression that does not meet the restriction.

It is a static error if, during the static analysis phase, an implementation that restricts the choices of FTCaseOption encounters the "lowercase" or "uppercase" option.

It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in .

It is a dynamic error if, in a function invocation, the argument corresponding to the specified function's collation parameter does not identify a supported collation.

XML Syntax (XQueryX) for XQuery 1.0 and XPath 2.0 Full-Text 1.0

defines an XML representation of . , section 5.4, XML Syntax, states "XQuery/XPath Full-Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements." This appendix specifies XML Schemas that together define the XML representation of XQuery 1.0 and XPath 2.0 Full-Text 1.0 by representing the abstract syntax found in . Because XQuery 1.0 and XPath 2.0 Full-Text 1.0 integrates seamlessly with XQuery 1.0 (, section 4.3, Composability, states that "XQuery/XPath Full-Text MUST be composable with XQuery, and SHOULD be composable with itself."), it follows that the XML Syntax for XQuery 1.0 and XPath 2.0 Full-Text 1.0 must integrate seamlessly with the XML Syntax for XQuery 1.0.

The XML Schema specified in this appendix accomplishes seamless integration by importing the XML Schema defined for XQueryX in , incorporating all of its type and element definitions. It then extends that schema by adding definitions of new types and elements in a namespace belonging to the Full-Text specification.

XQueryX representation of XQuery 1.0 and XPath 2.0 Full-Text 1.0

The XML Schema that defines the complex types and elements for XQueryX in support of XQuery 1.0 and XPath 2.0 Full-Text 1.0, including the ftcontainsExpr, incorporates a second XML Schema that defines types and elements to support the ftmatchOption. Both XML Schemas are defined in this section.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text-10" targetNamespace="http://www.w3.org/2007/xpath-full-text-10" elementFormDefault="qualified" attributeFormDefault="unqualified">     <xsd:import namespace="http://www.w3.org/2005/XQueryX" schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/> <xsd:include schemaLocation="./XQueryX-Full-Text-ftmatchOption-extensions.xsd"/> <xsd:element name="ftOptionDecl" substitutionGroup="xqx:prologPartTwoItem"> <xsd:complexType> <xsd:sequence> <xsd:element ref="xqxft:ftMatchOption"/> </xsd:sequence> </xsd:complexType> </xsd:element>  <xsd:complexType name="ftExpr"> <xsd:complexContent> <xsd:extension base="xqx:expr"/> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftExpr" type="xqxft:ftExpr" abstract="true" substitutionGroup="xqx:expr"/>  <xsd:element name="ftScoreVariableBinding" type="xqx:QName" substitutionGroup="xqx:forLetClauseItemExtensions"/>  <xsd:complexType name="ftContainsExpr"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftRangeExpr" type="xqx:exprWrapper" /> <xsd:sequence minOccurs="0" maxOccurs="1"> <xsd:element name="ftSelectionExpr" type="ftSelectionWrapper" /> <xsd:element name="ftIgnoreOption" type="ftIgnoreOption" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftContainsExpr" type="xqxft:ftContainsExpr" substitutionGroup="xqxft:ftExpr" />  <xsd:complexType name="ftProximity" /> <xsd:element name="ftProximity" type="xqxft:ftProximity" abstract="true"/>  <xsd:simpleType name="ftUnit"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="paragraph"/> <xsd:enumeration value="sentence"/> <xsd:enumeration value="word"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="ftBigUnit"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="paragraph"/> <xsd:enumeration value="sentence"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="contentLocation"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="at start"/> <xsd:enumeration value="at end"/> <xsd:enumeration value="entire content"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="ftScopeType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="same"/> <xsd:enumeration value="different"/> </xsd:restriction> </xsd:simpleType>  <xsd:complexType name="unaryRange"> <xsd:sequence> <xsd:element name="value" type="xqx:exprWrapper" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="binaryRange"> <xsd:sequence> <xsd:element name="lower" type="xqx:exprWrapper" /> <xsd:element name="upper" type="xqx:exprWrapper" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftRange"> <xsd:choice> <xsd:element name="atLeastRange" type="xqxft:unaryRange" /> <xsd:element name="atMostRange" type="xqxft:unaryRange" /> <xsd:element name="exactlyRange" type="xqxft:unaryRange" /> <xsd:element name="fromToRange" type="xqxft:binaryRange" /> </xsd:choice> </xsd:complexType>  <xsd:complexType name="ftOrderIndicator"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftOrderIndicator" type="ftOrderIndicator" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftWindow"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="value" type="xqx:exprWrapper" /> <xsd:element name="unit" type="xqxft:ftUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftWindow" type="xqxft:ftWindow" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftDistance"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="ftRange" type="xqxft:ftRange" /> <xsd:element name="unit" type="xqxft:ftUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftDistance" type="ftDistance" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftScope"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="type" type="xqxft:ftScopeType" /> <xsd:element name="unit" type="xqxft:ftBigUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftScope" type="xqxft:ftScope" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftContent"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="location" type="xqxft:contentLocation" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftContent" type="xqxft:ftContent" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftMatchOptionOrFTProximity"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:choice> <xsd:element ref="xqxft:ftMatchOption" /> <xsd:element ref="xqxft:ftProximity" /> </xsd:choice> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType>  <xsd:complexType name="ftSelection" > <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftSelectionSource" type="xqx:exprWrapper"/> <xsd:element name="optionsOrProximity" type="xqxft:ftMatchOptionOrFTProximity" minOccurs="0" maxOccurs="1" /> <xsd:element name="weight" type="xqx:exprWrapper" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftSelection" type="xqxft:ftSelection" substitutionGroup="xqxft:ftExpr" /> <xsd:complexType name="ftSelectionWrapper"> <xsd:sequence> <xsd:element ref="xqxft:ftSelection"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftIgnoreOption"> <xsd:sequence> <xsd:element ref="xqx:expr"/> </xsd:sequence> </xsd:complexType>  <xsd:element name="ftLogicalOp" type="xqx:binaryOperatorExpr" abstract="true" substitutionGroup="xqx:operatorExpr"/> <xsd:element name="ftOr" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <xsd:element name="ftAnd" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <xsd:element name="ftMildNot" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <xsd:element name="ftLogicalNot" type="xqx:unaryOperatorExpr" abstract="true" substitutionGroup="xqx:operatorExpr"/> <xsd:element name="ftUnaryNot" type="xqx:unaryOperatorExpr" substitutionGroup="xqxft:ftLogicalNot"/>  <xsd:complexType name="ftTimes"> <xsd:sequence> <xsd:element name="ftRange" type="xqxft:ftRange"/> </xsd:sequence> </xsd:complexType> <xsd:simpleType name="ftAnyAllOption"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="any"/> <xsd:enumeration value="all"/> <xsd:enumeration value="any word"/> <xsd:enumeration value="all words"/> <xsd:enumeration value="phrase"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="ftWordsAlternatives"> <xsd:choice> <xsd:element name="ftWordsLiteral" type="xqx:exprWrapper"/> <xsd:element name="ftWordsExpression" type="xqx:exprWrapper"/> </xsd:choice> </xsd:complexType> <xsd:complexType name="ftWords"> <xsd:sequence> <xsd:element name="ftWordsValue" type="xqxft:ftWordsAlternatives" /> <xsd:element name="ftAnyAllOption" type="xqxft:ftAnyAllOption" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:complexType> <xsd:group name="ftWordsWithTimes"> <xsd:sequence> <xsd:element name="ftWords" type="xqxft:ftWords" /> <xsd:element name="ftTimes" type="xqxft:ftTimes" minOccurs="0" /> </xsd:sequence> </xsd:group> <xsd:complexType name="ftWordsSelection"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr" > <xsd:choice> <xsd:element name="parenthesized" type="xqx:exprWrapper"/> <xsd:group ref="xqxft:ftWordsWithTimes" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftWordsSelection" type="xqxft:ftWordsSelection" substitutionGroup="xqxft:ftExpr"/> </xsd:schema> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text-10" targetNamespace="http://www.w3.org/2007/xpath-full-text-10" elementFormDefault="qualified" attributeFormDefault="unqualified">    <xsd:import namespace="http://www.w3.org/2005/XQueryX" schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/>  <xsd:complexType name="ftMatchOption" /> <xsd:element name="ftMatchOption" type="xqxft:ftMatchOption" abstract="true" /> <xsd:complexType name="ftMatchOptions"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:element ref="xqxft:ftMatchOption"/> </xsd:sequence> </xsd:complexType>  <xsd:complexType name="ftCaseOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="lowercase"/> <xsd:enumeration value="uppercase"/> <xsd:enumeration value="case sensitive"/> <xsd:enumeration value="case insensitive"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="case" type="xqxft:ftCaseOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftDiacriticsOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="diacritics sensitive"/> <xsd:enumeration value="diacritics insensitive"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="diacritics" type="xqxft:ftDiacriticsOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftStemOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="with stemming" /> <xsd:enumeration value="without stemming" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="stem" type="xqxft:ftStemOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftThesaurusID"> <xsd:sequence> <xsd:element name="at" type="xsd:string" /> <xsd:element name="relationship" type="xsd:string" minOccurs="0" /> <xsd:element name="levels" type="xqxft:ftRange" minOccurs="0" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="thesaurusSpecSequence"> <xsd:sequence> <xsd:choice> <xsd:element name="default" /> <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID" /> </xsd:choice> <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftThesaurusOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:choice> <xsd:element name="without" /> <xsd:element name="thesauri" type="xqxft:thesaurusSpecSequence" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="thesaurus" type="xqxft:ftThesaurusOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftRefOrList"> <xsd:choice> <xsd:element name="ref" type="xsd:string" /> <xsd:element name="list"> <xsd:complexType> <xsd:sequence> <xsd:element ref="xqx:stringConstantExpr" minOccurs="1" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:complexType> <xsd:element name="ftRefOrList" type="xqxft:ftRefOrList" /> <xsd:group name="baseStopwords"> <xsd:choice> <xsd:element name="default" /> <xsd:element ref="xqxft:ftRefOrList" /> </xsd:choice> </xsd:group> <xsd:complexType name="ftInclExclStringLiteral"> <xsd:choice> <xsd:element name="union" type="xqxft:ftRefOrList" /> <xsd:element name="except" type="xqxft:ftRefOrList" /> </xsd:choice> </xsd:complexType> <xsd:complexType name="stopwordsSpecSequence"> <xsd:sequence> <xsd:group ref="xqxft:baseStopwords" /> <xsd:element name="ftInclExclStringLiteral" type="xqxft:ftInclExclStringLiteral" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftStopwordOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:choice> <xsd:element name="without" /> <xsd:element name="stopwords" type="xqxft:stopwordsSpecSequence" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="stopword" type="xqxft:ftStopwordOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftLanguageOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value" type="xsd:string" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="language" type="xqxft:ftLanguageOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftWildCardOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption"> <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="with wildcards" /> <xsd:enumeration value="without wildcards" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="wildcard" type="xqxft:ftWildCardOption" substitutionGroup="xqxft:ftMatchOption" /> </xsd:schema> XQueryX stylesheet for XQuery 1.0 and XPath 2.0 Full-Text 1.0

The XSLT stylesheet that defines the semantics of XQueryX in support of XQuery 1.0 and XPath 2.0 Full-Text 1.0 integrates seamlessly with the XQueryX XSLT stylesheet defined in by importing the XQueryX XSLT stylesheet. It provides additional templates that define the semantics of the XQueryX representation of XQuery 1.0 and XPath 2.0 Full-Text 1.0 by transforming that XQueryX representation into the human readable syntax of XQuery 1.0 and XPath 2.0 Full-Text 1.0.

<?xml version='1.0'?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text-10" xmlns:xqx="http://www.w3.org/2005/XQueryX">     <xsl:import href="http://www.w3.org/2005/XQueryX/xqueryx.xsl"/>  <xsl:template match="xqxft:ftOptionDecl"> <xsl:text>declare ft-option </xsl:text> <xsl:apply-templates/> </xsl:template>  <xsl:template match="xqxft:ftScoreVariableBinding"> <xsl:text> score </xsl:text> <xsl:value-of select="$DOLLAR"/> <xsl:if test="@xqx:prefix"> <xsl:value-of select="@xqx:prefix"/> <xsl:value-of select="$COLON"/> </xsl:if> <xsl:value-of select="."/> </xsl:template>  <xsl:template match="xqxft:ftContainsExpr"> <xsl:apply-templates select="xqxft:ftRangeExpr"/> <xsl:text> ftcontains </xsl:text> <xsl:apply-templates select="xqxft:ftSelectionExpr"/> <xsl:apply-templates select="xqxft:ftIgnoreOption"/> </xsl:template> <xsl:template match="xqxft:value"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftRangeExpr"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftSelectionExpr"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftIgnoreOption"> <xsl:text>without content </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftSelection"> <xsl:apply-templates select="xqxft:ftSelectionSource"/> <xsl:value-of select="$NEWLINE"/> <xsl:text> </xsl:text> <xsl:apply-templates select="xqxft:optionsOrProximity"/> <xsl:text> </xsl:text> <xsl:apply-templates select="xqxft:weight"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftSelectionSource"> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:optionsOrProximity"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> <xsl:text> </xsl:text> </xsl:template>  <xsl:template match="xqxft:ftOrderIndicator"> <xsl:text>ordered</xsl:text> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftWindow"> <xsl:text>window </xsl:text> <xsl:apply-templates select="xqxft:value"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftDistance"> <xsl:text>distance </xsl:text> <xsl:apply-templates select="xqxft:ftRange"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftScope"> <xsl:value-of select="xqxft:type"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftContent"> <xsl:value-of select="xqxft:location"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:exactlyRange"> <xsl:text>exactly </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:atLeastRange"> <xsl:text>at least </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:atMostRange"> <xsl:text>at most </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:fromToRange"> <xsl:text>from </xsl:text> <xsl:apply-templates select="xqxft:lower"/> <xsl:text> to </xsl:text> <xsl:apply-templates select="xqxft:upper"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:lower"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:upper"> <xsl:apply-templates/> </xsl:template>  <xsl:template match="xqxft:case"> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:diacritics"> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:stem"> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:thesaurus"> <xsl:choose> <xsl:when test="without"> <xsl:text>without thesaurus </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:thesauri"> <xsl:text>with thesaurus </xsl:text> <xsl:choose> <xsl:when test="child::*[2]"> <xsl:call-template name="parenthesizedList"/> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xqxft:default"> <xsl:text>default </xsl:text> </xsl:template> <xsl:template match="xqxft:thesaurusID"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:at"> <xsl:text>at "</xsl:text> <xsl:value-of select="."/> <xsl:text>" </xsl:text> </xsl:template> <xsl:template match="xqxft:relationship"> <xsl:text>relationship "</xsl:text> <xsl:value-of select="."/> <xsl:text>" </xsl:text> </xsl:template> <xsl:template match="xqxft:levels"> <xsl:apply-templates/> <xsl:text> levels </xsl:text> </xsl:template>  <xsl:template match="xqxft:stopword"> <xsl:choose> <xsl:when test="without"> <xsl:text>without stop words </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:stopwords"> <xsl:text>with </xsl:text> <xsl:choose> <xsl:when test="default"> <xsl:text>default stop words </xsl:text> </xsl:when> <xsl:otherwise> <xsl:text>stop words </xsl:text> </xsl:otherwise> </xsl:choose> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:refOrList"> <xsl:choose> <xsl:when test="ref"> <xsl:text>at "</xsl:text> <xsl:value-of select="ref"/> <xsl:text>" </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xqxft:list"> <xsl:call-template name="parenthesizedList"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftInclExclStringLiteral"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:union"> <xsl:text>union </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:except"> <xsl:text>except </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:language"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:wildcard"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:ftAnd"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text>ftand </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftOr"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text>ftor </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:mildNot"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text>not in </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:unaryNot"> <xsl:text>ftnot </xsl:text> <xsl:apply-templates select="xqx:operand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftWordsSelection"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:parenthesized"> <xsl:text>( </xsl:text> <xsl:apply-templates/> <xsl:text> ) </xsl:text> </xsl:template> <xsl:template match="xqxft:ftWords"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsValue"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsLiteral"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsExpression"> <xsl:text> { </xsl:text> <xsl:apply-templates/> <xsl:text> } </xsl:text> </xsl:template> <xsl:template match="xqxft:ftAnyAllOption"> <xsl:value-of select="."/> </xsl:template> <xsl:template match="xqxft:ftTimes"> <xsl:text>occurs </xsl:text> <xsl:apply-templates/> <xsl:text>times </xsl:text> </xsl:template> </xsl:stylesheet> XQueryX for XQuery 1.0 and XPath 2.0 Full-Text 1.0 example

The following examples are based on the data and queries in the use cases in . For each example, we show the English description of the query, the XQuery Full-Text solution given in , an XQueryX solution, and the XQuery Full-Text expression that results from applying the Full-Ttext XQueryX-to-XQuery Full-Text transformation defined by the stylesheet in to the Full-Text XQueryX solution. That produced XQuery Full-Text expression is presented only as a sanity-check — the intent of the stylesheet is not to create the identical XQuery Full-Text expression given in , but to produce a valid XQuery Full-Text expression with the same semantics. The semantics of the Full-Text XQueryX solution are determined by the semantics of the XQuery Full-Text expression that results from that transformation. The "correctness" of that transformation is determined by asking the following the question: Can some Full-Text XQueryX processor QX process some Full-Text XQueryX document D1 to produce results R1, after which the stylesheet is used to translate D1 into an XQuery Full-Text expression E1 that, when processed by some XQuery Full-Text processor Q, produces results R2 that are equivalent (under some meaningful definition of "equivalent") to results R1?

Comparison of the results of the Full-Text XQueryX-to-XQuery Full-Text transformation given in this document with the XQuery Full-Text solutions in the may be helpful in evaluating the correctness of the Full-Text XQueryX solution in each example.

The XQuery Full-Text Use Cases solution given for each example is provided only to assist readers of this document in understanding the Full-Text XQueryX solution. There is no intent to imply that this document specifies a "compilation" or "transformation" of XQuery Full-Text syntax into Full-Text XQueryX syntax.

In the following examples, note that path expressions are expanded to show their structure. Also, note that the prefix syntax for binary operators like "and" makes the precedence explicit. In general, humans find it easier to read an XML representation that does not expand path expressions, but it is less convenient for programmatic representation and manipulation. XQueryX is designed as a language that is convenient for production and modification by software, and not as a convenient syntax for humans to read and write.

Finally, please note that white space, including new lines, have been added to some of the Full-Text XQueryX documents and XQuery Full-Text expressions for readability. That additional white space is not produced by the Full-Text XQueryX-to-XQuery Full-Text transformation.

Example

Here is Q4 from the , use case SCORE: "All Queries May Be Written with Score, Queries in this Section Must Be Written with Score"

XQuery solution in XQuery 1.0 and XPath 2.0 Full-Text 1.0 Use Cases: declare function local:filter ( $nodes as node()*, $exclude as element()* ) as node()* { for $node in $nodes except $exclude return typeswitch ($node) case $e as element() return element {node-name($e)} { $e/@*, filter( $e/node() except $exclude, $exclude ) } default return $node }; for $book in doc("http://bstore1.example.com/full-text.xml") /books/book let $irrelevantParts := for $part in $book//part let score $score := $part ftcontains "usability test.*" with wildcards where $score < 0.5 return $part where count($irrelevantParts) < count($book//part) return filter($book, $irrelevantParts) A Solution in Full-Text XQueryX: <?xml version="1.0"?> <xqx:module xmlns:xqxft="http://www.w3.org/2007/xpath-full-text-10" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2007/xpath-full-text-10 http://www.w3.org/2007/xpath-full-text-10/XQueryX-Full-Text-extensions.xsd http://www.w3.org/2005/XQueryX http://www.w3.org/2005/XQueryX/xqueryx.xsd"> <xqx:mainModule> <xqx:prolog> <xqx:functionDecl> <xqx:functionName xqx:prefix="local">filter</xqx:functionName> <xqx:paramList> <xqx:param> <xqx:varName>nodes</xqx:varName> <xqx:typeDeclaration> <xqx:anyKindTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator> </xqx:typeDeclaration> </xqx:param> <xqx:param> <xqx:varName>exclude</xqx:varName> <xqx:typeDeclaration> <xqx:elementTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator> </xqx:typeDeclaration> </xqx:param> </xqx:paramList> <xqx:typeDeclaration> <xqx:anyKindTest/> </xqx:typeDeclaration> <xqx:functionBody> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>node</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:exceptOp> <xqx:firstOperand> <xqx:varRef> <xqx:name>nodes</xqx:name> </xqx:varRef> </xqx:firstOperand> <xqx:secondOperand> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:secondOperand> </xqx:exceptOp> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:returnClause> <xqx:typeswitchExpr> <xqx:argExpr> <xqx:varRef> <xqx:name>node</xqx:name> </xqx:varRef> </xqx:argExpr> <xqx:typeswitchExprCaseClause> <xqx:variableBinding>e</xqx:variableBinding> <xqx:sequenceType> <xqx:elementTest/> </xqx:sequenceType> <xqx:resultExpr> <xqx:computedElementConstructor> <xqx:tagNameExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">node-name</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:tagNameExpr> <xqx:contentExpr> <xqx:sequenceExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:attributeName> <xqx:star/> </xqx:attributeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:pathExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">filter</xqx:functionName> <xqx:arguments> <xqx:exceptOp> <xqx:firstOperand> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:anyKindTest/> </xqx:stepExpr> </xqx:pathExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:secondOperand> </xqx:exceptOp> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:sequenceExpr> </xqx:contentExpr> </xqx:computedElementConstructor> </xqx:resultExpr> </xqx:typeswitchExprCaseClause> <xqx:typeswitchExprDefaultClause> <xqx:resultExpr> <xqx:varRef> <xqx:name>node</xqx:name> </xqx:varRef> </xqx:resultExpr> </xqx:typeswitchExprDefaultClause> </xqx:typeswitchExpr> </xqx:returnClause> </xqx:flworExpr> </xqx:functionBody> </xqx:functionDecl> </xqx:prolog> <xqx:queryBody> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>book</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">doc</xqx:functionName> <xqx:arguments> <xqx:stringConstantExpr> <xqx:value>http://bstore1.example.com/full-text.xml</xqx:value> </xqx:stringConstantExpr> </xqx:arguments> </xqx:functionCallExpr> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:nameTest>books</xqx:nameTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:nameTest>book</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:letClause> <xqx:letClauseItem> <xqx:typedVariableBinding> <xqx:varName>irrelevantParts</xqx:varName> </xqx:typedVariableBinding> <xqx:letExpr> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>part</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:nameTest>part</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:letClause> <xqx:letClauseItem> <xqxft:ftScoreVariableBinding>score</xqxft:ftScoreVariableBinding> <xqx:letExpr> <xqxft:ftContainsExpr> <xqxft:ftRangeExpr> <xqx:varRef> <xqx:name>part</xqx:name> </xqx:varRef> </xqxft:ftRangeExpr> <xqxft:ftSelectionExpr> <xqxft:ftSelection> <xqxft:ftSelectionSource> <xqx:stringConstantExpr> <xqx:value>usability test.*</xqx:value> </xqx:stringConstantExpr> </xqxft:ftSelectionSource> <xqxft:optionsOrProximity> <xqxft:wildcard> <xqxft:value>with wildcards</xqxft:value> </xqxft:wildcard> </xqxft:optionsOrProximity> </xqxft:ftSelection> </xqxft:ftSelectionExpr> </xqxft:ftContainsExpr> </xqx:letExpr> </xqx:letClauseItem> </xqx:letClause> <xqx:whereClause> <xqx:lessThanOp> <xqx:firstOperand> <xqx:varRef> <xqx:name>score</xqx:name> </xqx:varRef> </xqx:firstOperand> <xqx:secondOperand> <xqx:decimalConstantExpr> <xqx:value>0.5</xqx:value> </xqx:decimalConstantExpr> </xqx:secondOperand> </xqx:lessThanOp> </xqx:whereClause> <xqx:returnClause> <xqx:varRef> <xqx:name>part</xqx:name> </xqx:varRef> </xqx:returnClause> </xqx:flworExpr> </xqx:letExpr> </xqx:letClauseItem> </xqx:letClause> <xqx:whereClause> <xqx:lessThanOp> <xqx:firstOperand> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">count</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>irrelevantParts</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">count</xqx:functionName> <xqx:arguments> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:nameTest>part</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:arguments> </xqx:functionCallExpr> </xqx:secondOperand> </xqx:lessThanOp> </xqx:whereClause> <xqx:returnClause> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="local">filter</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> <xqx:varRef> <xqx:name>irrelevantParts</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:returnClause> </xqx:flworExpr> </xqx:queryBody> </xqx:mainModule> </xqx:module> Transformation of Full-Text XQueryX Solution into XQuery Full-Text

Application of the stylesheet in to the Full-Text XQueryX solution results in:

declare function local:filter($nodes as node()*, $exclude as element()*) as node() { ( for $node in ($nodes except $exclude) return ( typeswitch($node) case $e as element() return element {fn:node-name($e)} {( $e/child::attribute(*), fn:filter( ($e/child::node() except $exclude), $exclude ) )} default return $node ) ) }; ( for $book in fn:doc("http://bstore1.example.com/full-text.xml")/child::books/child::book let $irrelevantParts:= ( for $part in $book/descendant-or-self::part let score $score:=$part ftcontains "usability test.*" with wildcards where ($score < 0.5) return $part ) where (fn:count($irrelevantParts) < fn:count($book/descendant-or-self::part)) return local:filter($book, $irrelevantParts) ) References Normative References XQuery 1.0 and XPath 2.0 Full-Text Requirements, Stephen Buxton, Michael Rys, Editors. World Wide Web Consortium, 18 May 2007. This version is http://www.w3.org/TR/2007/WD-xpath-full-text-10-requirements-20070518/. The latest version is available at http://www.w3.org/TR/xpath-full-text-10-requirements/. A. Phillips and M. Davis. Tags for Identifying Languages. IETF BCP 47. See http://tools.ietf.org/html/bcp47. This reference leads to and and replaces . S. Bradner. Key Words for use in RFCs to Indicate Requirement Levels. IETF RFC 2119. See http://www.ietf.org/rfc/rfc2119.txt. H. Alvestrand. Tags for the Identification of Languages. IETF RFC 3066. See http://www.ietf.org/rfc/rfc3066.txt. A. Phillips and M. Davis. Tags for Identifying Languages. IETF RFC 4646. See http://www.ietf.org/rfc/rfc4646.txt. A. Phillips and M. Davis. Matching of Language Tags. IETF RFC 4647. See http://www.ietf.org/rfc/rfc4647.txt. Non-normative References Documentation Guidelines for the Establishment and Development of Monolingual Thesauri, Geneva: International Organization for Standardization, 2nd edition, 1986. ISO/IEC 13249-2 Information technology --- Database languages --- SQL Multimedia and Application Packages --- Part 2: Full-Text. Geneva: International Organization for Standardization, 2nd edition, 2003.

Acknowledgements

We would like to thank the members of the XQuery and XPath Full-Text group for their fruitful discussions.

We would like to thank the following people for their contributions on earlier drafts of this document.

"Andrew Eisenberg" - IBM - andrew.eisenberg@us.ibm.com

"Roland Seiffert" - IBM - seiffert@de.ibm.com

"Andrew Cencini" - Microsoft - acencini@microsoft.com

"Nimish Khanolkar" - Microsoft - nimishk@exchange.microsoft.com

"Ashok Malhotra" Oracle - ashok.malhotra@oracle.com

"Tapas Nayak" Microsoft - tapasnay@exchange.microsoft.com

Glossary

Checklist of Implementation-Defined Features

This appendix provides a summary of features defined in this specification whose effect is explicitly implementation-defined. The conformance rules require vendors to provide documentation that explains how these choices have been exercised.

Each word MUST consist of one or more consecutive characters;

The tokenizer MUST preserve the containment hierarchy (paragraphs contain sentences contain words); and

The tokenizer MUST, when tokenizing two equal strings, identify the same tokens in each.

Implementations are free to provide implementation-defined ways to differentiate between markup's effect on token boundaries during tokenization.

The set of expressions (of form ExprSingle) that can be assigned to a score variable in a let-clause is implementation-defined.

The match option application order, subject to the stated constraints, is implementation-defined.

It is implementation-defined what a stem of a word is and whether stemming will based on an algorithm, dictionary, or mixed approach.

It is implementation-defined which thesaurus relationships an implementation supports.

The behavior of the implementation when it encounters a combination of thesauri, levels, and relationships that it does not support is implementation-defined.

When the option "with default stop words" is used, an implementation-defined collection of stop words is used.

When a stop word is specified in a query, then the number of tokens in the text that are matched by that stop word is implementation-defined.

The set of valid language identifiers is implementation-defined.

The behavior of the implementation when it encounters a language identifier it does not support is implementation-defined.

Certain values in the static context (see ) that can be overwritten or augmented by implementations are implementation-defined.

Which namespace URIs will be recognized for denoting extension selection pragmas is implementation-defined, as is the syntax and behavior of recognized pragmas.

Which namespace URIs will be recognized for denoting extension options is implementation-defined, as is the syntax and behavior of recognized options.

Change Log

Sihem Amer-Yahia	2005-04-08	Updated case matrix	Updated case matrix row "sensitive", column "CCI" from "case-insensitive variant of CCI if it exists, else error" to "case-sensitive variant of CCI if it exists, else error".
Sihem Amer-Yahia	2005-05-02	Closed issues with no changes	Closed Cluster B, Issue 28 IGNORE Syntax with no change to the document. Closed Cluster B, Issue 50 IGNORE Queries with no change to the document.
Sihem Amer-Yahia	2005-05-02	Updated FTTimes syntax	Closed Cluster G, Issue 14 FTTimesSelection and added a related bullet item in Section 3.
Sihem Amer-Yahia	2005-05-02	Updated FTWildCard syntax	Updated FTWildCardOption in Section 3.
Sihem Amer-Yahia	2005-05-03	Updated introduction	Replaced "semantic element" with "semantic markup" and "tag" with "element" in the introduction.
Sihem Amer-Yahia	2005-05-03	Added issue on error codes	Added Cluster J, Issue 59 Error Codes.
Sihem Amer-Yahia	2005-05-03	Closed issues with no change	Closed Cluster A, Issue 54 Weight Granularity in Scoring with same resolution as for Cluster A, Issue 5 Score Weighting, no further change to document. Closed Cluster H, Issue 9 Window with no change to the document. Closed Cluster H, Issue 19 FTScopeSelection on structure with no change to the document. Closed Cluster E, Issue 25 MatchOption Syntax with no change to the document. Closed Cluster H, Issue 44 FTContains Semantics with no change to the document.
Sihem Amer-Yahia	2005-05-03	Updated FTContent syntax	Updated FTContent adding "entire content", Closed Cluster C, Issue 39 Exact Element Content.
Sihem Amer-Yahia	2005-05-03	Closed issue on Boolean Naming	Closed Cluster F, Issue 38 Boolean Naming. Changes to the document are pending awaiting a decision on whether it is OK to use "and", "or", "not" for full-text. If so change existing symbols to "and", "or", "not". If not change existing symbols to "ftand", "ftor", "ftnot".
Chavdar Botev	2005-05-03	Updated FTDistance semantics	Updated the semantics for distance.
Sihem Amer-Yahia	2005-05-03	Updated FTRange syntax	Made "exactly" required before an exact number in FTRange. Closed Cluster F, Issue 43 Exactly in FTRangeSpec.
Sihem Amer-Yahia	2005-05-04	Closed issue on collations	Closed Cluster D, Issue 57 Collations Match Option.
Jochen Doerre	2005-05-19	Added issue on scoring	Added Cluster A, Issue 60 Extended Scoring.
Chavdar Botev	2005-06-29	Added issue on FTNegation	Added Cluster G, Issue 62 Precise semantics of double negation.
Chavdar Botev	2005-06-29	Added issue on FTTimes	Added Cluster G, Issue 61 Desired semantics of FTTimes.
Sihem Amer-Yahia	2005-07-11	Updated FTMildNegation syntax	Updated the mild not syntax from "mild not" to "not in". Closed Cluster I, Issue 10 MildNot and Cluster F, Issue 41 Mildnot Naming.
Chavdar Botev	2005-07-12	Updated FTIgnore semantics	Changed semantics of FTIgnoreOption.
Sihem Amer-Yahia	2005-07-18	Corrected error codes	Corrected and added error codes, closing and implementing the resolution for Cluster J Issue 59 Error Codes.
Sihem Amer-Yahia	2005-07-18	Closed issues with no changes	closed Cluster I, Issue 13 "loose-grammar" leaving the grammar as it is. Closed issue Cluster D, Issue 53 "matchoptions-default" with no change to the document. Closed Cluster H, Issue 58 "ft-about-operator" with no change to the document.
Sihem Amer-Yahia	2005-07-21	Updated score syntax	Closed Cluster A, Issue 60 "new-scoring-proposal" and Issue 2 "scoring-values" and updated Section 2.2 Score Clause to reflect new score syntaxes. There are now syntaxes for scored queries 1) returning the same results as queries with Boolean predicates and 2) for returning more or fewer results.
Sihem Amer-Yahia	2005-07-21	Added appendix for defaults	Added appendix for defaults in the query prolog analogous to C.1 in the XQuery language document.
Sihem Amer-Yahia	2005-07-21	Updated FTThesaurus section	Aligned description in Section 3.2.4 FTThesaurusOption with current grammar.
Sihem Amer-Yahia	2005-07-21	Opened and closed issue on nested FTNegation	Opened and closed Cluster I, Issue 65 Nested FTNegations on the right side of an FTMildNegation.
Chavdar Botev	2005-07-25	Updated FTMildNegation semantics	Changed the semantics of MildNot.
Sihem Amer-Yahia	2005-08-10	Added Change Log	Added Change Log harvesting back entries from CVS change log.
Jochen Doerre	2005-08-17	Grammar changes	Changed XQuery/XPath grammar for new scoring syntax (resolution of Issue 60), for match option defaults in query prolog (resolution of Issue 45), for simplified window operator (resolution to Issue 51), renamed "mild not" to "not in" (resolution of Issue 41), modified FTThesaurusOption, FTStopwordOption and FTLanguageOption to require StringLiterals as decided in May 05 F2F.
Jochen Doerre	2005-08-17	Changes to Section 2	New scoring syntax introduced; rewritten most of 2.2. Corrected use of weights in 2.2.1 (wrong default, wrong use of 1.5)
Jochen Doerre	2005-08-17	Changes to Section 3	Adapting the explanations to changed syntax for FTWindow, FTThesaurusOption, FTStopwordOption and FTLanguageOption. Also corrected a couple of example explanations. Removed FTIgnoreOption from the list of match option defaults in 3.2 Corrected explanation and example of FTLanguageOption (diacritics nor case are language-specific!). Commented out last two examples of FTDistance, because distance 15 does not work for phrases.
Jochen Doerre	2005-08-17	Appendices A+B	Adapted introductory comment about which version of the XQuery/XPath grammars we are aligned to.
Jochen Doerre	2005-08-17	Dates in Header	Adapted current date and previous date and links in full-text-query-language-semantics.xml and in tqheader.xml.
Jochen Doerre	2005-08-19	Added Section 2.3, Changes in 3+4	Added Section 2.3 Extension to Static Context. Changed Sections 3.2 and 4.4.1.1 to refer to match option settings in the static context.
Jochen Doerre	2005-08-19	Added Issue 63	Added Cluster G Issue 63: Distance constraints do not work on phrases.
Jochen Doerre	2005-08-19	Changes in Section 4	Adapted semantics to new scoring feature (resolution of Issue 60), changed FTWindow semantics according to resolution of Issue 51, and cleaned examples.
Jochen Doerre	2005-08-19	Appendix G	Added lines for statically known thesauri and stop lists.
Jochen Doerre	2005-08-25	Added Issue 64	Added Cluster E Issue 64:System Relative Operator Defaults (using wording proposed by Pat Case).
Jochen Doerre	2005-10-10	Changes in Section 3	Rephrased Section 3.2.7 FTIgnoreOption. Explanation and example adapted to simple (non-recursive) use of "ignore".
Jochen Doerre	2005-10-10	Changes in Section 4	Incorporated Section 4.3.1.4 Match and AllMatches Normal Form.
Sihem Amer-Yahia	2005-10-12	Incorporated comments	Incorporated Pat's comments at http://lists.w3.org/Archives/Member/member-query-fttf/2005Sep/0068.html
Jim Melton	2005-10-20	Changes in Sections 3 and 4	Properly marked up errors and inserted error summary appendix. Re-ordered appendices so normative appendices precede non-normative appendices.
Jochen Doerre	2005-10-24	Final editings	Included corrections to examples in Section 3. Changed meaning of distance 0 for sentences (paragraphs) to mean adjacent. Rework of Appendix H Checklist of Implementation-Defined Features. Resolution texts to issues 45, 59, and 62.
Jochen Doerre	2005-11-28	Restrict FTTimes to FTWords	Modified EBNF syntax to allow the FTTimes operation to be applicable only to simple FTWords.
Jochen Doerre	2005-11-28	Re: Bug 2299: Changes to Section 4	The AllMatches model has been changed to allow the TokenInfo of a StringMatch to represent an interval of token positions, instead of single positions. Thus, a phrase is now modeled using a single StringMatch, and consequently distance constraints (which always apply to the individual StringMatches) can be used to constrain the entire phrase. In addition, this change allows to model overlapping tokens. The semantics functions for FTOrder (order now constrains the start positions of tokens), for FTScope, for FTDistance (a distance constraint requires a certain number of positions between the end of one token and the start of the next) and for FTWindows have been adapted.
Jochen Doerre	2006-01-09	Issues List removed	Dropped Appendix I "Issues List", as issues are tracked in Bugzilla now.
Mary Holstege	2006-02-01	Static context	Added known languages to static context.
Jochen Doerre	2006-03-06	Bug 2776	Changed EBNF grammar to allow weights to be specified using RangeExpr.
Mary Holstege	2006-03-30	Updated Tokenization 4.2.7	Expanded and clarified definition. Added examples.
Pat Case	2006-04-13	Replaced glossary	Removed glossary copied from the XQuery language document and inserted coding to produce a full-text glossary.
Jochen Doerre	2006-04-24	Section 2	Added new Processing Model section.
Jochen Doerre	2006-04-25	Section 4	Included the completely revised semantics schemata and functions, which now (i) correctly handle interval-based TokenInfos, (ii) separate the representation of TokenInfos and SearchTokenInfos and SearchItems, (iii) have been simplified regarding the semantics of match options by no longer separating the implementation-defined matching function from (most of) the implementation-defined application of match options, and (iv) have been type- and syntax-checked.
Mary Holstege	2006-05-31	Bug 2483	Clarified type constraints on full-text operator parameters in Section 3. Revised EBNF to be more specific in some cases.
Jochen Doerre	2006-08-04	Bug 3374	Revised complete example in Section 4.3.3.
Jim Melton	2006-08-17	Added XQueryX support	Added new normative appendix defining the XML schemas and XSLT stylesheet necessary for XQuery 1.0 and XPath 2.0 Full-Text 1.0 to integrate into XQueryX.
Jochen Doerre	2006-08-21	Bug 3439	Fixed FTMildNot semantics.
Mary Holstege	2006-08-22	Conformance	Added new conformance section as section 5. Add error code definitions to appendix D.
Mary Holstege	2006-08-22	FTWords	Fixed wording of FTWords with respect to type constraints.
Mary Holstege	2006-10-05	Score Variables	Added more complex scoring examples as clarification for bug #3596.
Mary Holstege	2006-10-05	FTSelection	Improved reading flow for examples. Make linkage of non-terminals consistent.
Mary Holstege	2006-11-01	Overall	Reorganized structure of document to improve reading flow.
Jim Melton	2006-12-26	FTLanguageOption	Revised text dealing with FTLanguageOption values that do not identify a known, defined language in RFC 3066. Added reference to RFC 4646.
Jim Melton	2006-12-26	FTLanguageOption and FTContainsExpr	Added text saying that a Full-Text processor SHOULD use xml:lang information when choosing collations and when processing FTMatchOptions. Also added text saying that an xml:lang specification SHOULD take precedence over an FTLanguageOption specification.
Jim Melton	2006-12-26	Tokenization	Made changes clarifying that tokenization SHOULD be implementation-defined (implicitly permitting it to be implementation-dependent).
Jochen Doerre	2007-01-22	Definitions for implementation-defined/ -dependent.	Added definitions for implementation-defined/dependent to Introduction as in XQuery document. Added links throughout the paper.
Jochen Doerre	2007-02-17	Bug 3698	Removed options "with diacritics", "without diacritics".
Jochen Doerre	2007-02-17	Bug 3914	Changed syntax of Booleans to "ftand", "ftor", "ftnot".
Jochen Doerre	2007-02-17	Bug 3920	Changed 3rd example in 3.3.7 FTDistance and added a 4th.
Jim Melton	2007-02-25	Bug 3935	Added text to define how wildcard characters can be escaped so they can be used in a search.
Pat Case	2007-02-26	Itemized sample tokens in 3 FTSelections	To resolve Bug 3913, added a sentence itemizing the first 5 tokens in the sample tokenization.
Pat Case	2007-02-26	Corrected example in 3.3.7 FTDistance	To resolve Bug 3920, corrected the first example and preceding text in 3.3.7 FTDistance to remove the "not in" operator and to use terms from the sample data.
Pat Case	2007-02-26	Inserted sentence into 3.2.6 FTLanguageOption	To resolve Bug 3926, inserted sentence into 3.2.6 FTLanguageOption saying that the "language" option MAY influence the behavior of other match options.
Pat Case	2007-02-26	Inserted a sentence into 3.2.5 FTStopWordOption	To resolve Bug 3930, inserted a sentence into 3.2.5 FTStopWordOption saying that "union" and "except" are applied from left to right.
Pat Case	2007-02-26	Added a note to 3.2.5 FTStopWordOption	To resolve Bug 3932, added a note to 3.2.5 FTStopWordOption saying Stop word lists MAY be applied during indexing. If applied during indexing asking for stop words to not be used during a query, will have no effect.
Pat Case	2007-02-26	Added a note to 3.4 FTIgnore	To resolve Bug 3936, added a note to 3..4 FTIgnore saying Nodes MAY be ignored during indexing and during query processing. Ignore option applies only to query processing. Whether and how indexing ignores nodes is out of scope for this specification.
Jochen Doerre	2007-02-26	Bug 3924	Changed grammar for match options: now precedence of match options is higher than Booleans. Included restriction to have at most one option of a group at a level.
Jochen Doerre	2007-02-27	Bug 3910, 3924, 3928	Reformulated what the case options mean. Added lower/uppercase as possible values for the case option to table in Appendix C (Static Context Components) and put rules and alternatives in the grammar into a more logical order. Also ordered tables and lists in the text the same.
Jochen Doerre	2007-03-02	Bug 3737	Reformulated and restructured most of section 3. Added explanation of the application structure of positional filters (formerly: FTProximities) and how match options take effect. Renamed the following grammar symbols: FTWordsSelection to FTPrimary, FTWordsMatches to FTPrimaryWithOptions, FTProximity to FTPosFilter.
Mary Holstege	2007-04-02	Bugs 4345, 4355, 4358, 4445	Reworked description of the wildcard option and added a new example. Added note on the effect when the lower bound of a range is greater than the upper bound. Fixed FTContent example to be "with wildcards".
Jochen Doerre	2007-04-09	Bug 3939	Added example for overlapping tokens in 4.1.
Jochen Doerre	2007-04-09	Bug 3931	Added match option application order, as agreed in FTTF-136.
Mary Holstege	2007-04-19	Conformance	Made support for uppercase and lowercase FTCaseOptions optional.
Mary Holstege	2007-04-19	Extensions	Added text to describe extension options and selections.
Jochen Doerre	2007-04-19	Bug 4386	And-selection description fixed in Sec. 3.
Jochen Doerre	2007-04-20	Bugs 3898, 4388	Finalized the additions needed to allow for nested FTDistance/FTWindow.
Jochen Doerre	2007-04-23	Section 4	Simplifications to the match option schemata and processing.
Mary Holstege	2007-04-25	Schemas	Misc. editorial improvements to schemas.