XQuery and XPath Full Text 1.0

CR-xpath-full-text-10

W3C Candidate Recommendation

16 May 2008 http://www.w3.org/TR/2008/CR-xpath-full-text-10-20080516/ XML http://www.w3.org/TR/xpath-full-text-10/ Sihem Amer-Yahia AT&T Labs - Research Chavdar Botev Invited Expert Stephen Buxton Mark Logic Corporation Pat Case Library of Congress Jochen Doerre IBM Mary Holstege Mark Logic Corporation Jim Melton Oracle Michael Rys Microsoft Jayavel Shanmugasundaram Invited Expert

This document defines the syntax and formal semantics of XQuery and XPath Full Text 1.0 which is a language that extends XQuery 1.0 and XPath 2.0 with full-text search capabilities.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

W3C publishes a Candidate Recommendation, as described in the Process Document, to indicate that the document is believed to be stable and to encourage implementation by the developer community. The publication of this document constitutes a call for implementations of this specification.

This document has been jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity. It will remain a Candidate Recommendation until at least 15 September 2008. The Working Groups expect to advance this specification to Recommendation Status.

The XML Query Working Group and XSL Working Group intend to submit this document for consideration as a W3C Proposed Recommendation as soon as the following conditions are all met:

A test suite is available that tests each identified XQuery and XPath Full Text 1.0 feature, both required and optional.

Minimal Conformance to this specification, as defined in , has been demonstrated by at least two distinct implementations, at least one of which uses the XQuery human-readable syntax defined in this specification.

An XPath Full Text parsing applet that generates XQueryX is available.

The Working Groups have responded formally to all issues raised during the CR period against this document.

Once the entrance criteria for Proposed Recommendation have been achieved, the Director will be requested to advance this document to Proposed Recommendation status. Working closely with the developer community, we expect to show evidence of implementations by approximately 15 September 2008.

The 15 optional features are each individually at risk. Optional features for which there are not at least two implementations at the end of the Candidate Recommendation period may be removed from this specification.

The WG believes that this document, published on 16 May 2008, is sufficiently mature and stable for the development community to begin developing implementation experience and reporting on that experience.

The WGs particularly solicit feedback regarding how thesauri are to be used in combination.

No implementation report currently exists. However, a Test Suite for this document is under development. Implementors are encouraged to run this test suite and report their results. The Test Suite can be found at http://dev.w3.org:/cvsweb/2007/xpath-full-text-10-test-suite/.

This document incorporates changes made against the Last Call Working Draft of 18 May 2007. Changes to this document since the Last Call Working Draft are detailed in .

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FT]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

English EBNF

SA January 2004: First version of document before Feb F2F

SA 26 February 2004: Second version of document before Feb F2F meetings.

JM 18 May 2007: Last Call Working Draft

Introduction

This document defines the language and the formal semantics of XQuery and XPath Full Text 1.0. This language is designed to meet the requirements identified in W3C XQuery and XPath Full Text Requirements and to support the queries in the W3C XQuery and XPath Full Text Use Cases .

XQuery and XPath Full Text 1.0 extends the syntax and semantics of XQuery 1.0 and XPath 2.0.

Additionally, this document defines an XML syntax for XQuery and XPath Full Text 1.0. The most recent versions of the two XQueryX XML Schemas and the XQueryX XSLT stylesheet for XQuery and XPath Full Text 1.0 are available at http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx.xsd, http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx-ftmatchoption-extensions.xsd, and http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx.xsl, respectively.

Full-Text Search and XML

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 1.0 and XPath 2.0.

XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.

Full-text search is different from substring search in many ways:

A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases the 20.9 version ...". A full-text search for the token "lease" will not.

There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as 'mouse'" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens 'XML' and 'Query' allowing up to 3 intervening tokens".

Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text.

Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization. Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.

Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization is defined more formally in .

A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined. A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.

Consecutive tokens need not be separated by either punctuation or space, and tokens may overlap.

In some natural languages, tokens and words can be used interchangeably.

A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.

A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.

Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., to indicate bold. Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization. In the absence of an implementation-defined way to differentiate, element markup (start tags, end tags, and empty-element tags) creates token boundaries.

A sample tokenization is used for the examples in this document. The results might be different for other tokenizations.

Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).

Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).

This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.

Certain aspects of language processing are described in this specification as implementation-defined or implementation-dependent.

Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.

Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.

Organization of this document

This document is organized as follows. We first present a high level syntax for the XQuery and XPath Full Text 1.0 language along with some examples. Then, we present the syntax and examples of the basic primitives in the XQuery and XPath Full Text 1.0 language. This is followed by the semantics of the XQuery and XPath Full Text 1.0 language. The appendix contains a section that provides an EBNF for the XPath 2.0 Grammar with Full-Text Extensions, an EBNF for XQuery 1.0 Grammar with Full-Text Extensions, acknowledgements and a glossary.

A word about namespaces

Certain namespace prefixes are predeclared by XQuery 1.0 and, by implication, by this specification, and bound to fixed namespace URIs. These namespace prefixes are as follows:

xml = http://www.w3.org/XML/1998/namespace

xs = http://www.w3.org/2001/XMLSchema

xsi = http://www.w3.org/2001/XMLSchema-instance

fn = http://www.w3.org/2005/xpath-functions

local = http://www.w3.org/2005/xquery-local-functions

In addition to the prefixes in the above list, this document uses the prefix err to represent the namespace URI http://www.w3.org/2005/xqt-errors, This namespace prefix is not predeclared and its use in this document is not normative. Error codes that are not defined in this document are defined in other XQuery 1.0 and XPath 2.0 specifications, particularly and .

Finally, this document uses the prefix fts to represent a namespace containing a number of functions used in this document to describe the semantics of XQuery and XPath Full Text functions. There is no requirement that these functions be implemented, therefore no URI is associated with that prefix.

Full-Text Extensions to XQuery and XPath

XQuery and XPath Full Text extends the languages of XQuery 1.0 and XPath 2.0 in three ways. It:

Adds a new expression called FTContainsExpr;

Enhances the syntax of FLWOR expressions in XQuery 1.0 and for expressions in XPath 2.0 with optional score variables; and

Adds static context declarations for full-text match options to the query prolog.

Additionally, it extends the data model and processing models in various ways.

Processing Model

A full-text contains expression () is composed of several parts:

An XPath 2.0 or XQuery 1.0 expression (RangeExpr) that specifies the sequence of items to be searched. Those items are called the search context.

The full-text selection to be applied (). Full-text selections are, syntactically and semantically, fully composable and contain:

Required:

Tokens and phrases for which a search is performed ().

Optional:

Match options, such as indicators for case sensitivity and stop words ();

Boolean full-text operators, that compose a full-text selection from simpler full-text selections ();

Other full-text operators that are constraints on the positions of matches, such as indicators for distance between tokens and for the cardinality of matches ( and ); and

The weighting information. Each individual search term in a full-text selection may be annotated with optional weight information. This information may be used during the evaluation of the full-text selections to calculate scoring, information that quantifies the relevance of the result to the given search criteria.

An optional XPath 2.0 or XQuery 1.0 expression (UnionExpr) that specifies the set of nodes, descendents of the RangeExp, whose contents must be ignored for the purpose of determining a match during the search ().

The results of the evaluation of the full-text selection operators are instances of the AllMatches model, which complements the XQuery Data Model (XDM) for processing full-text queries. An AllMatches instance describes all possible solutions to the full-text query for a given search context item. Each solution is described by a Match instance. A Match instance contains the tokens from the search context that must be included (described using StringInclude instances which model the positive terms) and the tokens from search context item that must be excluded (described using StringExclude instances which model the negative terms). Each negative or positive term is modeled as a tuple: the position of the query token or phrase in the full-text selection, and a TokenInfo structure that describes a set of tokens in the text string which match the query token or phrase.

Figure 1 provides a schematic overview of the XQuery and XPath Full Text processing steps that are discussed in detail below. Some of these steps are completely outside the domain of XQuery; in Figure 1, these are depicted outside the black line that represents the boundaries of the language. The diagram only shows the central pieces of the XQuery Processing Model (see ), however zooms in on the Execution Engine where the processing of the full-text extensions takes place. The full-text processing steps are labeled as FTn within the diagram and are referenced within the text.

Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all full-text selections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization transforms an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of full-text selections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models.

The resulting AllMatches instance obtained by the evaluation of an FTContainsExpr is converted into a Boolean value before being returned to the enclosing XPath or XQuery operation as follows. If at least one member of the disjunction contains only positive terms then value returned is true. If all members of the disjunction contain negative terms the result is false.

Weighting information, in an implementation-dependent fashion, may be used when calculating the scoring information computed and made available by FTContainsExpr to the optional score construct.

Given the components of a given full-text contains expression, the evaluation algorithm will proceed according to the following steps, also referenced in the processing model diagram as steps FTn (see Fig. 1):

Evaluate the search context expression (resulting in the sequence of search context items), the ignore option, if any (resulting in the set of ignored nodes), and any other XQuery/XPath exprssions nested within the full-text contains expression. (FT1)

Tokenize the query string(s). (FT2.1)

For each search context item:

Delete the ignored nodes from the search context item.

Tokenize the result of the previous step. This produces a sequence of tokens. (FT2.2) Note that implementations may (as an optimization) perform tokenization as part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance.

Evaluate the FTSelection against the tokens of the search context. (FT3, FT4)

Convert the topmost AllMatches instances into a Boolean value. (FT5)

The additional scoring information (also part of FT5) that is produced by the evaluation of the full-text contains expression is implementation-dependent and is not specified in this document. The scoring information is made available at the same time the Boolean value is returned.

(A more detailed version of the above procedure appears in Section .)

Section describes the syntax and the informal semantics of full-text operators. Their formal semantics as well as the formal definition of the AllMatches data model are given in Section .

Full-Text Contains Expression

A full-text contains expression is a expression that evaluates a sequence of items against a full-text selection.

As a syntactic construct, a full-text contains expression (grammar symbol: FTContainsExpr) behaves like a comparison expression (see ). This grammar rule introduces FTContainsExpr.

ComparisonExprFTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?

A full-text contains expression may be used anywhere a ComparisonExpr may be used. The ftcontains operator has higher precedence than other comparison operators, so the results of ftcontains expressions may be compared without enclosing them in parentheses.

Description FTContainsExprRangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?

A full-text contains expression returns a Boolean value. It returns true if there is some item returned by the RangeExpr that, after tokenization, matches the full-text selection FTSelection. See Section for more details. For the purpose of determining a match, certain descendants of nodes (identified by FTIgnoreOption) in the RangeExpr may be ignored, as specified in Section .

An XQuery and XPath Full Text processor SHOULD try to use the information available in xml:lang for processing of collations, as well as the various match options defined in Section .

Examples

The following example in XQuery Full Text returns the author of each book with a title containing a token with the same root as dog and the token cat. for $b in /books/book where $b/title ftcontains ("dog" with stemming) ftand "cat" return $b/author

The same example in XPath Full Text is written as: /books/book[title ftcontains ("dog" with stemming) ftand "cat"]/author

In the next example a ComparisonExpr is combined with an FTContainsExpr using the logical XQuery operator and. The query selects books that have a price of less than 50 and a title which contains a token with the same root as train:

/books/book[price < 50 and title ftcontains ("train" with stemming)]

The following example shows the combination of two ftcontains expressions the results of which are compared using the not-equals operator. The query selects books where either the title contains the token dog and the token cat and the content does not contain a token with the same root as train, or where the title fails to have one of the matching tokens but the content does:

/books/book[title ftcontains "dog" ftand "cat" ne content ftcontains ("train" with stemming)] Score Variables

Besides specifying a match of a full-text query as a Boolean condition, full-text query applications typically also have the ability to associate scores with the results. The score of a full-text query result expresses its relevance to the search conditions.

XQuery and XPath Full Text extends the languages of XQuery 1.0 and XPath 2.0 further by adding optional score variables to the for and let clauses of FLWOR expressions.

The production for the extended for clause in XQuery 1.0 follows. ForClause"for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)* FTScoreVar"score" "$" VarName

In XPath 2.0, the SimpleForClause is extended similarly.

When a score variable is present in a for clause the evaluation of the expression following the in keyword not only needs to determine the result sequence of the expression, i.e., the sequence of items which are iteratively bound to the for variable. It must also determine in each iteration the relevance "score" value of the current item and bind the score variable to that value.

The semantics of scoring and how it relates to second-order functions is discussed in Section .

In the following example book elements are determined that satisfy the condition [content ftcontains "web site" ftand "usability" and .//chapter/title ftcontains "testing"]. The scores assigned to the book elements are returned. for $b score $s in /books/book[content ftcontains "web site" ftand "usability" and .//chapter/title ftcontains "testing"] return $s

The example above is also a legal example of the XPath 2.0 extension.

Scores are typically used to order results, as in the following, more complete example. for $b score $s in /books/book[content ftcontains "web site" ftand "usability"] where $s > 0.5 order by $s descending return <result> <title> {$b//title} </title> <score> {$s} </score> </result>

Note that the score variable gets one score value for each item in the value of the expression after the in keyword, regardless of the number of FTContainsExprs in that expression. In the following example, two separate full-text contains expressions are used to select the matching paragraphs. There is still just one score for each para returned. The highest scoring paragraphs will be returned first:

for $p score $s in //book[title ftcontains "software"]/para[. ftcontains "usability"] order by $s descending return $p

The following more elaborate example uses multiple score variables to return the matching paragraphs ordered so that those from the highest scoring books precede those from the lowest scoring books, where the highest scoring paragraphs of each book are returned before the lower scoring paragraphs of that book:

for $b score $score1 in //book[title ftcontains "software"] order by $score1 descending return for $p score $score2 in $b/para[. ftcontains "usability"] order by $score2 descending return $p

The score variable is bound to a value which reflects the relevance of the match criteria in the full-text selections to the items returned by the respective RangeExprs. The calculation of relevance is implementation-dependent, but score evaluation must follow these rules:

Score values are of type xs:double in the range [0, 1].

For score values greater than 0, a higher score must imply a higher degree of relevance

Similarly to their use in a for clause, score variables may be specified in a let clause. A score variable in a let clause is also bound to the score of the expression evaluation, but in the let clause one score is determined for the complete result.

The production for the extended let clause follows. LetClause(("let" "$" VarName TypeDeclaration?) | ("let" "score" "$" VarName)) ":=" ExprSingle ("," (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle)*

When using the score option in a for clause the expression following the in keyword has the dual purpose of filtering, i.e., driving the iteration, and determining the scores. It is possible to separately specify expressions for filtering and scoring by combining a simple for clause with a let clause that uses scoring. The following is an example of this. for $b in /books/book[.//chapter/title ftcontains "testing"] let score $s := $b/content ftcontains "web site" ftand "usability" order by $s descending return <result score="{$s}">{$b}</result> This example returns book elements with chapter titles that contain "testing". Along with the book elements scores are returned. These scores, however, reflect whether the book content contains "web site" and "usability".

Note that it is not a requirement of the score of an FTContainsExpr to be 0, if the expression evaluates to false, nor to be non-zero, if the expression evaluates to true. Hence, in the example above it is not possible to infer the Boolean value of the FTContainsExpr in the let clause from the calculated score of a returned result element. For instance, an implementation may want to assign a non-zero score to a book that contained "web site", but not "usability", as this may be considered more relevant than a book that does not contain "web site" or "usability".

The expression ExprSingle associated with the score variable is passed to the scoring algorithm. The scoring algorithm calculates the score value based on the passed expression (not on the value returned by evaluating the expression). The set of expressions supported by the scoring algorithm is implementation-defined. If an expression not supported by the scoring algorithm is passed to the scoring algorithm, the result is implementation-defined.

The use of score variables introduces a second-order aspect to the evaluation of expressions which cannot be emulated by (first-order) XQuery functions. Consider the following replacement of the clause let score $s := FTContainsExpr

let $s := score(FTContainsExpr)

where a function score is applied to some FTContainsExpr. If the function score were first-order, it would only be applied to the result of the evaluation of its argument, which is one of the Boolean constants true or false. Hence, there would be at most two possible values such a score function would be able to return and no further differentiation would be possible.

Using Weights Within a Scored FTContainsExpr

Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions. Weight declarations are introduced syntactically in the FTSelection production, described in Section .

The weight MUST have an absolute value between 0.0 and 1000.0 inclusive.

The weights assigned are not related to any absolute standard, but typically have a relationship to other weights within the same FTContains expression.

The effect of weights on the resulting score is implementation-dependent. However, scoring algorithms MUST conform to these constraints:

When no explicit weight is specified, the default weight is 1.0; and

Weight declarations in an FTContainsExpr for which no scores are evaluated are ignored.

The following example illustrates how different weights can be used for different search terms. for $b in /books/book let score $s := $b/content ftcontains ("web site" weight 0.5) ftand ("usability" weight 2) return <result score="{$s}">{$b}</result>

Extensions to the Static Context

The XQuery Static Context is extended with a component for each full-text match option group. The settings of these components can be changed by using the following declaration syntax in the Prolog. Prolog((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)* FTOptionDecl"declare" "ft-option" FTMatchOptions Match options modify the match semantics of full-text expressions. They are described in detail in Section . When a match option is specified explicitly in a full-text expression, it overrides the setting of the respective component in the static context.

Full-Text Selections

This section describes the full-text selections which contain the full-text operators in a full-text contains expression (FTContainsExpr), as well as the match options which modify the matching semantics of the full-text selections. In the following, the syntax for each type of full-text selection is given together with an informal statement of its meaning.

A full-text selection specifies the conditions of a full-text search.

FTSelectionFTOr FTPosFilter* ("weight" RangeExpr)?

As shown in the grammar, a full-text selection consists of search conditions possibly involving logical operators (FTOr) followed by an arbitrary number of positional filters (FTPosFilter) optionally followed by a "weight" value which is specified using a range expression. The RangeExpr is evaluated, as if it were an argument to a function with an expected type xs:double; it must be between 0.0 and 1000.0 inclusive.

The syntax and semantics of the individual full-text selection operators follow.

This XML document is the source document for examples in this section.

<books> <book number="1"> <title shortTitle="Improving Web Site Usability">Improving the Usability of a Web Site Through Expert Reviews and Usability Testing</title> <author>Millicent Marigold</author> <author>Montana Marigold</author> <editor>Véra Tudor-Medina</editor> <content> The usability of a Web site is how well the site supports the users in achieving specified goals. A Web site should facilitate learning, and enable efficient and effective task completion, while propagating few errors. <note>This book has been approved by the Web Site Users Association. </note> </content> </book> </books>

Tokenization is implementation-defined. A sample tokenization is used for the examples in this section. This sample tokenization uses white space, punctuation and XML tags as word-breakers and  for paragraph boundaries. The results may be different for other tokenizations.

The first five tokens in this example using the sample tokenization would be "Improving", "the", "usability", "of", and "a".

Unless stated otherwise, the results assume a case-insensitive match.

Primary Full-Text Selections FTPrimary(FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection

A primary full-text selection is the basic form of a full-text selection. It specifies tokens and phrases as search conditions (FTWords), optionally followed by a cardinality constraint (FTTimes). An FTSelection in parentheses and the FTExtensionSelection are also a primary full-text selections.

Search Tokens and Phrases FTWordsFTWordsValue FTAnyallOption? FTWordsValueLiteral | ("{" Expr "}") FTAnyallOption("any" "word"?) | ("all" "words"?) | "phrase"

FTWords finds matches that contain the specified tokens and phrases.

FTWords consists of two parts: a mandatory FTWordsValue part and an optional FTAnyallOption part. FTWordsValue specifies the tokens and phrases that must be contained in the matches. FTAnyallOption specifies how containment is checked.

In general, the tokens and phrases in FTWordsValue are specified using a nested XQuery expression. To simplify notation, the enclosing braces may be omitted if FTWordsValue consists of a single literal.

The following rules specify how an FTWordsValue matches tokens and phrases. First, the FTWordsValue is converted to a sequence of strings as though it were an argument to a function with the expected type of xs:string*. Then, each of those strings is tokenized into a sequence of tokens as described in Section 4.1 Tokenization. Then, FTAnyallOption is checked.

If FTAnyallOption is "any", the sequence of tokens for each string is considered as a phrase, i.e. a match is found in the tokenized form of the text being searched, whenever that form contains a subsequence of tokens that corresponds to the sequence of query tokens in an implementation-defined way and that subsequence of tokens covers consecutive token positions in the tokenized text. If the value of the FTWordsValue contains more than one string, the different strings are considered to be alternatives, i.e. the resulting matches must contain at least one of the generated phrases.

If FTAnyallOption is "all", the sequence of tokens for each string is considered as a phrase. The resulting matches must contain all of the generated phrases.

If FTAnyallOption is "phrase", the tokens from all the strings are concatenated in a single sequence, which is considered as a phrase. The resulting matches must contain the generated phrase.

If FTAnyallOption is "any word", the tokens from all the strings are combined into a single set. The resulting matches must contain at least one of the tokens in the set.

If FTAnyallOption is "all words", the tokens from all the strings are combined into a single set. The resulting matches must contain all of the tokens in the set.

If the FTWordsValue evaluates to a single string, the use of "any", "all", and "phrase" in FTAnyallOption produces the same results.

If FTAnyallOptions is omitted, "any" is the default.

The following expression returns the sample book element, because its title element contains the token "Expert":

//book[./title ftcontains "Expert"]

The following expression returns the sample book element, because its title element contains the phrase "Expert Reviews":

//book[./title ftcontains "Expert Reviews"]

The following expression returns the sample book element, because its title element contains the two tokens "Expert" and "Reviews":

//book[./title ftcontains {"Expert", "Reviews"} all]

The following expression returns false for our sample document, because the p element doesn't contain the phrase "Web Site Usability" although it contains all of the tokens in the phrase:

//book//p ftcontains "Web Site Usability"

The following expression returns book numbers of book elements by "Marigold" with a title about "Web Site Usability", sorting them in descending score order:

for $book in /books/book[.//author ftcontains "Marigold"] let score $score := $book/title ftcontains "Web Site Usability" where $score > 0.8 order by $score descending return $book/@number Cardinality Selection FTTimes"occurs" FTRange "times"

A cardinality selection consist of an FTWords followed by the FTTimes postfix operator. A cardinality selection selects matches for which the operand FTWords is matched a specified number of times.

A cardinality selection limits the number of different matches of FTWords within the specified range. The semantics of FTRange are described in .

In the document fragment "very very big":

The FTWords "very big" has 1 match consisting of the second "very" and "big".

The FTWords {"very", "big"} all has 2 matches; one consisting of the first "very" and "big", and the other containing the second "very" and "big".

The FTWords {"very", "big"} any has 3 matches.

The following expression returns the example book element's number, because the book element contains 2 or more occurrences of "usability":

//book[. ftcontains "usability" occurs at least 2 times]/@number

The following expression returns the empty sequence, because there are 3 occurrences of {"usability", "testing"} any in the designated title:

//book[@number="1" and title ftcontains {"usability", "testing"} any occurs at most 2 times] Match Options

Full-text match options modify the matching behaviour of the primary full-text selection to which they are applied.

Match options modify the set of tokens in the query, or how they are matched against tokens in the text.

Each of the seven alternatives of production FTMatchOption corresponds to one match option group. The match options from any given group are mutually exclusive, i.e., only one of these settings can be in effect, whereas match options of different groups can be combined freely.

Note that, along with the syntax rules above, there is an extra-grammatical constraint, multiple-match-options , which needs to be considered, if multiple match options are specified. It states that within a single FTMatchOptions at most one match option of any given match option group may be specified. For example, if the FTCaseOption "lowercase" is specified, then "uppercase" cannot also be specified as part of the same FTMatchOptions.

Although match options only take effect in the application of FTWords, the syntax also allows to specify match options that modify the non-primitive full-text selection "(" FTSelection ")". Such a higher-level match option provides a default for the respective match option group for any embedded FTPrimary, just as match option declarations in the Prolog provide default match options for the whole query.

Match options are propagated through the query via the static context. For each of the seven match option groups, the static context has a component that contains one option from that group. The seven settings are initialized by the implementation in accordance with the table in Appendix , and are modified by any FTOptionDecls in the Prolog. The resulting settings are then propagated unchanged to every FTContainsExpr in the module (including those in VarDecls and FunctionDecls, and including any that happen to be nested within another FTContainsExpr). At any given FTContainsExpr, the settings from the static context are copied to the FTContainsExpr's inner settings, which are then propagated down the syntax tree. At each FTPrimaryWithOptions, the locally specified match options (if any) overwrite the corresponding inner setting(s). At each FTWords, the inner settings are used as the effective match options for tokenizing the query strings and matching them against the tokens in the text. (These inner settings could be seen as a parallel set of components in the static context, but Section models them as structures that get passed as parameters to various semantic functions.)

Thus, when a match option appears in an FTSelection, it applies to the associated FTPrimary, but not to any FTContainsExprs that happen to be embedded within that FTPrimary. Instead, for a nested FTContainsExpr, the default match options are those declared in the Prolog or, if not declared in the Prolog, then supplied by the implementation's initial values.

The order in which effective match options for an FTWords are applied is called the match option application order. This order is significant because match options are not always commutative. For example, synonym(stem(word)) is not always the same as stem(synonym(word)).

The match option application order is subject to some constraints:

The Language Option must be applied first

The Stemming Option must be applied before the Case Option and the Diacritics Option

Aside from these constraints, the full order of the application of match options is implementation-defined.

More information on their semantics is given in .

If no match options declarations are present in the prolog and the implementation does not define any overwriting of the static context components for the match options, the query:

/books/book/title ftcontains "usability"

is, assuming "de" is the implementation-defined default language, equivalent to the query:

/books/book/title ftcontains "usability" language "de" without wildcards without thesaurus without stemming case insensitive diacritics insensitive without stop words

We describe each match option group in more detail in the following sections.

Language Option FTLanguageOption"language" StringLiteral

A language option modifies token matching by specifying the language of search tokens and phrases.

The StringLiteral following the keyword language designates one language. It must be castable to xs:language; otherwise, an error is raised: .

The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. The "language" option MAY influence the behavior of other match options in an implementation-defined way.

The set of standardized language identifiers is defined in . The set of valid language identifiers among the standardized set is implementation-defined. An implementation MAY choose to use private extensions introduced by a singleton 'x' for additional language identifiers, or other singletons for registered extensions as described in sec. 2.2.6 of . It is implementation-defined what additional language identifiers, if any, are valid. If an invalid language identifier is specified, then the behavior is implementation-defined. If the implementation chooses to raise an error in that case, it must raise .

The default language is specified in the static context.

When an XQuery and XPath Full Text processor evaluates text in a document that is governed by an xml:lang attribute and the portion of the full-text query doing that evaluation contains an FTLanguageOption that specifies a different language from the language specified by the governing xml:lang attribute, the language-related behavior of that full-text query is implementation-defined.

This is an example where the language option is used to select the appropriate stop word list:

//book[@number="1"]//editor ftcontains "salon de the" with default stop words language "fr" Wildcard Option FTWildCardOption("with" "wildcards") | ("without" "wildcards")

A wildcard option modifies token and phrase matching by specifying whether wildcards are used or not.

When the "with wildcards" option is used, wildcard indicators (represented by periods (.)) and qualifiers may be appended to or inserted into the query tokens. If the period is at the beginning of a query token, the wildcard is a prefix wildcard. If the period is at the end of a query token, it is a suffix wildcard. If the period is inserted into a query token, it is an infix wildcard.

Each indicator and qualifier in a query token will match zero or more characters within a token in the text being searched, as described below. The number of characters matched depends on the qualifier. Qualifiers available are none, question mark, asterisk, plus sign, and two numbers separated by a comma, both enclosed by curly braces.

If a period is present, but there are no qualifiers, one character in the text will match.

If a period is followed by a question mark (.?), zero or one characters in the text being searched will match.

If a period is followed by an asterisk (.*), zero or more characters will match.

If a period is followed by a plus sign (.+), one or more characters will match.

If a period is followed by two numbers separated by a comma, both enclosed by curly braces (.{n,m}), a specified range of characters (at least n characters and no more than m characters) will match.

When "with wildcards" is present and an indicator or qualifier character is intended to be taken literally (as itself), that character must be preceded by ("escaped by") a backslash (\). For example, a period (.) that is intended to be a sentence terminator or a decimal point must be preceded by a backslash so that it is not interpreted to be an indicator. Similarly a question mark (?), asterisk (*), or plus sign (+) that is intended to be interpreted as an ordinary text character must be preceded by a backslash so that it is not interpreted to be an indicator.

The "without wildcards" option finds tokens without recognizing wildcard indicators and qualifiers. Periods, question marks, asterisks, plus signs, and two numbers separated by a comma, both enclosed by curly braces, are always recognized as ordinary text characters.

The default is "without wildcards".

Note: Wildcard indicators and qualifiers may be token boundaries. How text with wildcard indicators and qualifiers is tokenized is implementation-defined.

The expression returns true, because the title element contains "improving":

//book[@number="1"]/title ftcontains "improv.*" with wildcards

The following expression returns true, because the title element contains "site":

//book[@number="1"]/title ftcontains ".?site" with wildcards

The following expression returns true, because the p element contains "well":

//book[@number="1"]/p ftcontains "w.ll" with wildcards

The following expression returns false, because the p element does not contain the phrase "w ll":

//book[@number="1"]/p ftcontains "w.ll" without wildcards

(Note that, without wildcards, the sample tokenization will treat the period in "w.ll" as punctuation, thus producing "w" and "ll" as separate tokens.)

Thesaurus Option FTThesaurusOption("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus") FTThesaurusID"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")? URILiteralStringLiteral

A thesaurus option modifies token and phrase matching by specifying whether a thesaurus is used or not. If thesauri are used, the thesaurus option specifies information to locate the thesauri either by default or through a URI reference. It also states the relationship to be applied and how many levels within the thesaurus to be traversed.

Thesauri add related tokens and phrases to the query or change query tokens. Thus, the user may narrow, broaden, or otherwise modify the query using synonyms, hypernyms (more generic terms), etc. The search is performed as though the user has specified all related query tokens and phrases in a disjunction (FTOr).

A thesaurus may be standards-based or locally-defined. It may be a traditional thesaurus, or a taxonomy, soundex, ontology, or topic map. How the thesaurus is represented is implementation-dependent.

FTThesaurusID specifies the relationship sought between tokens and phrases written in the query and terms in the thesaurus and the number of levels to be queried in hierarchical relationships by including an FTRange "levels". If no levels are specified, the default is to query all levels in hierarchical relationships.

Relationships include, but are not limited to, the relationships and their abbreviations presented in and their equivalents in other languages. The set of relationships supported by an implementation is implementation-defined, but implementations SHOULD support the relationships defined in . The following list of terms have the meanings defined in . If a query specifies thesaurus relationships or levels not supported by the thesaurus, or does not specify a relationship, the behavior is implementation-defined.

equivalence relationships (synonyms): PREFERRED TERM (USE), NONPREFERRED USED FOR TERM (UF);

hierarchical relationships: BROADER TERM (BT), NARROWER TERM (NT), BROADER TERM GENERIC (BTG), NARROWER TERM GENERIC (NTG), BROADER TERM PARTITIVE (BTP), NARROWER TERM PARTITIVE (NTP), TOP Terms (TT); and

associative relationships: RELATED TERM (RT).

The "with thesaurus" option specifies that string matches include tokens that can be found in one of the specified thesauri. When "default" is used in place of a FTThesaurusID, the thesauri specified in the static context are used, which are either given by the prolog declaration for the thesaurus option, or, if no such declaration exists a system-defined default thesaurus with a system-defined relationship. The default thesaurus may be used in combination with other explicitly specified thesauri.

The "without thesaurus" option specifies that no thesaurus will be used.

The default is "without thesaurus".

The following expression returns true, because it finds a content element containing "tasks" which the thesaurus identified as a synonym for "duties":

count(.//book/content ftcontains "duties" with thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml" relationship "UF")>0

The following expression returns book elements, because it finds a content element containing "web site components", and narrower terms "navigation" and "layout":

doc("http://bstore1.example.com/full-text.xml") /books/book[count(./content ftcontains "web site components" with thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml" relationship "NT" at most 2 levels)>0]

Assuming the thesaurus available at URL "http://bstore1.example.com/UsabilitySoundex.xml" contains soundex capabilities, the following query returns a book element containing "Marigold" which sounds like "Merrygould":

doc("http://bstore1.example.com/full-text.xml") /books/book[count(. ftcontains "Merrygould" with thesaurus at "http://bstore1.example.com/UsabilitySoundex.xml" relationship "sounds like")>0] Stemming Option FTStemOption("with" "stemming") | ("without" "stemming")

A stemming option modifies token and phrase matching by specifying whether stemming is applied or not.

The "with stemming" option specifies that matches may contain tokens that have the same stem as the tokens and phrases written in the query. It is implementation-defined what a stem of a token is.

The "without stemming" option specifies that the tokens and phrases are not stemmed.

It is implementation-defined whether the stemming is based on an algorithm, dictionary, or mixed approach.

The default is "without stemming".

The following expression returns true, because the title of the specified book contains "improving" which has the same stem as "improve":

/books/book[@number="1"]/title ftcontains "improve" with stemming Case Option FTCaseOption("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"

A case option modifies the matching of tokens and phrases by specifying how uppercase and lowercase characters are considered.

There are four possible character case options:

Using the option "case insensitive", tokens and phrases are matched, regardless of the case of characters of the query tokens and phrases.

Using the option "case sensitive", tokens and phrases are matched, if and only if the case of their characters is the same as written in the query.

Using the option "lowercase", tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only lowercase characters.

Using the option "uppercase", tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only uppercase characters.

The default is "case insensitive".

The effect of the case options is also influenced by the query's default collation (see and ). The following table summarizes how these interact.

Case Matrix

Case option \ Default collation UCC (Unicode Codepoint Collation) CCS (some generic case-sensitive collation) CCI (some generic case-insensitive collation)

case insensitive compare as if both lower case-insensitive variant of CCS if it exists, else error CCI

case sensitive UCC CCS case-sensitive variant of CCI if it exists, else error

lowercase compare using UCC after applying fn:lower-case() to the query string compare using CCS after applying fn:lower-case() to the query string CCI

uppercase compare using UCC after applying fn:upper-case() to the query string compare using CCS after applying fn:upper-case() to the query string CCI

Case Matrix
Case option \ Default collation	UCC (Unicode Codepoint Collation)	CCS (some generic case-sensitive collation)	CCI (some generic case-insensitive collation)
case insensitive	compare as if both lower	case-insensitive variant of CCS if it exists, else error	CCI
case sensitive	UCC	CCS	case-sensitive variant of CCI if it exists, else error
lowercase	compare using UCC after applying fn:lower-case() to the query string	compare using CCS after applying fn:lower-case() to the query string	CCI
uppercase	compare using UCC after applying fn:upper-case() to the query string	compare using CCS after applying fn:upper-case() to the query string	CCI

In this table, "else error" means "Otherwise, an error is raised: ". The phrase "if it exists" is used, because the case-sensitive collation CCS does not always have a case-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the case-insensitive collation CCI does not always have a case-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

The following expression returns false, because the title element doesn't contain "usability" in lower-case characters:

//book[@number="1"]/title ftcontains "Usability" lowercase

The following expression returns true, because the character case is not considered:

//book[@number="1"]/title ftcontains "usability" case insensitive Diacritics Option FTDiacriticsOption("diacritics" "insensitive")
| ("diacritics" "sensitive")

A diacritics option modifies token and phrase matching by specifying how diacritics are considered.

There are two possible diacritics options:

The option "diacritics" "insensitive" matches tokens and phrases with and without diacritics. Whether diacritics are written in the query or not is not considered.

The option "diacritics" "sensitive" matches tokens and phrases only if they contain the diacritics as they are written in the query.

The default is "diacritics insensitive".

The effect of the diacritics options is also influenced by the query's default collation (see and ). The following table summarizes how these interact.

Diacritics Matrix

Diacritics option \ Default collation UCC (Unicode Codepoint Collation) CDS (some generic diacritics-sensitive collation) CDI (some generic diacritics-insensitive collation)

diacritics insensitive UCC comparison, but without considering diacritics diacritics-insensitive variant of CDS if it exists, else error CDI

diacritics sensitive UCC CDS diacritics-sensitive variant of CDI if it exists, else error

Diacritics Matrix
Diacritics option \ Default collation	UCC (Unicode Codepoint Collation)	CDS (some generic diacritics-sensitive collation)	CDI (some generic diacritics-insensitive collation)
diacritics insensitive	UCC comparison, but without considering diacritics	diacritics-insensitive variant of CDS if it exists, else error	CDI
diacritics sensitive	UCC	CDS	diacritics-sensitive variant of CDI if it exists, else error

In this table, "else error" means "Otherwise, an error is raised: ". The phrase "if it exists" is used, because the diacritics-sensitive collation CDS does not always have a diacritics-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the diacritics-insensitive collation CDI does not always have a diacritics-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

The following expression returns true, because the token "Véra" in the editor element is matched, as the acute accent is not considered in the comparison:

//book[@number="1"]//editor ftcontains "Vera" diacritics insensitive

This returns false, because the editor element does not contain the token "Vera" in this exact form, i.e. without any diacritics:

//book[@number="1"]/editors ftcontains "Vera" diacritics sensitive Stop Word Option FTStopWordOption("with" "stop" "words" FTStopWords FTStopWordsInclExcl*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTStopWordsInclExcl*) FTStopWords("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")") FTStopWordsInclExcl("union" | "except") FTStopWords

A stop word option controls matching of FTWords by specifying whether stop words are used or not. Stop words are tokens in the query that match any token in the text being searched. Normally a stop word matches exactly one token, but there may be implementation-defined conditions, under which a stop word may match a different number of tokens.

FTStopWords specifies the list of stop words either explicitly as a comma-separated list of string literals, or by the keyword at followed by a literal URI. If the URI specifies a list of stop words that is not found in the statically known stop word lists, an error is raised . Whether the stop word list is resolved from the statically known stop word lists or given explicitly, no tokenization is performed on the stop words: they are used as they occur in the list.

The "with stop words" option specifies that if a token is within the specified collection of stop words, it is removed from the search and any token may be substituted for it. Stop words retain their position numbers and are counted in FTDistance and FTWindow searches.

Multiple stop word lists may be combined using "union" or "except". The keywords "union" and "except" are applied from left to right. If "union" is specified, every string occurring in the lists specified by the left-hand side or the right-hand side is a stop word. If "except" is specified, only strings occurring in the list specified by the left-hand side but not in the list specified by the right-hand side are stop words.

The "with default stop words" option specifies that an implementation-defined collection of stop words is used.

The "without stop words" option specifies that no stop words are used. This is equivalent to specifying an empty list of stop words.

The default is "without stop words".

Some implementations may apply stop word lists during indexing and be unable to comply with query-time requests to not apply those stop words. An implementation may still support stop-word options (and therefore not raise ) by applying any additional stop words specified in the query. Pre-application of irrevocable stop word lists falls under implementation-defined tokenization behavior in this case, and a query that specifies "without stop words" may still have some words ignored.

The following expression returns true, because the document contains the phrase "propagating few errors":

/books/book[@number="1"]//p ftcontains "propagation of errors" with stemming with stop words ("a", "the", "of")

Note the asymmetry in the stop word semantics: the property of being a stop word is only relevant to query terms, not to document terms. Hence, it is irrelevant for the above-mentioned match whether "few" is a stop word or not, and on the other hand we do not want the query above to match "propagation" followed by 2 stop words, or even a sequence of 3 stop words in the document.

The following expression returns false. In this case specifying "few" as a stop word has no effect, since "few" does not appear in the query. Although the words "propagating" and "errors" appear in the text being searched, the phrase "propagating errors" cannot be matched, since that phrase does not occur.

/books/book[@number="1"]//p ftcontains "propagating errors" with stop words ("few")

The following expression returns false, because "of" is not in the p element between "propagating" and "errors":

/books/book[@number="1"]//p ftcontains "propagation of errors" with stemming without stop words

The following expression uses the stop words list specified at the URL. Assuming that the specified stop word list contains the word "then", this query is reduced to a query on the phrase "planning X conducting", allowing any token as a substitute for X. It returns a book element, because its content element contains "planning then conducting". It would also return the book if the phrases "planning and conducting" and "planning before conducting" had been in its content:

doc("http://bstore1.example.com/full-text.xml") /books/book[count(.//content ftcontains "planning then conducting" with stop words at "http://bstore1.example.com/StopWordList.xml")>0]

The following expression returns books containing "planning then conducting", but not does not return books containing "planning and conducting", since it is exempting "then" from being a stop word:

doc("http://bstore1.example.com/full-text.xml") /books/book[count(.//content ftcontains "planning then conducting" with stop words at "http://bstore1.example.com/StopWordList.xml" except ("the", "then"))>0] Extension Option

An extension option is a match option that acts in an implementation-defined way.

FTExtensionOption"option" QName StringLiteral

An extension option consists of an identifying QName and a StringLiteral. Typically, a particular option will be recognized by some implementations and not by others. The syntax is designed so that option declarations can be successfully parsed by all implementations.

The QName of an extension option must resolve to a namespace URI and local name, using the statically known namespaces.

There is no default namespace for options.

Each implementation recognizes an implementation-defined set of namespace URIs used to denote extension options.

If the namespace part of the QName is not a namespace recognized by the implementation as one used to denote extension option, then the extension option is ignored.

Otherwise, the effect of the extension option, including its error behavior, is implementation-defined. For example, if the local part of the QName is not recognized, or if the StringLiteral does not conform to the rules defined by the implementation for the particular extension option, the implementation may choose whether to report an error, ignore the extension option, or take some other action.

Implementations may impose rules on where particular extension options may appear relative to other match options, and the interpretation of an option declaration may depend on its position.

An extension option must not be used to change the syntax accepted by the processor, or to suppress the detection of static errors. However, it may be used without restriction to modify the set of tokens in the query or how they are matched against tokens in the text being searched. An extension option has the same scope as other match options.

The following examples illustrate several possible uses for extension options:

This extension option is set as part of the static context of all full-text expressions in the module and might be used to ensure that queries are insensitive to Arabic short-vowels.

declare namespace exq = "http://example.org/XQueryImplementation"; declare ft-option option exq:diacritics "short-vowel insensitive"

This extension option applies only to the matching in the full-text selection in which it is found and might be used to specify how compound words should be matched.

declare namespace exq = "http://example.org/XQueryImplementation"; //para[. ftcontains ("Kinder" ftand "Platz" distance exactly 1 words) with stemming option exq:compounds "distance=1" ] Logical Full-Text Operators

Full-text selections can be combined with the logical connectives ftor (full-text or), ftand (full-text and), not in (mild not), and ftnot (unary full-text not).

FTOrFTAnd ( "ftor" FTAnd )* FTAndFTMildNot ( "ftand" FTMildNot )* FTMildNotFTUnaryNot ( "not" "in" FTUnaryNot )* FTUnaryNot("ftnot")? FTPrimaryWithOptions Or-Selection

An or-selection combines two full-text selections using the ftor operator.

An or-selection finds all matches that satisfy at least one of the operand full-text selections.

The following expression returns the book element written by "Millicent":

//book[.//author ftcontains "Millicent" ftor "Voltaire"] And-Selection

An and-selection combines two full-text selections using the ftand operator.

An and-selection finds matches that satisfy all of the operand full-text selections simultaneously. A match of an and-selection is formed by combining matches for each of the operand full-text selections as described in .

For example, "usability" ftand "testing" will find two matches in //book[@number="1"]/title: each of the two matches for the FTWords selection "usability" (the two occurrences of "usability" in the string value of the title element) is combined with the single match for the FTWords "testing" (only one occurrence of "testing" in the title). Since the above and-selection has at least one match, the following expression will return "true".

//book[@number="1"]/title ftcontains ("usability" ftand "testing")

The following expression returns false, because "Millicent" and "Montana" are not contained by the same author element in any book element:

//book/author ftcontains "Millicent" ftand "Montana"

No author element in any book element contains both "Millicent" and "Montana". Therefore, for any such author element, there are either one match for the FTWords "Millicent" and zero matches for the FTWords "Montana", or vice versa, or no matches for both of them. In any of these cases, the and-selection will have zero matches.

Mild-Not Selection

A mild-not selection combines two full-text selections using the not in operator.

The not in operator is a milder form of the operator combination ftand ftnot. The selection A not in B matches a token sequence that matches A, but not when it is a part of a match of B. In contrast, A ftand ftnot B only finds matches when the token sequence contains A and does not contain B.

As an example, consider a search for "Mexico" not in "New Mexico". This may return, among others, a document which is all about "Mexico" but mentions at the end that "New Mexico was named after Mexico". The occurrence of "Mexico" in "New Mexico" is not considered, but other occurrences of "Mexico" are matched. Note that this document would not be matched by the full-text selection "Mexico" ftand ftnot "New Mexico".

A match to a mild-not selection must contain at least one token that satisfies the first condition and does not satisfy the second condition. If it contains a token that satisfies both the first and the second condition, the token is not considered as a match.

The following expression returns true, because "usability" appears in the title and the p elements and the token within the phrase "Usability Testing" in the title element is not considered:

/books/book ftcontains "usability" not in "usability testing"

Operands of a mild-not selection may not contain a full-text selection that evaluates to an AllMatches that contains a StringExclude. Such full-text selections are not-selection and FTWords with a cardinality constraint using at most, from ... to, and exactly occurrences ranges. If such an expression is encountered, an error is raised.

Not-Selection

A not-selection is a full-text selection starting with the prefix operator ftnot.

A not-selection selects matches that do not satisfy the operand full-text selection. Details about how such matches are constructed are given in .

The following expression returns the empty sequence, because all book elements contain "usability":

//book[. ftcontains ftnot "usability"]

The following expression returns true, because book elements contain "information" and "retrieval" but not "information retrieval":

//book ftcontains "information" ftand "retrieval" ftand ftnot "information retrieval"

The following expression returns book elements containing "web site usability" but not "usability testing":

//book[. ftcontains "web site usability" ftand ftnot "usability testing"] Positional Filters FTPosFilterFTOrder | FTWindow | FTDistance | FTScope | FTContent

Positional filters are postfix operators that serve to filter matches based on various constraints on their positional information.

Recall that the grammar rule for FTSelection allows an arbitrary number of positional filters to follow an FTOr. Multiple adjacent positional filters are applied from left to right, i.e., the first filter is applied to the result of the FTOr, the second is applied to the result of that first application, and so on.

Ordered Selection FTOrder"ordered"

An ordered selection consists of a full-text selection followed by the postfix operator "ordered". An ordered selection constrains the order of tokens and phrases to be the same as the order in which they are written in the operand selection.

The default is unordered. Unordered is in effect when ordered is not specified in the query. Unordered cannot be written explicitly in the query.

An ordered selection selects matches which satisfy the operand full-text selection and which also satisfy the following constraint: the order that the matching tokens or phrases have in the text being searched is the same order that the corresponding query tokens or phrases have in the operand selection. In both cases, the ordering is determined from the minimum start positions of the contituent tokens.

The following expression returns true, because titles of book elements contain "web site" and "usability" in the order in which they are written in the query, i.e., "web site" must precede "usability":

//book/title ftcontains ("web site" ftand "usability") ordered

The following expression returns false, because although "Montana" and "Millicent" both appear in the book element, they do not appear in the order they are written in the query:

//book[@number="1"] ftcontains ("Montana" ftand "Millicent") ordered Window Selection FTWindow"window" AdditiveExpr FTUnit FTUnit"words" | "sentences" | "paragraphs"

A window selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTWindow. A window selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases, more precisely the individual StringIncludes of that match, are found within a number of FTUnits (words, sentences, and paragraphs). The number of FTUnits is specified by an AdditiveExpr that is converted as though it were an argument to a function with the expected type of xs:integer.

A window selection may cross element boundaries. The size of the window is not affected by the presence or absence of element boundaries. Stop words are included in the computation of the window size whether they are ignored by the query or not.

A window selection examines the matches generated by the preceding portion of the FTSelection, and selects those for which the matched tokens and phrases (more precisely, the individual StringIncludes of that match) are all found within a window whose size is a specified number of FTUnits (words, sentences, or paragraphs); for each such window, the window selection then generates a match containing the merge of those StringIncludes, plus any StringExcludes that fall within the window.

The following expression returns true, because "web", "site", and "usability" are within a window of 5 tokens in the title element:

/books/book/title ftcontains "web" ftand "site" ftand "usability" window 5 words

The following expression returns true, because "web" and "site" in the order they are written in the query and either "usability" or "testing" are within a window of at most 10 tokens:

/books/book ftcontains ("web" ftand "site" ordered) ftand ("usability" ftor "testing") window 10 words

The following expression returns true, because the title element contains "Web Site Usability". A similar query on the p element would not return true, because its occurrences of "web site" and "usability" are not within a window of 3:

/books/book//title ftcontains "web site" ftand "usability" window 3 words

The following expression returns the sample book element, because its number attribute is 1 and it contains a window of 2 words which contains an occurrence of "efficient" but not an occurrence of "and". There is just one such matching window in the sample text and it contains "enable efficient".

/books/book[@number="1" and . ftcontains "efficient" ftand ftnot "and" window 2 words]

The following expression returns the empty sequence, because in the selected book element, there is no occurrence of "efficient" within a window of 3 tokens which would not also contain an occurrence of "and":

/books/book[@number="1" and . ftcontains "efficient" ftand ftnot "and" window 3 words]

In order to allow meaningful results for nested positional filters, e.g., a window selection embedded inside a distance selection, the resulting matches for window selections are formed from the input matches that satisfy the window constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. This is explained in more detail in Section .

Distance Selection FTDistance"distance" FTRange FTUnit FTRange("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)

A distance selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTDistance.

A distance selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases satisfy the specified distance conditions.

Distances in the search context are measured in units of tokens, sentences, or paragraphs. Roughly speaking, the distance between two matches is the number of intervening units, so a distance of zero tokens (sentences, paragraphs) means no intervening tokens (sentences, paragraphs). More precisely, given two matches, we first determine their order by sorting on starting position and if necessary on ending position. Let M1 be the "earlier" and M2 be the "later". (If there are overlapping tokens involved, the designations "earlier" and "later" may not be intuitively obvious.) Then the distance between the two is M2's starting position minus M1's ending position, minus 1.

When computing distances in the search context, a distance selection may cross element boundaries; they affect the distance computed only to the extent that they affect the tokenization of the search context. Stop words are counted in those computations whether they are ignored or not.

When a distance selection applies a distance condition to more than two matches, the distance condition is required to hold on each successive pair of matches.

An FTDistance expresses a distance condition in terms of an FTUnit and an FTRange. An FTUnit can be words, sentences, or paragraphs, where words refers to a distance measured in tokens.

An FTRange specifies a range of integer values by providing a minimum and/or maximum value for some integer quantity. (Here, where the FTRange appears in an FTDistance, that quantity is a distance. When it appears in an FTTimes, the quantity is a number of occurrences.) Each one of the AdditiveExpr specified in an FTRange is converted as though it were an argument to a function with the expected parameter type of xs:integer.

Let the value of the first (or only) operand be M. If "from" is specified, let the value of the second operand be N.

If "exactly" is specified, then the range is the closed interval [M, M]. If "at least" is specified, then the range is the half-closed interval [M, unbounded). If "at most" is specified, then the range is the half-closed interval (unbounded, M]. If "from-to" is specified, then the range is the closed interval [M, N]. Note: If M is greater than N, the range is empty.

Here are some examples of FTRanges:

'exactly 0' specifies the range [0, 0].

'at least 1' specifies the range [1,unbounded).

'at most 1' specifies the range (unbounded, 1].

'from 5 to 10' specifies the range [5, 10].

The following expression returns false, because "completion" and "errors" are less than 11 tokens apart:

/books/book ftcontains ("completion" ftand "errors" distance at least 11 words)

The following expression returns false:

/books/book ftcontains "web" ftand "site" ftand "usability" distance at most 2 words

The search context does contain the phrase "The usability of a Web site", in which the tokens "usability" and "Web" have a distance of 2 words, and the tokens "Web" and "site" have a distance of 0 words, both of which satisfy the constraint distance at most 2 words. However, the problem is that "usability" and "site" have a distance of 3 words, which does not satisfy the constraint, and so the distance selection yields no matches, and the expression as a whole yields false. (The phrase "Improving Web Site Usability" would satisfy the given full-text selection, but it occurs in an attribute value, and so is not subject to tokenization.)

The following expression returns the empty sequence, because between any token "usability" and the token in any occurrence of the phrase "web site" that is the nearest to the token "usability" there is always more than one intervening token:

/books/book[.//p ftcontains "web site" ftand "usability" distance at most 1 words]

The following expression returns the book title, because for the occurrences of the tokens "web" and "users" in the note element only one intervening token appears:

/books/book[. ftcontains "web" ftand "users" distance at most 1 words]/title

In order to allow meaningful results for nested positional filters, e.g., a distance selection embedded inside another distance selection, the resulting matches for distance selections are formed from the input matches that satisfy the distance constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. Thus, a distance selection that embeds a window or a distance selection takes the result of the embedded selection as a single unit.

The following gives an example of nested distance selections:

/books/book ftcontains ((("richard" ftand "nixon") distance at most 2 words) ftand (("george" ftand "bush") distance at most 2 words) distance at least 20 words)

This expression allows to find book elements that contain, for instance, "Richard M. Nixon" and "George W. Bush" at least 20 words apart. The matches for the inner distance selections are treated as single units (represented by StringIncludes) by the outer distance selection. Suppose such phrases are present in the search context, then the outer distance selection enforces a constraint on the number of intervening tokens ("at least 20") between the last token of "Richard M. Nixon" and the first token of "George W. Bush".

Scope Selection FTScope("same" | "different") FTBigUnit FTBigUnit"sentence" | "paragraph"

A scope selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTScope.

A scope selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are contained in the same scope or in different scopes.

Possible scopes are sentences and paragraphs.

By default, there are no restrictions on the scope of the matches.

The following expression returns false, because the tokens "usability" and "Marigold" are not contained within the same sentence:

//book ftcontains "usability" ftand "Marigold" same sentence

The following expression returns true, because the tokens "usability" and "Marigold" are contained within different sentences:

//book ftcontains "usability" ftand "Marigold" different sentence

The following expression returns a book element, because it contains "usability" and "testing" in the same paragraph:

//book[. ftcontains "usability" ftand "testing" same paragraph]

The following expression returns a book element, because "site" and "errors" appear in the same sentence:

//book[. ftcontains "site" ftand "errors" same sentence]

It is possible that both "same sentence" and "different sentence" conditions are simultaneously safisfied for several tokens and/or phrases within the same document fragment. This can be observed if there are occurrences of the tokens and/or phrases both within the same sentence and within difference sentences. For example, consider the following document fragment.

<introduction> ... The usability of a Web site is how well the site supports the user in achieving specified goals. ... Expert reviews and usability testing are methods of identifying problems in layout, terminology, and navigation. ... </introduction>

This sample will satisfy both conditions ("usability" ftand "reviews") different sentence and ("usability" ftand "reviews") same sentence. The tokens "usability" and "reviews" occur both in different sentences (the first and second shown sentences) and in the same sentence (the second shown sentences.)

The above observation also holds for the "same paragraph" and "different paragraph" conditions.

Anchoring Selection FTContent("at" "start") | ("at" "end") | ("entire" "content")

An anchoring selection consists of a full-text selection followed by one of the postfix operators "at start", "at end", or "entire content".

An anchoring selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are the first, last, or all tokens in the tokenized form of the items being searched.

Using the "at start" operator, tokens or phrases are matched, if they cover the first token position in the tokenized string value of the item being searched.

Using the "at end" operator, tokens or phrases are matched, if they cover the last token position in the tokenized string value of the item being searched.

Using the "entire content" operator, tokens or phrases are matched, if they cover all token positions of the tokenized string value of the item being searched.

The following expression returns each title element starting with the phrase "improving the usability of a web site":

/books//title[. ftcontains "improving the usability of a web site" at start]

The following expression returns the p element of the sample, because it ends with the phrase "propagating few errors":

/books//p[. ftcontains "propagat.*" with wildcards ftand "few errors" distance at most 2 words at end]

Since the distance operator doesn't imply an ordering, the last example would also yield a match if the p element ended with, say, "few errors are propagated".

The following expression returns each note element whose entire content is "this book has been approved by the web site users association":

/books//note[. ftcontains "this book has been approved by the web site users association" entire content]

The following example returns true because both the content and the note elements match:

/books//* ftcontains "Association" at end Ignore Option FTIgnoreOption"without" "content" UnionExpr

The ignore option specifies a set of nodes whose contents are ignored. It is applicable only to a top-level FTSelection (see FTContainsExpr). Ignored nodes are the set of nodes whose content are ignored. Ignored nodes are identified by the XQuery expression UnionExpr. The value of the UnionExpr must be a sequence of zero or more nodes; otherwise a type error is raised .

Let I1, I2, ..., In be the sequence of items of the search context and let N1, N2, ..., Nk be the sequence of nodes that UnionExpr evaluates to. For each Ij (j=1..n) a copy is made that omits each node Ni (i=1..k). Those copies form the new search context. If UnionExpr evaluates to an empty sequence no nodes are omitted.

In the following fragment, if $x//annotation is ignored, "Web Usability" will be found 2 times: once in the title element and once in the editor element. The 2 occurrences in the 2 annotation elements are ignored. On the other hand, "expert" will not be found, as it appears only in an annotation element.

let $x := <book> <title>Web Usability and Practice</title> <author>Montana <annotation> this author is an expert in Web Usability</annotation> Marigold </author> <editor>Véra Tudor-Medina on Web <annotation> best editor on Web Usability</annotation> Usability </editor> </book>

By default, no element content is ignored.

Nodes MAY be ignored during indexing and during query processing. The ignore option applies only to query processing. Whether and how indexing ignores nodes is out of scope for this specification.

Extension Selections

An extension selection is a full-text selection whose semantics are implementation-defined. Typically, a particular extension will be recognized by some implementations and not by others. The syntax is designed so that extension selections can be successfully parsed by all implementations, and so that fallback behavior can be defined for implementations that do not recognize a particular extension.

FTExtensionSelectionPragma+ "{" FTSelection? "}" Pragma"(#" S? QName (S PragmaContents)? "#)" PragmaContents(Char* - (Char* '#)' Char*))

An extension selection consists of one or more pragmas followed by a full-text selection enclosed in curly braces. See for information on pragmas in general. A pragma is denoted by the delimiters (# and #), and consists of an identifying QName followed by implementation-defined content. The content of a pragma may consist of any string of characters that does not contain the ending delimiter #). The QName of a pragma must resolve to a namespace URI and local name, using the statically known namespaces.

Since there is no default namespace for pragmas, a pragma QName must include a namespace prefix.

Each implementation recognizes an implementation-defined set of namespace URIs used to denote pragmas.

If the namespace part of a pragma QName is not recognized by the implementation as a pragma namespace, then the pragma is ignored. If all the pragmas in an FTExtensionSelection are ignored, then full-text extension selection is just the full-text selection enclosed in curly braces; if this full-text selection is absent, then a static error is raised .

If an implementation recognizes the namespace of one or more pragmas in an FTExtensionSelection, then the value of the FTExtensionSelection, including its error behavior, is implementation-defined. For example, an implementation that recognizes the namespace of a pragma QName, but does not recognize the local part of the QName, might choose either to raise an error or to ignore the pragma.

It is a static error if an implementation recognizes a pragma but determines that its content is invalid.

If an implementation recognizes a pragma, it must report any static errors in the following full-text selection even if it will not apply that selection.

The following examples illustrate three ways in which extension selections might be used.

A pragma can be used to furnish a hint for how to evaluate the following full-text selection, without actually changing the result. For example:

declare namespace exq = "http://example.org/XQueryImplementation"; /books/book/author[name ftcontains (# exq:use-index #) {'Berners-Lee'}]

An implementation that recognizes the exq:use-index pragma might use an index to evaluate the full-text selection that follows. An implementation that does not recognize this pragma would evaluate the full-text selection in its normal way.

A pragma might be used to modify the semantics of the following full-text selection in ways that would not (in the absence of the pragma) be conformant with this specification. For example, a pragma might be used to change distance counting so that adjacent words are at a distance of 1 (otherwise they would be at a distance of 0): declare namespace exq = "http://example.org/XQueryImplementation"; /books/book[.//p ftcontains (# exq:distance #) { "web site" ftand "usability" distance at most 1 words] }

Such changes to the language semantics must be scoped to the expression contained within the curly braces following the pragma.

A pragma might contain syntactic constructs that are evaluated in place of the following full-text selection. In this case, the following selection itself (if it is present) provides a fallback for use by implementations that do not recognize the pragma. For example:

declare namespace exq = "http://example.org/XQueryImplementation"; //city[. ftcontains (# exq:classifier with class 'Animals' #) {"animal" with thesaurus at "http://example.org/thesaurus.xml" relationship "RT"}

Here an implementation that recognizes the pragma will return the result of evaluating the proprietary syntax with class 'animals', while an implementation that does not recognize the pragma will instead return the result of the thesaurus option. If no fallback expression is required, or if none is feasible, then the expression between the curly braces may be omitted, in which case implementations that do not recognize the pragma will raise a static error.

Semantics

This section describes the formal semantics of XQuery and XPath Full Text 1.0. The figure below shows how XQuery and XPath Full Text 1.0 integrates with XQuery 1.0 and XPath 2.0.

The following diagram represents the interaction of XQuery and XPath Full Text with the rest of XQuery 1.0 and XPath 2.0. It illustrates how full-text expressions can be nested within XQuery 1.0 and XPath 2.0 expressions and vice versa.

Step 1 represents the composability of XQuery 1.0 and XPath 2.0 expressions and the fact that such expressions evaluate to a sequence of XDM items. This process is outside the scope of this document and will not be discussed further.

Step 2 shows how XQuery 1.0 and XPath 2.0 expressions can be nested within full-text expressions. If an XQuery 1.0 and XPath 2.0 expression is nested on the left-hand side of an FTContains expression or within FTWords, the sequence of XDM items that result from evaluation of that XQuery 1.0 or XPath 2.0 expression are converted to their tokenized form, as described in Tokenization. If the XQuery 1.0 and XPath 2.0 expression is nested within another type of FTSelection, the items in its result sequence are converted to atomic values, as discussed in FTSelections.

Step 3 represents the composability of FTSelections. Each FTSelection operates on zero or more AllMatches and returns an AllMatches. The process is described in the Evaluation of FTSelections section.

Step 4 shows how XQuery and XPath Full Text 1.0 and scoring expressions can be nested into XQuery 1.0 and XPath 2.0 expressions. The sections and describe how this is achieved.

In the list above and throughout the rest of this section, bold typeface has been used to distinguish the concepts that are part of the AllMatches model.

The functions and schemas defined in this section are considered to be within the fts: namespace (as discussed in section ). These functions and schemas are used only for describing the semantics. There is no requirement that an implementation of this specification must use the functions, schemas, or algorithms described in this section of this specification. The only requirement is that implementations must achieve the same results that an implementation that does use these functions, schemas, and algorithms would achieve.

Note that by using XQuery 1.0 and XPath 2.0 to specify the formal semantics, we avoid the need to introduce new formalism. We simply reuse the formal semantics of XQuery 1.0 and XPath 2.0.

Tokenization

Formally, tokenization is the process of converting an XDM item to a collections of tokens, taking any structural information of the item into account to identify token, sentence, and paragraph boundaries. Each token is assigned a starting and ending position.

Tokenization, including the definition of the term "token", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interprete the results of tokenization. Tokenization MUST conform to these constraints:

Each token MUST consist of one or more characters.

Tokenization of an item MUST include only tokens derived from the string value of that item.

The tokenizer SHOULD, when tokenizing two equal items, identify the same tokens in each. The cases where it does not are implementation-defined.

The starting and ending position of a token MUST be integers, and the starting position MUST be less than or equal to the ending position.

In the tokenization of an item, consider the range of token positions from the smallest starting position to the largest ending position; every token position in that range must be covered by some token in the tokenization. That is, for every token position P, there must exist some token T such that T's starting position <= P <= T's ending position.

The tokenizer MUST preserve the containment hierarchy (paragraphs contain sentences contain tokens) by adhering to the following constraints:

Each token is contained in at most one sentence and at most one paragraph. (In particular, this means that no tokens of any sentence are contained in any other sentence, and no tokens of any paragraph are contained in any other paragraph.)

All tokens of a sentence are contained in at most one paragraph.

The range of token positions from the smallest starting position to the largest ending position in a sentence does not overlap with the token position range from any other sentence.

The range of token positions from the smallest starting position to the largest ending position in a paragraph does not overlap with the token position range from any other paragraph.

Useful information for tokenizer implementors may be found in .

Usually, the starting and ending positions of a token are the same. For some languages, some tokenizers may identify overlapping tokens. For example, the German word "Donaudampfschifffahrtskapitaensmuetze" might be tokenized into the following tokens: "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfschiff", "kapitaen", "muetze", "kapitaensmuetze", "schifffahrt", "dampfschifffahrt", and perhaps others. In the face of overlapping tokens, it is implementation-dependent what positions a tokenizer assigns to each such token. For example, a tokenizer might assign the same position value to each of the tokens "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfshiff", etc. In that case, the distance between each (overlapping) token assigned the same position is -1. Tokenizers might retain additional information about those overlapping tokens that allows the full-text implementation to distinguish among them.

Consider the sentence "Ich sehe den Dampfschifffahrtskapitän auf dem Fluß." If an implementation tokenizes "Dampfschifffahrtskapitän" as overlapping tokens at the same position, then the implementation could still determine that the query "'Schifffahrt Dampf' window 0 words ordered" fails to match the sentence because phrase matching is implementation-defined and may make use of additional implementation-dependent token information.

Even more complex situations can arise. Consider, for example, the German sentence "Er stellte sie vor." A sophisticated tokenizer might construct the token "vorstellen" covering positions 2 through 4, which overlaps the token "sie" at position 3. For the purposes of distance calculations, tokens are considered in the order of their starting positions, so the distance between "vorstellen" and "sie" would be 3-4-1=-2. (See fts:wordDistance, below.)

Examples

For example, the following example must return false, because the 'secret' only occurs within an attribute and a comment, neither of which contributes characters to the string value of the 'p' element node:

Sensitive material  ftcontains 'secret'

The following document may lead to overlapping tokens to account for the ambiguity caused by the hyphen:

I will re- sign tommorow.

The following document fragment is the source document for examples in this section. A sample tokenization is used for the examples in this section. The results might be different for other tokenizations.

Unless stated otherwise, the results assume a case-insensitive match.

<offers> <offer id="1000" price="10000"> Ford Mustang 2000, 65K, excellent condition, runs great, AC, CC, power all </offer> <offer id="1001" price="8000"> Honda Accord 1999, 78K, A/C, cruise control, runs and looks great, excellent condition </offer> <offer id="1005" price="5500"> Ford Mustang, 1995, 150K highway mileage, no rust, excellent condition </offer> </offers>

In this sample tokenization, tokens are delimited by punctuation and whitespace symbols.

The token "Ford" is at relative position 1.

The token "Mustang" is at relative position 2.

The token "2000" is at relative position 3.

Relative position numbers are assigned sequentially through the end of the document.

Hence in this example each token occupies exactly one position, and no overlapping of tokens occurs. The relative positions of tokens are shown below in parentheses.

<offers> <offer id="1000" price="10000"> Ford(1) Mustang(2) 2000(3), 65K(4), excellent(5) condition(6), runs(7) great(8), AC(9), CC(10), power(11) all(12) </offer> <offer id="1001" price="8000"> Honda(13) Accord(14) 1999(15), 78K(16), A(17)/C(18), cruise(19) control(20), runs(21) and(22) looks(23) great(24), excellent(25) condition(26) </offer> <offer id="1005" price="5500"> Ford(27) Mustang(28), 1995(29), 150K(30) highway(31) mileage(32), little(33) rust(34), excellent(35) condition(36) </offer> </offers>

The relative positions of paragraphs are determined similarly. In this sample tokenization, the paragraph delimiters are start tags and end tags.

The tokens in the first 'offer' element are assigned relative paragraph number 1.

The tokens from the next 'offer' element are assigned relative paragraph number 2.

Relative paragraph numbers are assigned sequentially through the end of the document.

The relative positions of sentences are determined similarly using sentence delimiters.

Implementations may provide for the means to ignore or side-step certain structural elements when performing tokenization. In the following example, the implementation has decided to ignore the markup for <bold> and prune out the entire subtree headed by <deleted>.

<para><deleted>This sentence was deleted.</deleted> This <bold>entire paragraph</bold> is one sentence as far as the tokenizer is concerned. </para>

Using the same notation as before, this sample tokenization is shown below. All the tokens marked with a token position also have the same sentence and paragraph relative positions. Note that there are no tokens marked for the ignored subtree.

<para><deleted>This sentence was deleted.</deleted> This(1) <bold>entire(2) paragraph(3)</bold> is(4) one(5) sentence(6) as(7) far(8) as(9) the(10) tokenizer(11) is(12) concerned(13). </para> Representations of Tokenized Text and Matching

A QueryItem is a sequence of QueryTokenInfos representing the collection of tokens derived from tokenizing one query string.

A QueryTokenInfo is the identity of a token inside a query string. Each QueryTokenInfo is associated with a starting and ending position that captures the relative position of the query string in the query.

A TokenInfo represents a contiguous collection of tokens from an XML document. Each TokenInfo is associated with:

startPos: the smallest starting position of a token in the sequence

endPos: the largest ending position of any token of the sequence

startSent: the relative position of the sentence containing the token with the smallest starting position or zero if the tokenizer does not report sentences

endSent: the relative position of the sentence containing the token with the largest ending position or zero if the tokenizer does not report sentences

startPara: the relative position of the paragraph containing the token with the smallest starting position or zero if the tokenizer does not report paragraphs

endPara: the relative position of the paragraph containing the token with the largest ending position or zero if the tokenizer does not report paragraphs

The following matching function is the central implementation-defined primitive performing the full-text retrieval.

declare function fts:matchTokenInfos ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $stopWords as xs:string*, $queryTokens as element(fts:queryToken)* ) as element(fts:tokenInfo)* external;

The above function returns the TokenInfos in items in $searchContext that match the query string represented by the sequence $queryTokens, when using the match options in $matchOptions and stop words in $stopWords. If $queryTokens is a sequence of more than one query token, each returned TokenInfo must represent a phrase matching that sequence.

While this matching function assumes a tokenized representation of the query strings, it does not assume a tokenized representation of the input items in $searchContext, i.e. the texts being searched. Hence, the tokenization of the search context is implicit in this function and coupled to the retrieval of matches. Of course, this does not imply that tokenization of the search context cannot be done a priori. The tokenization of each item in $searchContext does not necessarily take into account the match options in $matchOptions or the query tokens in $queryTokens. This allows implementations to tokenize and index input data without the knowledge of particular match options used in full-text queries.

Evaluation of FTSelections

The XQuery 1.0 and XPath 2.0 Data Model is inadequate to support fully composable FTSelections. Full-text operations, such as FTSelections, operate on linguistic units, such as positions of tokens, and which are not captured in the XQuery 1.0 and XPath 2.0 Data Model (XDM).

XQuery and XPath Full Text adds relative token, sentence, and paragraph position numbers via AllMatches. AllMatches make FTSelections fully composable.

AllMatches Formal Model

An AllMatches describes the possible results of an FTSelection. The UML Static Class diagram of AllMatches is shown on the diagram given below.

The AllMatches object contains zero or more Matches.

Each Match describes one result to the FTSelection. The result is described in terms of zero or more StringIncludes and zero or more StringExcludes.

A StringMatch is a possible match of a sequence of query tokens with a corresponding sequence of tokens in a document. A StringMatch may be a StringInclude or StringExclude. The queryPos attribute specifies the position of the query token in the query. This attribute is needed for FTOrders. The matched document token sequence is described in the TokenInfo associated with the StringMatch.

A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.

A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.

Intuitively, AllMatches specifies the TokenInfos that a search context item contains and does not contain to satisfy an FTSelection.

The AllMatches structure resembles the Disjunctive Normal Form (DNF) in propositional and first-order logic. The AllMatches is a disjunction of Matches. Each Match is a conjunction of StringIncludes, and StringExcludes.

Examples

Since in most of the examples below the tokens span only a single position, we characterize the TokenInfo instance by simply giving this position, written as "Pos:X". This should be read as the value for both, the startPos and the endPos attribute. Furthermore, for expository reasons, we include in each StringMatch example an attribute "query string", set to the original query string, in order to facilitate the association from which query string that match came from.

The simplest example of an FTSelection is an FTWords such as "Mustang". The AllMatches corresponding to this FTWords is given below.

As shown, the AllMatches consists of two Matches. Each Match represents one possible result of the FTWords "Mustang". The result represented by the first Match, represented as a StringInclude, contains the token "Mustang" at position 2. The result described by the second Match contains the token "Mustang" at position 28.

A more complex example of an FTSelection is an FTWords such as "Ford Mustang". The AllMatches for this FTWords is given below.

There are two possible results for this FTWords, and these are represented by the two Matches. Each of the Matches requires two tokens to be matched. The first Match is obtained by matching "Ford" at position 1 and matching "Mustang" at position 2. Similarly, the second Match is obtained by matching "Ford" at position 27 and "Mustang" at position 28.

An even more complex example of an FTSelection is an FTSelection such as "Mustang" ftand ftnot "rust" that searches for "Mustang" but not "rust". The AllMatches for this FTSelection is given below.

This example introduces StringExclude. StringExclude corresponds to negation in DNF (Disjunctive Normal Form). It specifies that the result described by the corresponding Match must not match the token at the specified position. In this example, the first Match specifies that "Mustang" is matched at position 2, and that the token "rust" at position 34 is not matched.

XML representation

AllMatches has a well-defined hierarchical structure. Therefore, the AllMatches can be easily modeled in XML. This XML representation and those which follow formally describe the semantics of FTSelections. For example, the XML representation of AllMatches formally specifies how an FTSelection operates on zero or more AllMatches to produce a resulting AllMatches.

The XML schema for representing AllMatches is given below.

The stokenNum attribute in AllMatches is related to the representation of the semantics as XQuery functions. Therefore, it is not considered part of the AllMatches model. The stokenNum attribute stores the number of query tokens used when evaluating the AllMatches. This value is used to compute the correct value for the queryPos attribute in new StringMatches.

XML Representation

FTSelections are fully composable and may be nested arbitrarily under other FTSelections. Each FTSelection may be associated with match options (such as stemming and stop words) and score weights. Since score weights are solely interpreted by the formal semantics scoring function, they do not influence the semantics of FTSelections. Therefore, score weights are not considered in the formal semantics.

The XML structures defined by the following schema represent FTSelections within the semantic functions of section . This representation is used for definitional purposes only and should not be confused with the XML representation for queries in Appendix . Every FTSelection is represented as an XML element. Every nested FTSelection is represented as a nested descendant element. For binary FTSelections, e.g., FTAnd, the nested FTSelections are represented in <left> and <right> descendant elements. For unary FTSelections, a <selection> descendant element is used. Additional characteristics of FTSelections, e.g., the distance unit for FTDistance, are stored in attributes.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:include schemaLocation="AllMatches.xsd" /> <xs:include schemaLocation="MatchOptions.xsd" /> <xs:complexType name="ftSelection"> <xs:sequence> <xs:choice> <xs:element name="ftWords" type="fts:ftWords"/> <xs:element name="ftAnd" type="fts:ftAnd"/> <xs:element name="ftOr" type="fts:ftOr"/> <xs:element name="ftUnaryNot" type="fts:ftUnaryNot"/> <xs:element name="ftMildNot" type="fts:ftMildNot"/> <xs:element name="ftOrder" type="fts:ftOrder"/> <xs:element name="ftScope" type="fts:ftScope"/> <xs:element name="ftContent" type="fts:ftContent"/> <xs:element name="ftDistance" type="fts:ftDistance"/> <xs:element name="ftWindow" type="fts:ftWindow"/> <xs:element name="ftTimes" type="fts:ftTimes"/> </xs:choice> <xs:element ref="fts:matchOptions" minOccurs="0"/> <xs:element name="weight" type="xs:double" minOccurs="0"/> </xs:sequence> </xs:complexType> <xs:element name="selection" type="fts:ftSelection"/> <xs:complexType name="ftWords"> <xs:sequence> <xs:element ref="fts:queryItem" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="type" type="fts:ftWordsType" use="required"/> </xs:complexType> <xs:element name="queryItem" type="fts:queryItem"/> <xs:complexType name="ftAnd"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftOr"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftUnaryNot"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftMildNot"> <xs:sequence> <xs:element name="left" type="fts:ftSelection"/> <xs:element name="right" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftOrder"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftScope"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:scopeType" use="required"/> <xs:attribute name="scope" type="fts:scopeSelector" use="required"/> </xs:complexType> <xs:complexType name="ftContent"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:contentMatchType" use="required"/> </xs:complexType> <xs:complexType name="ftDistance"> <xs:sequence> <xs:element name="range" type="fts:ftRangeSpec"/> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="type" type="fts:distanceType" use="required"/> </xs:complexType> <xs:complexType name="ftWindow"> <xs:sequence> <xs:element name="selection" type="fts:ftSelection"/> </xs:sequence> <xs:attribute name="size" type="xs:integer" use="required"/> <xs:attribute name="type" type="fts:distanceType" use="required"/> </xs:complexType> <xs:complexType name="ftTimes"> <xs:sequence> <xs:element name="range" type="fts:ftRangeSpec"/> <xs:element name="selection" type="fts:ftWords"/> </xs:sequence> </xs:complexType> <xs:simpleType name="ftWordsType"> <xs:restriction base="xs:string"> <xs:enumeration value="any"/> <xs:enumeration value="all"/> <xs:enumeration value="phrase"/> <xs:enumeration value="any word"/> <xs:enumeration value="all word"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="scopeType"> <xs:restriction base="xs:string"> <xs:enumeration value="same"/> <xs:enumeration value="different"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="scopeSelector"> <xs:restriction base="xs:string"> <xs:enumeration value="paragraph"/> <xs:enumeration value="sentence"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="distanceType"> <xs:restriction base="xs:string"> <xs:enumeration value="paragraph"/> <xs:enumeration value="sentence"/> <xs:enumeration value="word"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="contentMatchType"> <xs:restriction base="xs:string"> <xs:enumeration value="at start"/> <xs:enumeration value="at end"/> <xs:enumeration value="entire content"/> </xs:restriction> </xs:simpleType> </xs:schema> The evaluate function

The semantics for the evaluation of FTSelections is defined using the fts:evaluate function. The function takes three parameters: (1) an FTSelection, 2) a search context item, and 3) the default set of match options that apply to the evaluation of the FTSelection.

The fts:evaluate function returns the AllMatches that is the result of evaluating the FTSelection. When fts:evaluate is applied to some FTSelection X, it calls the function fts:ApplyX to build the resulting AllMatches. If X is applied on nested FTSelections, the fts:evaluate function is recursively called on these nested FTSelections and the returned AllMatches are used in the evaluation of fts:ApplyX.

The semantics for the fts:evaluate function is given below.

declare function fts:evaluate ( $ftSelection as element(*, fts:ftSelection), $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryTokenNum as xs:integer ) as element(fts:allMatches) { if (fn:count($ftSelection/fts:matchOptions) > 0) then (: First we deal with all match options that the :) (: FTSelection might bear: we add the match options :) (: to the current match options structure, and :) (: pass the new structure to the recursive call. :) let $newFTSelection := <fts:selection>{$ftSelection/* [fn:not(self::fts:matchOptions)]}</fts:selection> return fts:evaluate($newFTSelection, $searchContext, fts:replaceMatchOptions($matchOptions, $ftSelection/fts:matchOptions), $queryTokenNum) else if (fn:count($ftSelection/fts:weight) > 0) then (: Weight has no bearing on semantics -- just :) (: call "evaluate" on nested FTSelection :) let $newFTSelection := $ftSelection/*[fn:not(self::fts:weight)] return fts:evaluate($newFTSelection, $searchContext, $matchOptions, $queryTokenNum) else typeswitch ($ftSelection/*[1]) case $nftSelection as element(fts:ftWords) return (: Apply the FTWords in the search context :) fts:ApplyFTWords($searchContext, $matchOptions, $nftSelection/@type, $nftSelection/fts:queryItem, $queryTokenNum + 1) case $nftSelection as element(fts:ftAnd) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $queryTokenNum) let $newQueryTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newQueryTokenNum) return fts:ApplyFTAnd($left, $right) case $nftSelection as element(fts:ftOr) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $queryTokenNum) let $newQueryTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newQueryTokenNum) return fts:ApplyFTOr($left, $right) case $nftSelection as element(fts:ftUnaryNot) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTUnaryNot($nested) case $nftSelection as element(fts:ftMildNot) return let $left := fts:evaluate($nftSelection/fts:left, $searchContext, $matchOptions, $queryTokenNum) let $newQueryTokenNum := $left/@stokenNum let $right := fts:evaluate($nftSelection/fts:right, $searchContext, $matchOptions, $newQueryTokenNum) return fts:ApplyFTMildNot($left, $right) case $nftSelection as element(fts:ftOrder) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTOrder($nested) case $nftSelection as element(fts:ftScope) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTScope($nftSelection/@type, $nftSelection/@scope, $nested) case $nftSelection as element(fts:ftContent) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTContent($searchContext, $nftSelection/@type, $nested) case $nftSelection as element(fts:ftDistance) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTDistance($nftSelection/@type, $nftSelection/fts:range, $nested) case $nftSelection as element(fts:ftWindow) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTWindow($nftSelection/@type, $nftSelection/@size, $nested) case $nftSelection as element(fts:ftTimes) return let $nested := fts:evaluate($nftSelection/fts:selection, $searchContext, $matchOptions, $queryTokenNum) return fts:ApplyFTTimes($nftSelection/fts:range, $nested) default return () };

For concreteness, assume that the FTSelection was invoked inside an ftcontains expression such as searchContext ftcontains ftSelection. In order to determine the AllMatches result of ftSelection, the fts:evaluate function is invoked as follows: fts:evaluate($ftSelection, $searchContext, $matchOptions, 0), where $ftSelection is the XML representation of the ftSelection and $searchContext is bound to the result of the evaluation of the XQuery expression searchContext.

Initially, the $queryTokensNum is 0, i.e., no query tokens have been processed.

The variable $matchOptions is bound to the list of match options as defined in the static context (see Appendix ). Match options embedded in $ftSelection modify the match options collection as evaluation proceeds.

Given the invocation of: fts:evaluate($ftSelection, $searchContext, $matchOptions), evaluation proceeds as follows. First, $ftSelection is checked to see whether 1) it contains a match option, 2) it contains a weight specification, 3) it is an FTWords, or 4) none of the above hold.

If $ftSelection contains one or more match options, these are combined with the inherited match options via a call to fts:replaceMatchOptions (see ). The evaluate function is then invoked on the nested FTSelection with the new set of match options, and the result of that call is returned.

If $ftSelection contains a weight specification, then the specification is ignored because it does not alter the semantics. The evaluate function is recursively called on the nested FTSelection and the resulting AllMatches is returned.

If $ftSelection is an FTWords, then it does not have any nested FTSelections. Consequently, this is the base of the recursive call, and the AllMatches result of the FTWords is computed and returned. The AllMatches is computed by invoking the ApplyFTWords function with the current search context and other necessary information.

If $ftSelection contains neither a match option nor a weight specification and is not an FTWords, the FTSelection performs a full-text operation, such as ftand, ftor, window. These operations are fully-compositional and may be invoked on nested FTSelections. Consequently, evaluation proceeds as follows.

First, the evaluate function is recursively invoked on each nested FTSelection. The result of evaluating each nested FTSelection is an AllMatches.

The AllMatches are transformed into the resulting AllMatches by applying the full-text operation corresponding to FTSelection1 which is generically named applyX for some type of FTSelection X in the code.

For example, let FTSelection1 be FTSelection2 ftand FTSelection3. Here FTSelection2 and FTSelection3 may themselves be arbitrarily nested FTSelections. Thus, evaluate is invoked on FTSelection2 and FTSelection3, and the resulting AllMatches are transformed to the final AllMatches using the ApplyFTAnd function corresponding to ftand .

The semantics of the ApplyX function for each FTSelection kind X is given below.

Formal semantics functions

The formal semantics of the applyX functions for each FTSelection kind X is specified by five functions. How two of these functions are computed is implementation-dependent, but all the functions must satisfy some well-defined properties.

The wordDistance function returns the number of tokens that occur between the positions of the TokenInfos $tokenInfo1 and $tokenInfo2. For example, two tokens with consecutive positions have a distance of 0 tokens, and two overlapping tokens have a distance of -1 tokens.

declare function fts:wordDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startPos - $sorted[1]/@endPos - 1 };

The sentenceDistance function returns the number of sentences between the TokenInfos $tokenInfo1 and $tokenInfo2.

declare function fts:sentenceDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startSent - $sorted[1]/@endSent - 1 };

The paraDistance function returns the number of paragraphs between the TokenInfos $tokenInfo1 and $tokenInfo2.

declare function fts:paraDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startPara - $sorted[1]/@endPara - 1 };

The isStartToken function returns true if the TokenInfo $tokenInfo describes a token whose starting position is the first position of the item $searchContext.

declare function fts:isStartToken ( $searchContext as item(), $tokenInfo as element(fts:tokenInfo) ) as xs:boolean external;

The isEndToken function returns true if the TokenInfo $tokenInfo describes a token whose ending position is the last position of the item $searchContext.

declare function fts:isEndToken ( $searchContext as item(), $tokenInfo as element(fts:tokenInfo) ) as xs:boolean external; FTWords

An FTWords that consists of a single query string consisting of a sequence of token to be matched as a phrase is evaluated by the applyQueryTokensAsPhrase function. Its parameters are 1) the search context, 2) the list of match options, 3) the query string to be matched as a sequence of fts:queryToken items, and 4) the position where the latter query string occurs in the query.

(: simplified version not dealing with special match options :) declare function fts:applyQueryTokensAsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryTokens as element(fts:queryToken)*, $queryPos as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$queryPos}"> { for $tokenInfo in fts:matchTokenInfos( $searchContext, $matchOptions, (), $queryTokens ) return <fts:match> <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> {$tokenInfo} </fts:stringInclude> </fts:match> } </fts:allMatches> };

If after the application of all the match options, the sequence of query tokens returned for an FTWords is empty, an empty AllMatches is returned.

The AllMatches corresponding to an FTWords is a set of Matches. Each of the Matches is associated with a starting and an ending position indicating where the corresponding query tokens were found. For example, the AllMatches result for the FTWords "Mustang" is given below. To simplify the presentation in the figures we write Pos: N, if the attributes startPos and endPos are the same with N being that position.

There are five variations of FTWords depending on how the tokens and phrases in the nested XQuery 1.0 and XPath 2.0 expression are matched.

When any word is specified, at least one token in the tokenization of the nested expression must be matched.

When all word is specified, all tokens in the tokenization of the nested expression must be matched.

When phrase is specified, all tokens in the tokenization of the nested expression must be matched as a phrase.

When any is specified, at least one string atomic value in the nested expression must be matched as a phrase.

When all is specified, all string atomic values in the nested expression must be matched as a phrase.

The semantics for FTWords when any word is specified is given below. Since FTWords does not have nested FTSelections, the ApplyFTWords function does not take AllMatches parameters corresponding to nested FTSelection results.

declare function fts:MakeDisjunction ( $curRes as element(fts:allMatches), $rest as element(fts:allMatches)* ) as element(fts:allMatches) { if (fn:count($rest) = 0) then $curRes else let $firstAllMatches := $rest[1] let $restAllMatches := fn:subsequence($rest, 2) let $newCurRes := fts:ApplyFTOr($curRes, $firstAllMatches) return fts:MakeDisjunction($newCurRes, $restAllMatches) }; declare function fts:ApplyFTWordsAnyWord ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Tokenization of query string has already occurred. :) (: Get sequence of QueryTokens over all query items. :) let $queryTokens := $queryItems/fts:queryToken return if (fn:count($queryItems) eq 0) then <fts:allMatches stokenNum="0" /> else let $allAllMatches := for $queryToken at $pos in $queryTokens return fts:applyQueryTokensAsPhrase($searchContext, $matchOptions, $queryToken, $queryPos + $pos - 1) let $firstAllMatches := $allAllMatches[1] let $restAllMatches := fn:subsequence($allAllMatches, 2) return fts:MakeDisjunction($firstAllMatches, $restAllMatches) };

The tokenized query strings are passed to ApplyFTWordsAnyWord as a sequence of fts:queryItem, each containing the tokens of a single query string. A single flattened sequence of all tokens (of type fts:queryToken) over all query items is constructed. For each of these, the result of FTWords is computed using applyQueryTokensAsPhrase. Finally, the disjunction of all resulting AllMatches is computed.

The semantics for FTWords when all word is specified is similar to the above, however composes a conjunction. It is given below.

declare function fts:MakeConjunction ( $curRes as element(fts:allMatches), $rest as element(fts:allMatches)* ) as element(fts:allMatches) { if (fn:count($rest) = 0) then $curRes else let $firstAllMatches := $rest[1] let $restAllMatches := fn:subsequence($rest, 2) let $newCurRes := fts:ApplyFTAnd($curRes, $firstAllMatches) return fts:MakeConjunction($newCurRes, $restAllMatches) }; declare function fts:ApplyFTWordsAllWord ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Tokenization of query strings has already occurred. :) (: Get sequence of QueryTokens over all query items :) let $queryTokens := $queryItems/fts:queryToken return if (fn:count($queryTokens) eq 0) then <fts:allMatches stokenNum="0" /> else let $allAllMatches := for $queryToken at $pos in $queryTokens return fts:applyQueryTokensAsPhrase($searchContext, $matchOptions, $queryToken, $queryPos + $pos - 1) let $firstAllMatches := $allAllMatches[1] let $restAllMatches := fn:subsequence($allAllMatches, 2) return fts:MakeConjunction($firstAllMatches, $restAllMatches) };

The semantics for FTWords if phrase is specified is given below.

declare function fts:ApplyFTWordsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { (: Get sequence of QueryTokenInfos over all query items :) let $queryTokens := $queryItems/fts:queryToken return if (fn:count($queryTokens) eq 0) then <fts:allMatches stokenNum="0" /> else fts:applyQueryTokensAsPhrase($searchContext, $matchOptions, $queryTokens, $queryPos) };

The ApplyFTWordsPhrase function also flattens the sequence of query items to a sequence of query tokens, but then calls applyQueryTokensAsPhrase on that entire sequence, instead of calling it on each query token individually. Hence, the sequence of all query tokens is matched as a single phrase and the computed TokenInfos are returned.

The semantics for FTWords when any is specified is given below.

declare function fts:ApplyFTWordsAny ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if (fn:count($queryItems) eq 0) then <fts:allMatches stokenNum="0" /> else let $firstQueryItem := $queryItems[1] let $restQueryItem := fn:subsequence($queryItems, 2) let $firstAllMatches := fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $firstQueryItem, $queryPos) let $newQueryPos := if ($firstAllMatches//@queryPos) then fn:max($firstAllMatches//@queryPos) + 1 else $queryPos let $restAllMatches := fts:ApplyFTWordsAny($searchContext, $matchOptions, $restQueryItem, $newQueryPos) return fts:ApplyFTOr($firstAllMatches, $restAllMatches) };

The FTWords with any specified forms the disjunction of the AllMatches that are the result of the matching of each query item as a phrase.

The semantics for FTWords when all is specified is given below.

declare function fts:ApplyFTWordsAll ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if (fn:count($queryItems) = 0) then <fts:allMatches stokenNum="0" /> else let $firstQueryItem := $queryItems[1] let $restQueryItem := fn:subsequence($queryItems, 2) let $firstAllMatches := fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $firstQueryItem, $queryPos) return if ($restQueryItem) then let $newQueryPos := if ($firstAllMatches//@queryPos) then fn:max($firstAllMatches//@queryPos) + 1 else $queryPos let $restAllMatches := fts:ApplyFTWordsAll($searchContext, $matchOptions, $restQueryItem, $newQueryPos) return fts:ApplyFTAnd($firstAllMatches, $restAllMatches) else $firstAllMatches };

The difference between all and any is the use of conjunction instead of disjunction.

The ApplyFTWords function combines all of these functions.

declare function fts:ApplyFTWords ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $type as fts:ftWordsType, $queryItems as element(fts:queryItem)*, $queryPos as xs:integer ) as element(fts:allMatches) { if ($type eq "any word") then fts:ApplyFTWordsAnyWord($searchContext, $matchOptions, $queryItems, $queryPos) else if ($type eq "all word") then fts:ApplyFTWordsAllWord($searchContext, $matchOptions, $queryItems, $queryPos) else if ($type eq "phrase") then fts:ApplyFTWordsPhrase($searchContext, $matchOptions, $queryItems, $queryPos) else if ($type eq "any") then fts:ApplyFTWordsAny($searchContext, $matchOptions, $queryItems, $queryPos) else fts:ApplyFTWordsAll($searchContext, $matchOptions, $queryItems, $queryPos) }; Match Options Semantics Types

XQuery 1.0 functions are used to define the semantics of FTMatchOptions. These functions operate on an XML representation of the FTMatchOptions. The representation closely follows the syntax. Each FTMatchOption is represented by an XML element. Additional characteristics of the match option are represented as attributes. The schema is given below.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fts="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:complexType name="ftMatchOptions"> <xs:sequence> <xs:element ref="fts:thesaurus" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:stopwords" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:case" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:diacritics" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:stem" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:wildcard" minOccurs="0" maxOccurs="1"/> <xs:element ref="fts:language" minOccurs="0" maxOccurs="1"/> </xs:sequence> </xs:complexType> <xs:element name="matchOptions" type="fts:ftMatchOptions"/> <xs:element name="case" type="fts:ftCaseOption" /> <xs:element name="diacritics" type="fts:ftDiacriticsOption" /> <xs:element name="thesaurus" type="fts:ftThesaurusOption" /> <xs:element name="stem" type="fts:ftStemOption" /> <xs:element name="wildcard" type="fts:ftWildCardOption" /> <xs:element name="language" type="fts:ftLanguageOption" /> <xs:element name="stopwords" type="fts:ftStopWordOption" /> <xs:complexType name="ftCaseOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="case insensitive"/> <xs:enumeration value="case sensitive"/> <xs:enumeration value="lowercase"/> <xs:enumeration value="uppercase"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftDiacriticsOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="case insensitive"/> <xs:enumeration value="case sensitive"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftThesaurusOption"> <xs:sequence> <xs:element name="thesaurusName" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="relationship" type="xs:string" minOccurs="0" maxOccurs="1"/> <xs:element name="range" type="fts:ftRangeSpec" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="thesaurusIndicator"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="with"/> <xs:enumeration value="without"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="language" type="xs:string"/> </xs:complexType> <xs:complexType name="ftRangeSpec"> <xs:attribute name="type" type="fts:rangeSpecType" use="required"/> <xs:attribute name="m" type="xs:integer"/> <xs:attribute name="n" type="xs:integer" use="required"/> </xs:complexType> <xs:simpleType name="rangeSpecType"> <xs:restriction base="xs:string"> <xs:enumeration value="exactly"/> <xs:enumeration value="at least"/> <xs:enumeration value="at most"/> <xs:enumeration value="from to"/> </xs:restriction> </xs:simpleType> <xs:complexType name="ftStemOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="with stemming"/> <xs:enumeration value="without stemming"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftWildCardOption"> <xs:sequence> <xs:element name="value"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="with wildcards"/> <xs:enumeration value="without wildcards"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> <xs:complexType name="ftLanguageOption"> <xs:sequence> <xs:element name="value" type="xs:string"/> </xs:sequence> </xs:complexType> <xs:complexType name="ftStopWordOption"> <xs:sequence> <xs:choice> <xs:element name="default-stopwords"> <xs:complexType /> </xs:element> <xs:element name="stopword" type="xs:string" /> <xs:element name="uri" type="xs:anyURI" /> </xs:choice> <xs:element name="oper" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:choice> <xs:element name="stopword" type="xs:string" /> <xs:element name="uri" type="xs:anyURI" /> </xs:choice> <xs:attribute name="type"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="union"/> <xs:enumeration value="except"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:schema> High-Level Semantics

The previous section described FTSelections without giving any details about how FTMatchOptions need to be interpreted. All processing of FTMatchOptions was delegated to the function matchTokenInfos, which is implementation-defined. In this section, further details on the semantics of FTMatchOptions are given.

The extension is achieved by modifying an existing function and adding functions that are specific to the FTMatchOptions.

Modifications in the semantics of existing functions

The semantics of most of the FTSelections remains unmodified. The modifications are to the method for matching a sequence of query tokens.

declare function fts:applyQueryTokensAsPhrase ( $searchContext as item(), $matchOptions as element(fts:matchOptions), $queryTokens as element(fts:queryToken)*, $queryPos as xs:integer ) as element(fts:allMatches) { let $thesaurusOption := $matchOptions/fts:thesaurus[1] return if ($thesaurusOption and $thesaurusOption/@thesaurusIndicator eq "with") then let $noThesaurusOptions := <fts:matchOptions>{ $matchOptions/*[fn:not(self::fts:thesaurus)] }</fts:matchOptions> let $lookupRes := fts:applyThesaurusOption($thesaurusOption, $queryTokens) return fts:ApplyFTWordsAny($searchContext, $noThesaurusOptions, $lookupRes, $queryPos) else (: from here on we have a single sequence of query tokens :) (: which is to be matched a phrase; no alternatives anymore :) <fts:allMatches stokenNum="{$queryPos}"> { for $pos in fts:matchTokenInfos( $searchContext, $matchOptions, fts:applyStopWordOption($matchOptions/fts:stopwords), $queryTokens ) return <fts:match> <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> {$pos} </fts:stringInclude> </fts:match> } </fts:allMatches> };

Two FTMatchOptions need to be processed differently than the rest of the FTMatchOptions as shown in the function above.

Unlike all other FTMatchOptions the semantics of the FTThesaurusOption cannot be formulated as an operation on individual query tokens, because a thesaurus lookup may return alternative query items for a whole phrase, i.e., a sequence of query tokens. Since the result of a thesaurus lookup is a sequence of alternatives, there must be a higher level of processing. The above call to applyThesaurusOption> returns for the given sequence of query tokens (representing a phrase) all thesaurus expansions for the selected thesaurus, relationship and level range as a sequence of query items. The alternative expansions are evaluated as a disjunction using the fts:ApplyFTWordsAny. The matching of the alternatives is performed with FTThesaurusOption turned off to avoid double expansions, i.e., expansion of an already expanded token.

For the semantics of the FTStopWordOption the list of stop words needs to be computed as demanded by the special syntax for stop word lists involving the operators "union" and "except".

Semantics of new FTMatchOptions functions

The expansion of FTSelections also includes adding additional functions that are specific to the FTMatchOptions.

The evaluate function above handles match options occurring in the query structure by using a call to the function replaceMatchOptions which is defined below. The latter function replaces match options from the list given by the first argument with match options of the same group in the list given by the second argument, if any. If an option is present in the second list but not in the first list, the option is included to the resulting list too. Intuitively, the replaceMatchOptions computes the effective match options for a given FTSelection. The function uses the options specified specifically for the current FTSelection ( $ftSelection/fts:matchOptions to override any options of the same group declared up the query tree ($matchOptions).

declare function fts:replaceMatchOptions ( $matchOptions as element(fts:matchOptions), $newMatchOptions as element(fts:matchOptions) ) as element(fts:matchOptions) { <fts:matchOptions> { (if ($newMatchOptions/fts:thesaurus) then $newMatchOptions/fts:thesaurus else $matchOptions/fts:thesaurus), (if ($newMatchOptions/fts:stopwords) then $newMatchOptions/fts:stopwords else $matchOptions/fts:stopwords), (if ($newMatchOptions/fts:case) then $newMatchOptions/fts:case else $matchOptions/fts:case), (if ($newMatchOptions/fts:diacritics) then $newMatchOptions/fts:diacritics else $matchOptions/fts:diacritics), (if ($newMatchOptions/fts:stem) then $newMatchOptions/fts:stem else $matchOptions/fts:stem), (if ($newMatchOptions/fts:wildcard) then $newMatchOptions/fts:wildcard else $matchOptions/fts:wildcard), (if ($newMatchOptions/fts:language) then $newMatchOptions/fts:language else $matchOptions/fts:language) } </fts:matchOptions> };

This function determines how match options of the same group overwrite each other, so that only one option of the same group remains.

The details of the semantics of the remaining FTMatchOptions are determined by the implementation-defined function matchTokenInfos.

Formal Semantics Functions

FTMatchOption functions which are necessary to support match option processing are given below.

declare function fts:resolveStopWordsUri ( $uri as xs:string? ) as xs:string* external; declare function fts:lookupThesaurus ( $tokens as element(fts:queryToken)*, $thesaurusName as xs:string?, $thesaurusLanguage as xs:string?, $relationship as xs:string?, $range as element(fts:range)? ) as element(fts:queryItem)* external;

The function resolveStopWordsUri is used to resolve any URI to a sequence of strings to be used as stop words.

The function lookupThesaurus finds all expansions related to $tokens in the thesaurus $thesaurusName for the language $thesaurusLanguage using the relationship $relationship within the optional number of levels $range. If $tokens consists of more than one query token, it is regarded as a phrase.

The thesaurus function returns a sequence of expansion alternatives. Each alternative is regarded as a new search phrase and is represented as a query item. Alternatives are treated as though they are connected with a disjunction (FTOr).

FTCaseOption

FTMatchOptions of type FTCaseOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTCaseOption is "lowercase" the returned TokenInfos must span only tokens that are all lowercase. If the FTCaseOption is "uppercase" the returned TokenInfos must span only tokens that are all uppercase. If the FTCaseOption is "case insensitive" the function must return all TokenInfos matching the query tokens when disregarding character case. If the FTCaseOption is "case sensitive" the function must return all TokenInfos that also accord with the query tokens in character case.

FTDiacriticsOption

FTMatchOptions of type FTDiacriticsOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTDiacriticsOption is "diacritics insensitive" the function must return all TokenInfos matching the query tokens when disregarding diacritical marks. If the FTDiacriticsOption is "diacritics sensitive" the function must return all TokenInfos that also accord with the query tokens in diacritical marks.

FTStemOption

FTMatchOptions of type FTStemOption are passed in the $matchOptions parameter to matchTokenInfos. It is implementation-defined what the effect of the option "with stemming" is on matching tokens, however, it is expected that this option allows to match linguistic variants of the query tokens. If the FTStemOption is "without stemming" the returned TokenInfos must span exact matches (i.e. not including linguistic variations) of the query tokens.

FTThesaurusOption

The semantics for the FTThesaurusOption is given below.

declare function fts:applyThesaurusOption ( $matchOption as element(fts:thesaurus), $queryTokens as element(fts:queryToken)* ) as element(fts:queryItem)* { if ($matchOption/@thesaurusIndicator = "with") then fts:lookupThesaurus( $queryTokens, $matchOption/fts:thesaurusName, $matchOption/@language, $matchOption/fts:relationship, $matchOption/fts:range ) else if ($matchOption/@thesaurusIndicator = "without") then <fts:queryItem> {$queryTokens} </fts:queryItem> else () }; FTStopWordOption

Stop words interact with FTDistance and FTWindow. The semantics for the FTStopWordOption is given below.

declare function fts:applyStopWordOption ( $stopWordOption as element(fts:stopwords)? ) as xs:string* { if ($stopWordOption) then let $swords := typeswitch ($stopWordOption/*[1]) case $e as element(fts:stopword) return $e/text() case $e as element(fts:uri) return fts:resolveStopWordsUri($e/text()) case element(fts:default-stopwords) return fts:resolveStopWordsUri(()) default return () return fts:calcStopWords( $swords, $stopWordOption/fts:oper ) else () }; declare function fts:calcStopWords ( $stopWords as xs:string*, $opers as element(fts:oper)* ) as element(fts:queryToken)* { if ( fn:empty($opers) ) then $stopWords else let $swords := typeswitch ($opers[1]/*[1]) case $e as element(fts:stopword) return $e/text() case $e as element(fts:uri) return fts:resolveStopWordsUri($e/text()) default return () return if ($opers[1]/@type eq "union") then fts:calcStopWords( ($stopWords, $swords), $opers[fn:position() gt 2] ) else (: "except" :) fts:calcStopWords( $stopWords[fn:not(.)=$swords], $opers[fn:position() gt 2] ) };

The stop words set is computed using the fts:calcStopWords function. The function uses the function fts:resolveStopWordsUri to resolve any URI to a sequence of strings. Then, the stop words are removed from the set of query tokens.

FTLanguageOption

The FTLanguageOption is not associated with a semantics function. It is just a parameter to other semantics functions.

FTWildCardOption

FTMatchOptions of type FTWildCardOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTWildCardOption is "with wildcards" the function must return all TokenInfos in the search context that span tokens, such that those tokens are wildcard expansions of the corresponding query token. The wildcard expansions are described in Section 3.2.7 FTWildCardOption. If the FTWildCardOption is "without wildcards" all query tokens must be matched literally.

Full-Text Operators Semantics FTOr

The parameters of the ApplyFTOr function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The semantics is given below.

declare function fts:ApplyFTOr ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, $allMatches2/@stokenNum))}"> {$allMatches1/fts:match,$allMatches2/fts:match} </fts:allMatches> };

The ApplyFTOr function creates a new AllMatches in which Matches are the union of those found in the input AllMatches. Each Match represents one possible result of the corresponding FTSelection. Thus, a Match from either of the AllMatches is a result.

For example, consider the FTSelection "Mustang" ftor "Honda". The AllMatches corresponding to "Mustang" and "Honda" are given below.

The AllMatches produced by ApplyFTOr is given below.

FTAnd

The parameters of the ApplyFTAnd function are the two AllMatches corresponding to the results of the two nested FTSelections. The semantics is given below.

declare function fts:ApplyFTAnd ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, $allMatches2/@stokenNum))}" > { for $sm1 in $allMatches1/fts:match for $sm2 in $allMatches2/fts:match return <fts:match> {$sm1/*, $sm2/*} </fts:match> } </fts:allMatches> };

The result of the conjunction is a new AllMatches that contains the "Cartesian product" of the matches of the participating FTSelections. Every resulting Match is formed by the combination of the StringInclude components and StringExclude from the AllMatches of the nested FTSelection . Thus every match contains the positions to satisfy a Match from both original FTSelections and excludes the positions that violate the same Matches.

For example, consider the FTSelection "Mustang" ftand "rust". The source AllMatches are give below.

The AllMatches produced by ApplyFTAnd is given below.

FTUnaryNot

The ApplyFTUnaryNot function has one AllMatches parameter corresponding to the result of the nested FTSelection to be negated. The semantics is given below.

declare function fts:InvertStringMatch ( $strm as element(*,fts:stringMatch) ) as element(*,fts:stringMatch) { if ($strm instance of element(fts:stringExclude)) then <fts:stringInclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}"> {$strm/fts:tokenInfo} </fts:stringInclude> else <fts:stringExclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}"> {$strm/fts:tokenInfo} </fts:stringExclude> }; declare function fts:UnaryNotHelper ( $matches as element(fts:match)* ) as element(fts:match)* { if (fn:empty($matches)) then <match/> else for $sm in $matches[1]/* for $rest in fts:UnaryNotHelper( fn:subsequence($matches, 2) ) return <fts:match> { fts:InvertStringMatch($sm), $rest/* } </fts:match> }; declare function fts:ApplyFTUnaryNot ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { fts:UnaryNotHelper($allMatches/fts:match) } </fts:allMatches> };

The generation of the resulting AllMatches of an FTUnaryNot resembles the transformation of a negation of prepositional formula in DNF back to DNF. The negation of AllMatches requires the inversion of all the StringMatches within the AllMatches.

In the InvertStringMatch function above, this inversion occurs as follows.

The function fts:invertStringMatch inverts a StringInclude into a StringExclude and vice versa.

The function fts:UnaryNotHelper transforms the source Matches into the resulting Matches by forming the combinations of the inversions of a StringInclude or StringExclude component over the source Matches into new Matches.

For example, consider the FTSelection ftnot ("Mustang" ftor "Honda"). The source AllMatches is given below:

The FTUnaryNot transforms the StringIncludes to StringExcludes as illustrated below.

FTMildNot

The parameters of the ApplyFTMildNot function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The semantics is given below.

declare function fts:CoveredIncludePositions ( $match as element(fts:match) ) as xs:integer* { for $strInclude in $match/fts:stringInclude return $strInclude/fts:tokenInfo/@startPos to $strInclude/fts:tokenInfo/@endPos }; declare function fts:ApplyFTMildNot ( $allMatches1 as element(fts:allMatches), $allMatches2 as element(fts:allMatches) ) as element(fts:allMatches) { if (fn:count($allMatches1//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'FTDY0017'), "Invalid expression on the left-hand side of a not-in") else if (fn:count($allMatches2//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'FTDY0017'), "Invalid expression on the right-hand side of a not-in") else if (fn:count($allMatches2//fts:stringInclude) eq 0) then $allMatches1 else <fts:allMatches stokenNum="{$allMatches1/@stokenNum}"> { $allMatches1/fts:match[ every $matches2 in $allMatches2/fts:match satisfies let $posSet1 := fts:CoveredIncludePositions(.) let $posSet2 := fts:CoveredIncludePositions($matches2) return some $pos in $posSet1 satisfies fn:not($pos = $posSet2) ] } </fts:allMatches> };

The resulting AllMatches contains Matches of the first operand that do not mention in their StringInclude components positions in a StringInclude component in the AllMatches of the second operand.

For example, consider the FTSelection ("Ford" not in "Ford Mustang"). The source AllMatches for the left-hand side argument is given below.

The source AllMatches for the right-hand side argument is given below.

The FTMildNot will transform these to an empty AllMatches because both position 1 and position 27 from the first AllMatches contain only TokenInfos from StringInclude components of the second AllMatches.

FTOrder

The ApplyFTOrder function has one AllMatches parameter corresponding to the result of the nested FTSelections. The semantics is given below.

declare function fts:ApplyFTOrder ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies (($stringInclude1/fts:tokenInfo/@startPos <= $stringInclude2/fts:tokenInfo/@startPos) and ($stringInclude1/@queryPos <= $stringInclude2/@queryPos)) or (($stringInclude1/fts:tokenInfo/@startPos>= $stringInclude2/fts:tokenInfo/@startPos) and ($stringInclude1/@queryPos >= $stringInclude2/@queryPos)) return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies (($stringExcl/fts:tokenInfo/@startPos <= $stringIncl/fts:tokenInfo/@startPos) and ($stringExcl/@queryPos <= $stringIncl/@queryPos)) or (($stringExcl/fts:tokenInfo/@startPos >= $stringIncl/fts:tokenInfo/@startPos) and ($stringExcl/@queryPos >= $stringIncl/@queryPos)) return $stringExcl } </fts:match> } </fts:allMatches> };

The resulting AllMatches contains the Matches for which the starting positions in the StringInclude elements are in the order of the query positions of their query strings. StringExcludes that preserve the order (with respect to their starting positions) are also retained.

For example, consider the FTSelection ("great" ftand "condition") ordered. The source AllMatches is given below.

The AllMatches for FTOrder are given below.

FTScope

The parameters of the ApplyFTScope function are 1) the type of the scope (same or different), 2) the linguistic unit (sentence or paragraph), and 2) one AllMatches parameter corresponding to the result of the nested FTSelections. The function definitions depend on the type of the scope (paragraph, sentence) and the scope predicate (same, different).

The semantics of same sentence is given below.

declare function fts:ApplyFTScopeSameSentence ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1/fts:tokenInfo/@startSent = $stringInclude2/fts:tokenInfo/@startSent and $stringInclude1/fts:tokenInfo/@startSent = $stringInclude1/fts:tokenInfo/@endSent and $stringInclude2/fts:tokenInfo/@startSent = $stringInclude2/fts:tokenInfo/@endSent and $stringInclude1/fts:tokenInfo/@startSent > 0 and $stringInclude2/fts:tokenInfo/@startSent > 0 return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where $stringExcl/fts:tokenInfo/@startSent = 0 or ($stringExcl/fts:tokenInfo/@startSent = $stringExcl/fts:tokenInfo/@endSent and (every $stringIncl in $match/fts:stringInclude satisfies $stringIncl/fts:tokenInfo/@startSent = $stringExcl/fts:tokenInfo/@startSent) ) return $stringExcl } </fts:match> } </fts:allMatches> };

An AllMatches returned by the scope same sentence contains those Matches whose StringIncludes span only a single sentence and all span the same sentence. In these Matches only those StringExcludes are retained that also only span a single sentence, which is, in case there are StringIncludes in that Match, the same as the one spanned by the StringIncludes.

The semantics of different sentence is given below.

declare function fts:ApplyFTScopeDifferentSentence ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1 is $stringInclude2 or $stringInclude1/fts:tokenInfo/@endSent < $stringInclude2/fts:tokenInfo/@startSent or $stringInclude2/fts:tokenInfo/@endSent < $stringInclude1/fts:tokenInfo/@startSent return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies $stringExcl/fts:tokenInfo/@endSent < $stringIncl/fts:tokenInfo/@startSent or $stringIncl/fts:tokenInfo/@endSent < $stringExcl/fts:tokenInfo/@startSent return $stringExcl } </fts:match> } </fts:allMatches> };

An AllMatches returned by the scope different sentence contains those Matches that have no two StringIncludes covering the same sentence. In these Matches only those StringExcludes are retained that also do not cover a common sentence with one of the StringIncludes.

The semantics of same paragraph is analogous to same sentence and is given below.

declare function fts:ApplyFTScopeSameParagraph ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1/fts:tokenInfo/@startPara = $stringInclude2/fts:tokenInfo/@startPara and $stringInclude1/fts:tokenInfo/@startPara = $stringInclude1/fts:tokenInfo/@endPara and $stringInclude2/fts:tokenInfo/@startPara = $stringInclude2/fts:tokenInfo/@endPara and $stringInclude1/fts:tokenInfo/@startPara > 0 and $stringInclude2/fts:tokenInfo/@endPara > 0 return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where $stringExcl/fts:tokenInfo/@startPara = 0 or ($stringExcl/fts:tokenInfo/@startPara = $stringExcl/fts:tokenInfo/@endPara and (every $stringIncl in $match/fts:stringInclude satisfies $stringIncl/fts:tokenInfo/@startPara = $stringExcl/fts:tokenInfo/@startPara) ) return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of different paragraph is analogous to different sentence and is given below.

declare function fts:ApplyFTScopeDifferentParagraph ( $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where every $stringInclude1 in $match/fts:stringInclude, $stringInclude2 in $match/fts:stringInclude satisfies $stringInclude1 is $stringInclude2 or $stringInclude1/fts:tokenInfo/@endPara < $stringInclude2/fts:tokenInfo/@startPara or $stringInclude2/fts:tokenInfo/@endPara < $stringInclude1/fts:tokenInfo/@startPara return <fts:match> { $match/fts:stringInclude, for $stringExcl in $match/fts:stringExclude where every $stringIncl in $match/fts:stringInclude satisfies $stringExcl/fts:tokenInfo/@endPara < $stringIncl/fts:tokenInfo/@startPara or $stringIncl/fts:tokenInfo/@endPara < $stringExcl/fts:tokenInfo/@startPara return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics for the general case is given below.

declare function fts:ApplyFTScope ( $type as fts:scopeType, $selector as fts:scopeSelector, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "same" and $selector eq "sentence") then fts:ApplyFTScopeSameSentence($allMatches) else if ($type eq "different" and $selector eq "sentence") then fts:ApplyFTScopeDifferentSentence($allMatches) else if ($type eq "same" and $selector eq "paragraph") then fts:ApplyFTScopeSameParagraph($allMatches) else fts:ApplyFTScopeDifferentParagraph($allMatches) };

For example, consider the FTSelection ("Mustang" ftand "Honda") same paragraph. The source AllMatches is given below.

The FTScope returns an empty AllMatches because neither Match contains TokenInfos from a single sentence.

FTContent

The parameters of the ApplyFTContent function are 1) the search context, 2) the type of the content match (at start, at end, or entire content), and 3) one AllMatches parameter corresponding to the result of the nested FTSelections. The semantics is given below.

declare function fts:ApplyFTContent ( $searchContext as item(), $type as fts:contentMatchType, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "entire content") then let $temp1 := fts:ApplyFTWordDistanceExactly( $allMatches, 1) let $temp2 := fts:ApplyFTContent( $searchContext, fts:contentMatchType("at start"), $temp1) let $temp3 := fts:ApplyFTContent( $searchContext, fts:contentMatchType("at end"), $temp2) return <fts:allMatches stokenNum="{$temp3/@stokenNum}"> { for $match in $temp3/fts:match return <fts:match> { (: Note: due to ApplyFTWordDistanceExactly above there must be either one or no stringInclude in $match :) $match/fts:stringInclude[@isContiguous], $match/fts:stringExclude[@isContiguous] } </fts:match> } </fts:allMatches> else <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match where if ($type eq "at start") then some $si in $match/fts:stringInclude satisfies fts:isStartToken($searchContext, $si/fts:tokenInfo) else (: $type eq "at end" :) some $si in $match/fts:stringInclude satisfies fts:isEndToken($searchContext, $si/fts:tokenInfo) return $match } </fts:allMatches> };

The evaluation of scope functions depends on the type of the content match.

entire content is evaluated as distance exactly 0 words at start at end, i.e., all the StringIncludes must match every token in the search context item.

at start retains only Matches that contain a StringInclude that matches the first token. This is checked using the semantic function fts:isStartToken.

at end retains the Matches that contain a StringInclude that matches the last token. This is checked using the semantic function fts:isEndToken.

FTWindow

Before we define the semantics functions of the FTWindow and FTDistance operations, we introduce the auxiliary function joinIncludes that will be used in their definitions. joinIncludes takes a sequence of StringIncludes of a Match and transforms it into either the empty sequence, in case the input sequence was empty, or otherwise a single StringInclude representing the span from the first position of the match to the last. For the purpose of being able to evaluate an "entire content" operator further up in the tree, we pre-evaluate whether all possible positions between first and last are covered in the input StringIncludes and store that boolean in the attribute "isContiguous".

declare function fts:joinIncludes( $strIncls as element(fts:stringInclude)* ) as element(fts:stringInclude)? { if (fn:empty($strIncls)) then $strIncls else let $posSet := fts:CoveredIncludePositions(<fts:match>$strIncls</fts:match>), $minPos := fn:min($strIncls/fts:tokenInfo/@startPos), $maxPos := fn:max($strIncls/fts:tokenInfo/@endPos), $isContiguous := ( every $pos in $minPos to $maxPos satisfies ($pos = $posSet) ) and ( every $strIncl in $strIncls satisfies $strIncl/@isContiguous ) return <fts:stringInclude queryPos="{$strIncls[1]/@queryPos}" isContiguous="{$isContiguous}"> <fts:tokenInfo startPos="{$minPos}" endPos="{$maxPos}" startSent="{fn:min($strIncls/fts:tokenInfo/@startSent)}" endSent="{fn:max($strIncls/fts:tokenInfo/@startSent)}" startPara="{fn:min($strIncls/fts:tokenInfo/@startPara)}" endPara="{fn:max($strIncls/fts:tokenInfo/@startPara)}"/> </fts:stringInclude> };

The parameters of the ApplyFTWindow function are 1) the unit of type fts:distanceType, 2) a size, and 3) one AllMatches parameter corresponding to the result of the nested FTSelections. For each unit type a function is defined as follows.

The semantics of window N words is given below.

declare function fts:ApplyFTWordWindow ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startPos), $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endPos) for $windowStartPos in ($maxpos - $n + 1 to $minpos) let $windowEndPos := $windowStartPos + $n - 1 return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startPos >= $windowStartPos and $stringExclude/fts:tokenInfo/@endPos <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };

The semantics of window N sentences is given below.

declare function fts:ApplyFTSentenceWindow ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startSent), $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endSent) for $windowStartPos in ($maxpos - $n + 1 to $minpos) let $windowEndPos := $windowStartPos + $n - 1 return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startSent >= $windowStartPos and $stringExclude/fts:tokenInfo/@endSent <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };

The semantics of window N paragraphs is given below.

declare function fts:ApplyFTParagraphWindow ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startPara), $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endPara) for $windowStartPos in ($maxpos - $n + 1 to $minpos) let $windowEndPos := $windowStartPos + $n - 1 return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExclude in $match/fts:stringExclude where $stringExclude/fts:tokenInfo/@startPara >= $windowStartPos and $stringExclude/fts:tokenInfo/@endPara <= $windowEndPos return $stringExclude } </fts:match> } </fts:allMatches> };

The resulting AllMatches contains Matches of the operand that satisfy the condition that there exists a sequence of the specified number of consecutive (token, sentence, or paragraph) positions, such that all StringIncludes are within that window, and the StringExcludes retained are also within that window. For each Match that satisfies the window condition the StringIncludes are joined into a single StringInclude. This enables further window or distance operations to be applied to the result in a way that that result is taken as a single entity.

The semantics for the general function is given below.

declare function fts:ApplyFTWindow ( $type as fts:distanceType, $size as xs:integer, $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "word") then fts:ApplyFTWordWindow($allMatches, $size) else if ($type eq "sentence") then fts:ApplyFTSentenceWindow($allMatches, $size) else fts:ApplyFTParagraphWindow($allMatches, $size) };

For example, consider the FTWindow selection ("Ford Mustang" ftand "excellent") window 10 words. The Matches of the source AllMatches for ("Ford Mustang" ftand "excellent") are given below.

The result for the FTWindow selection consists of only the first, the fifth, and the sixth Matches because their respective window sizes are 5, 4, and 9.

FTDistance

The parameters of the ApplyFTDistance function are 1) one AllMatches parameter corresponding to the result of the nested FTSelections, 2) the unit of the distance (tokens, sentences, paragraphs), and 3) the range specified. The function definitions depend on the distance units and the range specifications.

The semantics of case word distance exactly N is given below.

declare function fts:ApplyFTWordDistanceExactly( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $idx in 1 to fn:count($sorted) - 1 satisfies fts:wordDistance( $sorted[$idx]/fts:tokenInfo, $sorted[$idx+1]/fts:tokenInfo ) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) = $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of word distance at least N is given below.

declare function fts:ApplyFTWordDistanceAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of word distance at most N is given below.

declare function fts:ApplyFTWordDistanceAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of word distance from M to N is given below.

declare function fts:ApplyFTWordDistanceFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPos ascending, $si/fts:tokenInfo/@endPos ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $m and fts:wordDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $m and fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance exactly N is given below.

declare function fts:ApplyFTSentenceDistanceExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) = $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance at least N is given below.

declare function fts:ApplyFTSentenceDistanceAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance at most N is given below.

declare function fts:ApplyFTSentenceDistanceAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of sentence distance from M to N is given below.

declare function fts:ApplyFTSentenceDistanceFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startSent ascending, $si/fts:tokenInfo/@endSent ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $m and fts:sentenceDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $m and fts:sentenceDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance exactly N is given below.

declare function fts:ApplyFTParagraphDistanceExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) = $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) = $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance at least N is given below.

declare function fts:ApplyFTParagraphDistanceAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance at most N is given below.

declare function fts:ApplyFTParagraphDistanceAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The semantics of paragraph distance from M to N is given below.

declare function fts:ApplyFTParagraphDistanceFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> { for $match in $allMatches/fts:match let $sorted := for $si in $match/fts:stringInclude order by $si/fts:tokenInfo/@startPara ascending, $si/fts:tokenInfo/@endPara ascending return $si where if (fn:count($sorted) le 1) then fn:true() else every $index in (1 to fn:count($sorted) - 1) satisfies fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) >= $m and fts:paraDistance( $sorted[$index]/fts:tokenInfo, $sorted[$index+1]/fts:tokenInfo ) <= $n return <fts:match> { fts:joinIncludes($match/fts:stringInclude), for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) >= $m and fts:paraDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl } </fts:match> } </fts:allMatches> };

The resulting AllMatches contains Matches of the operand that satisfy the condition that the distance for every pair of consecutive StringIncludes is within the specified interval, where the distance is measured in tokens, sentences, or paragraphs from the end of the preceding StringInclude to the start of the next.

In the general case, the semantics is given below.

declare function fts:ApplyFTDistance ( $type as fts:distanceType, $range as element(fts:range), $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if ($type eq "word") then if ($range/@type eq "exactly") then fts:ApplyFTWordDistanceExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTWordDistanceAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTWordDistanceAtMost( $allMatches, $range/@n) else fts:ApplyFTWordDistanceFromTo( $allMatches, $range/@m, $range/@n) else if ($type eq "sentence") then if ($range/@type eq "exactly") then fts:ApplyFTSentenceDistanceExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTSentenceDistanceAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTSentenceDistanceAtMost( $allMatches, $range/@n) else fts:ApplyFTSentenceDistanceFromTo( $allMatches, $range/@m, $range/@n) else if ($range/@type eq "exactly") then fts:ApplyFTParagraphDistanceExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTParagraphDistanceAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTParagraphDistanceAtMost( $allMatches, $range/@n) else fts:ApplyFTParagraphDistanceFromTo( $allMatches, $range/@m, $range/@n) };

For example, consider the FTDistance selection ("Ford Mustang" ftand "excellent") distance at most 3 words. The Matches of the source AllMatches for ("Ford Mustang" ftand "excellent") are given below.

The result for the FTDistance selection consists of only the first Match (with positions 1, 2, and 5) and the fifth Match (with positions 25, 27, and 28), because only for these Matches the word distance between consecutive TokenInfos is always less than or equal to 3. It is 1 for the first pair and 3 for the second in the first case, and 2 and 1 in the second.

FTTimes

The parameters of the ApplyFTTimes function are 1) an FTRange specification, and 2) a parameter corresponding to the result of the nested FTWords.

The function definitions depend on the range specification FTRange to limit the number of occurrences.

The general semantics is given below.

declare function fts:FormCombinations ( $sms as element(fts:match)*, $k as xs:integer ) as element(fts:match)* (: Find all combinations of exactly $k elements from $sms, and for each such combination, construct a match whose children are copies of all the children of all the elements in the combination. Return the sequence of all such matches. :) { if ($k eq 0) then <fts:match/> else if (fn:count($sms) lt $k) then () else if (fn:count($sms) eq $k) then <fts:match>{$sms/*}</fts:match> else let $first := $sms[1], $rest := fn:subsequence($sms, 2) return ( (: all the combinations that don't involve $first :) fts:FormCombinations($rest, $k), (: and all the combinations that do involve $first :) for $combination in fts:FormCombinations($rest, $k - 1) return <fts:match> { $first/*, $combination/* } </fts:match> ) }; declare function fts:FormCombinationsAtLeast ( $sms as element(fts:match)*, $times as xs:integer) as element(fts:match)* (: Find all combinations of $times or more elements from $sms, and for each such combination, construct a match whose children are copies of all the children of all the elements in the combination. Return the sequence of all such matches. :) { for $k in $times to fn:count($sms) return fts:FormCombinations($sms, $k) }; declare function fts:FormRange ( $sms as element(fts:match)*, $l as xs:integer, $u as xs:integer, $stokenNum as xs:integer ) as element(fts:allMatches) { if ($l > $u) then () else let $am1 := <fts:allMatches stokenNum="{$stokenNum}"> {fts:FormCombinationsAtLeast($sms, $l)} </fts:allMatches> let $am2 := <fts:allMatches stokenNum="{$stokenNum}"> {fts:FormCombinationsAtLeast($sms, $u+1)} </fts:allMatches> return fts:ApplyFTAnd($am1, fts:ApplyFTUnaryNot($am2)) };

The semantics of occurs exactly N times is given below.

declare function fts:ApplyFTTimesExactly ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, $n, $n, $allMatches/@stokenNum) };

The semantics of occurs at least N times is given below.

declare function fts:ApplyFTTimesAtLeast ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> {fts:FormCombinationsAtLeast($allMatches/fts:match, $n)} </fts:allMatches> };

The semantics of occurs at most N times is given below.

declare function fts:ApplyFTTimesAtMost ( $allMatches as element(fts:allMatches), $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, 0, $n, $allMatches/@stokenNum) };

The semantics of occurs from M to N times is given below.

declare function fts:ApplyFTTimesFromTo ( $allMatches as element(fts:allMatches), $m as xs:integer, $n as xs:integer ) as element(fts:allMatches) { fts:FormRange($allMatches/fts:match, $m, $n, $allMatches/@stokenNum) };

The way to ensure that there are at least N different matches of an FTSelection is to ensure that at least N of its Matches occur simultaneously. This is similar to forming their conjunction by combining N or more distinct Matches into one simple match. Therefore, the AllMatches for the selection condition specifying the range qualifier at least N contains the possible combinations of N or more simple matches of the operand. This operation is performed in the function fts:FormCombinationsAtLeast.

The range [L, U] is represented by the condition at least L and not at least U+1. This transformation is performed in the function fts:FormRange.

The semantics for the general case is given below.

declare function fts:ApplyFTTimes ( $range as element(fts:range), $allMatches as element(fts:allMatches) ) as element(fts:allMatches) { if (fn:count($allMatches//fts:stringExclude) gt 0) then fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'XPST0003')) else if ($range/@type eq "exactly") then fts:ApplyFTTimesExactly($allMatches, $range/@n) else if ($range/@type eq "at least") then fts:ApplyFTTimesAtLeast($allMatches, $range/@n) else if ($range/@type eq "at most") then fts:ApplyFTTimesAtMost($allMatches, $range/@n) else fts:ApplyFTTimesFromTo($allMatches, $range/@m, $range/@n) };

The above function performs a sanity check to ensure that the nested AllMatches is a result of the evaluation of FTWords as defined in the grammar rule for FTPrimary . Otherwise, an error is raised.

For example, consider the FTTimes selection "Mustang" occurs at least 2 times. The source AllMatches of the FTWords selection "Mustang" is given below.

The result consists of the pairs of the Matches.

FTContainsExpr

Consider an FTContainsExpr expression of the form SearchContext ftcontains FTSelection, where SearchContext is an XQuery 1.0 expression that returns a sequence of items. The FTContainsExpr returns true if and only if one of those items satisfies the FTSelection.

If the FTContainsExpr is of the form SearchContext ftcontains FTSelection without content IgnoreExpr for some XQuery 1.0 expression IgnoreExpr, then any nodes returned by IgnoreExpr are (notionally) pruned from each search context item before attempting to satisfy the FTSelection.

More formally, evaluation of an FTContainsExpr proceeds according to the following steps. Where appropriate, the explanation includes references to arcs labelled "FTn" in the processing model diagram (Figure 1) in .

For each XQuery/XPath expression nested within the FTContainsExpr, evaluate it with respect to the same dynamic context as the FTContainsExpr (FT1). Specifically:

Evaluate the search context expression (SearchContext), resulting in the sequence of search context items.

Evaluate the ignore option (IgnoreExpr) if any, resulting in the set of ignored nodes.

At each FTWordsValue, evaluate the literal/expression and convert the result to xs:string*.

At each weight specification, evaluate the expression and convert the result to xs:double.

At each FTWindow and FTRange, evaluate the AdditiveExpr(s) and convert each to xs:integer.

Using the settings of the match option components in the FTContainsExpr's static context, construct an element(fts:matchOptions) structure.

Based on the parse-tree of the FTContainsExpr's FTSelection and the results of steps 1c-1e, construct an element(*,fts:ftSelection) structure. We refer to this as the "operator tree" below. In this process:

Construct the operator tree from the top down, propagating FTMatchOptions down to FTWordsValues.

Tokenize the query string(s) obtained at 1c. (FT2.1)

Call the function fts:FTContainsExpr (see declaration below), passing the following arguments to its parameters:

$searchContextItems: The sequence of items returned by SearchContext, calculated in step 1a.

$ignoreNodes: The sequence of items returned by IgnoreExpr (in 1b), if that expression is present, or the empty sequence otherwise.

$ftSelection: The XML node representation of FTSelection (constructed in step 2).

$defOptions: The XML representation of the match options in the FTContainsExpr's static context (constructed in step 3).

Within the function, for each search context item:

Delete the ignored nodes from the search context item. [fts:FTContainsExpr calls fts:reconstruct.]

Traverse the operator tree from the top down, propagating FTMatchOptions down to FTWordsValues. [fts:evaluate calls itself and fts:replaceMatchOptions.]

At each FTWordsValue, using the prevailing FTMatchOptions:

Tokenize the search context obtained at 4a. (FT2.2) (Whether this pays any attention to FTMatchOptions is up to the implementation.) [This happens within fts:matchTokenInfos.]

Match the search context tokens and the query tokens, yielding an element(fts:tokenInfo)* structure. [This happens within fts:matchTokenInfos.]

Convert that into an element(fts:allMatches). (FT3) [This happens in fts:applyQueryTokensAsPhrase.]

Traverse the operator tree from the bottom up. At each point, the AllMatches instances produced by subtrees are taken as input, and a new AllMatches instance is obtained as output. (FT4) [This is most of the section 4 code.]

If the topmost AllMatches instance contains a Match with no StringExcludes, then the search context item satisfies the full-text condition given by the FTSelection, and the call to fts:FTContainsExpr returns true. [This is handled by the QuantifiedExpr in fts:FTContainsExpr.]

[Note that the section 4 code doesn't implement 4b-4d as three sequential steps. Instead, they are different aspects of a single traversal of the operator tree.]

If none of the topmost AllMatches provides a successful match, then fts:FTContainsExpr returns false.

The boolean value returned by the call to fts:FTContainsExpr is the value of the FTContainsExpr. (FT5)

declare function fts:FTContainsExpr ( $searchContextItems as item()*, $ignoreNodes as node()*, $ftSelection as element(*,fts:ftSelection), $defOptions as element(fts:matchOptions) ) as xs:boolean { some $searchContext in $searchContextItems satisfies let $newSearchContext := fts:reconstruct( $searchContext, $ignoreNodes ) return if (fn:empty($newSearchContext)) then fn:false() else let $allMatches := fts:evaluate($ftSelection, $newSearchContext, $defOptions, 0) return some $match in $allMatches/fts:match satisfies fn:count($match/fts:stringExclude) eq 0 }; declare function fts:reconstruct ( $n as item(), $ignore as node()* ) as item()? { typeswitch ($n) case node() return if (some $i in $ignore satisfies $n is $i) then () else if ($n instance of element()) then let $nodeName := fn:node-name($n) let $nodeContent := for $nn in $n/node() return fts:reconstruct($nn,$ignore) return element {$nodeName} {$nodeContent} else if ($n instance of document-node()) then document { for $nn in $n/node() return fts:reconstruct($nn, $ignore) } else $n default return $n }; Scoring

This section addresses the semantics of scoring variables in XQuery 1.0 for and let clauses and XPath 2.0 for expressions.

Scoring variables associate a numeric score with the result of the evaluation of XQuery 1.0 and XPath 2.0 expressions. This numeric score tries to estimate the value of a result item to the user information need expressed using the XQuery 1.0 and XPath 2.0 expression. The numeric score is computed using an implementation-dependent scoring algorithm.

There are numerous scoring algorithms used in practice. Most of the scoring algorithms take as inputs a query and a set of results to the query. In computing the score, these algorithms rely on the structure of the query to estimate the relevance of the results.

In the context of defining the semantics of XQuery and XPath Full Text, passing the structure of the query poses a problem. The query may contain XQuery 1.0 and XPath 2.0 expressions and XQuery and XPath Full Text expressions in particular. The semantics of XQuery 1.0 and XPath 2.0 expressions is defined using (among other things) functions that take as arguments sequences of items and return sequences of items. They are not aware of what expression produced a particular sequence, i.e., they are not aware of the expression structure.

To define the semantics of scoring in XQuery and XPath Full Text using XQuery 1.0, expressions that produce the query result (or the functions that implement the expressions) must be passed as arguments. In other words, second-order functions are necessary. Currently XQuery 1.0 and XPath 2.0 do not provide such functions.

Nevertheless, in the interest of the exposition, assume that such second-order functions are present. In particular, that there are two semantic second-order function fts:score and fts:scoreSequence that take one argument (an expression) and return the score value of this expression, respectively a sequence of score values, one for each item to which the expression evaluates. The scores must satisfy scoring properties.

A for clause containing a score variable for $result score $score in Expr ... is evaluated as though it is replaced by the following the set of clauses. let $scoreSeq := fts:scoreSequence(Expr) for $result at $i in Expr let $score := $scoreSeq[$i] ... Here, $scoreSeq and $i are new variables, not appearing elsewhere, and fts:scoreSequence is the second-order function.

Similarly, a let clause containing a score variable let score $score := Expr ... is evaluated as though it is replaced by the following clause. let $score := fts:score(Expr) ...

Example

This section presents a more complex example for the evaluation of FTContainsExpr. This example uses the same sample document fragment and assigns it $doc. Consider the following FTContainsExpr.

$doc ftcontains ( ( "mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) ) window 11 words ftand ftnot "rust" ) same paragraph

Begin by evaluating the FTSelection to AllMatches.

( ( "mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) ) window 11 words ftand ftnot "rust" ) same paragraph

Step 1: Evaluate the FTWords "mustang".

Step 2: Evaluate the FTWords {"great", "excellent"} any word.

Step 2.1: Match the token "great"

Step 2.2 Match the token "excellent"

Step 2.3 - Combine the above AllMatches as if FTOr is used, i.e., by forming a union of the Matches.

Step 3 - Apply the FTTimes {("great", "excellent")} any word occurs at least 2 times forming two pairs of Matches.

Step 4 - Apply the FTAnd "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) forming all possible pairs of StringMatches.

Step 5 - Apply the FTWindow ("Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times)) window 11 words, filtering out Matches for which the window is not less than or equal to 11 tokens.

Step 6 - Evaluate FTWords "rust".

Step 7 - Apply the FTUnaryNot ftnot "rust", transforming the StringInclude into a StringExclude.

Step 8 - Apply the FTAnd (("Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times)) window 11 words) ftand ftnot "rust", forming all possible combintations of three StringMatches from the first AllMatches and one StringMatch from the second AllMatches.

Step 9: Apply the FTScope, filtering out Matches whose TokenInfos are not within the same paragraph (assuming the <offer> elements determine paragraph boundaries).

The resulting AllMatches contains a Match that does not contain a StringExclude. Therefore, the sample FTContainsExpr returns true.

Conformance

This section defines the conformance criteria for a XQuery and XPath Full Text 1.0 processor.

In this section, the following terms are used to indicate the requirement levels defined in . MUST means that the item is an absolute requirement of the specification. MAY means that an item is truly optional. SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

An XQuery and XPath Full Text 1.0 processor that claims to conform to this specification MUST include a claim of Minimal Conformance as defined in . In addition to a claim of Minimal Conformance, it MAY claim conformance to one or more optional features defined in

Minimal Conformance

Minimal Conformance to this specification MUST include all of the following items:

Minimal support for XQuery 1.0 or XPath 2.0 . The optional features of XQuery 1.0 or XPath 2.0 MAY be supported.

Support for everything specified in this document except those operators and match options specified in to be optional. If an implementation does not provide a given optional operator or match option, it MUST implement any requirements specified in for implementations that do not provide that operator or match option.

A definition of every item specified to be implementation-defined in .

Implementations are not required to define items specified to be implementation-dependent

Optional Features FTMildNot Operator

It is optional whether the implementation supports the FTMildNot. If it does not support FTMildNot and encounters one in a full-text query, then it MUST raise an error .

FTUnaryNot Operator

The unrestricted form of negation in FTUnaryNot, that can negate every kind of FTSelection, is optional. Implementations may choose to support the negation operation in a restricted form, enforcing one or both of the following restrictions.

Negation Restriction 1. An FTUnaryNot expression may only appear as a direct right operand of an "ftand" (FTAnd) operation.

Negation Restriction 2. An FTUnaryNot expression may not appear as a descendant of an FTOr that is modified by an FTPosFilter. (An FTOr is modified by an FTPosFilter, if it is derived using the production for FTSelection together with that FTPosFilter.)

Consider the following example FTSelections.

1. ftnot "web" 2. "web" ftand ( ftnot "information" ftor "retrieval" ) 3. "web" ftand ftnot("information" ftand "retrieval") 4. "web" ftand ftnot("information" ftand "retrieval" window 5 words) 5. "web" ftand ("information" ftand ftnot "retrieval" window 5 words)

The first two FTSelections both violate restriction 1, while the third and the fourth are conform with both restrictions. The fifth one violates restriction 2, while obeying restriction 1. Note that in the last example the FTSelection to which the window operation is applied is "information" ftand ftnot "retrieval", which contains an FTUnaryNot expression.

If the implementation does enforce one or both of these restrictions on FTUnaryNot and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTUnit and FTBigUnit

It is optional whether the implementation supports all the choices of FTUnit and FTBigUnit. If it does not support one or more choices of FTUnit or FTBigUnit and encounters an unsupported FTUnit or FTBigUnit in a full-text query, then it MUST raise an error .

FTOrder Operator

The unrestricted form of the FTOrder postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTOrder.

Order Operator Restriction. FTOrder may only appear directly succeeding an FTWindow or an FTDistance operator.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTScope Operator

It is optional whether the implementation supports the FTScope operator. If it does not support FTScope and encounters one in a full-text query, then it MUST raise an error .

FTWindow Operator

The unrestricted form of the FTWindow postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTWindow.

Window Operator Restriction. FTWindow can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTDistance Operator

The unrestricted form of the FTDistance postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTDistance.

Distance Operator Restriction. FTDistance can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTTimes Operator

It is optional whether the implementation supports the FTTimes operator. If it does not support FTTimes and encounters one in a full-text query, then it MUST raise an error .

FTContent Operator

It is optional whether the implementation supports the FTContent operator. If it does not support FTContent and encounters one in a full-text query, then it MUST raise an error .

FTCaseOption

It is optional whether the implementation supports the "lowercase" and "uppercase" choices for the FTCaseOption. If it does not support these choices for the FTCaseOption and encounters an unsupported choice in a full-text query, then it MUST raise an error .

FTStopWordOption

It is optional whether the implementation supports the FTStopWordOption. If it does not support FTStopWordOption and encounters one in a full-text query, then it MUST raise an error .

It is optional whether the implementation supports the FTStopWordOption in the body of the query. If it supports FTStopWordOption in the prolog, but not in the body of a query, and encounters one in the body of a query it MUST raise an error .

It is optional whether the implementation supports the StringLiteral alternative of FTStopWords in the FTStopWordOption. If it does not support the StringLiteral alternative of FTStopWords and encounters such an alternative in a full-text query, then it MUST raise an error .

FTLanguageOption

It is optional whether the implementation supports the unrestricted form of FTLanguageOption. Implementations may choose to enforce the following restriction on the use of FTLanguageOption.

Single Language Restriction. If a full-text query contains more than one FTLanguageOption in its body and the prolog, then the languages specified must be the same.

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error .

FTIgnoreOption

The implementation may constrain the set of ignored nodes. If the operand of FTIgnoreOption violates the implementation-defined restriction on that operand, it MUST raise an error .

Scoring

The implementation may restrict the allowable expressions used to compute scores. The restrictions are implementation-defined.

If the implementation does enforce such restrictions and encounters a full-text query that does not obey the restriction then it MUST raise an error .

Weights

An implementation may constrain the range of valid weights to non-negative values. If an implementation does enforce this restriction and encounters a full-text query that uses a negative weight, it MUST raise an error .

EBNF for XQuery 1.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XML Query 1.0 grammar (see http://www.w3.org/TR/2005/CR-xquery-20051103/).

ModuleVersionDecl? (LibraryModule | MainModule)VersionDecl"xquery" "version" StringLiteral ("encoding" StringLiteral)? SeparatorMainModuleProlog QueryBodyLibraryModuleModuleDecl PrologModuleDecl"module" "namespace" NCName "=" URILiteral SeparatorProlog((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)*SetterBoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderDecl | CopyNamespacesDeclImportSchemaImport | ModuleImportSeparator";"NamespaceDecl"declare" "namespace" NCName "=" URILiteralBoundarySpaceDecl"declare" "boundary-space" ("preserve" | "strip")DefaultNamespaceDecl"declare" "default" ("element" | "function") "namespace" URILiteralOptionDecl"declare" "option" QName StringLiteralFTOptionDecl"declare" "ft-option" FTMatchOptionsOrderingModeDecl"declare" "ordering" ("ordered" | "unordered")EmptyOrderDecl"declare" "default" "order" "empty" ("greatest" | "least")CopyNamespacesDecl"declare" "copy-namespaces" PreserveMode "," InheritModePreserveMode"preserve" | "no-preserve"InheritMode"inherit" | "no-inherit"DefaultCollationDecl"declare" "default" "collation" URILiteralBaseURIDecl"declare" "base-uri" URILiteralSchemaImport"import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)?SchemaPrefix("namespace" NCName "=") | ("default" "element" "namespace")ModuleImport"import" "module" ("namespace" NCName "=")? URILiteral ("at" URILiteral ("," URILiteral)*)?VarDecl"declare" "variable" "$" QName TypeDeclaration? ((":=" ExprSingle) | "external")ConstructionDecl"declare" "construction" ("strip" | "preserve")FunctionDecl"declare" "function" QName "(" ParamList? ")" ("as" SequenceType)? (EnclosedExpr | "external")ParamListParam ("," Param)*Param"$" QName TypeDeclaration?EnclosedExpr"{" Expr "}"QueryBodyExprExprExprSingle ("," ExprSingle)*ExprSingleFLWORExpr
| QuantifiedExpr
| TypeswitchExpr
| IfExpr
| OrExprFLWORExpr(ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingleForClause"for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)*PositionalVar"at" "$" VarNameFTScoreVar"score" "$" VarNameLetClause(("let" "$" VarName TypeDeclaration?) | ("let" "score" "$" VarName)) ":=" ExprSingle ("," (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle)*WhereClause"where" ExprSingleOrderByClause(("order" "by") | ("stable" "order" "by")) OrderSpecListOrderSpecListOrderSpec ("," OrderSpec)*OrderSpecExprSingle OrderModifierOrderModifier("ascending" | "descending")? ("empty" ("greatest" | "least"))? ("collation" URILiteral)?QuantifiedExpr("some" | "every") "$" VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingleTypeswitchExpr"typeswitch" "(" Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingleCaseClause"case" ("$" VarName "as")? SequenceType "return" ExprSingleIfExpr"if" "(" Expr ")" "then" ExprSingle "else" ExprSingleOrExprAndExpr ( "or" AndExpr )*AndExprComparisonExpr ( "and" ComparisonExpr )*ComparisonExprFTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?FTContainsExprRangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?RangeExprAdditiveExpr ( "to" AdditiveExpr )?AdditiveExprMultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*MultiplicativeExprUnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*UnionExprIntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*IntersectExceptExprInstanceofExpr ( ("intersect" | "except") InstanceofExpr )*InstanceofExprTreatExpr ( "instance" "of" SequenceType )?TreatExprCastableExpr ( "treat" "as" SequenceType )?CastableExprCastExpr ( "castable" "as" SingleType )?CastExprUnaryExpr ( "cast" "as" SingleType )?UnaryExpr("-" | "+")* ValueExprValueExprValidateExpr | PathExpr | ExtensionExprGeneralComp"=" | "!=" | "<" | "<=" | ">" | ">="ValueComp"eq" | "ne" | "lt" | "le" | "gt" | "ge"NodeComp"is" | "<<" | ">>"ValidateExpr"validate" ValidationMode? "{" Expr "}"ValidationMode"lax" | "strict"ExtensionExprPragma+ "{" Expr? "}"Pragma"(#" S? QName (S PragmaContents)? "#)"ws: explicitPragmaContents(Char* - (Char* '#)' Char*))PathExpr("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExprxgc: leading-lone-slashRelativePathExprStepExpr (("/" | "//") StepExpr)*StepExprFilterExpr | AxisStepAxisStep(ReverseStep | ForwardStep) PredicateListForwardStep(ForwardAxis NodeTest) | AbbrevForwardStepForwardAxis("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")AbbrevForwardStep"@"? NodeTestReverseStep(ReverseAxis NodeTest) | AbbrevReverseStepReverseAxis("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")AbbrevReverseStep".."NodeTestKindTest | NameTestNameTestQName | WildcardWildcard"*"
| (NCName ":" "*")
| ("*" ":" NCName)ws: explicitFilterExprPrimaryExpr PredicateListPredicateListPredicate*Predicate"[" Expr "]"PrimaryExprLiteral | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall | OrderedExpr | UnorderedExpr | ConstructorLiteralNumericLiteral | StringLiteralNumericLiteralIntegerLiteral | DecimalLiteral | DoubleLiteralVarRef"$" VarNameVarNameQNameParenthesizedExpr"(" Expr? ")"ContextItemExpr"."OrderedExpr"ordered" "{" Expr "}"UnorderedExpr"unordered" "{" Expr "}"FunctionCallQName "(" (ExprSingle ("," ExprSingle)*)? ")"xgc: reserved-function-namesgn: parensConstructorDirectConstructor
| ComputedConstructorDirectConstructorDirElemConstructor
| DirCommentConstructor
| DirPIConstructorDirElemConstructor"<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">"))ws: explicitDirAttributeList(S (QName S? "=" S? DirAttributeValue)?)*ws: explicitDirAttributeValue('"' (EscapeQuot | QuotAttrValueContent)* '"')
| ("'" (EscapeApos | AposAttrValueContent)* "'")ws: explicitQuotAttrValueContentQuotAttrContentChar
| CommonContentAposAttrValueContentAposAttrContentChar
| CommonContentDirElemContentDirectConstructor
| CDataSection
| CommonContent
| ElementContentCharCommonContentPredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExprDirCommentConstructor""ws: explicitDirCommentContents((Char - '-') | ('-' (Char - '-')))*ws: explicitDirPIConstructor"<?" PITarget (S DirPIContents)? "?>"ws: explicitDirPIContents(Char* - (Char* '?>' Char*))ws: explicitCDataSection"<![CDATA[" CDataSectionContents "]]>"ws: explicitCDataSectionContents(Char* - (Char* ']]>' Char*))ws: explicitComputedConstructorCompDocConstructor
| CompElemConstructor
| CompAttrConstructor
| CompTextConstructor
| CompCommentConstructor
| CompPIConstructorCompDocConstructor"document" "{" Expr "}"CompElemConstructor"element" (QName | ("{" Expr "}")) "{" ContentExpr? "}"ContentExprExprCompAttrConstructor"attribute" (QName | ("{" Expr "}")) "{" Expr? "}"CompTextConstructor"text" "{" Expr "}"CompCommentConstructor"comment" "{" Expr "}"CompPIConstructor"processing-instruction" (NCName | ("{" Expr "}")) "{" Expr? "}"SingleTypeAtomicType "?"?TypeDeclaration"as" SequenceTypeSequenceType("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)OccurrenceIndicator"?" | "*" | "+"xgc: occurrence-indicatorsItemTypeKindTest | ("item" "(" ")") | AtomicTypeAtomicTypeQNameKindTestDocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTestAnyKindTest"node" "(" ")"DocumentTest"document-node" "(" (ElementTest | SchemaElementTest)? ")"TextTest"text" "(" ")"CommentTest"comment" "(" ")"PITest"processing-instruction" "(" (NCName | StringLiteral)? ")"AttributeTest"attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"AttribNameOrWildcardAttributeName | "*"SchemaAttributeTest"schema-attribute" "(" AttributeDeclaration ")"AttributeDeclarationAttributeNameElementTest"element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"ElementNameOrWildcardElementName | "*"SchemaElementTest"schema-element" "(" ElementDeclaration ")"ElementDeclarationElementNameAttributeNameQNameElementNameQNameTypeNameQNameURILiteralStringLiteralFTSelectionFTOr FTPosFilter* ("weight" RangeExpr)?FTOrFTAnd ( "ftor" FTAnd )*FTAndFTMildNot ( "ftand" FTMildNot )*FTMildNotFTUnaryNot ( "not" "in" FTUnaryNot )*FTUnaryNot("ftnot")? FTPrimaryWithOptionsFTPrimaryWithOptionsFTPrimary FTMatchOptions?FTPrimary(FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelectionFTWordsFTWordsValue FTAnyallOption?FTWordsValueLiteral | ("{" Expr "}")FTExtensionSelectionPragma+ "{" FTSelection? "}"FTAnyallOption("any" "word"?) | ("all" "words"?) | "phrase"FTTimes"occurs" FTRange "times"FTRange("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)FTPosFilterFTOrder | FTWindow | FTDistance | FTScope | FTContentFTOrder"ordered"FTWindow"window" AdditiveExpr FTUnitFTDistance"distance" FTRange FTUnitFTUnit"words" | "sentences" | "paragraphs"FTScope("same" | "different") FTBigUnitFTBigUnit"sentence" | "paragraph"FTContent("at" "start") | ("at" "end") | ("entire" "content")FTMatchOptionsFTMatchOption+xgc: multiple-match-optionsFTMatchOptionFTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopWordOption
| FTExtensionOptionFTCaseOption("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"FTDiacriticsOption("diacritics" "insensitive")
| ("diacritics" "sensitive")FTStemOption("with" "stemming") | ("without" "stemming")FTThesaurusOption("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")FTThesaurusID"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")?FTStopWordOption("with" "stop" "words" FTStopWords FTStopWordsInclExcl*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTStopWordsInclExcl*)FTStopWords("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")FTStopWordsInclExcl("union" | "except") FTStopWordsFTLanguageOption"language" StringLiteralFTWildCardOption("with" "wildcards") | ("without" "wildcards")FTExtensionOption"option" QName StringLiteralFTIgnoreOption"without" "content" UnionExpr Terminal Symbols IntegerLiteralDigitsDecimalLiteral("." Digits) | (Digits "." [0-9]*)ws: explicitDoubleLiteral(("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digitsws: explicitStringLiteral('"' (PredefinedEntityRef | CharRef | EscapeQuot | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | EscapeApos | [^'&])* "'")ws: explicitPredefinedEntityRef"&" ("lt" | "gt" | "amp" | "quot" | "apos") ";"ws: explicitEscapeQuot'""'EscapeApos"''"ElementContentCharChar - [{}<&]QuotAttrContentCharChar - ["{}<&]AposAttrContentCharChar - ['{}<&]Comment"(:" (CommentContents | Comment)* ":)"ws: explicitgn: commentsPITarget[http://www.w3.org/TR/REC-xml#NT-PITarget]xgc: xml-versionCharRef[http://www.w3.org/TR/REC-xml#NT-CharRef]xgc: xml-versionQName[http://www.w3.org/TR/REC-xml-names/#NT-QName]xgc: xml-versionNCName[http://www.w3.org/TR/REC-xml-names/#NT-NCName]xgc: xml-versionS[http://www.w3.org/TR/REC-xml#NT-S]xgc: xml-versionChar[http://www.w3.org/TR/REC-xml#NT-Char]xgc: xml-version

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of .

Digits[0-9]+CommentContents(Char+ - (Char* ('(:' | ':)') Char*)) Extra-grammatical Constraints

This section contains constraints on the EBNF productions, which are required to parse legal sentences. The note below is referenced from the right side of the production, with the notation: /* xgc: <id> */.

multiple-match-options

No single alternative for FTMatchOption can be specified more than once as part of the same FTMatchOptions. For example, if the FTCaseOption "lowercase" is specified, then "uppercase" cannot also be specified as part of the same FTMatchOptions.

EBNF for XPath 2.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XPath 2.0 grammar (see http://www.w3.org/TR/2005/CR-xpath20-20051103/).

XPathExprExprExprSingle ("," ExprSingle)*ExprSingleForExpr
| QuantifiedExpr
| IfExpr
| OrExprForExprSimpleForClause "return" ExprSingleSimpleForClause"for" "$" VarName FTScoreVar? "in" ExprSingle ("," "$" VarName FTScoreVar? "in" ExprSingle)*FTScoreVar"score" "$" VarNameQuantifiedExpr("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingleIfExpr"if" "(" Expr ")" "then" ExprSingle "else" ExprSingleOrExprAndExpr ( "or" AndExpr )*AndExprComparisonExpr ( "and" ComparisonExpr )*ComparisonExprFTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?FTContainsExprRangeExpr ( "ftcontains" FTSelection FTIgnoreOption? )?RangeExprAdditiveExpr ( "to" AdditiveExpr )?AdditiveExprMultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*MultiplicativeExprUnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*UnionExprIntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*IntersectExceptExprInstanceofExpr ( ("intersect" | "except") InstanceofExpr )*InstanceofExprTreatExpr ( "instance" "of" SequenceType )?TreatExprCastableExpr ( "treat" "as" SequenceType )?CastableExprCastExpr ( "castable" "as" SingleType )?CastExprUnaryExpr ( "cast" "as" SingleType )?UnaryExpr("-" | "+")* ValueExprValueExprPathExprGeneralComp"=" | "!=" | "<" | "<=" | ">" | ">="ValueComp"eq" | "ne" | "lt" | "le" | "gt" | "ge"NodeComp"is" | "<<" | ">>"Pragma"(#" S? QName (S PragmaContents)? "#)"ws: explicitPragmaContents(Char* - (Char* '#)' Char*))PathExpr("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExprxgc: leading-lone-slashRelativePathExprStepExpr (("/" | "//") StepExpr)*StepExprFilterExpr | AxisStepAxisStep(ReverseStep | ForwardStep) PredicateListForwardStep(ForwardAxis NodeTest) | AbbrevForwardStepForwardAxis("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")
| ("namespace" "::")AbbrevForwardStep"@"? NodeTestReverseStep(ReverseAxis NodeTest) | AbbrevReverseStepReverseAxis("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")AbbrevReverseStep".."NodeTestKindTest | NameTestNameTestQName | WildcardWildcard"*"
| (NCName ":" "*")
| ("*" ":" NCName)ws: explicitFilterExprPrimaryExpr PredicateListPredicateListPredicate*Predicate"[" Expr "]"PrimaryExprLiteral | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCallLiteralNumericLiteral | StringLiteralNumericLiteralIntegerLiteral | DecimalLiteral | DoubleLiteralVarRef"$" VarNameVarNameQNameParenthesizedExpr"(" Expr? ")"ContextItemExpr"."FunctionCallQName "(" (ExprSingle ("," ExprSingle)*)? ")"xgc: reserved-function-namesgn: parensSingleTypeAtomicType "?"?SequenceType("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)OccurrenceIndicator"?" | "*" | "+"xgc: occurrence-indicatorsItemTypeKindTest | ("item" "(" ")") | AtomicTypeAtomicTypeQNameKindTestDocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTestAnyKindTest"node" "(" ")"DocumentTest"document-node" "(" (ElementTest | SchemaElementTest)? ")"TextTest"text" "(" ")"CommentTest"comment" "(" ")"PITest"processing-instruction" "(" (NCName | StringLiteral)? ")"AttributeTest"attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"AttribNameOrWildcardAttributeName | "*"SchemaAttributeTest"schema-attribute" "(" AttributeDeclaration ")"AttributeDeclarationAttributeNameElementTest"element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"ElementNameOrWildcardElementName | "*"SchemaElementTest"schema-element" "(" ElementDeclaration ")"ElementDeclarationElementNameAttributeNameQNameElementNameQNameTypeNameQNameURILiteralStringLiteralFTSelectionFTOr FTPosFilter* ("weight" RangeExpr)?FTOrFTAnd ( "ftor" FTAnd )*FTAndFTMildNot ( "ftand" FTMildNot )*FTMildNotFTUnaryNot ( "not" "in" FTUnaryNot )*FTUnaryNot("ftnot")? FTPrimaryWithOptionsFTPrimaryWithOptionsFTPrimary FTMatchOptions?FTPrimary(FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelectionFTWordsFTWordsValue FTAnyallOption?FTWordsValueLiteral | ("{" Expr "}")FTExtensionSelectionPragma+ "{" FTSelection? "}"FTAnyallOption("any" "word"?) | ("all" "words"?) | "phrase"FTTimes"occurs" FTRange "times"FTRange("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)FTPosFilterFTOrder | FTWindow | FTDistance | FTScope | FTContentFTOrder"ordered"FTWindow"window" AdditiveExpr FTUnitFTDistance"distance" FTRange FTUnitFTUnit"words" | "sentences" | "paragraphs"FTScope("same" | "different") FTBigUnitFTBigUnit"sentence" | "paragraph"FTContent("at" "start") | ("at" "end") | ("entire" "content")FTMatchOptionsFTMatchOption+xgc: multiple-match-optionsFTMatchOptionFTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopWordOption
| FTExtensionOptionFTCaseOption("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"FTDiacriticsOption("diacritics" "insensitive")
| ("diacritics" "sensitive")FTStemOption("with" "stemming") | ("without" "stemming")FTThesaurusOption("with" "thesaurus" (FTThesaurusID | "default"))
| ("with" "thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("without" "thesaurus")FTThesaurusID"at" URILiteral ("relationship" StringLiteral)? (FTRange "levels")?FTStopWordOption("with" "stop" "words" FTStopWords FTStopWordsInclExcl*)
| ("without" "stop" "words")
| ("with" "default" "stop" "words" FTStopWordsInclExcl*)FTStopWords("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")FTStopWordsInclExcl("union" | "except") FTStopWordsFTLanguageOption"language" StringLiteralFTWildCardOption("with" "wildcards") | ("without" "wildcards")FTExtensionOption"option" QName StringLiteralFTIgnoreOption"without" "content" UnionExpr Terminal Symbols IntegerLiteralDigitsDecimalLiteral("." Digits) | (Digits "." [0-9]*)ws: explicitDoubleLiteral(("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digitsws: explicitStringLiteral('"' (EscapeQuot | [^"])* '"') | ("'" (EscapeApos | [^'])* "'")ws: explicitEscapeQuot'""'EscapeApos"''"Comment"(:" (CommentContents | Comment)* ":)"ws: explicitgn: commentsQName[http://www.w3.org/TR/REC-xml-names/#NT-QName]xgc: xml-versionNCName[http://www.w3.org/TR/REC-xml-names/#NT-NCName]xgc: xml-versionS[http://www.w3.org/TR/REC-xml#NT-S]xgc: xml-versionChar[http://www.w3.org/TR/REC-xml#NT-Char]xgc: xml-version

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of .

Digits[0-9]+CommentContents(Char+ - (Char* ('(:' | ':)') Char*)) Static Context Components

The following table describes the full-text components of the static context (as defined in ). The following aspects of each component are described:

Default initial value: This is the initial value of the component if it is not overridden or augmented by the implementation or by a query.

Can be overwritten or augmented by implementation: Indicates whether an XQuery implementation is allowed to replace the default initial value of the component by a different, implementation-defined value and/or to augment the default initial value by additional implementation-defined values.

Can be overwritten or augmented by a query: Indicates whether a query is allowed to replace and/or augment the initial value provided by default or by the implementation. If so, indicates how this is accomplished (for example, by a declaration in the prolog; as defined in ).

Scope: Indicates where the component is applicable. "Global" indicates that the component applies globally, throughout all the modules used in a query. "Module" indicates that the component applies throughout a module (as defined in ). "Lexical" indicates that the component applies within the expression in which it is defined (equivalent to "module", if the component is declared in a prolog.)

Consistency Rules: Indicates rules that must be observed in assigning values to the component.

Static Context Components
Component	Default initial value	Can be overwritten or augmented by implementation?	Can be overwritten or augmented by a query?	Scope	Consistency rules
FTCaseOption	`case insensitive`	overwriteable	overwriteable by prolog	lexical	Value must be `case insensitive`, `case sensitive`, `lowercase`, or `uppercase`.
FTDiacriticsOption	`diacritics insensitive`	overwriteable	overwriteable by prolog	lexical	Value must be `diacritics insensitive` or `diacritics sensitive`.
FTStemOption	`without stemming`	overwriteable	overwriteable by prolog	lexical	Value must be `without stemming` or `with stemming`.
FTThesaurusOption	`without thesaurus`	overwriteable	overwriteable by prolog (refer to default to augment)	lexical	Value must be part of the statically known thesauri.
Statically known thesauri	none	augmentable	cannot be augmented or overwritten by prolog	module	Each URI uniquely identifies a thesaurus list.
FTStopWordOption	`without stop words`	overwriteable	overwriteable by prolog (refer to default to augment)	lexical	Value must be part of the statically known stop word lists.
Statically known stop word lists	none	augmentable	cannot be augmented or overwritten by prolog	module	Each URI uniquely identifies a stop word list.
FTLanguageOption	implementation-defined	overwriteable	overwriteable by prolog	lexical	Value must be castable to `xs:language`.
Statically known languages	none	augmentable	cannot be augmented or overwritten by prolog	module	Each string uniquely identifies a language.
FTWildCardOption	`without wildcards`	no	overwriteable by prolog	lexical	Value must be `without wildcards` or `without wildcards`.

Error Conditions

An implementation that does not support the FTMildNot operator must raise a static error if a full-text query contains a mild not.

An implementation that enforces one of the restrictions on FTUnaryNot must raise a static error if a full-text query does not obey the restriction.

An implementation that does not support one or more of the choices on FTUnit and FTBigUnit must raise a static error if a full-text query contains one of those choices.

An implementation that does not support the FTScope operator must raise a static error if a full-text query contains a scope.

An implementation that does not support the FTTimes operator must raise a static error if a full-text query contains a times.

An implementation that restricts the use of FTStopWordOption must raise a static error if a full-text query contains a stop word option that does not meet the restriction.

An implementation that restricts the use of FTIgnoreOption must raise a static error if a full-text query contains an ignore option that does not meet the restriction.

It is a static error if, during the static analysis phase, the query is found to contain a stop word option that refers to a stop word list that is not found in the statically known stop word lists.

It may be a static error if, during the static analysis phase, the query is found to contain a language identifier in a language option that the implementation does not support. The implementation may choose not to raise this error and instead provide some other implementation-defined behavior.

It is a static error if, during the static analysis phase, an expression is found to use an FTOrder operator that does not appear directly succeeding an FTWindow or an FTDistance operator and the implementation enforces this restriction.

An implementation may restrict the use of FTWindow and FTDistance to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators && and ||. If it a static error if, during the static analysis phase, an expression is found that violates this restriction and the implementation enforces this restriction.

An implementation that does not support the FTContent operator must raise a static error if a full-text query contains one.

It is a static error if, during the static analysis phase, an implementation that restricts the use of FTLanguageOption to a single language, encounters more than one distinct language option.

An implementation may constrain the form of the expression used to compute scores. It is a static error if, during the static analysis phase, such an implementation encounters a scoring expression that does not meet the restriction.

It is a static error if, during the static analysis phase, an implementation that restricts the choices of FTCaseOption encounters the "lowercase" or "uppercase" option.

It is a dynamic error if an implementation that does not support negative weights encounters a weight expression that does not meet the restriction.

It is a dynamic error if an implementation encounters a mild not selection, one of whose operands evaluates to an AllMatches that contains a StringExclude

It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in .

It is a dynamic error if, in a function invocation, the argument corresponding to the specified function's collation parameter does not identify a supported collation.

XML Syntax (XQueryX) for XQuery and XPath Full Text 1.0

defines an XML representation of . , section 5.4, XML Syntax, states "XQuery and XPath Full Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements." This appendix specifies XML Schemas that together define the XML representation of XQuery and XPath Full Text 1.0 by representing the abstract syntax found in . Because XQuery and XPath Full Text 1.0 integrates seamlessly with XQuery 1.0, it follows that the XML Syntax for XQuery and XPath Full Text 1.0 must integrate well with the XML Syntax for XQuery 1.0.

The XML Schema specified in this appendix accomplishes integration by importing the XML Schema defined for XQueryX in , incorporating all of its type and element definitions. It then extends that schema by adding definitions of new types and elements in a namespace belonging to the full-text specification.

The semantics of a Full Text XQueryX document are determined by the semantics of the XQuery Full Text expression that results from transforming the XQueryX document into XQuery Full Text syntax using the XSLT stylesheet that appears in section . The "correctness" of that transformation is determined by asking the following the question: Can some Full Text XQueryX processor QX process some Full Text XQueryX document D1 to produce results R1, after which the stylesheet is used to translate D1 into an XQuery Full Text expression E1 that, when processed by some XQuery Full Text processor Q, produces results R2 that are equivalent (under some meaningful definition of "equivalent") to results R1?

XQueryX representation of XQuery and XPath Full Text 1.0

The XML Schema that defines the complex types and elements for XQueryX in support of XQuery and XPath Full Text 1.0, including the ftContainsExpr, incorporates a second XML Schema that defines types and elements to support the ftMatchOption. Both XML Schemas are defined in this section.

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified">       <xsd:import namespace="http://www.w3.org/2005/XQueryX" schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/> <xsd:include schemaLocation="./xpath-full-text-10-ftmatchoption-extensions.xsd"/> <xsd:element name="ftOptionDecl" substitutionGroup="xqx:prologPartOneItem"> <xsd:complexType> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="xqxft:ftMatchOption"/> </xsd:sequence> </xsd:complexType> </xsd:element>  <xsd:complexType name="ftExpr"> <xsd:complexContent> <xsd:extension base="xqx:expr"/> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftExpr" type="xqxft:ftExpr" abstract="true" substitutionGroup="xqx:expr"/>  <xsd:element name="ftScoreVariableBinding" type="xqx:QName" substitutionGroup="xqx:forLetClauseItemExtensions"/>  <xsd:complexType name="ftContainsExpr"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftRangeExpr" type="xqx:exprWrapper" /> <xsd:sequence minOccurs="0" maxOccurs="1"> <xsd:element name="ftSelectionExpr" type="xqxft:ftSelectionWrapper" /> <xsd:element name="ftIgnoreOption" type="xqxft:ftIgnoreOption" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftContainsExpr" type="xqxft:ftContainsExpr" substitutionGroup="xqxft:ftExpr" />  <xsd:complexType name="ftProximity" /> <xsd:element name="ftProximity" type="xqxft:ftProximity" abstract="true"/>  <xsd:simpleType name="ftUnit"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="paragraph"/> <xsd:enumeration value="sentence"/> <xsd:enumeration value="word"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="ftBigUnit"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="paragraph"/> <xsd:enumeration value="sentence"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="contentLocation"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="at start"/> <xsd:enumeration value="at end"/> <xsd:enumeration value="entire content"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name="ftScopeType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="same"/> <xsd:enumeration value="different"/> </xsd:restriction> </xsd:simpleType>  <xsd:complexType name="unaryRange"> <xsd:sequence> <xsd:element name="value" type="xqx:exprWrapper" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="binaryRange"> <xsd:sequence> <xsd:element name="lower" type="xqx:exprWrapper" /> <xsd:element name="upper" type="xqx:exprWrapper" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftRange"> <xsd:choice> <xsd:element name="atLeastRange" type="xqxft:unaryRange" /> <xsd:element name="atMostRange" type="xqxft:unaryRange" /> <xsd:element name="exactlyRange" type="xqxft:unaryRange" /> <xsd:element name="fromToRange" type="xqxft:binaryRange" /> </xsd:choice> </xsd:complexType>  <xsd:complexType name="ftOrdered"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftOrdered" type="xqxft:ftOrdered" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftWindow"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="value" type="xqx:exprWrapper" /> <xsd:element name="unit" type="xqxft:ftUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftWindow" type="xqxft:ftWindow" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftDistance"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="ftRange" type="xqxft:ftRange" /> <xsd:element name="unit" type="xqxft:ftUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftDistance" type="xqxft:ftDistance" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftScope"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="type" type="xqxft:ftScopeType" /> <xsd:element name="unit" type="xqxft:ftBigUnit" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftScope" type="xqxft:ftScope" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftContent"> <xsd:complexContent> <xsd:extension base="xqxft:ftProximity"> <xsd:sequence> <xsd:element name="location" type="xqxft:contentLocation" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftContent" type="xqxft:ftContent" substitutionGroup="xqxft:ftProximity"/>  <xsd:complexType name="ftPosFilter"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence minOccurs="0" maxOccurs="unbounded"> <xsd:choice> <xsd:element ref="xqxft:ftMatchOption" /> <xsd:element ref="xqxft:ftProximity" /> </xsd:choice> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType>  <xsd:complexType name="ftSelection" > <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftSelectionSource" type="xqx:exprWrapper"/> <xsd:element name="ftPosFilter" type="xqxft:ftPosFilter" minOccurs="0" maxOccurs="1" /> <xsd:element name="weight" type="xqx:exprWrapper" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftSelection" type="xqxft:ftSelection" substitutionGroup="xqxft:ftExpr" /> <xsd:complexType name="ftSelectionWrapper"> <xsd:sequence> <xsd:element ref="xqxft:ftSelection"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftIgnoreOption"> <xsd:sequence> <xsd:element ref="xqx:expr"/> </xsd:sequence> </xsd:complexType>  <xsd:element name="ftLogicalOp" type="xqx:binaryOperatorExpr" abstract="true" substitutionGroup="xqx:operatorExpr"/> <xsd:element name="ftOr" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <xsd:element name="ftAnd" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <xsd:element name="ftMildNot" type="xqx:binaryOperatorExpr" substitutionGroup="xqxft:ftLogicalOp"/> <xsd:element name="ftLogicalNot" type="xqx:unaryOperatorExpr" abstract="true" substitutionGroup="xqx:operatorExpr"/> <xsd:element name="ftUnaryNot" type="xqx:unaryOperatorExpr" substitutionGroup="xqxft:ftLogicalNot"/>  <xsd:complexType name="ftTimes"> <xsd:sequence> <xsd:element name="ftRange" type="xqxft:ftRange"/> </xsd:sequence> </xsd:complexType> <xsd:simpleType name="ftAnyAllOption"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="any"/> <xsd:enumeration value="all"/> <xsd:enumeration value="any word"/> <xsd:enumeration value="all words"/> <xsd:enumeration value="phrase"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="ftWordsAlternatives"> <xsd:choice> <xsd:element name="ftWordsLiteral" type="xqx:exprWrapper"/> <xsd:element name="ftWordsExpression" type="xqx:exprWrapper"/> </xsd:choice> </xsd:complexType> <xsd:complexType name="ftWords"> <xsd:sequence> <xsd:element name="ftWordsValue" type="xqxft:ftWordsAlternatives" /> <xsd:element name="ftAnyAllOption" type="xqxft:ftAnyAllOption" minOccurs="0" maxOccurs="1" /> </xsd:sequence> </xsd:complexType> <xsd:group name="ftWordsWithTimes"> <xsd:sequence> <xsd:element name="ftWords" type="xqxft:ftWords" /> <xsd:element name="ftTimes" type="xqxft:ftTimes" minOccurs="0" /> </xsd:sequence> </xsd:group> <xsd:complexType name="ftExtensionSelection"> <xsd:sequence> <xsd:element name="pragma" type="xqx:pragma" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="ftSelection" type="xqxft:ftSelection" minOccurs="0" maxOccurs="1"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftPrimary"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr" > <xsd:choice> <xsd:element name="parenthesized" type="xqx:exprWrapper"/> <xsd:group ref="xqxft:ftWordsWithTimes" /> <xsd:element name="ftExtensionSelection" type="xqxft:ftExtensionSelection"/> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:complexType name="ftPrimaryWithOptions"> <xsd:complexContent> <xsd:extension base="xqxft:ftExpr"> <xsd:sequence> <xsd:element name="ftPrimary" type="xqxft:ftPrimary"/> <xsd:element ref="xqxft:ftMatchOption" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftPrimaryWithOptions" type="xqxft:ftPrimaryWithOptions" substitutionGroup="xqxft:ftExpr"/> </xsd:schema> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" targetNamespace="http://www.w3.org/2007/xpath-full-text" elementFormDefault="qualified" attributeFormDefault="unqualified">     <xsd:import namespace="http://www.w3.org/2005/XQueryX" schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/>  <xsd:complexType name="ftMatchOption" /> <xsd:element name="ftMatchOption" type="xqxft:ftMatchOption" abstract="true" /> <xsd:complexType name="ftMatchOptions"> <xsd:sequence minOccurs="1" maxOccurs="unbounded"> <xsd:element ref="xqxft:ftMatchOption"/> </xsd:sequence> </xsd:complexType>  <xsd:complexType name="ftCaseOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="lowercase"/> <xsd:enumeration value="uppercase"/> <xsd:enumeration value="case sensitive"/> <xsd:enumeration value="case insensitive"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="case" type="xqxft:ftCaseOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftDiacriticsOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="diacritics sensitive"/> <xsd:enumeration value="diacritics insensitive"/> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="diacritics" type="xqxft:ftDiacriticsOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftStemOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="with stemming" /> <xsd:enumeration value="without stemming" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="stem" type="xqxft:ftStemOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftThesaurusID"> <xsd:sequence> <xsd:element name="at" type="xsd:anyURI" /> <xsd:element name="relationship" type="xsd:string" minOccurs="0" /> <xsd:element name="levels" type="xqxft:ftRange" minOccurs="0" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="thesaurusSpecSequence"> <xsd:sequence> <xsd:choice> <xsd:element name="default" /> <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID" /> </xsd:choice> <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftThesaurusOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:choice> <xsd:element name="without" /> <xsd:element name="thesauri" type="xqxft:thesaurusSpecSequence" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="thesaurus" type="xqxft:ftThesaurusOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftStopWords"> <xsd:choice> <xsd:element name="ref" type="xsd:anyURI" /> <xsd:element name="list"> <xsd:complexType> <xsd:sequence> <xsd:element ref="xqx:stringConstantExpr" minOccurs="1" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:choice> </xsd:complexType> <xsd:element name="ftStopWords" type="xqxft:ftStopWords" /> <xsd:group name="baseStopWords"> <xsd:choice> <xsd:element name="default" /> <xsd:element ref="xqxft:ftStopWords" /> </xsd:choice> </xsd:group> <xsd:complexType name="ftStopWordsInclExcl"> <xsd:choice> <xsd:element name="union" type="xqxft:ftStopWords" /> <xsd:element name="except" type="xqxft:ftStopWords" /> </xsd:choice> </xsd:complexType> <xsd:complexType name="stopWordsSpecSequence"> <xsd:sequence> <xsd:group ref="xqxft:baseStopWords" /> <xsd:element name="ftStopWordsInclExcl" type="xqxft:ftStopWordsInclExcl" minOccurs="0" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType> <xsd:complexType name="ftStopWordOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:choice> <xsd:element name="without" /> <xsd:element name="stopwords" type="xqxft:stopWordsSpecSequence" /> </xsd:choice> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="stopword" type="xqxft:ftStopWordOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftLanguageOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption" > <xsd:sequence> <xsd:element name="value" type="xsd:string" /> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="language" type="xqxft:ftLanguageOption" substitutionGroup="xqxft:ftMatchOption" />  <xsd:complexType name="ftWildCardOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption"> <xsd:sequence> <xsd:element name="value"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="with wildcards" /> <xsd:enumeration value="without wildcards" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="wildcard" type="xqxft:ftWildCardOption" substitutionGroup="xqxft:ftMatchOption" /> <xsd:complexType name="ftExtensionOption"> <xsd:complexContent> <xsd:extension base="xqxft:ftMatchOption"> <xsd:sequence> <xsd:element name="ftExtensionName" type="xqx:QName"/> <xsd:element name="ftExtensionValue" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:element name="ftExtensionOption" type="xqxft:ftExtensionOption" substitutionGroup="xqxft:ftMatchOption" /> </xsd:schema> XQueryX stylesheet for XQuery and XPath Full Text 1.0

The XSLT stylesheet that defines the semantics of XQueryX in support of XQuery and XPath Full Text 1.0 integrates seamlessly with the XQueryX XSLT stylesheet defined in by importing the XQueryX XSLT stylesheet. It provides additional templates that define the semantics of the XQueryX representation of XQuery and XPath Full Text 1.0 by transforming that XQueryX representation into the human readable syntax of XQuery and XPath Full Text 1.0.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" xmlns:xqx="http://www.w3.org/2005/XQueryX">     <xsl:import href="http://www.w3.org/2005/XQueryX/xqueryx.xsl"/>  <xsl:template match="xqxft:ftOptionDecl"> <xsl:text>declare ft-option </xsl:text> <xsl:apply-templates/> </xsl:template>  <xsl:template match="xqxft:ftScoreVariableBinding"> <xsl:text> score </xsl:text> <xsl:value-of select="$DOLLAR"/> <xsl:if test="@xqx:prefix"> <xsl:value-of select="@xqx:prefix"/> <xsl:value-of select="$COLON"/> </xsl:if> <xsl:value-of select="."/> </xsl:template>  <xsl:template match="xqxft:ftContainsExpr"> <xsl:apply-templates select="xqxft:ftRangeExpr"/> <xsl:text> ftcontains </xsl:text> <xsl:apply-templates select="xqxft:ftSelectionExpr"/> <xsl:apply-templates select="xqxft:ftIgnoreOption"/> </xsl:template> <xsl:template match="xqxft:value"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftRangeExpr"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftSelectionExpr"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftIgnoreOption"> <xsl:text>without content </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftSelection"> <xsl:apply-templates select="xqxft:ftSelectionSource"/> <xsl:value-of select="$NEWLINE"/> <xsl:text> </xsl:text> <xsl:apply-templates select="xqxft:optionsOrProximity"/> <xsl:text> </xsl:text> <xsl:apply-templates select="xqxft:weight"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftSelectionSource"> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftPosFilter"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> <xsl:text> </xsl:text> </xsl:template>  <xsl:template match="xqxft:ftOrder"> <xsl:text>ordered</xsl:text> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftWindow"> <xsl:text>window </xsl:text> <xsl:apply-templates select="xqxft:value"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftDistance"> <xsl:text>distance </xsl:text> <xsl:apply-templates select="xqxft:ftRange"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftScope"> <xsl:value-of select="xqxft:type"/> <xsl:text> </xsl:text> <xsl:value-of select="xqxft:unit"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:ftContent"> <xsl:value-of select="xqxft:location"/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:exactlyRange"> <xsl:text>exactly </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:atLeastRange"> <xsl:text>at least </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:atMostRange"> <xsl:text>at most </xsl:text> <xsl:apply-templates select="xqxft:value"/> </xsl:template> <xsl:template match="xqxft:fromToRange"> <xsl:text>from </xsl:text> <xsl:apply-templates select="xqxft:lower"/> <xsl:text> to </xsl:text> <xsl:apply-templates select="xqxft:upper"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:lower"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:upper"> <xsl:apply-templates/> </xsl:template>  <xsl:template match="xqxft:case"> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:diacritics"> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:stem"> <xsl:value-of select="xqxft:value"/> <xsl:value-of select="$NEWLINE"/> </xsl:template>  <xsl:template match="xqxft:thesaurus"> <xsl:choose> <xsl:when test="without"> <xsl:text>without thesaurus </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:thesauri"> <xsl:text>with thesaurus </xsl:text> <xsl:choose> <xsl:when test="child::*[2]"> <xsl:call-template name="parenthesizedList"/> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xqxft:default"> <xsl:text>default </xsl:text> </xsl:template> <xsl:template match="xqxft:thesaurusID"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:at"> <xsl:text>at "</xsl:text> <xsl:value-of select="."/> <xsl:text>" </xsl:text> </xsl:template> <xsl:template match="xqxft:relationship"> <xsl:text>relationship "</xsl:text> <xsl:value-of select="."/> <xsl:text>" </xsl:text> </xsl:template> <xsl:template match="xqxft:levels"> <xsl:apply-templates/> <xsl:text> levels </xsl:text> </xsl:template>  <xsl:template match="xqxft:stopword"> <xsl:choose> <xsl:when test="without"> <xsl:text>without stop words </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:stopwords"> <xsl:text>with </xsl:text> <xsl:choose> <xsl:when test="default"> <xsl:text>default stop words </xsl:text> </xsl:when> <xsl:otherwise> <xsl:text>stop words </xsl:text> </xsl:otherwise> </xsl:choose> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftStopWords"> <xsl:choose> <xsl:when test="ref"> <xsl:text>at "</xsl:text> <xsl:value-of select="ref"/> <xsl:text>" </xsl:text> </xsl:when> <xsl:otherwise> <xsl:apply-templates/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xqxft:list"> <xsl:call-template name="parenthesizedList"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:FTStopWordsInclExcl"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:union"> <xsl:text>union </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:except"> <xsl:text>except </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:language"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:wildcard"> <xsl:apply-templates/> <xsl:value-of select="$NEWLINE"/> </xsl:template> <xsl:template match="xqxft:ftAnd"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text>ftand </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftOr"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text>ftor </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftMildNot"> <xsl:apply-templates select="xqx:firstOperand"/> <xsl:text>not in </xsl:text> <xsl:apply-templates select="xqx:secondOperand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftUnaryNot"> <xsl:text>ftnot </xsl:text> <xsl:apply-templates select="xqx:operand"/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="xqxft:ftPrimaryWithOptions"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftPrimary"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:parenthesized"> <xsl:text>( </xsl:text> <xsl:apply-templates/> <xsl:text> ) </xsl:text> </xsl:template> <xsl:template match="xqxft:ftWords"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsValue"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsLiteral"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftWordsExpression"> <xsl:text> { </xsl:text> <xsl:apply-templates/> <xsl:text> } </xsl:text> </xsl:template> <xsl:template match="xqxft:ftAnyAllOption"> <xsl:value-of select="."/> </xsl:template> <xsl:template match="xqxft:ftTimes"> <xsl:text>occurs </xsl:text> <xsl:apply-templates/> <xsl:text>times </xsl:text> </xsl:template> <xsl:template match="xqxft:ftExtensionSelection"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftExtensionOption"> <xsl:text>option </xsl:text> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftExtensionName"> <xsl:apply-templates/> </xsl:template> <xsl:template match="xqxft:ftExtensionValue"> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet> XQueryX for XQuery and XPath Full Text 1.0 example

The following example is based on the data and queries of one of the use cases in . In this example, we show the English description of the query, the XQuery Full Text solution given in , a Full Text XQueryX solution, and the XQuery Full Text query that results from applying the Full Text XQueryX-to-XQuery Full Text transformation defined by the stylesheet in to the Full Text XQueryX solution. The latter XQuery Full Text expression is presented only as a sanity-check — the intent of the stylesheet is not to create the identical XQuery Full Text expression given in , but to produce a valid XQuery Full Text expression with the same semantics.

Comparison of the results of the Full Text XQueryX-to-XQuery Full Text transformation given in this document with the XQuery Full Text solutions in the may be helpful in evaluating the correctness of the Full Text XQueryX solution in the example.

The XQuery Full Text Use Cases solution given for the example is provided only to assist readers of this document in understanding the Full Text XQueryX solution. There is no intent to imply that this document specifies a "compilation" or "transformation" of XQuery Full Text syntax into Full Text XQueryX syntax.

In the following example, note that path expressions are expanded to show their structure. Also, note that the prefix syntax for binary operators like "and" makes the precedence explicit. In general, humans find it easier to read an XML representation that does not expand path expressions, but it is less convenient for programmatic representation and manipulation. XQueryX is designed as a language that is convenient for production and modification by software, and not as a convenient syntax for humans to read and write.

Finally, please note that white space, including new lines, have been added to some of the Full Text XQueryX documents and XQuery Full Text expressions for readability. That additional white space is not necessarily produced by the Full Text XQueryX-to-XQuery Full Text transformation.

Example

Here is Q4 from the , use case SCORE: Find all books with parts about "usability testing".

XQuery solution in XQuery and XPath Full Text 1.0 Use Cases: declare function local:filter ( $nodes as node()*, $exclude as element()* ) as node()* { for $node in $nodes except $exclude return typeswitch ($node) case $e as element() return element {node-name($e)} { $e/@*, filter( $e/node() except $exclude, $exclude ) } default return $node }; for $book in doc("http://bstore1.example.com/full-text.xml") /books/book let $irrelevantParts := for $part in $book//part let score $score := $part ftcontains "usability test.*" with wildcards where $score < 0.5 return $part where count($irrelevantParts) < count($book//part) return filter($book, $irrelevantParts) A Solution in Full Text XQueryX: <?xml version="1.0"?> <xqx:module xmlns:xqxft="http://www.w3.org/2007/xpath-full-text" xmlns:xqx="http://www.w3.org/2005/XQueryX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2007/xpath-full-text http://www.w3.org/2007/xpath-full-text/XQueryX-Full-Text-extensions.xsd http://www.w3.org/2005/XQueryX http://www.w3.org/2005/XQueryX/xqueryx.xsd"> <xqx:mainModule> <xqx:prolog> <xqx:functionDecl> <xqx:functionName xqx:prefix="local">filter</xqx:functionName> <xqx:paramList> <xqx:param> <xqx:varName>nodes</xqx:varName> <xqx:typeDeclaration> <xqx:anyKindTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator> </xqx:typeDeclaration> </xqx:param> <xqx:param> <xqx:varName>exclude</xqx:varName> <xqx:typeDeclaration> <xqx:elementTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator> </xqx:typeDeclaration> </xqx:param> </xqx:paramList> <xqx:typeDeclaration> <xqx:anyKindTest/> </xqx:typeDeclaration> <xqx:functionBody> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>node</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:exceptOp> <xqx:firstOperand> <xqx:varRef> <xqx:name>nodes</xqx:name> </xqx:varRef> </xqx:firstOperand> <xqx:secondOperand> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:secondOperand> </xqx:exceptOp> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:returnClause> <xqx:typeswitchExpr> <xqx:argExpr> <xqx:varRef> <xqx:name>node</xqx:name> </xqx:varRef> </xqx:argExpr> <xqx:typeswitchExprCaseClause> <xqx:variableBinding>e</xqx:variableBinding> <xqx:sequenceType> <xqx:elementTest/> </xqx:sequenceType> <xqx:resultExpr> <xqx:computedElementConstructor> <xqx:tagNameExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">node-name</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:tagNameExpr> <xqx:contentExpr> <xqx:sequenceExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:attributeName> <xqx:star/> </xqx:attributeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:pathExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">filter</xqx:functionName> <xqx:arguments> <xqx:exceptOp> <xqx:firstOperand> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>e</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:anyKindTest/> </xqx:stepExpr> </xqx:pathExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:secondOperand> </xqx:exceptOp> <xqx:varRef> <xqx:name>exclude</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:sequenceExpr> </xqx:contentExpr> </xqx:computedElementConstructor> </xqx:resultExpr> </xqx:typeswitchExprCaseClause> <xqx:typeswitchExprDefaultClause> <xqx:resultExpr> <xqx:varRef> <xqx:name>node</xqx:name> </xqx:varRef> </xqx:resultExpr> </xqx:typeswitchExprDefaultClause> </xqx:typeswitchExpr> </xqx:returnClause> </xqx:flworExpr> </xqx:functionBody> </xqx:functionDecl> </xqx:prolog> <xqx:queryBody> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>book</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">doc</xqx:functionName> <xqx:arguments> <xqx:stringConstantExpr> <xqx:value>http://bstore1.example.com/full-text.xml</xqx:value> </xqx:stringConstantExpr> </xqx:arguments> </xqx:functionCallExpr> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:nameTest>books</xqx:nameTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:nameTest>book</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:letClause> <xqx:letClauseItem> <xqx:typedVariableBinding> <xqx:varName>irrelevantParts</xqx:varName> </xqx:typedVariableBinding> <xqx:letExpr> <xqx:flworExpr> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>part</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:nameTest>part</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:letClause> <xqx:letClauseItem> <xqxft:ftScoreVariableBinding>score</xqxft:ftScoreVariableBinding> <xqx:letExpr> <xqxft:ftContainsExpr> <xqxft:ftRangeExpr> <xqx:varRef> <xqx:name>part</xqx:name> </xqx:varRef> </xqxft:ftRangeExpr> <xqxft:ftSelectionExpr> <xqxft:ftSelection> <xqxft:ftSelectionSource> <xqx:stringConstantExpr> <xqx:value>usability test.*</xqx:value> </xqx:stringConstantExpr> </xqxft:ftSelectionSource> <xqxft:optionsOrProximity> <xqxft:wildcard> <xqxft:value>with wildcards</xqxft:value> </xqxft:wildcard> </xqxft:optionsOrProximity> </xqxft:ftSelection> </xqxft:ftSelectionExpr> </xqxft:ftContainsExpr> </xqx:letExpr> </xqx:letClauseItem> </xqx:letClause> <xqx:whereClause> <xqx:lessThanOp> <xqx:firstOperand> <xqx:varRef> <xqx:name>score</xqx:name> </xqx:varRef> </xqx:firstOperand> <xqx:secondOperand> <xqx:decimalConstantExpr> <xqx:value>0.5</xqx:value> </xqx:decimalConstantExpr> </xqx:secondOperand> </xqx:lessThanOp> </xqx:whereClause> <xqx:returnClause> <xqx:varRef> <xqx:name>part</xqx:name> </xqx:varRef> </xqx:returnClause> </xqx:flworExpr> </xqx:letExpr> </xqx:letClauseItem> </xqx:letClause> <xqx:whereClause> <xqx:lessThanOp> <xqx:firstOperand> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">count</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>irrelevantParts</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:firstOperand> <xqx:secondOperand> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="fn">count</xqx:functionName> <xqx:arguments> <xqx:pathExpr> <xqx:stepExpr> <xqx:filterExpr> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> </xqx:filterExpr> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis> <xqx:nameTest>part</xqx:nameTest> </xqx:stepExpr> </xqx:pathExpr> </xqx:arguments> </xqx:functionCallExpr> </xqx:secondOperand> </xqx:lessThanOp> </xqx:whereClause> <xqx:returnClause> <xqx:functionCallExpr> <xqx:functionName xqx:prefix="local">filter</xqx:functionName> <xqx:arguments> <xqx:varRef> <xqx:name>book</xqx:name> </xqx:varRef> <xqx:varRef> <xqx:name>irrelevantParts</xqx:name> </xqx:varRef> </xqx:arguments> </xqx:functionCallExpr> </xqx:returnClause> </xqx:flworExpr> </xqx:queryBody> </xqx:mainModule> </xqx:module> Transformation of Full Text XQueryX Solution into XQuery Full Text

Application of the stylesheet in to the Full Text XQueryX solution results in:

declare function local:filter($nodes as node()*, $exclude as element()*) as node() { ( for $node in ($nodes except $exclude) return ( typeswitch($node) case $e as element() return element {fn:node-name($e)} {( $e/child::attribute(*), fn:filter( ($e/child::node() except $exclude), $exclude ) )} default return $node ) ) }; ( for $book in fn:doc("http://bstore1.example.com/full-text.xml")/child::books/child::book let $irrelevantParts:= ( for $part in $book/descendant-or-self::part let score $score:=$part ftcontains "usability test.*" with wildcards where ($score < 0.5) return $part ) where (fn:count($irrelevantParts) < fn:count($book/descendant-or-self::part)) return local:filter($book, $irrelevantParts) ) References Normative References XQuery and XPath Full Text Requirements, Stephen Buxton, Michael Rys, Editors. World Wide Web Consortium, 18 May 2007. This version is http://www.w3.org/TR/2007/WD-xpath-full-text-10-requirements-20070518/. The latest version is available at http://www.w3.org/TR/xpath-full-text-10-requirements/. A. Phillips and M. Davis. Tags for Identifying Languages. IETF BCP 47. See http://tools.ietf.org/html/bcp47. This reference leads to and and replaces . S. Bradner. Key Words for use in RFCs to Indicate Requirement Levels. IETF RFC 2119. See http://www.ietf.org/rfc/rfc2119.txt. H. Alvestrand. Tags for the Identification of Languages. IETF RFC 3066. See http://www.ietf.org/rfc/rfc3066.txt. A. Phillips and M. Davis. Tags for Identifying Languages. IETF RFC 4646. See http://www.ietf.org/rfc/rfc4646.txt. A. Phillips and M. Davis. Matching of Language Tags. IETF RFC 4647. See http://www.ietf.org/rfc/rfc4647.txt. Non-normative References Documentation Guidelines for the Establishment and Development of Monolingual Thesauri, Geneva: International Organization for Standardization, 2nd edition, 1986. ISO/IEC 13249-2 Information technology --- Database languages --- SQL Multimedia and Application Packages --- Part 2: Full-Text. Geneva: International Organization for Standardization, 2nd edition, 2003. M. Davis. Unicode Standard Annex #29 Text Boundaries, revision 11, 2006. See http://www.unicode.org/reports/tr29/

Acknowledgements

We would like to thank the members of the XQuery and XPath Full-Text group for their fruitful discussions.

We would like to thank the following people for their contributions on earlier drafts of this document.

Andrew Cencini, Microsoft - acencini@microsoft.com

Andrew Eisenberg, IBM - andrew.eisenberg@us.ibm.com

Nimish Khanolkar, Microsoft - nimishk@exchange.microsoft.com

Ashok Malhotra, Oracle - ashok.malhotra@oracle.com

Tapas Nayak, Microsoft - tapasnay@exchange.microsoft.com

Roland Seiffert, IBM - seiffert@de.ibm.com

Glossary

Checklist of Implementation-Defined Features

This appendix provides a summary of features defined in this specification whose effect is explicitly implementation-defined. The conformance rules require vendors to provide documentation that explains how these choices have been exercised.

A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.

A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. An implementation is not required to support sentences.

A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. An implementation is not required to support paragraphs.

Implementations are free to provide implementation-defined ways to differentiate between markup's effect on token boundaries during tokenization.

How text with wildcard indicators and qualifiers is tokenized is implementation-defined.

The set of expressions (of form ExprSingle) that can be assigned to a score variable in a let-clause is implementation-defined. The result of passing an expression to the scoring algorithm that it does not support is implementation-defined.

The match option application order, subject to the stated constraints, is implementation-defined.

It is implementation-defined what a stem of a token is and whether stemming will based on an algorithm, dictionary, or mixed approach.

It is implementation-defined which thesaurus relationships an implementation supports.

The behavior of the implementation when it encounters a combination of thesauri, levels, and relationships that it does not support is implementation-defined.

When the option "with default stop words" is used, an implementation-defined collection of stop words is used.

When a stop word is specified in a query, then the number of tokens in the text that are matched by that stop word is implementation-defined.

The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. It MAY influence the behavior of other match options in an implementation-defined way.

The set of valid language identifiers is implementation-defined.

The behavior of the implementation when it encounters a language identifier it does not support is implementation-defined.

Certain values in the static context (see ) that can be overwritten or augmented by implementations are implementation-defined.

Which namespace URIs will be recognized for denoting extension selection pragmas is implementation-defined, as is the syntax and behavior of recognized pragmas.

Which namespace URIs will be recognized for denoting extension options is implementation-defined, as is the syntax and behavior of recognized options.

The conditions under which tokenization of two equal items produces different tokens is implementation-defined.

The restrictions on allowable expressions used to compute scores are implementation-defined.

Change Log

Sihem Amer-Yahia	2005-04-08	Updated case matrix	Updated case matrix row "sensitive", column "CCI" from "case-insensitive variant of CCI if it exists, else error" to "case-sensitive variant of CCI if it exists, else error".
Sihem Amer-Yahia	2005-05-02	Closed issues with no changes	Closed Cluster B, Issue 28 IGNORE Syntax with no change to the document. Closed Cluster B, Issue 50 IGNORE Queries with no change to the document.
Sihem Amer-Yahia	2005-05-02	Updated FTTimes syntax	Closed Cluster G, Issue 14 FTTimesSelection and added a related bullet item in Section 3.
Sihem Amer-Yahia	2005-05-02	Updated FTWildCard syntax	Updated FTWildCardOption in Section 3.
Sihem Amer-Yahia	2005-05-03	Updated introduction	Replaced "semantic element" with "semantic markup" and "tag" with "element" in the introduction.
Sihem Amer-Yahia	2005-05-03	Added issue on error codes	Added Cluster J, Issue 59 Error Codes.
Sihem Amer-Yahia	2005-05-03	Closed issues with no change	Closed Cluster A, Issue 54 Weight Granularity in Scoring with same resolution as for Cluster A, Issue 5 Score Weighting, no further change to document. Closed Cluster H, Issue 9 Window with no change to the document. Closed Cluster H, Issue 19 FTScopeSelection on structure with no change to the document. Closed Cluster E, Issue 25 MatchOption Syntax with no change to the document. Closed Cluster H, Issue 44 FTContains Semantics with no change to the document.
Sihem Amer-Yahia	2005-05-03	Updated FTContent syntax	Updated FTContent adding "entire content", Closed Cluster C, Issue 39 Exact Element Content.
Sihem Amer-Yahia	2005-05-03	Closed issue on Boolean Naming	Closed Cluster F, Issue 38 Boolean Naming. Changes to the document are pending awaiting a decision on whether it is OK to use "and", "or", "not" for full text. If so change existing symbols to "and", "or", "not". If not change existing symbols to "ftand", "ftor", "ftnot".
Chavdar Botev	2005-05-03	Updated FTDistance semantics	Updated the semantics for distance.
Sihem Amer-Yahia	2005-05-03	Updated FTRange syntax	Made "exactly" required before an exact number in FTRange. Closed Cluster F, Issue 43 Exactly in FTRangeSpec.
Sihem Amer-Yahia	2005-05-04	Closed issue on collations	Closed Cluster D, Issue 57 Collations Match Option.
Jochen Doerre	2005-05-19	Added issue on scoring	Added Cluster A, Issue 60 Extended Scoring.
Chavdar Botev	2005-06-29	Added issue on FTNegation	Added Cluster G, Issue 62 Precise semantics of double negation.
Chavdar Botev	2005-06-29	Added issue on FTTimes	Added Cluster G, Issue 61 Desired semantics of FTTimes.
Sihem Amer-Yahia	2005-07-11	Updated FTMildNegation syntax	Updated the mild not syntax from "mild not" to "not in". Closed Cluster I, Issue 10 MildNot and Cluster F, Issue 41 Mildnot Naming.
Chavdar Botev	2005-07-12	Updated FTIgnore semantics	Changed semantics of FTIgnoreOption.
Sihem Amer-Yahia	2005-07-18	Corrected error codes	Corrected and added error codes, closing and implementing the resolution for Cluster J Issue 59 Error Codes.
Sihem Amer-Yahia	2005-07-18	Closed issues with no changes	closed Cluster I, Issue 13 "loose-grammar" leaving the grammar as it is. Closed issue Cluster D, Issue 53 "matchoptions-default" with no change to the document. Closed Cluster H, Issue 58 "ft-about-operator" with no change to the document.
Sihem Amer-Yahia	2005-07-21	Updated score syntax	Closed Cluster A, Issue 60 "new-scoring-proposal" and Issue 2 "scoring-values" and updated Section 2.2 Score Clause to reflect new score syntaxes. There are now syntaxes for scored queries 1) returning the same results as queries with Boolean predicates and 2) for returning more or fewer results.
Sihem Amer-Yahia	2005-07-21	Added appendix for defaults	Added appendix for defaults in the query prolog analogous to C.1 in the XQuery language document.
Sihem Amer-Yahia	2005-07-21	Updated FTThesaurus section	Aligned description in Section 3.2.4 FTThesaurusOption with current grammar.
Sihem Amer-Yahia	2005-07-21	Opened and closed issue on nested FTNegation	Opened and closed Cluster I, Issue 65 Nested FTNegations on the right side of an FTMildNegation.
Chavdar Botev	2005-07-25	Updated FTMildNegation semantics	Changed the semantics of MildNot.
Sihem Amer-Yahia	2005-08-10	Added Change Log	Added Change Log harvesting back entries from CVS change log.
Jochen Doerre	2005-08-17	Grammar changes	Changed XQuery/XPath grammar for new scoring syntax (resolution of Issue 60), for match option defaults in query prolog (resolution of Issue 45), for simplified window operator (resolution to Issue 51), renamed "mild not" to "not in" (resolution of Issue 41), modified FTThesaurusOption, FTStopwordOption and FTLanguageOption to require StringLiterals as decided in May 05 F2F.
Jochen Doerre	2005-08-17	Changes to Section 2	New scoring syntax introduced; rewritten most of 2.2. Corrected use of weights in 2.2.1 (wrong default, wrong use of 1.5)
Jochen Doerre	2005-08-17	Changes to Section 3	Adapting the explanations to changed syntax for FTWindow, FTThesaurusOption, FTStopwordOption and FTLanguageOption. Also corrected a couple of example explanations. Removed FTIgnoreOption from the list of match option defaults in 3.2 Corrected explanation and example of FTLanguageOption (diacritics nor case are language-specific!). Commented out last two examples of FTDistance, because distance 15 does not work for phrases.
Jochen Doerre	2005-08-17	Appendices A+B	Adapted introductory comment about which version of the XQuery/XPath grammars we are aligned to.
Jochen Doerre	2005-08-17	Dates in Header	Adapted current date and previous date and links in full-text-query-language-semantics.xml and in tqheader.xml.
Jochen Doerre	2005-08-19	Added Section 2.3, Changes in 3+4	Added Section 2.3 Extension to Static Context. Changed Sections 3.2 and 4.4.1.1 to refer to match option settings in the static context.
Jochen Doerre	2005-08-19	Added Issue 63	Added Cluster G Issue 63: Distance constraints do not work on phrases.
Jochen Doerre	2005-08-19	Changes in Section 4	Adapted semantics to new scoring feature (resolution of Issue 60), changed FTWindow semantics according to resolution of Issue 51, and cleaned examples.
Jochen Doerre	2005-08-19	Appendix G	Added lines for statically known thesauri and stop lists.
Jochen Doerre	2005-08-25	Added Issue 64	Added Cluster E Issue 64:System Relative Operator Defaults (using wording proposed by Pat Case).
Jochen Doerre	2005-10-10	Changes in Section 3	Rephrased Section 3.2.7 FTIgnoreOption. Explanation and example adapted to simple (non-recursive) use of "ignore".
Jochen Doerre	2005-10-10	Changes in Section 4	Incorporated Section 4.3.1.4 Match and AllMatches Normal Form.
Sihem Amer-Yahia	2005-10-12	Incorporated comments	Incorporated Pat's comments at http://lists.w3.org/Archives/Member/member-query-fttf/2005Sep/0068.html
Jim Melton	2005-10-20	Changes in Sections 3 and 4	Properly marked up errors and inserted error summary appendix. Re-ordered appendices so normative appendices precede non-normative appendices.
Jochen Doerre	2005-10-24	Final editings	Included corrections to examples in Section 3. Changed meaning of distance 0 for sentences (paragraphs) to mean adjacent. Rework of Appendix H Checklist of Implementation-Defined Features. Resolution texts to issues 45, 59, and 62.
Jochen Doerre	2005-11-28	Restrict FTTimes to FTWords	Modified EBNF syntax to allow the FTTimes operation to be applicable only to simple FTWords.
Jochen Doerre	2005-11-28	Re: Bug 2299: Changes to Section 4	The AllMatches model has been changed to allow the TokenInfo of a StringMatch to represent an interval of token positions, instead of single positions. Thus, a phrase is now modeled using a single StringMatch, and consequently distance constraints (which always apply to the individual StringMatches) can be used to constrain the entire phrase. In addition, this change allows to model overlapping tokens. The semantics functions for FTOrder (order now constrains the start positions of tokens), for FTScope, for FTDistance (a distance constraint requires a certain number of positions between the end of one token and the start of the next) and for FTWindows have been adapted.
Jochen Doerre	2006-01-09	Issues List removed	Dropped Appendix I "Issues List", as issues are tracked in Bugzilla now.
Mary Holstege	2006-02-01	Static context	Added known languages to static context.
Jochen Doerre	2006-03-06	Bug 2776	Changed EBNF grammar to allow weights to be specified using RangeExpr.
Mary Holstege	2006-03-30	Updated Tokenization 4.2.7	Expanded and clarified definition. Added examples.
Pat Case	2006-04-13	Replaced glossary	Removed glossary copied from the XQuery language document and inserted coding to produce a full-text glossary.
Jochen Doerre	2006-04-24	Section 2	Added new Processing Model section.
Jochen Doerre	2006-04-25	Section 4	Included the completely revised semantics schemata and functions, which now (i) correctly handle interval-based TokenInfos, (ii) separate the representation of TokenInfos and SearchTokenInfos and SearchItems, (iii) have been simplified regarding the semantics of match options by no longer separating the implementation-defined matching function from (most of) the implementation-defined application of match options, and (iv) have been type- and syntax-checked.
Mary Holstege	2006-05-31	Bug 2483	Clarified type constraints on full-text operator parameters in Section 3. Revised EBNF to be more specific in some cases.
Jochen Doerre	2006-08-04	Bug 3374	Revised complete example in Section 4.3.3.
Jim Melton	2006-08-17	Added XQueryX support	Added new normative appendix defining the XML schemas and XSLT stylesheet necessary for XQuery and XPath Full Text 1.0 to integrate into XQueryX.
Jochen Doerre	2006-08-21	Bug 3439	Fixed FTMildNot semantics.
Mary Holstege	2006-08-22	Conformance	Added new conformance section as section 5. Add error code definitions to appendix D.
Mary Holstege	2006-08-22	FTWords	Fixed wording of FTWords with respect to type constraints.
Mary Holstege	2006-10-05	Score Variables	Added more complex scoring examples as clarification for bug #3596.
Mary Holstege	2006-10-05	FTSelection	Improved reading flow for examples. Make linkage of non-terminals consistent.
Mary Holstege	2006-11-01	Overall	Reorganized structure of document to improve reading flow.
Jim Melton	2006-12-26	FTLanguageOption	Revised text dealing with FTLanguageOption values that do not identify a known, defined language in RFC 3066. Added reference to RFC 4646.
Jim Melton	2006-12-26	FTLanguageOption and FTContainsExpr	Added text saying that a full-text processor SHOULD use xml:lang information when choosing collations and when processing FTMatchOptions. Also added text saying that an xml:lang specification SHOULD take precedence over an FTLanguageOption specification.
Jim Melton	2006-12-26	Tokenization	Made changes clarifying that tokenization SHOULD be implementation-defined (implicitly permitting it to be implementation-dependent).
Jochen Doerre	2007-01-22	Definitions for implementation-defined/ -dependent.	Added definitions for implementation-defined/dependent to Introduction as in XQuery document. Added links throughout the paper.
Jochen Doerre	2007-02-17	Bug 3698	Removed options "with diacritics", "without diacritics".
Jochen Doerre	2007-02-17	Bug 3914	Changed syntax of Booleans to "ftand", "ftor", "ftnot".
Jochen Doerre	2007-02-17	Bug 3920	Changed 3rd example in 3.3.7 FTDistance and added a 4th.
Jim Melton	2007-02-25	Bug 3935	Added text to define how wildcard characters can be escaped so they can be used in a search.
Pat Case	2007-02-26	Itemized sample tokens in 3 FTSelections	To resolve Bug 3913, added a sentence itemizing the first 5 tokens in the sample tokenization.
Pat Case	2007-02-26	Corrected example in 3.3.7 FTDistance	To resolve Bug 3920, corrected the first example and preceding text in 3.3.7 FTDistance to remove the "not in" operator and to use terms from the sample data.
Pat Case	2007-02-26	Inserted sentence into 3.2.6 FTLanguageOption	To resolve Bug 3926, inserted sentence into 3.2.6 FTLanguageOption saying that the "language" option MAY influence the behavior of other match options.
Pat Case	2007-02-26	Inserted a sentence into 3.2.5 FTStopWordOption	To resolve Bug 3930, inserted a sentence into 3.2.5 FTStopWordOption saying that "union" and "except" are applied from left to right.
Pat Case	2007-02-26	Added a note to 3.2.5 FTStopWordOption	To resolve Bug 3932, added a note to 3.2.5 FTStopWordOption saying Stop word lists MAY be applied during indexing. If applied during indexing asking for stop words to not be used during a query, will have no effect.
Pat Case	2007-02-26	Added a note to 3.4 FTIgnore	To resolve Bug 3936, added a note to 3..4 FTIgnore saying Nodes MAY be ignored during indexing and during query processing. Ignore option applies only to query processing. Whether and how indexing ignores nodes is out of scope for this specification.
Jochen Doerre	2007-02-26	Bug 3924	Changed grammar for match options: now precedence of match options is higher than Booleans. Included restriction to have at most one option of a group at a level.
Jochen Doerre	2007-02-27	Bug 3910, 3924, 3928	Reformulated what the case options mean. Added lower/uppercase as possible values for the case option to table in Appendix C (Static Context Components) and put rules and alternatives in the grammar into a more logical order. Also ordered tables and lists in the text the same.
Jochen Doerre	2007-03-02	Bug 3737	Reformulated and restructured most of section 3. Added explanation of the application structure of positional filters (formerly: FTProximities) and how match options take effect. Renamed the following grammar symbols: FTWordsSelection to FTPrimary, FTWordsMatches to FTPrimaryWithOptions, FTProximity to FTPosFilter.
Mary Holstege	2007-04-02	Bugs 4345, 4355, 4358, 4445	Reworked description of the wildcard option and added a new example. Added note on the effect when the lower bound of a range is greater than the upper bound. Fixed FTContent example to be "with wildcards".
Jochen Doerre	2007-04-09	Bug 3939	Added example for overlapping tokens in 4.1.
Jochen Doerre	2007-04-09	Bug 3931	Added match option application order, as agreed in FTTF-136.
Mary Holstege	2007-04-19	Conformance	Made support for uppercase and lowercase FTCaseOptions optional.
Mary Holstege	2007-04-19	Extensions	Added text to describe extension options and selections.
Jochen Doerre	2007-04-19	Bug 4386	And-selection description fixed in Sec. 3.
Jochen Doerre	2007-04-20	Bugs 3898, 4388	Finalized the additions needed to allow for nested FTDistance/FTWindow.
Jochen Doerre	2007-04-23	Section 4	Simplifications to the match option schemata and processing.
Mary Holstege	2007-04-25	Schemas	Misc. editorial improvements to schemas.
Pat Case	2007-09-13	Definition of a token	Refined the definition of a token.
Pat Case	2007-09-13	Sections 1-2	Made editorial changes throughout Sections 1-2.
Mary Holstege	2007-10-11	Semantics	Clarified definition of tokenization; fix-ups wrt overlapping tokens.
Mary Holstege	2007-10-11	Conformance	Reinstated lost conformance item on negative weights; fixed up constraints on scoring expressions.
Pat Case	2007-10-12	Reorganized Section 1.1	Reorganized Section 1.1, taking paragraphs out of the second ordered list, removing 2 sentences, reordering some of the paragraphs.
Pat Case	2007-10-12	Tokenization	Consolidated the early, informal introduction to tokenization into Section 1.1, moving what was in 2.1 Processing Model to Section 1.1. Removed some text and added a forward reference to the formal definition and constraints in 4.1.
Pat Case	2007-10-13	Using Weights	In 2.3.1. Using Weights, relabelled and reorganized the constraints pertaining to weights and scoring algorithms.
Pat Case	2007-10-15	Processing Model	In 2.1 Processing Model, made step 2, the new step 4a.
Jochen Doerre	2007-11-09	FTStopWords grammar and description	Renamed nonterminals: FTRefOrList to FTStopWords, FTInclExclStringLiteral to FTStopWordsInclExcl. Added negative stop words example: .../p ftcontains "propagating errors" with stop words ("few").
Jochen Doerre	2007-11-12	Chapter 3	Adapt text were it assumed that tokens have unique positions. Talk explicitly of covered token positions (in FTWords, FTContent).
Jochen Doerre	2007-11-13	Chapter 3	More explanation for 2nd example for anchoring selection "at end" (3.6.5). Bug 4717.
Pat Case	2007-12-4	Title	Removed 1.0, 2.0, and hyphen from title and title references.
Mary Holstege	2008-01-24	Misc.	Bug fixes: 4714, 4715/2, 4717, 4728, 5415. Replaced incorrect text in definition of FTWindow. Eliminated notion of "adjacent" and "consecutive" tokens; replaced with description in terms of token positions. Made definition of Ignore option consistent with formal semantics: no new context focus is generated. Added additional examples. Added informative reference to UAX29. Consistent usage of the term "query string" etc.
Mary Holstege	2008-01-24	Grammar.	Move ft-option to first part of prolog.
Mary Holstege	2008-02-28	Semantics.	Clarify handling of overlapping tokens with respect to distance.
Mary Holstege	2008-03-17	Semantics.	Minor fixes to function definitions to resolve issues: 5572, 5573, 5574, and 5575.
Mary Holstege	2008-04-22	Naming conventions.	Use the case "StopWord" and "MildNot" consistently.