W3C

XQuery and XPath Full Text 1.0

W3C Recommendation 17 March 2011

This version:
http://www.w3.org/TR/2011/REC-xpath-full-text-10-20110317/
Latest version:
http://www.w3.org/TR/xpath-full-text-10/
Previous version:
http://www.w3.org/TR/2011/PR-xpath-full-text-10-20110125/
Editors:
Pat Case, Library of Congress
Michael Dyck, Invited Expert
Mary Holstege, Mark Logic Corporation
Sihem Amer-Yahia, AT&T Labs - Research
Chavdar Botev, Invited Expert
Stephen Buxton, Mark Logic Corporation
Jochen Doerre, IBM
Jim Melton, Oracle
Michael Rys, Microsoft
Jayavel Shanmugasundaram, Invited Expert

Please refer to the errata for this document, which may include some normative corrections.

See also translations.

This document is also available in these non-normative formats: XML and Changes since Candidate Recommendation.


Abstract

This document defines the syntax and formal semantics of XQuery and XPath Full Text 1.0, which is a language that extends XQuery 1.0 [XQuery 1.0: An XML Query Language (Second Edition)] and XPath 2.0 [XML Path Language (XPath) 2.0 (Second Edition)] with full-text search capabilities.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a Recommendation of the W3C. It was jointly developed by the W3C XML Query Working Group and the W3C XSL Working Group, each of which is part of the XML Activity.

This document incorporates minor changes made against the Proposed Recommendation of 25 January 2011. Changes to this document since the Proposed Recommendation are detailed in J Change Log. A Java applet that parses XQuery and XPath Full Text 1.0 expressions is available at http://www.w3.org/2010/02/qt-applets/xquery10-fulltext/.

A Test Suite has been created for this document. Implementors are encouraged to run this test suite and report their results. The Test Suite can be found at http://dev.w3.org/cvsweb/2007/xpath-full-text-10-test-suite/. An implementation report is available at http://dev.w3.org/2007/xpath-full-text-10-test-suite/PublicPagesStagingArea/ReportedResults/XQFTTSReport.html.

No substantive changes have been made to this specification since its publication as a Proposed Recommendation.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[FT]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Full-Text Search and XML
    1.2 Organization of this document
    1.3 A word about namespaces
2 Full-Text Extensions to XQuery and XPath
    2.1 Processing Model
    2.2 Full-Text Contains Expression
        2.2.1 Description
        2.2.2 Examples
    2.3 Score Variables
        2.3.1 Using Weights Within a Scored FTContainsExpr
    2.4 Extensions to the Static Context
3 Full-Text Selections
    3.1 Primary Full-Text Selections
        3.1.1 Weights
    3.2 Search Tokens and Phrases
    3.3 Cardinality Selection
    3.4 Match Options
        3.4.1 Language Option
        3.4.2 Wildcard Option
        3.4.3 Thesaurus Option
        3.4.4 Stemming Option
        3.4.5 Case Option
        3.4.6 Diacritics Option
        3.4.7 Stop Word Option
        3.4.8 Extension Option
    3.5 Logical Full-Text Operators
        3.5.1 Or-Selection
        3.5.2 And-Selection
        3.5.3 Mild-Not Selection
        3.5.4 Not-Selection
    3.6 Positional Filters
        3.6.1 Ordered Selection
        3.6.2 Window Selection
        3.6.3 Distance Selection
        3.6.4 Scope Selection
        3.6.5 Anchoring Selection
    3.7 Ignore Option
    3.8 Extension Selections
4 Semantics
    4.1 Tokenization
        4.1.1 Examples
        4.1.2 Representations of Tokenized Text and Matching
    4.2 Evaluation of FTSelections
        4.2.1 AllMatches
            4.2.1.1 Formal Model
            4.2.1.2 Examples
            4.2.1.3 XML representation
        4.2.2 XML Representation
        4.2.3 The evaluate function
        4.2.4 FTWords
        4.2.5 Match Options Semantics
            4.2.5.1 Types
            4.2.5.2 High-Level Semantics
            4.2.5.3 Formal Semantics Functions
            4.2.5.4 FTCaseOption
            4.2.5.5 FTDiacriticsOption
            4.2.5.6 FTStemOption
            4.2.5.7 FTThesaurusOption
            4.2.5.8 FTStopWordOption
            4.2.5.9 FTLanguageOption
            4.2.5.10 FTWildCardOption
        4.2.6 Full-Text Operators Semantics
            4.2.6.1 FTOr
            4.2.6.2 FTAnd
            4.2.6.3 FTUnaryNot
            4.2.6.4 FTMildNot
            4.2.6.5 FTOrder
            4.2.6.6 FTScope
            4.2.6.7 FTContent
            4.2.6.8 FTWindow
            4.2.6.9 FTDistance
            4.2.6.10 FTTimes
    4.3 FTContainsExpr
    4.4 Scoring
    4.5 Example
5 Conformance
    5.1 Minimal Conformance
    5.2 Optional Features
        5.2.1 FTMildNot Operator
        5.2.2 FTUnaryNot Operator
        5.2.3 FTUnit and FTBigUnit
        5.2.4 FTOrder Operator
        5.2.5 FTScope Operator
        5.2.6 FTWindow Operator
        5.2.7 FTDistance Operator
        5.2.8 FTTimes Operator
        5.2.9 FTContent Operator
        5.2.10 FTCaseOption
        5.2.11 FTStopWordOption
        5.2.12 FTLanguageOption
        5.2.13 FTIgnoreOption
        5.2.14 Scoring
        5.2.15 Weights
6 XQueryX Conformance

Appendices

A EBNF for XQuery 1.0 Grammar with Full Text extensions
    A.1 Terminal Symbols
B EBNF for XPath 2.0 Grammar with Full-Text extensions
    B.1 Terminal Symbols
C Static Context Components
D Error Conditions
E XML Syntax (XQueryX) for XQuery and XPath Full Text 1.0
    E.1 XQueryX representation of XQuery and XPath Full Text 1.0
    E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0
    E.3 XQueryX for XQuery and XPath Full Text 1.0 example
        E.3.1 Example
            E.3.1.1 XQuery solution in XQuery and XPath Full Text 1.0 Use Cases:
            E.3.1.2 A Solution in Full Text XQueryX:
            E.3.1.3 Transformation of Full Text XQueryX Solution into XQuery Full Text
F References
    F.1 Normative References
    F.2 Non-normative References
G Acknowledgements (Non-Normative)
H Glossary (Non-Normative)
I Checklist of Implementation-Defined Features (Non-Normative)
J Change Log (Non-Normative)


1 Introduction

This document defines the language and the formal semantics of XQuery and XPath Full Text 1.0. This language is designed to meet the requirements identified in W3C XQuery and XPath Full Text Requirements [XQuery and XPath Full Text 1.0 Requirements] and to support the queries in the W3C XQuery and XPath Full Text Use Cases [XQuery and XPath Full Text 1.0 Use Cases].

In this document, examples and material labeled as "Note" are provided for explanatory purposes and are not normative.

XQuery and XPath Full Text 1.0 extends the syntax and semantics of XQuery 1.0 and XPath 2.0.

Additionally, this document defines an XML syntax for XQuery and XPath Full Text 1.0. The most recent versions of the two XQueryX XML Schemas and the XQueryX XSLT stylesheet for XQuery and XPath Full Text 1.0 are available at http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx.xsd, http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx-ftmatchoption-extensions.xsd, and http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx.xsl, respectively.

1.1 Full-Text Search and XML

As XML becomes mainstream, users expect to be able to search their XML documents. This requires a standard way to do full-text search, as well as structured searches, against XML documents. A similar requirement for full-text search led ISO to define the SQL/MM-FT [SQL/MM] standard. SQL/MM-FT defines extensions to SQL to express full-text searches providing functionality similar to that defined in this full-text language extension to XQuery 1.0 and XPath 2.0.

XML documents may contain highly structured data (fixed schemas, known types such as numbers, dates), semi-structured data (flexible schemas and types), markup data (text with embedded tags), and unstructured data (untagged free-flowing text). Where a document contains unstructured or semi-structured data, it is important to be able to search using Information Retrieval techniques such as scoring and weighting.

Full-text search is different from substring search in many ways:

  1. A full-text search searches for tokens and phrases rather than substrings. A substring search for news items that contain the string "lease" will return a news item that contains "Foobar Corporation releases version 20.9 ...". A full-text search for the token "lease" will not.

  2. There is an expectation that a full-text search will support language-based searches which substring search cannot. An example of a language-based search is "find me all the news items that contain a token with the same linguistic stem as 'mouse'" (finds "mouse" and "mice"). Another example based on token proximity is "find me all the news items that contain the tokens 'XML' and 'Query' allowing up to 3 intervening tokens".

  3. Full-text search must address the vagaries and nuances of language. Search results are often of varying usefulness. When you search a web site for cameras that cost less than $100, this is an exact search. There is a set of cameras that matches this search, and a set that does not. Similarly, when you do a string search across news items for "mouse", there is only 1 expected result set. When you do a full-text search for all the news items that contain the token "mouse", you probably expect to find news items containing the token "mice", and possibly "rodents", or possibly "computers". Not all results are equal. Some results are more "mousey" than others. Because full-text search may be inexact, we have the notion of score or relevance. We generally expect to see the most relevant results at the top of the results list.

Note:

As XQuery and XPath evolve, they may apply the notion of score to querying structured data. For example, when making travel plans or shopping for cameras, it is sometimes useful to get an ordered list of near matches in addition to exact matches. If XQuery and XPath define a generalized inexact match, we expect XQuery and XPath to utilize the scoring framework provided by XQuery and XPath Full Text 1.0.

[Definition: Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.] Informally, tokenization breaks a character string into a sequence of tokens, units of punctuation, and spaces.

Tokenization, in general terms, is the process of converting a text string into smaller units that are used in query processing. Those units, called tokens, are the most basic text units that a full-text search can refer to. Full-text operators typically work on sequences of tokens found in the target text of a search. These tokens are characterized by integers that capture the relative position(s) of the token inside the string, the relative position(s) of the sentence containing the token, and the relative position(s) of the paragraph containing the token. The positions typically comprise a start and an end position.

Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization. Tokenization operates on the string value of an item; for element nodes this does not include the content of attribute nodes, but for attribute nodes it does. Tokenization is defined more formally in 4.1 Tokenization.

[Definition: A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.] [Definition: A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.]

Note:

Consecutive tokens need not be separated by either punctuation or space, and tokens may overlap.

Note:

In some natural languages, tokens and words can be used interchangeably.

[Definition: A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.]

[Definition: A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.]

Some XML elements represent semantic markup, e.g., <title>. Others represent formatting markup, e.g., <b> to indicate bold. Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries; for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries. Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization. In the absence of an implementation-defined way to differentiate, element markup (start tags, end tags, and empty-element tags) creates token boundaries.

A sample tokenization is used for the examples in this document. The results might be different for other tokenizations.

Tokenization enables functions and operators that operate on a part or the root of the token (e.g., wildcards, stemming).

Tokenization enables functions and operators which work with the relative positions of tokens (e.g., proximity operators).

This specification focuses on functionality that serves all languages. It also selectively includes functionalities useful within specific families of languages. For example, searching within sentences and paragraphs is useful to many western languages and to some non-western languages, so that functionality is incorporated into this specification.

Certain aspects of language processing are described in this specification as implementation-defined or implementation-dependent.

  • [Definition: Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.]

  • [Definition: Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.]

1.2 Organization of this document

This document is organized as follows. We first present a high level syntax for the XQuery and XPath Full Text 1.0 language along with some examples. Then, we present the syntax and examples of the basic primitives in the XQuery and XPath Full Text 1.0 language. This is followed by the semantics of the XQuery and XPath Full Text 1.0 language. The appendix contains a section that provides an EBNF for the XPath 2.0 Grammar with Full-Text Extensions, an EBNF for XQuery 1.0 Grammar with Full-Text Extensions, acknowledgements and a glossary.

1.3 A word about namespaces

Certain namespace prefixes are predeclared by XQuery 1.0 and, by implication, by this specification, and bound to fixed namespace URIs. These namespace prefixes are as follows:

  • xml = http://www.w3.org/XML/1998/namespace

  • xs = http://www.w3.org/2001/XMLSchema

  • xsi = http://www.w3.org/2001/XMLSchema-instance

  • fn = http://www.w3.org/2005/xpath-functions

  • local = http://www.w3.org/2005/xquery-local-functions

In addition to the prefixes in the above list, this document uses the prefix err to represent the namespace URI http://www.w3.org/2005/xqt-errors, This namespace prefix is not predeclared and its use in this document is not normative. Error codes that are not defined in this document are defined in other XQuery 1.0 and XPath 2.0 specifications, particularly [XML Path Language (XPath) 2.0 (Second Edition)] and [XQuery 1.0 and XPath 2.0 Functions and Operators (Second Edition)].

Finally, this document uses the prefix fts to represent a namespace containing a number of functions used in this document to describe the semantics of XQuery and XPath Full Text functions. There is no requirement that these functions be implemented, therefore no URI is associated with that prefix.

2 Full-Text Extensions to XQuery and XPath

XQuery and XPath Full Text 1.0 extends the languages of XQuery 1.0 and XPath 2.0 in three ways. It:

  1. Adds a new expression called FTContainsExpr;

  2. Enhances the syntax of FLWOR expressions in XQuery 1.0 and for expressions in XPath 2.0 with optional score variables; and

  3. Adds static context declarations for full-text match options to the query prolog.

Additionally, it extends the data model and processing models in various ways.

2.1 Processing Model

A full-text contains expression (2.2 Full-Text Contains Expression) is composed of several parts:

  1. An XPath 2.0 or XQuery 1.0 expression (RangeExpr) that specifies the sequence of items to be searched. [Definition: Those items are called the search context.]

  2. The full-text selection to be applied (3 Full-Text Selections). Full-text selections are, syntactically and semantically, fully composable and contain:

    • Required:

    • Optional:

      • Match options, such as indicators for case sensitivity and stop words (3.4 Match Options);

      • Boolean full-text operators, that compose a full-text selection from simpler full-text selections (3.5 Logical Full-Text Operators);

      • Other full-text operators that are constraints on the positions of matches, such as indicators for distance between tokens and for the cardinality of matches (3.6 Positional Filters and 3.3 Cardinality Selection); and

      • The weighting information. Each individual search term in a full-text selection may be annotated with optional weight information. This information may be used during the evaluation of the full-text selections to calculate scoring, information that quantifies the relevance of the result to the given search criteria.

  3. An optional XPath 2.0 or XQuery 1.0 expression (UnionExpr) that specifies the set of nodes, descendents of the RangeExp, whose contents must be ignored for the purpose of determining a match during the search (3.7 Ignore Option).

The results of the evaluation of the full-text selection operators are instances of the AllMatches model, which complements the XQuery Data Model (XDM) for processing full-text queries. An AllMatches instance describes all possible solutions to the full-text query for a given search context item. Each solution is described by a Match instance. A Match instance contains the tokens from the search context that must be included (described using StringInclude instances which model the positive terms) and the tokens from search context item that must be excluded (described using StringExclude instances which model the negative terms). Each negative or positive term is modeled as a tuple: the position of the query token or phrase in the full-text selection, and a TokenInfo structure that describes a set of tokens in the text string which match the query token or phrase.

Processing Model Extensions

Figure 1 provides a schematic overview of the XQuery and XPath Full Text 1.0 processing steps that are discussed in detail below. Some of these steps are completely outside the domain of XQuery; in Figure 1, these are depicted outside the black line that represents the boundaries of the language. The diagram only shows the central pieces of the XQuery Processing Model (see Section 2.2 Processing ModelXQ), however zooms in on the Execution Engine where the processing of the full-text extensions takes place. The full-text processing steps are labeled as FTn within the diagram and are referenced within the text.

Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all full-text selections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization transforms an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of full-text selections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models.

The resulting AllMatches instance obtained by the evaluation of an FTContainsExpr is converted into a Boolean value before being returned to the enclosing XPath or XQuery operation as follows. If at least one member of the disjunction contains only positive terms then value returned is true. If all members of the disjunction contain negative terms the result is false.

Weighting information, in an implementation-dependent fashion, may be used when calculating the scoring information computed and made available by FTContainsExpr to the optional score construct.

Given the components of a given full-text contains expression, the evaluation algorithm will proceed according to the following steps, also referenced in the processing model diagram as steps FTn (see Fig. 1):

  1. Evaluate the search context expression (resulting in the sequence of search context items), the ignore option, if any (resulting in the set of ignored nodes), and any other XQuery/XPath exprssions nested within the full-text contains expression. (FT1)

  2. Tokenize the query string(s). (FT2.1)

  3. For each search context item:

    1. Delete the ignored nodes from the search context item.

    2. Tokenize the result of the previous step. This produces a sequence of tokens. (FT2.2) Note that implementations may (as an optimization) perform tokenization as part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance.

    3. Evaluate the FTSelection against the tokens of the search context. (FT3, FT4)

  4. Convert the topmost AllMatches instances into a Boolean value. (FT5)

    The additional scoring information (also part of FT5) that is produced by the evaluation of the full-text contains expression is implementation-dependent and is not specified in this document. The scoring information is made available at the same time the Boolean value is returned.

(A more detailed version of the above procedure appears in Section 4.3 FTContainsExpr.)

Section 3 Full-Text Selections describes the syntax and the informal semantics of full-text operators. Their formal semantics as well as the formal definition of the AllMatches data model are given in Section 4 Semantics.

2.2 Full-Text Contains Expression

[Definition: A full-text contains expression is a expression that evaluates a sequence of items against a full-text selection. ]

As a syntactic construct, a full-text contains expression (grammar symbol: FTContainsExpr) behaves like a comparison expression (see Section 3.5.2 General ComparisonsXQ). This grammar rule introduces FTContainsExpr.

[50]    ComparisonExpr    ::=    FTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?

A full-text contains expression may be used anywhere a ComparisonExpr may be used. The contains text operator has higher precedence than other comparison operators, so the results of contains text expressions may be compared without enclosing them in parentheses.

2.2.1 Description

[51]    FTContainsExpr    ::=    RangeExpr ( "contains" "text" FTSelection FTIgnoreOption? )?

A full-text contains expression returns a Boolean value. It returns true if there is some item returned by the RangeExpr that, after tokenization, matches the full-text selection FTSelection. Since tokenization includes tokens derived only from the string values of items, a full-text contains expression searches the text of element nodes and of their descendant elements. The string value of other kinds of nodes, such as attributes and comments, will not be included unless the attribute or comment node itself is the target (RangeExpr) of the full-text contains expression. See Section 3 Full-Text Selections for more details. For the purpose of determining a match, certain descendants of nodes (identified by FTIgnoreOption) in the RangeExpr may be ignored, as specified in Section 3.7 Ignore Option.

An XQuery and XPath Full Text 1.0 processor SHOULD try to use the information available in xml:lang for processing of collations, as well as the various match options defined in Section 3.4 Match Options.

2.2.2 Examples

The following example in XQuery 1.0 Full Text returns the author of each book with a title containing a token with the same root as dog and the token cat.

for $b in /books/book
where $b/title contains text ("dog" using stemming) ftand "cat" 
return $b/author

The same example in XPath 2.0 Full Text is written as:


/books/book[title contains text ("dog" using stemming) ftand "cat"]/author

In the next example a ComparisonExpr is combined with an FTContainsExpr using the logical XQuery operator and. The query selects books that have a price of less than 50 and a title which contains a token with the same root as train:

/books/book[price < 50 and title contains text ("train" using stemming)]

The following example shows the combination of two contains text expressions the results of which are compared using the not-equals operator. The query selects books where either the title contains the token dog and the token cat and the content does not contain a token with the same root as train, or where the title fails to have one of the matching tokens but the content does:

/books/book[title contains text "dog" ftand "cat" ne
            content contains text ("train" using stemming)]

2.3 Score Variables

Besides specifying a match of a full-text query as a Boolean condition, full-text query applications typically also have the ability to associate scores with the results. [Definition: The score of a full-text query result expresses its relevance to the search conditions.]

XQuery and XPath Full Text 1.0 extends the languages of XQuery 1.0 and XPath 2.0 further by adding optional score variables to the for and let clauses of FLWOR expressions.

The production for the extended for clause in XQuery 1.0 follows.

[35]    ForClause    ::=    "for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)*
[37]    FTScoreVar    ::=    "score" "$" VarName

In XPath 2.0, the SimpleForClause is extended similarly.

When a score variable is present in a for clause the evaluation of the expression following the in keyword not only needs to determine the result sequence of the expression, i.e., the sequence of items which are iteratively bound to the for variable. It must also determine in each iteration the relevance "score" value of the current item and bind the score variable to that value.

The scope of a score variable bound in a for or let clause comprises all subexpressions of the containing FLWOR expression that appear after the variable binding. The scope does not include the expression to which the variable is bound. The for and let clauses of a given FLWOR expression may bind the same score variable name more than once. In this case, each new binding occludes the previous one, which becomes inaccessible in the remainder of the FLWOR expression.

The expanded QName of a score variable bound in a for clause must be distinct from both the expanded QName of the variable with which it is associated and the expanded QName of any positional variable with which it is associated [err:XQST0089]XQ.

The semantics of scoring and how it relates to second-order functions is discussed in Section 4.4 Scoring.

In the following example book elements are determined that satisfy the condition [content contains text "web site" ftand "usability" and .//chapter/title contains text "testing"]. The scores assigned to the book elements are returned.

for $b score $s 
    in /books/book[content contains text "web site" ftand "usability" 
                   and .//chapter/title contains text "testing"]
return $s

The example above is also a valid example of the XPath 2.0 extension.

Scores are typically used to order results, as in the following, more complete example.

for $b score $s 
    in /books/book[content contains text "web site" ftand "usability"]
where $s > 0.5
order by $s descending
return <result>  
          <title> {$b//title} </title> 
          <score> {$s} </score> 
       </result>

Note that the score variable gets one score value for each item in the value of the expression after the in keyword, regardless of the number of FTContainsExprs in that expression. In the following example, two separate full-text contains expressions are used to select the matching paragraphs. There is still just one score for each para returned. The highest scoring paragraphs will be returned first:

for $p score $s in 
  //book[title contains text "software"]/para[. contains text "usability"]
     order by $s descending
  return $p

The following more elaborate example uses multiple score variables to return the matching paragraphs ordered so that those from the highest scoring books precede those from the lowest scoring books, where the highest scoring paragraphs of each book are returned before the lower scoring paragraphs of that book:

for $b score $score1 in //book[title contains text "software"]
    order by $score1 descending
return
    for $p score $score2 in $b/para[. contains text "usability"]
       order by $score2 descending
    return $p

The score variable is bound to a value which reflects the relevance of the match criteria in the full-text selections to the items returned by the respective RangeExprs. The calculation of relevance is implementation-dependent, but score evaluation must follow these rules:

  1. Score values are of type xs:double in the range [0, 1].

  2. For score values greater than 0, a higher score must imply a higher degree of relevance

Similarly to their use in a for clause, score variables may be specified in a let clause. A score variable in a let clause is also bound to the score of the expression evaluation, but in the let clause one score is determined for the complete result.

The production for the extended let clause follows.

[38]    LetClause    ::=    "let" (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle ("," (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle)*

When using the score option in a for clause the expression following the in keyword has the dual purpose of filtering, i.e., driving the iteration, and determining the scores. It is possible to separately specify expressions for filtering and scoring by combining a simple for clause with a let clause that uses scoring. The following is an example of this.

for $b in /books/book[.//chapter/title contains text "testing"]
let score $s := $b/content contains text "web site" ftand "usability" 
order by $s descending
return <result score="{$s}">{$b}</result>

This example returns book elements with chapter titles that contain "testing". Along with the book elements scores are returned. These scores, however, reflect whether the book content contains "web site" and "usability".

Note that it is not a requirement of the score of an FTContainsExpr to be 0, if the expression evaluates to false, nor to be non-zero, if the expression evaluates to true. Hence, in the example above it is not possible to infer the Boolean value of the FTContainsExpr in the let clause from the calculated score of a returned result element. For instance, an implementation may want to assign a non-zero score to a book that contained "web site", but not "usability", as this may be considered more relevant than a book that does not contain "web site" or "usability".

The expression ExprSingle associated with the score variable is passed to the scoring algorithm. The scoring algorithm calculates the score value based on the passed expression (not on the value returned by evaluating the expression). The set of expressions supported by the scoring algorithm is implementation-defined. If an expression not supported by the scoring algorithm is passed to the scoring algorithm, the result is implementation-defined.

The use of score variables introduces a second-order aspect to the evaluation of expressions which cannot be emulated by (first-order) XQuery functions. Consider the following replacement of the clause let score $s := FTContainsExpr

let $s := score(FTContainsExpr)

where a function score is applied to some FTContainsExpr. If the function score were first-order, it would only be applied to the result of the evaluation of its argument, which is one of the Boolean constants true or false. Hence, there would be at most two possible values such a score function would be able to return and no further differentiation would be possible.

2.3.1 Using Weights Within a Scored FTContainsExpr

[Definition: Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions.] Weight declarations are introduced syntactically in the FTPrimaryWithOptions production, described in Section 3.1.1 Weights.

The weights assigned are not related to any absolute standard, but typically have a relationship to other weights within the same FTContains expression.

The effect of weights on the resulting score is implementation-dependent. However, scoring algorithms MUST conform to the constraint that when no explicit weight is specified, the default weight is 1.0.

The following example illustrates how different weights can be used for different search terms.

for $b in /books/book
let score $s := $b/content contains text ("web site" weight {0.5})
                                ftand ("usability" weight {2})
return <result score="{$s}">{$b}</result>

2.4 Extensions to the Static Context

The XQuery Static Context is extended with a component for each full-text match option group. The settings of these components can be changed by using the following declaration syntax in the Prolog.

[6]    Prolog    ::=    ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)*
[24]    FTOptionDecl    ::=    "declare" "ft-option" FTMatchOptions

Match options modify the match semantics of full-text expressions. They are described in detail in Section 3.4 Match Options. When a match option is specified explicitly in a full-text expression, it overrides the setting of the respective component in the static context.

3 Full-Text Selections

This section describes the full-text selections which contain the full-text operators in a full-text contains expression (FTContainsExpr), as well as the match options which modify the matching semantics of the full-text selections. In the following, the syntax for each type of full-text selection is given together with an informal statement of its meaning.

[Definition: A full-text selection specifies the conditions of a full-text search. ]

[144]    FTSelection    ::=    FTOr FTPosFilter*

As shown in the grammar, a full-text selection consists of search conditions possibly involving logical operators (FTOr), followed by an arbitrary number of positional filters (FTPosFilter).

The syntax and semantics of the individual full-text selection operators follow.

This XML document is the source document for examples in this section.

<books>
  <book number="1">
    <title shortTitle="Improving Web Site Usability">Improving  
        the Usability of a Web Site Through Expert Reviews and
        Usability Testing</title>
    <author>Millicent Marigold</author>
    <author>Montana Marigold</author>
    <editor>Véra Tudor-Medina</editor>
    <content>
      <p>The usability of a Web site is how well the  
          site supports the users in achieving specified  
          goals. A Web site should facilitate learning,  
          and enable efficient and effective task  
          completion, while propagating few errors.
      </p>
      <note>This book has been approved by the Web Site  
          Users Association.
      </note>
    </content>
  </book>
</books>

Tokenization is implementation-defined. A sample tokenization is used for the examples in this section. This sample tokenization uses white space, punctuation and XML tags as word-breakers , periods followed by a space as sentence boundaries, and <p> for paragraph boundaries. The first sentence and paragraph start at the beginning of the document, and the last sentence and paragraph end at the end of the document. The results may be different for other tokenizations.

The first five tokens in this example using the sample tokenization would be "Improving", "the", "usability", "of", and "a".

Unless stated otherwise, the results assume a case-insensitive match.

3.1 Primary Full-Text Selections

[151]    FTPrimary    ::=    (FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection

[Definition: A primary full-text selection is the basic form of a full-text selection. It specifies tokens and phrases as search conditions (FTWords), optionally followed by a cardinality constraint (FTTimes). An FTSelection in parentheses and the FTExtensionSelection are also a primary full-text selections.]

3.1.1 Weights

[150]    FTPrimaryWithOptions    ::=    FTPrimary FTMatchOptions? FTWeight?
[145]    FTWeight    ::=    "weight" "{" Expr "}"

As shown in the grammar, a full-text primary selection may be optionally followed by match options (which are discussed in 3.4 Match Options) and by a "weight" value that is specified using an expression enclosed in braces. The Expr is evaluated as if it were an argument to a function with an expected type xs:double. The weight MUST have an absolute value between 0.0 and 1000.0 inclusive. If the absolute value of the weight is greater than 1000.0, an error is raised: [err:FTDY0016].

Note:

As a consequence of the flexibility given to implementations under Section 2.3.4 Errors and OptimizationXQ, it is possible that evaluation of weight declarations in an FTContainsExpr for which no scores are evaluated may be skipped by the implementation and errors with them may go unreported.

3.2 Search Tokens and Phrases

[152]    FTWords    ::=    FTWordsValue FTAnyallOption?
[153]    FTWordsValue    ::=    StringLiteral | ("{" Expr "}")
[155]    FTAnyallOption    ::=    ("any" "word"?) | ("all" "words"?) | "phrase"

FTWords finds matches that contain the specified tokens and phrases.

FTWords consists of two parts: a mandatory FTWordsValue part and an optional FTAnyallOption part. FTWordsValue specifies the tokens and phrases that must be contained in the matches. FTAnyallOption specifies how containment is checked.

In general, the tokens and phrases in FTWordsValue are specified using a nested XQuery expression. To simplify notation, the enclosing braces may be omitted if FTWordsValue consists of a single string literal.

The following rules specify how an FTWordsValue matches tokens and phrases. First, the FTWordsValue is converted to a sequence of strings as though it were an argument to a function with the expected type of xs:string*. If the sequence is empty, the FTWords yields no matches. Otherwise, each of those strings is tokenized into a sequence of tokens as described in Section 4.1 Tokenization. Then, FTAnyallOption is checked.

If FTAnyallOption is "any", the sequence of tokens for each string is considered as a phrase. If the sequence of tokens is empty, then the phrase contributes nothing to the set of matches for the FTWords. Otherwise, a match is found in the tokenized form of the text being searched, whenever that form contains a subsequence of tokens that corresponds to the sequence of query tokens in an implementation-defined way and that subsequence of tokens covers consecutive token positions in the tokenized text. If the value of the FTWordsValue contains more than one string, the different strings are considered to be alternatives, i.e., the search context must contain at least one of the generated phrases. Each resulting match will contain exactly one such phrase.

If FTAnyallOption is "all", the sequence of tokens for each string is considered as a phrase. If any such sequence of tokens is empty, the FTWords yields no matches. The resulting matches must contain all of the generated phrases.

If FTAnyallOption is "phrase", the tokens from all the strings are concatenated in a single sequence, which is considered as a phrase. If the sequence of tokens is empty, the FTWords yields no matches. The resulting matches must contain the generated phrase.

If FTAnyallOption is "any word", the tokens from all the strings are combined into a single set. If the set is empty, the FTWords yields no matches. The search context must contain at least one of the tokens in the set. Each resulting match will contain exactly one such token.

If FTAnyallOption is "all words", the tokens from all the strings are combined into a single set. If the set is empty, the FTWords yields no matches. The resulting matches must contain all of the tokens in the set.

If the FTWordsValue evaluates to a single string, the use of "any", "all", and "phrase" in FTAnyallOption produces the same results.

If FTAnyallOption is omitted, "any" is the default.

The following expression returns the sample book element, because its title element contains the token "Expert":

//book[./title contains text "Expert"]

The following expression returns the sample book element, because its title element contains the phrase "Expert Reviews":

//book[./title contains text "Expert Reviews"]

The following expression returns the sample book element, because its title element contains the two tokens "Expert" and "Reviews":

//book[./title contains text {"Expert", "Reviews"} all]

The following expression returns false for our sample document, because the p element doesn't contain the phrase "Web Site Usability" although it contains all of the tokens in the phrase:

//book//p contains text "Web Site Usability"

The following expression returns book numbers of book elements by "Marigold" with a title about "Web Site Usability", sorting them in descending score order:

for $book in /books/book[.//author contains text "Marigold"] 
let score $score := $book/title/@shortTitle contains text "Web Site Usability" 
where $score > 0.8 
order by $score descending
return $book/@number

3.3 Cardinality Selection

[156]    FTTimes    ::=    "occurs" FTRange "times"

[Definition: A cardinality selection consist of an FTWords followed by the FTTimes postfix operator.] A cardinality selection selects matches for which the operand FTWords is matched a specified number of times.

A cardinality selection limits the number of different matches of FTWords within the specified range. The semantics of FTRange are described in 3.6.3 Distance Selection.

In the document fragment "very very big":

  1. The FTWords "very big" has 1 match consisting of the second "very" and "big".

  2. The FTWords {"very", "big"} all has 2 matches; one consisting of the first "very" and "big", and the other containing the second "very" and "big".

  3. The FTWords {"very", "big"} any has 3 matches.

The following expression returns the example book element's number, because the book element contains 2 or more occurrences of "usability":

//book[. contains text "usability" occurs at least 2 times]/@number

The following expression returns the empty sequence, because there are 3 occurrences of {"usability", "testing"} any in the designated title:

//book[@number="1" and title contains text {"usability", 
"testing"} any occurs at most 2 times] 

3.4 Match Options

Full-text match options modify the matching behaviour of the primary full-text selection to which they are applied.

[150]    FTPrimaryWithOptions    ::=    FTPrimary FTMatchOptions? FTWeight?
[166]    FTMatchOptions    ::=    ("using" FTMatchOption)+
[167]    FTMatchOption    ::=    FTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopWordOption
| FTExtensionOption

[Definition: Match options modify the set of tokens in the query, or how they are matched against tokens in the text.]

[Definition: Each of the alternatives of production FTMatchOption other than FTExtensionOption corresponds to one match option group. ] The match options from any given group are mutually exclusive, i.e., only one of these settings can be in effect, whereas match options of different groups can be combined freely.

It is a static error [err:FTST0019] if, within a single FTMatchOptions, there is more than one match option of any given match option group. For example, if the FTCaseOption "lowercase" is specified, then "uppercase" cannot also be specified as part of the same FTMatchOptions.

Although match options only take effect in the application of FTWords, the syntax also allows to specify match options that modify the non-primitive full-text selection "(" FTSelection ")". Such a higher-level match option provides a default for the respective match option group for any embedded FTPrimary, just as match option declarations in the Prolog provide default match options for the whole query.

Match options are propagated through the query via the static context. For each of the seven match option groups, the static context has a component that contains one option from that group. The seven settings are initialized by the implementation in accordance with the table in Appendix C Static Context Components, and are modified by any FTOptionDecls in the Prolog. The resulting settings are then propagated unchanged to every FTContainsExpr in the module (including those in VarDecls and FunctionDecls, and including any that happen to be nested within another FTContainsExpr). At any given FTContainsExpr, the settings from the static context are copied to the FTContainsExpr's inner settings, which are then propagated down the syntax tree. At each FTPrimaryWithOptions, the locally specified match options (if any) overwrite the corresponding inner setting(s). At each FTWords, the inner settings are used as the effective match options for tokenizing the query strings and matching them against the tokens in the text. (These inner settings could be seen as a parallel set of components in the static context, but Section 4 Semantics models them as structures that get passed as parameters to various semantic functions.)

Thus, when a match option appears in an FTSelection, it applies to the associated FTPrimary, but not to any FTContainsExprs that happen to be embedded within that FTPrimary. Instead, for a nested FTContainsExpr, the default match options are those declared in the Prolog or, if not declared in the Prolog, then supplied by the implementation's initial values.

An FTMatchOption applies to the FTPrimary that immediately precedes it. That FTPrimary is either an FTWords (possibly qualified by an FTTimes), an FTExtensionSelection, or a parenthesized FTSelection.

[Definition: The order in which effective match options for an FTWords are applied is called the match option application order.] This order is significant because match options are not always commutative. For example, synonym(stem(word)) is not always the same as stem(synonym(word)).

The match option application order is subject to some constraints:

  1. The Language Option must be applied first

  2. The Stemming Option must be applied before the Case Option and the Diacritics Option

Aside from these constraints, the full order of the application of match options is implementation-defined.

More information on their semantics is given in 4.2.5 Match Options Semantics.

If no match options declarations are present in the prolog and the implementation does not define any overwriting of the static context components for the match options, the query:

/books/book/title contains text "usability" 

is, assuming "de" is the implementation-defined default language, equivalent to the query:

/books/book/title contains text "usability" 
    using language "de"
    using no wildcards
    using no thesaurus
    using no stemming
    using case insensitive 
    using diacritics insensitive 
    using no stop words

We describe each match option group in more detail in the following sections.

3.4.1 Language Option

[177]    FTLanguageOption    ::=    "language" StringLiteral

[Definition: A language option modifies token matching by specifying the language of search tokens and phrases.]

The StringLiteral following the keyword language designates one language. It must be castable to xs:language; otherwise, an error is raised: [err:XPTY0004]XP.

The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. The "language" option MAY influence the behavior of other match options in an implementation-defined way.

The set of standardized language identifiers is defined in [BCP 47]. The set of valid language identifiers among the standardized set is implementation-defined. An implementation MAY choose to use private extensions introduced by a singleton 'x' for additional language identifiers, or other singletons for registered extensions as described in sec. 2.2.6 of [BCP 47]. It is implementation-defined what additional language identifiers, if any, are valid. If an invalid language identifier is specified, then the behavior is implementation-defined. If the implementation chooses to raise an error in that case, it must raise [err:FTST0009].

The default language is specified in the static context.

When an XQuery and XPath Full Text processor evaluates text in a document that is governed by an xml:lang attribute and the portion of the full-text query doing that evaluation contains an FTLanguageOption that specifies a different language from the language specified by the governing xml:lang attribute, the language-related behavior of that full-text query is implementation-defined.

This is an example where the language option is used to select the appropriate stop word list:

//book[@number="1"]/content//p contains text "salon de thé"
using stop words default using language "fr"

3.4.2 Wildcard Option

[178]    FTWildCardOption    ::=    "wildcards" | ("no" "wildcards")

[Definition: A wildcard option modifies token and phrase matching by specifying whether or not wildcards are recognized in query strings.]

When the "wildcards" option is used, wildcard syntax may be included within query strings. A wildcard consists of an indicator (a period or full stop, "."), optionally followed by a qualifier. Each wildcard in a query token will match zero or more characters within a token in the text being searched, as described below. The number of characters that can be matched depends on the qualifier. The forms of wildcard syntax specified by this document are:

  1. A single period, without any qualifiers: Matches a single arbitrary character.

  2. A period immediately followed by a single question mark, "?": Matches either no characters or one character.

  3. A period immediately followed by a single asterisk, "*": Matches zero or more characters.

  4. A period immediately followed by a single plus sign, "+": Matches one or more characters.

  5. A period immediately followed by a sequence of characters that matches the regular expression {[0-9]+,[0-9]+}: Matches a number of characters, where the number is no less than the number represented by the series of digits before the comma, and no greater than the number represented by the series of digits following the comma.

    If a period in the query string is immediately followed by a left curly brace, but the subsequent characters do not conform to the given regular expression, then an error is raised: [err:FTDY0020].

A question mark, asterisk, plus sign, or left curly brace that is not immediately preceded by a period is not treated as a qualifier. For example, using the sample tokenization and "wildcards", the query string "wil+" does not match the search text "will" or "willlllll", but only matches the search text "wil". (The sample tokenization treats the plus sign as punctuation.)

When "wildcards" is used, any character in a query string can be "escaped" by immediately preceding it with a backslash, "\". That is, a backslash immediately followed by any character represents that character literally, preventing any special interpretation that the "wildcards" option might otherwise attach to it. In particular:

  1. Escaping a period prevents its interpretation as a wildcard.

  2. Escaping a question mark, asterisk, plus sign, or left curly brace ensures that it is not interpreted as a qualifier.

  3. An escaped backslash ("\\") represents a literal backslash.

  4. If a query string is terminated by an unescaped backslash, an error is raised: [err:FTDY0020].

Note:

A query string of the form "abc\"xyz" does not represent the three characters "abc" followed by a literal double-quote followed by the three characters "xyz". Instead, this is a malformed StringLiteral, and the processor will report a syntax error [err:XPST0003]XP.

When the "no wildcards" option is used, no wildcards are recognized in query strings. Periods, question marks, asterisks, plus signs, left curly braces, and backslashes are always recognized as ordinary text characters.

The default is "no wildcards".

The following expression returns true, because the p element contains "well":

//book[@number="1"]/p contains text "w.ll" using wildcards

The following expression returns true, because the title element contains "site":

//book[@number="1"]/title contains text ".?site" using wildcards

The following expression returns true, because the title element contains "improving":

//book[@number="1"]/title contains text "improv.*" using wildcards

The following expression raises error [err:FTDY0020], because the query string uses incorrect syntax:

//book[@number="1"]/p contains text "wi.{5,7]" using wildcards

The following expression returns true, because the title contains "site":

//book[@number="1"]/title contains text "\s\i\t\e" using wildcards

The following expression returns true, because the title contains "Usability":

//book[@number="1"]/title contains text "Usab.+\\" using wildcards

(Note that "\\" represents a literal backslash, which the sample tokenization treats as punctuation.)

The following expression raises error [err:FTDY0020], because the query string ends with an unescaped backslash:

//book[@number="1"]/p contains text "will\" using wildcards

The following expression returns false, because the p element does not contain the phrase "w ll":

//book[@number="1"]/p contains text "w.ll" using no wildcards

(Note that, without wildcards, the sample tokenization will treat the period in "w.ll" as punctuation, thus producing "w" and "ll" as separate tokens.)

3.4.3 Thesaurus Option

[171]    FTThesaurusOption    ::=    ("thesaurus" (FTThesaurusID | "default"))
| ("thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("no" "thesaurus")
[172]    FTThesaurusID    ::=    "at" URILiteral ("relationship" StringLiteral)? (FTLiteralRange "levels")?
[143]    URILiteral    ::=    StringLiteral
[173]    FTLiteralRange    ::=    ("exactly" IntegerLiteral)
| ("at" "least" IntegerLiteral)
| ("at" "most" IntegerLiteral)
| ("from" IntegerLiteral "to" IntegerLiteral)

[Definition: A thesaurus option modifies token and phrase matching by specifying whether a thesaurus is used or not.] If thesauri are used, the thesaurus option specifies information to locate the thesauri either by default or through a URI reference. It also states the relationship to be applied and how many levels within the thesaurus to be traversed.

If the thesaurus option specifies a thesaurus with a relative URI, that relative URI is resolved to an absolute URI using the base URI in the static context and that absolute URI is used to identify the thesaurus.

If the URI specifies a thesaurus that is not found in the statically known thesauri, an error is raised [err:FTST0018].

Thesauri add related tokens and phrases to the query or change query tokens. Thus, the user may narrow, broaden, or otherwise modify the query using synonyms, hypernyms (more generic terms), etc. The search is performed as though the user has specified all related query tokens and phrases in a disjunction (FTOr).

Note:

A thesaurus may be standards-based or locally-defined. It may be a traditional thesaurus, or a taxonomy, soundex, ontology, or topic map. How the thesaurus is represented is implementation-dependent.

An FTThesaurusID may optionally contain a StringLiteral to specify the relationship sought between tokens and phrases written in the query and terms in the thesaurus. Relationships include, but are not limited to, the relationships and their abbreviations presented in [ISO 2788] and their equivalents in other languages. The set of relationships supported by an implementation is implementation-defined, but implementations SHOULD support the relationships defined in [ISO 2788]. The following list of terms have the meanings defined in [ISO 2788]. If a query specifies thesaurus relationships not supported by the thesaurus, or does not specify a relationship, the behavior is implementation-defined.

  1. equivalence relationships (synonyms): PREFERRED TERM (USE), NONPREFERRED USED FOR TERM (UF);

  2. hierarchical relationships: BROADER TERM (BT), NARROWER TERM (NT), BROADER TERM GENERIC (BTG), NARROWER TERM GENERIC (NTG), BROADER TERM PARTITIVE (BTP), NARROWER TERM PARTITIVE (NTP), TOP Terms (TT); and

  3. associative relationships: RELATED TERM (RT).

An FTThesaurusID may also optionally include an FTLiteralRange to specify the number of levels to be queried in hierarchical relationships. An FTLiteralRange is a constrained form of FTRange, and specifies a (possibly empty) range of integer values according to the same rules.

Note:

For historical reasons, an implementation MAY allow an FTLiteralRange to have subexpressions more general than IntegerLiterals, and MAY even allow its subexpressions to be dynamically evaluated.

The effect of specifying a particular range of levels in an FTThesaurusID is implementation-defined. This includes cases involving empty ranges, negative levels, or levels not supported by the thesaurus.

If no levels are specified, the default is to query all levels in hierarchical relationships or to query an implementation-defined number of levels in hierarchical relationships.

The "thesaurus" option specifies that string matches include tokens that can be found in one of the specified thesauri. When "default" is used in place of a FTThesaurusID, the thesauri specified in the static context are used, which are either given by the prolog declaration for the thesaurus option, or, if no such declaration exists a system-defined default thesaurus with a system-defined relationship. The default thesaurus may be used in combination with other explicitly specified thesauri.

The "no thesaurus" option specifies that no thesaurus will be used.

The default is "no thesaurus".

The following expression returns true, because it finds a content element containing "task" which the thesaurus identified as a synonym for "duty":

.//book/content contains text "duty" using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "UF"

The following expression returns a book element, because it finds a content element containing "users", which is a narrower term of "people":

doc("http://bstore1.example.com/full-text.xml")
/books/book[./content contains text "people" using
thesaurus at "http://bstore1.example.com/UsabilityThesaurus.xml"
relationship "NT" at most 2 levels]

Assuming the thesaurus available at URL "http://bstore1.example.com/UsabilitySoundex.xml" contains soundex capabilities, the following query returns a book element containing "Marigold" which sounds like "Merrygould":

doc("http://bstore1.example.com/full-text.xml")
/books/book[. contains text "Merrygould" using thesaurus at
"http://bstore1.example.com/UsabilitySoundex.xml" relationship
"sounds like"]

3.4.4 Stemming Option

[170]    FTStemOption    ::=    "stemming" | ("no" "stemming")

[Definition: A stemming option modifies token and phrase matching by specifying whether stemming is applied or not. ]

The "stemming" option specifies that matches may contain tokens that have the same stem as the tokens and phrases written in the query. It is implementation-defined what a stem of a token is.

The "no stemming" option specifies that the tokens and phrases are not stemmed.

It is implementation-defined whether the stemming is based on an algorithm, dictionary, or mixed approach.

The default is "no stemming".

The following expression returns true, because the title of the specified book contains "improving" which has the same stem as "improve":

/books/book[@number="1"]/title contains text "improve" using stemming 

3.4.5 Case Option

[168]    FTCaseOption    ::=    ("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"

[Definition: A case option modifies the matching of tokens and phrases by specifying how uppercase and lowercase characters are considered.]

There are four possible character case options:

  1. Using the option "case insensitive", tokens and phrases are matched, regardless of the case of characters of the query tokens and phrases.

  2. Using the option "case sensitive", tokens and phrases are matched, if and only if the case of their characters is the same as written in the query.

  3. Using the option "lowercase", tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only lowercase characters.

  4. Using the option "uppercase", tokens and phrases are matched, if and only if they match the query without regard to character case, but contain only uppercase characters.

The default is "case insensitive".

The effect of the case options is also influenced by the query's default collation (see Section 2.1.1 Static ContextXQ and Section 4.4 Default Collation DeclarationXQ). The following table summarizes how these interact.

Case Matrix
Case option \ Default collation UCC (Unicode Codepoint Collation) CCS (some generic case-sensitive collation) CCI (some generic case-insensitive collation)
case insensitive compare as if both lower case-insensitive variant of CCS if it exists, else error CCI
case sensitive UCC CCS case-sensitive variant of CCI if it exists, else error
lowercase compare using UCC after applying fn:lower-case() to the query string compare using CCS after applying fn:lower-case() to the query string CCI
uppercase compare using UCC after applying fn:upper-case() to the query string compare using CCS after applying fn:upper-case() to the query string CCI

Note:

In this table, "else error" means "Otherwise, an error is raised: [err:FOCH0002]FO". The phrase "if it exists" is used, because the case-sensitive collation CCS does not always have a case-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the case-insensitive collation CCI does not always have a case-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

The following expression returns false, because the title element doesn't contain "usability" in lower-case characters:

//book[@number="1"]/title contains text "Usability" using lowercase 

The following expression returns true, because the character case is not considered:

//book[@number="1"]/title contains text "usability" using case insensitive

3.4.6 Diacritics Option

[169]    FTDiacriticsOption    ::=    ("diacritics" "insensitive")
| ("diacritics" "sensitive")

[Definition: A diacritics option modifies token and phrase matching by specifying how diacritics are considered. ]

There are two possible diacritics options:

  1. The option "diacritics" "insensitive" matches tokens and phrases with and without diacritics. Whether diacritics are written in the query or not is not considered.

  2. The option "diacritics" "sensitive" matches tokens and phrases only if they contain the diacritics as they are written in the query.

The default is "diacritics insensitive".

The effect of the diacritics options is also influenced by the query's default collation (see Section 2.1.1 Static ContextXQ and Section 4.4 Default Collation DeclarationXQ). The following table summarizes how these interact.

Diacritics Matrix
Diacritics option \ Default collation UCC (Unicode Codepoint Collation) CDS (some generic diacritics-sensitive collation) CDI (some generic diacritics-insensitive collation)
diacritics insensitive UCC comparison, but without considering diacritics diacritics-insensitive variant of CDS if it exists, else error CDI
diacritics sensitive UCC CDS diacritics-sensitive variant of CDI if it exists, else error

Note:

In this table, "else error" means "Otherwise, an error is raised: [err:FOCH0002]FO". The phrase "if it exists" is used, because the diacritics-sensitive collation CDS does not always have a diacritics-insensitive variant (and, even if one exists, it may not be possible to determine it algorithmically), and because the diacritics-insensitive collation CDI does not always have a diacritics-sensitive variant (and, even if one exists, it may not be possible to determine it algorithmically).

The following expression returns true, because the token "Véra" in the editor element is matched, as the acute accent is not considered in the comparison:

//book[@number="1"]//editor contains text "Vera" using diacritics insensitive

This returns false, because the editor element does not contain the token "Vera" in this exact form, i.e. without any diacritics:

//book[@number="1"]/editors contains text "Vera" using diacritics sensitive

3.4.7 Stop Word Option

[174]    FTStopWordOption    ::=    ("stop" "words" FTStopWords FTStopWordsInclExcl*)
| ("stop" "words" "default" FTStopWordsInclExcl*)
| ("no" "stop" "words")
[175]    FTStopWords    ::=    ("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")
[176]    FTStopWordsInclExcl    ::=    ("union" | "except") FTStopWords

[Definition: A stop word option controls matching of tokens by specifying whether stop words are used or not. Stop words are tokens in the query that match any token in the text being searched. ] More precisely, a stop word option defines a collection of stop words according to the rules below. Then, in every FTWords to which the stop word option applies, each query token is checked: if it appears (using an implementation-defined comparison) in the specified collection of stop words, it is considered a stop word.

Normally a stop word matches exactly one token, but there may be implementation-defined conditions, under which a stop word may match a different number of tokens.

Tokens matched by stop words retain their position numbers and are counted by FTDistance and FTWindow filters.

FTStopWords specifies the list of stop words either explicitly as a comma-separated list of string literals, or by the keyword at followed by a literal URI. If the URI specifies a list of stop words that is not found in the statically known stop word lists, an error is raised [err:FTST0008]. Whether the stop word list is resolved from the statically known stop word lists or given explicitly, no tokenization is performed on the stop words: they are used as they occur in the list.

If the stop words option specifies a stop word list with a relative URI, that relative URI is resolved to an absolute URI using the base URI in the static context and that absolute URI is used to identify the stop word list.

Multiple stop word lists may be combined using "union" or "except". The keywords "union" and "except" are applied from left to right. If "union" is specified, every string occurring in the lists specified by the left-hand side or the right-hand side is a stop word. If "except" is specified, only strings occurring in the list specified by the left-hand side but not in the list specified by the right-hand side are stop words.

The "stop words default" option specifies that an implementation-defined collection of stop words is used.

The "no stop words" option specifies that no stop words are used. This is equivalent to specifying an empty list of stop words.

The default is "no stop words".

Note:

Some implementations may apply stop word lists during indexing and be unable to comply with query-time requests to not apply those stop words. An implementation may still support stop-word options (and therefore not raise [err:FTST0006]) by applying any additional stop words specified in the query. Pre-application of irrevocable stop word lists falls under implementation-defined tokenization behavior in this case, and a query that specifies "no stop words" may still have some words ignored.

The following expression returns true, because the document contains the phrase "propagating few errors":

/books/book[@number="1"]//p contains text "propagating of errors"
using stop words ("a", "the", "of") 

Note the asymmetry in the stop word semantics: the property of being a stop word is only relevant to query terms, not to document terms. Hence, it is irrelevant for the above-mentioned match whether "few" is a stop word or not, and on the other hand we do not want the query above to match "propagating" followed by 2 stop words, or even a sequence of 3 stop words in the document.

The following expression returns false. In this case specifying "few" as a stop word has no effect, since "few" does not appear in the query. Although the words "propagating" and "errors" appear in the text being searched, the phrase "propagating errors" cannot be matched, since that phrase does not occur.

/books/book[@number="1"]//p contains text "propagating errors" 
using stop words ("few")

The following expression returns false, because "of" is not in the p element between "propagating" and "errors":

/books/book[@number="1"]//p contains text "propagating of errors" 
using no stop words

The following expression uses the stop words list specified at the URL. Assuming that the specified stop word list contains the word "then", this query is reduced to a query on the phrase "planning X conducting", allowing any token as a substitute for X. It returns a book element, because its content element contains "planning then conducting". It would also return the book if the phrases "planning and conducting" and "planning before conducting" had been in its content:

doc("http://bstore1.example.com/full-text.xml")
/books/book[.//content contains text "planning then 
conducting" using stop words at 
"http://bstore1.example.com/StopWordList.xml"]

The following expression returns books containing "planning then conducting", but not does not return books containing "planning and conducting", since it is exempting "then" from being a stop word:

doc("http://bstore1.example.com/full-text.xml")
/books/book[.//content contains text "planning then conducting"
using stop words at "http://bstore1.example.com/StopWordList.xml"
except ("the", "then")]

3.4.8 Extension Option

[Definition: An extension option is a match option that acts in an implementation-defined way. ]

[179]    FTExtensionOption    ::=    "option" QName StringLiteral

An extension option consists of an identifying QName and a StringLiteral. Typically, a particular option will be recognized by some implementations and not by others. The syntax is designed so that option declarations can be successfully parsed by all implementations.

The QName of an extension option must resolve to a namespace URI and local name, using the statically known namespaces.

Note:

There is no default namespace for options.

Each implementation recognizes an implementation-defined set of namespace URIs used to denote extension options.

If the namespace part of the QName is not a namespace recognized by the implementation as one used to denote extension option, then the extension option is ignored.

Otherwise, the effect of the extension option, including its error behavior, is implementation-defined. For example, if the local part of the QName is not recognized, or if the StringLiteral does not conform to the rules defined by the implementation for the particular extension option, the implementation may choose whether to report an error, ignore the extension option, or take some other action.

Implementations may impose rules on where particular extension options may appear relative to other match options, and the interpretation of an option declaration may depend on its position.

An extension option must not be used to change the syntax accepted by the processor, or to suppress the detection of static errors. However, it may be used without restriction to modify the set of tokens in the query or how they are matched against tokens in the text being searched. An extension option has the same scope as other match options.

The following examples illustrate several possible uses for extension options:

This extension option is set as part of the static context of all full-text expressions in the module and might be used to ensure that queries are insensitive to Arabic short-vowels.

declare namespace exq = "http://example.org/XQueryImplementation";

declare ft-option using option exq:diacritics "short-vowel insensitive";

This extension option applies only to the matching in the full-text selection in which it is found and might be used to specify how compound words should be matched.

declare namespace exq = "http://example.org/XQueryImplementation";

//para[. contains text
         ("Kinder" ftand "Platz" distance exactly 1 words)
         using stemming
         using option exq:compounds "distance=1" ]

3.5 Logical Full-Text Operators

Full-text selections can be combined with the logical connectives ftor (full-text or), ftand (full-text and), not in (mild not), and ftnot (unary full-text not).

[146]    FTOr    ::=    FTAnd ( "ftor" FTAnd )*
[147]    FTAnd    ::=    FTMildNot ( "ftand" FTMildNot )*
[148]    FTMildNot    ::=    FTUnaryNot ( "not" "in" FTUnaryNot )*
[149]    FTUnaryNot    ::=    ("ftnot")? FTPrimaryWithOptions

3.5.1 Or-Selection

[Definition: An or-selection combines two full-text selections using the ftor operator.]

An or-selection finds all matches that satisfy at least one of the operand full-text selections.

The following expression returns the book element written by "Millicent":

//book[.//author contains text "Millicent" ftor "Voltaire"]

3.5.2 And-Selection

[Definition: An and-selection combines two full-text selections using the ftand operator.]

An and-selection finds matches that satisfy all of the operand full-text selections simultaneously. A match of an and-selection is formed by combining matches for each of the operand full-text selections as described in 4.2.6.2 FTAnd.

For example, "usability" ftand "testing" will find two matches in //book[@number="1"]/title: each of the two matches for the FTWords selection "usability" (the two occurrences of "usability" in the string value of the title element) is combined with the single match for the FTWords "testing" (only one occurrence of "testing" in the title). Since the above and-selection has at least one match, the following expression will return "true".

//book[@number="1"]/title contains text ("usability" ftand "testing")

The following expression returns false, because "Millicent" and "Montana" are not contained by the same author element in any book element:

//book/author contains text "Millicent" ftand "Montana"

No author element in any book element contains both "Millicent" and "Montana". Therefore, for any such author element, there are either one match for the FTWords "Millicent" and zero matches for the FTWords "Montana", or vice versa, or no matches for both of them. In any of these cases, the and-selection will have zero matches.

3.5.3 Mild-Not Selection

[Definition: A mild-not selection combines two full-text selections using the not in operator.]

The not in operator is a milder form of the operator combination ftand ftnot. The selection A not in B matches a token sequence that matches A, but not when it is a part of a match of B. In contrast, A ftand ftnot B only finds matches when the token sequence contains A and does not contain B.

As an example, consider a search for "Mexico" not in "New Mexico". This may return, among others, a document which is all about "Mexico" but mentions at the end that "New Mexico was named after Mexico". The occurrence of "Mexico" in "New Mexico" is not considered, but other occurrences of "Mexico" are matched. Note that this document would not be matched by the full-text selection "Mexico" ftand ftnot "New Mexico".

A match to a mild-not selection must contain at least one token that satisfies the first condition and does not satisfy the second condition. If it contains a token that satisfies both the first and the second condition, the token is not considered as a match.

The following expression returns true, because "usability" appears in the title and the p elements and the token within the phrase "Usability Testing" in the title element is not considered:

/books/book contains text "usability" not in "usability testing"

If either operand of a mild-not selection yields an AllMatches that contains a Match that contains a StringExclude, then a dynamic error [err:FTDY0017] is raised.

Note:

This situation can arise if the operand contains a not-selection or a cardinality constraint (FTTimes) involving exactly, at most, or from ... to.

3.5.4 Not-Selection

[Definition: A not-selection is a full-text selection starting with the prefix operator ftnot.]

A not-selection selects matches that do not satisfy the operand full-text selection. Details about how such matches are constructed are given in 4.2.6.3 FTUnaryNot.

The following expression returns the empty sequence, because all book elements contain "usability":

//book[. contains text ftnot "usability"]

The following expression returns true, because book elements contain "improving" and "usability" but not "improving usability":

//book contains text "improving" ftand
"usability" ftand ftnot "improving usability"

The following expression returns book elements containing "web site usability" but not "usability testing":

//book[title/@shortTitle contains text "web site usability" ftand 
ftnot "usability testing"]

3.6 Positional Filters

[158]    FTPosFilter    ::=    FTOrder | FTWindow | FTDistance | FTScope | FTContent

[Definition: Positional filters are postfix operators that serve to filter matches based on various constraints on their positional information.]

Recall that the grammar rule for FTSelection allows an arbitrary number of positional filters to follow an FTOr. In a group of multiple adjacent positional filters, FTOrder filters are applied first, and then the other positional filters are applied from left to right, skipping the FTOrder filters. That is, the first filter is applied to the result of the FTOr, the second is applied to the result of that first application, and so on.

An FTOr consists of one or more FTAnds (separated by ftor), each of which could be an FTPosFilter applied to an embedded FTOr, enclosed in parentheses.

3.6.1 Ordered Selection

[159]    FTOrder    ::=    "ordered"

[Definition: An ordered selection consists of a full-text selection followed by the postfix operator "ordered".] An ordered selection constrains the order of tokens and phrases to be the same as the order in which they are written in the operand selection.

The default is unordered. Unordered is in effect when ordered is not specified in the query. Unordered cannot be written explicitly in the query.

An ordered selection selects matches which satisfy the operand full-text selection and which also satisfy the following constraint: the order that the matching tokens or phrases have in the text being searched is the same order that the corresponding query tokens or phrases have in the operand selection. In both cases, the ordering is determined from the minimum start positions of the constituent tokens.

The following expression returns true, because titles of book elements contain "web site" and "usability" in the order in which they are written in the query, i.e., "web site" must precede "usability":

//book/title contains text ("web site" ftand "usability") ordered

The following expression returns false, because although "Montana" and "Millicent" both appear in the book element, they do not appear in the order they are written in the query:

//book[@number="1"] contains text ("Montana" ftand "Millicent") ordered

3.6.2 Window Selection

[160]    FTWindow    ::=    "window" AdditiveExpr FTUnit
[162]    FTUnit    ::=    "words" | "sentences" | "paragraphs"

[Definition: A window selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTWindow.] A window selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases, more precisely the individual StringIncludes of that match, are found within a number of FTUnits (words, sentences, and paragraphs). The number of FTUnits is specified by an AdditiveExpr that is converted as though it were an argument to a function with the expected type of xs:integer.

A window selection may cross element boundaries. The size of the window is not affected by the presence or absence of element boundaries. Stop words are included in the computation of the window size whether they are ignored by the query or not.

A window selection examines the matches generated by the preceding portion of the FTSelection, and selects those for which the matched tokens and phrases (more precisely, the individual StringIncludes of that match) are all found within a window whose size is a specified number of FTUnits (words, sentences, or paragraphs); for each such window, the window selection then generates a match containing the merge of those StringIncludes, plus any StringExcludes that fall within the window.

The following expression returns true, because "web", "site", and "usability" are within a window of 5 tokens in the title element:

/books/book/title contains text "web" ftand "site"
ftand "usability" window 5 words

The following expression returns true, because "web" and "site" in the order they are written in the query and either "usability" or "testing" are within a window of at most 10 tokens:

/books/book contains text ("web" ftand "site" ordered)
ftand ("usability" ftor "testing") window 10 words

The following expression returns false, because the instances of "web site" and "usability" in the title element are not within a window of 3. The phrase "Web Site Usability" in the attribute does not apply because the attribute is not part of the string value of the node. A similar query with a window of 5 would return true.

/books/book//title contains text "web site" ftand
"usability" window 3 words

The following expression returns the sample book element, because its number attribute is 1 and it contains a window of 2 words which contains an occurrence of "efficient" but not an occurrence of "and". There is just one such matching window in the sample text and it contains "enable efficient".

/books/book[@number="1" and . contains text "efficient" 
ftand ftnot "and" window 2 words]

The following expression returns the empty sequence, because in the selected book element, there is no occurrence of "efficient" within a window of 3 tokens which would not also contain an occurrence of "and":

/books/book[@number="1" and . contains text "efficient" 
ftand ftnot "and" window 3 words]

In order to allow meaningful results for nested positional filters, e.g., a window selection embedded inside a distance selection, the resulting matches for window selections are formed from the input matches that satisfy the window constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. This is explained in more detail in Section 3.6.3 Distance Selection.

3.6.3 Distance Selection

[161]    FTDistance    ::=    "distance" FTRange FTUnit
[157]    FTRange    ::=    ("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)

[Definition: A distance selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTDistance.]

A distance selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases satisfy the specified distance conditions.

Distances in the search context are measured in units of tokens, sentences, or paragraphs. Roughly speaking, the distance between two matches is the number of intervening units, so a distance of zero tokens (sentences, paragraphs) means no intervening tokens (sentences, paragraphs). More precisely, given two matches, we first determine their order by sorting on starting position and if necessary on ending position. Let M1 be the "earlier" and M2 be the "later". (If there are overlapping tokens involved, the designations "earlier" and "later" may not be intuitively obvious.) Then the distance between the two is M2's starting position minus M1's ending position, minus 1.

When computing distances in the search context, a distance selection may cross element boundaries; they affect the distance computed only to the extent that they affect the tokenization of the search context. Stop words are counted in those computations whether they are ignored or not.

When a distance selection applies a distance condition to more than two matches, the distance condition is required to hold on each successive pair of matches.

An FTDistance expresses a distance condition in terms of an FTUnit and an FTRange. An FTUnit can be words, sentences, or paragraphs, where words refers to a distance measured in tokens.

An FTRange specifies a range of integer values by providing a minimum and/or maximum value for some integer quantity. (Here, where the FTRange appears in an FTDistance, that quantity is a distance. When it appears in an FTTimes, the quantity is a number of occurrences.) Each one of the AdditiveExpr specified in an FTRange is converted as though it were an argument to a function with the expected parameter type of xs:integer.

Let the value of the first (or only) operand be M. If "from" is specified, let the value of the second operand be N.

If "exactly" is specified, then the range is the closed interval [M, M]. If "at least" is specified, then the range is the half-closed interval [M, unbounded). If "at most" is specified, then the range is the half-closed interval (unbounded, M]. If "from-to" is specified, then the range is the closed interval [M, N]. Note: If M is greater than N, the range is empty.

Here are some examples of FTRanges:

  1. 'exactly 0' specifies the range [0, 0].

  2. 'at least 1' specifies the range [1,unbounded).

  3. 'at most 1' specifies the range (unbounded, 1].

  4. 'from 5 to 10' specifies the range [5, 10].

The following expression returns false, because "completion" and "errors" are less than 11 tokens apart:

/books/book contains text ("completion" ftand "errors" 
distance at least 11 words)

The following expression returns true:

/books/book contains text "web" ftand "site" ftand
"usability" distance at most 2 words

The search context contains two occurrences of the phrase "the usability of a web site" (once in the <title> and once in the <content>). In this phrase, the tokens "usability" and "web" have a distance of 2 words, and the tokens "web" and "site" have a distance of 0 words, both of which satisfy the constraint distance at most 2 words. (The tokens "usability" and "site" have a distance of 3 words, but this does not cause the distance filter to fail, because these are not successive matches.) Thus, the full-text selection yields two matches, and the whole expression yields true. (The phrase "Improving Web Site Usability" would also satisfy the given full-text selection, but in the sample document it occurs in an attribute value, and so does not contribute to the string value or the tokenization of the book element.)

The following expression returns the empty sequence, because between any token "usability" and the token in any occurrence of the phrase "web site" that is the nearest to the token "usability" there is always more than one intervening token:

/books/book[.//p contains text "web site"
ftand "usability" distance at most 1 words] 

The following expression returns the book title, because for the occurrences of the tokens "web" and "users" in the note element only one intervening token appears:

/books/book[. contains text "web"
ftand "users" distance at most 1 words]/title 

In order to allow meaningful results for nested positional filters, e.g., a distance selection embedded inside another distance selection, the resulting matches for distance selections are formed from the input matches that satisfy the distance constraint as follows. All StringIncludes of such a match are coerced into a single StringInclude that spans all token positions from the smallest to the largest position of any input StringIncludes. Thus, a distance selection that embeds a window or a distance selection takes the result of the embedded selection as a single unit.

The following gives an example of nested distance selections:

/books/book contains text ((("richard" ftand "nixon") distance at most 2 words) 
                   ftand 
                   (("george" ftand "bush") distance at most 2 words) 
                  distance at least 20 words)

This expression allows to find book elements that contain, for instance, "Richard M. Nixon" and "George W. Bush" at least 20 words apart. The matches for the inner distance selections are treated as single units (represented by StringIncludes) by the outer distance selection. Suppose such phrases are present in the search context, then the outer distance selection enforces a constraint on the number of intervening tokens ("at least 20") between the last token of "Richard M. Nixon" and the first token of "George W. Bush".

3.6.4 Scope Selection

[163]    FTScope    ::=    ("same" | "different") FTBigUnit
[164]    FTBigUnit    ::=    "sentence" | "paragraph"

[Definition: A scope selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTScope.]

A scope selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are contained in the same scope or in different scopes.

Possible scopes are sentences and paragraphs.

By default, there are no restrictions on the scope of the matches.

The following expression returns false, because the tokens "usability" and "Marigold" are not contained within the same sentence:

//book contains text "usability" ftand "Marigold" same sentence

The following expression returns true, because the tokens "usability" and "Marigold" are contained within different sentences:

//book contains text "usability" ftand "Marigold" different sentence

The following expression returns a book element, because it contains "usability" and "testing" in the same paragraph:

//book[. contains text "usability" ftand "testing" same paragraph] 

The following expression returns a book element, because "site" and "errors" appear in the same sentence:

//book[. contains text "site" ftand "errors" same sentence] 

It is possible that both "same sentence" and "different sentence" conditions are simultaneously safisfied for several tokens and/or phrases within the same document fragment. This can be observed if there are occurrences of the tokens and/or phrases both within the same sentence and within difference sentences. For example, consider the following document fragment.

<introduction>
... The usability of a Web site is how well the site supports the user in
achieving specified goals. ... Expert reviews and usability testing are methods of
identifying problems in layout, terminology, and navigation. ...
</introduction>

This sample will satisfy both conditions ("usability" ftand "reviews") different sentence and ("usability" ftand "reviews") same sentence. The tokens "usability" and "reviews" occur both in different sentences (the first and second shown sentences) and in the same sentence (the second shown sentences.)

The above observation also holds for the "same paragraph" and "different paragraph" conditions.

3.6.5 Anchoring Selection

[165]    FTContent    ::=    ("at" "start") | ("at" "end") | ("entire" "content")

[Definition: An anchoring selection consists of a full-text selection followed by one of the postfix operators "at start", "at end", or "entire content".]

An anchoring selection selects matches which satisfy the operand full-text selection and for which the matched tokens and phrases are the first, last, or all tokens in the tokenized form of the items being searched.

  • Using the "at start" operator, tokens or phrases are matched, if they cover the first token position in the tokenized string value of the item being searched.

  • Using the "at end" operator, tokens or phrases are matched, if they cover the last token position in the tokenized string value of the item being searched.

  • Using the "entire content" operator, tokens or phrases are matched, if they cover all token positions of the tokenized string value of the item being searched.

The following expression returns each title element starting with the phrase "improving the usability of a web site":

/books//title[. contains text "improving the usability
of a web site" at start]

The following expression returns the p element of the sample, because it ends with the phrase "propagating few errors":

/books//p[. contains text "propagat.*" using wildcards ftand "few
errors" distance at most 2 words at end]

Since the distance operator doesn't imply an ordering, the last example would also yield a match if the p element ended with, say, "few errors are propagated".

The following expression returns each note element whose entire content is "this book has been approved by the web site users association":

/books//note[. contains text "this book has been
approved by the web site users association" entire content]

The following example returns true because both the content and the note elements match:

/books//* contains text "Association" at end

3.7 Ignore Option

[180]    FTIgnoreOption    ::=    "without" "content" UnionExpr

The ignore option specifies a set of nodes whose contents are ignored. It is applicable only to a top-level FTSelection (see FTContainsExpr). [Definition: Ignored nodes are the set of nodes whose content are ignored.] Ignored nodes are identified by the XQuery expression UnionExpr. The value of the UnionExpr must be a sequence of zero or more nodes; otherwise a type error is raised [err:XPTY0004]XP.

Let I1, I2, ..., In be the sequence of items of the search context and let N1, N2, ..., Nk be the sequence of nodes that UnionExpr evaluates to. For each Ij (j=1..n) a copy is made that omits each node Ni (i=1..k). Those copies form the new search context. If UnionExpr evaluates to an empty sequence no nodes are omitted.

In the following fragment, if $x//annotation is ignored, "Web Usability" will be found 2 times: once in the title element and once in the editor element. The 2 occurrences in the 2 annotation elements are ignored. On the other hand, "expert" will not be found, as it appears only in an annotation element.

let $x := <book>
   <title>Web Usability and Practice</title>
   <author>Montana <annotation> this author is
       an expert in Web Usability</annotation> Marigold
   </author>
   <editor>Véra Tudor-Medina on Web <annotation> best
       editor on Web Usability</annotation> Usability
   </editor>
 </book>
 

By default, no element content is ignored.

Note:

Nodes MAY be ignored during indexing and during query processing. The ignore option applies only to query processing. Whether and how indexing ignores nodes is out of scope for this specification.

3.8 Extension Selections

[Definition: An extension selection is a full-text selection whose semantics are implementation-defined.] Typically, a particular extension will be recognized by some implementations and not by others. The syntax is designed so that extension selections can be successfully parsed by all implementations, and so that fallback behavior can be defined for implementations that do not recognize a particular extension.

[154]    FTExtensionSelection    ::=    Pragma+ "{" FTSelection? "}"
[69]    Pragma    ::=    "(#" S? QName (S PragmaContents)? "#)"
[70]    PragmaContents    ::=    (Char* - (Char* '#)' Char*))

An extension selection consists of one or more pragmas followed by a full-text selection enclosed in curly braces. See Section 3.14 Extension ExpressionsXQ for information on pragmas in general. A pragma is denoted by the delimiters (# and #), and consists of an identifying QName followed by implementation-defined content. The content of a pragma may consist of any string of characters that does not contain the ending delimiter #). The QName of a pragma must resolve to a namespace URI and local name, using the statically known namespaces.

Note:

Since there is no default namespace for pragmas, a pragma QName must include a namespace prefix.

Each implementation recognizes an implementation-defined set of namespace URIs used to denote pragmas.

If the namespace part of a pragma QName is not recognized by the implementation as a pragma namespace, then the pragma is ignored. If all the pragmas in an FTExtensionSelection are ignored, then full-text extension selection is just the full-text selection enclosed in curly braces; if this full-text selection is absent, then a static error is raised [err:XQST0079]XQ.

If an implementation recognizes the namespace of one or more pragmas in an FTExtensionSelection, then the value of the FTExtensionSelection, including its error behavior, is implementation-defined. For example, an implementation that recognizes the namespace of a pragma QName, but does not recognize the local part of the QName, might choose either to raise an error or to ignore the pragma.

It is a static error [err:XQST0013]XQ if an implementation recognizes a pragma but determines that its content is invalid.

If an implementation recognizes a pragma, it must report any static errors in the following full-text selection even if it will not apply that selection.

The following examples illustrate three ways in which extension selections might be used.

A pragma can be used to furnish a hint for how to evaluate the following full-text selection, without actually changing the result. For example:

declare namespace exq = "http://example.org/XQueryImplementation";

/books/book/author[name contains text (# exq:use-index #) {'Berners-Lee'}]

An implementation that recognizes the exq:use-index pragma might use an index to evaluate the full-text selection that follows. An implementation that does not recognize this pragma would evaluate the full-text selection in its normal way.

A pragma might be used to modify the semantics of the following full-text selection in ways that would not (in the absence of the pragma) be conformant with this specification. For example, a pragma might be used to change distance counting so that adjacent words are at a distance of 1 (otherwise they would be at a distance of 0):

declare namespace exq = "http://example.org/XQueryImplementation";

/books/book[.//p contains text (# exq:distance #) { "web site"
ftand "usability" distance at most 1 words }]

Such changes to the language semantics must be scoped to the expression contained within the curly braces following the pragma.

A pragma might contain syntactic constructs that are evaluated in place of the following full-text selection. In this case, the following selection itself (if it is present) provides a fallback for use by implementations that do not recognize the pragma. For example:

declare namespace exq = "http://example.org/XQueryImplementation";

//city[. contains text (# exq:classifier with class 'Animals' #) 
       {"animal" using thesaurus at "http://example.org/thesaurus.xml" 
        relationship "RT"}]

Here an implementation that recognizes the pragma will return the result of evaluating the proprietary syntax with class 'animals', while an implementation that does not recognize the pragma will instead return the result of the thesaurus option. If no fallback expression is required, or if none is feasible, then the expression between the curly braces may be omitted, in which case implementations that do not recognize the pragma will raise a static error.

4 Semantics

This section describes the formal semantics of XQuery and XPath Full Text 1.0. The figure below shows how XQuery and XPath Full Text 1.0 integrates with XQuery 1.0 and XPath 2.0.

The following diagram represents the interaction of XQuery and XPath Full Text 1.0 with the rest of XQuery 1.0 and XPath 2.0. It illustrates how full-text expressions can be nested within XQuery 1.0 and XPath 2.0 expressions and vice versa.

XQuery and Full Text Interaction diagram

Note:

In the list above and throughout the rest of this section, bold typeface has been used to distinguish the concepts that are part of the AllMatches model.

The functions and schemas defined in this section are considered to be within the fts: namespace (as discussed in section 1.3 A word about namespaces). These functions and schemas are used only for describing the semantics. There is no requirement that an implementation of this specification must use the functions, schemas, or algorithms described in this section of this specification. The only requirement is that implementations must achieve the same results that an implementation that does use these functions, schemas, and algorithms would achieve.

Note that by using XQuery 1.0 and XPath 2.0 to specify the formal semantics, we avoid the need to introduce new formalism. We simply reuse the formal semantics of XQuery 1.0 and XPath 2.0.

4.1 Tokenization

[Definition: Formally, tokenization is the process of converting an XDM item to a collections of tokens, taking any structural information of the item into account to identify token, sentence, and paragraph boundaries. Each token is assigned a starting and ending position.]

Tokenization, including the definition of the term "token", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interprete the results of tokenization. Tokenization MUST conform to these constraints:

  1. Each token MUST consist of one or more characters.

  2. Tokenization of an item MUST include only tokens derived from the string value of that item. The string value is defined in [XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition)] in Section 2.6.5 String ValuesDM; for element nodes it does not include the contents of attributes, but for attribute nodes it does.

  3. The tokenizer SHOULD, when tokenizing two equal items, identify the same tokens in each. The cases where it does not are implementation-defined.

  4. The starting and ending position of a token MUST be integers, and the starting position MUST be less than or equal to the ending position.

  5. In the tokenization of an item, consider the range of token positions from the smallest starting position to the largest ending position; every token position in that range must be covered by some token in the tokenization. That is, for every token position P, there must exist some token T such that T's starting position <= P <= T's ending position.

  6. The tokenizer MUST preserve the containment hierarchy (paragraphs contain sentences contain tokens) by adhering to the following constraints:

    1. Each token is contained in at most one sentence and at most one paragraph. (In particular, this means that no tokens of any sentence are contained in any other sentence, and no tokens of any paragraph are contained in any other paragraph.)

    2. All tokens of a sentence are contained in at most one paragraph.

    3. The range of token positions from the smallest starting position to the largest ending position in a sentence does not overlap with the token position range from any other sentence.

    4. The range of token positions from the smallest starting position to the largest ending position in a paragraph does not overlap with the token position range from any other paragraph.

Useful information for tokenizer implementors may be found in [UAX29].

Note:

Usually, the starting and ending positions of a token are the same. For some languages, some tokenizers may identify overlapping tokens. For example, the German word "Donaudampfschifffahrtskapitaensmuetze" might be tokenized into the following tokens: "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfschiff", "kapitaen", "muetze", "kapitaensmuetze", "schifffahrt", "dampfschifffahrt", and perhaps others. In the face of overlapping tokens, it is implementation-dependent what positions a tokenizer assigns to each such token. For example, a tokenizer might assign the same position value to each of the tokens "Donaudampfschifffahrtskapitaensmuetze", "Donau", "dampf", "schiff", "dampfshiff", etc. In that case, the distance between each (overlapping) token assigned the same position is -1. Tokenizers might retain additional information about those overlapping tokens that allows the full-text implementation to distinguish among them.

Consider the sentence "Ich sehe den Dampfschifffahrtskapitän auf dem Fluß." If an implementation tokenizes "Dampfschifffahrtskapitän" as overlapping tokens at the same position, then the implementation could still determine that the query "'Schifffahrt Dampf' window 0 words ordered" fails to match the sentence because phrase matching is implementation-defined and may make use of additional implementation-dependent token information.

Even more complex situations can arise. Consider, for example, the German sentence "Er stellte sie vor." A sophisticated tokenizer might construct the token "vorstellen" covering positions 2 through 4, which overlaps the token "sie" at position 3. For the purposes of distance calculations, tokens are considered in the order of their starting positions, so the distance between "vorstellen" and "sie" would be 3-4-1=-2. (See fts:wordDistance, below.)

4.1.1 Examples

For example, the following example must return false, because the 'secret' only occurs within an attribute and a comment, neither of which contributes characters to the string value of the 'p' element node:

<p kind='secret'>Sensitive material <!-- secret --></p> contains text 'secret'

The following document may lead to overlapping tokens to account for the ambiguity caused by the hyphen:

<p>I will re-
sign tommorow.</p>

The following document fragment is the source document for examples in this section. A sample tokenization is used for the examples in this section. The results might be different for other tokenizations.

Unless stated otherwise, the results assume a case-insensitive match.

<offers>
    <offer id="1000" price="10000">
        Ford Mustang 2000, 65K, excellent condition, runs 
        great, AC, CC, power all
    </offer>
    <offer id="1001" price="8000">
        Honda Accord 1999, 78K, A/C, cruise control, runs 
        and looks great, excellent condition
    </offer>
    <offer id="1005" price="5500">
        Ford Mustang, 1995, 150K highway mileage, no rust, 
        excellent condition
    </offer>
</offers>
        

In this sample tokenization, tokens are delimited by punctuation and whitespace symbols.

  • The token "Ford" is at relative position 1.

  • The token "Mustang" is at relative position 2.

  • The token "2000" is at relative position 3.

  • Relative position numbers are assigned sequentially through the end of the document.

Hence in this example each token occupies exactly one position, and no overlapping of tokens occurs. The relative positions of tokens are shown below in parentheses.

<offers>
    <offer id="1000" price="10000">
        Ford(1) Mustang(2) 2000(3), 65K(4), excellent(5)
        condition(6), runs(7) great(8), AC(9), CC(10), 
        power(11) all(12)
    </offer>
    <offer id="1001" price="8000">
        Honda(13) Accord(14) 1999(15), 78K(16), A(17)/C(18),
        cruise(19) control(20), runs(21) and(22) looks(23)
        great(24), excellent(25) condition(26)
    </offer>
    <offer id="1005" price="5500">
        Ford(27) Mustang(28), 1995(29), 150K(30) highway(31)
        mileage(32), little(33)  rust(34), excellent(35) 
        condition(36)
    </offer>
</offers>
        

The relative positions of paragraphs are determined similarly. In this sample tokenization, the paragraph delimiters are start tags and end tags.

  • The tokens in the first 'offer' element are assigned relative paragraph number 1.

  • The tokens from the next 'offer' element are assigned relative paragraph number 2.

  • Relative paragraph numbers are assigned sequentially through the end of the document.

The relative positions of sentences are determined similarly using sentence delimiters.

Implementations may provide for the means to ignore or side-step certain structural elements when performing tokenization. In the following example, the implementation has decided to ignore the markup for <bold> and prune out the entire subtree headed by <deleted>.

<para><deleted>This sentence was deleted.</deleted>
This <bold>entire paragraph</bold> is one sentence
as far as the tokenizer is concerned.
</para>

Using the same notation as before, this sample tokenization is shown below. All the tokens marked with a token position also have the same sentence and paragraph relative positions. Note that there are no tokens marked for the ignored subtree.

<para><deleted>This sentence was deleted.</deleted>
This(1) <bold>entire(2) paragraph(3)</bold> is(4) one(5) sentence(6)
as(7) far(8) as(9) the(10) tokenizer(11) is(12) concerned(13).
</para>

4.1.2 Representations of Tokenized Text and Matching

[Definition: A QueryItem is a sequence of QueryTokenInfos representing the collection of tokens derived from tokenizing one query string. ]

[Definition: A QueryTokenInfo is the identity of a token inside a query string. ] Each QueryTokenInfo is associated with a position that captures the relative position of the query string in the query.

[Definition: A TokenInfo represents a contiguous collection of tokens from an XML document. ] Each TokenInfo is associated with:

  • startPos: the smallest starting position of a token in the sequence

  • endPos: the largest ending position of any token of the sequence

  • startSent: the relative position of the sentence containing the token with the smallest starting position or zero if the tokenizer does not report sentences

  • endSent: the relative position of the sentence containing the token with the largest ending position or zero if the tokenizer does not report sentences

  • startPara: the relative position of the paragraph containing the token with the smallest starting position or zero if the tokenizer does not report paragraphs

  • endPara: the relative position of the paragraph containing the token with the largest ending position or zero if the tokenizer does not report paragraphs

The following matching function is the central implementation-defined primitive performing the full-text retrieval.

declare function fts:matchTokenInfos (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $stopWords as xs:string*,
      $queryTokens as element(fts:queryToken)* )
   as element(fts:tokenInfo)*  external;
            

The above function returns the TokenInfos in items in $searchContext that match the query string represented by the sequence $queryTokens, when using the match options in $matchOptions and stop words in $stopWords. If $queryTokens is a sequence of more than one query token, each returned TokenInfo must represent a phrase matching that sequence.

Note:

While this matching function assumes a tokenized representation of the query strings, it does not assume a tokenized representation of the input items in $searchContext, i.e. the texts being searched. Hence, the tokenization of the search context is implicit in this function and coupled to the retrieval of matches. Of course, this does not imply that tokenization of the search context cannot be done a priori. The tokenization of each item in $searchContext does not necessarily take into account the match options in $matchOptions or the query tokens in $queryTokens. This allows implementations to tokenize and index input data without the knowledge of particular match options used in full-text queries.

4.2 Evaluation of FTSelections

The XQuery 1.0 and XPath 2.0 Data Model is inadequate to support fully composable FTSelections. Full-text operations, such as FTSelections, operate on linguistic units, such as positions of tokens, and which are not captured in the XQuery 1.0 and XPath 2.0 Data Model (XDM).

XQuery and XPath Full Text adds relative token, sentence, and paragraph position numbers via AllMatches. AllMatches make FTSelections fully composable.

4.2.1 AllMatches

4.2.1.1 Formal Model

[Definition: An AllMatches describes the possible results of an FTSelection.] The UML Static Class diagram of AllMatches is shown on the diagram given below.

AllMatches class diagram

The AllMatches object contains zero or more Matches.

[Definition: Each Match describes one result to the FTSelection.] The result is described in terms of zero or more StringIncludes and zero or more StringExcludes.

[Definition: A StringMatch is a possible match of a sequence of query tokens with a corresponding sequence of tokens in a document. A StringMatch may be a StringInclude or StringExclude.] The queryPos attribute specifies the position of the query token in the query. This attribute is needed for FTOrders. The matched document token sequence is described in the TokenInfo associated with the StringMatch.

[Definition: A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.]

[Definition: A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.]

Intuitively, AllMatches specifies the TokenInfos that a search context item contains and does not contain to satisfy an FTSelection.

The AllMatches structure resembles the Disjunctive Normal Form (DNF) in propositional and first-order logic. The AllMatches is a disjunction of Matches. Each Match is a conjunction of StringIncludes, and StringExcludes.

4.2.1.2 Examples

Since in most of the examples below the tokens span only a single position, we characterize the TokenInfo instance by simply giving this position, written as "Pos:X". This should be read as the value for both, the startPos and the endPos attribute. Furthermore, for expository reasons, we include in each StringMatch example an attribute "query string", set to the original query string, in order to facilitate the association from which query string that match came from.

The simplest example of an FTSelection is an FTWords such as "Mustang". The AllMatches corresponding to this FTWords is given below.

Sample AllMatches

As shown, the AllMatches consists of two Matches. Each Match represents one possible result of the FTWords "Mustang". The result represented by the first Match, represented as a StringInclude, contains the token "Mustang" at position 2. The result described by the second Match contains the token "Mustang" at position 28.

A more complex example of an FTSelection is an FTWords such as "Ford Mustang". The AllMatches for this FTWords is given below.

Sample AllMatches

There are two possible results for this FTWords, and these are represented by the two Matches. Each of the Matches requires two tokens to be matched. The first Match is obtained by matching "Ford" at position 1 and matching "Mustang" at position 2. Similarly, the second Match is obtained by matching "Ford" at position 27 and "Mustang" at position 28.

An even more complex example of an FTSelection is an FTSelection such as "Mustang" ftand ftnot "rust" that searches for "Mustang" but not "rust". The AllMatches for this FTSelection is given below.

Sample AllMatches

This example introduces StringExclude. StringExclude corresponds to negation in DNF (Disjunctive Normal Form). It specifies that the result described by the corresponding Match must not match the token at the specified position. In this example, the first Match specifies that "Mustang" is matched at position 2, and that the token "rust" at position 34 is not matched.

4.2.1.3 XML representation

AllMatches has a well-defined hierarchical structure. Therefore, the AllMatches can be easily modeled in XML. This XML representation and those which follow formally describe the semantics of FTSelections. For example, the XML representation of AllMatches formally specifies how an FTSelection operates on zero or more AllMatches to produce a resulting AllMatches.

The XML schema for representing AllMatches is given below.

<xs:schema 
     xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     xmlns:fts="http://www.w3.org/2007/xpath-full-text"
     targetNamespace="http://www.w3.org/2007/xpath-full-text"
     elementFormDefault="qualified" 
     attributeFormDefault="unqualified">

  <xs:complexType name="allMatches">
    <xs:sequence>
      <xs:element ref="fts:match" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="stokenNum" type="xs:integer" use="required" />
  </xs:complexType>

  <xs:element name="allMatches" type="fts:allMatches"/>

  <xs:complexType name="match">
    <xs:sequence>
      <xs:element ref="fts:stringInclude" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
      <xs:element ref="fts:stringExclude" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
   </xs:sequence>
  </xs:complexType>
  
  <xs:element name="stringInclude" 
              type="fts:stringMatch" />

  <xs:element name="stringExclude" 
              type="fts:stringMatch" />

  <xs:element name="match" type="fts:match"/>

  <xs:complexType name="stringMatch">
    <xs:sequence>
      <xs:element ref="fts:tokenInfo"/>
    </xs:sequence>
    <xs:attribute name="queryPos" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="isContiguous" 
                  type="xs:boolean" 
                  use="required"/>  
  </xs:complexType>

  <xs:complexType name="tokenInfo">
    <xs:attribute name="startPos" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="endPos" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="startSent" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="endSent" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="startPara" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="endPara" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>

  <xs:element name="tokenInfo" type="fts:tokenInfo"/>

  <xs:complexType name="queryItem">
    <xs:sequence>
      <xs:element ref="fts:queryToken" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
   </xs:sequence>
  </xs:complexType>

  <xs:complexType name="queryTokenInfo">
    <xs:attribute name="word" 
                  type="xs:string" 
                  use="required"/>
    <xs:attribute name="queryPos" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>

  <xs:element name="queryToken" type="fts:queryTokenInfo"/>
</xs:schema>
                

The stokenNum attribute in AllMatches is related to the representation of the semantics as XQuery functions. Therefore, it is not considered part of the AllMatches model. The stokenNum attribute stores the number of query tokens used when evaluating the AllMatches. This value is used to compute the correct value for the queryPos attribute in new StringMatches.

4.2.2 XML Representation

FTSelections are fully composable and may be nested arbitrarily under other FTSelections. Each FTSelection may be associated with match options (such as stemming and stop words) and score weights. Since score weights are solely interpreted by the formal semantics scoring function, they do not influence the semantics of FTSelections. Therefore, score weights are not considered in the formal semantics.

The XML structures defined by the following schema represent FTSelections within the semantic functions of section 4 Semantics. This representation is used for definitional purposes only and should not be confused with the XML representation for queries in Appendix E XML Syntax (XQueryX) for XQuery and XPath Full Text 1.0. Every FTSelection is represented as an XML element. Every nested FTSelection is represented as a nested descendant element. For binary FTSelections, e.g., FTAnd, the nested FTSelections are represented in <left> and <right> descendant elements. For unary FTSelections, a <selection> descendant element is used. Additional characteristics of FTSelections, e.g., the distance unit for FTDistance, are stored in attributes.

<xs:schema
     xmlns:xs="http://www.w3.org/2001/XMLSchema" 
     xmlns:fts="http://www.w3.org/2007/xpath-full-text"
     targetNamespace="http://www.w3.org/2007/xpath-full-text"
     elementFormDefault="qualified" 
     attributeFormDefault="unqualified">
           
  <xs:include schemaLocation="AllMatches.xsd" />
  <xs:include schemaLocation="MatchOptions.xsd" />

  <xs:complexType name="ftSelection">
    <xs:sequence>
      <xs:choice>
        <xs:element name="ftWords" type="fts:ftWords"/>
        <xs:element name="ftAnd" type="fts:ftAnd"/>
        <xs:element name="ftOr" type="fts:ftOr"/>
        <xs:element name="ftUnaryNot" type="fts:ftUnaryNot"/>
        <xs:element name="ftMildNot" type="fts:ftMildNot"/>
        <xs:element name="ftOrder" type="fts:ftOrder"/>
        <xs:element name="ftScope" type="fts:ftScope"/>
        <xs:element name="ftContent" type="fts:ftContent"/>
        <xs:element name="ftDistance" type="fts:ftDistance"/>
        <xs:element name="ftWindow" type="fts:ftWindow"/>
        <xs:element name="ftTimes" type="fts:ftTimes"/>
      </xs:choice>
      <xs:element ref="fts:matchOptions" 
                  minOccurs="0"/>
      <xs:element name="weight" 
                  type="xs:double" 
                  minOccurs="0"/>
    </xs:sequence>
  </xs:complexType>

  <xs:element name="selection" type="fts:ftSelection"/>

  <xs:complexType name="ftWords">
    <xs:sequence>
      <xs:element ref="fts:queryItem" 
                  minOccurs="0" 
                  maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:ftWordsType" 
                  use="required"/>
  </xs:complexType>

  <xs:element name="queryItem" type="fts:queryItem"/>
  
  <xs:complexType name="ftAnd">
    <xs:sequence>
      <xs:element name="left" type="fts:ftSelection"/>
      <xs:element name="right" type="fts:ftSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="ftOr">
    <xs:sequence>
      <xs:element name="left" type="fts:ftSelection"/>
      <xs:element name="right" type="fts:ftSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="ftUnaryNot">
    <xs:sequence>
      <xs:element name="selection" type="fts:ftSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="ftMildNot">
    <xs:sequence>
      <xs:element name="left" type="fts:ftSelection"/>
      <xs:element name="right" type="fts:ftSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="ftOrder">
    <xs:sequence>
      <xs:element name="selection" type="fts:ftSelection"/>
    </xs:sequence>
  </xs:complexType>
  
  <xs:complexType name="ftScope">
    <xs:sequence>
      <xs:element name="selection" type="fts:ftSelection"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:scopeType" 
                  use="required"/>
    <xs:attribute name="scope" 
                  type="fts:scopeSelector" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="ftContent">
    <xs:sequence>
      <xs:element name="selection" type="fts:ftSelection"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:contentMatchType" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="ftDistance">
    <xs:sequence>
      <xs:element name="range" type="fts:ftRangeSpec"/>
      <xs:element name="selection" type="fts:ftSelection"/>
    </xs:sequence>
    <xs:attribute name="type" 
                  type="fts:distanceType" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="ftWindow">
    <xs:sequence>
      <xs:element name="selection" type="fts:ftSelection"/>
    </xs:sequence>
    <xs:attribute name="size" 
                  type="xs:integer" 
                  use="required"/>
    <xs:attribute name="type" 
                  type="fts:distanceType" 
                  use="required"/>
  </xs:complexType>
  
  <xs:complexType name="ftTimes">
    <xs:sequence>
      <xs:element name="range" type="fts:ftRangeSpec"/>
      <xs:element name="selection" type="fts:ftWords"/>
    </xs:sequence>
  </xs:complexType>
    
  <xs:simpleType name="ftWordsType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="any"/>
      <xs:enumeration value="all"/>
      <xs:enumeration value="phrase"/>
      <xs:enumeration value="any word"/>
      <xs:enumeration value="all word"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="scopeType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="same"/>
      <xs:enumeration value="different"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="scopeSelector">
    <xs:restriction base="xs:string">
      <xs:enumeration value="paragraph"/>
      <xs:enumeration value="sentence"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="distanceType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="paragraph"/>
      <xs:enumeration value="sentence"/>
      <xs:enumeration value="word"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:simpleType name="contentMatchType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="at start"/>
      <xs:enumeration value="at end"/>
      <xs:enumeration value="entire content"/>
    </xs:restriction>
  </xs:simpleType>
</xs:schema>
            

4.2.3 The evaluate function

The semantics for the evaluation of FTSelections is defined using the fts:evaluate function. The function takes three parameters: (1) an FTSelection, 2) a search context item, and 3) the default set of match options that apply to the evaluation of the FTSelection.

The fts:evaluate function returns the AllMatches that is the result of evaluating the FTSelection. When fts:evaluate is applied to some FTSelection X, it calls the function fts:ApplyX to build the resulting AllMatches. If X is applied on nested FTSelections, the fts:evaluate function is recursively called on these nested FTSelections and the returned AllMatches are used in the evaluation of fts:ApplyX.

The semantics for the fts:evaluate function is given below.

declare function fts:evaluate (
      $ftSelection as element(*, fts:ftSelection), 
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $queryTokenNum as xs:integer )
   as element(fts:allMatches)
{
   if (fn:count($ftSelection/fts:matchOptions) > 0) then 
      (: First we deal with all match options that the    :)
      (: FTSelection might bear: we add the match options :)
      (: to the current match options structure, and      :)
      (: pass the new structure to the recursive call.    :)
      let $newFTSelection := 
         <fts:selection>{$ftSelection/*
                           [fn:not(self::fts:matchOptions)]}</fts:selection>
      return fts:evaluate($newFTSelection, 
                          $searchContext, 
                          fts:replaceMatchOptions($matchOptions, 
                                              $ftSelection/fts:matchOptions),
                          $queryTokenNum)
   else if (fn:count($ftSelection/fts:weight) > 0) then
      (: Weight has no bearing on semantics -- just :)
      (: call "evaluate" on nested FTSelection     :)
      let $newFTSelection := $ftSelection/*[fn:not(self::fts:weight)]
      return fts:evaluate($newFTSelection, 
                          $searchContext, 
                          $matchOptions,
                          $queryTokenNum)
   else
      typeswitch ($ftSelection/*[1]) 
         case $nftSelection as element(fts:ftWords) return
            (: Apply the FTWords in the search context :)
            fts:ApplyFTWords($searchContext,
                             $matchOptions,
                             $nftSelection/@type,
                             $nftSelection/fts:queryItem,
                             $queryTokenNum + 1)
         case $nftSelection as element(fts:ftAnd) return
            let $left := fts:evaluate($nftSelection/fts:left,
                                     $searchContext,
                                     $matchOptions,
                                     $queryTokenNum)
            let $newQueryTokenNum := $left/@stokenNum
            let $right := fts:evaluate($nftSelection/fts:right,
                                      $searchContext,
                                      $matchOptions,
                                      $newQueryTokenNum)
            return fts:ApplyFTAnd($left, $right)
         case $nftSelection as element(fts:ftOr) return
            let $left := fts:evaluate($nftSelection/fts:left,
                                     $searchContext,
                                     $matchOptions,
                                     $queryTokenNum)
            let $newQueryTokenNum := $left/@stokenNum
            let $right := fts:evaluate($nftSelection/fts:right,
                                      $searchContext,
                                      $matchOptions,
                                      $newQueryTokenNum)
            return fts:ApplyFTOr($left, $right)
         case $nftSelection as element(fts:ftUnaryNot) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTUnaryNot($nested)
         case $nftSelection as element(fts:ftMildNot) return
            let $left := fts:evaluate($nftSelection/fts:left,
                                     $searchContext,
                                     $matchOptions,
                                     $queryTokenNum)
            let $newQueryTokenNum := $left/@stokenNum
            let $right := fts:evaluate($nftSelection/fts:right,
                                      $searchContext,
                                      $matchOptions,
                                      $newQueryTokenNum)
            return fts:ApplyFTMildNot($left, $right)
         case $nftSelection as element(fts:ftOrder) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTOrder($nested)
         case $nftSelection as element(fts:ftScope) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTScope($nftSelection/@type, 
                                    $nftSelection/@scope,
                                    $nested)
         case $nftSelection as element(fts:ftContent) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTContent($searchContext,
                                      $nftSelection/@type, 
                                      $nested)
         case $nftSelection as element(fts:ftDistance) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTDistance($nftSelection/@type,
                                       $nftSelection/fts:range,
                                       $nested)
         case $nftSelection as element(fts:ftWindow) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTWindow($nftSelection/@type,
                                     $nftSelection/@size,
                                     $nested)
         case $nftSelection as element(fts:ftTimes) return
            let $nested := fts:evaluate($nftSelection/fts:selection,
                                        $searchContext,
                                        $matchOptions,
                                        $queryTokenNum)
            return fts:ApplyFTTimes($nftSelection/fts:range,
                                    $nested)
         default return <fts:allMatches stokenNum="0" />
};
            

For concreteness, assume that the FTSelection was invoked inside an contains text expression such as searchContext contains text ftSelection. In order to determine the AllMatches result of ftSelection, the fts:evaluate function is invoked as follows: fts:evaluate($ftSelection, $searchContext, $matchOptions, 0), where $ftSelection is the XML representation of the ftSelection and $searchContext is bound to the result of the evaluation of the XQuery expression searchContext.

Initially, the $queryTokensNum is 0, i.e., no query tokens have been processed.

The variable $matchOptions is bound to the list of match options as defined in the static context (see Appendix C Static Context Components). Match options embedded in $ftSelection modify the match options collection as evaluation proceeds.

Given the invocation of: fts:evaluate($ftSelection, $searchContext, $matchOptions), evaluation proceeds as follows. First, $ftSelection is checked to see whether 1) it contains a match option, 2) it contains a weight specification, 3) it is an FTWords, or 4) none of the above hold.

  1. If $ftSelection contains one or more match options, these are combined with the inherited match options via a call to fts:replaceMatchOptions (see 4.2.5 Match Options Semantics). The evaluate function is then invoked on the nested FTSelection with the new set of match options, and the result of that call is returned.

  2. If $ftSelection contains a weight specification, then the specification is ignored because it does not alter the semantics. The evaluate function is recursively called on the nested FTSelection and the resulting AllMatches is returned.

  3. If $ftSelection is an FTWords, then it does not have any nested FTSelections. Consequently, this is the base of the recursive call, and the AllMatches result of the FTWords is computed and returned. The AllMatches is computed by invoking the ApplyFTWords function with the current search context and other necessary information.

  4. If $ftSelection contains neither a match option nor a weight specification and is not an FTWords, the FTSelection performs a full-text operation, such as ftand, ftor, window. These operations are fully-compositional and may be invoked on nested FTSelections. Consequently, evaluation proceeds as follows.

    • First, the evaluate function is recursively invoked on each nested FTSelection. The result of evaluating each nested FTSelection is an AllMatches.

    • The AllMatches are transformed into the resulting AllMatches by applying the full-text operation corresponding to FTSelection1 which is generically named applyX for some type of FTSelection X in the code.

    For example, let FTSelection1 be FTSelection2 ftand FTSelection3 . Here FTSelection2 and FTSelection3 may themselves be arbitrarily nested FTSelections. Thus, evaluate is invoked on FTSelection2 and FTSelection3, and the resulting AllMatches are transformed to the final AllMatches using the ApplyFTAnd function corresponding to ftand .

The semantics of the ApplyX function for each FTSelection kind X is given below.

4.2.4 FTWords

An FTWords that consists of a single query string consisting of a sequence of token to be matched as a phrase is evaluated by the applyQueryTokensAsPhrase function. Its parameters are 1) the search context, 2) the list of match options, 3) the query string to be matched as a sequence of fts:queryToken items, and 4) the position where the latter query string occurs in the query.

(: simplified version not dealing with special match options :)
declare function fts:applyQueryTokensAsPhrase (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $queryTokens as element(fts:queryToken)*,
      $queryPos as xs:integer )
   as element(fts:allMatches)
{
   <fts:allMatches stokenNum="{$queryPos}"> 
   {
      for $tokenInfo in
         fts:matchTokenInfos( 
            $searchContext,
            $matchOptions,
            (),
            $queryTokens )
      return  
         <fts:match>  
            <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> 
            {$tokenInfo}
            </fts:stringInclude> 
         </fts:match>
   } 
   </fts:allMatches>
};

If after the application of all the match options, the sequence of query tokens returned for an FTWords is empty, an empty AllMatches is returned.

The AllMatches corresponding to an FTWords is a set of Matches. Each of the Matches is associated with a starting and an ending position indicating where the corresponding query tokens were found. For example, the AllMatches result for the FTWords "Mustang" is given below. To simplify the presentation in the figures we write Pos: N, if the attributes startPos and endPos are the same with N being that position.

FTWords example

There are five variations of FTWords depending on how the tokens and phrases in the nested XQuery 1.0 and XPath 2.0 expression are matched.

  • When any word is specified, at least one token in the tokenization of the nested expression must be matched.

  • When all word is specified, all tokens in the tokenization of the nested expression must be matched.

  • When phrase is specified, all tokens in the tokenization of the nested expression must be matched as a phrase.

  • When any is specified, at least one string atomic value in the nested expression must be matched as a phrase.

  • When all is specified, all string atomic values in the nested expression must be matched as a phrase.

The semantics for FTWords when any word is specified is given below. Since FTWords does not have nested FTSelections, the ApplyFTWords function does not take AllMatches parameters corresponding to nested FTSelection results.

declare function fts:MakeDisjunction (
      $curRes as element(fts:allMatches),
      $rest as element(fts:allMatches)* ) 
   as element(fts:allMatches) 
{
   if (fn:count($rest) = 0)
   then $curRes
   else 
      let $firstAllMatches := $rest[1]
      let $restAllMatches := fn:subsequence($rest, 2)
      let $newCurRes := fts:ApplyFTOr($curRes, 
                                      $firstAllMatches)
      return fts:MakeDisjunction($newCurRes, 
                                 $restAllMatches)
};

declare function fts:ApplyFTWordsAnyWord (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $queryItems as element(fts:queryItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   (: Tokenization of query string has already occurred. :)
   (: Get sequence of QueryTokens over all query items. :)
   let $queryTokens := $queryItems/fts:queryToken
   return
      if (fn:count($queryTokens) eq 0) 
      then <fts:allMatches stokenNum="0" />
      else
         let $allAllMatches := 
            for $queryToken at $pos in $queryTokens
            return fts:applyQueryTokensAsPhrase($searchContext,
                                                 $matchOptions,
                                                 $queryToken,
                                                 $queryPos + $pos - 1)
         let $firstAllMatches := $allAllMatches[1]
         let $restAllMatches := fn:subsequence($allAllMatches, 2)
         return fts:MakeDisjunction($firstAllMatches, $restAllMatches)
};

The tokenized query strings are passed to ApplyFTWordsAnyWord as a sequence of fts:queryItem, each containing the tokens of a single query string. A single flattened sequence of all tokens (of type fts:queryToken) over all query items is constructed. For each of these, the result of FTWords is computed using applyQueryTokensAsPhrase. Finally, the disjunction of all resulting AllMatches is computed.

The semantics for FTWords when all word is specified is similar to the above, however composes a conjunction. It is given below.

declare function fts:MakeConjunction ( 
      $curRes as element(fts:allMatches),
      $rest as element(fts:allMatches)* ) 
   as element(fts:allMatches)
{
   if (fn:count($rest) = 0)
   then $curRes
   else 
      let $firstAllMatches := $rest[1]
      let $restAllMatches := fn:subsequence($rest, 2)
      let $newCurRes := fts:ApplyFTAnd($curRes, 
                                       $firstAllMatches)
      return fts:MakeConjunction($newCurRes, 
                                 $restAllMatches)
};

declare function fts:ApplyFTWordsAllWord (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $queryItems as element(fts:queryItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   (: Tokenization of query strings has already occurred. :)
   (: Get sequence of QueryTokens over all query items :)
   let $queryTokens := $queryItems/fts:queryToken
   return
      if (fn:count($queryTokens) eq 0) 
      then <fts:allMatches stokenNum="0" />
      else
         let $allAllMatches := 
            for $queryToken at $pos in $queryTokens
            return fts:applyQueryTokensAsPhrase($searchContext,
                                                 $matchOptions,
                                                 $queryToken,
                                                 $queryPos + $pos - 1)
            let $firstAllMatches := $allAllMatches[1]
            let $restAllMatches := fn:subsequence($allAllMatches, 2)
            return fts:MakeConjunction($firstAllMatches, $restAllMatches)
};

The semantics for FTWords if phrase is specified is given below.

declare function fts:ApplyFTWordsPhrase (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $queryItems as element(fts:queryItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   (: Get sequence of QueryTokenInfos over all query items :)
   let $queryTokens := $queryItems/fts:queryToken
   return
      if (fn:count($queryTokens) eq 0) 
      then <fts:allMatches stokenNum="0" />
      else
         fts:applyQueryTokensAsPhrase($searchContext,
                                       $matchOptions,
                                       $queryTokens,
                                       $queryPos)
};

The ApplyFTWordsPhrase function also flattens the sequence of query items to a sequence of query tokens, but then calls applyQueryTokensAsPhrase on that entire sequence, instead of calling it on each query token individually. Hence, the sequence of all query tokens is matched as a single phrase and the computed TokenInfos are returned.

The semantics for FTWords when any is specified is given below.

declare function fts:ApplyFTWordsAny (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $queryItems as element(fts:queryItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   if (fn:count($queryItems) eq 0) 
   then <fts:allMatches stokenNum="0" />
   else 
      let $firstQueryItem := $queryItems[1]
      let $restQueryItem := fn:subsequence($queryItems, 2)
      let $firstAllMatches := 
         fts:ApplyFTWordsPhrase($searchContext,
                                $matchOptions,
                                $firstQueryItem,
                                $queryPos)
      let $newQueryPos := 
         if ($firstAllMatches//@queryPos) 
         then fn:max($firstAllMatches//@queryPos) + 1
         else $queryPos
      let $restAllMatches :=
         fts:ApplyFTWordsAny($searchContext,
                             $matchOptions,
                             $restQueryItem,
                             $newQueryPos)
      return fts:ApplyFTOr($firstAllMatches, $restAllMatches)
};

The FTWords with any specified forms the disjunction of the AllMatches that are the result of the matching of each query item as a phrase.

The semantics for FTWords when all is specified is given below.

declare function fts:ApplyFTWordsAll (
      $searchContext as item(), 
      $matchOptions as element(fts:matchOptions), 
      $queryItems as element(fts:queryItem)*,
      $queryPos as xs:integer ) 
   as element(fts:allMatches) 
{
   if (fn:count($queryItems) = 0) 
   then <fts:allMatches stokenNum="0" />
   else 
      let $firstQueryItem := $queryItems[1]
      let $restQueryItem := fn:subsequence($queryItems, 2)
      let $firstAllMatches := 
         fts:ApplyFTWordsPhrase($searchContext,
                                $matchOptions,
                                $firstQueryItem,
                                $queryPos)
      return
         if ($restQueryItem) then
            let $newQueryPos := 
               if ($firstAllMatches//@queryPos) 
               then fn:max($firstAllMatches//@queryPos) + 1
               else $queryPos
            let $restAllMatches :=
               fts:ApplyFTWordsAll($searchContext,
                                   $matchOptions,
                                   $restQueryItem,
                                   $newQueryPos)
            return 
               fts:ApplyFTAnd($firstAllMatches, $restAllMatches)
         else $firstAllMatches
};

The difference between all and any is the use of conjunction instead of disjunction.

The ApplyFTWords function combines all of these functions.

declare function fts:ApplyFTWords ( 
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $type as fts:ftWordsType,
      $queryItems as element(fts:queryItem)*, 
      $queryPos as xs:integer )
   as element(fts:allMatches) 
{
   if ($type eq "any word")
   then fts:ApplyFTWordsAnyWord($searchContext,
                                $matchOptions,
                                $queryItems,
                                $queryPos)
   else if ($type eq "all word")
   then fts:ApplyFTWordsAllWord($searchContext,
                                $matchOptions,
                                $queryItems,
                                $queryPos)
   else if ($type eq "phrase")
   then fts:ApplyFTWordsPhrase($searchContext,
                               $matchOptions,
                               $queryItems,
                               $queryPos)
   else if ($type eq "any")
   then fts:ApplyFTWordsAny($searchContext,
                            $matchOptions,
                            $queryItems,
                            $queryPos)
   else fts:ApplyFTWordsAll($searchContext,
                            $matchOptions,
                            $queryItems,
                            $queryPos)
};
                

4.2.5 Match Options Semantics

4.2.5.1 Types

XQuery 1.0 functions are used to define the semantics of FTMatchOptions. These functions operate on an XML representation of the FTMatchOptions. The representation closely follows the syntax. Each FTMatchOption is represented by an XML element. Additional characteristics of the match option are represented as attributes. The schema is given below.

<xs:schema 
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns:fts="http://www.w3.org/2007/xpath-full-text"
    targetNamespace="http://www.w3.org/2007/xpath-full-text"
    elementFormDefault="qualified" 
    attributeFormDefault="unqualified">

  <xs:complexType name="ftMatchOptions">
    <xs:sequence>
       <xs:element ref="fts:thesaurus" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="fts:stopwords" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="fts:case" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="fts:diacritics" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="fts:stem" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="fts:wildcard" minOccurs="0" maxOccurs="1"/>
       <xs:element ref="fts:language" minOccurs="0" maxOccurs="1"/>
    </xs:sequence>
  </xs:complexType>

  <xs:element name="matchOptions" type="fts:ftMatchOptions"/>

  <xs:element name="case" type="fts:ftCaseOption" />
  <xs:element name="diacritics" type="fts:ftDiacriticsOption" />
  <xs:element name="thesaurus" type="fts:ftThesaurusOption" />
  <xs:element name="stem" type="fts:ftStemOption" />
  <xs:element name="wildcard" type="fts:ftWildCardOption" />
  <xs:element name="language" type="fts:ftLanguageOption" />
  <xs:element name="stopwords" type="fts:ftStopWordOption" /> 

 <xs:complexType name="ftCaseOption">
   <xs:sequence>
     <xs:element name="value">
       <xs:simpleType>
         <xs:restriction base="xs:string">
           <xs:enumeration value="case insensitive"/>
           <xs:enumeration value="case sensitive"/>
           <xs:enumeration value="lowercase"/>
           <xs:enumeration value="uppercase"/>
         </xs:restriction>
       </xs:simpleType>
     </xs:element>
   </xs:sequence>
  </xs:complexType>

  <xs:complexType name="ftDiacriticsOption">
    <xs:sequence>
      <xs:element name="value">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="diacritics insensitive"/>
            <xs:enumeration value="diacritics sensitive"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
       
  <xs:complexType name="ftThesaurusOption">
    <xs:sequence>
      <xs:element name="thesaurusName" type="xs:string" 
                  minOccurs="0" maxOccurs="1"/>
      <xs:element name="relationship" type="xs:string" 
                  minOccurs="0" maxOccurs="1"/>
      <xs:element name="range" type="fts:ftRangeSpec" 
                  minOccurs="0" maxOccurs="1"/>
    </xs:sequence>
    <xs:attribute name="thesaurusIndicator">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="using"/>
          <xs:enumeration value="no"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
  </xs:complexType>
 
  <xs:complexType name="ftRangeSpec">
    <xs:attribute name="type" 
                  type="fts:rangeSpecType" 
                  use="required"/>
    <xs:attribute name="m" 
                  type="xs:integer"/>
    <xs:attribute name="n" 
                  type="xs:integer" 
                  use="required"/>
  </xs:complexType>
  
  <xs:simpleType name="rangeSpecType">
    <xs:restriction base="xs:string">
      <xs:enumeration value="exactly"/>
      <xs:enumeration value="at least"/>
      <xs:enumeration value="at most"/>
      <xs:enumeration value="from to"/>
    </xs:restriction>
  </xs:simpleType>
  
  <xs:complexType name="ftStemOption">
    <xs:sequence>
      <xs:element name="value">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="stemming"/>
            <xs:enumeration value="no stemming"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
 
  <xs:complexType name="ftWildCardOption">
    <xs:sequence>
      <xs:element name="value">
        <xs:simpleType>
          <xs:restriction base="xs:string">
            <xs:enumeration value="wildcards"/>
            <xs:enumeration value="no wildcards"/>
          </xs:restriction>
        </xs:simpleType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
 
  <xs:complexType name="ftLanguageOption">
    <xs:sequence>
      <xs:element name="value" type="xs:string"/>
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="ftStopWordOption">
    <xs:sequence>
      <xs:choice>
        <xs:element name="default-stopwords">
            <xs:complexType />
        </xs:element>
        <xs:element name="stopword" type="xs:string" />
        <xs:element name="uri" type="xs:anyURI" />
      </xs:choice>
      <xs:element name="oper" minOccurs="0" maxOccurs="unbounded">
        <xs:complexType>
          <xs:choice>
            <xs:element name="stopword" type="xs:string" />
            <xs:element name="uri" type="xs:anyURI" />
          </xs:choice>
          <xs:attribute name="type">
            <xs:simpleType>
              <xs:restriction base="xs:string">
                <xs:enumeration value="union"/>
                <xs:enumeration value="except"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:attribute>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
  </xs:complexType>
 
</xs:schema>            
4.2.5.2 High-Level Semantics

The previous section described FTSelections without giving any details about how FTMatchOptions need to be interpreted. All processing of FTMatchOptions was delegated to the function matchTokenInfos, which is implementation-defined. In this section, further details on the semantics of FTMatchOptions are given.

The extension is achieved by modifying an existing function and adding functions that are specific to the FTMatchOptions.

Modifications in the semantics of existing functions

The semantics of most of the FTSelections remains unmodified. The modifications are to the method for matching a sequence of query tokens.

declare function fts:applyQueryTokensAsPhrase (
      $searchContext as item(),
      $matchOptions as element(fts:matchOptions),
      $queryTokens as element(fts:queryToken)*,
      $queryPos as xs:integer )
   as element(fts:allMatches)
{
   let $thesaurusOption := $matchOptions/fts:thesaurus[1]
   return 
      if ($thesaurusOption and 
          $thesaurusOption/@thesaurusIndicator eq "using") then
         let $noThesaurusOptions := 
            <fts:matchOptions>{
               $matchOptions/*[fn:not(self::fts:thesaurus)]
            }</fts:matchOptions>
         let $lookupRes := fts:applyThesaurusOption($thesaurusOption,
                                                    $noThesaurusOptions,
                                                    $queryTokens)            
         return fts:ApplyFTWordsAny($searchContext,
                                    $noThesaurusOptions,
                                    $lookupRes,
                                    $queryPos)
      else
         (: from here on we have a single sequence of query tokens :)
         (: which is to be matched a phrase; no alternatives anymore :)
         <fts:allMatches stokenNum="{$queryPos}"> 
         {
            for $pos in
               fts:matchTokenInfos( 
                  $searchContext,
                  $matchOptions,
                  fts:applyStopWordOption($matchOptions/fts:stopwords),
                  $queryTokens )
            return  
               <fts:match>  
                  <fts:stringInclude queryPos="{$queryPos}" isContiguous="true"> 
                  {$pos}
                  </fts:stringInclude> 
               </fts:match>
         } 
         </fts:allMatches> 
};

Two FTMatchOptions need to be processed differently than the rest of the FTMatchOptions as shown in the function above.

  • Unlike all other FTMatchOptions the semantics of the FTThesaurusOption cannot be formulated as an operation on individual query tokens, because a thesaurus lookup may return alternative query items for a whole phrase, i.e., a sequence of query tokens. Since the result of a thesaurus lookup is a sequence of alternatives, there must be a higher level of processing. The above call to applyThesaurusOption returns for the given sequence of query tokens (representing a phrase) all thesaurus expansions for the selected thesaurus, relationship and level range as a sequence of query items. The alternative expansions are evaluated as a disjunction using the fts:ApplyFTWordsAny. The matching of the alternatives is performed with FTThesaurusOption turned off to avoid double expansions, i.e., expansion of an already expanded token.

  • For the semantics of the FTStopWordOption the list of stop words needs to be computed as demanded by the special syntax for stop word lists involving the operators "union" and "except".

Semantics of new FTMatchOptions functions

The expansion of FTSelections also includes adding additional functions that are specific to the FTMatchOptions.

The evaluate function above handles match options occurring in the query structure by using a call to the function replaceMatchOptions which is defined below. The latter function replaces match options from the list given by the first argument with match options of the same group in the list given by the second argument, if any. If an option is present in the second list but not in the first list, the option is included to the resulting list too. Intuitively, the replaceMatchOptions computes the effective match options for a given FTSelection. The function uses the options specified specifically for the current FTSelection ( $ftSelection/fts:matchOptions to override any options of the same group declared up the query tree ($matchOptions).

declare function fts:replaceMatchOptions (
      $matchOptions as element(fts:matchOptions),
      $newMatchOptions as element(fts:matchOptions) )
   as element(fts:matchOptions)
{
   <fts:matchOptions>
   {
      (if ($newMatchOptions/fts:thesaurus) then $newMatchOptions/fts:thesaurus
       else $matchOptions/fts:thesaurus),
      (if ($newMatchOptions/fts:stopwords) then $newMatchOptions/fts:stopwords
       else $matchOptions/fts:stopwords),
      (if ($newMatchOptions/fts:case) then $newMatchOptions/fts:case
       else $matchOptions/fts:case),
      (if ($newMatchOptions/fts:diacritics) then $newMatchOptions/fts:diacritics
       else $matchOptions/fts:diacritics),
      (if ($newMatchOptions/fts:stem) then $newMatchOptions/fts:stem
       else $matchOptions/fts:stem),
      (if ($newMatchOptions/fts:wildcard) then $newMatchOptions/fts:wildcard
       else $matchOptions/fts:wildcard),
      (if ($newMatchOptions/fts:language) then $newMatchOptions/fts:language
       else $matchOptions/fts:language)
   }
   </fts:matchOptions>
};

This function determines how match options of the same group overwrite each other, so that only one option of the same group remains.

The details of the semantics of the remaining FTMatchOptions are determined by the implementation-defined function matchTokenInfos.

4.2.5.3 Formal Semantics Functions

FTMatchOption functions which are necessary to support match option processing are given below.

declare function fts:resolveStopWordsUri ( $uri as xs:string? ) 
   as xs:string* external;

declare function fts:lookupThesaurus (
      $tokens as element(fts:queryToken)*,
      $thesaurusName as xs:string?, 
      $relationship as xs:string?,
      $range as element(fts:range)?,
      $noThesaurusOptions as element(fts:matchOptions) ) 
   as element(fts:queryItem)* external;

The function resolveStopWordsUri is used to resolve any URI to a sequence of strings to be used as stop words.

The function lookupThesaurus finds all expansions related to $tokens in the thesaurus $thesaurusName using the relationship $relationship within the optional number of levels $range. If $tokens consists of more than one query token, it is regarded as a phrase. The current match options other than the thesaurus option are also passed to the function, via $noThesaurusOptions, allowing the implementation to apply any of those match options (whichever it deems relevant) to the input or output of the actual thesaurus lookup.

The thesaurus function returns a sequence of expansion alternatives. Each alternative is regarded as a new search phrase and is represented as a query item. Alternatives are treated as though they are connected with a disjunction (FTOr).

4.2.5.4 FTCaseOption

FTMatchOptions of type FTCaseOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTCaseOption is "lowercase" the returned TokenInfos must span only tokens that are all lowercase. If the FTCaseOption is "uppercase" the returned TokenInfos must span only tokens that are all uppercase. If the FTCaseOption is "case insensitive" the function must return all TokenInfos matching the query tokens when disregarding character case. If the FTCaseOption is "case sensitive" the function must return all TokenInfos that also accord with the query tokens in character case.

4.2.5.5 FTDiacriticsOption

FTMatchOptions of type FTDiacriticsOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTDiacriticsOption is "diacritics insensitive" the function must return all TokenInfos matching the query tokens when disregarding diacritical marks. If the FTDiacriticsOption is "diacritics sensitive" the function must return all TokenInfos that also accord with the query tokens in diacritical marks.

4.2.5.6 FTStemOption

FTMatchOptions of type FTStemOption are passed in the $matchOptions parameter to matchTokenInfos. It is implementation-defined what the effect of the option "stemming" is on matching tokens, however, it is expected that this option allows to match linguistic variants of the query tokens. If the FTStemOption is "no stemming" the returned TokenInfos must span exact matches (i.e. not including linguistic variations) of the query tokens.

4.2.5.7 FTThesaurusOption

The semantics for the FTThesaurusOption is given below.

declare function fts:applyThesaurusOption (
      $matchOption as element(fts:thesaurus),
      $noThesaurusOptions as element(fts:matchOptions),
      $queryTokens as element(fts:queryToken)* )
   as element(fts:queryItem)*
{
   if ($matchOption/@thesaurusIndicator = "using") then
      fts:lookupThesaurus( $queryTokens,
                           $matchOption/fts:thesaurusName,
                           $matchOption/fts:relationship,
                           $matchOption/fts:range,
                           $noThesaurusOptions )
   else if ($matchOption/@thesaurusIndicator = "no") then
      <fts:queryItem>
      {$queryTokens}
      </fts:queryItem>
   else ()
};
4.2.5.8 FTStopWordOption

Stop words interact with FTDistance and FTWindow. The semantics for the FTStopWordOption is given below.

declare function fts:applyStopWordOption (
      $stopWordOption as element(fts:stopwords)? )
   as xs:string*
{
   if ($stopWordOption) then
      let $swords := 
         typeswitch ($stopWordOption/*[1])
            case $e as element(fts:stopword) 
               return $e/text()
            case $e as element(fts:uri) 
               return fts:resolveStopWordsUri($e/text())
            case element(fts:default-stopwords)
               return fts:resolveStopWordsUri(())
            default return ()
      return fts:calcStopWords( $swords, $stopWordOption/fts:oper )
   else ()
};
declare function fts:calcStopWords ( 
      $stopWords as xs:string*,
      $opers as element(fts:oper)* )
   as xs:string*
{
   if ( fn:empty($opers) ) then $stopWords
   else
      let $swords := 
         typeswitch ($opers[1]/*[1])
            case $e as element(fts:stopword) 
               return $e/text()
            case $e as element(fts:uri) 
               return fts:resolveStopWordsUri($e/text())
            default return ()
      return
         if ($opers[1]/@type eq "union") then
            fts:calcStopWords( ($stopWords, $swords), 
                               $opers[fn:position() gt 2] )
         else (: "except" :)
            fts:calcStopWords( $stopWords[fn:not(.)=$swords],
                               $opers[fn:position() gt 2] )
};
            

Given the applicable setting of the Stop Word Option, the function fts:applyStopWordOption calls fts:calcStopWords to compute the set of stop words, and returns that set as an instance of xs:string*. This then is passed to fts:matchTokenInfos, which uses it to affect the matching of tokens. The fts:calcStopWords function uses the function fts:resolveStopWordsUri to resolve any URI to a sequence of strings.

4.2.5.9 FTLanguageOption

The FTLanguageOption is not associated with a semantics function. It is just a parameter to other semantics functions.

4.2.5.10 FTWildCardOption

FTMatchOptions of type FTWildCardOption are passed in the $matchOptions parameter to matchTokenInfos. If the FTWildCardOption is "wildcards" the function must return all TokenInfos in the search context that span tokens, such that those tokens are wildcard expansions of the corresponding query token. The wildcard expansions are described in Section 3.2.7 FTWildCardOption. If the FTWildCardOption is "no wildcards" all query tokens must be matched literally.

4.2.6 Full-Text Operators Semantics

4.2.6.1 FTOr

The parameters of the ApplyFTOr function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The semantics is given below.

declare function fts:ApplyFTOr (
      $allMatches1 as element(fts:allMatches),
      $allMatches2 as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, 
                                       $allMatches2/@stokenNum))}">
   {$allMatches1/fts:match,$allMatches2/fts:match}
   </fts:allMatches>
};
            

The ApplyFTOr function creates a new AllMatches in which Matches are the union of those found in the input AllMatches. Each Match represents one possible result of the corresponding FTSelection. Thus, a Match from either of the AllMatches is a result.

For example, consider the FTSelection "Mustang" ftor "Honda". The AllMatches corresponding to "Mustang" and "Honda" are given below.

FTOr input AllMatches 1

FTOr input AllMatches 2

The AllMatches produced by ApplyFTOr is given below.

FTOr result AllMatches
4.2.6.2 FTAnd

The parameters of the ApplyFTAnd function are the two AllMatches corresponding to the results of the two nested FTSelections. The semantics is given below.

declare function fts:ApplyFTAnd (
      $allMatches1 as element(fts:allMatches),
      $allMatches2 as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{fn:max(($allMatches1/@stokenNum, 
                                       $allMatches2/@stokenNum))}" >
   {
      for $sm1 in $allMatches1/fts:match
      for $sm2 in $allMatches2/fts:match
      return <fts:match>
             {$sm1/*, $sm2/*}
             </fts:match>
   }
   </fts:allMatches>
};
            

The result of the conjunction is a new AllMatches that contains the "Cartesian product" of the matches of the participating FTSelections. Every resulting Match is formed by the combination of the StringInclude components and StringExclude from the AllMatches of the nested FTSelection . Thus every match contains the positions to satisfy a Match from both original FTSelections and excludes the positions that violate the same Matches.

For example, consider the FTSelection "Mustang" ftand "rust". The source AllMatches are give below.

FTAnd input AllMatches 1

FTAnd input AllMatches 2

The AllMatches produced by ApplyFTAnd is given below.

FTAnd result AllMatches
4.2.6.3 FTUnaryNot

The ApplyFTUnaryNot function has one AllMatches parameter corresponding to the result of the nested FTSelection to be negated. The semantics is given below.

declare function fts:InvertStringMatch ( $strm as element(*,fts:stringMatch) ) 
   as element(*,fts:stringMatch)
{
   if ($strm instance of element(fts:stringExclude)) then
      <fts:stringInclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}">
      {$strm/fts:tokenInfo}
      </fts:stringInclude>
   else
      <fts:stringExclude queryPos="{$strm/@queryPos}" isContiguous="{$strm/@isContiguous}">
      {$strm/fts:tokenInfo}
      </fts:stringExclude>
};

declare function fts:UnaryNotHelper ( $matches as element(fts:match)* )
   as element(fts:match)*
{
   if (fn:empty($matches))
   then <fts:match/>
   else
      for $sm in $matches[1]/*
      for $rest in fts:UnaryNotHelper( fn:subsequence($matches, 2) )
      return 
         <fts:match>
         {
            fts:InvertStringMatch($sm),
            $rest/*
         }
         </fts:match>
};

declare function fts:ApplyFTUnaryNot (
      $allMatches as element(fts:allMatches) )
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      fts:UnaryNotHelper($allMatches/fts:match)
   }
   </fts:allMatches>
};
            

The generation of the resulting AllMatches of an FTUnaryNot resembles the transformation of a negation of prepositional formula in DNF back to DNF. The negation of AllMatches requires the inversion of all the StringMatches within the AllMatches.

In the InvertStringMatch function above, this inversion occurs as follows.

  1. The function fts:invertStringMatch inverts a StringInclude into a StringExclude and vice versa.

  2. The function fts:UnaryNotHelper transforms the source Matches into the resulting Matches by forming the combinations of the inversions of a StringInclude or StringExclude component over the source Matches into new Matches.

For example, consider the FTSelection ftnot ("Mustang" ftor "Honda"). The source AllMatches is given below:

FTUnaryNot input AllMatches

The FTUnaryNot transforms the StringIncludes to StringExcludes as illustrated below.

FTUnaryNot result AllMatches
4.2.6.4 FTMildNot

The parameters of the ApplyFTMildNot function are the two AllMatches parameters corresponding to the results of the two nested FTSelections. The semantics is given below.

declare function fts:CoveredIncludePositions (
       $match as element(fts:match) )
    as xs:integer*
{
    for $strInclude in $match/fts:stringInclude
    return $strInclude/fts:tokenInfo/@startPos
           to $strInclude/fts:tokenInfo/@endPos
};

declare function fts:ApplyFTMildNot (
       $allMatches1 as element(fts:allMatches),
       $allMatches2 as element(fts:allMatches) )
    as element(fts:allMatches)
{
    if (fn:count($allMatches1//fts:stringExclude) gt 0) then
       fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'FTDY0017'), 
                "Invalid expression on the left-hand side of a not-in")
    else if (fn:count($allMatches2//fts:stringExclude) gt 0) then
       fn:error(fn:QName('http://www.w3.org/2005/xqt-errors', 'FTDY0017'), 
                "Invalid expression on the right-hand side of a not-in")
    else if (fn:count($allMatches2//fts:stringInclude) eq 0) then
       $allMatches1
    else
       <fts:allMatches stokenNum="{$allMatches1/@stokenNum}">
       {
          $allMatches1/fts:match[
             every $matches2 in $allMatches2/fts:match
             satisfies
                let $posSet1 := fts:CoveredIncludePositions(.)
                let $posSet2 := fts:CoveredIncludePositions($matches2)
                   return some $pos in $posSet1 satisfies fn:not($pos = $posSet2)
          ]
       }
       </fts:allMatches>
};
            

The resulting AllMatches contains Matches of the first operand that do not mention in their StringInclude components positions in a StringInclude component in the AllMatches of the second operand.

For example, consider the FTSelection ("Ford" not in "Ford Mustang"). The source AllMatches for the left-hand side argument is given below.

FTMildNot input AllMatches 1

The source AllMatches for the right-hand side argument is given below.

FTMildNot input AllMatches 2

The FTMildNot will transform these to an empty AllMatches because both position 1 and position 27 from the first AllMatches contain only TokenInfos from StringInclude components of the second AllMatches.

4.2.6.5 FTOrder

The ApplyFTOrder function has one AllMatches parameter corresponding to the result of the nested FTSelections. The semantics is given below.

declare function fts:ApplyFTOrder (
      $allMatches as element(fts:allMatches) )
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude
            satisfies (($stringInclude1/fts:tokenInfo/@startPos <= 
                        $stringInclude2/fts:tokenInfo/@startPos)
                       and
                       ($stringInclude1/@queryPos <= 
                        $stringInclude2/@queryPos))
                      or
                       (($stringInclude1/fts:tokenInfo/@startPos>= 
                         $stringInclude2/fts:tokenInfo/@startPos)
                        and
                        ($stringInclude1/@queryPos >= 
                         $stringInclude2/@queryPos))
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where every $stringIncl in $match/fts:stringInclude
                  satisfies (($stringExcl/fts:tokenInfo/@startPos <= 
                              $stringIncl/fts:tokenInfo/@startPos)
                             and
                              ($stringExcl/@queryPos <= 
                               $stringIncl/@queryPos))
                            or
                             (($stringExcl/fts:tokenInfo/@startPos >= 
                               $stringIncl/fts:tokenInfo/@startPos)
                              and
                              ($stringExcl/@queryPos >= 
                               $stringIncl/@queryPos))
            return $stringExcl
         }
         </fts:match>
   }         
   </fts:allMatches>
};
            

The resulting AllMatches contains the Matches for which the starting positions in the StringInclude elements are in the order of the query positions of their query strings. StringExcludes that preserve the order (with respect to their starting positions) are also retained.

For example, consider the FTSelection ("great" ftand "condition") ordered. The source AllMatches is given below.

FTOrder input AllMatchesFTOrder input AllMatchesFTOrder input AllMatches

The AllMatches for FTOrder are given below.

FTOrder result AllMatchesFTOrder result AllMatches
4.2.6.6 FTScope

The parameters of the ApplyFTScope function are 1) the type of the scope (same or different), 2) the linguistic unit (sentence or paragraph), and 2) one AllMatches parameter corresponding to the result of the nested FTSelections. The function definitions depend on the type of the scope (paragraph, sentence) and the scope predicate (same, different).

The semantics of same sentence is given below.

declare function fts:ApplyFTScopeSameSentence (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude 
            satisfies $stringInclude1/fts:tokenInfo/@startSent = 
                      $stringInclude2/fts:tokenInfo/@startSent
                  and $stringInclude1/fts:tokenInfo/@startSent = 
                      $stringInclude1/fts:tokenInfo/@endSent
                  and $stringInclude2/fts:tokenInfo/@startSent = 
                      $stringInclude2/fts:tokenInfo/@endSent
                  and $stringInclude1/fts:tokenInfo/@startSent > 0
                  and $stringInclude2/fts:tokenInfo/@startSent > 0
      return 
        <fts:match>
        {
           $match/fts:stringInclude,
           for $stringExcl in $match/fts:stringExclude
           where
              $stringExcl/fts:tokenInfo/@startSent = 0
              or
              ($stringExcl/fts:tokenInfo/@startSent = 
               $stringExcl/fts:tokenInfo/@endSent
               and 
                  (every $stringIncl in $match/fts:stringInclude
                   satisfies $stringIncl/fts:tokenInfo/@startSent = 
                             $stringExcl/fts:tokenInfo/@startSent) )
           return $stringExcl
        }
        </fts:match>
   }
   </fts:allMatches>
};

An AllMatches returned by the scope same sentence contains those Matches whose StringIncludes span only a single sentence and all span the same sentence. In these Matches only those StringExcludes are retained that also only span a single sentence, which is, in case there are StringIncludes in that Match, the same as the one spanned by the StringIncludes.

The semantics of different sentence is given below.

declare function fts:ApplyFTScopeDifferentSentence (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where
         count($match/fts:stringInclude) > 1
         and
         (
            every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude  
            satisfies
               $stringInclude1 is $stringInclude2
               or
               (
                     ( $stringInclude1/fts:tokenInfo/@startSent !=  
                       $stringInclude2/fts:tokenInfo/@startSent 
                    or $stringInclude1/fts:tokenInfo/@startSent !=  
                       $stringInclude1/fts:tokenInfo/@endSent 
                    or $stringInclude2/fts:tokenInfo/@startSent !=  
                       $stringInclude2/fts:tokenInfo/@endSent ) 
                   and $stringInclude1/fts:tokenInfo/@startSent > 0 
                   and $stringInclude2/fts:tokenInfo/@endSent > 0
               )
         )
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where every $stringIncl in $match/fts:stringInclude
                  satisfies ($stringIncl/fts:tokenInfo/@startSent !=  
                             $stringExcl/fts:tokenInfo/@startSent 
                          or $stringIncl/fts:tokenInfo/@startSent !=  
                             $stringIncl/fts:tokenInfo/@endSent 
                          or $stringExcl/fts:tokenInfo/@startSent !=  
                             $stringExcl/fts:tokenInfo/@endSent ) 
                         and $stringIncl/fts:tokenInfo/@startSent > 0 
                         and $stringExcl/fts:tokenInfo/@endSent > 0
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

An AllMatches returned by the scope different sentence contains those Matches that have at least two StringIncludes, no two of which begin and end all in the same sentence. In these Matches only those StringExcludes are retained that do not conflict with any of the StringIncludes.

The semantics of same paragraph is analogous to same sentence and is given below.

declare function fts:ApplyFTScopeSameParagraph (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude  
            satisfies $stringInclude1/fts:tokenInfo/@startPara = 
                      $stringInclude2/fts:tokenInfo/@startPara
                  and $stringInclude1/fts:tokenInfo/@startPara = 
                      $stringInclude1/fts:tokenInfo/@endPara
                  and $stringInclude2/fts:tokenInfo/@startPara = 
                      $stringInclude2/fts:tokenInfo/@endPara
                  and $stringInclude1/fts:tokenInfo/@startPara > 0
                  and $stringInclude2/fts:tokenInfo/@endPara > 0
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where 
               $stringExcl/fts:tokenInfo/@startPara = 0
               or
               ($stringExcl/fts:tokenInfo/@startPara = 
                $stringExcl/fts:tokenInfo/@endPara
                and
                   (every $stringIncl in $match/fts:stringInclude
                    satisfies $stringIncl/fts:tokenInfo/@startPara = 
                              $stringExcl/fts:tokenInfo/@startPara) )
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of different paragraph is analogous to different sentence and is given below.

declare function fts:ApplyFTScopeDifferentParagraph (
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      where
         count($match/fts:stringInclude) > 1
         and
         (
            every $stringInclude1 in $match/fts:stringInclude,
                  $stringInclude2 in $match/fts:stringInclude  
            satisfies
               $stringInclude1 is $stringInclude2
               or
               (
                     ( $stringInclude1/fts:tokenInfo/@startPara !=  
                       $stringInclude2/fts:tokenInfo/@startPara 
                    or $stringInclude1/fts:tokenInfo/@startPara !=  
                       $stringInclude1/fts:tokenInfo/@endPara 
                    or $stringInclude2/fts:tokenInfo/@startPara !=  
                       $stringInclude2/fts:tokenInfo/@endPara ) 
                   and $stringInclude1/fts:tokenInfo/@startPara > 0 
                   and $stringInclude2/fts:tokenInfo/@endPara > 0
               )
         )
      return 
         <fts:match>
         {
            $match/fts:stringInclude,
            for $stringExcl in $match/fts:stringExclude
            where every $stringIncl in $match/fts:stringInclude
                  satisfies ($stringIncl/fts:tokenInfo/@startPara !=  
                             $stringExcl/fts:tokenInfo/@startPara 
                          or $stringIncl/fts:tokenInfo/@startPara !=  
                             $stringIncl/fts:tokenInfo/@endPara 
                          or $stringExcl/fts:tokenInfo/@startPara !=  
                             $stringExcl/fts:tokenInfo/@endPara ) 
                         and $stringIncl/fts:tokenInfo/@startPara > 0 
                         and $stringExcl/fts:tokenInfo/@endPara > 0
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics for the general case is given below.

declare function fts:ApplyFTScope (
      $type as fts:scopeType,
      $selector as fts:scopeSelector, 
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "same" and $selector eq "sentence")
   then fts:ApplyFTScopeSameSentence($allMatches)
   else if ($type eq "different" and $selector eq "sentence")
      then fts:ApplyFTScopeDifferentSentence($allMatches)
   else if ($type eq "same" and $selector eq "paragraph")
      then fts:ApplyFTScopeSameParagraph($allMatches)
   else fts:ApplyFTScopeDifferentParagraph($allMatches)
};

For example, consider the FTSelection ("Mustang" ftand "Honda") same paragraph. The source AllMatches is given below.

FTScope input AllMatches

The FTScope returns an empty AllMatches because neither Match contains TokenInfos from a single sentence.

4.2.6.7 FTContent

The parameters of the ApplyFTContent function are 1) the search context, 2) the type of the content match (at start, at end, or entire content), and 3) one AllMatches parameter corresponding to the result of the nested FTSelections.

The evaluation of ApplyFTContent depends on the type of the content match:

  • entire content retains those Matches such that for every token position in the search context, some StringInclude in the Match covers that token position.

  • at start retains those Matches that contain a StringInclude that covers the lowest token position in the search context.

  • at end retains those Matches that contain a StringInclude that covers the highest token position in the search context.

The semantics is given below.

declare function fts:ApplyFTContent (
      $searchContext as item(),
      $type as fts:contentMatchType,
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      $allMatches/fts:match[
         let $start_pos := fts:getLowestTokenPosition($searchContext),
             $end_pos   := fts:getHighestTokenPosition($searchContext),
             $match     := .
         return
            if ($type eq "entire content") then
               every $pos in $start_pos to $end_pos
               satisfies
                  some $si in $match/fts:stringInclude[data(@isContiguous)]
                  satisfies
                     fts:TokenInfoCoversTokenPosition($si/fts:tokenInfo, $pos)
            else
               let $pos :=
                  if ($type eq "at start") then
                     $start_pos
                  else (: $type eq "at end" :)
                     $end_pos
               return
                  some $ti in $match/fts:stringInclude/fts:tokenInfo
                  satisfies
                     fts:TokenInfoCoversTokenPosition($ti, $pos)
      ]
   }
   </fts:allMatches>
};

ApplyFTContent depends on the helper function fts:TokenInfoCoversTokenPosition, which ascertains whether the given $tokenInfo covers a particular $tokenPosition.

declare function fts:TokenInfoCoversTokenPosition(
      $tokenInfo as element(fts:tokenInfo),
      $tokenPosition as xs:integer )
   as xs:boolean
{
   ($tokenPosition >= $tokenInfo/@startPos)
   and
   ($tokenPosition <= $tokenInfo/@endPos)
};

ApplyFTContent also depends on two functions whose definitions are implementation-dependent: getLowestTokenPosition and getHighestTokenPosition return (respectively) the first and last token positions of the item $searchContext.

declare function fts:getLowestTokenPosition(
      $searchContext as item() )
   as xs:integer
   external;

declare function fts:getHighestTokenPosition(
      $searchContext as item() )
   as xs:integer
   external;

Note that the way @isContiguous is calculated in joinIncludes and used in ApplyFTContent can lead to counter-intuitive results. For example, consider the following query:

"one two three four"
contains text
   ("one" ftand "three" window 3 words)
   ftand
   ("two" ftand "four" window 3 words)
   entire content

Even though the four query tokens do cover all of the search context's token positions, the query yields false, because the Match that ApplyFTContent receives as input has two StringIncludes, each of which is non-contiguous.

4.2.6.8 FTWindow

Before we define the semantics functions of the FTWindow and FTDistance operations, we introduce the auxiliary function joinIncludes that will be used in their definitions. joinIncludes takes a sequence of StringIncludes of a Match and transforms it into either the empty sequence, in case the input sequence was empty, or otherwise a single StringInclude representing the span from the first position of the match to the last. For the purpose of being able to evaluate an "entire content" operator further up in the tree, we pre-evaluate whether all possible positions between first and last are covered in the input StringIncludes and store that boolean in the attribute "isContiguous".

declare function fts:joinIncludes(
      $strIncls as element(fts:stringInclude)* )
   as element(fts:stringInclude)?
{
   if (fn:empty($strIncls))
   then 
      $strIncls
   else
      let $posSet := fts:CoveredIncludePositions(<fts:match>$strIncls</fts:match>),
         $minPos := fn:min($strIncls/fts:tokenInfo/@startPos),
         $maxPos := fn:max($strIncls/fts:tokenInfo/@endPos),
         $isContiguous := 
            ( every $pos in $minPos to $maxPos
              satisfies ($pos = $posSet) )
            and
            ( every $strIncl in $strIncls
              satisfies $strIncl/@isContiguous )
      return
         <fts:stringInclude 
            queryPos="{$strIncls[1]/@queryPos}"
            isContiguous="{$isContiguous}">
            <fts:tokenInfo
               startPos ="{$minPos}"
               endPos   ="{$maxPos}"
               startSent="{fn:min($strIncls/fts:tokenInfo/@startSent)}"
               endSent  ="{fn:max($strIncls/fts:tokenInfo/@endSent)}"
               startPara="{fn:min($strIncls/fts:tokenInfo/@startPara)}"
               endPara  ="{fn:max($strIncls/fts:tokenInfo/@endPara)}"/>
         </fts:stringInclude>
};

The parameters of the ApplyFTWindow function are 1) the unit of type fts:distanceType, 2) a size, and 3) one AllMatches parameter corresponding to the result of the nested FTSelections. For each unit type a function is defined as follows.

The semantics of window N words is given below.

declare function fts:ApplyFTWordWindow (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startPos),
          $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endPos)
      for $windowStartPos in ($maxpos - $n + 1 to $minpos)
      let $windowEndPos := $windowStartPos + $n - 1
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExclude in $match/fts:stringExclude
            where $stringExclude/fts:tokenInfo/@startPos >=
                  $windowStartPos
              and $stringExclude/fts:tokenInfo/@endPos <=
                  $windowEndPos
            return $stringExclude
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of window N sentences is given below.

declare function fts:ApplyFTSentenceWindow (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startSent),
          $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endSent)
      for $windowStartPos in ($maxpos - $n + 1 to $minpos)
      let $windowEndPos := $windowStartPos + $n - 1
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExclude in $match/fts:stringExclude
            where $stringExclude/fts:tokenInfo/@startSent >=
                  $windowStartPos
              and $stringExclude/fts:tokenInfo/@endSent <=
                  $windowEndPos
            return $stringExclude
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of window N paragraphs is given below.

declare function fts:ApplyFTParagraphWindow (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches)
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $minpos := fn:min($match/fts:stringInclude/fts:tokenInfo/@startPara),
          $maxpos := fn:max($match/fts:stringInclude/fts:tokenInfo/@endPara)
      for $windowStartPos in ($maxpos - $n + 1 to $minpos)
      let $windowEndPos := $windowStartPos + $n - 1
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExclude in $match/fts:stringExclude
            where $stringExclude/fts:tokenInfo/@startPara >=
                  $windowStartPos
              and $stringExclude/fts:tokenInfo/@endPara <=
                  $windowEndPos
            return $stringExclude
         }
         </fts:match>
   }
   </fts:allMatches>
};

The resulting AllMatches contains Matches of the operand that satisfy the condition that there exists a sequence of the specified number of consecutive (token, sentence, or paragraph) positions, such that all StringIncludes are within that window, and the StringExcludes retained are also within that window. For each Match that satisfies the window condition the StringIncludes are joined into a single StringInclude. This enables further window or distance operations to be applied to the result in a way that that result is taken as a single entity.

The semantics for the general function is given below.

declare function fts:ApplyFTWindow (
      $type as fts:distanceType,
      $size as xs:integer,
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "word") then
      fts:ApplyFTWordWindow($allMatches, $size)
   else if ($type eq "sentence") then 
      fts:ApplyFTSentenceWindow($allMatches, $size)
   else
      fts:ApplyFTParagraphWindow($allMatches, $size)
};

For example, consider the FTWindow selection ("Ford Mustang" ftand "excellent") window 10 words. The Matches of the source AllMatches for ("Ford Mustang" ftand "excellent") are given below.

FTWindow AllMatches

FTWindow AllMatches

FTWindow AllMatches

FTWindow AllMatches

FTWindow AllMatches

FTWindow AllMatches

The result for the FTWindow selection consists of only the first, the fifth, and the sixth Matches because their respective window sizes are 5, 4, and 9.

4.2.6.9 FTDistance

The parameters of the ApplyFTDistance function are 1) one AllMatches parameter corresponding to the result of the nested FTSelections, 2) the unit of the distance (tokens, sentences, paragraphs), and 3) the range specified. The resulting AllMatches contains Matches of the operand that satisfy the condition that the distance for every pair of consecutive StringIncludes is within the specified interval, where the distance is measured (in tokens, sentences, or paragraphs) from the end of the preceding StringInclude to the start of the next.

An invocation of the ApplyFTDistance function will call one of twelve helper functions, each of which handles a particular unit of distance and type of range.

declare function fts:ApplyFTDistance (
      $type as fts:distanceType,
      $range as element(fts:range),
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if ($type eq "word") then
      if ($range/@type eq "exactly") then
         fts:ApplyFTWordDistanceExactly($allMatches, $range/@n)
      else if ($range/@type eq "at least") then 
         fts:ApplyFTWordDistanceAtLeast($allMatches, $range/@n)
      else if ($range/@type eq "at most") then
         fts:ApplyFTWordDistanceAtMost( $allMatches, $range/@n)
      else
         fts:ApplyFTWordDistanceFromTo( $allMatches, $range/@m, $range/@n)
   else if ($type eq "sentence") then
      if ($range/@type eq "exactly") then
         fts:ApplyFTSentenceDistanceExactly($allMatches, $range/@n)
      else if ($range/@type eq "at least") then
         fts:ApplyFTSentenceDistanceAtLeast($allMatches, $range/@n)
      else if ($range/@type eq "at most") then
         fts:ApplyFTSentenceDistanceAtMost( $allMatches, $range/@n)
      else
         fts:ApplyFTSentenceDistanceFromTo( $allMatches, $range/@m, $range/@n)
   else
      if ($range/@type eq "exactly") then
         fts:ApplyFTParagraphDistanceExactly($allMatches, $range/@n)
      else if ($range/@type eq "at least") then
         fts:ApplyFTParagraphDistanceAtLeast($allMatches, $range/@n)
      else if ($range/@type eq "at most") then
         fts:ApplyFTParagraphDistanceAtMost( $allMatches, $range/@n)
      else
         fts:ApplyFTParagraphDistanceFromTo( $allMatches, $range/@m, $range/@n)
};

Word Distance

The semantics of case word distance exactly N is given below.

declare function fts:ApplyFTWordDistanceExactly(
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending,
                              $si/fts:tokenInfo/@endPos ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $idx in 1 to fn:count($sorted) - 1
            satisfies fts:wordDistance(
                         $sorted[$idx]/fts:tokenInfo,
                         $sorted[$idx+1]/fts:tokenInfo
                      ) = $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) = $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The semantics of word distance at least N is given below.

declare function fts:ApplyFTWordDistanceAtLeast (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending,
                              $si/fts:tokenInfo/@endPos ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                         $sorted[$index]/fts:tokenInfo,
                         $sorted[$index+1]/fts:tokenInfo
                      ) >= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) >= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of word distance at most N is given below.

declare function fts:ApplyFTWordDistanceAtMost (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending,
                              $si/fts:tokenInfo/@endPos ascending
                     return $si
      where
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
        <fts:match>
        {
           fts:joinIncludes($match/fts:stringInclude),
           for $stringExcl in $match/fts:stringExclude
           where some $stringIncl in $match/fts:stringInclude
                 satisfies fts:wordDistance(
                               $stringIncl/fts:tokenInfo,
                               $stringExcl/fts:tokenInfo
                           ) <= $n
           return $stringExcl
        }
        </fts:match>
   }
   </fts:allMatches>
};

The semantics of word distance from M to N is given below.

declare function fts:ApplyFTWordDistanceFromTo (
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPos ascending,
                              $si/fts:tokenInfo/@endPos ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) >= $m 
                      and
                      fts:wordDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) >= $m
                            and
                            fts:wordDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) <= $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The preceding four helper functions all rely on fts:wordDistance, which returns the number of token positions that occur between two TokenInfos. For example, two tokens with consecutive positions have a distance of 0 tokens, and two overlapping tokens have a distance of -1 tokens.

declare function fts:wordDistance (
      $tokenInfo1 as element(fts:tokenInfo),
      $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted :=
      for $ti in ($tokenInfo1, $tokenInfo2)
      order by $ti/@startPos ascending, $ti/@endPos ascending
      return $ti
   return
      (: -1 because we count starting at 0 :)
      $sorted[2]/@startPos - $sorted[1]/@endPos - 1
};
            

Sentence Distance

The semantics of sentence distance exactly N is given below.

declare function fts:ApplyFTSentenceDistanceExactly (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startSent ascending,
                              $si/fts:tokenInfo/@endSent ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) = $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) = $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of sentence distance at least N is given below.

declare function fts:ApplyFTSentenceDistanceAtLeast (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                    order by $si/fts:tokenInfo/@startSent ascending,
                             $si/fts:tokenInfo/@endSent ascending
                    return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) >= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) >= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of sentence distance at most N is given below.

declare function fts:ApplyFTSentenceDistanceAtMost (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startSent ascending,
                              $si/fts:tokenInfo/@endSent ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) <= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of sentence distance from M to N is given below.

declare function fts:ApplyFTSentenceDistanceFromTo (
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startSent ascending,
                              $si/fts:tokenInfo/@endSent ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) >= $m 
                      and
                      fts:sentenceDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) >= $m
                            and
                            fts:sentenceDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) <= $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The preceding four helper functions all rely on fts:sentenceDistance, which returns the number of sentences between two TokenInfos.

declare function fts:sentenceDistance (
      $tokenInfo1 as element(fts:tokenInfo),
      $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted :=
      for $ti in ($tokenInfo1, $tokenInfo2)
      order by $ti/@startPos ascending, $ti/@endPos ascending
      return $ti
   return
      (: -1 because we count starting at 0 :)
      $sorted[2]/@startSent - $sorted[1]/@endSent - 1
};
            

Paragraph Distance

The semantics of paragraph distance exactly N is given below.

declare function fts:ApplyFTParagraphDistanceExactly (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPara ascending,
                              $si/fts:tokenInfo/@endPara ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) = $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) = $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of paragraph distance at least N is given below.

declare function fts:ApplyFTParagraphDistanceAtLeast (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPara ascending,
                              $si/fts:tokenInfo/@endPara ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) >= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) >= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of paragraph distance at most N is given below.

declare function fts:ApplyFTParagraphDistanceAtMost (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPara ascending,
                              $si/fts:tokenInfo/@endPara ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) <= $n
            return $stringExcl
         }
         </fts:match>
   }           
   </fts:allMatches>
};

The semantics of paragraph distance from M to N is given below.

declare function fts:ApplyFTParagraphDistanceFromTo (
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}">
   {
      for $match in $allMatches/fts:match
      let $sorted := for $si in $match/fts:stringInclude          
                     order by $si/fts:tokenInfo/@startPara ascending,
                              $si/fts:tokenInfo/@endPara ascending
                     return $si
      where 
         if (fn:count($sorted) le 1) then fn:true() else
            every $index in (1 to fn:count($sorted) - 1)
            satisfies fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) >= $m 
                      and
                      fts:paraDistance(
                          $sorted[$index]/fts:tokenInfo,
                          $sorted[$index+1]/fts:tokenInfo
                      ) <= $n 
      return 
         <fts:match>
         {
            fts:joinIncludes($match/fts:stringInclude),
            for $stringExcl in $match/fts:stringExclude
            where some $stringIncl in $match/fts:stringInclude
                  satisfies fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) >= $m
                            and
                            fts:paraDistance(
                                $stringIncl/fts:tokenInfo,
                                $stringExcl/fts:tokenInfo
                            ) <= $n
            return $stringExcl
         }
         </fts:match>
   }
   </fts:allMatches>
};

The preceding four helper functions all rely on fts:paraDistance, which returns the number of paragraphs between two TokenInfos.

declare function fts:paraDistance (
      $tokenInfo1 as element(fts:tokenInfo),
      $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted :=
      for $ti in ($tokenInfo1, $tokenInfo2)
      order by $ti/@startPos ascending, $ti/@endPos ascending
      return $ti
   return
      (: -1 because we count starting at 0 :)
      $sorted[2]/@startPara - $sorted[1]/@endPara - 1
};
            

For example, consider the FTDistance selection ("Ford Mustang" ftand "excellent") distance at most 3 words. The Matches of the source AllMatches for ("Ford Mustang" ftand "excellent") are given below.

FTDistance input AllMatches

FTDistance input AllMatches

FTDistance input AllMatches

FTDistance input AllMatches

FTDistance input AllMatches

FTDistance input AllMatches

The result for the FTDistance selection consists of only the first Match (with positions 1, 2, and 5) and the fifth Match (with positions 25, 27, and 28), because only for these Matches the word distance between consecutive TokenInfos is always less than or equal to 3. For the first Match, the word distance between the two TokenInfos is 2 (startPos 5 - endPos 2 - 1), and for the fifth Match, it's 1 (startPos 27 - endPos 25 - 1).

4.2.6.10 FTTimes

The parameters of the ApplyFTTimes function are 1) an FTRange specification, and 2) a parameter corresponding to the result of the nested FTWords.

The function definitions depend on the range specification FTRange to limit the number of occurrences.

The general semantics is given below.

declare function fts:FormCombinations (
      $sms as element(fts:match)*, 
      $k as xs:integer ) 
   as element(fts:match)*
(:
   Find all combinations of exactly $k elements from $sms, and
   for each such combination, construct a match whose children are
   copies of all the children of all the elements in the combination.
   Return the sequence of all such matches.
:)
{
   if ($k eq 0) then <fts:match/>
   else if (fn:count($sms) lt $k) then ()
   else if (fn:count($sms) eq $k) then <fts:match>{$sms/*}</fts:match>
   else
      let $first := $sms[1],
          $rest  := fn:subsequence($sms, 2)
      return (
         (: all the combinations that don't involve $first :)
         fts:FormCombinations($rest, $k),

         (: and all the combinations that do involve $first :)
         for $combination in fts:FormCombinations($rest, $k - 1)
         return
            <fts:match>
            {
               $first/*,
               $combination/*
            }
            </fts:match>
      )
};

declare function fts:FormCombinationsAtLeast (
      $sms as element(fts:match)*,
      $times as xs:integer)
   as element(fts:match)*
(:
   Find all combinations of $times or more elements from $sms, and
   for each such combination, construct a match whose children are
   copies of all the children of all the elements in the combination.
   Return the sequence of all such matches.
:)
{
   for $k in $times to fn:count($sms)
   return fts:FormCombinations($sms, $k)
};

declare function fts:FormRange (
      $sms as element(fts:match)*, 
      $l as xs:integer, 
      $u as xs:integer, 
      $stokenNum as xs:integer ) 
   as element(fts:allMatches)
{
   if ($l > $u) then <fts:allMatches stokenNum="0" />
   else 
      let $am1 := <fts:allMatches stokenNum="{$stokenNum}">
                     {fts:FormCombinationsAtLeast($sms, $l)}
                  </fts:allMatches>
      let $am2 := <fts:allMatches stokenNum="{$stokenNum}">
                     {fts:FormCombinationsAtLeast($sms, $u+1)}
                  </fts:allMatches>
      return fts:ApplyFTAnd($am1,
                            fts:ApplyFTUnaryNot($am2))
};
            

The semantics of occurs exactly N times is given below.

declare function fts:ApplyFTTimesExactly (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   fts:FormRange($allMatches/fts:match, $n, $n, $allMatches/@stokenNum)      
};

The semantics of occurs at least N times is given below.

declare function fts:ApplyFTTimesAtLeast (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   <fts:allMatches stokenNum="{$allMatches/@stokenNum}"> 
   {fts:FormCombinationsAtLeast($allMatches/fts:match, $n)} 
   </fts:allMatches>
};

The semantics of occurs at most N times is given below.

declare function fts:ApplyFTTimesAtMost (
      $allMatches as element(fts:allMatches),
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   fts:FormRange($allMatches/fts:match, 0, $n, $allMatches/@stokenNum)
};

The semantics of occurs from M to N times is given below.

declare function fts:ApplyFTTimesFromTo (
      $allMatches as element(fts:allMatches),
      $m as xs:integer,
      $n as xs:integer ) 
   as element(fts:allMatches) 
{
   fts:FormRange($allMatches/fts:match, $m, $n, $allMatches/@stokenNum)  
};

The way to ensure that there are at least N different matches of an FTSelection is to ensure that at least N of its Matches occur simultaneously. This is similar to forming their conjunction by combining N or more distinct Matches into one simple match. Therefore, the AllMatches for the selection condition specifying the range qualifier at least N contains the possible combinations of N or more simple matches of the operand. This operation is performed in the function fts:FormCombinationsAtLeast.

The range [L, U] is represented by the condition at least L and not at least U+1. This transformation is performed in the function fts:FormRange.

The semantics for the general case is given below.

declare function fts:ApplyFTTimes (
      $range as element(fts:range),
      $allMatches as element(fts:allMatches) ) 
   as element(fts:allMatches) 
{
   if (fn:count($allMatches//fts:stringExclude) gt 0) then
      fn:error(fn:QName('http://www.w3.org/2005/xqt-errors',
                        'XPST0003'))
   else if ($range/@type eq "exactly") then
      fts:ApplyFTTimesExactly($allMatches, $range/@n)
   else if ($range/@type eq "at least") then 
      fts:ApplyFTTimesAtLeast($allMatches, $range/@n)
   else if ($range/@type eq "at most") then
      fts:ApplyFTTimesAtMost($allMatches, $range/@n)
   else fts:ApplyFTTimesFromTo($allMatches, 
                               $range/@m, 
                               $range/@n)
};

The above function performs a sanity check to ensure that the nested AllMatches is a result of the evaluation of FTWords as defined in the grammar rule for FTPrimary . Otherwise, an error [err:XPST0003]XP is raised.

For example, consider the FTTimes selection "Mustang" occurs at least 2 times. The source AllMatches of the FTWords selection "Mustang" is given below.

FTTimes input AllMatches

The result consists of the pairs of the Matches.

FTTimes result AllMatches

4.3 FTContainsExpr

Consider an FTContainsExpr expression of the form SearchContext contains text FTSelection, where SearchContext is an XQuery 1.0 expression that returns a sequence of items. The FTContainsExpr returns true if and only if one of those items satisfies the FTSelection.

If the FTContainsExpr is of the form SearchContext contains text FTSelection without content IgnoreExpr for some XQuery 1.0 expression IgnoreExpr, then any nodes returned by IgnoreExpr are (notionally) pruned from each search context item before attempting to satisfy the FTSelection.

More formally, evaluation of an FTContainsExpr proceeds according to the following steps. Where appropriate, the explanation includes references to arcs labelled "FTn" in the processing model diagram (Figure 1) in 2.1 Processing Model.

  1. For each XQuery/XPath expression nested within the FTContainsExpr, evaluate it with respect to the same dynamic context as the FTContainsExpr (FT1). Specifically:

    1. Evaluate the search context expression (SearchContext), resulting in the sequence of search context items.

    2. Evaluate the ignore option (IgnoreExpr) if any, resulting in the set of ignored nodes.

    3. At each FTWordsValue, evaluate the literal/expression and convert the result to xs:string*.

    4. At each weight specification, evaluate the expression and convert the result to xs:double.

    5. At each FTWindow and FTRange, evaluate the AdditiveExpr(s) and convert each to xs:integer.

  2. Using the settings of the match option components in the FTContainsExpr's static context, construct an element(fts:matchOptions) structure.

  3. Based on the parse-tree of the FTContainsExpr's FTSelection and the results of steps 1c-1e, construct an element(*,fts:ftSelection) structure. We refer to this as the "operator tree" below. In this process:

    1. Construct the operator tree from the top down, propagating FTMatchOptions down to FTWordsValues.

    2. Tokenize the query string(s) obtained at 1c. (FT2.1)

  4. Call the function fts:FTContainsExpr (see declaration below), passing the following arguments to its parameters:

    • $searchContextItems: The sequence of items returned by SearchContext, calculated in step 1a.

    • $ignoreNodes: The sequence of items returned by IgnoreExpr (in 1b), if that expression is present, or the empty sequence otherwise.

    • $ftSelection: The XML node representation of FTSelection (constructed in step 2).

    • $defOptions: The XML representation of the match options in the FTContainsExpr's static context (constructed in step 3).

    Within the function, for each search context item:

    1. Delete the ignored nodes from the search context item. [fts:FTContainsExpr calls fts:reconstruct.]

    2. Traverse the operator tree from the top down, propagating FTMatchOptions down to FTWordsValues. [fts:evaluate calls itself and fts:replaceMatchOptions.]

    3. At each FTWordsValue, using the prevailing FTMatchOptions:

      1. Tokenize the search context obtained at 4a. (FT2.2) (Whether this pays any attention to FTMatchOptions is up to the implementation.) [This happens within fts:matchTokenInfos.]

      2. Match the search context tokens and the query tokens, yielding an element(fts:tokenInfo)* structure. [This happens within fts:matchTokenInfos.]

      3. Convert that into an element(fts:allMatches). (FT3) [This happens in fts:applyQueryTokensAsPhrase.]

    4. Traverse the operator tree from the bottom up. At each point, the AllMatches instances produced by subtrees are taken as input, and a new AllMatches instance is obtained as output. (FT4) [This is most of the section 4 code.]

    5. If the topmost AllMatches instance contains a Match with no StringExcludes, then the search context item satisfies the full-text condition given by the FTSelection, and the call to fts:FTContainsExpr returns true. [This is handled by the QuantifiedExpr in fts:FTContainsExpr.]

    [Note that the section 4 code doesn't implement 4b-4d as three sequential steps. Instead, they are different aspects of a single traversal of the operator tree.]

    If none of the topmost AllMatches provides a successful match, then fts:FTContainsExpr returns false.

  5. The boolean value returned by the call to fts:FTContainsExpr is the value of the FTContainsExpr. (FT5)

declare function fts:FTContainsExpr (
      $searchContextItems as item()*,
      $ignoreNodes as node()*,
      $ftSelection as element(*,fts:ftSelection),
      $defOptions as element(fts:matchOptions) )
   as xs:boolean 
{ 
   some $searchContext in $searchContextItems
   satisfies 
      let $newSearchContext := fts:reconstruct( $searchContext, $ignoreNodes )
      return
         if (fn:empty($newSearchContext)) then fn:false()
         else
            let $allMatches := fts:evaluate($ftSelection,
                                            $newSearchContext,
                                            $defOptions,
                                            0)
            return 
               some $match in $allMatches/fts:match
               satisfies 
                  fn:count($match/fts:stringExclude) eq 0
};

declare function fts:reconstruct (
      $n as item(),
      $ignore as node()* )
   as item()?
{
   typeswitch ($n)
     case node() return
        if (some $i in $ignore satisfies $n is $i) then ()
        else if ($n instance of element()) then
           let $nodeName := fn:node-name($n)
           let $nodeContent := for $nn in $n/node()
                               return fts:reconstruct($nn,$ignore)
           return element {$nodeName} {$nodeContent}
        else if ($n instance of document-node()) then
           document {
              for $nn in $n/node()
              return fts:reconstruct($nn, $ignore)
           }
        else $n
     default return $n
};
            

4.4 Scoring

This section addresses the semantics of scoring variables in XQuery 1.0 for and let clauses and XPath 2.0 for expressions.

Scoring variables associate a numeric score with the result of the evaluation of XQuery 1.0 and XPath 2.0 expressions. This numeric score tries to estimate the value of a result item to the user information need expressed using the XQuery 1.0 and XPath 2.0 expression. The numeric score is computed using an implementation-dependent scoring algorithm.

There are numerous scoring algorithms used in practice. Most of the scoring algorithms take as inputs a query and a set of results to the query. In computing the score, these algorithms rely on the structure of the query to estimate the relevance of the results.

In the context of defining the semantics of XQuery and XPath Full Text, passing the structure of the query poses a problem. The query may contain XQuery 1.0 and XPath 2.0 expressions and XQuery and XPath Full Text 1.0 expressions in particular. The semantics of XQuery 1.0 and XPath 2.0 expressions is defined using (among other things) functions that take as arguments sequences of items and return sequences of items. They are not aware of what expression produced a particular sequence, i.e., they are not aware of the expression structure.

To define the semantics of scoring in XQuery and XPath Full Text 1.0 using XQuery 1.0, expressions that produce the query result (or the functions that implement the expressions) must be passed as arguments. In other words, second-order functions are necessary. Currently XQuery 1.0 and XPath 2.0 do not provide such functions.

Nevertheless, in the interest of the exposition, assume that such second-order functions are present. In particular, that there are two semantic second-order function fts:score and fts:scoreSequence that take one argument (an expression) and return the score value of this expression, respectively a sequence of score values, one for each item to which the expression evaluates. The scores must satisfy scoring properties.

A for clause containing a score variable

for $result score $score in Expr
...

is evaluated as though it is replaced by the following the set of clauses.

let $scoreSeq := fts:scoreSequence(Expr)
for $result at $i in Expr
let $score := $scoreSeq[$i]
...

Here, $scoreSeq and $i are new variables, not appearing elsewhere, and fts:scoreSequence is the second-order function.

Similarly, a let clause containing a score variable

let score $score := Expr
...

is evaluated as though it is replaced by the following clause.

let $score := fts:score(Expr)
...

4.5 Example

This section presents a more complex example for the evaluation of FTContainsExpr. This example uses the same sample document fragment and assigns it $doc. Consider the following FTContainsExpr.

    $doc contains text (
        (
            "Mustang"
            ftand
            ({("great", "excellent")} any word occurs at least 2 times)
            window 11 words
        )
        ftand
        ftnot "rust"
    ) same paragraph

Begin by evaluating the FTSelection to AllMatches.

    (
        (
            "Mustang"
            ftand
            ({("great", "excellent")} any word occurs at least 2 times)
            window 11 words
        )
        ftand
        ftnot "rust"
    ) same paragraph

Step 1: Evaluate the FTWords "Mustang".

Example, step 1

Step 2: Evaluate the FTWords {"great", "excellent"} any word.

Step 2.1: Match the token "great"

Example, step 2

Step 2.2 Match the token "excellent"

Example, step 3

Step 2.3 - Combine the above AllMatches as if FTOr is used, i.e., by forming a union of the Matches.

Example, step 4

Step 3 - Apply the FTTimes {("great", "excellent")} any word occurs at least 2 times forming two pairs of Matches.

Example, step 5.1

Example, step 5.1

Example, step 5.2

Step 4 - Apply the FTAnd "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) forming all possible pairs of StringMatches.

Example, step 6.1

Example, step 6.1

Example, step 6.2

Example, step 6.2

Example, step 6.3

Example, step 6.3

Example, step 6.4

Example, step 6.4

Example, step 6.5

Example, step 6.5

Step 5 - Apply the FTWindow ( "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) window 11 words ) , filtering out Matches for which the window is not less than or equal to 11 tokens.

Example, step 7.1

Example, step 7.2

Example, step 7.2

Step 6 - Evaluate FTWords "rust".

Example, step 8

Step 7 - Apply the FTUnaryNot ftnot "rust", transforming the StringInclude into a StringExclude.

Example, step 9

Step 8 - Apply the FTAnd ( ( "Mustang" ftand ({("great", "excellent")} any word occurs at least 2 times) window 11 words ) ftand ftnot "rust" ) , forming all possible combintations of three StringMatches from the first AllMatches and one StringMatch from the second AllMatches.

Example, step 10.1

Example, step 10.2

Example, step 10.3

Step 9: Apply the FTScope, filtering out Matches whose TokenInfos are not within the same paragraph (assuming the <offer> elements determine paragraph boundaries).

Example, step 11

The resulting AllMatches contains a Match that does not contain a StringExclude. Therefore, the sample FTContainsExpr returns true.

5 Conformance

This section defines the conformance criteria for a XQuery and XPath Full Text 1.0 processor.

In this section, the following terms are used to indicate the requirement levels defined in [RFC 2119]. [Definition: MUST means that the item is an absolute requirement of the specification.] [Definition: MAY means that an item is truly optional.] [Definition: SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.]

An XQuery and XPath Full Text 1.0 processor that claims to conform to this specification MUST include a claim of Minimal Conformance as defined in 5.1 Minimal Conformance. In addition to a claim of Minimal Conformance, it MAY claim conformance to one or more optional features defined in 5.2 Optional Features

5.1 Minimal Conformance

Minimal Conformance to this specification MUST include all of the following items:

  1. Minimal support for XQuery 1.0 [XQuery 1.0: An XML Query Language (Second Edition)] or XPath 2.0 [XML Path Language (XPath) 2.0 (Second Edition)]. The optional features of XQuery 1.0 [XQuery 1.0: An XML Query Language (Second Edition)] or XPath 2.0 [XML Path Language (XPath) 2.0 (Second Edition)] MAY be supported.

  2. Support for everything specified in this document except those operators and match options specified in 5.2 Optional Features to be optional. If an implementation does not provide a given optional operator or match option, it MUST implement any requirements specified in 5.2 Optional Features for implementations that do not provide that operator or match option.

  3. A definition of every item specified to be implementation-defined in I Checklist of Implementation-Defined Features.

    Note:

    Implementations are not required to define items specified to be implementation-dependent

5.2 Optional Features

5.2.1 FTMildNot Operator

It is optional whether the implementation supports the FTMildNot. If it does not support FTMildNot and encounters one in a full-text query, then it MUST raise an error [err:FTST0001].

5.2.2 FTUnaryNot Operator

The unrestricted form of negation in FTUnaryNot, that can negate every kind of FTSelection, is optional. Implementations may choose to support the negation operation in a restricted form, enforcing one or both of the following restrictions.

  • [Definition: Negation Restriction 1. An FTUnaryNot expression may only appear as a direct right operand of an "ftand" (FTAnd) operation.]

  • [Definition: Negation Restriction 2. An FTUnaryNot expression may not appear as a descendant of an FTOr that is modified by an FTPosFilter. (An FTOr is modified by an FTPosFilter, if it is derived using the production for FTSelection together with that FTPosFilter.)]

Consider the following example FTSelections.

1. ftnot "web"

2. "web" ftand ( ftnot "information" ftor "retrieval" )

3. "web" ftand ftnot("information" ftand "retrieval")

4. "web" ftand ftnot("information" ftand "retrieval" window 5 words)

5. "web" ftand ("information" ftand ftnot "retrieval" window 5 words)

The first two FTSelections both violate restriction 1, while the third and the fourth are conform with both restrictions. The fifth one violates restriction 2, while obeying restriction 1. Note that in the last example the FTSelection to which the window operation is applied is "information" ftand ftnot "retrieval", which contains an FTUnaryNot expression.

If the implementation does enforce one or both of these restrictions on FTUnaryNot and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0002].

5.2.3 FTUnit and FTBigUnit

Support for the "sentences" alternative of FTUnit and the "sentence" alternative of FTBigUnit is optional. Similarly, support for the "paragraphs" alternative of FTUnit and the "paragraph" alternative of FTBigUnit is optional. If an implementation does not support one or more choices of FTUnit or FTBigUnit and encounters an unsupported FTUnit or FTBigUnit in a full-text query, then it MUST raise an error [err:FTST0003].

5.2.4 FTOrder Operator

The unrestricted form of the FTOrder postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTOrder.

[Definition: Order Operator Restriction. FTOrder may only appear directly succeeding an FTWindow or an FTDistance operator.]

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0010].

5.2.5 FTScope Operator

It is optional whether the implementation supports the FTScope operator. If it does not support FTScope and encounters one in a full-text query, then it MUST raise an error [err:FTST0004].

5.2.6 FTWindow Operator

The unrestricted form of the FTWindow postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTWindow.

[Definition: Window Operator Restriction. FTWindow can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.]

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0011].

5.2.7 FTDistance Operator

The unrestricted form of the FTDistance postfix operator, that can be applied to any kind of FTSelection, is optional. Implementations may choose to enforce the following restriction on the use of FTDistance.

[Definition: Distance Operator Restriction. FTDistance can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.]

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0011].

5.2.8 FTTimes Operator

It is optional whether the implementation supports the FTTimes operator. If it does not support FTTimes and encounters one in a full-text query, then it MUST raise an error [err:FTST0005].

5.2.9 FTContent Operator

It is optional whether the implementation supports the FTContent operator. If it does not support FTContent and encounters one in a full-text query, then it MUST raise an error [err:FTST0012].

5.2.10 FTCaseOption

It is optional whether the implementation supports the "lowercase" and "uppercase" choices for the FTCaseOption. If it does not support these choices for the FTCaseOption and encounters an unsupported choice in a full-text query, then it MUST raise an error [err:FTST0015].

5.2.11 FTStopWordOption

It is optional whether the implementation supports the FTStopWordOption. If it does not support FTStopWordOption and encounters one in a full-text query, then it MUST raise an error [err:FTST0006].

It is optional whether the implementation supports the FTStopWordOption in the body of the query. If it supports FTStopWordOption in the prolog, but not in the body of a query, and encounters one in the body of a query it MUST raise an error [err:FTST0006].

It is optional whether the implementation supports the StringLiteral alternative of FTStopWords in the FTStopWordOption. If it does not support the StringLiteral alternative of FTStopWords and encounters such an alternative in a full-text query, then it MUST raise an error [err:FTST0006].

5.2.12 FTLanguageOption

It is optional whether the implementation supports the unrestricted form of FTLanguageOption. Implementations may choose to enforce the following restriction on the use of FTLanguageOption.

[Definition: Single Language Restriction. If a full-text query contains more than one FTLanguageOption in its body and the prolog, then the languages specified must be the same.]

If the implementation does enforce this restriction and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0013].

5.2.13 FTIgnoreOption

The implementation may constrain the set of ignored nodes. If the operand of FTIgnoreOption violates the implementation-defined restriction on that operand, it MUST raise an error [err:FTST0007].

5.2.14 Scoring

The implementation may restrict the allowable expressions used to compute scores. The restrictions are implementation-defined.

If the implementation does enforce such restrictions and encounters a full-text query that does not obey the restriction then it MUST raise an error [err:FTST0014].

5.2.15 Weights

An implementation may constrain the range of valid weights to non-negative values. If an implementation does enforce this restriction and encounters a full-text query that uses a negative weight, it MUST raise an error [err:FTDY0016].

6 XQueryX Conformance

This section defines the conformance criteria for an XQueryX processor that includes the Full Text capability.

In this section, the terms MUST, MAY, and SHOULD are used as defined in 5 Conformance.

An XQueryX processor that claims to conform to this specification MUST implement the XQueryX syntax as defined in E XML Syntax (XQueryX) for XQuery and XPath Full Text 1.0 and include a claim of Minimal Conformance as defined in 5.1 Minimal Conformance. In addition to a claim of Minimal Conformance, it MAY claim conformance to one or more optional features defined in 5.2 Optional Features.

A EBNF for XQuery 1.0 Grammar with Full Text extensions

The EBNF in this document and in this section is aligned with the current XML Query 1.0 grammar (see http://www.w3.org/TR/2010/REC-xquery-20101214/).

[1]    Module    ::=    VersionDecl? (LibraryModule | MainModule)
[2]    VersionDecl    ::=    "xquery" "version" StringLiteral ("encoding" StringLiteral)? Separator
[3]    MainModule    ::=    Prolog QueryBody
[4]    LibraryModule    ::=    ModuleDecl Prolog
[5]    ModuleDecl    ::=    "module" "namespace" NCName "=" URILiteral Separator
[6]    Prolog    ::=    ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import | FTOptionDecl) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)*
[7]    Separator    ::=    ";"
[8]    Setter    ::=    BoundarySpaceDecl | DefaultCollationDecl | BaseURIDecl | ConstructionDecl | OrderingModeDecl | EmptyOrderDecl | CopyNamespacesDecl
[9]    BoundarySpaceDecl    ::=    "declare" "boundary-space" ("preserve" | "strip")
[10]    DefaultCollationDecl    ::=    "declare" "default" "collation" URILiteral
[11]    BaseURIDecl    ::=    "declare" "base-uri" URILiteral
[12]    ConstructionDecl    ::=    "declare" "construction" ("strip" | "preserve")
[13]    OrderingModeDecl    ::=    "declare" "ordering" ("ordered" | "unordered")
[14]    EmptyOrderDecl    ::=    "declare" "default" "order" "empty" ("greatest" | "least")
[15]    CopyNamespacesDecl    ::=    "declare" "copy-namespaces" PreserveMode "," InheritMode
[16]    PreserveMode    ::=    "preserve" | "no-preserve"
[17]    InheritMode    ::=    "inherit" | "no-inherit"
[18]    Import    ::=    SchemaImport | ModuleImport
[19]    SchemaImport    ::=    "import" "schema" SchemaPrefix? URILiteral ("at" URILiteral ("," URILiteral)*)?
[20]    SchemaPrefix    ::=    ("namespace" NCName "=") | ("default" "element" "namespace")
[21]    ModuleImport    ::=    "import" "module" ("namespace" NCName "=")? URILiteral ("at" URILiteral ("," URILiteral)*)?
[22]    NamespaceDecl    ::=    "declare" "namespace" NCName "=" URILiteral
[23]    DefaultNamespaceDecl    ::=    "declare" "default" ("element" | "function") "namespace" URILiteral
[24]    FTOptionDecl    ::=    "declare" "ft-option" FTMatchOptions
[25]    VarDecl    ::=    "declare" "variable" "$" QName TypeDeclaration? ((":=" ExprSingle) | "external")
[26]    FunctionDecl    ::=    "declare" "function" QName "(" ParamList? ")" ("as" SequenceType)? (EnclosedExpr | "external") /* xgc: reserved-function-namesXQ */
[27]    ParamList    ::=    Param ("," Param)*
[28]    Param    ::=    "$" QName TypeDeclaration?
[29]    EnclosedExpr    ::=    "{" Expr "}"
[30]    OptionDecl    ::=    "declare" "option" QName StringLiteral
[31]    QueryBody    ::=    Expr
[32]    Expr    ::=    ExprSingle ("," ExprSingle)*
[33]    ExprSingle    ::=    FLWORExpr
| QuantifiedExpr
| TypeswitchExpr
| IfExpr
| OrExpr
[34]    FLWORExpr    ::=    (ForClause | LetClause)+ WhereClause? OrderByClause? "return" ExprSingle
[35]    ForClause    ::=    "for" "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle ("," "$" VarName TypeDeclaration? PositionalVar? FTScoreVar? "in" ExprSingle)*
[36]    PositionalVar    ::=    "at" "$" VarName
[37]    FTScoreVar    ::=    "score" "$" VarName
[38]    LetClause    ::=    "let" (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle ("," (("$" VarName TypeDeclaration?) | FTScoreVar) ":=" ExprSingle)*
[39]    WhereClause    ::=    "where" ExprSingle
[40]    OrderByClause    ::=    (("order" "by") | ("stable" "order" "by")) OrderSpecList
[41]    OrderSpecList    ::=    OrderSpec ("," OrderSpec)*
[42]    OrderSpec    ::=    ExprSingle OrderModifier
[43]    OrderModifier    ::=    ("ascending" | "descending")? ("empty" ("greatest" | "least"))? ("collation" URILiteral)?
[44]    QuantifiedExpr    ::=    ("some" | "every") "$" VarName TypeDeclaration? "in" ExprSingle ("," "$" VarName TypeDeclaration? "in" ExprSingle)* "satisfies" ExprSingle
[45]    TypeswitchExpr    ::=    "typeswitch" "(" Expr ")" CaseClause+ "default" ("$" VarName)? "return" ExprSingle
[46]    CaseClause    ::=    "case" ("$" VarName "as")? SequenceType "return" ExprSingle
[47]    IfExpr    ::=    "if" "(" Expr ")" "then" ExprSingle "else" ExprSingle
[48]    OrExpr    ::=    AndExpr ( "or" AndExpr )*
[49]    AndExpr    ::=    ComparisonExpr ( "and" ComparisonExpr )*
[50]    ComparisonExpr    ::=    FTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?
[51]    FTContainsExpr    ::=    RangeExpr ( "contains" "text" FTSelection FTIgnoreOption? )?
[52]    RangeExpr    ::=    AdditiveExpr ( "to" AdditiveExpr )?
[53]    AdditiveExpr    ::=    MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*
[54]    MultiplicativeExpr    ::=    UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
[55]    UnionExpr    ::=    IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*
[56]    IntersectExceptExpr    ::=    InstanceofExpr ( ("intersect" | "except") InstanceofExpr )*
[57]    InstanceofExpr    ::=    TreatExpr ( "instance" "of" SequenceType )?
[58]    TreatExpr    ::=    CastableExpr ( "treat" "as" SequenceType )?
[59]    CastableExpr    ::=    CastExpr ( "castable" "as" SingleType )?
[60]    CastExpr    ::=    UnaryExpr ( "cast" "as" SingleType )?
[61]    UnaryExpr    ::=    ("-" | "+")* ValueExpr
[62]    ValueExpr    ::=    ValidateExpr | PathExpr | ExtensionExpr
[63]    GeneralComp    ::=    "=" | "!=" | "<" | "<=" | ">" | ">="
[64]    ValueComp    ::=    "eq" | "ne" | "lt" | "le" | "gt" | "ge"
[65]    NodeComp    ::=    "is" | "<<" | ">>"
[66]    ValidateExpr    ::=    "validate" ValidationMode? "{" Expr "}"
[67]    ValidationMode    ::=    "lax" | "strict"
[68]    ExtensionExpr    ::=    Pragma+ "{" Expr? "}"
[69]    Pragma    ::=    "(#" S? QName (S PragmaContents)? "#)" /* ws: explicitXQ */
[70]    PragmaContents    ::=    (Char* - (Char* '#)' Char*))
[71]    PathExpr    ::=    ("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExpr
/* xgc: leading-lone-slashXQ */
[72]    RelativePathExpr    ::=    StepExpr (("/" | "//") StepExpr)*
[73]    StepExpr    ::=    FilterExpr | AxisStep
[74]    AxisStep    ::=    (ReverseStep | ForwardStep) PredicateList
[75]    ForwardStep    ::=    (ForwardAxis NodeTest) | AbbrevForwardStep
[76]    ForwardAxis    ::=    ("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")
[77]    AbbrevForwardStep    ::=    "@"? NodeTest
[78]    ReverseStep    ::=    (ReverseAxis NodeTest) | AbbrevReverseStep
[79]    ReverseAxis    ::=    ("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")
[80]    AbbrevReverseStep    ::=    ".."
[81]    NodeTest    ::=    KindTest | NameTest
[82]    NameTest    ::=    QName | Wildcard
[83]    Wildcard    ::=    "*"
| (NCName ":" "*")
| ("*" ":" NCName)
/* ws: explicitXQ */
[84]    FilterExpr    ::=    PrimaryExpr PredicateList
[85]    PredicateList    ::=    Predicate*
[86]    Predicate    ::=    "[" Expr "]"
[87]    PrimaryExpr    ::=    Literal
| VarRef
| ParenthesizedExpr
| ContextItemExpr
| FunctionCall
| OrderedExpr
| UnorderedExpr
| Constructor
[88]    Literal    ::=    NumericLiteral | StringLiteral
[89]    NumericLiteral    ::=    IntegerLiteral | DecimalLiteral | DoubleLiteral
[90]    VarRef    ::=    "$" VarName
[91]    VarName    ::=    QName
[92]    ParenthesizedExpr    ::=    "(" Expr? ")"
[93]    ContextItemExpr    ::=    "."
[94]    OrderedExpr    ::=    "ordered" "{" Expr "}"
[95]    UnorderedExpr    ::=    "unordered" "{" Expr "}"
[96]    FunctionCall    ::=    QName "(" (ExprSingle ("," ExprSingle)*)? ")" /* xgc: reserved-function-namesXQ */
/* gn: parensXQ */
[97]    Constructor    ::=    DirectConstructor
| ComputedConstructor
[98]    DirectConstructor    ::=    DirElemConstructor
| DirCommentConstructor
| DirPIConstructor
[99]    DirElemConstructor    ::=    "<" QName DirAttributeList ("/>" | (">" DirElemContent* "</" QName S? ">")) /* ws: explicitXQ */
[100]    DirAttributeList    ::=    (S (QName S? "=" S? DirAttributeValue)?)* /* ws: explicitXQ */
[101]    DirAttributeValue    ::=    ('"' (EscapeQuot | QuotAttrValueContent)* '"')
| ("'" (EscapeApos | AposAttrValueContent)* "'")
/* ws: explicitXQ */
[102]    QuotAttrValueContent    ::=    QuotAttrContentChar
| CommonContent
[103]    AposAttrValueContent    ::=    AposAttrContentChar
| CommonContent
[104]    DirElemContent    ::=    DirectConstructor
| CDataSection
| CommonContent
| ElementContentChar
[105]    CommonContent    ::=    PredefinedEntityRef | CharRef | "{{" | "}}" | EnclosedExpr
[106]    DirCommentConstructor    ::=    "<!--" DirCommentContents "-->" /* ws: explicitXQ */
[107]    DirCommentContents    ::=    ((Char - '-') | ('-' (Char - '-')))* /* ws: explicitXQ */
[108]    DirPIConstructor    ::=    "<?" PITarget (S DirPIContents)? "?>" /* ws: explicitXQ */
[109]    DirPIContents    ::=    (Char* - (Char* '?>' Char*)) /* ws: explicitXQ */
[110]    CDataSection    ::=    "<![CDATA[" CDataSectionContents "]]>" /* ws: explicitXQ */
[111]    CDataSectionContents    ::=    (Char* - (Char* ']]>' Char*)) /* ws: explicitXQ */
[112]    ComputedConstructor    ::=    CompDocConstructor
| CompElemConstructor
| CompAttrConstructor
| CompTextConstructor
| CompCommentConstructor
| CompPIConstructor
[113]    CompDocConstructor    ::=    "document" "{" Expr "}"
[114]    CompElemConstructor    ::=    "element" (QName | ("{" Expr "}")) "{" ContentExpr? "}"
[115]    ContentExpr    ::=    Expr
[116]    CompAttrConstructor    ::=    "attribute" (QName | ("{" Expr "}")) "{" Expr? "}"
[117]    CompTextConstructor    ::=    "text" "{" Expr "}"
[118]    CompCommentConstructor    ::=    "comment" "{" Expr "}"
[119]    CompPIConstructor    ::=    "processing-instruction" (NCName | ("{" Expr "}")) "{" Expr? "}"
[120]    SingleType    ::=    AtomicType "?"?
[121]    TypeDeclaration    ::=    "as" SequenceType
[122]    SequenceType    ::=    ("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)
[123]    OccurrenceIndicator    ::=    "?" | "*" | "+" /* xgc: occurrence-indicatorsXQ */
[124]    ItemType    ::=    KindTest | ("item" "(" ")") | AtomicType
[125]    AtomicType    ::=    QName
[126]    KindTest    ::=    DocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTest
[127]    AnyKindTest    ::=    "node" "(" ")"
[128]    DocumentTest    ::=    "document-node" "(" (ElementTest | SchemaElementTest)? ")"
[129]    TextTest    ::=    "text" "(" ")"
[130]    CommentTest    ::=    "comment" "(" ")"
[131]    PITest    ::=    "processing-instruction" "(" (NCName | StringLiteral)? ")"
[132]    AttributeTest    ::=    "attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"
[133]    AttribNameOrWildcard    ::=    AttributeName | "*"
[134]    SchemaAttributeTest    ::=    "schema-attribute" "(" AttributeDeclaration ")"
[135]    AttributeDeclaration    ::=    AttributeName
[136]    ElementTest    ::=    "element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"
[137]    ElementNameOrWildcard    ::=    ElementName | "*"
[138]    SchemaElementTest    ::=    "schema-element" "(" ElementDeclaration ")"
[139]    ElementDeclaration    ::=    ElementName
[140]    AttributeName    ::=    QName
[141]    ElementName    ::=    QName
[142]    TypeName    ::=    QName
[143]    URILiteral    ::=    StringLiteral
[144]    FTSelection    ::=    FTOr FTPosFilter*
[145]    FTWeight    ::=    "weight" "{" Expr "}"
[146]    FTOr    ::=    FTAnd ( "ftor" FTAnd )*
[147]    FTAnd    ::=    FTMildNot ( "ftand" FTMildNot )*
[148]    FTMildNot    ::=    FTUnaryNot ( "not" "in" FTUnaryNot )*
[149]    FTUnaryNot    ::=    ("ftnot")? FTPrimaryWithOptions
[150]    FTPrimaryWithOptions    ::=    FTPrimary FTMatchOptions? FTWeight?
[151]    FTPrimary    ::=    (FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection
[152]    FTWords    ::=    FTWordsValue FTAnyallOption?
[153]    FTWordsValue    ::=    StringLiteral | ("{" Expr "}")
[154]    FTExtensionSelection    ::=    Pragma+ "{" FTSelection? "}"
[155]    FTAnyallOption    ::=    ("any" "word"?) | ("all" "words"?) | "phrase"
[156]    FTTimes    ::=    "occurs" FTRange "times"
[157]    FTRange    ::=    ("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)
[158]    FTPosFilter    ::=    FTOrder | FTWindow | FTDistance | FTScope | FTContent
[159]    FTOrder    ::=    "ordered"
[160]    FTWindow    ::=    "window" AdditiveExpr FTUnit
[161]    FTDistance    ::=    "distance" FTRange FTUnit
[162]    FTUnit    ::=    "words" | "sentences" | "paragraphs"
[163]    FTScope    ::=    ("same" | "different") FTBigUnit
[164]    FTBigUnit    ::=    "sentence" | "paragraph"
[165]    FTContent    ::=    ("at" "start") | ("at" "end") | ("entire" "content")
[166]    FTMatchOptions    ::=    ("using" FTMatchOption)+
[167]    FTMatchOption    ::=    FTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopWordOption
| FTExtensionOption
[168]    FTCaseOption    ::=    ("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"
[169]    FTDiacriticsOption    ::=    ("diacritics" "insensitive")
| ("diacritics" "sensitive")
[170]    FTStemOption    ::=    "stemming" | ("no" "stemming")
[171]    FTThesaurusOption    ::=    ("thesaurus" (FTThesaurusID | "default"))
| ("thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("no" "thesaurus")
[172]    FTThesaurusID    ::=    "at" URILiteral ("relationship" StringLiteral)? (FTLiteralRange "levels")?
[173]    FTLiteralRange    ::=    ("exactly" IntegerLiteral)
| ("at" "least" IntegerLiteral)
| ("at" "most" IntegerLiteral)
| ("from" IntegerLiteral "to" IntegerLiteral)
[174]    FTStopWordOption    ::=    ("stop" "words" FTStopWords FTStopWordsInclExcl*)
| ("stop" "words" "default" FTStopWordsInclExcl*)
| ("no" "stop" "words")
[175]    FTStopWords    ::=    ("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")
[176]    FTStopWordsInclExcl    ::=    ("union" | "except") FTStopWords
[177]    FTLanguageOption    ::=    "language" StringLiteral
[178]    FTWildCardOption    ::=    "wildcards" | ("no" "wildcards")
[179]    FTExtensionOption    ::=    "option" QName StringLiteral
[180]    FTIgnoreOption    ::=    "without" "content" UnionExpr

A.1 Terminal Symbols

[181]    IntegerLiteral    ::=    Digits
[182]    DecimalLiteral    ::=    ("." Digits) | (Digits "." [0-9]*) /* ws: explicitXQ */
[183]    DoubleLiteral    ::=    (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits /* ws: explicitXQ */
[184]    StringLiteral    ::=    ('"' (PredefinedEntityRef | CharRef | EscapeQuot | [^"&])* '"') | ("'" (PredefinedEntityRef | CharRef | EscapeApos | [^'&])* "'") /* ws: explicitXQ */
[185]    PredefinedEntityRef    ::=    "&" ("lt" | "gt" | "amp" | "quot" | "apos") ";" /* ws: explicitXQ */
[186]    EscapeQuot    ::=    '""'
[187]    EscapeApos    ::=    "''"
[188]    ElementContentChar    ::=    (Char - [{}<&])
[189]    QuotAttrContentChar    ::=    (Char - ["{}<&])
[190]    AposAttrContentChar    ::=    (Char - ['{}<&])
[191]    Comment    ::=    "(:" (CommentContents | Comment)* ":)" /* ws: explicitXQ */
/* gn: commentsXQ */
[192]    PITarget    ::=    [http://www.w3.org/TR/REC-xml#NT-PITarget]XML /* xgc: xml-versionXQ */
[193]    CharRef    ::=    [http://www.w3.org/TR/REC-xml#NT-CharRef]XML /* xgc: xml-versionXQ */
[194]    QName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-QName]Names /* xgc: xml-versionXQ */
[195]    NCName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names /* xgc: xml-versionXQ */
[196]    S    ::=    [http://www.w3.org/TR/REC-xml#NT-S]XML /* xgc: xml-versionXQ */
[197]    Char    ::=    [http://www.w3.org/TR/REC-xml#NT-Char]XML /* xgc: xml-versionXQ */

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of A EBNF for XQuery 1.0 Grammar with Full Text extensions.

[198]    Digits    ::=    [0-9]+
[199]    CommentContents    ::=    (Char+ - (Char* ('(:' | ':)') Char*))

B EBNF for XPath 2.0 Grammar with Full-Text extensions

The EBNF in this document and in this section is aligned with the current XPath 2.0 grammar (see http://www.w3.org/TR/2010/REC-xpath20-20101214/).

[1]    XPath    ::=    Expr
[2]    Expr    ::=    ExprSingle ("," ExprSingle)*
[3]    ExprSingle    ::=    ForExpr
| QuantifiedExpr
| IfExpr
| OrExpr
[4]    ForExpr    ::=    SimpleForClause "return" ExprSingle
[5]    SimpleForClause    ::=    "for" "$" VarName FTScoreVar? "in" ExprSingle ("," "$" VarName FTScoreVar? "in" ExprSingle)*
[6]    FTScoreVar    ::=    "score" "$" VarName
[7]    QuantifiedExpr    ::=    ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle
[8]    IfExpr    ::=    "if" "(" Expr ")" "then" ExprSingle "else" ExprSingle
[9]    OrExpr    ::=    AndExpr ( "or" AndExpr )*
[10]    AndExpr    ::=    ComparisonExpr ( "and" ComparisonExpr )*
[11]    ComparisonExpr    ::=    FTContainsExpr ( (ValueComp
| GeneralComp
| NodeComp) FTContainsExpr )?
[12]    FTContainsExpr    ::=    RangeExpr ( "contains" "text" FTSelection FTIgnoreOption? )?
[13]    RangeExpr    ::=    AdditiveExpr ( "to" AdditiveExpr )?
[14]    AdditiveExpr    ::=    MultiplicativeExpr ( ("+" | "-") MultiplicativeExpr )*
[15]    MultiplicativeExpr    ::=    UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
[16]    UnionExpr    ::=    IntersectExceptExpr ( ("union" | "|") IntersectExceptExpr )*
[17]    IntersectExceptExpr    ::=    InstanceofExpr ( ("intersect" | "except") InstanceofExpr )*
[18]    InstanceofExpr    ::=    TreatExpr ( "instance" "of" SequenceType )?
[19]    TreatExpr    ::=    CastableExpr ( "treat" "as" SequenceType )?
[20]    CastableExpr    ::=    CastExpr ( "castable" "as" SingleType )?
[21]    CastExpr    ::=    UnaryExpr ( "cast" "as" SingleType )?
[22]    UnaryExpr    ::=    ("-" | "+")* ValueExpr
[23]    ValueExpr    ::=    PathExpr
[24]    GeneralComp    ::=    "=" | "!=" | "<" | "<=" | ">" | ">="
[25]    ValueComp    ::=    "eq" | "ne" | "lt" | "le" | "gt" | "ge"
[26]    NodeComp    ::=    "is" | "<<" | ">>"
[27]    Pragma    ::=    "(#" S? QName (S PragmaContents)? "#)" /* ws: explicitXP */
[28]    PragmaContents    ::=    (Char* - (Char* '#)' Char*))
[29]    PathExpr    ::=    ("/" RelativePathExpr?)
| ("//" RelativePathExpr)
| RelativePathExpr
/* xgc: leading-lone-slashXP */
[30]    RelativePathExpr    ::=    StepExpr (("/" | "//") StepExpr)*
[31]    StepExpr    ::=    FilterExpr | AxisStep
[32]    AxisStep    ::=    (ReverseStep | ForwardStep) PredicateList
[33]    ForwardStep    ::=    (ForwardAxis NodeTest) | AbbrevForwardStep
[34]    ForwardAxis    ::=    ("child" "::")
| ("descendant" "::")
| ("attribute" "::")
| ("self" "::")
| ("descendant-or-self" "::")
| ("following-sibling" "::")
| ("following" "::")
| ("namespace" "::")
[35]    AbbrevForwardStep    ::=    "@"? NodeTest
[36]    ReverseStep    ::=    (ReverseAxis NodeTest) | AbbrevReverseStep
[37]    ReverseAxis    ::=    ("parent" "::")
| ("ancestor" "::")
| ("preceding-sibling" "::")
| ("preceding" "::")
| ("ancestor-or-self" "::")
[38]    AbbrevReverseStep    ::=    ".."
[39]    NodeTest    ::=    KindTest | NameTest
[40]    NameTest    ::=    QName | Wildcard
[41]    Wildcard    ::=    "*"
| (NCName ":" "*")
| ("*" ":" NCName)
/* ws: explicitXP */
[42]    FilterExpr    ::=    PrimaryExpr PredicateList
[43]    PredicateList    ::=    Predicate*
[44]    Predicate    ::=    "[" Expr "]"
[45]    PrimaryExpr    ::=    Literal
| VarRef
| ParenthesizedExpr
| ContextItemExpr
| FunctionCall
[46]    Literal    ::=    NumericLiteral | StringLiteral
[47]    NumericLiteral    ::=    IntegerLiteral | DecimalLiteral | DoubleLiteral
[48]    VarRef    ::=    "$" VarName
[49]    VarName    ::=    QName
[50]    ParenthesizedExpr    ::=    "(" Expr? ")"
[51]    ContextItemExpr    ::=    "."
[52]    FunctionCall    ::=    QName "(" (ExprSingle ("," ExprSingle)*)? ")" /* xgc: reserved-function-namesXP */
/* gn: parensXP */
[53]    SingleType    ::=    AtomicType "?"?
[54]    SequenceType    ::=    ("empty-sequence" "(" ")")
| (ItemType OccurrenceIndicator?)
[55]    OccurrenceIndicator    ::=    "?" | "*" | "+" /* xgc: occurrence-indicatorsXP */
[56]    ItemType    ::=    KindTest | ("item" "(" ")") | AtomicType
[57]    AtomicType    ::=    QName
[58]    KindTest    ::=    DocumentTest
| ElementTest
| AttributeTest
| SchemaElementTest
| SchemaAttributeTest
| PITest
| CommentTest
| TextTest
| AnyKindTest
[59]    AnyKindTest    ::=    "node" "(" ")"
[60]    DocumentTest    ::=    "document-node" "(" (ElementTest | SchemaElementTest)? ")"
[61]    TextTest    ::=    "text" "(" ")"
[62]    CommentTest    ::=    "comment" "(" ")"
[63]    PITest    ::=    "processing-instruction" "(" (NCName | StringLiteral)? ")"
[64]    AttributeTest    ::=    "attribute" "(" (AttribNameOrWildcard ("," TypeName)?)? ")"
[65]    AttribNameOrWildcard    ::=    AttributeName | "*"
[66]    SchemaAttributeTest    ::=    "schema-attribute" "(" AttributeDeclaration ")"
[67]    AttributeDeclaration    ::=    AttributeName
[68]    ElementTest    ::=    "element" "(" (ElementNameOrWildcard ("," TypeName "?"?)?)? ")"
[69]    ElementNameOrWildcard    ::=    ElementName | "*"
[70]    SchemaElementTest    ::=    "schema-element" "(" ElementDeclaration ")"
[71]    ElementDeclaration    ::=    ElementName
[72]    AttributeName    ::=    QName
[73]    ElementName    ::=    QName
[74]    TypeName    ::=    QName
[75]    URILiteral    ::=    StringLiteral
[76]    FTSelection    ::=    FTOr FTPosFilter*
[77]    FTWeight    ::=    "weight" "{" Expr "}"
[78]    FTOr    ::=    FTAnd ( "ftor" FTAnd )*
[79]    FTAnd    ::=    FTMildNot ( "ftand" FTMildNot )*
[80]    FTMildNot    ::=    FTUnaryNot ( "not" "in" FTUnaryNot )*
[81]    FTUnaryNot    ::=    ("ftnot")? FTPrimaryWithOptions
[82]    FTPrimaryWithOptions    ::=    FTPrimary FTMatchOptions? FTWeight?
[83]    FTPrimary    ::=    (FTWords FTTimes?) | ("(" FTSelection ")") | FTExtensionSelection
[84]    FTWords    ::=    FTWordsValue FTAnyallOption?
[85]    FTWordsValue    ::=    StringLiteral | ("{" Expr "}")
[86]    FTExtensionSelection    ::=    Pragma+ "{" FTSelection? "}"
[87]    FTAnyallOption    ::=    ("any" "word"?) | ("all" "words"?) | "phrase"
[88]    FTTimes    ::=    "occurs" FTRange "times"
[89]    FTRange    ::=    ("exactly" AdditiveExpr)
| ("at" "least" AdditiveExpr)
| ("at" "most" AdditiveExpr)
| ("from" AdditiveExpr "to" AdditiveExpr)
[90]    FTPosFilter    ::=    FTOrder | FTWindow | FTDistance | FTScope | FTContent
[91]    FTOrder    ::=    "ordered"
[92]    FTWindow    ::=    "window" AdditiveExpr FTUnit
[93]    FTDistance    ::=    "distance" FTRange FTUnit
[94]    FTUnit    ::=    "words" | "sentences" | "paragraphs"
[95]    FTScope    ::=    ("same" | "different") FTBigUnit
[96]    FTBigUnit    ::=    "sentence" | "paragraph"
[97]    FTContent    ::=    ("at" "start") | ("at" "end") | ("entire" "content")
[98]    FTMatchOptions    ::=    ("using" FTMatchOption)+
[99]    FTMatchOption    ::=    FTLanguageOption
| FTWildCardOption
| FTThesaurusOption
| FTStemOption
| FTCaseOption
| FTDiacriticsOption
| FTStopWordOption
| FTExtensionOption
[100]    FTCaseOption    ::=    ("case" "insensitive")
| ("case" "sensitive")
| "lowercase"
| "uppercase"
[101]    FTDiacriticsOption    ::=    ("diacritics" "insensitive")
| ("diacritics" "sensitive")
[102]    FTStemOption    ::=    "stemming" | ("no" "stemming")
[103]    FTThesaurusOption    ::=    ("thesaurus" (FTThesaurusID | "default"))
| ("thesaurus" "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")
| ("no" "thesaurus")
[104]    FTThesaurusID    ::=    "at" URILiteral ("relationship" StringLiteral)? (FTLiteralRange "levels")?
[105]    FTLiteralRange    ::=    ("exactly" IntegerLiteral)
| ("at" "least" IntegerLiteral)
| ("at" "most" IntegerLiteral)
| ("from" IntegerLiteral "to" IntegerLiteral)
[106]    FTStopWordOption    ::=    ("stop" "words" FTStopWords FTStopWordsInclExcl*)
| ("stop" "words" "default" FTStopWordsInclExcl*)
| ("no" "stop" "words")
[107]    FTStopWords    ::=    ("at" URILiteral)
| ("(" StringLiteral ("," StringLiteral)* ")")
[108]    FTStopWordsInclExcl    ::=    ("union" | "except") FTStopWords
[109]    FTLanguageOption    ::=    "language" StringLiteral
[110]    FTWildCardOption    ::=    "wildcards" | ("no" "wildcards")
[111]    FTExtensionOption    ::=    "option" QName StringLiteral
[112]    FTIgnoreOption    ::=    "without" "content" UnionExpr

B.1 Terminal Symbols

[113]    IntegerLiteral    ::=    Digits
[114]    DecimalLiteral    ::=    ("." Digits) | (Digits "." [0-9]*) /* ws: explicitXP */
[115]    DoubleLiteral    ::=    (("." Digits) | (Digits ("." [0-9]*)?)) [eE] [+-]? Digits /* ws: explicitXP */
[116]    StringLiteral    ::=    ('"' (EscapeQuot | [^"])* '"') | ("'" (EscapeApos | [^'])* "'") /* ws: explicitXP */
[117]    EscapeQuot    ::=    '""'
[118]    EscapeApos    ::=    "''"
[119]    Comment    ::=    "(:" (CommentContents | Comment)* ":)" /* ws: explicitXP */
/* gn: commentsXP */
[120]    QName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-QName]Names /* xgc: xml-versionXP */
[121]    NCName    ::=    [http://www.w3.org/TR/REC-xml-names/#NT-NCName]Names /* xgc: xml-versionXP */
[122]    S    ::=    [http://www.w3.org/TR/REC-xml#NT-S]XML /* xgc: xml-versionXP */
[123]    Char    ::=    [http://www.w3.org/TR/REC-xml#NT-Char]XML /* xgc: xml-versionXP */

The following symbols are used only in the definition of terminal symbols; they are not terminal symbols in the grammar of B EBNF for XPath 2.0 Grammar with Full-Text extensions.

[124]    Digits    ::=    [0-9]+
[125]    CommentContents    ::=    (Char+ - (Char* ('(:' | ':)') Char*))

C Static Context Components

The following table describes the full-text components of the static context (as defined in Section 2.1.1 Static ContextXQ). The following aspects of each component are described:

Static Context Components
Component Default initial value Can be overwritten or augmented by implementation? Can be overwritten or augmented by a query? Scope Consistency rules
FTCaseOption case insensitive overwriteable overwriteable by prolog lexical Value must be case insensitive, case sensitive, lowercase, or uppercase.
FTDiacriticsOption diacritics insensitive overwriteable overwriteable by prolog lexical Value must be diacritics insensitive or diacritics sensitive.
FTStemOption no stemming overwriteable overwriteable by prolog lexical Value must be stemming or no stemming.
FTThesaurusOption no thesaurus overwriteable overwriteable by prolog (refer to default to augment) lexical Each URI in the value must be found in the statically known thesauri.
Statically known thesauri none augmentable cannot be augmented or overwritten by prolog module Each URI uniquely identifies a thesaurus list.
FTStopWordOption no stop words overwriteable overwriteable by prolog (refer to default to augment) lexical Each URI in the value must be found in the statically known stop word lists.
Statically known stop word lists none augmentable cannot be augmented or overwritten by prolog module Each URI uniquely identifies a stop word list.
FTLanguageOption implementation-defined overwriteable overwriteable by prolog lexical Value must be castable to xs:language.
Statically known languages none augmentable cannot be augmented or overwritten by prolog module Each string uniquely identifies a language.
FTWildCardOption no wildcards no overwriteable by prolog lexical Value must be wildcards or no wildcards.

D Error Conditions

err:FTST0001

An implementation that does not support the FTMildNot operator must raise a static error if a full-text query contains a mild not.

err:FTST0002

An implementation that enforces one of the restrictions on FTUnaryNot must raise a static error if a full-text query does not obey the restriction.

err:FTST0003

An implementation that does not support one or more of the choices on FTUnit and FTBigUnit must raise a static error if a full-text query contains one of those choices.

err:FTST0004

An implementation that does not support the FTScope operator must raise a static error if a full-text query contains a scope.

err:FTST0005

An implementation that does not support the FTTimes operator must raise a static error if a full-text query contains a times.

err:FTST0006

An implementation that restricts the use of FTStopWordOption must raise a static error if a full-text query contains a stop word option that does not meet the restriction.

err:FTST0007

An implementation that restricts the use of FTIgnoreOption must raise a static error if a full-text query contains an ignore option that does not meet the restriction.

err:FTST0008

It is a static error if, during the static analysis phase, the query is found to contain a stop word option that refers to a stop word list that is not found in the statically known stop word lists.

err:FTST0009

It may be a static error if, during the static analysis phase, the query is found to contain a language identifier in a language option that the implementation does not support. The implementation may choose not to raise this error and instead provide some other implementation-defined behavior.

err:FTST0010

It is a static error if, during the static analysis phase, an expression is found to use an FTOrder operator that does not appear directly succeeding an FTWindow or an FTDistance operator and the implementation enforces this restriction.

err:FTST0011

An implementation may restrict the use of FTWindow and FTDistance to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor. If it a static error if, during the static analysis phase, an expression is found that violates this restriction and the implementation enforces this restriction.

err:FTST0012

An implementation that does not support the FTContent operator must raise a static error if a full-text query contains one.

err:FTST0013

It is a static error if, during the static analysis phase, an implementation that restricts the use of FTLanguageOption to a single language, encounters more than one distinct language option.

err:FTST0014

An implementation may constrain the form of the expression used to compute scores. It is a static error if, during the static analysis phase, such an implementation encounters a scoring expression that does not meet the restriction.

err:FTST0015

It is a static error if, during the static analysis phase, an implementation that restricts the choices of FTCaseOption encounters the "lowercase" or "uppercase" option.

err:FTDY0016

It is a dynamic error if a weight value is not within the required range of values; it is also a dynamic error if an implementation that does not support negative weights encounters a negative weight value.

err:FTDY0017

It is a dynamic error if an implementation encounters a mild not selection, one of whose operands evaluates to an AllMatches that contains a StringExclude

err:FTST0018

It is a static error if, during the static analysis phase, the query is found to contain a thesaurus option that refers to a thesaurus that is not found in the statically known thesauri.

err:FTST0019

It is a static error if, within a single FTMatchOptions, there is more than one match option of any given match option group.

err:FTDY0020

It is a dynamic error if, when "wildcards" is in effect, a query string violates wildcard syntax.

err:FOCH0002

It is a dynamic error if, in a function invocation, the argument corresponding to the specified function's collation parameter does not identify a supported collation.

err:XPST0003

It is a static error if an expression is not a valid instance of the grammar defined in A EBNF for XQuery 1.0 Grammar with Full Text extensions or of the grammar defined in B EBNF for XPath 2.0 Grammar with Full-Text extensions.

err:XPTY0004

It is a type error if, during the static analysis phase, an expression is found to have a static type that is not appropriate for the context in which the expression occurs, or during the dynamic evaluation phase, the dynamic type of a value does not match a required type as specified by the matching rules in Section 2.5.4 SequenceType MatchingXP.

err:XQST0013

It is a static error if an implementation recognizes a pragma but determines that its content is invalid.

err:XQST0079

It is a static error if an extension expression contains neither a pragma that is recognized by the implementation nor an expression enclosed in curly braces.

E XML Syntax (XQueryX) for XQuery and XPath Full Text 1.0

[XML Syntax for XQuery 1.0 (XQueryX) (Second Edition)] defines an XML representation of [XQuery 1.0: An XML Query Language (Second Edition)]. [XQuery and XPath Full Text 1.0 Requirements], section 5.4, XML Syntax, states "XQuery and XPath Full Text MAY have more than one syntax binding. One query language syntax MUST be expressed in XML in a way that reflects the underlying structure of the query. See XML Query Requirements." This appendix specifies XML Schemas that together define the XML representation of XQuery and XPath Full Text 1.0 by representing the abstract syntax found in A EBNF for XQuery 1.0 Grammar with Full Text extensions. Because XQuery and XPath Full Text 1.0 integrates seamlessly with XQuery 1.0, it follows that the XML Syntax for XQuery and XPath Full Text 1.0 must integrate well with the XML Syntax for XQuery 1.0.

The XML Schema specified in this appendix accomplishes integration by importing the XML Schema defined for XQueryX in Section 4 An XML Schema for the XQuery XML SyntaxXQX, incorporating all of its type and element definitions. It then extends that schema by adding definitions of new types and elements in a namespace belonging to the full-text specification.

The semantics of a Full Text XQueryX document are determined by the semantics of the XQuery Full Text expression that results from transforming the XQueryX document into XQuery Full Text syntax using the XSLT stylesheet that appears in section E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0. The "correctness" of that transformation is determined by asking the following the question: Can some Full Text XQueryX processor QX process some Full Text XQueryX document D1 to produce results R1, after which the stylesheet is used to translate D1 into an XQuery Full Text expression E1 that, when processed by some XQuery Full Text processor Q, produces results R2 that are equivalent (under some meaningful definition of "equivalent") to results R1?

E.1 XQueryX representation of XQuery and XPath Full Text 1.0

The XML Schema that defines the complex types and elements for XQueryX in support of XQuery and XPath Full Text 1.0, including the ftContainsExpr, incorporates a second XML Schema that defines types and elements to support the ftMatchOption. Both XML Schemas are defined in this section.


<xsd:schema
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:xqx="http://www.w3.org/2005/XQueryX"
     xmlns:xqxft="http://www.w3.org/2007/xpath-full-text"
     targetNamespace="http://www.w3.org/2007/xpath-full-text"
     elementFormDefault="qualified"
     attributeFormDefault="unqualified">

<!-- Initial creation                            2006-08-17: Jim Melton    -->
<!-- Added ftOptionDecl, ftScoreVariableBinding  2006-08-21: Jim Melton    -->
<!-- First version believed complete             2006-08-29: Jim Melton    -->
<!-- Cleaned up naming                           2007-04-27: Mary Holstege -->
<!-- Revised to align with updated syntax        2008-01-14: Jim Melton    -->
<!-- Moved ftOptionDecl: prolog part two to one  2008-01-24: Jim Melton    -->
<!-- Revised position of "weight" in grammar     2008-11-12: Jim Melton    -->

  <xsd:import namespace="http://www.w3.org/2005/XQueryX"
              schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/>

  <xsd:include schemaLocation="./xpath-full-text-10-xqueryx-ftmatchoption-extensions.xsd"/>

  <xsd:element name="ftOptionDecl" substitutionGroup="xqx:prologPartOneItem">
    <xsd:complexType>
      <xsd:sequence minOccurs="1" maxOccurs="unbounded">
        <xsd:element ref="xqxft:ftMatchOption"/>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>


  <!-- Create a new substitution group for full-text expressions           -->
  <xsd:complexType name="ftExpr">
    <xsd:complexContent>
      <xsd:extension base="xqx:expr"/>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftExpr" type="xqxft:ftExpr" abstract="true" substitutionGroup="xqx:expr"/>


  <!-- Represents an untyped variable for the "score" clause               -->
  <xsd:element name="ftScoreVariableBinding" type="xqx:QName"
               substitutionGroup="xqx:forLetClauseItemExtensions"/>



  <!-- FTContains ("contains text")                                        -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTContainsExpr ::=                                                -->
  <!--     RangeExpr ( "contains" "text" FTSelection FTIgnoreOption? )?    -->
  <xsd:complexType name="ftContainsExpr">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftExpr">
        <xsd:sequence>
          <xsd:element name="ftRangeExpr"
                       type="xqx:exprWrapper" />
          <xsd:sequence minOccurs="0" maxOccurs="1">
            <xsd:element name="ftSelectionExpr"
                         type="xqxft:ftSelectionWrapper" />
            <xsd:element name="ftIgnoreOption"
                         type="xqxft:ftIgnoreOption"
                         minOccurs="0" maxOccurs="1" />
          </xsd:sequence>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftContainsExpr" type="xqxft:ftContainsExpr" substitutionGroup="xqxft:ftExpr" />


  <!-- FTProximity                                                         -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTPosFilter ::=                                                   -->
  <!--     FTOrder | FTWindow | FTDistance | FTScope | FTContent           -->
  <xsd:complexType name="ftProximity" />

  <xsd:element name="ftProximity" type="xqxft:ftProximity" abstract="true"/>


  <!-- some simple type definitions                                        -->

  <!-- Represents the following grammar productions:                       -->
  <!--   FTUnit ::= "words" | "sentences" | "paragraphs"                   -->
  <xsd:simpleType name="ftUnit">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="paragraph"/>
      <xsd:enumeration value="sentence"/>
      <xsd:enumeration value="word"/>
    </xsd:restriction>
  </xsd:simpleType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTBigUnit ::= "sentence" | "paragraph"                            -->
  <xsd:simpleType name="ftBigUnit">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="paragraph"/>
      <xsd:enumeration value="sentence"/>
    </xsd:restriction>
  </xsd:simpleType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTContent ::= ("at" "start") | ("at" "end") | ("entire" "content")-->
  <xsd:simpleType name="contentLocation">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="at start"/>
      <xsd:enumeration value="at end"/>
      <xsd:enumeration value="entire content"/>
    </xsd:restriction>
  </xsd:simpleType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTScope ::= ("same" | "different") FTBigUnit                      -->
  <xsd:simpleType name="ftScopeType">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="same"/>
      <xsd:enumeration value="different"/>
    </xsd:restriction>
  </xsd:simpleType>


  <!-- range-related definitions                                           -->
  <xsd:complexType name="unaryRange">
    <xsd:sequence>
      <xsd:element name="value" type="xqx:exprWrapper" />
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="binaryRange">
    <xsd:sequence>
      <xsd:element name="lower" type="xqx:exprWrapper" />
      <xsd:element name="upper" type="xqx:exprWrapper" />
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="unaryLiteralRange">
    <xsd:sequence>
      <xsd:element name="value" type="xsd:integer" />
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="binaryLiteralRange">
    <xsd:sequence>
      <xsd:element name="lower" type="xsd:integer" />
      <xsd:element name="upper" type="xsd:integer" />
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTRange ::= ("exactly" AdditiveExpr)                              -->
  <!--             | ("at" "least" AdditiveExpr)                           -->
  <!--             | ("at" "most" AdditiveExpr)                            -->
  <!--             | ("from" AdditiveExpr "to" AdditiveExpr)               -->
  <xsd:complexType name="ftRange">
    <xsd:choice>
      <xsd:element name="atLeastRange" type="xqxft:unaryRange" />
      <xsd:element name="atMostRange" type="xqxft:unaryRange" />
      <xsd:element name="exactlyRange" type="xqxft:unaryRange" />
      <xsd:element name="fromToRange" type="xqxft:binaryRange" />
    </xsd:choice>
  </xsd:complexType>


  <!-- Represents the following grammar productions:                       -->
  <!--   FTLiteralRange ::= ("exactly" IntegerLiteral)                     -->
  <!--                    | ("at" "least" IntegerLiteral)                  -->
  <!--                    | ("at" "most" IntegerLiteral)                   -->
  <!--                    | ("from" IntegerLiteral "to" IntegerLiteral)    -->
  <xsd:complexType name="ftLiteralRange">
    <xsd:choice>
      <xsd:element name="atLeastLiteralRange" type="xqxft:unaryLiteralRange" />
      <xsd:element name="atMostLiteralRange" type="xqxft:unaryLiteralRange" />
      <xsd:element name="exactlyLiteralRange" type="xqxft:unaryLiteralRange" />
      <xsd:element name="fromToLiteralRange" type="xqxft:binaryLiteralRange" />
    </xsd:choice>
  </xsd:complexType>


  <!-- ftPosFilter alternative: ordered                                    -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTOrder ::= "ordered"                                             -->
  <xsd:complexType name="ftOrdered">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftProximity">
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftOrdered" type="xqxft:ftOrdered" substitutionGroup="xqxft:ftProximity"/>


  <!-- ftPosFilter alternative: window                                     -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTWindow ::= "window" AdditiveExpr FTUnit                         -->
  <xsd:complexType name="ftWindow">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftProximity">
        <xsd:sequence>
          <xsd:element name="value" type="xqx:exprWrapper" />
          <xsd:element name="unit" type="xqxft:ftUnit" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftWindow" type="xqxft:ftWindow" substitutionGroup="xqxft:ftProximity"/>


  <!-- ftPosFilter alternative: distance                                   -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTDistance ::= "distance" FTRange FTUnit                          -->
  <xsd:complexType name="ftDistance">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftProximity">
        <xsd:sequence>
          <xsd:element name="ftRange" type="xqxft:ftRange" />
          <xsd:element name="unit" type="xqxft:ftUnit" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftDistance" type="xqxft:ftDistance" substitutionGroup="xqxft:ftProximity"/>

  <!-- ftPosFilter alternative: scope                                      -->
  <!-- Represents the following grammar productions:                       -->
  <xsd:complexType name="ftScope">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftProximity">
        <xsd:sequence>
          <xsd:element name="type" type="xqxft:ftScopeType" />
          <xsd:element name="unit" type="xqxft:ftBigUnit" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftScope" type="xqxft:ftScope" substitutionGroup="xqxft:ftProximity"/>

  <!-- ftPosFilter alternative: FTContent                                  -->
  <!-- Represents the following grammar productions:                       -->
  <xsd:complexType name="ftContent">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftProximity">
        <xsd:sequence>
          <xsd:element name="location" type="xqxft:contentLocation" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftContent" type="xqxft:ftContent" substitutionGroup="xqxft:ftProximity"/>


  <!-- ftPosFilter                                                         -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTPosFilter ::=                                                   -->
  <!--     FTOrder | FTWindow | FTDistance | FTScope | FTContent           -->
  <xsd:complexType name="ftPosFilter">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftExpr">
        <xsd:sequence minOccurs="0" maxOccurs="unbounded">
          <xsd:element ref="xqxft:ftProximity" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>


  <!-- FTSelection                                                         -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTSelection ::= FTOr FTPosFilter*                                 -->
  <xsd:complexType name="ftSelection" >
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftExpr">
        <xsd:sequence>
          <xsd:element name="ftSelectionSource" type="xqx:exprWrapper"/>
          <xsd:element name="ftPosFilter"
                       type="xqxft:ftPosFilter"
                       minOccurs="0" maxOccurs="1" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftSelection" type="xqxft:ftSelection" substitutionGroup="xqxft:ftExpr" />


  <xsd:complexType name="ftSelectionWrapper">
    <xsd:sequence>
      <xsd:element ref="xqxft:ftSelection"/>
    </xsd:sequence>
  </xsd:complexType>


  <!-- Represents the following grammar productions:                       -->
  <!--   FTIgnoreOption ::= "without" "content" UnionExpr                  -->
  <xsd:complexType name="ftIgnoreOption">
    <xsd:sequence>
      <xsd:element ref="xqx:expr"/>
    </xsd:sequence>
  </xsd:complexType>


  <!-- Full-Text logical operators                                         -->
  <xsd:element name="ftLogicalOp" type="xqx:binaryOperatorExpr" abstract="true"
               substitutionGroup="xqx:operatorExpr"/>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTOr ::= FTAnd ( "ftor" FTAnd )*                                  -->
  <xsd:element name="ftOr" type="xqx:binaryOperatorExpr"
               substitutionGroup="xqxft:ftLogicalOp"/>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTAnd ::= FTMildNot ( "ftand" FTMildNot )*                        -->
  <xsd:element name="ftAnd" type="xqx:binaryOperatorExpr"
               substitutionGroup="xqxft:ftLogicalOp"/>

  <!-- Represents the following grammar productions:                       -->
  <!--       FTMildNot ::= FTUnaryNot ( "not" "in" FTUnaryNot )*              -->
  <xsd:element name="ftMildNot" type="xqx:binaryOperatorExpr"
               substitutionGroup="xqxft:ftLogicalOp"/>

  <!-- Represents the following grammar productions:                       -->
  <xsd:element name="ftLogicalNot" type="xqx:unaryOperatorExpr" abstract="true"
               substitutionGroup="xqx:operatorExpr"/>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTUnaryNot ::= ("ftnot")? FTPrimaryWithOptions                    -->
  <xsd:element name="ftUnaryNot" type="xqx:unaryOperatorExpr"
               substitutionGroup="xqxft:ftLogicalNot"/>


  <!-- Definitions associated with FTWords                                 -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTTimes ::= "occurs" FTRange "times"                              -->
  <xsd:complexType name="ftTimes">
    <xsd:sequence>
      <xsd:element name="ftRange" type="xqxft:ftRange"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--  FTAnyallOption ::= ("any" "word"?) | ("all" "words"?) | "phrase"   -->
  <xsd:simpleType name="ftAnyAllOption">
    <xsd:restriction base="xsd:string">
      <xsd:enumeration value="any"/>
      <xsd:enumeration value="all"/>
      <xsd:enumeration value="any word"/>
      <xsd:enumeration value="all words"/>
      <xsd:enumeration value="phrase"/>
    </xsd:restriction>
  </xsd:simpleType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTWordsValue ::= StringLiteral | ("{" Expr "}")                         -->
  <xsd:complexType name="ftWordsAlternatives">
    <xsd:choice>
      <xsd:element name="ftWordsLiteral" type="xqx:exprWrapper"/>
      <xsd:element name="ftWordsExpression" type="xqx:exprWrapper"/>
    </xsd:choice>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTWords ::= FTWordsValue FTAnyallOption?                          -->
  <xsd:complexType name="ftWords">
    <xsd:sequence>
      <xsd:element name="ftWordsValue" type="xqxft:ftWordsAlternatives" />
      <xsd:element name="ftAnyAllOption" type="xqxft:ftAnyAllOption"
                   minOccurs="0" maxOccurs="1" />
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--  ... FTWordsValue FTAnyallOption?                                   -->
  <xsd:group name="ftWordsWithTimes">
    <xsd:sequence>
      <xsd:element name="ftWords" type="xqxft:ftWords" />
      <xsd:element name="ftTimes" type="xqxft:ftTimes" minOccurs="0" />
    </xsd:sequence>
  </xsd:group>


  <!-- Represents the following grammar productions:                       -->
  <!--   FTExtensionSelection ::= Pragma+ "{" FTSelection? "}"             -->
  <xsd:complexType name="ftExtensionSelection">
    <xsd:sequence>
      <xsd:element name="pragma" type="xqx:pragma"
                   minOccurs="1" maxOccurs="unbounded"/>
      <xsd:element name="ftSelection" type="xqxft:ftSelection"
                   minOccurs="0" maxOccurs="1"/>
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTPrimary ::= (FTWords FTTimes?)                                  -->
  <!--               | ("(" FTSelection ")")                               -->
  <!--               | FTExtensionSelection                                -->
  <xsd:complexType name="ftPrimary">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftExpr" >
        <xsd:choice>
          <xsd:element name="parenthesized" type="xqx:exprWrapper"/>
          <xsd:group ref="xqxft:ftWordsWithTimes" />
          <xsd:element name="ftExtensionSelection" type="xqxft:ftExtensionSelection"/>
        </xsd:choice>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTPrimaryWithOptions ::= FTPrimary FTMatchOptions? FTWeight?      -->
  <xsd:complexType name="ftPrimaryWithOptions">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftExpr">
        <xsd:sequence>
          <xsd:element name="ftPrimary" type="xqxft:ftPrimary"/>
          <xsd:element ref="xqxft:ftMatchOptions"
                       minOccurs="0" maxOccurs="1"/>
          <xsd:element name="weight"
                       type="xqx:exprWrapper"
                       minOccurs="0" maxOccurs="1" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftPrimaryWithOptions" type="xqxft:ftPrimaryWithOptions"
               substitutionGroup="xqxft:ftExpr"/>

</xsd:schema>



<xsd:schema
     xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
     xmlns:xqx="http://www.w3.org/2005/XQueryX"
     xmlns:xqxft="http://www.w3.org/2007/xpath-full-text"
     targetNamespace="http://www.w3.org/2007/xpath-full-text"
     elementFormDefault="qualified" 
     attributeFormDefault="unqualified">

<!-- Initial creation                         2006-08-17: Jim Melton       -->
<!-- First version believed complete          2006-08-29: Jim Melton       -->
<!-- Cleaned up naming                        2007-04-27: Mary Holstege    -->     
<!-- Revised to align with updated syntax     2008-01-14: Jim Melton       -->
<!-- Comments added to clarify each element   2008-11-12: Jim Melton       -->
<!-- Add element decl for ftMatchOptions      2009-07-06: Michael Dyck     -->
<!-- Fixed FTThesaurus for ftLiteralRange     2011-03-08: Jim Melton       -->

  <xsd:import namespace="http://www.w3.org/2005/XQueryX"
              schemaLocation="http://www.w3.org/2005/XQueryX/xqueryx.xsd"/>

  <!-- FTMatchOption                                                       -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTMatchOption ::= FTLanguageOption                                -->
  <!--                   | FTWildCardOption                                -->
  <!--                   | FTThesaurusOption                               -->
  <!--                   | FTStemOption                                    -->
  <!--                   | FTCaseOption                                    -->
  <!--                   | FTDiacriticsOption                              -->
  <!--                   | FTStopWordOption                                -->
  <!--                   | FTExtensionOption                               -->
  <xsd:complexType name="ftMatchOption" />

  <xsd:element name="ftMatchOption" type="xqxft:ftMatchOption"
               abstract="true" />
  

  <!-- Represents the following grammar productions:                       -->
  <!--   FTMatchOptions ::= ( "using" FTMatchOption )+                     -->
  <xsd:complexType name="ftMatchOptions">
    <xsd:sequence minOccurs="1" maxOccurs="unbounded">
      <xsd:element ref="xqxft:ftMatchOption"/>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:element name="ftMatchOptions" type="xqxft:ftMatchOptions"/>
  

  <!-- ftMatchOption alternative: case                                     -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTCaseOption ::= ("case" "insensitive")                           -->
  <!--                  | ("case" "sensitive")                             -->
  <!--                  | "lowercase"                                      -->
  <!--                  | "uppercase"                                      -->
  <xsd:complexType name="ftCaseOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption" >
        <xsd:sequence>
          <xsd:element name="value">
            <xsd:simpleType>
              <xsd:restriction base="xsd:string">
                <xsd:enumeration value="lowercase"/>
                <xsd:enumeration value="uppercase"/>
                <xsd:enumeration value="case sensitive"/>
                <xsd:enumeration value="case insensitive"/>
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="case" type="xqxft:ftCaseOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- ftMatchOption alternative: diacritics                               -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTDiacriticsOption ::= ("diacritics" "insensitive")               -->
  <!--                        | ("diacritics" "sensitive")                 -->
  <xsd:complexType name="ftDiacriticsOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption" >
        <xsd:sequence>
          <xsd:element name="value">
            <xsd:simpleType>
              <xsd:restriction base="xsd:string">
                <xsd:enumeration value="diacritics sensitive"/>
                <xsd:enumeration value="diacritics insensitive"/>
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>
  
  <xsd:element name="diacritics" type="xqxft:ftDiacriticsOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- ftMatchOption alternative: stemming                                 -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTStemOption ::= ("stemming") | ("no" "stemming")                 -->
  <xsd:complexType name="ftStemOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption" >
        <xsd:sequence>
          <xsd:element name="value">
            <xsd:simpleType>
              <xsd:restriction base="xsd:string">
                <xsd:enumeration value="stemming" /> 
                <xsd:enumeration value="no stemming" /> 
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>
  
  <xsd:element name="stem" type="xqxft:ftStemOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- ftMatchOption alternative: thesaurus                                -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTThesaurusID ::= "at" URILiteral ("relationship" StringLiteral)? -->
  <!--                     (FTLiteralRange "levels")?                      -->
  <xsd:complexType name="ftThesaurusID">
    <xsd:sequence>
      <xsd:element name="at" type="xsd:anyURI" />
      <xsd:element name="relationship" type="xsd:string" minOccurs="0" />
      <xsd:element name="levels" type="xqxft:ftLiteralRange" minOccurs="0" />
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   ... (FTThesaurusID | "default")                                   -->
  <!--   ... "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")     -->
  <xsd:complexType name="thesaurusSpecSequence">
    <xsd:sequence>
      <xsd:choice>
        <xsd:element name="default" />
        <xsd:element name="thesaurusID"
                     type="xqxft:ftThesaurusID" />
      </xsd:choice>
      <xsd:element name="thesaurusID" type="xqxft:ftThesaurusID"
                   minOccurs="0" maxOccurs="unbounded" />
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTThesaurusOption ::=                                             -->
  <!--       ("thesaurus" (FTThesaurusID | "default"))                     -->
  <!--     | ("thesaurus"                                                  -->
  <!--          "(" (FTThesaurusID | "default") ("," FTThesaurusID)* ")")  -->
  <!--     | ("no" "thesaurus")                                            -->
  <xsd:complexType name="ftThesaurusOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption" >
        <xsd:choice>
          <xsd:element name="noThesauri" />
          <xsd:element name="thesauri" type="xqxft:thesaurusSpecSequence" />
        </xsd:choice>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="thesaurus" type="xqxft:ftThesaurusOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- ftMatchOption alternative: stopwords                                -->
  <!-- Represents the following grammar productions:                       -->
  <!--     FTStopWords ::= ("at" URILiteral)                               -->
  <!--   | ("(" StringLiteral ("," StringLiteral)* ")")                    -->
  <xsd:complexType name="ftStopWords">
    <xsd:choice>
      <xsd:element name="ref" type="xsd:anyURI" />
      <xsd:element name="list">
        <xsd:complexType>
          <xsd:sequence>
            <xsd:element ref="xqx:stringConstantExpr"
                         minOccurs="1" maxOccurs="unbounded" />
          </xsd:sequence>
        </xsd:complexType>
      </xsd:element>
    </xsd:choice>
  </xsd:complexType>

  <xsd:element name="ftStopWords" type="xqxft:ftStopWords" />


  <!-- Represents the following grammar productions:                       -->
  <!--   ... "stop" "words" FTStopWords ...                                -->
  <!--   ... "stop" "words" "default" ...                                  -->
  <xsd:group name="baseStopWords">
    <xsd:choice>
      <xsd:element name="default" />
      <xsd:element ref="xqxft:ftStopWords" />
    </xsd:choice>
  </xsd:group>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTStopWordsInclExcl ::= ("union" | "except") FTStopWords          -->
  <xsd:complexType name="ftStopWordsInclExcl">
    <xsd:choice>
      <xsd:element name="union" type="xqxft:ftStopWords" />
      <xsd:element name="except" type="xqxft:ftStopWords" />
    </xsd:choice>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   ... ("using" "stop" "words" FTStopWords FTStopWordsInclExcl*) ... -->
  <!--   ... ("using" "default" "stop" "words" FTStopWordsInclExcl*) ...   -->
  <xsd:complexType name="stopWordsSpecSequence">
    <xsd:sequence>
      <xsd:group ref="xqxft:baseStopWords" />
      <xsd:element name="ftStopWordsInclExcl"
                   type="xqxft:ftStopWordsInclExcl"
                   minOccurs="0" maxOccurs="unbounded" />
    </xsd:sequence>
  </xsd:complexType>

  <!-- Represents the following grammar productions:                       -->
  <!--   FTStopWordOption ::=                                              -->
  <!--       ("stop" "words" FTStopWords FTStopWordsInclExcl*)             -->
  <!--     | ("stop" "words" "default" FTStopWordsInclExcl*)               -->
  <!--     | ("no" "stop" "words")                                         -->
  <xsd:complexType name="ftStopWordOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption" >
        <xsd:choice>
          <xsd:element name="noStopwords" />
          <xsd:element name="stopwords" type="xqxft:stopWordsSpecSequence" />
        </xsd:choice>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="stopword" type="xqxft:ftStopWordOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- ftMatchOption alternative: language                                 -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTLanguageOption ::= "language" StringLiteral                     -->
  <xsd:complexType name="ftLanguageOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption" >
        <xsd:sequence>
          <xsd:element name="value" type="xsd:string" />
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="language" type="xqxft:ftLanguageOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- ftMatchOption alternative: wildcards                                -->
  <!-- Represents the following grammar productions:                       -->
  <!--   FTWildCardOption ::= ("wildcards")                                -->
  <!--                      | ("no" "wildcards")                           -->
  <xsd:complexType name="ftWildCardOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption">
        <xsd:sequence>
          <xsd:element name="value">
            <xsd:simpleType>
              <xsd:restriction base="xsd:string">
                <xsd:enumeration value="wildcards" /> 
                <xsd:enumeration value="no wildcards" />
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="wildcard" type="xqxft:ftWildCardOption"
               substitutionGroup="xqxft:ftMatchOption" />


  <!-- Represents the following grammar productions:                       -->
  <!--   FTExtensionOption ::= "option" QName StringLiteral                -->
  <xsd:complexType name="ftExtensionOption">
    <xsd:complexContent>
      <xsd:extension base="xqxft:ftMatchOption">
        <xsd:sequence>
          <xsd:element name="ftExtensionName" type="xqx:QName"/>
          <xsd:element name="ftExtensionValue" type="xsd:string"/>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:element name="ftExtensionOption" type="xqxft:ftExtensionOption"
               substitutionGroup="xqxft:ftMatchOption" />

</xsd:schema>


E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0

The XSLT stylesheet that defines the semantics of XQueryX in support of XQuery and XPath Full Text 1.0 integrates seamlessly with the XQueryX XSLT stylesheet defined in Section B Transforming XQueryX to XQueryXQX by importing the XQueryX XSLT stylesheet. It provides additional templates that define the semantics of the XQueryX representation of XQuery and XPath Full Text 1.0 by transforming that XQueryX representation into the human readable syntax of XQuery and XPath Full Text 1.0.


<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xqxft="http://www.w3.org/2007/xpath-full-text"
                xmlns:xqx="http://www.w3.org/2005/XQueryX">

<!-- Initial creation                            2006-08-17: Jim Melton   -->
<!-- Added ftOptionDecl, ftScoreVariableBinding  2006-08-21: Jim Melton   -->
<!-- First version believed complete             2006-08-29: Jim Melton   -->
<!-- Revised to align with 2008-01-24 draft      2008-02-08: Jim Melton   -->
<!-- Revised position of "weight" in grammar     2008-11-12: Jim Melton   -->
<!-- Various bug fixes                           2009-07-14: Michael Dyck -->
<!-- ftcontains => "contains text", Bug 7247     2009-09-17: Jim Melton   -->
<!-- with => using, stop words default, Bug 7271 2009-09-17: Jim Melton   -->
<!-- {} around weight values, around empty
     selection after pragmas                     2010-09-07: Jim Melton   -->

<xsl:import href="http://www.w3.org/2005/XQueryX/xqueryx.xsl"/>


<!-- ftOptionDecl -->
<xsl:template match="xqxft:ftOptionDecl">
  <xsl:text>declare ft-option </xsl:text>
  <xsl:apply-templates/>
</xsl:template>


<!-- ftScoreVariableBinding -->
<xsl:template match="xqxft:ftScoreVariableBinding">
  <xsl:text> score </xsl:text>
  <xsl:value-of select="$DOLLAR"/>
  <xsl:if test="@xqx:prefix">
    <xsl:value-of select="@xqx:prefix"/>
    <xsl:value-of select="$COLON"/>
  </xsl:if>
  <xsl:value-of select="."/>
</xsl:template>


<!-- ftcontains -->
<xsl:template match="xqxft:ftContainsExpr">
  <xsl:apply-templates select="xqxft:ftRangeExpr"/>
  <xsl:text> contains text </xsl:text>
  <xsl:apply-templates select="xqxft:ftSelectionExpr"/>
  <xsl:apply-templates select="xqxft:ftIgnoreOption"/>
</xsl:template>


<xsl:template match="xqxft:value">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftRangeExpr">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftLiteralRangeExpr">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftSelectionExpr">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftIgnoreOption">
  <xsl:text>without content </xsl:text>
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftSelection">
  <xsl:apply-templates select="xqxft:ftSelectionSource"/>
  <xsl:value-of select="$NEWLINE"/>
  <xsl:text>    </xsl:text>
  <xsl:apply-templates select="xqxft:ftPosFilter"/>
</xsl:template>


<xsl:template match="xqxft:ftSelectionSource">
  <xsl:apply-templates/>
  <xsl:text> </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftPosFilter">
  <xsl:apply-templates/>
  <xsl:value-of select="$NEWLINE"/>
  <xsl:text>      </xsl:text>
</xsl:template>


<!-- FTProximity alternative: ordered -->
<xsl:template match="xqxft:ftOrdered">
  <xsl:text>ordered </xsl:text>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<!-- FTProximity alternative: window -->
<xsl:template match="xqxft:ftWindow">
  <xsl:text>window </xsl:text>
  <xsl:apply-templates select="xqxft:value"/>
  <xsl:text> </xsl:text>
  <xsl:value-of select="xqxft:unit"/>
  <xsl:text>s</xsl:text>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<!-- FTProximity alternative: distance -->
<xsl:template match="xqxft:ftDistance">
  <xsl:text>distance </xsl:text>
  <xsl:apply-templates select="xqxft:ftRange"/>
  <xsl:text> </xsl:text>
  <xsl:value-of select="xqxft:unit"/>
  <xsl:text>s</xsl:text>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<!-- FTProximity alternative: scope -->
<xsl:template match="xqxft:ftScope">
  <xsl:value-of select="xqxft:type"/>
  <xsl:text> </xsl:text>
  <xsl:value-of select="xqxft:unit"/>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<!-- FTProximity alternative: content -->
<xsl:template match="xqxft:ftContent">
  <xsl:value-of select="xqxft:location"/>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<xsl:template match="xqxft:exactlyRange | xqxft:exactlyLiteralRange">
  <xsl:text>exactly </xsl:text>
  <xsl:apply-templates select="xqxft:value"/>
</xsl:template>

<xsl:template match="xqxft:atLeastRange | xqxft:atLeastLiteralRange">
  <xsl:text>at least </xsl:text>
  <xsl:apply-templates select="xqxft:value"/>
</xsl:template>

<xsl:template match="xqxft:atMostRange | xqxft:atMostLiteralRange">
  <xsl:text>at most </xsl:text>
  <xsl:apply-templates select="xqxft:value"/>
</xsl:template>

<xsl:template match="xqxft:fromToRange | xqxft:fromToLiteralRange">
  <xsl:text>from </xsl:text>
  <xsl:apply-templates select="xqxft:lower"/>
  <xsl:text> to </xsl:text>
  <xsl:apply-templates select="xqxft:upper"/>
  <xsl:text> </xsl:text>
</xsl:template>

<xsl:template match="xqxft:lower">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="xqxft:upper">
  <xsl:apply-templates/>
</xsl:template>


<!-- ftMatchOption alternative: case -->
<xsl:template match="xqxft:case">
  <xsl:text> using </xsl:text>
  <xsl:value-of select="xqxft:value"/>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>


<!-- ftMatchOption alternative: diacritics -->
<xsl:template match="xqxft:diacritics">
  <xsl:text> using </xsl:text>
  <xsl:value-of select="xqxft:value"/>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>


<!-- ftMatchOption alternative: stemming -->
<xsl:template match="xqxft:stem">
  <xsl:text> using </xsl:text>
  <xsl:value-of select="xqxft:value"/>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>


<!-- ftMatchOption alternative: thesaurus -->
<xsl:template match="xqxft:thesaurus">
  <xsl:text> using </xsl:text>
  <xsl:choose>
    <xsl:when test="xqxft:noThesauri">
      <xsl:text>no thesaurus </xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates/>
    </xsl:otherwise>
  </xsl:choose>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<xsl:template match="xqxft:thesauri">
  <xsl:text> </xsl:text>
  <xsl:text>thesaurus </xsl:text>
  <xsl:choose>
    <xsl:when test="child::*[2]">
      <xsl:call-template name="parenthesizedList"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<xsl:template match="xqxft:default">
  <xsl:text>default </xsl:text>
</xsl:template>

<xsl:template match="xqxft:thesaurusID">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="xqxft:at">
  <xsl:text>at "</xsl:text>
  <xsl:value-of select="."/>
  <xsl:text>" </xsl:text>
</xsl:template>

<xsl:template match="xqxft:relationship">
  <xsl:text>relationship "</xsl:text>
  <xsl:value-of select="."/>
  <xsl:text>" </xsl:text>
</xsl:template>

<xsl:template match="xqxft:levels">
  <xsl:apply-templates/>
  <xsl:text> levels </xsl:text>
</xsl:template>


<!-- ftMatchOption alternative: stopword -->
<xsl:template match="xqxft:stopword">
  <xsl:text>using </xsl:text>
  <xsl:choose>
    <xsl:when test="xqxft:noStopwords">
      <xsl:text>no stop words </xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates/>
    </xsl:otherwise>
  </xsl:choose> 
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>

<xsl:template match="xqxft:stopwords">
  <xsl:text> </xsl:text>
  <xsl:choose>
    <xsl:when test="xqxft:default">
      <xsl:text>stop words default </xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:text>stop words </xsl:text>
      <xsl:apply-templates select="xqxft:ftStopWords"/>
    </xsl:otherwise>
  </xsl:choose>
  <xsl:apply-templates select="xqxft:ftStopWordsInclExcl"/>
</xsl:template>

<xsl:template match="xqxft:ftStopWords">
  <xsl:call-template name="ftStopWords_type"/>
</xsl:template>

<xsl:template name="ftStopWords_type">
  <xsl:choose>
    <xsl:when test="xqxft:ref">
      <xsl:text>at "</xsl:text>
      <xsl:value-of select="xqxft:ref"/>
      <xsl:text>" </xsl:text>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<xsl:template match="xqxft:list">
  <xsl:call-template name="parenthesizedList"/>
  <xsl:text> </xsl:text>
</xsl:template>

<xsl:template match="xqxft:FTStopWordsInclExcl">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="xqxft:union">
  <xsl:text>union </xsl:text>
  <xsl:call-template name="ftStopWords_type"/>
</xsl:template>

<xsl:template match="xqxft:except">
  <xsl:text>except </xsl:text>
  <xsl:call-template name="ftStopWords_type"/>
</xsl:template>


<xsl:template match="xqxft:language">
  <xsl:text>using language "</xsl:text>
  <xsl:apply-templates/>
  <xsl:text>"</xsl:text>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>


<xsl:template match="xqxft:wildcard">
  <xsl:text>using </xsl:text>
  <xsl:apply-templates/>
  <xsl:value-of select="$NEWLINE"/>
</xsl:template>


<xsl:template match="xqxft:ftAnd">
  <xsl:apply-templates select="xqx:firstOperand"/>
  <xsl:text> ftand </xsl:text>
  <xsl:apply-templates select="xqx:secondOperand"/>
  <xsl:text> </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftOr">
  <xsl:apply-templates select="xqx:firstOperand"/>
  <xsl:text> ftor </xsl:text>
  <xsl:apply-templates select="xqx:secondOperand"/>
  <xsl:text> </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftMildNot">
  <xsl:apply-templates select="xqx:firstOperand"/>
  <xsl:text> not in </xsl:text>
  <xsl:apply-templates select="xqx:secondOperand"/>
  <xsl:text> </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftUnaryNot">
  <xsl:text>ftnot </xsl:text>
  <xsl:apply-templates select="xqx:operand"/>
  <xsl:text> </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftPrimaryWithOptions">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftPrimary">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:parenthesized">
  <xsl:text>( </xsl:text>
  <xsl:apply-templates/>
  <xsl:text> ) </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftWords">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftWordsValue">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftWordsLiteral">
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftWordsExpression">
  <xsl:text> { </xsl:text>
  <xsl:apply-templates/>
  <xsl:text> } </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftAnyAllOption">
  <xsl:value-of select="."/>
  <xsl:text> </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftTimes">
  <xsl:text>occurs </xsl:text>
  <xsl:apply-templates/>
  <xsl:text> times </xsl:text>
</xsl:template>


<xsl:template match="xqxft:ftExtensionSelection">
  <xsl:apply-templates select="xqxft:pragma"/>
  <xsl:text> { </xsl:text>
  <xsl:apply-templates select="xqxft:ftSelection"/>
  <xsl:text> } </xsl:text>
</xsl:template>


<xsl:template match="xqxft:pragma">
  <xsl:value-of select="$PRAGMA_BEGIN"/>
  <xsl:apply-templates select="xqx:pragmaName"/>
  <xsl:value-of select="$SPACE"/>
  <xsl:value-of select="xqx:pragmaContents"/>
  <xsl:value-of select="$PRAGMA_END"/>
</xsl:template>


<xsl:template match="xqxft:ftExtensionOption">
  <xsl:text>using option </xsl:text>
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftExtensionName">
  <xsl:if test="@xqx:prefix">
    <xsl:value-of select="@xqx:prefix"/>
    <xsl:value-of select="$COLON"/>
  </xsl:if>
  <xsl:apply-templates/>
</xsl:template>


<xsl:template match="xqxft:ftExtensionValue">
  <xsl:text> "</xsl:text>
  <xsl:apply-templates/>
  <xsl:text>"</xsl:text>
</xsl:template>


<xsl:template match="xqxft:weight">
  <xsl:text> weight { </xsl:text>
  <xsl:apply-templates/>
  <xsl:text> } </xsl:text>
</xsl:template>


</xsl:stylesheet>


E.3 XQueryX for XQuery and XPath Full Text 1.0 example

The following example is based on the data and queries of one of the use cases in [XQuery and XPath Full Text 1.0 Use Cases]. In this example, we show the English description of the query, the XQuery Full Text solution given in [XQuery and XPath Full Text 1.0 Use Cases], a Full Text XQueryX solution, and the XQuery Full Text query that results from applying the Full Text XQueryX-to-XQuery Full Text transformation defined by the stylesheet in E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0 to the Full Text XQueryX solution. The latter XQuery Full Text expression is presented only as a sanity-check — the intent of the stylesheet is not to create the identical XQuery Full Text expression given in [XQuery and XPath Full Text 1.0 Use Cases], but to produce a valid XQuery Full Text expression with the same semantics.

Comparison of the results of the Full Text XQueryX-to-XQuery Full Text transformation given in this document with the XQuery Full Text solutions in the [XQuery and XPath Full Text 1.0 Use Cases] may be helpful in evaluating the correctness of the Full Text XQueryX solution in the example.

The XQuery Full Text Use Cases solution given for the example is provided only to assist readers of this document in understanding the Full Text XQueryX solution. There is no intent to imply that this document specifies a "compilation" or "transformation" of XQuery Full Text syntax into Full Text XQueryX syntax.

In the following example, note that path expressions are expanded to show their structure. Also, note that the prefix syntax for binary operators like "and" makes the precedence explicit. In general, humans find it easier to read an XML representation that does not expand path expressions, but it is less convenient for programmatic representation and manipulation. XQueryX is designed as a language that is convenient for production and modification by software, and not as a convenient syntax for humans to read and write.

Finally, please note that white space, including new lines, have been added to some of the Full Text XQueryX documents and XQuery Full Text expressions for readability. That additional white space is not necessarily produced by the Full Text XQueryX-to-XQuery Full Text transformation.

E.3.1 Example

Here is Q4 from the [XQuery and XPath Full Text 1.0 Use Cases], use case SCORE: Find all books with parts about "usability testing".

E.3.1.1 XQuery solution in XQuery and XPath Full Text 1.0 Use Cases:
declare function local:filter ( $nodes 
   as node()*, $exclude as element()* ) as node()*
   {
      for $node in $nodes except $exclude
      return
         typeswitch ($node)
            case $e as element()
               return 
                 element {node-name($e)}
                   {
                       $e/@*,
                      filter( $e/node() except $exclude, 
                      $exclude )
                   }
            default 
               return $node
   };

for $book in doc("http://bstore1.example.com/full-text.xml")
   /books/book
let $irrelevantParts := 
   for $part in $book//part
   let score $score := $part contains text "usability test.*" 
      using wildcards
   where $score < 0.5
   return $part
where count($irrelevantParts) < count($book//part)
return filter($book, $irrelevantParts)
E.3.1.2 A Solution in Full Text XQueryX:
<?xml version="1.0"?>
<xqx:module xmlns:xqxft="http://www.w3.org/2007/xpath-full-text"
            xmlns:xqx="http://www.w3.org/2005/XQueryX"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="http://www.w3.org/2007/xpath-full-text
                                http://www.w3.org/2007/xpath-full-text/xpath-full-text-10-xqueryx.xsd
                                http://www.w3.org/2005/XQueryX
                                http://www.w3.org/2005/XQueryX/xqueryx.xsd">

  <xqx:mainModule>
    <xqx:prolog>
      <xqx:functionDecl>
        <xqx:functionName xqx:prefix="local">filter</xqx:functionName>
        <xqx:paramList>
          <xqx:param>
            <xqx:varName>nodes</xqx:varName>
            <xqx:typeDeclaration>
              <xqx:anyKindTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator>
            </xqx:typeDeclaration>
          </xqx:param>
          <xqx:param>
            <xqx:varName>exclude</xqx:varName>
            <xqx:typeDeclaration>
              <xqx:elementTest/><xqx:occurrenceIndicator>*</xqx:occurrenceIndicator>
            </xqx:typeDeclaration>
          </xqx:param>
        </xqx:paramList>
        <xqx:typeDeclaration>
          <xqx:anyKindTest/>
        </xqx:typeDeclaration>
        <xqx:functionBody>
          <xqx:flworExpr>
            <xqx:forClause>
              <xqx:forClauseItem>
                <xqx:typedVariableBinding>
                  <xqx:varName>node</xqx:varName>
                </xqx:typedVariableBinding>
                <xqx:forExpr>
                  <xqx:exceptOp>
                    <xqx:firstOperand>
                      <xqx:varRef>
                        <xqx:name>nodes</xqx:name>
                      </xqx:varRef>
                    </xqx:firstOperand>
                    <xqx:secondOperand>
                      <xqx:varRef>
                        <xqx:name>exclude</xqx:name>
                      </xqx:varRef>
                    </xqx:secondOperand>
                  </xqx:exceptOp>
                </xqx:forExpr>
              </xqx:forClauseItem>
            </xqx:forClause>
            <xqx:returnClause>
              <xqx:typeswitchExpr>
                <xqx:argExpr>
                  <xqx:varRef>
                    <xqx:name>node</xqx:name>
                  </xqx:varRef>
                </xqx:argExpr>
                <xqx:typeswitchExprCaseClause>
                  <xqx:variableBinding>e</xqx:variableBinding>
                  <xqx:sequenceType>
                    <xqx:elementTest/>
                  </xqx:sequenceType>
                  <xqx:resultExpr>
                    <xqx:computedElementConstructor>
                      <xqx:tagNameExpr>
                        <xqx:functionCallExpr>
                          <xqx:functionName xqx:prefix="fn">node-name</xqx:functionName>
                          <xqx:arguments>
                            <xqx:varRef>
                              <xqx:name>e</xqx:name>
                            </xqx:varRef>
                          </xqx:arguments>
                        </xqx:functionCallExpr>
                      </xqx:tagNameExpr>
                      <xqx:contentExpr>
                        <xqx:sequenceExpr>
                          <xqx:pathExpr>
                            <xqx:stepExpr>
                              <xqx:filterExpr>
                                <xqx:varRef>
                                  <xqx:name>e</xqx:name>
                                </xqx:varRef>
                              </xqx:filterExpr>
                            </xqx:stepExpr>
                            <xqx:stepExpr>
                              <xqx:xpathAxis>child</xqx:xpathAxis>
                              <xqx:attributeTest>
                                <xqx:attributeName>
                                  <xqx:star/>
                                </xqx:attributeName>
                              </xqx:attributeTest>
                            </xqx:stepExpr>
                          </xqx:pathExpr>
                          <xqx:functionCallExpr>
                            <xqx:functionName xqx:prefix="fn">filter</xqx:functionName>
                            <xqx:arguments>
                              <xqx:exceptOp>
                                <xqx:firstOperand>
                                  <xqx:pathExpr>
                                    <xqx:stepExpr>
                                      <xqx:filterExpr>
                                        <xqx:varRef>
                                          <xqx:name>e</xqx:name>
                                        </xqx:varRef>
                                      </xqx:filterExpr>
                                    </xqx:stepExpr>
                                    <xqx:stepExpr>
                                      <xqx:xpathAxis>child</xqx:xpathAxis>
                                      <xqx:anyKindTest/>
                                    </xqx:stepExpr>
                                  </xqx:pathExpr>
                                </xqx:firstOperand>
                                <xqx:secondOperand>
                                  <xqx:varRef>
                                    <xqx:name>exclude</xqx:name>
                                  </xqx:varRef>
                                </xqx:secondOperand>
                              </xqx:exceptOp>
                              <xqx:varRef>
                                <xqx:name>exclude</xqx:name>
                              </xqx:varRef>
                            </xqx:arguments>
                          </xqx:functionCallExpr>
                        </xqx:sequenceExpr>
                      </xqx:contentExpr>
                    </xqx:computedElementConstructor>
                  </xqx:resultExpr>
                </xqx:typeswitchExprCaseClause>
                <xqx:typeswitchExprDefaultClause>
                  <xqx:resultExpr>
                    <xqx:varRef>
                      <xqx:name>node</xqx:name>
                    </xqx:varRef>
                  </xqx:resultExpr>
                </xqx:typeswitchExprDefaultClause>
              </xqx:typeswitchExpr>
            </xqx:returnClause>
          </xqx:flworExpr>
        </xqx:functionBody>
      </xqx:functionDecl>
    </xqx:prolog>
    <xqx:queryBody>
      <xqx:flworExpr>
        <xqx:forClause>
          <xqx:forClauseItem>
            <xqx:typedVariableBinding>
              <xqx:varName>book</xqx:varName>
            </xqx:typedVariableBinding>
            <xqx:forExpr>
              <xqx:pathExpr>
                <xqx:stepExpr>
                  <xqx:filterExpr>
                    <xqx:functionCallExpr>
                      <xqx:functionName xqx:prefix="fn">doc</xqx:functionName>
                      <xqx:arguments>
                        <xqx:stringConstantExpr>
                          <xqx:value>http://bstore1.example.com/full-text.xml</xqx:value>
                        </xqx:stringConstantExpr>
                      </xqx:arguments>
                    </xqx:functionCallExpr>
                  </xqx:filterExpr>
                </xqx:stepExpr>
                <xqx:stepExpr>
                  <xqx:xpathAxis>child</xqx:xpathAxis>
                  <xqx:nameTest>books</xqx:nameTest>
                </xqx:stepExpr>
                <xqx:stepExpr>
                  <xqx:xpathAxis>child</xqx:xpathAxis>
                  <xqx:nameTest>book</xqx:nameTest>
                </xqx:stepExpr>
              </xqx:pathExpr>
            </xqx:forExpr>
          </xqx:forClauseItem>
        </xqx:forClause>
        <xqx:letClause>
          <xqx:letClauseItem>
            <xqx:typedVariableBinding>
              <xqx:varName>irrelevantParts</xqx:varName>
            </xqx:typedVariableBinding>
            <xqx:letExpr>
              <xqx:flworExpr>
                <xqx:forClause>
                  <xqx:forClauseItem>
                    <xqx:typedVariableBinding>
                      <xqx:varName>part</xqx:varName>
                    </xqx:typedVariableBinding>
                    <xqx:forExpr>
                      <xqx:pathExpr>
                        <xqx:stepExpr>
                          <xqx:filterExpr>
                            <xqx:varRef>
                              <xqx:name>book</xqx:name>
                            </xqx:varRef>
                          </xqx:filterExpr>
                        </xqx:stepExpr>
                        <xqx:stepExpr>
                          <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis>
                          <xqx:nameTest>part</xqx:nameTest>
                        </xqx:stepExpr>
                      </xqx:pathExpr>
                    </xqx:forExpr>
                  </xqx:forClauseItem>
                </xqx:forClause>
                <xqx:letClause>
                  <xqx:letClauseItem>
                    <xqxft:ftScoreVariableBinding>score</xqxft:ftScoreVariableBinding>
                    <xqx:letExpr>
                      <xqxft:ftContainsExpr>
                        <xqxft:ftRangeExpr>
                          <xqx:varRef>
                            <xqx:name>part</xqx:name>
                          </xqx:varRef>
                        </xqxft:ftRangeExpr>
                        <xqxft:ftSelectionExpr>
                          <xqxft:ftSelection>
                            <xqxft:ftSelectionSource>
                              <xqxft:ftPrimaryWithOptions>
                                <xqxft:ftPrimary>
                                  <xqxft:ftWords>
                                    <xqxft:ftWordsValue>
                                      <xqxft:ftWordsLiteral>
                                        <xqx:stringConstantExpr>
                                          <xqx:value>usability test.*</xqx:value>
                                        </xqx:stringConstantExpr>
                                     </xqxft:ftWordsLiteral>
                                    </xqxft:ftWordsValue>
                                  </xqxft:ftWords>
                                </xqxft:ftPrimary>
                                <xqxft:wildcard>
                                  <xqxft:value>using wildcards</xqxft:value>
                                </xqxft:wildcard>
                              </xqxft:ftPrimaryWithOptions>
                            </xqxft:ftSelectionSource>
                          </xqxft:ftSelection>
                        </xqxft:ftSelectionExpr>
                      </xqxft:ftContainsExpr>
                    </xqx:letExpr>
                  </xqx:letClauseItem>
                </xqx:letClause>
                <xqx:whereClause>
                  <xqx:lessThanOp>
                    <xqx:firstOperand>
                      <xqx:varRef>
                        <xqx:name>score</xqx:name>
                      </xqx:varRef>
                    </xqx:firstOperand>
                    <xqx:secondOperand>
                      <xqx:decimalConstantExpr>
                        <xqx:value>0.5</xqx:value>
                      </xqx:decimalConstantExpr>
                    </xqx:secondOperand>
                  </xqx:lessThanOp>
                </xqx:whereClause>
                <xqx:returnClause>
                  <xqx:varRef>
                    <xqx:name>part</xqx:name>
                  </xqx:varRef>
                </xqx:returnClause>
              </xqx:flworExpr>
            </xqx:letExpr>
          </xqx:letClauseItem>
        </xqx:letClause>
        <xqx:whereClause>
          <xqx:lessThanOp>
          <xqx:firstOperand>
            <xqx:functionCallExpr>
              <xqx:functionName xqx:prefix="fn">count</xqx:functionName>
              <xqx:arguments>
                <xqx:varRef>
                  <xqx:name>irrelevantParts</xqx:name>
                </xqx:varRef>
              </xqx:arguments>
            </xqx:functionCallExpr>
          </xqx:firstOperand>
          <xqx:secondOperand>
            <xqx:functionCallExpr>
              <xqx:functionName xqx:prefix="fn">count</xqx:functionName>
              <xqx:arguments>
                <xqx:pathExpr>
                  <xqx:stepExpr>
                    <xqx:filterExpr>
                      <xqx:varRef>
                        <xqx:name>book</xqx:name>
                      </xqx:varRef>
                    </xqx:filterExpr>
                  </xqx:stepExpr>
                  <xqx:stepExpr>
                    <xqx:xpathAxis>descendant-or-self</xqx:xpathAxis>
                    <xqx:nameTest>part</xqx:nameTest>
                  </xqx:stepExpr>
                </xqx:pathExpr>
              </xqx:arguments>
            </xqx:functionCallExpr>
          </xqx:secondOperand>
          </xqx:lessThanOp>
        </xqx:whereClause>
        <xqx:returnClause>
          <xqx:functionCallExpr>
            <xqx:functionName xqx:prefix="local">filter</xqx:functionName>
            <xqx:arguments>
              <xqx:varRef>
                <xqx:name>book</xqx:name>
              </xqx:varRef>
              <xqx:varRef>
                <xqx:name>irrelevantParts</xqx:name>
              </xqx:varRef>
            </xqx:arguments>
          </xqx:functionCallExpr>
        </xqx:returnClause>
      </xqx:flworExpr>
    </xqx:queryBody>
  </xqx:mainModule>
</xqx:module>
E.3.1.3 Transformation of Full Text XQueryX Solution into XQuery Full Text

Application of the stylesheet in E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0 to the Full Text XQueryX solution results in:

declare function local:filter($nodes as node()*, $exclude as element()*) as node()
{
( for $node in ($nodes except $exclude)
  return ( typeswitch($node)
             case $e as element()
               return element {fn:node-name($e)}
                  {( $e/child::attribute(*),
                     fn:filter( ($e/child::node() except $exclude), $exclude ) )}
             default return $node )
)
};

( for $book
    in fn:doc("http://bstore1.example.com/full-text.xml")/child::books/child::book
  let $irrelevantParts:=
  ( for $part in $book/descendant-or-self::part
    let score $score := $part contains text "usability test.*"
        using wildcards
    where ($score < 0.5)
    return $part
)
  where (fn:count($irrelevantParts) < fn:count($book/descendant-or-self::part))
  return local:filter($book, $irrelevantParts)
)

F References

F.1 Normative References

XQuery 1.0: An XML Query Language (Second Edition)
XQuery 1.0: An XML Query Language (Second Edition), Don Chamberlin, Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xquery-20101214/. The latest version is available at http://www.w3.org/TR/xquery/.
XML Path Language (XPath) 2.0 (Second Edition)
XML Path Language (XPath) 2.0 (Second Edition), Don Chamberlin, Anders Berglund, Scott Boag, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath20-20101214/. The latest version is available at http://www.w3.org/TR/xpath20/.
XQuery 1.0 and XPath 2.0 Functions and Operators (Second Edition)
XQuery 1.0 and XPath 2.0 Functions and Operators (Second Edition), Ashok Malhotra, Jim Melton, and Norman Walsh, Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath-functions-20101214/. The latest version is available at http://www.w3.org/TR/xpath-functions/.
XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition)
XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), Norman Walsh, Mary Fernández, Ashok Malhotra, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/. The latest version is available at http://www.w3.org/TR/xpath-datamodel/.
XML Syntax for XQuery 1.0 (XQueryX) (Second Edition)
XML Syntax for XQuery 1.0 (XQueryX) (Second Edition), Jim Melton and Subramanian Muralidhar, Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xqueryx-20101214/. The latest version is available at http://www.w3.org/TR/xqueryx/.
XQuery and XPath Full Text 1.0 Requirements
XQuery and XPath Full Text 1.0 Requirements, Stephen Buxton, Pat Case, and Michael Rys, Editors. World Wide Web Consortium, 25 January 2011. This version is http://www.w3.org/TR/2011/NOTE-xpath-full-text-10-requirements-20110125/. The latest version is available at http://www.w3.org/TR/xpath-full-text-10-requirements/.
XQuery and XPath Full Text 1.0 Use Cases
XQuery and XPath Full Text 1.0 Use Cases, Sihem Amer-Yahia and Pat Case, Editors. World Wide Web Consortium, 25 January 2011. This version is http://www.w3.org/TR/2011/NOTE-xpath-full-text-10-use-cases-20110125/. The latest version is available at http://www.w3.org/TR/xpath-full-text-10-use-cases/.
BCP 47
A. Phillips and M. Davis. Tags for Identifying Languages. IETF BCP 47. See http://tools.ietf.org/html/bcp47. This reference leads to [RFC 4646] and [RFC 4647] and replaces [RFC 3066].
RFC 2119
S. Bradner. Key Words for use in RFCs to Indicate Requirement Levels. IETF RFC 2119. See http://www.ietf.org/rfc/rfc2119.txt.
RFC 3066
H. Alvestrand. Tags for the Identification of Languages. IETF RFC 3066. See http://www.ietf.org/rfc/rfc3066.txt.
RFC 4646
A. Phillips and M. Davis. Tags for Identifying Languages. IETF RFC 4646. See http://www.ietf.org/rfc/rfc4646.txt.
RFC 4647
A. Phillips and M. Davis. Matching of Language Tags. IETF RFC 4647. See http://www.ietf.org/rfc/rfc4647.txt.

F.2 Non-normative References

ISO 2788
Documentation Guidelines for the Establishment and Development of Monolingual Thesauri, Geneva: International Organization for Standardization, 2nd edition, 1986.
SQL/MM
ISO/IEC 13249-2 Information technology --- Database languages --- SQL Multimedia and Application Packages --- Part 2: Full-Text. Geneva: International Organization for Standardization, 2nd edition, 2003.
UAX29
M. Davis. Unicode Standard Annex #29 Text Boundaries, revision 11, 2006. See http://www.unicode.org/reports/tr29/

G Acknowledgements (Non-Normative)

We would like to thank the members of the XQuery and XPath Full-Text group for their fruitful discussions.

We would like to thank the following people for their contributions on earlier drafts of this document.

H Glossary (Non-Normative)

AllMatches

An AllMatches describes the possible results of an FTSelection.

Distance Operator Restriction

Distance Operator Restriction. FTDistance can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.

Full-TextQueries

Full-text queries are performed on tokens and phrases. Tokens and phrases are produced via tokenization.

IgnoredNodes

Ignored nodes are the set of nodes whose content are ignored.

Match

Each Match describes one result to the FTSelection.

Negation Restriction 1

Negation Restriction 1. An FTUnaryNot expression may only appear as a direct right operand of an "ftand" (FTAnd) operation.

Negation Restriction 2

Negation Restriction 2. An FTUnaryNot expression may not appear as a descendant of an FTOr that is modified by an FTPosFilter. (An FTOr is modified by an FTPosFilter, if it is derived using the production for FTSelection together with that FTPosFilter.)

Order Operator Restriction

Order Operator Restriction. FTOrder may only appear directly succeeding an FTWindow or an FTDistance operator.

Paragraph

A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.

Phrase

A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.

QueryItem

A QueryItem is a sequence of QueryTokenInfos representing the collection of tokens derived from tokenizing one query string.

QueryTokenInfo

A QueryTokenInfo is the identity of a token inside a query string.

Score

The score of a full-text query result expresses its relevance to the search conditions.

Sentence

A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.

Single Language Restriction

Single Language Restriction. If a full-text query contains more than one FTLanguageOption in its body and the prolog, then the languages specified must be the same.

StringExclude

A StringExclude is a StringMatch that describes a TokenInfo that must not be contained in the document.

StringInclude

A StringInclude is a StringMatch that describes a TokenInfo that must be contained in the document.

StringMatch

A StringMatch is a possible match of a sequence of query tokens with a corresponding sequence of tokens in a document. A StringMatch may be a StringInclude or StringExclude.

Token

A token is a non-empty sequence of characters returned by a tokenizer as a basic unit to be searched. Beyond that, tokens are implementation-defined.

TokenInfo

A TokenInfo represents a contiguous collection of tokens from an XML document.

Tokenization

Formally, tokenization is the process of converting an XDM item to a collections of tokens, taking any structural information of the item into account to identify token, sentence, and paragraph boundaries. Each token is assigned a starting and ending position.

WeightDeclarations

Scoring may be influenced by adding weight declarations to search tokens, phrases, and expressions.

Window Operator Restriction

Window Operator Restriction. FTWindow can only be applied to an FTOr that is either a single FTWords or a combination of FTWords involving only the operators ftand and ftor.

anchoring selection

An anchoring selection consists of a full-text selection followed by one of the postfix operators "at start", "at end", or "entire content".

and-selection

An and-selection combines two full-text selections using the ftand operator.

cardinality selection

A cardinality selection consist of an FTWords followed by the FTTimes postfix operator.

case option

A case option modifies the matching of tokens and phrases by specifying how uppercase and lowercase characters are considered.

diacritics option

A diacritics option modifies token and phrase matching by specifying how diacritics are considered.

distance selection

A distance selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTDistance.

extension option

An extension option is a match option that acts in an implementation-defined way.

extension selection

An extension selection is a full-text selection whose semantics are implementation-defined.

full-text contains expression

A full-text contains expression is a expression that evaluates a sequence of items against a full-text selection.

full-text selection

A full-text selection specifies the conditions of a full-text search.

implementation dependent

Implementation-dependent indicates an aspect that may differ between implementations, is not specified by this or any W3C specification, and is not required to be specified by the implementor for any particular implementation.

implementation defined

Implementation-defined indicates an aspect that may differ between implementations, but must be specified by the implementor for each particular implementation.

language option

A language option modifies token matching by specifying the language of search tokens and phrases.

match option

Match options modify the set of tokens in the query, or how they are matched against tokens in the text.

match option application order

The order in which effective match options for an FTWords are applied is called the match option application order.

match option group

Each of the alternatives of production FTMatchOption other than FTExtensionOption corresponds to one match option group.

may

MAY means that an item is truly optional.

mild-not selection

A mild-not selection combines two full-text selections using the not in operator.

must

MUST means that the item is an absolute requirement of the specification.

not-selection

A not-selection is a full-text selection starting with the prefix operator ftnot.

or-selection

An or-selection combines two full-text selections using the ftor operator.

ordered selection

An ordered selection consists of a full-text selection followed by the postfix operator "ordered".

positional filter

Positional filters are postfix operators that serve to filter matches based on various constraints on their positional information.

primary full-text selection

A primary full-text selection is the basic form of a full-text selection. It specifies tokens and phrases as search conditions (FTWords), optionally followed by a cardinality constraint (FTTimes). An FTSelection in parentheses and the FTExtensionSelection are also a primary full-text selections.

scope selection

A scope selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTScope.

search context

Those items are called the search context.

should

SHOULD means that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

stemming option

A stemming option modifies token and phrase matching by specifying whether stemming is applied or not.

stop word option

A stop word option controls matching of tokens by specifying whether stop words are used or not. Stop words are tokens in the query that match any token in the text being searched.

thesaurus option

A thesaurus option modifies token and phrase matching by specifying whether a thesaurus is used or not.

wildcard option

A wildcard option modifies token and phrase matching by specifying whether or not wildcards are recognized in query strings.

window selection

A window selection consists of a full-text selection followed by one of the (complex) postfix operators derived from FTWindow.

I Checklist of Implementation-Defined Features (Non-Normative)

This appendix provides a summary of features defined in this specification whose effect is explicitly implementation-defined. The conformance rules require vendors to provide documentation that explains how these choices have been exercised.

  1. Tokenization, including the definition of the term "tokens", SHOULD be implementation-defined. Implementations SHOULD expose the rules and sample results of tokenization as much as possible to enable users to predict and interpret the results of tokenization.

  2. A phrase is an ordered sequence of any number of tokens. Beyond that, phrases are implementation-defined.

  3. A sentence is an ordered sequence of any number of tokens. Beyond that, sentences are implementation-defined. A tokenizer is not required to support sentences.

  4. A paragraph is an ordered sequence of any number of tokens. Beyond that, paragraphs are implementation-defined. A tokenizer is not required to support paragraphs.

  5. Implementations are free to provide implementation-defined ways to differentiate between markup's effect on token boundaries during tokenization.

  6. The set of expressions (of form ExprSingle) that can be assigned to a score variable in a let-clause is implementation-defined. If an expression not supported by the scoring algorithm is passed to the scoring algorithm, the result is implementation-defined.

  7. When a sequence of query tokens is considered as a phrase, it matches a sequence of tokens in the tokenized form of the text being searched only if the two sequences correspond in an implementation-defined way.

  8. The match option application order, subject to the stated constraints, is implementation-defined.

  9. The "language" option influences tokenization, stemming, and stop words in an implementation-defined way. It MAY influence the behavior of other match options in an implementation-defined way.

  10. The set of valid language identifiers is implementation-defined.

  11. If an invalid language identifier is specified, then the behavior is implementation-defined.

  12. When a processor evaluates text in a document that is governed by an xml:lang attribute and the portion of the full-text query doing that evaluation contains an FTLanguageOption that specifies a different language from the language specified by the governing xml:lang attribute, the language-related behavior of that full-text query is implementation-defined.

  13. It is implementation-defined which thesaurus relationships an implementation supports.

  14. If a query specifies thesaurus relationships not supported by the thesaurus, or does not specify a relationship, the behavior is implementation-defined.

  15. The effect of specifying a particular range of levels in an FTThesaurusID is implementation-defined.

  16. If a query does not specify the number of levels, and the implementation does not follow the default of querying all levels of hierarchical relationships, then the number of levels of hierarchical relationships queries is implementation-defined.

  17. It is implementation-defined what a stem of a token is, and whether stemming is based on an algorithm, dictionary, or mixed approach.

  18. An implementation-defined comparison is used to determine whether a query token appears in the collection of stop words defined by the applicable stop word option.

  19. Normally a stop word matches exactly one token, but there may be implementation-defined conditions, under which a stop word may match a different number of tokens.

  20. The "stop words default" option specifies that an implementation-defined collection of stop words is used.

  21. An implementation recognizes an implementation-defined set of namespace URIs used to denote extension options. The effect of each, including its error behavior, is implementation-defined.

  22. An implementation recognizes an implementation-defined set of namespace URIs used to denote extension selection pragmas. The effect of each, including its error behavior, is implementation-defined.

  23. The conditions under which tokenization of two equal items produces different tokens is implementation-defined.

  24. An implementation may impose an implementation-defined restriction on the operand of FTIgnoreOption.

  25. For certain full-text components of the static context (see C Static Context Components), the default initial value of the component can be overwritten or augmented with an implementation-defined value or values.

J Change Log (Non-Normative)

Michael Dyck 2008-08-19 3.6.3 Distance Selection Change the prose around example 2 to agree with the meeting 170 decision re distance filter applied to n>2 matches.
Jim Melton 2008-08-26 Sections 2.3.1 Using Weights Within a Scored FTContainsExpr, 3 Full-Text Selections, and D Error Conditions Ensure that all description of weights, valid values, and errors related to invalid values are captured in one place, and generalize the description of the error raised for invalid values. Resolves bug 5812.
Jim Melton 2008-08-26 Section 3.8 Extension Selections Correct syntax of the second and third examples. Resolves bug 5879.
Jim Melton 2008-08-26 Section 3.5.4 Not-Selection Rewrite the second example to correspond to the search document content. Resolves bug 5884.
Jim Melton 2008-08-26 Section 3.5.4 Not-Selection Rewrite the third example to correspond to the search document structure. Resolves bug 5885.
Mary Holstege 2008-10-30 Miscellaneous Add text to clarify status of attribute searches (bug 5975) and of scope and constraints on score variables (bug 6094).
Mary Holstege 2008-10-30 Full-Text Selections Change scope of weight variables to FTPrimary rather than FTSelection to resolve bug 6178.
Michael Dyck 2008-11-07 2.3.1 Using Weights Within a Scored FTContainsExpr Collateral changes due to Bug 6178's relocation of FTWeight within grammar.
Michael Dyck 2008-11-07 3.4 Match Options, 3.6 Positional Filters Add text intended to clarify scope of options/filters. [Bug 5977]
Jim Melton 2008-11-12 Appendix E XML Syntax (XQueryX) for XQuery and XPath Full Text 1.0 Revise Schema and stylesheet to reflect change in position of "weight" in the grammar.
Pat Case 2008-11-23 Removed count > 0 from examples Removed count > 0 from examples in 3.4.3 Thesaurus Option and 3.4.7 Stop Word Option.
Pat Case 2008-11-23 Corrected FTLanguage example Corrected the FTLanguage example in Section 3.4.1 to search in content//p and to search for "salon de thé".
Michael Dyck 2009-01-08 4.2.7.9 FTDistance Rewrite the very last sentence. [Bug 6303]
Michael Dyck 2009-01-08 Grammar, Full-Text Selections Get rid of "multiple-match-options" as an extra-grammatical constraint, and instead make it a conventional static error (FTST0019).
Jim Melton 2009-01-28 Section 1 Introduction Inserted paragraph stating that Notes are not normative.
Jim Melton 2009-01-28 Appendix D Error Conditions Added entries and descriptions for errors XPST0003, XQST0013, and XQST0079.
Mary Holstege 2009-02-19 Window Selection 3.6.2 Window Selection Provided corrected commentary on windowing example; clarified non-applicability of contents of attribute in this example.
Mary Holstege 2009-02-19 FTUnit and FTBigUnit 5.2.3 FTUnit and FTBigUnit Clarified that we did not mean for 'word' and 'words' to be optional units.
Mary Holstege 2009-02-19 Thesaurus Option 3.4.3 Thesaurus Option and Error Conditions D Error Conditions Added error FTST0018 for missing thesauri.
Michael Dyck 2009-02-26 4.2.7.7 FTContent Rewrite fts:ApplyFTContent() to fix 'entire content'. In the process, introduce fts:TokenInfoCoversTokenPosition(), fts:getLowestTokenPosition(), & fts:getHighestTokenPosition() and drop fts:isStartToken() & fts:isEndToken()
Michael Dyck 2009-02-26 4.2.7.7 FTContent Add example of counter-intuitive behaviour due to @isContiguous=false.
Michael Dyck 2009-03-16 4.2.7.9 FTDistance Rearrange the section to bring together the parts about fts:ApplyFTDistance.
Michael Dyck 2009-03-16 4.2.4 Formal semantics functions & 4.2.7.9 FTDistance Dissolve 4.2.4 and integrate its content into 4.2.7.9 (now 4.2.6.9).
Michael Dyck 2009-03-17 4.2.6.8 FTWindow Fix two typos in fts:joinIncludes(). (Bug 6386)
Michael Dyck 2009-03-17 fts:evaluate, fts:FormRange, fts:UnaryNotHelper, fts:calcStopWords Fix some typos that would give static errors or type errors if you tried to treat these functions as actual XQuery code.
Mary Holstege 2009-03-19 fts:ApplyFTScopeDifferentSentence, fts:ApplyFTScopeDifferentParagraph Fix code to properly handle the case where there is a single string include.
Pat Case 2009-03-28 Changed FTStopword examples 1 and 3 Removed the stemming match option and changed the operand propagation to propagating in FTStopword examples 1 and 3.
Michael Dyck 2009-04-14 3.4.1 Language Option Fix typo in example query.
Jim Melton 2009-04-30 E.3.1.2 A Solution in Full Text XQueryX Fix element names in example. (Bug 6840)
Michael Dyck 2009-06-06 3.2 Search Tokens and Phrases Add wording to handle cases involving empty sequences of items or tokens. (Bug 6813)
Michael Dyck 2009-06-08 4.2.4 FTWords Fix error in fts:ApplyFTWordsAnyWord().
Michael Dyck 2009-07-05 3.4.2 Wildcard Option, Appendix D Errors Wording changes, add more examples. Add error FTDY0020.
Michael Dyck 2009-07-06 3.4.2 Wildcard Option Delete Note re wildcards and token boundaries.
Michael Dyck 2009-07-06 E.1 XQueryX representation of XQuery and XPath Full Text 1.0 Add element declaration for ftMatchOptions.
Michael Dyck 2009-07-14 E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0 Bug fixes.
Mary Holstege 2009-09-03 Various Modify match option syntax to avoid conflicts with XQuery and XQuery Update Facility: change "with" to "using" and "without" to "using no", plus adding "using" before other options for consistency. Change 'ftcontains' to 'contains text'. Bugzilla bugs 7247 and 7271.
Jim Melton 2009-09-18 E.1 XQueryX representation of XQuery and XPath Full Text 1.0 Fixed grammar within a comment re: Bugzilla Bug 7247. Fixed grammar within comments re: Bugzilla Bug 7271; also fixed enumeration values and element content names for complex types ftStemOption, ftThesaurusOption, ftStopwordOption, and ftWildCardOption re: Bugzilla Bug 7271.
Jim Melton 2009-09-18 E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0 Fixed templates' generated output re: Bugzilla Bug 7247 and Bugzilla Bug 7271.
Jim Melton 2009-09-18 E.3 XQueryX for XQuery and XPath Full Text 1.0 example Fixed "original" XQuery, the XQueryX code, and the XQuery resulting from transforming the XQueryC code re: Bugzilla Bug 7247 and Bugzilla Bug 7271.
Michael Dyck 2009-10-10 Sections 3.4.* and Appendix A Change match option syntax to resolve bug 7271.
Michael Dyck 2009-10-10 Section 2.2.1 and Appendix A Change FTContainsExpr syntax to resolve bug 7247.
Pat Case 2009-11-10 Section 4.2.5.1 Types Corrected an error in the FTMatchOptions schema for FTDiacriticsOption changing case insensitive to diacritics insensitive and changing case sensitive to diacritics sensitive
Michael Dyck 2009-11-25 Section 2.3.1 and 3.1.1, Appendix A and B Change syntax of FTWeight: replace RangeExpr with "{" Expr "}". Change examples and prose accordingly.
Mary Holstege 2009-12-10 Appendix D, Section 4.2.5 Change semantics schema and functions to replace "with/without".
Michael Dyck 2010-01-25 Section 3.5.3 Mild-Not Selection Clarify that FTDY0017 is a dynamic error, and not statically detectable.
Mary Holstege 2010-05-25 Sections 3.4.3 Thesaurus Option and 3.4.7 Stop Word Option Resolve [9677] by making it clear the relative URIs for thesauri and stop word lists should be resolved against the base URI in the static context.
Mary Holstege 2010-05-25 Appendix D Error Conditions Resolve [9681] by replacing the obsolete operators && and || by ftand and ftor, respectively.
Michael Dyck 2010-07-11 Sections 3.4.7 Stop Word Option and 4.2.5.8 FTStopWordOption Replace wording that said that stop words are "removed from the search" or "removed from the set of query tokens".
Michael Dyck 2010-07-11 Sections 3.4.7 Stop Word Option and I Checklist of Implementation-Defined Features (Non-Normative) Change two occurrences of "default stop words" to "stop words default". (Leftovers from 2009-10-10.)
Michael Dyck 2010-07-13 Sections 3.4.7 Stop Word Option and I Checklist of Implementation-Defined Features (Non-Normative) Add wording to state that an implementation-defined comparison is used to determine whether a query token appears in a collection of stop words.
Michael Dyck 2010-09-06 4.2.6.6 FTScope Fix the 'where' conditions in fts:ApplyFTScopeDifferentSentence() and fts:ApplyFTScopeDifferentParagraph(). And tweak the prose following the former. (See Bugzilla Bug 9448.)
Jim Melton 2010-09-07 E.2 XQueryX stylesheet for XQuery and XPath Full Text 1.0 Fixed two templates (weights and extensions).
Michael Dyck 2010-09-13 Section 3.2 (Search Tokens and Phrases), Appendix A (EBNF for XQuery 1.0 Grammar with Full-Text extensions), and Appendix B (EBNF for XPath 2.0 Grammar with Full-Text extensions) In the production for FTWordsValue, change "Literal" to "StringLiteral"
Michael Dyck 2010-09-16 Appendix I (Checklist of Implementation-Defined Features) Make the list more consistent with the statements of implementation-defined features in the body of the document. (Delete 2 items, add 3, reword several, reorder several.)
Michael Dyck 2010-11-29 Section 4.2.5.{2,3,7} Pass $noThesaurusOptions down to fts:lookupThesaurus(). (See Bugzilla bug 11209.)
Michael Dyck 2010-11-29 Section 4.2.5.{1,3,7} Eliminate the now-redundant $thesaurusLanguage parameter from fts:lookupThesaurus(), and the now-unnecessary (and not syntactically justified) "language" attribute from complexType ftThesaurusOption. (Clean-up after previous change.)
Mary Holstege 2010-12-06 Section 3.6 Require FTOrder positional filters to be applied before other positional filters.
Mary Holstege 2011-01-04 Sections 2.3.1 and 3.1.1. In resolution of bug 11582, clarify that the second "constraint" on scoring algorithms is actually just a specific consequence of the rules for errors and optimization.
Mary Holstege 2011-01-04 Section 3.4.3 In resolution of bug 11444, change the default from "all levels" to "all levels or to an implementation-defined number of levels."
Michael Dyck 2011-02-28 Section 3.4.3 and Appendix I. Introduce FTLiteralRange for FTThesaurusID. Clarify implementation-definedness of levels. (See bugs 11821 and 12036.)
Mary Holstege 2011-03-01 Section 3 Resolve bug 12057 by clarifying what the sample tokenization is for the examples in the specification.
Jim Melton 2011-03-08 Appendix E Updated Full Text XQueryX schemas and stylesheet for bugs 11821 and 12036