8408 – Clarification on StringExcludes and Windows

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 8408 - Clarification on StringExcludes and Windows

Summary: Clarification on StringExcludes and Windows

Status:	CLOSED INVALID

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Full Text 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jim Melton
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-12-01 13:10 UTC by Peter M. Fischer
Modified:	2009-12-21 07:26 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Peter M. Fischer 2009-12-01 13:10:13 UTC

In the CR, the following query (examples-362-5 in XQFTTS) is said to return an empty result when applied to the sample document (FT-3-examples-source-document.xml):

/books/book[@number="1" and . contains text "efficient" 
ftand ftnot "and" window 3 words]

The reasoning is that there is no occurrence of "efficient" within a window of 3 tokens which would not also contain an occurrence of "and".

The formal definition of FTWindow seems to indicate a different result:

- There are 3 (not 2) occurences of "and" in book, two within the <p> element and a third in the <title> element. Applying FTNot and then FTAnd would yield three matches, each with the StringInclude of the single occurence of "efficient" (within <p>), and a StringExlude that corresponds to the occurences of "and"

- The StringInclude (trivially) fulfills the window condition for all three cases
- According to 4.2.6.8, fts:ApplyFTWordWindow, only the StringExcludes are retained which are within the window limits. Clearly, the two StringExcludes within <p> fulfill this criterion, and are retained in the result. For the StringExclude stemming from <title>, the window condition is not fulfilled, therefore it is dropped.

- As a consequence, there is now a Match without a StringExclude, causing the book to become part of the result

Is my understanding of the semantics correct?

If yes, a possible solution would be to modify the search context, as to search not inside <book>, but inside <p>

Comment 1 Michael Dyck 2009-12-07 03:10:10 UTC

> Applying FTNot and then FTAnd would yield three matches, each with
> the StringInclude of the single occurence of "efficient" (within <p>),
> and a StringExclude that corresponds to the occurences of "and"

Actually, I think you'll find it yields a single match, containing the one StringInclude and all three StringExcludes. See the example in 4.2.6.3 FTUnaryNot, in which FTUnaryNot transforms:
  an AllMatches containing 3 Matches each with 1 StringInclude
into:
  an AllMatches containing 1 Match with 3 StringExcludes.

Comment 2 Michael Dyck 2009-12-07 03:12:25 UTC

For completeness, here is my analysis of that example.

In the example document, the 'book' node has one occurrence of the word
"efficient" (call it E) and 3 occurrences of the word "and" (call them
A1, A2, A3). I'll use an ad hoc notation for matches.

     "efficient"
         generates 1 match:
             [include E]

     "and"
         generates 3 matches:
             [include A1],
             [include A2],
             [include A3]

     ftnot "and"
         fts:ApplyFTUnaryNot calls fts:UnaryNotHelper and generates a
         single match:
             [exclude A1, exclude A2, exclude A3]

     "efficient" ftand ftnot "and"
         fts:ApplyFTAnd generates a single match:
             [include E, exclude A1, exclude A2, exclude A3]

     "efficient" ftand ftnot "and" window 3 words
         fts:ApplyFTWindow calls fts:ApplyFTWordWindow, which takes the
         single match (from ApplyFTAnd) and finds all windows of width 3
         that contain all the stringIncludes of the match (i.e., just E),
         thus:

             and enable efficient

                 enable efficient and

                        efficient and effective

         For each such window, it generates a match consisting of:
         -- the "join" of the stringIncludes (i.e., just E again), and
         -- all of the stringExcludes (from the input match) that fall
            within the window. For the first window, this is the
            stringExclude for A2. For the second and third window, this
            is the stringExclude for A3.

         That is, it generates 3 matches:
            [include E, exclude A2],
            [include E, exclude A3],
            [include E, exclude A3]

     . ftcontains "efficient" ftand ftnot "and" window 3 words
         fts:FTContainsExpr receives the above 3 matches, looks for one
         that has zero stringExcludes, finds no such match, and so returns
         false (which agrees with the prose accompanying the example).

If there's a flaw in this reasoning, please let us know.

Comment 3 Michael Dyck 2009-12-14 22:49:49 UTC

At its meeting on December 7, the Task Force endorsed my responses above. Consequently, I'm marking this issue Resolved-Invalid. If you agree, please mark it Closed.