12144 – [FT] ApplyFT*Window semantics wrong

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12144 - [FT] ApplyFT*Window semantics wrong

Summary: [FT] ApplyFT*Window semantics wrong

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Full Text 1.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	All All

Importance:	P2 major
Target Milestone:	---
Assignee:	Jim Melton
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-02-20 16:01 UTC by Paul J. Lucas
Modified:	2011-03-30 01:36 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Paul J. Lucas 2011-02-20 16:01:54 UTC

After having sent e-mail on this to the public-qt-comments mailing list and receiving no response, I'm now filing it as a bug that will force it to be dealt with eventually.

Unless I'm missing something, I think the semantics for the ApplyFTWordWindow may be wrong. The test FTNot-q6.xq in the XQuery Full-Text Test Suite is:

> declare variable $input-context external;
> 
> $input-context/books/book[
>   para contains text "software" ftand ("coder" ftand ftnot "ninja" window 5 words)
> ]/title

That test has the expected results of "nothing" yet our engine returns:

<title>Ninja Coder</title>

To simplify matters, I've whittled down that test into an equivalent test that exhibits the same failure:

> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand ftnot "ninja" window 5 words

Running this test incorrectly returns true; if you remove the "window 5 words" from the test, it correctly returns false.

What constitutes correctness of course depends on my interpretation of a window filter used with an "ftand ftnot" means.  If the query were instead:

> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand "ninja" window 5 words

i.e., the "ftnot" were removed, then that means that, in order for the query to return true, the words "coder" and "ninja" must not only both occur at least once in the same document, but must also occur at least once within 5 words of each other.

If the "ftnot" is put back, then I assume that means that, in order for the query to return true, the word "coder" must occur at least once in the document and the word "ninja", if it occurs at all, must never occur within 5 words of any "coder".  

To compare the results from our engine, I've copied/pasted the XQuery algorithms and the schema from the spec and run that using our engine for comparison "hand-coding" the allMatches data, i.e.:

> let $am :=
>  <fts:allMatches stokenNum="1">
>    <fts:match>
>      <fts:stringInclude queryPos="1" isContiguous="T">
>        <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
>          startPara="1" endPara="1"/>
>      </fts:stringInclude>
>      <fts:stringExclude queryPos="2" isContiguous="T">
>        <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
>          startPara="1" endPara="1"/>
>      </fts:stringExclude>
>    </fts:match>
>  </fts:allMatches>
> return
>  fts:ApplyFTWordWindow( $am, 5 )

The results of that are:

<fts:allMatches xmlns:fts="http://www.w3.org/2007/xpath-full-text" stokenNum="1">
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
   <fts:stringExclude queryPos="2" isContiguous="T">
     <fts:tokenInfo startPos="1" endPos="1" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringExclude>
 </fts:match>
 <fts:match>
   <fts:stringInclude queryPos="1" isContiguous="false">
     <fts:tokenInfo startPos="2" endPos="2" startSent="1" endSent="1"
       startPara="1" endPara="1"/>
   </fts:stringInclude>
 </fts:match>
</fts:allMatches>

The last match has a stringInclude but no stringExclude.  According to the semantics for the FTContainsExpr in section 4.3:

>           return 
>              some $match in $allMatches/fts:match
>              satisfies 
>                 fn:count($match/fts:stringExclude) eq 0

it says to return "true" if there is at least one match that has no stringExclude.  Well, as I've pointed out above, the last match has no stringExclude.  Therefore, the query (according to the spec's own semantics) returns true whereas the expected return should be false.

So, as far as I can tell, there a bug in the semantics of ApplyFTWordWindow, ApplyFTSentenceWindow, and ApplyFTParagraphWindow.

Comment 1 Paul J. Lucas 2011-02-20 16:02:58 UTC

Importance increased to "major" since, if I'm right, it's a major bug in the specification to have incorrect semantics.

Comment 2 Michael Dyck 2011-02-21 08:01:59 UTC

(personal response:)
This is mostly a duplicate of Bug 12009, so I will mark it as such, but I'll also answer some particulars.

As indicated in 12009, I believe your engine is behaving conformantly on this test-case; the test-case's expected output is incorrect.

Re the whittled-down test:
> let $x := <msg>ninja coder</msg>
> return $x contains text "coder" ftand ftnot "ninja" window 5 words

I believe your engine, by returning true, is again correct, and your interpretation of a window filter used with an "ftand ftnot" is incorrect.

> If the "ftnot" is put back, then I assume that means that, in order for the
> query to return true, the word "coder" must occur at least once in the
> document and the word "ninja", if it occurs at all, must never occur within
> 5 words of any "coder".

As you say, this interpretation would cause you to expect a result of false from the whittled-down test (because "ninja" *does* occur within 5 words of "coder").
However, the correct interpretation is more along the lines of:
    there must be at least one window of 5 words
        containing an occurrence of "coder" and
        not containing any occurrence of "ninja"
(Note that the window is allowed to 'extend beyond' the bounds of the search context, otherwise a two-word element couldn't support a 5-word window.)
This interpretation implies a result of true for the test, since the 5-word window that starts at "coder" does not contain an occurrence of "ninja".

So I don't think this issue indicates a bug in the semantics of fts:ApplyFT*Window.

Comment 3 Michael Dyck 2011-02-21 08:03:58 UTC

(On second thought, maybe I won't mark it as a duplicate.)

Comment 4 Paul J. Lucas 2011-02-21 16:23:34 UTC

> This interpretation implies a result of true for the test, since the 5-word
> window that starts at "coder" does not contain an occurrence of "ninja".

To me, that's counter-intuitive.  How would you express my interpretation in XQuery?

Comment 5 Michael Dyck 2011-02-21 17:47:01 UTC

(personal response:)

To achieve this:
> the word "coder" must occur at least once in the document
> and the word "ninja", if it occurs at all,
> must never occur within 5 words of any "coder".

try this:
  ... contains text "coder" ftand ftnot ( "coder" ftand "ninja" window 5 words )

Comment 6 Paul J. Lucas 2011-02-21 20:36:03 UTC

Your solution isn't very intuitive.

The fact that the test author got the expected results wrong supports my assertion that the present semantics aren't intuitive.

Comment 7 Jim Melton 2011-02-24 02:43:58 UTC

Paul, there was some sympathy in the WGs for your concern about non-intuitiveness.  However, given that a change of that magnitude would cause Full Text 1.0 to go back to Working Draft, we concluded that delaying publication for yet another 18 to 24 months would do a greater dis-service to the community at large.  We've agreed to change FTNot-q6 so that it doesn't provide such a non-intuitive query, thus reducing the visibility of the problem (see, I'm being honest here!), but to defer consideration of a change to the syntax and/or semantics until v.next of Full Text.  We hope that this is acceptable to you, even though it did not directly make the changes that you might want in this version of the spec. 

As soon as FTNot-q6 has been modified, we'll mark the bug RESOLVED, either with a "FIXED" resolution or a "LATER" resolution.  We hope you will be willing to mark the bug CLOSED at that time.

Comment 8 Michael Dyck 2011-03-15 19:41:27 UTC

As reported in Bug 12009 comment 4, the WGs agreed that for test-case FTNot-q6, the expected results were incorrect for the query, and I have fixed it by changing the expected result to: 
    <title>Ninja Coder</title>
as returned by your engine.

(In comment 7, Jim Melton indicated that we would change the query to be more "intuitive", but that was a mistaken reconstruction of a truncated action item.)

Consequently, I'm marking this bug resolved-fixed. Please mark it closed if you accept this resolution.