This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6665 - [FT] Test Suite Bugs
Summary: [FT] Test Suite Bugs
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL: http://basex.org
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-03-08 22:39 UTC by Christian Gruen
Modified: 2009-05-02 12:00 UTC (History)
3 users (show)

See Also:


Attachments

Description Christian Gruen 2009-03-08 22:39:11 UTC
Dear all,

thanks again for your efforts. I've once more debugged the current test suite:

[1] FTContent-complex1.xqy ...

File suffixes should be changed to .xq


[2] ft-3.2-examples-q5.xq:

Still an open issue (see bug 6469); I'd propose a small scoring value (to cover more implementations) and the 'inspect' attribute.


[3] ft-3.3-examples-q1.xq:

Still open (6469)


[4] ft-3.4.4-examples-q2.xq

Expected Result: boolean value (true)


[5] examples-362-4.xq

I wouldn't expect any results for [. ftcontains "efficient" ftand ftnot "and" window 2 words] as "and" directly follows the term "efficient" - but I might be wrong


[6] examples-363-4.xq

The result contains the <book> element instead of the <title>


[7] examples-363-5.xq

Result is empty; I'd expect a boolean result.


[8] examples-364-2.xq

I would have expected "false" as result as no typical sentence delimiter is found here (such as a period).


[9] ftstaticcontext-q6.xq

Book number 3 is missing in the result.


[10] /ftstaticcontext-q7.xq

No error expected here as FT-Option is in first part of prolog. (In case of an error, I wouldn't expect FTST0001, but XPST0003).


[11] FTPrimary-FTWords-q1.xq

Wrong results (expected: <p>It is a static ...</p> and
<p>The order in ...</p>)


[12] FTPrimary-FTExtensionSelection-q1.xq

Wrong result (expected: <p>It is a static ...<p>)


[13] FTPrimary-FTWords-anyword-q4a.xq

9 (instead of 1) <p> results expected


[14] FTPrimary-FTWords-anyword-q4b.xq

9 (instead of 0) <p> results expected


[15] still open (see bug 6469..):

 "t is checked.<" -> "t is checked. <"
 "ue</nt>is"        ->  "ue</nt> is"
 "on</nt>produces"  ->  "on</nt> produces"


[16] FTSelection-FTTimes-q4d.xq

Second <p> element should be removed (contains "cardinality" only once)


[17] FTNot-q1.xq

Only 4 <title> elements expected (the other ones don't have a <para> element)


[18] ftcaseunconstrained-q1.xq

I would expect no results; if I got it right, the character conversion is only applied to the query string (I'm just interested: can you explain why the lowercase/uppercase options were introduced to the standard?).


[19] FTPosFilter-q2.xq

I would expect 'false' as terms are not ordered.


[20] FTPosFilter-q4.xq

Maybe I misinterepret this one, but I would expect 'false' as distance isn't 0 words.


[21] FTPosFilter-q5.xq / FTPosFilter-q6.xq

'false' expected (words are in same sentence)


[22] FTWindow-words3.xq

$x should at least be 4


[23] FTWindow-words4.xq

..is the same as FTWindow-words3.xq


[24] FTWindow-complexwords3.xq

"window 2 words" expected


[25] FTWindow-complexwords4.xq

"window 31 words" expected


[26] FTWindow-sentences1.xq / FTWindow-paragraphs1.xq

"window 2 ..." expected


[27] FTWindow-sentences3.xq / FTWindow-paragraphs3.xq

$x := 2 expected


[28] FTDistance-complexwords2.xq

<title>Ninja Coder</title> expected as result


[29] ftstaticcontext-q5.xq

I would expect an error here as both "case sensitive" and "case insensitive" are specified (but, as usual, I can easily be wrong).


[30] ft-3.4.1-examples-q1.xq

FTST0009 expected instead of FTST0013.


[31] FTContent-and1.xq

Wrong syntax: "content.]"


[32] FTIgnore-q...

"ignore" queries should support FTST0007 as alternative result.


Best,

Christian, BaseX Team 
http://www.basex.org
Comment 1 Michael Dyck 2009-03-09 20:00:24 UTC
> [5] examples-362-4.xq
> 
> I wouldn't expect any results for
>     [. ftcontains "efficient" ftand ftnot "and" window 2 words]
> as "and" directly follows the term "efficient" - but I might be wrong

So you're thinking it means something like:
    the book contains an occurrence of 'efficient' that doesn't have
    an occurrence of 'and' within any 2-word window of it
But in fact it means something like:
    the book has a 2-word window that contains an occurrence of
    'efficient' and no occurrence of 'and'
which succeeds because of the words "enable efficient".

In terms of the section 4 semantics, the FTAnd
    "efficient" ftand ftnot "and"
generates a single Match, containing a StringInclude for the occurrence of
'efficient' and three StringExcludes, one for each occurrence of 'and' in
the book's string-value. This Match is passed to the fts:ApplyFTWordWindow
function to apply the
    window 2 words
filter. It constructs every 2-word window that contains all the
StringIncludes in the Match (there are two, one for "enable efficient"
and one for "efficient and") and for each, generates a Match containing
the StringInclude for 'efficient', and all the StringExcludes that fall
within the window. (For the "enable efficient" window, there are no
StringExcludes; for the "efficient and" window, there is one.) At the top
level, fts:FTContainsExpr looks for a Match containing no StringExcludes,
finds one, and so yields true.


> [29] ftstaticcontext-q5.xq
> 
> I would expect an error here as both "case sensitive" and "case
> insensitive" are specified (but, as usual, I can easily be wrong).

The test was constructed to counter that expectation. You're presumably
expecting FTST0019, defined in Section 3.4:
    It is a static error [err:FTST0019] if, within a single FTMatchOptions,
    there is more than one match option of any given match option group.
    For example, if the FTCaseOption "lowercase" is specified, then
    "uppercase" cannot also be specified as part of the same
    FTMatchOptions.
The thing to note is that the constraint only applies within a single
FTMatchOptions. In the test case:
    declare ft-option case sensitive;
    declare ft-option with stemming case insensitive;
there are two FTOptionDecls, each containing its own FTMatchOptions. Within
each FTMatchOptions, there are no match option conflicts, so no error.
Comment 2 Christian Gruen 2009-03-09 20:19:39 UTC
Michael,

thanks for your quick reply.

Conc. issue 29: do you have an unpublished version of the XQFT specs/could you send me your version? I cannot find any reference to the error FTST0019 at http://www.w3.org/TR/xpath-full-text-10.

I am a little big confused why the constraint only applies to single match options; do you know why this decision has been taken?

Thanks,
Christian

Comment 3 Michael Dyck 2009-03-09 23:29:24 UTC
(In reply to comment #2)
> 
> Conc. issue 29: do you have an unpublished version of the XQFT specs/
> could you send me your version?

Sorry, that would be against the rules.

> I cannot find any reference to the error FTST0019 at
> http://www.w3.org/TR/xpath-full-text-10.

Ah, that's right. In the current CR, the restriction is expressed via an
extra-grammatical constraint, so you'd have been expecting a parse error.
(We realized it wasn't an appropriate use of an EGC, so changed it to be
an ordinary static error.)

> I am a little big confused why the constraint only applies to single
> match options; do you know why this decision has been taken?

That depends on what you have in mind as the alternative.
Comment 4 Christian Gruen 2009-03-10 00:15:48 UTC
> That depends on what you have in mind as the alternative.

Well, I would expect that a single match option cannot be redefined if it has been defined before. Did you have a special reason to allow redefinitions here?
Comment 5 Michael Dyck 2009-03-10 00:31:34 UTC
(In reply to comment #4)
> 
> Well, I would expect that a single match option cannot be redefined if it has
> been defined before.

That would prevent later/inner FTMatchOptions from overriding earlier/outer FTMatchOptions, which is explicitly intended.

> Did you have a special reason to allow redefinitions here?

If by "here", you mean "in FTOptionDecls in the Prolog", then I'm not sure we
have a special reason to allow them there -- it might just be a particular
example of the general intent.
Comment 6 Christian Gruen 2009-03-10 01:03:13 UTC
> That would prevent later/inner FTMatchOptions from overriding earlier/outer
> FTMatchOptions, which is explicitly intended.

Thanks again; we probably have different implementations in mind as overriding should be no problem, even if default options can only be defined once in the prolog.

Comment 7 Pat Case 2009-03-13 13:24:39 UTC
Hi Christian,

I am getting back to you on item 4, 9-10, 18, 30.

[4] ft-3.4.4-examples-q2.xq
Expected Result: boolean value (true)

--Yes. I have fixed the result.

[9] ftstaticcontext-q6.xq
Book number 3 is missing in the result.

--I corrected the query to actually look only in the Book with attribute number="1" as it was intended to do. Changing this line from:
let $cont := $book[$x]/content
to
let $cont := $book[@number=$x]/content

[10] /ftstaticcontext-q7.xq
No error expected here as FT-Option is in first part of prolog. (In case of an
error, I wouldn't expect FTST0001, but XPST0003).

--I think this one inadvertently got fixed after we ran a syntax check, so I have gone back in and broken it again, moving the FT-Option later in the prolog.

--Well I couldn't have been farther off on the error. Can't explain that. I have fixed it to XPST0003.

[18] ftcaseunconstrained-q1.xq
I would expect no results; if I got it right, the character conversion is only
applied to the query string (I'm just interested: can you explain why the
lowercase/uppercase options were introduced to the standard?).

--Yes. Ersatz is only in the document with an initial upper case. I have replaced the result with a blank document. 

--Lowercase and uppercase are in the spec for end user convenience. They allow users to specify lower or upper case regardless of how the characters are typed or copied into a search box. They will be especially helpful to those with little manual dexterity and other disabilities. 

[30] ft-3.4.1-examples-q1.xq
FTST0009 expected instead of FTST0013.

--No change. This is another recent error addition. We will be publising a new version of the spec soon so you will be able to see the new errors. This error is added to 3.4.1 as:
If an invalid language identifier is specified, then the behavior is implementation-defined. If the implementation chooses to raise an error in that case, it must raise [err:FTST0009].

As always, thanks so much for your assistance.

Pat
Comment 8 Jim Melton 2009-03-13 21:40:37 UTC
Christian, I reviewed your items [2] and [11] through [16] and agreed with each of them.  I have made the corrections that you suggested, and invite you to review the changes that I made.  Unless you object to any of my changes, then I will consider items [2] and [11] through [16] as resolved. 

Thanks again for your diligence!
Comment 9 Christian Gruen 2009-03-14 00:57:50 UTC
Thanks Jim, the only thing I noticed was a missing newline before the closing </p> tags in [11] and [12]:

OLD: FTMatchOptions</nt>.</p>
NEW: FTMatchOptions</nt>.
</p>

OLD: stem(synonym(word)).</p>
NEW: stem(synonym(word)).
</p>

Christian


Comment 10 Mary Holstege 2009-03-19 16:40:41 UTC
Christian, thanks again for your detailed comments.
With respect to items 6, 7, 17, 22-28, 31, 32. Done.

With respect to item #8: the test is correct because the element p is a paragraph break, and sentences are not allowed to span paragraphs. (Section 4.1 bullet 6b).

With respect to items 19-21 the tests correctly reflect the subtle influence of the rule that the modifiers are applied in order. (Although you are correct that FTPosFilter-q5.xq should return false; this was a mistake.)

Consider FTPosFilter-q2:
"one two three" ftcontains "three" ftand "one" window 3 words ordered

First the ftand is applied, creating a single string match spanning positions 1 to 3.  Then the window is applied, and it also succeeds, and we still have the single string match.  Is this single string match ordered with respect to itself? Yes it is.  Similar rules apply to the other positional filter cases.

I believe these resolves the outstanding items in this bug.  If you agree, please close the bug.
Comment 11 Jim Melton 2009-03-20 01:01:37 UTC
Christian, I have made the suggested changes vis-à-vis items [3], [11], and [12].  I believe that every item is now completed and I have marked the bug RESOLVED/FIXED.  If you agree, please mark it CLOSED. 

Thanks again for your persistence and help with these issues!
Comment 12 Christian Gruen 2009-03-20 17:08:49 UTC
Dear Mary, 

thanks for your clarifying answer. Yes, indeed I completely ignored that positional filters are to be evaluated from left to right, as is clearly stated in 3.6.

A last comment concerning number [8]: as the existing tests assume <p> to be a paragraph delimiter, and as this definition is implementation defined, it would make sense to add alternative results or an "inspect" attribute to these query. As implementors probably prefer the number of "inspect" attributes to be minimized, it might be advisable to propose default settings for the test suite - but this is just a vague idea.

Christian