This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5838 - [FTUC] Possible Inconsistencies
Summary: [FTUC] Possible Inconsistencies
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 Use Cases (show other bugs)
Version: Candidate Recommendation
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Pat Case
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-07 13:59 UTC by Christian Gruen
Modified: 2008-09-09 16:55 UTC (History)
2 users (show)

See Also:


Attachments

Description Christian Gruen 2008-07-07 13:59:19 UTC
Dear all,

I had a closer look on the test suite and the XQFT Use Cases now and
tried to answer them with our BaseX full-text implementation. That's
when I came across some possible inconsistencies..

=== XQFT Use Cases, 16 May 2008 ===

- 2.2.7: ....[. ftcontains "improv.* the ....  testing" entire content]

The query contains the token "improv.*", but the wildcard option is
not specified here. If I add "with wildcards", I get the expected
results

- 4.2.1: ....ftcontains "improve" ftand "web" ftand "usability" with
stemming....

If I get the grammar right, the stemming option is only applied on the
last token in this example; I get the correct results if I
parenthesize all the search tokens:

 ....ftcontains ("improve" ftand "web" ftand "usability") with stemming....

- 5.2.1: Getting pedantic.. The "s" in the "solution in XQuery" header
is written in lower-case (I wrote a simple XQuery script to extract
the queries, and this one was left out)

- 16.2.9: ...for $cont := $book/content...

"for" should probably be replaced with "let" (or ":=" with "in").

- 17.2.4:    ...filter( $e/node.... / ....return filter($book....

I can parse this one if I precede the function call with the "local:" prefix.

=== XQFT Test Suite ===

Here I mainly stumbled across some minor serialization issues:

- As far as I know, new lines inside attribute values are removed
while parsing XML documents, so I expected the attribute..

 ....url="http://www.useit.com/papers/heuristic
 /heuristic_list.html">Ten Usability....

...to yield..

 ....url="http://www.useit.com/papers/heuristic
/heuristic_list.html">....

The same applies to two other attributes:

 ....url="http://usability.gov .... /guidelines/index.html"....
 ....shortTitle="Usabilityguy Manuscript .... Guide">....

Next - another bagatelle - the attribute

 ....normalize=
 "1990/1999"....

spans two lines whereas Saxon, Qizx, or BaseX keep it in one line:

 ....<componentDate normalize="1990/1999">1990-1999....

Last but not least, the two test-cases

 element-queries-results-q7.xq  and
 element-queries-results-q7b.xq

use the wildcard and "entire content" option from the above mentioned
use-case query (2.2.7).


I've noticed another possible inconsistency in the Test Suite queries
and Use Cases: many examples, esp. the XPath examples, use the count()
function to check if an ftcontains operator yields results. As
ftcontains returns a boolean value, I assume that the count function
will always return 1..

 count( 'abc' ftcontains 'def' ) > 0  -> true


That's all I found for now - thanks for listening.

Regards,
Christian
Comment 1 Pat Case 2008-07-07 14:05:58 UTC
Thanks for entering this into Bugzilla, Christian. We will take a close look and  back to you as soon as possible. Pat
Comment 2 Pat Case 2008-07-10 15:10:45 UTC
Christian,

The following changes are being made to the Full Text Use Cases. I am treating them as editorial changes therefore I am not waiting for Full Text Task Force approval, but I will ask for their review once published internally.

Corrected 2.2.7 Q7 Entire Element Content Query
Added "with wildcards" to the XQuery and XPath solutions.

Corrected 4.2.1 Q1 Query on Attribute
Moved "with stemming" to after "improve" and added parentheses around ("improve" with stemming ftand "web" ftand "usability") to make the distance operator applicable to all 3 operands in the XQuery and XPath solutions.

Corrected 5.2.1 Q1 One Character Suffix Wildcard Query
Capitalized the "S" in "Solution in XQuery".

Corrected 16.2.9 Q9 Query Using an XQuery Expression to Determine the Number of Words Allowed in a Window
Changed the 2nd "for" to "let" in the XQuery solution.

Corrected 17.2.4 Q4 Query Combining Score and XML Structure with a Conditional Return
Added the "local" prefix to 2 filter functional calls in the XQuery solution.

The changes will appear internally to W3C Members after the next build of the Use Cases.

The changes will appear to the public in the next public release.

Thank you so much for pointing out these errors.

We will address the other issues: >0 and text case issues separately.

Pat Case, Member XQuery Full Text Task Force
Comment 3 Pat Case 2008-07-16 17:30:42 UTC
Christian,

We have eliminated unwanted whitespace in the start tags in the expected results files for the test cases in the FT Test Suite (ELEMENT through WILDCARD).

Thanks for pointing this out as well.

The count > 0 issue remains.

Pat Case, Member XQuery Full Text Task Force

Comment 4 Christian Gruen 2008-07-16 18:46:00 UTC
Pat,

thank you for the quick response. I've had another look at the test cases, and that's what I noticed:

[1] Concerning the attribute serialization, I still get other results.. I must say sorry as some whitespaces seem to have get lost in my bug report.

This is what's find in several XQFTTS results..
  <title shortTitle="Usabilityguy Manuscript Guide">John 

..and this is what I expect/get:
  <title shortTitle="Usabilityguy Manuscript        Guide">John ...

The same observation applies to the attributes, containing "heuristic_list.html" and "/guidelines":

 <citation url="http://www.useit.com/papers/heuristic             /heuristic_list.html">Ten Usability
 <citation url="http://usability.gov             /guidelines/index.html"> Research-Based

This special case is due to the shredding of attribute nodes. The following
document/XQuery..

<a b='c
    d'/>

is supposed to return

<a b="c     d"/>

The best approach to fix this trivial one might be to modify the source file and remove the newline and indentation from the attributes.


[2] In the XQFTTSCatalog.xml file, "Textsource" occurrences should be replaced with "Testsource"


[3] /UseCase-OTHER/other-queries-results-q1.xq still contains the old XQFT Use Case example


Feel free to ask for more,
thank you,

Christian, BaseX Team
http://www.basex.org

Comment 5 Christian Gruen 2008-07-16 18:51:09 UTC
...a last one for today..

[4] The files in "ExpectedTestResults/Examples" are to be expected in "ExpectedTestResults/Examples/2.2.2" (or, alternatively, XQFTTSCatalog.xml should be fixed the other way round)

Regards,
Christian, BaseX Team
http://www.basex.org
Comment 6 Pat Case 2008-07-17 13:14:07 UTC
Christian,

New items 1 & 3 done.

1. Eliminated whitespace in the attributes in the use cases source file as had done previously for the expected results files.

3. Updated /UseCase-OTHER/other-queries-results-q1.xq to the corrected XQFT Use
Case query.

Count > 0 and New items 2 & 4 are still awaiting action.

Pat Case, Member XQuery Full Text Task Force
Comment 7 Pat Case 2008-08-11 20:09:04 UTC
Christian,

New items 2 & 4 are done.

[2] In the XQFTTSCatalog.xml file, "Textsource" occurrences should be replaced
with "Testsource"

--We have made the correction to "Testsource".

[4] The files in "ExpectedTestResults/Examples" are to be expected in
"ExpectedTestResults/Examples/2.2.2" (or, alternatively, XQFTTSCatalog.xml
should be fixed the other way round)

--We have inserted a 2.2.2 sub-directory under "ExpectedTestResults/Examples" and moved the 3 existing files into it.

count > 0
--We have decided to remove count > 0 from the vast majority of XQuery and XPath solutions. We had added it as a filter, but it is no longer needed. 
--Please track our progress in Bug 5829.

Old item 4.2.1
--We have recorrected 4.2.1 Q1 Query on Attribute
Removing the added parentheses around
("improve" with stemming ftand "web" ftand "usability")
is now
"improve" with stemming ftand "web" ftand "usability"
--By default the distance operator is applicable to any number of FTANDs, so the parentheses are superfluous and we try to only use parentheses in the use cases where they are significant.

We think we have addressed all your concerns.

If and when you agree (it will take about 2 weeks to get the new solutions into the test cases, you might want to wait to see them), please close this bug.

Pat Case, Library of Congress and Member FTTF
Comment 8 Christian Gruen 2008-09-09 16:55:16 UTC
Pat,

I've finally closed this bug. A last thing I noticed (which is actually based on a little typo by myself): "Testsources" (in XQFTTSCatalog.xml) must be changed to "TestSources" to support tests on Linux systems.

Thanks,
Christian, BaseX Team
http://www.basex.org