[Bug 6809] New: [FT] Test Suite - Thesaurus Queries

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6809

           Summary: [FT] Test Suite - Thesaurus Queries
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text 1.0
        AssignedTo: jim.melton@acm.org
        ReportedBy: christian.gruen@gmail.com
         QAContact: public-qt-comments@w3.org


Dear task force,

I decided to add a basic Thesaurus implementation to BaseX to support and test
the remaining queries. I frankly admit that I'm no Thesaurus expert at all, so
I mainly focused on the hints in the specification and the existing tests. As
I'm not sure if I completely understood what's going on in the test examples,
here are some more questions/bug indications:


[1] ft-3.4.3-examples-q1

The usability.xml thesaurus file returns the synonym "tasks" for the query
input "duties" - but the queried document node includes only the word in
singular ("task" instead of "tasks"). Is this intended?


[2] ft-3.4.3-examples-q2

The thesaurus offers the terms "navigation", "layout" and "terminology" for the
query phrase "web site components", but all of the terms are not included in
the tested document node.


[3] ft-3.4.3-examples-q3.xq

In this query, words similar to "Merrygould" are to be found. As "case
insensitive" is the default options, the term is converted to "merrygould" in
my tests - so the thesaurus doesn't return any result.


[4] Probably a naïve question: do all thesaurus entries work in a
"bidirectional" way? I.e., if "A" is a synonym for "B", do I get "A" if I look
for "B", and "B" if I look for "A"? Next to that, are all synonym
bidirectional? One could argue that "Marigold" sounds like "Merrygould", but
"Merrygould" doesn't sound like "Marigold". In the latter case, the upper query
[3] would only return results in the direction opposite to the current one.


[5] ft-3.4.3-expressions-q3

The thesaurus returns "software" for the term "program"; this term seems to be
included in two books (number 1 and 3), but the current result contains only
book 1.


[6] ft-3.4.3-expressions-q5

..references the missing file "TechnicalThesaurus.xml".


[7] ft-3.4.3-expressions-q6

parentheses missing before "default" and after "NT". I guess that the Thesaurus
should also accept the original query terms and not only synonyms; is this
correct? If "yes", then book number 3 should be added as result, as it contains
the term "Computers".


[8] thesaurus-queries-results-q2 / q2b

As the used relationship is "narrower terms" here (instead of "NT" or "narrower
term") - do you expect implementations to recognize all kinds of writings, or ?


[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

"spellcheck.xml" and "OurTaxonomy.xml" don't exist yet.


[10] full-text-composability-queries-results-q2b

Parsing issue: "]" missing after "stemming"


[11] full-text-composability-queries-results-q3 / q3b

Parsing issue: some opening and closing parentheses are missing.



I'm currently running the Thesaurus as the last match option, as I saw that the
execution order of match options seems to be implementation defined. It may
well be that different orders could result in different results - but I haven't
really thought this through.

Concluding, as I indicated in the beginning, my knowledge on Thesauri is very
limited. So maybe it will be helpful to directly talk to one of you in near
future to get more insight in some of the open issues..

Christian


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 14 April 2009 01:43:41 UTC