<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>6809</bug_id>
          
          <creation_ts>2009-04-14 01:43:32 +0000</creation_ts>
          <short_desc>[FT] Test Suite - Thesaurus Queries</short_desc>
          <delta_ts>2009-05-04 12:43:44 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Candidate Recommendation</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Christian Gruen">christian.gruen</reporter>
          <assigned_to name="Jim Melton">jim.melton</assigned_to>
          <cc>jmdyck</cc>
    
    <cc>pcase</cc>
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>24710</commentid>
    <comment_count>0</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-04-14 01:43:32 +0000</bug_when>
    <thetext>Dear task force,

I decided to add a basic Thesaurus implementation to BaseX to support and test the remaining queries. I frankly admit that I&apos;m no Thesaurus expert at all, so I mainly focused on the hints in the specification and the existing tests. As I&apos;m not sure if I completely understood what&apos;s going on in the test examples, here are some more questions/bug indications:


[1] ft-3.4.3-examples-q1

The usability.xml thesaurus file returns the synonym &quot;tasks&quot; for the query input &quot;duties&quot; - but the queried document node includes only the word in singular (&quot;task&quot; instead of &quot;tasks&quot;). Is this intended?


[2] ft-3.4.3-examples-q2

The thesaurus offers the terms &quot;navigation&quot;, &quot;layout&quot; and &quot;terminology&quot; for the query phrase &quot;web site components&quot;, but all of the terms are not included in the tested document node.


[3] ft-3.4.3-examples-q3.xq

In this query, words similar to &quot;Merrygould&quot; are to be found. As &quot;case insensitive&quot; is the default options, the term is converted to &quot;merrygould&quot; in my tests - so the thesaurus doesn&apos;t return any result.


[4] Probably a naïve question: do all thesaurus entries work in a &quot;bidirectional&quot; way? I.e., if &quot;A&quot; is a synonym for &quot;B&quot;, do I get &quot;A&quot; if I look for &quot;B&quot;, and &quot;B&quot; if I look for &quot;A&quot;? Next to that, are all synonym bidirectional? One could argue that &quot;Marigold&quot; sounds like &quot;Merrygould&quot;, but &quot;Merrygould&quot; doesn&apos;t sound like &quot;Marigold&quot;. In the latter case, the upper query [3] would only return results in the direction opposite to the current one.


[5] ft-3.4.3-expressions-q3

The thesaurus returns &quot;software&quot; for the term &quot;program&quot;; this term seems to be included in two books (number 1 and 3), but the current result contains only book 1.


[6] ft-3.4.3-expressions-q5

..references the missing file &quot;TechnicalThesaurus.xml&quot;.


[7] ft-3.4.3-expressions-q6
	
parentheses missing before &quot;default&quot; and after &quot;NT&quot;. I guess that the Thesaurus should also accept the original query terms and not only synonyms; is this correct? If &quot;yes&quot;, then book number 3 should be added as result, as it contains the term &quot;Computers&quot;.


[8] thesaurus-queries-results-q2 / q2b

As the used relationship is &quot;narrower terms&quot; here (instead of &quot;NT&quot; or &quot;narrower term&quot;) - do you expect implementations to recognize all kinds of writings, or ?


[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

&quot;spellcheck.xml&quot; and &quot;OurTaxonomy.xml&quot; don&apos;t exist yet.


[10] full-text-composability-queries-results-q2b

Parsing issue: &quot;]&quot; missing after &quot;stemming&quot;


[11] full-text-composability-queries-results-q3 / q3b

Parsing issue: some opening and closing parentheses are missing.



I&apos;m currently running the Thesaurus as the last match option, as I saw that the execution order of match options seems to be implementation defined. It may well be that different orders could result in different results - but I haven&apos;t really thought this through.

Concluding, as I indicated in the beginning, my knowledge on Thesauri is very limited. So maybe it will be helpful to directly talk to one of you in near future to get more insight in some of the open issues..

Christian</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24732</commentid>
    <comment_count>1</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-04-15 04:15:49 +0000</bug_when>
    <thetext>A little update: please forget my question [4] concerning synonyms. All recommended relationships from the specification are now implemented in a bidirectional way (except &quot;TT&quot;); if unknown relationships such as &quot;sounds like&quot; are encountered, they are stored in a unidirectionally. This is why I guess that most implementations would probably benefit if the thesaurus file &quot;soundex.xml&quot; was rewritten from..

  &lt;entry&gt;
    &lt;term&gt;Marigold&lt;/term&gt;
    &lt;synonym&gt;
      &lt;term&gt;Merrygould&lt;/term&gt;
      &lt;relationship&gt;sounds like&lt;/relationship&gt;
    &lt;/synonym&gt;
  &lt;/entry&gt;

..to..

  &lt;entry&gt;
    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;synonym&gt;
      &lt;term&gt;Marigold&lt;/term&gt;
      &lt;relationship&gt;sounds like&lt;/relationship&gt;
    &lt;/synonym&gt;
  &lt;/entry&gt;

I&apos;m still interested to hear your opinion about the remaining topics!
Christian
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24749</commentid>
    <comment_count>2</comment_count>
    <who name="Michael Dyck">jmdyck</who>
    <bug_when>2009-04-15 18:51:03 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt;
&gt; [10] full-text-composability-queries-results-q2b
&gt; 
&gt; Parsing issue: &quot;]&quot; missing after &quot;stemming&quot;

Fixed!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24750</commentid>
    <comment_count>3</comment_count>
    <who name="Michael Dyck">jmdyck</who>
    <bug_when>2009-04-15 20:09:38 +0000</bug_when>
    <thetext>(In reply to comment #0)
&gt; 
&gt; [11] full-text-composability-queries-results-q3 / q3b
&gt; 
&gt; Parsing issue: some opening and closing parentheses are missing.

Fixed!</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24779</commentid>
    <comment_count>4</comment_count>
    <who name="Pat Case">pcase</who>
    <bug_when>2009-04-16 17:03:31 +0000</bug_when>
    <thetext>Hi Christian.

Some responses follow:

[1] ft-3.4.3-examples-q1

The usability.xml thesaurus file returns the synonym &quot;tasks&quot; for the query
input &quot;duties&quot; - but the queried document node includes only the word in
singular (&quot;task&quot; instead of &quot;tasks&quot;). Is this intended?

--Yes. I didn&apos;t notice that example in the language document was wrong when I copied it into the test suite. I have changed duties to duty in the query,  added duty and task as synonyms in the thesaurus, and updated the description in the catalog to fix it in the test suite. I corrected the query and the description in the language document. I did not build the language document.


[2] ft-3.4.3-examples-q2

The thesaurus offers the terms &quot;navigation&quot;, &quot;layout&quot; and &quot;terminology&quot; for the
query phrase &quot;web site components&quot;, but all of the terms are not included in
the tested document node.

--Sigh. This example does not work against the sample document in the language document. It works against the sample document in the use cases. So I reworked it to work against the sample document in the language document. I searched on people and set users up as an NT for people in the usability thesaurus. Fixed in both the test suite and the language document.


[3] ft-3.4.3-examples-q3.xq

In this query, words similar to &quot;Merrygould&quot; are to be found. As &quot;case
insensitive&quot; is the default options, the term is converted to &quot;merrygould&quot; in
my tests - so the thesaurus doesn&apos;t return any result.

--Realizing now that case insensitive does not mean lower case, we search on &quot;Merrygould&quot;. &quot;Merrygould&quot; and &quot;Marigold&quot; are in the thesaurus. &quot;Marigold&quot; is found in the sample document, so I am at loss as to why we are talking about case at all in this query. I don&apos;t understand how Mary&apos;s comments apply to this one. I have made no changes.


[5] ft-3.4.3-expressions-q3

The thesaurus returns &quot;software&quot; for the term &quot;program&quot;; this term seems to be
included in two books (number 1 and 3), but the current result contains only
book 1.

--So true. I added Bk 3 to the result.


[6] ft-3.4.3-expressions-q5

..references the missing file &quot;TechnicalThesaurus.xml&quot;.

--My bad again. I corrected the thesaurus name to UsabilityThesaurus.xml.


[7] ft-3.4.3-expressions-q6

parentheses missing before &quot;default&quot; and after &quot;NT&quot;. I guess that the Thesaurus
should also accept the original query terms and not only synonyms; is this
correct? If &quot;yes&quot;, then book number 3 should be added as result, as it contains
the term &quot;Computers&quot;.

--Yes. Added the parentheses to the query. Added Bk 3 to the results. I also changed the operator in the query from ftor to ftand, otherwise since program is nowhere in the sample document, there would be no result at all.


[8] thesaurus-queries-results-q2 / q2b

As the used relationship is &quot;narrower terms&quot; here (instead of &quot;NT&quot; or &quot;narrower
term&quot;) - do you expect implementations to recognize all kinds of writings, or ?

--Ouch. That probably was a bit rude of me. I have duplicated the entry in the thesaurus and made the relationships in the second copy &quot;narrower terms&quot;, so that no translation from NT to narrower terms is required. 

[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

&quot;spellcheck.xml&quot; and &quot;OurTaxonomy.xml&quot; don&apos;t exist yet.

--I added the 2 thesauri.

Again, many thanks Christian for pointing these out so I could correct them.

Please let me know what you think and if you think these responses combined with Michael D&apos;s are adequate, please close the bug.

Pat Case
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24781</commentid>
    <comment_count>5</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-04-16 18:33:04 +0000</bug_when>
    <thetext>Hi Pat,

you are welcome. I checked the queries once more. Before closing this bug, it would be great if you could have another look at the following issues:


[3] ft-3.4.3-examples-q3.xq

It&apos;s good to hear your opinion on this query, as I surely had quite an implementation-centered approach in my mind here. As I feel that this issue is more complicated as I thought first, I&apos;ll add an extra &quot;bug&quot; to discuss the relationship between Thesaurus and match options.

Considering the relationship between &quot;Merrygould&quot; and &quot;Marigold&quot;, I would indeed expect the &quot;soundex.xml&quot; file to be modified. This was my suggestion..

OLD:
  &lt;term&gt;Marigold&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;relationship&gt;sounds like&lt;/relationship&gt;
  &lt;/synonym&gt;

NEW:
  &lt;term&gt;Merrygould&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;Marigold&lt;/term&gt;
    &lt;relationship&gt;sounds like&lt;/relationship&gt;
  &lt;/synonym&gt;

If I process a thesaurus request, I look up the input word (Merrygould) and return all words that are linked with the &quot;sounds like&quot; relationship to this term. I have no access to the complete ISO 2788 standard, but, as far as I know, the &quot;sounds like&quot; relationship is not defined there. So an XQuery implementation has to &quot;guess&quot; how a &quot;unknown&quot; relationship like this one works. I treat all undefined relationships as unidirectional, i.e. I will currently return &quot;Merrygould&quot; for the input term &quot;Marigold&quot; - but not the other way round. If the xml file will be modified as proposed above, the relationship can be consistently answered like the other thesaurus examples.

If you have a different opinion or think I&apos;m wrong, don&apos;t hesitate to tell me.


[5] ft-3.4.3-expressions-q3

Now, result should be defined as &quot;Fragment&quot; in XQFTCatalog.xml..


[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

Different spellings: &quot;misspelling-of&quot; vs &quot;misspelling of&quot;..


Christian

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24784</commentid>
    <comment_count>6</comment_count>
    <who name="Pat Case">pcase</who>
    <bug_when>2009-04-16 19:27:28 +0000</bug_when>
    <thetext>Christian,

[3] ft-3.4.3-examples-q3.xq

It&apos;s good to hear your opinion on this query, as I surely had quite an
implementation-centered approach in my mind here. As I feel that this issue is
more complicated as I thought first, I&apos;ll add an extra &quot;bug&quot; to discuss the
relationship between Thesaurus and match options.

Considering the relationship between &quot;Merrygould&quot; and &quot;Marigold&quot;, I would
indeed expect the &quot;soundex.xml&quot; file to be modified. This was my suggestion..

OLD:
  &lt;term&gt;Marigold&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;relationship&gt;sounds like&lt;/relationship&gt;
  &lt;/synonym&gt;

NEW:
  &lt;term&gt;Merrygould&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;Marigold&lt;/term&gt;
    &lt;relationship&gt;sounds like&lt;/relationship&gt;
  &lt;/synonym&gt;

If I process a thesaurus request, I look up the input word (Merrygould) and
return all words that are linked with the &quot;sounds like&quot; relationship to this
term. I have no access to the complete ISO 2788 standard, but, as far as I
know, the &quot;sounds like&quot; relationship is not defined there. So an XQuery
implementation has to &quot;guess&quot; how a &quot;unknown&quot; relationship like this one works.
I treat all undefined relationships as unidirectional, i.e. I will currently
return &quot;Merrygould&quot; for the input term &quot;Marigold&quot; - but not the other way
round. If the xml file will be modified as proposed above, the relationship can
be consistently answered like the other thesaurus examples.

If you have a different opinion or think I&apos;m wrong, don&apos;t hesitate to tell me.

--I see sounds like as a two way equivalency similar to synonym, but I don&apos;t claim to know how the thesaurus should be structured either. So to get this solved, I have put both entries in the thesaurus. Hope that is OK.


[5] ft-3.4.3-expressions-q3

Now, result should be defined as &quot;Fragment&quot; in XQFTCatalog.xml..

--Done. 

[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

Different spellings: &quot;misspelling-of&quot; vs &quot;misspelling of&quot;..

--I looked in the queries, the spellcheck thesaurus, and the use cases and don&apos;t see any hyphenated versions. Where are you looking?

Pat</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>24786</commentid>
    <comment_count>7</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-04-16 20:29:29 +0000</bug_when>
    <thetext>..continued..


[3] ft-3.4.3-examples-q3.xq

--I see sounds like as a two way equivalency similar to synonym, but I don&apos;t
claim to know how the thesaurus should be structured either. So to get this
solved, I have put both entries in the thesaurus. Hope that is OK.

Yes, this is fine as well. A minor issue: it should be rewritten from 

    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;synonym&gt;
      &lt;term&gt;&lt;/term&gt;
      &lt;relationship&gt;Marigold&lt;/relationship&gt;
    &lt;/synonym&gt;

...to...

    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;synonym&gt;
      &lt;term&gt;Marigold&lt;/term&gt;
      &lt;relationship&gt;sounds like&lt;/relationship&gt;
    &lt;/synonym&gt;


[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

--I looked in the queries, the spellcheck thesaurus, and the use cases and
don&apos;t see any hyphenated versions. Where are you looking?

Sorry, I mixed this one up. The hyphenated version is used in the &quot;usability.xml&quot; - but it&apos;s used nowhere.

Instead, I would suggest to extend the &quot;spellcheck.xml&quot; file similar to the &quot;soundex.xml&quot; file; otherwise the logics of this thesaurus is opposite to the other ones. An example..


a) &quot;users&quot; ftcontains &quot;people&quot; with thesaurus at &quot;usability.xml&quot;
     relationship &quot;NT&quot;

  &lt;term&gt;people&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;users&lt;/term&gt;
    &lt;relationship&gt;NT&lt;/relationship&gt;
  &lt;/synonym&gt;

  -&gt; true


b) &quot;succesful&quot; ftcontains &quot;sucessfull&quot; with thesaurus at &quot;spellcheck.xml&quot;
     relationship &quot;misspelling of&quot;

  &lt;term&gt;successful&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;sucessfull&lt;/term&gt;
    &lt;relationship&gt;misspelling of&lt;/relationship&gt;
  &lt;/synonym&gt;

  -&gt; false...


Christian

</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>25001</commentid>
    <comment_count>8</comment_count>
    <who name="Pat Case">pcase</who>
    <bug_when>2009-05-04 12:16:50 +0000</bug_when>
    <thetext>Christian,

[3] ft-3.4.3-examples-q3.xq

Yes, this is fine as well. A minor issue: it should be rewritten from 

    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;synonym&gt;
      &lt;term&gt;&lt;/term&gt;
      &lt;relationship&gt;Marigold&lt;/relationship&gt;
    &lt;/synonym&gt;

...to...

    &lt;term&gt;Merrygould&lt;/term&gt;
    &lt;synonym&gt;
      &lt;term&gt;Marigold&lt;/term&gt;
      &lt;relationship&gt;sounds like&lt;/relationship&gt;
    &lt;/synonym&gt;

--Done.

[9] thesaurus-queries-results-q5 / q5b / q6 / q6b

Instead, I would suggest to extend the &quot;spellcheck.xml&quot; file similar to the
&quot;soundex.xml&quot; file; otherwise the logics of this thesaurus is opposite to the
other ones. An example..

a) &quot;users&quot; ftcontains &quot;people&quot; with thesaurus at &quot;usability.xml&quot;
     relationship &quot;NT&quot;

  &lt;term&gt;people&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;users&lt;/term&gt;
    &lt;relationship&gt;NT&lt;/relationship&gt;
  &lt;/synonym&gt;

  -&gt; true


b) &quot;succesful&quot; ftcontains &quot;sucessfull&quot; with thesaurus at &quot;spellcheck.xml&quot;
     relationship &quot;misspelling of&quot;

  &lt;term&gt;successful&lt;/term&gt;
  &lt;synonym&gt;
    &lt;term&gt;sucessfull&lt;/term&gt;
    &lt;relationship&gt;misspelling of&lt;/relationship&gt;
  &lt;/synonym&gt;

  -&gt; false...


--Done.

Pat Case</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>25002</commentid>
    <comment_count>9</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-05-04 12:43:44 +0000</bug_when>
    <thetext>Thanks; as far as I can see, all todos are fixed, so I closed this one.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>