[Bug 11444] New: [FT] FTThesaurusOption "levels" default should be implementation-defined

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11444

           Summary: [FT] FTThesaurusOption "levels" default should be
                    implementation-defined
           Product: XPath / XQuery / XSLT
           Version: Candidate Recommendation
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text 1.0
        AssignedTo: jim.melton@acm.org
        ReportedBy: paul@lucasmail.org
         QAContact: public-qt-comments@w3.org


The spec section 3.4.3 says in part:

> FTThesaurusID specifies the relationship sought between tokens and phrases written in the query and terms in the thesaurus and the number of levels to be queried in hierarchical relationships by including an FTRange "levels". If no levels are specified, the default is to query all levels in hierarchical relationships.

The problem with defaulting to "all levels" is that is makes queries too broad.
 For example, if using the WordNet data for the word "canary":

$ wn canary -n1 -hypen

Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun canary

Sense 1
fink, snitch, snitcher, stoolpigeon, stool pigeon, stoolie, sneak, sneaker,
canary
       => informer, betrayer, rat, squealer, blabber
           => informant, source
               => communicator
                   => person, individual, someone, somebody, mortal, soul
                       => organism, being
                           => living thing, animate thing
                               => whole, unit
                                   => object, physical object
                                       => physical entity
                                           => entity
                       => causal agent, cause, causal agency
                           => physical entity
                               => entity

then every query, e.g.:

    .//book/content contains text "canary" using
    thesaurus at "http://wordnet.princeton.edu"

would return true if it contains any of the words "whole", "object", "entity",
etc., which is, IMHO, not a useful result and most likely not what the user
would want because those words are so far removed from "canary".

My suggestion is to change the last sentence of the cited paragraph from the
spec to read:

> If no levels are specified, the default number of levels to query in hierarchical relationships is implementation-defined.

- Paul

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 1 December 2010 04:03:41 UTC