This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11738 - [FT] thesaurus.xsd wrong (again) or usability.xml wrong
Summary: [FT] thesaurus.xsd wrong (again) or usability.xml wrong
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: All All
: P2 blocker
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-11 21:16 UTC by Paul J. Lucas
Modified: 2011-01-24 19:24 UTC (History)
1 user (show)

See Also:


Attachments

Description Paul J. Lucas 2011-01-11 21:16:38 UTC
The thesaurus.xsd file that is part of the Full Text Test Suite specifies the schema for a "testing" thesaurus. It says that a synonym has exactly 1 term and one or more relationships.  It does *not* say that a synonym can have more synonyms.

However, the file usability.xml has:

  <entry>
    <term>infrastructure</term>
    <synonym>
      <term>networks</term>
      <synonym>
        <term>Web</term>
        <relationship>NT</relationship>
      </synonym>
      <relationship>NT</relationship>
    </synonym>
  </entry>

i.e., the synonym "networks" itself has a synonym of "Web".  It is assumed that the intention was to codify both a hierarchical relationship and "levels" which implies that the schema is wrong.

Another oddity in the schema is why an entry can have *zero* terms.  You'd think an entry would have exactly one term, but that's not what the schema says.
Comment 1 Paul J. Lucas 2011-01-21 17:41:00 UTC
Importance changed to "blocker" since it is not possible either to write code to test nor test any of the full-text thesaurus tests that do not use the default thesaurus.  The test-suite-provided thesaurus usability.xml file doesn't validate against the schema, so all thesaurus tests that use usability.xml fail.

Incidentally, I would also like to change the "Priority" of this bug to give it a "higher" priority, but the explanation of what "Priority" means:

> This field describes the importance and order in which a bug should be fixed compared to other bugs.
> This field is utilized by the programmers/engineers to prioritize their work to be done.

doesn't specify which way is a higher priority: is a P1 bug the highest priority, or would that be a P5 bug?
Comment 2 Mary Holstege 2011-01-24 18:37:06 UTC
The file usability.xml is indeed in error. It should not have nested synonyms.
I have fixed this error.  If you are satisified with this resolution please mark the bug as CLOSED.
Comment 3 Paul J. Lucas 2011-01-24 18:59:27 UTC
So, just to clarify, if usability.xml has no nested synonyms, then as far as I can tell, none of the thesaurus XML data files provided in the test suite actually have "levels" of synonyms in them.  There are several tests in the test suite that have "levels" clauses in them.  If there are no levels in the thesauri data, then what, exactly, are the "levels" in the queries testing?

What I would have expected would be the existence of thesauri data that has levels like:

kitten
=> cat
==> feline

and for a query to limit the levels to, say, "exactly 1 level" while looking for "feline" -- the "contains text" should then return "false" because "feline" should *not* be considered because it is too many levels away.

Aside: again, the reason this bug is a "blocker" is because, although the W3C may consider these data "informational," in order to pass the tests, an implementation should use the exact thesaurus data specified.  In order to use the exact data specified, an implementation has to read in the thesaurus XML data into an internal data-structure according to a schema.  A schema that allows nested synonyms results in an different internal data structure (and algorithm) from one that doesn't.  If either the XML file or the schema are broken, this task is impossible.
Comment 4 Mary Holstege 2011-01-24 19:24:49 UTC
If "infrastructure" has a synonym of "networks" and "networks" has a synonym of "Web", then "networks" is linked at level 1 to "infrastructure" and "Web" is linked at level 2 to "infrastructure".  It is really all a question about how deep a graph search you do along the various labelled (relationship) arcs.