This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4454 - [FT] match options order should be implementation-defined
Summary: [FT] match options order should be implementation-defined
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Working drafts
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-09 12:02 UTC by Jochen Doerre
Modified: 2007-04-19 15:52 UTC (History)
0 users

See Also:


Attachments

Description Jochen Doerre 2007-04-09 12:02:23 UTC
From an implementation point of view there are two types of match options, which influences how these can practically be applied: i) match options that control a simple query rewrite step (without regard to what is actually contained in an index) and ii) match options that affect the lexical lookup of tokens in the index. According to this differentiation certain orders for match option application are more natural, because the implementation of a kind i) match option is less complex when that match option is applied in a query rewrite step prior to lexical lookup.
The thesaurus options is the typical kind i) match option. Also, stop words (when considered as query expansions, as our spec does) are as well.
Stemming is in-between the two kinds, as it typically involves a query-rewrite step, but also affects lexical lookup.
On the other hand, wildcard, case, diacritics are typically of kind ii).

We defined the match option application order as:
1. ftlanguage
2. ftwildcard
3. ftthesaurus
4. ftstem
5. ftcase
6. ftdiacritics
7. ftstopword

This order is in conflict with the semantics of FTStopword and FTThesaurus, as we have defined it in 4.6.2, where stop word filtering and thesaurus expansion are done as query rewrite steps, hence precede all other options, except language. 
The current semantics assumes an order:
1. ftlanguage
2. ftthesaurus
3. ftstopword
4. ftstem, ftcase, ftdiacriatics, ftwildcard

The order between the last  four would be implementation-defined. (Actually, I would assume that ftcase, ftdiacritics and ftwildcard are commutative, hence there's no need to define an order between them).

I can accept a partial order like above, but would opt for even more flexibility: implementations should be able to choose what order they implement also w.r.t. ftthesaurus vs. ftstopwords and ftstem vs. ftwildcard.
Best,
/Jochen
Comment 1 Mary Holstege 2007-04-19 15:52:30 UTC
WG agreed at F2F to make order implementation-defined subject to the following
constraints:
(1) ftlanguage must come first
(2) ftstem much come before ftcase and ftdiacritics