<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>4454</bug_id>
          
          <creation_ts>2007-04-09 12:02:23 +0000</creation_ts>
          <short_desc>[FT] match options order should be implementation-defined</short_desc>
          <delta_ts>2007-04-19 15:52:50 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Working drafts</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Jochen Doerre">doerre</reporter>
          <assigned_to name="Jim Melton">jim.melton</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>14681</commentid>
    <comment_count>0</comment_count>
    <who name="Jochen Doerre">doerre</who>
    <bug_when>2007-04-09 12:02:23 +0000</bug_when>
    <thetext>From an implementation point of view there are two types of match options, which influences how these can practically be applied: i) match options that control a simple query rewrite step (without regard to what is actually contained in an index) and ii) match options that affect the lexical lookup of tokens in the index. According to this differentiation certain orders for match option application are more natural, because the implementation of a kind i) match option is less complex when that match option is applied in a query rewrite step prior to lexical lookup.
The thesaurus options is the typical kind i) match option. Also, stop words (when considered as query expansions, as our spec does) are as well.
Stemming is in-between the two kinds, as it typically involves a query-rewrite step, but also affects lexical lookup.
On the other hand, wildcard, case, diacritics are typically of kind ii).

We defined the match option application order as:
1. ftlanguage
2. ftwildcard
3. ftthesaurus
4. ftstem
5. ftcase
6. ftdiacritics
7. ftstopword

This order is in conflict with the semantics of FTStopword and FTThesaurus, as we have defined it in 4.6.2, where stop word filtering and thesaurus expansion are done as query rewrite steps, hence precede all other options, except language. 
The current semantics assumes an order:
1. ftlanguage
2. ftthesaurus
3. ftstopword
4. ftstem, ftcase, ftdiacriatics, ftwildcard

The order between the last  four would be implementation-defined. (Actually, I would assume that ftcase, ftdiacritics and ftwildcard are commutative, hence there&apos;s no need to define an order between them).

I can accept a partial order like above, but would opt for even more flexibility: implementations should be able to choose what order they implement also w.r.t. ftthesaurus vs. ftstopwords and ftstem vs. ftwildcard.
Best,
/Jochen</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>14776</commentid>
    <comment_count>1</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2007-04-19 15:52:30 +0000</bug_when>
    <thetext>WG agreed at F2F to make order implementation-defined subject to the following
constraints:
(1) ftlanguage must come first
(2) ftstem much come before ftcase and ftdiacritics

</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>