<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>2485</bug_id>
          
          <creation_ts>2005-11-08 18:08:14 +0000</creation_ts>
          <short_desc>tokenization parameters: for data for query or for both ?</short_desc>
          <delta_ts>2006-02-20 19:29:19 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Working drafts</version>
          <rep_platform>Macintosh</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Daniela Florescu">dflorescu</reporter>
          <assigned_to name="Sihem Amer-Yahia">sihem</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>7096</commentid>
    <comment_count>0</comment_count>
    <who name="Daniela Florescu">dflorescu</who>
    <bug_when>2005-11-08 18:08:14 +0000</bug_when>
    <thetext>This issue concerns the options that effect the tokenization process.

The question is: are they influencing only the tokenization of the
query string or of both the query string and the data string ?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>7097</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Rys">mrys</who>
    <bug_when>2005-11-08 18:16:52 +0000</bug_when>
    <thetext>Note that in order to have systems that perform a priori tokenization of the 
search corpus using FT indices (like all the ones we have), the tokenization 
parameters of ftcontains should only apply to the search strings and not the 
input data. </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>8043</commentid>
    <comment_count>2</comment_count>
    <who name="Sihem Amer-Yahia">sihem</who>
    <bug_when>2006-01-30 16:54:29 +0000</bug_when>
    <thetext>Added a note in Section 4:

Because tokenization is implementation-defined, the
tokenization of each item in $searchContext does not necessarily take
into account the match options in $matchOptions. This allows
implementations to tokenize and index input data without the knowledge
of particular match options used in full-text queries.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>