<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>6946</bug_id>
          
          <creation_ts>2009-05-23 17:06:06 +0000</creation_ts>
          <short_desc>[FT] Test Suite - Wildcards</short_desc>
          <delta_ts>2009-06-09 09:55:09 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Candidate Recommendation</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Christian Gruen">christian.gruen</reporter>
          <assigned_to name="Jim Melton">jim.melton</assigned_to>
          <cc>jmdyck</cc>
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>25272</commentid>
    <comment_count>0</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-05-23 17:06:06 +0000</bug_when>
    <thetext>Dear Pat, dear all,

I eventually decided to implement my own wildcard evaluator to disallow the support more sophisticated wildcard expressions by a regular expression matcher.

As a consequence, I would now expect empty results for the following three results:

[1] ftwildcard-q4.xq
    .//content ftcontains &quot;task?&quot; with wildcards

As the question mark is not preceded by a period, the literal term &quot;task?&quot; will be matched against the text.


[2] ftwildcard-q13.xq
    $cont ftcontains &quot;specialist\.&quot;

As the period is preceded by a backslash, the resulting literal term is &quot;specialist.&quot;


[3] ftwildcard-q14.xq
    $cont ftcontains &quot;nex.\?&quot;

The period will be treated as a placeholder for one arbitrary character, and the question mark will be searched as literal.


Indeed the &quot;ftusecases.xml&quot; documents contains the substring &quot;task?&quot;, &quot;specialist.&quot;, and &quot;next?&quot;. However, if the full-text tokenizer interprets periods as sentence delimiters, and not as parts of tokens, the periods will not be passed on to the wildcard evaluator. This is why both of the following queries

  &quot;task?&quot; ftcontains &quot;task?&quot;,
  &quot;task?&quot; ftcontains &quot;task?&quot; with wildcards

will lead to the internal comparison &quot;task&quot; &lt;-&gt; &quot;task?&quot;, and will both return false.

The tokenizer could add periods and other special characters to the returned tokens. This, however, would lead to surprising results with normal ftcontains operations:

  &quot;is this a task?&quot; ftcontains &quot;task&quot;  -&gt;  false


Looking forward to your reply - thanks,
Christian</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>25502</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Dyck">jmdyck</who>
    <bug_when>2009-06-08 16:34:59 +0000</bug_when>
    <thetext>&gt; Indeed the &quot;ftusecases.xml&quot; documents contains the substring &quot;task?&quot;,
&gt; &quot;specialist.&quot;, and &quot;next?&quot;. However, if the full-text tokenizer interprets
&gt; periods as sentence delimiters, and not as parts of tokens, the periods will
&gt; not be passed on to the wildcard evaluator. This is why both of the following
&gt; queries
&gt; 
&gt;   &quot;task?&quot; ftcontains &quot;task?&quot;,
&gt;   &quot;task?&quot; ftcontains &quot;task?&quot; with wildcards
&gt; 
&gt; will lead to the internal comparison &quot;task&quot; &lt;-&gt; &quot;task?&quot;, and will both return
&gt; false.

But note that tokenization is applied, not just to the search item, but also to the query strings. If the latter&apos;s tokenization also interprets punctuation as token-delimiters, then the internal comparison will be
    &quot;task&quot; &lt;-&gt; &quot;task&quot;
and the queries will return true.

By the way, did you intend &quot;with wildcards&quot; for cases [2] and [3]?
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>25504</commentid>
    <comment_count>2</comment_count>
    <who name="Christian Gruen">christian.gruen</who>
    <bug_when>2009-06-08 16:51:07 +0000</bug_when>
    <thetext>&gt; But note that tokenization is applied, not just to the search item, but also 
&gt; to the query strings. If the latter&apos;s tokenization also interprets 
&gt; punctuation as token-delimiters, then the internal comparison will be
&gt; 
&gt;    &quot;task&quot; &lt;-&gt; &quot;task&quot;
&gt;
&gt; and the queries will return true.

Thanks Michael. So I think I now understand that the full-text tokenizer itself must be completely aware to the wildcard syntax to return the correct tokens - which seems logical.


&gt; By the way, did you intend &quot;with wildcards&quot; for cases [2] and [3]?

Yes, exactly - I was referring here to the two test suite files &quot;ftwildcard-q13.xq&quot; and &quot;ftwildcard-q14.xq&quot;.


I&apos;ll give an update to this bug report as soon as possible.
Christian
</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>