This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6946 - [FT] Test Suite - Wildcards
Summary: [FT] Test Suite - Wildcards
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-23 17:06 UTC by Christian Gruen
Modified: 2009-06-09 09:55 UTC (History)
1 user (show)

See Also:


Attachments

Description Christian Gruen 2009-05-23 17:06:06 UTC
Dear Pat, dear all,

I eventually decided to implement my own wildcard evaluator to disallow the support more sophisticated wildcard expressions by a regular expression matcher.

As a consequence, I would now expect empty results for the following three results:

[1] ftwildcard-q4.xq
    .//content ftcontains "task?" with wildcards

As the question mark is not preceded by a period, the literal term "task?" will be matched against the text.


[2] ftwildcard-q13.xq
    $cont ftcontains "specialist\."

As the period is preceded by a backslash, the resulting literal term is "specialist."


[3] ftwildcard-q14.xq
    $cont ftcontains "nex.\?"

The period will be treated as a placeholder for one arbitrary character, and the question mark will be searched as literal.


Indeed the "ftusecases.xml" documents contains the substring "task?", "specialist.", and "next?". However, if the full-text tokenizer interprets periods as sentence delimiters, and not as parts of tokens, the periods will not be passed on to the wildcard evaluator. This is why both of the following queries

  "task?" ftcontains "task?",
  "task?" ftcontains "task?" with wildcards

will lead to the internal comparison "task" <-> "task?", and will both return false.

The tokenizer could add periods and other special characters to the returned tokens. This, however, would lead to surprising results with normal ftcontains operations:

  "is this a task?" ftcontains "task"  ->  false


Looking forward to your reply - thanks,
Christian
Comment 1 Michael Dyck 2009-06-08 16:34:59 UTC
> Indeed the "ftusecases.xml" documents contains the substring "task?",
> "specialist.", and "next?". However, if the full-text tokenizer interprets
> periods as sentence delimiters, and not as parts of tokens, the periods will
> not be passed on to the wildcard evaluator. This is why both of the following
> queries
> 
>   "task?" ftcontains "task?",
>   "task?" ftcontains "task?" with wildcards
> 
> will lead to the internal comparison "task" <-> "task?", and will both return
> false.

But note that tokenization is applied, not just to the search item, but also to the query strings. If the latter's tokenization also interprets punctuation as token-delimiters, then the internal comparison will be
    "task" <-> "task"
and the queries will return true.

By the way, did you intend "with wildcards" for cases [2] and [3]?
Comment 2 Christian Gruen 2009-06-08 16:51:07 UTC
> But note that tokenization is applied, not just to the search item, but also 
> to the query strings. If the latter's tokenization also interprets 
> punctuation as token-delimiters, then the internal comparison will be
> 
>    "task" <-> "task"
>
> and the queries will return true.

Thanks Michael. So I think I now understand that the full-text tokenizer itself must be completely aware to the wildcard syntax to return the correct tokens - which seems logical.


> By the way, did you intend "with wildcards" for cases [2] and [3]?

Yes, exactly - I was referring here to the two test suite files "ftwildcard-q13.xq" and "ftwildcard-q14.xq".


I'll give an update to this bug report as soon as possible.
Christian