This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3596 - second order aspect of scoring expressions
Summary: second order aspect of scoring expressions
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Working drafts
Hardware: Macintosh All
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-13 08:37 UTC by Martin Probst
Modified: 2006-10-05 19:42 UTC (History)
0 users

See Also:


Attachments

Description Martin Probst 2006-08-13 08:37:31 UTC
I've posted this to the list but found out later that it might be better to submit a bug report.

The full text specification extends the XQuery processing model to allow for a second-order aspect of functions and it appears to me values are somewhat cheating around the normal flow of XDM instances in XQuery using this mechanism. This seems a bit strange, as it does not go so well with the XQuery spec. Also, there seem to be some holes, e.g. what is score here:
> for $x score $score in //book[title ftcontains "hello"]/para[. ftcontains "world"] return $score
The score of the title, or the score of the para? I think this problem occurs because of the score values sneaking around normal XQuery evaluation order.

Now I wonder if this couldn't be greatly simplified by providing just two full text keywords, e.g. "ftmatches" returning an xs:boolean and "ftscore" returning an xs:double in [0.1]. "ftmatches" could be used for boolean conditions:
> //book[. ftmatches "hello" && "world"]
And "ftscore" if the user needs more control over relevance:
> for $b in //book
> let $score := $b ftscore "hello" && "world"
> where $score > 0.5
> order by $score descending
> return $b
The definition of what score is a "match" could be an option, e.g.
> declare option fts:match-score := 0.5;
Or completely arbitrary and application defined (as in the current spec, I think).

As this only adds completely normal XQuery expressions returning XDM instances I think this would greatly simplify both the processing model, the application for the user and the implementation for vendors (which is of course why I write this, I'm lazy :-)).

I can't quite come up with a limitation of this concept over the one with the special score keywords, functions etc. Am I missing something?
Comment 1 Mary Holstege 2006-10-05 17:23:08 UTC
The WG had considered scoring and the various alternatives at length.
The problem with separating scoring as you suggest is that it makes it both harder to optimize queries, and harder to create them (for complex queries).
In answer to your specific question:

The score applies to the entire expresssion.  In theory the expression doesn't
even need to incorporate ftcontains.

Some examples, that we intend to incorporate into the draft to illustrate:

Return matching paragraphs in order of overall score:

for $p score $score in //book[title ftcontains "hello"]/para[. ftcontains "world"]
     order by $score descending
  return $p

Return the matching paragraphs ordered so that those from the highest scoring
books precede those from the lowest scoring books, where the highest scoring
paragraphs of each book are returned before the lower scoring paragraphs of
that book:

for $b score $score1 in //book[title ftcontains "hello"]
    order by $score1 descending
return
    for $p score $score2 in $b/para[. ftcontains "world"]
       order by $score2 descending
    return $p

We indent to close this bug with the addition of this clarification
to the specification. If you wish to object to the closure, please re-open the
bug.

Mary Holstege
for Full-Text Task Force of the XQuery Working Group
Comment 2 Martin Probst 2006-10-05 19:42:36 UTC
I'm not sure it's going to be easier to optimize queries if you add a second order aspect of evaluation - actually I think all sorts of side effects need special casing in the generic way you can optimize functional languages (just like with XML constructors).

However, if you add that clarification to the specification the spec as is will be consistent and it's simply a design decision whether to do it this way or not, so I'll close this bug.