This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
I've posted this to the list but found out later that it might be better to submit a bug report. The full text specification extends the XQuery processing model to allow for a second-order aspect of functions and it appears to me values are somewhat cheating around the normal flow of XDM instances in XQuery using this mechanism. This seems a bit strange, as it does not go so well with the XQuery spec. Also, there seem to be some holes, e.g. what is score here: > for $x score $score in //book[title ftcontains "hello"]/para[. ftcontains "world"] return $score The score of the title, or the score of the para? I think this problem occurs because of the score values sneaking around normal XQuery evaluation order. Now I wonder if this couldn't be greatly simplified by providing just two full text keywords, e.g. "ftmatches" returning an xs:boolean and "ftscore" returning an xs:double in [0.1]. "ftmatches" could be used for boolean conditions: > //book[. ftmatches "hello" && "world"] And "ftscore" if the user needs more control over relevance: > for $b in //book > let $score := $b ftscore "hello" && "world" > where $score > 0.5 > order by $score descending > return $b The definition of what score is a "match" could be an option, e.g. > declare option fts:match-score := 0.5; Or completely arbitrary and application defined (as in the current spec, I think). As this only adds completely normal XQuery expressions returning XDM instances I think this would greatly simplify both the processing model, the application for the user and the implementation for vendors (which is of course why I write this, I'm lazy :-)). I can't quite come up with a limitation of this concept over the one with the special score keywords, functions etc. Am I missing something?
The WG had considered scoring and the various alternatives at length. The problem with separating scoring as you suggest is that it makes it both harder to optimize queries, and harder to create them (for complex queries). In answer to your specific question: The score applies to the entire expresssion. In theory the expression doesn't even need to incorporate ftcontains. Some examples, that we intend to incorporate into the draft to illustrate: Return matching paragraphs in order of overall score: for $p score $score in //book[title ftcontains "hello"]/para[. ftcontains "world"] order by $score descending return $p Return the matching paragraphs ordered so that those from the highest scoring books precede those from the lowest scoring books, where the highest scoring paragraphs of each book are returned before the lower scoring paragraphs of that book: for $b score $score1 in //book[title ftcontains "hello"] order by $score1 descending return for $p score $score2 in $b/para[. ftcontains "world"] order by $score2 descending return $p We indent to close this bug with the addition of this clarification to the specification. If you wish to object to the closure, please re-open the bug. Mary Holstege for Full-Text Task Force of the XQuery Working Group
I'm not sure it's going to be easier to optimize queries if you add a second order aspect of evaluation - actually I think all sorts of side effects need special casing in the generic way you can optimize functional languages (just like with XML constructors). However, if you add that clarification to the specification the spec as is will be consistent and it's simply a design decision whether to do it this way or not, so I'll close this bug.