<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>3596</bug_id>
          
          <creation_ts>2006-08-13 08:37:31 +0000</creation_ts>
          <short_desc>second order aspect of scoring expressions</short_desc>
          <delta_ts>2006-10-05 19:42:36 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Working drafts</version>
          <rep_platform>Macintosh</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Martin Probst">martin</reporter>
          <assigned_to name="Jim Melton">jim.melton</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>11102</commentid>
    <comment_count>0</comment_count>
    <who name="Martin Probst">martin</who>
    <bug_when>2006-08-13 08:37:31 +0000</bug_when>
    <thetext>I&apos;ve posted this to the list but found out later that it might be better to submit a bug report.

The full text specification extends the XQuery processing model to allow for a second-order aspect of functions and it appears to me values are somewhat cheating around the normal flow of XDM instances in XQuery using this mechanism. This seems a bit strange, as it does not go so well with the XQuery spec. Also, there seem to be some holes, e.g. what is score here:
&gt; for $x score $score in //book[title ftcontains &quot;hello&quot;]/para[. ftcontains &quot;world&quot;] return $score
The score of the title, or the score of the para? I think this problem occurs because of the score values sneaking around normal XQuery evaluation order.

Now I wonder if this couldn&apos;t be greatly simplified by providing just two full text keywords, e.g. &quot;ftmatches&quot; returning an xs:boolean and &quot;ftscore&quot; returning an xs:double in [0.1]. &quot;ftmatches&quot; could be used for boolean conditions:
&gt; //book[. ftmatches &quot;hello&quot; &amp;&amp; &quot;world&quot;]
And &quot;ftscore&quot; if the user needs more control over relevance:
&gt; for $b in //book
&gt; let $score := $b ftscore &quot;hello&quot; &amp;&amp; &quot;world&quot;
&gt; where $score &gt; 0.5
&gt; order by $score descending
&gt; return $b
The definition of what score is a &quot;match&quot; could be an option, e.g.
&gt; declare option fts:match-score := 0.5;
Or completely arbitrary and application defined (as in the current spec, I think).

As this only adds completely normal XQuery expressions returning XDM instances I think this would greatly simplify both the processing model, the application for the user and the implementation for vendors (which is of course why I write this, I&apos;m lazy :-)).

I can&apos;t quite come up with a limitation of this concept over the one with the special score keywords, functions etc. Am I missing something?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12312</commentid>
    <comment_count>1</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2006-10-05 17:23:08 +0000</bug_when>
    <thetext>The WG had considered scoring and the various alternatives at length.
The problem with separating scoring as you suggest is that it makes it both harder to optimize queries, and harder to create them (for complex queries).
In answer to your specific question:

The score applies to the entire expresssion.  In theory the expression doesn&apos;t
even need to incorporate ftcontains.

Some examples, that we intend to incorporate into the draft to illustrate:

Return matching paragraphs in order of overall score:

for $p score $score in //book[title ftcontains &quot;hello&quot;]/para[. ftcontains &quot;world&quot;]
     order by $score descending
  return $p

Return the matching paragraphs ordered so that those from the highest scoring
books precede those from the lowest scoring books, where the highest scoring
paragraphs of each book are returned before the lower scoring paragraphs of
that book:

for $b score $score1 in //book[title ftcontains &quot;hello&quot;]
    order by $score1 descending
return
    for $p score $score2 in $b/para[. ftcontains &quot;world&quot;]
       order by $score2 descending
    return $p

We indent to close this bug with the addition of this clarification
to the specification. If you wish to object to the closure, please re-open the
bug.

Mary Holstege
for Full-Text Task Force of the XQuery Working Group</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12321</commentid>
    <comment_count>2</comment_count>
    <who name="Martin Probst">martin</who>
    <bug_when>2006-10-05 19:42:36 +0000</bug_when>
    <thetext>I&apos;m not sure it&apos;s going to be easier to optimize queries if you add a second order aspect of evaluation - actually I think all sorts of side effects need special casing in the generic way you can optimize functional languages (just like with XML constructors).

However, if you add that clarification to the specification the spec as is will be consistent and it&apos;s simply a design decision whether to do it this way or not, so I&apos;ll close this bug.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>