<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>5633</bug_id>
          
          <creation_ts>2008-04-07 22:53:42 +0000</creation_ts>
          <short_desc>[FT] INCORRECT DISTANCE COMPUTATION IN FTDISTANCE</short_desc>
          <delta_ts>2008-04-15 02:46:51 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Working drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows XP</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Thomas Baby">thomas.baby</reporter>
          <assigned_to name="Jim Melton">jim.melton</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>19752</commentid>
    <comment_count>0</comment_count>
    <who name="Thomas Baby">thomas.baby</who>
    <bug_when>2008-04-07 22:53:42 +0000</bug_when>
    <thetext>The FTDistance functions rely on computing word distance, sentence distance, or paragraph distance, which are implemented in functions wordDistance, sentenceDistance, or paraDistance respectively. These functions do not return the absolute value of the distance, and this leads to some &quot;funny&quot; semantics in the presence of exclusions. 

For example, in function fts:ApplyFTWordDistanceAtMost, we say that for each stringExclude, there has to be at least one stringInclude from which it is not more than a certain word distance apart. 

for $stringExcl in $match/fts:stringExclude
where some $stringIncl in $match/fts:stringInclude
      satisfies fts:wordDistance(
                    $stringIncl/fts:tokenInfo,
                    $stringExcl/fts:tokenInfo
                ) &lt;= $n
return $stringExcl

But, since distance returned by wordDistance is not absolute, the result can be different depending on whether the stringExclude occcurs &quot;before&quot; and &quot;after&quot; a stringInclude. Intuitively, this does not make sense.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19753</commentid>
    <comment_count>1</comment_count>
    <who name="Thomas Baby">thomas.baby</who>
    <bug_when>2008-04-07 23:09:14 +0000</bug_when>
    <thetext>Minor error in the last paragraph. Here is the corrected paragraph:

But, since distance returned by wordDistance is not absolute, the result can be
different depending on whether the stringExclude occcurs &quot;before&quot; or &quot;after&quot; a
stringInclude. Intuitively, this does not make sense.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19764</commentid>
    <comment_count>2</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2008-04-10 15:09:03 +0000</bug_when>
    <thetext>Intuitive or not, this is a deliberate decision. In the face of overlapping tokens, the absolute value is not particularly more intuitive, and the absolute value gives the wrong answer. We order by token positions to produce determinate results, so I propose we close this bug with no action.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19771</commentid>
    <comment_count>3</comment_count>
    <who name="Michael Dyck">jmdyck</who>
    <bug_when>2008-04-10 17:43:58 +0000</bug_when>
    <thetext>Avoiding the term &quot;absolute value&quot;, the problem is that, depending on the order in which you pass two args to fts:wordDistance(), it will (in general) return two different results, only one of which is correct. The onus is on the caller to pass the args in the order that delivers the correct result. But it does not always do so, as pointed out in the original comment.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19778</commentid>
    <comment_count>4</comment_count>
    <who name="Thomas Baby">thomas.baby</who>
    <bug_when>2008-04-10 21:39:12 +0000</bug_when>
    <thetext>Thanks for your comments, Mary!

As I mentioned in the bug description, and as elaborated on by Michael Dyck, we seem to have an issue when handling a mix of stringIncludes and stringExcludes. So, until there is a resolution, I don&apos;t think the bug can be closed. </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19799</commentid>
    <comment_count>5</comment_count>
    <who name="Thomas Baby">thomas.baby</who>
    <bug_when>2008-04-15 02:40:04 +0000</bug_when>
    <thetext>The resolution is to modify functions xxDistance (xx=word, para, or sentence) to sort their inputs:

declare function fts:wordDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted := 
     for $ti in ($tokenInfo1, $tokenInfo2) 
     order by $ti/@startPos ascending, $ti/@endPos ascending
     return $ti
   return
     (: -1 because we count starting at 0 :)
     $sorted[2]/@startPos - $sorted[1]/@endPos - 1
};
            

declare function fts:paraDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer 
{
   (: Ensure tokens are in order :)
   let $sorted := 
     for $ti in ($tokenInfo1, $tokenInfo2) 
     order by $ti/@startPos ascending, $ti/@endPos ascending
     return $ti
   return
     (: -1 because we count starting at 0 :)
     $sorted[2]/@startPara - $sorted[1]/@endPara - 1
};
            

declare function fts:sentenceDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted := 
     for $ti in ($tokenInfo1, $tokenInfo2) 
     order by $ti/@startPos ascending, $ti/@endPos ascending
     return $ti
   return
     (: -1 because we count starting at 0 :)
     $sorted[2]/@startSent - $sorted[1]/@endSent - 1
};
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19800</commentid>
    <comment_count>6</comment_count>
    <who name="Thomas Baby">thomas.baby</who>
    <bug_when>2008-04-15 02:43:57 +0000</bug_when>
    <thetext>The changes to the functions resolve the issue. So, closing the bug.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>