<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>5509</bug_id>
          
          <creation_ts>2008-02-28 19:02:20 +0000</creation_ts>
          <short_desc>[FT] Semantics of distance wrt overlapping tokens at same starting position is unclear</short_desc>
          <delta_ts>2008-04-04 15:53:30 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Working drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mary Holstege">holstege</reporter>
          <assigned_to name="Mary Holstege">holstege</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>19217</commentid>
    <comment_count>0</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2008-02-28 19:02:20 +0000</bug_when>
    <thetext>FT semantics defines distance by ordering tokens by their starting positions.
However, if tokens overlap and have different ending positions but the same
starting positions, the results will be indeterminate.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19218</commentid>
    <comment_count>1</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2008-02-28 19:14:35 +0000</bug_when>
    <thetext>Proposal:
(1) Change the semantics functions to order by startPos, endPos
This still leads to some non-determinism when the startPos and endPos are 
identical.  However, for the purposes of distance calculations, this is 
irrelevant.

(2) In addition, and this relates to #4715 more than this bug, we could in
section 2 say something like:
&quot;Tokens are ordered by their starting positions and, if necessary, their ending positions.&quot; 

I will go ahead with (1) as we agree this is the right thing to do. Comments on #2?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19219</commentid>
    <comment_count>2</comment_count>
    <who name="Jim Melton">jim.melton</who>
    <bug_when>2008-02-28 19:28:55 +0000</bug_when>
    <thetext>Thanks, Mary.  Your change to (1) is OK with me, and I agree that it is basically what we agreed on the call.  I&apos;m OK with (2) as well. 

On (1), you said &quot;This still leads to some non-determinism when the startPos and endPos are identical&quot;.  True, that&apos;s irrelevant for distance calculations.  Can you identify a place in the language or document where it is relevant? </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19220</commentid>
    <comment_count>3</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2008-02-28 19:34:21 +0000</bug_when>
    <thetext>It can come into play when we talk about phrases, I think -- 
a phrase being an ordered sequence of tokens. However, I believe that
non-determinism doesn&apos;t matter in practice, because phrase matching
is implementation-dependent anyhow, and we want to allow implementations
to come to their own conclusions about matching overlapping tokens in
such cases (cf the Dampfschiffmumble example)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>19507</commentid>
    <comment_count>4</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2008-03-17 19:40:06 +0000</bug_when>
    <thetext>
Bug was fixed as part of other work.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>