This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5509 - [FT] Semantics of distance wrt overlapping tokens at same starting position is unclear
Summary: [FT] Semantics of distance wrt overlapping tokens at same starting position i...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Working drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Mary Holstege
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-28 19:02 UTC by Mary Holstege
Modified: 2008-04-04 15:53 UTC (History)
0 users

See Also:


Attachments

Description Mary Holstege 2008-02-28 19:02:20 UTC
FT semantics defines distance by ordering tokens by their starting positions.
However, if tokens overlap and have different ending positions but the same
starting positions, the results will be indeterminate.
Comment 1 Mary Holstege 2008-02-28 19:14:35 UTC
Proposal:
(1) Change the semantics functions to order by startPos, endPos
This still leads to some non-determinism when the startPos and endPos are 
identical.  However, for the purposes of distance calculations, this is 
irrelevant.

(2) In addition, and this relates to #4715 more than this bug, we could in
section 2 say something like:
"Tokens are ordered by their starting positions and, if necessary, their ending positions." 

I will go ahead with (1) as we agree this is the right thing to do. Comments on #2?
Comment 2 Jim Melton 2008-02-28 19:28:55 UTC
Thanks, Mary.  Your change to (1) is OK with me, and I agree that it is basically what we agreed on the call.  I'm OK with (2) as well. 

On (1), you said "This still leads to some non-determinism when the startPos and endPos are identical".  True, that's irrelevant for distance calculations.  Can you identify a place in the language or document where it is relevant? 
Comment 3 Mary Holstege 2008-02-28 19:34:21 UTC
It can come into play when we talk about phrases, I think -- 
a phrase being an ordered sequence of tokens. However, I believe that
non-determinism doesn't matter in practice, because phrase matching
is implementation-dependent anyhow, and we want to allow implementations
to come to their own conclusions about matching overlapping tokens in
such cases (cf the Dampfschiffmumble example)
Comment 4 Mary Holstege 2008-03-17 19:40:06 UTC
Bug was fixed as part of other work.