This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4715 - [FT] editorial: 3.5.3 Distance Selection
Summary: [FT] editorial: 3.5.3 Distance Selection
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Last Call drafts
Hardware: All All
: P2 minor
Target Milestone: ---
Assignee: Pat Case
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-23 10:06 UTC by Michael Dyck
Modified: 2008-03-15 00:36 UTC (History)
0 users

See Also:


Attachments

Description Michael Dyck 2007-06-23 10:06:52 UTC
3.5.3 Distance Selection

[1]
para 5
"The following rule applies to the computation of distance:
* Zero words (sentences, paragraphs) means adjacent tokens (sentences,
paragraphs)."
    This is out of place. It's plunked in the middle of a explanation
    of how an FTRange specifies a range of integer values.

[2]
"adjacent tokens (sentences, paragraphs)"
    [2a]
    What does it mean for two tokens (sentences, paragraphs) to be
    adjacent (or "consecutive", as appears elsewhere)?

    [2b]
    More generally, how does one compute the distance of a set of matches
    (in tokens or sentences or paragraphs)?

[3]
para 6
"Note: If M is greater then N"
    s/then/than/

[4]
example 5
"distance at most 2"
"distance at most 2"
"distance at least 20"
    These are missing the FTUnit ("words").
Comment 1 Jim Melton 2007-06-26 10:34:00 UTC
The FTTF considered your item [2] at its F2F in June, 2007.  We agreed that you have identified a question for which we do not have answers at present: in the face of overlapping tokens, for which we do not believe the specification can prescribe relative positions for all language possibilities, our formulae for computing distance in (for example) section 4.2.4 of the spec may not always be reasonable or correct.  This will require more thought and we will respond further to this bug report when we have something to report. 
Comment 2 Jochen Doerre 2007-11-13 13:57:07 UTC
[3] and [4] fixed
Comment 3 Pat Case 2008-01-24 15:46:57 UTC
[2] Resolved by removing "adjacent" and "consecutive" where relevant in the document.

[1] No change. Decided not to move the bullet because this is the first place that the concept of distance arises and it is appropirate to place this sentence here.

Michael, if you agree with the changes, please close the bug.
Comment 4 Michael Dyck 2008-02-16 09:25:17 UTC
(In reply to comment #3)
> [2] Resolved by removing "adjacent" and "consecutive" where relevant in the
> document.

Changing "adjacent tokens [etc]" to "no intervening tokens [etc]" doesn't resolve the problem, because "intervening" is no more defined than "adjacent" was.
   [2a] Given 2 matches, what does it mean for there to be no intervening
        tokens/sentences/paragraphs?
   [2b] Given n>2 matches, what does it mean?
 
> [1] No change. Decided not to move the bullet because this is the first place
> that the concept of distance arises and it is appropirate to place this
> sentence here.

I disagree on both counts. How one computes distances in the search context is one thing, and how one expresses conditions on such distances is another, and there's no need to jumble them up.

More constructively, I suggest the following.

(1) Take:

    A distance selection may cross element boundaries when computing distance.

and merge it with the later sentence:

    The distances computed by a distance selection are not affected by the
    presence or absence of element boundaries in the text being searched.

The two are basically the same. The latter is perhaps slightly more informative, so you could just drop the former.

(2) Take:

    The following rule applies to the computation of distance:
    o Zero words (sentences, paragraphs) means no intervening tokens
      (sentences, paragraphs).

Reword it to something like:

    A distance of zero words (...) means ...

And add it to the "distances computed" para.

(3) Add sentences to answer [2a+b] above.

(4) If you like, move the whole para on computing distances earlier. It would fit roughly where the "Distance is specified" sentence is. E.g.:

    ... matched tokens and phrases satisfy the specified distance conditions.

    Distances in the search context are measured in units of tokens,
    sentences, or paragraphs. Roughly speaking, the distance between
    two matches is the number of intervening units, so a distance of
    zero tokens (...) means no intervening tokens (...)
    More precisely, ...
    {sentence re element boundaries}
    {sentence re stop words}

    An FTDistance expresses a distance condition in terms of an FTUnit
    and an FTRange. An FTUnit can be <code>words</code>, <code>sentences</code>,
    or <code>paragraphs</code>, where <code>words</code> refers
    to token distances.  An FTRange specifies a range of integer values,
    providing a minimum and maximum value for the distance in question.
    Each one of ...
Comment 5 Michael Dyck 2008-03-15 00:36:03 UTC
The suggested changes of Comment #4, with some amendments, were approved at FTTF meeting 167, and I have committed these to the document. This resolves the original points [1] and [2], and thus resolves the bug.