4698 – [FT] editorial: 2.1 Processing Model

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4698 - [FT] editorial: 2.1 Processing Model

Summary: [FT] editorial: 2.1 Processing Model

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Full Text 1.0 (show other bugs)
Version:	Last Call drafts
Hardware:	All All

Importance:	P2 minor
Target Milestone:	---
Assignee:	Pat Case
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-06-23 09:55 UTC by Michael Dyck
Modified:	2008-03-03 01:32 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Michael Dyck 2007-06-23 09:55:36 UTC

2.1 Processing Model

[1]
section
    I think the spec might be better off with the contents of this section
    put elsewhere. E.g., the stuff on tokenization can be merged into 4.1;
    pretty much everything else is specific to full-text contains
    expressions, so can be merged into 2.2.1.

[2]
para 1
"As part of the External Processing that is described in the XQuery
Processing Model, when an XML document is parsed into an Infoset/PSVI and
ultimately into a XQuery Data Model instance, a full-text process called
tokenization is usually executed."
    With respect to the Processing Model, tokenization is *not* part of
    external processing, because:
    (a) there's no allowance for tokens in the Data Model, and
    (b) the only place/time where the thing-to-be-tokenized and the
        options-by-which-to-tokenize-it are guaranteed to come together
        is within the query at evaluation time.
    (Implementations may be able to statically determine [or guess] some
    combinations, and so do pre-tokenization, but that's not something
    that is [or should be] captured in the Processing Model.) Replace the
    para with something like:
        "At various points in full-text processing, the processor is
        called upon to 'tokenize' a string."

[3]
para 3
'including the definition of the term "words"'
    Delete. (Avoid using the term "words".)

[4]
"interprete"
    Change to "interpret".

[5]
list 1
"2. ... the containment hierarchy (e.g., paragraphs contain sentences,
which contain words)"
    I think you mean "i.e.", not "e.g.". (If that's just an *example* of a
    containment hierarchy, then who gets to define the actual hierarchy
    that the tekenizer must preserve?)

[6]
para 5
"evaluated within the normal Query Processing (XQuery Processing Model),"
    Odd. Delete "the"? De-capitalize "Query Processing"?
    Is the parenthesized text supposed to be a link?
    Could just delete the whole quoted phrase; it doesn't seem relevant.

[7]
list 2
"3. ... which contents may be ignored"
    [7a]
    s/which contents/whose contents/

    [7b]
    s/may/must/

[8]
para 8 (2nd after diagram)
"Tokenization normally occurs at the time of parsing of the original XML
documents, for example, during the Data Model Generation process"
    That may be true in the real world, but not in the Processing Model.
    See my comment for para 1 above.

[9]
para 9, 11, ...
"Full Text expression"
    When this section refers to a "Full Text expression", it specifically
    means a full-text contains expression. Might as well be specific.

[10]
list 3
"1. ... the set of search context items"
    s/set/sequence/

[11]
"2. Evaluate the (optional) ignore expression, resulting in the set of
ignored nodes and virtually delete the ignore nodes from the search
context nodes tree."
    [11a]
    The ignore option must be evaluated for each search context item, so
    2 should be the new 4a.

    [11b]
    s/ignore expression/ignore option/

    [11c]
    s/nodes and virtually/nodes, and virtually/ (or "nodes. Virtually")

    [11d]
    s/ignore nodes/ignored nodes/

    [11e]
    s/the search context nodes tree/the search context item/

[12]
"4a. Apply the tokenization algorithm"
    In terms of the processing model, you can't do tokenization at this
    level. Each different FTPrimaryWithOptions within the FTSelection
    is allowed to have different FTMatchOptions, some of which affect
    tokenization. So theoretically, each FTWords causes its own
    tokenization of the search context item.

[13]
'4b. Evaluate the simple "FTWord" operators'
    s/FTWord/FTWords/

[14]
'against the tokenized input'
    s/input/context item/
    ("input" suggests an external document)

[15]
"4c. ... in a bottom up fashion"
    s/bottom up/bottom-up/

[16]
"At each step the AllMatches instance produced by the previous steps"
    s/instance/instances/

[17]
"and a new instance of the AllMatches"
    s/instance of the AllMatches/AllMatches instance/

[18]
"the FTMatchOptions are controlling the semantics"
    s/are controlling/control/

[19]
"5. Convert the AllMatches instance"
    s/the AllMatches instance/the topmost AllMatches instances/
    (since each search context item results in one topmost AllMatches
    instanmce)

Comment 1 Jim Melton 2007-08-17 21:14:05 UTC

The Task Force discussed your item [1] and determined that the spec is not actually incorrect as written (with respect to this topic) and that there were much more significant tasks awaiting attention from the editors.  The TF therefore resolved to close this item [1] with no changes to the document.  Since you were a participant in the discussions in which this decision was made, we presume that you are satisfied with the result.

Comment 2 Jim Melton 2007-09-13 22:53:35 UTC

As decided in meeting #152 (the minutes of which are at the member-only URI
http://lists.w3.org/Archives/Member/member-query-fttf/2007Sep/0005.html), items
[4,], [5], [7a], [7b] (that is, all of item [7]), [11b], [11c], [11d], [13], [14], [15], [16], [17], [18], and [19] have been resolved. 

That leaves items [2a], [2b], [3], [8], [9], [10], [11a], [11e], and [12] to be resolved.

Comment 3 Pat Case 2007-10-01 19:21:28 UTC

Item [11e] changes made as recommended. Approved by FTTF on October 1, 2007. That leaves items [2a], [2b], [3], [8], [9], [10], [11a], and [12] to be
resolved.

Comment 4 Pat Case 2007-10-16 12:37:40 UTC

[3] Done previously.
[6] The FTTF agreed. Deleted (XQuery Processing Model).
[8] The FTTF agreed. Changed the 2nd para after the Processing Model diagram to:
Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all full-text selections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization transforms an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of full-text selections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models. 
[9] The FTTF agreed. Changed Full Text repression to FTContainsExpr.
[10] The FTTF agreed. Changed set to sequence.
[11a] The FTTF agreed. In the Processing Model list moved 2 (ignore option) to be a new 4a. 

Items [2a,b] and [12] remain to be resolved.

Comment 5 Pat Case 2008-01-24 22:04:10 UTC

[12] Accepted proposed rewrites for Section 4 with minor changes.
Then added after numbered list in 2.1:
(Note that a more detailed version of the above procedure appears in Section 4.3 FTContainsExpr.)

[2a-b] Delete 2.1 first sentence. Place corresponding information in 3b something like:
Note that implementations may (for reasons of optimization) perform tokenization [as part of external etc]
 
The completion of the 2 items finishes the resolution of the bug.

MichaelD, once the changes are made in the document, please mark the bug closed.

Comment 6 Michael Dyck 2008-02-11 09:57:29 UTC

Reviewing the items of this issue...

[1]
Given all the surgery we've done on section 2.1, it seems to me there's now even less reason to separate sections 2.1 and 2.2. However, I realize that the previous determination (that having them separate is not actually incorrect) is still as valid.

[6]
> The FTTF agreed. Deleted (XQuery Processing Model).
But the remaining phrase "evaluated within the normal Query Processing" is still pretty odd. (E.g., it doesn't seem to be justified by any phrasing in the XQuery spec.)

[9]
> The FTTF agreed. Changed Full Text repression to FTContainsExpr.
There are still a few occurrences of "full-text expression" (meaning "full-text contains expression") in the section. Should these be changed?

I think all the other items have been put to rest

Comment 7 Michael Dyck 2008-03-03 01:30:38 UTC

The points in comment #6 were discussed at FTTF meeting 164.

> [1]
> Given all the surgery we've done on section 2.1, it seems to me there's now
> even less reason to separate sections 2.1 and 2.2. However, I realize that
> the previous determination (that having them separate is not actually
> incorrect) is still as valid.

The Task Force is okay with it as is.

> [6]
> > The FTTF agreed. Deleted (XQuery Processing Model).
> But the remaining phrase "evaluated within the normal Query Processing" is
> still pretty odd. (E.g., it doesn't seem to be justified by any phrasing in
> the XQuery spec.)

The Task Force decided to delete:
    ", evaluated within the normal Query Processing,"

> [9]
> > The FTTF agreed. Changed Full Text repression to FTContainsExpr.
> There are still a few occurrences of "full-text expression" (meaning
> "full-text contains expression") in the section. Should these be changed?

The Task Force decided yes.

Comment 8 Michael Dyck 2008-03-03 01:32:40 UTC

I have applied the changes for [6] and [9] to the document, and so am marking this bug resolved-fixed. I will also close it.