This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
2.1 Processing Model [1] section I think the spec might be better off with the contents of this section put elsewhere. E.g., the stuff on tokenization can be merged into 4.1; pretty much everything else is specific to full-text contains expressions, so can be merged into 2.2.1. [2] para 1 "As part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance, a full-text process called tokenization is usually executed." With respect to the Processing Model, tokenization is *not* part of external processing, because: (a) there's no allowance for tokens in the Data Model, and (b) the only place/time where the thing-to-be-tokenized and the options-by-which-to-tokenize-it are guaranteed to come together is within the query at evaluation time. (Implementations may be able to statically determine [or guess] some combinations, and so do pre-tokenization, but that's not something that is [or should be] captured in the Processing Model.) Replace the para with something like: "At various points in full-text processing, the processor is called upon to 'tokenize' a string." [3] para 3 'including the definition of the term "words"' Delete. (Avoid using the term "words".) [4] "interprete" Change to "interpret". [5] list 1 "2. ... the containment hierarchy (e.g., paragraphs contain sentences, which contain words)" I think you mean "i.e.", not "e.g.". (If that's just an *example* of a containment hierarchy, then who gets to define the actual hierarchy that the tekenizer must preserve?) [6] para 5 "evaluated within the normal Query Processing (XQuery Processing Model)," Odd. Delete "the"? De-capitalize "Query Processing"? Is the parenthesized text supposed to be a link? Could just delete the whole quoted phrase; it doesn't seem relevant. [7] list 2 "3. ... which contents may be ignored" [7a] s/which contents/whose contents/ [7b] s/may/must/ [8] para 8 (2nd after diagram) "Tokenization normally occurs at the time of parsing of the original XML documents, for example, during the Data Model Generation process" That may be true in the real world, but not in the Processing Model. See my comment for para 1 above. [9] para 9, 11, ... "Full Text expression" When this section refers to a "Full Text expression", it specifically means a full-text contains expression. Might as well be specific. [10] list 3 "1. ... the set of search context items" s/set/sequence/ [11] "2. Evaluate the (optional) ignore expression, resulting in the set of ignored nodes and virtually delete the ignore nodes from the search context nodes tree." [11a] The ignore option must be evaluated for each search context item, so 2 should be the new 4a. [11b] s/ignore expression/ignore option/ [11c] s/nodes and virtually/nodes, and virtually/ (or "nodes. Virtually") [11d] s/ignore nodes/ignored nodes/ [11e] s/the search context nodes tree/the search context item/ [12] "4a. Apply the tokenization algorithm" In terms of the processing model, you can't do tokenization at this level. Each different FTPrimaryWithOptions within the FTSelection is allowed to have different FTMatchOptions, some of which affect tokenization. So theoretically, each FTWords causes its own tokenization of the search context item. [13] '4b. Evaluate the simple "FTWord" operators' s/FTWord/FTWords/ [14] 'against the tokenized input' s/input/context item/ ("input" suggests an external document) [15] "4c. ... in a bottom up fashion" s/bottom up/bottom-up/ [16] "At each step the AllMatches instance produced by the previous steps" s/instance/instances/ [17] "and a new instance of the AllMatches" s/instance of the AllMatches/AllMatches instance/ [18] "the FTMatchOptions are controlling the semantics" s/are controlling/control/ [19] "5. Convert the AllMatches instance" s/the AllMatches instance/the topmost AllMatches instances/ (since each search context item results in one topmost AllMatches instanmce)
The Task Force discussed your item [1] and determined that the spec is not actually incorrect as written (with respect to this topic) and that there were much more significant tasks awaiting attention from the editors. The TF therefore resolved to close this item [1] with no changes to the document. Since you were a participant in the discussions in which this decision was made, we presume that you are satisfied with the result.
As decided in meeting #152 (the minutes of which are at the member-only URI http://lists.w3.org/Archives/Member/member-query-fttf/2007Sep/0005.html), items [4,], [5], [7a], [7b] (that is, all of item [7]), [11b], [11c], [11d], [13], [14], [15], [16], [17], [18], and [19] have been resolved. That leaves items [2a], [2b], [3], [8], [9], [10], [11a], [11e], and [12] to be resolved.
Item [11e] changes made as recommended. Approved by FTTF on October 1, 2007. That leaves items [2a], [2b], [3], [8], [9], [10], [11a], and [12] to be resolved.
[3] Done previously. [6] The FTTF agreed. Deleted (XQuery Processing Model). [8] The FTTF agreed. Changed the 2nd para after the Processing Model diagram to: Like all XQuery expressions, an FTContainsExpr returns an XDM Instance (see Fig. 1). With the exception of FTWords, which consumes TokenInfos, all full-text selections are closed under the AllMatches data model, i.e., their input and output are AllMatches instances. Tokenization transforms an XDM instance into TokenInfos, which ultimately get converted into AllMatches instances by the evaluation of full-text selections. Thus, the evaluation of nested full-text and XQuery expressions instances moves back and forth between these two models. [9] The FTTF agreed. Changed Full Text repression to FTContainsExpr. [10] The FTTF agreed. Changed set to sequence. [11a] The FTTF agreed. In the Processing Model list moved 2 (ignore option) to be a new 4a. Items [2a,b] and [12] remain to be resolved.
[12] Accepted proposed rewrites for Section 4 with minor changes. Then added after numbered list in 2.1: (Note that a more detailed version of the above procedure appears in Section 4.3 FTContainsExpr.) [2a-b] Delete 2.1 first sentence. Place corresponding information in 3b something like: Note that implementations may (for reasons of optimization) perform tokenization [as part of external etc] The completion of the 2 items finishes the resolution of the bug. MichaelD, once the changes are made in the document, please mark the bug closed.
Reviewing the items of this issue... [1] Given all the surgery we've done on section 2.1, it seems to me there's now even less reason to separate sections 2.1 and 2.2. However, I realize that the previous determination (that having them separate is not actually incorrect) is still as valid. [6] > The FTTF agreed. Deleted (XQuery Processing Model). But the remaining phrase "evaluated within the normal Query Processing" is still pretty odd. (E.g., it doesn't seem to be justified by any phrasing in the XQuery spec.) [9] > The FTTF agreed. Changed Full Text repression to FTContainsExpr. There are still a few occurrences of "full-text expression" (meaning "full-text contains expression") in the section. Should these be changed? I think all the other items have been put to rest
The points in comment #6 were discussed at FTTF meeting 164. > [1] > Given all the surgery we've done on section 2.1, it seems to me there's now > even less reason to separate sections 2.1 and 2.2. However, I realize that > the previous determination (that having them separate is not actually > incorrect) is still as valid. The Task Force is okay with it as is. > [6] > > The FTTF agreed. Deleted (XQuery Processing Model). > But the remaining phrase "evaluated within the normal Query Processing" is > still pretty odd. (E.g., it doesn't seem to be justified by any phrasing in > the XQuery spec.) The Task Force decided to delete: ", evaluated within the normal Query Processing," > [9] > > The FTTF agreed. Changed Full Text repression to FTContainsExpr. > There are still a few occurrences of "full-text expression" (meaning > "full-text contains expression") in the section. Should these be changed? The Task Force decided yes.
I have applied the changes for [6] and [9] to the document, and so am marking this bug resolved-fixed. I will also close it.