3739 2006-09-18 19:22:58 +0000 [FT] Description of tokenization (Editorial/Technical) 2007-01-08 20:00:31 +0000 1 1 1 Unclassified XPath / XQuery / XSLT Full Text 1.0 Working drafts PC Windows XP CLOSED FIXED P2 normal --- 1 holstege doerre public-qt-comments oldest_to_newest 11826 0 holstege 2006-09-18 19:22:58 +0000 == Section 1.1 (Full-Text Search and XML) Bullet 3, final paragraph: "The tokenizer has to evaluate two equal strings..." (1) Suggest replacing "evaluate" with some other word that doesn't carry the same implications in the XQuery context, perhaps "process". (2) "equal" is troubling as well: equal as in XQuery equals in the face of a collation? Or codepoint-by-codepoint equal? I believe we mean the latter. Bullets 4 and 5 Should mention the relationship of markup to tokenization, particularly paragraph identification. I expect for most XML markup that it will be the markup, not white space, that identifies paragraph boundaries. 12238 1 holstege 2006-10-02 19:03:40 +0000 WG agreed with this comment on 2006-10-02. Change "evaluate" to "process" Change "equal" to "codepoint equal" Modify bullet 5 with the sentences: "Semantic markup serves well as token boundaries. Some formatting markup serves well as token boundaries, for example, paragraphs are most commonly delimited by formatting markup. Other formatting markup may not serve well as token boundaries." 12417 2 doerre 2006-10-13 10:12:22 +0000 DONE as agreed.