This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5122 - [FT] Section 4 Tokenization constraint
Summary: [FT] Section 4 Tokenization constraint
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Full Text 1.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-01 19:40 UTC by Mary Holstege
Modified: 2007-10-11 21:40 UTC (History)
0 users

See Also:


Attachments

Description Mary Holstege 2007-10-01 19:40:35 UTC
The definition of tokenization in section 4 includes the rule:

"The tokenizer MUST, when tokenizing two equal items, identify the same tokens in each."

This is too strong.  The context in which the items arise may impact the
tokenization of those items.  As a simple example: the parent element may provide different xml:lang attributes. Other implementation-specific 
configuration information may apply to ancestors of the item and impact 
how the item itself is tokenized.
Comment 1 Mary Holstege 2007-10-11 21:40:03 UTC
Agreed to at f2f meeting 11 Oct 2007:
Change text to:
The tokenizer SHOULD, when
tokenizing two equal items, identify the same tokens in each.
The cases where tokenization of two equal items does not
identity the same tokens in each is implementation-defined.