This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The definition of tokenization in section 4 includes the rule: "The tokenizer MUST, when tokenizing two equal items, identify the same tokens in each." This is too strong. The context in which the items arise may impact the tokenization of those items. As a simple example: the parent element may provide different xml:lang attributes. Other implementation-specific configuration information may apply to ancestors of the item and impact how the item itself is tokenized.
Agreed to at f2f meeting 11 Oct 2007: Change text to: The tokenizer SHOULD, when tokenizing two equal items, identify the same tokens in each. The cases where tokenization of two equal items does not identity the same tokens in each is implementation-defined.