<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>5122</bug_id>
          
          <creation_ts>2007-10-01 19:40:35 +0000</creation_ts>
          <short_desc>[FT] Section 4 Tokenization constraint</short_desc>
          <delta_ts>2007-10-11 21:40:17 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Last Call drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mary Holstege">holstege</reporter>
          <assigned_to name="Jim Melton">jim.melton</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>16956</commentid>
    <comment_count>0</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2007-10-01 19:40:35 +0000</bug_when>
    <thetext>The definition of tokenization in section 4 includes the rule:

&quot;The tokenizer MUST, when tokenizing two equal items, identify the same tokens in each.&quot;

This is too strong.  The context in which the items arise may impact the
tokenization of those items.  As a simple example: the parent element may provide different xml:lang attributes. Other implementation-specific 
configuration information may apply to ancestors of the item and impact 
how the item itself is tokenized.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17152</commentid>
    <comment_count>1</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2007-10-11 21:40:03 +0000</bug_when>
    <thetext>Agreed to at f2f meeting 11 Oct 2007:
Change text to:
The tokenizer SHOULD, when
tokenizing two equal items, identify the same tokens in each.
The cases where tokenization of two equal items does not
identity the same tokens in each is implementation-defined.
</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>