<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>3739</bug_id>
          
          <creation_ts>2006-09-18 19:22:58 +0000</creation_ts>
          <short_desc>[FT] Description of tokenization (Editorial/Technical)</short_desc>
          <delta_ts>2007-01-08 20:00:31 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Full Text 1.0</component>
          <version>Working drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows XP</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mary Holstege">holstege</reporter>
          <assigned_to name="Jochen Doerre">doerre</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>11826</commentid>
    <comment_count>0</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2006-09-18 19:22:58 +0000</bug_when>
    <thetext>== Section 1.1 (Full-Text Search and XML)
Bullet 3, final paragraph: 
&quot;The tokenizer has to evaluate two equal strings...&quot;
(1) Suggest replacing &quot;evaluate&quot; with some other word that doesn&apos;t carry the
same implications in the XQuery context, perhaps &quot;process&quot;.
(2) &quot;equal&quot; is troubling as well: equal as in XQuery equals in the face of
a collation? Or codepoint-by-codepoint equal?  I believe we mean the latter.

Bullets 4 and 5
Should mention the relationship of markup to tokenization, particularly
paragraph identification.  I expect for most XML markup that it will be the
markup, not white space, that identifies paragraph boundaries.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12238</commentid>
    <comment_count>1</comment_count>
    <who name="Mary Holstege">holstege</who>
    <bug_when>2006-10-02 19:03:40 +0000</bug_when>
    <thetext>WG agreed with this comment on 2006-10-02.

Change &quot;evaluate&quot; to &quot;process&quot;
Change &quot;equal&quot; to &quot;codepoint equal&quot;

Modify bullet 5 with the sentences:

&quot;Semantic markup serves well as token boundaries. Some formatting markup serves
well as token boundaries, for example, paragraphs are most commonly delimited
by formatting markup. Other formatting markup may not serve well as token
boundaries.&quot;
 
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>12417</commentid>
    <comment_count>2</comment_count>
    <who name="Jochen Doerre">doerre</who>
    <bug_when>2006-10-13 10:12:22 +0000</bug_when>
    <thetext>DONE as agreed.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>