<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>1394</bug_id>
          
          <creation_ts>2005-05-12 17:47:23 +0000</creation_ts>
          <short_desc>Improvement to fn:tokenize function</short_desc>
          <delta_ts>2005-09-29 11:02:21 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>XPath / XQuery / XSLT</product>
          <component>Functions and Operators 1.0</component>
          <version>Last Call drafts</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows 2000</op_sys>
          <bug_status>CLOSED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>enhancement</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Mukul Gandhi">mukul_gandhi</reporter>
          <assigned_to name="Ashok Malhotra">ashok.malhotra</assigned_to>
          
          
          <qa_contact name="Mailing list for public feedback on specs from XSL and XML Query WGs">public-qt-comments</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>3307</commentid>
    <comment_count>0</comment_count>
    <who name="Mukul Gandhi">mukul_gandhi</who>
    <bug_when>2005-05-12 17:47:23 +0000</bug_when>
    <thetext>I was reading the latest working draft of &quot;XQuery 1.0 and XPath 2.0 Functions 
and Operators&quot; at http://www.w3.org/TR/xpath-functions/ ..

I felt a need for improvement of fn:tokenize function (described in section 
7.6.4) .

Just now tokenize function breaks the input string into a sequence of strings ..

I&apos;ll illustrate the problem I am facing with an example (this is tested with 
Saxon 8.4)..

I want to tokenize a string by &quot;any capital letter&quot;. So A,B,C .... Z will be 
possible delimeters. I can solve this problem as below with the tokenize 
function (using a regular expression) ..

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;xsl:stylesheet version=&quot;1.0&quot; xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot;&gt;

&lt;xsl:output method=&quot;text&quot; /&gt;

&lt;xsl:variable name=&quot;tempstr&quot; select=&quot;&apos;HelloThere&apos;&quot; /&gt;  
  
&lt;xsl:template match=&quot;/&quot;&gt;
  &lt;xsl:for-each select=&quot;tokenize($tempstr, &apos;[A-Z]&apos;)&quot;&gt;
    &lt;xsl:value-of select=&quot;.&quot; /&gt;&lt;xsl:text&gt; &lt;/xsl:text&gt;
  &lt;/xsl:for-each&gt;
&lt;/xsl:template&gt;
  
&lt;/xsl:stylesheet&gt;

This gives output ello here

Its fine.. But I have no access to the current delimeter (it is variable for 
each iteration) ..

I propose a function like &quot;fn:delim() as xs:string&quot; which will return the 
delimeter in context .. (it will be conceptually similar to position() function)

For example, I would be able to modify the above example to like ..
&lt;xsl:for-each select=&quot;tokenize($tempstr, &apos;[A-Z]&apos;)&quot;&gt;
  &lt;xsl:value-of select=&quot;delim()&quot; /&gt;&lt;xsl:value-of select=&quot;.&quot; /&gt;&lt;xsl:text&gt; 
&lt;/xsl:text&gt;
&lt;/xsl:for-each&gt;

This will return output Hello There

I guess it will be useful..

Regards,
Mukul</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>3310</commentid>
    <comment_count>1</comment_count>
    <who name="Michael Kay">mike</who>
    <bug_when>2005-05-12 19:49:45 +0000</bug_when>
    <thetext>Thanks for the comment, Mukul. We did try to design a function that provided
this capability but found that it was too difficult to do as a pure function
because of the complexity of the result. Providing access to a secondary result
using an ancillary function delim() might seem natural in an XSLT context, but
it doesn&apos;t fit the stricter functional style of XPath and XQuery. XQuery avoids
such context-dependent functions because they make the job of the optimizer much
harder.

So instead we provided this functionality in XSLT through the xsl:analyze-string
instruction, which has two sub-instructions, matching-substring and
non-matching-substring. This is similar to tokenize() except that both the
tokens and the separators are returned (and you can also get access to the
matched subgroups within the matched pattern using the ancillary regex-group()
function).

You might also be interested that in Saxon I have provided the functionality of
xsl:analyze-string as an extension function saxon:analyze-string so that it is
available to XQuery users. This exploits Saxon&apos;s support for higher-order
functions: once XQuery supports higher-order functions in some future release it
will be much easier to design functions that do this kind of job.

This is a personal response, you will get an official response from the WGs in
due course.

Michael Kay</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>3267</commentid>
    <comment_count>2</comment_count>
    <who name="Mukul Gandhi">mukul_gandhi</who>
    <bug_when>2005-05-17 05:08:53 +0000</bug_when>
    <thetext>Hi Mike,
  I am curious to know, has there been any progress made for my this 
suggestion? I wish to know how XSL WG felt about this subject. I am also keen 
to know what are the implications with regards to this. i.e. how this may or 
may not fit in the functional programming style of XSLT (2.0) (and what 
benifit the user will get).. I am a newbie to functional programming style .. 
I&apos;ll be grateful.. if you can please elaborate more on this topic..

Regards,
Mukul</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>3125</commentid>
    <comment_count>3</comment_count>
    <who name="Ashok Malhotra">ashok.malhotra</who>
    <bug_when>2005-05-19 22:21:53 +0000</bug_when>
    <thetext>There was not enough support for this enhancement during the discussion by the
joint QT WGs on 5/19/2005.  It was suggested that the requested functionality
could be achieved via the XSLT analyze-string function as discussed in the post
by Michael Kay.

Ashok Malhotra</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>