1394 – Improvement to fn:tokenize function

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 1394 - Improvement to fn:tokenize function

Summary: Improvement to fn:tokenize function

Status:	CLOSED WONTFIX

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 1.0 (show other bugs)
Version:	Last Call drafts
Hardware:	PC Windows 2000

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	Ashok Malhotra
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-05-12 17:47 UTC by Mukul Gandhi
Modified:	2005-09-29 11:02 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Mukul Gandhi 2005-05-12 17:47:23 UTC

I was reading the latest working draft of "XQuery 1.0 and XPath 2.0 Functions 
and Operators" at http://www.w3.org/TR/xpath-functions/ ..

I felt a need for improvement of fn:tokenize function (described in section 
7.6.4) .

Just now tokenize function breaks the input string into a sequence of strings ..

I'll illustrate the problem I am facing with an example (this is tested with 
Saxon 8.4)..

I want to tokenize a string by "any capital letter". So A,B,C .... Z will be 
possible delimeters. I can solve this problem as below with the tokenize 
function (using a regular expression) ..

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="text" />

<xsl:variable name="tempstr" select="'HelloThere'" />  
  
<xsl:template match="/">
  <xsl:for-each select="tokenize($tempstr, '[A-Z]')">
    <xsl:value-of select="." /><xsl:text> </xsl:text>
  </xsl:for-each>
</xsl:template>
  
</xsl:stylesheet>

This gives output ello here

Its fine.. But I have no access to the current delimeter (it is variable for 
each iteration) ..

I propose a function like "fn:delim() as xs:string" which will return the 
delimeter in context .. (it will be conceptually similar to position() function)

For example, I would be able to modify the above example to like ..
<xsl:for-each select="tokenize($tempstr, '[A-Z]')">
  <xsl:value-of select="delim()" /><xsl:value-of select="." /><xsl:text> 
</xsl:text>
</xsl:for-each>

This will return output Hello There

I guess it will be useful..

Regards,
Mukul

Comment 1 Michael Kay 2005-05-12 19:49:45 UTC

Thanks for the comment, Mukul. We did try to design a function that provided
this capability but found that it was too difficult to do as a pure function
because of the complexity of the result. Providing access to a secondary result
using an ancillary function delim() might seem natural in an XSLT context, but
it doesn't fit the stricter functional style of XPath and XQuery. XQuery avoids
such context-dependent functions because they make the job of the optimizer much
harder.

So instead we provided this functionality in XSLT through the xsl:analyze-string
instruction, which has two sub-instructions, matching-substring and
non-matching-substring. This is similar to tokenize() except that both the
tokens and the separators are returned (and you can also get access to the
matched subgroups within the matched pattern using the ancillary regex-group()
function).

You might also be interested that in Saxon I have provided the functionality of
xsl:analyze-string as an extension function saxon:analyze-string so that it is
available to XQuery users. This exploits Saxon's support for higher-order
functions: once XQuery supports higher-order functions in some future release it
will be much easier to design functions that do this kind of job.

This is a personal response, you will get an official response from the WGs in
due course.

Michael Kay

Comment 2 Mukul Gandhi 2005-05-17 05:08:53 UTC

Hi Mike,
  I am curious to know, has there been any progress made for my this 
suggestion? I wish to know how XSL WG felt about this subject. I am also keen 
to know what are the implications with regards to this. i.e. how this may or 
may not fit in the functional programming style of XSLT (2.0) (and what 
benifit the user will get).. I am a newbie to functional programming style .. 
I'll be grateful.. if you can please elaborate more on this topic..

Regards,
Mukul

Comment 3 Ashok Malhotra 2005-05-19 22:21:53 UTC

There was not enough support for this enhancement during the discussion by the
joint QT WGs on 5/19/2005.  It was suggested that the requested functionality
could be achieved via the XSLT analyze-string function as discussed in the post
by Michael Kay.

Ashok Malhotra