This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10261 - [F&O] return value of fn:replace and fn:tokenize if $pattern does not match
Summary: [F&O] return value of fn:replace and fn:tokenize if $pattern does not match
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.0 (show other bugs)
Version: Working drafts
Hardware: PC Windows XP
: P2 minor
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-29 14:29 UTC by Herbert Oppmann
Modified: 2010-10-20 07:18 UTC (History)
0 users

See Also:


Attachments

Description Herbert Oppmann 2010-07-29 14:29:50 UTC
Regarding: Recommendation 23 January 2007, chapter 7.6.3 fn:replace and 7.6.4 fn:tokenize

The document does not explicitely describe what fn:replace and fn:tokenize shall return if the $pattern does not match the $input.

I tried it with Altova XMLSpy, the only implementation of XSLT 2.0 I have, and it returns $input unchanged.

This makes sense, and should be documented.
Comment 1 Michael Kay 2010-07-29 15:35:16 UTC
Thanks for the comment.

Concerning fn:replace, the summary says: "The function returns the xs:string that is obtained by replacing each non-overlapping substring of $input that matches the given $pattern with an occurrence of the $replacement string."

Admittedly this relies on an understanding of the English phrase "replacing a substring of a string", and in particular the implication that the rest of the string (outside that substring) is left unchanged; but given that expansion, I don't see how one can interpret this in any way other than meaning if there are no matches, no replacements are made, and therefore the original string is returned unchanged. (OK, perhaps it's also relying on the mathematician's understanding of "each" as opposed to the everyday understanding.)

Similarly the summary of fn:tokenize is "This function breaks the $input string into a sequence of strings, treating any substring that matches $pattern as a separator. The separators themselves are not returned." 

Again this is informal language, but I think the only possible interpretation is that if there are no matches then there are no separators and therefore the result sequence contains a single string which is the same as the original.

My recommendation to the WG is to treat this as editorial, and to try and improve the text for the next ("1.1") version by making the language a bit more formal. In particular, I've been trying in 1.1 to ensure that the function summaries contain no information that does not appear explicitly in the rules, and these two functions don't satisfy that rule (in the 1.0/2.0 spec, very few did).
Comment 2 Herbert Oppmann 2010-07-30 09:41:33 UTC
Dear Michael,

thanks for your quick response on that. Your recommendation to the WG is perfectly OK for me.

I was just not sure if I can rely on these functions returning the input string unchanged if the pattern does not match, because this was a bit "between the lines" while all the other aspects of the behaviour of these functions were so clear and formal.

Thanks again.
Comment 3 Michael Kay 2010-07-30 10:18:06 UTC
An observation: in 1.1/2.1 we could define fn:tokenize(S, P, F) to be the value of fn:analyze-string(S, P, F)/*/fn:match/string().

Defining fn:replace(S, P, R, F) in terms of fn:analyze-string(S, P, F) is rather more challenging because of the complexities of captured groups, but it ought to be possible to do it.
Comment 4 Michael Kay 2010-10-19 16:00:14 UTC
Changed status to editorial, and version to 1.1; closing as FIXED on the basis that the editor has agreed to clarify the text (e.g. by adding examples) without changing the technical nature of the specification.

Originator: please mark as CLOSED to indicate your acceptance of this resolution.