24385 – [FO30] Unclarity in last-line resolution for fn:unparsed-text-lines()

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24385 - [FO30] Unclarity in last-line resolution for fn:unparsed-text-lines()

Summary: [FO30] Unclarity in last-line resolution for fn:unparsed-text-lines()

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Functions and Operators 3.0 (show other bugs)
Version:	Proposed Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-24 16:33 UTC by Abel Braaksma
Modified:	2014-02-19 09:17 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Abel Braaksma 2014-01-24 16:33:18 UTC

The last sentence under fn:unparsed-text-lines() [1] reads:

"but if the external resource ends with a newline sequence, no zero-length string will be returned as the last item in the result."

and the example given is:

fn:tokenize(fn:unparsed-text($href), '\r\n|\r|\n')[not(position()=last() and .='')]

The explanation and the example create a slight ambiguity when dealing with multiple empty lines at the end of the input. The text seems to imply that empty lines are removed, i.e. "no zero-length string will be returned as the last item in the result". But the normative code snippet only strips the last empty line.

We've currently implemented this to only strip at most one line from the end, if it is empty. I think this is correct. But another interpretation of the above may mean that all empty lines at the end should be removed.

This also raises the question on the corner case of input consisting solely of empty lines. From the example code, zero or one empty line will return the empty sequence, more empty lines will return a sequence of empty strings, one less than the number of empty lines in the input.

Comment 1 Michael Kay 2014-02-19 09:17:44 UTC

The WG examined this and agreed that the text as written was capable of being misinterpreted.

The intended meaning (in line with the usual behaviour of regular expressions in other languages) is that at most one trailing newline should be ignored. To clarify, the text in both 3.0 and 3.1 has been changed to read:

If there are two adjacent newline sequences, a zero-length string will be returned to represent the empty line; but if the external resource ends with the sequence x0A, x0D, or x0Dx0A, the result will be as if this final line ending were not present.