This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28598 - [FO31] (editorial) unclarity in description of negative starting position with fn:subsequence
Summary: [FO31] (editorial) unclarity in description of negative starting position wit...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 minor
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-05-04 07:31 UTC by Abel Braaksma
Modified: 2015-05-14 21:36 UTC (History)
0 users

See Also:


Attachments

Description Abel Braaksma 2015-05-04 07:31:06 UTC
We had a lengthy discussion this morning about this seemingly trivial point. We concluded that the spec is correct, but that it was open for some ambiguity, hence this bug report / editorial enhancement request.

The FO30 and FO31 spec says, about a negative starting position:

   * If $startingLoc is zero or negative, the subsequence includes items from 
     the beginning of the $sourceSeq.

This is easily misread, as in our case, that if $startingLoc is zero or negative, it is taken as 1.

But after looking at existing behavior of implementations and expected test outcomes, the actual sequence slice taken is from position 1 to $length + $startingPos (in other words, from the start to the difference between these two numbers).

If $length = 1, $start = 0, the first item is returned
If $length = 1, $start = -1, the empty seq is returned
If $length = 5, $start = -3, the first two items are returned

After looking again to the written out function, it is formally working that way. But we found the text vague:

- there are not always items from returned from the beginning of the $source
- the number of items is not equal to the $length argument

I'm not sure if, whether, or how this can/sould be improved, but I am rooting for something along those lines: 

   * If $startingLoc is zero or negative, the subsequence includes items from
     the start of the sequence to position = $length + $startingPos, or the 
     empty sequence if the result is zero or negative.

In addition to this, it might also proof helpful or add to clarity to have a statement like: if $length is zero or negative, the empty sequence is returned.
Comment 1 Michael Kay 2015-05-05 14:40:43 UTC
The text you are complaining about is in a Note. Notes are intended to clarify the rules. If it's not aiding your understanding, then I guess we should take it out. However, I'm reluctant - the way the rules are written doesn't make it easy for the reader to see the consequences of all combinations of arguments.
Comment 2 Abel Braaksma 2015-05-06 01:16:03 UTC
You are correct in that I read the Notes as mandatory additional rules. If I hadn't interpret them as mandatory, I would probably have given less wait to the precise semantic interpretation.

I do feel as though the mentioned Note is confusing.

Also, I noticed that the 6th para under Notes starts with "As an exception to the previous two notes", of which the first is "If $length is not specified then...". 

These notes seem to contradict each other. The 4th para implies (because of "as an exception to...") that $length, if absent, is taken as +INF. But then the 6th note would invalidate that and return nothing in the edge-case call:

   fn:subsequence($seq, xs:double('-INF'))

But, unless I am very mistaken, I believe the above call is supposed to return all items. In fact, if only $start is given, and it is negative, all items will be returned (which, as such, is not amongst the notes, unless the 3rd para, which is again the confusing one).

SUMMARY (of impr. suggestions / thoughts)
- para 3 "if $startingLoc..." seems incomplete (comment#0)
- para 6 "as an exception" seems to contradict, in part, para 4
- edge case -INF without length argument is not in the notes (or it is para 6)
- edge case negative or zero length is not in the notes
- edge case if $start or $length is NaN, nothing is returned
Comment 3 Abel Braaksma 2015-05-06 01:17:01 UTC
> less wait
oops, s/wait/weight/
Comment 4 Michael Kay 2015-05-06 18:16:47 UTC
I suggest we delete notes 3, 4, and 5, and replace them with the following sentence modelled on what appears for "substring":

The function returns a sequence comprising those items of $sourceSeq whose index position (counting from one) is greater than or equal to the value of $startingLoc (rounded to an integer), and (if $length is specified) less than the sum of $startingLoc and $length (both rounded to integers). No error occurs if $startingLoc is zero or negative, or if $startingLoc plus $length exceeds the number of items in the sequence, or if $length is negative.
Comment 5 Michael Kay 2015-05-13 08:35:19 UTC
I have rewritten the notes along the lines suggested.
Comment 6 Abel Braaksma 2015-05-14 21:36:24 UTC
Thanks, straightforward and clear, closing.