This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29120 - [xslt 3.0] last() in a streamable xsl:merge
Summary: [xslt 3.0] last() in a streamable xsl:merge
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-10 21:26 UTC by Michael Kay
Modified: 2015-10-30 15:38 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2015-09-10 21:26:25 UTC
We appear to have no rule preventing the use of last() in a streamable xsl:merge, either when computing a merge key, or within xsl:merge-action. In the first case the value of last() is the number of items in the input stream; in the second case it is the number of groups (distinct merge key values). Neither can be evaluated without lookahead.

I think the problem can be solved by adding a fifth condition to the list in 15.4 Streamable Merging:

5. Neither the select attribute of any xsl:merge-key child of the xsl:merge-source element, nor the sequence constructor within the sibling xsl:merge-action element, contains any function call or named function reference to either of the functions fn:last or fn:function-lookup.
Comment 1 Michael Kay 2015-09-10 22:12:02 UTC
An observation: I discovered rather to my surprise that calling last() within xsl:merge-action in Saxon actually works today. It has the effect that the input file is processed twice.
Comment 2 Abel Braaksma 2015-09-23 13:04:15 UTC
We could say that the context size is unavailable and any attempt to get the context size is an error while streaming. This would prevent extension functions or instructions to somehow make a call to fn:last().

Another observation is that it is therefore not possible to create a specific action on the last item of a merge. I.e., it is not possible to get a sum, a total or a summary at the end. I am not sure if accumulators can help circumvent this, because after the xsl:merge instruction they will be out of scope. We could consider a child <xsl:last-merge-action> (similar to xsl:on-completion), but at this stage we stopped adding new features ;). Or is there another workaround?

I don't think it is necessary to ban fn:function-lookup. Since the context is a grounded snapshot, one can circumvent this with assigning the result of fn:function-lookup to a variable and calling that variable from within the xsl:merge-action. Hence I think prohibiting getting the context size may be an easier way to prevent this.
Comment 3 Abel Braaksma 2015-09-23 15:15:25 UTC
We also say in the same section 15.4:

> 3. The expression in the select attribute of that xsl:merge-source element
> has striding posture;

Which seems to already prohibit the use of fn:last(), which, if applied to anything but a grounded posture, makes the expression roaming and free-ranging.

Hence I don't think we need an extra rule for the use of fn:last().

I noticed some other things that may need addressing w.r.t. rule #3, I have reported them separately here: bug#29142.
Comment 4 Michael Kay 2015-09-28 10:42:48 UTC
Our streamability rules for xsl:merge currently impose no constraints on what you can write in xsl:merge-action; this is on the theory that the context item for xsl:merge-action is grounded (it's a sequence delivered by snapshot()). Although there's a good case for allowing test="position() eq last()" so one can detect the last group in the sequence of groups, permitting last() without restrictions would allow

<xsl:merge-action>
  <xsl:if test="last() = 27186">...</xsl:if>
</xsl:merge-action>

which requires knowledge of the number of groups even when processing the first group. As I mentioned, Saxon is actually handling this, by reading all the input twice, but I don't think this comes within our usual definition of streaming.

The problems are slightly different when it comes to computing a merge key. Here we are currently allowing

<xsl:merge-source for-each-stream="doc.xml" select="/*/*">
    <xsl:merge-key select="last() - 825"/>

where we can't compute the merge key without knowing the number of items in the input.

The problems with last() actually go beyond xsl:merge. Consider:

<xsl:for-each select="copy-of(/*/transaction)">
  <xsl:value-of select="last()"/>
</xsl:for-each>

Here last() is allowed because the context posture is grounded. But we have the same problem, that in effect we have to materialize the entire result of copy-of(), rather than pipelining it one item at a time. I think this is consistent with our general approach - we also allow reverse(copy-of(/*/*)) - so in effect we're saying if you use copy-of on a streamed input sequence, the system may need to hold the entire result of copy-of in memory. But with xsl:merge, where we do an implicit snapshot(), holding the entire sequence of snapshots in memory would destroy the whole point.

I'm not sure whether comment #2 is suggesting that we make use of last() in these situations a dynamic error. That would be very different from our usual approach where streamability violations are detected statically.
Comment 5 Abel Braaksma 2015-09-28 13:40:43 UTC
> <xsl:merge-source for-each-stream="doc.xml" select="/*/*">
>    <xsl:merge-key select="last() - 825"/>

This is interesting. My instinct says this should be forbidden. Checking out the rules once more, I see that for streamability we do not prohibit anything in xsl:merge-key. That means that:

<xsl:merge-key select="../foo" />

is allowed. Considering that we take a snapshot, one could argue that this is essentially a grounded expression that will always select the empty sequence, but that hardly constitutes the principle of least surprise, and is isn't helping any useful case I can think of.

My proposal would be to add an extra rule to guaranteed-streamable rules for xsl:merge-source, something like:

    5. Every select expression of child xsl:merge-keys has a striding or 
       grounded posture and a motionless or consuming sweep when assessed 
       with the context item type and posture of the select expression of 
       the parent xsl:merge-source.

This rule will also take care of erroneous use of the fn:last() function, which is not allowed in a striding posture.

As presently written it does not take care of a context item type other than nodes (i.e. atomic types or function items), which, if we are keeping this disallowed because of fn:snapshot, is better if we statically prohibit it to make the streaming rules apply to all allowed cases (if not, then the fn:last() function would be allowed if xsl:merge-source selects non-nodes, only to hit a dynamic error later with fn:snapshot).

-----

> <xsl:for-each select="copy-of(/*/transaction)">
>   <xsl:value-of select="last()"/>
> </xsl:for-each>

While this exact construct is not problematic, it becomes problematic with /*/transaction/copy-of(). I have reported that separately: bug 29153, comment 0.

> I'm not sure whether comment #2 is suggesting that we make use of last() 
> in these situations a dynamic error. That would be very different from our 
> usual approach where streamability violations are detected statically.

No, I think we can statically determine whether or not the context-size is requested. In the case of binding a variable to the last() function, don't we already have a restriction on this (in the sense that binding to focus-dependent functions is disallowed)? But since this then becomes a *dynamic* function call, a dynamic error when invoking such a dynamic function call seems appropriate.

The alternative is to:
- ban fn:last()
- ban function-lookup
- ban all and any dynamic function calls
- ban inline functions (they can contain fn:last)

this seems a bit too much killing a mosquito with a bazooka ;).
Comment 6 Michael Kay 2015-10-08 17:33:57 UTC
In discussion today we were inclined to a solution where if streamable="yes", xsl:merge-action uses a singleton focus. Note that the equivalent of position() can still be achieved if necessary using accumulators. However, no decision was made.
Comment 7 Michael Kay 2015-10-22 11:46:30 UTC
I don't think the effect of position() can be achieved using accumulators, because a call to position() within xsl:merge-action represents the number of groups that have been processed, which can only be determined in relation to the number of distinct merge keys across all the documents; accumulators can only depend on a single source document.

For evaluation of the merge key, we already decided in the resolution of bug #28762 to use a singleton focus in the case where it is streamable, and I propose we stick with that decision. (However, the resolution of 28762 currently appears in section 15.1 and it probably deserves a mention in 15.5).

For the focus within xsl:merge-action I propose to change bullet 3 of 15.7 from

"The context size is the number of groups, that is, the number of distinct sets of merge key values."

to:

"The context size is as follows:

* If any of the xsl:merge-source elements within the xsl:merge instruction specifies streamable="yes" (explicitly or implicitly), then *absent*.
   Note: this means that within the xsl:merge-action of a streamable xsl:merge, calling last() throws error XPDY0002

* Otherwise, the number of groups, that is, the number of distinct sets of merge key values."

This currently contradicts the statement in XPath 2.1.2: "If any component in the focus is defined, all components of the focus are defined." I propose to raise a separate issue on that.
Comment 8 Michael Kay 2015-10-22 22:21:48 UTC
The change (that is, setting context size to absent) was accepted and has been applied.