29153 – [XSLT30] have we created a loop-hole with windowed streaming and copy-of or snapshot?

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29153 - [XSLT30] have we created a loop-hole with windowed streaming and copy-of or snapshot?

Summary: [XSLT30] have we created a loop-hole with windowed streaming and copy-of or s...

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Last Call drafts
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Duplicates (1):	29161 (view as bug list)
Depends on:
Blocks:

Reported:	2015-09-28 13:35 UTC by Abel Braaksma
Modified:	2015-10-30 15:38 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Abel Braaksma 2015-09-28 13:35:13 UTC

This follows from the discussion in bug 29120, comment 4, where Michael wrote:

> <xsl:for-each select="copy-of(/*/transaction)">
>   <xsl:value-of select="last()"/>
> </xsl:for-each>

> Here last() is allowed because the context posture is grounded. But we have 
> the same problem, that in effect we have to materialize the entire result of 
> copy-of(), rather than pipelining it one item at a time. 

This does not pose a problem, because the whole argument will be consumed by copy-of(), but what if you rewrite this as the following, where the "consumation" should happen one element at a time?

<xsl:for-each select="/*/transaction/copy-of(.)">
  <xsl:value-of select="last()"/>
</xsl:for-each>

This is the "suggested" way of writing windowed streaming. Since we take a copy each time, the xsl:for-each gets a grounded posture. But the problem is not just with copy-of:

<xsl:for-each select="for $x in */* return string(*)">
  <xsl:value-of select="last()"/>
</xsl:for-each>

Which, imo, also suggests that the user intends to stream it, but the only way to know the value of fn:last() is by evaluating the whole tree.

I checked the general streamability rules, and while we have exception rules for higher-order operands like xsl:for-each, they have no effect when the select expression selects a grounded result.

I think that the only problem is with fn:last(), therefore I suggest we add a rule in section 19.8.9.14 Streamability of the last function along those lines (probably with a Note explaining why so):

    x) if the fn:last function appears at any depth inside a higher-order 
       operand, then roaming and free-ranging.

This has a few nasty side-effects, so perhaps we can do better. I.e., consider:

a) a/b[last()] (should be prohibited)
b) (1 to 10)[last()] (should be allowed)
c) xsl:for-each/@select="copy-of(x)", the last() in seqtor (should be allowed?)
d) same with xsl:iterate
e) same with xsl:for-each-group
f) (for $e in a/b return copy-of($e))[last()]  (maybe allowed?)
g) a/b/copy-of(x)[last()] (should be prohibited, though can be made streamable)

Perhaps there are other scenarios to consider? In the event that we prohibit too much, someone can always create a copy of a node-set as a workaround and use count($x).

Comment 1 Michael Kay 2015-09-28 13:53:57 UTC

Perhaps the rule should be that last() is disallowed if its focus-setting container is consuming (i.e., even if the posture is grounded).

And perhaps the above rule should have an exception that a value comparison or general comparison is allowed if one operand is position() and the other is last(), e.g. 

if (position() = last()) ...

because with some limited lookahead that can be streamed easily enough.

Comment 2 Abel Braaksma 2015-09-28 15:39:56 UTC

> if (position() = last()) ...
>
> because with some limited lookahead that can be streamed easily enough.

In general, I agree, but I find it a tricky rule to get right. What with if(position() + 10 = last()) or if(@x = last())? 

If the suggestion is to *only* allow the exact expression "position() = last()", then still, that can be written in a variety of similar ways.

If we want to allow that, we should perhaps better consider to:

a) introduce something like fn:is-last, in line with fn:has-children
b) allow xsl:on-completion inside xsl:for-each (in tail position)

I would opt for (a), as it will be simpler to ban fn:last completely (you can then use "if(is-last()) then position() else ()" as alternative), or at least inside any higher-order context.

For a moment I thought this would also allow the now forbidden a/b[last()] as a/b[is-last()], but that won't work when combining "a/b[is-last()] | a/c[is-last()]". So the rule could be "is-last() is motionless and grounded in grounded and climbing postures, and roaming and free-ranging otherwise".

(I know we aren't adding any new features, but this is no suggestion thereto, the suggestion is to fix the bug.)

Comment 3 Michael Kay 2015-09-28 15:55:11 UTC

I think the rule "a value comparison or general comparison is allowed if one operand is position() and the other is last()" is easy enough to state, and easy enough to test for, and flexible enough to handle all the common ways of asking "is this the last item?".

Comment 4 Abel Braaksma 2015-09-30 01:47:06 UTC

Hmm... This is harder than I thought: what about applying templates with windowed streaming? I have raised this separately here: bug 29161.

Comment 5 Abel Braaksma 2015-09-30 02:11:47 UTC

> "a value comparison or general comparison is allowed if one operand is 
> position() and the other is last()" is easy enough to state,

Yes, I think you are right. Unfortunately, while pondering over it again, I think it cannot be done (I hope to be wrong though) because of accumulators. Consider:

<xsl:accumulator name="count" initial-value="0">
    <xsl:accumulator-rule match="*" select="$value + 1" />
</xsl:accumulator>

<xsl:template match="/">
    <xsl:for-each select="*/special/copy-of()">
        <xsl:if test="position() = last()">
            <last>{accumulator-after()}</last>
        </xsl:if>
        <xsl:if test="position() != last()">
            <elem>accumulator-after()</elem>
        </xsl:if>
    </xsl:for-each>
</xsl:template>

If the input stream is something like:

<root>
  <special />
  <foo />
  <special />
  <foo />
  <foo />
  ... 1000's more w/o special ...
  <foo />
  <special />
</root>

Then the output should be something like:

<elem>2</elem>
<elem>4</elem>
<last>2038543</last>

However, upon visiting each <special>, the input stream must proceed to the next <special> to peek whether or not the element is the last in the selection. By doing so, the accumulator function is called, leading to a different outcome, something like:

<elem>4</elem>
<elem>2038543</elem>
<last>2038543</last>

With non-streaming, this is not a problem, but with streaming, we lose the accumulator value after visiting the node, to prevent that we have to keep track of each and every accumulated value along the way (iirc).

Comment 6 Michael Kay 2015-09-30 09:21:21 UTC

I don't see a problem with lookahead and accumulators.

Process A does the lookahead, that is:
repeat {
  read node N
  write node N-1
}

Process B reads the output of Process A

Process A knows whether the node it has just passed to Process B is the last in the sequence

Process B evaluates the accumulators.

Comment 7 Michael Kay 2015-10-16 10:54:59 UTC

The issues raised here (and in related bugs) have been discussed by the WG in email and at telcons. I proposed that we add text explaining the general principles as drafted here:

https://lists.w3.org/Archives/Public/public-xsl-wg/2015Oct/0013.html

and the WG accepted this approach, with an action to take into account Abel's comments at

https://lists.w3.org/Archives/Public/public-xsl-wg/2015Oct/0016.html 

I have added this text as a new section 19.8.

I believe the bug can now be closed, but I leave it open for the moment to allow WG review.

Comment 8 Michael Kay 2015-10-16 11:03:36 UTC

*** Bug 29161 has been marked as a duplicate of this bug. ***

Comment 9 Michael Kay 2015-10-30 11:13:39 UTC

The WG reviewed this bug and determined that the actions already taken were adequate to mark it as resolved.