26328 – [XSLT30] Streamable and non-streamable accumulators in grounded postures

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26328 - [XSLT30] Streamable and non-streamable accumulators in grounded postures

Summary: [XSLT30] Streamable and non-streamable accumulators in grounded postures

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Last Call drafts
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-07-14 12:53 UTC by Abel Braaksma
Modified:	2014-09-11 21:44 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Abel Braaksma 2014-07-14 12:53:52 UTC

We briefly discussed this during the latest telcon of the WG. The issue is the following, suppose you have the following in your stylesheet:

<xsl:mode streamable="yes" />
<xsl:mode name="non-streamable" />

<!-- counting nodes -->
<xsl:accumulator name="acc" 
   initial-value="()"
   streamable="yes">

   <xsl:accumulator-rule 
      match="node()"
      phase="start"
      new-value="$value + 1" />
</xsl:accumulator>

<!-- counting nodes non-streamable -->
<xsl:accumulator name="acc-ns" 
   initial-value="()"
   streamable="no">

   <xsl:accumulator-rule 
      match="node()"
      phase="start"
      new-value="$value + 1" />
</xsl:accumulator>

<xsl:template match="*">
   <xsl:value-of select="accumulator-before('acc')" />
   <xsl:choose>
      <xsl:when test=". instance of text()">
         <xsl:apply-templates
            select="copy-of(.)"
            mode="non-streamable" />
      </xsl:when>
      <xsl:when>
         <xsl:for-each select="copy-of(comment())">
            <!-- streamable accumulator? -->
            <xsl:value-of select="accumulator-before('acc')" />
            <!-- or non-streamable accumulator? -->
            <xsl:value-of select="accumulator-before('acc-ns')" />
         </xsl:for-each>
      </xsl:when>
   </xsl:choose>
</xsl:template>

<xsl:template match="text()" mode="non-streamable">
   <!-- streamable accumulator? -->
   <xsl:value-of select="accumulator-before('acc')" />
   <!-- or non-streamable accumulator? -->
   <xsl:value-of select="accumulator-before('acc-ns')" />
</xsl:template>

Streamable accumulators are defined such that they can only be used in streamable contexts. The example above is a demonstration of the questions that arise. I am unsure whether the current rules either prevent, or allow the use of accumulators in grouned context or non-streamable modes called with copies of nodes from a streamable document. 

I recognize the following situations:
1) calling a streamable accumulator when a streamed node, not copied, is the context node (first call above).

2) calling a streamable accumulator when a copy of a streamed node is the context node (inside the for-each above).

3) calling a streamable accumulator when a copy of a streamed node is the context node and the construct is inside a declaration that is outside the scope of streamability analysis (the last matching template above).

4) calling a non-streamable accumulator when a copy of a streamed node is the context node (inside the for-each above)

5) calling a non-streamable accumulator when a copy of a streamed node is the context node and the construct is inside a declaration that is outside the scope of streamability analysis (the last matching template above).

In the specification (internal draft) we currently say under 18.2.1: 

"An accumulator is applicable to a particular document if the pattern supplied in the applies-to attribute matches the first element node in the document [...]"

We don't say anything, I think, about temporary documents. If we create a copy of a node, are the accumulated values reset? Do both streaming and non-streaming accumulators apply to it? Or should they turn up empty? Or does the copy of a node include the accumulated values that have been calculated for the in-scope nodes?

Comment 1 Abel Braaksma 2014-07-31 13:09:01 UTC

The WG discussed this at F2F Hursley 30 July 2014.

The WG concluded that there were some issues that needed resolving, but not necessarily as mentioned in this bug entry.

> 1) calling a streamable accumulator when a streamed node, not 
> copied, is the context node (first call above).

This is allowed, but limited inside the focus-setting construct of xsl:stream and xsl:template. The WG agreed that it would be beneficial to expand this to xsl:for-each, xsl:iterate, xsl:for-each-group.

> 2) calling a streamable accumulator when a copy of a streamed 
> node is the context node (inside the for-each above).
> 3) calling a streamable accumulator when a copy of a streamed 
> node is the context node and the construct is inside a declaration 
> that is outside the scope of streamability analysis (the last matching 
> template above).

Both are not allowed, the rules stipulate that the context node must be equal to a streamed node of the focus setting container (which is xsl:template or xsl:stream only, atm). A copy of a node is never the same as the context node. 

> 4) calling a non-streamable accumulator when a copy of a streamed 
> node is the context node (inside the for-each above)

Uncertain (question: is it possible to let a non-streamable accumulator operate on a temporary tree?)

> 5) calling a non-streamable accumulator when a copy of a 
> streamed node is the context node and the construct is inside 
> a declaration that is outside the scope of streamability 
> analysis (the last matching template above).

Same as above: unclear atm.

Comment 2 Abel Braaksma 2014-07-31 14:55:10 UTC

On 4) 5)
This is actually allowed and will return the calculated accumulator. Accumulators apply to copies of nodes. There is an ambiguity in the text that says "documents" when it means "any tree". However, this is been tackled elsewhere and will be covered with a disclaimer on the word "document".

New case: merging
The merge-group is grounded, even in streamed mode. This means that any streamable accumulator cannot be used and any non-streaming accumulator will only be applicable to the snapshot of the merge-group.

The WG has considered this and decided not to pursue allowing streamable accumulators with merging.

The WG has considered allowing accumulator-before/after inside xsl:iterate and xsl:for-each, but dismissed xsl:for-each-group and xsl:function for reasons of being hard to define pre/post-descent.

Comment 3 Michael Kay 2014-07-31 14:57:20 UTC

Resolved on 31 July 2014, subject to detailed textual proposal, to extend the definition of "controlling sequence constructor" for accumulator-before/after to cover the sequence constructors within xsl:for-each and xsl:iterate, so that streamable accumulator functions can be called within these instructions.

Comment 4 Michael Kay 2014-08-04 09:52:10 UTC

I'm applying changes to give effect to this decision but I think it needs a little more work.

Abel asks whether a streamable accumulator should apply to a non-streamed document. I think that it should, in the same way that a streamable template can be applied to a non-streamed document. "Streamable" means that it can be streamed, not that it must be streamed. Using a non-streamable accumulator on a streamed document should be an error, but using a streamable accumulator on a non-streamed document should be fine; it should behave just as if it were a non-streamable accumulator.

I am trying to rationalize the rules that control when you can call accumulator functions on streamed nodes. If we ignore for the moment the problem of templates (etc) that choose not to consume the streamed input, then the rule appears to be more simple than we are making it: in the case of accumulator-before, you must be "in the course of" evaluating an instruction whose context item is the same as the context item of the accumulator-before call, and that has a following-sibling instruction that is consuming. For accumulator-after, replace following-sibling by preceding-sibling.

This leaves the question of sequence constructors that process a streamed node without consuming it, for example those that process text nodes. In this case we classify all the instructions as both pre-descent and post-descent. This sounds reasonable, but it causes trouble if you call accumulator-after before calling accumulator-before, especially in the case where you are processing an element that actually has children (i.e. where the children are skipped) in which case you need to consume the input stream implicitly before doing the accumulator-after() call. I'm looking for some way to make accumulator-after "conditionally consuming" to handle such cases. One crude way would be an xsl:skip instruction that consumes the input and returns nothing; we could then require accumulator-before to come before this instruction and accumulator-after to come after it.

Comment 5 Michael Kay 2014-08-14 16:55:01 UTC

In discussion:

(a) if the template isn't consuming, we should be able to use either acc-before() or acc-after() at any point - this means the before-value has to be remembered.

(b) we could give effect to this by saying (i) acc-before must not have a preceding sibling that is consuming; (ii) acc-after must not have a following-sibling that is consuming.

Comment 6 Michael Kay 2014-08-21 15:03:18 UTC

I note also that the definitions of pre-descent and post-descent allow evaluation of accumulators within xsl:if or xsl:choose, whereas the rules for accumulator-before and accumulator-after do not.

To implement comment #5, I think we can change the rules for accumulator before as follows:

If the context item is a node in a streamed document, then a number of restrictions (called the dynamic pre-conditions) apply:

* The accumulator must be declared with streamable="yes"

* The call on accumulator-before must be made in the course of evaluating some sequence constructor SC that has the following properties:

** The element E that immediately contains SC must be one of xsl:stream, xsl:template, xsl:for-each, or xsl:iterate.

** The sweep of SC must be consuming or motionless, and its posture must be striding or crawling.

** The context item for evaluation of SC must be the same as the context item for evaluation of the accumulator-before function call.

* Every instruction J that has properties that (a) the call on accumulator-before is made in the course of evaluating J, and (b) the context item for evaluating J is the same as the context item for evaluation accumulator-before, must be one of the following:

** an instruction that is not consuming and that has no preceding-sibling instruction that is consuming, or

** a consuming xsl:if instruction, provided that the test expression is not consuming

** a consuming xsl:choose instruction, provided that:

*** if the call is in the course of evaluating a test expression, then neither that test expression nor any preceding test expression is consuming;

*** if the call is in the course of evaluating an xsl:when sequence constructor, then that neither the test expression on that xsl:when element nor any preceding test instruction is consuming

*** if the call is in the course of evaluating an xsl:otherwise sequence constructor, then no test expression on any xsl:when element is consuming

The rules for accumulator-after become:

If the context item is a node in a streamed document, then a number of restrictions (called the dynamic pre-conditions) apply:

* The accumulator must be declared with streamable="yes"

* The call on accumulator-after must be made in the course of evaluating some sequence constructor SC that has the following properties:

** The element E that immediately contains SC must be one of xsl:stream, xsl:template, xsl:for-each, or xsl:iterate.

** The sweep of SC must be consuming or motionless, and its posture must be striding or crawling.

** The context item for evaluation of SC must be the same as the context item for evaluation of the accumulator-before function call.

* Every instruction J that has the properties that (a) the call on accumulator-after is made in the course of evaluating J, and (b) the context item for evaluating J is the same as the context item for evaluation of accumulator-after, must be one of the following:

** an instruction that is not consuming and that has no following-sibling instruction that is consuming, or

** a consuming xsl:if instruction, provided that the evaluation occurs in the course of evaluating the sequence constructor contained in the xsl:if instruction

** a consuming xsl:choose instruction, provided that one of the following applies:

*** the call is in the course of evaluating a test expression, provided that neither that test expression, nor any following test expression, nor the sequence constructor in any subsequent branch of the xsl:choose is consuming;

*** the call is in the course of evaluating an xsl:when sequence constructor, provided that no subsequent branch of the xsl:choose has a test expression or contained sequence constructor that is consuming;

*** the call is in the course of evaluating an xsl:otherwise sequence constructor.

Comment 7 Michael Kay 2014-08-21 17:24:10 UTC

Discussed extensively during today's telcon, but without coming to a conclusion. The rules proposed in comment #6 appear excessively complicated. They could be simplified by not allowing use in conditional instructions, but this would be rather restrictive. Could we have a simpler rule that is decidable statically, even if it is more restrictive? It was felt that a simpler rule would be desirable so long as users have a workaround for how to handle all reasonable cases, e.g. by binding the accumulator value to a variable.

MK pointed out that what we are essentially doing is saying that acc-before must come "before" the descent, and acc-after must come "after", and we are defining some rules that amount to defining a partial order of execution; at least to the extent that we say "an implementation must deliver the same results as if the order of execution satisfied the following constraints". (The main constraint being that sequence constructors are evaluated left-to-right, but also some constraints on conditional instructions.)

ABr suggested we could reduce the complexity of the rules by half if we don't constrain accumulator-before (which requires implementations to "remember" the accumulator-before value on a stack, in case it's needed during the post-descent phase.

Comment 8 Michael Kay 2014-09-11 19:27:50 UTC

After considerable email discussion the WG today agreed a solution to the remaining problems. An outline of the solution is at https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Sep/0016.html (member-only) with some refinements recorded in the minutes of today's telcon. The essence of the solution is:

* accumulator-before is unrestricted; it's a property of a node (like name()) that can be accessed at any time. The implication is that the system (conceptually at least) calculates the acc-before value as soon as it encounters the start tag of the node, and keeps the value until it hits the end tag.

* accumulator-after is restricted by streamability rules, which have the effect of ensuring that it is not streamable if it appears in a sequence constructor prior to a consuming instruction.

* accumulators are allowed to invoke each other. A cycle is an error. Implementations are allowed to report the error statically if they can. If it isn't detected statically (e.g. because the accumulator names passed to the functions are not string literals) then in the event of a cycle the processor is allowed to fail catastrophically, analogous to the kind of failure permitted for infinite function or template recursion (e.g. stack overflow or non-termination).

Comment 9 Michael Kay 2014-09-11 21:44:49 UTC

The changes have been applied to the spec.