This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29507 - [xslt30] A problem case for streamed grouping
Summary: [xslt30] A problem case for streamed grouping
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-26 11:52 UTC by Michael Kay
Modified: 2016-10-06 18:42 UTC (History)
1 user (show)

See Also:


Attachments

Description Michael Kay 2016-02-26 11:52:05 UTC
I'm having difficulty with this test case:

  <xsl:template name="g-008" use-when="true() or $RUN">
    <out>
      <xsl:stream href="../docs/books.xml">
        <xsl:fork>
          <xsl:for-each-group select="($extra, /BOOKLIST/BOOKS/ITEM)" group-by="@CAT">
            <CAT ID="{current-grouping-key()}">
              <xsl:copy-of select="current-group()/PRICE"/>
            </CAT>
          </xsl:for-each-group>
        </xsl:fork>
      </xsl:stream>
    </out>
  </xsl:template>

It seems to be guaranteed-streamable according to the spec, but in practice streaming it is very difficult.

The problem is that current-group()/PRICE needs sorting into document order, because there is no guarantee that current-group() is already in document order, which is because it is not known whether the nodes in $extra will come before or after the nodes in /BOOKLIST/BOOKS/ITEM in document order.

Now arguably, we know that the subset of nodes in current-group()/PRICE which are streamed nodes will be in document order relative to each other, so one could devise a strategy that takes this into account. But this is pretty hard to achieve.

I'd prefer to make this one non-streamable, but I'm not sure of the best way of changing the rules to make it so.
Comment 1 Michael Kay 2016-02-26 12:07:02 UTC
I think the same problem might apply to the much simpler expression

($extra, /BOOKLIST/BOOKS/ITEM)/PRICE

Under the GSR, the posture of the LHS is striding, and I think we tend to assume that when an expression is striding, its nodes are delivered in document order, which is not the case here.
Comment 2 Michael Kay 2016-02-26 14:35:47 UTC
I have written a test case to demonstrate the problem: sx-commaExpr-201, which does

<xsl:copy-of select="($extraItem, /BOOKLIST/BOOKS/ITEM) / PRICE"/>

Note our definition of striding:

[Definition: Striding: indicates that the result of a construct is a sequence of nodes, in document order, that are peers in the sense that none of them is an ancestor or descendant of any other.]

I think we're best off sticking with this definition, which means that an expression shouldn't be classified as striding if the results are not in document order.

It's probably the comma operator that is the main offender here - but it relies on the GSR, so perhaps the GSR is wrong.

Currently (A, B) is striding if A is striding and B is grounded, or vice versa: under GSR 2(d)(iv), if one operand is grounded and motionless and the other is striding and consuming, then the P&S of the comma expression is the P&S of the consuming operand.

Note that if we change this so that ($extraItem, /BOOKLIST/BOOKS/ITEM) is no longer striding, then writing the query as

<xsl:copy-of select="($extraItem, /BOOKLIST/BOOKS/ITEM) ! PRICE"/>

also fails, even though document order shouldn't affect this one. (I've made this one into test sx-comma-019).
Comment 3 Michael Kay 2016-02-28 17:38:00 UTC
After debating this with myself on the XSL mailing list, and producing an initial implementation, I believe we can close this as follows.

(A) At the point where we define striding posture, remove the claim that the result of an expression in striding posture is always in document order. Replace this claim with a note that explains that a striding expression may contain a mixture of streamed nodes and grounded items, and the streamed nodes will always be in document order; as a result, some expressions that would normally require sorting into document order, such as (/book/book | $extrabook)/price, are deemed streamable because the sort can be achieved without buffering streamed nodes in memory.

(B) Under "Streamability of path expressions" add a similar note about mixed posture expressions.

Note that under the current rules, similar expressions that involve crawling sub-expressions are not streamable: for example (//book | $extrabook) / PRICE is not guaranteed streamable. This is because the LHS of the "/" operator is not a "scannable expression". We could easily extend the rules for scannable expressions to cover this case, but I don't propose to do so.

The test cases available are now sx-comma-040 to -043 and -140 to -143, and sx-union-040 to -043 and -140 to -143.
Comment 4 Abel Braaksma 2016-03-10 11:31:16 UTC
for reference of the discussion, see also https://lists.w3.org/Archives/Public/public-xsl-wg/2016Mar/0000.html and previous messages in that thread.
Comment 5 Michael Kay 2016-04-14 15:52:25 UTC
On 2016-04-07, the WG

RESOLVED: to resolve bug 29507 by accepting the proposal in comment 3.
Comment 6 Michael Kay 2016-04-28 16:45:02 UTC
Abel asked for the bug not to be closed until he has had a chance to review the changes in the updated spec.
Comment 7 Michael Kay 2016-05-27 11:26:10 UTC
The changes have been applied.

Under item (B) of the proposal I put the additional note under the General Streamability Rules, not under "Path Expressions" as suggested, because the rules apply more generally than to path expressions.