This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25160 - [XSLT 3.0] a//c with striding CP is roaming by the rules, though we strove to make it streamable
Summary: [XSLT 3.0] a//c with striding CP is roaming by the rules, though we strove to...
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
: 24490 (view as bug list)
Depends on: 24490
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-26 17:27 UTC by Abel Braaksma
Modified: 2014-07-14 10:56 UTC (History)
2 users (show)

See Also:


Attachments

Description Abel Braaksma 2014-03-26 17:27:55 UTC
Under [1] we write that // must be rewritten as /descendant-or-self::node()/ and an omitted axis x to child::x. Following the rules we get, given a context posture of striding:

1. rewrite a//c as child::a/descendant-or-self::node()/child::c
2. rewrite for associativity as (child::a/descendant-or-self::node())/child::c
3. determine LH posture and sweep
  a. assess child::a/descendant-or-self::node()
  b. determine LH posture and sweep
     i. assess child::a with CP striding
     ii. table under 19.8.7.7 gives result posture striding, sweep consuming
  c. determine RH posture and sweep
     i. assess descendant-or-self::node() with CP striding
     ii. table under 19.8.7.7 gives result posture crawling, sweep consuming
4. determine RH posture and sweep
  a. assess child::c with CP crawling
  b. table under 19.8.7.7 gives result posture roaming, sweep free-ranging
5. result of a//c is roaming and free-ranging

The problem in the rules is with the rewriting of // into /descendant-or-self::node()/, which introduces an extra step. Were we to rewrite a//c as a/descendant-or-self::c, the problem would not arise.

I propose to augment the rule for rewriting // as follows:

//x and //child::x into /descendant::x
//otheraxis::x into /descendant-or-self::node()/otheraxis::x
//otherexpr into /descendant-or-self::node()/otherexpr

Note also that we say very explicitly that a//c should be streamable (well, crawling, so streamable in certain contexts, like xsl:for-each) in the second bullet point under Examples of 19.8.7.6: 

"If the context posture is striding, then the posture of the expression a//c is crawling, because a descendant axis step evaluated with striding posture creates a new crawling posture."
Comment 1 Abel Braaksma 2014-03-26 17:29:22 UTC
Forgot reference:

[1] http://www.w3.org/TR/xslt-30/#streamability-of-path-expressions
Comment 2 Abel Braaksma 2014-03-26 17:30:53 UTC
This bug was originally discovered by Eugene Fotin.
Comment 3 Michael Kay 2014-03-27 20:22:02 UTC
As discussed on the telcon today. section 19.1 of the spec is designed to address this problem. However, we felt at Prague that 19.1 lacks precision and should be made more prescriptive.

I think this can be done as follows.

(a) For every *selection pattern* P, there is an implicit *matcher function* M<P> [angle brackets indicate a subscript]. The matcher function has the signature M<P>($n as node()) as xs:boolean, and the semantics of the function are such that M<P>(N) returns true if and only if N is a node that matches the pattern P.

(b) Definition: A *scanning expression* is an expression E that is the *equivalent expression* of some *motionless* *selection pattern* P, provided that E is not a subexpression of some containing expression that is itself a *scanning expression*, and that the static type of E does not include attribute or namespace nodes.

Before analyzing a construct C for streamability, but after expansion of abbreviations such as "//", any *scanning expression* E contained (at any depth) within C is rewritten as

descendant::node()[M<P>(.)]

where P is the motionless pattern whose *equivalent expression* is E.

For example, descendant::section/child::head is replaced by descendant::node()[F(.)] where F(N) returns true for nodes that match the pattern descendant::section/head.

(c) A matcher function taking the context item as its argument is grounded and motionless.

Note: any expression that meets these criteria is thus evaluated by scanning all descendant nodes in document order and testing each one to see whether it matches the corresponding pattern. This execution strategy is streamable, whereas a nested-loop evaluation strategy that followed the XPath semantics more closely might not be streamable. Of course, processors are free to adopt a different algorithm that achieves the same effect and preserves guaranteed streamability.

Note: the exclusion of patterns that match attributes or namespace nodes is a simplification that does not adversely affect streamability. It means that given an expression such as //section/head/@status, which is rewritten as descendant-or-self::node()/child::section/child::head/attribute::status, the expression descendant-or-self::node()/child::section/child::head acts as the *selection expression* to be replaced by a filter expression using a matcher function; this filter expression is streamable, with crawling posture, and the addition of an axis step using the attribute or namespace axis then creates an expression with climbing posture.
Comment 4 Michael Kay 2014-03-27 20:39:01 UTC
I failed to capture in the forgoing rules the constraint that the matching operation must occur within the subtree of the context node. I now recall this constraint causing us sime difficulty in Prague.

One approach to this might be to take the definition of the semantics of selection patterns:

<quote>
Specifically, an item N matches a pattern P if the following applies, where EE is the equivalent expression to P:

N is a node, and the result of evaluating the expression root(.)//(EE) with a singleton focus based on N is a sequence that includes the node N
</quote>

and modify it to say:

<new>
an item N matches a pattern P within a (sub)tree rooted at $A if the following applies, where EE is the equivalent expression to P:

N is a node, and the result of evaluating the expression $A//(EE) is a sequence that includes the node N

When $A is not otherwise specified, it is taken as root(N).
</new>

The matcher function M<P> then becomes M<P>(N, A), returning true if the pattern matches a node N within a subtree rooted at A, and the rewrite of a scanning expression  becomes 

let $A := . return $A/descendant::node[M<P>(., $A)]
Comment 5 Michael Kay 2014-03-28 08:59:28 UTC
Unfortunately the above proposal has a rather serious bug. The expression child::x gets rewritten to something like //*[self::x] which not only gives the wrong answer, it also has the wrong posture.

There's also (at least one) further bug. The expression //x is not streamable unless the context item happens to be the document node, but it is equivalent to a motionless pattern and gets rewritten under the proposed rules.

I'm therefore wondering about a different formulation. Delete section 19.1, and instead of preprocessing the expression tree, we do the streamability analysis as described and then have a second attempt if it fails. More specifically, in section 19.8.7.6 Streamability of Path Expressions, we recast the rules as follows:

1. Compute the provisional posture and sweep, as follows:

  (insert the existing rules)

2. If the provisional posture is roaming or the provisional sweep is free-ranging (or both), and if the path expression is a *scanning expression* (defined below), then the posture is crawling and the sweep is consuming. Otherwise, the posture and sweep of the path expression are the provisional posture and sweep.

Definition: A *scanning expression* is a relative path expression in the form L/R that satisfies the following conditions:

(a) each of the operands L and R is either (i) a scanning expression, or (ii) an AxisStep whose axis is child, descendant, descendant-or-self, or self.

(b) the expression is the *equivalent expression* of some *motionless pattern*. (That is, when the expression is considered as a pattern, the rules for classifying patterns (19.8.9) deem the pattern to be motionless.)

Rationale: note that we are no longer doing anything to change the semantics of the expression, we are only defining its streamability properties. The responsibility for devising a correct streaming implementation of such expressions is thus left entirely to the implementor, as it should be. In addition, we only invoke this rule as a fallback if the expression would otherwise be non-streamable, which reduces the risk of introducing bugs.

As an alternative to (b) we could inline the rules for motionless patterns: basically, the top-level predicates must be motionless and non-positional.

We will obviously need to add explanations for the reader; much of the required material already exists.

The new rules no longer apply to union expressions. But I think we've relaxed the rules for unions in 19.8.7.4 sufficiently that they no longer need special treatment. Under the above rules, section//(author|publisher) is not streamable because the expression is not a "scanning expression". I think we can live with this.
Comment 6 Innovimax 2014-03-28 11:50:26 UTC
Just to add my support and a careful monitoring of this bug because it goes in the direction to fully support "XML Signature Streaming Profile of XPath 1.0" subset [1]


[1] http://www.w3.org/TR/xmldsig-xpath/
Comment 7 Michael Kay 2014-04-16 08:42:03 UTC
*** Bug 24490 has been marked as a duplicate of this bug. ***
Comment 8 Michael Kay 2014-07-14 10:56:31 UTC
A detailed proposal that develops the idea in comment #5 was produced here:

https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Jul/0007.html

(member only) and was accepted by the WG with minor changes (Minutes 10 July 2014). The changes have been applied.