This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29983 - [XSLT30] Scanning expressions and function calls
Summary: [XSLT30] Scanning expressions and function calls
Status: RESOLVED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-06 09:45 UTC by Abel Braaksma
Modified: 2017-01-12 22:57 UTC (History)
0 users

See Also:


Attachments

Description Abel Braaksma 2016-11-06 09:45:27 UTC
This issue is a reality check. I found myself writing:

f:someDeepDescent(foo)/bar

I believed it was allowed and guaranteed streamable, because it is essentially a scanning expression. The lh-side of "/" returns a crawling expression, the rh-side does a name-test for each child. 

It is a scanning expression in my mental model, but now I think this is wrong. A scanning expression is only a scanning expression if it can be rewritten, i.e.:

a//b/c

can be rewritten as (not entirely correctly so, but to get the gist):

descendent-or-self::c[parent::b[ancestor-or-self::a]]

Since everything inside [...] is motionless, this works.

My mental model was: take any crawling expression and add a striding or crawling expression, result is a crawling expression.

The function above is assessed as crawling expression. However, it cannot be rewritten trivially, so it does not qualify as a scanning expression. Conclusion: *not* guaranteed streamable.

I think the rules catch this correctly, I just spent a lot of time finding this out and wondering if my assessment is correct.
Comment 1 Michael Kay 2016-11-06 14:33:53 UTC
My mental model for scanning expressions is that they are expressions that can be evaluated by considering each descendant node individually and testing it against a pattern. Anything that involves evaluating a user-defined function that selects which nodes should match the pattern doesn't fit into that model at all.

Remember that all this started with the //section/footer problem, with a a source tree like

<section id="A">
  <section id="B">
    <footer of="B"/>
  </section>
  <footer of="A"/>
</section>

Here the nested-loop evaluation strategy for the path expression yields nodes that are not in document order (the child of the second section PRECEDES the child of the first section). The only streamable strategy for evaluating this is to consider the footer nodes in document order and test each one against the pattern //section/footer. If we replaced "//section" here by a function that selects the section elements then I would have no idea how to devise a streamable execution strategy.
Comment 2 Abel Braaksma 2016-11-06 17:25:21 UTC
Thanks, that helps and I remember now (the discussion, the whiteboard). Do you think it makes sense to enlighten the section by explaining how we got to those rules, in other words, add something along the following lines?

Note:

These rules grab those situations where an expression can be rewritten as a pattern and therefor can assess upon visiting a node whether or not it fits this pattern, which effectively makes it streamable....

(no wordsmith here, doesn't sound too clear to me, maybe something better? ;) )
Comment 3 Michael Kay 2016-11-11 00:22:48 UTC
I was asked to improve the explanation and rationale for scanning expressions. I have therefore rewritten the relevant Note as follows:

<p>The special rules for scanning expressions are designed to ensure
that expressions such as <code>//section/head</code> are streamable. The problem
with such an expression is that it is
possible to have two nested sections <var>A</var> and <var>B</var>, where <var>A</var> is the parent of <var>B</var> and thus precedes <var>B</var> in document order, but where there are children of <var>A</var> that come <emph>after</emph> children of <var>B</var> in document order. This means that a nested-loop strategy for the evaluation of <code>/descendant::section/child::head</code> is not guaranteed to deliver nodes in document order without a sort, and is therefore not a viable strategy for streaming.</p>

<p>However, there is a different strategy for evaluating such an expression,
which is in effect to rewrite the expression as <code>/descendant::head[parent::section]</code>; specifically, it is possible to scan all descendants in document order, looking for a <code>head</code>
element that has a <code>section</code> parent. Hence the term <term>scanning expressions</term>.</p>

<p>The expressions that qualify as scanning expressions are paths that can be evaluated by scanning all descendants and testing each one (independently) to see whether the elements on its ancestor axis match the specified path. The subset of expressions that qualify  as scanning expressions is therefore very similar to the subset that qualify as motionless patterns.</p>

                     <p>Scanning expressions cannot use positional predicates: for example <code>//section/head[1]</code> is not recognized as a scanning expression because this would require information about a streamed node (specifically, about its preceding siblings) that is not retained during streaming.
                        </p>
Comment 4 Abel Braaksma 2016-11-11 10:25:21 UTC
This is a very clear explanation and I believe helps a lot in understanding scanning expressions, thanks.

I'd be interested in your thoughts to add one more line, something like:

"One might be tempted to extend scanning expressions to ordinary expressions, and certain processors may be able to do so by rewriting ordinary expressions into scanning expressions, but these particular rules are deliberately limited to path expressions where each step is itself a step, filter or union expression."

I know we normally don't show counter-examples, but I think it wouldn't hurt to add one or two scanning expressions that are either not valid scanning expressions or that are not streamable (we presently have one such example). I also think it wouldn't hurt to add a union and filter expression:

* f:deep-descent(book)/author, if f:deep-descent is a streamable function with streamability of deep-descent, its posture is crawling. The step expression /child:author is striding. This is not a scanning expression and according to the rules in *Axis Steps* this expression is roaming and free-ranging.

* .//account | .//ACCOUNT is a union expression that is a scanning expression with resulting posture crawling.

* (list/person except list/person[@age > 12])[@state='NY'] is a filter expression with an embedded except expression that is a scanning expression and the expression as a whole is a scanning expression with crawling result posture.

* catalog//author[name = 'Tolkien'] is roaming and free-ranging, because the predicate is not motionless. Here, the expression catalog//author would be a scanning expression, but with the predicate it becomes free-ranging.
Comment 5 Michael Kay 2016-11-16 17:25:19 UTC
I suggest changing 

The subset of expressions that qualify as scanning expressions is very similar to the subset that qualify as motionless patterns, and a possible evaluation strategy for scanning expressions is to examine each descendant of the context node and test whether it matches the corresponding pattern.

to:

Expressions that qualify as scanning expressions would also qualify as motionless patterns, and a possible evaluation strategy for scanning expressions is to examine each descendant of the context node and test whether it matches the corresponding pattern.

In the interests of simplicity, however, the set of expressions that are treated as scanning expressions has been kept to a minimum. It excludes, for example, expressions that use operators such as union, except, and intersect, and expressions that contain motionless predicates, even though these might be amenable to a similar evaluation strategy.
Comment 6 Abel Braaksma 2016-11-16 19:31:09 UTC
I welcome the extension to the note, but it isn't entirely accurate, as union, intersect and except expressions *are* allowed as scanning expressions, we say explicitly (point 2.a.iv)

a. A scanning expression is any of the following:
   [i..iii]
   iv. A union, intersect, or except expression in which both operands
       are scanning expressions.
Comment 7 Michael Kay 2016-11-17 17:58:56 UTC
Also noted that comment #5 is incorrect to suggest that 

catalog//author[name = 'Tolkien']

is not a scanning expression.
Comment 8 Abel Braaksma 2016-11-17 19:02:48 UTC
(In reply to Michael Kay from comment #7)
> Also noted that comment #5 is incorrect to suggest that 
> 
> catalog//author[name = 'Tolkien']
> 
> is not a scanning expression.

More precisely:

The first part, catalog//author, is a path expression with crawling posture and consuming sweep. This part may be considered a scanning expression, but it fails the following tests:

The inner expression of the predicate is itself consuming (and grounded).

Any expression with a consuming predicate is roaming and free-ranging.

Perhaps you meant an expression like catalog//author[@name = "Tolkien"], which I think is indeed still not a scanning expression. My example in comment#4 deliberately used name, not @name.

A better example is perhaps

a//b//c[@d = "x"]

which is a scanning expression and is crawling. And the counterpart, for instance:

a//b//c[d = "x"]

which is a scanning expression + non-motionless predicate, thus fails the test for scanning expressions as a whole.
Comment 9 Michael Kay 2016-11-17 22:23:53 UTC
I was asked to produce a revised proposal trying to fix the errors in my previous version.

So I started thinking: are there any differences between scanning expressions and motionless patterns, and if so, what are they, and can we justify them?

I came to the conclusion that there are no significant differences.

There are differences of exposition: we say for motionless patterns that for predicates:  "The expression immediately contained in the predicate is motionless, when assessed with a context posture of striding, and a context item type set to the static type of the expression to which the predicate applies, determined using the rules in 19.1 Determining the Static Type of a Construct." - while for scanning expressions we simply say that the predicate must be motionless.

So I propose that we align the concepts, and redefine "scanning expressions" as follows  (remembering that we are talking here about relative path expressions only):

A relative path expression is a *scanning expression* if and only if it is the *equivalent expression* of some *motionless pattern* P.

With that change to the definition, I see little need to change the Note introduced by comment #3, other than to change the sentence "The subset of expressions that qualify as scanning expressions is therefore very similar to the subset that qualify as motionless patterns." - change "very similar to" to "the same as".
Comment 10 Abel Braaksma 2016-11-17 23:35:26 UTC
> Re comment#9:
We used to have the term *equivalent expression* but decided to drop it because we couldn't define it specifically enough. I believe the problem was that we define clearly to go from a pattern to an expression, but the inverse is not so simple and would lead to processor-dependent behavior on what expression also is a motionless pattern (I think it was in the discussion of bug 25160).

So, if there are no differences, that is good, then the definition is as it should be. But I'm uncertain we should go back to a situation we dismissed previously as too vague.

That said, if we can convince ourselves it is not vague and that for all E (expression) in EE (all RelativePath expressions) we can define unambiguously whether it can be rewritten as a motionless pattern or not, it would be a good way forward into simplifying this. But I have a strong feeling that precisely that discussion led to the introduction of scanning expressions in the first place.

Note that, with the present rules, an expression like //foo and the pattern //foo have distinct streamability semantics, the patterns are streamable, and the expression is only streamable in rare cases (if the context type is document-node), meaning that the sets of motionless patterns and streamable RelativePath expressions is not (presently) equal.
Comment 11 Michael Kay 2016-11-18 09:40:55 UTC
We define the term "equivalent expression" in 5.5.3.

It's not quite perfect because it's defined in terms of the lexical form of the pattern/expression, while the streamability analysis is really operating on the expression tree, but at this level it's good enough. The formulation I used "is the equivalent expression of some motionless pattern" is designed to accommodate lexical variations.

We don't need to worry about expressions/patterns like "//x" in this section because we're only concerned here with analyzing relative path expressions.

Note also that the semantics of the "scanning" operation are not precisely equivalent to pattern-matching of the descendant nodes, because the pattern has to match within the subtree rooted at the context item. For example descendant::section/child::head is a scanning expression, but it doesn't necessarily select every descendant element that matches the pattern section//head - there is an additional condition, viz that the section in question has to be within the subtree. But I still think it is the case that there we can define scanning expressions by reference to motionless patterns without any change to the intended meaning of the term (and that if we do so, we may avoid unintended differences).
Comment 12 Abel Braaksma 2016-11-19 01:12:33 UTC
Sorry, I meant with "used to have the term *equivalent expression*" to be w.r.t. the previous section 19.1, which led to a lot of debate.

Bug 24490 contains the summary of the Prague WG F2F where we discussed the problems of rewriting expressions, which was then a part of the streamability rules in a more general sense.

Since this only applies to a (subset of) RelativePathExpr, it is very well possible that the concerns there do not apply anymore, even more so if we are now talking about the situation where a RelativePathExpr is lexically equivalent with a Pattern30 production (as opposed to the result of a rewrite).

This will exclude some expressions that are potentially streamable, but I think they aren't streamable by our rules now either (like x/.//y, current()//x/y).
Comment 13 Michael Kay 2017-01-12 22:57:17 UTC
The proposal in comment #9 was accepted.

In implementing the proposal, I realized that the definition of "equivalent expression" is not quite right, because the equivalent expression of the pattern "section//head" is not "section//head", but "child-or-top::section//head", and "section//head" is therefore not the equivalent expression of any pattern.

So I reformulated the rule as:

<p><termdef id="dt-scanning-expression" term="scanning expression">A  <code>RelativePathExpr</code> is a <term>scanning expression</term> if and only if it is syntactically equivalent to some <termref def="dt-motionless"/> <termref def="dt-pattern"/>.</termdef></p>

followed by a note explaining what "syntactic equivalence" means.