25316 – [XSLT 3.0] Streamability and numeric predicates

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25316 - [XSLT 3.0] Streamability and numeric predicates

Summary: [XSLT 3.0] Streamability and numeric predicates

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Last Call drafts
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-04-10 21:07 UTC by Michael Kay
Modified:	2014-08-02 18:06 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Michael Kay 2014-04-10 21:07:10 UTC

In the streamability rules for filter expressions and axis steps we give special treatment to numeric predicates, which is useful because it establishes that the result is a singleton and this can change the posture from crawling to striding.

However, the only cases we detect are where the value is a numeric literal or a variable reference whose declared type is numeric.

We can extend this a little bit easily by saying "a variable reference whose static type is a subtype of U{decimal, double, float}".

It would be nice to allow ANY expression whose static type is a subtype of U{decimal, double, float}. The only reason we can't do this is that we need to rule out context-dependent expressions such as X[position()], which do not necessarily evaluate to a singleton. But I think we now have the machinery to do this. Specifically, we should be able to use a formulation such as:

Given E as B[P]; the result is a singleton if the static type of P is a subtype of U{decimal, double, float} and neither P, nor any subexpression of P at any depth (a) has E as its focus-setting container and (b) is a context item expression, an axis expression, or a call on a focus-dependent function other than last().

Furthermore, the current type analysis infers U{A} (rather than a numeric type) for expressions such as $X + 1. For operators "+" and "-", the result is numeric if either operand is known to be numeric. For multiplicative operators, the result is numeric if both operands are known to be numeric. In fact one can go further; it doesn't matter if the result is a duration, because a duration has no effective boolean value and is therefore a type error if it appears in a predicate; therefore any arithmetic expression used as a predicate will either deliver a numeric result, or cause a type error, which means we can safely assume that a filter expression whose predicate is an arithmetic expression will either deliver a result with cardinality 0:1 or will fail.

Comment 1 Michael Kay 2014-04-10 21:14:30 UTC

The formulation "numeric literal or variable declared as numeric" occurs when talking of predicates in filter expressions and axis steps. But when it comes to classifying patterns, we say:

The expression immediately contained in the predicate is a non-numeric expression. An expression is non-numeric if its static item type....

which fails to allow for the pathological cases like X[position()] where the static type is numeric but the evaluation is focus-sensitive.

So we need to fix this.

Comment 2 Michael Kay 2014-04-10 21:21:46 UTC

Disregard what I said about patterns. For patterns, our aim is to ensure that the predicate is NOT numeric. For this purpose, disallowing pathological cases like P[count(@*)] is fine.

Comment 3 Michael Kay 2014-07-14 15:49:13 UTC

I was asked to produce a concrete wording proposal.

Proposal 

(A) in 19.8.7.8 change "For a filter expression of the form B[P]" to "For a filter expression F of the form B[P]" 


(B) change rule 1 of 19.8.7.8 (Streamability of Filter Expressions) from

If B is crawling and consuming, and P is either a numeric literal, or a variable reference whose static type is a subtype of U{xs:decimal, xs:double, xs:float}, then crawling and consuming

to

If all the following conditions are satisifed:

* B is crawling and consuming;

* The static type of P is a subtype of U{xs:decimal, xs:double, xs:float}, or

* Neither P, nor any operand of P at any depth that has F as its focus-setting container, is a context item expression, an axis expression, or a call on a focus-dependent function other than last();

then crawling and consuming

(C) in section 19.2 (Determining Static Type), in the table entries for AdditiveExpr and MultiplicativeExpr, qualify the current entries to say: "But if the expression is a predicate (that is, if it appears between square brackets in a filter expression or axis step), then U{xs:decimal, xs:double, xs:float}. This type inference is possible because any other result of an addition or subtraction [or multiplicative operation] within a predicate, for example an xs:duration or xs:dateTime, would cause a type error, and static type inference only needs to consider the type of non-error results.

Comment 4 Abel Braaksma 2014-07-15 01:18:18 UTC

Some observations:

1) I think this new rule, and the existing rule, has an error. We say "If B is crawling ....<set of rules>... then crawling and consuming". As written, this rule has no effect. Because the result is a single node, the resulting posture should be striding.

2) "... and consuming". I can't think of a crawling expression that is motionless, it could happen in xsl:for-each, but there a crawling expression cannot be the focus setting expression.

3) We do allow position() in a predicate, except that than the result of B[position()] is crawling (this follows from the next rule in that section).

4) "that has F as its focus-setting container" >> I think you mean "that has B as its focus-setting container. F already includes P, so I don't think the whole expression can be its own focus-setting container.

5) "other than last();" >> the last() function is already prohibited in a filter expression, unless in a nested filter expression with climbing posture, but these rules rule other path expressions out.

It appears to me that the rules are meant to allow such expressions as:
(foo//bar)[12 + $count]
(foo//bar)[(1, 2, 3)[last()]]
(foo//bar)[(3, 4, 5)[position() = $pos]]
(foo//bar)[count($param)]

And to disallow (in fact, leave result as crawling):
(foo//bar)[position()]
(foo//bar)[@graph = 10]
(foo//bar)[position() = 12]
(foo//bar)[number(parent::*[2]/@age) + 10]

Handy: the new (and existing) rules can turn a crawling expr into striding, if you expect zero-or-one results:
(foo//bar)[@graph = 10][1]
(foo//bar)[number(parent::*[2]/@age) + 10][1]
(foo//bar)[position()][ancestor::name[@age]][1]

6) We have a rule under 19.8.7.7 (axis steps) that is almost the same:

"4. If the context posture is striding, and the axis is descendant or descendant-or-self, and there is a predicate in the PredicateList that is either a numeric literal or a variable reference whose static type is a subtype of U{xs:decimal, xs:double, xs:float} (for example, descendant::title[1]), then striding and consuming;"

Should we apply these new rules there as well (or reference it to have them in one place)?

Comment 5 Michael Kay 2014-07-15 22:44:37 UTC

Some observations:

>1) I think this new rule, and the existing rule, has an error.... Because the result is a single node, the resulting posture should be striding.

Yes, you are right.

2) "... and consuming". I can't think of a crawling expression that is motionless, it could happen in xsl:for-each, but there a crawling expression cannot be the focus setting expression.

Yes. There's an underlying problem that posture and sweep are not orthogonal. It would be a good idea to have a chart somewhere that shows which combinations are possible.

3) We do allow position() in a predicate, except that than the result of B[position()] is crawling (this follows from the next rule in that section).

Not sure of your point?

4) "that has F as its focus-setting container" >> I think you mean "that has B as its focus-setting container. F already includes P, so I don't think the whole expression can be its own focus-setting container.

B isn't a container of the predicate, so it can't be the focus-setting container. The terminology might not be ideal, but I think we are using the term the way it is defined.

5) "other than last();" >> the last() function is already prohibited in a filter expression, unless in a nested filter expression with climbing posture, but these rules rule other path expressions out.

Not sure of your point...

6) We have a rule under 19.8.7.7 (axis steps) that is almost the same:

"4. If the context posture is striding, and the axis is descendant or descendant-or-self, and there is a predicate in the PredicateList that is either a numeric literal or a variable reference whose static type is a subtype of U{xs:decimal, xs:double, xs:float} (for example, descendant::title[1]), then striding and consuming;"

Should we apply these new rules there as well (or reference it to have them in one place)?

Yes, we should apply the same thinking to that rule. Sorry I missed that one.

Comment 6 Abel Braaksma 2014-07-16 13:09:38 UTC

3) ... <about fn:position() >
> Not sure of your point?

As it was originally written, both rules under 19.8.7.8 resulted in crawling. Now that the first ends up in striding, the issue is moot, because there is a difference now.

5) ... <about fn:last() >
> Not sure of your point...

In the rule, you say (my words) "except if it contains focus-dependent functions... except last". The double exception is redundant, you disallow focus-dependent functions, fn:last() is prohibited anyway (by its own rules), so why making an exception to the exception (allowing it), only to find out during analysis that using fn:last will yield free-ranging anyway? Or does an expression exist that uses last() on a streaming node-set that is crawling and is allowed?

On (4), the focus-setting container, I see now that the "container" is the whole construct, under the definition in the spec we write: " if an expression appears within the predicate of a filter expression, its focus-setting container is the filter expression.". I was confused with what we define is the "controlling operand". A suggestion -- I think this says the same, and it might be clearer:

* Neither P, nor any operand of P at any depth that has B as its [controlling operand], is a context item expression, an axis expression, or a call on a focus-dependent function.


[...]
7) I just noticed the "or" at the end of the second bullet point, I think this should be "and".

Comment 7 Sharon Adler 2014-07-23 20:27:11 UTC

The WG looked at this bug last week (July 17, 2014)and determined that this needed more thought and work.  To be considered at the Hursley F2F.

Comment 8 Michael Kay 2014-07-31 13:40:21 UTC

The following proposal was accepted:

(A) in 19.8.7.8 change "For a filter expression of the form B[P]" to "For a filter expression F of the form B[P]" 


(B) change rule 1 of 19.8.7.8 (Streamability of Filter Expressions) from

If B is crawling and consuming, and P is either a numeric literal, or a variable reference whose 
static type is a subtype of U{xs:decimal, xs:double, xs:float}, then crawling and consuming

to

If all the following conditions are satisfied:

* B is crawling;

* The static type of P is a subtype of U{xs:decimal, xs:double, xs:float}, and

* Neither P, nor any operand of P, at any depth provided it has F as its focus-setting container, 
is a context item expression, an axis expression, or a call on a focus-dependent function; [[it would be useful to explain the implications of this rule]]

then striding and the sweep of B

(C) in section 19.2 (Determining Static Type), in the table entries for AdditiveExpr and MultiplicativeExpr,
 qualify the current entries to say: "But if the expression is a predicate (that is, if it appears between
  square brackets in a filter expression or axis step), then U{xs:decimal, xs:double, xs:float}. 
  This type inference is possible because any other result of an addition or subtraction [or multiplicative operation] within a predicate, for example an xs:duration or xs:dateTime, would cause a type error, and static type inference only needs to consider the type of non-error results.


Notes/examples:

(a) we don't recognize .//a[position() = 12] as striding - but this can be rewritten .//a[12].

(b) we classify .//a[(1 to 5)] as striding, it actually throws a dynamic error, but this doesn't affect the streamability analysis

(c) generally, give examples of where this analysis is helpful

In Streamability of Axis Steps, apply the same rules.

Comment 9 Michael Kay 2014-08-02 18:06:52 UTC

The changes have been applied to the spec (both for filter expressions and axis steps)