29984 – [XSLT30] Lessen the restraint on required raising of XTSE3430 for constructs not guaranteed streamable per our rules

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29984 - [XSLT30] Lessen the restraint on required raising of XTSE3430 for constructs not guaranteed streamable per our rules

Summary: [XSLT30] Lessen the restraint on required raising of XTSE3430 for constructs ...

Status:	CLOSED WORKSFORME

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-11-06 10:09 UTC by Abel Braaksma
Modified:	2017-01-12 12:11 UTC (History)
CC List:	1 user (show)

See Also:

Attachments

Description Abel Braaksma 2016-11-06 10:09:50 UTC

We require a processor to offer a user option to raise the error XTSE3430 when a construct is not guaranteed streamable per the rules in the specification.

I know this option has had considerable debate in the past, but I would like to (slightly) lessen this requirement.

Specifically, processors may very well pre-optimize expressions and constructs if they evaluate to an empty sequence or a constant value. Example:

every $a in foo[false()] satisfies $a[@x = 1]

This is not allowed with streaming (binding a node to a variable). But the set of $a is empty (replace "foo[false()]" with any expression that is statically empty).

I wouldn't be surprised if most processors detect this at the parsing phase, and do not detect it being not guaranteed-streamable (but streamable nonetheless). I know we miss it.

While I am not certain what exactly we can do here, I am thinking of something along those lines:

Replace:

[Definition: A guaranteed-streamable construct is a construct that is declared to be streamable and that follows the particular rules for that construct to make streaming possible, as defined by the analysis in this specification.]

with

[Definition: A guaranteed-streamable construct is a construct that is declared to be streamable and that follows the particular rules for that construct to make streaming possible, as defined by the analysis in this specification, or any construct that can statically be determined to never require access to a streamed node but that would otherwise not be guaranteed streamable according to the rules in this specification.]. Whether or not a processor can determine statically that a construct does not require access to a streamed node is implementation-dependent.

Comment 1 Michael Kay 2016-11-06 14:46:52 UTC

I've been arguing for relaxation of this rule for a long while. To implement the rule in its current form, in principle you have to do streamability analysis on the expression tree before you do any other rewriting of the tree. Since streamability analysis depends on things like binding variable references, binding function references, and doing a certain amount of type analysis, this is a pretty tough demand.

Note also that this is explicitly a requirement that is "at risk" ("The requirement for every processor to report any departure from guaranteed streamability, even in cases where the processor is able to use streaming anyway.")

I would suggest adding the rule:

"The requirement to report a static error where a construct A is not guaranteed-streamable does not apply in cases where the processor is able to determine in the course of static analysis that the construct A is semantically equivalent to a construct B where B is guaranteed-streamable."

Note that this does not license relaxation of the rules as to what constructs are guaranteed streamable, it only permits static rewrites prior to streamability analysis. 

However, it's quite a wide exemption: for example (PRICE - DISCOUNT) is not GS, but it can be rewritten as (let $M := map{1:data(PRICE), 2:data(DISCOUNT)} return ($M?1 - $M?2)) which is GS.

Comment 2 Abel Braaksma 2016-11-06 17:37:07 UTC

I think the aim we should have is to widen the rules to those cases that can be statically assessed. Option (3), in comparison, is to "attempt streaming if you can", which we take as "optimistic streaming" (i.e., a user knows more of the data model than the processor does, so you can just try and see how it goes).

The underlying proposal here would instead widen what I call "pessimistic streaming" (option 1), which means, only if you are dead certain you stream, and you have to be certain during static analysis, then you are *not required* to raise XTSE3430.

Your proposal on "static rewrites" is wider than my proposal that only assesses cases where access to nodes is considered. I look forward to a discussion on this, perhaps we finally come to something that is both strict *and* allows processors a certain freedom in parsing and optimizing.

This proposal may also make XSLT 3.0 more future proof. By cementing the rules we kinda prohibit advances in streamability and big data analysis with XSLT. By allowing leniency, new research, real-world scenarios and processors playing catch-up with one another can advance the applicability of streaming XSLT and (big) data mining in general.

Comment 3 Michael Kay 2016-11-21 14:29:58 UTC

Here's an example where the relaxation in the rules would be useful. Test case accumulator-009s has 

   <xsl:template match="/">
     <result min="{accumulator-after('min')}" max="{accumulator-after('max')}" 
        sum="{accumulator-after('sum')}" count="{accumulator-after('count')}" 
        avg="{round(accumulator-after('sum') div accumulator-after('count'), 2)}"/> 
   </xsl:template>

Now, despite what I wrote in bug #30018, this is not guaranteed streamable. Uder §19.8.9.1, (accumulator-after() is consuming if there is no preceding INSTRUCTION that is consuming, therefore the literal result element has two consuming operands. But if it is rewritten as

<xsl:element name="result">
  <xsl:attribute name="min" select="accumulator-after('min')"/>
  <xsl:attribute name="max" select="accumulator-after('max')"/>
  ...

then it become guaranteed streamable. And Saxon does this rewrite at a very early stage of static analysis, long before streamability analysis.

Changing the rules to make the expression as originally written guaranteed-streamable is not easy. The reason the rules for accumulator-after() treat instructions and sequence constructors specially is that there is a natural order of execution. The rules would also have to handle:

   <xsl:template match="/">
     <result min="{accumulator-after('min')}" size="{count(*)}"/> 
   </xsl:template>

where it is very difficult to define rules that ensure that accumulator-after can be evaluated after the consuming count(*) expression.

Comment 4 Michael Kay 2016-12-01 18:17:28 UTC

MSMcQ asked (in IRC): But if we are going to remove the interoperability rule so thoroughly, why are we defining a concept of guaranteed-streamable in the first place?  If that definition has become pointless, as well as far more complex than I wish it were, then why not forget it entirely?

My response to that is that the definition is very far from pointless. We are essentially partitioning stylesheets into three classes:

(A) Those that are guaranteed streamable

(B) Stylesheets that are statically equivalent to stylesheets that are guaranteed streamable

(C) The rest

and the proposal is to remove the requirement for implementations to distinguish categories (A) and (B).

From a usability point of view, I think most users would prefer stylesheets in category B to be streamed rather than to be rejected: otherwise they will want to know "why is it streamable if I write it this way, and not if I write it that way, when the two are obviously equivalent".

From a purely practical point of view, I don't think it's realistic (for example) for an implementor to run two sets of static type inference rules over the expression tree, one to do the best possible optimization of the code, and one to evaluate exactly the types decreed as input to the streamability rules. It's a threat I don't like to use, but I do believe that if we keep this requirement in the spec then we will end up with products that decide not to conform to this requirement.

Comment 5 C. M. Sperberg-McQueen 2016-12-08 02:39:29 UTC

Most of the users I talk to are quite ready to accept that an equivalence obvious to a human is not necessarily obvious to a machine, and that attempting to extend a simple set of rules to cover even one or two such equivalences will quickly render it no longer simple.  But perhaps the users I speak to are, like myself, not
typical.

Speaking for myself, I as a user would prefer stylesheets in category B to be streamed, with messages informing me that this and that construct are not guaranteed streamable, although for this implementation (hurrah) they are streamable in fact.  If however I am forced to choose between giving up the messages or giving up the streaming, my preference is to give up the streaming.  Other users may have less wariness of being locked in to specific implementations.  (MAY?  Looking at user behavior, it's quite clear most users have no objection at all to lock-in, until it's too late.)

I would not like to push implementors too far, because I'm acutely conscious of the risk MK identifies, of not getting any conforming implementations.  But given a set of implementations which don't do what I think is the right thing (and which has been a fundamental principle of our design since 2007), and given the choice between  defining conformance to include or to exclude those implementations, I think we would do better to make them non-conforming.  Conformance rules can't force an implementation to do something the implementor doesn't want to do, but in some ways the definition of conformance is the only thing a WG has -- either we make it a useful concept for doing the kinds of thinking and talking users and others need to do, including talking and thinking about interoperability, or we make it useless. It's a threat I don't like to use, but I do think there is a risk that if conformance offers interoperability guarantees that are too weak, there might be WG members who will object to progressing the specification.

Comment 6 C. M. Sperberg-McQueen 2016-12-08 02:41:53 UTC

I'd like to extend MK's analysis in comment 4 a bit.  It seems to me that from the point of view of a user concerned with interoperability and working with a given processor P there are several distinct subclasses of (B) and (C)

(A) Stylesheets that are guaranteed streamable

(B) Stylesheets that are statically equivalent to stylesheets that are guaranteed streamable

(B1) Stylesheets recognized by P to be statically equivalent to a stylesheet in (A)

(B2) Stylesheets which are in reality statically equivalent to a stylesheet in (A), but which P does not recognize as such

(C) The rest

(C1) Not guaranteed streamable, not statically equivalent to guaranteed-streamable stylesheets, but nevertheless streamable in fact by P

(C2) Not guaranteed streamable, not statically equivalent to guaranteed-streamable stylesheets, and not in fact streamable by P

(C3) Guaranteed streamable, or statically equivalent to same, but not in fact streamable

Class C3 represents errors in our rules for guaranteed streamability; I hope it's empty, and it plays no further part here.

I believe (oversimplifying slightly, in what I hope are harmless ways) that under the status quo, if a user requests a conforming processor to process a stylesheet S in streaming fashion, then 

  - If S is in A, P processes S in streaming fashion.
  - If S is in B1, P processes S in streaming fashion and warns the user that S is not in A. 
  - If S is in B2, P processes S and warns the user that S is not in A.  (Whether the processing is done in streaming fashion or not depends on whether P thinks S is in C1 or C2; no processor will identify any stylesheet as being in B2.)
  - If S is in C1, P processes S in streaming fashion and warns the user that S is not in A. 
  - If S is in C2, P processes S in non-streaming fashion and warns the user that S is not in A. 

In all cases, the user will know whether S is in A or not in A.  Depending on how communicative the processor is, the user may also know whether S is in (B1 + C1) or in (C2).

If I understand the proposal here, it varies very slightly in exactly one case.  P is not required to distinguish class B1 from class A, so B1 is handled differently:

  - If S is in B1, P processes S in streaming fashion and does not warn the user that S is not in A.  

The user now always knows whether S is in (A + B1) or not in (A + B1).  The user may also be able to distinguish C1 from C2 (depends on whether the processor issues an informative message like "you call this a streaming stylesheet?  I can't stream this!" or "streaming anyway, lucky you").

Note that under the status quo, as I understand it, two conforming processors P and Q may disagree on some classifications but not all.

  - P knows that S is in A iff Q knows that S is in A.
  - If P places S in any of B1, C1, or C2, Q may place S in any of B1, C1, or C2.
  - No processor can reliably distinguish B2 from C (C1 or C2); I assume that any S not in A or B1 will go into C1 or C2, not B2.

Since if S is in A for one processor, S will be in A for any other conforming streaming processor, under the status quo a user will always know, from using processor P, whether processor Q is guaranteed to stream S or not.

Under the proposal here, however,  the user cannot know from using processor P whether S is guaranteed to be streamed by processor Q or not.  If P places S in (A + B1), S may in reality be in A, so Q is guaranteed to stream S.  But S may in reality be in B1, in which case Q is not guaranteed to stream S.

MK says the concept of guaranteed streamability is far from pointless, but I am not sure what work it does.  If the stylesheet is in (A + B1), the message to the user is essentially "well, your stylesheet is either guaranteed streamable, or I rewrote it without noticing into one that is guaranteed streamable.  If you run into problems with a different processor, you too can rewrite it into a guaranteed-streamable form.  Unfortunately, I can't tell you what changes you'll need to make in order to do that; you're on your own there.  But section 19 of the spec may help."  Compared with the goals I think the status quo achieves, this counts in my book as pretty pointless.

Comment 7 Abel Braaksma 2017-01-12 12:10:37 UTC

As per discussion during telcon Jan 05, 2017 I was given the action (ACTION 2017-01-05-002) to withdraw this bug. I am withdrawing with status WORKSFORME.