29472 – [XSLT30] Add attribute "streamable=yes|no" to xsl:stream

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 29472 - [XSLT30] Add attribute "streamable=yes|no" to xsl:stream

Summary: [XSLT30] Add attribute "streamable=yes|no" to xsl:stream

Status:	CLOSED INVALID

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Candidate Recommendation
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-02-16 11:17 UTC by Abel Braaksma
Modified:	2016-08-25 16:22 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Abel Braaksma 2016-02-16 11:17:48 UTC

From the workshop at XMLPrague 2016 and the discussion during the F2F day 2 on Bug 29467 we assessed that it was very desirable to have the possibility to switch OFF streaming for xsl:stream. The current means to do so are cumbersome to do implementation-independent way, or are at API level.

Comment 1 Michael Kay 2016-02-17 14:28:42 UTC

Leaving open for now. More input on use cases is needed. There are some complex interactions:

* What should a streaming/non-streaming processor do?

* What should they do if the code is/is-not guaranteed-streamable?

* What should they do if the code is streamable but not guaranteed streamable?

Comment 2 Abel Braaksma 2016-02-18 11:59:13 UTC

One of the main problems here is that it is (probably) not possible for a processor to create an extension attribute for this, because it is not allowed to violate default behavior of xsl:stream. If we decide that we should leave it to implementers, we should open up the conformance section to allow such an extension.

Additionally, part of the discussion was on the topic of such an option for xsl:mode and xsl:merge-source as well. I think the main use-case is for xsl:stream, as for xsl:mode and xsl:merge-source a switch can be used by the programmer through shadow-attributes with _streaming="{$use-streaming}" and then streaming on/off can be set globally.

For xsl:stream such an option does not exist. In hindsight, something like <xsl:process streamable="yes" source="streameddoc.xml"> might have been a more configurable syntax.

Suggestion:

We might consider <xsl:stream streamable="yes|no"> where the default is "yes". I think this would serve most use-cases. It looks a bit odd, but is in line with other instructions that support streaming, hence I think it fits in our orthogonality principles. It should *not* be an AVT, otherwise static analysis becomes problematic, but as shadow attribute it is then statically configurable.

Having this in place, a user can set one global static parameter, say $use-streaming, and use that as a shadow attribute in all xsl:mode, xsl:merge-source, xsl:accumulator, xsl:stream, xsl:global-context-item instruction and declaration. It solves the main use-case from the workshop: to test the whole stylesheet without streaming and streamability analysis, and then, once it works, with streaming by switching the static parameter.

Comment 3 Michael Kay 2016-02-18 12:19:30 UTC

I agree, xsl:stream/@streamable = yes|no seems the best option available, even though it looks odd.

Presumably the effect of @streamable="no" is to switch off both the check for guaranteed streamability, and streamed execution.

We discussed the relationship to the rule in 19.10:

<quote>
If a construct is declared as streamable but is not guaranteed-streamable (that is, if it fails to satisfy the conditions for streamability defined in this specification), then the processor must be prepared to do any one of the following at user option:

1. Signal a static error [see ERR XTSE3430]

2. Process the stylesheet as if it were a non-streaming processor (see below)

3. Process the stylesheet with streaming if it is able to do so, or signal a static error [see ERR XTSE3430] if it is not able to do so.
</quote>

Noting also that option (1) is a feature-at-risk.

If we want XSLT syntax to control the choice between these three options, rather than relying on unspecified implementation-defined syntax as we do now, then we might consider these as additional values of "streamable":

streamable="yes-strict"
streamable="yes-fallback"
streamable="yes-optimistic"

Recall also that we suggested in discussion that in option (3), we might want to change the word "static" to "static or dynamic" - that is, to allow processors to adopt an optimistic strategy where streaming might fail dynamically under some input conditions.

Comment 4 Michael Kay 2016-02-19 00:09:47 UTC

A point about extension attributes here: what the spec actually says is:

The presence of an extension attribute must not cause the principal result or any secondary result of the transformation to be different from the results that a conformant XSLT 3.0 processor might produce.

It's therefore entirely reasonable to use extension attributes to switch streaming on or off, or diagnostics on or off, even in cases where the spec mandates streaming or mandates diagnostics. If the output of the transformation is not affected, you're within the rules.

(And in practice, I don't feel a need to take the rule too seriously anyway. You can always put a sentence in your documentation "Using the my:xyz option causes the processor to behave in a non-conformant way", and no-one can complain that your software behaves in a non-conformant processor when the user specifically asks for this.)

Comment 5 Abel Braaksma 2016-02-20 17:22:54 UTC

> then we might consider these as additional values of "streamable":

> streamable="yes-strict"
> streamable="yes-fallback"
> streamable="yes-optimistic"

In principle I am in favor of such an option (esp. since we already offer optimistic streaming, though the option is not documented yet), but I'm also hesitant, as I'd rather have the stylesheet as a whole be analyzed/processed as strict/fallback/optimistic.

I'd also favor any library package to have to be strict (for compatibility reasons), but that may be both an unnecessary restriction and/or hard to define in spec prose.

For consideration then, I would like to suggest (xsl:package|xsl:stylesheet)/streamability-level = "strict | fallback | optimistic", or something of that kind, which is easier to implement at this stage than trying to define interactions of the several constructs if one is strict and the other is optimistic. If library packages must be strict and import precedence is not an issue (only the principle stylesheet module counts) then I think this could work without being too much of a change, and it relieves the stress of Rule 1: Raising a static error.

Since fallback and optimistic are implementation-defined, we could define this such that only "strict" is required.

Comment 6 Michael Kay 2016-03-10 13:10:31 UTC

I would be inclined to go with the following.

(a) A new attribute xsl:stream/@streamable="yes|no". The default is yes.

(b) In 19.10, change:

The xsl:stream instruction implicitly declares that its contained sequence constructor is streamable;

The xsl:stream instruction implicitly declares that its contained sequence constructor is streamable except when the attribute streamable=no is specified.

(Note: a non-streaming processor already executes the xsl:stream instruction without actually streaming. Setting streamable=no causes a streaming processor to behave like a non-streaming processor; in particular, it does not check that the construct is guaranteed streamable.)

I looked at the following idea:

@streamability = strict | fallback | extended

The attribute defines the action of a streaming processor when the stylesheet contains a construct that is declared streamable but not guaranteed streamable. The values are

1. strict: Signal a static error [see ERR XTSE3430]

2. fallback: Process the stylesheet as if the processor were a non-streaming processor (see below). Note that non-streaming processors are allowed to use streaming if they choose, so this does not necessarily imply a non-streamed evaluation.

3. extended: Process the stylesheet with streaming if it is able to do so, or signal a static error [see ERR XTSE3430] if it able to determine statically that it cannot do so.

(d) relate these three options to the (a,b,c) options in 19.10.

But on balance, I'm reluctant. I think it's hard to pin down the semantics here. We don't define "streaming" precisely, there are many hybrid strategies, and it's ultimately a matter of judgement whether a particular strategy counts as streaming or not. It also interferes with the general rule that implementations are always allowed to attempt streaming if they choose, and they are always allowed to fail if they run out of resources.

My own preference (as anticipated in the "features at risk") is to change the sentence

If a construct is declared as streamable but is not guaranteed-streamable (that is, if it fails to satisfy the conditions for streamability defined in this specification), then the processor must be prepared to do any one of the following at user option:

If a construct is declared as streamable but is not guaranteed-streamable (that is, if it fails to satisfy the conditions for streamability defined in this specification), then the processor may do any one of the following (and may give users control over which option is chosen):

As I've said before, Saxon's streamability rules will give the same answer as the W3C streamability rules in 99% of cases, but I don't think it's possible to achieve 100%; there will always be cases that aren't streamable according to the W3C rules but where Saxon doesn't detect the non-streamability, and that's because we do a lot of expression rewriting before we assess streamability. So I think defining attributes that purport to give even finer control over streaming behaviour is the wrong direction to go in; the semantics of such attributes are almost impossible to define in a way that is objectively testable.

Comment 7 Abel Braaksma 2016-03-10 13:51:57 UTC

On (a) and (b) I agree with enthusiasm to the proposed changes, these are good options and fix the XML Prague dilemma we faced during the workshop.

On (c)
I understand the reluctance, but it also offers a way out for situations where otherwise differences between processors would yield a streamable stylesheet in one processor and a non-streamable one in another.

In the case of "strict", processors can always choose to do (some) analysis prior to expression-rewriting (we carry the streamability properties of a construct along when doing the rewriting, for instance).

I would prefer to have this option, but be lenient about the details.

"strict", I think, should be the only one that is required to be supported.

"fallback" would be nice to have, and is a big opportunity for easier transition from non-streaming to streaming stylesheets (envision processors doing an implicit copy-of for non-streamable constructs, for instance).

"extended" or "optimistic" I would like to allow processors to raise a dynamic error. I.e., this fixes the situation where a user knows that the XML is streamable because it knows the document, the processor can use a strategy where it tries to stream it, but if it finds it must "look back", it raises a dynamic error. (it may still raise a static error if the processor knows it cannot possible stream it, i.e. with an xsl:sort on a streamed node).

We have had these options for a while now, I think it is good to make them part of the language, not just API-dependent options (either with or without this attribute, it won't be very easily testable, so this doesn't change that status-quo).

Comment 8 Michael Kay 2016-07-01 17:32:15 UTC

We dug a very deep hole for ourselves when discussing this, and I'm really not sure why.

I really think there is a very strong case for having an instruction that does exactly the same thing as xsl:stream except for the streaming.

In the particular example I've been addressing today, I want to compare the performance characteristics of a streaming implementation with a non-streaming implementation (and at the same time check that the results are the same), before deciding which to use. In due course, I might decide that I want the stylesheet to be able to operate either way.

With other constructs in the language, I can switch between streaming and non-streaming versions of my code by setting streamable=yes|no, for example on xsl:mode and xsl:merge-source. I really can't see why there is resistance to allowing the same thing here. Sure, it's difficult to define the precise semantics, but that applies to the streaming machinery in its entirety.

Sure, I could have a command-line flag that says "implement xsl:stream without streaming". But setting options from the outside environment can be very inconvenient, and can be too coarse-grained (it applies to all xsl:stream instructions, not just one).

For the time being I've implemented an extension attribute to achieve this. But I really think it should be a standard option, as it is elsewhere.

If the xsl:stream instruction had been named xsl:process-document, and had been in the language before we started work on streaming, then I don't think we would have had any hesitation in adding a streamable="yes|no" attribute to the instruction.

Comment 9 C. M. Sperberg-McQueen 2016-07-07 16:38:59 UTC

A simple proposal (but not necessarily a non-controversial one):

1 We seem to be in agreement that allowing the xsl:source instruction
to carry a streamable attribute would improve the parallelism between
it and other constructs and thus improve the design of the spec.

2 We seem to be in agreement that this does not, strictly speaking,
require conforming processors not to stream the data.

Bug 29472 is a request for a way to turn off streaming processing on
xsl:stream instructions.  If we really do agree on point 2, then bug
29472 must be closed as INVALID (or WONTFIX, if we think it would be
desirable to be able to turn streaming processing on or off).

If we want to make the change described in 1 (and in the bug report),
even though it will not have the effect of reliably turning stream
off, then we need a new bug, which we can close by adding the
streamable attribute to xsl:stream (and, if we like, renaming 
xsl:stream).

Comment 10 Abel Braaksma 2016-07-08 07:53:35 UTC

(In reply to C. M. Sperberg-McQueen from comment #9)
> A simple proposal (but not necessarily a non-controversial one):
> 
> 1 We seem to be in agreement that allowing the xsl:source instruction
> to carry a streamable attribute would improve the parallelism between
> it and other constructs and thus improve the design of the spec.
I am in favor or renaming xsl:stream to xsl:source or xsl:doc or xsl:source-document or something similar that removes the perceived semantics that xsl:stream starts streaming. That way I think much of the controversy goes away and it brings the instruction in line with other instructions that have the @streamable attribute.

> 2 We seem to be in agreement that this does not, strictly speaking,
> require conforming processors not to stream the data.
I agree. This bug report was inaptly worded. It should be noted that when I wrote "switch OFF streaming", it meant: "remove the requirement to do guaranteed-streamability checking (if you claim conformance with 19.10) and stream the input as per the rules in section 19.10".

Any processor is always allowed to stream the input (fn:doc, fn:unparsed-text etc) if it can and wants to, regardless of whether it claims conformance with section 19.10, regardless whether the @streamable attribute is present.

In fact, we even say that "A non-streaming processor is not required to assess whether constructs are guaranteed-streamable". Which I think means that they can check for guaranteed-streamability if they want to.

> Bug 29472 is a request for a way to turn off streaming processing on
> xsl:stream instructions.  If we really do agree on point 2, then bug
> 29472 must be closed as INVALID (or WONTFIX, if we think it would be
> desirable to be able to turn streaming processing on or off).
If that helps I have no problem with that. But I think that in spirit this bug report was raised with the static streamability checking (per 19.10) in mind. In comment 8, Michael explains that other use cases can be considered also.

> then we need a new bug, which we can close by adding the
> streamable attribute to xsl:stream (and, if we like, renaming 
> xsl:stream).
I can go either way (new bug, or this one). I have renamed this bug report, perhaps that helps already. I would vote for renaming xsl:stream.

Comment 11 C. M. Sperberg-McQueen 2016-07-08 13:44:18 UTC

In comment 10, ABr suggests that this bug can be reinterpreted as having been not about switching streaming processing off and on but about the availability of a declarative statement that a document read by an xsl:stream instruction is or is not warranted streamable by the stylesheet author.

It seems to me that one of the important things learned in the process of digging the deep hole MK mentions in comment 8 is that when we are being careful, everyone in the WG agrees on the utility of a careful distinction between (a) a description like "a declarative statement that a document is or is not warranted streamable" on the one hand and (b) descriptions like "the possibility to switch OFF streaming" (description) or "the effect ... is to switch off ... streamed execution" (comment 3) on the other.  It seems very clear from the description, comment 1, and comment 2 that at the workshop in Prague the felt need was for something best described along the lines of (b).  

Closing this bug by doing (a) amounts to a claim that there really is no important distinction between (a) and (b) after all; that is a claim I am not ready to tolerate, let alone endorse.  

I recognize that different members of the WG assign different degrees of importance to the difference between (a) and (b), but just as I have accepted the unwelcome fact that not everyone is willing to make any effort to distinguish clearly between declarative and imperative semantics at all times, so I ask other members of the WG to accept the unwelcome fact that some members of the WG (such as me) regard the distinction as fundamental.

Comment 12 Abel Braaksma 2016-07-11 10:32:07 UTC

Re comment #11: let's create a new bug then and close this as INVALID (do we need a WG decision for that or can we just close and move on?). And apologies for the fuzziness of my language, I still have trouble understanding the subtle differences between the different wordings mentioned here and in offline discussions.

Comment 13 Abel Braaksma 2016-07-20 14:19:28 UTC

Closing this as invalid per the suggestion in comment 9, a new bug has been created per the same suggestion, see bug 29747.