This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25174 - [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document
Summary: [XSLT30] Buffering with xsl:try wrapped around xsl:stream or xsl:result-document
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XSLT 3.0 (show other bugs)
Version: Last Call drafts
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on: 25173
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-27 13:12 UTC by Abel Braaksma
Modified: 2014-08-22 11:37 UTC (History)
0 users

See Also:


Attachments

Description Abel Braaksma 2014-03-27 13:12:21 UTC
This bug report is a result of [1] and the response from the working group in [2]. The relevant quotes from the latter (minutes of 13 March 2014) are:

Key example (edited to be valid):
<xsl:try>
    <xsl:stream href="foo.xml">
        <xsl:apply-templates mode="streaming"/>
    </xsl:stream>
    <xsl:catch />
</xsl:try>


Key issue: 
xsl:try requires stable roll-back in case an error occurs that is caught by xsl:catch, even if that error is right at the beginning, i.e. when trying to read the source document.

Quotes from the minutes:
"ABr: but if you only want to catch a failure in opening the document, you've incurred the buffering cost for no benefit. And there's nowhere else to put the xsl:try in this case."

"MK: yes, it would be nice to recover from a failure opening the input, without having to buffer all the output."

"MK: this would suggest solution (3), an on-error output, but not sure how we would define the semantics. Basically, catching errors that occur before any output is written."

Suggested solutions (mark #3)
(1) disallow non-motionless expressions in xsl:try, which forces the programmer to do a copy prior to the xsl:try
(2) define an extra attribute on xsl:output to make rollback behavior on xsl:try optional
(3) make errors on xsl:stream uncatchable, or catchable only by a special attribute "on-error"
(4) disallow xsl:stream inside xsl:try


-------------------------------------------------------

The issue with xsl:try wrapped around xsl:stream effectively prevents (output) streaming, because the whole output is required to be buffered. This is not a problem if the result set is small, but if it is not, it will blow up streamability. 

The same issue occurs when xsl:try is wrapped around xsl:result-document, with one difference, the processor is not required to leave the result document in a stable state, i.e., it is possible to start writing output to a result document and _not_ rollback in case an error is raised. But this is not ideal either, as the user will be left with an uncertain state.

Suggestion #3
The suggested resolution #3 above introduces a new attribute on xsl:stream, on-error, which could take an expression. The special variables defined under [3], like err:code and err:description are available inside this expression. Example:

<xsl:stream href="foo.xml" on-error="my:report($err:code)">
    <xsl:apply-templates mode="streaming"/>
</xsl:stream>

There is one additional drawback here, however. The special variables in the err: namespace are currently lexically scoped (see [3]), which means you will have to pass each error variable. A solution to this is to (also) allow the errors to be available as a map (as in $err:bag or $err:map), which gives advantages in this scenario and related scenarios, or to allow the special variables to be dynamically scoped, comparable to current-group. In the latter case you can write:

<xsl:template match="/">
    <xsl:stream href="foo.xml" on-error="my:report-errors()">
        <xsl:apply-templates mode="streaming"/>
    </xsl:stream>
</xsl:template>

<xsl:function name="my:report-errors">
   <xsl:message select="$err:description" />
</xsl:function>

If we decide, however, to keep the current scope and variables for err:xxx, but we adopt static AVTs from [4], there is another way out for programmers to write this more effectively:

<xsl:variable name="errorhandler"
    select="'my:report-errors($err:code, $err:description, $err:value)'"
    static="yes" />

<xsl:template match="/">
    <xsl:stream href="foo.xml" on-error="{$errorhandler}">
        <xsl:apply-templates mode="streaming"/>
    </xsl:stream>
</xsl:template>

<xsl:function name="my:report-errors">
   <xsl:param name="errcode" />
   ....
   <xsl:message select="$errcode" />
   ....
</xsl:function>

IMO it is hard not to get enthusiastic about static AVTs, it seems to open up a whole new level of abstraction through preprocessing macros, that can greatly reduce many typical use-cases (but that's another subject, again, see [4]).

Semantics for on-error: it will only catch errors that occur prior to starting reading the document, perhaps up until the root node, which is in line with current rules (somewhere we say that buffering of DTD and opening comments etc is required in streaming). Other errors ought to be caught the normal way, using more fine-grained xsl:try/xsl:catch.

I propose to adopt this for xsl:stream and xsl:result-document, in the latter only to catch errors occurring from first attempt to writing the result document.

Recovery actions: when on-error is defined and called, we might introduce a return value true/false that determines whether further processing should take place or not, or add one more attribute: on-error-terminate="yes|no". We may also decide on whether the result of on-error becomes part of the current result tree or not.

See also: bug 25173.


[1] https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Mar/0012.html
[2] https://lists.w3.org/Archives/Member/w3c-xsl-wg/2014Mar/0014.html
[3] https://www.w3.org/TR/xslt-30/html/Overview.html#element-try
[4] https://www.w3.org/Bugs/Public/show_bug.cgi?id=24619
Comment 1 Abel Braaksma 2014-05-16 01:49:49 UTC
Following discussion during the telcon of April 10, 2014, the WG asked me to come up with a proposal. At that telcon, Michael Kay suggested to allow xsl:catch inside an xsl:stream instruction.

I think that proposal has merit, because it uses existing syntax and has less side effect issues as the proposal in this bugreport (which adds an attribute on-error on xsl:stream)

Here's an outline of the xsl:catch proposal:

Allow the xsl:catch instruction to appear as a child of xsl:stream. It must be the last child element (similar to xsl:try), except for xsl:fallback.

It will have an absent focus for both the sequence constructor and the the select attribute (reasoning: we try to catch an error on reading the input stream, so there will be no node to process).

If this element is present, then, prior to processing the sequence constructor in xsl:stream, the processor must attempt to read and buffer the streamed input document up until the start of the root element (i.e., it must perform the same action that would be required for has-children(root()), as described in bug 25173). If it fails, the appropriate error is raised, which can then be caught by the xsl:catch element.

If the error is not caught, it will bubble up and can be caught by a previous xsl:try/xsl:catch construct and the xsl:stream instruction is not evaluated further.

If the error is caught, the sequence constructor of xsl:catch is processed and the rest of the sequence constructor of xsl:stream will be ignored. No rollback is necessary, because the body of xsl:stream has not yet been processed.

If no error is raised, the xsl:catch instruction is ignored and processing continues as normal.

Notes:
Note 1: this is different from using fn:streaming-document-available, which will not throw an error but simply returns true or false, but has the potential side effect that upon a subsequent read in an xsl:stream instruction using the same URI, it may still raise an error.

Note 2: this instruction cannot be used to catch errors raised by the body of the xsl:stream instruction, it will only catch errors resulting from the initial reading of the streamed input document, i.e. when it fails to construct a streamed document node.

Note 3: this instruction is defined to allow graceful degradation without having to buffer the full result of streamed processing in case of a failure to read the input document. If you do want to catch errors during streamed processing, you can wrap the body of an xsl:stream element inside a regular xsl:try/catch, but this will incur the penalty that the processor will be required to buffer all output, which may be detrimental in certain streaming scenarios.

Note 4: because xsl:catch has absent focus, its sweep and posture are motionless and grounded.

Example:

<xsl:stream href="http://example.org/{$docname}.xml">
    <xsl:value-of select="count(//news)" />
    <xsl:catch errors="err:FODC0005">
       <xsl:text>Invalid docname specified.</xsl:text>
    </xsl:catch>
</xsl:stream>

As an aside: it came to my attention that we do not currently specify the error conditions for xsl:stream in regards to the input document. I assume they are the same as for fn:doc?
Comment 2 Michael Kay 2014-05-16 07:10:46 UTC
I wonder if this can't be done simply by defining an error code that is guaranteed to be used exclusively for errors encountered at the start of the streaming operation, and then catching this specific error with a conventional try/catch instruction around the xsl:stream? I.e. no new syntax, just new semantics for a specific error code?
Comment 3 Abel Braaksma 2014-05-22 11:12:39 UTC
Re comment 2:

I thought of it, but I think we run into even more trouble, consider:

<xsl:try>
   <xsl:stream href="good.uri">
      <xsl:apply-templates select="x" />
   </xsl:stream>
   <xsl:catch ="err:Special">
      <xsl:message select="'Rolled back'" />
   </xsl:catch>
</xsl:try>

<xsl:template match="x">
   <xsl:try>
      <xsl:value-of select="@y mod @z" />
      <xsl:stream href="{@baduri}">
         <xsl:apply-templates />
      </xsl:stream>
      <xsl:catch select="*">
         <xsl:message select="'What is rolled back?'" />
      </xsl:catch>
   </xsl:try>
</xsl:template>

The second try/catch must somehow differentiate between
1) buffering and rollback current context in case of div by zero
2) poking @baduri for I/O and rolling back without applying templates
3) buffering apply templates and rolling back
4) if halfway streaming getting I/O error, rolling back apply templates

Even though implementations might be able to do this, I don't think it is good to have one construct follow different semantics by one the same syntax and different rollback behavior with potentially the same (I/O) error.

Which is why I presently prefer the (slightly) different syntax (xsl:catch as child of xsl:stream) where the position and focus of xsl:catch makes it unambiguous what is going on and what is going to be caught and/or rolled back.
Comment 4 Michael Kay 2014-07-31 10:29:29 UTC
In discussion today we came up with the following approach.

We add an attribute recoverable=yes|no to xsl:result-document (and to xsl:output for the principal result document). The semantics of this attribute are that if recoverable=yes is specified, then any output written to the result document during the course of an xsl:try must be "undoable" if an error occurs during the xsl:try and is caught. (The implementation for undoing the bad writes might either use a rollback/checkpoint mechanism, or a buffering/delayed-write mechanism). 

But if recoverable="no" is specified, then the following happens: the processor is allowed to write output to the result document optimistically, and if a failure occurs at a point where it has written output that needs to be undone, then despite the fact that the error was caught, the xsl:result-document instruction itself fails saying in effect that the contents of the result-document are incorrect and unrecoverable. A try/catch around the xsl:result-document instruction can catch this error, and determine that the transformation should continue despite one of its result documents being unusable.

If recoverable="no" is specified, then the user can still manually prevent problems by doing local buffering of output using an explicit xsl:variable to hold temporary results; the potential for try/catch to cause result document corruptions occurs only when in final output state.

Users can also reduce the risk of xsl:stream causing problems by calling stream-available() to test whether a stream is available before starting to process it. There's still a risk of an I/O error on the stream later, and the choice of whether to buffer output to make this I/O error recoverable is now made at the level of the xsl:result-document instruction.
Comment 5 Michael Kay 2014-07-31 10:53:15 UTC
We could in addition allow recoverable="yes" on xsl:try in the case where the result document is non-recoverable to indicate that a section of code is recoverable (i.e buffering is required) even though the document as a whole is not. (This could also be achieved by using an xsl:variable around that section of code, but there was a feeling that this approach was too non-obvious).
Comment 6 Michael Kay 2014-08-04 14:42:09 UTC
I have written the solution of comment #4 into the spec for the working group to review.
Comment 7 Michael Kay 2014-08-21 14:10:37 UTC
At the telcon on 14 August we studied the new text, and the question was raised again of whether to put the new attribute on xsl:result-document, xsl:try, or both.

On reflection I'm inclined to put it on xsl:try. This has the advantage, originally pointed out in comment 5, that it becomes more visible what the recovery units are, and where buffering might be required to enable recovery.

If we were to add the attribute on xsl:result-document as well then I would propose that this merely acts as a default for the value on xsl:try. However, since the association of an xsl:result-document instruction to an xsl:try instruction is dynamic, this adds another piece of dynamic context information which I think we could well do without. Also, putting the attribute on xsl:result-document makes it messy to define an equivalent for the principal result tree: xsl:output is not really the right place as it's all about serialization. So my proposal is to have it on xsl:try only.

The text of section 24.3 is largely still applicable, though it now becoomes logical to move it to 8.3.2.
Comment 8 Michael Kay 2014-08-21 17:27:08 UTC
The WG accepted the proposal in comment #7.

There was some feeling that the attribute name "recoverable" could be improved. People would prefer something that linked it to the recoverability of the final result tree output, for example recover-output or rollback-output were suggested. It was left to the editor to ponder upon.
Comment 9 Michael Kay 2014-08-22 11:37:49 UTC
The changes have been applied.