24606 – [xslt 3.0] Multi-pass streaming

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24606 - [xslt 3.0] Multi-pass streaming

Summary: [xslt 3.0] Multi-pass streaming

Status:	CLOSED LATER

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	XSLT 3.0 (show other bugs)
Version:	Last Call drafts
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Michael Kay
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-02-10 19:40 UTC by Michael Kay
Modified:	2014-05-15 14:00 UTC (History)
CC List:	2 users (show)

See Also:

Attachments

Description Michael Kay 2014-02-10 19:40:34 UTC

This thread in the December archives remains unresolved.

See https://lists.w3.org/Archives/Member/w3c-xsl-wg/2013Dec/0028.html (member only) and subsequent thread.

We do not really have a clean way of doing multi-pass streaming in the current spec (other than relying on external frameworks like XProc to combine multiple stylesheets, of course).

The classic way of defining a multi-pass transformation in XSLT is to use variables:

<xsl:variable name="temp1">
 <xsl;stream href="input.xml">
   <some processing/>
 </xsl:stream>
</xsl;variable>
<xsl:apply-templates select="$temp1" mode="streamable"/>

But there is nothing here to say that $temp1 should be processed as a stream; indeed we encourage the idea that streamed nodes are never bound to variables.

I would be tempted to suggest something like

<xsl:pipeline>
  <xsl;stream href="input.xml">
    ...
  </xsl:stream>
  <xsl;stream>
    ...
  </xsl:stream>
</xsl:pipeline>

where each xsl:stream in the pipeline after the first takes its input from the result of the previous xsl:stream.

I'm trying to look around for a solution that doesn't involve new syntax, but it's hard to find one. 

* Adding "streamable="yes" to xsl;variable is a possibility but it's still new syntax, and the rules for what it means and how it can be used are potentially quite tricky.

* Some kind of coupling of xsl:result-document and xsl:stream might be possible:

  <xsl;result-document href="temp.xml" method="pipe">
    ...
  </xsl:result-document>
  <xsl;stream href="temp.xml">
    ...
  </xsl:stream>

* Or something like Saxon's next-in-chain:

<xsl:result-document next="mode2">
  <xsl:stream href="input.xml">
     ,,,
  </xsl:stream>
</xsl:result-document>

where the "next" attribute is a mode that is then used to process the result document.

Comment 1 C. M. Sperberg-McQueen 2014-02-13 11:39:36 UTC

We discussed this at the WG meeting in Prague.

Some WG members noted that this has been a desiderata for some time.  Some felt that this was too big a change to make after going to Last Call.

The option of using XProc means that this really is a desideratum, not a pressing requirement.

Perhaps the WG should comment on the current XProc requirements document and suggest that support for streaming processing should be a requirement.

Comment 2 dnovatchev 2014-02-14 21:15:15 UTC

Dear WG,

Please, kindly understand that not specifying useful functionality, because it *may* be implemented, is a user's hell.

At present XProc doesn't mandate any streaming, and leaves this entirely to each individual implementation.

From the XProc spec (http://www.w3.org/TR/xproc/#external-docs) :

 "Whether (and when and how) or not the intermediate results that pass
 between steps are ever written to a filesystem is
 implementation-dependent."

Another recent bug resolution -- 24648 -- again leaves an important feature on the whim of a particular implementor.

So, we end up with: Implementation1(Feature1) and Implementation2(Feature2).

I feel sad for the user who needs both Feature1 and Feature2.

Hope to have an official reply that this user's hell is what the WG really plans for us.

Regards,
Dimitre Novatchev