XML Processing Model WG -- 04 Nov 2010

Accept this agenda?

-> http://www.w3.org/XML/XProc/2010/11/04-05-agenda

Henry can call in between 16:15 and 17:00, so we'll move review of processor profiles to the end of the day

Accepted.

Accept minutes from the previous meeting?

-> http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2010Oct/0017.html

Accepted.

Next meeting: telcon, 18 Nov 2010?

No regrets heard.

Review of proposed XProc errata

-> http://www.w3.org/XML/XProc/2010/05/wd-comments/

(We've got things mixed together on the issues list; Norm will fix that later)

Allow p:xslt to produce an empty sequence?

Vojtech: It would require all implementations to change.

Alex: It is annoying.

More discussion...

Norm: I don't hear consensus to make the change as an erratum.

Mohamed: I think it's an uncommon problem, and the folks who encounter it, the ones using xsl:result-document, are probably able to work around it.
... It might be more confusing for users with simpler stylesheets to understand why it's a sequence.

Proposal: No change to the spec, the test suite has already been updated by Vojtech.

Accepted.

xml:id processing in XProc

Mohamed: We only say "may" in the spec, so I don't think we can say that xml:id processing is mandatory.

Vojtech: But the revised profiles document makes it explicit.

Mohamed: We don't have xml:id in the implementation-defined features list.

Norm: I think we need to do that as an erratum.
... I just don't think we can change "may" to "must" in an erratum.

<scribe> ACTION: Norm to draft an erratum to add xml:id to the implementation-defined features list. [recorded in http://www.w3.org/2010/11/04-xproc-minutes.html#action01]

Alex: If you were going to make xml:id required, you'd have to say it was performed on all the inputs where ever they came from, on p:document, on p:inline, and on the outputs of all steps.
... Should we say that in the spec as part of the erratum, explaining why xml:id was left as "may"?

Norm: Yes, I'll try to do that when I add the text to make xml:id implementation-defined

Proposal: No technical changes, just clarify that xml:id is an implementation-defined feature

Accepted.

Shouldn't choose report err:XD0026 too?

-> http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2010Sep/0003.html

Norm: this looks like a straight-up erratum to me

Sounds of general agreement

Proposal: Fix the prose for p:xpath-context to make it clear that err:XD0026 should be raised there too.

Accepted.

<scribe> ACTION: Norm to propose an erratum to fix p:xpath-context [recorded in http://www.w3.org/2010/11/04-xproc-minutes.html#action02]

New and upcoming XProc implementations

(Topic suggested by Mohamed)

General discussion: Tubular submitted test suite results recently. There's a .NET implementation in the works from Oliver H. Vojtech knows of another Java implementation that's coming.

Mohamed: We should update the public XProc page too.

Norm: Yes. Want to take a stab at it?

<scribe> ACTION: Mohamed to propose new text for the public XProc page. [recorded in http://www.w3.org/2010/11/04-xproc-minutes.html#action03]

Simplified template step

Mohamed: You can do it with XSLT

Some exploration of how XSLT Simplified Stylesheets work

Much discussion...

Alex: Are some of these things really just syntactic sugar that you could implement by translating to some equivalent 1.0 pipeline?

The scribe provided the following summary of this issue after lunch:

The WG discussed the problem of a "simplified template" step, something that would make it easier to construct documents with dynamically generated content.

There seem to be two paths: a clean-slate design targetted at XProc V.next where we can change the semantics of pipelines in any way we want, or a design that would be valid in XProc 1.0. Although the former might produce the best results, for the short and medium term, it seemed wise to consider what we could accomplish within the constraints of XProc 1.0.

After much discussion, the WG concluded that we could achieve significant simplification with two new XProc atomic steps: p:in-scope-names and p:document-template.>

<p:in-scope-names>
  <p:output port="result" primary="false"/>
</p:in-scope-names>

The p:in-scope-names step is roughly analagous to the p:parameters step. It creates a c:param-set document containing one c:param element for each in-scope variable and option. The order of the c:param elements is implementation-dependent.

<p:document-template>
  <p:input port="template"/>
  <p:input port="source" sequence="true" primary="true"/>
  <p:input port="parameters" kind="parameters"/>
</p:document-template>

The p:document-template step makes a verbatim copy of its source input with one exception: within attribute values and element content, every expression delimited by curly braces is treated as an XPath expression and replaced by the result of evaluating that expression.

The context item used during evaluation of the expression is the document that appears on the context port. It is a dynamic error err:XDxxxx if more than one document appears on the context port.
In an XPath 1.0 implementation, if p:empty is given or implied as the context, an empty document node is used as the context node. In an XPath 2.0 implementation, the context item is undefined. It is a dynamic error (err:XD0026) if the select expression makes reference to the context node, size, or position when the context item is undefined."
The names of all the parameters passed to the step are available as variable names during expression evaluation.
If an expression selects one or more nodes from the context, a copy of those nodes is inserted in the result document, unless the expression occurs in an attribute value in which case their string value is used.
The sequence "{{" is replaced by a single, literal "{".
The sequence "}}" is replaced by a single, literal "}".
The version of XPath supported by the step is implementation-defined. The XPath context is the step XPath context; this means, for example, that the XProc extension functions cannot be called.

Suppose you wish to construct the following document:

  <c:request method="POST" href="http://example.com/post"
             username="user" password="password">
    <c:body>
      <h:div>...</h:div>
    </c:body>
  </c:request>

Where the method, href, username, and password are computed by the pipeline as either options or variables and the body of the post is selected from one of the pipeline's input documents (say /h:html/h:body/h:div[3]).

Using only XProc 1.0 standard steps, this can be accomplished with several successive p:add-attribute steps and a p:insert step. (There may be other ways as well, though they are all a bit tedious.)

Using these new steps, it's much simpler:

  <p:in-scope-names name="vars"/>

  <p:document-template>
    <p:input port="template">
      <p:inline>
        <c:request method="{$method}" href="{$uri}"
                   username="{$user}" password="{$password}">
          <c:body>
            { /h:html/h:body/h:div[3] }
          </c:body>
        </c:request>
      </p:inline
    </p:input>
    <p:input port="source" sequence="true">
      <p:pipe step="main" port="result"/>
    </p:input>
    <p:input port="parameters">
      <p:pipe step="vars" port="result"/>
    </p:input>
  </p:document-template>

This is in many ways like a simplified stylesheet in XSLT. In fact, aside from the new semantics associated with curly-braces in element content, it could be implemented in XSLT. Conversely, it could be implemented in XQuery, if you were willing or able to escape all of the markup and pass it to the p:xquery step in a c:query element.

(There was some subsequent editing of the examples and fine tuning of semantics, but the WG agreed to publish something along these lines as a WG Note. Norm agreed to write it up.)

Charter for XProc

Norm: What should we do next? Fold up our tents and go home or do more work?

Liam: The XML Activity has a charter, as do the individual working groups. They all expire in January. This is normal, it's a chance for the membership to review activities.

Liam outlines the process.

Some discussion of 1 or 2 year charters; a 2 year charter implying XProc 2.0 work.

Norm: We have two implementations and reports of as many as four or five more in the works.
... I think I'd like a 1 year charter for maintenance and possible requirements gathering, then after a year see where we are.

Liam: I'd like to be able to consider pipelines, with synchronization points, as a possible solution for more complex processing requirements
... For example, as an alternative to XQuery Scripting Extensions.

Norm: I'd be happy with a charter that broadly spoke of maintenance and possible requirements gathering with some explicit discussion of interaction with other working groups to consider possible cooperative activities.

XML processor profiles

Henry joins by phone.

Norm: I think we're in good shape. I like the document. I discussed it informally with the TAG over lunch.

Norm summarizes his informal discussion with the TAG over lunch. Has agreed to write another document, one that builds on the profiles document to say what a processor that receives an application/xml document should do.

Norm: There's still a desire to have a document that says more along the lines of "XML Functions", but it doesn't have to be this document and it doesn't have to be a normative product of this WG.
... I agreed that I'd work on such a document.

<ht> I will help if we can actually figure out a ToC for the proposed additional doc't

<ht> I remain unconvinced that there is a coherent topic short of a PhD thesis in scope

Norm: Henry, did you get a chance to review the DoC?

<ht> I did look

<ht> I believe all are closed

Norm: Next steps: clean up the typo, republish as a Last Call with an explicit note that we plan to go directly from LC to PR. Explicitly ask David and Bjorn if they're content with the resolutions.

<scribe> ACTION: Henry to produce such a Last Call draft. [recorded in http://www.w3.org/2010/11/04-xproc-minutes.html#action04]

<scribe> ACTION: Henry to close the issues on the DoC that we believe are resolved. [recorded in http://www.w3.org/2010/11/04-xproc-minutes.html#action05]

Iteration

Some discussion of the XML Calabash "iterate-to-fixed-point" step.

Alex points out that his use case, combining the entries of a paginated Atom feed into a single feed isn't well-served by this step.

Mohamed suggested that the p:iterate step will provide a way to do the pagination use case easily.

Florent observes that if you have p:iterate you can implement fixed-point iteration with it.

Mohamed: The p:iterate step will iterate over a sequence, but if the fixed-point case is a useful case, then we can probably make that work.

<scribe> ACTION: Mohamed to write up a proposal for p:iterate (along the lines of xsl:iterate from XSL 2.1) [recorded in http://www.w3.org/2010/11/04-xproc-minutes.html#action06]

More future step possibilities

Norm: We can't add new compound steps until V.next, but there are some we could write up as possibilities

Some discussion of the restriction on where p:variables can appear. We successfully convinced ourselves that we needed the restriction :-)

Some discussion of dependencies

It might be nice to have a partition element that simply ensures that all of the steps in one partition run before/after all the ones in another partition

Adjourned for the day.

Time passes. Night falls.

Reconvened on 5 November 2010.

Investigating methods for making pipelines easier to build

Vojtech: We're reimplementing the DITA toolkit in XProc (instead of Ant+Java extensions)
... It's about 9000-10000 lines of XProc
... Top-level pipeline consists of about 20 steps
... If you didn't want to do, for example, conref processing (one of the steps in the middle)
... how would you turn that off or replace that with something else?
... Suppose you want to do it just slightly differently
... Currently only option is to cut-and-paste the entire pipeline and change the one part I want to change.
... It would be nice if you could pass a step in dynamically or do some sort of replacement

Alex: Or pass in a subpipeline

Norm: The problem is that there's no obvious, single step that you want to override, you need to access them all

Vojtech: Even if they copy-and-paste to change the definition of a pipeline, the original pipeline may be called from somewhere else where they have no control.
... It would also be nice to have library-level variables or constants.

Norm: I can imagine we might do that...

Mohamed: But it's not really variables, you want to override them, so it's more like parameters or options.

Norm: That could get messy.

More discussion

Vojtech: I really want to replace all steps of a given type, not just a single instance.

Florent: It seems that there are two aspects here: the ability to override an entire step type (like XSLT import precedence) and the ability to modify the internals of one type.

Norm: Maybe you could make it work with just a mechanism to override whole step types. If you want to override only some, you can conditionalize the internals.

Mohamed: But how would you know what instance it was?

Norm: Yes, I think we'd have to provide a function to get the step name.

Alex: Sometimes the pipeline author knows where the extension points are, where the author is making choices, and that's maybe slightly different.

Norm: That's where p:eval would be useful. You could let users pass in a pipeline.
... You can kind of do that today by putting in calls to step types that are undefined, then the user has to import both your pipeline and declarations for those types.
... (I wonder if that's actually true)...

Alex: It might be nice if we could make that easier; some mechanism for declaring steps "abstract" and then some mechanism for defining them when you import the pipeline.
... I wonder if users will be able to find ways to define where their extension points need to be or if we'll need a "redefine" mechanism.

Mohamed: I don't think I want to see the ability to change a single instance of a type, but being able to change a whole type seems like something I might want to see.

Norm: Isn't this just like what we were talking about in XSLT WG about overriding templates?

Mohamed: Yes.

Norm: Ok. I'm not dreaming.

"Streaming" XProc

Mohamed: The idea is the same as it was at the beginning of XProc. Being able to sort out, whatever we call it, saying how you can make your pipeline more streamable.
... We already say that if you use last(), you've probably impacted streaming in the pipeline.
... It would be nice if we had a document that described a profile of XProc that would improve streamability.
... Innovimax is working on an implementation of XProc using multi-threading and streaming.
... We are planning to issue the project at the end of March. It is based on XML Calabash. We already have some interesting results.
... We're working at the same time on a streamable subset of XPath with a different research agency.
... We have customers with large volumes of data that can't use DOMs and they can't rewrite all their tools.

Norm: That could be interesting. If you had a defined streamable subset of XPath, you could have a switch to analyze a pipeline for streamability.

Mohamed: Like the work we're doing in XSLT 3.0 now for streaming, we should be looking at XProc with streaming in mind.

Alex: It always depends on what you mean by streaming. What's your bound?

Some discussion of the value (or not) in defining a streamable subset of XPath (whatever that means)

Florent recounts the example in XSLT of two XPath expressions interacting in ways that prevent streaming even though the expressions are simple.

Discussion leads to general agreement that a study of use cases is necessary and would be valuable.

Mohamed: Having a spec is only the first step. The second step is to have a document that describes usage patterns that will improve performance.

Mohamed suggests that we could have a workshop on pipeline performance.

Norm: So do we want to ask Liam to put something in the charter about investigating streaming or performance?

General nods of agreement.

More discussion of possible extension steps

<scribe> ACTION: Alex to review common concurrency patterns to see what might work best for us (.e.g. countdown latches) [recorded in http://www.w3.org/2010/11/05-xproc-minutes.html#action01]

- DRAFT -

XML Processing Model WG

Meeting 183, 04-05 Nov 2010

Attendees

Contents

Accept this agenda?

Accept minutes from the previous meeting?

Next meeting: telcon, 18 Nov 2010?

Review of proposed XProc errata

Allow p:xslt to produce an empty sequence?

xml:id processing in XProc

Shouldn't choose report err:XD0026 too?

New and upcoming XProc implementations

Simplified template step

Charter for XProc

XML processor profiles

Iteration

More future step possibilities

Investigating methods for making pipelines easier to build

"Streaming" XProc

More discussion of possible extension steps

Summary of Action Items