XML Processing Model WG -- 22 Dec 2005

<scribe> Scribe: Norman Walsh

<scribe> ScribeNick: Norm

Date: 22 Dec 2005

<AndrewF> ??P7 is AndrewF

1. Administrivia

Administrivia

1.1. Accept this agenda

Accepted

1.2. Accept minutes from the previous teleconference

Accepted

1.3. Next meeting: 5 Jan 2006. (No meeting 29 Dec 2005.)

Regrets for 5 Jan: None

1.4. Tech Plenary registration is now open

URI for registration: http://www.w3.org/2002/09/wbs/35125/TP2006/

Rui: Not sure if I can be present. I'll let you know when I am.

2. Technical

2.1. Use Cases

2.1.1. From Alex

Use Cases

Alex: Use cases: http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2005Dec/0011.html
... I sent another version this morning. [Scribe notes it isn't in the archives yet]
... I'm using these pipelines for math, but they aren't specific to math.
... I'm using them for web serves; tagsoup is injected to turn random HTML back into proper XHTML

<PGrosso> I see Alex's email at http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2005Dec/0021.html

Alex: Link I sent this morning includes pointers to onlin eexamples

Thank you, PGrosso

Alex: One of the components of my pipeline is the ability to pick out a subtree
... Important technical feature is the ability to one transform create a piece of markup that pointed to another transform.
... pipelines are sometimes embedded in the source documents

Norm: Did I understand correctly: you have a pipeline where one step generates the pipeline that's used for subsequent steps

Alex: No, they generate data.
... your pipeline is what it is, in the apply-xslt example, it's XSLT that decides what needs to be done next
... what you really want to do is decide based on data what needs to be done (including possibly making up what needs to be done on the fly)
... In cocoon you do this by redirecting to a new pipeline
... Each step can generate data that might cause subsequent stages to do something
... The interface to the tides example is a screen-scraping example; the component (using tag soup? --scribe) extracts data from a web page to find the tide data
... That's two pipelines that work together

Michael: we shouldn't rathole here, but it's not clear to me how the facilities that Alex is talking about are and are not feasible in coccoon.

Hopefully this can be clarified in email

Alex: You can do a lot of these in coccoon, but they have a simple one-level sequence path and I've got a more hierarchical model. Processing subtrees is like having embedded pipleines. That's hard to do in Cocoon because of their syntax.

Alex explains that the pipeline steps aren't dynamic but, for example, the selection of a particular stylesheet in a transfomration step might be data driven

2.1.2. From Andrew

Andrew: I put two simple cases, but I'm trying to make a couple of points.
... First, conditionality is required. Second, we'd like to have each componetn be independent, but sometimes we need to pass parameters from one stage on to the next.

Norm: When I spoke about not needing to pass parameters, it was just an observation. I'm not surprised that sometimes its needed

Alex: Can a particular step set a parameter for a later step?

Andrew: There are user-set parameters when you invoke the pipeline, but there's no other kind of input.

Alex: I have an example where stages can bind parameters for later stages.

2.1.3. From Jenni

Jeni isn't here, alas

Alex: Jeni makes the point that some steps are made up of sub-pipelines

Micheal: One thing that becomes very clear is that quite frequently you seem to have a choice of where to put certain kinds of functionality
... Conditional processing, for exampe, can be handled by choosing whether to invoke stylesheet a or stylesheet b or by writiing a stylesheet that checks a condition and then operates in mode a or mode b.

Michael: We seem to get to choose whether to put the complexity in the pipeline or in the individual stages.
... in her use case, she imagined parsing the ... scribe was distracted

<PGrosso> flanneling around?

Michael: it's not always absolutely clear what the implication of moving the complexity around is

Alex: putting all the complexity in a stylesheet makes the stylesheet hard to maintain

Michael: That's one reason to let some of the complexity percolate up. But it's not clear how to balance those tradeoffs

Alex: you can write extensions to XSLT and one of those could evaluate a pipeline

Norm: I think it's a mistake to focus so exclusively on XSLT as there are other kinds of components

<ebruchez> True

ack

<Zakim> ht, you wanted to point to step (1c)

<MSM> alexmilowski: one way to think about these questions is to look at the kinds of extension functions people have written for XSLT 1 -- sometimes those functions are there only because of deficiencies in the environment, and represent functionality that 'really' belongs in a pipelining language

ht: Jeni's step 1c is clear about the fact that it merges the output from 1a and 1b. Maybe Jeni's case is a little simpler than Alex's.
... It's clearly a requirement at some stage, though not clear that it has to be in V1

2.1.4. From Norm

<ht> Norm: Most of my examples are straightforward

<ht> ... Two interesting wrt V1 or not

<ht> ... First a sub-pipelining example similar to Jeni's 1c

<ht> ... Second where a step produces an indeterminate number of e.g. chapter.html files, each of which has to go through further processing

<ht> ... When you know exactly how many files will be output, it's clear how to do this in a fairly simple pipeline language, but when this isn't known in advance, not clear what to do

<ht> AM: XQuery and XSLT2 are clear examples of this, which I've thought of in terms of thinking of the output as a sequence of documents

<ht> ... This is parallel to the XPath-based viewport abstraction which I and others have been using

<ht> EB: Problem with sequence is that the items aren't named

<ht> ... Also, loss of symmetry, wrt names and/or cardinality, wrt inputs and outputs of steps

<ht> NW: In my example they'd be named

<ht> EB: In XSLT2 case, yes, but not in others. . .

2.1.5. From Rui

<ebruchez> ht, comment about sequences was from Erik

Rui: mine are similar to previous examples.
... In the first use case, if you have an XSLT pagination, you'll create a huge set of documents. But the main document will not be used further
... should we allow the pipeline author to express this?
... in the next scenario the question is one of reuse and composition of pipelines
... should we use XInclude, or would we like to have another sort of language to express the composition
... If you have one pipeline with a component that outputs more than one document, and those documents are needed by the next component, how can you be sure that the right documents were generated?
... Do we need a way to specify that a certain number or kind of documents will be produced at runtime

Erik: it looks like I did not get Rui's use cases. I got a blank email.

Several other people had problems. The MIME seems to have been garbled.

This is probably a consequence of a MIME message being forwarded by another client that does MIME

Use cases from Erik: http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2005Dec/0020.html

Erik: I did not really connect those use cases to XPL which is the language that we have designed and a variation of which we submitted to W3C
... These are just use cases from actual clients
... I thought it would be useful to categorize the environments where these are processed.
... I found three broad categories: command-line/batch environments, web enviromnets, and service environments
... I'm just trying to see if there are requirements that haven't been mentioned, I don't think so, except perhaps the question of validation
... is validation a custom-component pipeline step or is it something that is part of the pipeline language
... Very often our use cases start by saying "we need an XML document"; often they begin with a URI, but in some cases you can just consider passing a document to the pipeline itself, implying that the pipeline itself can receive and produce XML documents
... Looking at it this way, you can imagine that a pipeline interpreter might be a component in its own right.
... the second set of use cases involve conditionals. Our current thinking is that we do have many use cases that require conditionals.
... One of them is a conditional database access logic. Query a document, look at the content, if there's something then do an update otherwise do an insert.
... Anther scenario is content-dependent transformation; the transformation selected is determined by the output of a previous stage.
... Another example is where you want to generate a particular document for desktop or mobile browsers. The configuration from the outside determines which stylesheet (or pipeline? or pipeline stage? --scribe) is used.
... Another common use case is the selection of Atom or RSS1 or RSS2, etc., when generating feeds
... The next use case is a little different. Here an XML pipeline is used to implement an XML-RPC service. In the request you have method calls with method names. The sub-pipeline that's executed is determined by the method selected in the request.

Norm: I wasn't sure at what level you needed to make conditional selection

Erik: Based on whether an XPath expression returns true, you will execute a particular branch of the pipeline. Otherwise, you test another condition, etc. It's completely exeternal to the components.

2.2. Requirements

Erik: there are a few more use cases, that involve iteration.
... consider a collection of files on disk. You'd read a list of documents, either from a file or with a component that can scan the filesystem, you want to iterate on that list of docuents and for each iteration you want to perform a sequence of steps. Alternatively, you may want to combine all the results together.

Norm: that sounds like a colleciton

Erik: That's almost an implementation question. If your language supports multiple outputs in a dynamic way then maybe you can do that. But here the idea is that you want to perform a certain number of tasks, perhaps once for each element that matches an XPath expression.

Alex: that sounds a lot like the concept of identifying subtrees in an infoset using an xpath
... I've been experimenting with another kind, where you have a document that contains 10 entries and points to hte next document with the next 10 entries, etc. That seems completely different.

Erik: we've identified two types of interation; one is a for-each another is a while.

Any other business

Norm: I propose we continue with iteration next week. And begin looking at requirements.

Proposal: for 5 Jan, everyone submit a list of possible requirements so that we can begin to select the ones upon which we have consensus

Accepted.

Norm wishes the group happy holidays

Adjourned

XML Processing Model WG

22 Dec 2005

Attendees

Contents