XML Processing Model WG

12 Jan 2006


See also: IRC log


Henry, Richard, Rui, Erik, Alessandro, Norm, Jeni, Paul, Andrew, Alex
Robin, Michael_(partial)
Norman Walsh




<scribe> Scribe: Norman Walsh

<scribe> ScribeNick: Norm

Date: 12 Jan 2006

<richard> http://www.w3.org/XML/XProc/2006/01/12-agenda.html


Accept this agenda?


Accept last week's minutes: http://www.w3.org/XML/XProc/2006/01/05-minutes.html


Norm reminds the group about the plenary and the hotel arrangements

Technical: email followup

Kinds of iteration?

Alex: we're getting more technical, are we start doing that now are we going to lose the requirements/use-cases thread?
... The question of what passes between processes is an important one at this stage.
... Core WG said "infosets" but now we need to support XDM and other augmented forms

Norm: the ability to pass around infosets and augmented infosets are both requirements in my mind

Jeni: I think there's a community that just wants to pass serialized XML around
... We ought to have a should or maybe requirement around those ideas

Richard: How is that different from an infoset?

Jeni: I think some folks care about whether things are represented by an entity or a Unicode character

Richard: So you're assuming the components aren't normal XML processors?

Jeni: The kind of pipeline I have in mind is one where someone takes a non well-formed XML document, smartens it up into XML, and then can report that as parsed XML to the next stage. Then later on, create some XML that is just a stream of characters (e.g., change particular characters into images)

Richard: So you'd be able to pass things around that aren't really XML?

Jeni: From my use cases, I think processes should be able to consume and produce things which aren't XML (especially HTML)
... Taking non-XML and turning it into XML is important.

Alex: Maybe we'll have to look at serialization more closely.
... Maybe some of the other things should be doable on the end of a pipeline.

Norm: I imagined non-XML only at the ends but there's nothing that would prevent someone from glueing several together I suppose.

Erik: Talking about non-XML stuff is a little scary because it's more like Unix pipes and is a little more complex. We need to be careful.
... Certainly it's important to some people to have some things, like entities, preserved, but if none of the existing data models do that, we should investigage why.
... In XPL, we only deal with XML infosets. If a component is trying to read data which is not XML, then either the component accesses the information externally (not through a connection in the pipeline) or you can encapsulate the information in some XML format (e.g., base64 encoded)

Alex: We need to be very careful not to try to take on more than we can handle

Norm: Jeni only said "could" or "should". Let's see if we can get a better handle on the issues when we have more information (later in the process)

Richard: If we're dealing with both plain infosets and augmented infosets, then we could have an "unintepreted text" mode as well. Though we wouldn't have any standard components that work on them.

Rui: We can look at the way cocoon handles this issue

Erik: Cocoon handles this by using generators or serializers following a model similar to what I said about XPL above.


Alex: I got as far as getting myself setup with XML Spec. I haven't done any new content, but I have a proposal about how it should be laid out.

-> http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2006Jan/0041.html

Alex: I'd like feedback on that layout before I proceed.
... Do we need a terminology section?
... There's a lot of terminology out there, we should define what we mean by things so that we don't confuse readers.
... The previous document had a section on "design principles" but those sound like "requirements" to me.
... I think we could introduce the idea that "design principles" are just very broad requirements.

MSM: Design principles are not simply broad requirements in the following way: there have been some people active in W3C WGs who have said a "requirement" is (a) a crisp, verifiable statement and (b) is a do-or-die thing; if you don't meet the requirements you don't ship.
... For people who take that view, keep it "short and simple" isn't crisp enough. Short you could manage, but "crisp" would be untestable.
... But equally, it's not exactly a do-or-die situation. If you set a target of 20 pages and the normative prose turns out to be 21 pages, you typically call that a success.
... If no one in our readership is going to interpret requirement as above, then Alex's proposal is fine. But there are those people in the world.

Alex: In that case, I would put "we process infosets" as a hard requirement.

<MSM> +1

Norm: Does that all sound ok to folks then?


Discussion of requirements in Alex's document:


1. The language must be rich enough to address practical interoperability concerns.

Design principle

2. The language should be as small and simple as possible.

Design principle

3. The language must allow the inputs, outputs, and other parameters of a components to be specified.


4. he language must define the basic minimal set of mandatory input processing options and associated error reporting options required to achieve interoperability.

There's some confusion about this one

Editor will refactor.

5. Given a set of components and a set of documents, the language must allow the order of processing to be specified.


6. It should be relatively easy to implement a conformant implementation of the language, but it should also be possible to build a sophisticated implementation that can perform parallel operations, lazy or greedy processing, and other optimizations.

Confusion? Design principles or requirements?

Editor will refactor.

7. The model should be extensible enough so that applications can define new processes and make them a component in a pipeline.


Richard: I think we should be careful not to use "extensibility" and "interoperability" without being fairly precise about what we mean.

8. The model must provide mechanisms for addressing error handling and fallback behaviors.


MSM: Are we talking about candidate requirements or requirements we've accepted

<richard> these are all "candidate" requirements at this stage, surely

Norm: I think we get to start over and we get to pick if these are requirements we accept or not after we believe we have a common understanding of what they mean

9. The model could allow conditional processing so that different components are selected depending on run-time evaluation.

MSM: Run-time evaluation is clear enough to count as crisp?

Alex: No, I think these will all get longer.


10. The model should not prohibit the existence of streaming pipelines.


Richard: we should be clear that you should be able to write pipelines that can be streamed rather than that every pipeline must be streamable.
... Some things that you might want to do with pipelines cannot be streamed.

MSM: Can we imagine an option where I ask if this pipeline is streamable and fail if it isn't?

Erik: I'm not sure I understand the question. Should you have an option to ask the pipeline engine if a pipeline is streamable?

MSM: I would like the option of having the processor tell me if I've failed to write a streaming pipeline.

Erik: This sounds like something specific to a particular implementation.

MSM: It may be infeasable in general.

Erik: The idea is to leave the door open to allow some processors to optimize something to be streaming.

MSM: If it's that difficult to tell, then I'm concerned about it being a requirement as opposed to a design goal.

Richard: Something like a general XSLT transformation cannot possibly be guaranteed to be streamble. There are some cases where the streambility is determined by the compoents.
... But if there are conditionals in the language then it may also not be possible to stream on that basis (.e.g, a condition that cannot be deterined until some stage has finished).
... As we proceed through, we shouldn't put anything in that prevents a streaming pipeline.

Alex; We can mark this as a possible new requirement and debate it as we proceed.

<ebruchez> I I think that's too specific of a reuirement

<MSM> I am having trouble imagining a language construct that would not only be non-streamable but would successfully prohibit the writing of streamable pipelines. Is the req as formulated by Core a nop?

11. The model should allow multiple inputs and multiple outputs for a component.


12. The model should allow any data set conforming to one of the W3C standards, such as XML 1.1, XSLT 1.0, XML Query 1.0, etc., to be specified as an input or output of a component.

I'd be inclined to state it broadly as a design principle.

<richard> Michael - a rule that downstream components must not start unless it is guaranteed that no upstream component will abortld be an example of such a construct

Alex: That boils down to specific ones for known languages.

Norm: I think we may be able to answer the question more generally, but I'm ok with that.

13. Information should be passed between components in a standard way, for example, as one of the data sets conforming to an industry standard.

Richard: I think that means it should use things like SAX and DOM

MSM: Except that neither SAX nor DOM is a data set.

Alex: we could refactor that to say that we don't want to preclude ... some list of known ways to pass infosets.

Richard: The Core WG may have been trying to express that it didn't want us to invent a *new* way
... The fact that the Core WG included it doesn't mean everyone there agreed with it.

Editor will refactor.

14. The language should be expressed in XML. It should be possible to author and manipulate documents expressed in the pipeline language using standard XML tools.


15. The pipeline language should be declarative, not based on APIs.

Erik: I would argue that XPL is declarative
... You really are declaring linking of components together and leaving it to the implementation to do the work

Richard: The idea here is that the language for expressing the connections between components should be declarative.

16. The model should be neutral with respect to implementation language.


Norm: Do you have enough to make a first pass?

Alex: Yes. If you look at this list and see things missing, we should add them.
... I'll take a first stab at it from the minutes of the preceding meetings.

<scribe> ACTION: Alex to produce document by c.o.b. 17 Jan 2006 [recorded in http://www.w3.org/2006/01/12-xproc-minutes.html#action01]


Summary of Action Items

[NEW] ACTION: Alex to produce document by c.o.b. 17 Jan 2006 [recorded in http://www.w3.org/2006/01/12-xproc-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.127 (CVS log)
$Date: 2006/01/26 17:28:48 $