XML Processing Model WG -- 19 Jan 2005

<scribe> Scribe: Norman Walsh

<scribe> ScribeNick: Norm

Date: 19 Jan 2005

Administrivia

accept this agenda?

-> http://www.w3.org/XML/XProc/2006/01/19-agenda.html

Accepted.

accept minutes from the previous teleconference?

-> http://www.w3.org/XML/XProc/2006/01/12-minutes.html

Accepted.

next meeting: 26 Jan 2006.

Any regrets?

Possible regrets from Norm

Henry to chair in Norm's absence

Tech Plenary

Registration is now open; discounted rates at the Sofitel end tomorrow

Requirements and Use Cases

Thank you, Alex

Alex: Some consolidation needed on the use cases; they aren't tightly coupled with the requirements yet
... Many use cases contain the same components.
... It might be a good idea to factor out the common bits

Norm: Was lack of discussion about PSVI in "Infoset Processing" intentional?

Alex: It does say augment...

Norm suggests "Infoset, augmented infoset, or other data models"

Erik: Is there agreement already that the infoset is our minimum?

Alex: From an XML perspective, that seems right

Norm: Erik, did you have something else in mind?

Erik: No, that's fine

Alex: Augmenting or annotation infosets is something I've always imagined a pipeline language could support

Norm points out some ways that XDM is different from Infoset.

Erik: can you consider the XDM as a superset of the Infoset?

Richard: It's not a subset or superset but it has a correspondence

Some discussion of XDM follows.

Richard: I'm worried that we're trying to include too many things. We should concentrate first and formost on infosets and anything else should be an extension

Erik: I'm inclined to agree, but we have also had mail that suggests we need to support some things that aren't infosets (documents containing only text nodes, etc.)
... The best place to look if we need to do those things may be XDM.
... Rather than specifying our own thing, we should point to XDM.
... This might also make it easier to deal with parameters as they could be the XDM instances from some previous process

Alex: I can see the advantage of XDM, but there are lots of simple pipelines that don't require that

Richard: I assume at this stage we aren't planning to standardize the representation of XDM. One way to do this is to say that you pass infosets or extended infosets and anything you want to pass like XDMs, you represent them as extended infosets

Norm: that appeals to me

<alexmilowski> How about: "At minimum, an XML document is represented and manipulated as an information set. Use of a super-set, augmented or extended information sets, or data models that map to information set should be allow be implementations."

Erik: this is a minimal goal; we may come back to this later

Richard: I think that saying things like XDM are represented as extended infosets at least gives us a handle on what kinds of things can be in components
... I'd like to say that what passes between components are infosets, possibly augmented

Erik: I think one thing we are trying to say is that this is a minimum requirement. The model must support passing infosets between components. If an implementation wants to say that it supports passing things from XDM, maybe we should try to allow this.

Richard: I did not intend to allow you to pass a sequence of integers between components
... I want extensions such as that to be outside the scope of what we are standardizing.

Erik: XDM is an XML Data Model and it does allow more things

<Zakim> MSM, you wanted to express uneasiness about the phrase "represented as" followed by the phrase "an infoset". Infosets are way too abstract to serve as representations in any

MSM: Richard made me very nervous by speaking about data models "represented as" infosets. It suggests that infosets are APIs or data structures and I think they are neither. They are more abstract.

Richard: I understand your reservation and I agree essentially.
... I certainly wasn't implying anything about an API; I simply wanted to constrain things sufficiently such that we could speak of passing information items around rather than talking about integers and sequences.

MSM: I think that can be paraphrased as "data models conceptualized as infosets" and I'm happy with that.

<richard> i am happy with "conceptualized as" rather than "represented as"

Norm: Any other concerns about design principles?

Moving on to terminology...

Alex: I tried to factor out the the distinction between straight-through processing as pipelines and process models which may be more complex.
... Those are both really the subject of our spec.

Henry: I don't find this helpful. It's useful to get started on iterating over this.
... I don't think restricting pipeline to "straight through" is very clear or likely to work for this group.
... I agree the distinction is important, but I'd rather not do it this way.

Richard: I partly agree with Henry because I think "process model" is probably interpreted as a more general term and includes descriptions of how they are processed that doesn't include things that we're going to describe.

Norm: I concur

Alex: Maybe we could define one and then I could take a pass through to use that term consistently.

Norm: I think of the pipeline as the whole thing

<ht> HST likes 'XML Pipeline' for the whole space, 'pipeline language' for AM's 'specification language' and 'pipeline document' for an XML document in a pipeline language

Richard: I see pipeline conceptually as the flow of a document through a series of components. They aren't linear.

Erik: If we do use the term "pipeline", I'd like that to mean the whole thing.

<ht> +1 -- A pipeline is a configuration of steps, steps involve components and connectivity and parameters

Henry: I like "pipeline" for the whole thing that documents pass through, I like "pipeline document" to describe a document in a "pipeline language". Pipelines have "steps" which consist of "a component" plus it's parameters and connectivity.

Alex: I don't like the term "process model". So I'm happy to consolidate these things into "pipeline"

Richard: A step can be used in isolation, but when you specify what a unix program does, you specify what it's standard input/output/error and parameters are and that seems to be consistent with "component"

Erik: A component could be XSLT or XQuery, but a step is an instance of one of those things.

Richard: I agree that a step is an instance of a component with various things associated with it.

Erik: "Step" has a strongly linear connotation, but we might have things in parallel or conditional.

Richard: Yes, but it is used in descriptions of programming languages and that covers the parallel case for me.

Alex: Maybe we can take a stab at defining "pipeline" and "step" to replace process model.

Alex proposed something the scribe failed to capture

<alexmilowski> A pipeline is a configuration of steps that defines, but not limited to, order, dependencies, or iteration along with their configuration.

Norm asks about the term "component vocabulary"

Some discussion follows

<ebruchez> We've used the term "interface" to describe how a component communicates with the rest of the pipeline

Alex: when I say vocabulary, I actually mean an XML language (a set of XML elements)

Richard: I think what Erik described as an interface is what I described as a "component specification": the thing that a generic pipeline editor would need to have
... to allow you to join components together

Alex: maybe we could consolidate "use environment" and "binding"?
... there's a whole context in which a pipeline runs.

Richard: I think "environment" is quite widely used. Binding seems more specific.

Alex: Maybe we could use "pipeline environment"

Norm: Yes.

Alex: I suggest that people send feedback by email. The two most critical bits are: which things should be combined or refactored and the connections between requirements and use cases.

<richard> The great thing about the term "pipeline" is the associated plumbing metaphor - sources, sinks, the ability to insert Ts and so on

Alex: I'll shoot for another draft on Tuesday.

Any other business

If you're going to be at the f2f, please describe your conflicts for Monday/Tuesday by email.

If you're not going to the f2f, please indicate if you'd like to dail in for all or part of our meetings. Taking into account the time of day in your part of the world :-)

ADJOURNED

XML Processing Model WG

19 Jan 2005

Attendees