XML Processing Model WG

27 Feb 2006 P.M.



Norm, Alex, Rui, Michael, Erik, Henry, Murray, Richard, Andrew (by phone)

XPL Presentation (See presentation)

 * Michael: (Clarification)
         The 'infosetref' attribute represents the binding and the
         are internal to the component
         The 'name' attribute is the formal parameter name.

 * Erik: the p:input and p:output declare the name of the inputs and
   outputs that are used to invoke the process and handle the results

 * Norm: (Clarification)
  It is the pipeline processor that looks at the inputs and outputs?

 * Erik: the inputs and outputs are evaluated in a lazy fashion and it
   back-chains through the steps which eventually leads to the input of
   the pipeline.

 * Alex: (Claification)
   How does back chaining work with conditionals?

 * Erik: The output of conditionals needs to have the same infoset name.

 * Erik: XHTML example (use case 5.15: Content-Dependent
    - one of the use cases.
    - one of the steps rewrites the QNames for presentation in IE
    - one of the steps deals with HTML serialization
    - the output for serialization uses an internal root element node
      for representation of text and binary (character encoded)

 * Erik: Iteration example:
    - lets you iteration over an document via xpath expression
    - the current() function gives you the current item being
    - gives you the ability to process large XML document
 * Murray: Does each of the steps have its own XML vocabulary (e.g. HTTP
 * Erik: Yes.
 * Richard: Do they require their own namespaces
 * Erik: No, but there it isn't required as it is contextual to the
   component.  Having another namespace adds declarations to the

GUI Tool Sub-thread:

 * Richard: Do you have a GUI tool?
 * Erik: No.
 * Richard: we should define the tool in terms of a graph
 * Norm & Michael expressed concern with this as they wouldn't
   want to require a GUI tool.  That starting with a graph could
   ignore the XML representation

Norm's SXPipe:

 * http://norman.walsh.name/2004/06/20/sxpipe

 * Stages are executed in order.  It is handed a DOM and returns a

 * In example, skip attribute allows steps to be skipped.  If statically
   evaluated to true, the step isn't executed.

 * Impl: two methods: init & run.  Init is passed the element that
   represents the stages.  1700 lines of java.

(Alex's presentation here)

Richard's presentation:

 * I want to replace what we do today without a pipeline with an XML

 * lxgrep - produces a tree fragment (multiple root elements possible)
   via an XPath

 * lxprintf - formats Xpath matches as plain text

     -e element   For each element

 * lxreplace - replaces elements/attributes
     -n   Renames an element

 * lxsort - sorts elements by values identified by an XPath

 * lxviewport - runs a unix command on everything that matches an
   element (like subtree in smallx, viewport in MT pipelines)

 * lxtransduce - ??

 * want to make these pipelines more declarative so people can use them
   without writing code.

 * XSLT is also available

Rui Lopes: (see presentation)

 * APP: Architecture for XML Processing

 * Complex processing support for digital librarys - both developers and

 * Always a need for some manual purposes.

 * Tiers: a set of pipelines woing on disjoint inputs

 * Pipeline: acyclic diagraph of processors

 * Processor: defined by a URI that differentiates an interface vs
   implementation vs usage.

 * Processing language:

     Project: an RDF document

     Pipeline: mapped to a linear sequence of components

     Registry: An RDF document that registers components & their inputs
               and outputs

 * Pros:
    * Separation of concerns lets you interchange components without
      touching the pipelines.
    * Its an implementatin neutral language
    * and others

 * Cons:
    * No interation/test
    * RDF based
    * Doesn't support generation of XSLT styelsheets
    * Doesn't support chunking

 * Thoughts:
    * Good to have multiple levels of composition (not just xinclude)
    * Indirection is good for batch processing

 Alex: The model is that you define a particular step in the registry
       that is a binding, for example, of an XSLT transform to its
       to its output.  A pipeline then points to that step and the step
       can be re-used in other pipelines.

 * If the registry changes, the pipeline doesn't have to change.


    * stdin & stdout
    * then there is parameters
    * then there is the notion of input & output
    * then there is the notion of an infoset on the side
    * then there is the notion of artifacts
    * e.g. on a server you might want to store things in a cache

    * storing on a filesystem can be abstract to the idea that outputs
      have a URI and a processor can decide to write them out to disk
      if they want.  Whether that happens isn't a relevant problem.

    * It is quite likely an implementation will need to buffer things
      if you have a pipeline that isn't just a straight line.

    * In XPL everything is in scope

    * there is no guarantee that you read things at the same rate, so
      you have to buffer

    There's stiff an output being buffered & cached.  As an output
    you produce foo.infoset and later you consume foo.infoset, then you
    need to store that.

  Erik: you could have a implemention that buffers things to memory or
   alternatively to a disk cache if it is too big

    Before today, I was thinking this was like a unix pipe.
    They could be bringing in separate things, but there is still just a
    Most things talked about today don't seem like pipelines.

    My stuff is a unix a pipeline.... but that's "just an implementation
    hack" that uses shell programming.

  Erik: The reason you want to serialize is?

  Richard: Because I have a bunch of programs that run on files.  I want
   a language that I can still compile to scripts that serialize to

   There are other things that things like schema validation might do
   that may not be able to be serialized

  MSM: It is possible to define a non-standard PSVI serialization

  Erik: You can always do this by wrapping components that always

    * there are simple components where one documen comes in and one
      goes out
    * there are other ways to thing about things like XSLT:
        - there is one input and an ancillary input (the stylesheet) and
          one output
        - but this isn't always fixed

  Alex: Having a primary input is necessary for streaming

  Murray: In what case is that there is the stylesheet the input

  Norm: I have a report that is coming out and the report is always the
        same (the input document), but the XSLT is what is generated by
        the pipeline.

  MSM: Why is there emphasis on backward chaining?

  Erik: (diagram on chart w/ parallel steps that start from the same
        start and are aggregated at the end)

     Back chaining is because a step can optionally decide not to get an
     input.  It isn't that easy to understand from a user.

     Specifying order is natural and is a problem.  Users do have
     problems with [controlling] order  You have this problem with XSLT

  Richard: what drives things in XSLT is apply-templates--and that is
           not backward chaining.

           parallel paths are the 1% case

  Alex: There is a whole body of knowledge that deals with network flows
        and we should be in compliance with those known concepts and

  All: [to alex] You're going to have to prove that you need stdin for