W3C

XML Processing Model WG

27 Feb 2006 P.M.

Agenda

Attendees

Present
Norm, Alex, Rui, Michael, Erik, Henry, Murray, Richard, Andrew (by phone)
Regrets
Chair
Norm
Scribe
Alex

XPL Presentation (See presentation)

 * Michael: (Clarification)
         The 'infosetref' attribute represents the binding and the
names
         are internal to the component
         The 'name' attribute is the formal parameter name.

 * Erik: the p:input and p:output declare the name of the inputs and
   outputs that are used to invoke the process and handle the results

 * Norm: (Clarification)
  It is the pipeline processor that looks at the inputs and outputs?

 * Erik: the inputs and outputs are evaluated in a lazy fashion and it
   back-chains through the steps which eventually leads to the input of
   the pipeline.

 * Alex: (Claification)
   How does back chaining work with conditionals?

 * Erik: The output of conditionals needs to have the same infoset name.

 * Erik: XHTML example (use case 5.15: Content-Dependent
         Transformations)
    - one of the use cases.
    - one of the steps rewrites the QNames for presentation in IE
    - one of the steps deals with HTML serialization
    - the output for serialization uses an internal root element node
      for representation of text and binary (character encoded)

 * Erik: Iteration example:
    - lets you iteration over an document via xpath expression
    - the current() function gives you the current item being
      iterated
    - gives you the ability to process large XML document
 * Murray: Does each of the steps have its own XML vocabulary (e.g. HTTP
           serializer)
 * Erik: Yes.
 * Richard: Do they require their own namespaces
 * Erik: No, but there it isn't required as it is contextual to the
   component.  Having another namespace adds declarations to the
   document.

GUI Tool Sub-thread:

 * Richard: Do you have a GUI tool?
 * Erik: No.
 * Richard: we should define the tool in terms of a graph
 * Norm & Michael expressed concern with this as they wouldn't
   want to require a GUI tool.  That starting with a graph could
   ignore the XML representation

Norm's SXPipe:

 * http://norman.walsh.name/2004/06/20/sxpipe

 * Stages are executed in order.  It is handed a DOM and returns a
   DOM.

 * In example, skip attribute allows steps to be skipped.  If statically
   evaluated to true, the step isn't executed.

 * Impl: two methods: init & run.  Init is passed the element that
   represents the stages.  1700 lines of java.

(Alex's presentation here)

Richard's presentation:

 * I want to replace what we do today without a pipeline with an XML
   pipeline.

 * lxgrep - produces a tree fragment (multiple root elements possible)
   via an XPath

 * lxprintf - formats Xpath matches as plain text

     -e element   For each element

 * lxreplace - replaces elements/attributes
     -n   Renames an element

 * lxsort - sorts elements by values identified by an XPath

 * lxviewport - runs a unix command on everything that matches an
   element (like subtree in smallx, viewport in MT pipelines)

 * lxtransduce - ??

 * want to make these pipelines more declarative so people can use them
   without writing code.

 * XSLT is also available

Rui Lopes: (see presentation)

 * APP: Architecture for XML Processing

 * Complex processing support for digital librarys - both developers and
   producers

 * Always a need for some manual purposes.

 * Tiers: a set of pipelines woing on disjoint inputs

 * Pipeline: acyclic diagraph of processors

 * Processor: defined by a URI that differentiates an interface vs
   implementation vs usage.

 * Processing language:

     Project: an RDF document

     Pipeline: mapped to a linear sequence of components

     Registry: An RDF document that registers components & their inputs
               and outputs

 * Pros:
    * Separation of concerns lets you interchange components without
      touching the pipelines.
    * Its an implementatin neutral language
    * and others

 * Cons:
    * No interation/test
    * RDF based
    * Doesn't support generation of XSLT styelsheets
    * Doesn't support chunking

 * Thoughts:
    * Good to have multiple levels of composition (not just xinclude)
    * Indirection is good for batch processing

 Alex: The model is that you define a particular step in the registry
       that is a binding, for example, of an XSLT transform to its
       input+parameters
       to its output.  A pipeline then points to that step and the step
       can be re-used in other pipelines.

 * If the registry changes, the pipeline doesn't have to change.

Infosets:

  Murray:
    * stdin & stdout
    * then there is parameters
    * then there is the notion of input & output
    * then there is the notion of an infoset on the side
    * then there is the notion of artifacts
    * e.g. on a server you might want to store things in a cache

  Norm:
    * storing on a filesystem can be abstract to the idea that outputs
      have a URI and a processor can decide to write them out to disk
      if they want.  Whether that happens isn't a relevant problem.

  Richard:
    * It is quite likely an implementation will need to buffer things
      if you have a pipeline that isn't just a straight line.

  Erik
    * In XPL everything is in scope

  Richard:
    * there is no guarantee that you read things at the same rate, so
      you have to buffer

  Murray:
    There's stiff an output being buffered & cached.  As an output
    you produce foo.infoset and later you consume foo.infoset, then you
    need to store that.

  Erik: you could have a implemention that buffers things to memory or
   alternatively to a disk cache if it is too big

  Murray:
    Before today, I was thinking this was like a unix pipe.
    They could be bringing in separate things, but there is still just a
    pipeline.
    Most things talked about today don't seem like pipelines.

  Richard:
    My stuff is a unix a pipeline.... but that's "just an implementation
    hack" that uses shell programming.

  Erik: The reason you want to serialize is?

  Richard: Because I have a bunch of programs that run on files.  I want
   a language that I can still compile to scripts that serialize to
   files.

   There are other things that things like schema validation might do
   that may not be able to be serialized

  MSM: It is possible to define a non-standard PSVI serialization

  Erik: You can always do this by wrapping components that always
        serialize

  Norm:
    * there are simple components where one documen comes in and one
      goes out
    * there are other ways to thing about things like XSLT:
        - there is one input and an ancillary input (the stylesheet) and
          one output
        - but this isn't always fixed

  Alex: Having a primary input is necessary for streaming
        implemenations.

  Murray: In what case is that there is the stylesheet the input

  Norm: I have a report that is coming out and the report is always the
        same (the input document), but the XSLT is what is generated by
        the pipeline.

  MSM: Why is there emphasis on backward chaining?

  Erik: (diagram on chart w/ parallel steps that start from the same
        start and are aggregated at the end)

     Back chaining is because a step can optionally decide not to get an
     input.  It isn't that easy to understand from a user.

     Specifying order is natural and is a problem.  Users do have
     problems with [controlling] order  You have this problem with XSLT

  Richard: what drives things in XSLT is apply-templates--and that is
           not backward chaining.

           parallel paths are the 1% case

  Alex: There is a whole body of knowledge that deals with network flows
        and we should be in compliance with those known concepts and
        algorithms.

  All: [to alex] You're going to have to prove that you need stdin for
       optimization.