XProc F2F -- 01 Nov 2011

XML processor profiles discussion

Paul: historically there has been a need for an xml proc profile document for years

Henry: xml spec left some flexibility points, which were compounded by the existence of by follow on specs ... causing problems later on

Jim: just by giving a profile a name makes it clearer just what it is.
... giving a name to the profiles is important, so we can agree on what we are talking about

Alex&Henry: ...we need to specifically ask the XQuery WG to review the next LCWD of the profiles spec

Henry: the XQuery/XSLT WG's lean towards wanting features in their spec

Alex: we need to make careful consideration of socializing the xml proc profile

Paul: we should include in the new charter the maintenance of XML Proc profile

Murray: feels that the full profile is not really full, story around validation is weak'

Paul: we should run through some issues discovered yesterday

Alex: reviewed data model, as one of action items from yesterday

Jim: started a short rationale for picking out profiles

Murray was to provide short wording for Section 2

Paul: Michael had a comment about xinclude and we decided we should expand on xml:base and xml:id as well, so readers did not have to jump around

Reviewing Alex text/prose/work on the data model

Alex: we are using the term data model as per its natural definition, but in some cases we abuse the term
... abuse of term is usually related to infoset properties or reference to the process of going from xml to the data model

Henry: the infoset does not exist, it's a vocabulary
... XML processors map from an XML document to a data model, the question is what info is preserved in that data model?'

Alex: reviewing the use of the term 'data model' throughout the xml processor profile document

Alex: we are not telling you how to construct a data model, we are providing the ingredients .... (scribe: a bit unclear)

Alex: suggest in section 2 that we should just talk about the information available versus the data model

<ht> Just talk about "information available" wrt adopting each profile

Murray: brings up the term 'faithful' in relation to the GRDDL spec
... Should we have a similar para in the intro, that says "we realize there is a data model, but we insist that you ensure faithful provision of those data items into the data model"

<ht> Could we do better by talking about 'data model' as shorthand for the interface between the processor and the application

Alex: section 3 Classes of Information, note under class x def -- is a good example usage (still needs work though)
... Maybe we should move note into a more prominent position

Murray: Maybe we can call it the receiving applications' data model

Alex: when we rewrite Section 6, we revisit its usage of the term data model

Henry: it maybe that the usage of the term data model is dropped in that section

Alex: Suggests boiler plate language in the conforming section, so that end consumers can 'cut n paste'

Murray: would be cool if we could write that up as W3c note with each of those specs

Henry: would you be prepared to work with us (to other WG) on such a note
... the right place for that, is in the implementation report ... what goes in the impl report is 'proof that the xml proc profile has done its job'

Jim: doing now a prelim mapping of existing processors to profiles

Murray: if we could actually get some tool vendors to say we are valid to a particular xml proc profile

Henry: provides explanation of the rationale of xml proc profile for Philipp Hoschka

Murray: you can't implement the english language, but you can conform

Henry: it's an existence proof

XML/HTML5 liaison on external subset

Henry: We have discussed asking that XHTML5's implicit external subset include attribute information such as nmtokens and id-ness.
... it occurs to us in discussion, does not give you the same DOM that you get today (by processing dtd, using it)
... given that all those parsers can already read dtd syntax, that we should explore the possibilities of getting those attlist decls added to the data: URI
... discussed with editor of HTML5 spec

<PGrosso> [This issue came up during yesterday morning's f2f, and Henry followed up with Ian Hickson, editor of the HTML5 spec.]

<PGrosso> Henry will follow up with Henri Sivonen.

<PGrosso> ACTION: Henry to follow up with Henri Sivonen on attr decls [recorded in http://www.w3.org/XML/XProc/2011/11/01-minutes.html#action01]

Paul: other issue from yesterday, what is the story with internal subset when parsed with xml parser ... with Henry

<PGrosso> ACTION: Henry to look into the story with an internal subset in an HTML5 document when parsed using the XML parser [recorded in http://www.w3.org/XML/XProc/2011/11/01-minutes.html#action02]

Henry: catalogs have no official standing

Paul: this goes way back

Henry: catalogs have no statutory basis with the W3C ... so there is no way to say/prove that the HTML5 spec is intentionally breaking the [non-existent] W3C catalog spec
... the writers of the HTML spec wanted to cover the scenario where there is no network access

Murray: I'm confused about going out to the network ... why is it a problem to get js and everything else you want?

Alex: the XML parser's resource manager is blocking,

XProc Vnext

Cornelia: Were all those requirements met?

Henry: [responded -- scribe missed]

Jim: perhaps we need to dig into the reasons why as it relates to original use cases

Henry: Most of the use cases pulled from common pipeline experience, pretty concrete and simple

Cornelia: asked about v.next and should we do v.next, based on what

Mohamed: one of them was GRDDL and was not fulfilled

Everyone: having a chat about use cases ...

Cornelia: speak to Vojtech about xproc in the browser

Alex: demonstrating that current use cases maybe defined too broadly ...

Henry: it's important to the browser vendors to have a live unified API, much wider range of behaviors more tightly defined ... xproc might be able to take advantage of this
... Most of the browsers you can register a handler and link up with a media type hook, can we use as an extensibility point for xproc

<PGrosso> The V1 requirements doc is at http://www.w3.org/TR/xproc-requirements/

<ht> RESOLUTION: Those present agree that we should request rechartering to do a Vnext, subject to Norm's agreement

Paul: should we go through v1 requirements doc ?
... summarising, the group here is interested in v.next, perhaps we should create a requirements doc for v.next ... start with v1 and capture the delta

Henry: clarifies we should be collecting candidate requirements for v.next
... highlights that v.next may have another dimension in terms of dealing with a broader set of data types (serialized xml)

<alexmilowski> http://www.w3.org/TR/xproc-requirements/#use-case-rw-non-xml

<alexmilowski> That's the read/write non-XML use case.

Mohamed: we have p:data, Henry reminds us that that this is not enough

<ht> AM: General topics:

<ht> ... 1) What flows: XDM more generally, text, JSON

<ht> ... 2) Usability

<ht> ... 3) New steps and control primitives

<ht> ... 4) Resource Manager

<ht> JF: What are the major sources of cognitive dissonance

Henry: the problems at the margins were known
... maybe we put up a wiki for dumping thoughts for v.next use cases, pain points, good experiences

Cornelia: overviews a specific usage of xproc in her experience, mentions mashups ... discuss pain points and usage/adoption

<ht> LQ wanted us to look at how XProc can function as the backbone for WebApps, doing what he called the 'choreography' of XQuery, XSLT, etc. in an XRX-style app

Henry: there is more rigor in review of charter these days

Paul: should we not focus on v.next use cases as the basis of rechartering?

Alex: highlights the need for the right level of detail in the use cases (and follow on charter), so we give ourselves 'wiggle room'

Henry: steers us back on course to focus discussing v.next and everything around rechartering
... tries to overview what storage might mean

Mohamed: Can we stick to the term 'resource manager'

Henry: Do we want to add abstraction around the idea of a resource manager?

Cornelia: This is a fundamental paradigm shift from purely flow based model to introducing the concept of a variable

Henry: [history lesson about early pipeline efforts]
... we could do that, because we could envisage pipeline to exist as running in an app server ...

Alex: That architecturally does not preclude running on a device of any size
... I just want to compute a document, give it a name and magic happens!

Henry: we built a toolset at the unix shell level, to build pipelines ... turned into a constraint on our thinking
... sums up---Lets put up a wiki, collect pain

Moz: adds that we probably need to consider the need for defining a profile tool

<ht> 5) Integration

<ht> ... including profiling, APIs, LQ's point

Henry has got wiki running

<ht> http://www.w3.org/wiki/XprocVnext

<PGrosso> ...is the wiki for collecting potential V.next info and such?

Alex: overview of work he has done atomojo
... ran into various xprocisms
... passing optional options to declare steps
... empty source on p:template

Henry: the reason why we couldn't do that was for error diagnostics

Vojtech&Alex: discuss the impact of addressing this

Vojtech: what I observed is writing xproc by hand is annoying

Alex: output signatures for compound steps could be made easier

Ales: issues with inlining xslt and handling whitespace .. get a bit more sophisticated with handling of whitespace

Alex: loading computed uris (perhaps solved with AVT in options)

Henry: all the other compound steps have their steps wrapped ... which is the basis for wrapping try/catch with p:group
... parameters and the rules for their use, at least to simplify

Alex: scenario where out of band env information is wanted, Java world does this with JNDI/OSGI

addressing Moz concerns

Vojtech: because Moz dealing with streaming impl
... try/catch and streaming not so miscible

Alex: one way you can look at it is as parallel-but-synchronized streams

Vojtech attempts to intepret Moz requirement

Alex: Can't we do this today?
... strings bad .... we all know it

Henry: adding his candidates, map reduce related steps and iterate until a condition is met step

Jim: cx:until-changed

<alexmilowski> See: http://xmlcalabash.com/docs/reference/

other steps

http://exproc.org/proposed/steps/

[scribe getting lost .... still working on possible new steps]

Florent: make avail for user step?

Moz: NVDL names of the port depends on the script

Henry&Moz: reviewing historical basis ...

adding calabash/exproc steps to the v.next wiki ....

Alex: we have the same problem as xpath had when going to v2, we have an incomplete inventory
... there is all these concurrency operations that people do

Henry: I have this pipeline and want it to respond to eventing
... clarifying the on-demand construction (as related to Resource Manager)
... ability to store/retrieve, e.g. local cache

Moz: the use case is 'can we make xinclude be connected to output of some steps' .... give uris to step output

Henry: we did it (in the past) by preceding resource manager uris with a double hash
... I am going to look at my input and transform it; here is a little micro catalog that maps the uris in this document ... Vojtech adds you can make it more dynamic

Vojtech: need to add sync primitives

Alex&Vojtech&Moz: discuss sync on a step

<fgeorges> Henry, this is probably the one: http://code.google.com/p/xproc-plus-time/

<alexmilowski> http://aws.amazon.com/elasticache/

<alexmilowski> ElastiCache

http://code.google.com/p/xproc-plus-time/ link to Philip Fennell work with SMIL + XPROC

Vojtech: we can access steps from xpath via user functions

<PGrosso> ACTION: Henry and Norm to add a schema to the p:template note that includes p:template and p:in-scope-namespaces. [recorded in http://www.w3.org/XML/XProc/2011/11/01-minutes.html#action05]

Discussing how to bring in non xml data into xproc

Jim: what was the historical discussion on this ...

Henry: there was previous discussion but was tightly constrained

Jim&Henry: external input/outputs maybe the right place to do it

Cornelia: use canonical xml when inside xproc

Moz: use case of csv coming into xproc, requires pre/post step processing outside of xproc

Henry: step signatures could include media type ... this is one design point, there are 3 variants:

Henry:

Non-XML only at the margins, on the way in or out of whole pipeline
You really only allow non-XML for a moment, as the input and/or the output of the step, but we stay all-XML in the pipes. Two sub-cases
1. Auto-shimmming: the engine inserts the necessary shims before and/or after, so effectively a macro;
2. Obligatory shimming: The pipeline author must include the shims explicitly.
Non-XML flows: where pne real (non-shim) step produces a non-XML media type and another step can accept that media type, you can plug them together

Henry: Unsure if we want to go as far as (3), so we could have e.g. allow json to json connections, which would mean json in the pipes

Cornelia: autoshimming, have we captured the option of what we require is that the developer of the pipeline be explicit e.g. not done automagically

Murray: are what you are proposing a step for doing these conversions?
... this is a babelfish

Alex: We may have closed the door a bit, using http-request as example
... work to do there
... what we really want is the structure and data, not the blob
... there is no way to say I don't know how to deal with this media type
... as a pipeline author, when you call http-request, I expect you to at least 'not catch fire'

Moz: the only way to make things compat today, is to add an optional feature to http-request, specifying how to convert to/from XML (call this 'bedazzling') ... in specific cases if you recognize some things to flag a pre/post process

Henry: whatever media type you should be able to handle that, i.e. handle the shimming dynamically

Alex: it would be nice to declare the kind of media type you can handle coming out of an http-request
... worst case scenario you will get markup of a similar thing ...

Moz: csv to xml, sometimes you want to separate columns ... having only one way to convert is probably a worst answer then none

Jim: ability to connect to odbc

Moz: we may not want every piece of the xml document (after bedazzling process is applied)

Vojtech: the way you specify alternatives here could be similar to serializations, using some agreed upon qnames ...
... could be supported by all implementations

Florent: this is not a trivial task and many of these kind of efforts become intractable

Henry: who does this kind of process with qnames now?

scribe gets lost a bit

Vojtech: we do this in serialization method today

XProc evangelizing

Cornelia: who are we evangelizing to?

Alex: we don't know who is using xproc

Jim offers the idea of using XML Prague preconf day

as a place to do a XProc workshop/contest

Cornelia: who is the developer profile ... you can use a pipeline language instead of using Java
... most people say they are X-averse

Murray says we go after non native java speakers

Alex: I am mixing and mashing ... built whole atom app impl with xproc

Murray: we could possibly build 2,3 or 4 reference usages, turning a docbook into pdf

Moz: exists now

Alex: reviewing our home WG page

Moz: is it time to have an xproc interest group? XProc WG is about pub the spec ...

Murray: is anyone working on a graphical user interface for XProc

Moz: we need to get the W3C interested in xproc ... publishing tools

Philipp mentions about W3C blog

Moz discusses the folks he has trained

Henry addressing Cornelia's question, ' the way you get interest in xproc [at enterprise level?] there are 2 tracks that are difficult'

Henry: the 2 target audiences are the pro services people
... that is hard to sell from scratch
... if you want to make money out of xproc, implement a vertical ..
... you will have to build an xproc solution and sell that first

Jim: we need early adopters

Cornelia: I have done xproc pipelines in the medical space, EBXML ... brutal to parse

<PGrosso> Jim: It would be great to have the usability of xproc to be simpler than abusing ant or xslt.

Henry: divide and conquer

<PGrosso> Jim: what other things can the average user use xproc to do.

Henry: many years ago I had to convert PPT html, that left the results editable by humans
... at that time OpenOffice just came out ... the xml it emitted was pretty bad
... I wrote a number of stylesheets, e.g. bite off manageable bits

<PGrosso> Henry: The basic message is that xproc substantiates divide and conquer.

<MoZ> Henry: you have to either be Shell Script or JAXP programmer if you don't do XProc

<PGrosso> Cornelia: Do we have tooling as one thing to look into for V.next because it is one of the barriers now.

<PGrosso> Henry: Yes that is under the integration heading.

<PGrosso> Jim: What about academics, professors? How do we reach them?

<PGrosso> Henry: Norm and I need to write a CACM article.

<PGrosso> Cornelia: Vojtech will be just coming from Berkeley after lunch, so let's see what he has to say.

Cornelia: lets continue with Vojtech talking about academic

Henry welcomes Florent Georges as observer

Vojtech just spoke about xproc at Berkeley

Vojtech: they seemed to 'get it', showed them some examples ... they already had xml basics (stylesheet, schema, xpath ...)
... students appreciated the idea ...

Cornelia: did you get a sense of the course at Berkeley and the kind of students who attended, following on our earlier question about who should we target

Henry asks Vojtech, any ideas to raise our profile?

Cornelia: might lean closer to markup side versus CS

Vojtech: what missing most is libraries of existing xproc pipelines
... important to demonstrate that there are practical applications versus theory

Moz: anything specific (e.g. we talked about DITA pipeline)

Vojtech: takes time to adopt, more people looking at it to satisfy curiosity
... more interest in xproc addon to documentum,

Henry: any brilliant suggestions to get xproc adopted ...

Florent: library sounds good and really valid ...

Jim: whats the state of expath package with calabash

Florent: working on expath packaging system

http://www.expath.org/modules/pkg/

Florent gives overview of expath packaging

www.cxan.org

Jim: you have cxan as well ...

Florent: overviews cxan ...

Jim: what example pipeline could be low hanging fruit?

Henry: Markup Tech had posted several pipelines (not using xproc), and the one that had the most support was...an SGML to XML converter, using OST

<alexmilowski> The server distribution is described here: http://code.google.com/p/atomojo/wiki/V2Server

<alexmilowski> I've been working on the description of the XML configuraiton here: http://code.google.com/p/atomojo/wiki/V2ServerComponentXML

<alexmilowski> ...and examples of use: http://code.google.com/p/atomojo/source/browse/#svn%2Ftrunk%2Fv2%2Fapp-server%2Fconf

Henry: We don't at the moment have any way to add metadata to things in the pipeline.

Henry goes over a few of the example piplines he used as demo

Henry: the other thing we did with that was propagate output flags (with disable-output-excaping)

Alex: what about packaging epub

Moz playing the devil ...

Geert work https://github.com/grtjn/xproc-ebook-conv

http://code.google.com/p/daisy-pipeline/

Moz provides example (based on work at a Bank) ... only format is excel

Moz: every package format is something interesting, excel more so for banking

Jim: I am willing to do a few example packages

Florent: maybe another area, for existing projects ... xml databases, library of steps

Henry: notes that Adam Retter has the patch for calabash for eXist

Vojtech: we have a couple of extensions, but not really appropriate (native processing)

<scribe> ACTION: JF to ask Geert if we can use his pipeline [recorded in http://www.w3.org/XML/XProc/2011/11/01-minutes.html#action03]

<scribe> ACTION: Alex to liase with Henry to get his appserver stuff working to host example/demo xproc [recorded in http://www.w3.org/XML/XProc/2011/11/01-minutes.html#action04]

Vojtech: I have a few example pipelines, GRDDL pipeline

<fgeorges> Geert just responded me: https://github.com/grtjn/xproc-ebook-conv/

Cornelia: we talked about verticals ... are there any areas of verticals as a group

http://www.w3.org/wiki/XprocVnext

Florent: all my sites are using pipelines

Cornelia: ability to serve up html is a good example, whats the representation format?

<Vojtech> GRDDL in XProc: https://community.emc.com/docs/DOC-10276

Florent: overviews the process of how his sites work with pipelines

Cornelia: is there any standard format for page layout ... various answers (html, xslfo, etc....)

Henry: identity pipeline, sgml2xml conversion (preserving entity references)

<Vojtech> Javadoc-style XProc documentation generator in XProc

<Vojtech> :https://community.emc.com/docs/DOC-8657

Henry: validation of large doc (one piece of a time) using viewport, xinclude remove xml:base (absolutize ...)

<Vojtech> Pre-OAuth Twitter archiving XProc pipeline: https://community.emc.com/docs/DOC-6967

Henry: why schema based processing is worth it = because you can leverage the type information
... validate twice with surgery (validate then delete everything that is not validity known)

WG takes a short break to update Norm (who is dealing with snowstorm fun n games)

<PGrosso> Norm: (via IM) Agreed with our requesting to recharter.

Cornelia: I heard the term Tutorial used ...

Alex: No one has written the primer

Paul wondering about primer

Henry: very grateful to everyone for coming and contributing, it was very valuable

- DRAFT -

XProc F2F

01 Nov 2011

Attendees

Contents

XML processor profiles discussion

XML/HTML5 liaison on external subset

XProc Vnext

XProc evangelizing

Summary of Action Items