IRC log of xproc on 2012-02-02

Timestamps are in UTC.

14:55:24 [RRSAgent]
RRSAgent has joined #xproc
14:55:24 [RRSAgent]
logging to
14:55:28 [Zakim]
Zakim has joined #xproc
14:55:31 [Norm]
zakim, this will be xproc
14:55:31 [Zakim]
ok, Norm; I see XML_PMWG()10:00AM scheduled to start in 5 minutes
14:55:49 [Norm]
rrsagent, set logs world-visible
14:55:50 [Norm]
Meeting: XML Processing Model WG
14:55:50 [Norm]
Date: 2 February 2012
14:55:50 [Norm]
14:55:50 [Norm]
Meeting: 208
14:55:53 [Norm]
Chair: Norm
14:55:54 [Norm]
Scribe: Norm
14:55:56 [Norm]
ScribeNick: Norm
14:57:36 [Norm]
Regrets: Cornelia
14:59:18 [jfuller]
almost there ... brewing up tea or I wont be useful
14:59:44 [Vojtech]
Vojtech has joined #xproc
15:00:16 [Zakim]
XML_PMWG()10:00AM has now started
15:00:23 [Zakim]
+ +1.213.457.aaaa
15:00:30 [Norm]
zakim, passcode?
15:00:30 [Zakim]
the conference code is 97762 (tel:+1.617.761.6200, Norm
15:00:46 [Zakim]
- +1.213.457.aaaa
15:00:48 [Zakim]
XML_PMWG()10:00AM has ended
15:00:48 [Zakim]
Attendees were +1.213.457.aaaa
15:01:04 [alexmilowski]
alexmilowski has joined #xproc
15:01:10 [ht]
ht has joined #xproc
15:01:13 [Norm]
zakim, this will be xproc
15:01:13 [Zakim]
ok, Norm; I see XML_PMWG()10:00AM scheduled to start now
15:01:17 [Norm]
zakim, passcode?
15:01:17 [Zakim]
the conference code is 97762 (tel:+1.617.761.6200, Norm
15:01:22 [alexmilowski]
Hmm… trying to locate my phone. :(
15:01:30 [Zakim]
XML_PMWG()10:00AM has now started
15:01:37 [Zakim]
15:01:42 [Zakim]
15:02:17 [Norm]
zakim, who's here?
15:02:17 [Zakim]
On the phone I see Norm, [IPcaller]
15:02:18 [Zakim]
On IRC I see ht, alexmilowski, Vojtech, Zakim, RRSAgent, Norm, jfuller, Liam, caribou
15:02:24 [Norm]
zakim, [ip is jfuller
15:02:25 [Zakim]
+jfuller; got it
15:02:38 [Zakim]
15:02:45 [ht]
zakim, ? is done
15:02:48 [Zakim]
+done; got it
15:02:53 [ht]
zakim, done is me
15:02:56 [Zakim]
+ht; got it
15:03:05 [Zakim]
15:03:10 [Zakim]
15:03:18 [Vojtech]
zakim, jeroen is me
15:03:18 [Zakim]
+Vojtech; got it
15:03:29 [Norm]
zakim, who's here?
15:03:29 [Zakim]
On the phone I see Norm, jfuller, ht, Alex_Milows, Vojtech
15:03:33 [Zakim]
On IRC I see ht, alexmilowski, Vojtech, Zakim, RRSAgent, Norm, jfuller, Liam, caribou
15:03:50 [Norm]
Present: Norm, Jim, Henry, Alex, Vojtech
15:04:00 [Norm]
Topic: Accept this agenda?
15:04:00 [Norm]
15:04:04 [Norm]
15:04:08 [Norm]
Topic: Accept minutes from the previous meeting?
15:04:08 [Norm]
15:04:16 [Norm]
15:04:21 [Norm]
Topic: Next meeting: telcon, 23 February 2012
15:04:46 [Norm]
Accepted. No regrets heard.
15:05:01 [Norm]
Topic: Review of open action items
15:05:10 [Norm]
A-206-10: completed
15:05:19 [Norm]
15:05:26 [Norm]
15:06:04 [Norm]
ACTION: A-208-01: Norm to put the categorization on the agenda for 23 Feb
15:06:17 [Norm]
A-206-02: continued
15:06:31 [Norm]
A-207-01: continued
15:06:35 [Norm]
A-207-02: continued
15:06:41 [Norm]
Topic: Charter
15:06:52 [Norm]
Norm: I sent mail to Liam to setup a time to chat about it.
15:07:17 [Norm]
Norm: I'm going to propose what we talked about last week: that we have enough community interest to start a on the REC track
15:07:28 [Norm]
Topic: Discussion of items from recent email
15:07:29 [jfuller]
Philip Fennel talking XPROC
15:07:30 [jfuller]
15:08:11 [jfuller]
some xproc libs
15:08:12 [jfuller]
15:08:53 [Norm]
Jim: I'm interested in Vojtech's shimming approach.
15:09:30 [Norm]
Vojtech: There are two main things to this: one is the conversion between media types. On the ports you can declare what kinds of media types they accept or produce.
15:09:49 [Norm]
...You can specify a wildcard. For example, the identity step in my extension can process arbitrary media types.
15:10:00 [Norm]
...And the p:store step can process arbitrary media types.
15:11:01 [Norm]
...If you have two steps that are connected and the first step produces application/xml and the second step takes application/json, then if the processor knows how to convert from XML to JSON, the XML will be converted to JSON for the second step.
15:11:42 [Norm]
...This is the main idea. There are some defaults, for example application/xml and image/svg+xml, then conversion might happen or it might just be passed through.
15:12:11 [Norm]
...The second big part is what to do with XPath in XProc. Because the conversion happens only between the input and output ports, but you can also have options/choose-when/etc.
15:12:24 [Norm]
...In all the places where you can use XPath, you can encounter non-XML data in the context.
15:12:46 [Norm]
...I support that to the extent that some expressions succeed, for example base-uri() and media type information.
15:13:12 [Norm]
...Potentially you can also imagine that you could access the binary stream itself, but I didn't go that direction.
15:13:28 [Norm]
...Those are the two main things.
15:13:47 [Norm]
Vojtech's paper will be presented at XML Prague 2012.
15:14:10 [Norm]
Vojtech: There are also some small changes to the steps to operation on non-XML data (like p:identity and p:store)
15:14:21 [Norm]
...The compound steps like for-each and choose/when, they can operate on any media type.
15:14:52 [Norm]
...One nice feature that you get with for-each is that if you put an output that declares a media type, you can get automatic conversion to that media type for all the subpipeline's outputs.
15:15:44 [Norm]
...For the bindings, where the media type is not available or wrong, you have the opportunity to override the binding. You can say that this isn't application/xml, but instead it's a JPEG or something. It doesn't convert, it just overrides the media type carried with the document.
15:16:09 [Norm]
Jim: Thanks. That's fascinating.
15:16:20 [Norm]
...It sounds like the shimming is less of a hassle than the impact on XPath.
15:16:32 [Norm]
Vojtech: Well, there are more interesting problems.
15:16:48 [Norm]
...Before you evaluate an XPath expression, maybe you should shim to XML. But it occurred to me that you could do it beforehand.
15:17:08 [Norm]
...That way you get more flexibility.
15:17:55 [MoZ]
MoZ has joined #xproc
15:18:06 [Norm]
...There's a question about what kinds of conversion you should handle and how. Maybe we could come up with some default conversions.
15:18:17 [Norm]
...But if you just take XML and JSON, there are a whole bunch of schemes. I left that open.
15:18:49 [Norm]
Vojtech: Things like handling the XQuery inputs are handy.
15:19:01 [Norm]
...There are open questions about what to do with p:document or p:load if you point it to non-XML data.
15:19:35 [Norm]
...Right now we convert to base64, but with my proposal we don't do that.
15:20:02 [Norm]
...I tend to think of p:data and p:http-request as similar things, so they behave the same way.
15:20:18 [Norm]
...Overall I was surprised that it wasn't that difficult. There weren't that many breaking changes.
15:20:39 [Norm]
...It's not that dramatic a change for pipeline authors.
15:20:46 [Norm]
Jim: It doesn't sound like low-hanging fruit to me, but...
15:20:58 [Norm]
Vojtech: Well, it's not because there are a bunch of loose ends. I made some arbitrary decisions.
15:21:20 [Norm]
...Some of the questions have broader implications.
15:21:39 [Norm]
Alex: This is an example of where just looking at the low-hanging fruit doesn't necessarily achieve all the goals of making XProc easier to use.
15:21:52 [Norm]
...I'm not sure we should tackle every problem that's beneficial, but we need some way of deciding.
15:22:00 [Norm]
...I really like this. It solves problems that have really frustrated me.
15:22:42 [Norm]
Vojtech: Yes. I have lost of implementations with non-XML data, so you need to work around a lot of issues.
15:23:02 [Norm]
...I implemented it in our XProc engine and it turns out that it's not that difficult.
15:23:27 [Norm]
...I think this whole topic is something that needs more careful consideration.
15:23:47 [Norm]
Norm: I think we should put this on the list as a possibility.
15:24:06 [Norm]
Vojtech: If you just take the changes to the XProc language, there were only two: extra attributes on input/output and binding elements.
15:24:19 [Norm]
...And one more XPath extension function to get the media type.
15:25:19 [Norm]
Alex: Non-XML stuff and parameters are the two big pain points.
15:26:19 [Norm]
Norm: Yeah, I think something like Vojtech's approach for non-XML might be worth doing and I still think we might do something clever with parameters and maps.
15:26:44 [Norm]
Vojtech: I changed the pxp:zip/unzip steps to treat the ZIP documents as a (binary) stream. The output of the pipeline is really binary ZIP data.
15:26:52 [Norm]
...For this kind of functionality I think that's a nice feature.
15:28:09 [Norm]
Norm: We've got a little time left. What about Mohamed's ideas for streaming.
15:28:15 [Norm]
Norm summarizes
15:28:22 [Norm]
15:30:27 [Norm]
Norm expresses general approval.
15:30:33 [Norm]
Henry: I'm not sure I understand the unordered one.
15:31:02 [Norm]
...I'm not sure howi t buys you anything at all. Imagine that you're implementing the for-each step.
15:31:08 [Norm]
...You fork six threads and now what do you do?
15:31:19 [Norm]
Vojtech: When the first is done, you can send it's output along.
15:32:06 [Norm]
Norm attempts to explain.
15:32:20 [Norm]
Henry: It's not a buffering issue. You have to buffer all six. There's no way to know which one is going to finish first.
15:32:29 [MoZ]
Link :
15:32:59 [Norm]
...What is true is that you can hand a pointer to the buffer for whoever finishes first on to the next stage.
15:33:32 [MoZ]
no you'll have to buffer ONE LESS
15:33:57 [MoZ]
but you indeed may have to buffer the last 5
15:34:44 [Norm]
Henry: What I was thinking of was that someone might want to do more complex analysis. If you have a for-each followed by an identity followed by another for-each, you can make huge gains by optimizing the flow.
15:34:49 [MoZ]
The point being to let the implementation define the strategy that minimize buffering
15:36:07 [jfuller]
+1 to that
15:37:13 [MoZ]
we may have the same point with XSLT 2.0+ outputs
15:37:28 [MoZ]
but it becomes the XSLT implementation problem
15:38:17 [Norm]
Some discussion of cardinality and media type wrt Alex's question about href on p:query
15:38:51 [Norm]
Jim: In the past, was there any discussion of adding metadata to the input/output of ports?
15:39:02 [Norm]
...If we were going to do the shimming, instead of adding media type, we could do something broader.
15:39:53 [Norm]
Henry: "Yes" is the short answer, a very long time ago. It was in conjunction with the question of outputs. I mentioned the fact that in the old Markup Pipeline Engine, it was coherent to persist the output escaping result of an XSLT step. If you turned of output escaping, as part of the output of an XSLT step,
15:40:11 [Norm]
...if you do that in th emiddle of a pipeline that gets lost.
15:40:33 [Norm]
...Because that's a property of the *stylesheet*, the pipeline author might not even know about it. You can't use the pipeline's own controls.
15:40:49 [Norm]
...So we had a general purpose backchannel (an attribute/value backchannel) in the architecture.
15:40:51 [MoZ]
s/th emiddle/the middle/
15:41:14 [Norm]
...Another thing you can use it for is to persist the character encoding. So you can output in the same encoding that you received.
15:41:19 [Norm]
...If you get latin-3, you produce latin-3.
15:41:37 [Norm]
Alex: This is interesting, because it seems to go along with the idea of media types.
15:41:49 [Norm]
...The media type has parameters and one of them is the encoding.
15:42:16 [Norm]
...I think the immediate follow-on from having media types is dealing with other metadata.
15:42:53 [Norm]
Jim: Yeah, that's what I was thinking of. It could also potentially simplify some existing steps.
15:43:04 [Norm]
Discussion returns ineveitably to parameters.
15:43:49 [Norm]
15:48:53 [Norm]
Alex: There are two large buckets: getting stuff from an uknown bucket that comes from outside the pipeline and copying directly from options passed to the pipeline.
15:49:42 [Norm]
Henry: I think what's not clear to me is that the parameter passing topology is potentially different from the backchannel I had in mind.
15:50:00 [Norm]
...Such a backchannel is really metadata about a particular document.
15:50:50 [Norm]
Norm: I don't think we meant the same mechanism...
15:51:21 [Norm]
Alex: What's curious about the parameter problem compared to the media type thing is that it's a change that doesn't seem to really rock the boat from a pipeline author perspective.
15:51:36 [Norm]
...The parameters thing could be a radical departure.
15:51:52 [Norm]
Vojtech: I was thinking the same thing.
15:52:17 [Norm]
Norm: For my part it depends on the solution.
15:52:22 [Norm]
Henry: And here's a pipeline conversion pipeline.
15:53:19 [Norm]
Topic: Any other business?
15:53:20 [jfuller]
parameter gravity
15:53:33 [Norm]
None heard. See you in three weeks.
15:53:41 [Norm]
15:53:43 [Zakim]
15:53:44 [Zakim]
15:53:44 [Zakim]
15:53:46 [Zakim]
15:53:52 [Norm]
Hey, Vojtech, do you want a demo jam slot?
15:53:59 [Norm]
rrsagent, draft minutes
15:53:59 [RRSAgent]
I have made the request to generate Norm
15:54:25 [Norm]
moz, do you want a demo jam slot?
15:54:54 [MoZ]
Not sure of the topic
15:55:00 [MoZ]
but if there is one left
15:55:05 [MoZ]
I could do some demo for sure
15:55:54 [MoZ]
are you already out of slots, Norm ?
15:56:07 [Norm]
no, I've got a few left, just want to make sure you get one if you want one
15:56:11 [Norm]
I'm down to 5, I believe.
15:57:31 [MoZ]
where can I see the current list ?
15:57:43 [Norm]
behind my eyeballs
15:57:47 [MoZ]
15:57:53 [MoZ]
was expecting this
15:57:58 [MoZ]
ok keep me one
15:58:01 [Norm]
15:58:06 [MoZ]
15:58:21 [Norm]
rrsagent, bye
15:58:21 [RRSAgent]
I see 1 open action item saved in :
15:58:21 [RRSAgent]
ACTION: A-208-01: Norm to put the categorization on the agenda for 23 Feb [1]
15:58:21 [RRSAgent]
recorded in