XML Processing Model WG -- 26 Apr 2012

Date: 26 April 2012

<scribe> Meeting: 214

<scribe> Scribe: Norm

<scribe> ScribeNick: Norm

Accept this agenda?

-> http://www.w3.org/XML/XProc/2012/04/26-agenda

Accepted.

Accept minutes from the previous meeting?

-> http://www.w3.org/XML/XProc/2012/04/19-minutes

Accepted.

Next meeting: telcon, 10 May 2012, skip 3 May

Accepted.

Review of open action items

A-213-10: Completed

A-213-11 to A-213-14: Completed

A-213-15: Completed.

A-213-15 - A-213-18: Completed.

Vojtech: For XSLT 1.0, the only option is to write to a file, we're explicit about not having documents appear on the secondary output port.

A-213-09: Completed.

p:zip and p:unzip

Norm: I move we postpone this until Jim can be present.

Accepted.

Debugging strategies

Murray: I'm trying to figure out two things: what sorts of mechanisms would be useful in the language to assist with debugging, and what sort of things are already there?
... There's p:log, two implementations have a "message" step, but I'm wondering about the possibility of other kinds of steps.
... I've had this discussion in the past with C programmers and now I'm talking with XProc programmers.
... I put some steps in the requirements document: one to turn on debugging, one to turn on tracing. I highlighted that there are some functions that can give you information about your environment.
... I wonder about strategies ... logs, etc.
... It seems like those are the sorts of things you might expose.

Norm: There are two things you can do: get a dump of the graph and get more verbose logging.

Norm waxes poetic about -D and Java logging.

Vojtech: We have something similar. We have profiling output. And also we have a detailed trace of the pipeline: what documents were passed, what were the options and variables, etc.
... I wonder if we should try to standardize this.
... In other specifications by XQuery, there's a trace function but the rest is implementation defined or dependent.
... In my view, we have p:log which is rather inflexible and you can have a message step. But the problem with this is that it requires you to modify the pipeline and potentially break the sequence of steps. Sometimes you have to do ugly plumbing to keep the original sequence.
... Maybe what we could consider is some sort of construct like group or a wrapper that would log some information without having to add pipe bindings to keep the pipeline in the original sequence. A construct that doesn't influence the connections between the steps would be nice.
... Instead of a message step. Or we could have both. We could have a trace element that wraps a bunch of steps and does logging, but it wouldn't be a step.

Norm: Yes, we could invent a new kind of thing, but I wonder if this is so implementation dependent that it's of limited value.

Vojtech: Like p:log, I think we could leave the details implementation dependent. The trace wrapper might be something like the resource manager that we discussed in the past.

Norm: Yes, if we invent a new kind of wrapper for the resource manager, maybe we could leverage that for the trace wrapper.

Henry: Oh, I'd rather not. You really shouldn't have to edit your pipeline to do this.
... Maybe a wrapper is the best we're going to come up with. For something like the resource manager, a wrapper is more appealing because it's a feature of the design of the pipeline. Whereas, tracing and profiling are not part of the pipeline.
... So I'd rather not have something in the pipeline.
... We can't just leave this to implementors, the way the python or lisp debuggers do, because you can't implement XProc in XProc.
... A different way to talk about this in the same spirit would be to say that we already have ways to name things. Maybe we want to think about this in a sort of meta way: we want to think about ways of annotating pipelines, externally even, in order to describe tracing or profiling behavior.
... We could have a trace descriptor and a pipeline.

Alex: This could be done if you had a description of the binding for the pipeline.
... That would require the ability to point at a chunk of pipeline not individual steps.

Vojtech: We could do somethign similar to XQuery 3 with annotations.

Norm: It seems like some sort of "trace only these named steps" feature might be useful.

Henry: I know that there was at least some work in actually doing just what I dismissed: as far as the engine is concerned all you can say is instrument yourself. Where you put the enegy is in the tool that presents the output to you. So instead of trying to say only give me trace information for the last four steps, you just turn on tracing.
... Then the tool only shows you the output for only the last four steps.

Norm: Yeah. Fair point. My tracing is all adhoc.

Murray: So we could imagine an XProc pipeline that read the trace output and presented it in a nice way.
... I've heard the argument before for putting all the tracing outside the program. I've heard the same argument about documentation too.
... One of the things I've noticed as I'm gathering these requirements is a section called "Integration".
... A lot of these requirements in the areas of debugging and testing and error handling are related to integration. All of these things can be aided by leaving sign posts in your program. If you know that you're having a problem in a certain area of the program, then leaving the indicators in there and being able to flag the pipeline could be very helpful.

<ht> Hmm -- I absolutely agree that documentation is an integral part of a program or pipeline

Murray: You can run your pipeline 24 hours a day and diff the traces, look for differences, etc. This just seems useful from a Q/A audit perspective.

Alex: My question is, can I write a pipeline that's normal and reasonably minimal and still debug the thing?
... Could I profile, debug, etc. without having to touch the pipeline?

Norm: I think with an appropriate debugging environment you could.

Vojtech: Yes, but some steps are in libraries that can have the same names, etc.

Murray: I don't care what anybody does with respect to designing a debugger that can look into an XProc program and debug it. More power to them. But that's not what I want to discuss. We're talking about requirements for the language.

Henry: I hear you, the way I hear this conversation going so far is that no body has come up with any.

Murray: No. Several people have made suggestions, but we keep coming back to "I want to do this from outside my pipeline"

Henry: Putting things in the language requires that implementors support them. I think the argument that I would make isn't that my program is sacred, but rather are we sure enough of the value of in-language support that we want to require everyone to do the work that's necessary.
... It's the cost-benefit analysis that comes first.

Murray: Here's a simple question: if a processor has the ability to turn on trace, then providing some markup that advises that processor that this is a good time to turn on trace, would be useful. And if the processor can't turn on trace, then it's harmless.
... I don't want to specify what comes out in the trace, though we might want to give some advice, but that's up to the processor.

Alex: I guess the conundrum as I see it is that we don't have any debuggers yet. And we have very minimal tracing and debugging support.
... I suspect there are things we should do but I don't think I know what they are.

Murray: Well, Norm said he output trace information...

Alex: Yes, but that's very primitive compared to other languages. Do we have the right naming conventions, for example?

Murray: We decided, early on, that there would be a "stderr" port. Could we not designate a port for trace output?
... I just want to look for some things that would make the language easier to debug.

Vojtech: We already have p:log, but it's very primitive. Maybe we should just make p:log more flexible and useful; allowing it in options, variables, input ports, etc. Then with a processor switch, you could enable the log statements you wanted to trace.
... It could wind up in one location. Maybe we don't have to add anything new, just improve existing features? Maybe we could imagine a switch to magically insert p:log statements everywhere. The advantage of the log is that it doesn't change the sequencing of steps.

Norm: We could do that. The only thing that occurs most obviously to me would be a standard message step.

Vojtech: It's definitely useful, but it's tedious to add 10 of them.

Norm: Yes, it's tricky, but is still perhaps useful enough to standarize.

Vojtech: Maybe with a switch to disable the output.

Henry: Yes, I think that might be worth looking at standardizing that. Maybe we could add classes so that you can enable them or disable them by name. It would be nice to be able to turn them off without having to edit them out.

Alex: I'm looking at p:log. First a question: If I don't have an href or if I use the same href, what happens?

Implementors mumble a bit

Alex: It would be nice if there was some metadata on the output so that I could reconstruct what happened later. A notion of what port this was produced from, when it arrived, etc.
... Similarly, it might be nice to log inputs.

Vojtech: Absolutely.

Alex: It would be nice to be able to put assertions inside the p:log step.
... Is this XPath expression true?

Vojtech: The ability to construct a message with an XPath expression would be useful.

Alex: Those are the sorts of things that would be useful.
... You could have one big log file with all the data in it; then you could examine that output.

Murray: So one of the things we could consider is whether every step would have a verbosity level and basically if you had high verbosity turned on, then that step would report some things when it started.
... We could rationally talk about what those conditions might be.
... Speaking of which, I've listed a lot of functions in the use cases and requirements document. It might be nice to have an exhaustive list.

Norm: Where's the list?

Murray: F.5.12

Norm: That's a mixture so I'm confused.

Murray: Yes, it's a mixture, but they return information about the current context or environment.
... All of this is useful information that you can use in debugging. Years ago, working in troff, I got some debugging built in. We had levels of verbosity and I could set the warning/error etc. messages. I could print messages at the beginnings of loops, I could turn trace on in the middle of a loop, etc.
... I found this useful at the time.

Norm: Yes. I can see that.
... Of the things we've discussed today, I think the proposal to extend p:log so that it can contain messages or assertions and the ability to log inputs seems like the best combination of utillity and low hanging fruit.

<scribe> ACTION: Norm to sketch out an extension to p:log with messages and assertions. [recorded in http://www.w3.org/2012/04/26-xproc-minutes.html#action01]

Clustering

Murray: Who's baby is clustering?

Norm: What do you mean by clustering?

Murray: Good question. I found an input along the lines of "does XProc need clustering?"

Norm: In the doc?

Murray: Yes, F.3.3

Henry: Is this group-by?

Some discussion of where the requirement came from and what it means

Streaming and parallel processing

Murray: Alex and I have noted some language along these lines in the first requirements and use cases document that didn't make it into the spec.
... But it's never clear what streaming and parallel processing mean in concrete terms.
... How have we impeded or assisted parallel processing?

<Vojtech> Btw, it was Paul who put the remark about clustering in the wiki

Henry: Parallel processing is a little easier. What I think we meant is to never constrain parallelization

<Vojtech> http://www.w3.org/wiki/index.php?title=Integration&diff=55046&oldid=55034

Henry: Make no assumptions about evaluation order that aren't required by explicit connectiviiy.

<Vojtech> Oh, it was Henry!

Henry: The way I used to say it was: it ought to be possible to implement an XProc processor by starting each step in a thread an waiting to see what happens. Someone has input, everyone else is blocked, and each step works as input arrives.
... For example, there's nothing today that says that the steps at the bottom of a pipeline have to run after the ones at the top.

Murray; for-each says the step must produce output in the right order. Does that have an impact on parallelism?

Norm: On streaming more than parallel processing.

Alex: It might be nice to add annotations to a pipeline to say what the streaming/parllelism expectations are.

Murray: I was puzzled by a request to allow for-each in an unordered way

Henry: Yes, this connects up to unordered collections. Right now we have sequences, but if we had collections, then you could have a switch on p:for-each that said it was allowed to be unordered.
... Then the question is, what does a step that takes an unordered collection as input look like?

<scribe> ACTION: Norm to put streaming/parallel processing on the agenda for two weeks [recorded in http://www.w3.org/2012/04/26-xproc-minutes.html#action02]

Norm: Adjourned

- DRAFT -

XML Processing Model WG

26 Apr 2012

Attendees

Contents

Accept this agenda?

Accept minutes from the previous meeting?

Next meeting: telcon, 10 May 2012, skip 3 May

Review of open action items

p:zip and p:unzip

Debugging strategies

Clustering

Streaming and parallel processing

Summary of Action Items

Scribe.perl diagnostic output