XML Processing Model WG

Meeting 73, 5 Jul 2007


See also: IRC log


Norm, Mohamed, Rui, Paul, Henry, Murray, Andrew
Richard, Alessandro


Accept this agenda?

-> http://www.w3.org/XML/XProc/2007/07/05-agenda


Accept minutes from the previous meeting?

-> http://www.w3.org/XML/XProc/2007/06/28-minutes


Next meeting: telcon 12 July 2007

Richard's regrets continue; probably regrets from Mohamed, Henry until 16 August.

Review of 6 July 2007 Working Draft

-> http://www.w3.org/XML/XProc/docs/WD-xproc-20070706/

Murray: On some fourth level headings, the formatting looks a bit odd.

<scribe> ACTION: Norm to do something about the formatting of fourth level headings [recorded in http://www.w3.org/2007/07/05-xproc-minutes.html#action01]

Murray: In particular, since we have an element name in there, having it in u/c is a problem.

Mohamed: Some small editorial problems that I sent to Alex didn't get incorporated.
... and error codes are in an odd order.

<scribe> ACTION: Norm to sort the error codes in the appendix [recorded in http://www.w3.org/2007/07/05-xproc-minutes.html#action02]

Mohamed: What about p:map?

Norm: Yes, we still need to talk about that, but I don't think it'll get in this draft.

Mohamed: We have a schematron reference but no schematron step.

Norm: I thought we had agreed to have a schematron step.

Henry: Seems reasonable to me, along with XSLT2 and XSL Formatter.

Mohamed: We may also want to have an NVDL step.

Norm: Yes.
... I'd like someone to propose how the NVDL step would work.

Murray: What about an appendix for the WG members.

Norm: Sure.

Proposal: We'll publish this as a public Working Draft tomorrow.


Step library issues

-> http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2007May/0318.html

Norm: Let's struggle on in Alex's absence.
... What about parsing HTML?

Henry: I seem to recall that if we said the content-type was text/html, then you get an implementation defined mapping from HTML to XHTML.

Norm: Should we do it that way?

Henry: There was an implicit reference to the HTTP request step that it by default produces escaped markup.

Norm: I hope that's wrong.

Henry: We have an unescape markup step because we know that Atom, RSS, NewsML, etc can encapsulate documents with escaped markup.
... So it seems that p:http-request and p:unescape-markup have this problem.
... but what do save/serialize have to do with this?
... I'd like to split receiving and producing.
... How about: it's implementation defined if any media types under than application/xml or application/foo+xml are allowed. Processors are not required to support any other media types. But if they do, then it's implementation defined what mechanism they use to get from the ones they support to XML.

Murray: Are we still talking about infosets?

Henry: Yes, that's why this problem arises

Murray: So it's implementation defined how you build an infoset from something that isn't XML.

Norm: I'm happy with Henry's proposal as a starting point.

Murray: I'm worried about how many different kinds of implementation-defined we're going to get.
... In GRDDL, we have an issue called faithful infosets. This arises because in GRDDL, we're talking about XPath node trees and there are questions about validation and XInclude, etc.
... This seems to create another faithful infoset issue.

Scribe stepped away, a few minutes lost

Henry: The things you can depend on are the minimal common subset that more-or-less the infoset defines
... It's true that there's more in the XPath 2.0 datamodel, but you can't get at it from our language.

Norm: I'm sympathetic because of web services like Flickr that allow users to get comments

Murray: I think everything needs to be able to filter to XML or you need to have a specific component that's for loading non-XML things

Henry: I think Murray is right, but we're going to cheat just a little bit and say there are two.
... I'm happy that if you want to inject HTML into your pipeline and gaurantee that it's XML then you have to use http-request.

Norm: We have load, basically only to support DTD validation

<Zakim> MoZ, you wanted to ask Murray on the difference between XPath node trees and infosets and to

Mohamed: I have a problem with components that translate from HTML to XML.

Norm: I want it to be implementation defined.

Mohamed: Norm, you said HTML to XHTML, but maybe we just meant HTML to XML.

Henry: Yes, I think that was my fault. All we need is XML.

Murray outlines a recent GRDDL use case about faithfulness of a representation

Murray: My initial thought was that there should be a "garbage-in" step that could reach out and bring anything in.

Norm: I think implementors will provide this if we don't

Henry: The way I read this, you can specify that you require an application/html+xml media type and that will cause the pipeline to fail if you don't get it.

Murray: I do an http-request and what I get back is an HTML document. I run some kind of process over that and I get some result. That result may be successful or not successfull.
... What comes out of http-request will be the result.
... But presumably I as the author of the pipeline want to know a couple of things.

Norm: I think you can find all of those things by looking at the headers and body you get back.

Henry: If you're using tidy, I'll expect implementations to fail if tidy throws errors.

Norm: I agree.

Henry: If you're using tagsoup, then you know you'll always get an output.

<Zakim> MoZ, you wanted to speaks about the difference between p:parameter namespace=""... and p:option without namespace@

Mohamed: Are we sure that the parameters of the header will be available to the next step?
... The http-request step will ask with some parameters, the result will be one of those.

Murray: So the http-request does a get and there are some headers.

Norm: You get those back in the headers.

<Zakim> ht, you wanted to register a concern about the architecture of p:http-request

Henry: If no one else is worrying about this, that's ok, because I'm only looking at this in detail now.
... Had we already discussed doing this using two output ports instead?
... I'd like to be able to write a take-my-chances pipeline where the primary output is a sequence of documents.
... And only if I care about the minutia do I look at the port.

Norm: I'm not sure how that would handle multipart related.

Henry: An alternative would be to say that there is an option that says "take my chances"
... I want a sequence of documents or fail, don't bother me with all this stuff.

Norm: That's not on the table now, but if you can fire off a quick message before you go on vacatoin, that would be good.

<Zakim> MoZ, you wanted to ask the question why p:store/!result is not primary but not p:xslformatter/!result

Norm: Oversight, I agree.

Mohamed: What is the default for required on option?

Norm: "no"

Mohamed: It's written explicitly in some places.

Norm: Are we satisified that we've given editorial direction to Alex

Norm attempts to describe the serialization problem that probably caused Alex to lump them together.

Any other business?



Summary of Action Items

[NEW] ACTION: Norm to do something about the formatting of fourth level headings [recorded in http://www.w3.org/2007/07/05-xproc-minutes.html#action01]
[NEW] ACTION: Norm to sort the error codes in the appendix [recorded in http://www.w3.org/2007/07/05-xproc-minutes.html#action02]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.128 (CVS log)
$Date: 2007/07/12 16:06:22 $