XML Processing Model WG -- 23/24 Oct 2008

Accept this agenda?

-> http://www.w3.org/XML/XProc/2008/10/tpac-agenda

Mohamed: Is everyone going to be here this afternoon?

Norm: The AC meeting is this afternoon, we'll see what happens.
... We can rearrange the agenda if necessary.

Agenda accepted, for the time being.

Vojtech: What is the default XML processing model?

Norm: It's the other work item on our charter; in the absence of any explicit instructions, what processing should an XML processor perform.
... We need to start thinking about that item.

Accept minutes from the previous meeting?

-> http://www.w3.org/XML/XProc/2008/10/02-minutes

Accepted.

Remaining open last call issues

015: Add p:encrypt/p:decrypt steps

Norm: Following discussions with the XML Security WG, we're not likely to have any definitions in time for V1.

Mohamed: What about having simple steps with parameters?

Norm: I don't see how that provides any more interoperability than just letting implementors do it in their own namespace.

Proposal: close with no action.

Accepted.

030: LCWD comments from the XQuery WG

Norm: I think these are all ok, but I haven't implemented them yet.

Alex: Where did we leave off?

Norm: We just need to be careful that introducing "implementation defined namespaces" doesn't leak outside the XQuery step. But I don't think that's going to be a problem because we have an XML syntax.

Alex: Do we need to say something about who wins when they come from both places?

Norm: So if my p:xquery call has a foo: namespace declaration and my XQuery implementation predefines the foo: namespace (differently), who wins?

Alex: It seems like the right answer would be, we stuff our things into the static context and that overrides what was in the by default.

Norm: My guess is that the query processor starts and will overwrite anything that we put in the static context.

<scribe> ACTION: Alex/Norm to investigate how this actually works. [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action01]

Mohamed: My thought was about all the validation steps. Is there a static context for them too?

Alex: For schema there isn't.

Mohamed: All the steps make it clear what is declared in XProc but XQuery is starting to make us think differently about it.

Norm: I don't think any of the other steps have this sort of defaulted namespace behavior.

035: Another look at validate-with-xml-schema

Norm: I'm perfectly happy with Henry's proposal for lax/strict.
... Then Henry goes on to propose some new options: use-schema-location and try-namespace.

Alex: I think use-schema-location is a really good idea.

Some discussion of whether or not parameters should be passed to the schema-validate step.

Norm: Let's set this one aside until Henry gets here.
... The only thing you can't do with extension attributes is compute their values dynamically. I don't know how serious that is.

036: Make 5.7.2 consistent wrt context node

-> http://www.w3.org/XML/XProc/docs/diff.html#p.option

Vojtech: Sometimes it's difficult to detect exactly why an XPath expression failed.

Proposal: Accept the changes.

Accepted.

Vojtech: If you define a default binding for p:input and you then refer to a variable not-in-scope, what happens?

Norm: The expression fails.
... I think the upshot is that we need to say somewhere general that it's an error to refer to varible bindings that are not in scope.

<scribe> ACTION: Norm to add a general statement about out-of-scope variables. [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action02]

Mohamed: With respect to the binding of p:option, we should say that it's as if the binding was to p:empty then in 5.15 we should say what that means (empty in 1.0 and undefined in 2.0)

Norm: Makes sense to me.

Mohamed: Then maybe we wouldn't have to cut-and-paste that prose everywhere

Norm: Anyone disagree?

Accepted.

<scribe> ACTION: Norm to fix p:empty and p:option as Mohamed suggests. [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action03]

035: Another look at validate-with-xml-schema

Henry: I chose these two options explicitly because these are the ones that you need to get Saxon to do the right thing. The default behavior changed between 8.0 and 9.0.
... What exactly it means to "try namespaces" is implmeentation defined (RDDL, GRDDL, etc.)

Alex: For Xerces, if you turn off the use-schema-location hints and add a catalog, that'll just work.

Henry: Catalogs should be transparent. They enter the game at the time you have a URI that you're trying to dereference.

Alex: We need to be very clear about what try-namespaces it means.

Henry: We can point directly into the schema spec for the right paragraph and clause.

Alex: I have a catalog for my schema processor and I need to tell it where the catalog is.
... I could do it externally, but that would be global in some way.

Henry: We haven't decided if parameters are a mechanism which people can use to extend the option set in implementation specific ways.
... I don't think that's what they were intended for.
... They were intended to operate in the case where it is in the nature of a particular step that it has an open-ended set of options.

Alex: We have steps that violate that: p:hash and p:xsl-formatter

Henry: Are we sure we're capable of predicting in advance which steps are likely to want parameters? Shouldn't every step have a parameter port?

Alex: Going back through last call?

Henry: Right, I've said it, but I agree we don't want to go through last call for it.

Vojtech: We have an explicit error for p:hash

Alex: Maybe we should make that a general "I didn't like your parameter" error.
... The only thing I can see parameters for are weird implementation features.

Henry: Don't we really need a way to allow implementations to extend the list of options available on the step?

Alex: Can we do this in V.next

Henry: Yes, but it will be very disruptive. The p:hash and p:xsl-formatter steps will have these parameters when they don't need them anymore.
... This would actually have the benefit of packaging things a little better.

Inspection of 3.8

Henry: It seems to me that extension attributes can be used to pass implementation-specific strings, but they are static.

Vojtech: Why don't we have a way to compute extension attribute values?

Norm: We decided not to do attribute value templates, and we don't have an element syntax for them.

Does anyone want to add a parameter input port to p:validate-with-schema?

No.

Do we want to add the use-location-hints and try-namespaces options ?

Yes.

Anyone object?

Accepted.

Are we happy with the proposed error?

yes.

Alex: Should we have the general error about bad parameters or bad parameter values?

Yes.

Break

Reconvene at 14:00

037: Self-importing

Norm: I think we should allow it, but may require adding some prose about the base URI of the pipeline or library document.

Accepted.

041: Steps with no inputs/outputs

Vojtech: We can import pipelines, not just libraries, but the prose talks about libraries.

Norm: Yes, that's probably just sloppy wording. I'll fix it.

<scribe> ACTION: Norm to fix the wording about imports so that it applies equally to p:pipelines and p:libraries [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action04]

037: Self-importing

Vojtech: Does this include little self-contained compound steps?

Henry: Yes, this is fine.

Norm: There's no issue, we can just close this without action.

Accepted.

042: Detecting errors

Norm: I asked if unknown steps were an error, and the consensus was that they are not.
... I'm satisfied.
... I propose we close this with no action.

<scribe> ACTION: Norm to change 6.1 so that it's not a static error. [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action05]

<MoZ> well

<MoZ> Norm, what does it means for the implementation ?

It means that it's a dynamic error if you attempt to evaluate it.

Which we already say

Does that make sense, MoZ ?

<MoZ> okidok

044: Source on p:error

Norm: I think it should not be primary; Henry agreed. Any objections?

Vojtech: As long as you can bind something, I'm fine.

Accepted

045: split-sequence and position() and last()

Norm: I was confused because of our changes to tracking position and length in for-each and viewport.
... I think Mohamed is right and there's no problem.
... Proposal: close without action.

Accepted.

046: href on p:store

Norm attempts to explain the situation.

Norm: I think p:store w/o an href should write the document to the location of the base URI of the document being stored.
... Though we appear not to actually say that yet.

Henry: If base URIs are propagated, doesn't that run the risk of blowing away the pipeline document.

Norm: If I have an XSLT step that produces a result document, and I p:store that result document, I want it to be written to the right URI.

Henry: See what we say at the top of section 7. If I feed file://important/document into a complex pipeline that has a p:store somewhere and I've forgotten to put href on it, we'll overwrite the document.
... Is that really what we want?

Norm: We have our own base URI function (because XPath 1.0 didn't)
... So you could say:

<p:store>

<p:with-option name="href" select="p:base-uri(/)">

</p:store>

Henry: There are three options: (1) make it required, (2) give the empty string special status, perhaps an error, or (3) give it a default that we think does something useful, like /dev/null

Norm: If I have a p:xslt step that produces a bunch of secondary result documents and I want to write them to disk, I'll have to write the complex form of p:store in order to save the documents.

Henry: We could specify that the base URI for absolutization in p:store is the base URI of the primary input.

Alex/Norm: We could add a separate option for store to base-URI?

Henry: On balance, I think the facts are that you can get what you want and anything else puts carelessness at high risk.
... But do we call the empty string an error?

Norm: No, becaues #foo would do the same thing.

Henry: So I think the consensus is that the href attribute is required.

Proposal: Make the href attribute required.

Accepted.

047: p:wrap-sequence and position()/last()

Norm: I think Mohamed is right.
... Proposal: Make it explicit that position() and last() are available in wrap sequence.

Accepted.

048: p:log on atomic steps

Norm: This is a spec exposition bug. We just need to say somewhere that p:log can be used on all the atomic steps.

<scribe> ACTION: Norm to change 3.3 so that it refers to with-option, variable, etc. [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action06]

Accepted.

049: Standard C14N method

Mohamed: I made a proposal and we talked about it and decided not to do it.

Alex: It should be put in the serialization spec, we shouldn't have to do it. It's something everyone wants.

Proposed: Close with no action.

Mohamed: I did make a request for an example.

Norm: I'm fine with that.

<MoZ> http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2008Jun/0035.html

<scribe> ACTION: Norm to add an example of C14N [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action07]

Test suite

Review of use cases and requirements

<scribe> ACTION: Norm to add our use cases and requirements document to the References [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action08]

Use case 5.10 requires dsig, so we can't do that one.

Use case 5.11 requires a validator that preserves base URI properties.

Use case 5.14 requires tagsoup or tidy, so we can't do that one.

Discussion of content-type on p:load and p:document to satisfiy 5.14

Mohamed observes that p:data can load non-XML resource, but we have no facility for doing that with a computed URI.

Mohamed: So we need to create another step or somehow extend p:load

<alexmilowski> well... yes

Vojtech: You can't use text/plain on p:load because it doesn't provide a wrapper.

<alexmilowski> not even with with-option ?

Norm: Maybe this is how we decided to use p:http-request for this case...

<alexmilowski> yeah...

Mohamed: I want to fetch an xhtml document which is distributed as text/html so that I am able to work with it.

<alexmilowski> you can pass a computed URI with a 'file' scheme

<alexmilowski> (or whatever)

<alexmilowski> unescape-markup ...

<alexmilowski> Besides... ISO-8859-1 is really Windows-1252 according to HTML5 ...

<alexmilowski> ...so, you really want p:data ...

<alexmilowski> (seriously... you really do...)

<alexmilowski> In fact... you want a byte sequence base64 encoding so you can run their crazy redefinition of character encodings

You want p:data, but you can't use p:data if you need to construct the URI

<alexmilowski> If you get a text/html media type...

<alexmilowski> ...and it has a non-unicode encoding...

<alexmilowski> do you get base64 ?

<alexmilowski> (checking spec)

<alexmilowski> Here's our note:

<alexmilowski> "Given the above description, any content identified as text/html will be base64-encoded in the c:body element, as HTML isn't always well-formed XML. A user can attempt to convert such content into XML using the p:unescape-markup step."

<alexmilowski> But:

<alexmilowski> "is recognized as a non-XML media type whose contents are encoded as a sequence of Unicode characters (e.g. it has a character parameter or the definition of the media type is such that it requires Unicode),"

<alexmilowski> That says that text/html; charset=UTF-8 should end up as characters and not base64

<alexmilowski> But text/html; charset=ISO-8859-1 should be base64

<alexmilowski> Thus... you might have to look at the 'encoding' attribute of 'c:body' to understand whether you have characters or not.

<alexmilowski> Ugly...

<alexmilowski> What we need is a media type parameter of 'version'

This is all very unsatisfying

<alexmilowski> so p:unescape-markup can use

<alexmilowski> text/html; charset=ISO-8859-1; version=5.0

<alexmilowski> Yes

What are the problems?

<alexmilowski> text/html isn't what you expect anymore...

[Many of the HTML references here are to HTML5, much discussed at TPAC —ed.]

1. p:data can load a non-XML resource, but can't do so with a computed URI

2. p:load takes a computed URI, but can't load non-XML data

3. p:http-request can take a dynamic URI and can load non-XML data, but it's likely to base64 encode the result

4. And we don't have a way to unescape base64 encoded text

[Subsequent research will reveal that, yes, we do. —ed.]

Alex: We separated out the encoding on the result from http-request, but we don't seem to be doing this here.
...c: data and c:body are slightly out of step in this regard.

Alex: You might want to choose what to do with data based on its encoding: even if it's a mappable encoding, you might want to treat it as data.

We need to clarify how/what encoding means on c:body when it appears in a response.

<scribe> ACTION: Norm to clarify encoding on c:body in a response--probably by saying that it isn't used [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action10]

Norm: I think there's consensus that we could make forward progress by saying that implementations SHOULD attempt to convert the content of any text/* media type into Unicode characters. Implementations MUST present text/* media types that use a Unicode encoding into characters.

Light breaks over Marblehead...the p:unescape-markup step *can* decode base64 encoded text.

Mohamed: We need encoding on c:data

Alex: That's right because it might or might not be base64 encoded.
... In unescape-markup we need to say that there can be an charset parameter on the content-type.

Vojtech: We should remove the charset parameter's default value and say that it's only used if it's specified and it overrides the charset on the content-type.

Norm: What have we decided?

1. Remove the default value from the charset parameter on p:unescape-markup

2. Steps that take a content-type should respect the charset parameter

3. If you specify a charset on unescape-markup, it overrides the charset parameter on the content-encoding

4. If you don't specify the charset in either place, and the encoding is base64, that's a dynamic error

5. Change p:unescape-markup so that it ignores the charset if the encoding isn't specified.

6. If you want to load a non-XML resource, you're stuck with p:http-request

7. Specifically, it's not a dyanmic error if encoding isn't specified and the charset is

8. Add encoding attribute to c:data

9. Document that http-request can be used to load non-XML resources

Add an example that shows that there are a bunch of optoins that don't make sense

<scribe> ACTION: Alex to go through the spec again and look at the encoding/charset things [recorded in http://www.w3.org/2008/10/23-xproc-minutes.html#action11]

Reviewing use cases and requirements

Henry: With respect to 5.19, we found it useful to have a filter on input, so that you can say that the pipeline begins by processing *its* input with a filter and then proceeds.
... A use case that I had to implement was "here comes a document, it's a product-database-related document, there's a key field in this, you need to look up this field, if it exists in the database, you add the attribute, otherwise, add it to the database.

(This is with respect to 5.20)

<scribe> ACTION: Norm to check with Erik Bruchez about use case 5.24 [recorded in http://www.w3.org/2008/10/24-xproc-minutes.html#action21]

Some discussion of the NVDL steps.

http://lists.dsdl.org/dsdl-comment/2008-09/0048.html

Some discusssion of versioning. We can ask what XPath version we have, but not what XSLT version.

Alex argues in favor of being able to check versions of XSLT, Schema Validation, etc.

Henry proposes an XML document that lists all the steps and the supported versions of each.

In short: all but a very small number of the use cases are satisfied by XProc V1.0

Henry: Propose that the editor produce a CR draft.

Accepted.

<scribe> ACTION: Norm to produce a CR draft. [recorded in http://www.w3.org/2008/10/24-xproc-minutes.html#action22]

CR exit criteria

Henry: The ideal would be one of two things: one is a very carefully annotated issues list that shows we've dealt with all the CR comments and we have buyin from everyone reasonable and everyone we don't have buyin is unreasonable.
... And the other is an implementation report that shows three complete implementations of every feature.
... The bare minimum is two implementations of every feature.

Norm: Do we need a timetable?

Henry: Yes, we're being rechartered, so we should have a plan for getting to Rec and it better be before December, 2009.
... Aim to publish a CR draft in the middle of November, and set the CR period to end on 1 March.

Default XML Processing Model

Henry: There are two lines of potential exploration: One is that the XML spec itself leaves certain choices to the processor (e.g., external parameter entity references are expanded)
... And what that means is when you publish an XML document on the web, the question is, what are you held to. What is the document that is what you published? Or what is the infoset?
... The objective question that lurks behind it is, if there are entity references defined in an external parameter entity, and your parse doesn't retrieve them, are you bound by the statements present in the document when they involve unknown entity references.
... By the same token, what about XInclude processing.
... The infoset spec explicitly declines to answer the question of what is an XML document. Nor does the spec.
... The other line is this notion of the recursive, compositional semantics of XML documents.

Some discussion of what the default might be...

Henry: Another approach is, should we be talking about a third component to the XML media types. When you fetch a document, you can say, I want the 0 model, the 1 model, or the 2 model.
... Where the 0 model means what the parser gives you, the 1 model gives you XInclude, the 2 model gives you XInclude/validation model, etc.
... Another model says that you should be able to put a pipeline in a URI.

Consensus seems to be forming around the idea that the defalut XML Processing Model is normal XML parsing (with some constraints like, always chase external parameter entities for entity declarations), followed by XInclude.

Any other business?

Some discussion of the expectation of schema-location hints and what the defaults should be for try-namespaces= and use-schema-location=

There's some desire to have a consistent story around schema-location hints and schemas that arrive on the schemas port.

But it's not clear how implementations can support that.

Not clear what resolution we came to.

- DRAFT -

XML Processing Model WG

23/24 Oct 2008

Attendees

Contents

Accept this agenda?

Accept minutes from the previous meeting?

Remaining open last call issues

015: Add p:encrypt/p:decrypt steps

030: LCWD comments from the XQuery WG

035: Another look at validate-with-xml-schema

036: Make 5.7.2 consistent wrt context node

035: Another look at validate-with-xml-schema

037: Self-importing

041: Steps with no inputs/outputs

037: Self-importing

042: Detecting errors

044: Source on p:error

045: split-sequence and position() and last()

046: href on p:store

047: p:wrap-sequence and position()/last()

048: p:log on atomic steps

049: Standard C14N method

Test suite

Reviewing use cases and requirements

CR exit criteria

Default XML Processing Model

Any other business?

Summary of Action Items