XML Processing Model WG

Meeting 83, 6 Sep 2007


See also: IRC log


Norm, Mohamed, Paul, Henry, Alessandro, Rui, Richard, Michael, Andrew, Alex, Murray


Accept this agenda?

-> http://www.w3.org/XML/XProc/2007/09/06-agenda


Accept minutes from the previous meeting?

-> http://www.w3.org/XML/XProc/2007/08/30-minutes


Next meeting: telcon 13 September 2007

No regrets.

Comments on the new draft

-> http://www.w3.org/XML/XProc/docs/langspec.html

None heard.

Namespace fixup?

Norm: There's been a thread on this.
... The two positions seem to be: only when you serialize vs. on every step.

Michael: Am I right to assume that the folks who want it only on serialization are proposing a namespace fix-up step?
... Should we add a p:namespace-fixup step?

Henry: It's not a deeply technical issue, it's a complex one where people are trying to figure out what users are going to find most useful.

Alex: My proposal wasn't that we have namespace fixup but that we don't do wrong things. Don't create the problems that require namespace fixup.
... I think the most extreme position is that we do namespace fixup all the time.
... A more moderate position is to say that the steps in our library don't mangle namespaces.
... Then there's the "steps can do anything they want" position.

Henry: It would be possible to do option 1, but that would require a lot of analysis. My feeling is that that would be a substantial job of work.
... Everytime I think I've understood a reasonable subset of what those things are, I've come up with more.

Alex: I'm on the side of saying that we need to make our step library not cause these problems as much as possible.

Henry: I think because it's irritating but true that there's no published description of what namespace fixup means, that we don't have something we can refer to.
... Nonetheless, in various toolsets and libraries, there are serialization libraries that do some subset of necessary fixup.
... It's the fact that those are there and the analysis that we'd need are not that inclines me towards saying, this gets enforced at the margins.
... That's why my bias is in that direction.
... I also think it will make the spec easier to read.

Alex: That means that if you insert an element, there's no requirement to copy the in-scope namespaces.
... I think we need to say something about that in at least some cases.
... Then half the battle is won.

Henry: We don't say enough about what we mean when refer to nodes, which we do. The Schema Rec says what infoset properties are required for elements; we haven't said that.
... We haven't been at all clear about whether the prefix property is one we care about.
... In-scope namespaces, namespace attributes, we could go down the list.

Alex: I agree we need to say something about that. I think users want prefixes to be preserved.

Henry: Maybe this is a compromise: without being specific, implementations should preserve as much information as possible and produce complete infosets but enforcement is only at the margins. Mention things like prefixes and namespace attributes, etc.
... The reason we focus on local names and namespace names go without saying is because that's what's necessary for XPath expressions to work.
... The question of what about the rest of the document, it seems to me, it would be much simpler and allow us to get to L/C, to allow a general health and safety warning at the beginning.

Richard: One possibility would be to define the micro components in terms of trivial XSLT stylesheets because then XSLT has already defined what you have to do with prefixes and how serialization is supposed to work and which things are not allowed.

Alex: Has XSLT really said that much/

Richard: XSLT has said that applications don't have to use the prefixes.
... It says that reading what's serialized should produce the same data model except for some obvious cases, like extra namespace nodes.

Alex: XSLT 2.0 is in a nice situation because they got rid of this problem. You must copy the namespace declarations now.
... It's only XSLT 1.0 that has this problem, and our other steps.

Richard: I was thinking of XSLT 1.
... It does give the rule that you have to get the same thing back.

Henry: But it's model is quite impoverished compared to our own.

Straw poll: Simple binary choice between saying that the spec should gaurantee trivially serializable documents between steps or not?

scribe: Or should we enfoce that requirement only at serialization time.
... And that leaves open the question of how we do the former if we do it. We can just state it and leave it up to the implementation, or we can try to do all the analysis necessary.

Alessandro: I'm curious because I can't picture what would be the difference between components doing the fixup or the serializer doing the fixup. Can't we just leave it all to the implementors?

Richard: That will result in different implementations behaving differently.
... But maybe the only ones that will be different should be considered in error anyway.

Henry: Is there anyone who doesn't think that we should garauntee our output is w/f XML?

No. Whew. :-)

Norm: Can you even *tell* if a step doesn't do fixup?

Richard: Suppose that the pipeline generates a stylesheet, then the namespace bindings on those elements are going to be used. If you did fixup that put a namespace binding on one of those elements, then that could change the meaning of the XPath.

Norm: Yeah, alright.

Richard: But it seems to me that that's a bogus program anyway.
... Why was it doing that?

Henry: What that points towards is something which says "it is implementation-dependent how much is detected by the processor with respect to that kind of issue but this is unlikely to cause significant interoperability problems unless you're doing something dodgy anyway"

Murray: So I've been reading the email and listening and I'm not sure I even understand what XProc is about. Maybe a few simple questions will help.
... If I read in an XML document, there are NS bindings and uses.
... As I go through various steps, I may be adding and removing things. This could result in missing, new, or conflicting namespace bindings.

Richard: Yes. But you dont' have to be doing anything particuarly bad to do this. Just add a wrapper around an element and that wrapper must have a namespace declaration for whatever prefix you use.
... And that might conflict with one you've already used.

Straw poll: Should we put a health warning in the spec and ask for priority feedback, rather than trying to nail this ourselves now.

Murray: The results should be not just well formed XML but faithful to the spirit of the author.

Richard: The delete example is a good one.

Norm: I think rename is the culprit here, not delete. Delete deletes the whole subtree.

Michael: Unwrap rather than delete would give you the problem.

Paul: My only concern with the health warning is that we're supposed to go to Last Call with no other issues.
... We need to make sure that we don't make it sound like an open issue. We need to say this is what we think the answer is and see if that satisfies people.

Murray: In the GRDDL spec we put a health warning in about validation in our Last Call.

<ht> Proposal: "Atomic steps which add, delete or change aspects of XML documents may introduce inconsistencies in the relationship between the namespace names of elements and attributes, namespace declarations and in-scope namespace bindings. The extent to which these inconsistencies are detected and repaired on a step by step basis is implementation-defined. Such inconsistencies *must* be repaired on serialization. . .

<ht> (a process usually referred to as 'namespace fixup')

Murray: Someone asked whether we expected the final serialization to be well-formed. I alwasy thought that the output of every step would be well formed.

Henry: That's what we're struggling with.
... I should have included 'prefixes' above.

Richard: But we aren't specifying how they must be repaired.
... What serialization is produced to do the repair?

<MoZ> removing the document can be a repair

Richard: Let's try a concrete example.
... Suppose unwrapping removes an element with a declaration, what happens to the children.

Alex: I think we can point to the serialization spec which does have a nice description of this.
... There's something in there about reconstructed infosets.

Richard: I believe that we have to have something that addresses this.
... If the element you removed in unwrap had text children, they really will lose that namespace.

<MoZ> without adding too much complexity to the problem, I want to add the concern about the fact that with p:string-replace, I can replace a string with characters that are not allowed in XML 1.0 but are in XML 1.1 (&#1;)

Henry: I'm perfectly happy in this regard to point to the serialization spec for guidance.

Murray: Here's my thought. We put minimal text in the specification and we edit the description of the serialization step so that it spells this out in a little more detail and we point to a separate document to detail all of this.

Norm: That won't work on process grounds

<MoZ> Please not only namespace

Henry: Straw poll: Ask the editor to add a health warning about namespaces with references to the serialziation spec and leave it at that.

Richard: The effect with respect the email discussion si the question, is it ok to leave it until serialization?

Henry: This health warning would encourage implementors to do their best step by step.

Paul: I'm happy with that.

Mohamed: I think it must also include warnings about XML 1.0 vs. XML 1.1.

Alex: I don't like the health warning.

Paul: I'd still like to go to last call, unless you think we still have an open issue.

Micheal: This sounds like an open issue to me.
... I can't support that resolution.

Norm: With my chair's hat on, I cannot in good conscience claim there isn't an issue here.

Rename p:equal to p:compare?


Semantics of p:label-elements

Norm: It's been suggested that we should use sequential numbers and not check for duplicates.

Henry: If I add an xml:id and a subsequent step already has it, so I think the duplicate detection is a complete red herring and gets in the way of using this for scoped identifiers.

Norm: Any object to removing duplicate detection?


Murray: Let's leave it implementation defined.

Richard: I disagree strongly; the generate-id() in XSLT has that behavior and its a constant source of irritation.
... It should be defined exactly what the IDs are.

Henry: Regression tests have the same problem.

Alex: Sequential numbering is the suggestion? I'm ok.

Norm: Any object to sequential numbering instead of implementation-defined?

<MSM> what is the relevant total ordering here?

Mohamed: Can we make it an option to make it random?

Alex: We could add a radix?

Murray: Why not just make it an option to support sequential numbering, but you can implement other schemes if you want.

Norm: Alas, we're out of time, so I think we'll have to take this one to email as well.


Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.128 (CVS log)
$Date: 2007/09/17 11:57:08 $