This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This is essentially a re-raising of bug #1309 which was explicitly deferred for comment on the CR drafts. I agree with the requirement to strip white space text nodes in trees built from schema-validated input. This report just concerns the default mapping from a non schema-validated infoset. The requirement to strip white space text nodes from elements declared in a DTD introduces a large incompatibility between XPath 1 and XPath 2. This incompatibility is highlighted in the XSLT draft (J.1.1) but not in the XPath draft. If no changes are made to the specification to remove the incompatibility then similar wording to XSLT J.1.1 should be added to XPath I.1, as otherwise the small list of edge cases in appendix I.1 gives a rather over-optimistic view of the compatibility between the two versions. However, perhaps even more important than the compatibility between XPath 1 and XPath2, is compatibility between XPath2 (and XQuery) systems. The current requirement makes such compatibility rather hard to achieve. Typically a system will document which XML parser it uses, or give the user a choice of which to use, or give a choice of whether to use the parser in non-validating or validating mode. If a validating parser is used, the [element content whitespace] property will be reported, so in this case, all XPath2 (and XQuery) systems will act in the same way (although in a way incompatible with XPath1, this would be something I could "live with" (in W3C working group consensus-speak). However traditionally the most common type of parser used with XSLT (in particular) has been a non-validating-parser-which-reads-a-dtd (as the structure of the XSLT language means that this type of parser is more or less required to read the XSLT file, and typically the same parser is used on input documents). For this kind of parser there is, as far as I can tell, no specification at all, which suggests whether they should, or should not, report the [element content whitespace] property on elements for which they have read a DTD declaration. So typically a user will have no way of knowing whether or not white space will be stripped and no way of changing the behaviour if it is unwanted. Incompatibility with XPath1 is something that will hopefully become less important over time, but incompatibility between different XPath2/XQuery systems is something that should be avoided if at all possible. I offer 3 options A: Do not change the specification. In this case, the XPath compatibility appendix should document the incompatibility. B. Change the requirement to strip white space nodes so that it only applies to infosets constructed by a _validating_ XML parser. (DTD validated, so that if you validate with a DTD, the whitespace behaviour matches that of schema validation). C. Remove the requirement to strip white space when building from an Infoset (keeping it in the case of building from a PSVI) The status quo (A) has the largest incompatibility with Xpath 1 and introduces similarly large incompatibilities between Xquery and XPath2 systems running on different XML parsers. Taking either option (B) or (C) would cause all XPath2 and XQuery systems to work the same way. Option (C) is the most compatible with XPath1, and the one that I personally prefer, but perhaps option (B) would be a useful compromise position that should be considered. David
I must admit that I thought the specs currently said [B]. Reading it more carefully, I see that it is indeed possible that a parser doesn't do validation, but does distinguish whitespace appearing in element content from whitespace appearing in mixed content. Are there real parsers that do this, however? Looking at your option B, is there any way one can look at an InfoSet and determine whether it was constructed by a validating parser? If not, [B] looks like a lost cause (unless we abandon the pretence that we only look at the infoset and have no idea how it was created).
(In reply to comment #1) > Are there real parsers that do this, however? I don't know, and that's my concern. Even just using saxon I have no idea what happens using the parsers that can be easily used, without testing each case by trial and error. > > Looking at your option B, is there any way one can look at an InfoSet and > determine whether it was constructed by a validating parser? No, apparently not as far as I can see. The nearest thing is [all declarations processed] on the document item which I suppose could be used allthough that doesn't really do the right thing here. > If not, [B] looks > like a lost cause (unless we abandon the pretence that we only look at the > infoset and have no idea how it was created). I would word (B) such that if you don't know how it was created, you don't strip. Only strip if you know it was generated by a validating parser.
(In reply to comment #2) > (In reply to comment #1) > > Are there real parsers that do this, however? Yes. The default parser used with saxon does this. I have slightly modified the example given in bug #1309 to show this, making ws.xml invalid. The default parser probably depends on the JVM which is: $ java -version java version "1.5.0_06" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_06-b05) Java HotSpot(TM) Client VM (build 1.5.0_06-b05, mixed mode, sharing) ws.xml <!DOCTYPE x [<!ELEMENT x (z*)>]> <x> <y>s</y> <y>kill </y> <y>ti</y> <y>me </y> </x> ws.xsl <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text"/> <xsl:template match="x"> <xsl:copy-of select="node()[position() mod 2 = 0]"/> </xsl:template> </xsl:stylesheet> with XSLT1 you get $ saxon ws.xml ws.xsl skill time with XSLT2 using a validing parser, you get validation errors but then (if you carry on) a completely different result $ saxon8 -v ws.xml ws.xsl Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor Recoverable error on line 3 column 5 of file:/c:/tmp/ws.xml: SXXP0003: Error reported by XML parser: Element type "y" must be declared. Recoverable error on line 4 column 5 of file:/c:/tmp/ws.xml: SXXP0003: Error reported by XML parser: Element type "y" must be declared. Recoverable error on line 5 column 5 of file:/c:/tmp/ws.xml: SXXP0003: Error reported by XML parser: Element type "y" must be declared. Recoverable error on line 6 column 5 of file:/c:/tmp/ws.xml: SXXP0003: Error reported by XML parser: Element type "y" must be declared. Recoverable error on line 7 column 5 of file:/c:/tmp/ws.xml: SXXP0003: Error reported by XML parser: The content of element type "x" must atch "(z)*". kill me and using the same parser in _non_ validating mode no warning (about validity or white space) but again a dramatically different result from that obtained by XSLT1: $ saxon8 ws.xml ws.xsl Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor kill me
Note that there are widely deployed non-validating parsers that strip whitespace-only text nodes as a user-option.
(In reply to comment #4) > Note that there are widely deployed non-validating parsers that strip > whitespace-only text nodes as a user-option. True, although the most widely deployed one that I know of (msxml) strips whitespace nodes (if that option is chosen) whether or not the element is declared, so as it stands that option is incompatible with all three of the specification versions that I suggested, including the current status quo in the data model draft. As it says in the status section of the data model spec, the behaviour specified is "incompatible with current common practice". David
It is not incompatible. It just does not follow the Infoset to Data model mapping outlined in the data model document. But the data model document allows other processes to generate the data model.
(In reply to comment #6) > It is not incompatible. It just does not follow the Infoset to Data model > mapping outlined in the data model document. But the data model document > allows other processes to generate the data model. Oh yes, I agree. As I think I mentioned in my pre-CR version of this thread, nothing here stops any system building an XDM tree any way it likes from whatever data sources it has. Perhaps incompatible wasn't the best word. What I meant was the behaviour of stripping all white space nodes, although implemented in a very widely used system (I use it quite a bit:-) doesn't really help the present discussion decide any course of action as no system doing that is implementing the infoset to XDM mapping outlined in the data model spec. If the resulting tree meets the consistency constraints on an XDM tree (which I'm sure it will) it's conformant behaviour but irrelevant to the discussion of this part of the spec, surely. David
We debated the issue yesterday and decided (reluctantly) that option A is the best we can do. B doesn't work because it depends on knowing or influencing how the infoset was constructed, and in our processing model we don't have access to that information (though of course real products can attempt to do things this way). Option C implies an incompatibility between DTD-based and schema-based processing which many people felt would be just as troublesome as the other incompatibilities mentioned.
I can't say I'm surprised by this decision given the reaction of the WGs to previous reports on this subject (from me and xml core, at least). However I think it's a pretty bad decision. > Option C implies an incompatibility between DTD-based and schema-based > processing Given that processing with or without schema is pretty much completely incompatible, (or as the XSLT2 spec puts it more delicately "This may lead to a number of differences in behavior") I am surprised that white space would be considered an issue here. As we are finding in the test suite, results of order by expressions (for example) typically result in completely different orderings (numeric or textual) depending on whether a schema was used. If reviewing these specs were my day-job and I had time to carry on the argument I would certainly re-open this so that it is flagged as an issue at termination of CR. As neither of those things is true, I am instead going to close this report which is why I'm taking this last opportunity to complain (if not formally object) in this comment. David