This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Some of the following issues have been raised on earlier drafts but it seems safest to raise them again as last call issues in bugzilla. 6.7.3 Construction [of text nodes] from an Infoset says If the resulting Text Node consists entirely of white space and the Text Node occurs in Element content[XML], the content of the Text Node is the zero-length string. The reference to Element Content XML production is inappropriate as the input to this procedure is an infoset rather than a literal XML document. The [element content whitespace] infoset property is flagged a few lines up as being optionally used so this could say If the resulting Text Node consists entirely of characters with an [element content whitespace] property with value true, the content of the Text Node is the zero-length string. This would make the document consistent however (with either wording) this clause introduces a very large incompatibility with XPath1. I think it would be better to drop this clause altogether, systems requiring white space nodes to be dropped can use the PSVI mapping or a proprietary mapping to the datamodel, neither of which have any xpath1 compatiblity implications. Dropping white space from declared element content from schema validated (PSVI) input makes sense and is something that could be tested in a conformance test. Dropping white space from the infoset mapping if [element content whitespace] is reported isn't really testable as non validating parsers may or may not report this and don't need to document whether they do or they don't. As it is it means that given <!DOCTYPE x [ <!ELEMENT x (x*)> ]> <x> <x/> <x/> </x> a simple xpath of /x/node()[2] is completely undefined: it may pick up the the first or the second empty x node. If this clause is kept it should be higlighted here that it is incompatible with Xpath1's data model and the XPath (and XSLT) Compatability appendices should also mention this. For the reverse mapping 6.7.5 (and J7) states that all characters get mapped to infoset items with [element content whitespace] of unknown. The infoset has a constraint that all non-white characters have a value of false for this property http://www.w3.org/TR/xml-infoset/#infoitem.character says: ..It is always false for characters that are not white space. So I think the mapping from the DM to the infoset should set this property to false or to unknown depending on whether the character is white space. David
Is it possible to identify when these comments were previously made so that we can figure out what the WG did about the original comments? In particular were these comments made on the the immediately previous DM Last Call document and if so can you point to the issue(s) in the Last Call issues list? http://www.w3.org/2005/04/data-model-issues.html /paulc
> Is it possible to identify when these comments were previously made This current report is strongly related to http://www.w3.org/2005/04/data-model-issues.html#toc.qt-2003Dec0085-01 However the text that I commented about there has been largely removed/rewriten so most of the text of that comment is no longer relevant, the final comment lodged there is Norm's White space is now significant in all cases except element-only content where it is not significant. The draft has been clarified to reflect this. So the first of my comments in this bug report could be rephased as saying that: a) the clarification added at this point isn't correct as it uses the "Element Content" XML production which isn't necessarily available in an infoset which even if it was generated by parsing an XML document may not have the information. It will (or may) have the [element content whitespace] property reported though so this should be used as described in my message. b) If this clause isn't removed, the negative impact it has on XPath compatibility should be documented. The second of my comments that the reverse mapping should not always set the infoset property [element content whitespace] to "unknown", as it violates a constraint in the infoset spec, is new. David
I just noticed that the first two points of this report are identical to the two points in report 1303 from XML Core.
The WG has discussed this comment: http://lists.w3.org/Archives/Member/w3c-xsl-query/2005May/0069.html and declines to make any changes. Please let us know if you accept this resolution.
> Please let us know if you accept this resolution. I object to this resolution. If a breaking change is going to be introduced between 1.0 and 2.0 then the least that could be done is document that fact. Even if this is documented I object to the resolution. It's not just incompatibility with XSLT1 that must be documented: it's incompatibility between XSLT2 (and Xquery) systems. As whether or not [element content whitespace] is reported by a non-validating-parser-that-reads-a-dtd is entirely parser specific as far as I can see, and parsers of this type have traditionally been the ones most commonly used with XSLT. If the resolution is to keep the currently specified behaviour, the editorial change to change from referencing Element content[XML] to referencing [element content whitespace] should be made as Element content[XML] is a property of an XML document (a sequence of Unicode characters) not a property of an infoset ie it's not a property of the input to the mapping being defined. I also don't understand the WG's intention to specify a mapping from the Xquery data model that always sets [element content whitespace] to unknown. Do they disagree with the analysis that this produces an infoset that violates a constraint specified on the infoset recommendation? David
Just an additional comment to confirm that these issues are not addressed in the new drafts. As an example of the incompatiblity this introduces between XPath 1 and 2 (that is still not documented in the incompatibilities section) consider <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text"/> <xsl:template match="x"> <xsl:copy-of select="node()[position() mod 2 = 0]"/> </xsl:template> </xsl:stylesheet> applied to <!DOCTYPE x [ <!ELEMENT x (y*)> <!ELEMENT y (#PCDATA)> ]> <x> <y>s</y> <y>kill </y> <y>ti</y> <y>me </y> </x> With a 1.0 processor one gets the output skill time with a 2.0 processor as specified here you would get the output kill me Having the same construct work _without error_ in version 2 but produce radically different output to version 1 is something that should be avoided if at all possible, and documented _very_ clearly if it is impossible to avoid the incompatibility.
The XSL and XML Query WGs have decided to make this an issue for which we explicitly solicit CR feedback.
OK that sounds reasonable. What's the protocol, close this report and open new reports commenting on the CR draft? Or keep this report open and add comments to this when the new draft appears? Feel free to close this if that is appropriate.
The XSL WG also agreed to add the following non-normative text to the XSLT specification: It occurred to me during the discussion today that we ought in section 4.4 (which discusses xsl:strip-space) to mention the statement in the data model that whitespace text nodes in element-only content are stripped before xsl:strip-space/preserve-space comes into play. The place for this seems to be in the existing note at the end of the section which currently reads: Note: A source document is supplied as input to the XSLT processor in the form of a tree conforming to the data model described in [Data Model]. Nothing in this specification states that this tree must be built by parsing an XML document; nor does it state that the application that constructs the tree is required to treat whitespace in any particular way. The provisions in this section relate only to whitespace text nodes that are present in the tree supplied as input to the processor. In particular, the processor cannot preserve whitespace text nodes unless they were actually present in the supplied tree. I propose to change this to: Note: In [Data Model], processes are described for constructing a tree (an instance of the data model) from an Infoset or from a PSVI. Those processes deal with whitespace according to their own rules, and the provisions in this section apply to the resulting tree. In practice this means that elements that are defined in a DTD or a Schema to contain element-only content will have whitespace text nodes stripped, regardless of the xsl:strip-space and xsl:preserve-space declarations in the stylesheet. However, source trees are not necessarily constructed using those processes; indeed, they are not necessarily constructed by parsing XML documents. Nothing in the XSLT specification constrains how the source tree is constructed, or what happens to whitespace during its construction. The provisions in this section relate only to whitespace text nodes that are present in the tree supplied as input to the XSLT processor. The XSLT processor cannot preserve whitespace text nodes unless they were actually present in the supplied tree. I think we should also say something in the compatibility appendix. I'd suggest a new section J.1.2 before the existing J.1.2: J.1.2 Tree Construction: whitespace stripping In both 1.0 and in 2.0, the XSLT specification places no constraints on the way in which source trees are constructed. For XSLT 2.0, however, the [Data Model] specification describes explicit processes for constructing a tree from an Infoset or a PSVI, while also permitting other processes to be used. The process described in [Data Model] has the effect of stripping whitespace text nodes from elements declared to have element-only content. Although the XSLT 1.0 specification did not preclude such behavior, it differs from the way that most existing XSLT 1.0 implementations work. It is RECOMMENDED that an XSLT 2.0 implementation wishing to provide maximum interoperability and backwards compatibility should offer the user the option either to construct source trees using the processes described in [Data Model], or alternatively to retain or remove whitespace according to the common practice of previous XSLT 1.0 implementations. To write transformations that give the same result regardless of the whitespace stripping applied during tree construction, stylesheet authors can: * use the xsl:strip-space declaration to remove whitespace text nodes from elements having element-only content (this has no effect if the whitespace has already been stripped) * use instructions such as <xsl:apply-templates select="*"/> that cause only the element children of the context node to be processed, and not its text nodes. I also spotted while reading section 4.4 that the following Note: Note: This implies that if an xml:space attribute is specified on a literal result element, it will be included in the result. is misplaced in section 4.4, since literal result elements do not occur in source documents. I suggest we add it to the note at the end of 4.2, rephrasing it to fit: Note: If an xml:space attribute is specified on a literal result element, it will be copied to the result tree in the same way as any other attribute.
(In reply to comment #9) > In both 1.0 and in 2.0, the XSLT specification places no constraints on the > way in which source trees are constructed. I dispute this assertion. XSLT 1 refers to XPath for its data model, and XPath describes the correspondence between its data model and an XML document. In particular, it says that "The children of an element node are the element nodes, comment nodes, processing instruction nodes and text nodes for its content". There is no doubt from the XML spec that the content of an element includes whitespace characters regardless of the content model. I therefore believe that when an XSLT 1 processor constructs a data model from an XML document, it must not remove element content whitespace. However, the XML Core WG is happy with your decision to solicit CR feedback on this issue.