This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Consider a document: <element-only-content> <the-only-element/> </element-only-conent> validated against a schema which contains a definition of element-only-content s a complex type containing element-only content. In the above document, I've deliberately included insignificant whitespace inside the element-only-content element. Do these insignificant whitespace nodes exist as text nodes when the document is read? From XQuery 1.0 and XPath 2.0 Data Model (XDM): "Otherwise, construction from a PSVI is the same as construction from the Infoset except for the content property. When constructing the content property, [element content whitespace] is not used to test if whitespace is collapsed. Instead, if the resulting Text Node consists entirely of whitespace and the character information items used to construct this node have a parent and that parent is an element and its {content type} is not mixed, then the content of the Text Node is the zero-length string." which, if I understand it correctly, means that insignificant whitespace is represented as text nodes with zero-length string content. That means that in the above document, element-only-content contains three nodes: <the-only-element /> and two text nodes. Now consider the text in FS 8.1.7 Type adjustment * if the complex type is mixed, interleaves the type with a sequence of text nodes and xs:anyAtomicType. and the rule: Otherwise, just extend the type by the built-in attributes. statEnv |- Type1 extended by BuiltInAttributes is Type2 statEnv |- Type3 = Type2 & processing-instruction* & comment* --------------------------------------------------------------- statEnv |- Type1 adjusts to Type3 This says that the adjusted type of element-only-content does not include text nodes. Is this correct?
(In reply to comment #0) > > That means that in the above document, element-only-content contains > three nodes: <the-only-element /> and two text nodes. I believe that's incorrect. XDM section 6.7.1 says: Text Nodes must satisfy the following constraint: 1. If the parent of a text node is not empty, the Text Node must not contain the zero-length string as its content. ... When a Document or Element Node is constructed, ... If the resulting Text Node is empty, it must never be placed among the children of its parent, it is simply discarded. Note that 6.7.3 "Construction from an Infoset" has two reminders of this, one under 'content': Text Nodes are only allowed to be empty if they have no parents; an empty Text Node will be discarded when its parent is constructed, if it has a parent. and the other at the end of the section: Text Nodes are only allowed to be empty if they have no parents; an empty Text Node will be discarded when its parent is constructed, if it has a parent. Oddly, 6.7.4 "Construction from a PSVI" (from which you quoted) does not have any such reminder. I wonder if the second reminder in 6.7.3 was actually meant for 6.7.4. Checking the CVS history, it turns out that initially, the second reminder *was* in 6.7.4, but then got moved to 6.7.3 in June 2005, as one of the 'fixes' for Bug 1293, presumably this point: 6.7.4 The statement that empty text nodes are discarded should not appear only under contruction from a PSVI, since it applies equally to construction from an infoset. However, I believe this point was invalid, since the identical statement already appeared in both 6.7.3 (Infoset) and 6.7.4 (PSVI). (And even if it *had* been a valid point, the proper fix would have been to *copy* the statement from 6.7.4 to 6.7.3, not *move* it.) So I propose that we resolve this bug as 'invalid' and raise a separate one against the XDM.
Thanks for the clarification. So if we have a query such as: doc('validated-document.xml') in which the source text 'validated-document.xml' contained insignificant whitespace, in the serialized result, that whitespace would disappear (unless the serializer chose to add some)?
(In reply to comment #2) > > So if we have a query such as: > > doc('validated-document.xml') > > in which the source text 'validated-document.xml' contained insignificant > whitespace, in the serialized result, that whitespace would disappear > (unless the serializer chose to add some)? That's not really my area, so I asked the experts... Michael Kay replied: > Technically it's implementation-defined. doc() uses a mapping from URIs > to document nodes that is set up in the context any way the > implementation likes. > > However, on the assumption that > > (a) URIs are dereferenced in the conventional way [or in this case, in > the way prescribed by the test suite documentation] > > (b) the XDM tree is built using either the Infoset or PSVI mapping > defined in the XDM spec > > then insignificant whitespace (that is, whitespace text nodes appearing > as a child of an element defined in the DTD or schema to have > element-only content) will indeed be stripped. And Liam Quin replied: > Yes, if whitespace is determined to be insignificant, > it may well not make it into the Data Model. > > One way it might be deemed to be insignificant is if a DTD > is used, and the DTD says no #PCDATA (text nodes) can appear > at a particular place. > > I expect this could also happen through schema validation, although > by the time Schema gets to see the data the whitespace is already > there, so I am uncertain. to which Michael Kay added: > If you choose to use the PSVI-to-XDM construction method defined in the > XDM spec, whitespace text nodes in element-only content will not be > represented in the XDM instance. I hope that answers your question to your satisfaction.
As proposed in comment 1, I have created Bug 6139 against XDM to add a reminder to section 6.7.4 about the handling of empty Text Nodes (which would probably have prevented your creation of this bug). Therefore, I am marking this bug as resolved-invalid. If you agree, please mark the bug CLOSED.
Thanks to everyone for their comments.