This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5978 - [FS] text adjustment of non-mixed complex types
Summary: [FS] text adjustment of non-mixed complex types
Status: CLOSED INVALID
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Formal Semantics 1.0 (show other bugs)
Version: Recommendation
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Michael Dyck
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-21 10:20 UTC by Tim Mills
Modified: 2008-10-08 15:20 UTC (History)
0 users

See Also:


Attachments

Description Tim Mills 2008-08-21 10:20:55 UTC
Consider a document:

<element-only-content>
  <the-only-element/>
</element-only-conent>

validated against a schema which contains a definition of element-only-content s a complex type containing element-only content.

In the above document, I've deliberately included insignificant whitespace inside the element-only-content element.

Do these insignificant whitespace nodes exist as text nodes when the document is read?

From XQuery 1.0 and XPath 2.0 Data Model (XDM):

"Otherwise, construction from a PSVI is the same as construction from the Infoset except for the content property. When constructing the content property, [element content whitespace] is not used to test if whitespace is collapsed. Instead, if the resulting Text Node consists entirely of whitespace and the character information items used to construct this node have a parent and that parent is an element and its {content type} is not mixed, then the content of the Text Node is the zero-length string."

which, if I understand it correctly, means that insignificant whitespace is represented as text nodes with zero-length string content.

That means that in the above document, element-only-content contains three nodes: <the-only-element /> and two text nodes.

Now consider the text in FS 8.1.7 Type adjustment

    * if the complex type is mixed, interleaves the type with a sequence of text nodes and xs:anyAtomicType.

and the rule:

Otherwise, just extend the type by the built-in attributes.

statEnv |-  Type1 extended by BuiltInAttributes is Type2
statEnv |-  Type3 = Type2 & processing-instruction* & comment*
---------------------------------------------------------------
statEnv |-  Type1 adjusts to Type3

This says that the adjusted type of element-only-content does not include text nodes.  Is this correct?
Comment 1 Michael Dyck 2008-08-21 19:57:47 UTC
(In reply to comment #0)
>
> That means that in the above document, element-only-content contains
> three nodes: <the-only-element /> and two text nodes.

I believe that's incorrect. XDM section 6.7.1 says:

    Text Nodes must satisfy the following constraint:
    1. If the parent of a text node is not empty, the Text Node
       must not contain the zero-length string as its content.
    ...
    When a Document or Element Node is constructed, ... If the
    resulting Text Node is empty, it must never be placed among
    the children of its parent, it is simply discarded.

Note that 6.7.3 "Construction from an Infoset" has two reminders of this,
one under 'content':

    Text Nodes are only allowed to be empty if they have no parents;
    an empty Text Node will be discarded when its parent is constructed,
    if it has a parent.

and the other at the end of the section:

    Text Nodes are only allowed to be empty if they have no parents;
    an empty Text Node will be discarded when its parent is constructed,
    if it has a parent.

Oddly, 6.7.4 "Construction from a PSVI" (from which you quoted) does not
have any such reminder. I wonder if the second reminder in 6.7.3 was
actually meant for 6.7.4. Checking the CVS history, it turns out that
initially, the second reminder *was* in 6.7.4, but then got moved to 6.7.3
in June 2005, as one of the 'fixes' for Bug 1293, presumably this point:

    6.7.4
    The statement that empty text nodes are discarded should
    not appear only under contruction from a PSVI, since it
    applies equally to construction from an infoset.

However, I believe this point was invalid, since the identical statement
already appeared in both 6.7.3 (Infoset) and 6.7.4 (PSVI). (And even if it
*had* been a valid point, the proper fix would have been to *copy* the
statement from 6.7.4 to 6.7.3, not *move* it.)

So I propose that we resolve this bug as 'invalid' and raise a separate one
against the XDM.
Comment 2 Tim Mills 2008-08-22 14:55:27 UTC
Thanks for the clarification.

So if we have a query such as:

doc('validated-document.xml')

in which the source text 'validated-document.xml' contained insignificant whitespace, in the serialized result, that whitespace would disappear (unless the serializer chose to add some)?
Comment 3 Michael Dyck 2008-10-07 07:59:11 UTC
(In reply to comment #2)
> 
> So if we have a query such as:
> 
> doc('validated-document.xml')
> 
> in which the source text 'validated-document.xml' contained insignificant
> whitespace, in the serialized result, that whitespace would disappear
> (unless the serializer chose to add some)?

That's not really my area, so I asked the experts...

Michael Kay replied:
> Technically it's implementation-defined. doc() uses a mapping from URIs
> to document nodes that is set up in the context any way the
> implementation likes.
>
> However, on the assumption that
> 
> (a) URIs are dereferenced in the conventional way [or in this case, in
> the way prescribed by the test suite documentation]
> 
> (b) the XDM tree is built using either the Infoset or PSVI mapping
> defined in the XDM spec
> 
> then insignificant whitespace (that is, whitespace text nodes appearing
> as a child of an element defined in the DTD or schema to have
> element-only content) will indeed be stripped.

And Liam Quin replied:
> Yes, if whitespace is determined to be insignificant,
> it may well not make it into the Data Model.
> 
> One way it might be deemed to be insignificant is if a DTD
> is used, and the DTD says no #PCDATA (text nodes) can appear
> at a particular place.
> 
> I expect this could also happen through schema validation, although
> by the time Schema gets to see the data the whitespace is already
> there, so I am uncertain.

to which Michael Kay added:
> If you choose to use the PSVI-to-XDM construction method defined in the
> XDM spec, whitespace text nodes in element-only content will not be
> represented in the XDM instance.

I hope that answers your question to your satisfaction.
Comment 4 Michael Dyck 2008-10-07 08:06:00 UTC
As proposed in comment 1, I have created Bug 6139 against XDM to add a reminder to section 6.7.4 about the handling of empty Text Nodes (which would probably have prevented your creation of this bug).

Therefore, I am marking this bug as resolved-invalid. If you agree, please mark the bug CLOSED.
Comment 5 Tim Mills 2008-10-08 15:20:45 UTC
Thanks to everyone for their comments.