This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2611 - xqueryx: trivial embedding (esp CDATA sections)
Summary: xqueryx: trivial embedding (esp CDATA sections)
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: XQueryX 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Jim Melton
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-20 01:17 UTC by David Carlisle
Modified: 2006-06-13 12:19 UTC (History)
0 users

See Also:


Attachments

Description David Carlisle 2005-12-20 01:17:16 UTC
section 5 says

  If the XQuery contains characters that are prohibited in XML text
  (specifically < and &), except when they occur within a CDATA section
  within the XQuery, they must be "escaped" as either character entity
  references (&lt; and &amp;, respectively) or numeric character references

I think that the "except when they occur within a CDATA section within the XQuery"
should be deleted and that all "<" including those within CDATA sections (and
including the < in <![CDATA[ in such a section) should be escaped.
In addition there is a third possibility for escaping besides entity or
character references, namely to use CDATA sections, and in fact this possibility
is demonstrated in the last example.

It goes on to say:

  CDATA sections within an XQuery expression are embedded in the same form in
which they appear in any XML document.

I am not at all sure what this is intended to mean. Perhaps it is intended to
mean that XQuery CDATA sections are encoded as XML CDATA sections. In which case
I think that is completely wrong and means that this is a not-so-trivial
embedding. The Trivial embedding should take the xquery text as plain text and
embed it into XML using standard plain text to XML constructs, without having to
parse the xquery expression. (The plain text xml serialiser has to scan for <>
and & but not parse the expression.)
The Xquery <x><![CDATA[<]]></x> should be encoded as
<xqx:xquery>&lt;x&gt;&lt;![CDATA[&lt;]]&gt;&lt;/x&gt;<xqx:xquery>
not
<xqx:xquery>&lt;x&gt;<![CDATA[<]]>&lt;/x&gt;<xqx:xquery>
as this latter embedding is an embedding of the xquery
<x>&lt;</x>
which has the same run time behaviour as the first expression but it is a
different expression with a different parse tree.
It's important not to lose the fact that the CDATA section was in the XQuery as
although this example has the same behaviour if it is replaced, in other cases
it may be different, due to white space stripping (which is suppressed by CDATA
sections).

  it is recommended that > always be "escaped" (for example, as &gt; or &#3E;).

there's a missing x in the hex character ref at the end of that sentence.
Comment 1 Jim Melton 2005-12-25 01:21:56 UTC
Your final argument re: white space behavior was persuasive to me, so I intend
to propose to the Working Groups that your suggestion be adopted. 

Thanks for catching the missing "x" in the hex character reference, too. 
Comment 2 Jim Melton 2006-01-11 17:59:42 UTC
The XML Query Working Group has considered your comment and agrees with the
problem that you described.

A solution has been developed and approved by the WG:

(1) Replace the fourth paragraph ("If the XQuery contains...") of section 5, 
A Trivial Embedding of XQuery, with:

****
XQuery expressions are, for the purposes of this trivial embedding, treated 
as literal text. Therefore, if the XQuery contains characters that are 
prohibited in XML text (specifically < and &), they must be "escaped" as 
character entity references (&lt; and &amp;, respectively) or as numeric 
character references (for example, &#x3C; and &#x26;, respectively), or 
they must be enclosed in a CDATA section (for example, <![CDATA[<]] or 
<![CDATA[&]]).  Note that this includes the leading "<" of a CDATA section 
that appears in the original XQuery expression.  In addition, because the 
sequence of characters "]]>" is always prohibited within a CDATA section, 
it is recommended that instances of > in the original XQuery always be 
"escaped" (for example, as &gt;, &#x3E;, or <![CDATA[>]]).
****

(2) In addition, in the sixth paragraph ("The following two more..."), delete 
the entire sentence that reads "CDATA sections within an XQuery expression 
are embedded in the same form in which they appear in any XML document."

This is the official WG response. 

Please let us know if you agree with this resolution of your issue, by adding a
comment to the issue record and changing the Status of the issue to Closed. Or,
if you do not agree with this resolution, please add a comment explaining why.
If you wish to appeal the WG's decision to the Director, then also change the
Status of the record to Reopened. If you wish to record your dissent, but do not
wish to appeal the decision to the Director, then change the Status of the
record to Closed. If we do not hear from you in the next two weeks, we will
assume you agree with the WG decision.
Comment 3 David Carlisle 2006-01-12 10:11:08 UTC
 In addition, because the 
  sequence of characters "]]>" is always prohibited within a CDATA section, 

This should say "within XML element content" not "within a CDATA section". ]]>
is forbidden from all element content not just inside a CDATA section.

so an Xquery of
<a x="]]>" />

can't be encoded as

<xqx:xqueryx>&lt;a x="]]>" /></xqx:xqueryx>

you have to quote the > (or the ] ) as well.

David
Comment 4 Michael Kay 2006-01-12 10:31:32 UTC
It's seems to me that we're tying ourselves in knots in this section by trying
to tell people in full gory detail how to write a serializer. All we need to say is:

In the trivial embedding, the string of Unicode characters making up the text of
an XQuery query forms the string-value of a text node, which itself is the only
child of an xqx:xquery element.

Note: when such an element is serialized, special characters such as < and &
must be escaped in the usual way. For example...

(But frankly, this section on trivial embedding isn't worth the paper it isn't
written on. We don't need a standard for how to represent a string of Unicode
characters in an XML document. If an XML spec such as XML Schema or XSLT decides
it wants to embed XQuery, it will probably do it in a different way anyway.)
Comment 5 Jim Melton 2006-05-12 21:24:22 UTC
I just noticed that this comment has languished unCLOSED for some time.  I have accepted the correction provided in comment #3 (http://www.w3.org/Bugs/Public/show_bug.cgi?id=2611#c3) and made the requisite changes editorially. 

I have NOT, however, done anything with respect to comment #4 (http://www.w3.org/Bugs/Public/show_bug.cgi?id=2611#c4) and have no plans to do so unless directed by the XML Query WG to do so.  (Apologies, Mike!)

May we now mark this bug as CLOSED?
Comment 6 David Carlisle 2006-06-13 12:19:43 UTC
Closing this (although I agree with comment #4, that the trival embedding feature should be dropped.)