Bug 10176 - [SER] What does it mean to output an XML island as XML?
[SER] What does it mean to output an XML island as XML?
Status: CLOSED FIXED
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 1.0
Recommendation
All All
: P2 normal
: ---
Assigned To: Henry Zongaro
Mailing list for public feedback on specs from XSL and XML Query WGs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-15 16:06 UTC by Henry Zongaro
Modified: 2013-01-22 18:35 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Zongaro 2010-07-15 16:06:23 UTC
Consider the following stylesheet.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:output method="html" />

<xsl:template match="/">
<html>
<body>
<my:p title="&lt;" xmlns:my="http://example.org">xml island</my:p>
<p title="&lt;">not an xml island</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

The first paragraph of section 7.1 of Serialization says, "An element whose expanded QName has a non-null namespace URI MUST be output as XML. This is known as an XML Island."[1]  The third item in the numbered list then says, "the generic rules for the HTML output method that apply to all elements and attributes, for example the rules for escaping special characters in the text and the rules for indentation, MUST be used also for namespaced elements and attributes."

Then section 7.2 says, "The HTML output method MUST NOT escape "<" characters occurring in attribute values."

So, should the serialized result be

<html>
   <body>
      <my:p title="<" xmlns:my="http://example.org">xml island</my:p>
      <p title="<">not an xml island</p>
   </body>
</html>

or

<html>
   <body>
      <my:p title="&lt;" xmlns:my="http://example.org">xml island</my:p>
      <p title="<">not an xml island</p>
   </body>
</html>

The first requirement (that my:p be serialized "as XML") leads me to expect that < will be escaped, yielding the first result; the second and third requirements lead me to expect that < will not be escaped, yielding the second result.

[1] http://www.w3.org/TR/xslt-xquery-serialization/#HTML_MARKUP
[2] http://www.w3.org/TR/xslt-xquery-serialization/#HTML_ATTRIBS
Comment 1 Henry Zongaro 2011-05-24 14:42:34 UTC
The only thing that I can think is that the sentences, "An element whose
expanded QName has a non-null namespace URI MUST be output as XML. This is
known as an XML Island," were intended to mean that such elements are never recognized as HTML elements, even if their local names happen to be the same as those of HTML elements, and that the second result is the correct serialized result.
Comment 2 Henry Zongaro 2011-05-24 16:33:32 UTC
I wrote "the second result" in comment #1 where I meant "the first result."
Comment 3 Henry Zongaro 2011-07-27 13:30:07 UTC
The best that I've been able to determine is that the sentences "An element whose
expanded QName has a non-null namespace URI MUST be output as XML. This is
known as an XML Island," were meant to be a statement of intent, and that the rules that follow describe the actual rules in detail.  The behaviour of implementations that I've tested seem to reflect that.

I propose changing those sentences to read "An element whose expanded QName has a non-null namespace URI might be serialized differently from an element that is in no namespace.  An element that has a non-null namespace is known as an XML Island."
Comment 4 Henry Zongaro 2011-07-27 19:36:38 UTC
Liam points out that an XML island is the piece of the serialized HTML result that is formatted as XML - the element node itself is not an XML island.  My proposed rewording in comment #3 implied the latter.

Taking that into account, I propose changing those sentences to read, "An element whose expanded QName has a non-null namespace URI might be serialized differently from an element that is in no namespace.  The portion of the serialized document representing the result of serializing such an element is known as an XML Island."
Comment 5 Henry Zongaro 2011-10-11 10:00:55 UTC
At the joint telecon of the XSLT and XQuery Working Groups,[3] the proposals of comment#4 were accepted, with suitable editorial reworking to eliminate the use of the word "might".  This will be erratum SE.E20.

The editor's final revised wording is to change the sentences in question to read, "As is described in detail below, the HTML output method will not output an element differently from the XML output method unless the expanded QName of the element has a null namespace URI. [Definition] The portion of the serialized document representing the result of serializing an element whose expanded QName does not have a null namespace URI is known as an XML Island."

[3] http://lists.w3.org/Archives/Member/w3c-xsl-query/2011Sep/0250.html (Member-only link)