Bug 7823 - [SER] Description of escaping rules for script and style elements in HTML mode not clear
[SER] Description of escaping rules for script and style elements in HTML mod...
Status: CLOSED FIXED
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 1.0
2nd Edition Recommendation
PC Windows NT
: P2 normal
: ---
Assigned To: Henry Zongaro
Mailing list for public feedback on specs from XSL and XML Query WGs
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-06 17:45 UTC by Oliver Hallam
Modified: 2010-06-04 09:58 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Oliver Hallam 2009-10-06 17:45:18 UTC
The spec says "The HTML output method MUST NOT perform escaping for the content of the script and style elements."

It is not clear in this context what "escaping for the content" means.  If the content contains an element, are its attribute values escaped or not?

For example, how should the following query be serialized in HTML mode?

<script>
  document.write("<script bad="&quot;"/>&lt;<foo>&lt;&gt;</foo>&gt;</script>")
</script>

I assume this should be serialized as follows, with the "no escaping" only applying to text node children of script elements:

<script>
  document.write("<script bad="&quot;"/><<foo>&gt;&lt;</foo>></script>")
</script>

Am I right in this assumption?
Comment 1 Michael Dyck 2009-10-06 20:06:40 UTC
> For example, how should the following query be serialized in HTML mode?
> 
> <script>
>   document.write("<script bad="&quot;"/>&lt;<foo>&lt;&gt;</foo>&gt;</script>")
> </script>

When you say "query", do you mean that this should be interpreted as a query in XQuery? Because it appears to raise a syntax error. (The "</script>" on the second line ends the direct constructor, whereupon the '"' is illegal.)  
Comment 2 Michael Kay 2009-10-06 20:41:30 UTC
I suspect the intended example was

<script>
  document.write("<script bad="&quot;">&lt;<foo>&lt;&gt;</foo>&gt;</script>")
</script>

(Note the absence of "/" before the first ">" on line 2).

For information, Saxon serializes the result of this query (with method=html) as

<script>
  document.write("<script bad="&#34;"><<foo><></foo>></script>")
   </script>
Comment 3 Henry Zongaro 2009-11-27 15:43:21 UTC
The second example box of section 7.1 advises the reader against attempting to serialize a sequence in which a SCRIPT element will contain the sequence of characters </ - but it then goes on to say that it is possible to do so, using either nested elements or by constructing text that will be serialized as start and end element tags in the content of a SCRIPT element.

Given that the example box indicates that it's possible to use elements in the content of a SCRIPT element, I believe the intent was that any attribute values and text node in the content of the SCRIPT would not be escaped - not just the text node children.

I propose the following edit to make this intent clear.  In the fourth paragraph of section 7.1, change "MUST NOT perform escaping for the content of the script and style elements." to "MUST NOT perform escaping for any text node descendant, nor for any attribute of an element node descendant, of a script or style element."

(For whatever it's worth, our implementation (WebSphere XML Feature Pack) produces output similar to that of Saxon.)
Comment 4 Oliver Hallam 2009-11-27 19:44:13 UTC
To keep in spirit with the rule, should escaping also be prevented for comment and processing instruction nodes that are descendants of the script or style element?
Comment 5 Henry Zongaro 2009-11-27 20:28:50 UTC
Escaping only applies to text nodes and attribute nodes - see steps 3.a.iii and 3.e of section 4.[1]  It's not possible to escape the characters in a comment or processing instruction in XML or HTML.

(To make it more apparent where escaping occurs, I think I will create a definition for the term "to escape" in the Serialization 1.1 draft, and add links where the term is used.)

[1] http://www.w3.org/TR/xslt-xquery-serialization/#serphases
Comment 6 Michael Kay 2009-12-03 17:28:43 UTC
Looking at the Saxon code, it is disabling escaping of text nodes at any depth beneath a script element, and it is not disabling escaping of attribute values.
Comment 7 Henry Zongaro 2009-12-03 20:03:55 UTC
WebSphere XML Feature Pack is doing the same thing as Saxon.  At the XSL WG call of 2009-12-03, the working group suggested that we should look for some real use cases the help determine what should be the best approach to resolving this bug report.
Comment 8 Henry Zongaro 2010-04-05 20:27:42 UTC
I haven't been able to find any really compelling use case that would help us solve this one.

However, my understanding is that although HTML 4.01 did not permit the character sequence "</" to appear inside a script element,[2] in practice browsers checked for the sequence "</script" instead.  The current HTML 5.0 draft requires a "</script>" tag to terminate a script element.[3]

So, imagine one had this XSLT fragment:

<xsl:template match="/">
  <script type="text/xquery">
    { Mark up your ampersands this way &amp;amp;amp; }
  </script>
</xsl:template>

The current rules for the html output method require this to be serialized as

<script type="text/xquery">
  { Mark up your ampersands this way &amp;amp; }
</script>

From what I've read, it's quite likely that browsers today would be able to handle a nested element, and that they will continue to do so to support HTML5.  So if one had this XSLT fragment instead

<xsl:template match="/">
  <script type="text/xquery">
    { Mark up your ampersands this way <em>&amp;amp;amp;</em> }
  </script>
</xsl:template>

I think it would be quite reasonable to expect all descendant text and attributes nodes in the content of script (and style) elements to be handled in the same way vis a vis escaping, not just the text node children of such elements, producing this:

<script type="text/xquery">
  { Mark up your ampersands this way <em>&amp;amp;</em> }
</script>

I'd like to put forth the proposal I made in comment #3 as the resolution to this bug report.

[2] http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.2
[3] http://www.w3.org/TR/2010/WD-html5-20100304/syntax.html#script-data-end-tag-name-state
Comment 9 Henry Zongaro 2010-05-26 13:59:30 UTC
I neglected to report that at the joint teleconference of the XSL and XQuery Working Groups of 6 April, 2010,[4] the proposal made in comment #3 was adopted as the resolution of this bug report.

As few members of the XSL WG were present, I will take this back to that working group for ratification.

[4] http://lists.w3.org/Archives/Member/w3c-xsl-query/2010Apr/0115.html (Member-only link)
Comment 10 Henry Zongaro 2010-06-03 21:09:09 UTC
At its teleconference of 3 June 2010,[5] the XSL Working Group ratified the decision to adopt the proposal made in comment #3.  This will be Serialization erratum SE.E16.

[5] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2010Jun/0011.html (Member-only link)
Comment 11 Henry Zongaro 2010-06-03 21:25:23 UTC
Oliver, as the working groups have decided on this issue, I am marking this bug as RESOLVED/FIXED.  If you agree with the decision, could I ask you to mark the bug as CLOSED?  If you disagree, please REOPEN the bug report.
Comment 12 Oliver Hallam 2010-06-04 09:58:34 UTC
I agree with this resolution, and have marked the bug CLOSED.