This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13604 - CDATA sections are no allowed except in foreign content
Summary: CDATA sections are no allowed except in foreign content
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Leif Halvard Silli
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/html-polyglot...
Whiteboard:
Keywords:
Depends on: 23593
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-03 12:52 UTC by Philippe Le Hegaret
Modified: 2013-11-02 11:12 UTC (History)
7 users (show)

See Also:


Attachments

Description Philippe Le Hegaret 2011-08-03 12:52:36 UTC
The HTML5 spec is clear that "CDATA sections can only be used in foreign content (MathML or SVG)." [1].

However, the polyglot spec is mostly silent on those. Tidy generates CDATA sections for inline style and script when it outputs XHTML. It would be good to make it that those CDATA markers must not be used.

[1] http://www.w3.org/TR/html5/syntax.html#cdata-sections
Comment 1 David Carlisle 2011-08-03 20:20:24 UTC
(In reply to comment #0)
> The HTML5 spec is clear that "CDATA sections can only be used in foreign
> content (MathML or SVG)." [1].
> 
> However, the polyglot spec is mostly silent on those. Tidy generates CDATA
> sections for inline style and script when it outputs XHTML. It would be good to
> make it that those CDATA markers must not be used.
> 
> [1] http://www.w3.org/TR/html5/syntax.html#cdata-sections

the \\<![CDATA  markup doesn't generate a cdata section if used in an html script element (as < doesn't start markup in that context) so that usage doesn't contradict the fact that html doesn't allow cdata sections except in foreign content.

Like using linebreaks in svg attributes, this usage will cause difference between an htnl and xml DOM, but the difference is largely cosmetic, the difference just being whether the first line of the script that contains a jacascript comment has an empty comment (XML)  or a comment with the characters <!CDATA (from html parsing).

David
Comment 2 Philippe Le Hegaret 2011-08-03 20:34:35 UTC
I believe you're correct, but it would nice to clarify section 9.2 In-line Script and Style then. At the moment, it says that polyglot must use safe content, it doesn't mention anything at all about CDATA sections.
Comment 3 David Carlisle 2011-08-03 20:43:52 UTC
(In reply to comment #2)
> I believe you're correct, but it would nice to clarify section 9.2 In-line
> Script and Style then. At the moment, it says that polyglot must use safe
> content, it doesn't mention anything at all about CDATA sections.

that's consistent with the aim as currently expressed that polyglot aims to get identical doms. For many purposes that is a more strict requirement than necessary.

there's nothing wrong with using the 
//<!CDATA[
idiom, but unless the polyglot spec weakens it's aims to "compatible" DOM for some definition of compatible then it is right to say that the script should not contain a < (so it can't contain <![CDATA, so there is no need to say anything further about CDATA sections.).

Specifying the requirements for identical dom is probably the right thing for the spec to do, the conditions under which non-identical doms are Ok are probably harder to specify in any generic way.

David
Comment 4 Philippe Le Hegaret 2011-08-03 21:17:27 UTC
(In reply to comment #3)
> there's nothing wrong with using the 
> //<!CDATA[
> idiom, but unless the polyglot spec weakens it's aims to "compatible" DOM for
> some definition of compatible then it is right to say that the script should
> not contain a < (so it can't contain <![CDATA, so there is no need to say
> anything further about CDATA sections.).

Well, I still believe that being explicit would help the authors out there. There is nothing wrong with using CDATA in XHTML but, in Polyglot, those shouldn't be used since it won't produce identical doms.
Comment 5 Michael[tm] Smith 2011-08-04 05:07:22 UTC
mass-move component to LC1
Comment 6 Michael[tm] Smith 2011-08-04 05:07:41 UTC
mass-move component to LC1
Comment 7 Henri Sivonen 2011-08-04 07:47:30 UTC
(In reply to comment #4)
> Well, I still believe that being explicit would help the authors out there.
> There is nothing wrong with using CDATA in XHTML but, in Polyglot, those
> shouldn't be used since it won't produce identical doms.

If you desugar the DOMs, they are equivalent, though. The problem with talking about "the DOM" as a shorthand for the document tree is that the DOM has some domain modeling errors--particularly exposing the CDATA syntactic sugar in the data model.

Note that there's an ongoing attempt to remove this domain modeling error from the Web DOM: https://bugzilla.mozilla.org/show_bug.cgi?id=660660
Comment 8 Henri Sivonen 2011-08-05 13:26:24 UTC
For clarity, my comment was about text in SVG or MathML parents. Polyglot docs clearly can't use CDATA sections with HTML parents.
Comment 9 Henri Sivonen 2011-08-05 13:27:55 UTC
In fact, comment 7 is irrelevant on this bug report.
Comment 10 Leif Halvard Silli 2013-10-31 00:36:44 UTC
(In reply to Philippe Le Hegaret from comment #0)
> The HTML5 spec is clear that "CDATA sections can only be used in foreign
> content (MathML or SVG)." [1].
> 
> However, the polyglot spec is mostly silent on those. Tidy generates CDATA
> sections for inline style and script when it outputs XHTML. It would be good
> to make it that those CDATA markers must not be used.
> 
> [1] http://www.w3.org/TR/html5/syntax.html#cdata-sections

Since this bug was opened, some thing has happened: Polyglot Markup now has a section on raw text elements which, first, speaks about 'safe text' and thereafter, about 'safe CDATA': 

http://www.w3.org/TR/html-polyglot/#raw-text-elements

3.6.2 Raw text elements (script and style)
    3.6.2.1 The safe text content option
    3.6.2.2 The safe CDATA option

Thus it is possible that this bug is already fixed, in principle. However, bug 23593 might lead to some changes for the <script> element.
Comment 11 Leif Halvard Silli 2013-11-02 11:12:11 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:


   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: Referred to polyglot’s rules for safe CDATA.
Rationale: Concurred withe bug filer.

Checked in:

http://dev.w3.org/cvsweb/html5/html-polyglot/html-polyglot.html?rev=1.14
http://dev.w3.org/cvsweb/html5/html-polyglot/html-polyglot.html?rev=1.14

Comment: As told, the spec now already defines safe CDATA. But I described some of the differences between CDATA in script/style vs CDATA in foreign content (including the link that Philip included in comment #0) in polyglot’s section on when to use (named) entities.