This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14041 - inconsistent definitions of safe content for scripts.
Summary: inconsistent definitions of safe content for scripts.
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Eliot Graff
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-06 08:32 UTC by David Carlisle
Modified: 2013-04-24 00:56 UTC (History)
5 users (show)

See Also:


Attachments

Description David Carlisle 2011-09-06 08:32:34 UTC
Informally, "safe content" is content than you can put in a script (or style) element in a polyglot document, and conversely content that is not safe should be placed in an external file and referenced.

However

http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#external-script-and-style

says

> Polyglot markup uses external scripts if that document's script or style sheet uses < or & or ]]> or --. 

The restriction on -- is not needed, <script> a-- </script> would parse the same way in xml or html. It's inclusion appears to be related to the side comment on not using <!-- comments in scripts, but it's inclusion in the list of strings that force the use of external files appears to be bogus.




Conversely, the following section

http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#in-line-script-and-style

says

Safe content is content that does not contain a < or & character.

here, despite what it says in the previous section there is (correctly) no banning of -- and (incorrectly) no banning of ]]>

proposal.

Take the definition of "safe content" out of 9.1 and place it into section 9 immediately before 9.1 and 9.2 so both can reference it.

then 9.1 can say scripts _must_ use external reference if the script uses unsafe content and 9.2 can say scripts may be inline if they only contain safe content.

As a definition of "safe content" I think

Content is not "safe" if it contains (after any xml or html entity or character references are expanded) the characters < or & or the substring ]]>
Comment 1 Leif Halvard Silli 2011-09-06 16:56:43 UTC
(In reply to comment #0)

I agree w.r.t. '--'. Those situations when '--' (and '-->') inside <script>/<style> is potentially harmful, are already considered non-conforming by HTML5 itself. Hence it is "unsafe" (in some sense) even in HTML5 itself. Therefore I agree that it does not make sense to mention '--' in *this* definition of "unsafe". But I think 'unsafe' is perhaps not the most telling word. How about simply 'not polyglot'?

   ...snip...
> As a definition of "safe content" I think
> 
> Content is not "safe" if it contains (after any xml or html entity or character
> references are expanded) the characters < or & or the substring ]]>

The phrase "after any xml or html entity or character references are expanded" is quite confusing. It is clear that it is XML's "expansionism" that is the reason why there is a problem. However, it for instance sounds as if you say that ]]&gt; is dangerous ... And it sounds as if it somehow is possible to avoid expansion, in XML - is it? I would like to propose the following, as more hands on and correct:

   NEW DEFINITION PROPOSAL:
"""
   A <script> or <style> is not considered polyglot (that is:
   the XML interpretation will differ from the HTML
   interpretation) if it contains:
      1) any <  (this would begin a tag in XML only)
      2) any &  (this would begin a reference/entity in XML only)
      3) any ]]> (this would be seen as a CDATA end in XML only)
    NOTE:
     * Point 1) means that '<!--'   and '<![CDATA[' inside
       script and style are not polyglot.
     * Point 2) means that HTML entities, XML entities and 
       character references inside script and style are not
       considere polyglot.
"""
Comment 2 Leif Halvard Silli 2011-09-06 19:25:21 UTC
(In reply to comment #0)
* A (more) positive definition compared to the one in comment #1. 
* Instead of 'safe content'/'[not] polygot' => '[un]ambiguous code/content'.
   NOTE: 'safe' gives the wrong connotations - it 
              reminds about the vague rules of Appendix C. 

"""
9.x Unambigious content in <script> and <style>

   Except for the well-defined exceptions (e.g. xml:lang="foo"),
   ambigious strings (strings  that XML interprets different from
   HTML and vice-versa) are not used in Polyglot Markup. For the 
   content of <script> and <style> this means that the following
   strings MUST NOT occur:
      1) '<'  - because XML sees it as a tag/comment/CDATA starter
          even inside <script>/<style>. As a consequence, '<!--'
          and '<![CDATA[' may not occur in the content of polyglot 
          <script>/<style> elements.
      2) '&' - because XML sees it as a reference/entity starter even 
          inside <script>/<style>. As a consequence, HTML entities,
          XML entities and  character references may not occur in
          the content of polyglot <script>/<style> elements.
      3) ']]>' - (because XML sees it as a CDATA end mark)
    NOTE: When necessary, a possible workaround might be to 
    include the properly escaped code inside the @src attribute
    of <style> and <script>.
"""
Comment 3 Eliot Graff 2013-04-24 00:56:55 UTC
    EDITOR'S RESPONSE: This is an Editor's Response to your comment. If
    you are satisfied with this response, please change the state of
    this bug to CLOSED. If you have additional information and would
    like the Editor to reconsider, please reopen this bug. If you would
    like to escalate the issue to the full HTML Working Group, please
    add the TrackerRequest keyword to this bug, and suggest title and
    text for the Tracker Issue; or you may create a Tracker Issue
    yourself, if you are able to do so. For more details, see this
    document:

       http://dev.w3.org/html5/decision-policy/decision-policy.html

    Status: Accepted
    Change Description: Changed Section 9, Script and Style, as requested in these comments.
    Rationale: This change defines "ambiguous strings" and clarifies the roles of these characters in polyglot markup. 

new revision: 1.98; previous revision: 1.97