<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>22436</bug_id>
          
          <creation_ts>2013-06-24 17:02:20 +0000</creation_ts>
          <short_desc>Give rules for content that is treated as text under a common heading</short_desc>
          <delta_ts>2013-09-02 04:14:15 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff)</component>
          <version>unspecified</version>
          <rep_platform>PC</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://dev.w3.org/html5/html-xhtml-author-guide/</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Leif Halvard Silli">xn--mlform-iua</reporter>
          <assigned_to name="Leif Halvard Silli">xn--mlform-iua</assigned_to>
          <cc>eliotgra</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>xn--mlform-iua</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>89762</commentid>
    <comment_count>0</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2013-06-24 17:02:20 +0000</bug_when>
    <thetext>See the thread ”During HTML parsing, are *all* named character references replaced by their corresponding glyph?”, and in particular this answer from Michael: 

http://www.w3.org/mid/20130624113437.GB37583@sideshowbarker

What Michael said, is easy to forget. Thus, I think this subject needs a little more description in Polyglot Markup. Right now, only &lt;script&gt; and &lt;style&gt; are covered - and also &lt;noscript&gt;.

I would propose to

  a) ad a section that describes the general issue of content
     that, unlike in XML, is treated as text by the HTML parser
     Motivation: This a an important and general gotcha and 
     difference, both within pure HTML, but especialy when
     creating polyglots.

  b) In practise, this means listing all the elements
     that themselves - or their children, are treated
     as text by the HTML parsers. (This includes
     all elements that begins with the string “&lt;no”, such
     as &lt;noscript&gt; and &lt;noframe&gt;, as well as &lt;script&gt;,
     &lt;style&gt;, &lt;xmp&gt;, &lt;iframe&gt; and perhaps some more (?)

     NB: It may also make sense to mention, in a note
         that the “sane” elements, such as &lt;object&gt;,
         &lt;video&gt; etc, are not treated that way.

  c) The section should give the various usage rules 
     - some elements are forbidden etc, while others
     have special rules for polyglots under this
     heading. (Thus, the script/style should go there
     - or at least be represented with a link to the
     section where their rules are described.)

Btw, note that HTML5 already says that the content of iframe must be empty in XML, so describing iframe should be a nobrainer. See http://www.w3.org/TR/html5/embedded-content-0.html#iframe-content-model
And HTML5 has similar things to say about most - if not of these elements, so it is mostly a collection job.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>89876</commentid>
    <comment_count>1</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2013-06-26 10:15:32 +0000</bug_when>
    <thetext>&lt;title&gt; is among these elements.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>89877</commentid>
    <comment_count>2</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2013-06-26 10:42:40 +0000</bug_when>
    <thetext>(In reply to comment #1)
&gt; &lt;title&gt; is among these elements.

More data: 

&lt;title&gt; falls under the 
  &quot;generic RCDATA element parsing algorithm&quot; 
which means that character entities/references are still handled but that tags (other than the endtag of the element itself) are ignored.

For contrast, then e.g. &lt;iframe&gt; falls under the 
  &quot;generic raw text element parsing algorithm&quot;
which means that both tags (but for the endtag) and character entities/referenes are ignored.

see: http://www.w3.org/html/wg/drafts/html/master/syntax.html#generic-rcdata-element-parsing-algorithm</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>90126</commentid>
    <comment_count>3</comment_count>
    <who name="Eliot Graff">eliotgra</who>
    <bug_when>2013-07-02 19:18:37 +0000</bug_when>
    <thetext>This sounds good, Leif. Can you create proposed text for this?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>92803</commentid>
    <comment_count>4</comment_count>
    <who name="Leif Halvard Silli">xn--mlform-iua</who>
    <bug_when>2013-09-02 04:14:15 +0000</bug_when>
    <thetext>I have just commited a fix to this bug.

However, for polyglot markup, then only script, style, iframe and title are relevant, unless I missed something.

Hopefully this can now be closed, but I will look at it once more first.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>