11904 2011-01-28 11:50:51 +0000 <plaintext> and <xmp> in Polyglot Markup 2011-08-04 05:07:40 +0000 1 1 1 Unclassified HTML WG LC1 HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) unspecified PC All CLOSED FIXED http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#elements-that-cannot-contain-special-characters P2 major --- 1 xn--mlform-iua eliotgra eliotgra mike public-html-admin public-html-wg-issue-tracking shadow2531 xn--mlform-iua public-html-bugzilla oldest_to_newest 44826 0 xn--mlform-iua 2011-01-28 11:50:51 +0000 The draft text on plaintext and xmp should be deleted: ]] Due to the conflict between parsing rules between HTML and XML, polyglot markup uses the following elements only if they do not contain angled brackets ("<" or ">") or ampersands ("&").[[ ISSUES: (1) plaintext/xmp are forbidden in HTML5 - so how do they belong in this draft? (Needs separate bug too.) According to Henri Sivonnen, the Polyglot spec should only describe a subset of XML1 and HTML5. But which subset? Is it about the valid subset? or the valid and well-formed subset? Or perhaps about the DOM equal subset? Or the valid and well-formed DOM equal subset? Example: When you say that polyglot markup *requires* <colgroup/>, then we are outside both validity and well-formedness - then we are in the "equality" land. And the same goes for <xmp> and <plaintext> - the emphasis, as long as you discuss them at all, is on equality, and not on whether validity or well-formedness. This question requires a separate bug. But I want to mention it here anyhow. In my view, Polyglot Markup should describe the HTML5-valid (and perhaps also XML 1.0-valid), XML 1.0-well-formed, DOM-equal subset of HTML5. For that reason, plaintext and xmp does not belong in Polyglot Markup, as it is not permitted in HTML5. (2) For <plaintext>, can conflicting parsing rules ever be avoided ? No! PLAINTEXT EXAMPLE: <plaintext></plaintext> A HTML parser will display the characters "</plaintext>" to the user. Thus it seems to me that if parsing rules is the justification, then <plaintext> must not be used in polyglot documents, as it is not possible to use it in polyglots, without landing in problems/differences due to conflicting parsing rules. (Exception: <iframe><plaintext/></iframe>. But then we should also say that for example "<p/><p></p>" should be permitted, as it is the same issue: "<p/>" works fine, as long as it is empty and a new block element follows immediately after. Plus that are are outside the syntax what HTML5 permits. (3) For <xmp>, can conflicting parsing rules ever be avoided? Only as long as the author avoids any child element and NCRs. Thus, practically speaking, no! XMP example: <xmp><p>å</p></xmp> A HTML-parser will render the content of xmp literally, as code. This is impossible to replicate in XML, unless one uses <[CDATA[ ]]>. However, if one places a <[CDATA[ ]]> inside, then the parser will render those letters literally as well. As for what the specification draft says: Normally one would not say that the XMP example "contains" "<", ">" or "&". Instead, it contains a <p> element and a NCR. And it is, eventually, child elements and NCRs that needs to be forbidden inside an xmp element that occurs in a polyglots document. (4) No need to escape the *characters* <>&. (Needs separate bug too.) From XML's point of view, there isn't anything special with regard to "<", ">" and "&" inside xmp and plaintext: In all XML documents, the "<" and "&" must - in general -always be escaped. Thus they can neither occur whether inside xmp/plaintext or anywhere else. And, as long as they are escaped, then ">" does not constitute a problem, as far as I can see. Thus, nothing speciall needs to be said about "<" and ">" or "&" inside xmp/plaintext . Instead, it needs to be said aht xmp cannot contain elements or NCRs - see (3) above. CONCLUSION: Delete the entire section. Or, eventally, say that <plaintext> MUST NOT be used but that <XMP> can be used provided that it has no children and no NCRs. 45426 1 eliotgra 2011-02-12 00:44:29 +0000 In the Editor's Draft of 11 February 2011, I have deleted section 6.5.2 about <plaintext> and <xmp> in Polyglot Markup, as they are, indeed, deprecated in HTML5. Thank you so very much for catching this. Eliot 45449 2 xn--mlform-iua 2011-02-13 17:58:03 +0000 Fine. Satisified. I believe I should then close this bug. 53175 3 mike 2011-08-04 05:07:20 +0000 mass-move component to LC1 53208 4 mike 2011-08-04 05:07:40 +0000 mass-move component to LC1