11904 – <plaintext> and <xmp> in Polyglot Markup

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 11904 - <plaintext> and <xmp> in Polyglot Markup

Summary: <plaintext> and <xmp> in Polyglot Markup

Status:	CLOSED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	LC1 HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 major
Target Milestone:	---
Assignee:	Eliot Graff
QA Contact:	HTML WG Bugzilla archive list

URL:	http://dev.w3.org/html5/html-xhtml-au...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-01-28 11:50 UTC by Leif Halvard Silli
Modified:	2011-08-04 05:07 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description Leif Halvard Silli 2011-01-28 11:50:51 UTC

The draft text on plaintext and xmp should be deleted:

 ]] Due to the conflict between parsing rules between HTML and XML, polyglot markup uses the following elements only if they do not contain angled brackets ("<" or ">") or ampersands ("&").[[

ISSUES:

(1) plaintext/xmp are forbidden in HTML5 - so how do they belong in this draft? (Needs separate bug too.)

   According to Henri Sivonnen, the Polyglot  spec should only describe a subset of XML1 and HTML5.  But which subset? Is it about the valid subset? or the valid and well-formed subset? Or perhaps about the DOM equal subset? Or the valid and well-formed DOM equal subset? Example: When you say that polyglot markup *requires* <colgroup/>, then we are outside both validity and well-formedness - then we are in the "equality" land. And the same goes for <xmp> and <plaintext> - the emphasis, as long as you discuss them at all, is on equality, and not on whether validity or well-formedness.

This question requires a separate bug. But I want to mention it here anyhow. In my view, Polyglot Markup should describe the HTML5-valid (and perhaps also XML 1.0-valid), XML 1.0-well-formed, DOM-equal subset of HTML5. For that reason, plaintext and xmp does not belong in Polyglot Markup, as it is not permitted in HTML5.

(2) For <plaintext>, can conflicting parsing rules ever be avoided ?  No!

   PLAINTEXT EXAMPLE:  <plaintext></plaintext>

A HTML parser will display the characters "</plaintext>" to the user. Thus it seems to me that if parsing rules is the justification, then <plaintext> must not be used in polyglot documents, as it is not possible  to use it in polyglots, without landing in problems/differences due to conflicting parsing rules. (Exception: <iframe><plaintext/></iframe>. But then we should also say that for example "<p/><p></p>" should be permitted, as it is the same issue: "<p/>" works fine, as long as it is empty and a new block element follows immediately after. Plus that are are outside the syntax what HTML5 permits.
 
(3) For <xmp>, can conflicting parsing rules ever be avoided? Only as long as the author avoids any child element and NCRs. Thus, practically speaking, no! 

   XMP example: <xmp><p>&#229;</p></xmp>

A HTML-parser will render the content of xmp literally, as code. This is impossible to replicate in XML, unless one uses <[CDATA[ ]]>. However, if one places a  <[CDATA[ ]]> inside, then the parser will render those letters literally as well. 

As for what the specification draft says: Normally one would not say that the XMP example "contains" "<", ">" or "&". Instead, it contains a <p> element and a NCR. And it is, eventually, child elements and NCRs that needs to be forbidden inside an xmp element that occurs in a polyglots document.

(4) No need to escape the *characters* <>&. (Needs separate bug too.)

From XML's point of view, there isn't anything special with regard to "<", ">" and "&" inside xmp and plaintext: In all XML documents, the "<" and "&" must - in general -always be escaped. Thus they can neither occur whether inside xmp/plaintext or anywhere else. And, as long as they are escaped, then ">" does not constitute a problem, as far as I can see. Thus, nothing speciall needs to be said about "<" and ">" or "&" inside xmp/plaintext . Instead, it needs to be said aht xmp cannot contain elements or NCRs - see (3) above.

CONCLUSION: Delete the entire section. Or, eventally, say that <plaintext> MUST NOT be used but that <XMP> can be used provided that it has no children and no NCRs.

Comment 1 Eliot Graff 2011-02-12 00:44:29 UTC

In the Editor's Draft of 11 February 2011, I have deleted section 6.5.2 about <plaintext> and <xmp> in Polyglot Markup, as they are, indeed, deprecated in HTML5.

Thank you so very much for catching this.

Eliot

Comment 2 Leif Halvard Silli 2011-02-13 17:58:03 UTC

Fine. Satisified. I believe I should then close this bug.

Comment 3 Michael[tm] Smith 2011-08-04 05:07:20 UTC

mass-move component to LC1

Comment 4 Michael[tm] Smith 2011-08-04 05:07:40 UTC

mass-move component to LC1