[Bug 11904] New: <plaintext> and <xmp> in Polyglot Markup

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11904

           Summary: <plaintext> and <xmp> in Polyglot Markup
           Product: HTML WG
           Version: unspecified
          Platform: PC
               URL: http://dev.w3.org/html5/html-xhtml-author-guide/html-x
                    html-authoring-guide.html#elements-that-cannot-contain
                    -special-characters
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
                    Graff)
        AssignedTo: eliotgra@microsoft.com
        ReportedBy: xn--mlform-iua@xn--mlform-iua.no
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org, eliotgra@microsoft.com


The draft text on plaintext and xmp should be deleted:

 ]] Due to the conflict between parsing rules between HTML and XML, polyglot
markup uses the following elements only if they do not contain angled brackets
("<" or ">") or ampersands ("&").[[

ISSUES:

(1) plaintext/xmp are forbidden in HTML5 - so how do they belong in this draft?
(Needs separate bug too.)

   According to Henri Sivonnen, the Polyglot  spec should only describe a
subset of XML1 and HTML5.  But which subset? Is it about the valid subset? or
the valid and well-formed subset? Or perhaps about the DOM equal subset? Or the
valid and well-formed DOM equal subset? Example: When you say that polyglot
markup *requires* <colgroup/>, then we are outside both validity and
well-formedness - then we are in the "equality" land. And the same goes for
<xmp> and <plaintext> - the emphasis, as long as you discuss them at all, is on
equality, and not on whether validity or well-formedness.

This question requires a separate bug. But I want to mention it here anyhow. In
my view, Polyglot Markup should describe the HTML5-valid (and perhaps also XML
1.0-valid), XML 1.0-well-formed, DOM-equal subset of HTML5. For that reason,
plaintext and xmp does not belong in Polyglot Markup, as it is not permitted in
HTML5.

(2) For <plaintext>, can conflicting parsing rules ever be avoided ?  No!

   PLAINTEXT EXAMPLE:  <plaintext></plaintext>

A HTML parser will display the characters "</plaintext>" to the user. Thus it
seems to me that if parsing rules is the justification, then <plaintext> must
not be used in polyglot documents, as it is not possible  to use it in
polyglots, without landing in problems/differences due to conflicting parsing
rules. (Exception: <iframe><plaintext/></iframe>. But then we should also say
that for example "<p/><p></p>" should be permitted, as it is the same issue:
"<p/>" works fine, as long as it is empty and a new block element follows
immediately after. Plus that are are outside the syntax what HTML5 permits.

(3) For <xmp>, can conflicting parsing rules ever be avoided? Only as long as
the author avoids any child element and NCRs. Thus, practically speaking, no! 

   XMP example: <xmp><p>&#229;</p></xmp>

A HTML-parser will render the content of xmp literally, as code. This is
impossible to replicate in XML, unless one uses <[CDATA[ ]]>. However, if one
places a  <[CDATA[ ]]> inside, then the parser will render those letters
literally as well. 

As for what the specification draft says: Normally one would not say that the
XMP example "contains" "<", ">" or "&". Instead, it contains a <p> element and
a NCR. And it is, eventually, child elements and NCRs that needs to be
forbidden inside an xmp element that occurs in a polyglots document.

(4) No need to escape the *characters* <>&. (Needs separate bug too.)

>From XML's point of view, there isn't anything special with regard to "<", ">"
and "&" inside xmp and plaintext: In all XML documents, the "<" and "&" must -
in general -always be escaped. Thus they can neither occur whether inside
xmp/plaintext or anywhere else. And, as long as they are escaped, then ">" does
not constitute a problem, as far as I can see. Thus, nothing speciall needs to
be said about "<" and ">" or "&" inside xmp/plaintext . Instead, it needs to be
said aht xmp cannot contain elements or NCRs - see (3) above.

CONCLUSION: Delete the entire section. Or, eventally, say that <plaintext> MUST
NOT be used but that <XMP> can be used provided that it has no children and no
NCRs.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 28 January 2011 11:50:53 UTC