This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 9968 - The definition of 'polyglot document' should be be stricter
Summary: The definition of 'polyglot document' should be be stricter
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML/XHTML Compat. Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Eliot Graff
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/html-xhtml-au...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-21 07:48 UTC by Henri Sivonen
Modified: 2010-10-05 13:07 UTC (History)
4 users (show)

See Also:


Attachments

Description Henri Sivonen 2010-06-21 07:48:14 UTC
Currently, the polyglot publication says:
"A polyglot document is an HTML5 document which is at the same time an XML document and an HTML document, and which meets a well defined set of constraints."

The above sentence allows for documents that have a different document tree when parsed as HTML and XML. (Unless, of course, the well-defined set of constraints bans this case.) For example:
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml"><title></title></html>

Please define a polyglot document in terms of the same document tree:
"A polyglot document is a stream of bytes that parses into identical document trees (with the exception of the xmlns attribute on the root element) when processed as HTML and when processed as XML."
Comment 1 Eliot Graff 2010-09-07 22:10:35 UTC
Editor's Draft now uses the text below:

Abstract
A document that uses polyglot markup is document that is a stream of bytes that parses into identical document trees (with the exception of the xmlns attribute on the root element) when processed as HTML and when processed as XML. Polyglot markup that meets a well defined set of constraints is interpreted as compatible, regardless of whether they are processed as HTML or as XHTML, per the HTML5 specification. Polyglot markup uses a specific DOCTYPE, namespace declarations, and a specific casenormally lower case but occasionally camel casefor element and attribute names. Polyglot markup uses lower case for certain attribute values. Further constraints include those on empty elements, named entity references, and the use of scripts and style. 


Thanks,

E