Difference between revisions of "PolyglotRecommendationRationale"

From HTML WG Wiki
Jump to: navigation, search
m (Why no C)
m (Why no C)
Line 36: Line 36:
 
# ''The normativity problem is different.'' Some of the problems of Appendix C, were related to [[#normative|normativity]]. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps became when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.
 
# ''The normativity problem is different.'' Some of the problems of Appendix C, were related to [[#normative|normativity]]. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps became when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.
 
# ''The situation is different.'' When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. And some seeds to Polyglot Markup are also present in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, Appendix C [http://www.w3.org/TR/xhtml1/#C_1 section 1] and [http://www.w3.org/TR/xhtml1/#C_9 section 9] could in in theory have lead up to the same, simple decision with regard to character encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as HTML5 does when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 has made — for instance, Appendix C did not forbid the XML encoding declaration. Nor did it have HTML5 rule that limits the value of <code>&lt;meta charset="FOO"/></code> to <code>utf-8</code> when used in an XHTML5 file.  However, with HTML5, then HTML and XHTML are well understood as different languages. And this is in fact also reflected in the spec title - Polyglot Markup.
 
# ''The situation is different.'' When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. And some seeds to Polyglot Markup are also present in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, Appendix C [http://www.w3.org/TR/xhtml1/#C_1 section 1] and [http://www.w3.org/TR/xhtml1/#C_9 section 9] could in in theory have lead up to the same, simple decision with regard to character encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as HTML5 does when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 has made — for instance, Appendix C did not forbid the XML encoding declaration. Nor did it have HTML5 rule that limits the value of <code>&lt;meta charset="FOO"/></code> to <code>utf-8</code> when used in an XHTML5 file.  However, with HTML5, then HTML and XHTML are well understood as different languages. And this is in fact also reflected in the spec title - Polyglot Markup.
# ''The scope is different.'' Polyglot Markup is also ''robust'' markup. The robustness consists partly of the very ability to cater for two parsers itself, and partly of the sideeffects of the restrictions on the syntax that the polyglot requirement set. Polyglot Markup could be said to be about discovering the beautiful and safe best-practise language ''within'' HTML — the shared subset of XHTML and HTML that promotes the practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is on one side just the ''natural common subset'' of XHTML5 and HTML5, and on the other side a “man made” subset, where the “man made” side is just as much to be found within the HTML5 spec’s conformance rules  (which does a lot to converge HTML and XHTML) as they are to be found within the spec for Polyglot Markup. Polyglot Markup differs from Appendix C both in its “man made” rules and its “natural common subset'' derived rules. Polyglot Markup’s most important principle is probably the ''HTML-compatibility principle''. While Appendix C, on the other hand (and despite that it defined a profile for <code>text/html</code>!), seems to have been just as much about maximum XML-compatibility. Thus Appendix C  on one hand [http://www.w3.org/TR/xhtml1/#C_1 warn against processing instructions]  – for HTML-compatibility, and yet [http://www.w3.org/TR/xhtml1/#C_14 reccommend them]) for XML-compatibility.  That said, Appendix C did anyhow not shy off from claiming that XML “[http://www.w3.org/TR/xhtml1/#C_11 User Agents need to adapt]”  to HTML’s permission to drop elements that nevertheless are drawn in the <code>text/html</code> DOM, whereas Polyglot Markup is clear about the fact that XML-compatibility requires the author to express ''all'' elements in the code.
+
# ''The scope is different.'' Polyglot Markup is also ''robust'' markup. The robustness consists partly of the very ability to cater for two parsers at once, and partly of the sideeffects of the restrictions on the syntax that the polyglot requirement set. Polyglot Markup could be said to be about discovering the beautiful and safe best-practise language ''within'' HTML — the shared subset of XHTML and HTML that promotes the practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is on one side just the ''natural common subset'' of XHTML5 and HTML5, and on the other side a “man made” subset, where the “man made” side is just as much to be found within the HTML5 spec’s conformance rules  (which does a lot to converge HTML and XHTML) as they are to be found within the spec for Polyglot Markup. Polyglot Markup differs from Appendix C both in its “man made” rules and the rules that it derives from the “natural subset” of XML and HTML. Polyglot Markup’s most important principle is probably the ''HTML-compatibility principle''. While Appendix C, on the other hand (and despite that it defined a profile for <code>text/html</code>!), seems to have been just as much about maximum XML-compatibility. Thus Appendix C  on one hand [http://www.w3.org/TR/xhtml1/#C_1 warn against processing instructions]  – for HTML-compatibility, and yet [http://www.w3.org/TR/xhtml1/#C_14 reccommend them]) for XML-compatibility.  That said, Appendix C did anyhow not shy off from claiming that XML “[http://www.w3.org/TR/xhtml1/#C_11 User Agents need to adapt]”  to HTML’s permission to drop elements that nevertheless are drawn in the <code>text/html</code> DOM, whereas Polyglot Markup is clear about the fact that XML-compatibility requires the author to express ''all'' elements in the code. Polyglot Markup is based on principles that leads to a narrow conclusion and a narrow mission. Whereas Appendix C appears to have been specced for a broad, moving target and declares few or any principles for its choices — thus perhaps no wonder that the perceived message to many became that XML-looking syntax in and by itself is good.  This is very far from what Polyglot Markup does.

Revision as of 15:54, 7 December 2012

Rationale for Polyglot Markup on the Recommendation track

Summary: The specification Polyglot Markup: HTML-Compatible XHTML Documents (aka Polyglot spec) should be considered for the Recommendation track.

Why recommendation

On the topic of making Polyglot Markup a recommendation:

  1. While we might not recommend that all authors (or in fact that many authors) should create Polyglot documents, we should Recommend that when authors want to create polyglot markup they do so by following the authoring requirements outlined in this specification. In fact, the introduction of the Polyglot spec does state that
    All web content need not be authored in polyglot markup."
  2. The Polyglot spec does include normative language that can be followed precisely and a document can be measured objectively against those requirements to see if it successfully adheres to the polyglot markup rules. It is possible to build a validator that checks a document to see if it is a valid polyglot document according to the polyglot spec and to identify the normative requirements that have been violated should the validation fail.
  3. There is precedent for guidance documents including authoring guidance to be published as Recommendations. For example:

Why normative language

On the topic of using normative language in the specification:

  1. The purpose for using normative language in the specification is to make it clear which parts are necessary to conform to the specification and which parts provide advisory or informative content. The polyglot spec is intended to make it possible to objectively determine if a document adheres to its requirements and therefore it is appropriate to differentiate between normative and informative parts.
  2. One lesson from XHTML 1.0’s Appendix C is that vagueness and lack of normative status hurts: While XHTML 1.0 section 5 said that Appendix C was normative, Appendix C itself was only informative — which was confusing. That, combined with vague and somewhate convoluted rules that appeared quite permissive, as well as perhaps a lack of clear and well motivated principles behind the Appendix C-rules, lead to confusion and misunderstanding. To avoid repeat, clear, normatives and conformance-checable rules are needed.
  3. Authors are going to use XHTML5 syntax for text/html, hence it is of benefit that there is a normative spec for how to do it.

Why good value

On the topic of the value of promoting polyglot markup:

  1. Subsetting is a well known method for emphasizing on, and benefitting from, the good parts of a computer language. Example: From the author of “JavaScript. The Good Parts.” comes as well ADsafe, a Safe for Advertising subset of JavaScript that intends to remove its security risks via restrictions on the permitted syntax. Polyglot Markup can be viewed from a similiar angle.
  2. Conservative markup may help authors as well as the language itself. Currently, markup best-practises are defined outside the HTMLwg. Example: The above mentioned ADsafe JavaScript profile has rules not only for JavaScript but for HTML as well, including a restriction to use UTF-8, which happens to be a Polyglot Markup requirement as well. ADsafe also forbids document.write — which is also not used by Polyglot Markup. Not only does this example show that there can be real value in a conservative spec, but it also shows that there is a market for such spec, for which the HTMLwg should offer real value. And — by the way – the effect of this does not need to be that XML gets more attetion — it could just as well lead to an attetion to the secure subset of the text/html serialization.
  3. One syntax, two serialization is a feature. A tool vendor could serve two usergroups via one syntax. And this, in turn, has the potential of simplfying the tool for its users, as it ought to allow the vendor to skip poking the user to make choices about character encoding or markup format.
  4. It keeps the XHTML simple. While Polyglot Markup also adds requirements (like <!DOCTYPE html>) to XHTML5, overall, the HTML-compatibility requirements holds XHTML in the ears and keeps it simple. E.g. it forbids non-UTF-8, it forbids the XML declaration and so on. And best of all: This is not an artificial extra but an effect of HTML5’s design — Polyglot Markup is merly picking the fruit.
  5. It adds pedagogical value. While not being something that itself makes it worth sending Polyglot Markup for Recommendation, the single syntax highlights, in a pedagogical way, how HTML5 itself is designed to be XML-compatible and often permits the XML syntax within HTML documents and, as well, the differences between the HTML DOM and the XML DOM.

Why no C

On the topic of Appendix C

While some see the risk that Polyglot Markup on the Recommendation track would make it the new Appendix C, the Web in 2012 differs alot from the Web in 1999.

  1. C-mantics vs semantics: Polyglot Markup “safes” against semantic loss accross various parsers (including XML vs HTML) and, by removing many choices, it might allow those authors to whom it matters more focus on content than on code. But it does of course not directly affect semantics. When XHTML 1.0 came along, then all it offered was a XML version of HTML4. Thus much attention perhaps naturally was drawn to its syntax. By contrast, Polyglot Markup builds directly on HTML5, which defines a shared vocabulary for both XML and HTML.
  2. There might be little consensus around polyglot. But on the flip side, the sceptical attention ought to also help prevent history repeating.
  3. XML is a real opportunity. In the days of Appendix C, XML was not a real option. Whereas today it can be a serious option, since all the common user agents support it. The ability to try it out live, will help authors keep the XML real.
  4. The normativity problem is different. Some of the problems of Appendix C, were related to normativity. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps became when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.
  5. The situation is different. When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. And some seeds to Polyglot Markup are also present in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, Appendix C section 1 and section 9 could in in theory have lead up to the same, simple decision with regard to character encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as HTML5 does when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 has made — for instance, Appendix C did not forbid the XML encoding declaration. Nor did it have HTML5 rule that limits the value of <meta charset="FOO"/> to utf-8 when used in an XHTML5 file. However, with HTML5, then HTML and XHTML are well understood as different languages. And this is in fact also reflected in the spec title - Polyglot Markup.
  6. The scope is different. Polyglot Markup is also robust markup. The robustness consists partly of the very ability to cater for two parsers at once, and partly of the sideeffects of the restrictions on the syntax that the polyglot requirement set. Polyglot Markup could be said to be about discovering the beautiful and safe best-practise language within HTML — the shared subset of XHTML and HTML that promotes the practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is on one side just the natural common subset of XHTML5 and HTML5, and on the other side a “man made” subset, where the “man made” side is just as much to be found within the HTML5 spec’s conformance rules (which does a lot to converge HTML and XHTML) as they are to be found within the spec for Polyglot Markup. Polyglot Markup differs from Appendix C both in its “man made” rules and the rules that it derives from the “natural subset” of XML and HTML. Polyglot Markup’s most important principle is probably the HTML-compatibility principle. While Appendix C, on the other hand (and despite that it defined a profile for text/html!), seems to have been just as much about maximum XML-compatibility. Thus Appendix C on one hand warn against processing instructions – for HTML-compatibility, and yet reccommend them) for XML-compatibility. That said, Appendix C did anyhow not shy off from claiming that XML “User Agents need to adapt” to HTML’s permission to drop elements that nevertheless are drawn in the text/html DOM, whereas Polyglot Markup is clear about the fact that XML-compatibility requires the author to express all elements in the code. Polyglot Markup is based on principles that leads to a narrow conclusion and a narrow mission. Whereas Appendix C appears to have been specced for a broad, moving target and declares few or any principles for its choices — thus perhaps no wonder that the perceived message to many became that XML-looking syntax in and by itself is good. This is very far from what Polyglot Markup does.