Difference between revisions of "PolyglotRecommendationRationale"

From HTML WG Wiki
Jump to: navigation, search
m (Why good value)
m (Why good value)
Line 23: Line 23:
 
# ''Conservative markup may help authors as well as the language itself.'' Currently, markup best-practises are defined outside the HTMLwg. Example: The above mentioned [http://www.adsafe.org ADsafe] JavaScript profile has rules not only for JavaScript but for HTML as well, including a restriction to use UTF-8, which happens to be a Polyglot Markup requirement as well. ADsafe also forbids <code>document.write</code> — which is also not used by Polyglot Markup. Not only does this example show that there can be real value in a conservative spec, but it also shows that there is a market for such spec, for which the HTMLwg should offer real value. And ''— by the way –'' the effect of this does not need to be that XML gets more attetion — it could just as well lead to an attetion to the secure subset of the <code>text/html</code> serialization.
 
# ''Conservative markup may help authors as well as the language itself.'' Currently, markup best-practises are defined outside the HTMLwg. Example: The above mentioned [http://www.adsafe.org ADsafe] JavaScript profile has rules not only for JavaScript but for HTML as well, including a restriction to use UTF-8, which happens to be a Polyglot Markup requirement as well. ADsafe also forbids <code>document.write</code> — which is also not used by Polyglot Markup. Not only does this example show that there can be real value in a conservative spec, but it also shows that there is a market for such spec, for which the HTMLwg should offer real value. And ''— by the way –'' the effect of this does not need to be that XML gets more attetion — it could just as well lead to an attetion to the secure subset of the <code>text/html</code> serialization.
 
# ''One syntax, two serialization is a feature.'' A tool vendor could serve two usergroups via one syntax. And this, in turn, has the potential of simplfying the tool for its users, as it would the vendor to skip poking the user to make choices about character encoding or markup format.
 
# ''One syntax, two serialization is a feature.'' A tool vendor could serve two usergroups via one syntax. And this, in turn, has the potential of simplfying the tool for its users, as it would the vendor to skip poking the user to make choices about character encoding or markup format.
# ''It keeps the XHTML simple.'' While Polyglot Markup also adds requirements (like <code>&lt;!DOCTYPE html></code>) to XHTML5, overall, the HTML-compatibility requirements holds XHTML in the ears and keeps it simple. E.g. it forbids non-UTF-8, it forbids the XML declaration and so on. And best of all: This is not an artificial extra but an effect of HTML5’s design.   
+
# ''It keeps the XHTML simple.'' While Polyglot Markup also adds requirements (like <code>&lt;!DOCTYPE html></code>) to XHTML5, overall, the HTML-compatibility requirements holds XHTML in the ears and keeps it simple. E.g. it forbids non-UTF-8, it forbids the XML declaration and so on. And best of all: This is not an artificial extra but an effect of HTML5’s design — Polyglot Markup is merly picking the fruit.   
 
# ''It adds pedagogical value.'' While not being something that itself makes it worth sendting Polyglot Markup for Recommendation, the single syntax highlights, in a pedagogical way, how HTML5 ''itself'' is designed to be XML-compatible and often permits the XML syntax within HTML documents and, as well, the differences between the HTML DOM and the XML DOM.
 
# ''It adds pedagogical value.'' While not being something that itself makes it worth sendting Polyglot Markup for Recommendation, the single syntax highlights, in a pedagogical way, how HTML5 ''itself'' is designed to be XML-compatible and often permits the XML syntax within HTML documents and, as well, the differences between the HTML DOM and the XML DOM.
  

Revision as of 04:41, 7 December 2012

Rationale for Polyglot Markup on the Recommendation track

Summary: The specification Polyglot Markup: HTML-Compatible XHTML Documents (aka Polyglot spec) should be considered for the Recommendation track.

Why recommendation

On the topic of making Polyglot Markup a recommendation:

  1. While we might not recommend that all authors (or in fact that many authors) should create Polyglot documents, we should Recommend that when authors want to create polyglot markup they do so by following the authoring requirements outlined in this specification. In fact, the introduction of the Polyglot spec does state that
    All web content need not be authored in polyglot markup."
  2. The Polyglot spec does include normative language that can be followed precisely and a document can be measured objectively against those requirements to see if it successfully adheres to the polyglot markup rules. It is possible to build a validator that checks a document to see if it is a valid polyglot document according to the polyglot spec and to identify the normative requirements that have been violated should the validation fail.
  3. There is precedent for guidance documents including authoring guidance to be published as Recommendations. For example:

Why normative language

On the topic of using normative language in the specification:

  1. The purpose for using normative language in the specification is to make it clear which parts are necessary to conform to the specification and which parts provide advisory or informative content. The polyglot spec is intended to make it possible to objectively determine if a document adheres to its requirements and therefore it is appropriate to differentiate between normative and informative parts.
  2. One lesson from XHTML 1.0’s Appendix C is that vagueness and lack of normative status hurts: While XHTML 1.0 section 5 said that Appendix C was normative, Appendix C itself was only informative — which was confusing. That, combined with vague and somewhate convoluted rules that appeared quite permissive, as well as perhaps a lack of clear and well motivated principles behind the Appendix C-rules, lead to confusion and misunderstanding. To avoid repeat, clear, normatives and conformance-checable rules are needed.
  3. Authors are going to use XHTML5 syntax for text/html, hence it is of benefit that there is a normative spec for how to do it.

Why good value

On the topic of the value of promoting polyglot markup:

  1. Subsetting is a well known method for emphasizing on, and benefitting from, the good parts of a computer language. Example: From the author of “JavaScript. The Good Parts.” comes as well ADsafe, a Safe for Advertising subset of JavaScript that intends to remove its security risks via restrictions on the permitted syntax. Polyglot Markup can be viewed from a similiar angle.
  2. Conservative markup may help authors as well as the language itself. Currently, markup best-practises are defined outside the HTMLwg. Example: The above mentioned ADsafe JavaScript profile has rules not only for JavaScript but for HTML as well, including a restriction to use UTF-8, which happens to be a Polyglot Markup requirement as well. ADsafe also forbids document.write — which is also not used by Polyglot Markup. Not only does this example show that there can be real value in a conservative spec, but it also shows that there is a market for such spec, for which the HTMLwg should offer real value. And — by the way – the effect of this does not need to be that XML gets more attetion — it could just as well lead to an attetion to the secure subset of the text/html serialization.
  3. One syntax, two serialization is a feature. A tool vendor could serve two usergroups via one syntax. And this, in turn, has the potential of simplfying the tool for its users, as it would the vendor to skip poking the user to make choices about character encoding or markup format.
  4. It keeps the XHTML simple. While Polyglot Markup also adds requirements (like <!DOCTYPE html>) to XHTML5, overall, the HTML-compatibility requirements holds XHTML in the ears and keeps it simple. E.g. it forbids non-UTF-8, it forbids the XML declaration and so on. And best of all: This is not an artificial extra but an effect of HTML5’s design — Polyglot Markup is merly picking the fruit.
  5. It adds pedagogical value. While not being something that itself makes it worth sendting Polyglot Markup for Recommendation, the single syntax highlights, in a pedagogical way, how HTML5 itself is designed to be XML-compatible and often permits the XML syntax within HTML documents and, as well, the differences between the HTML DOM and the XML DOM.

Why no C risk

On the topic of Appendix C

While some sees the risk that Polyglot Markup on the Recommendation track would make it the new Appendix C, the Web in 2012 differs alot from the Web in 1999.

  1. C-mantics vs semantics: Polyglot Markup “safes” against semantic loss accross various parsers (including XML vs HTML) and, by removing many choices, it might allow those authors to whom it matters more focus on content than on code. But it does of course not directly affect semantics. When XHTML 1.0 came along, then all it offered was a XML version of HTML4. Thus much attention perhaps naturally was drawn to its syntax. By contrast, Polyglot Markup builds directly on HTML5, which defines a shared vocabulary for both XML and HTML.
  2. There is little consensus around polyglot. But on the flip side, the sceptical attention ought to also help prevent a repeat.
  3. XML is a real opportunity. In the days of Appendix C, XML was not a real option. Whereas today it can be a serious option, since all the common user agents support it. The ability to try it out live, will help authors keep the XML real.
  4. The normativity problem is different. Some of the problems of Appendix C, were related to normativity. That Appendix C was not normative, while XHTML 1.0 as such was, was probably part of the problem. The HTML language was updated via XHTML. And thus there was a desire to use the XHTML syntax. With Polyglot Markup, there is no need to “update” HTML via a ”foreign” specification from the “Land of XML”. HTML5 is already defined, and it has been defined on HTML’s own premises. Hence, the motivation behind the desire to use HTML-compatible XHTML is not the same or as wide as it perhaps becam when XHTML 1.0 was introduced. HTML5 already defines the two serializations and includes syntactic details that are meant to help moving between XML and HTML. Thus, for Polyglot Markup, it was relatively simple to define the exact syntax whereas Appendix C underestimated the problems due to lack of parser specification.

So, clearly, the situation is quite different.

  1. The situation is different. When we consider HTML4’s more arcane, SGML-inherited rules, then XHTML 1.0 was in many ways a simplification. Also, the seeds of Polyglot Markup can also be found in Appendix C. But Appendix C did for whatever reasons not reach far enough. For instance, section 1 on the XML declaration and on section 9 on the character encoding, which together in theory could have lead up to the same, simple decision with regard to encoding as Polyglot Markup has made, but didn't. But XHTML 1.0 also didn't form the same good basis for a polyglot spec as does XHTML5 and HTML5 when it comes to the same task. Simply put, XHTML 1.0 did not make as many brave, thoughtful and tested decisions as HTML5 and XHTML5 has made — for instance, XHTML 1 did not forbid the XML encoding declaration in HTML5. Or HTMl5’s rules about how only the only value permitted for <meta charset="FOO"/> when used in XHTML5, is UTF-8. As result, authors ended up not understanding what Appendix C said. As a result, there were authors who did not understand — on one side — why XHTML 1 was a simplification, and — on another side — why and how Appendix C was necessary to follow. In turn this lead to much irony about XHTML itself. We believe that, with HTML5, HTML and XHTML are understood as different languages. And this is also reflected in the spec title - polyglot markup.
  2. The scope is different. Polyglot Markup is also robust markup. An important aspect of Polyglot Markup is that it it is about more than about being polyglot — two languages in one. Polyglot Markup is could be said to be about discovering the beautiful and safe best-practise language within HTML — the subset that only supports the best practises that we want authors to use. Such as external scripts, external stylesheets, UTF-8, non-valid XML (only well-formed XML), no-quirks mode et cetera. This subset is partly a natural common denominator of XHTML5 and HTML5, and partly a “man made” subset.