Bug 19925 - Drop XHTML from the title of the document
Drop XHTML from the title of the document
Status: RESOLVED FIXED
Product: HTML WG
Classification: Unclassified
Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff)
unspecified
PC Linux
: P2 enhancement
: ---
Assigned To: Leif Halvard Silli
HTML WG Bugzilla archive list
http://www.w3.org/TR/html-polyglot/#t...
:
Depends on: 19923
Blocks: 12725
  Show dependency treegraph
 
Reported: 2012-11-09 20:07 UTC by Sam Ruby
Modified: 2013-09-02 00:07 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam Ruby 2012-11-09 20:07:39 UTC
Note: this is just a suggestion, feel free to dispose of it as you wish.  

As an alternative to dropping all normative requirements and publishing the document as a Note, consider repositioning this document as a normative and entirely optional profile of HTML which seeks to define constraints on the serialization of a DOM tree in a robust manner that is likely to retain semantics in when said serialization is reparsed using a variety of parsers, be the full featured and  bug free HTML5 parsers, somewhat HTML aware parsers, and even XML parsers.

Include in ths set requirements to using utf-8 even if not precisely required by any of these parsers.

Add an intro section which describes the benefits of robust syntax, even when producing expected to only be parsed and validated by fully HTML5 conforming tools.  As an example, in HTML5, close tags for paragraph elements are completely optional and will be inferred if not present.  Inclusion of close tags cause no harm beyond a minor increase in transfer size (an increase often mitigated by compression), but does allow validators to detect situations where the implicit closing rules don't match what the author intended.
Comment 1 Leif Halvard Silli 2012-11-09 22:10:37 UTC
(In reply to comment #0)
> Note: this is just a suggestion, feel free to dispose of it as you wish.  

The proposal to drop "XHTML" from the title is only a "bikeshedding" type of thing, right? (I.e. you don't propose that XHTML-compatibility becomes optional. That is: you don't propose that "closing tags" is enough.) If one drops it from the title, then one might need to add it elsewhere. It would be good if you proposed a full alternative title - as is, it is not enough to just drop "XHTML". And I must say that I don't quite see the benefit of dropping it.

Regarding "profile", then I have always seen this a profile. So I am only favor of making that aspect stronger. IN that regard, then I have always reacted to the "guidelines" description found in the list of "Editor's drafts": http://www.w3.org/html/wg/#current

I would also like to suggest that the intro describes polyglot markup as a single syntax that authoring tools could implement - as alternative to two syntaxes, XHTML5 and HTML5. Something which has the potential of simplifying things for users of such authoring tools.

I feel that most of what you say are things we have thought about when working with this spec. But I agree that it could be good to describe the advantages better.
Comment 2 Sam Ruby 2012-11-10 01:45:28 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > Note: this is just a suggestion, feel free to dispose of it as you wish.  
> 
> The proposal to drop "XHTML" from the title is only a "bikeshedding" type of
> thing, right?

If by this you mean that my suggestion is to improve the labelling in order to reduce the alleged confusion, then yes.

> Regarding "profile", then I have always seen this a profile. So I am only
> favor of making that aspect stronger.

+1
Comment 3 Leif Halvard Silli 2012-11-11 03:35:29 UTC
(In reply to comment #2)

Here is a name change proposal:

  <h1>HTML5 United</h1>
  <h2>A unified, HTML5-conforming and semantically robust
      authoring syntax profile for any HTML or XML parser.</h2>


   (If that title becomes too cool: <h1>HTML United Profile</h1>)


   Justification for the word choice: 

  A. United: Alludes to single syntax/togetherness/safe/strong etc

  B. Unified: Single syntax/Unification of XHTML5 and HTML5 syntax

  C. Semantically robust: It is always interpreted the same.

  D. HTML5-conforming: This is an important promise. We want
     authors to feel that they - without risk for validation
     punishment - can try this syntax. (Thus we must not fall
     for the temptation to allow something that HTML5 doesn't
     allow first - despite that it is an  extension spec.)

  E. Authoring syntax = to emphasize that this is a "how to 
     author" kind of specification - and not a parser spec.
     Also important to link "semantically robust" and "syntax",
     to prevent people from thinking that XHTML is "more 
     semantic" in and by itself, just because it is XML.

  F. For any HTML or XML parser: It would be possible to drop
     these 6 words, I guess. But, actually, these words justify
     the choice of the UTF-8 encoding as the single encoding
     (because, as the spec already says: UTF-8 is the only
     encoding that ALL HTML and XML parsers MUST support.]

  G. I chose to drop "polyglot" because it confuses those that
      know its computer meaning as well as those that don't.

     'Polyglot' also has the drawback that it focuses on "many" 
     ("poly-") whereas the attraction of polyglot markup rather
     is that it is a single syntax …

  H. I chose to split the title in two, because this is how 
     the HTML5 spec is titled.

I did not say 'HTML5 United Syntax" or "HTML5 United Markup", in order to avoid that - again - readers "jump to conclusions" about what this profile is.

I was quite literal in dropping "XHTML" … even if it could be argue that we should add "XHTML5-conforming" as well.
Comment 4 Leif Halvard Silli 2012-11-11 03:43:36 UTC
(In reply to comment #3)

Correction: To avoid repetition of "HTML5", the <h1> could be:

    <h1>HTML United</h1>
Comment 5 Sam Ruby 2012-11-12 16:11:20 UTC
(In reply to comment #4)
> 
>     <h1>HTML United</h1>

I don't believe that would address the alleged confusion.

If the intent is limited exclusively to documenting the overlap of two syntaxes, then the either the current title or the proposed ("united") title would work for such a document.  Additional restrictions (such as "utf-8") don't have a place in such a document.

If the intent is instead to document a profile, then the title and introduction should reflect such an intent.

At the present time, we have a bug report (19923) that suggests making the content of the document match the title.  And a bug report (19925) that suggests making the title of the document match the content.
Comment 6 Leif Halvard Silli 2012-11-12 16:58:16 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > 
> >     <h1>HTML United</h1>
> 
> I don't believe that would address the alleged confusion.

(Another thing (than UTF-8) that deviates from what is strictly polyglot is the requirement to use lang="*"/xml:lang="*" on the root element.)

For title, then how about

         "Default HTML5"

The title 'Default HTML5' says that there are other ways to do it. At the same time it clearly expresses a profile - a (new) standard/level. A new floor. It expresses what you (according to this profile) should fall back to do, unless you "know what you are doing" etc. So we are spaking about an "authoring default".
Comment 7 Larry Masinter 2012-11-12 17:41:01 UTC
I like the current title containing "XHTML", and I think HTML/XHTML Compatibility Authoring Guide is exactly what people will look for.

If you want to be clearer about the scope of applicability of the document, I think it's more appropriate to do it in the text of the document.

The Abstract is actually more clear about the scope of the recommendations made than the Introduction is, and copying some of the abstract text into the Introduction (and explicitly labeling it Scope, perhaps) might help clarify why this document is intended as a normative recommendation for a narrow range of applications.
Comment 8 Leif Halvard Silli 2012-11-12 18:13:23 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > 
> > >     <h1>HTML United</h1>
> > 
> > I don't believe that would address the alleged confusion.

> For title, then how about
> 
>          "Default HTML5"

HTML5 itself does say that new docs SHOULD _default_ to UTF-8. And I believe it recommends using @lang as well.
Comment 9 Sam Ruby 2012-11-12 18:46:24 UTC
(In reply to comment #6)
> 
> For title, then how about
> 
>          "Default HTML5"
> 
> The title 'Default HTML5' says that there are other ways to do it. At the
> same time it clearly expresses a profile - a (new) standard/level. A new
> floor. It expresses what you (according to this profile) should fall back to
> do, unless you "know what you are doing" etc. So we are spaking about an
> "authoring default".

'Default' doesn't give you any indication as to WHY.  Nor does it make it clear that it is a profile.

I would suggest something along the lines of 'robust profile for HTML'... to draw an association with the http://en.wikipedia.org/wiki/Robustness_principle

The current HTML5 core specification prioritizes implementor concerns over authoring concerns and defines a robust parsing algorithm for compensating for imperfect documents.  This profile complements that specification by defining a robust profile for documents which compensates for imperfect parsers.
Comment 10 Leif Halvard Silli 2012-11-12 20:45:55 UTC
(In reply to comment #9)
> (In reply to comment #6)
> > 
> > For title, then how about
> > 
> >          "Default HTML5"
> > 
> > The title 'Default HTML5' says that there are other ways to do it. At the
> > same time it clearly expresses a profile - a (new) standard/level. A new
> > floor. It expresses what you (according to this profile) should fall back to
> > do, unless you "know what you are doing" etc. So we are spaking about an
> > "authoring default".
> 
> 'Default' doesn't give you any indication as to WHY.  Nor does it make it
> clear that it is a profile.

OK. But I suggest to burry 'profile' a little - it is not very sexy. See below.

> I would suggest something along the lines of 'robust profile for HTML'... to
> draw an association with the
> http://en.wikipedia.org/wiki/Robustness_principle

Ah, I did not get, before now, that 'robust' links to Postel’s law … A good point.

> The current HTML5 core specification prioritizes implementor concerns over
> authoring concerns and defines a robust parsing algorithm for compensating
> for imperfect documents.  This profile complements that specification by
> defining a robust profile for documents which compensates for imperfect
> parsers.

To make it as easy as possible for the editor, how about simply replacing 'polyglot' with 'robust'? (At least for the most part.) Also, in order to speak about XHTML5, without saying it, I would suggest to alter the part after the colon as well. 

All in all, I propose:

   <h1>Robust Markup: A serialization agnostic profile of HTML5.</h2>
Comment 11 Leif Halvard Silli 2012-11-12 20:48:52 UTC
(In reply to comment #10)

> To make it as easy as possible for the editor, how about simply replacing
> 'polyglot' with 'robust'? 

By this I alluded to the fact we probably want that the editor to replace all the 110 occurences of 'polyglot' with 'robust'.
Comment 12 Leif Halvard Silli 2012-11-12 20:59:00 UTC
(In reply to comment #11)
> (In reply to comment #10)
> 
> > To make it as easy as possible for the editor, how about simply replacing
> > 'polyglot' with 'robust'? 
> 
> By this I alluded to the fact we probably want that the editor to replace
> all the 110 occurences of 'polyglot' with 'robust'.

Hm - I could not be done 100% mechanically. But at least, you get the point.
Comment 13 Larry Masinter 2012-12-04 23:56:42 UTC
In http://www.w3.org/wiki/Evolution/MIME I tried to distinguish between polyglot / multiview and specialization for use cases where more than one content label or interpretation method might apply.

The word 'polyglot' is memorable, and evokes quite well the "multiple languages" meme. And the two languages, HTML and XHTML, are different languages. None of the other terms suggested evoke the capabilities being described.

The word has entered the lexicon and discussed widely. None of the other proposals ("United", "Robust") do anything but hide the document from people searching for it.

Documents should be titled in a way that someone not familiar with the politics of the committee can find the document again without a lot of bother. None of the title renamings improve it from the current title, and most make retrieval significantly worse (IMO).
Comment 14 Leif Halvard Silli 2012-12-05 03:41:45 UTC
(In reply to comment #13)

Larry, I think you add good points. Could we achive the goal by adding a subtitle instead? Many specs have substitle. For example - and in order to emphasize the "left side" of the Robustness Principle:


<h1>Polyglot Markup: HTML-Compatible XHTML Documents</h1>
<h2>A conservative and serialization agnostic profile of HTML5.</h2>
Comment 15 Leif Halvard Silli 2012-12-11 05:52:23 UTC
(In reply to comment #14)
> (In reply to comment #13)

> <h1>Polyglot Markup: HTML-Compatible XHTML Documents</h1>
> <h2>A conservative and serialization agnostic profile of HTML5.</h2>

While not yet convinced about removing "XHTML", I very much support Sam’s idea of adjusting the scope of the spec.

However, we should look wider than the robustness principle/Postel’s law
http://en.wikipedia.org/wiki/Robustness_principle

Other relevant principles could be

* Redundancy http://en.wikipedia.org/wiki/Redundancy_(engineering)
* Fault tolerance http://en.wikipedia.org/wiki/Fault_tolerance

For instance, redundancy considerations could be a reason to recommend to have more than one failure point when it comes to setting the encoding.

Another example is the DOCTYPE: It is of no use in XHTML, but the risk that XHML is consumed as text/html is so high that it is a good idea to include it even in pure XHTML documents.
Comment 16 Leif Halvard Silli 2013-04-18 23:47:14 UTC
I concur with the arguments of the bug filer.

As part of solving this bug, I have update the the title become

   «Polyglot Markup: A robust profile of the HTML5 vocabulary»

See http://dev.w3.org/cvsweb/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html?annotate=1.96

With this change, I have fixed the part of the bug that relates to the title of the document - the 'XHTML' word has been removed. The addition of the words 'robust' and 'profile' seems very much in line with the bug filer’s argumentation. As for the term 'the HTML5 vocabulary', then it is a reference to the subtitle of the HTML5 specification ('A vocabulary and associated APIs for HTML and XHTML'), and thus bears a reference to HTML as the both a XHTML and HTML vocabluarly.

I will try to incorporate more of the bug filer’s proposals in a later commit, upon which I’ll close the bug. But I will reveal that much heed will given to 'robust'.

Just a brief justification for the removal of 'XHTML': Polyglot Markup already puts have many requirements that or not just a the bare minimum for creating a XHTML/HTML document. For example, the requirement to use both xml:@lang and @lang is not because XHTML-compatible parsers are not required to understand what @lang means. Even SVG has a @lang attribrute (though with different semantics from xml:@lang), which SVG-supporting XML parsers must support. Likewise, the requirement of polyglot markup to use @type="text/javascript" and @type="text/css" is *also not* because XHTML parsers are not required to support the *omission* of those attributes.

Btw, I even considered removing 'polyglot', but Larry has given good arguments for keeping it.
Comment 17 Leif Halvard Silli 2013-09-02 00:07:19 UTC
I have now included more from the bug filer’s original suggestion:

1) In the new scope section (of the introduction), I added wording that describe d this profile as, quote: "entirely optional". 

2) Added a section about robustness, where some the examples suggested in comment 0, plus some other points, are made/explained. 

For an overview, see the Table of Contents:
http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#toc

Or go directly to the robust section:
http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#robust

Or the scope section:
http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#scope