This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12073 - Permit restricted use of <?xml version="1.0" encoding="UTF-8" ?>
Summary: Permit restricted use of <?xml version="1.0" encoding="UTF-8" ?>
Status: CLOSED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC All
: P2 major
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/semantic...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-15 05:05 UTC by Leif Halvard Silli
Modified: 2011-08-04 05:12 UTC (History)
9 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2011-02-15 05:05:45 UTC
HTML5 should permit the XML declaration as conforming or conforming but obsolete, provided ALL the following 3 conditions are met:

1) Regardless of whether the declaration contains the encoding declaration:
   a) It can be used both with an without the encoding declaration;
   b) It is only permitted for UTF-8 or UTF-16 encoded pages;
   c) It may only be used when the page has a page-internal encoding declaration in a META element or in a BOM;

2) If the XML declaration's encoding declaratation is present, then:
   a) it  can only have the value "UTF-8" or "UTF-16";
   b) it MUST EITHER reflect the encoding value of the META charset/content-type element, if present;
   c) OR it MUST reflect the encoding declaration of the BOM, if present;

3) comments before DOCTYPE become forbidden;  See bug 12072;
    (And X-UA-Compatible remains forbidden in legal HTML5 syntax.)

JUSTIFICATION - argumetns:

(A) Polyglot Authoring: most XML/XHTML editors out there will auto-insert the XML declaration, and it is often not so straight forward for authors to make the editor not do so. The more IE6 disappears, the less purpose is there in using energy on removing such auto inserted XML declarations.

(B) Same justification as for why HTML5 permits <meta charset="UTF-8"/> in XHTML5 documents: it makes it easier to move back and forth between XML and HTML.

(C) the conditions for its use promotes use of UTF-8, which is a strong benefit that partly outweighs the disadvantages;

(D) It would provide a legal way to trigger quirks-mode in IE6 (and IE6 only!). 
      The XML declaration is already a often adviced method which authors use. See:
http://www.gunlaug.no/contents/wd_additions_16.html and
http://www.456bereastreet.com/archive/200904/using_an_xml_declaration_triggers_quirks_mode_in_ie_6/
http://css-discuss.incutio.com/wiki/Rendering_Mode
     This, of course, also has a disadvantage: A version of IE could be set in quirksmode through use of legal HTML5 syntax.
     However, 
     a) It would still be non-conforming of IE6 do go in to quirksmode because of the XML declaration
     b) the proposal to completely forbid comments before the DOCTYPE  - see bug 12072 - is far more important and, to a great deal, outweighs the possible disadvantages to alowing this, restricted, use of the XML declaration.

DISADVANTAGES - discussion:

* A permission to use <?xml version="1.0" encoding="UTF-8" ?> provides a legal way to offer quirks-mode in IE6. 
   This of course can be seen as a disadvantage. 
    OTOH, it is a behaviour specific to IE6, which is a disappearing browser and, for whic it is sometimes {but not anymore?} recommended to use quirks-mode.

ADVANTAGES - discussion:

* if comments before the DOCTYPE are forbidden (see bug 12072),  then the
   proprietary X-UA-COMPATIBLE meta element is largely prevented from 
   having effect in legal HTML5 syntax.  
   See http://www.w3.org/Bugs/Public/show_bug.cgi?id=12072#c1
    Thereby, it is also impossible to use X-UA-Compatible to trigger IE5-mode in IE8 (and IE9, I assume). (Which could otherwise be a temptation, if the XML declaration is permitted.)

* one avoids that authors have to invent other means than the XML declartion for triggering quirks mode in IE6

* the condiditions for its use promotes UTF-8 - you may call it sweetened pill.

CONSIDERATIONS - conforming but obsolete?

* Another option would be to make it conforming but obsolete and with a warning that it triggers quirks-mode in some legacy UAs. On balance, conforming but obsoleete is the option that I would prefer.
Comment 1 Ian 'Hixie' Hickson 2011-02-15 05:57:17 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: This is out of scope for the HTML spec. The HTML spec can't change what is legal XML syntax. That's defined by the XML spec.
Comment 2 Leif Halvard Silli 2011-02-15 07:23:20 UTC
(In reply to comment #1)

Just to clarify: when I said that

]] HTML5 should permit the XML declaration as conforming or conforming but obsolete [[

then I meant "inside the text/html syntax".  I did not intend - at all or in any way - to state anything about what should be possibe or allowed inside the XML syntax.

Status today is that the RFC for the 'text/html' MIME type says that XHTML 1.0 defines a profile which is compatible with HTML. This profile is, as we know, defined by the infamous Appendix C. And Appendix C does not forbid the XML declaration - it instead sounds more as if it recommend it. This is my basis for sahying that HTML5 could define some variants of it as "obsolete but conforming".

I see this as similar to how HTML5 defines the XHTML1.0 strict DOCTYPE as OBSOLETE but conforming: despite that it is a XHTML doctype, HTML5 states that it is obsolete bt conforming text/html. (Obsoleteness is usually a state that is "awarded" to things that once was conforming.)
Another simiarlity is that other XHTML1 doctypes are considered non-conforming.

I thus suggest a similar status, based on a similar evaluation, for the XML declaration: certain uses of it should be conforming.
Comment 3 Kornel Lesinski 2011-02-15 20:46:44 UTC
It's pretty easy to delete a line in text editor, so I don't think editor's templates are strong enough justification for the change.

As for XML tools, you can't use XML serializers to reliably create correct polyglot documents.

Tools that insist on XML declaration are obviously unaware of polyglot requirements and are not going to obey many other more important rules. And we already know it's impossible to make HTML compatible with any output XML serializer could generate.

In the end if you want to serialize polyglot documents, you MUST have polyglot serializer, and polyglot serializers won't force an XML declaration.
Comment 4 Leif Halvard Silli 2011-02-16 09:37:51 UTC
(In reply to comment #3)
> It's pretty easy to delete a line in text editor, so I don't think editor's
> templates are strong enough justification for the change.

To delete a line in the source code in a text editor is a huge step for many Web content authors.

As for those who are able to do delete lines: many times it is difficult or impossible to change the template of a WYSIWYG editor. Thus, one would have to post-edit the code that the WYSIWYG editor creates, to get it right. This is simple to do once, but if you have to edit the source code each time you edit that page with that WYSIWYG editor, then it becomes tedious.
 
> As for XML tools, you can't use XML serializers to reliably create correct
> polyglot documents.

I did not have "XML serializers" in mind. I have in mind WYSIWYG types of XHTML editors.

> Tools that insist on XML declaration are obviously unaware of polyglot
> requirements

I agree. Except that it seems to be common to be wanting to place IE6 in QuirksMode. And <?xml version="1.0" ?> is the simple way to get that effect.

>  and are not going to obey many other more important rules. 

Here I believe you are incorrect: For the type of tools that I have in mind - WYSIWYG XHTML editors - it is actualy quite common to produce source code that is largely polyglot. 

Examples of WYSIWYG editors which largely produce polyglot XHTML: Amaya, Mozilla/Gecko based editors, Oxygen in XHTML modes, Xopus, Xstandard. And probably most other WYSIWYG XHTML editors that are able to produce XHTML 1.0 code.

> And we
> already know it's impossible to make HTML compatible with any output XML
> serializer could generate.
> 
> In the end if you want to serialize polyglot documents, you MUST have polyglot
> serializer, and polyglot serializers won't force an XML declaration.

AGAIN: polyglot serializers exist - most WYSIWYG XHTML editors are, in fact, largely polyglot. In many cases, the XML declaration is the only piece of code that breaks with polyglot markup.

As for your last statement, that «polyglot serializers won't force an XML declaration»:
   a) that is a tautological statement.  http://en.wikipedia.org/wiki/Tautology_(rhetoric)
   b) then you treat the XML declaration as the quint-essential  symbol that shows if the code is polyglot or not. I do not understand what you base that on. If a HTML document brings a  *conforming* HTML5 **parser** lands in quirksmode then  we can be certain that that particular document is not polyglot.  So far so good. However, only non-conforming HTML5 parsers are placed in QuirksMode because of the XML declaration. Therefore, for **authors**,  provided that  the XML declaration becomes *permitted* (the way I suggest in this bug) by the HTML5 specification, then it would be perfectly polyglot to use the XML declaration in polyglot code. (Again, IE6 is not conforming.)
Comment 5 Henri Sivonen 2011-02-16 10:10:14 UTC
(In reply to comment #2)
> (In reply to comment #1)
> 
> Just to clarify: when I said that
> 
> ]] HTML5 should permit the XML declaration as conforming or conforming but
> obsolete [[
> 
> then I meant "inside the text/html syntax". 

I strongly disapprove of making stuff that goes through the "bogus comment" state valid. I also strongly disapprove of changing the parsing algorithm to avoid the "bogus comment" state in this case.

That is, I think this should be WONTFIX.
Comment 6 Leif Halvard Silli 2011-02-16 11:04:45 UTC
(In reply to comment #5)
> (In reply to comment #2)
> > (In reply to comment #1)
> > 
> > Just to clarify: when I said that
> > 
> > ]] HTML5 should permit the XML declaration as conforming or conforming but
> > obsolete [[
> > 
> > then I meant "inside the text/html syntax". 
> 
> I strongly disapprove of making stuff that goes through the "bogus comment"
> state valid. I also strongly disapprove of changing the parsing algorithm to
> avoid the "bogus comment" state in this case.

Would it be necessary to change the parser?

The underlying assumption here seems to be:

 A) if it is treated as a "bogus comment" by the parser, then it should be forbidden.
 B) if it is considered conforming, then it should not be considered a "bogus comment" by the parser.

And, yes, it seems like HTML5 tries to align "bogus comment" and "non-conforming". 

When I say that it should be considered "obsolete but conforming", then the assumption of those who are deeply familiar with teh HTML5 parser, will perhaps be that the XML declaration should  also have an effect? (Similar to how for instance <a name=foo></a>  has an effect, even if it is obsolete.)

My proposal is that <? ?> should continue to be classified as a "bogus comment", but that this particular bogus comment nevertheless should be accepted as permitted in the HTML5 syntax.

Thus, I don't suggest to change the HTML5 parser.

> That is, I think this should be WONTFIX.
Comment 7 Kornel Lesinski 2011-02-16 20:27:35 UTC
I realize that "polyglot serializers will observe polyglot serialization rules" is a tautology. That's good, because it means we can dictate what the syntax is going to be (IMHO it should be minimal without unnecessary talismans) without having to worry about other tools that are unsuitable for the task.

IE6 is fading away, and I think conforming HTML5 documents should never be allowed to be in quirks mode, so this use case for me is an argument against allowing XML declaration.

What editors are we talking about? Do they support other HTML5 elements and polyglot rules, or are they just XHTML Appendix C compatible?

If they're just "XHTML1/HTML4" polyglots, then it may be a good thing to flag something is not right when you want to use it for XHTML5/HTML5.
Comment 8 Leif Halvard Silli 2011-02-16 21:25:51 UTC
(In reply to comment #7)

> (IMHO it should be minimal without unnecessary talismans)

The spec says that ]] Authors should not use obsolete permitted DOCTYPEs, as they are unnecessarily long.[[ The spec could say a similar thing about the XML declaration.

> IE6 is fading away, and I think conforming HTML5 documents should never be
> allowed to be in quirks mode, so this use case for me is an argument against
> allowing XML declaration.

IE6 is fading away. But, nevertheless, I am sympathetic to that line of thought. 

> What editors are we talking about? Do they support other HTML5 elements and
> polyglot rules, or are they just XHTML Appendix C compatible?

Appendix C is the only  polyglot definition today, so yes.

> If they're just "XHTML1/HTML4" polyglots, then it may be a good thing to flag
> something is not right when you want to use it for XHTML5/HTML5.

Don't worry: most of them will most likely be using one of the conforming but obsolete DOCTYPEs, which means that "something" *will* be flagged.
Comment 9 Ian 'Hixie' Hickson 2011-05-04 20:27:30 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: I see no value in allowing an XML character encoding declaration in text/html: it has no effect, so we'd have to limit it to UTF-8, and if we do that, it would have no effect in XML, which defaults to UTF-8. What problem are we trying to solve here?
Comment 10 Michael[tm] Smith 2011-08-04 05:12:52 UTC
mass-move component to LC1