This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 21818 - XHTML5: Permit <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Summary: XHTML5: Permit <meta http-equiv="Content-Type" content="text/html; charset=UT...
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 blocker
Target Milestone: ---
Assignee: Robin Berjon
QA Contact: HTML WG Bugzilla archive list
URL: http://www.w3.org/html/wg/drafts/html...
Whiteboard:
Keywords: CR
Depends on:
Blocks: 21174
  Show dependency treegraph
 
Reported: 2013-04-24 18:21 UTC by Leif Halvard Silli
Modified: 2014-08-19 06:37 UTC (History)
9 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2013-04-24 18:21:11 UTC
BACKGROUND: HTML5’s section on <meta charset="foo"/> [1] has this tail about its XHTML usage:

]]
If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding).
[[

PROPOSAL: Repeat above tail also for <meta http-equiv="Content-Type" content="text/html; charset=FOO"/> at the end of the fragment #attr-meta-http-equiv-content-type, like so:

]]
 The Encoding declaration state may be used in HTML documents. <ins>and in XML documents</ins>. But <ins>if</ins> elements with an http-equiv attribute in that state <del>must not be</del> <ins>are</ins> used in XML documents, <ins>then the name of the character encoding of the character encoding declaration MUST be UTF-8 (and the document is therefore forced to use UTF-8 as its encoding)</ins>
[[

JUSTIFICATION: This encoding declaration is more robust than the meta@charset declaration. Therefore we want this method in Polyglt Markup. But to allow it in Polyglot Markup, it must first be permitted in the HTML5 proper.

[1] http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-charset
Comment 1 Eliot Graff 2013-05-13 20:06:37 UTC
I marked this as a blocking bug. Note that this is blocking polyglot, NOT HTML5.
Comment 2 Simon Pieters 2013-05-14 07:51:19 UTC
(In reply to comment #0)

> JUSTIFICATION: This encoding declaration is more robust than the
> meta@charset declaration. 

[citation needed]
Comment 3 Leif Halvard Silli 2013-05-14 14:03:13 UTC
(In reply to comment #2)
> (In reply to comment #0)
> 
> > JUSTIFICATION: This encoding declaration is more robust than the
> > meta@charset declaration. 
> 
> [citation needed]

First, the primary justification is of course the same justification that HTML5 gives for allowing <meta charset="UTF-8"/> in XHTML5:
   ]] to facilitate migration to and from XHTML.[[
   http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-charset

That justification is, in turn, related to what HTML5 says about the http-equiv=Content-Type and meta@charset being equivalent features:
   ]] The Encoding declaration state is just an alternative form of setting
      the charset attribute: it is a character encoding declaration. [[
   http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-http-equiv-content-type

Being an alternative form, it makes little sense to have different usage rules for the two of them.


As for the additional justification, about "more robust than the meta@charset declaration", then let me cite the initial comment in bug 21174:

> 4. However, fact is that in some implementation segments, the @charset
> variant is not supported. For instance OpenOffice, on last check, did not
> support <meta charset="UTF-8"/>. Thus, if the authors wants to support such
> implementations, he/she has to not conformin to the polyglot spec

  (OpenOffice importer also failed to understand the BOM, 
   and doesn't understand HTTP.)

   Btw - Google Docs, when saving/'download as HTML', skips the encoding declaration and, instead, settles for character entities for non-ASCII content, *maybe* because they wish to increase compatibility e.g. with consumers such as OpenOffice (since OpenOffice do support character entities but don't support <meta charset="FOO"/>).

Other examples:

1) The XHTML5/HTML5 compatible WYSIWYG editor Freeway Pro 6.0.8 (anno 2013) sports (despite my bug report, btw) an import engine that doesn't support BOM or <meta charset/>

2) The HTML parser of XMLlib before version 2.8 does not understand <meta charset="UTF-8"/>
   (In 2.8, the charset declaration seems to work - as characters are rendered as character entities, but the http-equiv variant seems to work *better*, as characters are then rendered as is.)

Other justifications:

3) There are UTF-8 capable (legacy) authoring tools that can insert (obsolete, but conforming) DOCTYPEs *and* <meta http-equiv>, but which cannot insert <meta charset>. One such tool is Amaya (which e.g. my wife *insists* on using) - http://www.w3.org/Amaya/

4) Sam said, http://intertwingly.net/blog/2012/11/09/In-defence-of-Polyglot#c1352476295 
   "For example, I not only always use utf-8, but I also always 
    declare such BOTH in a meta tag AND in the content type."
   My *guess* is that *one* justification for Sam’s habit, is related to lack of 100% support for <meta charset="UTF-8"/>
Comment 4 Leif Halvard Silli 2013-05-14 14:40:04 UTC
(In reply to comment #0)

> 
> PROPOSAL: Repeat above tail also for <meta http-equiv="Content-Type"
> content="text/html; charset=FOO"/> at the end of the fragment
> #attr-meta-http-equiv-content-type, like so:
> 
> ]]
>  The Encoding declaration state may be used in HTML documents. <ins>and in
> XML documents</ins>. But <ins>if</ins> elements with an http-equiv attribute
> in that state <del>must not be</del> <ins>are</ins> used in XML documents,
> <ins>then the name of the character encoding of the character encoding
> declaration MUST be UTF-8 (and the document is therefore forced to use UTF-8
> as its encoding)</ins>
> [[

BTW, editors, please also add the related note,[1] which I hereby offer in http-equiv modified form:

]]The http-equiv="content-type" has no effect in XML documents, and is only allowed in order to facilitate migration to and from XHTML.[[

[1] http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-charset
Comment 5 Robin Berjon 2013-05-27 09:13:25 UTC
Trying to make this actionable, if we look inside http://www.w3.org/html/wg/drafts/html/master/document-metadata.html#attr-meta-http-equiv-content-type we currently have:

"""
<p>
The Encoding declaration state may be used in HTML documents, but elements with an http-equiv attribute in that state must not be used in XML documents.
</p>
"""

So the proposal would be to replace that with:

"""
<p>
The Encoding declaration state may be used in HTML documents and in XML documents. If the Encoding declaration state is used in an XML document, the name of the character encoding must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding).
</p>
<p class=note>
The Encoding declaration state has no effect in XML documents, and is only allowed in order to facilitate migration to and from XHTML.
</p>
"""

Is that correct? Would it address your concerns?

Has anyone checked implementations on this? I don't mind making this change so long as browsers *really* ignore this instruction in XHTML. A test case would be most valuable.
Comment 6 Leif Halvard Silli 2013-05-27 13:18:55 UTC
(In reply to comment #5)

> So the proposal would be to replace that with:
> 
> """
> <p>
> The Encoding declaration state may be used in HTML documents and in XML
> documents. If the Encoding declaration state is used in an XML document, the
> name of the character encoding must be an ASCII case-insensitive match for
> the string "UTF-8" (and the document is therefore forced to use UTF-8 as its
> encoding).
> </p>
> <p class=note>
> The Encoding declaration state has no effect in XML documents, and is only
> allowed in order to facilitate migration to and from XHTML.
> </p>
> """
> 
> Is that correct? Would it address your concerns?

Absolutely! This seems perfect.

> Has anyone checked implementations on this? I don't mind making this change
> so long as browsers *really* ignore this instruction in XHTML. A test case
> would be most valuable.

XML 1.0 only operates the following ways of setting the encoding:

* fallback/default = UTF-8
* UTF-16 (and UTF-8) sniffing
* BOM
* <?xml version="*" encoding="*"?> (forbidden in polyglot)
* server sent HTTP Content-Type header

Thus, meta elements are not included in XML’s encoding calculation rules. Neither is the XML MIME type supposed to be decided by looking into <meta>.

Now, with regard to testing, then I recommend to test with a XHTML file without file extension, loaded via file:/// URL. For example a file called "test". (And not "test.html" or "test.xhtml".) But note that not all browsers will accept to open a file without file extension. (The file extension tells the mime type.)

First test you could do is to check this:

<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />

vs this

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

RESULT: text/html, regardless (in Chrome and Firfefox. Safari refuses to open.)

And so on. Other tests are certainly possible, but I think the above proves the point.
Comment 7 Travis Leithead [MSFT] 2013-07-18 22:18:55 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:


   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: Applied changed text as discussed in the bug
Rationale: Consistent recommended guidance and note for content-type meta http-equiv processing as for meta charset attribute guidance.

See HTML5.1 Nightly commit:
https://github.com/w3c/html/commit/bcaf4c17d7f073e7647323a51f0a51d1a1998129

And also HTML5.0 CR Draft commit:
https://github.com/w3c/html/commit/2a9b7e8622c79e0a0ffea74274e1bc4806900efa
Comment 8 Michael[tm] Smith 2013-07-19 02:25:45 UTC
Speaking as a validator developer, I don't like this change at all and it's disappointing to see it landing prematurely. It introduces an unnecessary incompatibility with the WHATWG spec, and it's highly unlikely that the WHATWG spec is going to change based just on the arguments presented in this bug. I don't think Leif has demonstrated why the change is needed (the arguments presented in comment 3 are not convincing) nor worked to try to get consensus about it.

(In reply to comment #0)
> JUSTIFICATION: This encoding declaration is more robust than the
> meta@charset declaration. Therefore we want this method in Polyglt Markup.
> But to allow it in Polyglot Markup, it must first be permitted in the HTML5
> proper.

I suggest then that the Polyglot spec be turned into an extension spec, and then you can override requirements in the HTML spec. I think making further changes to the HTML spec itself to accommodate the Polyglot spec is not a good idea.
Comment 9 github bugzilla bot 2014-01-16 15:10:14 UTC
Commit pushed to CR at https://github.com/w3c/html

https://github.com/w3c/html/commit/bcaf4c17d7f073e7647323a51f0a51d1a1998129
[Bug 21818] XHTML5: Permit <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
Comment 10 Henri Sivonen 2014-08-19 06:37:13 UTC
Introducing a difference from the WHATWG spec in order to cater to the polyglot guide is completely inappropriate.

This kind of churn outside the polyglot guide shows that the polyglot guide has negative externalities. I think the editors of other specs should absolutely refuse to let the polyglot guide have externalities and require it to internalize all the trouble it causes.