W3C

Results of Questionnaire ISSUE-88: Should meta/@content allow a list of languages - Straw Poll for Objections

The results of this questionnaire are available to anybody.

This questionnaire was open from 2010-06-24 to 2010-06-30.

13 answers have been received.

Jump to results for question:

  1. Objections to the Change Proposal to make the Content-Language pragma non-conforming
  2. Objections to the Change Proposal to require user agents ignore pragmas that specify multiple languages
  3. Objections to the Change Proposal to let multiple language tags continue to be legal

1. Objections to the Change Proposal to make the Content-Language pragma non-conforming

We have a Change Proposal to make the Content-Language pragma non-conforming. If you have strong objections to adopting this Change Proposal, please state your objections below.

Keep in mind, you must actually state an objection, not merely cite someone else. If you feel that your objection has already been adequately addressed by someone else, then it is not necessary to repeat it.

Details

Responder Objections to the Change Proposal to make the Content-Language pragma non-conforming
Anne van Kesteren
Leif Halvard Silli Based on Opera MAMA, then:
13% of web pages contain meta@http-equiv=Content-Language.
8%-9% contain html@lang (@lang on the <html> element)
1.75% of Web pages have a "real", serverside HTTP Content-Language header
http://lists.w3.org/Archives/Public/www-international/2010AprJun/0025#start25

- The use of html@lang cancels any effects that HTTP and http-equiv might have, in HTML5 parsers and (in the average cases) in legacy parsers as well
- Likewise, in case @content contains multiple tags, then HTML5 parsers will ignore it (steps have been taken to make Gecko implement HTML5 behavior:: https://bugzilla.mozilla.org/show_bug.cgi?id=564571)
- Whereas legacy browsers -with the exception of legacy Gecko - treat it as a crypic single language tag, so cryptic that authors do not make use of it. UAs should fix it, but multiple tags doesn't make pages unreadble.

Thus:

* How many pages contain both meta@http-equiv=Content-Language and html@lang simultaneously? Why bother all 13% with an error message if many of those also contain html@lang, which cancels any effects that meta@http-equiv=Content-Language might have?
* Why do those 1.75% of web pages _not_ get an error/warning, when HTML5 says that documents will inherit the language from the server whenever either meta@http-equiv=Content-Language or html@lang is lacking?
* Why present authors with an error for use of multiple tags when multiple tags are required to not have any effect? Or, why should validation really be affected by legacy browser, when the legacy behavior seldom affects pages and when the world is moving towards HTML5 parsers?

Also: To make meta@http-equiv=Content-Language entirely unconforming also means that authors will not be informed about the actual, factual effects that this element has - all that they will be told is that the element is illegal. This seems suboptimal.

I object because of lack of consistency and for needless and uneducating error messages.
Tantek Çelik No objections. It is preferable to remove broken features rather than keep them (even if "non-conforming") to minimize risk of continued misuse/misunderstanding and otherwise time-wasting on behalf of web designers and developers.
Michael Puls II
Julian Reschke The value space for http-equiv is defined by the respective specifications of HTTP headers. Changing the conformance, in particular for specific header names, is a layer violation.
Henri Sivonen I object to making the Content-Language pragma non-conforming, because doing so would make validation more noisy for authors who are taking HTML5 validation tools into use while working with legacy markup.

E.g. Microsoft FrontPage (at least some versions) use the Content-Language pragma to encode the default language of the page. While it would be nicer if the lang attribute on the root element had been used instead of the pragma, the language metadata can be relatively easily salvaged by treating the language tag from Content-Language meta pragma as being the language of the document node for the purposes of inheritance. Thus, it is reasonably easy to make single-language pragmas fit in the language identification scheme used by HTML (the lang attribute and computing the language of a node by looking for the lang value on nearest ancestor that has it specified). Supporting existing Content-Language pragma usage puts authors who have legacy content to maintain and migrate ahead of spec purity. (I agree that if existing practices were ignored, not having the Content-Language pragma at all would be nicer language design.)
Aryeh Gregor
Theresa O'Connor
David Singer
Lachlan Hunt This is my preferred alternative.
Philip Jägenstedt
Sam Johnston
Richard Ishida [For the i18n WG] We are concerned that this interferes with the original intended use of the http-equiv markup and compatibility between the corresponding HTTP header and HTML markup. On the other hand, we note that (a) the http-equiv, as a mechanism, is not often useful or effective in pages for any HTTP header values, let alone language declaration, and (b) allowing the content-language pragma to be used for language declarations will continue to confuse authors.

2. Objections to the Change Proposal to require user agents ignore pragmas that specify multiple languages

We have a Change Proposal to keep the current text that require user agents ignore pragmas that specify multiple languages, discouraging the use of the pragma, encouraging lang="" use instead, and explicitly requiring that conformance checkers warn of this issue where relevant.. If you have strong objections to adopting this Change Proposal please state your objections below.

Keep in mind, you must actually state an objection, not merely cite someone else. If you feel that your objection has already been adequately addressed by someone else, then it is not necessary to repeat it.

Details

Responder Objections to the Change Proposal to require user agents ignore pragmas that specify multiple languages
Anne van Kesteren
Leif Halvard Silli (NOTE: All 3 proposals agree that UAs MUST ignore pragmas with multiple languages inside.)

Based on Opera MAMA, then:
13% of web pages contain meta@http-equiv=Content-Language.
8%-9% contain html@lang (@lang on the <html> element)
1.75% of Web pages have a "real", serverside HTTP Content-Language header
http://lists.w3.org/Archives/Public/www-international/2010AprJun/0025#start25

- The use of html@lang cancels any effects that HTTP and http-equiv might have, in HTML5 parsers and (in the average cases) in legacy parsers as well
- Likewise, in case @content contains multiple tags, then HTML5 parsers will ignore it (steps have been taken to make Gecko implement HTML5 behavior:: https://bugzilla.mozilla.org/show_bug.cgi?id=564571)
- Whereas legacy browsers -with the exception of legacy Gecko - treat it as a crypic single language tag, so cryptic that authors do not make use of it. UAs should fix it, but multiple tags doesn't make pages unreadble.

Thus:

* How many pages contain both meta@http-equiv=Content-Language and html@lang simultaneously? Why bother all 13% with an error message if many of those also contain html@lang, which cancels any effects that meta@http-equiv=Content-Language might have?
* Why do those 1.75% of web pages _not_ get an error/warning, when HTML5 says that documents will inherit the language from the server whenever either meta@http-equiv=Content-Language or html@lang is lacking?
* Why present authors with an error for use of multiple tags when multiple tags are required to not have any effect? Or, why should validation really be affected by legacy browser, when the legacy behavior seldom affects pages and when the world is moving towards HTML5 parsers?

Also: To treat a single tag as "more legal" than multiple tags, creates a strange link - of semantic nature - between http-equiv=Content-Language and @lang.

I object because of lack of consistency and for needless and uneducating error and warning messages.
Tantek Çelik No strong objection. However, I'd still prefer complete removal of a broken feature rather than issuing warnings.
Michael Puls II I object to this. I don't think we should be allowed to mimic the HTTP form of this header with a meta. I feel we should restrict http-equiv stuff to Content-Type for the sake of charset, which of course can be just <meta charset>. There's already lang="" and I don't think we should be encouraging (at all) the use of a meta for the language, not even to try and make some multiple language fallback try to work. I don't believe <meta> is the place for that.
Julian Reschke
Henri Sivonen
Aryeh Gregor
Theresa O'Connor
David Singer
Lachlan Hunt I'm sympathetic to the rationale for leaving this as conforming but obsolete, given that its use is mostly harmless and is already used in existing content. However, given that it's not so widely used in existing content as to present a significant hindrance to migrating to HTML5; it provides no benefit that cannot not achieved by using the lang attribute; and permitting it as conforming only serves to falsely legitimise its use, I don't think it is worthy of remaining conforming at all. This, however, is not a particularly strong objection, and I would accept a resolution in favour of either this proposal, or the proposal to make it fully non-conforming.
Philip Jägenstedt
Sam Johnston
Richard Ishida [For the i18n WG] We have a STRONG objection to this proposal because it changes syntax of the <meta> Content-Language pragma to support only one language value. This is very likely to compound existing confusion about how the <meta> Content-Language element should be used because it makes it appear more like a duplicate of the lang attribute while breaking both existing content and compatibility with the corresponding HTTP header (the only header for which this is true). Warnings are only likely to be seen by people validating their content. That is an issue with all of the proposals, but this proposal compounds the confusion by making it seem like the meta Content-Language element is *designed* as an alternative method for expressing the default document processing language for the page. We really need to help people move to a single, consistent method.

3. Objections to the Change Proposal to let multiple language tags continue to be legal

We have a Change Proposal to let multiple language tags continue to be legal . If you have strong objections to adopting this Change Proposal please state your objections below.

Keep in mind, you must actually state an objection, not merely cite someone else. If you feel that your objection has already been adequately addressed by someone else, then it is not necessary to repeat it.

Details

Responder Objections to the Change Proposal to let multiple language tags continue to be legal
Anne van Kesteren I think this is confusing for authors given that it only has a measurable effect if one language is specified. Per the proposal it becomes meaningless for user agents if multiple values are specified. I do not find it useful at all that such a thing would not be flagged.
Leif Halvard Silli
Tantek Çelik I strongly object. The workarounds provided in the change proposal increase web authoring complexity. Broken features should be removed, from the language and the specification, rather than asking web developers to waste time learning about broken features and how to work around them. Let's keep the spec as clean as possible.
Michael Puls II I object to this. I don't think we should be allowed to mimic the HTTP form of this header with a meta. I feel we should restrict http-equiv stuff to Content-Type for the sake of charset, which of course can be just <meta charset>. There's already lang="" and I don't think we should be encouraging (at all) the use of a meta for the language, not even to try and make some multiple language fallback try to work. I don't believe <meta> is the place for that.
Julian Reschke
Henri Sivonen I strongly object to this change proposal. (To be clear, this objection should be consider stronger than my objection to making the pragma entirely non-conforming.)

Currently, each node in an HTML document can have its language computed to at most one language tag. A multi-language value doesn't fit in that language inheritance and computation scheme. The Change Proposal doesn't try to take the multiple languages in the pragma into account in the computation. It merely would make it conforming to have a list of languages in the pragma. I object to changing the conformance definition to give authors the wrong idea of the effect of the pragma by making it conforming to have a language list if the full list doesn't participate in computations (i.e. making the list talismanic). To be clear, I also pre-emptively object to making the language inheritance computation and inheritance more complex (by allowing the whole language list in the pragma participate in the computation).

I object to letting the notions of "target audience" or "content management systems" influence how pragma directives work in HTML. Actual usage as exemplified by the behavior programmed into Microsoft FrontPage suggests that people don't generally think in terms of "target audience" as in "This document written in French is meant for a German-speaking target audience". Instead, they think in terms of declaring the language of the document as in "This language is written in French." (That is, I think the HTTP Content-Language header is flawed as specified. If one stretches the limits of plausibility enough, one might see the case for responding with a French document written for a German-speaking audience to an HTTP request with Accept-Language: de, but it doesn't seem useful at all to have the resource declare whom it was intended for at the point where it's too late to perform negotiation based on the intended target.)

So far, it hasn't been demonstrated that the authoring tools need to be able to communicate the "target audience" to CMSs in-band so often that the use case merited standardization as an HTML pragma. (When configuring a CMS directly without wanting an authoring tool from a different vendor drive the configuration, it's not necessary to standardize CMS configuration directive--any product-specific syntax will do and doesn't have to leak to public view.) I object to premature standardization of purported CMS configuration features.
Aryeh Gregor Having multiple languages in http-equiv="Content-Language" is not interoperably supported, as far as I can tell. It might be worth defining useful behavior here and trying to get browsers to converge on it, but only if it would support features that aren't currently available. Otherwise it would be a waste of effort both to spec and implement.

I cannot find any realistic use given in the change proposal for allowing multiple languages here. Apparently, the idea is to indicate the target audience for the page, as opposed to what language the page is written in. However, nothing here tells us why anyone would want to do this, or how any HTML processor should use this information, other than to figure out what language the page is in. As such, it seems like a purely theoretical way to embed unneeded semantics in the page. (By contrast, Content-Language in an actual HTTP header could theoretically be used for content negotiation, in which case you would want to know the target audience and not the document language.)

On the other hand, the proposal would permit confusing markup to be valid and raise no warnings. For instance, if I understand correctly, it would make a document like

<!doctype html>
<html lang="en">
<meta http-equiv="Content-Language" content="de">

conforming with no warnings. But it's very unclear what this markup means, without actually checking the specs. Even if you consider a case like

<!doctype html>
<html lang="en">
<meta http-equiv="Content-Language" content="en,en-US">

where it's clear what language the document is supposed to be in, and even if UAs would handle this interoperably (would they?), authors tend to copy and paste markup. They (reasonably) assume that if markup works in one page, it should work in another. If the lang attribute were removed or changed in this document, the markup would become confusing, or might not even be processed interoperably.

In short, as far as I can tell from the change proposals, allowing multiple languages here opens up the risk of confusing or non-interoperable markup without exposing any practically useful features. The document snippets I gave above should not be conforming without at least warnings, so I object to this change proposal.
Theresa O'Connor If an HTTP header is to be included in an HTML page with http-equiv, its syntax and semantics should, ideally, be covered by HTTP and not HTML. But *which HTTP headers* can be so included is properly in the purview of HTML.

Browser behavior around http-equiv="content-language" differs from what HTTP specifies, but is required for compat. Given this disconnect, and the general user confusion around the feature, and the presence of simpler features that are far more likely to be used correctly, the HTML spec should disallow users from using http-equiv="content-language" in their pages.
David Singer The HTTP equivalent language header was probably a mistake. However, it is now used as a fallback for missing lang tags on the HTML. In this state, it is important that it only give a single language. Using lang properly is undoubtedly better, but failing that, getting a single language from this tag is what is needed. Allowing multiple languages is confusing, and does not lead to predictable, interoperable results. This proposal should not be accepted, but instead the status quo, or making the HTTP equivalent header non-conforming.
Lachlan Hunt I have recorded my thorough objections to this proposal elsewhere.

http://lists.w3.org/Archives/Public/www-archive/2010Jun/att-0072/Content-Language.html

I am citing the following summary of arguments here for easier review, but be aware that the supporting rationale for these arguments is in the linked article.

* The change proposal is based upon the false premise that the Content-Language HTTP header and pragma directive are equivalent.

* The HTTP header is used to declare the languages of the intended audience; the only defined function of the pragma directive is to be used as a fallback language in the absence of the lang attribute.

* The use of the pragma directive as part of server configuration is out of scope of HTML. Specific server side implementation choices need not affect the conformance definition.

* The pragma directive only fulfils its purpose of providing a fallback language when one language tag is specified. Multiple language tags are, by definition of the implementation requirements, not useful or beneficial.

* There are no reasons given for why it is beneficial to leave the pragma directive in the document when the lang attribute is present on the root element.

* Failing to offer a warning about its presence in all cases would continue to mislead the author about its legitimacy.

* The inconsistency of when warnings are issued would be confusing to authors. It is better to offer a consistent warning about the presence of a redundant feature.

* The defined effect, per the implementation requirements, of declaring multiple language tags is identical to that of omitting the pragma directive entirely. No reasons are given to explain why declaring multiple language tags is useful.

* The syntax of the Content-Language HTTP header field is not affected by the definition of the distinct Content-Language pragma directive in HTML, with which it only shares a common name and does not share significant functionality. It is reasonable for this distinct feature to use a distinct conforming syntax that is suitable for its purpose.

* No reason is given explaining why only emitting the warning under specific circumstances, as opposed to the current specification requirement, would serve better in encouraging authors to use the lang attribute instead.

* The proposed replacement specification text contains unjustified changes, inconsistencies, unimplementable requirements and is overall inappropriate for use in the specification.

* The claimed positive effects are unsupported by evidence and, in several cases, blatantly incorrect.

* In practice, very few authors use multiple language tags in the pragma directive, and doing so is not useful. Restricting the syntax to one language would not have a significant negative impact.
Philip Jägenstedt Sever of the claimed positive effects of this proposal would be better served by making http-equiv="Content-Language" non-conforming altogether.

1. More positive -- adding lang="" to mask http-equiv="Content-Language" seems worse than adding lang="" and getting rid of http-equiv="Content-Language" altogether. Leaving http-equiv="Content-Language" in a document just means that it risks getting copy-pasted into new documents, when lang="" is what should be used.

4. More correct -- I agree that, all else equal, it would be nice if http-equiv actually had the same effect as a HTTP header. However, given that this isn't possible in this case for compatibility reasons, it would be better to make http-equiv="Content-Language" non-conforming so that people (who care about validity) don't use it at all.

5. More useful -- simply making http-equiv="Content-Language" non-confirming would make for a less confusing authoring situation.
Sam Johnston Empirical evidence[1] suggests that the proposed functionality is more often than not either unused or misused. Language support is already more complicated than it needs to be and as a result often goes ignored by many designers - providing two mechanisms for specifying the same metadata is at least one mechanism more than necessary.

Accordingly the redundant mechanism should either be removed or at the very least simplified, particularly in consideration of the extremely limited use of multiple languages and the low impact of the fix.

1. http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html
Richard Ishida [For the i18n WG] The Internationalization WG is happiest with this proposal (compared to the others) because it is most consistent with our view that existing content should not be harmed or required to change the syntax of <meta> Content-Language, while the document processing language should be clearly defined, independent of document metadata, and derived primarily from @lang.

We object to the proposal as written because, although it provides a workable defaulting mechanism that may help with legacy pages, it is likely to prolong the confusion experienced by users creating new pages. In the absence of a lang attribute on the <html> tag, declaring language in <meta> Content-Language will continue to produce an effect. Users will only find out that you shouldn't do that if they validate their pages - and the people we're talking about who get this wrong are quite likely not to validate.

The CP also proposes two methods to remove any warnings that involve removing the meta and/or HTTP information rather than adding a lang attribute. This seems inconsistent with the goal of encouraging people to use language declarations. Insertion of lang attributes is preferred to removing information from the page, even if that information is not used by user-agents.

We would prefer that the CP be modified so that browsers must not guess at the default language for the page by looking at the HTTP headers and/or meta elements. This would result in a CP that does not remove or change the http-equiv information (as the "non-conforming" CP proposes) but would render it harmless. We believe that the defaulting mechanism proposed will occasionally confuse users when newer features in CSS3 (for example) are activated by the HTTP header value or by metadata injected by (for example) their CMS. This, of course, is an argument against the main raison d'être of this CP.

The Internationalization WG also STRONGLY disagrees with the proposal to change 'pragma-set default language' to 'pragma-set locale language'. We feel that the definition of locale (which is to do with API settings) is not to be confused with the declaration of content language. Although these are related in some ways, in fact the "locale" is not set by @lang and it should not be implied to be so.

More details on responses

  • Anne van Kesteren: last responded on 24, June 2010 at 13:05 (UTC)
  • Leif Halvard Silli: last responded on 25, June 2010 at 16:29 (UTC)
  • Tantek Çelik: last responded on 27, June 2010 at 01:29 (UTC)
  • Michael Puls II: last responded on 27, June 2010 at 17:46 (UTC)
  • Julian Reschke: last responded on 28, June 2010 at 09:48 (UTC)
  • Henri Sivonen: last responded on 28, June 2010 at 13:49 (UTC)
  • Aryeh Gregor: last responded on 28, June 2010 at 20:51 (UTC)
  • Theresa O'Connor: last responded on 28, June 2010 at 21:19 (UTC)
  • David Singer: last responded on 30, June 2010 at 00:42 (UTC)
  • Lachlan Hunt: last responded on 30, June 2010 at 07:51 (UTC)
  • Philip Jägenstedt: last responded on 30, June 2010 at 10:20 (UTC)
  • Sam Johnston: last responded on 30, June 2010 at 11:18 (UTC)
  • Richard Ishida: last responded on 1, July 2010 at 10:17 (UTC)

Everybody has responded to this questionnaire.


Compact view of the results / list of email addresses of the responders

WBS home / Questionnaires / WG questionnaires / Answer this questionnaire