Null change proposal for ISSUE-88 (mark III)

ISSUE-88
========

SUMMARY
There is no problem and the proposed remedy is to change nothing.

RATIONALE
There is no problem.


Another change proposal suggests adding a note on the basis that we should 
clarify why the HTTP and pragma declarations are different when it comes 
to values, and how they should be used, suggesting that this is a constant 
source of confusion.

However, no evidence has been provided to suggest that this really is a 
source of confusion.

Furthermore, the suggested note is wrong in practice. The pragma doesn't 
give metadata about the document. The original intent of the <meta 
http-equiv> feature was to provide a way for _servers_ to include data in 
their HTTP headers on a per-file basis; this isn't document-wide metadata 
for user agents, it's for servers. This original intent also doesn't match 
reality; reality is that this pragma sets the default language for 
lang="", which also isn't document-wide metadata for user agents.

Finally, the proffered note does not actually match the associated 
rationale: it doesn't explain why the HTTP and pragma declaration syntaxes 
are different; instead it talks about a "language" attribute.

If there is a "constant source of confusion", then what we need is 
pointers to this confusion, so that text intended specifically to address 
that confusion is included in the spec. It is quite possible that we could 
add lots of explanatory text and explain the situation in detail, but to 
do so we need to know what the confusion is about. As far as I am aware, 
no bug pointing to confusion on this subject and asking for clarification 
has been rejected, which makes using the change proposal process 
inappropriate.


The same change proposal also suggests a second change, namely to change 
the syntax to allow multiple comma-separated language codes, even though 
all but the first would be ignored.

User agents vary in their handling of the Content-Language pragma. Some 
user agents support a comma-separated list as meaning (contrary to the 
intent of the Content-Language HTTP header) that the root element and its 
descendants, in the absence of any lang="" attribute, are in multiple 
languages. This seems to contradict the model expected by the :lang 
selector and by the lang="" attribute, which assume that each element has 
a single language.

Other user agents treat the comma as part of the language tag, for example 
treating <meta http-equiv="Content-Language" content="en,fr"> as setting a 
pragma-set default language of "en,fr", which can be matched by a selector 
such as ":lang(en\,fr)", and specifically _not_ by ":lang(en)".

(The specification's UA conformance criteria propose a compromise model 
wherein the user agents aren't required to support multiple languages per 
element, but still interpret the comma correctly, rather than treating it 
as part of the language code.)

Because of the way some legacy UAs handle this pragma, and because the 
behaviour of conforming UAs drops all but the first language, it would be 
ill advised for us to make multiple values conforming. The way to mark 
that a document _uses_ multiple languages in such a way that user agents 
can actually parse and find this information is to use the lang="" 
attribute in the document. Putting multiple values in the pragma would 
fail to handle this according to the proposal.

Another possible use case would be to to have a standard way to say who 
the target audience of the document is, but in practice few people use 
that information on the Web, so it doesn't seem like having a pragma that 
exposes this information would be useful, even if we ignore that the user 
agents are currently required to ignore that information.

Even if there was such a need, this feature would be a bad way to provide 
that information, since it is used in an incompatible way by user agents 
(they use this information to determine processing behaviour -- none of 
the languages are treated as a target audience language hint).

For controlled environments, there are a multitude of options available to 
authors, such as the HTTP header of the same name, <meta name> with custom 
names, microdata, RDFa, out-of-band data, <script> blocks, etc. We don't 
need to use this mechanism for that purpose. Doing so would just confuse 
authors further.

No rationale is given for this second change, so it is hard to evaluate 
what the benefit of making this change would be.


Finally, it should be noted that the aforementioned other change proposal 
is self-contradictory. Making the second change (thus making the syntax 
of the pragma the same as its HTTP namesake) would make the rationale for 
the first change (that we should explain the differences between the 
syntax of the pragma and the HTTP header) incorrect.


DETAILS
Change nothing.

IMPACT

POSITIVE EFFECTS
* Encourages authoring behaviour compatible with both legacy user agents 
and with conforming user agents.
* Flags uses of the pragma in existing documents that are not being 
reliably processed in existing UAs.

NEGATIVE EFFECTS
* Flags uses of the pragma in existing documents that are harmless, such 
as "en,en-US". However, evidence suggests that use of the comma is pretty 
rare anyway:
   http://lists.w3.org/Archives/Public/public-html/2010Apr/0088.html

CONFORMANCE CLASS CHANGES
None.

RISKS
It's possible that there is confusion. However, it is easy to handle this 
at a future date when clear evidence of such confusion is found.


REFERENCES
Tests: http://www.hixie.ch/tests/adhoc/html/meta/content-language/

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Sunday, 4 April 2010 01:02:22 UTC