Null change proposal for ISSUE-88 (mark II): proposed note

This is a personal response. I want to specifically address some points
related to the proposed note text in the current version of the 'other
change proposal', which can be found at
http://www.w3.org/International/wiki/Htmlissue88 .

(With regard to the second part of Ian's proposal, we have been discussing
this in the i18n WG, and are trying to find a way forward wrt the topic.  We
may suggest a teleconference at some point to try to move faster towards
consensus.)


> From: public-html-request@w3.org [mailto:public-html-request@w3.org] On
> Behalf Of Ian Hickson
> Sent: 04 April 2010 02:02
...

> Another change proposal suggests adding a note on the basis that we
> should
> clarify why the HTTP and pragma declarations are different when it comes
> to values, and how they should be used, suggesting that this is a constant
> source of confusion.

The note proposes to clarify why HTTP and pragma declarations are different
from lang/xml:lang attribute declarations - not why HTTP and pragma
declarations are different from each other.  I have changed 'are different'
in the rationale to 'are different from language attributes' to make that
clearer at http://www.w3.org/International/wiki/Htmlissue88 although I think
it should be clear enough in the proposed text for the note itself.

> However, no evidence has been provided to suggest that this really is a
> source of confusion.

As part of its remit, the i18n WG has talked with many many people over
several years about how to declare language in HTML documents. In our
experience over those years, the question as to whether they should use the
lang attribute or the meta tag with Content-Language (which I will refer to
as 'the pragma' here) has almost without exception proved confusing to
content authors. That is the confusion we are concerned about. 

That confusion was compounded and exemplified in the past by earlier
versions of html editors, such as Frontpage, which, if you tried to declare
the default language for the page for text-processing[1] purposes using
dialog boxes, would add language information to the 'pragma' and not to a
lang attribute.  In addition, if you searched the Web for information about
how to declare language for a document some years back, sources almost
invariably proposed the use of the 'pragma' for that and didn't refer to the
lang attribute.  This, however, was at a time when major browsers (such as
IE, which then had a much larger market share than now) did not even support
the use of content-language for determining the default language of a page
for text-processing purposes, whereas the lang attribute did.  

More recently editors and advice appear to have been slowly changing, and
prospective content authors appear to now be much more likely to be guided
to use the lang attribute than the 'pragma'.  (This was borne out by
statistics I have seen in a presentation from Google at a Unicode
Conference, but which I am unable to find a pointer to, that showed the use
of language attributes increasing and that of pragmas decreasing.) On the
other hand, at the same time, there is also now much wider support for
defining the default language of a page using the pragma, since IE8, Safari,
Chrome, Opera and Firefox (but still not Opera) now support that.  (Note
however, that IE7 still didn't support it.)

We agree with the HTML5 spec [2] that authors should be encouraged to use
the lang attribute to declare the default language of the document, and our
concern is that changing the syntax of the pragma to accept only a single
value makes it appear to be equivalent alternative to the lang attribute and
therefore lessens the likelihood that authors will use the attribute -
especially for those who don't validate their content and see a warning
message. 

We recognise that there is usefulness in inferring default language from the
pragma or the HTTP header in the absence of a lang attribute, especially now
that browsers are generally doing so, but we feel that we should do what we
can to make it simpler for authors to understand that this is only a
fallback procedure.  We feel that continuing to describe the pragma in the
same terms as the HTTP header will help with that.

[1] By default language for text-processing we mean for use with voice
browsers, spell-checkers, etc.  This being what the lang attribute is
ideally suited for, given that it can only associate one language at a time
with a given range of text.

[2] "Conformance checkers will include a warning if this pragma is used.
Authors are encouraged to use the lang  attribute instead."


> Furthermore, the suggested note is wrong in practice. The pragma doesn't
> give metadata about the document. The original intent of the <meta
> http-equiv> feature was to provide a way for _servers_ to include data in
> their HTTP headers on a per-file basis; this isn't document-wide metadata
> for user agents, it's for servers. 

The purpose of the HTTP header is described in 14.12 Content-Language of
RFC2616 as:

"The Content-Language entity-header field describes the natural language(s)
of the intended audience for the enclosed entity. Note that this might not
be equivalent to all the languages used within the entity-body."
http://www.ietf.org/rfc/rfc2616.txt

So I'm not sure why you say that the pragma is not originally designed for
metadata about the document, given that you point out that it was originally
designed to feed the HTTP header, which is metadata.


> This original intent also doesn't match
> reality; reality is that this pragma sets the default language for
> lang="", which also isn't document-wide metadata for user agents.

> 
> Finally, the proffered note does not actually match the associated
> rationale: it doesn't explain why the HTTP and pragma declaration syntaxes
> are different; instead it talks about a "language" attribute.

See above.

> 
> If there is a "constant source of confusion", then what we need is
> pointers to this confusion, so that text intended specifically to address
> that confusion is included in the spec. It is quite possible that we could
> add lots of explanatory text and explain the situation in detail, but to
> do so we need to know what the confusion is about. As far as I am aware,
> no bug pointing to confusion on this subject and asking for clarification
> has been rejected, which makes using the change proposal process
> inappropriate.

The main point of this proposal was what is currently the second point: ie.
that HTML5 should not redefine the syntax of the Content-Language value.
The proposed note was suggested as spec text to support that proposal, and
that is why it was in the change proposal.

...

> Finally, it should be noted that the aforementioned other change proposal
> is self-contradictory. Making the second change (thus making the syntax
> of the pragma the same as its HTTP namesake) would make the rationale for
> the first change (that we should explain the differences between the
> syntax of the pragma and the HTTP header) incorrect.

This comment derives from a misunderstanding of the intent of the first
point.


RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/




> -----Original Message-----
> From: public-html-request@w3.org [mailto:public-html-request@w3.org] On
> Behalf Of Ian Hickson
> Sent: 02 April 2010 19:54
> To: public-html@w3.org
> Subject: Null change proposal for ISSUE-88 (mark II)
> 
> 
> ISSUE-88
> ========
> 
> SUMMARY
> There is no problem and the proposed remedy is to change nothing.
> 
> RATIONALE
> There is no problem.
> 
> 
> Another change proposal suggests adding a note on the basis that we
> should
> clarify why the HTTP and pragma declarations are different when it comes
> to values, and how they should be used, suggesting that this is a constant
> source of confusion.
> 
> However, no evidence has been provided to suggest that this really is a
> source of confusion.
> 
> Furthermore, the suggested note is wrong in practice. The pragma doesn't
> give metadata about the document. The original intent of the <meta
> http-equiv> feature was to provide a way for _servers_ to include data in
> their HTTP headers on a per-file basis; this isn't document-wide metadata
> for user agents, it's for servers. This original intent also doesn't match
> reality; reality is that this pragma sets the default language for
> lang="", which also isn't document-wide metadata for user agents.
> 
> Finally, the proffered note does not actually match the associated
> rationale: it doesn't explain why the HTTP and pragma declaration syntaxes
> are different; instead it talks about a "language" attribute.
> 
> If there is a "constant source of confusion", then what we need is
> pointers to this confusion, so that text intended specifically to address
> that confusion is included in the spec. It is quite possible that we could
> add lots of explanatory text and explain the situation in detail, but to
> do so we need to know what the confusion is about. As far as I am aware,
> no bug pointing to confusion on this subject and asking for clarification
> has been rejected, which makes using the change proposal process
> inappropriate.
> 
> 
> The same change proposal also suggests a second change, namely to
> change
> the syntax to allow multiple comma-separated language codes, even though
> all but the first would be ignored.
> 
> User agents do not pay any attention to values after the first. The way to
> mark that a document _uses_ multiple languages in such a way that user
> agents can actually parse and find this information is to use the lang=""
> attribute in the document. Putting multiple values in the pragma would
> fail to handle this according to the proposal.
> 
> Another possible use case would be to to have a standard way to say who
> the target audience of the document is, but in practice few people use
> that information on the Web, so it doesn't seem like having a pragma that
> exposes this information would be useful, even if we ignore that the user
> agents are currently required to ignore that information.
> 
> Even if there was such a need, this feature would be a bad way to provide
> that information, since it is used in an incompatible way by user agents
> (the first language, and only the first language, is used to determine
> processing behaviour -- none of the languages are treated as a target
> audience language hint). For controlled environments, there are a
> multitude of options available to authors, such as the HTTP header of the
> same name, <meta name> with custom names, microdata, RDFa, out-of-band
> data, <script> blocks, etc. We don't need to use this mechanism for that
> purpose. Doing so would just confuse authors further.
> 
> No rationale is given for this second change, so it is hard to evaluate
> what the benefit of making this change would be.
> 
> 
> Finally, it should be noted that the aforementioned other change proposal
> is self-contradictory. Making the second change (thus making the syntax
> of the pragma the same as its HTTP namesake) would make the rationale for
> the first change (that we should explain the differences between the
> syntax of the pragma and the HTTP header) incorrect.
> 
> 
> DETAILS
> Change nothing.
> 
> IMPACT
> 
> POSITIVE EFFECTS
> * Ensures consistency with current implementation usage of the content
> attribute in the Content Language pragma and with earlier specifications.
> 
> NEGATIVE EFFECTS
> None.
> 
> CONFORMANCE CLASS CHANGES
> None.
> 
> RISKS
> It's possible that there is confusion. However, it is easy to handle this
> at a future date when clear evidence of such confusion is found.
> 
> --
> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.800 / Virus Database: 271.1.1/2784 - Release Date: 04/01/10
> 19:32:00

Received on Thursday, 8 April 2010 20:05:27 UTC