This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12278 - [polyglot] i18n: Make lang and xml:lang required on the root element.
Summary: [polyglot] i18n: Make lang and xml:lang required on the root element.
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: Eliot Graff
QA Contact: HTML WG Bugzilla archive list
URL: http://www.w3.org/TR/2010/WD-html-pol...
Whiteboard:
Keywords:
Depends on:
Blocks: 12279
  Show dependency treegraph
 
Reported: 2011-03-10 00:26 UTC by Leif Halvard Silli
Modified: 2011-08-04 05:07 UTC (History)
7 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2011-03-10 00:26:40 UTC
PROBLEM:
   XML and HTML differ w.r.t. whether the HTTP Content-Language: header MUST or MAY change the language of an element from 'unset' to a specific language. And for http-equiv="Content-Language", then HTML has clear rules, whereas XML is silent. These differences can cause the language to be set on the HTML side,  while it remains unset on the XML side.

HOW TO SOLVE:
   EITHER require authors to create polyglot markup that is immune against the possibility that the Content-Language value (from either http-equiv pragma or HTTP header) can change the language from 'unset' to some specific language in an assymmetric way (that is: only on the HTML side): Basically, make @xml:lang/lang required on the root element - at least in some situations.
   OR accept the differences and document, in the Polyglot Markup specification, how XML and HTML differ.

PROBLEM IN DETAIL:
 A) http-equiv="Content-Language"

     HTML5 - MUST be used in absence of @lang:

  ]] If none of the node's ancestors, including the root element,
     have either  attribute set, but there is a pragma-set default
     language set, then that is the language of the node. [[
     http://dev.w3.org/html5/spec/elements#the-lang-and-xml:lang-attributes

  XML 1.0  - is silent w.r.t. http-equiv.
       However, some common XHTML user agents DO use 
       http-equiv="content-language".  While others don't.
       If considered as equal to http ... then it is
       correct to respect it. HTML5 do not consider it equal.
       Does it, in XML, depend on a DTD?

 B) HTML5 - higher protocols MUST be used as backup:

  ]] If there is no pragma-set default language set, then language
     information from a higher-level protocol (such as HTTP),  if
     any,  must be used as the final fallback language instead. [[
     http://dev.w3.org/html5/spec/elements#the-lang-and-xml:lang-attributes

    XML 1.0 - external transport protocol MAY be used as backup
    (we must ASSUME that 'Content-Language' is what is meant):

  ]] Language information may also be provided by external 
     transport protocols (e.g. HTTP or MIME). When available, this
     information may be used by XML applications, but the more 
     local information provided by xml:lang should be considered
     to override it. [[ 
     http://www.w3.org/TR/xml/#sec-lang-tag

 C) MULTIPLE Content-Language VALUES

      HTML5 specs that Content-Language (http or http-equiv) only 
      affects the language when its value is a single language tag.
      There is no general clarafication of this when it comes to XML.

SOLUTIONS ON THE TABLE - IN DETAIL:

    (1) Conditional: REQUIRE @xml:lang/@lang on root when there is a Content-Language (http-equiv pragma or HTTP header) whose value is exactly a single language tag.
         PRO: Polyglot Markup would follow the same rules as HTML5, except with a stricter conformance requirement.
         CON: Complexity. Such a rule is a complex for authors to administrate. For example, it would mean that if the HTTP server sends out a single Content-Language header without the author's awareness, then the document is assigned a language - which in turn only HTML user agents would be REQUIRED to detect.
         ISSUE-88: My Change Proposal for ISSUE-88 suggest that validators will pick up the HTTP Conent-Language header and warn whenever it causes the language to be set.

    (2) Always REQUIRE @xml:lang/@lang on the root element. 
         PRO: Simple rule.
         CON: Less flexibillity. The fact that the language can be inherited from the higher protocol can also be an advantage. And also, for XML, if one combines several documents into a bigger one (for example by the use of XINCLUDE), then each <html> element of the new, combined document, might end up with the language explicitly defined.  (In contrast, if the root element language was unset, then the <html> elements would inherit the language from the parent element in the new document.) 
         CON: PERHAPS it could increase the tendency to use bogus language declarations. (Many templates comes with "en" as the default.) 
         CON: PERHAPS it could increase the use of the empty string declaration, which is equal to explicitly declaring the language as unknown. <html xml:lang="" lang="" xmlns="*">. Is that bad? If so, why? And when?

    (3) Accept and document the differences: In absence of element level language declaration, then XML apps MAY and HTML uas MUST make use of Content-Language for setting the language. However, many (or most?) popular Web browsers that are also capable of handling XHTML *DO* seem to pick up the language from Content-Language too (from HTTP header and from http-equiv alike).
         PRO: Could triger vendors to align XHTML user agents with HTML5
         CON: left out in the cold would be specialized non-Web parsers, such as XSLT, and other parsers that respect the MAY in the XML spec.

   (4)  Forbidding HTTP Content-Language headers for polyglot markup: NOT A RELEVANT OPTION.

   (5)  Forbidding http-equiv=Content-Language in polyglot markup: Possible. But only limits the problem. Doesn't remove it. Thus one must still choose between option (1), (2) or (3).

    PREFERENCE: My preference is option (2) because it is simplest and because it seems safest.

CAN ISSUE-88 AFFECT THIS BUG?
   In short, yes. But ISSUE-88 is only about what syntax that is permitted inside http-equiv. It is not about how HTML user agents should *react* to Content-Language, whether coming from http-equiv or http.
Comment 1 Eliot Graff 2011-03-14 21:08:03 UTC
I have adopted a combined approach, stating option 1 as a requirement, but also adding a note that polyglot markup may always include both @xml:lang and @lang (option 2) for the sake of simplicity and expediency. This is published in the Editor's Draft of 14 March.


]]
6.5.1.1 Content-Language

 The following HTTP headers and http-equiv declarations warrant special discussion in polyglot markup. 

Example
http-equiv: <meta http-equiv="Content-Language" content="ru"/>
HTTP header: Content-language: ru

 There are no direct issues with regard to the use of Content-Language as long as the language attribute is declared on the root element, as described in Language Attributes. Polyglot markup must declare both the xml:lang as well as the lang attributes on the root element when there is a Content-Language (http-equiv pragma or HTTP header) whose value is exactly a single language tag. By declaring the language attribute on the root element, polyglot markup avoids a difference between XML and HTML in regard to Content-Language. 

Note
 For the sake of simplicity and expediency, content to be delivered as polyglot markup may always include both the xml:lang as well as the lang attributes on the root element. 
[[

Thanks for the detailed suggestion.

ELiot
Comment 2 Leif Halvard Silli 2011-03-14 21:43:28 UTC
nice
Comment 3 Michael[tm] Smith 2011-08-04 05:07:10 UTC
mass-move component to LC1
Comment 4 Michael[tm] Smith 2011-08-04 05:07:33 UTC
mass-move component to LC1