ISSUE-23: [Bug 12278] [polyglot] i18n: Make lang and xml:lang required on the root element

[Bug 12278] [polyglot] i18n: Make lang and xml:lang required on the root element

State:
CLOSED
Product:
polyglot
Raised by:
Richard Ishida
Opened on:
2011-03-23
Description:
Bugzilla: http://www.w3.org/Bugs/Public/show_bug.cgi?id=12278



Summary: [polyglot] i18n: Make lang and xml:lang required on
the root element.
Product: HTML WG
Version: unspecified
Platform: PC
URL: http://www.w3.org/TR/2010/WD-html-polyglot-20100624/#a
ttributes
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P2
Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
Graff)
AssignedTo: eliotgra@microsoft.com
ReportedBy: xn--mlform-iua@xn--mlform-iua.no
QAContact: public-html-bugzilla@w3.org
CC: ishida@w3.org, mike@w3.org,
public-html-wg-issue-tracking@w3.org,
public-html@w3.org, xn--mlform-iua@xn--mlform-iua.no,
public-i18n-core@w3.org, eliotgra@microsoft.com


PROBLEM:
XML and HTML differ w.r.t. whether the HTTP Content-Language: header MUST or
MAY change the language of an element from 'unset' to a specific language. And
for http-equiv="Content-Language", then HTML has clear rules, whereas XML is
silent. These differences can cause the language to be set on the HTML side,
while it remains unset on the XML side.

HOW TO SOLVE:
EITHER require authors to create polyglot markup that is immune against the
possibility that the Content-Language value (from either http-equiv pragma or
HTTP header) can change the language from 'unset' to some specific language in
an assymmetric way (that is: only on the HTML side): Basically, make
@xml:lang/lang required on the root element - at least in some situations.
OR accept the differences and document, in the Polyglot Markup
specification, how XML and HTML differ.

PROBLEM IN DETAIL:
A) http-equiv="Content-Language"

HTML5 - MUST be used in absence of @lang:

]] If none of the node's ancestors, including the root element,
have either attribute set, but there is a pragma-set default
language set, then that is the language of the node. [[
http://dev.w3.org/html5/spec/elements#the-lang-and-xml:lang-attributes

XML 1.0 - is silent w.r.t. http-equiv.
However, some common XHTML user agents DO use
http-equiv="content-language". While others don't.
If considered as equal to http ... then it is
correct to respect it. HTML5 do not consider it equal.
Does it, in XML, depend on a DTD?

B) HTML5 - higher protocols MUST be used as backup:

]] If there is no pragma-set default language set, then language
information from a higher-level protocol (such as HTTP), if
any, must be used as the final fallback language instead. [[
http://dev.w3.org/html5/spec/elements#the-lang-and-xml:lang-attributes

XML 1.0 - external transport protocol MAY be used as backup
(we must ASSUME that 'Content-Language' is what is meant):

]] Language information may also be provided by external
transport protocols (e.g. HTTP or MIME). When available, this
information may be used by XML applications, but the more
local information provided by xml:lang should be considered
to override it. [[
http://www.w3.org/TR/xml/#sec-lang-tag

C) MULTIPLE Content-Language VALUES

HTML5 specs that Content-Language (http or http-equiv) only
affects the language when its value is a single language tag.
There is no general clarafication of this when it comes to XML.

SOLUTIONS ON THE TABLE - IN DETAIL:

(1) Conditional: REQUIRE @xml:lang/@lang on root when there is a
Content-Language (http-equiv pragma or HTTP header) whose value is exactly a
single language tag.
PRO: Polyglot Markup would follow the same rules as HTML5, except with
a stricter conformance requirement.
CON: Complexity. Such a rule is a complex for authors to administrate.
For example, it would mean that if the HTTP server sends out a single
Content-Language header without the author's awareness, then the document is
assigned a language - which in turn only HTML user agents would be REQUIRED to
detect.
ISSUE-88: My Change Proposal for ISSUE-88 suggest that validators will
pick up the HTTP Conent-Language header and warn whenever it causes the
language to be set.

(2) Always REQUIRE @xml:lang/@lang on the root element.
PRO: Simple rule.
CON: Less flexibillity. The fact that the language can be inherited
from the higher protocol can also be an advantage. And also, for XML, if one
combines several documents into a bigger one (for example by the use of
XINCLUDE), then each <html> element of the new, combined document, might end up
with the language explicitly defined. (In contrast, if the root element
language was unset, then the <html> elements would inherit the language from
the parent element in the new document.)
CON: PERHAPS it could increase the tendency to use bogus language
declarations. (Many templates comes with "en" as the default.)
CON: PERHAPS it could increase the use of the empty string
declaration, which is equal to explicitly declaring the language as unknown.
<html xml:lang="" lang="" xmlns="*">. Is that bad? If so, why? And when?

(3) Accept and document the differences: In absence of element level
language declaration, then XML apps MAY and HTML uas MUST make use of
Content-Language for setting the language. However, many (or most?) popular Web
browsers that are also capable of handling XHTML *DO* seem to pick up the
language from Content-Language too (from HTTP header and from http-equiv
alike).
PRO: Could triger vendors to align XHTML user agents with HTML5
CON: left out in the cold would be specialized non-Web parsers, such
as XSLT, and other parsers that respect the MAY in the XML spec.

(4) Forbidding HTTP Content-Language headers for polyglot markup: NOT A
RELEVANT OPTION.

(5) Forbidding http-equiv=Content-Language in polyglot markup: Possible.
But only limits the problem. Doesn't remove it. Thus one must still choose
between option (1), (2) or (3).

PREFERENCE: My preference is option (2) because it is simplest and because
it seems safest.

CAN ISSUE-88 AFFECT THIS BUG?
In short, yes. But ISSUE-88 is only about what syntax that is permitted
inside http-equiv. It is not about how HTML user agents should *react* to
Content-Language, whether coming from http-equiv or http.
Related Actions Items:
No related actions
Related emails:
  1. Review of tracker issues for best practices (from addison@lab126.com on 2015-03-25)
  2. I18N-ISSUE-23: [Bug 12278] [polyglot] i18n: Make lang and xml:lang required on the root element [HTML5-mail] (from sysbot+tracker@w3.org on 2011-03-23)

Related notes:

No additional notes.

Display change log ATOM feed


Addison Phillips <addisonI18N@gmail.com>, Chair, Richard Ishida <ishida@w3.org>, Bert Bos <bert@w3.org>, Fuqiao Xue <xfq@w3.org>, Atsushi Shimono <atsushi@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 23.html,v 1.1 2023/07/19 12:02:00 carcone Exp $