This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17864 - i18n-ISSUE-118: explicitly undefined language
Summary: i18n-ISSUE-118: explicitly undefined language
Status: RESOLVED NEEDSINFO
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
: 16978 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-07-18 07:08 UTC by contributor
Modified: 2015-06-17 02:58 UTC (History)
4 users (show)

See Also:


Attachments

Description contributor 2012-07-18 07:08:23 UTC
This was was cloned from bug 16978 as part of operation convergence.
Originally filed: 2012-05-07 18:06:00 +0000
Original reporter: Addison Phillips <addison@lab126.com>

================================================================================
 #0   Addison Phillips                                2012-05-07 18:06:51 +0000 
--------------------------------------------------------------------------------
3.2.3.3 The lang and xml:lang attributes
http://www.w3.org/TR/html5/elements.html#the-lang-and-xml:lang-attributes

(lang). What does this mean:

--
If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown.
--

Does an explicitly unknown language have any different effect? It might be a good idea to add text such as:

--
If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown and any language specific processing that applied is implementation defined.
--
================================================================================
 #1   Ian 'Hixie' Hickson                             2012-05-10 17:55:59 +0000 
--------------------------------------------------------------------------------
I believe this is a duplicate of a previously existing bug with more discussion.
================================================================================
Comment 1 Ian 'Hixie' Hickson 2012-09-28 18:05:09 UTC
(The other bugs I had in mind don't cover this specific issue.)

Addison: What effect would it have if lang="und"? Where is that defined? I'll try to use the same language. (I don't want to explicitly make them equivalent, because the unknown codes have to be passed through to CSS, OpenType, etc.)
Comment 2 Addison Phillips 2012-09-28 18:20:18 UTC
(In reply to comment #1)
> (The other bugs I had in mind don't cover this specific issue.)
> 
> Addison: What effect would it have if lang="und"? Where is that defined? I'll
> try to use the same language. (I don't want to explicitly make them equivalent,
> because the unknown codes have to be passed through to CSS, OpenType, etc.)

I see lang="und" as being slightly different from lang="", although BCP 47 makes them equivalent in meaning. 'und' is defined by ISO 639-2 and is incorporated along with 'zxx', 'mul', and 'mis'. The specific definitions are here:

  http://tools.ietf.org/html/bcp47#section-4.1

See item #5, which has this sub-bullet about 'und':

       *  The 'und' (Undetermined) primary language subtag identifies
          linguistic content whose language is not determined.  This
          subtag SHOULD NOT be used unless a language tag is required
          and language information is not available or cannot be
          determined.  Omitting the language tag (where permitted) is
          preferred.  The 'und' subtag might be useful for protocols
          that require a language tag to be provided or where a primary
          language subtag is required (such as in "und-Latn").  The
          'und' subtag MAY also be useful when matching language tags in
          certain situations.

The way I see lang="und" being different from lang="" is probably the same thing you allude to you in your comment: there is actually a value there and, as far as any HTML processor is aware, it might contain some meaning or be available for matching. The processor would have to look at the content of the attribute and determine that it is 'und' in order to determine the "undetermined-ness" of the language, which is something we want to avoid. Hence: the 'und' tag should not be used in HTML5 (although it is not illegal to do so) because HTML5/HTML-next allows the empty string.
Comment 3 Ian 'Hixie' Hickson 2012-12-30 00:42:08 UTC
What part of that quoted text says what "effect" lang="und" has? Other than how the value is passed to other tools, how would lang="und" processing differ from lang="" according to the current specs? (i.e. is there anything required of user agents for one that is not required for the other?)

I don't understand what you would like specified here.
Comment 4 Michael[tm] Smith 2015-06-17 02:58:03 UTC
*** Bug 16978 has been marked as a duplicate of this bug. ***