Internationalization Core Teleconference

07 Oct 2009


See also: IRC log


aphillips, David, Richard, +1.650.253.aaaa, +1.650.253.aabb, Felix, YvesS, mark, andrewc
Addison Phillips
Addison Phillips




<scribe> Scribe: Addison Phillips

<scribe> ScribeNick: aphillip

Agenda and Minutes

Action Items

Richard: respond on our behalf to CSS on ruby issue

addison: set up document edit transition with dan for ws-i18n
... make a table of date changes for DST

<scribe> in progress

addison: ping martin about iri wg activity
... set up an introductory call or meeting with DOM Events folks to discuss

all: look at XForms 1.1 PR for internationalization issues, especially "input mode" at the back

Info Share

Unicode 5.2 was released

(and there was much rejoicing)

IUC33 is next week

likely no call next week

<r12a> http://www.w3.org/International/wiki/Xmllang_and_css

richard working on update to document describing styling via CSS (particularly with language info)

<r12a> Using CSS selectors with xml:lang

scribe: see above
... has worked fingers to bone figuring out how xml:lang and lang work with CSS
... please look it over and send comments

UAX #46


Mark: related to IDNA...
... and IDNA2003 contains rules for exchanging things in DNS with chinese, arabic, etc.
... IDNA2008 is in process
... may finish this year or early next
... and there are incompatibilities between the two
... which means that some old domain names no longer work
... and some in 2008 will won't work with 2003 implementations
... 2008 has many good things
... but there are four serious incompatibilities, which uax46 calls
... deviations
... has implications for spoofing and compat
... Unicode consortium is concerned
... working with browser vendors to see if these incompatibilities can be dealt with
... old UAX#46 was put on hold hoping IETF WG would deal with it
... but they didn't, so this doc revived
... can see in the link

<mark> http://unicode.org/cldr/utility/idna.jsp?a=3%96BB.at%0D%0A8%A1og.com%0D%0AOC%88BB.at%0D%0Afass.de%0D%0Afa3%9F.de%0D%0Af3%A43%9F.de%0D%0ASch3%A4ffer.de%0D%0A%EFC%A1%EFC%A2%EFC%A3%E3%83B%E6%97%A5%E6%9C%AC%0D%0A%E6%97%A5%E6%9C%AC%EFD%A1co%EFD%A1jp%0D%0AxC%81C%A7%0D%0AxC%A7C%81%0D%0AI%E2%99%A5NY%0D%0AE%92F%8CEBEFF%82%0D%0A%EFB%8B%EFA%AE%EFA%91%EFB2

Mark: links shows demo with various punycode transforms

<mark> ȡog.com

Mark: the "d" like character introduced in Unicode after IDNA 2003

<mark> öbb.at

Mark: if lower case works in all cases

<mark> faß.de

Mark: under 2003 remaps to "fass", but encoded as punycode under 2008

<mark> 日本。co。jp

Mark: ideographic full stop fails under 2008
... 2008 allows for pre-processing of the text in a domain name
... called "custom mapping"
... could make them all succeed (or make worse)
... any *user* can do custom mapping


<fsasaki> felix: just on IRC due to bad sound: is any registered label with ideographic full stop character, so does the missing ideographic full stop hurt any existing data?

Mark: full stop treated same as a period
... but no longer

<fsasaki> felix: tx for the explanation

<mark> ABC

(full width characters)

scribe: fail under 2008 but treated as ASCII in 2003
... and these occur *frequently* in actual web pages (based on Google inspection of pages)
... due to IME nature

Mark: proposal is to bridge gap by making stuff continue to work in 2008 that worked in 2003

<mark> ÖBB.at öbb.at fass.de faß.de fäß.de Schäffer.de ABC・日本 日本。co。jp x̧́ x̧́ I♥NY Βόλος ﻋﺮﺑﻲ

<mark> \u0001.com

Mark: what to do about four deviation characters

<mark> faß.de

Mark: such as estzett
... good reasons for both distinct and same treatment
... microsoft proposed to say that implementation (registrar) must bundle them
... this is what UAX#46 recommends
... then it'd work with any registrar

David: so bundling means that they would register both "fass" and "fa"

Mark: yes
... two cases: "good" registrar will bundle
... and remapping won't matter
... "bad" registrar won't bundle
... so two domains exist
... so don't support character
... so use 2003 version of behavior

basically: generate only *one* version for a given input
... even though mutliple outcomes are possible

Mark: one of the main reasons for this is so users can see in browser
... users don't like transform of name for display (remapping)
... only transform on wire

Richard: btw, we have a bunch of tests that show what people see

DENIC bundles

<r12a> http://www.w3.org/International/tests/test-idn-display-0

Mark: other thing concerned with...
... IDNA very concerned with TLDs
... but in reality there are 1000's of registrars
... e.g. google registers subdomains in google
... or blogspot

<mark> http://ɓlog.blogspot.com/


so many many "actual" registrars

scribe: belief is that "actual" registrars won't know to do anything about 2008
... a long time to replace browsers
... so how many on 2008 v. 2003
... and that's what TR46 is
... would like to inform you
... and also think W3C should have a position on this
... so HTML5 and XLink might need to be influenced

Mark: UTC will consider this, may publish as early as November... or may be longer
... needs some wordsmithing, although that won't influence decision by UTC

richard: do you know what HTML5 is doing about this?

Mark: don't know, but believe that it is just conformant to 2003 at present

richard: what we really want is everyone to implement just this one way of doing it

Mark: think we have a lot of supprot from browser vendors
... best if everyone does the same way

<fsasaki> felix: again just on IRC (sorry), propose to put Thomas Roessler (W3C) into the loop, who is following IDNA and also ICANN actions also to some extend IIRC

addison: pub together or a pointer or do something to charmod-iri....

Mark: ICU will implement TR46
... have a followup in a couple of weeks?

<scribe> ACTION: all: review UAX#46 draft for review in two weeks [recorded in http://www.w3.org/2009/10/07-core-minutes.html#action01]

<mark> mark@macchiato.com

Mark: send comments offline to above address

Language Tag Article Approval

HTML5 Review

In particular Hixie's remarks: http://lists.w3.org/Archives/Public/public-i18n-core/2009OctDec/0001.html

<scribe> ACTION: addison: reply to Hixie in support of text at end of thread [recorded in http://www.w3.org/2009/10/07-core-minutes.html#action02]

Language Tag Article Approval

http://www.w3.org/International/questions/qa-choosing-language-tags (new article) http://www.w3.org/International/articles/language-tags/temp.php (updated article)


Mark: my concern is that people might use 'cmn' instead of 'zh'
... will mess themselves up,b ecause 'zh' means, for all practical purposes Mandarin
... fear is that they won't see the squirms that say otherwise
... after reading first two sentences

<r12a> http://www.w3.org/International/questions/qa-choosing-language-tags#extlangsubtags

There is always a 3-letter subtag that is equivalent to any language+extlang pairing. For example, zh-cmn (Mandarin Chinese) can also be expressed with the single subtag cmn.

Mark: so richard's suggestion good
... use more specific tags in most cases, with some important exceptions
... then follow with "there are however"
... and follow with "there are situations where you should still use"
... and move the sentence on Macrolanguage searching to end

addison expresses happiness with that

Mark: in decision two, change examples from 'cmn' to 'yue'

<fsasaki> Felix: (on IRC) Sorry, have to leave at the top of the hour. My comments are editorial and proposals for "would be nice to have" additions, not "this is wrong" comments. So if you want to publish, please go ahead, the articles are good.

addison: tag "Chinese" as 'zh', including all Mandarin
... all other Sinitic languages should use their specific subtags and not use 'zh'

Mark: for predominant form, use the macrolanguage
... including Malay, for example
... ISO distinguishes Filipino and Tagalog... also Twi and Akan
... there are a number of issues with false distinctions
... "be consistent" and "try to be consistent with everyone else"

<mark> http://cldr.unicode.org/development/design-proposals/languages-to-show-for-translation

Mark: case sensitivity in file names can be important in file systems that are case sensitive

<r12a> where it is important, consider using bcp 47 approach

(actually bcp47 has a canonicalization section on case)

Mark: ordering of variants

<r12a> note, any particular recipient of the tags may or may not take the order as important


maybe a variant faq

richard: publish for wide review with changes?

<scribe> chair: any opposed?

none opposed

<scribe> ACTION: richard: publish language tag articles for wide review [recorded in http://www.w3.org/2009/10/07-core-minutes.html#action03]


<r12a> after changes discussion so far

Summary of Action Items

[NEW] ACTION: addison: reply to Hixie in support of text at end of thread [recorded in http://www.w3.org/2009/10/07-core-minutes.html#action02]
[NEW] ACTION: all: review UAX#46 draft for review in two weeks [recorded in http://www.w3.org/2009/10/07-core-minutes.html#action01]
[NEW] ACTION: richard: publish language tag articles for wide review [recorded in http://www.w3.org/2009/10/07-core-minutes.html#action03]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2009/10/07 20:19:04 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.135  of Date: 2009/03/02 03:52:20  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/DENIC /DENIC (german registry)/
Found Scribe: Addison Phillips
Found ScribeNick: aphillip
Default Present: aphillips, David, Richard, +1.650.253.aaaa, +1.650.253.aabb, Felix, YvesS, mark, andrewc
Present: aphillips David Richard +1.650.253.aaaa +1.650.253.aabb Felix YvesS mark andrewc
Agenda: http://lists.w3.org/Archives/Member/member-i18n-core/2009Oct/0002.html
Got date from IRC log name: 07 Oct 2009
Guessing minutes URL: http://www.w3.org/2009/10/07-core-minutes.html
People with action items: addison all reply richard

[End of scribe.perl diagnostic output]