This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17859 - Mechanism to enable localisation of form controls and other locale-specific data
Summary: Mechanism to enable localisation of form controls and other locale-specific data
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other All
: P3 enhancement
Target Milestone: Needs Impl Interest
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-18 07:05 UTC by contributor
Modified: 2019-03-29 19:21 UTC (History)
12 users (show)

See Also:


Attachments

Description contributor 2012-07-18 07:05:03 UTC
This was was cloned from bug 16965 as part of operation convergence.
Originally filed: 2012-05-07 17:25:00 +0000
Original reporter: Addison Phillips <addison@lab126.com>

================================================================================
 #0   Addison Phillips                                2012-05-07 17:25:09 +0000 
--------------------------------------------------------------------------------
4.10.7.1.7 Date and Time state
http://dev.w3.org/html5/spec/single-page.html#date-and-time-state-type-datetime

--
The format shown to the user is independent of the format used for form submission. Browsers are encouraged to use user interfaces that present dates and times according to the conventions of the user's preferred locale.
--

Should we encourage the use of the *page's* preferred locale? It would be best if we could provide a way for page authors to create a consistent user experience. Work on the ECMAScript I18N extension may eventually help here, but only for formatting displayed values---this section seems to imply that the user agent have a built-in date/time control set that may not be scriptable. Although a given user agent won't necessarily support the requested locale, we would suggest providing a <form> attribute or a field attribute to allow the page itself to request a given locale using a BCP 47 language tag.

The I18N WG feels that this could be accomplished by making section 4.10.7.2 normative.
================================================================================
Comment 1 Ian 'Hixie' Hickson 2012-09-16 03:31:09 UTC
Making section 4.10.7.2 "Implemention notes regarding localization of form controls " normative would have no effect, since the section contains no normative statements or conformance criteria.

However, giving a locale is something I think we'll eventually do, using a locale="" attribute. It would control things like the default output format for date localisation in CSS, the default calendar to use in date controls, the default format for <input type=number> controls, the default input mode, etc.
Comment 2 Ian 'Hixie' Hickson 2012-11-25 04:42:20 UTC
Marking this "LATER". This is a big feature to add and so we need to wait for browsers to have caught up first. It would help to have explicit buy-in from specific vendors who want to implement this, and to have a clear definition of locales that we could reference. (Does the CLDR cover this kind of thing?)
Comment 3 Addison Phillips 2012-11-25 17:56:07 UTC
I strongly oppose adding a locale attribute to HTML. The @lang attribute already contains the necessary information and could solve the vast majority of formatting problems. Having a separate attribute would be confusing.

In the recent discussion of calendars, Richard pointed out the problem of tagging the language of other attributes that are different from the body of the page or the desired format. This is a legitimate issue--but generally rare and one we have not solved for other text properties like direction either.

To answer your question, CLDR does define formats and the mapping for locale subtags that extend language tags (RFC6067 makes these valid in language tags). Some vendors will probably object to making CLDR normative for formats. As with the ECMAScript I18N extension, this isnt a barrier to what HTML needs to do though.

So... I would prefer if this bug were just closed. The right way to address this in my opinion would be by adding normative text about formatting or displaying values according to language (locale) of the element as described by @lang. i.e. this :

<input type=number value=123456.78 lang=de-CH>

might display as something like this:

[    123'456,78]
Comment 4 Ian 'Hixie' Hickson 2012-11-25 20:47:31 UTC
Overloading "lang" to mean by "language" and "locale" seems dodgy to me. For example, "en" is a valid language, but "en-GB" and "en-US" have very different locales. A document might be all in the ISO-8601 locale, while using multiple different languages (and indeed, ISO-8601 doesn't even have a language, as far as I can tell, so I'm not sure how one would express that locale as a lang="" value).
Comment 5 Addison Phillips 2012-11-25 21:42:07 UTC
A "locale" is an artificial concept that we use to describe linguistic, cultural, and regional variations--and that we use to activate APIs that do formatting, parsing, resource lookup, and so forth for us.

Locale identifiers, for some time now, have been based on BCP47--that is, on language tags. A language tag in HTML is typically declarative ("this text is in English") but the difference from that to a locale identifier ("format this date in a US English manner") is very narrow. Some descriptions that help illustrate this are in:

   http://www.unicode.org/reports/tr35/#Locale

   http://www.unicode.org/reports/tr35/#Language_and_Locale_IDs

I have no idea what you mean by "ISO-8601" in your comment. ISO 8601 is a standard for recording date and time information, including intervals and such. It's a locale-neutral format, a wire format, not intended for human presentation. A "locale" provides a means of mapping between this neutral format and some human presentation (be it a string or a calendar control or what).

You're right that "en-US" and "en-GB" represent very different locales. They also represent different language variations. Both of these are valid language tags. They can also be used as locale tags to format data inserted into text, such as a number or date.
Comment 6 Martin Dürst 2012-11-26 08:19:24 UTC
(from comment #4)

> A document might be all in the ISO-8601 locale, while using multiple different
> languages.

(In reply to comment #5)

> I have no idea what you mean by "ISO-8601" in your comment. ISO 8601 is a
> standard for recording date and time information, including intervals and
> such. It's a locale-neutral format, a wire format, not intended for human
> presentation.

I think what Ian means is a document with multiple languages, but where e.g. all the dates are in ISO-8601. That may indeed make sense because with a mixture of languages, it will be difficult for the reader to associate every date with a language, and to parse it correctly. It might still work if the dates are in the middle of some text, but not if they are e.g. in a table. Although ISO-8601 is indeed mostly a wire format, it is not a bad choice in such a situation. The question is how to mark this up.
Comment 7 Martin Dürst 2012-11-26 08:26:49 UTC
(In reply to comment #4)
> Overloading "lang" to mean by "language" and "locale" seems dodgy to me. For
> example, "en" is a valid language, but "en-GB" and "en-US" have very
> different locales. A document might be all in the ISO-8601 locale, while
> using multiple different languages (and indeed, ISO-8601 doesn't even have a
> language, as far as I can tell, so I'm not sure how one would express that
> locale as a lang="" value).

Using the same attribute for language and locale is indeed confusing at first. But it actually makes a lot of sense. For content written in a certain (local variant of a) language, it makes very much sense to also use the number formatting conventions of that language/variant, the date formatting conventions, the sorting conventions, the quoting conventions, the monetary amount formatting conventions, the calendar (display) conventions, and so on, of that language/variant. Essentially, these conventions are part of the writing conventions of the language. Mark Davis created a very nice example of this. I guess that was at an Unicode Conference a few years ago. I'm not sure it's online, but I'm sure he can make it available.
Comment 8 Addison Phillips 2012-11-26 15:49:35 UTC
(In reply to comment #6)

> Although ISO-8601 is indeed mostly a wire format, it is not a bad choice in
> such a situation. The question is how to mark this up.

2012-11-26T15:34:51Z is not a very friendly presentation. I agree that a presentation that is similar to that ("2012-11-26 07:34:51 PST"?) can be useful, removing ambiguity from dates and times, but that usually isn't exactly ISO 8601.

The question is how to mark this up. This isn't "locale", though, as there may be many different formatting "patterns" that are common or acceptable in a given language/region/culture. This makes it a preference that we will (eventually) need to provide for. 

The ECMAScript extension is a useful reference here:

   http://norbertlindenberg.com/2012/10/ecmascript-internationalization-api/index.html
Comment 9 Addison Phillips 2012-11-26 15:56:07 UTC
(In reply to comment #7)
> Mark Davis
> created a very nice example of this. I guess that was at an Unicode
> Conference a few years ago. I'm not sure it's online, but I'm sure he can
> make it available.

It's in UTS#35 (it's the second link I put into comment #5):

   http://www.unicode.org/reports/tr35/#Language_and_Locale_IDs

But I'll quote it here:

--
Criteria for what makes a written language should be purely pragmatic; what would copy-editors say? If one gave them text like the following, they would respond that is far from acceptable English for publication, and ask for it to be redone:

    "Theatre Center News: The date of the last version of this document was 2003年3月20日. A copy can be obtained for $50,0 or 1.234,57 грн. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug Felt."

So one would change it to either B or C below, depending on which orthographic variant of English was the target for the publication:

    "Theater Center News: The date of the last version of this document was 3/20/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."

    "Theatre Centre News: The date of the last version of this document was 20/3/2003. A copy can be obtained for $50.00 or 1,234.57 Ukrainian Hryvni. We would like to acknowledge contributions by the following authors (in alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric Mader."

Clearly there are many acceptable variations on this text. For example, copy editors might still quibble with the use of first versus last name sorting in the list, but clearly the first list was not acceptable English alphabetical order. And in quoting a name, like "Theatre Centre News", one may leave it in the source orthography even if it differs from the publication target orthography. And so on. However, just as clearly, there limits on what is acceptable English, and "2003年3月20日", for example, is not.
---
Comment 10 Martin Dürst 2012-11-27 01:07:39 UTC
(In reply to comment #8)

> 2012-11-26T15:34:51Z is not a very friendly presentation. I agree that a
> presentation that is similar to that ("2012-11-26 07:34:51 PST"?) can be
> useful, removing ambiguity from dates and times, but that usually isn't
> exactly ISO 8601.

I agree that for date+time, it's very user-unfriendly. For dates only (, or separate dates and times), it can make more sense.
Comment 11 Addison Phillips 2012-11-27 01:28:28 UTC
(In reply to comment #10)
> (In reply to comment #8)
> 
> > 2012-11-26T15:34:51Z is not a very friendly presentation. I agree that a
> > presentation that is similar to that ("2012-11-26 07:34:51 PST"?) can be
> > useful, removing ambiguity from dates and times, but that usually isn't
> > exactly ISO 8601.
> 
> I agree that for date+time, it's very user-unfriendly. For dates only (, or
> separate dates and times), it can make more sense.

I agree---but I still maintain that this is a format option rather than a "locale". 

Content authors want/need this kind of control and we *should* be talking about providing those specific display options for number/date/time/etc. microformats. 

But I think that @lang is fully sufficient for describing the locale to use... and that it is desirable to use it for locale.

Put a different way, the locale takes care of knowing that the date separator is - or slash or dot or what. Some locale might prefer "2012.11.26" to "2012-11-26". The page author doesn't need to know this when authoring the page, even if they specify a picture string or skeleton such as:

<input type=date format="y-m-d" value="2012-11-26"> <!-- @lang inherited from page -->
Comment 12 mark 2012-11-29 05:03:54 UTC
I agree with the comments that:

- For presentation and input, readers want to use dates & times in the formats appropriate to their own languages.
- For unusual circumstances, ISO 8601 dates may be appropriate; even there, the formatting is better without the T and Z. But those are unusual cases; ISO 8601 is better suited for a wire format.
- There is no need for a 'locale' element or attribute. The @lang attribute is completely sufficient (for reasons outlined below).
Comment 13 Ian 'Hixie' Hickson 2013-03-09 01:13:28 UTC
Well we can't use lang="" itself, since that would break existing pages all over the Web. But if the state of the art in describing locales is to use language codes, then fair enough. Seems a bit weird to me, but whatever.


To make progress on this bug, we need the following:

 - CSS-level features for localising content

 - A list of what needs to happen with respect to automatic localisation
    e.g. <input type=date> UI, <time> rendering... (see also comment 1)

 - A reference to use for locale labels and for how to interpret them when
   using them for the things in the previous bullet

 - Implementor interest
Comment 14 mark 2013-03-09 11:05:44 UTC
> Well we can't use lang="" itself, since that would break existing pages

​I'm not really up on this topic, and am trying to understand the issue. Is it that if you had 

<input type="date" name="fname">

and there was an enclosing tag with lang="de-AT", that only dates appropriate for de-AT would be accepted? 


BTW, ​I was looking at instances of 'locale' in http://www.w3.org/html/wg/drafts/html/master/single-page.html#language, and happened to run across the table:

Locale language	 Suggested default encoding
ar	UTF-8
be	ISO-8859-5
bg	windows-1251
​...

This guidance may be somewhat out of date. I looked at stats for our search indices for a few TLDs, and UTF-8 has risen significantly. For example, for .jp, .th, and .de it is significantly greater than all other encodings combined. For bg, mentioned above, it is about 2x all other encodings combined. Only for .cn is it about the just less than another encoding, and for .ru just above another encoding.

This would take a bit more work, and should be confirmed by some other search engines, but that table might need a bit of tweaking.
Comment 15 Martin Dürst 2013-03-09 11:46:17 UTC
(In reply to comment #14)
> BTW, ​I was looking at instances of 'locale' in
> http://www.w3.org/html/wg/drafts/html/master/single-page.html#language, and
> happened to run across the table:
> 
> Locale language	 Suggested default encoding
> ar	UTF-8
> be	ISO-8859-5
> bg	windows-1251
> ​...
> 
> This guidance may be somewhat out of date. I looked at stats for our search
> indices for a few TLDs, and UTF-8 has risen significantly.

Even if 99% of all pages use UTF-8, the default encoding in this table may be right because it may be the percentage of unlabeled pages that counts, or it may be the percentage of not-easily detectable pages (UTF-8 is easy to detect) that counts, or something similar.
Comment 16 mark 2013-03-09 14:44:02 UTC
That's a good point (although I'm not sure about 99% being a good cutoff point). But it is hard to say without knowing what kinds of considerations were factors in the data for the original table.

Part of the problem is that there is a fair amount of slightly damaged UTF-8 out in the wild, or cases where an ad on the page has a different encoding than the main text (a bad include). If the test is simply whether the page is completely valid UTF-8, then a suboptimal decision might be made.

Of course, if these are only guidelines, then it doesn't matter too much.

{phone}
Comment 17 Addison Phillips 2013-03-09 16:36:10 UTC
(In reply to comment #13)
> Well we can't use lang="" itself, since that would break existing pages all
> over the Web. 

How would it break existing content? I don't see a case.


> But if the state of the art in describing locales is to use
> language codes, then fair enough. Seems a bit weird to me, but whatever.
> 
> 
> To make progress on this bug, we need the following:
> 
>  - CSS-level features for localising content

A few features exist for localizing content by language, such as list indicators or via the :lang pseudo operator. But you're correct: it's a gap for html+ css. However presentation of user agent user interface such as calendar pickers or time pickers, whose interface is implementation defined, does not depend on this.

> 
>  - A list of what needs to happen with respect to automatic localisation
>     e.g. <input type=date> UI, <time> rendering... (see also comment 1)

This is a job for CLDR, although I imagine getting agreement among browsers will be tricky.
> 
>  - A reference to use for locale labels and for how to interpret them when
>    using them for the things in the previous bullet

The ECMAScript I18N extension provides one. It would best if html and JavaScript were identical in regard.
> 
>  - Implementor interest

Always. But I'm still hoping you will close this bug. I don't want to see an attribute named locale, although I would like see locale/language aware rendering of data in web pages.
Comment 18 Ian 'Hixie' Hickson 2013-04-12 23:33:28 UTC
We need _something_ in HTML. What would you suggest, if neither lang="" (which we can't use for back-compat reasons) nor a lang-like attribute like locale=""?
Comment 19 Addison Phillips 2013-04-12 23:48:09 UTC
I see the need for locale markup in HTML. What I don't understand is your assertion that using @lang for this would break existing content. What would it break and how?
Comment 20 Ian 'Hixie' Hickson 2013-10-23 20:39:38 UTC
Say you have a page that says:

   <body lang="en-US">
     <p>Enter your date: <input type=date></p>
   </body>

...and you view it today in a UK setting. You'll geta UK widget. If we change the widget based on the lang="" attribute, the page will change behaviour, in a way that the author never expected nor tested for. Given the prevalence of copy-paste authoring, it's extremely likely that there'll be many pages with this kind of thing going on. It's bad form to take something that previously had no effect, and make it have an effect of changing the user's interface.

(In reply to Addison Phillips from comment #17)
> > 
> > To make progress on this bug, we need the following:
> > 
> >  - CSS-level features for localising content
> 
> A few features exist for localizing content by language, such as list
> indicators or via the :lang pseudo operator. But you're correct: it's a gap
> for html+css. However presentation of user agent user interface such as
> calendar pickers or time pickers, whose interface is implementation defined,
> does not depend on this.

If we're going to do this, we should do it right. Otherwise, we'll have to introduce more and more hooks so that we don't break pages each time.


> >  - A list of what needs to happen with respect to automatic localisation
> >     e.g. <input type=date> UI, <time> rendering... (see also comment 1)
> 
> This is a job for CLDR, although I imagine getting agreement among browsers
> will be tricky.

I mean that we need a list of HTML features that we want localised, and rules for how to apply the CLDR information to each one.


> >  - A reference to use for locale labels and for how to interpret them when
> >    using them for the things in the previous bullet
> 
> The ECMAScript I18N extension provides one. It would best if html and
> JavaScript were identical in regard.

URL?
Comment 21 Norbert Lindenberg 2014-07-22 04:55:08 UTC
(In reply to Ian 'Hixie' Hickson from comment #20)

> > The ECMAScript I18N extension provides one. It would best if html and
> > JavaScript were identical in regard.
> 
> URL?

Official:
http://www.ecma-international.org/publications/standards/Ecma-402.htm

HTML version:
http://ecma-international.org/ecma-402/1.0/

Relevant in particular are sections 6.2, Language Tags, and 9, Locale and Parameter Negotiation.
Comment 22 Addison Phillips 2014-07-22 16:24:22 UTC
(In reply to Ian 'Hixie' Hickson from comment #20)
> Say you have a page that says:
> 
>    <body lang="en-US">
>      <p>Enter your date: <input type=date></p>
>    </body>
> 
> ...and you view it today in a UK setting. You'll geta UK widget. If we
> change the widget based on the lang="" attribute, the page will change
> behaviour, in a way that the author never expected nor tested for. Given the
> prevalence of copy-paste authoring, it's extremely likely that there'll be
> many pages with this kind of thing going on. It's bad form to take something
> that previously had no effect, and make it have an effect of changing the
> user's interface.
> 
I don't agree that the page will "change behavior in a way that the author never expected nor tested for". The current behavior is that the page author has no control over how the date input control appears (assuming there is anything more than a text box). The user-agent looks at the user's environment to determine locale and the presentation used is often quite different from what the page author would want (which is usually consistency with the page). The only way the page author could have an "expectation" or test for anything would be to try every browser in every locale combination on every platform.

It's my experience that developers wishing to work around this use JS widgets that they enforce the localization on or they call the server to format strings for them. Either is a huge pain. The JS Intl extension helps by putting the formatting capability into JS. But that still leaves this bug: providing a convenient, consistent way to format data values into strings or widgets would be a benefit to page authors (who could control what the user sees without lots of special effort) and users (who get a consistent page and browser experience) without breaking existing pages.
Comment 23 Cameron Jones 2014-07-23 16:09:48 UTC
(In reply to Ian 'Hixie' Hickson from comment #20)
> Say you have a page that says:
> 
>    <body lang="en-US">
>      <p>Enter your date: <input type=date></p>
>    </body>
> 
> ...and you view it today in a UK setting. You'll geta UK widget. If we
> change the widget based on the lang="" attribute, the page will change
> behaviour, in a way that the author never expected nor tested for. Given the
> prevalence of copy-paste authoring, it's extremely likely that there'll be
> many pages with this kind of thing going on. It's bad form to take something
> that previously had no effect, and make it have an effect of changing the
> user's interface.
> 

The problem is that UAs have bypassed the document langauge settings and enforced the application of the user's preferred locale.

What is needed is a way to restore the document @lang definition by disabling the user locale override and re-instating document priority.

There are a couple of options on how this could be implemented, perhaps as a new html @localized attribute, or perhaps a meta extension?

As additions to html, this would avoid impacting any existing documents.
Comment 24 Ian 'Hixie' Hickson 2014-07-23 18:21:18 UTC
Here's a more detailed description of the scenario I described in comment 20:

1. Author A from the USA writes a page that says: 
    <body lang="en-US">
      <p>Enter your date: <input type=date></p>
    </body>

2. Author A tests the form from their home in the USA. The form accepts dates in
   the form 12/31/2014 (because author A's browser uses the USA locale).

3. Author A deploys the page, happy that their users will see what author A sees.

4. Author B wants to write a page with a date widget but doesn't know how to. They
   find the aforementioned page by author A. Author B is in the UK. They see that
   the widget accepts dates of the form 31/12/2014 (because author B's browser
   uses the UK locale).

5. Author B copies and pastes author B's page, tests it, and deploys it, happy in
   the knowledge that author B's users will see what author B sees.

6. Much rejoicing from users of the pages by authors A and B. Author B asks their
   users to pick whether to go on holiday on January 2nd or February 1st.

6. We come along as redefine lang="" as affecting <input type=date>'s formatting.

7. Author A's users are unaffected. Author B's users now see a widget that accepts
   dates of the form 12/31/2014, but don't realise it. Those who want holidays in
   January enter 01/02/2015, those who want to go in February enter 02/01/2015.

8. Time passes. The holidays are booked. Tickets are sent. Much money changes hands.

9. Pandemonium breaks out, because many users of author B's page have tickets for 
   the wrong holiday.
Comment 25 Addison Phillips 2014-07-23 21:56:31 UTC
(In reply to Ian 'Hixie' Hickson from comment #24)
> Here's a more detailed description of the scenario I described in comment 20:

I agree that this scenario might be a problem, in cases where the browser displays the value formatted in an ambiguous form. I notice that Chrome does this currently--and thus your scenario already exists in reverse. If I visit a page for (let's say) British Airways and see that the date of my travel is 02/01/2014 while the page is in UK English, mightn't I naturally assume that this is January 2nd rather than February 1st (the locale of my computer)? What if I'm using an unfamiliar computer, such as in an Internet cafe?

The page author has no control now. Isn't that an issue?

I agree with Comment #23 that we could ameliorate this issue by providing an extension to markup to "turn on" localized formatting. What I want to avoid is having multiple ways of marking up language/locale information in an HTML document.

Note: I have a demo page which I think is helpful here:

   http://www.inter-locale.com/test/DateDemo.html 

Expanding on Cameron's comment, just having @localized might be good. We might also provide picture string/format control/skeleton? Or are we getting too baroque in what we expect from HTML?

   <input type=date name=xxx lang=en-US localized=mdy /> <!-- produces month-day-year format with appropriate local separators for US English -->

   <input type=date name=xxx lang=en-US /> <!-- produces a date value localized according to the browser locale, since no @localized -->
Comment 26 Cameron Jones 2014-07-24 16:20:36 UTC
(In reply to Addison Phillips from comment #25)
> (In reply to Ian 'Hixie' Hickson from comment #24)
> > Here's a more detailed description of the scenario I described in comment 20:

I'm not that sure that mindless copy/paste should be a design consideration.

The state of play is that there are local settings which affect the presentation of the data to the user with the effect that the meaning of the document changes in regards to how - or where - it is viewed.

We should remember that this only affects the new HTML5 date/time form controls and prior to this i don't think browsers were imposing auto-localization (there was no such functionality).

So, i guess to step back, we could ask if the approach taken by browsers in 
implementing the HTML5 date/time controls is just a bug? That was my initial appraisal.

However, i believe that the descision by browsers to auto-localize has been taken in the name of the user's best interest. In some regards this does remove the ambigous situtation as the user will always see data within their own locale. 

So in effect, the browser has protected its internal integrity across pages at the cost of a website's integrity across browsers.

Given that the browser advertises its locale within the HTTP header, i don't think that it should be auto-localizing as the page should have already been localized in processing the appropriate response.

However, this kind of auto-behaviour is also seen with translation tools which incurred the requirement to introduce the translate="no" attribute. This auto-localize case is effectively synonumous with that use case and so should be served with the same solution, namely a global localize="no" attribue.

This would also allow for turning on/off the existing behaviour over partial DOM trees.

> 
> I agree that this scenario might be a problem, in cases where the browser
> displays the value formatted in an ambiguous form. I notice that Chrome does
> this currently--and thus your scenario already exists in reverse. If I visit
> a page for (let's say) British Airways and see that the date of my travel is
> 02/01/2014 while the page is in UK English, mightn't I naturally assume that
> this is January 2nd rather than February 1st (the locale of my computer)?
> What if I'm using an unfamiliar computer, such as in an Internet cafe?
> 
> The page author has no control now. Isn't that an issue?
> 

One of the main problems with auto-localize is that it is impossible to declare form controls in a specific locale.

This means that multi-lingual pages are impossible.

So, it is impossible to write lanaguge tutorials or other cross-cultural educational documents.

> (In reply to Addison Phillips from comment #25)
> Expanding on Cameron's comment, just having @localized might be good. We
> might also provide picture string/format control/skeleton? Or are we getting
> too baroque in what we expect from HTML?
> 
>    <input type=date name=xxx lang=en-US localized=mdy /> <!-- produces
> month-day-year format with appropriate local separators for US English -->
> 
>    <input type=date name=xxx lang=en-US /> <!-- produces a date value
> localized according to the browser locale, since no @localized -->

Thinking about @localized (which i impied would be boolean), i think it would be better to follow the approach from translate="no" as stated above.

The ability to control the format was something i thought would be better placed for CSS as it *should* be a purely presentational customization.
Comment 27 Cameron Jones 2014-07-25 14:24:53 UTC
A more concrete use case to consider is that of multi-lingual websites which offer the ability for a user to choose their desired language through website controls - there are lots of examples of this within europe, notably:

voyages-sncf.com

In this case the website locale setting is out-of-band information wrt HTTP and the inference of the browser. The site will ignore any Accept-Language header and instead use proprietary cookie values for language settings.

The website will return a translated and localized page in the user's locale of choice, however due to an auto-localization policy defined by local browser settings there is no way for the website to control the localization of page data.

The effect of this is the inability for a website to provide a localized experience for their user due to the lack of control over auto-localization.

An additional consideration regardless of introducing an auto-localization switch is the effect that auto-localization has within CSS. 

Currently the :lang pseudo-class is used to target rules against the language resolution algorithm in HTML. The effect of auto-localization is that :lang no longer reflects the locale used for localizing data, so there is no way to target styles for locales.

With this being the case, there may be an additional need to introduce a :locale pseudo-class to CSS which factors in auto-localization and the user locale.
Comment 28 Ian 'Hixie' Hickson 2014-07-28 22:07:37 UTC
(In reply to Addison Phillips from comment #25)
> 
> The page author has no control now. Isn't that an issue?

Yes. I don't think anyone is arguing that we shouldn't provide authors with a way to control this.


If we want a mechanism to turn localisation on or off, then CSS is probably the best place for it, not the markup. In the markup, if we have a mechanism, it should just be a way to specify the locale. If we believe lang="" is sufficient for that, then there's nothing to add to the markup; we only need a mechanism to turn it on, which would be in CSS. If lang="" isn't enough, then we can have a new mechanism (e.g. locale=""), and then the presence of that mechanism can force the localisation on, and then we don't need something in CSS (though we probably still want it, long-term).

(In reply to Cameron Jones from comment #26)
> 
> I'm not that sure that mindless copy/paste should be a design consideration.

It's one of the main ways that Web development happens.
Comment 29 Cameron Jones 2014-07-29 16:30:24 UTC
I think it might be better to start with some appraisal of whether auto-localization is a feature or a bug.

The ramifications of auto-localization being a feature are quite explosive, and i don't believe that the implications have been thought through.

What benefit is auto-localization providing today such that it warrents the necessity for an escape hatch? 

We must consider the eventuality that if an escape hatch is provided, will it be used by default? Does this not render the default behaviour a bug?

> (In reply to Ian 'Hixie' Hickson from comment #28)
> 
> If we want a mechanism to turn localisation on or off, then CSS is probably
> the best place for it, not the markup. 

In lieu of some specific syntax to consider, i think this could be problematic as the locale will be defined through the same place as it is needed to be used.

> (In reply to Cameron Jones from comment #26)
> > 
> > I'm not that sure that mindless copy/paste should be a design consideration.
> 
> It's one of the main ways that Web development happens.

As a declarative language, HTML by definition is a description of *what* the document means. There are no useless or unimportant definitions. Everything has an effect. Therefore, supporting a model of copy/paste which is not simply a manifestation of referential transparency would violate the essential nature of a declarative language.

You can not say something has meaning, and then ignore it at will (or when *some* people use it incorrectly/without consideration for its effects). To do so would be to render valid uses invalid and "break things across the web"(tm).

If people have copy/pasted that their page is within the Inuit locale then lo(!) forever more it shall be.
Comment 30 Ian 'Hixie' Hickson 2014-07-29 16:52:26 UTC
(In reply to Cameron Jones from comment #29)
> I think it might be better to start with some appraisal of whether
> auto-localization is a feature or a bug.

Not sure exactly what you mean by "auto-localisation", but from context it looks like you mean the feature that exists today that causes form controls to render according to the user's local platform conventions rather than having the same UI for everyone.

If that is what you mean, then it clearly seems like a feature. The alternative would be for type=date to show a Chinese calendar (since that's the most-widely used calendar in terms of users), and I, for one, have no idea how to read Chinese.


> What benefit is auto-localization providing today such that it warrants the
> necessity for an escape hatch?

The benefits are not what warrant an "escape hatch". This bug is just a feature request from authors to be able to control the localisation more specifically.


> We must consider the eventuality that if an escape hatch is provided, will
> it be used by default? Does this not render the default behaviour a bug?

I don't understand what you mean. We can't change the defaults. Maintaining backwards-compatibility is paramount.


> In lieu of some specific syntax to consider, i think this could be
> problematic as the locale will be defined through the same place as it is
> needed to be used.

I don't understand what you mean.


> As a declarative language, HTML by definition is a description of *what* the
> document means.

Yes.

> There are no useless or unimportant definitions.

That's clearly false. There's lots of ways of including useless or unimportant HTML markup. For example:

   <span class=""></span>

...is semantically moot.


> Everything has an effect.

Not really.


> Therefore, supporting a model of copy/paste which is not
> simply a manifestation of referential transparency would violate the
> essential nature of a declarative language.

I'm not sure what you mean by "supporting". The simple fact of the matter is that significant volumes of Web content are generated by authors who don't understand the nuances of HTML yet, and they get their documents working by copying and pasting something that works nearly as they want, and then mutating it until it works well enough for them to deploy. I don't pass a value judgement on this matter, it's just how it is.


> You can not say something has meaning, and then ignore it at will (or when
> *some* people use it incorrectly/without consideration for its effects). To
> do so would be to render valid uses invalid and "break things across the
> web"(tm).

I'm not sure to what you are referring here.


> If people have copy/pasted that their page is within the Inuit locale then
> lo(!) forever more it shall be.

It's not the semantic meaning we have to preserve, it's the user-visible end result.
Comment 31 Cameron Jones 2014-07-30 15:23:43 UTC
(In reply to Ian 'Hixie' Hickson from comment #30)
> (In reply to Cameron Jones from comment #29)
> > I think it might be better to start with some appraisal of whether
> > auto-localization is a feature or a bug.
> 
> Not sure exactly what you mean by "auto-localisation", but from context it
> looks like you mean the feature that exists today that causes form controls
> to render according to the user's local platform conventions rather than
> having the same UI for everyone.

Almost. That should be:

"...the feature that exists today that causes form controls
to render according to the user's local platform conventions rather than
the language negotiation of HTTP and the HTML language resolution algorithm"

> 
> If that is what you mean, then it clearly seems like a feature. The
> alternative would be for type=date to show a Chinese calendar (since that's
> the most-widely used calendar in terms of users), and I, for one, have no
> idea how to read Chinese.
> 

No, it wouldn't. Nothing would show the Chinese calendar unless it had been set somewhere in the chain of locale settings, by default:

element -> parent -> html -> meta -> content-language -> user locale

The base-line default always falls back to the "user's local platform conventions". 

So, unless you had installed a Chinese operating system or otherwise set your environment to Chinese you would never see a Chinese calendar, unless you visited a Chinese web site.

> 
> > What benefit is auto-localization providing today such that it warrants the
> > necessity for an escape hatch?
> 
> The benefits are not what warrant an "escape hatch". This bug is just a
> feature request from authors to be able to control the localisation more
> specifically.
> 

No. The bug is that it is impossible to show anything *other* than the "user's local platform conventions".

> 
> > We must consider the eventuality that if an escape hatch is provided, will
> > it be used by default? Does this not render the default behavior a bug?
> 
> I don't understand what you mean. We can't change the defaults. Maintaining
> backwards-compatibility is paramount.

There is no backwards compatibility to support.

Since any data-point which would require localization is new to HTML5 and the implementations of such features is patchy at best there is no precedence for existing uses to preserve.

That browsers have implemented some patchy prototypes which don't take into account locale resolution, it has to be questioned if they have considered it at all. Ergo a bug.

For a tangential discussion, what scope is there within a "living standard" for the appropriate review and refinement of new features? If the first draft is baked in stone with the first implementation, this doesn't provide the necessary environment for global standards development impacting disparate users. 

> 
> 
> > In lieu of some specific syntax to consider, i think this could be
> > problematic as the locale will be defined through the same place as it is
> > needed to be used.
> 
> I don't understand what you mean.

I thought that you were implying that the locale could be set through CSS properties, in which case there is scope for infinite loops when you consider the additional requirement of needing to style based on locale selectors:

:locale("fr") {
	locale: "en";	
}

:locale("en") {
	locale: "fr";	
}

> 
> 
> > As a declarative language, HTML by definition is a description of *what* the
> > document means.
> 
> Yes.
> 
> > There are no useless or unimportant definitions.
> 
> That's clearly false. There's lots of ways of including useless or
> unimportant HTML markup. For example:
> 
>    <span class=""></span>
> 
> ...is semantically moot.

Nope. Still means something. That it has no content is moot.

I can still style it. I can still JS some content into it. Removing it from the page might break it in unknown or unexpected ways. We cannot make that judgment.

> 
> > Therefore, supporting a model of copy/paste which is not
> > simply a manifestation of referential transparency would violate the
> > essential nature of a declarative language.
> 
> I'm not sure what you mean by "supporting". The simple fact of the matter is
> that significant volumes of Web content are generated by authors who don't
> understand the nuances of HTML yet, and they get their documents working by
> copying and pasting something that works nearly as they want, and then
> mutating it until it works well enough for them to deploy. I don't pass a
> value judgment on this matter, it's just how it is.
> 

Within the specific context of <html lang="">, I think that is a throw back to the bygone days of (X)HTML 4(.01) (Strict|Transitional|Frameset) when it was too confusing and difficult to remember what anyone should put at the top of their shiny new HTML page. That technological requirement forced people to lookup and use the closest DOCTYPE to hand, with little consideration for what else they were copying (lang, xml:lang, dir, xmlns).

There is no problem with copy/paste, the problem is attempting to negate the semantics of the document because people have used it incorrectly. 

What is the problem with users getting "their documents working by copying and pasting something that works nearly as they want" and then mutating the <html lang=""> until it works well enough for them?

> 
> > You can not say something has meaning, and then ignore it at will (or when
> > *some* people use it incorrectly/without consideration for its effects). To
> > do so would be to render valid uses invalid and "break things across the
> > web"(tm).
> 
> I'm not sure to what you are referring here.
> 
> 
> > If people have copy/pasted that their page is within the Inuit locale then
> > lo(!) forever more it shall be.
> 
> It's not the semantic meaning we have to preserve, it's the user-visible end
> result.

But we have no standard user-visible behavior at present. All we have are partial implementations of a draft specification.

Looking again at Addison's test page in the various browsers and i'm struggling to find any localization happening at all any more. It appears that browsers have backed out of this functionality.

I think they are looking for some clear direction over the currently non-normative and ambiguous advice in section 4.10.5.2:

http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html#input-impl-notes

One way to simplify the problem space is to avoid baking in auto-localization and stick with the HTML language resolution algorithm as the sole means to derive a locale. This will avoid any knock-on affects to CSS and at least allow the new data points to be implemented consistently.

The notion of a browser implementing a default localization policy across all pages is something to consider, but the question is how important - or useful - is this when considered as a distinct and separate function from translation.

If an english monoglot visits a french website, what use is there in 'localizing' the page data for them? Either they understand the content or they do not.

The real scope for auto-localization is when viewed between 'en-US' and 'en-GB' or any other variation on the 'en' base. In this case a website could produce a generic english document (with the exclusion of spelling differences) and yet support through default client-side localization the translation of data points within the local cultural conventions.

The exclusion of spelling differences highlights that this is essentially a futile exercise as it will be impossible to provide a completely generic base document which maintains integrity across all cultural variations.

So, i conclude that the notion of client-side automated semi-translation of localizable data points is a bogus concept. 

I think instead we should just be looking at non-html derived localization being the same as translation and being catered for in the same manor.

As such, the translate="" attribute can be used to semantically denote intrinsically localized data points.

The main downside to all of this is 'en-US' users visiting 'en-GB' web sites needing to cope with the strange month/day variations in date widgets. This is already defacto standard for proprietary localized form controls stuffed into type="text".
Comment 32 Addison Phillips 2014-07-30 15:38:58 UTC
(In reply to Ian 'Hixie' Hickson from comment #28)
> (In reply to Addison Phillips from comment #25)
> > 
> > The page author has no control now. Isn't that an issue?
> 
> Yes. I don't think anyone is arguing that we shouldn't provide authors with
> a way to control this.
> 
> 
> If we want a mechanism to turn localisation on or off, then CSS is probably
> the best place for it, not the markup. In the markup, if we have a
> mechanism, it should just be a way to specify the locale. If we believe
> lang="" is sufficient for that, then there's nothing to add to the markup;
> we only need a mechanism to turn it on, which would be in CSS. If lang=""
> isn't enough, then we can have a new mechanism (e.g. locale=""), and then
> the presence of that mechanism can force the localisation on, and then we
> don't need something in CSS (though we probably still want it, long-term).
> 

I agree with this. I'll note that section 4.10.1.5 has what I consider the appropriate wording on presentation.

So I guess the way to resolve this bug would be to ask CSS to define a property that can be set to control (turn on/off) the localization of input elements in the page?
Comment 33 Ian 'Hixie' Hickson 2014-07-30 20:20:04 UTC
There's long been talk of having CSS provide ways to localise content like dates and times (which would apply to <time>, for instance), and it would definitely also be good to have CSS have a way to provide information to control bindings that describes how they should be localised. If CSS were to add either or both of these features, I'd be more than happy to work to make sure HTML hooked into them appropriately.
Comment 34 Ian 'Hixie' Hickson 2014-07-30 20:48:39 UTC
(In reply to Cameron Jones from comment #31)
> > Not sure exactly what you mean by "auto-localisation" [...]
> 
> "...the feature that exists today that causes form controls
> to render according to the user's local platform conventions rather than
> the language negotiation of HTTP and the HTML language resolution algorithm"

Ah, ok. So auto-platform-locale-localisation vs auto-content-locale-localisation.

I think good arguments can be made either way regarding which the default automatic localisation should be (the user's, or the author's).


> > This bug is just a feature request from authors to be able to control the
> > localisation more specifically.
> 
> The bug is that it is impossible to show anything *other* than the
> "user's local platform conventions".

I don't understand the difference between those statements.


> There is no backwards compatibility to support.

I feel it is important to handle the backwards-compatibility implications described in comment 24.


> Since any data-point which would require localization is new to HTML5 and
> the implementations of such features is patchy at best there is no
> precedence for existing uses to preserve.

The relevant point to which we need to be backwards compatible is what has shipped.

This is made clearer by not considering the current contemporary HTML spec to be "HTML5". Calling it "HTML5" implies that there has been one standard from 2004 to 2014. There have been thousands of HTML versions in that time.


> That browsers have implemented some patchy prototypes which don't take into
> account locale resolution, it has to be questioned if they have considered
> it at all. Ergo a bug.

We can call it "bugwards compatible" if that terminology feels more accurate to you. The end result is the same.


> For a tangential discussion, what scope is there within a "living standard"
> for the appropriate review and refinement of new features? If the first
> draft is baked in stone with the first implementation, this doesn't provide
> the necessary environment for global standards development impacting
> disparate users. 

Web standards are set in stone by the first content to rely on a particular implementation, whether or not the standard is described as "living" or not. You could publish a standard and have it be a W3C Recommendation for 10 years, but if nobody has used it, you could change it tomorrow. You could publish an IETF RFC draft marked with all manner of "DO NOT USE" or "UNSTABLE DRAFT" warnings, but if someone deploys a new version of amazon.com using that feature five seconds later, your review period is over. Similarly, if your draft doesn't match what was actually shipped, yet people depend on the feature, then your draft needs to be updated to match what was shipped, regardless of whether you like it, and regardless of whether you called your draft a "first public working draft" or an "ISO standard".

The term "Living Standard" tries to reflect this reality by removing the fake or misleading stability markers inherent in calling a specification a "draft" or "finished". While the spec is relevant, it must be updated and must take into account the reality it is attempting to steer.


> I thought that you were implying that the locale could be set through CSS
> properties, in which case there is scope for infinite loops when you
> consider the additional requirement of needing to style based on locale
> selectors:
> 
> :locale("fr") {
> 	locale: "en";	
> }
> 
> :locale("en") {
> 	locale: "fr";	
> }

I assume you mean ':lang()' rather than ':locale()'. I don't think anyone was proposing a ':locale()' pseudo-class (what would it match against?).

There's no infinite loop here, since the properties cannot affect the state that the selectors depend on.


> > > There are no useless or unimportant definitions.
> > 
> > That's clearly false. There's lots of ways of including useless or
> > unimportant HTML markup. For example:
> > 
> >    <span class=""></span>
> > 
> > ...is semantically moot.
> 
> Nope. Still means something. That it has no content is moot.
> 
> I can still style it. I can still JS some content into it. Removing it from
> the page might break it in unknown or unexpected ways. We cannot make that
> judgment.

CSS cannot affect the semantics of that piece of markup. It's semantically moot regardless of how you style it.

Script could mutate the content, certainly. In that case it would be different content, and might no longer be semantically moot.

In the absence of script or style, that markup does nothing useful. My point is just that the statement "There are no useless or unimportant definitions." is not unconditionally true.


> Within the specific context of <html lang="">, I think that is a throwback
> to the bygone days of (X)HTML 4(.01) (Strict|Transitional|Frameset) when it
> was too confusing and difficult to remember what anyone should put at the
> top of their shiny new HTML page. That technological requirement forced
> people to lookup and use the closest DOCTYPE to hand, with little
> consideration for what else they were copying (lang, xml:lang, dir, xmlns).

This kind of authoring behaviour goes far beyond page boilerplate.


> What is the problem with users getting "their documents working by copying
> and pasting something that works nearly as they want" and then mutating the
> <html lang=""> until it works well enough for them?

The "problem" is just that it means we can't change what "lang" does, qv comment 24.


> > It's not the semantic meaning we have to preserve, it's the user-visible end
> > result.
> 
> But we have no standard user-visible behavior at present.

The world doesn't care if the behaviour is standard or not. It cares about whether it is deployed or not.


> All we have are partial implementations of a draft specification.

That's all we ever have.


> If an english monoglot visits a french website, what use is there in
> 'localizing' the page data for them? Either they understand the content or
> they do not.

There are multiple locales even within English.

Personally I would like all the sites I go to to give dates in the form YYYY-MM-DD, rather than the mix of MM/DD/YYYY, DD/MM/YYYY, and DD/MM/YY that I get now (I read content in three locales, not all English-based). Similarly, I want all numbers to be in the form "N,NNN.nnn" rather than the mix of "NNNN.nnn", "N NNN,nnn", "N'NNN.nnn", etc, that I get now. I personally think it's way more useful for the page to be localised to my platform's locale than it is for the page to be localised to the author's locale.


> So, i conclude that the notion of client-side automated semi-translation of
> localizable data points is a bogus concept. 

I do not draw that same conclusion, but I certainly understand where you're coming from. Luckily I think we can continue to expand the platform til we are both happy.
Comment 35 Cameron Jones 2014-07-31 16:01:54 UTC
(In reply to Ian 'Hixie' Hickson from comment #34)
> (In reply to Cameron Jones from comment #31)
> > > Not sure exactly what you mean by "auto-localisation" [...]
> > 
> > "...the feature that exists today that causes form controls
> > to render according to the user's local platform conventions rather than
> > the language negotiation of HTTP and the HTML language resolution algorithm"
> 
> Ah, ok. So auto-platform-locale-localisation vs
> auto-content-locale-localisation.

Exactly.

> 
> I think good arguments can be made either way regarding which the default
> automatic localisation should be (the user's, or the author's).
> 

Yes, it becomes quite academic deciding defaults. The implications of supporting both configurations results in the same technical requirements.

Which behavior is default and which must be configured is largely inconsequential.

> > Since any data-point which would require localization is new to HTML5 and
> > the implementations of such features is patchy at best there is no
> > precedence for existing uses to preserve.
> 
> The relevant point to which we need to be backwards compatible is what has
> shipped.
> 
> This is made clearer by not considering the current contemporary HTML spec
> to be "HTML5". Calling it "HTML5" implies that there has been one standard
> from 2004 to 2014. There have been thousands of HTML versions in that time.
> 

Fair enough.

> 
> > That browsers have implemented some patchy prototypes which don't take into
> > account locale resolution, it has to be questioned if they have considered
> > it at all. Ergo a bug.
> 
> We can call it "bugwards compatible" if that terminology feels more accurate
> to you. The end result is the same.

I wasn't aware of the term, but it is an accurate description.

> 
> The term "Living Standard" tries to reflect this reality by removing the
> fake or misleading stability markers inherent in calling a specification a
> "draft" or "finished". While the spec is relevant, it must be updated and
> must take into account the reality it is attempting to steer.

Fair enough. I appreciate the practicalities of the process.

> 
> I assume you mean ':lang()' rather than ':locale()'. I don't think anyone
> was proposing a ':locale()' pseudo-class (what would it match against?).
> 

Ignore the prior negative example. 

I was considering the possibility that :locale() might be needed to allow CSS rules to target specific localizations. It would match against either the platform locale or page locale depending on which was enabled (pending such a feature in CSS).

If we consider that the effective locale may be set by the browser configuration, authors may want to style for those locales. In the following example the stylesheet refines the currency for Australian users by denoting that it is USD:

	<style>
		span.currency::before {
			content: "$";
		}

		span.currency:locale(en-AU)::after {
			content: "USD";
		}
	</style>
	
	<span class="currency" lang="en-US">100.00</span>
	
> 
> Personally I would like all the sites I go to to give dates in the form
> YYYY-MM-DD, rather than the mix of MM/DD/YYYY, DD/MM/YYYY, and DD/MM/YY that
> I get now (I read content in three locales, not all English-based).
> Similarly, I want all numbers to be in the form "N,NNN.nnn" rather than the
> mix of "NNNN.nnn", "N NNN,nnn", "N'NNN.nnn", etc, that I get now. I
> personally think it's way more useful for the page to be localised to my
> platform's locale than it is for the page to be localised to the author's
> locale.
> 

Dates in the form "YYYY-MM-DD" is an international syntax and does not exist as a cultural format in any locale. It would require the format to be overridden either through author or user stylesheets (functionality and syntax pending).

One problem with localizing according to the platform locale is that this can lead to inconsistencies within the page if any data has been localized on the server. The main way to avoid this - or to phrase it another way - to provide capabilities for people to mitigate this, is to integrate client-side localization for any potentially localizable data.

This was the main thrust for <data> (and <time>).

> 
> > So, i conclude that the notion of client-side automated semi-translation of
> > localizable data points is a bogus concept. 
> 
> I do not draw that same conclusion, but I certainly understand where you're
> coming from. Luckily I think we can continue to expand the platform til we
> are both happy.

Yes, the conclusion i draw is from a personal authoring decision process not from potential functional capability.

(In reply to Ian 'Hixie' Hickson from comment #33)
> There's long been talk of having CSS provide ways to localise content like
> dates and times (which would apply to <time>, for instance), and it would
> definitely also be good to have CSS have a way to provide information to
> control bindings that describes how they should be localised. If CSS were to
> add either or both of these features, I'd be more than happy to work to make
> sure HTML hooked into them appropriately.

It would be great to see this potential realized.

(In reply to Ian 'Hixie' Hickson from comment #13)
> 
>  - A list of what needs to happen with respect to automatic localisation
>     e.g. <input type=date> UI, <time> rendering... (see also comment 1)
> 

One further point is regarding the level of integration with BCP-47 locale extensions. Has any thought been given to the potential for manifesting each of the following attributes:

calendar
time zone data 
collation order
currency
number system

It seems that each of these has a place within HTML, if not the method to define their application.

The application of extensions may either require some conjunction with user locale so as to allow for pass through of things like timezone or currency, or possibly the use of translate="no" and/or a CSS localization toggle, or perhaps allowing partial BCP-47 extensions in lang="".

Consider potential use cases for:

<label>Enter Date\Time from the Japanese calendar</label>
<input type="datetime" lang="en-u-ca-japanese"/>

<label>Enter Date\Time from the UTC-14 timezone</label>
<input type="datetime" lang="en-u-tz-utce14"/>

<label>Select\Enter your phonebook names</label>
<select lang="en-u-co-phonebk" editable multiple sorted/>

<label>Enter some $USD</label>
<input type="currency" lang="en-u-cu-usd""/>

<label>Enter a Roman numeral</label>
<input type="number" lang="en-u-nu-roman"/>
Comment 36 Ian 'Hixie' Hickson 2014-08-01 19:55:55 UTC
> I was considering the possibility that :locale() might be needed to allow
> CSS rules to target specific localizations. It would match against either
> the platform locale or page locale depending on which was enabled (pending
> such a feature in CSS).

Ah, I see. That's one option. I was thinking more that the page locale would be set by the page (e.g. using lang=""), the platform locale would be set by the user (in system settings), and the CSS side would just let you specify which of those two locales to use when customising something, plus the ability to specify dedicated formats. As in:

   span.currency { content: format(page-locale, contents, 'USD') }

...to take the element's contents, format it according to the page's locale, and put that in the rendering (assuming USD as the currency).

Similarly:

   span.currency { content: format(user-locale, contents, attr(data-currency)) }
   span.currency { content: format('%03.2f %currency', attr(data-price), attr(data-currency))) }

We also need a way to configure the input control bindings somehow to enable or disable localisation. I don't know exactly how this should work. I'll let the Web Components guy figure that out.


> 		span.currency:locale(en-AU)::after {
> 			content: "USD";
> 		}

Being able to select based on the user's locale is an interesting idea, but somewhat orthogonal to this issue, I think.


> Dates in the form "YYYY-MM-DD" is an international syntax and does not exist
> as a cultural format in any locale.

Yet my system platform's locale is set to use that as a date format. And I want Web pages to honour that.


> This was the main thrust for <data> (and <time>).

Is was the main thrust for the original <time>, but that was dropped. <data> is just for storing machine-readable data, the browser isn't supposed to do anything with it. The new <time> is similar, but for time data. CSS hopefully one day will be able to localise both, either according to the page locale or the user locale.


> One further point is regarding the level of integration with BCP-47 locale
> extensions. Has any thought been given to the potential for manifesting each
> of the following attributes:
> 
> calendar
> time zone data 
> collation order
> currency
> number system
> 
> It seems that each of these has a place within HTML, if not the method to
> define their application.

Not sure what you mean.


> Consider potential use cases for:
> 
> <label>Enter Date/Time from the Japanese calendar</label>
> <input type="datetime" lang="en-u-ca-japanese"/>
> 
> <label>Enter Date/Time from the UTC-14 timezone</label>
> <input type="datetime" lang="en-u-tz-utce14"/>
> 
> <label>Select/Enter your phonebook names</label>
> <select lang="en-u-co-phonebk" editable multiple sorted/>
> 
> <label>Enter some $USD</label>
> <input type="currency" lang="en-u-cu-usd""/>
> 
> <label>Enter a Roman numeral</label>
> <input type="number" lang="en-u-nu-roman"/>

I don't really understand the use cases here. When would you want someone to enter a roman numeral, only for it to be converted to a decimal value, instead of just letting the author enter the value in whatever form they want that their browser accepts, for example?
Comment 37 Cameron Jones 2014-08-11 16:31:11 UTC
(In reply to Ian 'Hixie' Hickson from comment #36)
> > I was considering the possibility that :locale() might be needed to allow
> > CSS rules to target specific localizations. It would match against either
> > the platform locale or page locale depending on which was enabled (pending
> > such a feature in CSS).
> 
> Ah, I see. That's one option. I was thinking more that the page locale would
> be set by the page (e.g. using lang=""), the platform locale would be set by
> the user (in system settings), and the CSS side would just let you specify
> which of those two locales to use when customising something, plus the
> ability to specify dedicated formats. As in:
> 
>    span.currency { content: format(page-locale, contents, 'USD') }
> 
> ...to take the element's contents, format it according to the page's locale,
> and put that in the rendering (assuming USD as the currency).

Ok, i agree with the logic but for implementation i was thinking more in terms of defining CSS properties than functions. This would be more granular and allow overriding through selectors.

The two main requirements are the need to set the applicable locale and to configure the format.

A 'locale' property could be an enumerated option, with two options as "page-locale" and "user-locale" as you noted. That implies resolution to a base BCP-47 value, which leads me to think it could also provide an override point within CSS, ie:

* {
	locale: <page-locale | user-locale | BCP-47 | inherit | initial | unset> 
}

This would allow localization to be configured without specifying specific formats. For example, to configure an <input type="date"/> to use "en-US" date format:

input[type="date"] {
	locale: "en-US";
} 

Without the ability to set a locale (or variant) through CSS, we would only be able configure format skeletons:

input[type="date"] {
	format: "mm/d/y";
}

> 
> Similarly:
> 
>    span.currency { content: format(user-locale, contents,
> attr(data-currency)) }
>    span.currency { content: format('%03.2f %currency', attr(data-price),
> attr(data-currency))) }
> 
> We also need a way to configure the input control bindings somehow to enable
> or disable localisation. I don't know exactly how this should work. I'll let
> the Web Components guy figure that out.
> 

If the locale is controlled by property, this could just be another enumerated value; 'no-locale' or even just 'off'. This would allow the raw machine-readable data to be directly rendered as text.


> 
> > 		span.currency:locale(en-AU)::after {
> > 			content: "USD";
> > 		}
> 
> Being able to select based on the user's locale is an interesting idea, but
> somewhat orthogonal to this issue, I think.
> 

Yes, it's just a potential corner case. 

> 
> > Dates in the form "YYYY-MM-DD" is an international syntax and does not exist
> > as a cultural format in any locale.
> 
> Yet my system platform's locale is set to use that as a date format. And I
> want Web pages to honour that.
> 

This is only relevant for client-side localization as there is no way to encode this kind of configuration in a universal format, especially as a condensed identifier like BCP-47 and the locale extentions.

It definitely highlights the usefulness of the 'user-locale' setting as the interation with the Operating System is ideal. 

The syntax of a CSS 'format' configuration should probably allow for these existing conventions so that the integration can be seemless. In addition to custom format skeletons, it is also common for dates to provide format enumerations: SHORT, MEDIUM, LONG.

This implies there could be both a method for configuring which format an element should use for presentation, and a method for configuring the specific format skeletons.

> 
> > Consider potential use cases for:
> > 
> > [snip]
> 
> I don't really understand the use cases here. When would you want someone to
> enter a roman numeral, only for it to be converted to a decimal value,
> instead of just letting the author enter the value in whatever form they
> want that their browser accepts, for example?

The use case is that of being able to configure an <input> for localized data. The specific example of roman numerals is not important - what i am drawing from is the locale extensions of BCP-47 and the data available through the CLDR. There are various numeral systems defined and the approach for supporting them is identical. Shouldn't it be possible for users of alternate numeral systems to use their own? 

An interesting point i have heard in relation to non-ascii input data is that the input methods (ie, keyboards) don't exist for these code points to be generated - however, with more people using mobile technology as a primary device and the use of virtual keyboards as standard, there seems to be no reason that people should not be capable of operating within a completely localized environment.

The calendar and timezone BCP-47 extensions seem quite self explanatory of how these can apply to the date/time inputs. Similar to <input type="number">, these localization would only apply to the presentation of values so the wire value would be completely independent of this. It is also natural to assume that a user would want to see dates/times in their own calendar and timezone.

The currency use case is quite compelling. There has been previous discussion about supporting a currency input but AFAIK the current recommendation is to use number inputs. The great thing about BCP-47 and CLDR is that all the currency codes and symbols are already defined, maintained and available. There is also a natural default mechanism where, for example, a 'en-US' locale identifier will use USD.

One aspect of currency which is different from the other data is that it is generally not regarded to be localizable. If we compare it to date/times for example, we can localize a date into a different timezone and calendar but it will still refer to the same temporal instant. The case for currency is that changing the unit changes the value, and is generally not what people want - essentially you can not 'localize' a currency value, it can only be 'translated' between currencies at moments in time.

What this means for HTML localization is that if 'user-locale' is the default applicable locale, we would need a different default for handling currency so that it is rendered using the 'page-locale'. I think the best way this could be achieved is through the default browser stylesheets which could hold all the initial values for localization:

* {
	locale: "user-locale";
}

input[type="currency"] {
	locale: "page-locale";
}
Comment 38 Ian 'Hixie' Hickson 2014-08-18 20:37:45 UTC
You're welcome to discuss the CSS property design here, but for what it's worth, the CSS folks don't really read this bug. You're better off discussing the CSS side of this over on www-style.
Comment 39 Cameron Jones 2015-01-08 15:09:04 UTC
Thread started on www-style to discuss localization in CSS:

http://lists.w3.org/Archives/Public/www-style/2015Jan/0088.html
Comment 40 Anne 2016-03-28 13:35:45 UTC
So... I'm pretty sure browsers started investigating using lang="" here directly since it turns out it's more compatible than Ian thought it would be.

Kent, has that worked out? If it has I guess we should update HTML to say that the lang="" attribute is used for this.
Comment 41 Kent Tamura 2016-03-28 23:42:26 UTC
(In reply to Anne from comment #40)
> So... I'm pretty sure browsers started investigating using lang="" here
> directly since it turns out it's more compatible than Ian thought it would
> be.
> 
> Kent, has that worked out? If it has I guess we should update HTML to say
> that the lang="" attribute is used for this.

I had a plan to implement this [1], and confirmed it wouldn't break many pages.  But we don't complete the implementation yet.

I know input[type=number] of Firefox is already lang-aware.

Updating the specification so that UAs may apply element's locale sounds good.

[1] https://groups.google.com/a/chromium.org/forum/#!msg/blink-dev/QpEoCwU0Ttg/DVHHm28IKVYJ
Comment 42 Anne 2016-03-29 11:34:16 UTC
1. https://html.spec.whatwg.org/multipage/forms.html#input-impl-notes already allows using the page's locale and even encourages it. So it seems this was already changed at some point to be mostly encouraged. We could change this to make it close to a requirement (we can probably not completely require it).

2. For <input inputmode=tel/email/url> the user's locale is recommend for the keyboard, but that makes total sense to me. I don't think we want to change that.

3. https://html.spec.whatwg.org/multipage/forms.html#input-author-notes should probably be modified to say page's locale instead or even refer to the language of the control.

4. We should modify https://html.spec.whatwg.org/multipage/forms.html#submit-button-state-(type=submit) and https://html.spec.whatwg.org/multipage/forms.html#reset-button-state-(type=reset). That would even remove a fingerprinting issue. If we created a standardized list of strings per language we could completely eliminate all fingerprinting there.

If we do 4, https://html.spec.whatwg.org/multipage/forms.html#using-the-input-element-to-define-a-command also needs to be updated.
Comment 43 Cameron Jones 2016-04-06 12:47:33 UTC
1. The current description of locale was defined through this bug:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=13408

2. The problem with making a specific locale a requirement is that there is no universal default which makes sense for all controls.

3. The best change possible would be to specify which locale should be used for each control, however this was deemed to be too onerous for HTML and assumes a particular implementation. 

4. There is already navigator.language so there is no additional fingerprinting signals exposed by form localization. 

The place to define localization for presentation is not HTML but instead through CSS. This allows authors to write the content of the page with locale markup using the "lang" attribute and then configure the style of the page using the user or element locale through CSS declarations.

This also would provide the mechanism for UAs to define their localizations as CSS declarations within the default stylesheet.

I broached this on www-style last year but as an esoteric subject there was little headway at the time. I do note that i18n has interest in resolving this bug and has produced the following:

https://www.w3.org/International/wiki/Locale-based_forms
Comment 44 Domenic Denicola 2019-03-29 19:21:31 UTC
W3C Bugzilla is closing down, and as such we're closing all feature request bugs against HTML as "WONTFIX", at least wontfix-in-this-bugtracker. (We have been told the Bugzilla URLs will remain active, but read-only.)

If you still think this feature is valuable, please feel free to open a new issue against https://github.com/whatwg/html/issues ; the community has gotten much more active and involved since the Bugzilla days, and you might get a more useful dialogue there.