Re: [i18n review comment] BP3 should recommend locale-neutral representation #187

Thanks, Phil, for giving this a try. I think in light of Addison's 
comments, we will need to make a more substantial change. We had 
discussed in today's call changing the sense of the BP to primarily 
suggest using locale-neutral representations and to offer metadata only 
as a fallback if that wasn't workable. The version at 
http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata goes a 
little further in that direction, but even that doesn't go far enough. I 
think we need to write a new BP, "use locale-neutral data 
representations" and only mention the metadata approach in the 
implementation section as a fallback. There are usable pieces of text in 
the three versions of BP3 floating around, though I think this calls for 
a little new text as well, to get the angle right.

-Annette


On 8/19/16 8:37 AM, Phil Archer wrote:
> I took an action on today's call to try and address this in BP3. You 
> can see the results at
> http://philarcher1.github.io/dwbp/bp.html#LocaleParametersMetadata
>
> This uses some of Addison's text directly and highlights the value of 
> the xsd datatypes - but retains enough of the original BP for it to be 
> an amendment rather than a whole new one - I hope.
>
> This addresses most of the resolution taken today [1] but I have not 
> moved the BP to the formats section. I leave that to the editors who 
> may want to make further changes - or argue for it to be left where it 
> is, or add references from the formats section or, or, or...
>
> I've created the Pull Request https://github.com/w3c/dwbp/pull/447
>
> Phil.
>
> [1] https://www.w3.org/2016/08/19-dwbp-minutes#resolution02
>
> On 15/08/2016 17:28, Bernadette Farias Lóscio wrote:
>> Dear Ishida,
>>
>> This comment [1] is still under discussion [4] and we'd like to ask your
>> opinion about two of our proposals:
>>
>> 1. to include locale-neutral representation ideas as part of BP3 [2], or
>> 2. to include a paragraph at the introduction of Section 8.8 Data 
>> Formats
>> [3] to discuss the relevance of having local-neutral representations.
>>
>> We also discussed the proposal of having a new BP and we agreed that we
>> won't have a lot of time for a broader review of the new BP and to 
>> collect
>> feedback from the community.
>>
>> Thanks a lot!
>> DWBP editors
>>
>> [1] https://lists.w3.org/Archives/Public/public-dwbp-comments/
>> 2016Jul/0028.html
>> [2]http://agreiner.github.io/dwbp/bp.html#LocaleParametersMetadata
>> [3] https://www.w3.org/TR/dwbp/#dataFormats
>> [4] 
>> https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Aug/0009.html
>>
>>
>> 2016-08-04 23:26 GMT+02:00 Annette Greiner <amgreiner@lbl.gov>:
>>
>>> Hi Addison,
>>>
>>> Thanks for your response, and it does make sense. I think what I am 
>>> still
>>> missing is whether there is guidance we can point to as to how to 
>>> represent
>>> the "locale-neutral" data so that it can most easily be made locale
>>> specific by existing tools. You mention "pre-made standards for the 
>>> basic
>>> data types". Is there a recommended list we could reference?
>>>
>>> Thanks for your help!
>>> -Annette
>>>
>>>
>>> On 8/4/16 12:31 PM, Phillips, Addison wrote:
>>>
>>>> Hi Annette,
>>>>
>>>> Thanks for the note. This is a personal reply not on behalf of the WG.
>>>>
>>>> Locale neutral formats are quite common on the Web and the Internet in
>>>> general. One familiar format referenced by your document, for 
>>>> example, is
>>>> XML Schema. While the representations of numbers, dates, and the 
>>>> like in
>>>> XML Schema would be "more appropriate" for some languages/locales than
>>>> others if given as plain text, what distinguishes them is that they 
>>>> are all
>>>> machine readable and intended to be read by machines for later 
>>>> processing.
>>>> The display of values is a separate, local, concern for the data's
>>>> consumer. This necessarily means choosing specific separators (such as
>>>> decimal separators) over other, more localized values. Save for 
>>>> "free text"
>>>> (natural language) data, most data formats are locale neutral and 
>>>> these
>>>> include things like JSON-LD, XML Schema, CSV, and so forth.
>>>>
>>>> Not every possible data structure or data value is, of course, covered
>>>> fully. For example, in my day job (I work at Amazon), we have many
>>>> different common measurement units defined internally. To transmit 
>>>> these in
>>>> a locale-neutral manner, we need to construct our own data schemas and
>>>> identifiers. There are profoundly many ways to measure shoes, 
>>>> dresses, auto
>>>> parts, hats, drone propellers, and so forth. But it would be a 
>>>> nightmare to
>>>> have to deal with localized presentation formats on top of that.
>>>>
>>>> But there are pre-made standards for the basic data types and these 
>>>> are
>>>> what are needed to build almost any data structure necessary for 
>>>> global
>>>> interchange of data.
>>>>
>>>> Does that make sense?
>>>>
>>>> Addison
>>>>
>>>> Addison Phillips
>>>> Principal SDE, I18N Architect (Amazon)
>>>> Chair (W3C I18N WG)
>>>>
>>>> Internationalization is not a feature.
>>>> It is an architecture.
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>> From: Annette Greiner [mailto:amgreiner@lbl.gov]
>>>>> Sent: Thursday, August 04, 2016 12:04 PM
>>>>> To: ishida@w3.org; public-dwbp-comments@w3.org
>>>>> Cc: www International <www-international@w3.org>
>>>>> Subject: Re: [i18n review comment] BP3 should recommend 
>>>>> locale-neutral
>>>>> representation #187
>>>>>
>>>>> Hello on behalf of the DWBP WG,
>>>>>
>>>>> We're interested in pursuing this concept in our best practice 
>>>>> document,
>>>>> but
>>>>> we would like some clarification of the practice of locale 
>>>>> neutrality.
>>>>> You
>>>>> mention the variation across locales in decimal symbol, grouping 
>>>>> symbol,
>>>>> number of grouping digits, digit shapes, etc., and you give an 
>>>>> example
>>>>> of a
>>>>> locale-neutral data structure for monetary values.
>>>>> But this structure alone does not appear to address differences in
>>>>> decimal
>>>>> symbol, grouping symbol, number of grouping digits, or digit 
>>>>> shapes. It
>>>>> does
>>>>> provide a mechanism to separately specify the units, and the 
>>>>> example uses
>>>>> an ISO-4217 currency code, both of which we agree are good ideas. Is
>>>>> there a
>>>>> broad standard (beyond just monetary) for addressing the other
>>>>> symbol/representation issues you raised that we can address 
>>>>> briefly in
>>>>> our
>>>>> best practice? Do you consider SI units consistent with a 
>>>>> locale-neutral
>>>>> approach? Is there a locale-neutral standard for representing decimal
>>>>> numbers (perhaps using a period and no grouping, as in your example)?
>>>>>
>>>>> -Annette
>>>>>
>>>>>
>>>>> On 7/22/16 5:32 AM, ishida@w3.org wrote:
>>>>>
>>>>>> [raised by aphillips]
>>>>>>
>>>>>> https://www.w3.org/TR/dwbp/#LocaleParametersMetadata
>>>>>>
>>>>>> Best practice #3 introduces itself as:
>>>>>>
>>>>>> Providing locale parameters helps humans and computer applications
>>>>>>>
>>>>>> to work accurately with things like dates, currencies and numbers 
>>>>>> that
>>>>>> may look similar but have different meanings in different locales.
>>>>>>
>>>>>> But the actual best practice is to use **locale-neutral**
>>>>>> representations that are interpreted/displayed to end-users in a
>>>>>> locale-appropriate manner. For example, instead of storing the 
>>>>>> string
>>>>>> "€2000.00", exchanging a data structure like the following is 
>>>>>> strongly
>>>>>> preferred:
>>>>>>
>>>>>> ```
>>>>>> "price" {
>>>>>>     "value": 2000.00,
>>>>>>     "currency": "EUR"
>>>>>> }
>>>>>> ```
>>>>>>
>>>>>> The date examples given are all in xsd:date format, which is an
>>>>>> excellent example of using a locale-neutral format.
>>>>>>
>>>>>> Many things are dependent on locale: decimal symbol, grouping 
>>>>>> symbol,
>>>>>> number of grouping digits, digit shapes, etc. It's because there can
>>>>>> be wide variation (sometimes open to misinterpretation) that 
>>>>>> sending a
>>>>>> locale neutral format is preferred for data values. Note also btw 
>>>>>> that
>>>>>> the position of the currency symbol is dependent on the locale. In
>>>>>> France it would be normal to write 2000.00 € rather than €2000.00.
>>>>>> Same even when talking about USD when using $, ie. 2000.00 $.
>>>>>>
>>>>>>
>>>>>> -- 
>>>>> Annette Greiner
>>>>> NERSC Data and Analytics Services
>>>>> Lawrence Berkeley National Laboratory
>>>>>
>>>>>
>>> -- 
>>> Annette Greiner
>>> NERSC Data and Analytics Services
>>> Lawrence Berkeley National Laboratory
>>>
>>>
>>>
>>
>>
>

-- 
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory

Received on Friday, 19 August 2016 20:24:36 UTC