Re: [PROPOSAL: Turtle: EDITORIAL. Concepts: Non-ED] Re: Language Tag Case Conflict (between RDF1.1 and BCP47)

From Eric,
>Of course, this also depends on RDF Concepts changing "and must be
>normalized to lowercase." to something like ". While BCP47
>section-2.1.1 specifies the appropriate case for various sub-language
>forms, RDF treats as equal all variations in case. For example, a
>literal "中国" with a language tag of "zh-Hans" is the same term as
>that literal with a language tag of "zh-hans" or "zh-HANS"".

--Indeed, my confusion comes from "The language tag must be well-formed 
according to section 2.2.9 of [BCP47], and must be normalized to 
lowercase", if the expression can be adapted as something like Eric 
suggested, there will be no confusion.

From Jeremy,
>Crucially, the result of comparing two language tags should not be 
sensitive to the case of the original input.

>My view, is that the answer to Hong Sun is that if it is important to him 
for other reasons to leave the country code in upper case, then he should, 
>and he can get correct RDF 1.1 conformant operation by systematic case 
normalizing language tags using any case normalization he so choses 
including >that from BCP47. RDF 1.0 is clear on that point, and to the 
extent that RDF 1.1 does not allow that, it is a defect.

--It is not important for me whether the tag should be lower case or upper 
case, I just want to make sure when I convert data to RDF format, it is 
following the right format. Following what you stated, can I draw the 
following conclusions?
1. "text"@en-GB is equal to "text"@en-gb
2. It is better to use "text"@en-gb in RDF 1.1

Thanks!

kind regards,
Hong

 



From:   Eric Prud'hommeaux <eric@w3.org>
To:     Hong Sun/AXIFX/AGFA@AGFA
Cc:     gavin@carothers.name, David Wood <david@3roundstones.com>, Ivan 
Herman <ivan@w3.org>, public-rdf-comments Comments 
<public-rdf-comments@w3.org>
Date:   03/29/2013 05:52 PM
Subject:        [PROPOSAL: Turtle: EDITORIAL. Concepts: Non-ED] Re: 
Language Tag Case Conflict (between RDF1.1 and BCP47)
Sent by:        "Eric Prud'hommeaux" <ericw3c@gmail.com>



* Hong Sun <hong.sun@agfa.com> [2013-03-29 16:50+0100]
> Thanks Gavin,
> 
> You are right, it is mainly for the RDF1.1 concept.

if we're expecting that Turtle will be a lot of people's RDF HowTo,
perhaps it's worth a bit of informative text in the bottom of 2.5.1
Quoted Literals <http://www.w3.org/TR/turtle/#turtle-literals>

[[
Note that RDF treats language tags as case-insensitive. "中国"@zh-Hans
and "中国"@zh-hans are treated as the same node. Per BCP47
section-2.1.1, the latter is not a correct representation.
]]

Of course, this also depends on RDF Concepts changing "and must be
normalized to lowercase." to something like ". While BCP47
section-2.1.1 specifies the appropriate case for various sub-language
forms, RDF treats as equal all variations in case. For example, a
literal "中国" with a language tag of "zh-Hans" is the same term as
that literal with a language tag of "zh-hans" or "zh-HANS"".

This nudges people in the right direction, but doesn't tell them what
case to expect if they SPARQL for lantag("中国"@zh-Hans). I believe
most systems will give you the case used in the first variant that
they encountered so this is the strongest statement we can make.

The I18N folks might look askance at having ill-formed language tag
examples in Concepts and Turtle, but may, if it's too late to demand
BCP47 conformance, see that as preferable to ambiguity.


> Kind regards,
> Hong
> 
> 
> 
> From:   Gavin Carothers <gavin@carothers.name>
> To:     David Wood <david@3roundstones.com>
> Cc:     Ivan Herman <ivan@w3.org>, public-rdf-comments Comments 
> <public-rdf-comments@w3.org>, Hong Sun/AXIFX/AGFA@AGFA
> Date:   03/29/2013 04:30 PM
> Subject:        Re: Language Tag Case Conflict (between RDF1.1 and 
BCP47)
> 
> 
> 
> 
> 
> 
> On Fri, Mar 29, 2013 at 8:13 AM, David Wood <david@3roundstones.com> 
> wrote:
> Hi Hong Sun and Ivan,
> 
> Do I understand correctly that these comments apply to Turtle?
> 
> Editors option: No, concerns about the case sensitivity, or lack of case 

> sensitivity do not effect Turtle. Turtle is very careful to avoid doing 
> anything other then refer to RDF Concepts 1.1. The grammar allows for 
any 
> case of language tag segments and specifically passes the tag along as a 

> "unicode string" no attempt to normalize the language is made as part of 

> the Turtle specification. That's up to the rest of the RDF stack. We 
> chickened out.
> 
> The only change would be to examples in the Turtle if the WG does decide 

> one way or the other.
> 
> Cheers,
> Gavin
> 
> 
> Regards,
> Dave
> --
> http://about.me/david_wood

> 
> 
> 
> On Mar 29, 2013, at 06:09, Ivan Herman <ivan@w3.org> wrote:
> 
> Hong Sun does not have the right credentials to send the mail to the WG 
> mailing list, so I forward this to the comment list for processing and 
> archiving!
> 
> Hong Sun, thank you.
> 
> Ivan
> 
> Begin forwarded message:
> 
> From: Hong Sun <hong.sun@agfa.com>
> Subject: [Moderator Action] Language Tag Case Conflict (between RDF1.1 
and 
> BCP47)
> Date: March 29, 2013 10:43:18 GMT+01:00
> To: public-rdf-wg@w3.org
> 
> Dear All, 
> 
> I am working on processing text with language tag, but reading the RDF 
1.1 
> specification, I found there is a conflict in choosing the case for a 
> language tag. 
> 
> In RDF1.1 
> http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-language-tag 
> It is stated 
> """ 
> a non-empty language tag as defined by [BCP47]. The language tag must be 

> well-formed according to section 2.2.9 of [BCP47], and must be 
normalized 
> to lowercase. 
> """ 
> which is together with the following example: 
> show:218 show:localName "Cette Série des Années Septante"@fr-be .  # 
> literal with a region subtag 
> 
> 
> But taking a look at BCP47 
> http://tools.ietf.org/html/bcp47#section-2.2.9        , it states 
> """ 
> For example, one might use a tag such as "no-QQ", where 'QQ' 
>    is one of a range of private use ISO 3166-1 codes to indicate an 
>    otherwise undefined region. 
> """ 
> 
> An even more clear recommendation is given in this document in 
> http://tools.ietf.org/html/bcp47#section-2.1.1 
> """ 
> All subtags, including extension and private 
>    use subtags, use lowercase letters with two exceptions: two-letter 
>    and four-letter subtags that neither appear at the start of the tag 
>    nor occur after singletons.  Such two-letter subtags are all 
>    uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- 
>    letter subtags are titlecase (as in the tag "az-Latn-x-latn"). 
> """ 
> 
> In short, it seems that: 
> according to RDF1.1, we should uses de-ch, 
> in BCP47, it recommends to use de-CH, 
> and meanwhile RDF1.1 also states language tag must be well-formed 
> according to [BCP47]. 
> 
> So now there is a conflict, and which exactly should we use? 
> 
> 
> In addition, in the other specifictions, Turtle does not care the case, 
> while N3 now also use lower case for sub-tag, e.g. de-ch. 
> 
> http://www.w3.org/2000/10/swap/grammar/n3.n3 
> """ 
> # was: "[a-zA-Z][a-zA-Z0-9]*(-[a-zA-Z0-9]+)?"; 
> langcode        cfg:matches          "[a-z]+(-[a-z0-9]+)*"; # 
> http://www.w3.org/TR/rdf-testcases/#language 
>                 cfg:canStartWith         "a". 
> """ 
> 
> Is it possible to treat the language tag as case-insensitive? As Andy 
> Seaborne suggested in 
> http://lists.w3.org/Archives/Public/public-rdf-wg/2013Feb/0275.html 
> 
> Thanks! 
> 
> Kind Regards,
> 
> Hong Sun | Agfa HealthCare
> Researcher | HE/Advanced Clinical Applications Research
> T  +32 3444 8108
> 
> http://www.agfahealthcare.com

> http://blog.agfahealthcare.com

> Click on link to read important disclaimer: 
> http://www.agfahealthcare.com/maildisclaimer 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/

> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf

> 
> 
> 
> 
> 
> 
> 

-- 
-ericP

Received on Tuesday, 2 April 2013 08:46:44 UTC