Proposal for resolving ISSUE-64 (was: Re: RDF-ISSUE-64 (langtag-rfc): RFC 3066 or RFC 5646 for language tags? [RDF General])

Ok, I think the sanest course of action is to set the bar high, require the tag to be valid, and note that previous versions didn't do so.

PROPOSAL: Resolve ISSUE-64 by replacing the current text:

[[
Plain literals have a lexical form and optionally a language tag as defined by [RFC3066], normalized to lowercase.
]]

with:

[[
Plain literals have a lexical form and optionally a language tag as defined by [RFC5646]. The language tag, if present, MUST be well-formed according to section 2.2.9 of [RFC5646], and MUST be normalized to lowercase.

NOTE: Earlier versions of RDF permitted tags that adhered to the generic tag/subtag syntax of language tags, but were not well-formed according to [RFC5646]. Such language tags do not conform to RDF 1.1.
]]

Best,
Richard




On 30 May 2011, at 21:40, Pat Hayes wrote:

> 
> On May 30, 2011, at 2:03 PM, Richard Cyganiak wrote:
> 
>> On 30 May 2011, at 19:37, Pat Hayes wrote:
>>>> I have to admit that I only skimmed both specs, so I might be wrong here.
>>>> 
>>>> RFC 3066 had a quite simple generic grammar for language tags. Groups of letters/numbers separated by dashes. This can be checked with one simple regex.
>>>> 
>>>> RFC 5646 explicitly enumerates most of the “words” that are allowed in language tags. The grammar is several pages.
>>>> 
>>>> I read this as saying that "kx-kx-kx" is a valid RFC 3066 language tag, but not a valid RFC 5646 language tag. So "foo"@kx-kx-kx is a valid object in RDF 2004, but not in RDF 1.1?
>>> 
>>> Ah, I see what you are getting at. No, I don't think we should go this route. We can simply say that an RDF literal tag is any string of characters (as far as RDF is concerned, that is.)
>> 
>> Any string of characters? But this isn't a valid literal in RDF 2004:
>> "foo"@en.gb
>> 
>> because dots are not allowed by RFC 3066. Should it be valid in RDF 1.1?
>> 
>>> So "foo"@kx-kx-kx is a legal RDF literal, as indeed is "foo"@the-quality-of-mercy-is-not-strained. Nevertheless, we can refer to RFC 5646 for the meaning of the term 'language tag'. 
>> 
>> Right.
>> 
>>>> Will implementing a conformant RDF 1.1 system require an implementation of the full RFC 5646 grammar?
>>> 
>>> No. I don't think RDF parsers should be required to MUST check legality of language tags. They MAY do so, of course, if that is found useful, and they MAY emit error messages when they find bad ones, but they aren't obliged to do these checks. 
>> 
>> I *think* I agree with the general sentiment here. But if we change the set of allowable language tags from RDF 2004, then we should be well aware of that and document it.
>> 
>> FWIW, HTML5 states that the value of @lang must be valid according to BCP47 (which apparently is RFC 5646 plus some other stuff), and validator.w3.org really checks this and emits errors for non-existing languages. It emits a warning for i-klingon, saying that it's deprecated in favor of tlh…
> 
> Oh well. I really have no strong feelings about this, tell you the truth. If (as I thought you were saying) requiring conformance is an onerous burden, then we can afford to be more permissive, seems to me. But if conformance is thought desirable for interoperability, then so be it. But either way, we ought to refer to the most up-to-date specs, and not require conformance to an obsoleted spec, just on general good-standards-etiquette grounds.
> 
> Pat
> 
>> 
>> Best,
>> Richard
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 

Received on Monday, 30 May 2011 23:29:54 UTC