Re: Rethinking ISSUE-12 with lang datatypes

* Pat Hayes <phayes@ihmc.us> [2011-05-30 11:30-0500]
> 
> On May 30, 2011, at 3:51 AM, Richard Cyganiak wrote:
> 
> > On 26 May 2011, at 16:43, Antoine Zimmermann wrote:
> >>> RDF Concepts currently says that the language tag must be valid
> >>> according to RFC 5646, and lowercase. So I'd say that anything of the
> >>> form rdf:lang{langTag} where {langTag} is not lowercase or not
> >>> syntactically valid according to RFC 5646 is an ill-typed literal.
> > 
> > Actually I was wrong here -- RDF Concepts refers to RFC 3066, which has a much simpler generic syntax for language tags. RFC 5646 obsoletes RFC 3066. Oh well.
> > 
> > So I guess there's a related but separate issue here -- should language-tagged literals in RDF 1.1 still be defined in terms of the (simple) RFC 3066, or in terms of the (much more hairy) RFC 5646? I'll raise an issue for that.
> > 
> >> people in the WG said they would like to see a relationship between, e.g., "foo"@en and "foo"@en-GB (see the answers to your quiz).
> > 
> > I think it is possible to achieve consensus that trying to establish equalities between @en and @en-GB in core RDF is a bad idea. It's just too damn hard.
> 
> Its worse than merely hard, it would break RDFS and OWL class reasoning. Not only should we not go there, we should erect large barriers of RDF police tape to warn people from trying to go near it. 

There are some intra-langtag relationships which practical multi-language systems will need to implement on top of whatever we write down. For instance, to pass a SPARQL test suite, you need to implement 3066's notion of langMatches. So far, RDF imposes neither validation nor normalization on injecting literals into triples in a graph. Even if we were to isolate a set of datatypes (say integers and floats) which were normalized (e.g. "01" becoms "1" for purposes of graph equivalence), we wouldn't have to similarly raise the bar for language tags.

I don't recall anyone uttering requirements that { <s> <p> "foo"@en , "foo"@en-GB } be one triple, just assertion that it would be harmful. We could push SPARQL's langMatches into a lang-entailment to say that { <s> <p> [ :langMatches "foo"@en , "foo"@en-GB ] } in order to unify that functionality amongst parsers, APIs, etc, but I don't think many specs would be written to presume lang-entailment.

Given the relatively low bar of "foo"@en != "foo"@en-GB, I don't see why it's harder to presume inequality between ("foo"^^lang:en "foo"^^lang:en-GB) than between ("foo"@en "foo"@en-GB). It's already within the datatype extensibility scheme for me to write down the former. Jeremy, were there other inferences, besides equality, which motivated you to look at this? I don't want us to waste time on an impossible task, but I want to give do diligence to any model simplification. If we want to offer something attractive to the legions of JSON hackers out there, we have to present as uniform and simple a model as possible.



> Pat
> 
> > 
> > +1 to everything else you said.
> > 
> > Richard
> > 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 

-- 
-ericP

Received on Monday, 30 May 2011 19:18:02 UTC