Re: Rethinking ISSUE-12 with lang datatypes

On 25 May 2011, at 17:50, Antoine Zimmermann wrote:
> Adding datatypes for each language tags may work as follows:

Thanks for writing this up Antoine. Would be great if you could maintain this in a wiki page as well!

> For a language tag {langTag}, "xxx"@{langTag} would be interpreted as a typed literal of type rdf:lang{langTag}.

Make that rdf:string-{langTag}, so we'd end up with rdf:string-fr, rdf:string-en-gb and so on.

So, would it be accurate to say that "xxx"@en is syntactic sugar for "xxx"^^rdf:string-en ?

Would serializers be allowed to emit the "xxx"^^rdf:string-en form?

> For any language tag {langTag}, there is a datatype rdf:lang{langTag} such that:
> 
> - the lexical space is all unicode strings.
> - the value space is all pairs <string,{langTag}>
> - the lexical to value space is L2V(rdf:lang{langTag})(xxx)=<xxx,{langTag}>

I see

> There is an infinite number of lang datatypes and {langTag} SHOULD be restricted to what RFC 5646 defines, but implementation MAY accept any string for lang tags (e.g., "foo"@mylangtag-bar42 MAY be considered as a valid literal by parsers),

RDF Concepts currently says that the language tag must be valid according to RFC 5646, and lowercase. So I'd say that anything of the form rdf:lang{langTag} where {langTag} is not lowercase or not syntactically valid according to RFC 5646 is an ill-typed literal.

> in which case, a corresponding datatype rdf:land{langTag} MUST exist.

I don't know what that is supposed to mean.

> Additionally, we can add an additional datatype which is a superclass of all the lang datatypes (e.g., rdf:LangTaggedLiteral).

Make that rdf:LangTaggedString for increased clarity.

> This additional datatype has an empty lexical space but its value space is the set of all pairs <string,tag>.

This doesn't have to be a datatype. Making it a class would be easier and sufficient for using it in rdfs:range declarations.

> It follows that the following triples are valid under the appropriate entailment regime:
> 
> rdf:lang{langTag} rdf:type rdfs:Datatype;
>  rdfs:subClassOf rdf:LangTaggedLiteral .

I see

> rdf:LangTaggedLiteral rdf:type rdfs:Datatype;

I'd make this:
   
   rdf:LangTaggedString a rdfs:Class;

>  rdfs:subClassOf rdf:PlainLiteral .
> 
> In OWL, we have, for all pairs of distinct {langTag1} and {langTag2}:
> 
> rdf:lang{langTag1} owl:disjointWith rdf:lang{langTag2}.
> rdf:LangTaggedLiteral owl:equivalentClass [
> rdf:type rdfs:Datatype;
> owl:onDatatype rdf:PlainLiteral;
> owl:withRestrictions( [rdf:langRange "*"] )
> ].
> rdf:lang{langTag} owl:equivalentClass [
> rdf:type rdfs:Datatype;
> owl:onDatatype rdf:PlainLiteral;
> owl:withRestrictions( [rdf:langRange "{langTag}"] )
> ].
> 
> DRAWBACKS:
> - an infinite number of datatypes (but we already have an infinite number of RDF properties anyway);
> - OWL 2 does not talk about these new types, so the OWL 2 RDF-based semantics is incomplete wrt RDF 1.1 semantics;
> - there is no relationship between "sublanguages" like "en" VS "en-GB".

This point is no different than in current RDF, nor is it any different from any other proposal considered so far, so it's not a drawback.



> - others?
> 
> ADVANTAGES:
> - compared to rdf:PlainLiteral, we distinguish langTagged and non-langTagged literals; and the lexical form is more natural;
> - one can define language-specific range restrictions (e.g., ex:englishLabel rdfs:range rdf:langen.) in RDF without the need for OWL 2 datatype machinery;
> - compared to RDF alone, we have everything typed, which can be seen as a simplification.
> - others?
> 
> 
> Regards,
> -- 
> Antoine Zimmermann
> Researcher at:
> Laboratoire d'InfoRmatique en Image et Systèmes d'information
> Database Group
> 7 Avenue Jean Capelle
> 69621 Villeurbanne Cedex
> France
> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
> Lecturer at:
> Institut National des Sciences Appliquées de Lyon
> 20 Avenue Albert Einstein
> 69621 Villeurbanne Cedex
> France
> antoine.zimmermann@insa-lyon.fr
> http://zimmer.aprilfoolsreview.com/
> 
> 

Received on Thursday, 26 May 2011 14:28:19 UTC