Re: Rethinking ISSUE-12 with lang datatypes from Andy Seaborne on 2011-05-27 (public-rdf-wg@w3.org from May 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 27 May 2011 22:08:12 +0100
To: public-rdf-wg@w3.org
Message-ID: <4DE012BC.4020808@epimorphics.com>
On 27/05/11 18:32, Pat Hayes wrote:
>
> On May 27, 2011, at 4:49 AM, Ivan Herman wrote:
>
>>
>> On May 27, 2011, at 11:23 , Andy Seaborne wrote:
>>
>>>
>>>
>>> On 25/05/11 17:50, Antoine Zimmermann wrote:
>>>> All,
>>>>
>>>>
>>>> [disclaimer: I am not vehemently in favour of that proposal, just expressing my thoughts aloud.]
>>>
>>> In the same spirit: just thinking aloud.
>>
>> Ditto
>>
>>>
>>> One of the limitations of datatypes is that lexical space is a 1D, the set of sequences of characters.  If we generalise datatypes for RDF to a "representation space" which can be multi-dimensional, we can formulate and relate language tagged datatypes quite simply.
>>>
>>> Restricting the representation space to 1D space of strings, we get back to lexical space and compatibility with XSD etc.
>>>
>>> rdf:String is a datatype where the rep space is
>>>    (unicode strings) union (unicode strings, validLangTags)
>>> The value space is<string>  union<string,validLangTags>
>>>
>>> rdf:LangTaggedString is a derived datatype of rdf:String, restricting the  represenation space to (unicode strings, validLangTags).
>>>
>>> rdf:lang{langTag} is a derived datatype of rdf:LangTaggedString, restricting the representation space to (unicode strings, {langTag})
>>
>> But, I believe, the alternative idea was slightly different. If we remove rdf:LangTaggedString from the equation altogether, and we keep only the rdf:lang-{langtag} as a series of datatypes, then the representation space is simply unicode strings plus a specific datatype. Ie, just like we have
>>
>> "1"^^xsd:integer
>> "1"^^xsd:double
>>
>> that are (afaik) disjoint as different, we would have
>>
>> "a"^^rdf:lang-en
>> "a"^^xsd:string
>>
>> different.
>
> And similarly
>
> "a"^^rdf:lang-en
> "a"^^rdf-lang-en-uk
>
> Right?
>
>>
>> "a" is a shortcut for "a"^^xsd:string
>> "a"@en is a shortcut for "a"^^rdf:lang-en
>>
>> there is a question whether we would define rdf:lang-en as a subtype (subclass) of xsd:string; and it seems to be safer not to do that.
>
> It would be definitely wrong to do that. But we could have that rdf:lang-xx are all subclasses of rdf:LangTagString, that would be harmless (and might be useful.) Just don't call it a datatype.
>
>>
>> SPARQL str()
>>
>> returns the unicode string and drops the datatype for all combination.
>
> Hmm. Does that work for other datatypes? Does str() extract the string "123" from "123"^^xsd:integer ? If not, why not? That is, why is this case different from "abc"^^rdf:lang-en ?  After all, xsd:integer and rdf:lang-en are both just datatypes.

Yes, str() does.

>
> Pat
>
> PS. This tag-as-datatype idea does work, but it raises hairs on the
back of my neck, and I have been struggling to say why. It just seems
wrong to say that a language tag is a DATAtype. And it seems like overkill.
>
> The key issue with lang tagged literals is that they are the only
literal form in RDF that has two strings (as well as an implicit type).
All of the complications that we get embroiled in at this point are ways
of trying to get these two strings back into being one. rdf:PlainLIteral
smooshed them together into one string. Now we are proposing to bury one
of them inside a URI to get rid of it. I would vastly prefer that we
simply accepted that some literals have more than one string, and adapt
our notion of literal typing to accommodate to that fact, rather than
trying to disguise it or pretend its not true, and so become obliged to
swallow some clearly artificial notion (such as a language tag being a
kind of datatype) just to preserve what is in any case a purely
arbitrary model of literal typing.
>
> Peter has expressed a worry that changing this will interfere with
> the
heart, or maybe the foundations, of RDF, but this worry is really
nothing more than a vague rumbling sound. Suppose we had said originally
that the L2V mapping applied to the lexical form of the literal, rather
than to a string embedded in this lexical form. Nothing would have been
significantly different in the RDF specs: with a slightly adapted L2V
mapping, no entailments would have been altered, and no algorithms need
to have been changed. But this pseudo-problem, and all the twisting and
turning we and others have gone through and are still going through,
would simply not have arisen. We can still do this, and it really would
be more like having a haircut than like major abdominal surgery.
>
> A question for the rdf:lang-en proposal. In order to determine the
language tag of a lang-tagged literal, it is necessary to parse the
inside of a URI. Is this likely to be a problem? It feels like a problem
to me.

The value space is (string, langtag) so the langtag is available from 
the value.

"Representation space" pushes that back to the syntax form in a very 
general way - we could just take the idea and only apply it to "foo"@en 
forms.  That's the modified L2V you mention.

	Andy
Received on Friday, 27 May 2011 21:08:44 UTC