Re: Proposal for ISSUE-12, string literals from Pat Hayes on 2011-05-13 (public-rdf-wg@w3.org from May 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 13 May 2011 15:49:51 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Alex Hall <alexhall@revelytix.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <09730824-04D0-4A49-B486-6492B6AA70F2@ihmc.us>
On May 13, 2011, at 10:00 AM, Richard Cyganiak wrote:

> On 13 May 2011, at 15:33, Alex Hall wrote:
>> It's for this reason that I'd prefer to keep rdf:PlainLiteral out of the core RDF specs and reserve it for exchanging language-tagged literals with systems that don't support that notion.  Having to deal with the extraneous '@' for literals without language tags seems like needless complexity for what should be a simple string manipulation.
> 
> Strong +1. Earlier I tried to work out the changes to the spec that would be required to make rdf:PlainLiteral the unified representation of strings, and it's a bloody mess and I really don't want to go there.

I agree, but if we have to (a) include lang tags and (b) fit within the current RDF description of a datatype (which mentions a mapping from a string to a value, not from a pair to a value) then this is about the best that can be done, I think. (I was part of the debates that led to this design, and tried very hard myself to get rid of the trailing @ at the time, but couldn't find a way to do it.) Actually, I don't think it ALL that much of a mess: one trailing @ character isn't a bone-breaker, surely, to anyone who has to take a URI apart every now ant again. BUt it would be neater without it, for sure.

HOWEVER... I think we do have another way out. Unlike the designers of rdf:PlainLIteral, who were obliged to work within the constraints of the current RDF design, we can re-design RDF. See below.

> I kept my notes on the wiki anyways:
> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/SyntacticSugarProposal
> 
>> If we're going to say that everything has a datatype, I'd prefer to see "foo" get normalized to "foo"^^xsd:string.  But my reasons there are more aesthetic; it just seems wrong to single out that one particular primitive datatype and say that it should not be used.
>> 
> 
>> FWIW, my preferred approach would be to:
>> 1. Say that every literal has *either* a datatype *or* a language tag.
>> 2. Say that the datatype of the surface form "foo" is xsd:string.
> 
> This feels weird. Ok, "foo" is of type string, even though the type is implicit, I can understand that. But why is it no longer a string if I tag it as English? Shouldn't it still have an implicit type of string?

The string itself is still a string, but the literal is not just that string, its that string plus a tag, ie a pair. Which is why it – the literal rather than the the string – can't be typed with xsd:string. Sigh. 

But try this for size. Plain literals are a very special case, unique to RDF, and it is the language tag which makes them so special and strange. Datatypes are defined currently as mappings from a string to a value (so the rdf:PlainLiteral had to smush the tag into the string, hence all the @ business.) But we can define a special datatype which maps pairs into values, just for this purpose. We can even call it rdf:PlainLIteral without contradicting the current specs. 

It applies to two kinds of lexical forms: strings (these will be the ones with the @ in them), and pairs of a string with a lang tag. The lang tag may be the empty tag, but still we distinguish between S and <S, empty>. This, every plain literal is assumed to have a lang tag in it, even when there is no @ in the syntax. 

Its value space is the set of strings containing at lest one '@' character,  and pairs of a string and a language tag. The mapping follows the current rdf:PlainLiteral spec when applied to strings, so that "foo@en"^^rdf:PlainLiteral maps to <"foo", "en"> ; but in addition, it applies to current plain literal syntax, treated as being a pair of a string and a lang tag, so that "foo"@en also maps to <"foo", "en">. Here is the complete mapping as a table:

Lexical form		value
"foo@"   			"foo"	
"foo@tag"		<"foo", tag>
"foo", empty		"foo"
"foo", tag		<"foo", tag>  when tag =/= empty

and the plain literal syntax is understood thus:  "foo"   parses to   "foo", empty   and  "foo"@tag   parses to   "foo", tag .

The reason for this empty-tag shuffle is to keep a plain literal string distinguished from the rdf:PlainLIteral string with the trailing @ added, of course. If we could ignore the current rdf:PlainLIteral specs, this would be easier and we could simply map "foo" to itself and "foo"@en to <"foo", en>. But I think the shuffling is worth doing to avoid having even more inter-specs contradictions in this area. 

Advantages: Gives a type to plain literals; preserves rdf:PlainLIteral specs (extending them, but not contradicting them); allows people to use plain literals without getting involved with trailing @; and allows xsd:string to be deprecated in favor of plain literal syntax (or the reverse, of course.) 

Disadvantages: might be thought too complicated; takes the notion of type slightly outside the current RDF datatype specs.  

Thoughts?

Pat


> So you have replaced one weird thing (multiple ways of representing a string) with another weird thing (a notion of string datatypes that doesn't make sense).
> 
> I think the sensible way would be:
> 1) every literal has *both* a datatype and a (possibly empty) language tag;

EVERY literal? What about numbers and dates and times and ... ? 

> 2) of the built-in datatypes, only xsd:string can have non-empty language tags;
> 3) plain literals and rdf:PlainLiterals don't exist;
> 4) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
> 5) "foo"@en in concrete syntaxes is syntactic sugar for "foo"^^xsd:string@en.
> 
> This *might* work better than the rdf:PlainLiteral mess when translated into spec changes, but raises BC issues, and requires changes to syntax specs to add the syntactic sugar, so I prefer the proposal that says implementations MAY unify to plain literals, as it doesn't require changes to the abstract syntax.
> 
>> As long as the surface forms "foo" and "foo"^^xsd:string get normalized to the same thing (or systems have permission to do such normalization) then I'm happy.
> 
> Good to hear that.
> 
> Best,
> Richard

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 13 May 2011 20:50:29 UTC