StringLiterals/LanguageTaggedStringDatatypeProposal

From RDF Working Group Wiki
Revision as of 09:27, 25 May 2011 by Rcygania2 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This is a proposal for addressing the following time-permitting item from the charter:

Reconcile various forms of string literals: at the moment we have plain literals, rdf:plainLiteral, and xsd:string literals. They are very very close to one another but they are officially different. In practice this means that, eg, SPARQL queries have to have a three branch UNION to handle all of these. Worth looking at some sort of a reconciliation of these.

This is ISSUE-12.

Short summary

  • Abolish plain literals
  • Use xsd:string instead of untagged ones
  • Use a new “special datatype” rdf:LanguageTaggedString for tagged ones
  • The lexical form of rdf:LanguageTaggedString is not a string like for normal datatypes, but 〈string,langtag〉 pairs
  • "foo" and "foo"@en and corresponding forms in other concrete syntaxes are syntactic sugar for the above, and preferred

Details

1. Untagged plain literals are removed from the abstract syntax; an xsd:string typed plain literal is used instead.

2. In concrete syntaxes, the "foo" form SHOULD be used instead of "foo"^^xsd:string. (“SHOULD” for backward compatibility.)

3. Tagged plain literals are removed from the abstract syntax as well.

4. Instead, a new “special datatype” is introduced for tagged string literals only.

5. Let's provisionally call it rdf:LanguageTaggedString for now. A shorter name should be found.

6. Unlike normal datatypes, the lexical space of rdf:LanguageTaggedString is not "lexicalform" strings, but 〈string,langtag〉 pairs. Its value space is the set of 〈string,langtag〉 pairs too, and its L2V mapping is the identity mapping.

7. In concrete syntaxes, the "foo"@en form MUST be used for literals of type rdf:LanguageTaggedString.

8. rdf:PlainLiteral remains as it is -- not to be used as syntax (concrete or abstract).

Some corollaries

9. It's ok to use rdf:LanguageTaggedString and rdf:PlainLiteral in rdfs:range statements. This should probably be documented somewhere, at least in the RDFS spec.

10. In SPARQL, datatype("foo") is now xsd:string without the need for an exception in the spec

11. In SPARQL, datatype("foo"@en) is now rdf:LanguageTaggedString (with a note that legacy implementations might return error)

12. The value space of rdf:PlainLiteral is the union of the value spaces of xsd:string and rdf:LanguageTaggedString.

Comparison of current RDF and proposal

Literals in current RDF
Kind of literal Concrete syntaxes Abstract syntax Value
Concrete syntax form Allowed? Ttl NT Spq SRX RDFa R/X Abstract syntax form Allowed?
Strings without
language tag
"foo" "foo" Unicode string
"foo"^^xsd:string "foo"^^xsd:string
"foo@"^^rdf:PlainLiteral MUST NOT "foo@"^^rdf:PlainLiteral MUST NOT
Strings with
langauge tag
"foo"@en "foo"@en <Unicode string,
langauge tag>
"foo@en"^^rdf:PlainLiteral MUST NOT "foo@en"^^rdf:PlainLiteral MUST NOT
Integer numbers 1 "1"^^xsd:integer Number
"1"^^xsd:integer
Decimal numbers 1.0 "1.0"^^xsd:decimal
"1.0"^^xsd:decimal
Booleans true "true"^^xsd:boolean Boolean value
"true"^^xsd:boolean
Other literals "lexical"^^datatype "lexical"^^datatype Depends on L2V
mapping of datatype

Blue italics indicate changes between current RDF and new proposal.

Literals in the new proposal
Kind of literal Concrete syntaxes Abstract syntax Value
Concrete syntax form Allowed? Ttl NT Spq SRX RDFa R/X Abstract syntax form Allowed?
Strings without
language tag
"foo" "foo"^^xsd:string Unicode string
"foo"^^xsd:string SHOULD NOT
"foo@"^^rdf:PlainLiteral MUST NOT "foo@"^^rdf:PlainLiteral MUST NOT
Strings with
langauge tag
"foo"@en <"foo",@en>^^rdf:LangTaggedString <Unicode string,
langauge tag>
"???"^^rdf:LangTaggedString impossible, no lexical form defined
"foo@en"^^rdf:PlainLiteral MUST NOT "foo@en"^^rdf:PlainLiteral MUST NOT
Integer numbers 1 "1"^^xsd:integer Number
"1"^^xsd:integer
Decimal numbers 1.0 "1.0"^^xsd:decimal
"1.0"^^xsd:decimal
Booleans true "true"^^xsd:boolean Boolean value
"true"^^xsd:boolean
Other literals "lexical"^^datatype "lexical"^^datatype Depends on L2V
mapping of datatype

Discussion etc

  • Naming proposals: rdf:LanguageTaggedString, rdf:Text, …

There should be some language to the effect that "foo" is preferred, simply for ergonomic reasons. I phrased this as a SHOULD in the proposal. Weaker language might be sufficient in the general case. Or maybe expressing this preference is altogether unnecessary.

Some syntaxes have use cases that are hampered by the variability introduced by syntactic sugar. N-Triples and SPARQL Results XML/JSON, mostly. I think these syntaxes should make a stronger statement in their respective syntax spec. Perhaps forbid one of the forms when serializing. Which one doesn't really matter.