StringLiterals/AbolishUntaggedPlain

From RDF Working Group Wiki
Jump to: navigation, search

This is a proposal for addressing the following time-permitting item from the charter:

Reconcile various forms of string literals: at the moment we have plain literals, rdf:plainLiteral, and xsd:string literals. They are very very close to one another but they are officially different. In practice this means that, eg, SPARQL queries have to have a three branch UNION to handle all of these. Worth looking at some sort of a reconciliation of these.

This is ISSUE-12.

Proposal

1. Abolish plain literals without language tag from the abstract syntax

2. "foo" and corresponding forms in other concrete syntaxes are syntactic sugar for "foo"^^xsd:string. In general, both forms MAY be used and represent identical literals in the abstract syntax.

3. N-Triples has use cases that are hampered by the variability introduced by syntactic sugar. One of the two forms is forbidden when serializing. Only when serializing, so that legacy documents can still be parsed. The N-Triples editors will take care of this.

4. This proposal does not consider what ought to be done about language-tagged strings or rdf:PlainLiteral; these questions are to be addressed in a separate proposal and decision

Issues

  • This implies some changes to SPARQL. How to handle this, given the different WG timeframes? It's probably too late for SPARQL 1.1, or is it?
    • In queries, "foo" now matches "foo"^^xsd:string and vice versa
    • The notion of “simple literal” in the SPARQL spec does no longer reflect RDF Concepts
    • datatype("foo") is now xsd:string without the need for an exception in the spec
    • SPARQL Results XML/JSON are hampered by the variability introduced by syntactic sugar. One of the two forms should be forbidden when answering queries over RDF 1.1.

Comparison of current RDF and proposal

Blue italics indicate changes between current RDF and new proposal.

Literals in the new proposal
Kind of literal Concrete syntaxes Abstract syntax Value
Concrete syntax form Allowed? Ttl NT Spq SRX RDFa R/X Abstract syntax form Allowed?
Strings without
language tag
"foo" ? ? "foo"^^xsd:string Unicode string
"foo"^^xsd:string ? ?
"foo@"^^rdf:PlainLiteral MUST NOT "foo@"^^rdf:PlainLiteral MUST NOT
Strings with
langauge tag
"foo"@en <"foo",@en> <Unicode string,
langauge tag>
"foo@en"^^rdf:PlainLiteral MUST NOT "foo@en"^^rdf:PlainLiteral MUST NOT
Integer numbers 1 "1"^^xsd:integer Number
"1"^^xsd:integer
Decimal numbers 1.0 "1.0"^^xsd:decimal
"1.0"^^xsd:decimal
Booleans true "true"^^xsd:boolean Boolean value
"true"^^xsd:boolean
Other literals "lexical"^^datatype "lexical"^^datatype Depends on L2V
mapping of datatype

Discussion etc

  • One output syntax form should be stated as preferred. Suggestion: "Serializing an RDF graph SHOULD use the plain literal syntax "foo" in preference to the "foo"^^xsd:string form."
  • This can then be suggested to other RDF-related formats, including SPARQL results formats.