This is a proposal for addressing the following time-permitting item from the charter:
Reconcile various forms of string literals: at the moment we have plain literals, rdf:plainLiteral, and xsd:string literals. They are very very close to one another but they are officially different. In practice this means that, eg, SPARQL queries have to have a three branch UNION to handle all of these. Worth looking at some sort of a reconciliation of these.
This is ISSUE-12.
- RDF Concepts puts more emphasis on the distinction between (syntactic) “literal equality” and (semantic) “value equality”
- RDF Concepts explicitly points out the string value equalities that already arise from RDF Semantics
- RDF Concepts declares one of the forms—plain literals—as canonical
- Implementations MAY canonicalize, but don't have to.
- No changes to the abstract syntax required
- No changes to any concrete syntax or parser required
- No changes to any implementations of any of the existing entailment regimes required
- Those who are ok with canonicalization can do that, and don't need to deal with entailment
- Those who don't want to canonicalize, have the option of supporting only string value equality at query time, without RDFS- and D-Entailment
- “MAY canonicalize” softly discourages the use of xsd:string typed literals, without abolishing them or declaring them archaic
Changes to RDF Concepts
For reference, the current text:
- Section 6.5.2 The Value Corresponding to a Typed Literal
- Section 6.5.1 Literal Equality
- Section 6.3 Graph Equivalence
Changes to Section 6.5.2 The Value Corresponding to a Typed Literal
§1 Rename it to “6.5.1 The Value Corresponding to a Literal” and move it ahead of 6.5.1
§2 Add to the beginning: “The value of a plain literal without language tag is the same Unicode string as its lexical form.
The value of a plain literal with language tag is a pair consisting of 1. the same Unicode string as its lexical form, and 2. its language tag.
For typed literals, …” (continue with rest of section as is)
§3 Remove the Note at the end of the section
Changes to Section 6.5.1 Literal Equality
§4 Rename section to “6.5.2 Literal Equality and Canonical Forms”
§5 Add to the beginning: “Equality of literals can be evaluated based on their syntax, or based on their value.”
§6 Change “Two literals are equal …” to: “Two literals are syntactically equal …” in the current first paragraph
§7 Add to the end: “In application contexts, comparing the values of literals (see section 6.5.1) is usually more helpful than comparing their syntactic forms. Literals with different lexical forms and with different datatypes can have the same value. In particular:
- A plain literal with lexical form aaa and no language tag has the same value as a typed literal with lexical form aaa and datatype IRI xsd:string
- A plain literal with lexical form aaa and no language tag has the same value as a typed literal with lexical form aaa@ and datatype IRI rdf:PlainLiteral
- A plain literal with lexical form aaa and language tag xx has the same value as a typed literal with lexical form aaa@xx and datatype IRI rdf:PlainLiteral”
§8 “Some literals are canonical forms. Implementations MAY replace any literal with a canonical form if both are syntactically different, but have the same value. All plain literals, with or without language tag, are canonical forms.”
Changes to Section 6.3 Graph Equivalence
§9 Append this leftover sentence, which was removed from 6.5.1: “Note: For comparing RDF Graphs, semantic notions of entailment (see [RDF-SEMANTICS]) are usually more helpful than the syntactic equivalence defined here.”
Extending this to other XSD literals???
(While we're at it, we might also cover equalities between the built-in numeric XSD types, and between different lexical forms of the same built-in XSD datatype.)
- Possible weaker statement of §8, assuming we don't want to extend this to other literals besides strings: “Implementations MAY replace xsd:string typed literals and rdf:PlainLiteral typed literals with a plain literal that has the same value.”