Re: Proposal for ISSUE-12, string literals

On 12 May 2011, at 13:06, Ivan Herman wrote:
>> I'd be tempted to go further and make only the primitive types such as xsd:decimal into RDF canonical forms. This would mean that systems MAY canonicalize all numbers to a single numeric datatype.
> 
> Do you mean like the 'canonical' forms in Turtle? I may miss something here.

No. Turtle has syntactic sugar for certain numeric literals; this has nothing to do with canonicalization.

(This all goes way beyond ISSUE-12 anyways...)

I was suggesting that perhaps, instead of this:
"+0013"^^xsd:byte => "13"^^xsd:byte

I'd like to say that implementations MAY do this:
"+0013"^^xsd:byte => "13.0"^^xsd:decimal

They'd end up with all numbers represented in a single data type, with a single canonical representation. This makes comparisons quite a bit easier.

Best,
Richard




> 
> Ivan
> 
> 
> 
>> Best,
>> Richard
>> 
>> 
>> 
>>> 
>>> Le 12/05/2011 12:19, Richard Cyganiak a écrit :
>>>> On 12 May 2011, at 09:22, Ivan Herman wrote:
>>>>> - You make the remark on the wiki page on 'extending this to
>>>>> numeric literals', which I would rather say 'extending this to any
>>>>> datatype' (eg, xsd:dateTime, too).
>>>> 
>>>> Right -- I changed the section heading on the wiki.
>>>> 
>>>>> I have the impression that this is also a consequence of what you
>>>>> write already. You emphasize the 'lexical equality', and you also
>>>>> say "Implementations MAY replace any literal with a canonical form
>>>>> if both are syntactically different, but have the same value."
>>>>> which does not look like being bound to string literals.
>>>> 
>>>> The way I wrote it, the only literals marked as canonical forms are
>>>> plain string literals. So the sentence doesn't license replacement
>>>> of, say, +00013 with 13, because no numeric literals have been marked
>>>> as canonical forms. That could be easily changed, of course.
>>>> 
>>>>> Do you think there is anything missing in this document to make
>>>>> that picture complete (except, editorially, to possibly add
>>>>> non-string examples)?
>>>> 
>>>> If we only want to address string literals, then I think the proposal
>>>> is complete.
>>>> 
>>>> If we want to address other XSD literals as well, then some bullet
>>>> points should be added to the list of equalities, and the canonical
>>>> lexical form of some XSD datatypes (e.g., "13.0"^^xsd:decimal) should
>>>> be defined to be canonical forms so that other same-valued literals
>>>> can be replaced with the canonical form. This requires a detailed
>>>> reading of the XSD spec (which I have not done so far).
>>>> 
>>>> (RDF Concepts should probably contain a paragraph or two introducing
>>>> the rdf:PlainLiteral datatype and referencing the relevant spec, but
>>>> let's treat that as a separate issue.)
>>>> 
>>>>> - I would also propose to make some tiny changes in the Semantics
>>>>> document.
>>>> 
>>>> I'll let the editors of that document comment.
>>>> 
>>>> Best, Richard
>>>> 
>>>> 
>>>>> 
>>>>> Ivan
>>>>> 
>>>>> 
>>>>> On May 11, 2011, at 23:23 , Richard Cyganiak wrote:
>>>>> 
>>>>>> I took an action today to draft text for RDF Concepts that
>>>>>> resolves ISSUE-12. I put it on the wiki here:
>>>>>> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
>>>>>> 
>>>>>> 
>>> A plain text copy is attached below.
>>>>>> 
>>>>>> Best, Richard
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> SHORT SUMMARY
>>>>>> 
>>>>>> 1. RDF Concepts puts more emphasis on the distinction between
>>>>>> (syntactic) “literal equality” and (semantic, important for
>>>>>> applications) “value equality” 2. RDF Concepts explicitly points
>>>>>> out the specific string value equalities that already arise from
>>>>>> RDF Semantics 3. RDF Concepts declares one of the string literal
>>>>>> forms as canonical 4. Implementations MAY canonicalize, but don't
>>>>>> have to 5. The canonical form is plain literals.
>>>>>> 
>>>>>> 
>>>>>> WHY?
>>>>>> 
>>>>>> 1. No changes to the abstract syntax required 2. No changes to
>>>>>> any concrete syntax or parser required 3. No changes to any
>>>>>> implementations of any of the existing entailment regimes
>>>>>> required 4. Those who are ok with canonicalization can do that,
>>>>>> and don't need to deal with entailment 5. Those who don't want to
>>>>>> canonicalize, have the option of supporting only string value
>>>>>> equality at query time, without RDFS- and D-Entailment 6. “MAY
>>>>>> canonicalize” softly discourages the use of xsd:string typed
>>>>>> literals, without abolishing them outright or declaring them
>>>>>> archaic 7. Standardizing on xsd:string was never an option
>>>>>> because of language tags 8. Standardizing on rdf:PlainLiteral was
>>>>>> never an option because it MUST NOT be used in serializations
>>>>>> that support plain literals
>>>>>> 
>>>>>> 
>>>>>> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
>>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
>>>>>> 
>>>>>> 
>>>>>> §1 Rename it to “6.5.1 The Value Corresponding to a Literal” and
>>>>>> move it ahead of 6.5.1
>>>>>> 
>>>>>> §2 Add to the beginning: “The value of a plain literal without
>>>>>> language tag is the same Unicode string as its lexical form.
>>>>>> 
>>>>>> The value of a plain literal with language tag is a pair
>>>>>> consisting of 1. the same Unicode string as its lexical form, and
>>>>>> 2. its language tag.
>>>>>> 
>>>>>> For typed literals, …” (continue with rest of section as is)
>>>>>> 
>>>>>> §3 Remove the Note at the end of the section
>>>>>> 
>>>>>> 
>>>>>> CHANGES TO 6.5.1 Literal Equality
>>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
>>>>>> 
>>>>>> 
>>>>>> §4 Rename section to “6.5.2 Literal Equality and Canonical
>>>>>> Forms”
>>>>>> 
>>>>>> §5 Add to the beginning: “Equality of literals can be evaluated
>>>>>> based on their syntax, or based on their value.”
>>>>>> 
>>>>>> §6 Change “Two literals are equal …” to: “Two literals are
>>>>>> syntactically equal …” in the current first paragraph.
>>>>>> 
>>>>>> §7 Add to the end: “In application contexts, comparing the values
>>>>>> of literals (see section 6.5.1) is usually more helpful than
>>>>>> comparing their syntactic forms. Literals with different lexical
>>>>>> forms and with different datatypes can have the same value. In
>>>>>> particular:
>>>>>> 
>>>>>> - A plain literal with lexical form aaa and no language tag has
>>>>>> the same value as a typed literal with lexical form aaa and
>>>>>> datatype IRI xsd:string - A plain literal with lexical form aaa
>>>>>> and no language tag has the same value as a typed literal with
>>>>>> lexical form aaa@ and datatype IRI rdf:PlainLiteral - A plain
>>>>>> literal with lexical form aaa and language tag xx has the same
>>>>>> value as a typed literal with lexical form aaa@xx and datatype
>>>>>> IRI rdf:PlainLiteral”
>>>>>> 
>>>>>> §8 “Some literals are canonical forms. Implementations MAY
>>>>>> replace any literal with a canonical form if both are
>>>>>> syntactically different, but have the same value. All plain
>>>>>> literals, with or without language tag, are canonical forms.”
>>>>>> 
>>>>>> 
>>>>>> CHANGES TO 6.3 Graph Equivalence
>>>>>> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
>>>>>> 
>>>>>> 
>>>>>> §9 Append this leftover sentence, which was removed from 6.5.1:
>>>>>> “Note: For comparing RDF Graphs, semantic notions of entailment
>>>>>> (see [RDF-SEMANTICS]) are usually more helpful than the syntactic
>>>>>> equivalence defined here.”
>>>>>> 
>>>>>> 
>>>>>> EXTENDING THIS TO NUMERIC LITERALS???
>>>>>> 
>>>>>> (While we're at it, we might also cover equalities between the
>>>>>> built-in numeric XSD types, and between different lexical forms
>>>>>> of the same built-in XSD datatype.)
>>>>> 
>>>>> 
>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key:
>>>>> http://www.ivan-herman.net/pgpkey.html FOAF:
>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Antoine Zimmermann
>>> Researcher at:
>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>> Database Group
>>> 7 Avenue Jean Capelle
>>> 69621 Villeurbanne Cedex
>>> France
>>> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
>>> Lecturer at:
>>> Institut National des Sciences Appliquées de Lyon
>>> 20 Avenue Albert Einstein
>>> 69621 Villeurbanne Cedex
>>> France
>>> antoine.zimmermann@insa-lyon.fr
>>> http://zimmer.aprilfoolsreview.com/
>>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 

Received on Thursday, 12 May 2011 14:46:24 UTC