Re: ACTION-419: Sync with Birte on Datatypes for canonicalisation

On 30 Mar 2011, at 18:21, Birte Glimm wrote:

> Thanks Bijan and Andy, it's now clearer what says what. I, personally,
> prefer by far the 1.1 (datatypes) spec. The 1.0 spec is even
> contradictory in saying that canonical representations of decimals
> must have a decimal point and that derived types (such as integers)
> inherit the canonical representation from their primitive type, but
> integer specifically forbids decimal points. What is rather bizare is
> that the 1.1 Candidate Rec went back to LC, but with a document dated
> from 2009 that does not explain anything and that even has a note that
> last call coments are due by 31 Dec 2009.

This member only link may provide some clarification:
	http://lists.w3.org/Archives/Member/w3c-xml-schema-wg/2011Mar/0004.html

> I just discussed this also with Boris (Motik) and it seems that there
> is even a problem with the 1.1 spec since it is hard/impossible to
> define a canonical representation in the primitive type that is
> guaranteed valid for all derived types, e.g., for integers there is
> now an explicit exception in the definition of canonical
> representation that says "Specifically, for integers, the decimal
> point and fractional part are prohibited." I could, however, define
> other derived types, e.g., "integers with preceeding 0" by applying a
> pattern to integers that only allow values of the form "01", "0...".
> They are all lexically ok integers and decimals, but the canonical
> representation would no be in the lexical space of that datatype.

Yeah, well:

http://www.w3.org/TR/xmlschema11-2/#datatype

"The ·pattern· facet, on the other hand, and any other (·implementation-defined·) ·lexical· facets, restrict the ·lexical space· directly. When more than one lexical representation is provided for a given value, such facets may remove the ·canonical representation· while permitting a different lexical representation; in this case, the value remains in the ·value space· but has no ·canonical representation·. This specification provides no recourse in such situations. Applications are free to deal with it as they see fit."

I think in OWL this isn't a problem because we don't allow dorking (too much) with the lexical space (esp. with numbers).

Does it crop up anywhere for SPARQL in reality?

> Now
> one could argue that nobody is likely to define anything like that,
> but it might be safer todefine canonical representation per type,
> i.e., each primitive or derived type has o explicitly say what the
> canonical form is and even if it is just "same as for the super type".

Or just disallow lexical space mapping facets when it could muck things up.

> Anyway, I would like to go with the 1.1 definition and hope that there
> are no objections to that.

My preference.

> As Andy pointed out, it migh be worth thinking about this issue also
> with respect to SPARQL Update. If I put a canonical representation in
> and specify a canonical representation in my delete, then all is fine,
> but in all other cases things can get messy. If the store
> canonicalises both data and query things are ok-ish, but users still
> might not like that if they put
> :s :p "1"^^xsd:short
> in and delete
> :s :p "1.0"^^xsd:decimal
> then the triple is gone (assuming that the type is always the
> primitive one after canonicalisaion, which is not really defined yet
> anywhere).

Liveable, I guess.

> Even worse IMO would be if the store canonicalises the
> data, but not the query. So you put
> :s :p "01"^^xsd:short
> in and then try to delete
> :s :p "01"^^xsd:short
> and nothing is deleted, because the data value in the store has been
> canonicalised whereas the one in the query hasn't.

Ewww!

> Also, at the moment, the D-entailment regime does not prescribe
> anything to be done when loading and you could even have
> un-canonicalised data values in your graph as long as you somehow
> manage to give the correct answers, which are only those with
> canonical representations, so you could theoretically have
> :s :p "2.200"^^xsd:float .
> :s :p "2.20"^^xsd:float .
> and BGP
> :s :p ?dv .
> as long as you give just one answer with ?dv bound to the canonical
> represenation, which I think is "2.2"^^xsd:float. Obviously, it is
> quite unlikely that anybody would do that (I think), but assume
> someone does. If now a DELETE query is issued to delete
> :s :p "2.2"^^xsd:float .
> since you believe that this triple is in the graph, then you are
> mistaken and nothing happens.


Ouch.

Cheers,
Bijan.

Received on Wednesday, 30 March 2011 19:40:52 UTC