Re: Review of XSD Datatypes 1.1 Changes

On Fri, Feb 3, 2012 at 3:49 PM, Pat Hayes <phayes@ihmc.us> wrote:

>
> On Feb 2, 2012, at 2:08 PM, Alex Hall wrote:
>
> > Per ACTION-136 - Review changes in W3C XML Schema Definition Language
> (XSD) -- http://www.w3.org/TR/2012/PR-xmlschema11-2-20120119/#changes
> >
> > I've completed my review of the changes in XSD Datatypes 1.1. Rather
> than go through the exhaustive list of changes, I'll summarize the areas
> that I think are relevant to RDF:
> >
> > 1. Datatype definitions, including definitions of lexical spaces, value
> spaces, L2V mappings, and canonical mappings, underwent a thorough
> revision. This is a good thing, because the new definitions are much more
> precisely stated and leave less room for confusion. In general, RDF defers
> to XSD for datatype definitions so I don't think any action on our part is
> required here in terms of the RDF specs. However, implementors of XSD
> datatype processing in RDF will want to review these changes so we might
> want to call their attention to them. I did verify that the short-form
> literal definitions in Turtle for boolean, double, decimal, and integer are
> still valid subsets of the respective lexical spaces in XSD 1.1.
> >
> > 2. XSD 1.1 distinguishes between the identity of values and the
> (numeric) equality of values.
>
> In order for this to make sense, the value spaces have to be defined so
> that there are distinct but numerically identical values. As far as I can
> understand it, this means that for example 3 and 3.0 are different values
> in XSD, and the value spaces of xsd:number and xsd:real (for example) are
> not what a mathematician would mean by 'natural number' and 'real number'
> respectively. Given this, then...
>

There are three primitive numeric datatypes in XSD:
- decimal, whose value space is all numbers expressible as x / 10^n for
some integer x and non-negative integer n.
- double, whose value space is modeled after IEEE double-precision floating
point numbers (m * 2^e for some integers m and e with |m| < 2^53 and -1074
<= e <= 971).
- float, whose value space is modeled after IEEE single-precision floating
point numbers (m * 2^e for some integers m and e with |m| < 2^24 and -149
<= e <= 104).

The decimal type is the base type from which the remaining numeric
datatypes are derived (int, long, etc) and the value spaces for these
derived datatypes are subsets of the decimal value space.

For values of each primitive numeric datatype, identity and equality are
the same with the exception of the +0/-0/NaN weirdness mentioned below. So
"3"^^xsd:decimal and "3.0"^^xsd:decimal are both identical and equal.

The identity/equality distinction also comes into play with the dateTime
datatype, where values representing the same instant in time with different
timezone offsets are considered by XSD to be distinct but equal.


>
> > As far as I can tell, RDF Semantics is defined strictly in terms of
> identities (I would appreciate confirmation of this from one of the
> editors).
>
> ...yes. But identity in this sense is conventionally indicated by the
> equality sign '=', which might get confusing.
>
> FWIW, I do not myself find this distinction to be meaningful (two
> *numbers* can be distinct but have the same *numerical* value? Does that
> make sense to you?) but no doubt that is due to my early brainwashing as a
> mathematician.
>

No, it doesn't make much sense to me either. Nor can I tell by looking at
the spec that it's a meaningful distinction in XSD, I only assume it is
because they go through such pains to distinguish identify from equality.


>
> > To avoid confusion, it might be worth noting this distinction in the
> section on datatype entailment and explicitly stating that datatype
> entailment deals with identity and not equality, if that is indeed our
> position. [For SPARQL, pattern matching deals with identity and the '='
> operator deals with equality.
>
> Is that a previous decision, or do you just presume that it must work this
> way?


This is based on RDF Semantics section 1.4 (
http://www.w3.org/TR/rdf-mt/#gddenot):

"In this table, and throughout this document, the equality sign = indicates
identity" (The table here being the table which defines interpretations of
vocabulary terms, triples, and graphs.)

I would greatly prefer the case where the equality sign means identity,
> consistently (as it does everywhere else in the known universe). If we need
> to distinguish identity of value-space-elements from identity of numerical
> values of value-spece elements, then the tidy and clear way to do this is
> to have a function from the former to the latter, and write things like
>
> numerical-value(xsd:number("+0")) = numerical-value(xsd-value("-0"))
>
> even though xsd:number("+0") =/= xsd:number("-0")
>
> ie in a nutshell, equality is identity of numerical-values.
>
> > ]
> >
> > 3. The float and double datatypes introduce positive and negative zero
> to the value space; these values are distinct but equal. Conversely, NaN is
> identical to but not equal to itself.
>
> So numerical equality is not reflexive? This is a very strange world that
> XSD has invented.
>
> > This does have implications for RDF (and SPARQL). For instance, take the
> statements:
> >
> > <s> <p> "+0"^^xsd:double .
> > <s> <p> "-0"^^xsd:double .
> >
> > These two statements are equivalent under XSD entailment using the
> definition of double from XSD 1.0 (because "+0" and "-0" both mapped to the
> value zero), but are distinct under XSD entailment using the definition
> from XSD 1.1.
> >
> > But, given a graph with these statements, the SPARQL query: SELECT * {
> <s> <p> ?o FILTER ( ?o = "0"^^xsd:double ) } should return two rows.
> >
> > Meanwhile, given the graph:
> >
> > <s> <p> "NaN"^^xsd:double .
> >
> > SELECT * { ?s <p> "NaN"^^xsd:double } should return one row.
> > SELECT * { <s> <p> ?o FILTER ( ?o = "NaN"^^xsd:double ) } should return
> zero rows.
> >
> > 4. The value spaces of the primitive datatypes are disjoint. This is not
> actually a change in XSD 1.1, but is given more prominence (moved from
> Section 4, buried in the definition of the equality facet to Section 2 in
> the definition of the datatype system). So, strictly speaking, the graph {
> <s> <p> "1.0"^^xsd:decimal } does not XSD-entail the graph { <s> <p>
> "1.0"^^xsd:double } because decimal and double are different primitive
> types. This came as a surprise to me, even though I've spent some time
> poking around in the XSD specs, so I thought I'd call attention to it here.
> I had just presumed that the value denoted by both literals was simply the
> number 1.
>
> Whatever these values are, they are not numbers, for sure.
>

Well, they claim that they're numbers. But on the other hand they say stuff
like this:

http://www.w3.org/TR/2012/PR-xmlschema11-2-20120119/#identity

"In the identity relation defined herein, values from
different ·primitive· datatypes' ·value spaces· are made artificially
distinct if they might otherwise be considered identical.  For example,
there is a number two in the decimal datatype and a number two in
the floatdatatype.  In the identity relation defined herein, these two
values are considered distinct.  Other applications making use of these
datatypes may choose to consider values such as these identical, but for
the view of ·primitive· datatypes' ·value spaces· used herein, they are
distinct."

http://www.w3.org/TR/2012/PR-xmlschema11-2-20120119/#equality

"In the equality relation defined herein, values from different primitive
data spaces are made artificially unequal even if they might otherwise be
considered equal.  For example, there is a number two in
the decimal datatype and a number two in the float datatype.  In the
equality relation defined herein, these two values are considered unequal.
Other applications making use of these datatypes may choose to consider
values such as these equal; nonetheless, in the equality relation defined
herein, they are unequal."

So, at least we're allowed to go out on a limb and say that two equals
two... But, since we defer to XSD for datatype definitions, if we want
"1"^^xsd:decimal to equal "1"^^xsd:double we have to say so ourselves.


> >
> > 5. The definition of the xsd:duration datatype has been significantly
> revised. We should revisit the statement that "xsd:duration does not have a
> well-defined value space" and therefore should not be used in RDF.
>
> Indeed.
>
> > To begin with, I don't know what "well-defined" means in the context of
> this sentence. I do know that the confusion surrounding xsd:duration has to
> do with the fact that different months have different numbers of days, and
> the difficulty that arises when trying to compare a duration with a month
> component to one with (day/hour/minutes/seconds) components that total 28
> days or more.
>
> Not to mention leap years, leap seconds, etc..
>

They deal with leap seconds by conveniently ignoring them :-)

-Alex



>
> >
> > The duration definition in XSD 1.1 does have a clearly defined:
> >    - lexical space, which is the same as that in 1.0
> >    - value space, which is modeled as a [ months as xsd:integer, seconds
> as xsd:decimal ] tuple.
> >    - identity condition: two durations are identical if and only if
> their months and seconds components are both identical.
> >    - equality relation, which is the same as its identity relation.
> >    - partial ordering.
> >
> > Given these revisions, we should consider including xsd:duration in the
> list of RDF-compatible XSD types.
>
> Absolutely. I dont think this value space makes calendric sense as a
> specification of an actual duration, but that isn't our business.
>
> >
> > 6. We should include the following types, new in XSD 1.1, to the list of
> RDF-compatible XSD types:
> >    - xsd:dateTimeStamp, derived from xsd:dateTime by requiring a
> timezone offset.
> >    - xsd:dayTimeDuration, derived from xsd:duration by restricting the
> months component in the value space to be zero.
> >    - xsd:yearMonth, derived from xsd:duration by restricting the seconds
> component in the value space to be zero.
> >
> > Regardless of what is decided for xsd:duration, we should include
> dayTimeDuration and yearMonthDuration since both of these types are totally
> ordered.
>
> Agreed.
>
> Pat
>
> >
> > Regards,
> > Alex
> >
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

Received on Friday, 3 February 2012 22:27:50 UTC