ISSUE-178: time:Number - not a good idea?

time:Number - not a good idea?

State:
CLOSED
Product:
Time ontology in OWL
Raised by:
Simon Cox
Opened on:
2017-04-12
Description:
In order to support large numbers as well as arbitrary precision for numericDuration and numericPosition, we created a datatype time:Number being the union of xsd:float, xsd:double and xsd:decimal.

However, there are various gotchas with this, both from the pov of conforming to the full RDFS/OWL/SPARQL universe, and with inconsistent tool support.
I'll forward two emails that layout the problem in gory detail.
Related Actions Items:
No related actions
Related emails:
  1. ISSUE-178: time:Number - not a good idea? (from Simon.Cox@csiro.au on 2017-04-17)
  2. RE: OWL-Time - issue with SPARQL endpoints lacking owl reasoner (from chris.little@metoffice.gov.uk on 2017-04-12)
  3. Re: OWL-Time - issue with SPARQL endpoints lacking owl reasoner (from jlieberman@tumblingwalls.com on 2017-04-12)
  4. RE: OWL-Time - issue with SPARQL endpoints lacking owl reasoner (from Simon.Cox@csiro.au on 2017-04-12)

Related notes:

Mail from Simon Cox to list on 2017-04-12:

In the new OWL-Time there is a class time:TimePosition which is expected to have one of

time:numericPosition - being a number on a time-line, or
time:nominalPosition - being a named era from an ordinal reference system

alongside a

time:hasTRS - which indicates the reference system that the value relates to.

time:numericPosition is intended to support things like Unix time (usually an integer or decimal) or geologic or cosmologic time, which could be a very large number. So we want the option of either xsd:decimal (which provides arbitrary precision) and xsd:double (scientific notation). So I created an OWL2 union datatype defined as follows (Turtle notation), and used it for the rdfs:range of time:numericPosition.

time:Number
rdf:type rdfs:Datatype ;
rdfs:comment "Generalized number"@en ;
rdfs:comment "Note: integer is a specialization of decimal"@en ;
rdfs:label "Number"@en ;
owl:equivalentClass [
rdf:type rdfs:Datatype ;
owl:unionOf (
xsd:double
xsd:float
xsd:decimal
) ;
] ;
.

[Note that OWL2 has types owl:real (but no lexical representation) and owl:rational (use xsd:double for the lexical representation), neither of which meets requirements. ]

A colleague has looked at a test dataset in which I had mixed value with types xsd:float and time:Number which should be OK. We ran SPARQL queries including FILTER expressions like

FILTER ( ?targetAge > xsd:decimal(?end) )
FILTER ( ?targetAge < xsd:decimal(?begin) )

My test environment (TopBraid Composer) produced the expected results, but Doug found that for a variety of SPARQL engines that are not OWL2 aware, while the > and < operator succeeded when an xsd:decimal was compared with a xsd:float, they failed when xsd:decimal was compared to a time:Number.

Are we being too clever? How to satisfy the requirement?

Simon

-----Original Message-----
From: Douglas Fils [mailto:dfils@oceanleadership.org]
Sent: Wednesday, 12 April, 2017 03:56
To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
Subject: OWL Time in SPARQL endpoints lacking owl reasoner

Simon,
We got a response at https://github.com/blazegraph/database/issues/59

Looks like lacking a reasoner Blazegraph isn’t going to connect time:Number to a concept it can do calculations with. Adam’s Virtuoso doesn’t even seem to understand XPATH operators so I couldn’t get to the point of seeing if it does or not. It looks from the docs and net Virtuoso does do some OWL reasoning but it’s not complete. So whether is has the coverage of operations needed I do not know.

I worry that TopBraid is implementing elements that are not perhaps so common in the various SPARQL end points we are seeing in use. I confess, I don’t have the sample size to back up that statement.

If true though, I worry this could limit the use of a graph implementing OWL Time in the wild. I’ll leave it at that. You have far more experience and understanding of this than I. I would be very interested in your views and thoughts on this.

Thanks
Doug

Simon Cox, 12 Apr 2017, 07:56:55

Response from Antoine Zimmermann, 2017-04-12:

Simon,


I have several remarks wrt to your message concerning:
1) SPARQL engines supporting OWL
2) numeric values in XSD, RDF and OWL
3) precision & scientific notations (also related to your following email)

1) SPARQL engines supporting OWL
================================

Most SPARQL engines implement the standard "SPARQL 1.1 Query Language"
or a subset of it. This standard does not talk about reasoning. In fact, you must not do reasoning at query time if you want to conform to this standard, otherwise you would get incorrect results. If you want to support reasoning as part of the engine, you have to implement a different standard: "SPARQL 1.1 Entailment Regimes". Few SPARQL engines implement it. Even if they do, they may not support the OWL entailment regime because you can also restrict yourself to RDFS, for instance.


2) numeric values in XSD, RDF and OWL
=====================================

There are weird subtleties in the XML Schema datatypes. First, although most xsd:float literals and xsd:double literals have to be interpreted as numbers (like all xsd:decimal literals), the value spaces of these two datatypes are considered mutually disjoint and disjoint from that of xsd:decimal. Second, the value spaces of xsd:float and xsd:double contain values that are not numbers, namely "NaN"^^xsd:float and "NaN"^^xsd:double. So, ironically, by trying to encompass all forms of numbers, you created a datatype "time:Number" that contains things that are not numbers.

With OWL, you can create custom datatypes by combining supported datatypes (list given in Sec.4 of the "OWL 2 Structural Specification and Functional Style Syntax") with unionOf, intersectionOf, oneOf, complementOf and datatype restrictions. However, it is never possible to use a custom datatype IRI as the datatype IRI of a literal. That is, the following is invalid: "1.2e7"^^time:Number (according to the OWL 2 SS&FSS).
Consequently, the reasoning you can do with a unionOf datatype is very limited.

What TopBraid Composer is doing is probably that it does not care about the OWL 2 SS&FSS and allows any RDF graph. What reasoning it's doing is unclear. Perhaps, when you do this in TopBraid:

FILTER( "123"^^ex:notDefinedDatatype < xsd:decimal(1234) )

it's converting everything to a string and compares lexicographically?


3) Precision and scientific notation
====================================
The scientific notation has usually two purposes (as far as I know):
a) provide a concise notation for big or small numbers
b) (sometimes) provide an implicit notion of precision

In order to support a), your solution is to allow xsd:float and xsd:double. It makes sense but I say that it may not be necessary. IMHO, we should not assume that people are going to write down RDF files manually, or read RDF files visually. They will either be programmers or end users.
- Programmers load RDF to memory and save RDF to files with programming functions. They don't have to look at the literals in their stored form and don't have to write them explicitly.
- End users will input data values with interfaces that can allow things like 1.5e5 to be stored as an xsd:decimal, and that allow the users to see that a quantity stored as 45120084650320 is "45.12 trillion" or "45,120,084,650,320" or other user friendly notation.
Moreover, scientific notations could be written 1.5×10^5 instead.

In order to support b), xsd:float and xsd:double are not sufficient (in fact, they are useless for that). A notation like 4e17 may lead some scientists to believe that this is an approximation ("roughly" 400
quadrillion) but this is not how the XSD standard works.
"4e17"^^xsd:float is *exactly* the value denoted by
400,000,000,000,000,000 in anglo-saxon writing. In order to support precision, you would need an extra value (that could be stored as xsd:decimal to allow arbitrary precision) such that a pair
(400000000000000000,100000000000000000) is understood to be "4*10^17 ±10^17".

To conclude, my position is that xsd:float and xsd:double are not really needed here and I support getting rid of them, but I would not fight for it.


Hope this helps.
--AZ

Simon Cox, 12 Apr 2017, 07:58:26

Proposed resolution https://github.com/w3c/sdw/pull/689

Simon Cox, 12 Apr 2017, 08:45:36

xsd:float, xsd:double and xsd:decimal are entirely separate primitive datatypes and also defined as pairwise disjoint, so there is no logical way to compare them in XSD - RDF - OWL. Tools would need to intercept and manually recast them (e.g. in TopBraid), but that is not going to be interoperable.

Joshua Lieberman, 12 Apr 2017, 14:58:43

Display change log ATOM feed


Chair, Staff Contact
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 178.html,v 1.1 2018/10/09 10:07:59 carine Exp $