A possible structure of the datatype system for OWL 2 (related to ISSUE-126)

Hello,

After a very in-depth discussion about the issues related to datatypes (thanks everyone involved!), I thought it would be good to
summarize some of the outcomes of a discussion and to outline a possible structure of the datatype system. Thus, in this e-mail,
I'll try to (semi-)formally define a datatype map -- the "thing" that defines how datatypes would work in OWL 2.

1. Datatype Map
----------------

A datatype map consists of the following things:

- a set of datatypes
  - each datatype provides a set of allowed facets
- a possibly infinite set of constants (likely to be renamed to literals, but I'll stick to "constant" for the moment)
  - each constant consists of a lexicalValue and a typeURI
  - it is written as "lexicalValue"^^typeURI

Each datatype DT is assigned a value space DT^D, which is just a nonempty set.

Each constant c is assigned a value c^D, which is just an object from the union of the value spaces of all datatypes.


Thus, a datatype can be thought as a class with a predefined extension. Note that this definition does not assume any relationship
between the set of supported typeURIs (which determine the allowed constants) and the set of datatypes (which determine the allowed
sets of values).

2. Allowed datatypes
---------------------

Comformant OWL 2 implementations would be required to support the following base datatypes, each of whose value spaces would be
disjoint:

- owl:number - the value space is the set of all real numbers
- xsd:string - the value space is the set of all Unicode strings in normal form C
- owl:internationalizedString - the value space set is the set of pairs of the form (string,langTag)
- xsd:hexBinary - the value space is the set of all finite sequences of octets

The following datatype would also be supported in OWL 2:

- xsd:integer - the value space is the subset of the value space of owl:number containing all integers

Finally, we might support the following "shortcut" datatypes, whose value spaces can be defined from the value spaces of the above
mentioned datatypes using facets

- various xsd:integer derivatives, such as xsd:int and xsd:long
- various xsd:string derivatives, such as xsd:Name

3. Allowed constants
---------------------

Conformant OWL 2 implementations are required to support the following constant types:

- "nnn"^^xsd:int and all derivatives that fall within xsd:int - all such constants are to be interpreted as elements of owl:number
- "aaEbb"^^xsd:float - all such constants save for NaN and +-inf are to be interpreted as elements of owl:number
- "abc"^^xsd:string - interpreted as "abc"
- "abc"@langTag - interpreted as a pair ("abc",langTag)


4. Discussion
--------------

The set of constants is chosen such that implementations don't need to support numbers with arbitrary precision, which might be
quite cumbersome. In fact, implementations are only required to support 32 bit integers and single precision floating point numbers.
There are efficient ways to represent these on virtually all systems.

The set of datatypes, however, allows one to refer to the sets of all integers and real numbers. This allows one to specify the
ontology in a way that makes reasoning easy.

Implementations are free to support other constants as well. Note that these extensions do not necessarily mean that we need new
datatypes (i.e., new value spaces). For example, an implementation might choose to support arbitrary precision numbers via constants
of the form "123.03"^^xsd:decimal. Note that the proposed list of datatypes already contains the appropriate value space for such
constants (i.e., owl:number).

The open issues are what to do with NaN and +-inf and with date-time datatypes.

Regards,

	Boris

Received on Tuesday, 8 July 2008 16:18:40 UTC