XML Schema Datatypes in RDF and OWL

Editors' Draft $Date: 2004/12/13 17:37:34 $

Editors:: Jeremy J. Carroll; Jeff Z. Pan

Latest editors' draft (CVS copy): http://www.w3.org/2001/sw/BestPractices/XSCH/xsch-sw/

Abstract

The RDF and OWL Recommendations use the simple types from XML Schema. This document discusses three questions left unanswered by these Recommendations: What URIref should be used to refer to a user defined datatype? Which values of which XML Schema simple types are the same? How to use the problematic xsd:duration in RDF and OWL?

Status of this Document

This is an editors' draft for discussion by the SWBPD WG and other interested parties. The remainder of this status section is fictitious.

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is intended to be a part of a future W3C Working Group Note from the Semantic Web Best Practices and Deployment Working Group, part of the W3C Semantic Web Activity.

This document is intended for public discussion: it poses two questions to do with the use of XML Schema simple types within the Semantic Web, and sketches multiple answers. None of these suggestions have any recorded consensus around them. After public feedback, possibly including additional answers to these questions, the WG, in co-ordination with other W3C WGs hopes to agree on a single answer to each of the two questions. This will then form the basis of a second publication indicating a suggested best practice.

In addition, we update the Semantic Web community concerning on-going progress about the duration datatype.

We particularly seek feedback reporting on implementation experience on these issues.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

O. Introduction
1. Related Datatype Formalisms
@@@todo
2. User Defined Datatypes
3. Comparison of Values
4. Duration
5. References

O. Introduction

An overview of the datatype abstraction used by RDF is found in the [RDF Concepts and Abstract Syntax]; this is shared by the [OWL Abstract Syntax]. Key ideas of RDF datatyping and OWL datatyping are summarized in Section 1. Related Datatype Formalisms.

RDF and OWL allow the use of typed literal values in the description of resources and ontologies. See the [RDF Primer], and the [OWL Guide] for a more introductory treatments for RDF and OWL. Both the [RDF Semantics] and the [OWL Semantics] use the lexical-to-value mapping of the datatype to give the interpretation (the value) of a typed literal, thus the semantics of typed literals is given by the type system. The type systems are defined externally to RDF and OWL, most notably by [XML Schema2].

Concrete syntaxes for typed literals are found in [RDF Syntax], [N-triples], and [N3]. In this document we use N3 such as "10"^^xsd:int following the subset used by the [OWL Test Cases], with the following namespace prefixes:

@@@todo

Some questions about XML Schema datatypes in the Semantic Web are not directly answered by the published W3C Recommendations. This document considers three of them:

Within RDF and OWL, how to refer to an XML Schema user defined simple type with a URI.
Details of the denotational semantics of the values of the primitive XML Schema simple types. XML Schema principally gives an operational semantics. RDF and OWL applications need a denotational semantics for interoperable behaviour.
A possible solution to the problems concerning xsd:duration, which are reported in [RDF Semantics].

Before we go into details of these problems, we briefly summarize related datatype formalisms.

1. Related Datatype Formalisms

1.1 XML Schema Simple Types

[XML SCHEMA2] defines facilities for defining simple types to be used in XML Schema as well as other XML specifications.

[Definition:] An XML Schema simple type d is characterised by a value space, V(d), which is a non-empty set, a lexical space, L(d), which is a non-empty set of Unicode strings, and a set of facets, F(d), each of which characterizes a value space along independent axes or dimensions.

XML Schema simple types are divided into disjoint built-in simple types and derived simple types. Derived datatypes can be defined by derivation from primitive or existing derived datatypes by the following three means:

Derivation by restriction, i.e., by using facets on an existing type, so as to limit the number of possible values of the derived type.
Derivation by union, i.e., to allow value from a list of simple types.
Derivation by list, i.e., to define the list type of an existing simple types.

Example 1A

The following is the definition of a derived simple type (of the base datatype xsd:integer) which restricts values to integers greater than or equal to 0 and less than 150, using the facets minInclusive and maxExclusive.

   <xsd:schema ...>
     <xsd:simpleType name="humanAge">
       <xsd:restriction base="integer">
        <xsd:minInclusive value="0">
        <xsd:maxExclusive value="150">
       </xsd:restriction>
     </xsd:simpleType>
     ...
   </xsd:schema>

1.2 Datatypes in RDF

According to [RDF Semantics], RDF allows the use of datatypes defined by any external type systems, e.g., the XML Schema type system, which conform to the following specification.

[Definition:] In RDF, a datatype d is characterised by a value space, V(d), which is an non-empty set, a lexical space, L(d), which is an non-empty set of Unicode strings, and a total mapping L2V(d) from the lexical space to the value space.

This specification allows the use of non-list XML Schema simple types as datatypes in RDF.

[Definition:] All literals have a lexical form being a Unicode [UNICODE] string. Typed literals are of the form "v"^^u, where "v" is a Unicode string, called the lexical form of the typed literal, and u is a URI reference of a datatype. Plain literals have a lexical form and optionally a language tag as defined by [RFC-3066], normalized to lowercase.

Example 1B

Boolean is a datatype with value space {true,false}, lexical space {"true", "false","1","0"} and lexical-to-value mapping {"true"→true, "false"→false, "1"→true, "0"→false}. "true"^^xsd:boolean is a typed literal, while "true" is a plain literal.

The associations between datatype URI references (e.g., xsd:boolean) and datatypes (e.g., boolean) can be provided by datatype maps defined as follows.

[Definition:] A datatype map D is a partial mapping from datatype URI references to datatypes.

An RDFS-interpretation w.r.t. a datatype map D can be defined as follows.

[Definition:] Given a datatype map D, an RDFS D-interpretation I of a vocabulary V is any RDFS-interpretation of V∪{u |∃d.D(u)=d} which introduces (i) a distinguished subset LV of IR, called the set of literal values, which contains all the plain literals in V, and (ii) a mapping IL from literals in V into IR, and satisfies the following extra conditions:

LV = ICEXT(rdfs:Literal).
For any plain literal pl∈V, IL(pl) = pl.
For each pair <u,d> where d = D(u),
- I(u) ∈ ICEXT(rdfs:Datatype),
- there exists d∈IR s.t. I(u) = d,
- ICEXT(d) = V(d) ⊆ LV,
- for "s"^^u'∈V, I(u') = d, if s∈L(d), then IL("s"^^u') = L2S(d)(s); otherwise, IL("s"^^u') ∈ IR \ LV.
If d ∈ ICEXT(rdfs:Datatype), then <d, I(rdfs:Literal)> ∈ IEXT(rdfs:subClassOf).

1.3 Datatypes in OWL

The fundamental difference between RDF datatyping and OWL DL datatyping is the relationship between datatypes and classes. In OWL DL, datatypes are not classes, and object and datatype domains are disjoint with each other.

OWL allows different OWL reasoners to provide different supported datatypes.

[Definition:] Given a datatype map D, a datatype URI reference u is called a supported datatype URI reference w.r.t. D if there exists a datatype d such that <u,d>∈D (in this case, d is called a supported datatype w.r.t. D); otherwise, u is called an unsupported datatype URI reference w.r.t. D.

OWL provides the use of so called enumerated datatypes, which are built using literals.

[Definition:] Let y₁, ..., y_n be literals. An enumerated datatype is of the form oneOf(y₁, ..., y_n).

An OWL DL D-interpretation w.r.t. a datatype map D can be defined as follows.

[Definition:] An OWL DL datatype interpretation w.r.t. to a datatype map D is a pair (LV,ED), where the datatype domain LV = PL∪∪_{for each
supported datatype URIref u w.r.t. D}D(u) (PL is the value space for plain literals, i.e., the union of the set of Unicode strings and the set of pairs of Unicode strings and language tags) and ED is a datatype interpretation function, which has to satisfy the following conditions:

LV = ED(rdfs:Literal).
For any plain literal pl, ED(pl) = pl ∈ PL.
For each supported datatype URIref u (let d = D(u)):
- ED(u) = V(d) ⊆ LV,
- if s ∈ L(d), then ED("s"^^u) = L2V(d)(s); otherwise, ED("s"^^u) is not defined.
For each unsupported datatype URIref u, ED(u) ⊆ LV and ED("s"^^u) ∈ ED(u).
Each enumerated datatype oneOf(y₁, ..., y_n) is interpreted as {ED(y₁)}∪ ... ∪ {ED(y_n)}.

Note that here we simplify the presentation by using ED as the interpretation function for both datatype URI references and literals, while [OWL Semantics] uses EC for datatypes URI references and L for literals.

As far as OWL Full is concern, the disjointness restriction about object and datatype domains is not required.

1.4 Datatypes in Description Logics

[Pan 2004] presents a scheme of integrating a large family of decidable Description Logics (including SHOIN, the underpinning of OWL DL) with unary datatype groups, so as to support user defined datatypes. A combined DL is decidable if the unary datatype group is conforming. A conforming unary datatype group is equipped with a decision procedure for the satisfiability problem of finite conjunctions over supported datatypes.

[Definition:] A unary datatype group G is a triple <D,B,dom>, where D is a datatype map, B is the set of primitive base datatype URI references in G and dom is the declared domain function. We call S the set of supported datatype URI references, i.e., for each u∈S, D(u) is defined; we require B ⊆ S. The declared domain function dom has the following properties: for each u ∈ S, if u ∈ B, dom(u) = u; otherwise, dom(u) = v, where v ∈ B. We assume that there exists a datatype URI reference rdfsx:DatatypeBottom such that D(rdfsx:DatatypeBottom) is undefined.

Note that in [Pan 2004] datatype groups allow arbitrary datatype predicates, while here we consider only datatypes, which can be regarded as unary datatype predicates.

Example 1C

G₁=(D₁,B₁,dom₁) is a unary datatype group, where

D₁ = {xsd:integer → integer, xsd:string → string, xsd:nonNegativeInteger → ≥₀, xsdx:integerLessThanN → <_N},
B₁ = {xsd:integer, xsd:string},
dom₁ = {xsd:integer → xsd:integer, xsd:string → xsd:string, xsd:nonNegativeInteger → xsd:integer, xsdx:integerLessThanN → xsd:integer}.

According to D₁, we have S₁ = {xsd:integer, xsd:string, xsd:nonNegativeInteger, xsdx:integerGreaterThanN}, hence we have B₁ ⊆ S₁. Note that the value space of <_N is V(<_N) = {i ∈ V(integer) | i < L2S(integer)(N)} and by <_N we mean there exists a built-in datatype <_N for each integer L2S(integer)(N).

In a unary datatype group, datatype expressions can be used to represent user defined datatypes.

[Definition:] Let G be a unary datatype group, the set of G unary datatype expressions, abbreviated Dexp(G), is inductively defined as follows:

let u be a datatype URI reference, u ∈ DPexp(G);
let u be a datatype URI reference, its (relativised) negation not(u) ∈ DPexp(G);
let y₁, ..., y_n be literals, the enumerated datatype oneOf(y₁, ..., y_n) ∈ DPexp(G);
for any p,q ∈ DPexp(G), their conjunction and(p,q) ∈ DPexp(G);
for any p,q ∈ DPexp(G), their disjunction or(p,q) ∈ DPexp(G).

Example 1D

The XML Schema user defined datatype humanAge defined in [Example 1A] can be represented by the following unary datatype expression:

and(xsd:nonNegativeInteger, xsdx:integerLessThan150).

@@@ todo formatting, e.g. above should be centered and ???

[Definition:] A datatype interpretation of a unary datatype group G = (D,B,dom) is a pair (LV,ED), where the datatype domain LV is a non-empty set and ED is a datatype interpretation function that has to satisfies the following conditions:

ED(rdfs:Literal)=LV and ED(rdfsx:DatatypeBottom)} = ∅.
For each plain literal pl, ED(pl) = pl ∈ PL and PL ⊆ LV.
For any two primitive base datatype URI references u₁ and u₂, ED(u₁) ∩ ED(u₂) = ∅.
For each supported datatype URI reference u ∈ S (let d = D(u)):
- ED(u) = V(d) ⊆ V(D(dom(u))) ⊆ LV, L(D(u)) ⊆ L(D(dom(u)) and L2S(D(u)) ⊆ L2S(D(dom(u)),
- if s ∈ L(d), then ED("s"^^u) = L2V(d)(s); otherwise, ED("s"^^u) is not defined.
For each unsupported datatype URI reference u ∉ S, ED(u) ⊆ LV and "s"^^u ∈ ED(u).

The datatype interpretation function ED can be extended to provide semantics to unary datatype expressions as follows:

Relativised negations: if u ∈ S \ D, ED(not(u)) = ED(dom(U)) \ ED(u); otherwise, ED(not(u)) = LV \ ED(u).
Enumerated datatypes: ED(oneOf(y₁, ..., y_n)) = {ED(y₁)}∪ ... ∪ {ED(y_n)}.
Conjunctions: ED(and(p,q)) = ED(p) ∩ ED(q).
Disjunctions: ED(or(p,q)) = ED(p) ∪ ED(q).

The reader is referred to [Pan 2004] for more details about conforming unary datatype groups and how to combine them with Description Logics.

2. User Defined Datatypes

[XML Schema2] predefines about forty simple types, the ones suitable for RDF and OWL are listed in [RDF Semantics].

In addition, XML Schema permits users to refine these builtin types by taking a restriction including only some of the values or some of the lexical forms. As an example, we may wish to talk about ages of adults in years, where an adult is over 18. This can be described as a restriction on the xsd:integer datatype.

   <xsd:schema ...>
     <xsd:simpleType name="adultAge">
       <xsd:restriction base="integer">
        <xsd:minInclusive value="18">
       </xsd:restriction>
     </xsd:simpleType>
     ...
   </xsd:schema>

In a Semantic Web context this may be used with the objects of triples of an eg:age property, used, for instance, when describing some members of a club which is restricted to adults, e.g. a nightclub or a political party.

We will use this example throughout this section, and assume it can be retrieved from http://example.org/simpleTypes.

Within RDF, and RDF reasoning, this additional restriction may be enough to catch some typos or data entry errors (e.g. putting an inappropriate value of 0 for the eg:age property). Within OWL, and OWL reasoning, this may interact with axioms in the ontology to significantly restrict the possible interpretations, adding to the modelling power of the language.

2.1 Problem Statement:

When describing a resource with RDF or building an ontology with OWL, in which a user defined simple XML Schema datatype, such as adultAge above, what URI should be used to identify this datatype?

2.2 DAML+OIL Solution

OWL was explicitly derived from the earlier [DAML+OIL] ontology language. This did include support for user defined simple XML Schema types, as seen in the definition of Senior in the example file ontology (http://www.daml.org/2001/03/daml+oil-ex). The solution is most clearly articulated by [Patel-Schneider]:

OWL can use XML Schema non-list simple types defined at the top level of an XML Schema document and given a name, by using the URI reference constructed from the URI of the document and the local name of the simple type.

Thus with the example, the URI reference http://example.org/simpleTypes#adultAge identifies the datatype.

This is a non-standard approach to fragIDs, and not in conformance with [RFC 2396], which says:

The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. [...]

A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.

and [RFC 3023] which says:

As of today, no established specifications define identifiers for XML media types. However, a working draft published by W3C, namely "XML Pointer Language (XPointer)", attempts to define fragment identifiers for text/xml and application/xml.

In turn, the [XPointer Framework] says of fragments like #adultAge

A shorthand pointer, formerly known as a barename, consists of an NCName alone. It identifies at most one element in the resource's information set; specifically, the first one (if any) in document order that has a matching NCName as an identifier. The identifiers of an element are determined as follows:

If an element information item has an attribute information item among its [attributes] that is a schema-determined ID, then it is identified by the value of that attribute information item's [schema normalized value] property;

If an element information item has an element information item among its [children] that is a schema-determined ID, then it is identified by the value of that element information item's [schema normalized value] property;

If an element information item has an attribute information item among its [attributes] that is a DTD-determined ID, then it is identified by the value of that attribute information item's [normalized value] property.

An element information item may also be identified by an externally-determined ID value.

In short, the XML fragment #adultAge should identify the XML element with id="adultAge" rather than the datatype with name="adultAge".

While the [XPointer Framework] is a W3C Recommendation, it is not fully standard since it is not yet endorsed by an IETF RFC updating [RFC 3023]. So while the approach defined in the XPointer Framework for such fragments consisting of barenames does have widespread support, discussion continues. Some other aspects of the XPointer Framework are more contentious within the IETF community.

@@todo simple N3 example

2.3 Component Designators Solution

Following XML Schema Component Designators [XSCD] the same example would have URI reference http://example.org/simpleTypes#xscd(/type(adultAge)). This approach defines an XPointer scheme that navigates the XML Schema document to identify any of the schema components using a fragment. This is very general: fragments are defined that identify many different aspects of the document, including unnamed simple types within complex schema.

This is still at WD stage and has a dependency on the [XPointer Framework] W3C Recommendation, but whose relationship with [RFC 3023] is not yet fully secured.

The resulting URI references cannot be used with the qname abbreviation used in [N3] and [RDF/A], because they end in a ")" which is not an NCNameChar.

[XSCD] refers to such URI references as being:

The canonical schema component designator for this simple type definition

i.e. referring to the definition rather than to the type defined.

@@todo simple N3 example

2.4 Using the `id` Attribute

In section 2.2, we saw that the RFCs and recommendations concerning fragment IDs in XML schema document were suggestive (but not definitive) in using those fragment IDs which match the NCName construction as identifying elements with the corresponding @id attribute.

An alternative to the Daml+OIL solution is, instead of using the @name attribute, use the @id attribute. This involves modifying the schema defining the datatype, our example would be:

   <xsd:schema ...>
     <xsd:simpleType id="adultAge" name="adultAge">
       <xsd:restriction base="integer">
        <xsd:minInclusive value="18">
       </xsd:restriction>
     </xsd:simpleType>
     ...
   </xsd:schema>

Then we would use the URI reference http://example.org/simpleTypes#adultAge. as before.

This depends on the (less contentious) shorthand pointer part of the [XPointer Framework].

Our example RDF is as before:

@@@todo N3

2.5 Discussion

The DAML+OIL solution is non-standard, and suffers from problems such as the non-uniqueness of names within some XML Schema.

The other two solutions are conformant with XPointer, but this is not fully endorsed by the IETF who control the XML mimetype (RFC 3023), which RFC 2396 defers to concerning fragment ID semantics.

The XSCD solution is still at WD stage.

Both the XSCD and the id solutions have the problem that the proposed URI references designate a syntactic thing, rather than the datatype itself. This could be seen as a property of the relevant definitions (XSCD and XPointer) being concerned about representations of resources rather than resources themselves. The URIref potentially can be seen as denoting (in the sense of RDF Semantics) the datatype, and represented by the XML element describing the datatype.

A position combining aspects of all three solutions might be to use id where possible, and once XSCD goes to Rec to also use that, particularly for XML Schema that are not under the control of the RDF or OWL author.

3. Comparison of Values

Two different authors publishing the same information on the Semantic Web may make different syntactic choices. They then say the same thing in different ways. This is seen most clearly when the two documents entail one another as determined by the [RDF Semantics] or [OWL Semantics].

One aspect of the syntactic choices facing an author is which datatypes to use. Even if they use only the built in [XML SCHEMA2] simple types, there are non-trivial choices, and different authors may legitimately choose different datatypes. This section addresses the issue of how implementations of [RDF Semantics] and [OWL Semantics] should allow for the different choices of datatype made by different authors.

3.1 Problem Statement

What is the relationship between the value spaces of the various XML Schema built-in simple types when used within RDF and OWL ?

Or in other words, when are two literals, which are written down differently, refer to the same value. For example, "10"^^xsd:integer and "010"^^xsd:integer both denote the integer ten.

3.2 Examples

We give some examples, both ones for which the RDF and OWL Semantics are clear, and ones for which the semantics is not clear.

Each example is presented in two ways:

As a pair of literals which may, or may not, denote the same value.
As a possible entailment. Technically the intended entailment is a D-entailment, in terms of [RDF Semantics], or an OWL Full entailment in terms of the [OWL Semantics]. Similar OWL DL entailments could be constructed, illustrating the same issues.

A key term we will use is primitive base datatype in a type system. A recursive definition is:

Each built in primitive datatype is its own primitive base datatype.
The primitive base datatype of a derived simple type is the primitive base datatype of its base datatype.

In other words, the primitive base datatype of a type system is found by walking up the derivation tree until reaching a primitive type. Note that the concept of primitive base datatypes in a type system is slightly different from the concept of primitive base datatypes in a unary datatype group. This is because it is possible that a primitive base datatype of a type system is not in a datatype map, but its derived datatypes are. For instance, in Example_1C, xsd:integer is a primitive base datatype in the unary datatype group G₁.

3.2.1 Easy Examples

It is uncontested that in [XML SCHEMA2] a datatype derived by restriction refers to a subset of the values of its base datatype, and not to different values (see [XML SCHEMA2]).

Hence, two typed literals whose type have the same primitive base datatype, and whose lexical forms are equivalent, are equal.

In addition, [RDF Semantics] explicitly sanctions identification of RDF plain literals without language tags with corresponding typed literals with datatype xsd:string.

Derived Numerics

As a first example "15"^^xsd:byte and "15.0"^^xsd:decimal both denote the same value, fifteen. This follows because xsd:byte has primitive base datatype xsd:decimal.

This licenses the following entailment:

Example 3A

eg:Jane eg:age "15"^^xsd:byte .

entails

eg:Jane eg:age "15"^^xsd:decimal .

The same result holds for two types both of which have primitive base datatype decimal. For example "15"^^xsd:byte and "15"^^xsd:nonNegativeInteger both denote fifteen, and the entailment:

Example 3B

eg:Jane eg:age "15"^^xsd:nonNegativeInteger .

entails

eg:Jane eg:age "15"^^xsd:byte .

Note that xsd:byte is not derived from xsd:nonNegativeInteger, or vice versa, even with intermediate steps.

Derived Strings

xsd:language has primitive base datatype xsd:string. Thus "en-US"^^xsd:language and "en-US"^^xsd:string denote the same value, and the following entailment holds:

Example 3C

eg:doc dc:language "en-US"^^xsd:language .

entails

eg:doc dc:language "en-US"^^xsd:string .

However, despite the language identifier being case insensitive according to [RFC 3066], this case insensitivity is not represented in the datatype, so that "en-US"^^xsd:language and "en-us"^^xsd:language denote different values and we have the following non-entailment:

Example 3D

eg:doc dc:language "en-US"^^xsd:language .

does not entail

eg:doc dc:language "en-us"^^xsd:language .

Plain Strings

The [RDF Semantics] says (in an informative section):

the value space and lexical-to-value mapping of the XSD datatype xsd:string sanctions the identification of typed literals with plain literals without language tags for all character strings which are in the lexical space of the datatype, since both of them denote the Unicode character string which is displayed in the literal;

Thus "en-US"^^xsd:string denotes the same as the plain literal "en-US", and the following two entailments hold:

Example 3E

eg:doc dc:language "en-US"^^xsd:string .

entails

eg:doc dc:language "en-US" .

Example 3F

eg:doc dc:language "en-US"^^xsd:language .

entails

eg:doc dc:language "en-US" .

3.2.2 Hard Examples

When the two typed literals being compared have different primitive base datatypes, the situation is more difficult. The number one for instance can be a float, a double, or a decimal. Are these all the same? Is a URI (an xsd:anyURI) a string? Is a binary blob the same whether it is hex encoded or base64 encoded?

Float and Decimal

A human age is conventionally given as an integer (number of years, except for babies). but a float is a plausible alternative representation. On April 7th Jeremy was forty, is "40"^^xsd:integer the same as "40"^^xsd:float, does:

Example 3G

eg:JeremyCarroll eg:ageInYears "40"^^xsd:integer .

entail

eg:JeremyCarroll eg:ageInYears "40"^^xsd:float .

Floats and doubles are defined with value spaces which are not dense, but heavily influenced by the binary system. Typical decimal numbers, such as 1.3, do not map neatly into that value space, so that "1.3"^^xsd:float takes the value that is as close to 1.3 as possible within the float value space. This is an approximation. So strictly, as numbers, "1.3"^^xsd:float is not the same as "1.3"^^xsd:decimal, but perhaps that is too severe and unhelpful for application developers. Does:

Example 3H

eg:car eg:engineSizeInLitres "1.3"^^xsd:decimal .

entail

eg:car eg:engineSizeInLitres "1.3"^^xsd:float .

Float and Double

Every value that can be represented as a float can also be represented as a double, but as with float and decimal, the two types are derived one from the other. Thus, it can be argued that "40"^^xsd:double and "40"^^xsd:float are different, they typically will be implemented differently in a computer. Thus we can also ask, does:

Example 3J

eg:JeremyCarroll eg:ageInYears "40"^^xsd:double .

entail

eg:JeremyCarroll eg:ageInYears "40"^^xsd:float .

The engine size example is also confusing. "1.3"^^xsd:float has the value 10905190×2^-23, (i.e. approx 1.2999999523) whereas, "1.3"^^xsd:double has the value 5854679515581644×2^-52, (i.e. approx 1.299999999999999822). So we can again ask whether these two typed literals should be treated as the same value or not. This is only a rounding error. Does:

Example 3K

eg:car eg:engineSizeInLitres "1.3"^^xsd:double .

entail

eg:car eg:engineSizeInLitres "1.3"^^xsd:float .

String and anyURI

It is not entirely clear what the intended values of the anyURI datatype are. [RFC 2396] says: A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical resource., so we may choose to think of an anyURI as essentially the same as the equivalent xsd:string, or maybe not. Thus we can ask whether "http://www.example.org/doc"^^xsd:string is the same or different from "http://www.example.org/doc"^^xsd:anyURI. Does:

Example 3L

eg:doc dc:identifier "http://www.example.org/doc"^^xsd:anyURI .

entail

eg:doc dc:identifier "http://www.example.org/doc"^^xsd:string .

hexBinary and base64Binary

The final case where the value spaces of two XML Schema simple types appear to the same are for xsd:hexBinary and xsd:base64Binary. For both the value space is described as: the set of finite-length sequences of binary octets. For instance the binary sequence of two octets (00001111 10110111) (i.e. the 16-bit integer 4023) can be written in hexadecmial as 0FB7. In base64 encoding [RFC 2045] this same sequence of two octets is represented as D7c=. So we can ask whether "0FB7"^^xsd:hexBinary is the same as "D7c="^^xsd:base64Binary. As an entailment:

Example 3M

eg:doc eg:checkSum "0FB7"^^xsd:hexBinary .

entail

eg:doc eg:checkSum "D7c="^^xsd:base64Binary .

Formal Analysis

In discussing the examples, we presented pairs of literals which denoted the same value. This relationship of denoting the same value forms an equivalence relation, which we will write as ~. It is reflexive, symmetric and transitive.

In terms of the [RDF Semantics] the equivalence classes of ~ can be constructed from the interpretation function IL, in the following way:

~ = { X | X = IL^-1( lv ) for each lv ∈ LV}

Each X is one of the equivalence classes.

In terms of OWL DL Semantics [OWL Semantics], this can be constructed in terms of the interpretation function ED as:

~ = { X | X = ED^-1( lv ) for each lv ∈ LV}

All Primitive Types Differ

For implementors, the simplest solution is to agree that all primitive XML Schema Datatypes have disjoint value spaces. Thus all the harder examples (examples G through to M) are non-entailments, and the pair of typed literal values being considered are different, since they have different primitive base datatypes. This approach is also easy to understand.

The easier examples (3A through to 3F) are all entailments, since in each of these, either the primitive base datatype are the same for both the values being considered, or we are using the special rule from [RDF Semantics] that plain literals without language identifiers can be identified with the equivalent xsd:string.

In a unary datatype group, value spaces of primitive base datatypes are required to be disjoint with each other.

XPath 2.0 `eq`

Like RDF and OWL, [XSLT 2.0] and [XQuery 1.0] require the ability to compare values of typed literals. The basic operation they use is the [XPath 2.0] eq operator. Given two literals, this operator does one of:

return true
return false
indicate that the two types are incomparable

eq compares numeric values with different primitive datatypes. e.g. 0 as a float and 0 as a decimal compare true under eq.

Most pairs of primitive types are incomparable under eq, and with the strong typing in [XPath 2.0] such comparisons are errors.

eq is stronger than the primitive-equality discussed above, in that whenever two values derived from the same primitive base datatype are the same according to that primitive base datatype, then they also compare as eq.

eq is designed to take implementation considerations into account. xsd:decimal has implementation variability: [XML SCHEMA2] specifies:

Note: All �minimally conforming� processors �must� support decimal numbers with a minimum of 18 decimal digits (i.e., with a �totalDigits� of 18). However, �minimally conforming� processors �may� set an application-defined limit on the maximum number of decimal digits they are prepared to support, in which case that application-defined maximum number �must� be clearly documented.

As a result of this, we may expect different implementations to make different comparisons in some corner cases, due to rounding and precision issues.

Further [XML SCHEMA2] leaves implementations freedom with the equality function they use for floats and doubles, specifying:

Note: "Equality" in this Recommendation is defined to be "identity" (i.e., values that are identical in the �value space� are equal and vice versa). Identity must be used for the few operations that are defined in this Recommendation. Applications using any of the datatypes defined in this Recommendation may use different definitions of equality for computational purposes; [IEEE 754-1985]-based computation systems are examples. Nothing in this Recommendation should be construed as requiring that such applications use identity as their equality relationship when computing.

These considerations lead to aspects of the definition of eq that are surprising to purists. For example, eq is non-transitive:

"3.2"^^xsd:decimal eq "3.2"^^xsd:float
"3.2"^^xsd:float eq "3.20000000000000000001"^^xsd:decimal

but not

"3.2"^^xsd:decimal eq "3.20000000000000000001"^^xsd:decimal

Infinity, which can be represented in both the xsd:float and xsd:double datatypes is treated specially, and eq is not even reflexive, i.e. it is not the case that:

"INF"^^xsd:float eq "INF"^^xsd:float

Using `eq` in RDF and OWL

Using eq as the basis for typed literal value semantics in RDF and OWL would, from an implementation point of view, amount to using eq at all points where a comparison between two literals is needed. Since there are many different approaches to implementing RDF and OWL semantics such a procedural definition is somewhat unsatisfactory. If this approach is pursused the next version of this document will characterize the resulting implementation variability.

From the point of view of the ontologist or knowledge engineer, this will make clear that for interoperability it is necessary to avoid certain corner cases where rounding errors etc. could give surprising results.

A possible first sketch of a formal expression of the potential variability is as follows. While processing a set S of RDF graphs, over a vocabulary V, then the implementation may use any equivalence relation ~ over the typed literals in V that satisfies the following:

If x ~ y then there is a path x=x₀, x₁, ... x_n=y, such that for each i=0, ... n-1, x_i eq x_i+1.
If x !~ y then x != y and there exists x' ~ x and y' ~ y such that not x' eq y'.

A second possibility might be to require implementations to use an equivalence relation formed as the transitive, symmetric, reflexive closure of eq over the typed literals in the vocabulary V.

Both of the above approaches may give surprises since in the corner cases extending the vocabulary V, by for instance including further graphs, may cause typed literals that were considered different to be considered the same.

A third possibility might be to formally capture the difference between eq and equality by specifying the approximation mapping mapsTo between different datatypes. Therefore, eq("s₁"^^u₁, "s₂"^^u₂) returns:

true if L2S(D(u₁))(s₁) = L2S(D(u₂))(s₂) or mapsTo(D(u₁),D(u₂))("s₁"^^u₁) = L2S(D(u₂))(s₂),
false if L2S(D(u₁))(s₁) ≠ L2S(D(u₂))(s₂) and mapsTo(D(u₁),D(u₂))("s₁"^^u₁) ≠ L2S(D(u₂))(s₂),
incomparable if mapsTo(D(u₁),D(u₂)) is undefined.

Further details on `eq`

Values of the xsd:string type do not compare eq to any other primtive type. Thus the xsd:anyURI example 3L is a non-entailment.

@@@ todo datetime stuff - I think they are all incomparable should check.

hexBinary and base64Binary do not compare under eq, so that example 3M is also a non-entailment.

Numeric comparisons are provided with detailed casting rules to allow for rounding errors etc., as for 1.3 in example 3H and example 3K. These rules also deal with 40 in example 3G and example 3J. Thus all four of these entailments hold.

As a special case, INF^^xsd:float is not eq to itself, it is wholly unclear as to why, and what impact that has.

True Values

Following [CARROLL 2002], it is possible to take a literal reading of [XML Schema Datatypes]. Under this, the numeric types all have exact values, and no allowance is made for rounding. The binary encoding types have the same value space, and can be compared. anyURI is essentially a string restricted by the grammar of [RFC 2396].

With such a literal reading, the harder example entailments hold except for those that involve rounding. i.e. Example 3H and example 3K are non-entailments because despite appearances three different numbers are being considered in these two entailments (1.3, 1.2999999523..., 1.299999999999999822...).

In short, this approach suggests some modifications of the XML Schema datatypes derivation tree.

Discussion

The XPath 2.0 eq position is a compromise between the two theoretically sound positions and may well be the best solution on an 80/20 rule. It is also likely to have best compatibility with XML technologies based on XPath 2.0, and [SPARQL], which uses functions and operators from [F&O]. However, it is problematic that eq is not an equivalence relation because of the corner cases. Implementor feedback is needed as to how significant this problem is.

A summary table showing the examples and how they are treated by the three possible approaches is as follows:

Example	Literal 1	Literal 2	Primitive	`eq`	True
3A	`"15"^^xsd:byte`	`"15"^^xsd:decimal`	true	true	true
3B	`"15"^^xsd:nonNegativeInteger`	`"15"^^xsd:byte`	true	true	true
3C	`"en-US"^^xsd:language`	`"en-US"^^xsd:string`	true	true	true
3D	`"en-US"^^xsd:language`	`"en-us"^^xsd:language`	false	false	false
3E	`"en-US"^^xsd:string`	`"en-US"`	true	true	true
3F	`"en-US"^^xsd:language`	`"en-US"`	true	true	true
3G	`"40"^^xsd:integer`	`"40"^^xsd:float`	false	true	true
3H	`"1.3"^^xsd:decimal`	`"1.3"^^xsd:float`	false	true	false
3J	`"40"^^xsd:double`	`"40"^^xsd:float`	false	true	true
3K	`"1.3"^^xsd:double`	`"1.3"^^xsd:float`	false	true	false
3L	`"http://www.example.org/doc" ^^xsd:anyURI`	`"http://www.example.org/doc" ^^xsd:string`	false	true	true
3M	`"0FB7"^^xsd:hexBinary`	`"D7c="^^xsd:base64Binary`	false	true	true

4. Duration

The [RDF Semantics] Recommendation discourages the use of the xsd:duration datatype (see [XML SCHEMA2]). It says:

[Some] built-in XML Schema datatypes are unsuitable for various reasons, and SHOULD NOT be used: xsd:duration does not have a well-defined value space (this may be corrected in later revisions of XML Schema datatypes, in which case the revised datatype would be suitable for use in RDF datatyping);

The underlying difficulty is the impossibility of an unequivocal answer to the question "How many days in a month?" This has proved problematic in other applications of XML Schema datatypes. The XQuery and XSLT Working Groups have a proposed solution. They derive two new datatypes, xdt:yearMonthDuration and xdt:dayTimeDuration from xsd:duration, sidestepping the unanswerable question. In section 10.2 of [F&O] we read:

[Definition] xdt:yearMonthDuration is derived from xs:duration by restricting its lexical representation to contain only the year and month components. The value space of xdt:yearMonthDuration is the set of xs:integer month values. The year and month components of xdt:yearMonthDuration correspond to the Gregorian year and month components defined in section 5.5.3.2 of [ISO 8601], respectively.

and

[Definition] xdt:dayTimeDuration is derived from xs:duration by restricting its lexical representation to contain only the days, hours, minutes and seconds components. The value space of xdt:dayTimeDuration is the set of fractional second values. The components of xdt:dayTimeDuration correspond to the day, hour, minute and second components defined in Section 5.5.3.2 of [ISO 8601], respectively.

These two new datatypes are suitable for use with RDF and OWL. (Note that they are not yet recommended, since F&O is still in Working Draft).

5. References

@@@todo reorganize: recs, rfcs and standards first, then wds, then other

[RDF-SEMANTICS]: RDF Semantics, Patrick Hayes, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ . Latest version available at http://www.w3.org/TR/rdf-mt/ .
[F&O]: XQuery 1.0 and XPath 2.0 Functions and Operators, Ashok Malhotra, Jim Melton and Norman Walsh (editors), World Wide Web Consortium Working Draft, work in progress, 29 October 2004. This version of Functions and Operators is http://www.w3.org/TR/2004/WD-xpath-functions-20041029/. The latest version of Functions and Operators is at http://www.w3.org/TR/xpath-functions/.
[XML-SCHEMA2]: XML Schema Part 2: Datatypes, Second Edition, W3C Recommendation, World Wide Web Consortium, Paul V. Biron and Ashok Malhotra (editors), 28 October 2004. This version is http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/. The latest version is available at http://www.w3.org/TR/xmlschema-2/.
[ISO 8601]: ISO (International Organization for Standardization). Representations of dates and times, 2000-08-03. Available from: http://www.iso.ch/
[XPath 2.0]: World Wide Web Consortium. XML Path Language Version 2.0 W3C Working Draft. See http://www.w3.org/TR/xpath20/
[RFC 2396]: T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax. IETF RFC 2396. See http://www.ietf.org/rfc/rfc2396.txt.
[XQuery 1.0]: World Wide Web Consortium, XQuery 1.0: An XML Query Language. See http://www.w3.org/TR/xquery/.
[XSLT 2.0]: World Wide Web Consortium, XSL Transformations Language (XSLT) Version 2.0. See http://www.w3.org/TR/xslt20/.
[UNICODE]: The Unicode Standard, Version 3, The Unicode Consortium, Addison-Wesley, 2000. ISBN 0-201-61633-5, as updated from time to time by the publication of new versions. (See http://www.unicode.org/unicode/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database).
[RFC 2045]: N. Freed and N. Borenstein. RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. 1996. Available at: http://www.ietf.org/rfc/rfc2045.txt
[RFC 3066]: H. Alvestrand, ed. RFC 3066: Tags for the Identification of Languages 2001. Available at: http://www.ietf.org/rfc/rfc3066.txt
[Pan 2004]: Description Logics: Reasoning Support for the Semantic Web, Jeff Z.Pan, PhD Thesis, School of Computer Science, The University of Manchester, 2004.
[OWL Abstract Syntax]
[OWL Semantics]: OWL Web Ontology Language Semantics and Abstract Syntax, Peter F. Patel-Schneider, Patrick Hayes, and Ian Horrocks, Editors, W3C Recommendation 10 February 2004, http://www.w3.org/TR/2004/REC-owl-semantics-20040210/ . Latest version available at http://www.w3.org/TR/owl-semantics/ .
[Carroll 2002]: XML Schema Datatypes in RDF See http://lists.w3.org/Archives/Public/www-archive/2002Nov/att-0092/ Jeremy J. Carroll
[SPARQL]: SPARQL Query Language for RDF, Eric Prud'hommeaux and Andy Seaborne, Editors, W3C Working Draft 12 October 2004, http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/ . Latest version available at http://www.w3.org/TR/rdf-sparql-query/ .
[OWL Guide]: OWL Web Ontology Language Guide, Michael K. Smith, Chris Welty, and Deborah L. McGuinness, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-owl-guide-20040210/ . Latest version available at http://www.w3.org/TR/owl-guide/ .
[RDF Primer]: RDF Primer, Frank Manola and Eric Miller, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ . Latest version available at http://www.w3.org/TR/rdf-primer/ .
[RDF Concepts]: Resource Description Framework (RDF): Concepts and Abstract Syntax, Graham Klyne and Jeremy J. Carroll, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . Latest version available at http://www.w3.org/TR/rdf-concepts/ .
[RDF Syntax]: RDF/XML Syntax Specification (Revised), Dave Beckett, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/ . Latest version available at http://www.w3.org/TR/rdf-syntax-grammar/ .
[N-triples]: RDF Test Cases, Jan Grant and Dave Beckett, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/ . Latest version available at http://www.w3.org/TR/rdf-testcases/ .
[XPointer Framework]: RDF Test Cases, Paul Grosso, Eve Maler, Jonathan Marsh and Norman Walsh, Editors, W3C Recommendation, 25 March 2003, http://www.w3.org/TR/2003/REC-xptr-framework-20030325/ . Latest version available at http://www.w3.org/TR/xptr-framework/ .
[DAML+OIL]: DAML+OIL (March 2001) Reference Description. Dan Connolly, Frank van Harmelen, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider, and Lynn Andrea Stein. W3C Note 18 December 2001. Latest version is available at http://www.w3.org/TR/daml+oil-reference.
[Patel-Schnieder]: Web Ontology Working Group e-mail message: '@@@todo' #0265 in November 2002 archive, found at http://lists.w3.org/Archives/Public/www-webont-wg/2002Nov/0265, Peter F. Patel-Schneider, 2002.
[RFC 2046]: N. Freed and N. Borenstein. RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. 1996. Available at: http://www.ietf.org/rfc/rfc2046.txt
[RFC 3023]: MURATA Makoto, Simon St.Laurent, and Dan Kohn, RFC 3023: XML Media Types. Internet Engineering Task Force, 2001.
[XSCD]: XML Schema Component Designators, Mary Holstege and Asir S. Vedamuthu, Editors, W3C Working Draft, 16 July 2004, http://www.w3.org/TR/2004/WD-xmlschema-ref-20040716/ . Latest version available at http://www.w3.org/TR/xmlschema-ref/ .
[RDF/A]: RDF/A Syntax: A collection of attributes for layering RDF on XML languages, Mark Birbeck and Steven Pemberton, 11 October 2004. This version: http://www.formsplayer.com/notes/rdf-a.html . (Internal HTML WG discussion document, reference to be updated or deleted depending on outcome of discussion).
[OWL Test Cases]: OWL Web Ontology Language Test Cases , Jeremy J. Carroll and Jos De Roo, Editors. W3C Recommendation, 10 February 2004,
http://www.w3.org/TR/2004/REC-owl-test-20040210/.
Latest version available at http://www.w3.org/TR/owl-test/.
[N3]: Primer: Getting into RDF & Semantic Web using N3 Tim Berners-Lee, Dan Connolly