Copyright ©2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The RDF and OWL Recommendations use the simple types from XML
Schema. This document discussesaddresses
three questions left unanswered by
these Recommendations: Which URIref should be used to refer to a
user defined datatype? Which values of which XML Schema simple
types are the same? How to use the problematic
xsd:duration
in RDF and OWL?
In addition, we further describe how to
integrate OWL DL with user defined datatypes (in appendix B).
This editors' draft is for WG consideration and review. Changes are listed in the change log , and shown using added or modified text and deleted text.
Moreover, this draft contains two possible answers to the question of when two datatype values are identical.
The remainder of this section is fictitious, and acts as a draft for the status of a Working Group Note.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a Working Group Note, produced by the Semantic Web Best Practices and Deployment Working Group, part of the W3C Semantic Web Activity. Comments on this document may be sent to public-swbp-wg@w3.org, a mailing list with a public archive.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
An overview of the datatype abstraction used by RDF is found in the [RDF Concepts and Abstract Syntax]; this is shared by the [OWL Abstract Syntax]. The semantics of RDF datatyping and OWL datatyping are summarized in appendix A. The semantic framework of integrating OWL DL with user defined datatypes is summarized in appendix B.
RDF and OWL allow the use of typed literal values in the description of resources and ontologies. See the [RDF Primer], and the [OWL Guide] for a more introductory treatments for RDF and OWL. Both the [RDF Semantics] and the [OWL Semantics] use the lexical-to-value mapping of the datatype to give the interpretation (the value) of a typed literal, thus the semantics of typed literals is given by the type system. The type systems are defined externally to RDF and OWL, most notably by [XML Schema2].
Concrete syntaxes for typed literals are found in [RDF Syntax], [N-triples], and [N3].
Some questions about XML Schema datatypes in the Semantic Web are not directly answered by the published W3C Recommendations. This document considers four of them:
xsd:duration
, which are reported
in [RDF Semantics].While this document can be read from start to finish, many readers will benefit from skipping sections.
The intended reader is informed about RDF and/or OWL, and may be a creator or user of metadata or ontologies, or may be an implementor of systems that implement the RDF or OWL Recommendations, or may be the author or editor of related specifications.
The reader who is interested in defining their own datatypes should read section 2 and maybe appendix B, which gives a formal treatment, in terms of OWL DL and user defined datatypes, that has not been covered by the [OWL Semantics].
The reader who is interested in the correct use of datatypes should read section 3, concerning which values are the same, and section 5 concerning numerics, particularly, but not exclusively, for engineering applications.
Implementors probably should read most of the document: appendix A summarizes the formal treatment of datatyping from the recommendations; section 3 gives an extended discussion about equality; section 2 discusses the mapping from URIs to user defined types.
Readers most interested in formal semantics will find most value in appendix B, concerning the computability issue of combining OWL DL with user defined datatypes, and section 3 concerning equality. Such readers should start by reviewing appendix A, which should be familiar.
Section 4 on durations, is of more limited interest, but is significant to any reader who wishes to use, implement or build on top of duration datatypes.
In this document we use N3 such as "10"^^xsd:int
following the subset used by the [OWL Test
Cases], with the following namespace prefixes:
@prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix eg: <http://www.example.org/> . @prefix egdt: <http://example.org/simpleTypes#> . @prefix xsd: <http://www.example.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
[XML SCHEMA2] defines facilities for defining simple types to be used in XML Schema as well as other XML specifications. It is influenced by earlier work on datatypes such as [ISO 11404].
[Definition:] An XML Schema simple type d is characterised by a value space, V(d), which is a non-empty set, a lexical space, L(d), which is a non-empty set of Unicode strings, and a set of facets, F(d), each of which characterizes a value space along independent axes or dimensions.
XML Schema simple types are divided into disjoint built-in simple types and derived simple types. Derived datatypes can be defined by derivation from primitive or existing derived datatypes by the following three means:
The following is the definition of a derived simple type (of the base datatype xsd:integer) which restricts values to integers greater than or equal to 0 and less than 150, using the facets minInclusive and maxExclusive.
<xsd:schema ...> <xsd:simpleType name="humanAge"> <xsd:restriction base="integer"> <xsd:minInclusive value="0"> <xsd:maxExclusive value="150"> </xsd:restriction> </xsd:simpleType> ... </xsd:schema>
[XML Schema2] predefines about forty simple types, the ones suitable for RDF and OWL are listed in [RDF Semantics].
In addition, XML Schema permits users to refine these builtin types by taking a restriction including only some of the values or some of the lexical forms.
As a further example, we may wish to talk about
ages of adults in years, where an adult is over 18. This can be
described as a restriction on the xsd:integer
datatype.
<xsd:schema ...> <xsd:simpleType name="adultAge"> <xsd:restriction base="integer"> <xsd:minInclusive value="18"> </xsd:restriction> </xsd:simpleType> ... </xsd:schema>
In a Semantic Web context this may be used with the objects of
triples of an eg:age
property, used, for instance,
when describing some members of a club which is restricted to
adults, e.g. a nightclub or a political party.
We will use this example throughout this section, and assume it
can be retrieved from
http://example.org/simpleTypes
.
Within RDF, and RDF reasoning, this additional restriction may
be enough to catch some typos or data entry errors (e.g. putting an
inappropriate value of 0
for the eg:age
property). Within OWL, and OWL reasoning, this may interact with
axioms in the ontology to significantly restrict the possible
interpretations, adding to the modelling power of the language.
This section only deals with the problem of how to refer to such datatypes. Their semantics is treated in the appendices. Appendix A reviews the semantics of datatypes from the RDF and OWL recommendations. Appendix B describes how to integrate Description Logics (such as the SHOIN DL, which is the underpinning of OWL DL) with user defined datatypes.
We will also consider the topic of the target namespace from [XML SCHEMA1]. For clarity, we will consider two variants on this example. The first has no target namespace, the second defines one.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="adultAge"> <xs:restriction base="integer"> <xs:minInclusive value="18"> </xs:restriction> </xs:simpleType> ... </xs:schema>
<xs:schema targetNamespace="http://example.org/ns" elementFormDefault="qualified" xmlns:egn="http://example.org/ns" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:simpleType name="adultAge"> <xs:restriction base="integer"> <xs:minInclusive value="18"> </xs:restriction> </xs:simpleType> ... </xs:schema>
The case where the XML Schema has been assembed from multiple schema documents lies outside the scope of this document. This case is discussed in [XML SCHEMA1] and explicitly not discussed in [XSCD].
When describing a resource with RDF or building an ontology with
OWL, in which a user defined simple XML Schema datatype, such as
adultAge
above, what URI should be used to identify
this datatype?
OWL was explicitly derived from the earlier [DAML+OIL] ontology language. This did include support for user defined simple XML Schema types, as seen in the definition of Senior in the example file ontology (http://www.daml.org/2001/03/daml+oil-ex). The solution is most clearly articulated by [Patel-Schneider]:
OWL can use XML Schema non-list simple types defined at the top level of an XML Schema document and given a name, by using the URI reference constructed from the URI of the document and the local name of the simple type.
Thus with the example, the URI reference
http://example.org/simpleTypes#adultAge
identifies the
datatype.
With our example scenario, we may have a property eg:membersAge
with range http://example.org/simpleTypes#adultAge
,
resulting in triples like:
eg:membersAge rdfs:range <http://example.org/simpleTypes#adultAge> . _:aMember eg:name "Jane Doe" . _:aMember eg:membersAge "24"^^<http://example.org/simpleTypes#adultAge> .
This can be abbreviated as:
eg:membersAge rdfs:range egdt:adultAge . _:aMember eg:name "Jane Doe" . _:aMember eg:membersAge "24"^^egdt:adultAge .
As a further example, a club which has members of all ages, but wishes to have a class of its adult members, could use an OWL expression like the following (in the [OWL Abstract Syntax]:
Class(AdultMembers insersectionOf( Members Restriction(eg:membersAge, allValuesFrom(egdt:adultAge)) ) )
These datatype URIs use a non-standard approach to fragIDs, which is not in conformance with [RFC 2396], which says:
The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. [...]
A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.
and [RFC 3023] which says:
As of today, no established specifications define identifiers for XML media types. However, a working draft published by W3C, namely "XML Pointer Language (XPointer)", attempts to define fragment identifiers for text/xml and application/xml.
In turn, the [XPointer Framework]
says
of fragments like #adultAge
A shorthand pointer, formerly known as a barename, consists of an NCName alone. It identifies at most one element in the resource's information set; specifically, the first one (if any) in document order that has a matching NCName as an identifier. The identifiers of an element are determined as follows:
If an element information item has an attribute information item among its [attributes] that is a schema-determined ID, then it is identified by the value of that attribute information item's [schema normalized value] property;
If an element information item has an element information item among its [children] that is a schema-determined ID, then it is identified by the value of that element information item's [schema normalized value] property;
If an element information item has an attribute information item among its [attributes] that is a DTD-determined ID, then it is identified by the value of that attribute information item's [normalized value] property.
An element information item may also be identified by an externally-determined ID value.
In short, the XML fragment #adultAge
should
identify the XML element with id="adultAge"
rather
than the datatype with name="adultAge"
.
While the [XPointer Framework] is a W3C Recommendation, it is not fully standard since it is not yet endorsed by an IETF RFC updating [RFC 3023]. So while the approach defined in the XPointer Framework for such fragments consisting of barenames does have widespread support, discussion continues. Some other aspects of the XPointer Framework are more contentious within the IETF community.
Following XML Schema Component Designators [XSCD] the same
Example 2B
would have
has URI reference
http://example.org/simpleTypes#xscd(/type::adultAge)
.
A URI reference for
Example 2C
requires a choice of prefix for the namespace
http://example.org/ns
.
A good choice is to use the prefix
used by the schema itself, i.e. egn
.
The resulting URI reference for the datatype
is then
http://example.org/simpleTypes#xmlns(egn=http://example.org/ns)xscd(/type::egn:adultAge)
When the schema does not define a prefix for the target namespace, perhaps by using the default namespace, then an arbitrary prefix needs to be chosen. As always with namespace prefixes, it is permitted to use any prefix of your choice, even when a conventional prefix is used in the schema document.
This XML Schema Component Designators [XSCD] approach defines an XPointer scheme that navigates the XML Schema document to identify any of the schema components using a fragment. This is very general: fragments are defined that identify many different aspects of the document, including unnamed simple types within complex schema.
There are some issues with this proposal for the use case of naming datatypes for use in the Semantic Web.
[XSCD] is still at Working Draft stage and has a dependency on the [XPointer Framework] W3C Recommendation, but whose relationship with [RFC 3023] is not yet fully secured.
The resulting URI references formed with [XSCD] cannot be used with the qname abbreviation used in [N3] and [RDF/A], because they end in a ")" which is not an NCNameChar.
[XSCD] is not entirely clear as to exactly what is denoted by these fragment identifiers. The general theme is that they designate components of XML Schema's. At times, this suggest that a designator for a simple type identifies that simple type, e.g.
Schema Component Designators can be used to provide references to arbitrary types [..]
Identifying types for casting, [...]
But other text suggests that what is identified is the schema component describing the type. refers to such URI references as being:
The canonical schema component designator for this simple type definition
i.e. referring to the definition rather than to the type defined.
Our simple example 2B becomes:
eg:membersAge rdfs:range <http://example.org/simpleTypes#xscd(/type::adultAge)> . _:aMember eg:name "Jane Doe" . _:aMember eg:membersAge "24"^^<http://example.org/simpleTypes#xscd(/type::adultAge)> .
One way of reading the fragment is that it provides full
semantic clarity about what is being identified: the xscd(.)
shows that an XML Schema component is being identified; the /type
indicates that a type is being identified; the ::adultAge
shows which is
type is being identified.
The above URIrefs cannot be abbreviated as:
eg:membersAge rdfs:range egdt:xscd(/type::adultAge) . _:aMember eg:name "Jane Doe" . _:aMember eg:membersAge "24"^^egdt:xscd(/type::adultAge) .
because xscd(/type::adultAge)
does not match
the NCName production.
Overall, referring to XML Schema Datatypes in the manner proposed by the XML Schema Working Group is a good practice, and will be moreso, when [XSCD] reaches Recommendation status.
id
AttributeIn section 2.2, we saw that the
RFCs and recommendations concerning fragment IDs in XML schema
document were suggestive (but not definitive) in using those
fragment IDs which match the NCName construction as identifying
elements with the corresponding @id
attribute.
An alternative to the DAML+OIL solution is, instead of using the
@name
attribute, use the @id
attribute.
In cases where the XML Schema is under the control of a Semantic Web author, the full generality of [XSCD] is not needed. This section shows how when defining your own datatype, derived from an XML Schema type, it is possible to use a simpler method, by slightly This involves modifying the schema defining the datatype. Example 2A becomes:
<xsd:schema ...> <xsd:simpleType id="adultAge" name="adultAge"> <xsd:restriction base="integer"> <xsd:minInclusive value="18"> </xsd:restriction> </xsd:simpleType> ... </xsd:schema>
The difference is that the datatype we wish to use
is not only identified by the @name
attribute, but also by an @id
attribute.
While it is technically possibly to use different values
for these two attributes, it would be confusing.
Then we would use the URI reference
http://example.org/simpleTypes#adultAge
. as
before. can then be used to refer
to the datatype.
In the terminology of [RFC 3986],
the URI http://example.org/simpleTypes#adultAge
identifies a secondary resource. When
http://example.org/simpleTypes
is retrieved
as an XML Schema document, with mimetype application/xml
,
this may be taken as
a
This depends on the (less contentious) shorthand
pointer part of
from the [XPointer
Framework].
This identifies a view on the XML representation of the primary resource
being the XML element with the matching @id
attribute.
When used in RDF (see
[RDF Concepts], this URI reference
may be understood with the URI
http://example.org/simpleTypes
as identifying the schema, and the URI
http://example.org/simpleTypes#adultAge
as identifying the datatype itself, a resource defined
or described by the representation identified
by the application/xml
retrieval.
It is preferred that no targetNamespace
is given in the schema for this usage.
Moreover,
If there is no @id
attribute
with the given name,
the
[XPointer
Framework]
is clear that this is an error:
If no element information item is identified by a shorthand pointer's NCName, the pointer is in error.
Our example RDF is as before:
eg:membersAge rdfs:range <http://example.org/simpleTypes#adultAge> . _:aMember eg:name "Jane Doe" . _:aMember eg:membersAge "24"^^<http://example.org/simpleTypes#adultAge> .
Or:
eg:membersAge rdfs:range egdt:adultAge . _:aMember eg:name "Jane Doe" . _:aMember eg:membersAge "24"^^egdt:adultAge .
As a further example, a club which has members of all ages, but wishes to have a class of its adult members, could use an OWL expression like the following (in the [OWL Abstract Syntax]:
Class(AdultMembers insersectionOf( Members Restriction(eg:membersAge, allValuesFrom(egdt:adultAge)) ) )
When referring to arbitrary user defined datatypes
in arbitrary XML Schema, the
[XSCD] solution is appropriate.
When an
RDF or OWL author or tool
is writing an XML Schema for use with an RDF/XML document,
the @id
solution may be preferred.
The DAML+OIL solution is non-standard, and suffers from problems such as the non-uniqueness of names within some XML Schema.
The other two solutions are conformant with XPointer, but this is not fully endorsed by the IETF who control the XML mimetype (RFC 3023), which RFC 2396 defers to concerning fragment ID semantics.
The XSCD solution is still at WD stage.
The id solution, and to a lesser extent the XSCD solution, have the problem that the proposed URI references designate a syntactic thing, rather than the datatype itself. This could be seen as a property of the relevant definitions (XSCD and XPointer) being concerned about representations of resources rather than resources themselves. The URIref potentially can be seen as denoting (in the sense of RDF Semantics) the datatype, and represented by the XML element describing the datatype. Perhaps this reflects that in the Semantic Web we are interested in the formal concept of 'denotation' which may not in all cases be identical to the RFC 2396 concept of 'identify'. It may be possible to argue that the fragIDs identify a syntactic object, that themselves denote an abstraction, and that 'denote' as in RDF Semantics then denotes this abstraction in a two step process (the syntactic identification followed by the step of denoting respecting the semantics of the syntactic object identified).
EDITORS' OPINION: Our preference is a position combining aspects of all three solutions which is to use
@id
where possible, and once XSCD goes to Rec to also use that, particularly for XML Schema that are not under the control of the RDF or OWL author. Jeremy Carroll, Jeff Pan
Two different authors publishing the same information on the Semantic Web may make different syntactic choices. They then say the same thing in different ways. This is seen most clearly when the two documents entail one another as determined by the [RDF Semantics] or [OWL Semantics].
One aspect of the syntactic choices facing an author is which datatypes to use. Even if they use only the built in [XML SCHEMA2] simple types, there are non-trivial choices, and different authors may legitimately choose different datatypes. This section addresses the issue of how implementations of [RDF Semantics] and [OWL Semantics] should allow for the different choices of datatype made by different authors.
What is the relationship between the value spaces of the various XML Schema built-in simple types when used within RDF and OWL?
Or in other words, when do two literals, which are written down
differently, refer to the same value. For example,
"10"^^xsd:integer
and "010"^^xsd:integer
both denote the integer ten.
For implementors, the simplest The most appropriate solution is to agree that all primitive XML Schema Datatypes are treated as having disjoint value spaces. Thus all the harder examples (examples G through to M) are non-entailments, and the pair of typed literal values being considered are different, since they have different primitive base datatypes. This approach is alsoboth easy to understand, and easy to implement. We will call the resulting equality as primitive equality.
The easier examples (3A through to 3F) are all entailments,
since in each of these, either the primitive base datatype are the
same for both the values being considered, or we are using the
special rule from
[RDF Semantics] that plain literals without language
identifiers can be identified with the equivalent
xsd:string
.
Formally, in a unary datatype group, value spaces of primitive base datatypes are required to be defined as disjoint with each other. For instance, if the value space datatype D1 is a subset of that of the datatype D2, then D1 and D2 can not be both primitive base datatypes in a unary datatype group.
eq
Like RDF and OWL, [XSLT
2.0] and
[XQuery 1.0] require the ability to compare values of typed
literals. The basic operation they use is the [XPath
2.0]
eq
operator. Given two literals, this operator
does one of:
eq
compares numeric values with different primitive
datatypes. e.g. 0 as a float and 0 as a decimal compare true under
eq
.
Most pairs of primitive types are incomparable under
eq
, and with the strong typing in [XPath
2.0] such comparisons are errors.
eq
is stronger than the
primitive equality
discussed
above, in that Whenever two values derived from the same
primitive base datatype are the same according to that primitive
base datatype, then they also compare as eq
.
eq
is designed to take implementation
considerations into account. xsd:decimal
has
implementation variability:
[XML SCHEMA2] specifies:
Note: All �minimally conforming� processors �must� support decimal numbers with a minimum of 18 decimal digits (i.e., with a �totalDigits� of 18). However, �minimally conforming� processors �may� set an application-defined limit on the maximum number of decimal digits they are prepared to support, in which case that application-defined maximum number �must� be clearly documented.
Where the application-defined limits are exceeded an error must be thrown. This will result in knowledge bases with decimal values that exceed 18 digits not being usable with minimally conforming processors.
Further [XML SCHEMA2] leaves implementations freedom with the equality function they use for floats and doubles, specifying:
Note: "Equality" in this Recommendation is defined to be "identity" (i.e., values that are identical in the �value space� are equal and vice versa). Identity must be used for the few operations that are defined in this Recommendation. Applications using any of the datatypes defined in this Recommendation may use different definitions of equality for computational purposes; [IEEE 754-1985]-based computation systems are examples. Nothing in this Recommendation should be construed as requiring that such applications use identity as their equality relationship when computing.
These considerations lead to aspects of the definition of
eq
that are surprising to purists. For example,
eq
is non-transitive:
"3.2"^^xsd:decimal eq "3.2"^^xsd:float "3.2"^^xsd:float eq "3.20000000000000000001"^^xsd:decimal
but not
"3.2"^^xsd:decimal eq "3.20000000000000000001"^^xsd:decimal
Infinity, which can be represented in both the
xsd:float
and xsd:double
datatypes is
treated specially, and eq
is not even reflexive, i.e.
it is not the case that:
"INF"^^xsd:float eq "INF"^^xsd:float
eq
in RDF
and OWLUsing eq
as the basis for typed literal value
semantics in RDF and OWL would, from an implementation point of
view, amount to using eq
at all points where a
comparison between two literals is needed. Since there are many
different approaches to implementing RDF and OWL semantics such a
procedural definition is somewhat unsatisfactory.
If this approach
is pursused the next version of this document will characterize the
resulting implementation variability.
From the point of view of the ontologist or knowledge engineer, this will make clear that for interoperability it is necessary to avoid certain corner cases where rounding errors etc. could give surprising results.
TODO: After Face-to-face Nov 2005: decide on one of these formal expressions.
A possible first sketch of a formal expression of the potential variability is as follows. While processing a set S of RDF graphs, over a vocabulary V, then the implementation may use any equivalence relation ~ over the typed literals in V that satisfies the following:
A second possibility might be to require implementations to use an equivalence relation formed as the transitive, symmetric, reflexive closure of eq over the typed literals in the vocabulary V.
Both of the above approaches may give surprises since in the corner cases extending the vocabulary V, by for instance including further graphs, may cause typed literals that were considered different to be considered the same.
A third possibility might be to formally capture the difference
between eq and equality by specifying the approximation mapping
mapsTo between different datatypes. Therefore,
eq("s1"^^u1,
"s2"^^u2
) returns:
u1
))(s1
) =
L2S(D(u2
))(s2
) or
mapsTo(D(u1
),D(u2
))(
"s1"^^u1
) =
L2S(D(u2
))(s2
),u1
))(s1
) ≠
L2S(D(u2
))(s2
) and
mapsTo(D(u1
),D(u2
))(
"s1"^^u1
) ≠
L2S(D(u2
))(s2
),u1
),D(u2
))
is undefined.eq
Values of the xsd:string
type do not compare
eq
to any other primtive type.
Thus the
xsd:anyURI
example
3L is a non-entailment.
hexBinary and base64Binary do not compare under eq
,
so that example
3M is also a non-entailment.
For the date and time datatypes, eq
behaves like primitive equality
Numeric comparisons are provided with detailed casting rules to allow for rounding errors etc., as for 1.3 in example 3H and example 3K. These rules also deal with 40 in example 3G and example 3J. Thus all four of these entailments hold.
As a special case, INF^^xsd:float
is not
eq
to itself, it is wholly unclear as to why, and what
impact that has.
In discussing the examples, we presented pairs of literals which
denoted the same value. This relationship of denoting the same
value forms an equivalence relation, which we will write as
~
; it is conventionally written as '='
and called equality. It is reflexive, symmetric and transitive.
In terms of the
[RDF Semantics]
(see appendix A.1)
the equivalence relation ~
can
be constructed from the interpretation function IL, in the
following way:
~
= { <x,y> : IL(x)=IL(y), for any x, y ∈ LV }
In terms of [OWL Semantics] (see appendix A.2), this can be constructed in terms of the interpretation function ED as:
~
= { <x,y> : ED(x)=ED(y), for any x, y ∈ LV }
A key term we will use in the following examples, is primitive base datatype in a type system. A recursive definition is:
In other words, the primitive base datatype of a type system is found by walking up the derivation restriction tree until reaching a primitive type. Note that the concept of primitive base datatypes in a type system is slightly different from the concept of primitive base datatypes in a unary datatype group. This is because it is possible that a primitive base datatype of a type system is not in a datatype map, but its derived datatypes are. For instance, in Example_B, xsd:integer is a primitive base datatype in the unary datatype group G1.
We give some examples, both ones for which the RDF and OWL semantics are clear, and ones for which the semantics is not clear.
We give two sets of examples. The first set of examples, depend on comparisons where the primitive base datatype is the same. The second set where the primitive base datatype is not. However, the second set are intended to be slightly counter-intuitive, and to illustrate limitations in this approach to comparing typed literals.
We give two sets of examples.
In the first set, the typed literals compared
always have the same
primitive base datatype
and the additional behaviour
coming from use of XPath eq
is not
exercised. In the second set, the comparisons
depend on the additional semantics given by
eq
Each example is presented in two ways:
It is uncontested that in [XML SCHEMA2] a datatype derived by restriction refers to a subset of the values of its base datatype, and not to different values (see [XML SCHEMA2]).
Hence, two typed literals whose type have the same primitive base datatype, and whose lexical forms are equivalent, are equal.
In addition,
[RDF Semantics] explicitly sanctions identification of RDF
plain literals without language tags with corresponding typed
literals with datatype xsd:string
.
As a first example "15"^^xsd:byte
and
"15.0"^^xsd:decimal
both denote the same value,
fifteen. This follows because xsd:byte
has primitive
base datatype xsd:decimal
.
This licenses the following entailment:
The same result holds for two types both of which have primitive
base datatype decimal. For example "15"^^xsd:byte
and
"15"^^xsd:nonNegativeInteger
both denote fifteen, and
the entailment:
Note that xsd:byte
is not derived from
xsd:nonNegativeInteger
, or vice versa, even with
intermediate steps.
xsd:language
has primitive base datatype
xsd:string
. Thus "en-US"^^xsd:language
and "en-US"^^xsd:string
denote the same value, and the
following entailment holds:
eg:doc dc:language "en-US"^^xsd:language .
entails
eg:doc dc:language "en-US"^^xsd:string .
However, despite the language identifier being case insensitive
according to
[RFC 3066], this case insensitivity is not represented in the
datatype, so that "en-US"^^xsd:language
and
"en-us"^^xsd:language
denote different values and we
have the following non-entailment:
eg:doc dc:language "en-US"^^xsd:language .
does not entail
eg:doc dc:language "en-us"^^xsd:language .
The [RDF Semantics] says (in an informative section):
the value space and lexical-to-value mapping of the XSD datatype
xsd:string
sanctions the identification of typed literals with plain literals without language tags for all character strings which are in the lexical space of the datatype, since both of them denote the Unicode character string which is displayed in the literal;
Thus "en-US"^^xsd:string
denotes the same as the
plain literal "en-US"
, and the following two
entailments hold:
When the two typed literals being compared have different
primitive base datatypes,
the situation is more difficult
the comparisons depend on the behaviour defined
for eq
all the values are assumed
to be different, and entailments do not follow, even
when counterintuitive. The
number one for instance can be a float, a double, or a decimal.
Are
these all the same?
Following eq
, these are all equal.
Since they all have different primitive base datatypes,
these are all different.
Is a URI (an xsd:anyURI
) a string?
Is a binary blob the same whether it is hex encoded or base64
encoded?
A human age is conventionally given as an integer (number of
years, except for babies). but a float is a plausible alternative
representation. On April 7th 2004,
Jeremy was forty,
"40"^^xsd:integer
is eq
to
"40"^^xsd:float
, so that:
"40"^^xsd:integer
has a different primitive
basetype to
"40"^^xsd:float
, so that, they are not equal
and:
eg:JeremyCarroll eg:ageInYears "40"^^xsd:integer .
does not entails
eg:JeremyCarroll eg:ageInYears "40"^^xsd:float .
Similarly, float
and double
are different primitive base datatypes, and so
superficially similar values,
such as
"1.3"^^xsd:float
and
"1.3"^^xsd:decimal
are different, and:
Floats and doubles are defined with value spaces which are not
dense, but heavily influenced by the binary system. Typical decimal
numbers, such as 1.3, do not map neatly into that value space, so
that "1.3"^^xsd:float
takes the value that is as close
to 1.3 as possible within the float value space. This is an
approximation. So strictly, as numbers,
"1.3"^^xsd:float
is not the same as
"1.3"^^xsd:decimal
.
, but perhaps that is too severe and
unhelpful for application developers. Does:
However, they compare as equal using eq
,
so that:
eg:car eg:engineSizeInLitres "1.3"^^xsd:decimal .
does not entails
eg:car eg:engineSizeInLitres "1.3"^^xsd:float .
Every value that can be represented as a float can also be
represented as a double, and as with float and decimal,
neither float or double is derived from the other.
However,
"40"^^xsd:double
and "40"^^xsd:float
compare as eq
, so that:
As with float and decimal,
neither float or double is derived from the other.
Thus,
"40"^^xsd:double
and "40"^^xsd:float
are treated as not equal, and:
eg:JeremyCarroll eg:ageInYears "40"^^xsd:double .
does not entails
eg:JeremyCarroll eg:ageInYears "40"^^xsd:float .
Similarly:
The engine size example,
recast in terms of float
and double
,
illustrates a further feature of eq
.
"1.3"^^xsd:float
has the value
10905190×2-23, (i.e. approx 1.2999999523) whereas,
"1.3"^^xsd:double
has the value
5854679515581644×2-52, (i.e. approx
1.299999999999999822). Despite this difference,
eq
compares these two typed
literals as equal, overcoming the
rounding error. So that:
eg:car eg:engineSizeInLitres "1.3"^^xsd:double .
does not entails
eg:car eg:engineSizeInLitres "1.3"^^xsd:float .
Similarly, the two types string
and anyURI
, are distinct
primitive base datatypes. So that,
despite superficial similarities,
"http://www.example.org/doc"^^xsd:string
is different from
"http://www.example.org/doc"^^xsd:anyURI
, and:
The two values
"http://www.example.org/doc"^^xsd:string
and
"http://www.example.org/doc"^^xsd:anyURI
look similar and XPath eq
treats them comparable, (with a type promotion on the anyURI
)
and so they are
equal under the typed literal semantics described.
Thus:
It is not entirely clear what the intended values of the
anyURI
datatype are.
[RFC 2396] says: A Uniform Resource Identifier (URI) is a
compact string of characters for identifying an abstract or
physical resource.
, so we may choose to think of an
anyURI
as essentially the same as the equivalent
xsd:string
, or maybe not. Thus we can ask whether
"http://www.example.org/doc"^^xsd:string
is the same
or different from
"http://www.example.org/doc"^^xsd:anyURI
. Does:
eg:doc dc:identifier "http://www.example.org/doc"^^xsd:anyURI .
does not entails
eg:doc dc:identifier "http://www.example.org/doc"^^xsd:string .
The final case where the value spaces of two XML Schema simple
types appear to the same are for
xsd:hexBinary
and
xsd:base64Binary
. For both the value space is described
as: the set of finite-length sequences of binary octets
. For
instance the binary sequence of two octets (00001111 10110111)
(i.e. the 16-bit integer 4023) can be written in hexadecmial as
0FB7. In base64 encoding
[RFC 2045] this same sequence of two octets is represented as
D7c=. So we can ask whether "0FB7"^^xsd:hexBinary
is the
same as "D7c="^^xsd:base64Binary
. As an
entailment:
Despite this, the two types hexBinary
and base64Binary
, are distinct
primitive base datatypes. So that,
"0FB7"^^xsd:hexBinary
is different from
"D7c="^^xsd:base64Binary
, and:
Despite this, the two types hexBinary
and base64Binary
, are incomparable
with XPath eq
. So that,
eq
gives a type error when comparing
"0FB7"^^xsd:hexBinary
with
"D7c="^^xsd:base64Binary
.
So the two values are treated as distinct and:
eg:doc eg:checkSum "0FB7"^^xsd:hexBinary .
does not entail
eg:doc eg:checkSum "D7c="^^xsd:base64Binary .
While some of the non-entailments shown may be counterintuitive, it is possible to use SPARQL to query a graph and retrieve literal values that are similar even if not derived from the same primitive base type.
For example, related to examples 3H and 3K. Given a graph including the following three triples:
eg:car eg:engineSizeInLitres "1.3"^^xsd:double . eg:car eg:engineSizeInLitres "1.3"^^xsd:decimal . eg:car eg:engineSizeInLitres "1.3"^^xsd:float .
The following [SPARQL] query will match all three.
SELECT ?size WHERE { eg:car eg:engineSizeInLitres ?size . FILTER (?size = 1.3) . }
Following
[CARROLL 2002], it is possible to take a literal reading of
[XML Schema Datatypes]. Under this, the numeric types all have
exact values, and no allowance is made for rounding. The binary
encoding types have the same value space, and can be compared.
anyURI
is essentially a string restricted by the
grammar of
[RFC 2396].
With such a literal reading, the harder example entailments hold except for those that involve rounding. i.e. Example 3H and example 3K are non-entailments because despite appearances three different numbers are being considered in these two entailments (1.3, 1.2999999523..., 1.299999999999999822...).
In short, this approach suggests some modifications of the XML Schema datatypes derivation tree.
The XPath 2.0 eq
position is a compromise between
the two theoretically sound positions and may well be the best
solution on an 80/20 rule. It is also likely to have best
compatibility with XML technologies based on XPath 2.0, and
[SPARQL], which uses functions and operators from
[F&O]. However, it is problematic that eq
is
not an equivalence relation because of the corner cases.
Implementor feedback is needed as to how significant this problem
is.
A summary table showing the examples and how they are treated by the three possible approaches is as follows:
Example | Literal 1 | Literal 2 | Primitive | eq |
True |
---|---|---|---|---|---|
3A | "15"^^xsd:byte |
"15.0"^^xsd:decimal |
true | true | true |
3B | "15"^^xsd:nonNegativeInteger |
"15"^^xsd:byte |
true | true | true |
3C | "en-US"^^xsd:language |
"en-US"^^xsd:string |
true | true | true |
3D | "en-US"^^xsd:language |
"en-us"^^xsd:language |
false | false | false |
3E | "en-US"^^xsd:string |
"en-US" |
true | true | true |
3F | "en-US"^^xsd:language |
"en-US" |
true | true | true |
3G | "40"^^xsd:integer |
"40"^^xsd:float |
false | true | true |
3H | "1.3"^^xsd:decimal |
"1.3"^^xsd:float |
false | true | false |
3J | "40"^^xsd:double |
"40"^^xsd:float |
false | true | true |
3K | "1.3"^^xsd:double |
"1.3"^^xsd:float |
false | true | false |
3L | "http://www.example.org/doc" |
"http://www.example.org/doc" |
false | true | true |
3M | "0FB7"^^xsd:hexBinary |
"D7c="^^xsd:base64Binary |
false | true | true |
EDITORS' OPINION: Our preference is to use XPath
eq
despite it not being an equivalence relation (the difference between the two relations can be captured by the approximation mapping mapsTo). The advantage of this choice is the compatibility with XPath and SPARQL. We hope that the implementation problems are resolvable. Jeremy Carroll, Jeff Pan
The [RDF Semantics]
Recommendation discourages the use of the
xsd:duration
datatype (see [XML SCHEMA2]). It says:
[Some] built-in XML Schema datatypes are unsuitable for various reasons, and SHOULD NOT be used:
xsd:duration
does not have a well-defined value space (this may be corrected in later revisions of XML Schema datatypes, in which case the revised datatype would be suitable for use in RDF datatyping);
The underlying difficulty is the impossibility of an unequivocal
answer to the question "How many days in a month?" This has proved
problematic in other applications of XML Schema datatypes. The
XQuery and XSLT Working Groups have a proposed solution. They
derive two new datatypes,
xdt:yearMonthDuration
and
xdt:dayTimeDuration
from
xsd:duration
, sidestepping the unanswerable question.
In
section 10.2 of [F&O] we
read:
[Definition]
xdt:yearMonthDuration
is derived fromxs:duration
by restricting its lexical representation to contain only the year and month components. The value space ofxdt:yearMonthDuration
is the set ofxs:integer
month values. The year and month components ofxdt:yearMonthDuration
correspond to the Gregorian year and month components defined in section 5.5.3.2 of [ISO 8601], respectively.
and
[Definition]
xdt:dayTimeDuration
is derived fromxs:duration
by restricting its lexical representation to contain only the days, hours, minutes and seconds components. The value space ofxdt:dayTimeDuration
is the set of fractional second values. The components ofxdt:dayTimeDuration
correspond to the day, hour, minute and second components defined in Section 5.5.3.2 of [ISO 8601], respectively.
These two new datatypes are suitable for use with RDF and OWL. (Note that they are not yet recommended, since F&O is still in Working Draft).
For much data on the Semantic Web a motivation for providing type information is to permit the use of the data by engineering applications, and interoperation between engineering applications. Most such data will be marked up using the numeric types from XML Schema.
Loss in precision or unexpected changes in values due to automatic type conversion could be problematic in an engineering environment.
In the engineering domain there are three important types of usage for numerics: count, measurement, and constant.
xsd:integer
or a type derived from xsd:integer
is appropriate for
counts.
xsd:float
or xsd:double
datatypes are appropriate for measurement, but it should be noted
that these do not include a precision or uncertainity, which should
be included as the value of a separate property.
[XML SCHEMA2]
explicitly states for xsd:decimal
that, "Precision is not
reflected in this value space, the number 2.0 is not distinct from
the number 2.00."
xsd:decimal
will be more appropriate than
an xsd:float
or xsd:double
for expressing
a constant.
As an example of a measurement with an error range to indicate a weight in the interval (73.0Kg, 73.2Kg).
eg:JeremyCarroll eg:weight _:w . _:w eg:units "kilogram" . _:w eg:value "73.1"^^xsd:float . _:w eg:errorRange "0.1"^^xsd:float .
These different usages suggest some potential needs and concerns for a type system underlying this.
The first of these issues will generally be reflected
in the use of xsd:integer
for counts,
xsd:float
and xsd:double
for
measurements, and xsd:decimal
for constants.
The second issue concerning precision of measurements, must be addressed at the modelling level by using objects to state precision or error properties for measurements. This is not a bad approach, in any case, since there are often other properties or metadata associated with a measurement.
For the third issue, concerning some constants, no solution is offered.
Evan Wallace is the author of Section 5.
Evan Wallace, Ashok Malhotra, Pat Hayes, Dave Peterson, Dave Reynolds, Michael Sperberg-McQueen and Ralph Swick contributed useful reviews.
According to [RDF Semantics], (see section 5.1), RDF allows the use of datatypes defined by any external type systems, e.g., the XML Schema type system, which conform to the following specification.
[Definition:] In RDF, a datatype d is characterised by a value space, V(d), which is a non-empty set, a lexical space, L(d), which is a non-empty set of Unicode strings, and a total mapping L2V(d) from the lexical space to the value space.
This specification allows the use of non-list XML Schema simple types as datatypes in RDF.
[Definition:]
All literals have a lexical form being a Unicode [UNICODE] string. Typed literals are of
the form "v"^^u
, where "v"
is a Unicode
string, called the lexical form of the typed literal, and
u is a URI reference of a datatype. Plain literals have a
lexical form and optionally a language tag as defined by
[RFC-3066], normalized to
lowercase.
Boolean is a datatype with value space
{true,false}
, lexical space {"true",
"false","1","0"}
and lexical-to-value mapping
{"true"→true, "false"→false, "1"→true, "0"→false}
.
"true"^^xsd:boolean
is a typed literal, while
"true"
is a plain literal.
The associations between datatype URI references (e.g., xsd:boolean) and datatypes (e.g., boolean) can be provided by datatype maps defined as follows.
[Definition:] A datatype map D is a partial mapping from datatype URI references to datatypes.
An RDFS-interpretation w.r.t. a datatype map D can be defined as follows.
[Definition:] Given a datatype map D, an RDFS D-interpretation I of a vocabulary V is any RDFS-interpretation of V∪{u |∃d.D(u)=d} which introduces (i) a distinguished subset LV of IR, called the set of literal values, which contains all the plain literals in V, and (ii) a mapping IL from literals in V into IR, and satisfies the following extra conditions:
"s"^^u'
∈V, I(u') = d, if s∈L(d), then
IL("s"^^u'
) = L2S(d)(s); otherwise,
IL("s"^^u'
) ∈ IR \ LV.OWL Full datatyping follows the RDF Semantics as above; OWL DL datatyping is specified in section 3.1 of the [OWL Semantics], as follows.
The fundamental difference between RDF datatyping and OWL DL datatyping is the relationship between datatypes and classes. In OWL DL, datatypes are not classes, and object and datatype domains are disjoint with each other.
OWL allows different OWL reasoners to provide different supported datatypes.
[Definition:] Given a datatype map D, a datatype URI reference u is called a supported datatype URI reference w.r.t. D if there exists a datatype d such that <u,d>∈D (in this case, d is called a supported datatype w.r.t. D); otherwise, u is called an unsupported datatype URI reference w.r.t. D.
OWL provides the use of so called enumerated datatypes, which are built using literals.
[Definition:] Let y1, ..., yn be literals. An enumerated datatype is of the form oneOf(y1, ..., yn).
An OWL DL D-interpretation w.r.t. a datatype map D can be defined as follows.
[Definition:] An OWL DL datatype interpretation w.r.t. to a datatype map D is a pair (LV,ED), where the datatype domain LV (only) contains the value spaces for each datatype in D and PL (the value space for plain literals, i.e., the union of the set of Unicode strings and the set of pairs of Unicode strings and language tags) and ED is a datatype interpretation function, which has to satisfy the following conditions:
"s"^^u
) = L2V(d)(s);
otherwise, ED("s"^^u
) is not defined."s"^^u
) ∈ ED(u).∪ ...
∪
{ED(yn)}.Note that here we simplify the presentation by using ED as the interpretation function for both datatype URI references and literals, while [OWL Semantics] uses EC for datatypes URI references and L for literals.
In OWL Full, the disjointness restriction between object and datatype domains is not required.
[Pan 2004] and [PH 2005] present a scheme of integrating a large family of decidable Description Logics (including SHOIN, the underpinning of OWL DL) with unary datatype groups, so as to support user defined datatypes. A combined DL is decidable if the unary datatype group is conforming. A conforming unary datatype group is equipped with a decision procedure for the satisfiability problem of finite conjunctions over supported datatypes.
[Definition:] A unary datatype group G is a triple <D,B,dom>, where D is a datatype map, B is the set of primitive base datatype URI references in G and dom is the declared domain function. We call S the set of supported datatype URI references, i.e., for each u∈S, D(u) is defined; we require B ⊆ S. The declared domain function dom has the following properties: for each u ∈ S, if u ∈ B, dom(u) = u; otherwise, dom(u) = v, where v ∈ B. We assume that there exists a datatype URI reference rdfsx:DatatypeBottom such that D(rdfsx:DatatypeBottom) is undefined.
Note that in [Pan 2004] datatype groups allow arbitrary datatype predicates, while here we consider only datatypes, which can be regarded as unary datatype predicates.
G1=(D1,B1,dom1) is a unary datatype group, where
→
integer, xsd:string
→
string, xsd:nonNegativeInteger →
≥0
, xsdx:integerLessThanN →
<N
},→
xsd:integer,
xsd:string →
xsd:string, xsd:nonNegativeInteger
→
xsd:integer, xsdx:integerLessThanN →
xsd:integer}.
According to D1, we have S1 = {xsd:integer,
xsd:string, xsd:nonNegativeInteger, xsdx:integerGreaterThanN},
hence we have B1 ⊆ S1. Note that the value
space of <N
is
V(<N
) = {i ∈ V(integer) | i
<
L2S(integer)(N)} and by
<N
we mean there exists a built-in
datatype <N
for each integer
L2S(integer)(N).
In a unary datatype group, datatype expressions can be used to represent user defined datatypes.
[Definition:] Let G be a unary datatype group, the set unary datatype expressions for G, abbreviated Dexp(G), is inductively defined as follows:
The XML Schema user defined datatype humanAge defined in [Example 1A] can be represented by the following unary datatype expression:
and(xsd:nonNegativeInteger, xsdx:integerLessThan150).
[Definition:] A datatype interpretation of a unary datatype group G = (D,B,dom) is a pair (LV,ED), where the datatype domain LV is a non-empty set and ED is a datatype interpretation function that has to satisfies the following conditions:
"s"^^u
) = L2V(d)(s);
otherwise, ED("s"^^u
) is not defined."s"^^u
∈ ED(u).The datatype interpretation function ED can be extended to provide semantics to unary datatype expressions as follows:
∪ ... ∪
{ED(yn)}.[PH 2005] shows that we can combine any decidable DL (including SHOIN, the underpinning of OWL DL) that provides the conjunction and bottom constructors with a conforming unary datatype group and the combined DL is still decidable.
'Semanitcs' in introduction.
Updated syntax for XML Schema Component Designators.
Changed the anchor #ref-mapsto to #defn-mapsto.
Deleted broken link from description of [ISO 11404]. Added reference to ISO homepage instead.
The earlier draft was a discussion document. This note is not intended as such, so some issues, particularly to do with the interactions between various standards, recommendations, RFCs etc. has been removed.
Removed DAML+OIL solution.
Removed true values solution.
Removed XPath eq solution.
Removed primitive basetype solution.
Moved OWL syntax example from DAML+OIL section to the end of id section.
In the XML Schema Component Designator section:
Deleted words "(less contentious)" and "Moreover, " from id solution.
Changed XML Schema Component Designator section, to indicate that XSCD is a good practice. In particular, see last paragraph.
Changed discussion subsection on user defined datatypes to suggest that both the (remaining) solutions are appropriate, and have no discussion. Changed title to Suggested Practice.
Discussion of harder examples cut down substantially, since these are all trivially non-entailments with the agreed semantics.
Discussion of
harder examples rearranged
and presented to illustrate
the eq
semantics.
Discussion of example 3J was incorrect and has been fixed.
Removed EDITORS' OPINION notes.
Deleted all uses of the word "derivation" in section 1.3 since it has caused confusion. Added links to the XML Schema document for union, list and restriction, to make it clear that the intended concept is "derivation" as defined by that document.
Added brief discussion of target namespace after example 2A providing further examples example 2B and example 2C. Scoped this document to not address "XML Schema [...] assembled from multiple schema documents". Added reference [XML SCHEMA1].
In the XML Schema Component Designator section: added more extended discussion of target namespace issue; and added example XSCD for schema with target namespace.
Added text showing how the @id solution does comply with the secondary resource concept from RFC 3986, when read in conjunction with RDF Concepts, XPointer and XML Schema.
Added "by throwing a type error" to
description
of eq
.
Deleted incorrect comment about INF eq INF.
Removed incorrect discussion of anyURI and string as being incomparable.
Changed example 3L to be an entailment, since the anyURI is promoted to a string.
Reordered subsections in section 3, deleting old 3.4, 3.5, 3.6 and 3.7, and ordering the remaining subsections as follows: 3.1, 3.5, 3.4, 3.2, 3.3. Followed by renumbering.
Text discussing examples has changed, and the change tracking is not detailed.
Moved definition of primitive base datatype from the examples subsection to the formal analysis subsection.
Deleted references to the examples from the new section 3.2 (was 3.5 3.4)
Added example SPARQL query, to show how to use = in SPARQL to compare across the type hierarchy.
Added further acknowledgements.
Updated reference to RFC 2396 to be to RFC 3986
Updated Table of Contents
Deleted promise to characterize implementation variability of eq.
Removed unused references.
Updated versions of W3C WD's in references.