W3C

OWL 2 Web Ontology Language:
A Datatype for Internationalized Text

W3C Editor's Draft 22 September 2008

This version:
http://www.w3.org/2007/OWL/draft/ED-owl2-rdf-text-20080922/
Latest editor's draft:
http://www.w3.org/2007/OWL/draft/owl2-rdf-text/
Authors:
Jie Bao, Rensselaer Polytechnic Institute, Troy, New York, USA
Axel Polleres, DERI Galway at the National University of Ireland, Galway, Ireland
Boris Motik, Oxford University, Oxford, UK


Abstract

This document presents the specification for a primitive datatype representing internationalized text that is used both in the RIF and OWL languages.

Status of this Document

May Be Superseded

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is being published as one of a set of 8 documents:

  1. Structural Specification and Functional-Style Syntax
  2. Model-Theoretic Semantics
  3. RDF-Based Semantics
  4. Mapping to RDF Graphs
  5. XML Serialization
  6. Profiles
  7. Conformance and Test Cases
  8. A Datatype for Internationalized Text (this document)

Please Comment By ASAP

The OWL Working Group seeks public feedback on these Working Drafts. Please send your comments to public-owl-comments@w3.org (public archive). If possible, please offer specific changes to the text that would address your concern. You may also wish to check the Wiki Version of this document for internal-review comments and changes being drafted which may address your concerns.

No Endorsement

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Contents


1 Introduction

Internationalized text — that is, text that additionally conveys information in terms of a language tag — is used in several existing W3C specifications, such as RDF, XML, OWL, and RIF. This specification defines a datatype called rdf:text in order to allow specifications such as RDF, OWL, and RIF to refer to internationalized text literals in an interoperable way. Parallel efforts have been made to support internationalized strings by several W3C working groups, including the OWL WG and the RIF WG. Collaboration between the two working groups on the choice of language constructs for internationalized strings has lead to the present specification [1][2].

Parts of this document are based on the current work on rif:text [3] (RIF WG) and owl:internationalizedString [4] (OWL WG), for more details see a summary.

2 Preliminaries

A character is an atomic unit of communication. The structure of characters is not further specified in this document, other than to note that each character has a Universal Character Set (UCS) code point [ISO/IEC 10646] (or, equivalently, a Unicode code point [UNICODE]). The set of available characters is assumed to be infinite, and it thus independent from the currently actual version of UCS and Unicode.

A string is a finite sequence of characters. The length of a string is the number of characters in it. Strings are written in this specification by enclosing them in quotes. Two strings are identical if they contain exactly the came characters in exactly the same order.

To understand the rationale behind the assumption on the infinite number of characters, consider the following OWL 2 ontology:

ClassAssertion( a:i MinCardinality( n a:some-property DatatypeRestriction( xsd:string length 1 ) ) )

Intuitively, this OWL 2 axiom states that the individual a:i is connected to at least n different strings of length 1. If one assumes that there are exactly m UCS characters, then this ontology is satisfiable if and only if nm. This has several undesirable consequences:

In order to avoid such problems, this specification assumes that the number of UCS characters is infinite; that is, m = ∞. Despite this assumption, at any given point in time, UCs provides means of addressing only a finite subset of this set.

Thus, the example ontology is satisfiable regardless of with respect to which version of UCS it is interpreted.

The following namespace prefixes are used throughout this document:

3 Definition of the rdf:text Datatype

The datatype identified by the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#text (abbreviated rdf:text) allows for the representation of internationalized text literals. Except for the RIF and OWL specifications, this datatype is expected to supersede RDF's plain literals with language tags, cf. [5], which is why this datatype has been added into the rdf: namespace.

The datatype is defined along the lines of XML Schema Datatypes [XML Schema Datatypes] as consisting of a value space and a lexical space, with a mapping between the lexical value (i.e., an element of the lexical space) and the data value (i.e., an element of the value space). The former determines the set of values, whereas the latter provides means for referring to particular values. This specification also defines several shortcuts that can be used in abstract syntaxes such as the presentation syntaxes of OWL and RIF, or in the TURTLE syntax for RDF.

Value Space. The value space of rdf:text is the set of all pairs of the form 〈 "text" , "lang" 〉, where "text" is a string and "lang" is either the empty string "" or a lowercase language tag as specified in BCP-47 [BCP-47].

Lexical Space. A lexical value of an rdf:text literal is a string "val" that contains at least one character @ and that satisfies the following condition: let i be the position of the last character @ in "val", and let "abc" and "tag" be the substrings of "val" containing the characters up to and after position i (noninclusive), respectively; then "tag" must be either empty or a valid language tag. Each such lexical value is assigned a data value 〈 "abc", "lc-tag" 〉, where "lc-tag" is the string "tag" converted to lowercase.

Lexical value "Family Guy@en" is mapped to the data value 〈 "Family Guy" , "en" 〉, and "Family Guy@" is mapped to 〈 "Family Guy" , "" 〉. Furthermore, "Family Guy" is not a valid lexical value of rdf:text because it does not contain the character @.

3.1 xsd:string as a restriction of rdf:text

The xsd:string datatype is datatype defined in XML Schema Datatypes [XML Schema Datatypes] as having the value space equal to the set of all strings. Thus, the value space of xsd:string is not a subset of the value space of rdf:text, which may cause problems for certain applications of this specification. A similar problem arises with XML Schema datatypes that are derived from xsd:string.

To overcome this difficulty, specifications that use rdf:text MAY choose to interpret the datatypes from the following list in a slightly different way. The resulting datatypes have value spaces that are isomorphic with the value spaces from XML Schema Datatypes [XML Schema Datatypes], but that are subsets of the value space of rdf:text.

Value Space. For DT a datatype from the above list, the value space of DT is a set of pairs of the form 〈 "text" , "" 〉 where "text" is a string matching the restrictions of DT as specified in XML Schema Datatypes [XML Schema Datatypes] and "" is the empty string.

Lexical Space. For DT a literal from the above list, the lexical space of DT is a string "text" that matches the restrictions of DT as specified in XML Schema Datatypes [XML Schema Datatypes]. Each lexical value "text" is assigned a data value 〈 "text" , "" 〉.

3.2 Abbreviations of rdf:text and xsd:string Literals

In syntaxes such as the RIF presentation syntax [6], the OWL 2 functional-style syntax [7], or the TURTLE syntax [8], literals are written using the form "rep"^^datatypeURI. This specification defines a convenient representation for rdf:text and xsd:string literals. In particular, literals of the form "text@lang"^^rdf:text where "lang" is not empty can be abbreviated as "text"@lang; furthermore, literals of the form "text"^^xsd:string can be abbreviated as "text". If an implementation supports abbreviation of literals, it SHOULD abbreviate the literals eagerly whenever possible.

The abbreviated literals can be written using the following grammar. A subset of the N-triples quoting mechanism is employed in order to allow strings to contain quotes.

quotedString := '"' a finite sequence of characters with double quotes and backslashes replaced by the double quote or backslash preceded by a backslash '"'
languageTag := a nonempty (not quoted) string defined as specified in BCP-47 [BCP-47]
abbreviatedXSDStringLiteral := quotedString
abbreviatedRDFTextLiteral := quotedString '@' languageTag
abbreviatedLiteral := abbreviatedXSDStringLiteral | abbreviatedRDFTextLiteral

Text matching the abbreviatedXSDStringLiteral production SHOULD be mapped to an xsd:string literal, and text matching the abbreviatedRDFTextLiteral production SHOULD be mapped to an rdf:text literal.

"Padre de familia"@es is an abbreviation for the rdf:text literal "Padre de familia@es"^^rdf:text — a literal denoting a pair consisting of the string "Padre de familia" and the language tag es denoting the Spanish language. Furthermore, "Padre de familia" is an abbreviation for an xsd:string literal "Padre de familia"^^xsd:string, which is mapped to the same data value as the rdf:text literal "Padre de familia@"^^rdf:text.

4 Open Issues

Corresponding sections in The OWL 2 Structural Specification and Functional-Style Syntax, OWL Model-Theoretic Semantics and RIF Data Types and Built-Ins will be updated once an agreement is made. It is currently not clear whether this document will contain a definition of facets on rdf:text.

5 References

[RFC-4646]
RFC 4646 - Tags for Identifying Languages. M. Phillips and A. Davis. IETF, September 2006, http://www.ietf.org/rfc/rfc4646.txt. Latest version is available as BCP-47, (details) .
[UNICODE]
The Unicode Standard. The Unicode Consortium.
[IRC Log July 21, 2008]
Joint meeting of OWL, RIF and I18N WGs..
[ISO/IEC 10646]
ISO/IEC 10646-1:2000. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane and ISO/IEC 10646-2:2001. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 2: Supplementary Planes, as, from time to time, amended, replaced by a new edition or expanded by the addition of new parts. [Geneva]: International Organization for Standardization. ISO (International Organization for Standardization).
[BCP-47]
BCP-47 - Tags for Identifying Languages. A. Phillips, M. Davis, eds., IETF, September 2006, http://www.rfc-editor.org/rfc/bcp/bcp47.txt.
[XML Schema Datatypes]
XML Schema Part 2: Datatypes Second Edition. Paul V. Biron and Ashok Malhotra, eds. W3C Recommendation 28 October 2004.