Copyright © 2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
The OWL Working Group seeks public feedback on these Working Drafts. Please send your comments to public-owl-comments@w3.org (public archive). If possible, please offer specific changes to the text that would address your concern. You may also wish to check the Wiki Version of this document for internal-review comments and changes being drafted which may address your concerns.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Contents |
Internationalized text — that is, text that additionally conveys information in terms of a language tag — is used in several existing W3C specifications, such as RDF, XML, OWL, and RIF. This specification defines a datatype called rdf:text in order to allow specifications such as RDF, OWL, and RIF to refer to internationalized text literals in an interoperable way. Parallel efforts have been made to support internationalized strings by several W3C working groups, including the OWL WG and the RIF WG. Collaboration between the two working groups on the choice of language constructs for internationalized strings has lead to the present specification [1][2].
Parts of this document are based on the current work on rif:text [3] (RIF WG) and owl:internationalizedString [4] (OWL WG), for more details see a summary.
A character is an atomic unit of communication. The structure of characters is not further specified in this document, other than to note that each character has a Universal Character Set (UCS) code point [ISO/IEC 10646] (or, equivalently, a Unicode code point [UNICODE]). The set of available characters is assumed to be infinite, and it is thus independent from the current version of UCS and Unicode.
A string is a finite sequence of characters. The length of a string is the number of characters in it. Strings are written in this specification by enclosing them in quotes. Two strings are identical if they contain exactly the same sequence of characters.
To understand the rationale behind the assumption on the infinite number of characters, consider the following OWL 2 ontology:
ClassAssertion( a:i MinCardinality( n a:some-property DatatypeRestriction( xsd:string xsd:length 1 ) ) )
Intuitively, this OWL 2 axiom states that the individual a:i is connected to at least n different strings of length 1. If one assumes that there are exactly m UCS characters, then this ontology is satisfiable if and only if n ≤ m. This has several undesirable consequences:
In order to avoid such problems, this specification assumes that the number of UCS characters is infinite; that is, m = ∞. Despite this assumption, at any given point in time, UCS provides means of addressing only a finite subset of this set.
Thus, the example ontology is satisfiable regardless of with respect to which version of UCS it is interpreted.
A language tag is a string of the form as specified in BCP 47 [BCP 47].
This specification uses Uniform Resource Identifiers (URIs) for naming datatypes and their components, which are defined in RFC 3986 [RFC 3986]. For readability, URIs are often abbreviated according to the convention of XML Namespaces [XML Namespaces]. The following namespace prefixes are used throughout this document:
Datatypes are defined in this document along the lines of XML Schema Datatypes [XML Schema Datatypes]. Each datatype is identified by a URI and is described by the following components:
The italicized keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY specify certain aspects of the normative behavior of tools implementing this specification, and are interpreted as specified in RFC 2119 [RFC 2119].
The datatype identified by the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#text (abbreviated rdf:text) allows for the representation of internationalized text literals. In addition to the RIF and OWL specifications, this datatype is expected to supersede RDF's plain literals with language tags, cf. [5], which is why this datatype has been added into the rdf: namespace.
Value Space. The value space of rdf:text is the set of all pairs of the form 〈 "text" , "lang" 〉, where "text" is a string and "lang" is either the empty string "" or a lowercase language tag.
Lexical Space. A lexical value of an rdf:text literal is a string "val" that contains at least one character @ and that satisfies the following condition:
Let i be the position of the last character @ in "val", and let "abc" and "tag" be the substrings of "val" containing the characters up to and after position i (noninclusive), respectively. Then ,"tag" MUST be either empty or a valid language tag.
Each such lexical value is assigned a data value 〈 "abc", "lc-tag" 〉, where "lc-tag" is the string "tag" converted to lowercase.
Lexical value "Family Guy@en" is mapped to the data value 〈 "Family Guy" , "en" 〉, and "Family Guy@" is mapped to 〈 "Family Guy" , "" 〉. Furthermore, "Family Guy" is not a valid lexical value of rdf:text because it does not contain the character @.
Facet Space. The facet space of the rdf:text datatype is shown in Table 1.
A pair of the form... | ...is mapped to the subset of the value space of
rdf:text containing all pairs of the form 〈 "text" , "lang" 〉 such that... |
---|---|
〈 xsd:minLength v 〉 where v is a nonnegative integer |
the length of "text" is at least v |
〈 xsd:maxLength v 〉 where v is a nonnegative integer |
the length of "text" is at most v |
〈 xsd:length v 〉 where v is a nonnegative integer |
the length of "text" is exactly v |
〈 xsd:pattern v 〉 where v is a string specifying a regular expression with the syntax as in Section F of XML Schema Datatypes [XML Schema Datatypes] |
"text" matches the regular expression v |
〈 rdf:langPattern v 〉 where v is a string specifying a regular expression with the syntax as in Section F of XML Schema Datatypes [XML Schema Datatypes] |
"lang" matches the regular expression v |
The xsd:string datatype is a datatype defined in XML Schema Datatypes [XML Schema Datatypes] as having the value space equal to the set of all strings. Thus, the value space of xsd:string is not a subset of the value space of rdf:text, which may cause problems for certain applications of this specification. A similar problem arises with XML Schema datatypes that are derived from xsd:string.
To overcome this difficulty, specifications that use rdf:text MAY choose to interpret the datatypes from the following list in a slightly different way. The resulting datatypes have value spaces that are isomorphic with the value spaces from XML Schema Datatypes [XML Schema Datatypes], but that are subsets of the value space of rdf:text.
Value Space. For DT a datatype from the above list, the value space of DT is a set of pairs of the form 〈 "text" , "" 〉 where "text" is a string matching the restrictions of DT as specified in XML Schema Datatypes [XML Schema Datatypes] and "" is the empty string.
Lexical Space. For DT a datatype from the above list, the lexical space of DT is a string "text" that matches the restrictions of DT as specified in XML Schema Datatypes [XML Schema Datatypes]. Each lexical value "text" is assigned a data value 〈 "text" , "" 〉.
Facet Space. Each datatype DT from the above list supports the constraining facets xsd:minLength, xsd:maxLength, xsd:length, and xsd:pattern. The facet value of each pair for DT is the same as in Table 1, with the difference that the result is a subset of the value space of DT rather than of rdf:text.
In syntaxes such as the RIF presentation syntax [6], the OWL 2 functional-style syntax [7], or the TURTLE syntax [8], literals are written using the form "rep"^^datatypeURI. This specification defines a convenient representation for rdf:text and xsd:string literals. In particular, literals of the form "text@lang"^^rdf:text where "lang" is not empty can be abbreviated as "text"@lang; furthermore, literals of the form "text"^^xsd:string can be abbreviated as "text". If an implementation supports abbreviation of literals, it SHOULD abbreviate the literals eagerly whenever possible.
The abbreviated literals can be written using the following grammar. A subset of the N-triples quoting mechanism is employed in order to allow strings to contain quotes.
quotedString := '"' a
finite sequence of characters with double quotes and backslashes
replaced by the double quote or backslash preceded by a backslash
'"'
languageTag := a nonempty
(not quoted) string defined as specified in BCP-47 [BCP-47]
abbreviatedXSDStringLiteral :=
quotedString
abbreviatedRDFTextLiteral :=
quotedString '@' languageTag
abbreviatedLiteral :=
abbreviatedXSDStringLiteral |
abbreviatedRDFTextLiteral
Text matching the abbreviatedXSDStringLiteral production SHOULD be mapped to an xsd:string literal, and text matching the abbreviatedRDFTextLiteral production SHOULD be mapped to an rdf:text literal.
"Padre de familia"@es is an abbreviation for the rdf:text literal "Padre de familia@es"^^rdf:text — a literal denoting a pair consisting of the string "Padre de familia" and the language tag es denoting the Spanish language. Furthermore, "Padre de familia" is an abbreviation for an xsd:string literal "Padre de familia"^^xsd:string, which is mapped to the same data value as the rdf:text literal "Padre de familia@"^^rdf:text.
Corresponding sections in The OWL 2 Structural Specification and Functional-Style Syntax, OWL Model-Theoretic Semantics and RIF Data Types and Built-Ins will be updated once an agreement is made. It is currently not clear whether this document will contain a definition of facets on rdf:text.
This section defines constructor functions, operators, and functions on the rdf:text datatype. The terminology used and structure to describe these functions and operators is in accordance with the XQuery 1.0 and XPath 2.0 Functions and Operators [XPathFunc]. The error codes used in this section are given in Appendix G of the XPath 2.0 specification [XPath20] and Appendix C of XQuery and XPath function specification [XPathFunc].
fn:text-from-string-lang( $arg1 as xsd:string, $arg2 as xsd:string) as rdf:text
Summary: returns the data value 〈 $arg1, $arg2 〉 of type rdf:text. The arguments both have to be of type xsd:string or one of its subtypes; otherwise, this function raises type error err:FORG0006.
fn:text-from-string( $arg as xsd:string) as rdf:text
Summary: returns the data value ($arg, "") of type rdf:text. The argument has to be of type xsd:string or one of its subtypes; otherwise, this function raises type error err:FORG0006.
fn:string-from-text( $arg as rdf:text) as xsd:string
Summary: extracts the string part s from the argument $arg = 〈 s, l 〉 of type rdf:text. The argument $arg has to be of type rdf:text; otherwise, this function raises type error err:FORG0006.
fn:lang-from-text( $arg as rdf:text ) as xsd:lang
Summary: extracts the language tag l from the argument $arg = 〈 s, l 〉 of type rdf:text. The argument $arg has to be of type rdf:text; otherwise, this function raises type error err:FORG0006.
The notion of collations used in this section is taken from Section 7.3.1 of XPath and XQuery function specification [XPathFunc].
op:text-equal( $comparand1 as rdf:text, $comparand2 as rdf:text ) as xsd:boolean
Summary: returns true if and only if both the string parts and the language parts of $comparand1 and $comparand2 are equal; otherwise, this function returns false.
This function may be viewed as the a declared XQuery function with the following definition:
declare function op:text-equal( $comparand1 as rdf:text, $comparand2 as rdf:text ) as xsd:boolean { return if ( fn:compare ( fn:lang-from-text( $comparand1 ), fn:lang-from-text( $comparand2 ) ) = 0 && fn:compare ( fn:string-from-text( $comparand1 ) , fn:string-from-text( $comparand2 ) = 0 ) then fn:true() else fn:false() }
fn:text-compare( $comparand1 as rdf:text?, $comparand2 as rdf:text? ) as xsd:integer?
fn:text-compare( $comparand1 as rdf:text?, $comparand2 as rdf:text?, $collation as xsd:string ) as xsd:integer?
Summary: returns the empty sequence if one of the arguments is empty or if the language parts of $comparand1 and $comparand2 are unequal; otherwise, this function returns -1, 0, or 1 depending on whether the value of the string-part of $comparand1 is respectively less than, equal to, or greater than the value of the string-part of $comparand2. The collation used by the invocation of this function is determined according to the rules in Section 7.3.1 of the XPath and XQuery functions specification [XPathFunc].
These two functions may be viewed as declared XQuery functions with the following definitions:
declare function fn:text-compare( $comparand1 as rdf:text?, $comparand2 as rdf:text? ) as xsd:integer? { return if ( fn:compare ( fn:lang-from-text( $comparand1 ), fn:lang-from-text( $comparand2 ) ) = 0 ) then fn:compare ( fn:string-from-text( $comparand1 ) , fn:string-from-text( $comparand2 ) ) }
declare function fn:text-compare( $comparand1 as rdf:text?, $comparand2 as rdf:text? $collation as xsd:string ) as xsd:integer? { return if ( fn:compare ( fn:lang-from-text( $comparand1 ), fn:lang-from-text( $comparand2 ) ) = 0 ) then fn:compare ( textstring-from-text( $comparand1 ) , textstring-from-text( $comparand2 ), $collation) }
fn:text-length($arg as rdf:text) as xsd:integer
Summary: returns the number of characters that constitute the string part of $arg.
This function may be viewed as a declared XQuery function with the following definition:
declare function fn:text-length($arg as rdf:text?) as xsd:integer { return fn:string-length ( fn:string-from-text( $arg ) ) }
fn:matches-language-range($input as rdf:text?, $range as xsd:string) as xsd:boolean
Summary: returns true if the language tag part of $input is a valid language tag according to BCP-47 [BCP-47], and if it matches the language-range expression supplied as $range as specified by the algorithm for "Matching of Language Tags" which is part of BCP-47 [BCP-47]; otherwise, it returns false.
An empty input sequence is treated as a rdf:text value consisting of the empty string and the empty language tag. Since the empty string is not a valid language tag according to BCP-47 [BCP-47], on such input this function returns false.