W3C


RIF Datatypes and Built-Ins 1.0

W3C Editor's Draft 23 July 2008

This version:
http://www.w3.org/2005/rules/wg/draft/ED-rif-dtb-20080723/
Latest editor's draft:
http://www.w3.org/2005/rules/wg/draft/rif-dtb/
Previous version:
http://www.w3.org/2005/rules/wg/draft/ED-rif-dtb-20080717/ (color-coded diff)
Editors:
Axel Polleres, DERI
Harold Boley, National Research Council Canada
Michael Kifer, State University of New York at Stony Brook


Abstract

Status of this Document

May Be Superseded

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is being published as one of a set of 6 documents:

  1. RIF Use Cases and Requirements
  2. RIF Basic Logic Dialect
  3. RIF Framework for Logic Dialects
  4. RIF RDF and OWL Compatibility
  5. RIF Production Rule Dialect
  6. RIF Datatypes and Built-Ins 1.0 (this document)

Please Comment By 2008-07-28

The Rule Interchange Format (RIF) Working Group seeks public feedback on these Working Drafts. Please send your comments to public-rif-comments@w3.org (public archive). If possible, please offer specific changes to the text that would address your concern. You may also wish to check the Wiki Version of this document for internal-review comments and changes being drafted which may address your concerns.

No Endorsement

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.



This document, developed by the Rule Interchange Format (RIF) Working Group, specifies a list of primitive datatypes, built-in functions and built-in predicates expected to be supported by RIF dialects such as the RIF Basic Logic Dialect. Each dialect supporting a superset or subset of the primitive datatypes, built-in functions and built-in predicates defined here shall specify these additions or restrictions. Some of the datatypes are adopted from [XML-SCHEMA2]. A large part of the definitions of the listed functions and operators are adapted from [XPath-Functions].


Contents

1 Naming and notational conventions used in this document

Throughout this document we use the following prefixes for compact IRIs [CURIE] used for symbol spaces or for IRI constants in RIF's presentation syntax:

In RIF documents in presentation syntax, these prefixes can be defined using respective prefix directives in the preamble of the RIF document at hand [BLD].


2 Constants, Symbol Spaces, and Datatypes

2.1 Constants and Symbol Spaces

Each constant (that is, each non-keyword symbol) in RIF belongs to a particular symbol space. To refer to a constant in a particular RIF symbol space, we use the following presentation syntax:

"literal"^^<symbolSpaceIri>

where literal is called the lexical part of the symbol, and symbolSpaceIri is an (absolute or relative) IRI identifying the symbol space. Here literal is a Unicode string that must be an element in the lexical space of the symbol space identified by the IRI symbolSpaceIri. We often also use abbreviated syntax for denoting IRIs such as symbol space identifiers, i.e., for example the constant "http://www.example.org"^^<http://www.w3.org/2007/rif#iri> can be abbreviated as "http://www.example.org"^^rif:iri in RIF's presentation syntax. More details about this and other shortcut notations are given below.

2.1.1 Symbol Spaces

Formally, we define symbol spaces as follows.

Definition (Symbol space). A symbol space is a named subset of the set of all constants, Const in RIF. Each symbol in Const belongs to exactly one symbol space.

Each symbol space has an associated lexical space and a unique IRI identifying it. More precisely,

The identifiers of symbol spaces are not themselves constant symbols in RIF.

For convenience we will often use symbol space identifiers to refer to the actual symbol spaces (for instance, we may use "symbol space xs:string" instead of "symbol space identified by xs:string").


RIF dialects are expected to include the following symbol spaces. However, rule sets that are exchanged through RIF can use additional symbol spaces.

Note that, by the associated lexical space, not all unicode strings are syntactically valid lexical parts for all symbol spaces. That is, for instance "1.2"^^xs:decimal and "1"^^xs:integer are syntactically valid constant because 1.2 and 1 are members of the lexical space of symbol spaces xs:decimal and xs:integer, respectively. On the other hand, "a+2"^^xs:decimal is not a syntactically valid constant, since a+2 is not part of the lexical space of xs:decimal.


2.1.2 Shortcuts for Constants in RIF's Presentation Syntax

Besides the basic notion

"literal"^^<identifier>

RIF's presentation syntax introduces several shortcuts for particular symbol spaces, in order to make the presentation syntax more readable. RIF's presentation syntax for constants is defined by the following EBNF.

  ANGLEBRACKIRI ::= IRI_REF
  SYMSPACE      ::= ANGLEBRACKIRI | CURIE
  CURIE         ::= PNAME_LN | PNAME_NS
  Const         ::= '"' UNICODESTRING '"^^' SYMSPACE | CONSTSHORT
  CONSTSHORT    ::= ANGLEBRACKIRI              // shortcut for "..."^^rif:iri
                  | CURIE                      // shortcut for "..."^^rif:iri
                  | '"' UNICODESTRING '"'      // shortcut for "..."^^xs:string
                  | NumericLiteral             // shortcut for "..."^^xs:integer,xs:decimal,xs:double
                  | '_' LocalName              // shortcut for "..."^^rif:local

The EBNF grammar relies on reuse of nonterminals defined in the following grammar productions from other documents:

In this grammar, CURIE stands for compact IRIs [CURIE]. First, compact IRIs can be used for abbreviating symbol space IRIs, for instance it is allowed to write "http://www.example.org"^^rif:iri instead of "http://www.example.org"^^<http://www.w3.org/2007/rif#iri>, assuming that rif is a prefix defined for the IRI http://www.w3.org/2007/rif#iri in a respective prefix directive

Prefix( rif http://www.w3.org/2007/rif# )

in the preamble of the RIF document at hand [BLD].

Apart from compact IRIs, there exist convenient shortcut notations for constants in specific symbol spaces, namely for constants in the symbol spaces rif:iri, xs:string, xs:integer, xs:decimal, xs:double, and rif:local:

Editor's Note: We might introduce additional shortcuts, e.g. for rif:text in future versions of this draft.

2.1.3 Relative IRIs

Relative IRIs in RIF documents are resolved with respect to the base IRI. Relative IRIs are combined with base IRIs as per Uniform Resource Identifier (URI): Generic Syntax [RFC-3986] using only the basic algorithm in Section 5.2 . Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of Internationalized Resource Identifiers (IRIs) [RFC-3987].

A single base directive in the preamble of a RIF document in presentation syntax [BLD] or an xml:base directive in a RIF/XML document define the Base IRI used to resolve relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded in Content". Section 5.1.2 of RFC3986, "Base URI from the Encapsulating Entity" defines how the Base IRI may come from an encapsulating document, such as a SOAP envelope with an xml:base directive, containing a RIF ruleset as payload. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular RIF document was retrieved. If none of the above specifies the Base URI, or several ambiguous base directives are present in the preamble of a RIF document in presentation syntax, the default Base URI (section 5.1.4 of RFC3986, "Default Base URI") is used.

For instance, the constant <./xyz> or "./xyz"^^rif:iri are both valid abbreviations for the constant "http://www.example.org/xyz"^^rif:iri in a RIF document in presentation syntax that has the single base directive

Base( http://www.example.org )

in its preamble.

2.2 Primitive Datatypes

Datatypes in RIF are symbol spaces which have special semantics. That is, each datatype is characterized by a fixed lexical space, value space and lexical-to-value-mapping.

Definition (Primitive datatype). A primitive datatype (or just a datatype, for short) is a symbol space that has

Semantic structures are always defined with respect to a particular set of datatypes, denoted by DTS. In a concrete dialect, DTS always includes the datatypes supported by that dialect. RIF dialects are expected to support the following primitive datatypes. However, RIF dialects may include additional datatypes.

Editor's Note: rif:text (in particular, its identifying IRI) is an AT RISK feature. We expect a joint effort with the OWL WG to discuss rif:text and the equivalent OWL datatype, striving for a uniform symbol space for such text strings with a language tag.

Their value spaces and the lexical-to-value-space mappings are defined as follows:

The value space and the lexical-to-value-space mapping for rif:text defined here are compatible with RDF's semantics for strings with named tags [RDF-SEMANTICS].

3 Syntax and Semantics of Built-ins

3.1 Syntax of Built-ins

A RIF built-in function or predicate is a special case of externally defined terms, which are defined in RIF Framework for Logic Dialects and also reproduced in the direct definition of RIF Basic Logic Dialect (RIF-BLD).

In RIF's presentation syntax built-in predicates and functions are syntactically represented as external terms of the form:

'External' '(' Expr ')'

where Expr is a positional term as defined in RIF Framework for Logic Dialects (see also in RIF Basic Logic Dialect). For RIF's normative syntax, see the XML Serialization Framework in RIF-FLD, or, specifically for RIF-BLD, see XML Serialization Syntax for RIF-BLD.

RIF-FLD introduces the notion of an external schema to describe both both the syntax and semantics of exernally defined terms. In the special case of a RIF built-in, external schemas have an especially simple form. A built-in named f that takes n arguments has the schema

( ?X1 ... ?Xn;   f(?X1 ... ?Xn) )

Here f(?X1 ... ?Xn) is the actual term that is used to refer to the built-in (in expressions of the form External(f(?X1 ... ?Xn))) and ?X1 ... ?Xn is the list of all variables in that term.

For convenience, a complete definition of external schemas is reproduced in Appendix: Schemas for Externally Defined Terms.


3.2 Semantics of Built-ins

The semantics of external terms in RIF-FLD and RIF-BLD is defined using two mappings: Iexternal and Itruth ο Iexternal.

4 List of RIF Built-in Predicates and Functions

This section provides a catalogue defining the syntax and semantics of a list of built-in predicates and functions in RIF. For each built-in, the following is defined:

  1. The name of the built-in.
  2. The external schema of the built-in.
  3. For a built-in function, how it maps its arguments into a result.

    As explained in Section Semantics of Built-ins, this corresponds to the mapping Iexternal(σ) in the formal semantics of RIF-FLD and RIF-BLD, where σ is the external schema of the built-in.

  4. For a built-in predicate, its truth value when the arguments are substituted with values in the domain.

    As explained in Section Semantics of Built-ins, this corresponds to the mapping Itruth ο Iexternal(σ) in the formal semantics of RIF-FLD and RIF-BLD, where σ is the external schema of the built-in.

  5. The intended domains for the arguments of the built-in.

    Typically, built-in functions and predicates are defined over the value spaces of appropriate datatypes. These are the intended domains of the arguments. When an argument falls outside of its intended domain, it is understood as an error. Since this document defines a model-theoretic semantics for RIF built-ins, which does not support the notion of an error, the definitions leave the values of the built-in predicates and functions unspecified in such cases. This means that if one or more of the arguments is not in its intended domain, the value of Iexternal(σ)(a1 ... an) can vary from one semantic structure to another. Similarly, Itruth ο Iexternal(σ)(a1 ... an) can be t in some interpretations and f in others when an argument is not in the intended domain.

    This indeterminacy in case of an error implies that applications must not make any assumptions about the values of built-ins in such situations. Implementations are even allowed to abort in such cases and the only safe way to communicate rule sets that contain built-ins among RIF-compliant systems is to use datatype guards.


Many built-in functions and predicates described below are adapted from [XPath-Functions] and, when appropriate, we will refer to the definitions in that specification in order to avoid copying them.

4.1 Guard Predicates for Datatypes

RIF defines guard predicates for all datatypes in Section Primitive Datatypes.

Accordingly, the following schemas are defined.

4.1.1 pred:isInteger

4.1.2 pred:isDecimal

4.1.3 pred:isDouble

4.1.4 pred:isString

4.1.5 pred:isTime

4.1.6 pred:isDate

4.1.7 pred:isDateTime

4.1.8 pred:isDayTimeDuration

4.1.9 pred:isYearMonthDuration

4.1.10 pred:isXMLLiteral

4.1.11 pred:isText

Editor's Note: It was noted in discussions of the working group, that except guard predicates, also an analogous built-in function or predicate to SPARQL's datatype function is needed. This however has some technical implications, see http://lists.w3.org/Archives/Public/public-rif-wg/2008Jul/0096.html

4.2 Negative Guard Predicates for Datatypes

Likewise, RIF defines negative guard predicates for all datatypes in Section Primitive Datatypes.

Accordingly, the following schemas are defined.

4.2.1 pred:isNotInteger

4.2.2 pred:isNotDecimal

4.2.3 pred:isNotDouble

4.2.4 pred:isNotString

4.2.5 pred:isNotTime

4.2.6 pred:isNotDate

4.2.7 pred:isNotDateTime

4.2.8 pred:isNotDayTimeDuration

4.2.9 pred:isNotYearMonthDuration

4.2.10 pred:isNotXMLLiteral

4.2.11 pred:isNotText

Future dialects may extend this list of guards to other datatypes, but RIF does not require guards for all datatypes.

4.3 Cast Functions and Conversion Predicates for Datatypes and rif:iri

RIF defines cast functions for all datatypes mentioned in this document, i.e. for each datatype with IRI DATATYPEIRI there is an external function with the following schema:

We now discuss the intended domains and mappings for these cast functions.

Editor's Note: In the following, we adapt several cast functions from [XPath-Functions]. Due to the subtle differences in e.g. error handling between RIF and [XPath-Functions], these definitions might still need refinement in future versions of this draft.


4.3.1 xs:double, xs:integer, xs:decimal, xs:time, xs:date, xs:date,xs:dateTime, xs:dayTimeDuration, xs:yearMonthDuration

Editor's Note: We might split this subsection into separate subsections per casting function in future versions of this document, following the convention of having one separate subsection per funtcion/predicate in the rest of the document. However, it seemed convenient here to group the cast functions which purely rely on XML Schema datatype casting into one common subsection.

4.3.2 xs:string

Editor's Note: The cast from rif:text to xs:string is still under discussion, i.e. whether the lang tag should be included when casting to xs:string or not.

4.3.3 rdf:XMLLiteral

4.3.4 rif:text

4.3.5 rif:iri

Editor's Note: Casting to rif:iri is still under discussion in the working group since rif:iri is not a datatype. For details, we refer to Issue-61. The following is a strawman proposal which might still change in future versions of this working draft.

Additionally to the built-in cast functions for datatypes we allow conversions to constants in the rif:iri symbol space from xs:strings following similar considerations as conversions from xs:string to xs:anyURI in [XPath-Functions]. Technically speaking, we cannot proceed as with the other cast functions, defining the semantics via a fixed mapping Iexternal for an external schema ( ?arg1; rif:iri ( ?arg1 ) ), since rif:iri is not a datatype with a fixed value space and fixed lexical-to value mapping. Instead, casts to rif:iri are defined via an infinite set of axiomatic equalities in every RIF interpretation as follows.

The following equalities hold in every RIF interpretation for each unicode string a which is in the lexical space of the rif:iri symbol space:

Thus, although there is no explicit schema ( ?arg1; rif:iri ( ?arg1 ) ) in RIF, casts between xs:strings and rif:iris are still possible in RIF with the intended semantics that the IRI represented by a particular string can be cast to this very string and vice versa.

4.3.6 pred:iri-to-string

Editor's Note: Conversion from rif:iri to xs:string is still under discussion in the working group since rif:iri is not a datatype. For details, we refer to Issue-61. The following is a strawman proposal which might still change in future versions of this working draft.

Conversions from rif:iri to xs:string are not covered by the xs:string casting function above. Note here, that we cannot apply axiomatic equalities as for the rif:iri casting function; if we assume equalities

xs:string("http://example.org/iriA"^^rif:iri) = "http://example.org/iriA"^^xs:string
xs:string("http://example.org/iriB"^^rif:iri) = "http://example.org/iriB"^^xs:string

and a ruleset asserted an additional equality

"http://example.org/iriA"^^rif:iri = "http://example.org/iriB"^^rif:iri

this would immediately result in

"http://example.org/iriA"^^xs:string = "http://example.org/iriB"^^xs:string

which is inconsistent in RIF due to the definition of the lexical-to-value mapping for xs:string which maps two distinct strings to two distinct domain elements in every interpretation.

To this end, since conversions from IRIs (resources) to strings are a needed feature for instance for conversions between RDF formats (see example below), we add a built-in predicate which supports such conversions.

4.4 Numeric Functions and Predicates

The following functions and predicates are adapted from the respective numeric functions and operators in [XPath-Functions].

4.4.1 Numeric Functions

4.4.1.1 func:numeric-add (adapted from op:numeric-add)

The following numeric built-in functions func:numeric-subtract, func:numeric-multiply, func:numeric-divide, func:numeric-integer-divide, and func:numeric-mod are defined accordingly with respect to their corresponding operators in [XPath-Functions] and we will only add further explanations where needed.

4.4.1.2 func:numeric-subtract (adapted from op:numeric-subtract)

4.4.1.3 func:numeric-multiply (adapted from op:numeric-multiply)

4.4.1.4 func:numeric-divide (adapted from op:numeric-divide)

4.4.1.5 func:numeric-integer-divide (adapted from op:numeric-integer-divide)

4.4.1.6 func:numeric-mod (adapted from op:numeric-mod)

4.4.2 Numeric Predicates

4.4.2.1 pred:numeric-equal (adapted from op:numeric-equal)

The following numeric built-in predicates pred:numeric-less-than and pred:numeric-greater-than are defined acordingly with respect to their corresponding operators in [XPath-Functions]. The predicate pred:numeric-not-equal has the same intended domain as pred:numeric-equal and is true whenever pred:numeric-equal is false. The predicates pred:numeric-less-than-or-equal (and pred:numeric-greater-than-or-equal, respectively) are true whenever pred:numeric-equal is true or pred:numeric-less-than (pred:numeric-greater-than-or-equal, respectively) is true.

4.4.2.2 pred:numeric-less-than (adapted from op:numeric-less-than)

4.4.2.3 pred:numeric-greater-than (adapted from op:numeric-greater-than)

4.4.2.4 pred:numeric-not-equal

4.4.2.5 pred:numeric-less-than-or-equal

4.4.2.6 pred:numeric-greater-than-or-equal

4.5 Functions and Predicates on Strings

The following functions and predicates are adapted from the respective functions and operators on strings in [XPath-Functions].

Editor's Note: The following treatment of built-ins which may have multiple arities is a strawman proposal currently under discussion in the working group.

In the following, we encounter several versions of some built-ins with varying arity, since XPath and XQuery allow overloading, i.e. the same function or operator name occurring with different arities. We treat this likewise in RIF, by numbering the different versions of the respective built-ins and treating the unnumbered version as syntactic sugar, i.e. for instance instead of External( func:concat2( str1, str2) ) and External( func:concat3( str1 str2 str3 ) ) we allow the equivalent forms External( func:concat( str1, str2) ) and External( func:concat( str1 str2 str3 ) ). Note that this is really purely syntactic sugar, and does not mean that for external predicates and functions we lift the restriction made in BLD that each function and predicate has a unique assigned arity. Those schemata for which we allow this syntactic sugar, appear in the same box.

4.5.1 Functions on Strings

4.5.1.1 func:compare (adapted from fn:compare)

Editor's Note: The working group is currently discussing, whether in addition to adopting the fn:compare function from [XPath-Functions], own predicates pred:string-equal, pred:string-less-than, pred:string-greater-than, pred:string-not-equal, pred:string-less-than-or-equal, pred:string-greater-than-or-equal not defined in [XPath-Functions] shall be introduced, following the convention of having such predicates for other datatypes.

The following schemata are defined analogously with respect to their corresponding operators as defined in [XPath-Functions] and we only give informal descriptions of the respective mappings Iexternal.

4.5.1.2 func:concat (adapted from fn:concat)

4.5.1.3 func:string-join (adapted from fn:string-join)

4.5.1.4 func:substring (adapted from fn:substring)

4.5.1.5 func:string-length (adapted from fn:string-length)

4.5.1.6 func:upper-case (adapted from fn:upper-case)

4.5.1.7 func:lower-case (adapted from fn:lower-case)

4.5.1.8 func:encode-for-uri (adapted from fn:encode-for-uri)

4.5.1.9 func:iri-to-uri (adapted from fn:iri-to-uri)

4.5.1.10 func:escape-html-uri (adapted from fn:escape-html-uri)

4.5.1.11 func:substring-before (adapted from fn:substring-before)

4.5.1.12 func:substring-after (adapted from fn:substring-after)

4.5.1.13 func:replace (adapted from fn:replace)

4.5.2 Predicates on Strings

4.5.2.1 pred:contains (adapted from fn:contains)


4.5.2.2 pred:starts-with (adapted from fn:starts-with)


4.5.2.3 pred:ends-with (adapted from fn:ends-with)