## Introduction

The basis of the language is formed by conditions that can appear in the bodies of Horn-like rules with equality -- conjunctions and disjunctions of atomic formulas and equations (such rules reduce to pure Horn). We later extend our proposal to include builtins. As indicated in RIF Condition Language, the motivation is that this sublanguage can be shared among the bodies of the rules expressed in the RIF dialects LP, FO, PR, RR, and can also be used to uniformly express the body-like IC and QY sublanguages.

This document proposes a positive-condition syntax and semantics to account for webizing (the use of URIs for constants, predicates, and functions) and primitive data types (integers, floats, time, date, etc.). The main novelty is that there is no longer a wall between the domains of constants, functions, and predicates. Instead, all these symbols are drawn from the * same* domain. Separation between the different kinds of symbols is introduced through the mechanism of sorts. For instance, we can introduce a sort

**URI**for URIs, and the sorts

**integer**,

**float**,

**time**,

**string**, etc. for the corresponding data types. We can decide that certain sorts must be disjoint (integers, time) and others are not (eg,

**URI**can be a subsort of

**string**). We can control what sorts can be used for relation symbols (predicates), function symbols, etc. For example, we can decide that only strings can be predicates (which includes URIs, if

**URI**is a subsort of

**string**). Or we can decide that only

**URI**and

**localSymbol**(where

**localSymbol**is some kind of a subsort of

**string**) can be predicates and functions.

The first few sections on the syntax and semantics use a single domain for constants, functions, and predicates. The multisorted extension is described in the section Multisorted Extensions of the RIF Core.

## SYNTAX

Essential BNF for human-readable syntax, where Expressions and Atoms abstract from the underlying data model in that they can take a Herbrand (positional) or another form:

Var ::= '?' NAME? TERM ::= Con | Var | Expr Expr ::= Con '(' TERM* ')' Atom ::= Expr LITFORM ::= Atom | TERM '=' TERM QUANTIF ::= 'Exists' Var+ '(' CONDIT ')' CONJ ::= 'And' '(' CONDIT* ')' DISJ ::= 'Or' '(' CONDIT* ')' CONDIT ::= LITFORM | QUANTIF | CONJ | DISJ

Here the Herbrand form (Expr or Atom) applies its operator (Con) to positional TERM arguments using round parentheses, (... TERM ...). Notice that LITFORM stands for Literal Formula and anticipates the introduction of negated atoms later on. QUANTIF stands for Quantified Formula, which for Horn-like conditions can only be 'Exists' Formulas (Var+ variables should occur free in the scoped CONDIT, so 'Exists' can quantify them; free variables are discussed below). More explicitly than in logic programming, CONJ expresses formula conjunctions, and DISJ expresses disjunctions. Finally, CONDIT combines everything and defines RIF conditions, which can later be extended beyond LITFORM, QUANTIF, CONJ, or DISJ.

We initially assume that all constants (Con) belong to one logical sort: the sort of elementary entities. Relation names and function symbols are also in Con. Likewise, variables are initially not sorted and thus can range over all constants, expressions, etc. Both of these assumptions will be subsequently refined to allow multiple sorts for different data types. See section "Multisorted Extensions of the RIF Core".

At this point we do not commit to any particular vocabulary for the names of variables and for constants. For instance, NAME could be any alphanumeric string and a variety of options could be used for Con. We leave the decision till later time. A stand-alone '?' denotes an anonymous variable, each of whose occurrences is equivalent to a variable with a fresh NAME.

Note that there are two uses of variables in the RIF Condition Language: free and quantified. All quantified variables are quantified explicitly, existentially (and also universally, later). We adopt the usual scoping rules for quantification from first-order logic. Variables that are not explicitly quantified are free.

The free variables are needed because we are dealing with conditions that occur in rule bodies only. When a condition occurs in such a rule body, the free variables in the condition are precisely those that also occur in the rule head. Such variables are quantified universally outside of the rule, and the scope of such quantification is the entire rule. For instance, the variable ?X in the rule below is free in the condition that occurs in the rule body, but it is universally quantified outside of the rule.

Condition with a free variable ?X: ... Exists ?Y (condition(..?X..?Y..)) ... Rule using the condition in its body: Forall ?X (head(...?X...) :- ... Exists ?Y (condition(..?X..?Y..)) ...)

When conditions are used as queries, their free variables are to be bound to carry the answer bindings back to the caller.

The semantics of conditions is defined in the section SEMANTICS.

Example 1 (A Herbrand RIF condition in human-readable syntax): In this condition, ?Buyer is quantified existentially, while ?Seller and ?Author are free: And ( Exists ?Buyer (purchase(?Buyer ?Seller book(?Author LeRif) $49)) ?Seller=?Author )

This syntax is similar in style, and compatible to, the OWL Abstract Syntax http://www.w3.org/TR/owl-semantics/syntax.html.

An XML syntax can be obtained from the above BNF as follows. The non-terminals in all-upercase such as CONDIT become XML entities, which act like macros and will not be visible in instance markups. The other non-terminals as well as symbols ('Exists' etc. as well as '=') become XML elements, which are adapted from RuleML as shown below.

- Con (constant individual, function, or relation) - Var (logic variable, empty for anonymous variable) - Expr (expression formula) - Atom (atomic formula) - Equal (prefix version of term equation '=') - Exists (quantified formula for 'Exists') - declare (declare role, containing a Var) - formula (formula role, containing a CONDIT formula) - And (conjunction) - Or (disjunction)

This can be directly rewritten as a DTD (adapting PositiveConditions.dtd) or an XML Schema.

The condition formula in Example 1 can be serialized in XML as shown below.

Example 2 (A Herbrand RIF condition in XML syntax): <And> <Exists> <declare><Var>Buyer</Var></declare> <formula> <Atom> <Con>purchase</Con> <Var>Buyer</Var> <Var>Seller</Var> <Expr> <Con>book</Con> <Var>Author</Var> <Con>LeRif</Con> </Expr> <Con>$49</Con> </Atom> </formula> </Exists> <Equal> <Var>Seller</Var> <Var>Author</Var> </Equal> </And>

Using the DTD spec, Richard Goerwitz' STG Validator succeeds with the conjunction of Example 2:

<?xml version="1.0" standalone="no"?> <!DOCTYPE And SYSTEM "http://www.jdrew.org/rif/PositiveConditions.dtd"> <And> ... content of Example 2 ... </And>

This XML version can be derived from a 'fully striped' version in the abstract syntax notation (cf. asn06), which permits metasyntax interoperation with OWL, RDF, etc.

## SEMANTIC STRUCTURES (a.k.a. INTERPRETATIONS)

The first step in defining a model-theoretic semantics for a logic-based language is to define the notion of a *semantic structure*, also known as *interpretation*, and then to define the notion of truth valuation for the formulas in the language.

In case of the first-order semantics, the setting given here is one of the standard common definitions. Although it is not as frequently used as some other well-known definitions of semantic structures, it has the advantage of being easy to generalize to non-first-order cases --- for instance, rule sets with negation as failure (NAF), some of which (e.g., well-founded negation) use three-valued semantic structures, and settings, such as the Web, where information can be uncertain or contradictory. In the latter case, four-valued and other multi-valued semantic structures are used. (See, for example, M. Fitting, Fixpoint Semantics for Logic Programming A Survey, Theoretical Computer Science, 1999.)

A semantic structure is a mappings of the form

TV |

where * TV* is the set of all truth values. Thus, if φ if a formula then

*(φ) is its truth value.*

**I**The set of truth values * TV* typically has only two values,

**t**and

**f**. However, some versions of NAF have three,

**t**,

**u**(undefined), and

**f**, and, as we remarked, treatment of contradictions and uncertainty requires at least four:

**t**,

**u**,

**f**, and

**i**(inconsistent).

The set * TV* is assumed to have a total or partial order, called the

*truth order*; it is denoted <

_{t}. For instance, in the first-order case,

**f**<

_{t}

**t**, and it is a total order. In the well-founded semantics,

**f**<

_{t}

**u**<

_{t}

**t**, and it is again a total order. But in Belnap-style four-valued logics, which are suitable for dealing with uncertain or inconsistent information, the truth order is partial:

**f**<

_{t}

**u**<

_{t}

**t**and

**f**<

_{t}

**i**<

_{t}

**t**.

As a side remark, Belnap-style logics also have another order, called the *knowledge order* <_{k}: **u** <_{k} **t** <_{k} **i**; and **u** <_{k} **f** <_{k} **i**. Under the knowledge order, true and false are incomparable, and facts that are both true and false receive the truth value **i**, which is the least upper bound of **f** and **t** in the knowledge order.

More formally, let us define the following sets:

- D - a non-empty set (of domain elements),
- Con - the set of syntax elements recognized by the Con / entity production,
- Var - the set of syntax elements recognized by the Var / ?name production

An interpretation * I* consists of four mappings:

**I**_{C}from Con to elements of D**I**_{V}from Var to elements of D**I**_{F}from Con to functions from D* into D (D* is a set of all tuples over domain D)**I**_{R}from Con to truth-valued mappings D* ->**TV**

Using these mappings, we can define a more general mapping, *I*, as follows:

(k) =**I****I**_{C}(k) if k is a constant(?v) =**I****I**_{V}(?v) if v is a variable(f(t1,...,tn)) =**I****I**_{F}(f)((t1),...,**I**(tn))**I**

As explained earlier, an interpretation is supposed to map formulas to truth values. We define this mapping now:

Atomic formulas:

(r(t1,...,tn)) =**I****I**_{R}(r)((t1),...,**I**(tn))**I**Equality:

(t1=t2) =**I****t**iff(t1) =**I**(t2) and it is**I****f**otherwise.Conjunction:

(And(c1,...,cn)) = min**I**_{t}((c1),...,**I**(cn)), where min**I**_{t}is minimum with respect to the truth order.Disjunction:

(Or(c1,...,cn)) = max**I**_{t}((c1),...,**I**(cn)), where max**I**_{t}is maximum with respect to the truth order.Quantification:

(Exists v1 ... vn (c)) = max**I**_{t}(*(c)), where max**I**_{t}is taken over all interpretations* of the form <**I****I**_{C},***I**_{V},**I**_{F},**I**_{R}>, where***I**_{V}is the same as**I**_{V}except possibly on the variables v1,...,vn (i.e.,* agrees with**I**everywhere except possibly in its interpretation of the mappings of variables v1 ... vn).**I**

## Multisorted Extensions of the RIF Core

The classical idea of sorted logic can easily account for the ideas of primitive data types, URIs as identifiers of objects and concepts, and more. Many logic languages (e.g., Prolog, HiLog, F-logic, RDF) allow the same symbol to play multiple roles. For instance, the same symbol *foo* can be used as a constant, a predicate of several different arities, and as a function symbol of different arities. To account for such languages, we will use a *multisorted* logic.

In a multisorted RIF core, each constant from Con is associated with one or more sorts. A sort can be * primitive*, an

*, or a*

**arrow sort***. Arrow sorts are also known as*

**Boolean sort***function*sorts and Boolean sorts are also known as

*predicate*sorts.

Primitive sorts are drawn from a fixed collection of sorts PS_{1}, ..., PS_{n}. These sorts are intended to model primitive data types. For instance, we could have the sorts *integer*, *strings*, *time*, *dates*, etc. The same constant can be associated with more than one primitive sort, so it is possible that the sort of short integers will be a subsort of the sort of long integers (i.e., every constant that is associated with the sort *short* will also be associated with the sort *long*). It is a common practice to distinguish the constants of different primitive sorts syntactically. For instance, constants of the primitive sort *integer*, would have a different syntax from constants of sort *string*, and constants of primitive type URI would have yet another syntax.

An * arrow sort* is a statement of the form s

_{1}× ... × s

_{k}→ s, where s

_{1}, ..., s

_{k}, s are names of primitive sorts (i.e., one of the PS

_{1}, ..., PS

_{n}). A

*is a statement of the form s*

**Boolean sort**_{1}× ... × s

_{k}, where, again, s

_{1}, ..., s

_{k}are names of primitive sorts.

Recall that RIF core uses the symbols from Con to denote constants, predicates, and function symbols alike, so the same symbol can occur in multiple contexts. However, it is useful to restrict the contexts in which various symbols are allowed to occur. For instance, Prolog or RDF don't place any such restrictions, but OWL-DL has a unique role for each symbol. This restriction of the context is accomplished by controlling the sorts that are associated with each constant. For instance, if one doesn't want integers to occur as predicate and function symbols then we don't associate any arrow or Boolean sorts with the constants that are associated with primitive sort *integer*. On the other hand, we do want URIs to denote concepts and other predicates. In that case, we would associate every Boolean sort with every constant that has a primitive sort URI. If we want to also allow *local* names for concepts and other predicates, then we might introduce a separate primitive sort, *localPred*, and endow it with every Boolean type.

### Multisorted Syntax

#### Multisorted Syntax of Primitive Sorts

The human-readable syntax for primitive sorts (types) can use an infix operator between a term and its type. For example, reusing the 'double-^{}^{}^' infix of N3/Turtle, the term '6' can be given the type '#Perfect_number' by writing '6double-^{}^{}^#Perfect_number'. A correspondingly typed variable 'xyz' is written as 'xyzdouble-^{}^{}^#Perfect_number'.

The XML syntax can be obtained by using a 'type' attribute on XML term elements such as Con. Thus, the above example becomes <Con type="#Perfect_number">6</Con>. A correspondingly typed variable 'xyz' is written as <Var type="#Perfect_number">xyz</Var>.

### Formalization of Multisorted Extensions

Formally, the syntax of RIF core needs the following adjustments. We introduce new functions:

PSort: Con → powerset(Primitive_Sorts)

ASort: Con → powerset(Arrow_Sorts)

BSort: Con → powerset(Boolean_Sorts)

Each of these functions associates a (possibly empty) set of sorts (primitive, arrow, or Boolean) with every constant c ∈ Con.

PSort is also defined on variables:

PSort: Var → powerset(Primitive_Sorts)

The intended meaning is that if ?v ∈ Var and PSort(?v) = {s_{1}, ..., s_{k}} then ?v can be bound only to function terms that are simultaneously of sorts s_{1}, ..., s_{k} (we define what it means for a function term to belong to a primitive sort below). In theory, PSort(?v) can be an empty set. However, such a variable would be useless, since it cannot be bound to anything.

**Well-formed function terms**. If c ∈ Con is a constant and s ∈ PSort(c) then we say that c (and c(), which we identify with c) is a * well-formed function term of sort s*. Note that the same constant can be a well-formed term of several different sorts because we allow several primitive sorts to be associated with the same constant. The informal meaning of such a happenstance is that the term belongs to the "intersection" of all the sorts with which it is associated.

By induction, if f(t_{1}, ..., t_{k}) is a function term then it is a * well-formed function term of sort s* if there is an arrow sort s

_{1}, ..., s

_{k}→ s ∈ ASort(f) such that t

_{1}, ..., t

_{k}are well-formed function terms of sorts s

_{1}, ..., s

_{k}, respectively.

It is convenient to extend the mapping PSort from constants to function terms as follows:

PSort(t) = { s | t is a well-formed term of sort s }

**Well-formed atomic formula**. We can now say that an atomic formula p(t_{1}, ..., t_{k}) is * well-formed* if and only if t

_{1}, ..., t

_{k}are well-formed function terms and there is a Boolean sort s

_{1}× ... × s

_{k}∈ BSort(p) such that s

_{1}∈ PSort(t

_{1}), ..., s

_{k}∈ PSort(t

_{k}).

The only other modification to the definition of the RIF syntax is that we must require that all atomic formulas that occur in RIF conditions and rules must be well-formed.

### Semantics of the Multisorted RIF Core

The semantics of RIF core needs the following adjustments in order to be compatible with the multisorted syntax:

- The domain D of an interpretation is now split into several subdomains:
D = D

_{s1}∪ ... ∪ D_{sn}, where each D_{si}is the domain of interpretation of the primitive sort s_{i}.

If c ∈ Con or ?v ∈ Var is a constant or a variable of

primitive sort s then I

_{C}(c) ∈ D_{s}and**I**_{V}(?v) ∈ D_{s}.

If f has an arrow type s

_{1}, ..., s_{k}→ s ∈ ASort(f) then**I**_{F}(f) should be a (possibly polymorphic) function of type D_{s1}× ... × D_{sk}→ D_{s}, i.e., if d_{1}∈ D_{s1}, ..., d_{k}∈ D_{sk}then**I**_{F}(d_{1},...,d_{k}) must be in D_{sk}(if the arguments are not in D_{s1}× ... × D_{sk}then the result does not need to be in D_{s}, but**I**_{F}(f) might have other types, which restrict its behavior).

The definition of

**I**_{P}requires no adjustments.