W3C

RIF Basic Logic Dialect

W3C Editor's Draft 24 September 2007

This version:
(@@not-published, expands on the first published version)
Latest version:
http://www.w3.org/TR/rif-core
Previous version:
http://www.w3.org/TR/2007/WD-rif-core-20070330
Editors:
Harold Boley (National Research Council Canada)
Michael Kifer (State University of New York at Stony Brook)

Abstract

This document, developed by the Rule Interchange Format (RIF) Working Group, specifies the basic design for a format that allows logic rules to be translated between rule languages and thus transferred between rule systems.

In Phase 1, the RIF Working Group first defines a condition language, which is envisioned to be a shared part of all RIF dialects. In this document, the RIF Condition Language is used to define rule bodies of the RIF Basic Logic Dialect (RIF BLD). We give an abstract syntax (with UML visualization) and semantics (model theory) for positive and slotted conditions as well as for a Horn rule language. Examples in this document are based on a human-oriented syntax and an XML syntax, which are derived from the abstract syntax, but in the current working draft these syntaxes are used only for explanations and illustration.

Status of this Document

May Be Superseded

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Please Comment By DD MM 2007

The Rule Interchange Format (RIF) Working Group seeks public feedback on this Second Public Working Draft. Please send your comments to public-rif-comments@w3.org (public archive). If possible, please offer specific changes to the text which will address your concern.

No Endorsement

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.


Table of Contents

1. RIF Overview

(Editor's Note: This text is maintained on wiki page RIF Overview).

Based on the RIF Use Cases and Requirements, this document develops RIF BLD (the Basic Logic Dialect of the Rule Interchange Format) through a set of foundational concepts that are intended to be shared by all logic-based RIF dialects. The overall RIF design takes the form of a layered architecture organized around the notion of a dialect.

A dialect is a rule language with a well-defined syntax and semantics. This semantics must be model-theoretic, proof-theoretic, or operational in this order of preference. Some dialects might be proper extensions of others (both syntactically and semantically) and some may have incompatible expressive power. All logic-based dialects are required to extend the RIF Basic Logic Dialect.

From a theoretical perspective, RIF BLD corresponds to the language of definite Horn rules (see Horn Logic) with equality (and with a standard first-order semantics). Syntactically, however, RIF BLD has a number of extensions to support features such as objects and frames, international resource identifiers (or IRIs) RFC 3987 as identifiers for concepts, and XML Schema data types. These latter features make RIF a Web language. However, RIF is designed to enable interoperability among rule languages in general, and its uses are not limited to the Web. The semantics of RIF has provisions for future extensions towards dialects that support pure FOL, dialects that support negation as failure (NAF), business (or production) rules, reactive rules, and other features.

Eventually, it is hoped that RIF dialects will cover a number of important paradigms in rule-based specification and programming. Our main target paradigms include production rules, logic programming, FOL-based rules, reactive rules, and normative rules (integrity constraints).

The central part of RIF is its Condition Language. The condition language defines the syntax and semantics for the bodies of the rules in RIF BLD and the syntax for the queries. However, it is envisioned that the Condition Language will have wider use in RIF. In particular, it might be used as a sublanguage for specifying the conditional part in the bodies of production rules (PRD), reactive rules, and normative rules.

As mentioned, IRI constants are used in RIF both as logical constants, as names of predicate, and of function symbols. RIF BLD does not allow the same symbol to play more than one of these roles, but the dialects that extend the BLD may support polymorphic symbols (i.e., symbols that have more than one contextual use; e.g., of a constant and of a predicate). Such polymorphism is common practice in Logic Programming (e.g., [Prolog], [F-logic], [HiLog]) and in [RDF]. This extensibility is achieved in RIF by building its syntax on the basis of signatures. This draft also introduces a frame-based syntax and semantics and defines a normative way for RIF rules to interact with RDF.

The current document is the second draft of the RIF BLD specification (in the first draft called 'RIF Core'). A number of extensions are planned to support built-ins, additional primitive XML data types, the notion of RIF compliance, and so on. Tool support for RIF BLD is forthcoming. RIF dialects that extend BLD will be specified in other documents by this working group.

2. RIF Condition Language

(Editor's Note: This text is maintained on wiki page RIF Condition Language).

The RIF Condition Language is intended to be a common basis for several dialects of RIF. First of all, it is used by RIF BLD, as described in this document. The other future dialects or groups of dialects where the Condition Language or its extensions might be used include:

  • Rule bodies and queries in declarative logic programming dialects (LP)

  • Rule bodies in first-order dialects (FO)

  • Conditions in the rule bodies of the Production Rule Dialect (PRD)

  • The event and condition parts of the rule bodies in reactive rule dialects (RR)

  • Integrity constraints (IC)

It should be noted, however, that apart from RIF BLD and RIF PRD no decision has been made regarding which dialects will ultimately be part of RIF.

The RIF Condition Language is intended to be used only in rule bodies and queries, not in rule heads. The various RIF dialects diverge in the way they specify, interpret, or use rule heads and other components of the rules. By focusing on the condition part of the rule bodies we achieve maximum syntactic and a great deal of semantic reuse among the dialects.

This Condition Language part of the document describes Positive Conditions and Slotted Conditions.

2.1. Positive Conditions

(Editor's Note: This text is maintained on wiki page Positive Conditions).

The language of positive RIF conditions determines what can appear as a body (the if-part) of a rule supported by the basic RIF logic. As explained in Section Overview, RIF's Basic Logic Dialect corresponds to definite Horn rules, and the bodies of such rules are conjunctions of atomic formulas without negation. However, it is well-known that disjunctions of such conditions can also be used in the bodies of rules without changing the essential properties of the rule language. This is based on the fundamental logical tautologies (h ← b ∨ c) ≡ ((h ← b) ∧ (h ← c)) and ∀ x (F ∧ G) ≡ (∀ x F ∧ ∀ x G). In other words, a rule with a disjunction in the body can be split into two or more rules that have no such disjunction.

The condition language is intended to be shared among the bodies of the rules expressed in future RIF dialects, such as LP, FO, PR and RR. The condition language might also be used to uniformly express integrity contraints and queries. This section presents a syntax and semantics for the RIF condition language.

To make RIF dialects suitable as Web languages, RIF supports XML Schema and some other primitive data types. In addition, RIF promotes the use of internationalized resource identifiers (or IRIs) RFC 3987 to refer to individuals, predicates, and functions.

To ensure extensibility and to provide for future higher-order dialects based on formalisms, such as HiLog and Common Logic, the RIF logic language does not draw sharp boundary between the symbols used to denote individuals from symbols used as names for functions or predicates. Instead, all constant, preficate, and function symbols are drawn from the same universal set. RIF dialects control the contexts in which the different symbols can occur by attaching signatures to these symbols.

RIF's basic logic dialect carefully selects signatures for the symbols so that the corresponding logic will be first-order: each symbol has a unique role as an individual, a function symbol of a particular arity, or a predicate symbol of a particular arity. However, dialects extending the basic logic will be allowed to use signatures in less restrictive fashions so that symbols could be polymorphic, polyadic, and be allowed to occur in several different contexts (for example, both as individuals and as predicates).

We begin by describing a syntax, which is more general than what the basic logic dialect permits. This syntax can be used in the various dialects that extend RIF's basic logic dialect. Then we introduce the notion of a signature and specify the restrictions on the way signatures are allowed to be assigned to symbols.

2.1.1. Syntax

In this document we will introduce several related but distinct syntaxes:

  • Formal syntax. This syntax is used in the definitions and follows the standard textbook conventions for logical syntax.

  • Presentation syntax. This syntax is closely related to the formal syntax and is derived from it by elaborating on the various aspects of RIF (such as, for example, data types). This is a human-oriented syntax, which we use in our the examples.

  • Abstract syntax. This is a normative version of the RIF syntax. It is also derived from the formal syntax, but is more verbose than the presentation syntax (this verbose nature of the abstract syntax is the main reason for the presentation syntax).

  • XML syntax. This syntax is the normative XML serialization of the abstract syntax.

2.1.1.1. Formal Preliminaries

The language of RIF Condition Language consists of a countably infinite set of constant symbols Const, a countably infinite set of variable symbols Var (disjoint from Const), connective symbols ∧ and ∨, quantifier ∃, and auxiliary symbols like (, ), and so on. The basic language construct is called term, which is defined inductively as follows:

  • If tConst or tVar then t is a term.

  • If t and t1, ..., tn are terms then t(t1 ... tn) is a term

This definition is very general. It makes no distinction between constant symbols that represent individuals, predicates, and function symbols. The same symbol can occur in multiple contexts at the same time. For instance, p(p(a) p(a b c)) is a term. Even variables and general terms are allowed to occur in the position of predicates and function symbols. For instance, p(a)(p(b c) q) is also a term. To control the context in which any given symbol can occur in RIF dialects, the language associates a signature with each symbol (both constant and variable symbols).

Signatures. Let SigNames be a non-empty finite or countably infinite set of signature names. We require that this set includes at least the following signature names: i and bool. The signature name i represents the context where the constants that denote individual objects are allowed to appear. The name bool represents the context of atomic formulas. A signature is a statement of the form η{e1 ... en ...} where η ∈ SigNames is the name of the signature and {e1 ... en ...} is a set of signature expressions. This set can be empty, finite, or countably infinite. In RIF's basic logic dialect, this set will have at most one expression, but in more expressive dialects a signature can have more than one expression, and in this way they can support polymorphism.

A signature expression can be a base signature expression or an arrow signature expression (as a special case, an arrow expression can be a Boolean signature expression), defined as follows, where κ1, ..., κn, and κ are signature names from SigNames:

  • i and bool are basic signature expressions.

  • 1 ... κn) ⇒ κ, where n≥0, is an arrow signature expression. In particular, () ⇒ i and (i) ⇒ i are arrow signature expressions.

  • If κ above is bool then the signature is called a Boolean signature expression.

A set S of signatures is said to be coherent if

  • it contains the signatures i{ } and bool{ }, which represent the context of individual objects and atomic formulas; and

  • no two different signatures in S have the same name.

Well-formed terms and formulas. RIF uses signatures to control the context in which terms can appear through the notion of well-formed terms and well-formed atomic formulas. First, as mentioned above, each symbol (constant or variable) is associated with exactly one signature from a coherent set of signatures. (Different symbols can be associated with the same signature, but one symbol cannot be associated with more than one signature.) If σ is a signature then we will use σ# to denote the name of that signature.

  • If s is a constant or variable symbol with signature η{...} and a base expression i or bool belongs to the set {...} associated with that signature then s is a well-formed term with signature i{ } or bool{ }, respectively.

    • Note that if symbol s has a signature, say, κ{i (i)⇒bool} and symbol t has a signature ρ{(i)⇒i (i)⇒bool} then, according to the above definition, s is a well-formed term, while t is not. Also, according to this definition the signature of a symbol can be different from the signature of the same symbol when it is viewed as a term. For instance, in the above example, s has a signature κ{i (i)⇒bool} as a symbol, but its signature as a term is i{ }. Furthermore, a symbol always has just one signature (which might have many arrow expressions in it), but a term can have several signatures. For instance, if symbol q has the signature δ{i bool} then q as a well-formed term has two signatures: i{ } and bool{ }.

    • Although some symbols can have signatures as symbols and signatures as terms, it will be clear from the context which signature is meant. Complex terms (defined below) are not symbols and so they can have signatures only as terms.
  • A term t(t1 ... tn) is a well-formed term with signature σ iff

    • t is a well-formed term with signature σ0;

    • Each ti is a well-formed term with signature σj, j = 1,...,n (the σj's are not necessarily distinct); and

    • σ0 contains an arrow expression of the form (σ1# ... σn#) ⇒ σ#

  • A term t(t1 ... tn) is a well-formed atomic formula iff it is a well-formed term with signature bool{ }.

Note that f() and f are distinct terms and the proposed XML serialization also treats these as distinct fragments of XML: the nullary function application <Uniterm><Const>f</Const></Uniterm> vs. the symbol <Const>f</Const>.

RIF defines that there is a special predicate, =, which denotes equality. Like other predicates it has a signature, and this signature includes the Boolean expression (i i) ⇒ bool and possibly other expressions of the form (s s) ⇒ bool, where s is a signature name. No other signatures are allowed for =. We note that it is common practice to write the atomic formulas involving = using infix notation, i.e., a = b instead of =(a,b). The equality predicate has special model-theoretic semantics, as explained in Section Model Theory for RIF's Basic Condition Language.

Examples. To illustrate the above definitions, we give several examples.

Consider the term p(p(a) p(a b c)). If p has the (polymorphic) signature myPsig{(i)⇒i (i i)⇒i (i i i)⇒i} and a, b, c each has the signature i{ } then p(p(a) p(a b c)) is a well-formed term with signature i{ }. If, for instance, p's signature were myPsig2{(i i)⇒i (i i i)⇒i} then p(a) would not have been a well-formed term and the entire term would also be ill-formed.

Here is another, fancier, signature for p under which the above term would be well-formed (and again have signature i{ }): myPsig3{(i)⇒bool (bool i)⇒i (i i i)⇒i}. If p's signature were myPsig4{(i)⇒bool (bool i)⇒bool (i i i)⇒i} instead, then p(p(a) p(a,b,c)) would have been a well-formed term with signature bool{ } and, therefore, it would have been a well-formed atomic formula.

An even more advanced example of a signature is when the right-hand side of an arrow expression is a signature other than i or bool. For instance, let John, Mary, NewYork, and Boston have signatures i{ }; flight and ancestor have signatures h2{i (i i)⇒bool}; and closure have signature hh1{(h2)⇒p2}, where p2 is the name of the signature p2{(i i)⇒bool}. Then closure(flight)(NewYork,Boston) and closure(ancestor)(John,Mary) would be well-formed formulas. Such formulas are allowed in languages like HiLog, which support predicate constructors like closure in our example.

More general formulas are constructed out of atomic formulas with the help of logical connectives. RIF's Basic Condition Language defines the following general well-formed formulas.

  • If φ is a well-formed atomic formula then it is also a general well-formed formula.

  • If are φ and ψ are general well-formed formulas then so is φ ∧ ψ.

  • If are φ and ψ are general well-formed formulas then so is φ ∨ ψ.

  • If φ is a well-formed general formula and V1, ..., Vn are variables then ∃ V1 ... Vn φ is a general well-formed formula.

Signatures in the RIF Basic Condition Language. RIF's basic condition language presents a much simpler picture to the user by restricting the set of well-formed terms to a specific coherent set of signatures. Namely, the signature set of RIF's basic logic language contains the following signatures:

  • i{ } and bool{ }

  • For every integer arity n ≥ 0, there are signatures fn{(i ... i) ⇒ i} and pn{(i ... i) ⇒ bool} (there are n i's inside the parentheses), which represent function symbols of arity n and predicate symbols of arity n, respectively. In addition to i and bool, the symbols fn and pn are reserved signature names in the RIF basic logic dialect. In this way, each constant symbol can be either an individual, a predicate of one particular arity, or a function symbol of one particular arity.

  • All variables are associated with signature i{ }, so they can range only over individuals.

  • The equality symbol, =, has signature p2{(i i) ⇒ bool}. In this way, the equality predicate can compare only the terms whose signature is i{ }; it cannot compare predicate names or function symbols.

Symbol spaces. Throughout this document, the rif: prefix will stand for the XML Schema namespace URI http://www.w3.org/2001/XMLSchema and rif: will stand for the URI of the RIF namespace, http://www.w3.org/2007/rif.

The set of all constant symbols in RIF has a number of predefined subsets, called symbol spaces, which are used to represent XML data types, data types defined in other W3C specifications, such as rdf:XMLLiteral, and to distinguish other sets of constants. The following primitive data types are supported: xsd:long (http://www.w3.org/2001/XMLSchema#long), xsd:string (http://www.w3.org/2001/XMLSchema#string), xsd:decimal (http://www.w3.org/2001/XMLSchema#decimal), xsd:time (http://www.w3.org/2001/XMLSchema#time), xsd:dateTime http://www.w3.org/2001/XMLSchema#dateTime), and rdf:XMLLiteral (http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral).

Basic RIF logic defines two additional symbol spaces: rif:iri (for internationalized resource identifier or IRI) and rif:local (for constant symbols that are not visible outside of a particular set of RIF formulas).

Constant symbols that belong to the various symbol spaces will have special presentation syntax, and semantic structures will interpret them in a special way. In the formal syntax, however, support for symbol spaces only requires us to state that the set Const of all constant symbols has subsets Constxsd:long, Constxsd:string, Constxsd:decimal, Constxsd:time, Constxsd:dateTime, Constrdf:XMLLiteral, Constrif:iri, and Constrif:local. These sets of symbols are described in more detail later.

2.1.1.2. Abstract Syntax

We define the Abstract Syntax of BLD with (strictly) alternating syntactic class/property categories, i.e. in a "(fully) striped" manner. Section Presentation Syntax will then proceed to a "stripe-skipped" or "unstriped" human-readable syntax as well as to a (fully) striped XML syntax.

TODO: Explain "fully striped."

To compare two approaches at [F2F7], we represent the abstract syntax of the RIF Condition Language both in Abstract Syntax Notation and in Abstract EBNF Syntax.

2.1.1.2.1. Abstract Syntax Notation

The abstract syntax of the condition language is given in asn06 as follows:

class CONDITION

    subclass And
        property formula : list of CONDITION

    subclass Or
        property formula : list of CONDITION

    subclass Exists
        property declare : list of Var
        property formula : CONDITION

    subclass ATOMIC

class ATOMIC

    subclass Uniterm

    subclass Equal

class Uniterm
    property op: Const
    property arg: list of TERM

class Equal
    property side: list of TERM

class TERM

    subclass Const
        property name: CONSTNAME
        property type: TYPENAME?

    subclass Var
        property name: VARNAME

    subclass Uniterm    

The length of the list of Var of the declare property (role) of the Exists class is assumed to be 1 or more. The multiplicity of the side property of the Equal class is assumed to be exactly 2.

2.1.1.2.2. Abstract EBNF Syntax

We represent the abstract syntax of the RIF Condition Language with the help of usual EBNF, employed here to define BLD in a fully striped normal form. In this Abstract EBNF Syntax, the properties are named using idioms of the form 'name ->', e.g. 'formula ->'.

CONDITION  ::= 'And' '('  ('formula ->' CONDITION)*  ')'
                 |
               'Or'  '('  ('formula ->' CONDITION)*  ')'
                 |
               'Exists' '(' ('declare ->' Var)+  'formula ->' CONDITION ')'
                 |
               ATOMIC

ATOMIC     ::= Uniterm | Equal

Uniterm    ::= 'Uniterm' '(' 'op ->' Const  ('arg ->' TERM)* ')'

Equal      ::= 'Equal' '(' 'side ->' TERM  'side ->' TERM ')'

TERM       ::= 'Const' ( '(' 'type ->' TYPENAME ')' )?  '(' CONSTNAME ')'
                 |
               'Var' '(' VARNAME ')'
                 |
               Uniterm

The above abstract syntax can be illustrated with a UML diagram, as shown below. Automatic transformation from the Abstract EBNF Syntax to UML can be implemented based on the earlier asn06-to-UML transformation.

The central class of RIF, CONDITION, is specified recursively through its subclasses and their parts. The Equal class has two side roles. Two pairs of roles distinguish between the declare and formula parts of the Exists class, and between the operation (op) and arguments (arg) of the Uniterm class, where multiple arguments are subject to an ordered constraint (the natural "left-to-right" order):

The syntactic classes are partitioned into classes that will not be visible in serializations (written in all-uppercase letters) and classes that will be visible in instance markup (written with a leading uppercase letter only).

The three classes Var, CONDITION, and ATOMIC will be required in the abstract syntax of Horn Rules.

2.1.1.3. Presentation Syntax

The abstract syntax of Section Abstract Syntax can be lowered to a presentation syntax. The Presentation EBNF Syntax below, which lowers the Abstract Syntax, is used throughout this document to explain and illustrate the main ideas. This syntax is similar in style to what in OWL is called the Abstract Syntax http://www.w3.org/TR/owl-semantics/syntax.html.

Presentation EBNF Syntax. The human-oriented presentation syntax, described in this EBNF (usual except for whitespace handling), is work in progress and under discussion.

  CONDITION   ::= 'And' '(' CONDITION* ')' |
                  'Or' '(' CONDITION* ')' |
                  'Exists' Var+ '(' CONDITION ')' |
                  ATOMIC
  ATOMIC      ::= Uniterm | Equal
  Uniterm     ::= Const '(' TERM* ')'
  Equal       ::= TERM '=' TERM
  TERM        ::= Const | Var | Uniterm
  Const       ::= CONSTNAME | '"'CONSTNAME'"''^^'TYPENAME
  Var         ::= '?'VARNAME

The above is a standard syntax for a variant of first-order logic. The application of a symbol from Const to a sequence of terms is called Uniterm ("Universal term") since it can be used to play the role of a function term or an atomic formula depending on the syntactic context in which the application occurs. The non-terminal ATOMIC stands for (positive) atomic formula, which can later be complemented with "negated atomic formula". The 'Exists' formula is an "existential formula", which in Horn-like conditions is the only quantified formula but in later conditions may be complemented with "universal formula" (Var+ denotes the list of free variables in CONDITION). The 'And' formula defines conjunctions of conditions, and 'Or' denotes disjunctions. Finally, CONDITION assembles everything into what we call RIF conditions. RIF dialects will extend these conditions in various ways.

Note that individuals, function symbols, and predicate symbols all belong to the same set of symbols (Const). This syntax is more general than what RIF's basic logic dialect actually permits. As explained in Section Formal Preliminaries a set of signatures restricts this syntax to allow only the terms that are allowed in standard first-order logic.

The above Presentation EBNF Syntax, where TERM* is understood as TERM* ::= | TERM TERM*, uses spaces to indicate Lisp-like whitespace (SPACE, TAB, or NEWLINE) character sequences in the defined language; these must consist of at least one whitespace character when used as the separator between adjacent TERMs, and can consist of zero or more whitespace characters elsewhere. Where spaces are omitted from the Presentation EBNF Syntax, as between '?' and VARNAME in '?'VARNAME, there must be no whitespace character in the defined language.

Constant symbols can have two forms: CONSTNAME and "CONSTNAME"^^TYPENAME. The second form is more general. The first form is used as a shorthand for some symbol spaces, like long integers, decimals, local RIF constants, etc. The precise syntax for RIF constants is given in Section Primitive Data Types.

At this point we do not commit to any particular vocabulary for the names of variables. These are assumed to be alphanumeric character sequences starting with a ?-sign.

**** MK: should probably be more specific: Vars are alphanums starting with a ? or something like ?'.....', if non-alphanum characters are involved.

The lowering of the abstract syntax to this presentation syntax can be done via automatic EBNF-to-EBNF mapping. The mapper, abs2con4g ("abstract to concrete for grammar"), uses a TokenTable parameter specifying how to map certain abstract classes, in prefix notation, to operators, in infix or prefix notation. This table is given in the form of binary class2token facts, accessed via a lookup function. The main mapper is given in the form of (left-to-right oriented) equations defining abs2con4g as a binary function with the abstract syntax as first argument, the table as second argument, and the presentation syntax as returned value.

class2token('Equal','=')
class2token('type','^^')
class2token('Var','?')

abs2con4g(?Production1 ... ?ProductionN,
          ?TokenTable)
 =
  abs2con4g(?Production1,
            ?TokenTable)
  ...
  abs2con4g(?ProductionN,
            ?TokenTable)

abs2con4g(?LeftSymbol ::= ?RightExpression, ?TokenTable)
 =
  ?LeftSymbol ::= abs2con4g(?RightExpression, ?TokenTable)

abs2con4g('And'
            '('
               ('formula ->' CONDITION)*
             ')'
            |
          'Or'
            '('
               ('formula ->' CONDITION)*
            ')'
            |
          'Exists'
            '('
               ('declare ->' Var)+
                'formula ->' CONDITION
            ')'
            |
          ATOMIC,
          ?TokenTable)
 =
  'And' '(' CONDITION* ')' |
  'Or' '(' CONDITION* ')' |
  'Exists' Var+ '(' CONDITION ')' |
  ATOMIC

abs2con4g(Uniterm
            |
          Equal,
          ?TokenTable)
 =
  Uniterm | Equal


abs2con4g('Uniterm'
            '('
               'op ->' Const
               ('arg ->' TERM)*
            ')',
          ?TokenTable)
 =
  Const '(' TERM* ')'

abs2con4g('Equal'
            '('
               'side ->' TERM
               'side ->' TERM
             ')',
          ?TokenTable)
 =
  TERM lookup('Equal',?TokenTable) TERM

abs2con4g(Const
            |
          Var
            |
          Uniterm,
          ?TokenTable)
 =
  Const | Var | Uniterm

abs2con4g('Const' ('(' 'type ->' TYPENAME ')')?
             '('
                CONSTNAME
             ')',
          ?TokenTable)
 =
  CONSTNAME(lookup('type',?TokenTable)TYPENAME)?

abs2con4g('Var'
            '('
               VARNAME
             ')',
          ?TokenTable)
 =
  lookup('Var',?TokenTable)VARNAME

Note that variables in the RIF Condition Language can be free or quantified. All quantification is explicit. All variables introduced by quantification must also occur in the quantified formula. Initially, only existential quantification is used. Universal quantification will be introduced later. We adopt the usual scoping rules for quantifiers from first-order logic. Variables that are not explicitly quantified are free.

Free variables arise because CONDITION can occur in an if part of a rule. When this happens, the free variables in a condition formula are precisely those variables that also occur in the then part of the rule. We shall see in Section Horn Rules that such variables are quantified universally outside of the rule, and the scope of such quantification is the entire rule. For instance, the variable ?Y in the first RIF condition of Example 1 is quantified, existentially, but ?X is free. However, when this condition occurs in the if part of the second rule in Example 1, then this variable is quantified universally outside of the rule. (The precise syntax for RIF rules will be given in Section Horn Rules.)

Example 1 (A RIF condition in and outside of a rule)

 RIF condition:
                        Exists ?Y (condition(?X ?Y)) 

 RIF Horn rule:
        Forall ?X (then-part(?X) :-  Exists ?Y (condition(?X ?Y)))

When RIF conditions are used as queries, their free variables carry answer bindings back to the caller.

2.1.2. Symbol Spaces and Primitive Data Types

Following N-Triples, we will use the following presentation syntax to specify the constant symbols that belong to a symbol space label:

  "literal"^^label

For instance, "123"^^xsd:long is a symbol that belongs to the symbol space xsd:long, "2007-11-23T03:55:44-02:30"^^xsd:dateTime is a symbol in symbol space xsd:dateTime, and "ftp://example.com/foobar"^^rif:iri is a symbol that belongs to the symbol space rif:iri. The part of such a symbol that occurs inside the double quotes is called the lexical form of the symbol. The surrounding double quotes are not part of the literal. If a double quote is included as part of a literal, it must be escaped with the backslash. Some data types have shorthand notation in which the ^^label part can be omitted (see below).

A symbol space in RIF has the following components:

  • A non-empty set of character strings called the lexical space of the symbol space. The lexical space for a type, D, determines which character strings are allowed as value in a symbol like "value"D. For instance, "1.2"^^xsd:decimal and "1"^^xsd:decimal are two legal symbols because 1.2 and 1 are legal in the lexical space of the XML datatype xsd:decimal. On the other hand, "a+2"^^xsd:decimal is not a legal symbol, since a+2 is not part of the lexical space of the XML datatype xsd:decimal.

  • An absolute IRI that identifies the symbol space.

A constant symbol of the form "xyz"^^label is well-formed if its lexical form, xyz belongs to the lexical space associated with the symbol space label.

In addition the semantics of a symbol space defines

  • A non-empty set called the value space of the data type; and

  • A mapping from the lexical space of the data type to its value space.

These two components will be explained in Section Model Theory for RIF's Basic Condition Language. We would like to point out, however, that symbol spaces in RIF include spaces, such as xsd:long, whose value space is fixed as well as spaces like rif:iri and rif:local whose value spaces are not fixed. We will refer to the former as primitive data types.

A number of primitive types in RIF are based on the XML Schema data types, and their names are references to the corresponding XML Schema types.

In this version of the RIF BLD document, we define the following symbol spaces, where the prefix xsd refers to the XML Schema URI and rif is the prefix for the RIF language. The syntax such as xsd:long should be understood as a compact uri, i.e., a macro, which expands to a concatenation of the character sequence denoted by the prefix xsd and the string long. In the next version of this document we will introduce a syntax for defining prefixes for compact URIs and will also expand on the syntax for the symbols that can be used to denote RIF's primitive data types.

  • xsd:long. This symbol space corresponds to the XML data type xsd:long. Example: "123"^^xsd:long. Long integers also have a short notation, which does not require the "..."^^xsd:long wrapper. That is, long integers can be written as constants from the lexical space of xsd:long, but the surrounding double quotes and the type label ^^xsd:long can be omitted. Example: 123.

  • xsd:decimal. This corresponds to the XML data type xsd:decimal. Example: "123.45"^^xsd:decimal. Decimals also have an alternative short notation, which does not require the "..."^^xsd:decimal wrapper. That is, decimals can be written as constants from the lexical space of xsd:decimal, but the surrounding double quotes and the type label ^^xsd:decimal can be omitted. Example: 123.45.

The type xsd:decimal is a supertype of xsd:long.

  • xsd:string. This corresponds to the XML data type xsd:string. Example: "a string"^^xsd:string. Double quotes that appear inside strings are escaped with a backslash and a backslash that is supposed to appear in the string must be escaped with another backslash.

  • xsd:time. This corresponds to the XML data type xsd:time. Example: "18:33:44.2345"^^xsd:time.

  • xsd:dateTime. This type corresponds to the XML data type xsd:dateTime. Example: "2007-03-12T21:22:33.44-01:30"^^xsd:dateTime.

  • rdf:XMLLiteral. This type's lexical space is used to represent certain XML documents, as described in Resource Description Framework (RDF): Concepts and Abstract Syntax.

  • rif:iri. Symbols in this symbol space have the form "XYZ"^^rif:iri, where XYZ is an absolute IRI as specified in RFC 3987.

  • rif:local. Symbols in that symbol space have the form "XYZ"^^rif:local, where XYZ is any string of characters (provided that each occurrence of " and of \ in such a string is escaped with a backslash). Thus, the lexical space of rif:local is the same as the lexical space of xsd:string. Local symbols also have an alternative short notation. Example: 'a local symbol'. In this notation, single quotes and backslashes that occur inside such strings are escaped with backslashes.

TODO: Define CURIs.

Other XML data types that are likely to be incorporated in RIF include xsd:double, xsd:date, and a type for temporal duration. At present, the partial order on the above primitive data types is imposed by the XML Schema hierarchy, so the only subtype relationship is between xsd:long and xsd:decimal. This may be extended as more types are added.

The symbol space rif:iri is intended to be used in a way similar to RDF resources. The value space of the symbol space rif:iri can be any set and no a priori equalities among the members of the type rif:iri are assumed. This domain is not the same as the value space of the XML primitive type anyURI.

The symbol space rif:local is used for constant symbols (including predicate and function symbols) that are local to the various sets of RIF formulas. They are not visible outside of the rule set in which these constants occur (except, possibly, when such a constant is equated to a constant of type rif:iri).

TODO. We will need to define the notion of the rule set more precisely. For instance, what is the rule set in which a particular rule occurs? Will we have #include statements? Import? All this has to be hashed out.

The constant symbols that correspond to XML Schema data types all have the signature i{ } in RIF's basic logic dialect. The symbols of type rif:iri and rif:local can have any signature allowed by the basic RIF logic: i, fn, or pn, for n = 0,1,....

Symbols with ill-formed lexical part. RIF constant symbols that belong to one of the RIF-supported symbol spaces must be well-formed, i.e., their lexical form must belong to the lexical space associated with the symbol space. For instance, "123"^^xsd:long has a correct lexical form, since 123 belongs to the lexical space of the data type xsd:long. In contrast, "abc"^^xsd:long is ill-formed, as it does not have a correct lexical form. A compliant RIF interpreter must reject such symbols.

Symbols with undefined symbol spaces. RIF allows symbols of the form "..."^^label where label is not one of the previously defined standard symbol spaces (such as xsd:long, rif:local, etc.). These are treated as uninterpreted constant symbols in the RIF language. For instance, "abc"^^cde is such an uninterpreted symbol. Dialects that extend the basic RIF logic dialect might appropriate some of the symbol spaces, which are undefined in the basic RIF dialect, and give them special semantics.

2.1.3. XML serialization

EDITOR'S NOTE: The XML syntax for BLD presented here is one of the proposals the Working Group is considering. It is presented here to get feedback on this strawman and to give readers an idea for the kind of information that will be presented in this section.

The XML serialization for RIF abstract syntax presented in Section Abstract Syntax is fully striped. Positional information is optionally exploited only for the arg role elements. For example, role elements (declare and formula) are explicit within the Exists element. Following the examples of Java and RDF, we use capitalized names for class elements and names that start with lowercase for role elements.

The all-uppercase classes in the abstract syntax, such as CONDITION, become XML entities. They act like macros and are not visible in instance markup. The other classes as well as non-terminals and symbols (such as Exists or =) become XML elements with optional attributes, as shown below (see also Appendix Specification).

- And     (conjunction)
- Or      (disjunction)
- Exists  (quantified formula for 'Exists', containing declare and formula roles)
- declare (declare role, containing a Var)
- formula (formula role, containing a CONDITION formula)
- Uniterm (atomic or function-expression formula)
- op      (operation role)
- arg     (argument role, positional/non-positional without/with optional 'index' attribute)
- Equal   (prefix version of term equation '=')
- side    (left-hand and right-hand side role)
- Const   (predicate symbol, function symbol, or individual, with optional 'type' attribute)
- Var     (logic variable)

The XML syntax for symbol spaces utilizes the type attribute associated with XML term elements such as Const. For instance, a literal in the xsd:dateTime data type can be represented as <Const type="xsd:dateTime">2007-11-23T03:55:44-02:30</Const>.

The following example illustrates XML serialization of RIF conditions.

Example 2 (A RIF condition and its XML serialization):

a. RIF condition:

  And ( Exists ?Buyer (purchase(?Buyer
                                ?Seller
                                book(?Author "http://example.com/books#LeRif"^^rif:iri)
                                "http://example.com/currencies#USD"^^rif:iri(49)))
        ?Seller=?Author )

b. XML serialization:

  <And>
    <formula>
      <Exists>
        <declare><Var>Buyer</Var></declare>
        <formula>
          <Uniterm>
            <op><Const type="rif:local">purchase</Const></op>
            <arg><Var>Buyer</Var></arg>
            <arg><Var>Seller</Var></arg>
            <arg>
              <Uniterm>
                <op><Const type="rif:local">book</Const></op>
                <arg><Var>Author</Var></arg>
                <arg><Const type="rif:iri">http://example.com/books#LeRif</Const></arg>
              </Uniterm>
            </arg>
            <arg>
              <Uniterm>
                <op><Const type="rif:iri">http://example.com/currencies#USD</Const></op>
                <arg><Const type="xsd:long">49</Const></arg>
              </Uniterm>
            </arg>
          </Uniterm>
        </formula>
      </Exists>
    </formula>
    <formula>
      <Equal>
        <side><Var>Seller</Var></side>
        <side><Var>Author</Var></side>
      </Equal>
    </formula>
  </And>

2.1.3.1. Specifying Signatures in RIF Dialects

DaveR: Not clear what's the value of giving an informal example instance XML document for signature specifications here. Either formally define the XML syntax for signatures or drop the example.

We will now explain how signatures are defined in the Basic Condition Language and its extensions. In RIF's basic Logic dialect, there is no need to declare signatures, since they can be inferred. Indeed, the basic logic dialect requires that each symbol is associated with a unique signature. Therefore, the signature can be determined by the context in which the symbol is used. If a symbol is used in more than one context, the parser should deem it as a syntax error.

In dialects that extend RIF's Basic Condition Language, signature inference of the above kind is not possible, in general, and advanced signature inference might not always be appropriate even when it is possible. For this reason, we introduce the attribute sig for the XML elements Const and Var. The value of such an attribute is supposed to be the signature name for the corresponding symbol. For instance, in the above example, purchase was a 4-ary predicate, so it could be specified as <Const sig="p4">purchase</Const>. Since variables in the basic condition language always have the signature i, we could write <Var sig="i">Author</Var>.

The sig attribute is optional, and its value can be inferred for the basic language. For instance, in our example, the values p4 and i for the attribute sig could be inferred from the usage of the symbols purchase and Author. Dialects that extend the basic logic dialect can adopt their own rules for omitting this attribute and for inferring its values. The basic language does not need any special syntax for defining signatures -- all signature names are already defined: bool, i, f0, f1, p0, p1, etc. The dialects, however, will need a special sublanguage for defining signatures. The signature

  mysig{
     (s1 s2 s3) -> bool
     (s i) -> i
       } 

can, for example, be defined using the following XML excerpt:

  <Signature signame="mysig">
    <element>
      <Sigexpr sighead="bool">
        s1 s2 s3
      </Sigexpr>
    </element>
    <element>
      <Sigexpr sighead="i">
        s i
      </Sigexpr>
    </element>
  </Signature>

2.1.4. Model Theory for RIF's Basic Condition Language

The first step in defining a model-theoretic semantics for a logic-based language is to define the notion of a semantic structure, also known as an interpretation. (See end note on the rationale.) In this section we define basic semantic structures. This definition will be extended in Extension of Semantic Structures for Frames when we introduce frame syntax.

Basic semantic structures. A basic semantic structure, I, is a tuple of the form <D,IC, IV, IF, IR>, which determines the truth value of every formula, as explained below. Currently, by formula we mean anything produced by the CONDITION production in the Presentation EBNF Syntax. Later on, formulas will also include rules. Other dialects might extend this notion even further.

The set of truth values is denoted by TV. For RIF's basic logic dialect, TV includes only two values, t (true) and f (false). (See end note on truth values.)

The set TV has a total or partial order, called the truth order; it is denoted with <t. In the basic RIF logic, f <t t, and it is a total order. (See end note on ordering truth values.)

To define how semantic structures determine the truth values of RIF formulas, we introduce the following sets:

  • D - a non-empty set of elements called the domain of I,

  • Const - the set of individuals, predicate names, and function symbols,

  • Var - the set of variables.

The mappings that constitute an interpretation, I, are as follows:

  • IC from Const to elements of D

  • IV from Var to elements of D

  • IF from Const to functions from D* into D (here D* is a set of all tuples of any length over the domain D)

  • IR from Const to truth-valued mappings D*TV

It is convenient to define a more general mapping, I, based on the mappings IC, IV, IF, and IR:

  • I(k) = IC(k) if k is a symbol in Const

  • I(?v) = IV(?v) if ?v is a variable in Var

  • I(f(t1 ... tn)) = IF(f)(I(t1),...,I(tn))

Truth valuation for formulas. Observe that the notion of signatures from Section Formal Preliminaries is used only to constrain the syntax and does not appear in the definition of the semantic structure. This is because when we define truth valuations for formulas, below, all formulas are assumed to be well-formed.

Truth valuation for well-formed formulas in RIF's basic condition language is determined using the following function, denoted ITruth:

  • Atomic formulas: ITruth(r(t1 ... tn)) = IR(r)(I(t1),...,I(tn))

  • Equality: ITruth(t1=t2) = t iff I(t1) = I(t2); ITruth(t1=t2) = f otherwise.

  • Conjunction: ITruth(And(c1 ... cn)) = mint(ITruth(c1),...,ITruth(cn)), where mint is minimum with respect to the truth order.

  • Disjunction: ITruth(Or(c1 ... cn)) = maxt(ITruth(c1),...,ITruth(cn)), where maxt is maximum with respect to the truth order.

  • Quantification: ITruth(Exists ?v1 ... ?vn (c)) = maxt(I*Truth(c)), where maxt is taken over all interpretations I* of the form <D,IC, I*V, IF, IR>, where the mapping I*V has the same value as IV on all variables except, possibly, on the variables ?v1,...,?vn.

Notice that the mapping ITruth is uniquely determined by the four mapping comprising I and, therefore, it does not need to be listed explicitly.

Interpretation of symbol spaces. We now explain how symbol spaces are integrated into the semantics of the basic RIF logic. As mentioned earlier, the semantics of a symbol space defines an associated value space and a mapping from the lexical space to the value space. The value space may be constrained in various ways or it can be completely fixed. For instance, the XML Schema Part 2: Datatypes specification defines a concrete value space for each XML data type, including the data types such as xsd:decimal, which are of interest to RIF. The value space is different from the lexical space. Lexical space refers to the syntax of the constant symbols that belong to a particular primitive data type. For instance, "1.2"^^xsd:decimal and "1.20"^^xsd:decimal are two legal constants in RIF because 1.2 and 1.20 belong to the lexical space of xsd:decimal. However, these two constants are interpreted by the same element of the value space of the xsd:decimal type.

Other symbol spaces may have their value spaces constrained in some ways, but not fixed. For instance, the value space of rif:iri is defined below as the entire domain of interpretation. This means that this value space cannot be an arbitrary subset of that domain. Some symbol spaces may have their value spaces restricted in other ways. For instance, a value space may be required to be disjoint from the value space of the XML data types.

Formally, each data type comes with a value space, denoted by Dtype (for instance, Dxsd:decimal), and a mapping, IC: ConsttypeDtype. We require that DtypeD for each XML data type, where Consttype denotes the set of constants in the symbol space type.

The value spaces and the corresponding mappings for the XML data types are defined by the XML Schema Part 2: Datatypes specification. In some cases, the XML Schema specification defines the subdomains Ds and Dt for different XML primitive types s and t as being disjoint or as subsets of each other. For instance, the value space of xsd:long is defined to be a subset of xsd:decimal. The requirements imposed by the XML Schema specification may have interesting consequences for the mappings from the lexical space to the value space. For instance, for xsd:decimal the mapping IC: Constxsd:decimalDxsd:decimal is such that, for instance, IC("1.2"^^xsd:decimal) = IC("1.20"^^xsd:decimal) in every semantic structure. So, it follows that "1.2"^^xsd:decimal = "1.20"^^xsd:decimal is a RIF tautology. The semantics of XML also implies some inequalities. For instance, since distinct strings are different in the value space for the xsd:string data type, something like "abc"^^xsd:string"abcd"^^xsd:string is a tautology, since it is true in all RIF semantic structures. On the other hand, "abc"^^rif:iri"abcd"^^rif:iri is not a tautology in RIF, since it is possible in some semantic structures (but not all such structures) that, for example, IC("abc"^^rif:iri) = IC("abcd"^^rif:iri).

The value space for the RDF data type rdf:XMLLiteral is defined in Resource Description Framework (RDF): Concepts and Abstract Syntax.

For the symbol space rif:iri, the corresponding mapping is IC: Constrif:iriD, i.e., Drif:iri = D. That is, the interpretation allows to map any IRI into any value in the domain of the semantic structure.

The value space of the local RIF symbols is also the entire domain D and the corresponding mapping looks as follows: IC: Constrif:localD. Thus, local constants can denote any element of the domain of discourse, including integers, strings, etc.

2.2. Slotted Conditions

(Editor's Note: This text is maintained on wiki page Slotted Conditions).

In this section we extend Positive Conditions, defined in Section Positive Conditions, with slotted Uniterms and Frame formulas. A slot can be represented by an individual symbol or, more generally, by a Uniterm. It can be represented by an IRI or be known only locally. Semantically, a frame slot is a set-valued function that represents a property of an object. Such a function maps an object id to a set of values of the property. This can be a singleton set or even an empty set. In contrast, a uniterm slot semantically behaves like a unary uninterpreted function symbol. In both cases, however, the order of the slots is immaterial (in contrast to positional uniterms).

Syntactically, the extension is achieved by enriching the notion of a uniterm with slots and by complementing these with frame formulas. For uniformity and greater syntactic convenience, frame formulas can be nested inside other frame formulas. This is syntactic sugar, however, as explained later in this section.

2.2.1. Syntax

2.2.1.1. Formal Preliminaries

The most important additions to the syntax for positive conditions in Section Formal Preliminaries are the notions of slotted terms (including slotted predicates) and frames.

Slotted terms. A term is either a term in the sense of Section Formal Preliminaries or a slotted term. Slotted terms are defined as follows:

  • A slotted term is of the form t(p1v1 ... pnvn), where t is a term; p1, ..., pn are terms, which represent the names of the slots; and v1 , ..., vn are terms.

    • Slotted terms are like regular terms except that the arguments of such a term are named. Since the arguments are named, their order is considered to be immaterial.

In order to talk about the slotted terms that are also well-formed, we need to extend the notion of a signature to include slots. As before, a signature is a statement of the form η{e1 ... en ...} where η ∈ SigNames is the name of the signature and {e1 ... en ...} is a set of signature expressions, which can be empty, finite, or countably infinite. But now a signature expression can be not only a base expression or an arrow expression; it can, in addition, be a slotted arrow expression. A slotted arrow expression is a statement of the form

  • 1→κ1 ... γn→κn) ⇒ κ, where n≥0. The order of arguments in a slotted signature is assumed to be immaterial, so any permutation of arguments is assumed to yield the same signature expression. For instance, (ii ii) ⇒ i is a slotted arrow signature expression. By analogy with earlier definitions, if κ is bool then the expression is also a slotted Boolean signature expression. For instance, (ii ii) ⇒ bool is a slotted Boolean signature expression.

A term t(p1v1 ... pnvn) is a well-formed slotted term with signature σ if

  • Each pi is a well-formed term with some signature γi.

  • Each vi is a well-formed term (slotted or non-slotted) with some signature σi, i=1, ..., n.

  • The signature of t includes the arrow expression (γ1#→σ1# ... γn#→σn#) ⇒ σ#.

    Recall that if σ is a signature then σ# denotes its name.

  • The aforesaid term t is a well-formed slotted formula if σ is bool{ }.

    A slotted formula is just like an atomic formula of Section Formal Preliminaries except that the arguments of the predicate are named and their order is considered immaterial. This is analogous to names of columns in tables in relational databases.

Frames. A well-formed frame formula is one of the following:

  • Membership formula: o#c, where o, c are well-formed terms.

    Informally, such a formula says that object o is a member of class c. Classes are also objects. For instance, they can be members of some other classes. (In object-oriented languages these latter classes are known as meta-classes.)

  • Subclass formula: s##c, where s, c are well-formed terms.

    Informally, this formula states that class s is a member of class c.

  • Frame: t[p1 -> v1 ... pn -> vn], where t is a well-formed term, a membership formula, or a subclass formula; p1, ..., pn are well-formed terms; and each v1 , ..., vn is either a well-formed term or a well-formed frame formula.

    When t, all the pi, and all the vi are terms, such a formula should be understood as a statement that object t has properties pi, ..., pn, and for each i = 1,...,n, the value of property pi is a set that contains the object vi. When t, pi, or vi are not just terms but frame formulas themselves, the above is treated as a conjunction of simpler frame formulas, as defined by the unnest transformation in the section on semantics (below).

Atomic formulas and general formulas. The syntax for atomic formulas is extended with slotted formulas and frame formulas. More precisely, an atomic well-formed formula is

  • An atomic well-formed formula as defined in Section Formal Preliminaries; or

  • A well-formed slotted formula; or
  • A well-formed frame formula.

The notion of well-formed general formulas needs no adjustments with respect to its earlier definition (without frames). As before, such formulas are constructed from well-formed atomic formulas using logical connectives ∧, ∨ etc.

Extended signatures for RIF's Basic Logic Dialect. Section Formal Preliminaries defined the allowed signatures for the basic logic dialect of RIF. With the introduction of slotted signatures, we augment the set of allowed signatures in the basic dialect as follows:

  • For every n≥0 there are signatures fsn{(ii ... ii) ⇒ i} and psn{(ii ... ii) ⇒ bool} (in both cases the number of arguments inside the parentheses is n).

In addition, RIF's Basic Logic Dialect imposes the following restrictions:

  • The ter