Copyright © 2008 W3C^{®} (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document, developed by
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is being published as one of a set of 2 documents:
The Rule Interchange Format (RIF) Working Group seeks public feedback on these Working Drafts. Please send your comments to public-rif-comments@w3.org (public archive). If possible, please offer specific changes to the text that would address your concern. You may also wish to check the Wiki Version of this document for internal-review comments and changes being drafted which may address your concerns.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Contents |
The RIF Framework for Logic-based Dialects (RIF-FLD) is a formalism for specifying all logic-based dialects of RIF, including RIF-BLD. It is a logic in which both syntax and semantics are described through a number of mechanisms that are commonly used for various logic languages, but are rarely brought all together. RIF-BLD gives precise definitions to these mechanisms, but omits certain details. Every logic-based RIF dialect is required to specialize these general mechanisms (even leave out some elements of RIF-FLD) to produce its concrete syntax and model-theoretic semantics.
The framework described in this document is very general and captures most of the popular logic-based rule languages found in Databases, Logic Programming, and on the Semantic Web. However, it is anticipated that the needs of future dialects might stimulate further evolution of RIF-FLD. In particular, future extensions might include a logic rendering of actions, as found in production and reactive rule languages.
This document is mostly intended for the designers of future RIF dialects. All logic-based RIF dialects are required to be derived from RIF-FLD by specialization, as explained in Sections Syntax of a RIF Dialect as a Specialization of RIF-FLD and Semantics of a RIF Dialect as a Specialization of RIF-FLD. In addition to specialization, to lower the barrier of entry for their intended audiences, some dialects may choose to specify their syntax and semantics in a direct, but equivalent, way, which does not require familiarity with RIF-FLD. For instance, the RIF Basic Logic Dialect is specified both by specialization from RIF-FLD and also directly, without references to the framework. Thus, the reader who is only interested in RIF-BLD can proceed directly to that document.
RIF-FLD has the following main components:
Syntactic framework. The syntactic framework defines six types of RIF terms:
RIF dialects can choose to support all or some of the aforesaid categories of terms. The syntactic framework also defines the following mechanisms for specializing these terms:
Symbol spaces are used to separate the set of all non-logical symbols (symbols used as variables, individual constants, predicates, and functions) into distinct subsets. These subsets can then be given different semantics. A symbol space has one or more identifiers and a lexical space, which defines the "shape" of the symbols in that symbol space. For instance, some symbol spaces can be used to identify any object, and syntactically they look like IRIs (e.g., rif:iri in RIF Basic Logic Dialect). Other symbol spaces may be used to describe the data types used in RIF (for example, xsd:integer).
Signatures determine which terms and formulas are well-formed. It is a generalization of the notion of a sort in classical first-order logic [Enderton01]. Each nonlogical symbol (and some logical symbols, like =) has an associated signature. A signature defines, in a precise way, the syntactic contexts in which the symbol is allowed to occur.
For instance, the signature associated with a symbol, p, might allow p to appear in a term of the form f(p), but disallow it to occur in a term like p(a,b). The signature for f, on the other hand, might allow that symbol to appear in f(p) and f(p,q), but disallow f(p,q,r) and f(f). In this way, it is possible to control which symbols are used for predicates and which for functions, where variables can occur, and so on.
Semantic framework. This framework defines the notion of a semantic structure or interpretation (both terms are used in the literature [Enderton01, Mendelson97], but here we will mostly use the first). Semantic structures are used to interpret RIF formulas and to define logical entailment. As with the syntax, this framework includes a number of mechanisms that RIF logic-based dialects can specialize to suit their needs. These mechanisms include:
Roughly speaking, a set of formulas, G, logically entails another formula, g, if for every semantic structure I in some set S, if I makes G true, then I also makes g true. Almost all known logics define entailment this way. The difference lies in which set S they use. For instance, logics that are based on the classical first-order predicate calculus, such as Description Logic, assume that S is the set of all semantic structures. In contrast, logic programming languages, which use default negation, assume that S contains only the so-called "minimal" Herbrand models of G and, furthermore, only the minimal models of a special kind. See [Shoham87] for a more detailed exposition of this subject.
XML serialization framework. This framework defines the general principles for serializing the various parts of the presentation syntax of RIF-FLD.
The next subsection explains the overall idea of deriving the syntax of a RIF dialect from the RIF framework. The actual syntax of the RIF framework is given in subsequent subsections.
The syntax for a RIF dialect can be obtained from the general syntactic framework of RIF by specializing the following parameters (which are defined in this document):
Signatures determine which terms in the dialect are well-formed and which are not.
The exact way signatures are assigned depends on the dialect. An assignment can be explicit or implicit (for instance, derived from the context in which each symbol is used).
The RIF logic framework introduces the following types of terms:
A dialect might support all of these terms or just a subset.
Symbol spaces determine the syntax of the symbols that are allowed in the dialect.
RIF-FLD allows formulas of the following kind:
A dialect might support all of these formulas or it might impose various restrictions. For instance, the formulas allowed in the conclusion and/or premises of implications might be restricted, certain types of quantification might be prohibited, classical or default negation (or both) might not be allowed, etc.
Definition (Alphabet). The alphabet of RIF-FLD consists of
The set of connective symbols, quantifiers, =, etc., is disjoint from Const and Var. Variables are written as Unicode strings preceded with the symbol "?". The argument names in ArgNames are written as Unicode strings that do not start with a "?". The syntax for constant symbols is given in Section Symbol Spaces.
The symbols =, #, and ## are used in formulas that define equality, class membership, and subclass relationships. The symbol -> is used in terms that have named arguments and in frame terms. The symbol External indicates that an atomic formula or a function term is defined externally (e.g., a builtin).
The symbol Group is used to organize RIF-BLD rules into collections and annotate them with metadata. ☐
The language of RIF-FLD is the set of formulas constructed using the above alphabet according to the rules spelled out below.
Throughout this document, the xsd: prefix stands for the XML Schema namespace URI http://www.w3.org/2001/XMLSchema#, the rdf: prefix stands for http://www.w3.org/1999/02/22-rdf-syntax-ns#, and rif: stands for the URI of the RIF namespace, http://www.w3.org/2007/rif#. Syntax such as xsd:string should be understood as a compact URI [CURIE] -- a macro that expands to a concatenation of the character sequence denoted by the prefix xsd and the string string. The compact URI notation is not part of the RIF syntax, but rather just a space-saving device in this document.
The set of all constant symbols in a RIF dialect is partitioned into a number of subsets, called symbol spaces, which are used to represent XML Schema data types, data types defined in other W3C specifications, such as rdf:XMLLiteral, and to distinguish other sets of constants. All constant symbols have a syntax (and sometimes also semantics) imposed by the symbol space to which they belong
Definition (Symbol space). A symbol space is a named subset of the set of all constants, Const. The semantic aspects of symbol spaces will be described in Section Semantic Framework. Each symbol in Const belongs to exactly one symbol space.
Each symbol space has an associated lexical space, a unique identifier, and, possibly, one or more aliases. More precisely,
To simplify the language, we will often use symbol space identifiers to refer to the actual symbol spaces (for instance, we may use "symbol space xsd:string" instead of "symbol space identified by xsd:string").
To refer to a constant in a particular RIF symbol space, we use the following presentation syntax:
"literal"^^symspace
where literal is called the lexical part of the symbol, and symspace is an identifier or an alias of the symbol space. Here literal is a sequence of Unicode characters that must be an element in the lexical space of the symbol space symspace. For instance, "1.2"^^xsd:decimal and "1"^^xsd:decimal are legal symbols because 1.2 and 1 are members of the lexical space of the XML Schema data type xsd:decimal. On the other hand, "a+2"^^xsd:decimal is not a legal symbol, since a+2 is not part of the lexical space of xsd:decimal.
The set of all symbol spaces that partition Const is considered to be part of the logic language of RIF-FLD.
The following list of supported symbol spaces will move to another document, Data Types and Built-Ins. Any existing discrepancies will be fixed at that time. |
RIF supports the following symbol spaces. Rule sets that are
exchanged through RIF can use additional symbol spaces as explained
below.
and all the symbol spaces that correspond to the subtypes of xsd:string as specified in [XML-SCHEMA2].
and all the symbol spaces that corresponds to the subtypes of xsd:decimal as specified in [XML-SCHEMA2].
The lexical spaces of the above symbol spaces are defined in the document [XML-SCHEMA2].
This symbol space represents XML content. The lexical space of rdf:XMLLiteral is defined in the document [RDF-CONCEPTS].
This symbol space represents text strings with a language tag attached. The lexical space of rif:text is the set of all Unicode strings of the form ...@LANG, i.e., strings that end with @LANG where LANG is a language identifier as defined in [RFC-3066].
Constant symbols that belong to this symbol space are intended to be used in a way similar to RDF resources [RDF-SCHEMA]. The lexical space consists of all absolute IRIs as specified in [RFC-3987]; it is unrelated to the XML primitive type anyURI. A rif:iri constant must be interpreted as a reference to one and the same object regardless of the context in which that constant occurs.
Symbols in this symbol space are local to the RIF documents in which they occur. This means that occurrences of the same rif:local constant in different documents are viewed as unrelated distinct constants, but occurrences of the same rif:local constant in the same document must refer to the same object. The lexical space of rif:local is the same as the lexical space of xsd:string.
The most basic construct of a logic language is a term. RIF-FLD supports several kinds of terms: constants, variables, the regular positional terms, plus terms with named arguments, equality, classification terms, and frames. The word "term" will be used to refer to any kind of term.
Definition (Term). A term is a statement of one of the following forms:
Positional terms in RIF-FLD generalize the regular notion of a term used in first-order logic. For instance, the above definition allows variables everywhere.
The term t here represents a predicate or a function; s_{1}, ..., s_{n} represent argument names; and v_{1}, ..., v_{n} represent argument values. Terms with named arguments are like regular positional terms except that the arguments are named and their order is immaterial. Note that a term with no arguments, like f(), is both positional and also is considered to have named arguments.
Classification terms are used to describe class hierarchies.
Frame terms are used to describe properties of objects. As in the case of the terms with named arguments, the order of the properties p_{i}->v_{i} in a frame is immaterial.
Such terms are used for representing builtin functions and predicates as well as "procedurally attached" terms or predicates, which might exist in various rule-based systems, but are not specified by RIF. ☐
The above definition is very general. It makes no distinction between constant symbols that represent individuals, predicates, and function symbols. The same symbol can occur in multiple contexts at the same time. For instance, if p, a, and b are symbols then p(p(a) p(a p c)) is a term. Even variables and general terms are allowed to occur in the position of predicates and function symbols, so p(a)(?v(a c) p) is also a term.
Frame, classification, and other terms can be freely nested, as exemplified by p(?X q#r[p(1,2)->s](d->e f->g)). Some language environments, like FLORA-2 [FL2], OO jDREW [OOjD], and CycL [CycL] support fairly large (partially overlapping) subsets of RIF-FLD terms, but most languages support much smaller subsets. RIF dialects are expected to carve out the appropriate subsets of RIF-FLD terms, and the general form of the RIF logic framework allows a considerable degree of freedom.
Dialects can also restrict the contexts in which the various terms can occur. The mechanism that allows to control the context is called a signature and works as follows. The RIF-FLD language associates a signature with each symbol (both constant and variable symbols) and uses signatures to define well-formed terms. Each RIF dialect is expected to select appropriate signatures for the symbols in its alphabet, and only the terms that are well-formed according to the selected signatures are allowed in that particular dialect.
Part of the material in this section will be duplicated in the document Data Types and Built-Ins. This is in order to enable direct specification of RIF dialects, which bypass the references to FLD. |
This section introduces the notion of external schemas, which serve as templates for externally defined terms. These schemas determine which externally defined functions or predicates, are acceptable as terms in a RIF dialect. Externally defined terms include RIF builtins, which are specified in the document Data Types and Builtins. The notion of an externally defined term in RIF is very general. It is not necessarily a function or a predicate -- it can be any term, including frames, classification terms, and so on.
Definition (Schema for external term). An external schema is a statement of the form (?X_{1} ... ?X_{n}; τ) where
The names of the variables in an external schema are immaterial, but their order is. For instance, (?X ?Y; ?X[foo->?Y]) and (?V ?W; ?V[foo->?W]) are considered to be the same schema, but (?X ?Y; ?X[foo->?Y]) and (?Y ?X; ?X[foo->?Y]) are viewed as different schemas.
A term t is an instance of an external schema (?X_{1} ... ?X_{n}; τ) iff t can be obtained from τ by a simultaneous substitution ?X_{1}/s_{1} ... ?X_{n}/s_{n} or the variables ?X_{1} ... ?X_{n} with terms s_{1} ... s_{n}, respectively. Some of the terms s_{i} can be variables themselves. For example, ?Z[foo->f(a ?P)] is an instance of (?X ?Y; ?X[foo->?Y]) by the substitution ?X/?Z ?Y/f(a ?P). ☐
Observe that a variable cannot be an instance of an external schema, since τ in the above definition cannot be a variable. It will be seen later that this implies that a term of the form External(?X) is not well-formed in RIF.
Definition (Coherent set of external schemas). A set of
external schemas is coherent if there can be no term,
t, that is an instance of two distinct schemas.
Note that the coherence condition is easy to verify syntactically and that it implies that schemas like (?X ?Y; ?X[foo->?Y]) and (?Y ?X; ?X[foo->?Y]), which differ only in the order of their variables, cannot be in the same coherent set. ☐
It important to understand that external schemas are not part of the logic language in RIF, since they do not appear anywhere in the RIF formulas. Instead, like signatures, which are defined below, they are best thought of as part of the grammar of the language. In particular, they will be used to determine which external terms, i.e., the terms of the forl External(t) are well-formed.
In this section we introduce the concept of a signature, which is a key mechanism that allows RIF-FLD to control the context in which the various symbols are allowed to occur. For instance, a symbol f with signature {(term term) => term, (term) => term} can occur in terms like f(a b), f(f(a b) a), f(f(a)), etc., if a and b have signature term. But f is not allowed to appear in the context f(a b a) because there is no =>-expression in the signature of f to support such a context.
The above example provides intuition behind the use of signatures in RIF-FLD. Much of the development, below, is inspired by [CK95]. It should be kept in mind that signatures are not part of the logic language in RIF, since they do not appear anywhere in the RIF formulas. Instead they are part of the grammar: they are used to determine which sequences of tokens are in the language and which are not. The actual way by which signatures are assigned to the symbols of the language may vary from dialect to dialect. In some dialects (for example RIF-BLD), this assignment is derived from the context in which each symbol occurs and no separate language for signatures is used. Other dialects may choose to assign signatures explicitly. In that case, they would require a concrete language for signatures (which would be separate from the language for specifying the logic formulas of the dialect).
Definition (Signature name). Let SigNames be a non-empty, partially-ordered finite or countably infinite set of symbols, called signature names. Since signatures are not part of the logic language, their names do not have to be disjoint from Const, Var, and ArgNames. We require that this set includes at least the following signature names:
Dialects are expected to introduce additional signature names. For instance, RIF-BLD introduces one other signature name, term. The partial order on SigNames is dialect-specific; it is used in the definition of well-formed terms below.
We use the symbol < to represent the partial order on SigNames. Informally, α < β means that terms with signature α can be used wherever terms with signature β are allowed. We will write α ≤ β if either α = β or α < β.
Definition (Signature). A signature is a statement of the form η{e_{1}, ..., e_{n}, ...} where η ∈ SigNames is the name of the signature and {e_{1}, ..., e_{n}, ...} is a countable set of arrow expressions. Such a set can thus be infinite, finite, or even empty. In RIF-BLD, signatures can have at most one arrow expression. Other dialects (such as HiLog [CKW93], for example) may require polymorphic symbols and thus allow signatures with more than one arrow expression in them.
An arrow expression is defined as follows:
For instance, () ⇒ term and (term) ⇒ term are arrow expressions, if term is a signature name.
For instance, (arg1->term arg2->term) => term is an arrow signature expression with named arguments. The order of the arguments in arrow expressions with named arguments is immaterial, so any permutation of arguments yields the same expression.
A set S of signatures is coherent iff
All arrow expressions e_{i} here have the form (κ κ) ⇒ γ (the arguments in an equation must be compatible) and at least one of these expressions must have the form (κ κ) ⇒ atomic (i.e., some equations should be allowed as atomic formulas). Dialects may further specialize this signature.
Here all arrow expressions e_{i} are binary (have two arguments) and at least one has the form (κ γ) ⇒ atomic. Dialects may further specialize this signature.
Here all arrow expressions e_{i} have the form (κ κ) ⇒ γ (the arguments must be compatible) and at least one of these arrow expressions has the form (κ κ) ⇒ atomic. Dialects may further specialize this signature.
Here ηA denotes a signature with the name η and the associated set of arrow expressions A; similarly κB is a signature named κ with the set of expressions B. The requirement that B⊆A ensures that symbols that have signature η can be used wherever the symbols with signature κ are allowed. ☐
The language of a RIF dialect is a set of all well-formed formulas, as defined below. The language is determined by the following parameters:
Each variable symbol is associated with exactly one signature from a coherent set of signatures. A constant symbol can have one or more signatures, and different symbols can be associated with the same signature. (If variables were allowed to have multiple signatures then well-formed terms would not have been closed under substitutions. For instance, a term like f(?X,?X) could be well-formed, but f(a,a) could be ill-formed.)
We have already seen how the alphabet and the symbol spaces are used to define RIF terms. The next section shows how signatures and external schemas are used to further specialize this notion to define well-formed RIF-FLD terms.
Since signature names uniquely identify signatures in coherent signature sets, we will often refer to signatures simply by their names. For instance, if one of f's signatures is atomic{ }, we may simply say that symbol f has signature atomic.
Definition (Well-formed term).
As a special case, when n=0 we obtain that t( ) is a well-formed term with signature σ, if t's signature contains the arrow expression () ⇒ σ.
As a special case, when n=0 we obtain that t( ) is a well-formed term with signature σ, if t's signature contains the arrow expression () ⇒ σ.
Note that, according to the definition of coherent sets of schemas, a term can be an instance of at most one external schema. ☐
Note that, like constant symbols, well-formed terms can have more than one signature. Also note that, according to the above definition, f() and f are distinct terms.
Definition (Well-formed formula). A well-formed term is also
a well-formed atomic formula iff one of its
signatures is atomic or it is < atomic.
Note that equality, membership, subclass, and frame terms are
atomic formulas, since atomic is one of their
signatures.
More general formulas are constructed out of atomic formulas with the help of logical connectives. A formula is a statement that can have one of the following forms:
As a special case, And() is allowed and is treated as a tautology, i.e., a formula that is always true.
When n=0, we get Or() as a special case; it is treated as a contradiction, i.e., a formula that is always false.
Group formulas are intended to represent sets of formulas annotated with metadata. This metadata is specified using an optional frame term φ. Note that some of the ρ_{i}'s can be group formulas themselves, which means that groups can be nested. This allows one to attach metadata to various subsets of formulas, which may be inside larger sets of formulas, which in turn may be annotated. ☐
Example 1 (Signatures, well-formed terms and formulas).
We illustrate the above definitions with the following examples. In addition to atomic, let there be another signature, term{ }, which is intended here to represent the context of the arguments to positional terms or atomic formulas.
Consider the term p(p(a) p(a b c)). If p has the (polymorphic) signature mysig{(term)⇒term, (term term)⇒term, (term term term)⇒term} and a, b, c each has the signature term{ } then p(p(a) p(a b c)) is a well-formed term with signature term{ }. If instead p had the signature mysig2{(term term)⇒term, (term term term)⇒term} then p(p(a) p(a b c)) would not be a well-formed term since then p(a) would not be well-formed (in this case, p would have no arrow expression which allows p to take just one argument).
For a more complex example, let r have the signature mysig3{(term)⇒atomic, (atomic term)⇒term, (term term term)⇒term}. Then r(r(a) r(a b c)) is well-formed. The interesting twist here is that r(a) is an atomic formula that occurs as an argument to a function symbol. However, this is allowed by the arrow expression (atomic term)⇒ term, which is part of r's signature. If r's signature were mysig4{(term)⇒atomic, (atomic term)⇒atomic, (term term term)⇒term} instead, then r(r(a) r(a b c)) would be not only a well-formed term, but also a well-formed atomic formula.
An even more interesting example arises when the right-hand side of an arrow expression is something other than term or atomic. For instance, let John, Mary, NewYork, and Boston have signatures term{ }; flight and parent have signature h_{2}{(term term)⇒atomic}; and closure has signature hh_{1}{(h_{2})⇒p_{2}}, where p_{2} is the name of the signature p_{2}{(term term)⇒atomic}. Then flight(NewYork Boston), closure(flight)(NewYork Boston), parent(John Mary), and closure(parent)(John Mary) would be well-formed formulas. Such formulas are allowed in languages like HiLog [CKW93], which support predicate constructors like closure in the above example. ☐
Example 2 (A nested RIF-FLD group annotated with metadata).
We illustrate formulas, groups, and metadata by the following complete example. For better readability, we use the compact URI notation which assumes that prefixes are macro-expanded into IRIs. As explained earlier, this is just a space-saving device and not part of the RIF syntax.
Compact URI prefixes: dc expands into http://dublincore.org/documents/dces/ ex expands into http://example.org/ontology# hamlet expands into http://www.shakespeare-literature.com/Hamlet/
Group "hamlet:assertions"^^rif:iri["dc:title"^^rif:iri->"Hamlet"^^xsd:string, "dc:creator"^^rif:iri->"Shakespeare"^^xsd:string] ( Exists ?X (And(?X # "ex:RottenThing"^^rif:iri "ex:part-of"^^rif:iri(?X "http://www.denmark.dk"^^rif:iri))) Forall ?X (Or("hamlet:to-be"^^rif:iri(?X) Naf "hamlet:to-be"^^rif:iri(?X))) Forall ?X (And(Exists ?B (And("ex:has"^^rif:iri(?X ?B) ?B#"ex:business"^^rif:iri)) Exists ?D (And("ex:has"^^rif:iri(?X ?D) ?D#"ex:desire"^^rif:iri))) :- ?X#"ex:man"^^rif:iri) Group "hamlet:facts"^^rif:iri[ ] ( "hamlet:Yorick"^^rif:iri#"ex:poor"^^rif:iri "hamlet:Hamlet"^^rif:iri#"ex:prince"^^rif:iri ) )
Observe that the above set of formulas has a nested subset with its own metadata, "hamlet:facts"^^rif:iri[ ], which contains only a global IRI. ☐
Up to now we used Mathematical English to specify the syntax of RIF-FLD. We will now use the familiar EBNF notation in order to provide a succinct overview of the syntax. The following points about the EBNF notation have to be kept in mind:
Group ::= 'Group' IRIMETA? '(' (FORMULA | Group)* ')' IRIMETA ::= Frame FORMULA ::= 'And' '(' FORMULA* ')' | 'Or' '(' FORMULA* ')' | Implies | 'Exists' Var+ '(' FORMULA ')' | 'Forall' Var+ '(' FORMULA ')' | 'Neg' FORMULA | 'Naf' FORMULA | ATOMIC | 'External' '(' ATOMIC ')' Implies ::= FORMULA ':-' FORMULA ATOMIC ::= Atom | Equal | Member | Subclass | Frame Atom ::= UNITERM UNITERM ::= TERM '(' (TERM* | (Name '->' TERM)*) ')' Equal ::= TERM '=' TERM Member ::= TERM '#' TERM Subclass ::= TERM '##' TERM Frame ::= TERM '[' (TERM '->' TERM)* ']' TERM ::= Const | Var | Expr | 'External' '(' Expr ')' | Equal | Member | Subclass | Frame Expr ::= UNITERM Const ::= '"' UNICODESTRING '"^^' SYMSPACE Name ::= UNICODESTRING Var ::= '?' UNICODESTRING
The RIF-FLD semantic framework defines the notions of semantic structures and of models of RIF formulas. The semantics of a dialect is derived from these notions by specializing the following parameters.
The syntax of a dialect may limit the kinds of terms that are supported. For instance, if the dialect does not support frames or terms with named arguments then the parts of the semantic structures whose purpose is to interpret the unsupported types of terms become redundant.
The RIF-FLD semantic framework allows formulas to have truth values from an arbitrary partially ordered set of truth values, TV. A concrete dialect must select a concrete partially or totally ordered set of truth values.
A data type is a symbol space whose symbols have a fixed interpretation in any semantic structure. RIF-FLD defines a set of core data types that each dialect is expected to support, but its semantics does not limit support to just the core types. RIF dialects can introduce additional data types, and each dialect is expected to define the exact set of data types that it supports.
Logical entailment in RIF-FLD is defined with respect to an unspecified set of intended models. A RIF dialect must define which models are considered to be intended. For instance, one dialect might specify that all models are intended (which leads to classical first-order entailment), another may consider only the minimal models as intended, while a third one might only use well-founded or stable models [GRS91, GL88].
These notions are defined in the remainder of this document.
Definition (Set of truth values). Each RIF dialect is expected to define the set of truth values, denoted by TV. This set must have a partial order, called the truth order, denoted <_{t}. In some dialects, <_{t} can be a total order. We write a ≤_{t} b if either a <_{t} b or a and b are the same element of TV. In addition,
RIF dialects can have additional truth values. For instance, the semantics of some versions of NAF, such as the well-founded negation, requires three truth values: t, f, and u (undefined), where f <_{t} u <_{t} t. Handling of contradictions and uncertainty usually requires at least four truth values: t, u, f, and i (inconsistent). In this case, the truth order is partial: f <_{t} u <_{t} t and f <_{t} i <_{t} t.
Definition (Primitive data type). A primitive data type (or just a data type, for short) is a symbol space that has
Semantic structures are always defined with respect to a particular set of data types, denoted by DTS. In a concrete dialect, DTS always includes the data types supported by that dialect. All RIF dialects are expected to support the following primitive data types:
Their value spaces and the lexical-to-value-space mappings are defined as follows:
The value space and the lexical-to-value-space mapping for rif:text defined here are compatible with RDF's semantics for strings with named tags [RDF-SEMANTICS].
The above list of supported data types will move to the document Data Types and Built-Ins. Any existing discrepancies will be fixed at that time. |
Although the lexical and the value spaces might sometimes look
similar, one should not confuse them. Lexical spaces define the
syntax of the constant symbols in the RIF language. Value spaces
define the meaning of the constants. The lexical and the
value spaces are often not even isomorphic. For example,
1.2^^xsd:decimal and 1.20^^xsd:decimal are two
legal -- and distinct -- constants in RIF because 1.2 and
1.20 belong to the lexical space of xsd:decimal.
However, these two constants are interpreted by the same
element of the value space of the xsd:decimal type.
Therefore, 1.2^^xsd:decimal = 1.20^^xsd:decimal
is a RIF tautology. Likewise, RIF semantics for data types implies
certain inequalities. For instance, abc^^xsd:string ≠
abcd^^xsd:string is a tautology, since the
lexical-to-value-space mapping of the xsd:string type maps
these two constants into distinct elements in the value space of
xsd:string.
The central step in specifying a model-theoretic semantics for a logic-based language is defining the notion of a semantic structure, also known as an interpretation. Semantic structures are used to assign truth values to RIF-FLD formulas.
Definition (Semantic structure). A semantic structure, I, is a tuple of the form <TV, DTS, D, I_{C}, I_{V}, I_{F}, I_{frame}, I_{SF}, I_{sub}, I_{isa}, I_{=}, I_{external}, I_{truth}>. Here D is a non-empty set of elements called the domain of I. We will continue to use Const to refer to the set of all constant symbols and Var to refer to the set of all variable symbols. TV denotes the set of truth values that the semantic structure uses and DTS is the set of primitive data types used in I.
The other components of I are total mappings defined as follows:
This mapping interprets constant symbols.
This mapping interprets variable symbols.
This mapping interprets positional terms.
This is analogous to the interpretation of positional terms with two differences:
To see why such repetition can occur, note that argument names may repeat: p(a->b a->c). This can be understood as treating a as a set-valued argument. Identical argument/value pairs can then arise as a result of a substitution. For instance, p(a->?A a->?B) becomes p(a->b a->b) if the variables ?A and ?B are both instantiated with the symbol b.
Such repetitions arise naturally when variables are instantiated with constants. For instance, o[?A->?B ?A->?B] becomes o[a->b a->b] if variable ?A is instantiated with the symbol a and ?B with b.
The operator ## is required to be transitive, i.e., c1 ## c2 and c2 ## c3 must imply c1 ## c3. This is ensured by a restriction in Section Interpretation of Formulas.
The relationships # and ## are required to have the usual property that all members of a subclass are also members of the superclass, i.e., o # cl and cl ## scl must imply o # scl. This is ensured by a restriction in Section Interpretation of Formulas.
It gives meaning to the equality operator.
It is used to define truth valuation for formulas.
For every external schema, σ, associated with the language, I_{external}(σ) is assumed to be specified externally in some document (hence the name external schema). In particular, if σ is a schema of a RIF builtin predicate or function, I_{external}(σ) is specified in the document Data Types and Builtins so that:
For convenience, we also define the following mapping I :
Here we use {...} to denote a bag of argument/value pairs.
Here {...} denotes a bag of attribute/value pairs.
Note that, by definition, External(t) is well formed only if t is an instance of an external schema. Furthermore, by the definition of coherent sets of external schemas, t can be an instance of at most one such schema, so I(External(t)) is well-defined.
The effect of signatures. For every signature, sg, supported by the dialect, there is a subset D_{sg} ⊆ D, called the domain of the signature. Terms that have a given signature, sg, must be mapped by I to D_{sg}, and if a term has more than one signature it must be mapped into the intersection of the corresponding signature domains. To ensure this, the following is required:
The effect of data types. The data types in DTS impose the following restrictions. If dt is a symbol space identifier of a data type, let LS_{dt} denote the lexical space of dt, VS_{dt} denote its value space, and L_{dt}: LS_{dt} → VS_{dt} the lexical-to-value-space mapping. Then the following must hold:
That is, I_{C} must map the constants of a data type dt in accordance with L_{dt}. ☐
RIF-FLD does not impose special requirements on I_{C} for constants in the symbol spaces that do not correspond to primitive datatypes in DTS. Dialects may have such requirements, however. An example of such a restriction could be a requirement that no constant in a particular symbol space (such as rif:local) can be mapped to VS_{dt} of a data type dt.
Definition (Truth valuation). Truth valuation for well-formed formulas in RIF-FLD is determined using the following function, denoted TVal_{I}:
To ensure that equality has precisely the expected properties, it is required that
To ensure that the operator ## is transitive, i.e., c1 ## c2 and c2 ## c3 imply c1 ## c3, the following is required: For all c1, c2, c3 ∈ D, glb_{t}(TVal_{I}(c1 ## c2), TVal_{I}(c2 ## c3)) ≤_{t} TVal_{I}(c1 ## c3).
To ensure that all members of a subclass are also members of the superclass, i.e., o # cl and cl ## scl implies o # scl, the following is required:
Since the different attribute/value pairs are supposed to be understood as conjunctions, the following is required:
Note that, by definition, External(t) is well-formed only if t is an instance of an external schema. Furthermore, by the definition of coherent sets of external schemas, t can be an instance of at most one such schema, so I(External(t)) is well-defined.
The empty conjunction is treated as a tautology, so TVal_{I}(And()) = t.
The empty disjunction is treated as a contradiction, so TVal_{I}(Or()) = f.
The symbol ~ here is the idempotent operator of negation on TV introduced in Section Truth Values. Note that both classical and default negation are interpreted the same way in any concrete semantic structure. The difference between the two kinds of negation comes into play when logical entailment is defined.
Here lub_{t} (respectively, glb_{t}) is taken over all interpretations I* of the form <TV, DTS, D, I_{C}, I*_{V}, I_{F}, I_{frame}, I_{SF}, I_{sub}, I_{isa}, I_{=}, I_{external}, I_{truth}>, which are exactly like I, except that the mapping I*_{V}, is used instead of I_{V}. I*_{V} is defined to coincide with I_{V} on all variables except, possibly, on ?v_{1},... ,?v_{n}.
If Γ is a group formula of the form Group φ (ρ_{1} ... ρ_{n}) or Group (ρ_{1} ... ρ_{n}) then
This means that a group of formulas is treated as a conjunction. The metadata is ignored for semantic purposes.
Note that rule implications and equality formulas are always two-valued, even if TV has more than two values.
A model of a group of formulas Γ is a semantic structure I such that TVal_{I}(Γ) = t. ☐
Note that although metadata associated with RIF formulas is ignored by the semantics, it can be extracted by XML tools. Since metadata is represented by frame terms, it can be reasoned with by RIF dialects, such as RIF-BLD.
The semantics of a set of formulas, Γ, is the set of its intended semantic structures. RIF-FLD does not specify what these intended structures are, leaving this to RIF dialects. Different logic theories may have different criteria for what is considered an intended semantic structure.
For the classical first-order logic, every semantic structure is intended. For RIF-BLD, which is based on Horn rules, intended semantic structures are defined only for sets of rules: an intended semantic structure of a RIF-BLD set Γ is the unique minimal Herbrand model of Γ. For the dialects in which rule bodies may contain literals negated with the negation-as-failure connective Naf, only some of the minimal Herbrand models of a set of rules are intended. Each dialect of RIF is supposed to define the notion of intended semantic structures precisely. The two most common theories of intended semantic structures are the so called well-founded models [GRS91] and stable models [GL88].
The following example illustrates the notion of intended semantic structures. Suppose Γ consists of a single rule formula p :- Naf q. If Naf were interpreted as classical negation, not, then this rule would be simply equivalent to Or(p q), and so it would have two kinds of models: those where p is true and those where q is true. In contrast to first-order logic, most rule-based systems do not consider p and q symmetrically. Instead, they view the rule p :- Naf q as a statement that p must be true if it is not possible to establish the truth of q. Since it is, indeed, impossible to establish the truth of q, such theories would derive p even though it does not logically follow from Or(p q). The logic underlying rule-based systems also assumes that only the minimal Herbrand models are intended (minimality here is with respect to the set of true facts). Furthermore, although our example has two minimal Herbrand models -- one where p is true and q is false, and the other where p is false, but q is true, only the first model is considered to be intended.
The above concept of intended models and the corresponding notion of logical entailment with respect to the intended models, defined below, is due to [Shoham87].
We will now define what it means for a set of RIF formulas to entail a RIF formula. We assume that each set of formulas has an associated set of intended semantic structures.
Definition
(Logical entailment). Let Γ be a RIF group formula and
φ a RIF formula. We say that Γ
entails φ, written as
Γ |= φ, if and only if for every intended
semantic structure I of Γ it is the case
that TVal_{I}(Γ) ≤
TVal_{I}(φ). ☐
This general notion of entailment covers both first-order logic and non-monotonic logics that underlie many rule-based languages [Shoham87].
The RIF XML serialization framework defines a normative mapping from the RIF-FLD presentation syntax to XML, and also a normative XML Schema for that XML syntax. As explained in the overview section, RIF requires that the presentation syntax of any logic-based RIF dialect must be a specialization of the presentation syntax of RIF-FLD, i.e., every well-formed formula in the presentation syntax of a RIF dialect must be well-formed also in RIF-FLD. The goal of the XML serialization framework is to provide a similar yardstick for the RIF XML syntax. This amounts to the requirement that any valid XML document for a logic-based RIF dialect must also be a valid XML document for RIF-FLD. In this way, RIF-FLD provides a framework for extensibility and mutual compatibility between XML syntaxes of RIF dialects.
This section is incomplete in the present draft. The next draft will include full treatment of the XML serialization framework. |
The XML serialization for RIF-BLD is alternating or fully
striped [ANF01]. A fully striped serialization views XML documents as
objects and divides all XML tags into class descriptors, called
type tags, and property descriptors, called role
tags. We use capitalized names for type tags and lowercase
names for role tags. The RIF serialization framework uses the
following XML tags.
- Group (nested collection of formulas annotated with metadata) - meta (meta role, containing metadata, which is represented as a Frame) - Forall (quantified formula for 'Forall', containing declare and formula roles) - Exists (quantified formula for 'Exists', containing declare and formula roles) - declare (declare role, containing a Var) - formula (formula role, containing a FORMULA) - Implies (implication, containing if and then roles) - if (antecedent role, containing FORMULA) - then (consequent role, containing FORMULA) - And (conjunction) - Or (disjunction) - Neg (strong negation, containing a formula role) - Naf (negation as failure, containing a formula role) - Atom (atom formula, positional or with named arguments) - External (external call, containing a content role) - content (content role, containing an Atom, for predicates, or Expr, for functions) - Member (member formula) - Subclass (subclass formula) - Frame (Frame formula) - object (Member/Frame role containing a TERM or an object description) - op (Atom/Expr role for predicates/functions as operations) - arg (argument role) - upper (Member/Subclass upper class role) - lower (Member/Subclass lower instance/class role) - slot (Atom/Expr/Frame slot role, containing a Prop) - Prop (Property, prefix version of slot infix '->') - key (Prop key role, containing a Const) - val (Prop val role, containing a TERM) - Equal (prefix version of term equation '=') - Expr (expression formula, positional or with named arguments) - side (Equal left-hand side and right-hand side role) - Const (individual, function, or predicate symbol, with optional 'type' attribute) - Name (name of named argument) - Var (logic variable)
Example 3 (Serialization of a nested RIF-FLD group annotated with metadata).
This example shows an XML serialization for the formulas in Example 2. For convenience of reference, the original formulas are included at the top. For better readability, we again use the compact URI syntax.
Compact URI prefixes: dc expands into http://dublincore.org/documents/dces/ ex expands into http://example.org/ontology# hamlet expands into http://www.shakespeare-literature.com/Hamlet/
Presentation syntax: Group "hamlet:assertions"^^rif:iri["dc:title"^^rif:iri->"Hamlet"^^xsd:string, "dc:creator"^^rif:iri->"Shakespeare"^^xsd:string] ( Exists ?X (And(?X # "ex:RottenThing"^^rif:iri "ex:part-of"^^rif:iri(?X "http://www.denmark.dk"^^rif:iri))) Forall ?X (Or("hamlet:to-be"^^rif:iri(?X) Naf "hamlet:to-be"^^rif:iri(?X))) Forall ?X (And(Exists ?B (And("ex:has"^^rif:iri(?X ?B) ?B#"ex:business"^^rif:iri)) Exists ?D (And("ex:has"^^rif:iri(?X ?D) ?D#"ex:desire"^^rif:iri))) :- ?X#"ex:man"^^rif:iri) Group "hamlet:facts"^^rif:iri[ ] ( "hamlet:Yorick"^^rif:iri#"ex:poor"^^rif:iri "hamlet:Hamlet"^^rif:iri#"ex:prince"^^rif:iri ) ) XML serialization: <Group> <meta> <Frame> <object> <Const type="rif:iri">hamlet:assertions</Const> </object> <slot> <Prop> <key><Const type="rif:iri">dc:title</Const></key> <val><Const type="xsd:string">Hamlet</Const></val> </Prop> </slot> <slot> <Prop> <key><Const type="rif:iri">dc:creator</Const></key> <val><Const type="xsd:string">Shakespeare</Const></val> </Prop> </slot> </Frame> </meta> <formula> <Exists> <declare><Var>X</Var></declare> <formula> <And> <formula> <Member> <lower><Var>X</Var></lower> <upper><Const type="rif:iri">ex:RottenThing</Const></upper> </Member> </formula> <formula> <Atom> <op><Const type="rif:iri">ex:part-of</Const></op> <arg><Var>X</Var></arg> <arg><Const type="rif:iri">http://www.denmark.dk</Const></arg> </Atom> </formula> </And> </formula> </Exists> </formula> <formula> <Forall> <declare><Var>X</Var></declare> <formula> <Or> <formula> <Atom> <op><Const type="rif:iri">hamlet:to-be</Const></op> <arg><Var>X</Var></arg> </Atom> </formula> <formula> <Naf> <formula> <Atom> <op><Const type="rif:iri">hamlet:to-be</Const></op> <arg><Var>X</Var></arg> </Atom> </formula> </Naf> </formula> </Or> </formula> </Forall> </formula> <formula> <Forall> <declare><Var>X</Var></declare> <formula> <Implies> <if> <Member> <lower><Var>X</Var></lower> <upper><Const type="rif:iri">ex:man</Const></upper> </Member> </if> <then> <And> <formula> <Exists> <declare><Var>B</Var></declare> <And> <formula> <Atom> <op><Const type="rif:iri">ex:has</Const></op> <arg><Var>X</Var></arg> <arg><Var>B</Var></arg> </Atom> </formula> <formula> <Member> <lower><Var>B</Var></lower> <upper><Const type="rif:iri">ex:business</Const></upper> </Member> </formula> </And> </Exists> </formula> <formula> <Exists> <declare><Var>D</Var></declare> <And> <formula> <Atom> <op><Const type="rif:iri">ex:has</Const></op> <arg><Var>X</Var></arg> <arg><Var>D</Var></arg> </Atom> </formula> <formula> <Member> <lower><Var>D</Var></lower> <upper><Const type="rif:iri">ex:desire</Const></upper> </Member> </formula> </And> </Exists> </formula> </And> </then> </Implies> </formula> </Forall> </formula> <formula> <Group> <meta> <Frame> <object> <Const type="rif:iri">hamlet:facts</Const> </object> </Frame> </meta> <formula> <Member> <lower><Const type="rif:iri">hamlet:Yorick</Const></lower> <upper><Const type="rif:iri">ex:poor</Const></upper> </Member> </formula> <formula> <Member> <lower><Const type="rif:iri">hamlet:Hamlet</Const></lower> <upper><Const type="rif:iri">ex:prince</Const></upper> </Member> </formula> </Group> </formula> </Group>
We now serialize the syntax of Section EBNF Grammar for the Presentation
Syntax of RIF-FLD by defining a mapping from the presentation
syntax to XML.
This mapping will be given in the next draft. |