Differences between revisions 201 and 223 (spanning 22 versions)
| Deletions are marked like this. | Additions are marked like this. |
| Line 2: | Line 2: |
| supported by RIF Core. As explained in Section [:Core/Overview#Overview:Overview], RIF Core corresponds to definite Horn rules, and the bodies of such rules are conjunctions of atomic formulas without negation. However, it is well-known that disjunctions of such conditions can also be used in the bodies of rules without changing the essential properties of the rule language. This is based on the fundamental logical tautologies (h ← b ∨ c) ≡ ((h ← b) ∧ (h ← c)) and | supported by the basic RIF logic. As explained in Section [:Core/Overview#Overview:Overview], the Basic RIF Logic Dialect corresponds to definite Horn rules, and the bodies of such rules are conjunctions of atomic formulas without negation. However, it is well-known that disjunctions of such conditions can also be used in the bodies of rules without changing the essential properties of the rule language. This is based on the fundamental logical tautologies (h ← b ∨ c) ≡ ((h ← b) ∧ (h ← c)) and |
| Line 10: | Line 10: |
| Line 12: | Line 11: |
|
language, which supports primitive data types (such as `xsd:long`, `xsd:string`, `xsd:decimal`, `xsd:time`, `xsd:dateTime`), and the use of '''''international resource identifiers''''' (or '''''IRI'''''s) to refer to individuals, predicates, and functions -- an essential ingredient in making RIF dialects suitable as Web languages. To make RIF into a Web language and to provide for future higher-order dialects based on formalisms such as ["HiLog"] and ["Common Logic"], the RIF Core language does not separate symbols |
language. To make RIF dialects suitable as Web languages, RIF supports the following primitive data types: `xsd:long` (`http://www.w3.org/2001/XMLSchema#long`), `xsd:string` (`http://www.w3.org/2001/XMLSchema#string`), `xsd:decimal` (`http://www.w3.org/2001/XMLSchema#decimal`), `xsd:time` (`http://www.w3.org/2001/XMLSchema#time`), `xsd:dateTime` `http://www.w3.org/2001/XMLSchema#dateTime`), and the use of '''''international resource identifiers''''' (or '''''IRI'''''s) to refer to individuals, predicates, and functions. To ensure extensibility and to provide for future higher-order dialects based on formalisms, such as ["HiLog"] and ["Common Logic"], the RIF logic language does not draw sharp boundary between the symbols |
| Line 21: | Line 23: |
|
set. When desired, separation between the different kinds of symbols is achieved through the mechanism of '''''sorts'''''. In logic, the mechanism of sorts is used to classify symbols into separate groups. Currently RIF Core supports `the sort `rif:iri` for IRIs, and the sorts `xsd:long` (`http://www.w3.org/2001/XMLSchema#long`), `xsd:decimal` (`http://www.w3.org/2001/XMLSchema#decimal`), `xsd:time` (`http://www.w3.org/2001/XMLSchema#time`), `xsd:string` (`http://www.w3.org/2001/XMLSchema#string`), and `xsd:dateTime` (`http://www.w3.org/2001/XMLSchema#dateTime`). Other sorts (such as `xsd:double`, `xsd:date`, and a sort to represent temporal `duration`) might be added in future drafts. To make RIF a Web language and to facilitate extensibility, the underlying logic of RIF is not only multi-sorted (i.e., allows multiple sorts), but it also allows symbols to be ''polymorphic'' (i.e., the same symbol is allowed to have several sorts). Some sorts can be disjoint and others not. RIF dialects will be able to decide which sorts will be allowed for predicate symbols, which for function symbols, and so on. RIF Core does not assume that the symbols used for predicates, functions, and individuals are disjoint. RIF dialects may assume disjointness, however. It can be shown that if φ and ψ are well-formed formulas in a language that assumes such disjointness (in particular, does not have statements of the form `p = q`, where `p`, `q` are not both individuals, function symbols, or predicate names) then φ |= ψ under the disjointness semantics iff the same is true under the semantics that does not assume disjointness. `For didactic reasons, we start with a single sort, which includes all constants, function symbols, and predicate names. The multisorted RIF logic is described in Section [:Core/Positive Conditions#sec-multisorted-logic:Multisorted RIF Logic]. |
set. RIF dialects control the contexts in which the different symbols can occur by attaching ''signatures'' to these symbols. The Basic RIF logic dialect carefully selects signatures for the symbols so that the corresponding logic will be first-order: each symbol has a unique role as an individual, a function symbol of a particular arity, or a predicate symbol of a particular arity. However, dialects extending the basic logic will be allowed to use signatures in less restrictive fashions so that symbols could be polymorphic and be allowed to occur in several different contexts (for example, both as individuals and as predicates). We begin by describing a syntax, which is more general than what the basic logic dialect permits. This syntax can be used in the various dialects that extend the basic RIF logic dialect. Then we introduce the notion of a signature and specify the restrictions on the way signatures are allowed to be assigned to symbols. |
| Line 41: | Line 36: |
|
[[Anchor(sec-formal-preliminaries)]] === Formal Syntax === The language of RIF Condition Language consists of a countably infinite set of constant symbols `Const`, a countably infinite set of variable symbols `Var`, connective symbols ∧ and ∨, quantifiers ∀ and ∃, and auxiliary symbols like (, ), and so on. The basic language construct is called '''''term''''', which is formally defined as follows: * If `t` ∈ `Const` or `t` ∈ `Var` then `t` is a term. * If `t` and `t`,,1,,, ..., `t`,,n,, are terms then `t`(`t`,,1,, ... `t`,,n,,) is a term This definition is very general. It makes no distinction between constant symbols that represent individuals, predicates, and function symbols. The same symbol can occur in multiple contexts at the same time. For instance, `p(p(a), p(a,b,c))`. Even variables and general terms are allowed to occur in the position of predicates and function symbols. For instance, `p(a)(p(b,c),q)` is also a term. To control the context in which any given symbol can occur in RIF dialects, the language associates a ''signature'' with each symbol (both constant and variable symbols). '''Signatures.''' Let `SigNames` be a non-empty finite or countably infinite set of '''''signature names'''''. We assume that this set includes at least the following signature names: `i` and `bool`. The signature name `i` is intended to represent the context where the constants that denote individual objects are allowed to appear. The name `bool` represents the context of atomic formulas. A '''''signature''''' is a statement of the form `s`{`e`,,1,, ... `e`,,n,, ...} where `s` ∈ `SigNames` is the name of the signature and {`e`,,1,, ... `e`,,n,, ...} is a set of '''''arrow''''' or '''''Boolean signature expressions'''''. This set can be empty, finite, or countably infinite. In the basic RIF logic dialect, this set will have at most one expression, but in more expressive dialects a signature can have more than one expression and in this way they can support polymorphism. Arrow and Boolean signature expressions are defined as follows, where `s`,,1,,, ..., `s`,,n,,, and `s` are signature names from `SigNames`: * (`s`,,1,, ... `s`,,n,,) → `s`, where n≥0, is an '''''arrow signature expression'''''. In particular, () → `i` and (`i`) → `i` are arrow signature expressions. * If `s` above is `bool` then the signature is called a '''''Boolean signature expression'''''. A set `S` of signatures is said to be '''''coherent''''' if * it contains the signatures `i`{ } and `bool`{ }, which represent the context of individual objects and atomic formulas; and * no two different signatures in `S` have the same name. '''Well-formed terms and formulas.''' RIF uses signatures to control the context in which terms can appear through the notion of well-formed terms and well-formed atomic formulas. First, as mentioned above, each symbol (constant or variable) is associated with exactly one signature from a coherent set of signatures. (Different symbols can be associated with the same signature, but one symbol cannot be associated with more than one signature.) If σ is a signature then we will use σ^#^ to denote the name of that signature. * If `s` is a constant or variable with signature `i`{ } then it is a '''''well-formed term''''' with signature `i`{ }. * A term `t`(`t`,,1,, ... `t`,,n,,) is a '''''well-formed term''''' with signature σ iff * `t` is a well-formed term with signature σ,,0,,; * Each `t`,,i,, is a well-formed term with signature σ,,j,,, j = 1,...,n; and * σ,,0,, contains an arrow expression of the form (σ,,1,,^#^ ... σ,,n,,^#^) → σ^#^ * A term `t`(`t`,,1,, ... `t`,,n,,) is a '''''well-formed atomic formula''''' iff it is a well-formed term with signature `bool`{ }. RIF assumes that there is a special predicate, `=`, which denotes equality. Like other predicates, it has a signature, which includes the Boolean expression (`i` `i`) → `bool` and possibly other expressions of the form (`s` `s`) → `bool`, where `s` is a signature name. No other signatures are allowed for `=`. We note that it is a common practice to write the atomic formulas involving `=` using infix notation, i.e., `a = b` instead of `=(a,b)`. The equality predicate has special model-theoretic semantics, as explained in Section [:Core/Positive Conditions#sec-model-theory:Model Theory for Basic RIF Condition Language]. More general formulas are constructed out of atomic formulas with the help of logical connectives. The Basic RIF Condition Language defines the following general '''''well-formed formulas'''''. * If φ is an atomic formula than it is also a general well-formed formula. * φ ∧ ψ is a general well-formed formula, if so are φ and ψ. * φ ∨ ψ is a general well-formed formula, if so are φ and ψ. * ∃ `V`,,1,, ... `V`,,n,, φ is a general well-formed formula, if so is φ and `V`,,1,,, ..., `V`,,n,, are variables. '''Signatures in the Basic RIF Condition Language.''' The basic RIF condition language presents a much simpler picture to the user by restricting the set of well-formed terms with a very constraint coherent set of signatures. Namely, the signature set of the basic RIF logic language contains the following signatures: * `i`{ } and `bool`{ } * For every arity n, there ares signatures `f`,,n,,{(`i` ... `i`) → `i`} and `p`,,n,,{(`i` ... `i`) → `bool`} (there are n `i`'s inside the parentheses), which represent function symbols of arity n and predicate symbols of arity n, respectively. The symbols f`,,n,, and `p`,,n,, are reserved signature names in the basic RIF logic dialect. In this way, each constant symbol can be either an individual, a predicate of one particular arity, or a function symbol of one particular arity. * All variables are associated with signature `i`{ }, so they can range only over individuals. * The equality symbol, `=`, has signature `p`,,2,,{(`i` `i`) → `bool`}. In this way, the equality predicate can compare only individual constants and not predicates and function symbols. '''Primitive data types.''' The set of all constant symbols in RIF has a number of predefined subsets, which correspond to XML data types and to some RIF-specific types. Namely, the following primitive data types are supported: `xsd:long` (`http://www.w3.org/2001/XMLSchema#long`), `xsd:string` (`http://www.w3.org/2001/XMLSchema#string`), `xsd:decimal` (`http://www.w3.org/2001/XMLSchema#decimal`), `xsd:time` (`http://www.w3.org/2001/XMLSchema#time`), `xsd:dateTime` `http://www.w3.org/2001/XMLSchema#dateTime`). In addition, a RIF-specific data type, called `rif:iri` (for '''''international resource identifier''''' or '''''IRI'''''), is supported. Constant symbols that belong to this data type have special concrete syntax and semantic structures will interpret them in a special way. For the formal syntax we only need to state that the set `Const` of all constant symbols has subsets `Const`,,xsd:long,,, `Const`,,xsd:string,,, `Const`,,xsd:decimal,,, `Const`,,xsd:time,,, `Const`,,xsd:dateTime,,, and `Const`,,rif:iri,,. |
|
| Line 110: | Line 166: |
| The concrete human-readable syntax, described in this (slightly modified) EBNF, is work in progress and under discussion. | '''EBNF representation.''' The concrete human-readable syntax, described in this (slightly modified) EBNF, is work in progress and under discussion. |
| Line 121: | Line 177: |
|
Const ::= CONSTNAME | '"'CONSTNAME'"''^^'SORTNAME Var ::= '?'VARNAME | '?'VARNAME'^^'SORTNAME |
Const ::= CONSTNAME | '"'CONSTNAME'"''^^'TYPENAME Var ::= '?'VARNAME |
| Line 136: | Line 192: |
| Since initially we focus on an unsorted RIF language, individuals, function | Note that individuals, function |
| Line 138: | Line 194: |
|
Variables are also unsorted, i.e., they range over the entire domain. Sorts are introduced in Section [:Core/Positive_Conditions#sec-multisorted-logic:Multisorted RIF Logic]. However, in preparation of the multisorted version, the above EBNF syntax already introduces {{{SORTNAME}}}-enriched choices for `Const` and `Var` that will be required in Section [:Core/Positive_Conditions#sec-multisorted-logic:Multisorted RIF Logic]. |
This syntax is more general than what the basic RIF logic dialect actually permits. As explained in Section [:Core/Positive_Conditions#sec-formal-preliminaries:Formal Syntax] a set of signatures restricts this syntax to allow only the terms that are allowed in standard firts-order logic. |
| Line 153: | Line 207: |
|
constants, variables, or sorts. For the moment, {{{NamedElement}}} in the UML model can be either {{{CONSTNAME}}}, or {{{VARNAME}}}, or {{{SORTNAME}}}. These |
constants and variables. These |
| Line 156: | Line 209: |
| For more details on this see [:Core/Positive_Conditions#sec-multisorted-logic:Multisorted RIF Logic]. | |
| Line 184: | Line 236: |
| When conditions are used as queries, their free variables are used to | When RIF conditions are used as queries, their free variables |
| Line 187: | Line 239: |
|
The following is a possible XML-serializing mapping of the abstract syntax in Section [:Core/Positive Conditions#sec-abstract-syntax:Abstract Syntax] (and of the above EBNF syntax). This XML serialization is partially [http://esw.w3.org/topic/StripeSkipping stripe-skipped] in that it uses positional information instead of most role elements. The only place where role elements ('declare' and 'formula') are explicit is within the `Exists` element. |
'''XML serialization.''' The following is a possible XML-serializing mapping of the abstract syntax in Section [:Core/Positive Conditions#sec-abstract-syntax:Abstract Syntax] (and of the above EBNF syntax). This XML serialization is fully striped, where positional information is optionally exploited only for the `arg` role elements. For example, role elements (`declare` and `formula`) are explicit within the `Exists` element. |
| Line 207: | Line 259: |
|
- op (operation role) - arg (argument role, positional/non-positional without/with 'index' attribute) |
|
| Line 208: | Line 262: |
| - side (left-hand and right-hand side role) | |
| Line 219: | Line 274: |
| And ( Exists ?Buyer (purchase(?Buyer ?Seller book(?Author LeRif) $49)) ?Seller=?Author ) | And ( Exists ?Buyer (purchase(?Buyer ?Seller book(?Author LeRif) USD(49))) ?Seller=?Author ) |
| Line 224: | Line 279: |
|
<Exists> <declare><Var>Buyer</Var></declare> <formula> <Uniterm> <Const>purchase</Const> <Var>Buyer</Var> <Var>Seller</Var> |
<formula> <Exists> <declare><Var>Buyer</Var></declare> <formula> |
| Line 232: | Line 284: |
|
<Const>book</Const> <Var>Author</Var> <Const>LeRif</Const> |
<op><Const>purchase</Const></op> <arg><Var>Buyer</Var></arg> <arg><Var>Seller</Var></arg> <arg> <Uniterm> <op><Const>book</Const></op> <arg><Var>Author</Var></arg> <arg><Const>LeRif</Const></arg> </Uniterm> </arg> <arg> <Uniterm> <op><Const>USD</Const></op> <arg><Const>49</Const></arg> </Uniterm> </arg> |
| Line 236: | Line 301: |
|
<Const>$49</Const> </Uniterm> </formula> </Exists> <Equal> <Var>Seller</Var> <Var>Author</Var> </Equal> |
</formula> </Exists> </formula> <formula> <Equal> <side><Var>Seller</Var></side> <side><Var>Author</Var></side> </Equal> </formula> |
| Line 247: | Line 313: |
|
== Semantic Structures == |
'''Syntax for Primitive Types.''' In this document we will use the following syntax, following [http://www.w3.org/TR/rdf-testcases/#ntriples N-Triples], to specify the constant symbols that belong to primitive data types: {{{ "value"^^typename }}} For instance, {{{"123"^^xsd:long}}} is a symbol that belongs to the primitive data type `xsd:long`, {{{"2007-11-23T03:55:44-02:30"^^xsd:dateTime}}} is a symbol of data type `xsd:dateTime`, and {{{"ftp://example.com/foobar"^^rif:iri}}} is a symbol that belongs to the data type `rif:iri`. The part of such symbols that occurs inside the double quotes is called a '''''literal''''' of the symbol. The surrounding double quotes are not part of the literal. If a double quote must be included as part of a literal, it must be escaped with the backslash. The XML syntax for primitive types can utilize the `type` attribute associated with XML term elements such as `Const`. For instance, the earlier `xsd:dateTime` example can be represented as {{{<Const type="xsd:dateTime">2007-11-23T03:55:44-02:30</Const>}}}. Many primitive types in RIF may be based on the XML Schema data types and thus it is expected that some names of RIF primitive types will be references to those XML Schema types. In this case, these IRIs will be used as type names. In this version of the RIF Core document, we define the following primitive types, where the prefix `xsd` refers to the XML Schema URI and `rif` is the prefix for the RIF language. The syntax such as `xsd:long` should be understood as a [http://www.w3.org/2001/sw/BestPractices/HTML/2005-10-27-CURIE compact uri], i.e., a macro, which expands to a concatenation of the character sequence denoted by the prefix `xsd` and the string `long`. In the next version of this document we will introduce a syntax for defining prefixes for compact URIs and will also expand on the syntax for the symbols that can be used to denote RIF's primitive data types. * `xsd:long`. This data type corresponds to the XML data type `xsd:long`. Example: {{{"123"^^xsd:long}}}. Long integers also have a short notation, which does not require the {{{"..."^^xsd:long}}} wrapper. Example: `123`. * `xsd:decimal`. This corresponds to the XML data type `xsd:decimal`. Example: {{{"123.45"^^xsd:decimal}}}. Decimals also have an alternative short notation, which does not require the {{{"..."^^xsd:decimal}}} wrapper. Example: `123.45`. The type `xsd:decimal` is a supertype of `xsd:long`. * `xsd:string`. This corresponds to the XML data type `xsd:string`. Example: {{{"a string"^^xsd:string}}}. Double quotes that appear inside strings are escaped with a backslash and a backslash that is supposed to appear in the string must be escaped with another backslash. Strings have also an alternative short notation. Example: `'a string'`. In this notation, single quotes and backslashes that occur inside such strings are escaped with backslashes. * `xsd:time`. This corresponds to the XML data type `xsd:time`. Example: {{{"18:33:44.2345"^^xsd:time}}}. * `xsd:dateTime`. This type corresponds to the XML data type `xsd:dateTime`. Example: {{{"2007-03-12T21:22:33.44-01:30"^^xsd:dateTime}}}. * `rif:iri`. Symbols of this type have the form {{{"XYZ"^^rif:iri}}}, where `XYZ` is an IRI as specified in [http://www.ietf.org/rfc/rfc3987.txt RFC 3987]. Other XML data types that are likely to be incorporated in RIF include `xsd:double`, `xsd:date`, and a type for temporal `duration`. At present, the partial order on the above primitive data types is imposed by the XML Schema hierarchy, so the only subtype relationship is between `xsd:long` and `xsd:decimal`. This may be extended as more types are added. The type `rif:iri` is intended to be used in a way similar to [http://www.w3.org/TR/rdf-schema/ RDFS resources]. The domain of the type `rif:iri` can be any set and no a priori equalities among the members of the type `rif:iri` are assumed. This domain is ''not'' the same as the value space of the XML primitive type `anyURI`. The constant symbols that correspond to XML data types all have the signature `i`{ } in the basic RIF logic dialect. The symbols of type `rif:iri` can have any signature allowed by the basic RIF logic: `i`, `f`,,n,,, or `p`,,n,,, for n = 0,1,.... The `purchase` atomic formula of Example 2 can be enriched with primitive types, obtaining the EBNF version {{{ purchase(?Buyer ?Seller book(?Author "LeRif"^^xsd:string) USD("49"^^xsd:long)) }}} and its XML serialization {{{ <Uniterm> <op><Const>purchase</Const></op> <arg><Var>Buyer</Var></arg> <arg><Var>Seller</Var></arg> <arg> <Uniterm> <op><Const>book</Const></op> <arg><Var>Author</Var></arg> <arg><Const type="xsd:string">LeRif</Const></arg> </Uniterm> </arg> <arg> <Uniterm> <op><Const>USD</Const></op> <arg><Const type="xsd:long">49</Const></arg> </Uniterm> </arg> </Uniterm> }}} == Specifying Signatures in RIF Dialects == We will now explain how signatures are defined in the Basic Condition Language and its extensions. In the Basic RIF Logic dialect, there is no need to declare signatures, since they can be inferred. Indeed, the basic logic dialect assumes that each symbol is associated with a unique signature. Therefore, the signature can be determined by the context in which the symbol is used. If a symbol is used in more than one context, the parser should deem it as a syntax error. In dialects that extend the Basic RIF Condition Language, signature inference of the above kind is not possible, in general, and might not always be appropriate even when it is possible. For this reason, we intriduce the attribute `signature` for the XML tags `Const` and `Var`. The value of such a tag is supposed to be the signature name for the corresponding symbol. For instance, in the above example, `purchase` was a 4-ary predicate, so it could be specified as `<Const signature="p4">purchase</Const>`. Since variables in the basic condition language always have the signature `i`, we could write <Var signature="i">Author</Var>. The `signature` attribute is optional, and its value can be inferred for the basic language. Dialects that extend the basic logic dialect can adopt their own rules for omitting this attribute and for inferring its values. The basic language does not need any special syntax for defining signatures -- all signature names are already defined: `i`, `f0`, `f1`, `p0`, `p1`, etc. The dialects, however, will need a special sublanguage for defining signatures. The signature {{{ mysig{ (s1 s2 s3) -> bool, (s i) -> i } }}} can, for example, be defined using the following XML excerpt: {{{ <rif:signature rif:signature-name="mysig"> <rif:sig-expression rif:sig-head="bool"> s1 s2 s3 </rif:sig-expression> <rif:sig-expression rif:sig-head="i"> s i </rif:sig-expression> </rif:signature> }}} [[Anchor(sec-model-theory)]] == Model Theory for the Basic RIF Condition Language == |
| Line 255: | Line 421: |
| A '''''semantic structure''''', '''''I''''', is a tuple of the form <'''''D''''','''''I''''',,C,,, '''''I''''',,V,,, '''''I''''',,F,,, '''''I''''',,R,,>, which determines the truth value of every formula, as explained below. Currently, by '''''formula''''' we mean anything produced by the `CONDITION` production in the EBNF. Later on this term will also include rules. Other dialects might extend this notion even further. | '''''Semantic structures.''''' A '''''semantic structure''''', '''''I''''', is a tuple of the form <'''''D''''','''''I''''',,C,,, '''''I''''',,V,,, '''''I''''',,F,,, '''''I''''',,R,,>, which determines the truth value of every formula, as explained below. Currently, by '''''formula''''' we mean anything produced by the `CONDITION` production in the EBNF. Later on this term will also include rules. Other dialects might extend this notion even further. |
| Line 258: | Line 424: |
| For RIF Core, '''''TV''''' includes only two values, '''t''' | For the basic RIF logic dialect, '''''TV''''' includes only two values, '''t''' |
| Line 263: | Line 429: |
| '''''truth order'''''; it is denoted with <,,t,,. In RIF Core, '''f''' <,,t,, '''t''', and it is a total order. | '''''truth order'''''; it is denoted with <,,t,,. In the basic RIF logic, '''f''' <,,t,, '''t''', and it is a total order. |
| Line 272: | Line 438: |
| As mentioned above, an interpretation, '''''I''''', consists of a domain, '''''D''''', and four mappings: | The mappings that constitute an interpretation, '''''I''''', are as follows: |
| Line 278: | Line 444: |
|
We remark that `f()` and `f` can be interpreted differently by semantic structures, and the proposed XML serialization also treats them as distinct terms: the nullary function application `<Uniterm><Const>f</Const></Uniterm>` vs. the symbol `<Const>f</Const>`. Semantically, neither `f() = f` nor `f()` ≠ `f` are entailed a priori. However, if desirable, a particular application or a dialect may introduce axioms of the form `f = f()` to equate symbols and nullary terms. Using these mappings, we can define a more general mapping, ''I'', as follows: |
It is convenient to define a more general mapping, ''I'', based on the mappings '''''I''''',,C,,, '''''I''''',,V,,, '''''I''''',,F,,, and '''''I''''',,R,,: |
| Line 286: | Line 450: |
| Truth values for formulas in RIF Core are determined using the following function, denoted '''''I''''',,Truth,,: |
Note that `f()` and `f` can be interpreted differently by semantic structures, and the proposed XML serialization also treats them as distinct terms: the nullary function application `<Uniterm><Const>f</Const></Uniterm>` vs. the symbol `<Const>f</Const>`. Semantically, neither `f() = f` nor `f()` ≠ `f` are entailed a priori. However, if desirable, a particular application or a dialect may introduce axioms of the form `f = f()` to equate symbols and nullary terms. '''Truth valuation for formulas.''' Observe that the notion of signatures from Section [:Core/Positive_Conditions#sec-formal-preliminaries:Formal Syntax] is used only to constrain the syntax and does not appear in the definition of the semantic structure. This is because when we define truth valuations for formulas, below, all formulas are assumed to be well-formed. Truth valuation for well-formed formulas in the basic RIF condition language is determined using the following function, denoted '''''I''''',,Truth,,: |
| Line 296: | Line 466: |
|
[[Anchor(sec-multisorted-logic)]] == Multisorted RIF Logic == RIF uses multi-sorted logic to account for primitive data types, the use of IRIs as identifiers for objects and concepts, and more. This framework is general enough to accommodate a number of popular extensions, such as ["Prolog"], ["HiLog"], ["F-logic"], ["Common Logic"], and ["RDF"], which allow the same symbol to play multiple roles. For instance, the same symbol, ''foo'', can be used to denote an individual, a predicate of several different arities, and as a function symbol of different arities. In a multisorted RIF Core, each symbol from `Const` is associated with a '''''sort''''' and zero or more '''''signatures'''''. A signature can be an '''''arrow signature''''', or a '''''Boolean signature'''''. Arrow signatures are also known as ''function'' signatures, and Boolean signatures are also known as ''predicate'' signatures. A variable may be associated with one sort, or it may be unsorted. Sorts are drawn from a fixed collection of sorts PS,,1,,, ..., PS,,n,,. These sorts are intended to model primitive data types. For instance, RIF Core supports the sorts `xsd:long`, `xsd:string`, `xsd:dateTime`, and `rif:iri` among others. The same symbol can be associated with more than one primitive sort. It is also possible that one sort would be a subsort of another. So, there is a ''partial order defined over the sorts''. For example, the sort of short integers might be defined as a subsort of the sort of long integers, so every symbol of the sort `xsd:short` would also be associated with the sort `xsd:long`. In RIF, symbols of different sorts are distinguished syntactically. A concrete syntax for sorted symbols and variables is shown in Section [:Core/Positive_Conditions#sec-multisorted-syntax:Multisorted Syntax]. The language of the RIF sorted logic also provides a special sublanguage for defining the signatures of function symbols and predicates. An '''''arrow signature''''' is a statement of the form `s`,,1,, × ... × `s`,,k,, → `s`, where `s`,,1,,, ..., `s`,,k,,, `s` are names of sorts (i.e., one of the PS,,1,,, ..., PS,,n,,). For instance, `xsd:dateTime` × `xsd:long` → `xsd:dateTime` could be an arrow signature for a function. A '''''Boolean signature''''' is a statement of the form s,,1,, × ... × s,,k,,, where, again, `s`,,1,,, ..., `s`,,k,, are names of sorts. For instance, the predicate `HappenedBefore` may have a Boolean signature `xsd:dateTime` × `xsd:dateTime`. The arrow and Boolean signatures, along with the primitive sorts, are used in the definition of '''''well-formed''''' terms and formulas. This notion is defined in Section [:Core/Positive_Conditions#sec-formalizing-multisorted:Formalization of the Multisorted RIF Logic]. (See [:Core/EndNotes#Note2.1-4:end note on using sorts to represent FOL, RDF, and other logics].) [[Anchor(sec-formalizing-multisorted)]] === Formalization of the Multisorted RIF Logic === In addition to the syntax for the sorted literals, RIF Core needs the following extensions. First, we introduce the following new functions: * `Sort`: `Const` → Primitive_Sorts * `ASignature`: `Const` → Arrow_Signatures * `BSignature`: `Const` → Boolean_Signatures `Sort` associates a sort with every symbol in `Const`. `ASignature` associates a (possibly empty) set of arrow signatures with every symbol. Similarly, `BSignature` associates a (possibly empty) set of Boolean signatures with every symbol in `Const`. The set of primitive sorts is assumed to be '''''partially ordered'''''. RIF Core defines a set of builtin sorts as well as the partial order on them. RIF dialects will add to this set and extend the order according to their needs. The function `Sort` is also defined for variables: || Sort: Var → Primitive_Sorts || It is a ''partial function'' in this case. The informal meaning is that if ?v ∈ `Var` and `Sort`(?v) = s then `?v` can be bound only to terms that belong to the sort s (we define what it means for a term to belong to a particular sort below). If `Sort`(?v) is undefined then `?v` is an unsorted variable, which can be bound to any term of any sort. '''Well-formed function terms'''. Any symbol, c ∈ `Const`, is a '''''well-formed function term of sort Sort(c)'''''. A variable, `?v`, is a well-formed term of sort `Sort(?v)`, if `Sort(?v)` is defined. If `Sort(?v)` is not defined then `?v` is a well-formed term for every sort. By induction, if `f`(`t`,,1,, ... `t`,,k,,) is a function term then it is a '''''well-formed function term of sort s''''' if there is an arrow signature `s`,,1,,, ..., `s`,,k,, → `s` ∈ `ASignature(f)` such that `t`,,1,,, ..., `t`,,k,, are well-formed function terms of sorts `s`,,1,,, ..., `s`,,k,,, respectively. If `t` is a well-formed term of sort `s` and `s'` is a supersort of `s` then `t` is also a well-formed term of sort `s'`. It is convenient to extend the mapping `Sort` from symbols in `Const` to terms created as function applications as follows: || Sort(t) = { s | t is a well-formed term of sort s } || '''Well-formed atomic formula'''. We can now say that an atomic formula `p`(t,,1,,, ..., t,,k,,) is '''''well-formed''''' if and only if `t`,,1,,, ..., `t`,,k,, are well-formed function terms and there is a Boolean signature s,,1,, × ... × s,,k,, ∈ `BSignature(p)` such that `s`,,1,, = `Sort`(`t`,,1,,), ..., `s`,,k,, = `Sort`(`t`,,k,,). From now on, we also require that all atomic formulas that occur in RIF conditions and rules must be well-formed. '''Well-formed condition'''. A condition formula is well-formed if it is constructed out of well-formed atomic formulas. [[Anchor(sec-multisorted-syntax)]] === Multisorted Syntax === Now we define how exactly the sorts are specified in RIF, i.e., how the functions `Sort`, `ASignature`, and `BSignature` are defined in practice. '''Syntax for Primitive Sorts.''' In this document we will use the following syntax, following [http://www.w3.org/TR/rdf-testcases/#ntriples N-Triples], to specify the primitive sorts, which lie in the range of the function `Sort`: {{{ "value"^^sortname }}} For instance, {{{"123"^^xsd:long}}} is a symbol that belongs to the primitive sort `xsd:long`, {{{"2007-11-23T03:55:44-02:30"^^xsd:dateTime}}} is a symbol of sort `xsd:dateTime`, and {{{"ftp://example.com/foobar"^^rif:iri}}} is a symbol that belongs to the sort `rif:iri`. The part of such symbols that occurs inside the double quotes is called a '''''literal''''' of the sorted symbol. The surrounding double quotes are not part of the literal. If a double quote must be included as part of a literal, it must be escaped with the backslash. Some frequently used sorted literals, namely strings, long integers, and decimals, will have alternative short notation, as explained below. Variables can also be sorted. A variable `?v` of sort `xsd:dateTime`, for example, is represented as {{{?v^^xsd:dateTime}}}. We will continue to use our old notation, `?v`, for unsorted variables, i.e., variables that range over the entire domain, which includes all sorts. (See [:Core/EndNotes#Note2.1-5:end note on polymorphic symbols].) The XML syntax for sorts can utilize the 'sort' attribute associated with XML term elements such as `Const`. For instance, the earlier `xsd:dateTime` example can be represented as {{{<Const sort="xsd:dateTime">2007-11-23T03:55:44-02:30</Const>}}}. A correspondingly sorted variable {{{xyz}}} would be written as {{{<Var sort="xsd:dateTime">xyz</Var>}}}. Many primitive sorts in RIF may be based on the XML Schema data types and thus it is expected that some names of RIF primitive types will be references to those XML Schema types. In this case, these sorts will be IRI strings. In this version of the RIF Core document, we define the following primitive sorts, where the prefix `xsd` refers to the XML Schema URI and `rif` is the prefix for the RIF language. The syntax such as `xsd:long` should be understood as a [http://www.w3.org/2001/sw/BestPractices/HTML/2005-10-27-CURIE compact uri], i.e., a macro, which expands to a concatenation of the character sequence denoted by the prefix `xsd` and the string `long`. In the next version of this document we will introduce a syntax for defining prefixes for compact URIs and will also expand on the syntax for the symbols that can be used to denote RIF sorts. * `xsd:long`. This sort corresponds to the XML data type `xsd:long`. Example: {{{"123"^^xsd:long}}}. Long integers also have a short notation, which does not require the {{{"..."^^xsd:long}}} wrapper. Example: `123`. * `xsd:decimal`. This sort corresponds to the XML data type `xsd:decimal`. Example: {{{"123.45"^^xsd:decimal}}}. Decimals also have an alternative short notation, which does not require the {{{"..."^^xsd:decimal}}} wrapper. Example: `123.45`. The sort `xsd:decimal` is a supersort of `xsd:long`. * `xsd:string`. This corresponds to the XML data type `xsd:string`. Example: {{{"a string"^^xsd:string}}}. Double quotes that appear inside strings are escaped with a backslash and a backslash that is supposed to appear in the string must be escaped with another backslash. Strings have also an alternative short notation. Example: `'a string'`. In this notation, single quotes and backslashes that occur inside such strings are escaped with backslashes. * `xsd:time`. This corresponds to the XML data type `xsd:time`. Example: {{{"18:33:44.2345"^^xsd:time}}}. * `xsd:dateTime`. This sort corresponds to the XML data type `xsd:dateTime`. Example: {{{"2007-03-12T21:22:33.44-01:30"^^xsd:dateTime}}}. * `rif:iri`. Symbols of this sort have the form {{{"XYZ"^^rif:iri}}}, where `XYZ` is an IRI as specified in [http://www.ietf.org/rfc/rfc3987.txt RFC 3987]. Other XML data types that are likely to be incorporated as RIF sorts include `xsd:double`, `xsd:date`, and a sort for temporal `duration`. At present, the partial order on the above sorts is imposed by the XML Schema hierarchy, so the only subsort relationship is between `xsd:long` and `xsd:decimal`. This may be extended as more sorts are added. The domain of the aforesaid primitive sorts, which correspond to XML primitive data types is defined to be the value space associated with the corresponding data type as specified in [http://www.w3.org/TR/xmlschema-2/ XML Schema Part 2: Datatypes]. This means that, for example {{{"1.2"^^xsd:decimal}}} and {{{"1.20"^^xsd:decimal}}} represent the same symbol (i.e., '''''I''''',,C,,({{{"1.2"^^xsd:decimal}}}) = '''''I''''',,C,,({{{"1.20"^^xsd:decimal}}}) in every semantic structure). The same holds regarding the equalities defined for the value spaces of other data types. The sort `rif:iri` is intended to be used in a way similar to [http://www.w3.org/TR/rdf-schema/ RDFS resources]. The domain of the sort `rif:iri` can be any set and no a priori equalities among the members of the sort `rif:iri` are assumed. This domain is ''not'' the same as the value space of the XML primitive type `anyURI`. '''Specifying Arrow and Boolean Signatures.''' Arrow signatures are declared as follows. Note that multiple sorts can be declared for the same symbol. {{{:-}}} '''signature''' ''NAME'' ''s'',,1,, * ... * ''s'',,n,, → ''s'' ''',''' ''r'',,1,, * ... * ''r'',,k,, → r ''',''' ... Boolean signatures are declared similarly: {{{:-}}} '''signature''' ''NAME'' ''s'',,1,, * ... * ''s'',,n,, ''',''' ''r'',,1,, * ... * ''r'',,k,, ''',''' ... Note that ''NAME'' is a symbol from `Const` whose syntax was specified in Section [:Core/Positive_Conditions#sec-multisorted-syntax:Multisorted Syntax]. In most cases functions and predicates are denoted by IRIs, so such a symbol will have the syntax {{{"..."^^rif:iri}}}. In many cases sorts will be XML Schema data types referred to by strings that syntactically look like IRIs (in the sense of [http://www.ietf.org/rfc/rfc3987.txt RFC 3987]), but they should '''not''' be confused with RIF ''symbols of the sort'' `rif:iri`. For instance, in `"1"^^'http://www.w3.org/2001/XMLSchema#long'`, the sort specification `http://www.w3.org/2001/XMLSchema#long` is an IRI in the sense of [http://www.ietf.org/rfc/rfc3987.txt RFC 3987], but it is not a `rif:iri` symbol, because it does not have the form `"http://www.w3.org/2001/XMLSchema#long"^^rif:iri`. '''Sorts and Subsorts.''' Stating that a sort `s1` is a subsort of `s2` means the following: * For every individual of the sort `s1`, `"x"^^s1`, the individual `"x"^^s2` is a legal member of sort `s2`, and there is an equality axiom `"x"^^s1` = `"x"^^s2`. * Every sorted variable of the form `?X^^s1` can appear in any place where a predicate or a function signature requires a term of sort `s2`. For instance, if the predicate `p` has signature `s2` * `s2` then `p(?X^^s1,?X^^s2)` is a well-formed term. There are also semantic implications to subsort relationship, which affect the definition of semantic structures. These implications are discussed in the next section. Note that a priori there is no built-in equality of the form `"a"^^s = "a"^^r` unless `s` is a subsort of `r`. However, RIF dialects that extend the core may introduce additional implicit equalities between sorted constants for various reasons. Recall that XML Schema imposes certain subsort relationships, such as that `xsd:long` is a subsort of `xsd:decimal`. The above definition of subsorts implies that for every integer number that can be represented using 64 bits, such as `12345`, there is an equality axiom `"12345"^^xsd:long = "12345"^^xsd:decimal`. In addition to what is imposed by XML Schema, RIF dialects might introduce subsort relationships of their own. For instance, although RIF Core does not require individuals to be disjoint from predicate symbols and functions, some dialects may choose to require such disjointness. This can be done by, for example, introducing disjoint sorts `iriInd`, `iriPred`, and `iriFun` all of which would be subsorts of the built-in sort `iri`. [[Anchor(sec-multisorted-semantics)]] === Semantics of the Multisorted RIF Conditions === The definition of RIF semantic structures needs only minor modifications. '''''A multisorted semantic structure''''' '''''I''''', consists of a domain '''''D''''' = '''''D''''',,s1,, ∪ ... ∪ '''''D''''',,sn,,, and the same four mappings as before. Each '''''D''''',,si,, is the domain of interpretation of the primitive sort `s`,,i,,. If one sort, `s1`, is a subsort of another sort, `s2`, then the domain '''''D''''',,s1,, is a subset of '''''D''''',,s2,,. Only the mappings '''''I''''',,C,,, '''''I''''',,V,,, and '''''I''''',,F,, require some changes. These changes are expressed as additional constraints as indicated below. * The mapping '''''I''''',,C,, from `Const` to elements of '''''D''''' is now represented as a union of the mappings '''''I''''',,C,,^s^, one for each sort `s`. '''''I''''',,C,,^s^ maps `Const`,,s,, (the set of all individuals of sort `s`) to '''''D''''',,s,, (the domain of sort `s`). Note 1: If a sort is a built-in sort, such as the sorts based on the XML data types, then both '''''D''''',,s,, and the interpretation '''''I''''',,C,,^s^ are fixed and defined by the XML Datatype specification. Note 2: If there is an equality among sorted constants, such as `"x"^^s1 = "x"^^s2` (which might be given implicitly, if `s1` is a subsort of `s2`, or explicitly, because the user said so) then `I`,,C,,(`"x"^^s1`) = `I`,,C,,(`"x"^^s2`). * '''''I''''',,V,, from `Var` to elements of '''''D'''''. ''New constraint'': If `?v` is a variable of sort `s` then '''''I''''',,V,,`(?v)` ∈ '''''D''''',,s,,. * '''''I''''',,F,, from `Const` to functions from '''''D*''''' into '''''D''''' (here '''''D*''''' is a set of all tuples of any length over the domain '''''D'''''). ''New constraint'': If `f` has an arrow signature `s`,,1,,× ...× `s`,,k,, → `s` ∈ `ASignature(f)` then '''''I''''',,F,,`(f)` must be a function of type '''''D''''',,s1,, × ... × '''''D''''',,sk,, → '''''D''''',,s,,. The latter means that if `d`,,1,, ∈ '''''D''''',,s1,,, ..., d,,k,, ∈ '''''D''''',,sk,, then '''''I''''',,F,,`(f)`(d,,1,,,...,d,,k,,) must be in '''''D''''',,s,, (but if the arguments are not in '''''D''''',,s1,, × ... × '''''D''''',,sk,, then the result does not need to be in '''''D''''',,s,,). However, since the symbol `f` can have additional signatures, '''''I''''',,F,,`(f)` can have additional types, which restrict the behavior of this function. * '''''I''''',,R,, from `Const` to truth-valued mappings '''''D*''''' → '''''Truth'''''. Using these mappings, we can define a more general mapping, ''I'', as follows: * '''''I'''''(k) = '''''I''''',,C,,(k) if k is a symbol in `Const` * '''''I'''''(?v) = '''''I''''',,V,,(?v) if ?v is a variable in `Var` * '''''I'''''(f(t,,1,, ... t,,n,,)) = '''''I''''',,F,,(f)('''''I'''''(t,,1,,),...,'''''I'''''(t,,n,,)). ''New constraint'': Here f(t,,1,, ... t,,n,,) must be a ''well-formed term'' as defined earlier. The definition of '''''I''''',,Truth,, stays the same. |
'''Interpretation of primitive data types.''' We now explain how primitive data types are integrated into the semantics of the basic RIF logic. The [http://www.w3.org/TR/xmlschema-2/ XML Schema Part 2: Datatypes] specification defines the '''''value space''''' for each XML data type, including the data types such as `xsd:decimal`, which are of interest to RIF. The value space is different from the so-called '''''lexical space'''''. Lexical space refers to the syntax of the constant symbols that belong to a particular primitive data type. For instance, {{{"1.2"^^xsd:decimal}}} and {{{"1.20"^^xsd:decimal}}} are two different constants in RIF and in the lexical space of the XML data types. However, these two constants are interpreted by the ''same'' element of the value space of the `xsd:decimal` type. Formally, each of the XML data types supported by RIF comes with a value space, denoted by '''''D''''',,type,, (for instance, '''''D''''',,xsd:decimal,,), and a mapping, '''''I''''',,C,,: `Const`,,type,, → '''''D''''',,type,,. These value spaces and the corresponding mappings are defined by the [http://www.w3.org/TR/xmlschema-2/ XML Schema Part 2: Datatypes] specification. We assume that '''''D''''',,type,, ⊆ '''''D''''' for each XML data type and that '''''D''''',,t,, is disjoint from '''''D''''',,s,, for different XML primitive types s and t. For instance, for `xsd:decimal` this mapping is '''''I''''',,C,,: `Const`,,xsd:decimal,, → '''''D''''',,xsd:decimal,,, and it follows from the aforesaid specification that, for instance, '''''I''''',,C,,({{{"1.2"^^xsd:decimal}}}) = '''''I''''',,C,,({{{"1.20"^^xsd:decimal}}}) in every semantic structure. For the `rif:iri` data type, the corresponding mapping is '''''I''''',,C,,: `Const`,,rif:iri,, → '''''D''''', i.e., '''''D''''',,rif:iri,, = '''''D''''' and the interpretation mapping is allowed to map an IRI into any value in the domain of the semantic structure. |
The language of positive RIF conditions determines what can appear as a body (the if-part) of a rule supported by the basic RIF logic. As explained in Section Overview, the Basic RIF Logic Dialect corresponds to definite Horn rules, and the bodies of such rules are conjunctions of atomic formulas without negation. However, it is well-known that disjunctions of such conditions can also be used in the bodies of rules without changing the essential properties of the rule language. This is based on the fundamental logical tautologies (h ← b ∨ c) ≡ ((h ← b) ∧ (h ← c)) and ∀ x (F ∧ G) ≡ (∀ x F ∧ ∀ x G). In other words, a rule with a disjunction in the body can be split into two or more rules that have no such disjunction.
In a later draft, positive RIF conditions will be extended to include builtins, i.e. calls to procedures defined outside the ruleset. The condition language will be shared among the bodies of the rules expressed in future RIF dialects, such as LP, FO, PR and RR. The condition language might also be used to uniformly express integrity contraints and queries. This section presents a syntax and semantics for the RIF condition language.
To make RIF dialects suitable as Web languages, RIF supports the following primitive data types: xsd:long (http://www.w3.org/2001/XMLSchema#long), xsd:string (http://www.w3.org/2001/XMLSchema#string), xsd:decimal (http://www.w3.org/2001/XMLSchema#decimal), xsd:time (http://www.w3.org/2001/XMLSchema#time), xsd:dateTime http://www.w3.org/2001/XMLSchema#dateTime), and the use of international resource identifiers (or IRIs) to refer to individuals, predicates, and functions.
To ensure extensibility and to provide for future higher-order dialects based on formalisms, such as HiLog and Common Logic, the RIF logic language does not draw sharp boundary between the symbols used to denote individuals from symbols used as names for functions or predicates. Instead, all symbols are drawn from the same universal set. RIF dialects control the contexts in which the different symbols can occur by attaching signatures to these symbols.
The Basic RIF logic dialect carefully selects signatures for the symbols so that the corresponding logic will be first-order: each symbol has a unique role as an individual, a function symbol of a particular arity, or a predicate symbol of a particular arity. However, dialects extending the basic logic will be allowed to use signatures in less restrictive fashions so that symbols could be polymorphic and be allowed to occur in several different contexts (for example, both as individuals and as predicates).
We begin by describing a syntax, which is more general than what the basic logic dialect permits. This syntax can be used in the various dialects that extend the basic RIF logic dialect. Then we introduce the notion of a signature and specify the restrictions on the way signatures are allowed to be assigned to symbols.
Syntax
Formal Syntax
The language of RIF Condition Language consists of a countably infinite set of constant symbols Const, a countably infinite set of variable symbols Var, connective symbols ∧ and ∨, quantifiers ∀ and ∃, and auxiliary symbols like (, ), and so on. The basic language construct is called term, which is formally defined as follows:
If t ∈ Const or t ∈ Var then t is a term.
If t and t1, ..., tn are terms then t(t1 ... tn) is a term
This definition is very general. It makes no distinction between constant symbols that represent individuals, predicates, and function symbols. The same symbol can occur in multiple contexts at the same time. For instance, p(p(a), p(a,b,c)). Even variables and general terms are allowed to occur in the position of predicates and function symbols. For instance, p(a)(p(b,c),q) is also a term. To control the context in which any given symbol can occur in RIF dialects, the language associates a signature with each symbol (both constant and variable symbols).
Signatures. Let SigNames be a non-empty finite or countably infinite set of signature names. We assume that this set includes at least the following signature names: i and bool. The signature name i is intended to represent the context where the constants that denote individual objects are allowed to appear. The name bool represents the context of atomic formulas. A signature is a statement of the form s{e1 ... en ...} where s ∈ SigNames is the name of the signature and {e1 ... en ...} is a set of arrow or Boolean signature expressions. This set can be empty, finite, or countably infinite. In the basic RIF logic dialect, this set will have at most one expression, but in more expressive dialects a signature can have more than one expression and in this way they can support polymorphism.
Arrow and Boolean signature expressions are defined as follows, where s1, ..., sn, and s are signature names from SigNames:
(s1 ... sn) → s, where n≥0, is an arrow signature expression. In particular, () → i and (i) → i are arrow signature expressions.
If s above is bool then the signature is called a Boolean signature expression.
A set S of signatures is said to be coherent if
it contains the signatures i{ } and bool{ }, which represent the context of individual objects and atomic formulas; and
no two different signatures in S have the same name.
Well-formed terms and formulas. RIF uses signatures to control the context in which terms can appear through the notion of well-formed terms and well-formed atomic formulas. First, as mentioned above, each symbol (constant or variable) is associated with exactly one signature from a coherent set of signatures. (Different symbols can be associated with the same signature, but one symbol cannot be associated with more than one signature.) If σ is a signature then we will use σ# to denote the name of that signature.
If s is a constant or variable with signature i{ } then it is a well-formed term with signature i{ }.
A term t(t1 ... tn) is a well-formed term with signature σ iff
t is a well-formed term with signature σ0;
Each ti is a well-formed term with signature σj, j = 1,...,n; and
σ0 contains an arrow expression of the form (σ1# ... σn#) → σ#
A term t(t1 ... tn) is a well-formed atomic formula iff
it is a well-formed term with signature bool{ }.
RIF assumes that there is a special predicate, =, which denotes equality. Like other predicates, it has a signature, which includes the Boolean expression (i i) → bool and possibly other expressions of the form (s s) → bool, where s is a signature name. No other signatures are allowed for =. We note that it is a common practice to write the atomic formulas involving = using infix notation, i.e., a = b instead of =(a,b). The equality predicate has special model-theoretic semantics, as explained in Section Model Theory for Basic RIF Condition Language.
More general formulas are constructed out of atomic formulas with the help of logical connectives. The Basic RIF Condition Language defines the following general well-formed formulas.
If φ is an atomic formula than it is also a general well-formed formula.
φ ∧ ψ is a general well-formed formula, if so are φ and ψ.
φ ∨ ψ is a general well-formed formula, if so are φ and ψ.
∃ V1 ... Vn φ is a general well-formed formula, if so is φ and V1, ..., Vn are variables.
Signatures in the Basic RIF Condition Language. The basic RIF condition language presents a much simpler picture to the user by restricting the set of well-formed terms with a very constraint coherent set of signatures. Namely, the signature set of the basic RIF logic language contains the following signatures:
i{ } and bool{ }
For every arity n, there ares signatures fn{(i ... i) → i} and pn{(i ... i) → bool} (there are n i's inside the parentheses), which represent function symbols of arity n and predicate symbols of arity n, respectively. The symbols f,,n,, and p`n are reserved signature names in the basic RIF logic dialect.
In this way, each constant symbol can be either an individual, a predicate of one particular arity, or a function symbol of one particular arity.
All variables are associated with signature i{ }, so they can range only over individuals.
The equality symbol, =, has signature p2{(i i) → bool}. In this way, the equality predicate can compare only individual constants and not predicates and function symbols.
Primitive data types. The set of all constant symbols in RIF has a number of predefined subsets, which correspond to XML data types and to some RIF-specific types. Namely, the following primitive data types are supported: xsd:long (http://www.w3.org/2001/XMLSchema#long), xsd:string (http://www.w3.org/2001/XMLSchema#string), xsd:decimal (http://www.w3.org/2001/XMLSchema#decimal), xsd:time (http://www.w3.org/2001/XMLSchema#time), xsd:dateTime http://www.w3.org/2001/XMLSchema#dateTime). In addition, a RIF-specific data type, called rif:iri (for international resource identifier or IRI), is supported. Constant symbols that belong to this data type have special concrete syntax and semantic structures will interpret them in a special way. For the formal syntax we only need to state that the set Const of all constant symbols has subsets Constxsd:long, Constxsd:string, Constxsd:decimal, Constxsd:time, Constxsd:dateTime, and Constrif:iri.
Abstract Syntax
The abstract syntax of the condition language is given in asn06 as follows:
class CONDITION
subclass And
property formula : list of CONDITION
subclass Or
property formula : list of CONDITION
subclass Exists
property declare : list of Var
property formula : CONDITION
subclass ATOMIC
class ATOMIC
subclass Equal
property side: list of TERM
subclass Uniterm
class TERM
subclass Var
property name: xsd:string
subclass Const
property name: xsd:string
subclass Uniterm
property op: Const
property arg: list of TERM
The multiplicity of the side property (role) of the Equal class is assumed to be exactly 2.
The abstract syntax is visualized by a UML diagram used as a schema-level device for specifying syntactic classes with generalizations as well as multiplicity- and role-labeled associations. Automatic asn06-to-UML transformation is available.
The central class of RIF, CONDITION, is specified recursively through its subclasses and their parts. The Equal class has two side roles. Two pairs of roles distinguish between the declare and formula parts of the Exists class, and between the operation (op) and arguments (arg) of the Uniterm class, where multiple arguments are subject to an ordered constraint (the natural "left-to-right" order):
The syntactic classes are partitioned into classes that will not be visible in serializations (written in all-uppercase letters) and classes that will be visible in instance markup (written with a leading uppercase letter only).
The three classes Var, CONDITION, and ATOMIC will be required in the abstract syntax of Horn Rules.
Concrete Syntax
The abstract syntax of Section Abstract Syntax can be instantiated to a concrete syntax. The EBNF syntax below, which instantiates the abstract syntax, is used throughout this document to explain and illustrate the main ideas. This syntax is similar in style to what in OWL is called the Abstract Syntax http://www.w3.org/TR/owl-semantics/syntax.html.
EBNF representation. The concrete human-readable syntax, described in this (slightly modified) EBNF, is work in progress and under discussion.
CONDITION ::= CONJUNCTION | DISJUNCTION | EXISTENTIAL | ATOMIC
CONJUNCTION ::= 'And' '(' CONDITION* ')'
DISJUNCTION ::= 'Or' '(' CONDITION* ')'
EXISTENTIAL ::= 'Exists' Var+ '(' CONDITION ')'
ATOMIC ::= Uniterm | Equal
Uniterm ::= Const '(' TERM* ')'
Equal ::= TERM '=' TERM
TERM ::= Const | Var | Uniterm
Const ::= CONSTNAME | '"'CONSTNAME'"''^^'TYPENAME
Var ::= '?'VARNAME
The above is a standard syntax for a variant of first-order logic. The application of a symbol from Const to a sequence of terms is called Uniterm ("Universal term") since it can be used to play the role of a function term or an atomic formula depending on the syntactic context in which the application occurs. The non-terminal ATOMIC stands for (positive) atomic formula, which can later be complemented with "negated atomic formula". EXISTENTIAL stands for "existential formula", which in Horn-like conditions is the only quantified formula but in later conditions may be complemented with "universal formula" (Var+ denotes the list of free variables in CONDITION). The production CONJUNCTION defines conjunctions of conditions, and DISJUNCTION denotes disjunctions. Finally, CONDITION assembles everything into what we call RIF conditions. RIF dialects will extend these conditions in various ways.
Note that individuals, function symbols, and predicate symbols all belong to the same set of symbols (Const). This syntax is more general than what the basic RIF logic dialect actually permits. As explained in Section Formal Syntax a set of signatures restricts this syntax to allow only the terms that are allowed in standard firts-order logic.
The above EBNF, where TERM* is understood as TERM* ::= | TERM TERM*, uses spaces to indicate Lisp-like whitespace (SPACE, TAB, or NEWLINE) character sequences in the defined language; these must consist of at least one whitespace character when used as the separator between adjacent TERMs, and can consist of zero or more whitespace characters elsewhere. Where spaces are omitted from the EBNF, as between '?' and VARNAME in '?'VARNAME, there must be no whitespace character in the defined language.
At this point we do not commit to any particular vocabulary for the names of constants and variables. These are still assumed to be alphanumeric character sequences (except 'And' and 'Or').
Note that variables in the RIF Condition Language can be free and quantified. All quantification is explicit. All variables introduced by quantification should also occur in the quantified formula. Initially, only existential quantification is used. Universal quantification will be introduced later. We adopt the usual scoping rules for quantifiers from first-order logic. Variables that are not explicitly quantified are free.
Free variables arise because CONDITION can occur in an if part of a rule. When this happens, the free variables in a condition formula are precisely those variables that also occur in the then part of the rule. We shall see in Section Horn Rules that such variables are quantified universally outside of the rule, and the scope of such quantification is the entire rule. For instance, the variable ?Y in the first RIF condition of Example 1 is quantified, existentially, but ?X is free. However, when this condition occurs in the if part of the second rule in Example 1, then this variable is quantified universally outside of the rule. (The precise syntax for RIF rules will be given in Section Horn Rules.)
Example 1 (A RIF condition in and outside of a rule)
RIF condition:
Exists ?Y (condition(?X ?Y))
RIF Horn rule:
Forall ?X (then(?X) :- Exists ?Y (condition(?X ?Y)))
When RIF conditions are used as queries, their free variables carry answer bindings back to the caller.
XML serialization. The following is a possible XML-serializing mapping of the abstract syntax in Section Abstract Syntax (and of the above EBNF syntax). This XML serialization is fully striped, where positional information is optionally exploited only for the arg role elements. For example, role elements (declare and formula) are explicit within the Exists element. Following the examples of Java and RDF, we use capitalized names for class elements and names that start with lowercase for role elements.
The all-uppercase non-terminals in EBNF, such as CONDITION, become XML entities. They act like macros and are not visible in instance markup. The other non-terminals and symbols (such as Exists or =) become XML elements, as shown below.
- And (conjunction) - Or (disjunction) - Exists (quantified formula for 'Exists', containing declare and formula roles) - declare (declare role, containing a Var) - formula (formula role, containing a CONDITION formula) - Uniterm (atomic or function-expression formula) - op (operation role) - arg (argument role, positional/non-positional without/with 'index' attribute) - Equal (prefix version of term equation '=') - side (left-hand and right-hand side role) - Const (individual, function symbol, or relation/predicate symbol) - Var (logic variable)
The following example illustrates XML serialization of RIF conditions.
Example 2 (A RIF condition and its XML serialization):
a. RIF condition:
And ( Exists ?Buyer (purchase(?Buyer ?Seller book(?Author LeRif) USD(49))) ?Seller=?Author )
b. XML serialization:
<And>
<formula>
<Exists>
<declare><Var>Buyer</Var></declare>
<formula>
<Uniterm>
<op><Const>purchase</Const></op>
<arg><Var>Buyer</Var></arg>
<arg><Var>Seller</Var></arg>
<arg>
<Uniterm>
<op><Const>book</Const></op>
<arg><Var>Author</Var></arg>
<arg><Const>LeRif</Const></arg>
</Uniterm>
</arg>
<arg>
<Uniterm>
<op><Const>USD</Const></op>
<arg><Const>49</Const></arg>
</Uniterm>
</arg>
</Uniterm>
</formula>
</Exists>
</formula>
<formula>
<Equal>
<side><Var>Seller</Var></side>
<side><Var>Author</Var></side>
</Equal>
</formula>
</And>
Syntax for Primitive Types. In this document we will use the following syntax, following N-Triples, to specify the constant symbols that belong to primitive data types:
"value"^^typename
For instance, "123"^^xsd:long is a symbol that belongs to the primitive data type xsd:long, "2007-11-23T03:55:44-02:30"^^xsd:dateTime is a symbol of data type xsd:dateTime, and "ftp://example.com/foobar"^^rif:iri is a symbol that belongs to the data type rif:iri. The part of such symbols that occurs inside the double quotes is called a literal of the symbol. The surrounding double quotes are not part of the literal. If a double quote must be included as part of a literal, it must be escaped with the backslash.
The XML syntax for primitive types can utilize the type attribute associated with XML term elements such as Const. For instance, the earlier xsd:dateTime example can be represented as <Const type="xsd:dateTime">2007-11-23T03:55:44-02:30</Const>.
Many primitive types in RIF may be based on the XML Schema data types and thus it is expected that some names of RIF primitive types will be references to those XML Schema types. In this case, these IRIs will be used as type names.
In this version of the RIF Core document, we define the following primitive types, where the prefix xsd refers to the XML Schema URI and rif is the prefix for the RIF language. The syntax such as xsd:long should be understood as a compact uri, i.e., a macro, which expands to a concatenation of the character sequence denoted by the prefix xsd and the string long. In the next version of this document we will introduce a syntax for defining prefixes for compact URIs and will also expand on the syntax for the symbols that can be used to denote RIF's primitive data types.
xsd:long. This data type corresponds to the XML data type xsd:long. Example: "123"^^xsd:long. Long integers also have a short notation, which does not require the "..."^^xsd:long wrapper. Example: 123.
xsd:decimal. This corresponds to the XML data type xsd:decimal. Example: "123.45"^^xsd:decimal. Decimals also have an alternative short notation, which does not require the "..."^^xsd:decimal wrapper. Example: 123.45.
The type xsd:decimal is a supertype of xsd:long.
xsd:string. This corresponds to the XML data type xsd:string. Example: "a string"^^xsd:string. Double quotes that appear inside strings are escaped with a backslash and a backslash that is supposed to appear in the string must be escaped with another backslash. Strings have also an alternative short notation. Example: 'a string'. In this notation, single quotes and backslashes that occur inside such strings are escaped with backslashes.
xsd:time. This corresponds to the XML data type xsd:time. Example: "18:33:44.2345"^^xsd:time.
xsd:dateTime. This type corresponds to the XML data type xsd:dateTime. Example: "2007-03-12T21:22:33.44-01:30"^^xsd:dateTime.
rif:iri. Symbols of this type have the form "XYZ"^^rif:iri, where XYZ is an IRI as specified in RFC 3987.
Other XML data types that are likely to be incorporated in RIF include xsd:double, xsd:date, and a type for temporal duration. At present, the partial order on the above primitive data types is imposed by the XML Schema hierarchy, so the only subtype relationship is between xsd:long and xsd:decimal. This may be extended as more types are added.
The type rif:iri is intended to be used in a way similar to RDFS resources. The domain of the type rif:iri can be any set and no a priori equalities among the members of the type rif:iri are assumed. This domain is not the same as the value space of the XML primitive type anyURI.
The constant symbols that correspond to XML data types all have the signature i{ } in the basic RIF logic dialect. The symbols of type rif:iri can have any signature allowed by the basic RIF logic: i, fn, or pn, for n = 0,1,....
The purchase atomic formula of Example 2 can be enriched with primitive types, obtaining the EBNF version
purchase(?Buyer ?Seller book(?Author "LeRif"^^xsd:string) USD("49"^^xsd:long))
and its XML serialization
<Uniterm>
<op><Const>purchase</Const></op>
<arg><Var>Buyer</Var></arg>
<arg><Var>Seller</Var></arg>
<arg>
<Uniterm>
<op><Const>book</Const></op>
<arg><Var>Author</Var></arg>
<arg><Const type="xsd:string">LeRif</Const></arg>
</Uniterm>
</arg>
<arg>
<Uniterm>
<op><Const>USD</Const></op>
<arg><Const type="xsd:long">49</Const></arg>
</Uniterm>
</arg>
</Uniterm>
Specifying Signatures in RIF Dialects
We will now explain how signatures are defined in the Basic Condition Language and its extensions. In the Basic RIF Logic dialect, there is no need to declare signatures, since they can be inferred. Indeed, the basic logic dialect assumes that each symbol is associated with a unique signature. Therefore, the signature can be determined by the context in which the symbol is used. If a symbol is used in more than one context, the parser should deem it as a syntax error.
In dialects that extend the Basic RIF Condition Language, signature inference of the above kind is not possible, in general, and might not always be appropriate even when it is possible. For this reason, we intriduce the attribute signature for the XML tags Const and Var. The value of such a tag is supposed to be the signature name for the corresponding symbol. For instance, in the above example, purchase was a 4-ary predicate, so it could be specified as <Const signature="p4">purchase</Const>. Since variables in the basic condition language always have the signature i, we could write <Var signature="i">Author</Var>.
The signature attribute is optional, and its value can be inferred for the basic language. Dialects that extend the basic logic dialect can adopt their own rules for omitting this attribute and for inferring its values. The basic language does not need any special syntax for defining signatures -- all signature names are already defined: i, f0, f1, p0, p1, etc. The dialects, however, will need a special sublanguage for defining signatures. The signature
mysig{
(s1 s2 s3) -> bool,
(s i) -> i
}
can, for example, be defined using the following XML excerpt:
<rif:signature rif:signature-name="mysig">
<rif:sig-expression rif:sig-head="bool">
s1 s2 s3
</rif:sig-expression>
<rif:sig-expression rif:sig-head="i">
s i
</rif:sig-expression>
</rif:signature>
Model Theory for the Basic RIF Condition Language
The first step in defining a model-theoretic semantics for a logic-based language is to define the notion of a semantic structure, also known as an interpretation. (See end note on the rationale.)
Semantic structures. A semantic structure, I, is a tuple of the form <D,IC, IV, IF, IR>, which determines the truth value of every formula, as explained below. Currently, by formula we mean anything produced by the CONDITION production in the EBNF. Later on this term will also include rules. Other dialects might extend this notion even further.
The set of truth values is denoted by TV. For the basic RIF logic dialect, TV includes only two values, t (true) and f (false). (See end note on truth values.)
The set TV has a total or partial order, called the truth order; it is denoted with <t. In the basic RIF logic, f <t t, and it is a total order. (See end note on ordering truth values.)
To define how semantic structures determine the truth values of RIF formulas, we introduce the following sets:
D - a non-empty set of elements called the domain of I,
Const - the set of individuals, predicate names, and function symbols,
Var - the set of variables.
The mappings that constitute an interpretation, I, are as follows:
IC from Const to elements of D
IV from Var to elements of D
IF from Const to functions from D* into D (here D* is a set of all tuples of any length over the domain D)
IR from Const to truth-valued mappings D* → TV
It is convenient to define a more general mapping, I, based on the mappings IC, IV, IF, and IR:
I(k) = IC(k) if k is a symbol in Const
I(?v) = IV(?v) if ?v is a variable in Var
I(f(t1 ... tn)) = IF(f)(I(t1),...,I(tn))
Note that f() and f can be interpreted differently by semantic structures, and the proposed XML serialization also treats them as distinct terms: the nullary function application <Uniterm><Const>f</Const></Uniterm> vs. the symbol <Const>f</Const>. Semantically, neither f() = f nor f() ≠ f are entailed a priori. However, if desirable, a particular application or a dialect may introduce axioms of the form f = f() to equate symbols and nullary terms.
Truth valuation for formulas. Observe that the notion of signatures from Section Formal Syntax is used only to constrain the syntax and does not appear in the definition of the semantic structure. This is because when we define truth valuations for formulas, below, all formulas are assumed to be well-formed.
Truth valuation for well-formed formulas in the basic RIF condition language is determined using the following function, denoted ITruth:
Atomic formulas: ITruth(r(t1 ... tn)) = IR(r)(I(t1),...,I(tn))
Equality: ITruth(t1=t2) = t iff I(t1) = I(t2); ITruth(t1=t2) = f otherwise.
Conjunction: ITruth(And(c1 ... cn)) = mint(ITruth(c1),...,ITruth(cn)), where mint is minimum with respect to the truth order.
Disjunction: ITruth(Or(c1 ... cn)) = maxt(ITruth(c1),...,ITruth(cn)), where maxt is maximum with respect to the truth order.
Quantification: ITruth(Exists ?v1 ... ?vn (c)) = maxt(I*Truth(c)), where maxt is taken over all interpretations I* of the form <D,IC, I*V, IF, IR>, where I*V is the same as IV except possibly on the variables ?v1,...,?vn (i.e., I* agrees with I everywhere except possibly in its interpretation of the variables ?v1 ... ?vn).
Notice that the mapping ITruth is uniquely determined by the four mapping comprising I and, therefore, it does not need to be listed explicitly.
Interpretation of primitive data types. We now explain how primitive data types are integrated into the semantics of the basic RIF logic.
The XML Schema Part 2: Datatypes specification defines the value space for each XML data type, including the data types such as xsd:decimal, which are of interest to RIF. The value space is different from the so-called lexical space. Lexical space refers to the syntax of the constant symbols that belong to a particular primitive data type. For instance, "1.2"^^xsd:decimal and "1.20"^^xsd:decimal are two different constants in RIF and in the lexical space of the XML data types. However, these two constants are interpreted by the same element of the value space of the xsd:decimal type.
Formally, each of the XML data types supported by RIF comes with a value space, denoted by Dtype (for instance, Dxsd:decimal), and a mapping, IC: Consttype → Dtype. These value spaces and the corresponding mappings are defined by the XML Schema Part 2: Datatypes specification. We assume that Dtype ⊆ D for each XML data type and that Dt is disjoint from Ds for different XML primitive types s and t. For instance, for xsd:decimal this mapping is IC: Constxsd:decimal → Dxsd:decimal, and it follows from the aforesaid specification that, for instance, IC("1.2"^^xsd:decimal) = IC("1.20"^^xsd:decimal) in every semantic structure.
For the rif:iri data type, the corresponding mapping is IC: Constrif:iri → D, i.e., Drif:iri = D and the interpretation mapping is allowed to map an IRI into any value in the domain of the semantic structure.