RIF example UC10: Publishing Rules for Interlinked Metadata

Contents

RIF example UC10: Publishing Rules for Interlinked Metadata

Summary

This is an attempt to encode typical rules for the Use case Publishing_Rules_for_Interlinked_Metadata in the XML syntax of RIF Core WD1, slightly extended where it appears necessary.

We will analyze what is possibly to express, present two possible source formats for rules of the kind used in this use case (N3 and SPARQL construct statements) as well as their mapping to RIF. We will suggest/define extensions where necessary.

Source rules

For a first attempt, we focus here on rules of the second block in the Use Case scenario since they cover interesting aspects such as scope and scoped negation.

Consider an alternative database of movies published at http://altmd.example.org. In addition to metadata, it publishes the following rules:

r1: All movies listed at http://altmd.example.org but not listed at http://imd.example.org are independent movies.

r2: All movies with budgets below 5 million USD are low-budget movies.

In order to demonstrate that these rules are worthwhile to be considered obviously, we will now see how these can be expressed in existing syntaxes already. We note that the two chosen formalisms allow to express the desired rules, although there exact semantics (in both cases) is still tied to several open issues.

source rules in N3

@prefix log: <http://www.w3.org/2000/10/swap/log#> .
@prefix ex: <http://www.example.org/> .
 
{ <http://altmd.example.org> log:semantics ?AM .
  ?AM log:Includes {?m a ex:movie.} .
  <http://imd.example.org> log:semantics ?IM.
  ?IM log:notIncludes {?m a ex:movie.} . }
           log:implies {?m a ex:indepMovie} .

{ ?m a ex:movie. ?m ex:hasBudget ?b. ?b math:lessThan "5"^^xsd:integer .}
log:implies {?m a ex:lowBudgetMovie} .

Unclear issues semantically in N3 are the nesting of log: statements (such as log:semantics) and how mutual reference of N3 documents is dealt with.

source rules in SPARQL

As pointed out in a mail conversation between Dan Conolly and Axel Polleres, see http://lists.w3.org/Archives/Public/public-rif-wg/2006Oct/0030, SPARQL CONSTRUCT statements (probably mixed with RDF Facts can be seen as rule bases. For the scoping we can use named graphs in SPARQL.

PREFIX ex: <http://www.example.org/> .

CONSTRUCT { ?m a ex:indepMovie . }
WHERE { GRAPH <http://altmd.example.org> {?m a ex:movie.} 
        OPTIONAL { GRAPH <http://imd.example.org> {?m a ex:movie. ?x a ex:movie. } }
        FILTER !bound(?x) 
      }

CONSTRUCT {?m a ex:lowBudgetMovie}
WHERE { ?m a ex:movie. ?m ex:hasBudget ?b. FILTER(?b < "5"^^xsd:integer) .}

For the first of the two constructs to work as expected, one needs to assume both <http://altmd.example.org> and <http://imd.example.org> elements of the named graphs in the dataset evaluating the SPARQL queries. Also, the first query needs to do a trick, referring to an actually unnecessary variable ?x in connection with an OPTIONAL and GRAPH graph pattern in order to express scoped negation.

Analysis and issues

In the following, we will try to give a plain attempt, to express both rules in the RIF syntax. Putting aside the semantic unclarities of N3 and SPARQL, it is still an open issue, how to actually define the mappings from/to RIF for these example rules, which seems to be more challenging than simply defining the task whether the rule in principle is expressible in RIF Syntax.

RDF triple mapping

As already pointed out in the UC8 Worked Example, there are various possibilities for RDF mappings to predicates/rules. We chose here the option of representing the RDF triples by binary relations where the predicate symbol is determined by the RDF predicate.

RIF in XML or RDF ?

UC8 Worked Example suggests a mixed RDF and XML Literal proposal to implement the RIF syntax, i.e., expressing conditions as XML Literals, wheras the overall rule-structure, and ruleset is expressed in an RDF syntax. Let us try to go one step further and see whether RDF could be used town to the conditions. A triple pattern, or other n-ary predicate would then be encoded:

  <rif:Uniterm>
    <rif:Const rdf:resource="&rdf;type" />
    <rif:Parameters rdf:parseType="Collection">
      <rif:Var>m</rif:Var>
      <rif:Const rdf:resource="&ex;movie" />
    </rif:Parameters>
  </rif:Uniterm>

As for Datatypes, going down to RDF would allow to reuse the existing rdf:Datatype directly without the necessity to create a new attribute:

<Const rdf:datatype="&xsd;integer">5000000</Const>

However, we see already, that going down to RDF on the condition/atom level would have several complicating implications: 1. In order to parameter preserve order, we need to introduce a collection and a new property/tag for parameters, which boils down to a list in RDF. 2. An RDF syntax, together with allowing RDF as data has several semantic implications which already proved problematic at several points in e.g. OWL Full. We thus suggest to stay with the XML syntax, and support clear semantic interfaces for the RDF data and OWL ontology interfacing.

So, we suggest to keep with the XML notation, also at the rule and ruleset level, ending up in:

  <rif:Uniterm>
    <rif:Const rif:resource="&rdf;type" />
    <rif:Var>m</rif:Var>
    <rif:Const rif:resource="&ex;movie" />
    </rif:Parameters>
  </rif:Uniterm>

<rif:Const rif:datatype="&xsd;integer">5000000</Const>

Note the borrowed attribute names from RDF, analogously to RDF syntax, we use attributes rif:resource and rif:datatype for refering to URI identified objects and concrete datatypes.

Context/Scope

For this use case, atoms need to be scoped. A quick straw peoposal to achieve this, would be the annotation of <rif:Uniterm> with an additional attribute rif:context:

  <rif:Uniterm rif:context="http://altmd.example.org">
    <rif:Const rif:resource="&rdf;type" />
    <rif:Var>m</rif:Var>
    <rif:Const rif:resource="&ex;movie" />
  </Uniterm>

Variables

As a slight change to the original syntax, we could imagine variable ids also being represented as attribute values, i.e.:

    <rif:Var rif:varID="m"/>

but we will stick with the original proposal for the moment.

(Scoped) Negation as failure

For scoped negation as failure, we assume the tag <rif:naf>.

Builtins

Also for builtins we run into similar problems than UC8. There are (except equality) no defined comparison operators yet.

We could, for our purposes use XPath/XQuery-functions, which would however require us to provide a namespace for the op: prefix used in the XPath-functions document:

<rif:Const rif:resource="&op;numeric-greater-than" />
<rif:Const rif:datatype="&xsd;integer">5000000</rif:Const>
<rif:Var>b</rif:Var>

Note that in the rules above it is nowhere said that we talk about an USD amount. This would need to be separately stated somehow (and was left open in this first attempt).

bNodes

bnodes are not an issue in this example.

Rule naming

Rule naming is not an issue in this example, so we don't give names to the rules. Naming would be NECESSARY in if we stick to an RDF syntax, otherwise disambiguation of rules is impossible.

RIF Translation

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rif:Ruleset [
    <!ENTITY xsd  'http://www.w3.org/2001/XMLSchema#'>
    <!ENTITY rif  "http://www.w3.org/2006/10/rif#" >
    <!ENTITY  ex  "http://www.example.org#" >
    <!ENTITY  op  "http://TO-BE-DEFINED/op#" >
]>

<Ruleset xmlns="&rif;">

<Forall>
  <!-- Rule variables, universally quantified in declare role, skipping formula role -->
  <declare>
       <Var>m</Var>
  </declare>

  <Implies>

     <!-- rule body -->
     <if>
              <And>
                <Uniterm context="http://altmd.example.org">
                  <Const rif:resource="&rdf;type" />
                  <Var>m</Var>
                  <Const rif:resource="&ex;movie" />
                </Uniterm>
                <naf><Uniterm context="http://imb.example.org">
                  <Const rif:resource="&rdf;type" />
                  <Var>m</Var>
                  <Const rif:resource="&ex;movie" />
                </Uniterm></naf>
              </And> 
     </if>

     <!-- rule head -->
     <then> 
               <Uniterm>
                  <Const resource="&rdf;type" />
                  <Var>m</Var>
                  <Const resource="&ex;indepMovie" />
               </Uniterm>
     </then>
  </Implies>
</Forall>

<Forall>
   <!-- Rule variables, universally quantified in declare role, skipping formula role -->
  <declare>
       <Var>m</Var>
       <Var>b</Var>
  </declare>

  <Implies>

     <!-- rule body -->
     <if>
              <And>
                <Uniterm>
                  <Const resource="rdf:type" />
                  <Var>m</Var>
                  <Const resource="&ex;movie" />
                </Uniterm>
                <Uniterm>
                  <Const resource="&ex;hasBudget" />
                  <Var>m</Var>
                  <Var>b</Var>
                </Uniterm>               
                <Uniterm>
                  <Const resource="&op;numeric-greater-than">
                  <Const datatype="&xsd;integer">5000000</Const>
                  <Var>b</Var>
                </Uniterm>
              </And> 
     </if>

     <!-- rule head -->
     <then> 
               <Uniterm>
                  <Const resource="&rdf;type" />
                  <Var>m</Var>
                  <Const resource="&ex;lowBudgetMovie" />
               </Uniterm>
     </then>
  </Implies>
</Forall>
</Ruleset>

Although I can write down the rule now in RIF, it is still open, how to get e.g. from the N3 representation to SPARQL via RIF, or vice versa, and whether the intermediate format would be the suggested. Mappings to be defined, and to be discussed whether and how this can be done.