Metalog - Querying RDF data models

Massimo Marchiori, Janne Saarela
{massimo,jsaarela}@w3.org
World Wide Web Consortium

The Resource Description Framework (RDF) Model&Syntax Specification describes a metadata infrastructure which can accommodate classification elements from different vocabularies i.e. schemas. The underlying model consists of a labeled directed acyclic graph which can be linearized into eXtensible Markup Language (XML) transfer syntax for interchange between applications.

This paper will demonstrate how a new querying language, Metalog, allows users to write simple logic programs that have equally expressive power to Datalog logic programs. A Metalog query itself can also presented as an RDF Schema which allows distributed usage and refinement of Metalog queries through URI addressing on the Web.

The RDF data model also supports higher-order statements using a special reification construct. We will show how higher order modalities can also be expressed in Metalog so that the evaluation of queries can easily be implemented with resolution strategies such as SLD or SLDNF or semi-naive evaluation.

Finally, we present some practical results which demonstrate that our approach is feasible and extensible to more general logic programs in the future.

Introduction

Resource Description Framework

An RDF data model

Figure X. An RDF data model

Let's go even one step further. The data model is a labeled directed graph. This can also be represented as predicates which correspond with the arcs of the data model and connect thus two nodes. The example in Figure X corresponds with 25 triples some of which are presented in the following excerpt:

triple("http://www.w3.org/schemas/RDFschema#instanceOf","Authors","http://www.w3.org/schemas/RDFschema#Sequence").
triple("http://www.w3.org/schemas/RDFschema#1","Authors","Tim Bray").
triple("http://www.w3.org/schemas/RDFschema#2","Authors","Jean Paoli").
triple("http://www.w3.org/schemas/RDFschema#3","Authors","C. M. Sperberg-McQueen").
triple("http://www.w3.org/schemas/RDFschema#PropName","genid5","http://purl.org/RDF/DC#Creator").
triple("http://www.w3.org/schemas/RDFschema#PropObj","genid5","http://www.w3.org/TR/REC-xml").
triple("http://www.w3.org/schemas/RDFschema#PropValue","genid5","Authors").
triple("http://www.w3.org/schemas/RDFschema#instanceOf","genid5","http://www.w3.org/schemas/RDFschema#Property").

Figure N. Triple representation of RDF data model

The structure of the triples is as follows:

triple(property,propertyObject,propertyValue).
which can also be read as
property(propertyObject,propertyValue).

The example in Figure N. presents us the mechanism how RDF deals with higher order statements. If another property asserts something about another property, the RDF has decided not to allow nesting of triples but a mechanism called reification which provides a unique id for any assertion thus allowing it to be referer to from other triples. This mechanism also sets the recursion limit by not allowing reificated properties to be further reificated.

The following triple corresponds with four triples presented in the following examples:

triple(property,propertyObject,propertyValue).
triple('instanceOf',newID,'Property').
triple('PropName',newID,property).
triple('PropObj',newID,propertyObject).
triple('PropValue',newID,propertyValue).

Syntax

RDF could have chosen a special syntax but due to the popularity of the XML document encoding syntax, the decision was to build RDF on top of XML. This relieves RDF from some specification work as for example internalization (I18N) which is defined by XML to be based on Unicode.

The following example presents the XML encoding of the data model presented in Figure X.

<?xml version="1.0"?>
<?xml:namespace ns="http://www.w3.org/schemas/RDFschema" prefix="RDF"?>
<?xml:namespace ns="http://www.w3.org/schemas/WCschema" prefix="WC"?>
<?xml:namespace ns="http://purl.org/RDF/DC" prefix="DC"?>
<RDF:RDF>
<RDF:description about="http://www.w3.org/TR/REC-xml">
  <WC:Status about="http://www.w3.org/schemas/WCschema#REC"/>
  <DC:Title>Extensible Markup Language (XML) 1.0 Specification</DC:Title>
  <DC:Date>10-February-1998</DC:Date>
  <DC:Creator>
    <RDF:Seq ID="Authors">
      <RDF:LI>Tim Bray</RDF:LI>
      <RDF:LI>Jean Paoli</RDF:LI>
      <RDF:LI>C. M. Sperberg-McQueen</RDF:LI>
    </RDF:Seq>
  </DC:Creator>
  <WC:Language>en</WC:Language>
</RDF:description>
</RDF:RDF>

Figure Y. XML encoding of the data model in Figure X.

Query Languages

In general, query languages are formal languages to retrieve data from a database. Standardadized languages already exist to retrieve information from different types of databases such as Structured Query Language (SQL) for relational databases and Object Query Language (OQL) and SQL3 for object databases.

Semi-structure query languages such as XML-QL [ref] operate on the document level structure....

Logic programs consist of facts and rules where valid inference rules are used to determine all the facts that apply within a given model.

With RDF, the most suitable approach is to focus on the underlying data model. Even though XML-QL could be used to query RDF descriptions in their XML encoded form,

Metalog

We feel that the query language we are about to propose and any other to be widely deployed on the Web must address the following requirements:

  1. The query language must be easy to author
  2. The query language must have clear semantics
  3. The evaluation environment must be easy to implement
  4. The query language must be extensible for more complex semantics

Requirement 1: Easy to author

To address requirement 1, we would like users to use simple syntax with an IF THEN construct as follows:

if B(X) and C(X) then A(X);
The grammar for this language is simply (in BNF form):
program   ::= procedure+;

procedure ::= IF atoms THEN atoms;

atoms     ::= atoms AND atom
            | atom

atom ::= ID '(' ID ',' ID ')'
       | ID '(' String ',' ID ')'
       | ID '(' ID ',' String ')'
       | ID '(' String ',' String ')'

Requirement 2: Clear semantics

To address requirement 2, Metalog builds on definitive logic programs which are defined as follows:

A <- A1 and ... and Am, m >=0.
where Ai are positive literals i.e. atoms. Please note that there is a direct mapping of Metalog programs to definite logic programs in this form.

Requirement 3: Operational vs. Declarative semantics

With the metalog syntax we propose, the order in which the predicates in the IF block appear does not matter. Typically, in Prolog evaluation environments the resolution procedure selects predicates starting from left and proceeding to the right. Correspondingly, the order in which the procedures of the logic program are written determines the order in which they are used in the resolutions.

The reason for this is simply that the operational and declarative semantics happen to coincide in metalog. The declarative semantics of metalog programs i.e. the logical deduction taken all procedures gives the same result as any logic program evaluation environment which uses either (semi)naive bottom-up evaluation or top-down evaluation with a specific procedure such as SLD or SLDNF resolution.

Requirement 4: Extensible

The query system should not restrict future work on elaborating the expressivity of the queries. We can already anticipate the introduction of negation and disjunction to the grammar presented in Figure X.

An query evaluation system which has then been developed for an earlier version of the grammar should be able to tell the user that it may not be able to evaluate the query due to the fact that the semantics of the program may not be supported by the query evaluation system. Therefore, we would like to introduce a version tag in the beginning of any Metalog query. [ Is this a good idea? Older version is always more simple semantics than later version. ]

Metalog queries as RDF schemas

RDF schemas provide as a way to define type systems using the RDF data model. These types allow the authors of RDF entries to use specific properties with corresponding constrained property values with given arity.

We propose that metalog programs must have a corresponding RDF schema representation or extensibility. In this way, an author of a metalog query can point to a specific RDF schema representation of an existing metalog query and refine the query himself.

Metalog allows the use to point to an RDF schema with a namespace mechanism [wait for a good solid reference] that uses URIs. In this way, each predicate i.e. propertyName within a metalog query will be unique. Refining queries through URI addressing

Figure N. Refinement of metalog queries using URI addressing.

Higher-order statements in Metalog

[I think we need to say that the following additional procedures have to be introduced to hide the reification mechanism from the query interface level.]

The fact that a property has a certain value may be represented in many ways in the RDF data model. We enumerate them in the following list in increasing order of complexity:

  1. direct value - there is a fact in the corresponding data model where the value is directly present in the triple.
  2. reificated value - there is a fact in the corresponding data model but the triple may have been reificated into four corresponding triples by the processing application.
  3. proxied value through collection - If a property has multiple values, the author may use different collections nodes (Sequence, Bag, or Alternative) to indicate whether the values preserve order or not, or whether they are mutually exclusive, respectively. In this case, the value is proxied through an instance of one of these nodes.
  4. reificated proxied value through collection - this case corresponds with the case where the property is reificated and additionally the value is proxied through a collection.

The following default rules define first of all corresponding rules for the previous value cases 1-4 and then rules to determine reification and collection identify with reificated/4 and collection/1 predicates, respectively.


prop(Prop,PropObj,PropValue) :- triple(Prop,PropObj,PropValue),
        not(collection(PropObj)),
  	not(reificated(_,_,_,PropValue)).

prop(Prop,PropObj,PropValue) :- reificated(Prop,PropObj,PropValue,_),
	not(collection(PropValue)).

prop(Prop,PropObj,PropValue) :- collection(CollectionObject),
	triple(Prop,PropObj,CollectionObject),
	triple(_,CollectionObject,PropValue).

prop(Prop,PropObj,PropValue) :- reificated(Prop,PropObj,_,reificatedNode),
	collection(CollectionObject),
	reificated(Prop,PropObj,CollectionObject,reificatedNode),
	triple(_,CollectionObject,PropValue).

reificated(Prop,PropObj,PropValue,GenPropObj) :-
	triple('http://www.w3.org/schemas/RDFschema#instanceOf', GenPropObj, 'http://www.w3.org/schemas/RDFschema#Property'),
	triple('http://www.w3.org/schemas/RDFschema#PropObj', GenPropObj, PropObj),
	triple('http://www.w3.org/schemas/RDFschema#PropName', GenPropObj, Prop),
	triple('http://www.w3.org/schemas/RDFschema#PropValue', GenPropObj, PropValue).

collection(Object) :- triple('http://www.w3.org/schemas/RDFschema#instanceOf',Object,'http://www.w3.org/schemas/RDFschema#Bag').
collection(Object) :- triple('http://www.w3.org/schemas/RDFschema#instanceOf',Object,'http://www.w3.org/schemas/RDFschema#Alt').
collection(Object) :- triple('http://www.w3.org/schemas/RDFschema#instanceOf',Object,'http://www.w3.org/schemas/RDFschema#Sequence').

Figure N. Managing reification, collections, and their combination with additional rules.

Results

During the tests we have been using the following setup:

Metalog -> RDF schema compiler
This compiler is written in C++ using a combination of flex and bison to create a parse tree.
RDF/XML document -> triple compiler
This compiler is written in Java and it uses the Simple Api for XML documents (SAX) to initially parse RDF/XML encoded files. Once the parse tree is available, a translation process is run again in the tree to produce a corresponding triple representation of the underlying data model.
RDF schema -> prolog syntax compiler
The compilation translates an RDF description of a query to a prolog type syntax. We call it prolog type since the programs may actually be out of the scope of the semantics Prolog supports. For example, a procedure may have a disjunctive head.
Logic program evaluation environment
We have selected the Coral deductive database [X] as test environment and we have set plain semi-naive evaluation strategy for all test queries we will present later.

We would like to emphasize the fact that both compilers are easily ported to different platforms from the Solaris 2.6 environment we have been using. The evaluation environment is something we hope people will be able to embed into different applications using different evaluation strategies.

As input data we have been using a set of 2700 RDF data model triples that correspond with the data available at the World Wide Web Consortium technical reports page. This page presents the public documents the consortium has published along with their authors, dates, and URIs. The first example in Figure N is an excerpt of this data.

The queries we wanted to test were of N different types that will be discussed in the following test set-ups.

Trivial queries

We start with straight-forward queries using the example described already in Figure N as our case example.

NAMESPACE URI "http://purl.org/schemas/DublinCore/RDF" ALIAS uri1

IF { uri1:Creator(Doc,Person) AND uri1:Language(Doc,Language) }
THEN { Speaks (Person, Language) }

Query 1 - Metalog syntax

<?xml version="1.0"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
  xmlns="http://www.w3.org/TR/WD-metalog#">
<Procedure>
  <Head>
  <Conjunction>
    <Predicate name="Speaks">
      <rdf:Seq>
        <rdf:li><Variable>Person</Variable></rdf:li>
        <rdf:li><Variable>Language</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </Conjunction>
  </Head>
  <Body>
  <Conjunction>
  <Predicates>
  <rdf:Seq>
  <rdf:li>
    <Predicate name="http://purl.org/RDF/DC#Creator">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Person</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </rdf:li>
  <rdf:li>
    <Predicate name="http://purl.org/RDF/DC#Language">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Language</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </rdf:li>
  </rdf:Seq>
  </Predicates>
  </Conjunction>
  </Body>
</Procedure>
</rdf:RDF>

Query 1 - RDF/XML encoding of the query

Speaks(Person,Language) <- http://purl.org/RDF/DC#Creator(Doc,Person),
                           http://purl.org/RDF/DC#Language(Doc,Language).

Query 1 - Query in prolog syntax

NAMESPACE URI "http://purl.org/schemas/DublinCore/RDF" ALIAS uri1
NAMESPACE URI "http://www.w3.org/Metadata/RDF/Metalog/query1.rdf" ALIAS uri2

IF { uri1:Language(Doc, Language) AND uri2:Speaks(Person,Language) }
THEN { candoReview(Person,Doc) }

Query 2 - Metalog syntax

<?xml version="1.0"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
  xmlns="http://www.w3.org/TR/WD-metalog#">
<Procedure>
  <Head>
  <Conjunction>
    <Predicate name="candoReview">
      <rdf:Seq>
        <rdf:li><Variable>Person</Variable></rdf:li>
        <rdf:li><Variable>Doc</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </Conjunction>
  </Head>
  <Body>
  <Conjunction>
  <Predicates>
  <rdf:Seq>
  <rdf:li>
    <Predicate name="http://purl.org/RDF/DC#Language">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Language</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </rdf:li>
  <rdf:li>
    <Predicate name="http://www.w3.org/Metadata/RDF/Metalog/query1.rdf#Speaks">
      <rdf:Seq>
        <rdf:li><Variable>Person</Variable></rdf:li>
        <rdf:li><Variable>Language</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </rdf:li>
  </rdf:Seq>
  </Predicates>
  </Conjunction>
  </Body>
</Procedure>
</rdf:RDF>

Query 2 - RDF/XML encoding of the query

candoReview(Person,Doc) <- http://purl.org/RDF/DC#Language(Doc,Language),
                           http://www.w3.org/Metadata/RDF/Metalog/query1.rdf#Speaks(Person,Language).

Query 2 - Query in prolog syntax

Transitive queries

As an example, we study the versioning of different resources using the RDF data model. We expect that documents link always to previous version allowing branching versioning.

We would like to see what are all the previous document of a given document Doc. The query written in Metalog will look like

NAMESPACE URI "http://www.w3.org/Metadata/RDF/Metalog/WebCollections.rdf" ALIAS uri1

IF { uri1:previousVersion(Doc, Doc1) }
THEN { previousVersion(Doc,Doc1) };

IF { previousVersion(Doc, Doc2) AND uri1:previousVersion(Doc2, Doc3) }
THEN { previousVersion(Doc,Doc3) };

Query 3 - Metalog syntax for a transitive query

<?xml version="1.0"?>
<rdf:RDF 
  xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
  xmlns="http://www.w3.org/TR/WD-metalog#">
<Procedure>
  <Head>
  <Conjunction>
    <Predicate name="previousVersion">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Doc1</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </Conjunction>
  </Head>
  <Body>
  <Conjunction>
    <Predicate name="http://www.w3.org/Metadata/RDF/Metalog/WebCollections.rdf#previousVersion">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Doc1</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </Conjunction>
  </Body>
</Procedure>
<Procedure>
  <Head>
  <Conjunction>
    <Predicate name="previousVersion">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Doc3</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </Conjunction>
  </Head>
  <Body>
  <Conjunction>
  <Predicates>
  <rdf:Seq>
  <rdf:li>
    <Predicate name="previousVersion">
      <rdf:Seq>
        <rdf:li><Variable>Doc</Variable></rdf:li>
        <rdf:li><Variable>Doc2</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </rdf:li>
  <rdf:li>
    <Predicate name="http://www.w3.org/Metadata/RDF/Metalog/WebCollections.rdf#previousVersion">
      <rdf:Seq>
        <rdf:li><Variable>Doc2</Variable></rdf:li>
        <rdf:li><Variable>Doc3</Variable></rdf:li>
      </rdf:Seq>
    </Predicate>
  </rdf:li>
  </rdf:Seq>
  </Predicates>
  </Conjunction>
  </Body>
</Procedure>
</rdf:RDF>

Query 3 - RDF/XML encoding of the query

previousVersion(Doc,Doc1) <- http://www.w3.org/Metadata/RDF/Metalog/WebCollections.rdf#previousVersion(Doc,Doc1).
previousVersion(Doc,Doc3) <- previousVersion(Doc,Doc2),
                             http://www.w3.org/Metadata/RDF/Metalog/WebCollections.rdf#previousVersion(Doc2,Doc3).

Query 3 - Query in prolog syntax

Now, previousVersion property is defined in two places: one in the RDF schema references through the namespace mechanism and another one in this query (which can again be placed on the Web as RDF schema). If a user wishes to use this specific property, he will have very different results whether he uses the transitive previousVersion or a simple version that only connects two resources together.

Related work

The use of Web infrastructure to accommodate logic programs has been suggested by (Sandevall, 1996) and (Loke & Davidson, 1996). The latter approach suggests using familiar logic program notation to place facts and queries on HTML pages. The embedded rules also have the ability to refer to other HTML pages with other predicates using a namespace mechanism. In this way, their evaluation context increases over the amount of HTML pages they retrieve to find facts that satisfy the queries.

Future work

Conclusions

Acknowledgements

The authors would like to thank Bert Bos for his help in running the test sets.

References

  1. Das, S.K. (1992). Deductive Databases and Logic Programming. Addison Wesley.
  2. Lassila, O., Swick, R. (1998). Resource Description Framework (RDF) Model and Syntax Specification. W3C Working Draft.
    http://www.w3.org/TR
  3. Loke, S.W., Davison, A. (1996). Logic Programming with the World Wide Web. Proc. of the 7th ACM Conf. on Hypertext.
    http://www.cs.unc.edu/~barman/HT96/P14/lpwww.html
  4. Niemelä, Simons, P. (1997). Smodels -- an implementation of the stable model and well-founded semantics for normal logic programs Proc. of the 4th Int. Conf. on Logic Programming and Non-Monotonic Reasoning. Dagstuhl, Germany.
    http://saturn.hut.fi/pub/papers/lpnmr97-sd.ps.gz
  5. Ramakrishnan, R., Srivastava, D., Sudarshan, D. (1992). CORAL: Control, Relations and Logic. Proc. of the Int. Conf. on VLDB..
  6. Sandewall, E. (1996). Towards a World-Wide Data Base. Proc. of the 5th Int. WWW Conf..

Appendix A - Query schema in RDF

<?xml version="1.0"?>
<RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#"
     xmlns:RDF="http://www.w3.org/TR/WD-rdf-syntax#"
     xmlns:RDFS="http://www.w3.org/TR/WD-rdf-schema#">
<RDFS:Class ID="Procedure" />

<PropertyType ID="Head">
  <RDFS:comment>Head of the procedure</RDFS:comment>
  <RDFS:domain RDF:resource="#Procedure"/>
  <RDFS:range RDF:resource="#Connector"/>
  <RDFS:range RDF:resource="#Predicate"/>
</PropertyType>

<PropertyType ID="Body">
  <RDFS:comment>Body of the procedure</RDFS:comment>
  <RDFS:domain RDF:resource="#Procedure"/>
  <RDFS:range RDF:resource="#Connector"/>
  <RDFS:range RDF:resource="#Predicate"/>
</PropertyType>

<PropertyType ID="Predicates">
  <RDFS:comment>Predicates combined with a connector</RDFS:comment>
  <RDFS:domain RDF:resource="#Connector"/>
  <RDFS:range RDF:resource="#Predicate"/>
  <RDFS:range RDF:resource="#Connector"/>
  <!-- this last range definition enables recursion -->
</PropertyType>

<RDFS:Class ID="Connector" />

<RDFS:Class ID="Conjunction">
  <RDFS:subClassOf RDF:resource="#Connector" />
</RDFS:Class>

<RDFS:Class ID="Disjunction">
  <RDFS:subClassOf RDF:resource="#Connector" />
</RDFS:Class>

<RDFS:Class ID="Negation">
  <RDFS:subClassOf RDF:resource="#Connector" />
</RDFS:Class>

<RDFS:Class ID="Predicate" />

<PropertyType ID="Variable">
  <RDFS:comment>Variable within a predicate</RDFS:comment>
  <RDFS:domain RDF:resource="#Predicate"/>
  <RDFS:range RDF:resource="http://www.w3.org/FictionalSchemas/useful_types#String"/>
</PropertyType>

<PropertyType ID="Constant">
  <RDFS:comment>Constant within a predicate</RDFS:comment>
  <RDFS:domain RDF:resource="#Predicate"/>
  <RDFS:range RDF:resource="http://www.w3.org/FictionalSchemas/useful_types#String"/>
</PropertyType>

</RDF>

$Author: jsaarela $ - $Id: paper980828.html,v 1.1 1998/08/28 08:59:23 jsaarela Exp $