Metalog - Query language for RDF

Massimo Marchiori, Janne Saarela
{massimo,jsaarela}@w3.org
World Wide Web Consortium

The Resource Description Framework (RDF) Model&Syntax Specification describes a metadata infrastructure which can accommodate classification elements from different vocabularies i.e. schemas. The underlying model consists of a labeled directed acyclic graph which can be linearized into eXtensible Markup Language (XML) transfer syntax for interchange between applications.

This paper will demonstrate how a new querying language, Metalog, allows users to write queries in English-like syntax. We will demonstrate how these queries have equivalent representation both as RDF descriptions and as logic programs. We will also show how an automated compilation between these translations is possible.

The RDF representation of the queries and the XML namespace mechanism together can be used to refine existing queries already available and addressable on the Web.

Finally, we present some practical work with the Metalog query language in the context of managing multilingual Web sites and Access Control Lists.

Introduction

Resource Description Framework

An RDF data model

Figure 1. An RDF data model

Let's go even one step further. The data model is a labeled directed graph. This can also be represented as predicates which correspond with the arcs of the data model and connect thus two nodes. The example in Figure 1 corresponds with 11 triples some of which are presented in the following excerpt:

triple("http://www.w3.org/TR/NOTE-WC#Status","http://www.w3.org/TR/REC-xml","http://www.w3.org/schemas/WCschema#REC").
triple("http://purl.org/RDF/DC/Title","http://www.w3.org/TR/REC-xml","Extensible Markup Language (XML) 1.0 Specification").
triple("http://purl.org/RDF/DC/Date","http://www.w3.org/TR/REC-xml","10-February-1998").
triple("http://www.w3.org/TR/WD-rdf-syntax#type","Authors","http://www.w3.org/TR/WD-rdf-syntax#Seq").
triple("http://www.w3.org/TR/WD-rdf-syntax#_1","Authors","Tim Bray").
triple("http://www.w3.org/TR/WD-rdf-syntax#_2","Authors","Jean Paoli").
triple("http://www.w3.org/TR/WD-rdf-syntax#_3","Authors","C. M. Sperberg-McQueen").
triple("http://purl.org/RDF/DC/Creator","http://www.w3.org/TR/REC-xml","Authors").
triple("http://www.w3.org/TR/NOTE-WC#Language","http://www.w3.org/TR/REC-xml","en").

Figure 2. Triple representation of RDF data model

The structure of the triples is as follows:

triple(predicate,subject,object).
which can also be read as
predicate(subject,object).

The example in Figure X presents us the mechanism how RDF deals with higher order statements. If another property asserts something about another property, the RDF has decided not to allow nesting of triples but a mechanism called reification which provides a unique id for any assertion thus allowing it to be referer to from other triples. This mechanism also sets the recursion limit by not allowing reificated properties to be further reificated.

The following triple corresponds with four triples presented in the following examples:

triple(predicate, subject, object).
triple('type',newID,'rdf:Statement).
triple('predicate',newID,predicate).
triple('subject',newID,subject).
triple('object',newID,object).

Syntax

RDF could have chosen a special syntax but due to the popularity of the XML document encoding syntax, the decision was to build RDF on top of XML. This relieves RDF from some specification work as for example internalization (I18N) which is defined by XML to be based on Unicode.

The following example presents the XML encoding of the data model presented in Figure 1.

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
         xmlns:DC="http://purl.org/RDF/DC/"
         xmlns:WC="http://www.w3.org/TR/NOTE-WC#">
<rdf:Description about="http://www.w3.org/TR/REC-xml">
  <WC:Status rdf:resource="http://www.w3.org/schemas/WCschema#REC"/>
  <DC:Title>Extensible Markup Language (XML) 1.0 Specification</DC:Title>
  <DC:Date>10-February-1998</DC:Date>
  <DC:Creator>
    <rdf:Seq ID="Authors">
      <rdf:li>Tim Bray</rdf:li>
      <rdf:li>Jean Paoli</rdf:li>
      <rdf:li>C. M. Sperberg-McQueen</rdf:li>
    </rdf:Seq>
  </DC:Creator>
  <WC:Language>en</WC:Language>
</rdf:Description>
</rdf:RDF>

Figure 3. XML encoding of the data model in Figure 1.

Query Languages

In general, query languages are formal languages to retrieve data from a database. Standardadized languages already exist to retrieve information from different types of databases such as Structured Query Language (SQL) for relational databases and Object Query Language (OQL) and SQL3 for object databases.

Semi-structure query languages such as XML-QL [ref] operate on the document level structure....

Logic programs consist of facts and rules where valid inference rules are used to determine all the facts that apply within a given model.

With RDF, the most suitable approach is to focus on the underlying data model. Even though XML-QL could be used to query RDF descriptions in their XML encoded form, a single RDF data model could not be correctly determined with a single XML-QL query due to the fact that RDF allows several XML syntax encodings for the same data model.

Metalog

We feel that the query language we are about to propose and any other to be widely deployed on the Web must address the following requirements:

  1. The query language must be easy to author
  2. The query language must have extensible semantics
  3. The evaluation environment must be easy to implement

Requirement 1: Easy to author

To address requirement 1, we would like users to use a simple syntax which reads easily.

If the language of a document is X
and the author of the document is Y
then the Y can speak X.

If the "Language" of a DOCUMENT is X
and the "Author" of the DOCUMENT is Y
then speak of X is Y.

Mapping RDF constructs to Metalog syntax

In order to support different primitive constructs of the RDF data model in the query language, we proposed the following syntax sugar to the query language.

If a "Language" of a DOCUMENT is "fi" then ...

The a keyword indicates the value will be searched from a value of a Bag instance in the data model.

If the 2nd "Author" of a DOCUMENT is "Janne Saarela" then ...

The ordering keyword indicates the value will be searched from a value of a Sequence container instance in the data model. Also, as the order is significant, the match must be on the given listitem.

If the "Language" of a DOCUMENT is "fi" then ...

The the keyword indicates the value will be searched from a value of a Alternative container instance as well as a direct value in the data model. As Alternatives are used to indicate mutually exclusive values, the match can only happen for one query. Thus, the result of this query introduces a new fact in the data model.

Requirement 2: Extensible semantics

Requirement 3: The evaluation environment must be easy to implement

Metalog queries as RDF schemas

RDF schemas provide as a way to define type systems using the RDF data model. These types allow the authors of RDF entries to use specific properties with corresponding constrained property values with given arity.

We propose that metalog programs must have a corresponding RDF schema representation or extensibility. In this way, an author of a metalog query can point to a specific RDF schema representation of an existing metalog query and refine the query himself.

Metalog allows the use to point to an RDF schema with a namespace mechanism [wait for a good solid reference] that uses URIs. In this way, each predicate i.e. propertyName within a metalog query will be unique. Refining queries through URI addressing

Figure 4. Refinement of metalog queries using URI addressing.

Higher-order statements in Metalog

Evaluation environment requirements

Due to the ways how RDF can represent values, the evaluation environment should provide some a priori knowledg on how the data model can be queried in general. Thus, we present here some useful queries that should always be present in the query evaluation system either passed along with any given query or hard-wired in the evaluation code.

  1. direct value - there is a fact in the corresponding data model where the value is directly present in the triple.
  2. proxied value through collection - If a property has multiple values, the author may use different collections nodes (Sequence, Bag, or Alternative) to indicate whether the values preserve order or not, or whether they are mutually exclusive, respectively. In this case, the value is proxied through an instance of one of these nodes.

The following default rules define first of all corresponding rules for the previous value cases and then rules to determine reification and collection identify with reificated/4 and collection/1 predicates, respectively.


prop(Predicate,Subject,Object) :- triple(Predicate,Subject,Object),
        not(collection(Subject)).

prop(Predicate,Subject,Object) :- collection(CollectionObject),
	triple(Predicate,Subject,CollectionObject),
	triple(_,CollectionObject,Object).

collection(Object) :- triple('http://www.w3.org/TR/WD-rdf-syntax#type',Object,'http://www.w3.org/TR/WD-rdf-syntax#Bag').
collection(Object) :- triple('http://www.w3.org/TR/WD-rdf-syntax#type',Object,'http://www.w3.org/TR/WD-rdf-syntax#Alt').
collection(Object) :- triple('http://www.w3.org/TR/WD-rdf-syntax#type',Object,'http://www.w3.org/TR/WD-rdf-syntax#Seq).

Figure 5. Managing direct and collection values

Implementation architecture

The following subsections discuss the compilation process in between the three syntactical representations of the query language. It is essential that each compilation preserves the semantics of the query. In the following, we will demonsrate that there is an isomorphic mapping in between these three representations.

Metalog to RDF

The metalog language have been developed so that LALR(1) context free grammar can be used to parse it. We have thus been able to use bison to generate the front end to build the parse tree of the Metalog program.

The backend of this compiler walks the tree down generating RDF/XML descriptions while doing this. These descriptions use a specialized Metalog RDF schema which establishes the vocabulary i.e. the predicates that can be applied to given classes.


Figure X.

RDF to logic program

The compilation process from RDF to a logic program follows the front-end and back-end approach. Simple RDF Parser and Compiler (SiRPAC) has been developed in Java language to do the mapping process from the RDF/XML documents to the RDF data model expressed in 3-tuples (triples).

SiRPAC first builds the parse tree of the RDF/XML document using the Simple API to XML documents (SAX). This tool is publicly available at http://www.w3.org/RDF/Implementations/SiRPAC/.

We acknowledge the fact that the benefit of using SAX lies in the event-based management of big documents without building the complete parse tree in memory. However, we wanted to remain compiler independent and thus selected to use SAX.

The query compiler is based on the same approach as SiRPAC and it specifically takes into account the vocabulary of the Metalog RDF schema. These special schema constructs are recognized in the parse tree and they are compiled to logic program rules.


Figure X.

Logic program to RDF

It is essential to notice that this compilation step does not compile logic program rules nor queries into RDF/XML documents. The goal is only to compile the resulting facts of an evaluated query to an interchangeable representation in RDF.

Due to this limitation, any resulting fact from a query never uses the Metalog RDF schema vocabulary.


Figure X. Result of a logic program query.


Figure X. Result of a logic program query in RDF.

RDF to Metalog

Again, once we are parsing RDF/XML documents, the SAX based parse tree construction process is first activated after which the parse tree is traversed and the corresponding Metalog syntax is generated.


Figure X. RDF/XML document.


Figure X.

As input data we have been using a set of 2700 RDF data model triples that correspond with the data available at the World Wide Web Consortium technical reports page. This page presents the public documents the consortium has published along with their authors, dates, and URIs. The first example in Figure 1 is an excerpt of this data.

The queries we wanted to test were of N different types that will be discussed in the following test set-ups.

Trivial queries

We start with straight-forward queries using the example described already in Figure Y as our case example.

NAMESPACE URI "http://purl.org/schemas/DublinCore/RDF/" ALIAS uri1

IF "uri1:Creator" of DOC is PERSON and
"uri1:Language" of DOC is LANGUAGE
then "Speaks" Person Language.

Query 1 - Metalog syntax


Query 1 - RDF/XML encoding of the query


Query 1 - Query in prolog syntax

Related work

The use of Web infrastructure to accommodate logic programs has been suggested by (Sandevall, 1996) and (Loke & Davidson, 1996). The latter approach suggests using familiar logic program notation to place facts and queries on HTML pages. The embedded rules also have the ability to refer to other HTML pages with other predicates using a namespace mechanism. In this way, their evaluation context increases over the amount of HTML pages they retrieve to find facts that satisfy the queries.

Future work

Conclusions

Acknowledgements

The authors would like to thank Bert Bos for his help in running the test sets.

References

  1. Das, S.K. (1992). Deductive Databases and Logic Programming. Addison Wesley.
  2. Lassila, O., Swick, R. (1998). Resource Description Framework (RDF) Model and Syntax Specification. W3C Working Draft.
    http://www.w3.org/TR/
  3. Loke, S.W., Davison, A. (1996). Logic Programming with the World Wide Web. Proc. of the 7th ACM Conf. on Hypertext.
    http://www.cs.unc.edu/~barman/HT96/P14/lpwww.html
  4. Niemelä, Simons, P. (1997). Smodels -- an implementation of the stable model and well-founded semantics for normal logic programs Proc. of the 4th Int. Conf. on Logic Programming and Non-Monotonic Reasoning. Dagstuhl, Germany.
    http://saturn.hut.fi/pub/papers/lpnmr97-sd.ps.gz
  5. Ramakrishnan, R., Srivastava, D., Sudarshan, D. (1992). CORAL: Control, Relations and Logic. Proc. of the Int. Conf. on VLDB..
  6. Sandewall, E. (1996). Towards a World-Wide Data Base. Proc. of the 5th Int. WWW Conf..

Appendix A - Query schema in RDF

<RDF xmlns="http://www.w3.org/TR/WD-rdf-syntax#"
     xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#"
     xmlns:rdfs="http://www.w3.org/TR/WD-rdf-schema#">
<rdfs:Class ID="Procedure" />

<Predicate ID="Head">
  <rdfs:comment xml:lang="en">Head of the procedure</rdfs:comment>
  <rdfs:domain rdf:resource="#Procedure"/>
  <rdfs:range rdf:resource="#Connector"/>
  <rdfs:range rdf:resource="#Predicate"/>
</Predicate>

<Predicate ID="Body">
  <rdfs:comment xml:lang="en">Body of the procedure</rdfs:comment>
  <rdfs:domain rdf:resource="#Procedure"/>
  <rdfs:range rdf:resource="#Connector"/>
  <rdfs:range rdf:resource="#Predicate"/>
</Predicate>

<Predicate ID="Predicates">
  <rdfs:comment xml:lang="en">Predicates combined with a connector</rdfs:comment>
  <rdfs:domain rdf:resource="#Connector"/>
  <rdfs:range rdf:resource="#Predicate"/>
  <rdfs:range rdf:resource="#Connector"/>
  <!-- this last range definition enables recursion -->
</Predicate>

<rdfs:Class ID="Connector" />

<rdfs:Class ID="Conjunction">
  <rdfs:subClassOf rdf:resource="#Connector" />
</rdfs:Class>

<rdfs:Class ID="Disjunction">
  <rdfs:subClassOf rdf:resource="#Connector" />
</rdfs:Class>

<rdfs:Class ID="Negation">
  <rdfs:subClassOf rdf:resource="#Connector" />
</rdfs:Class>

<rdfs:Class ID="Predicate" />

<Predicate ID="Variable">
  <rdfs:comment xml:lang="en">Variable within a predicate</rdfs:comment>
  <rdfs:domain rdf:resource="#Predicate"/>
  <rdfs:range rdf:resource="http://www.w3.org/FictionalSchemas/useful_types#String"/>
</Predicate>

<Predicate ID="Constant">
  <rdfs:comment xml:lang="en">Constant within a predicate</rdfs:comment>
  <rdfs:domain rdf:resource="#Predicate"/>
  <rdfs:range rdf:resource="http://www.w3.org/FictionalSchemas/useful_types#String"/>
</Predicate>
</RDF>

$Author: jsaarela $ - $Id: paper981124.html,v 1.1 1998/11/25 16:02:16 jsaarela Exp $