RDF Database Access Protocol

Status

Experimental (Research). See also RDF federations.

Authors

Background

RDF, defined in RDF Model & Syntax 1.0, is a language or data format for transmitting small relational databases. It uses a variant of the relational model which allows RDF databases to be automatically combined when their creators use the same schemas and datatypes. RDF has also been seen in other ways, but this database angle is the one of interest here.

To date, most RDF work has involved either intentionally ad hoc RDF protocols, transmitting entire RDF databases via HTTP, or an overwhelming focus on queries. This is an informal attempt to move along the discussion to cover more mature and potentially standard functionality. @@@ list of related work

Use Cases

It's important to know what we want the protocol to be used for. Here are the applications we've been primarily considering so far.

Web Page Access Control Lists (ACL)

The web server queries the database to see who is allowed to access a given URI before continuing processing. Accessors are grouped to ease administration. A server-side web application (itself access controlled) allows modification of the access control information. Command-line and non-web GUI clients for modification and testing would also be good, although they require their own authentication mechanisms. (Implemented for W3C website. @@@@link)

Web Page Annotations (Annotea)

A modified web browser communicates with a database of "annotations" about pages or portions of pages. The browser allows annotations to be added, and automatically indicates the presense of annotations. (Implemented in Amaya. @@@@link)

Semantic Web Browser

A user-interface which lets people view and alter the properties of objects as recorded in RDF databases. The user can import databases (manually or automatically, following the origin of RDF identifiers, etc) and attach validation and inference processors. This is the application which divides RDF systems from the more traditional ones, as it spans systems and schemas.

There are some early approaches to this (done as server-side web applications): a mockup, @@@ em's, @@@ mike dean's.

Subsuming Web Services

Web Services are a style of network application servers with procedure-call interfaces, generally using XML and HTTP and available for use between parties with no other contact. Like perhaps all computer applications, web services can be designed as database applications which will often be simpler.

Use case: consider all the classic web services examples (get the temperature, send a message to a pager, buy a book, etc) as RDF database applications. Go on to more interesting versions, like: order two books from a bookseller only if both are in stock

Blindfold Grammar Action-Annotations

Turn text into data in an RDF database with a yacc-generated-parser using RDF database operations as its action-annotations. See, for instance, a unix /etc/group example.

Protocol Abstraction

We'll say that RDF Database Clients communicate with RDF Database Servers by exchanging serialized objects. The serialization may look like SQL, LISP, XML, N-Triples, etc. The underlying protocols may be TCP, UDP, HTTP, SMTP, etc. In other words, something like

  INSERT 
    INTO DATABASE <http://example.com/temperature-readings>
    TRIPLE (<http://example.com/temperature-readings#reading311232>,
            <http://example.com/temperature-readings#locationName>,
	    "Waltham");

is basically the same as

  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns="http://www.w3.org/2002/01/rdap/ont-1#" >
    <InsertionRequest>
       <database rdf:resource="http://example.com/temperature-readings" />
       <rdf:subject 
        rdf:resource="http://example.com/temperature-readings#reading311232" />
       <rdf:predicate
        rdf:resource="http://example.com/temperature-readings#locationName" />
       <rdf:object>Waltham</rdf:object> 
    </InsertionRequest>
  </rdf:RDF>

and we wont worry about which kind of syntaxes people are using for now. (These are just vague examples; I don't know if they are really workable syntaxes.)

Note that we cannot nicely map this to an API, because some of the operations have a recursive structure. For example we'd like "delete triple(a,b,c)" and "foreach ?x where ?x=c do delete triple(a,b,?x)" to use the same "delete" operation. We'd also like to be able to nest foreach within foreach, etc. We can only do this with an API like "do(RequestObject) returns ResponseObject". So our focus remains on designing Request and Response objects.

We'll consider HTTP GET of an XML/RDF file as a kind of degenerate query-for-everything, PUT as a replace-entire-database-contents. That works for some things, but many other times we'll want more fine-grained access.

Basic Requests

The database access "Request" objects (which a client sends a server) can be grouped into the following areas of functionality. Items marked with (+) are optional convenience features, which can be implemented in the client or a macro-functions in the server.

Database Existence

create database <identifier>

drop database <identifier>

+ use database <identifier> [ sets default ]

Modification

insert [into <database>] <triple>

delete [from <database>] <triple>

+ replace old_identifier new_identifier

Quantification

foreach <variable> where <condition> do <nested request>

introduce <variable>

(These are similar to For-All and There-Exists in first-order logic, but they involve immediate action rather than ongoing restraint.)

Query Response

(Given quantification, "querying" is just about returning some results to clients.)

+return <triple>

return <identifier>

Transactional Grouping

All operations done inside a transaction are made in a tentative state, where they are not visible to other clients until committed, and they can be rolled-back. Results to other clients which might differ if the changes were actually made are postponed (ie portions of the database are locked). See "database names" (below) for how clients can avoid being locked-out.

create transaction <new trans-ident>

+use transaction <trans-ident>

in <trans-ident> <nested request> [ if not using "use" ]

rollback [<trans-ident>]

commit [<trans-ident>]

+assert <condition> [ if !<condition> then rollback. ]

(Or simpler "begin transaction" and you can't name them.)

Client-Defined Functions

It might be nice to allow client-defined macro-style functions, both as procedures and as returning-a-value functions to be used in other commands.

Object Identification

These requests use terms (identifiers) of several types:

Conditions (boolean expressions)

Datatype operators are all done inside triples.

"closed world" negation here is something to be careful of; the use case will involve logics and triggers. That is, in general, we want everything that is true of a database to also be true of the union of that database and another. But there are times where it's so useful to do otherwise, that we want to allow it.

<database> contains <triple>

<database> does not contain <triple>

contained <triple> [using default db]

not contained <triple> [using default db]

<condition> AND <condition>

+<condition> OR <condition>

+<condition> XOR <condition>

Object Names

This syntax hopefully keeps these objects clear of other database objects, and gives us a place for indirection via an atom table, etc.

string("...")

an object with these string contents or just "..."

global("...")

an object with this global (URI) name or <...> or qnames

local("...")

an object with the local (_:foo) name or _: qnames

Database Names

This is just a clever idea of mine; I dunno if there's something like it in existing RDBMS' or not.

asIfRolledBack(db)

a never-locked read-only view of the database as it would appear if all pending transactions were rolled back.

asIfCommited(db)

a never-locked read-only view of the database as if would appear if all pended transactions were successfully committed. (but still maintaining database validity and security; that is, we assume no conflicting changes will be made.)

Responses (Server->Client)

Responses to Queries (returned identifiers and triples)

Errors and Confirmations

(Asynchronous) Invocation of Client-Side Functions

Not needed. Use RDF federations instead.

6. Implementation Considerations

We may be able to do this all as a thin layer over PostgreSQL or (if a syntax is SQLish enough) by providing a few functions or view-definitions and letting it parse the language. MySQL 3 lacks transactions, but maybe we can fake it; MySQL 4 should work.