Query By Assertion

status:

just a litte note, explaining some of my thoughts.

Background

There's been a lot of activity on languages and interfaces for querying RDF knowledge bases. There's an approach to query architecture which appeals to me, so I've given it a name and am writing about it here. I doubt I'm the first person to suggest it, but I haven't been able to find prior art yet.

I've talked about this from time to time over the years, such as in this thread. Following that, I attempted an implementation, but I got hung up on HTTP issue.

The Approach

Instead of asking, "Who is Bob's father?" one can say, "I want to be told the name of Bob's father." The latter form is an assertion which would have the same effect as the first form on an appropriately programmed server (such as a human). Traditional queries require a query language in which logic sentences can be embedded with query variables; query by assertion requires the logic be sufficiently expressive and that it have some kind of request-to-send-message construct.

Query variables are handled as universally quantified variables inside the query-assertion: "What are the last names of the students in 66.209?" turns into "for all students enrolled in 66.209, I want be told the last name the student."

More formally, instead of (query '(must-bind (?name)) '((?x enrolled 66.209) (?x last-name ?name))) we can say (assert '(forall (?x ?name) (implies (and (?x enrolled 66.209) (?x last-name ?name)) (send foo ?name))))

Motivation

Now why would we want to use this more verbose, more obscure form?

  1. We don't need another language. A query language has all the usual problems of extensibility. How will they be addressed? (Easy enough: use a query ontology instead. Still.)
  2. It's more peer-to-peer and distributable. If we phrase the requests using end point addresses, instead of "you" and "me", then we see that the messages can be passed around while maintaining their objective truth. It uses more of the RDF messaging infrastructure.
  3. Adding features for security, resource limits, prioritization, knowledge base selection, etc, etc, all fit in rather naturally into the phrasing of the request.
  4. Client-driven extensibility. As in the old X11-vs-NeWS debate, the QBA design is more flexible — the server is Turing complete (assuming the logic is at least Horn logic), so the client can program it to follow any desired query protocol. And yet, given the formal/declarative nature of the programming, security guarantees are possible. (Performance guarantees, on the other hand, can only be given for certain constructs. But that's okay.)

Equivalence and Implementation

One could implement a DQL answerer using rules and a SEND primitive. Maybe I should.

The QBA server sits in a standard agent perceive-deduce-act loop, where perception includes receiving assertions and acting includes sending responses. If the transmission format is actually a traditional query language (eg DQL), then it can be added almost directly, with little concern for trust issues.

Where might the presumed superiority of QBA show up? A prolog based implementation should similar performance.

Issues

Response by String or Sentence?

There are a few variations with QBA. The most fundamental is whether one asks to be sent (1) facts (in some pre-arranged knowledge representation language) or (2) strings. If Bob's father is named Sam, would you rather the above query be answered with "Sam" or the formula which encodes father(Bob, Sam)? I think the string approach is more fundamental, but the fact approach is often a nice abstraction to use.

In fact, using strings feels a little like level breaking. We might like the query-answering-agent to be agnostic about languages. But maybe part of it is, and part of it isn't.

For now, let's assume both: SEND for strings, and TELL for asserting sentences.

Infer a request or a goal-state?

What should be inferred?

(Those curly-braces are LX-Maple notation for quoting (reifying) sentence, much like in N3.)

Handling Repeats

How do we handle a repeated query? We don't necessarily need to, but it seems like a good idea. We could add an "after-some-time" parameter. As long as the answer can't prove they are the same request, it should answer it again, I think.

Fit With HTTP

How do you fit this with HTTP which has GET, PUT, POST? I take these primitives to mean "get the contents of the KB", "replace the contents of the KB", and "add to the KB". So QBA would naturally put querying under POST, and never use GET. This is bad web architecture.

Solution 1: for each KB at URI U1 and QBA sentence S, define a KB at URI U2 from which you can GET the result string for S. U2 is formed as concat(U1, "/qbaxml=", url-encode(rdfxml-encode(S))). This works with a KB having a single output-string, not various send-requests. That's different, but maybe we can bridge the gap, such as by send(STDOUT, "string"), where STDOUT basically is a variable bound to whoever is doing the asking.

Solution 2: put the QBA string in a header on the GET. This allows longer strings, can potentially stilled be cached (see the "vary" header), but does not allow linking.

These work together okay. You can use 1 until the URI gets to be too long. Having qbaxmlz might be nice too, using gzip and base-64 (or slightly higher) encoding, which I'd expect would allow RDF graphs at least 3 times as large to fit in URIs.

What is the generalized meaning of this qbaxml construct, then? (Do "accept" headers fit in somewhere, too?) It's a way of building one KB from another via filtering rules — making a smaller KB. Maybe something closer to cwm --filter would make more sense.

None of this matters, of course, if you're doing RDF message passing over UDP, SMTP, etc. These issues are an artifact of putting RDF messages into persistent slots (web addresses) and then wanting to retrieve only peices of them. This is awkward. There is an agent responsible for answering your requests (the web server) and it can do things like make parts of the information appear at another address (as in solution 1), or simply do arbitrary other processing (as in solution 2), but it gets complex.


Sandro Hawke
First: 2002/10/17; This: $Id: Overview.html,v 1.2 2002/10/17 16:02:07 sandro Exp $