The Semantic Web and the "History" Technique

The world and our understanding of it changes over time, but the changes which have occurred in the past do not, themselves, change. We can use this underlying stability of historical fact to manage the complexity of dynamic information systems. We can establish a duality: sometimes we view the dynamic current state of the system, sometimes we view the collection of historical states. The sequence of historical states grows larger (and smaller when certain past states are forgotten), but the states themselves never change.

The use of this duality is hardly new: maintaining a transaction history is common throughout robust systems, such as high-value commercial transactions (think of real-estate title searches) and database management systems (with transaction logs). Recently it has become commonplace as "source control" for software development. Perhaps more fundamentally, the study of physics since at least Newton's time has involved explicit awareness of change-over-time.

This is an essential perspective as we consider RDF databases: do we codify a transactional database access protocol, or do we simply allow appending to history? Some of the harder database problems look a lot simpler with the latter approach. Of course the interfaces are perfect duals; we can define either or both. If we don't define the history one, though, we need db-specific instead of shared solutions for transactions, access control, query-routing, etc. @@@ go on... how much is this about SemWeb vs protocols in general? "Everything had an RDF Translation".

Messages and Web Pages

A message is a historical artifact, a collection of information, expressed in some language and transmitted at some point in time. It is by definition immutable: our knowledge and understanding of a particular message may change, but the truth of what was transmitted is a matter of history.

In contrast, a web page is an identifiable collection of information in the web's information space. It can be highly dynamic, changing with each viewing. Messages are transmitted in the act of viewing (the client sends a request message to the server, the server sends a response message back to the client) but they should not be confused with the page itself. The response message in a successful viewing contains essentially a serialization of the information comprising the web page at the time of viewing.

The contents of a message may, of course, be stored on a web page. Information about a message such as the time of its transmission and other context may also be stored on a web page. These web pages could potentially change over time as evidence about the true contents and context of the messages is uncovered or reinterpreted.

Sentences and Knowledge Bases

Similarly, an expression in a logic language is not, itself, mutable. To change it is to make it a different expression. A storage facility for an expression is often called a knowledge base, although the more precise term might be an installation of a knowledge base management system.

The Sum of History

In physics, it is easy take one state, add some changes, and see the resulting state. At least it's easy if you know vector arithmetic.

A history needs a language for completely describing the changes from one state to another.

For RDF without bNodes, this requires only two predicates whose range is the change and whose domain rdf:Statement: addition and deletion. (Sometimes we might to know we have all the additions and deletions, and be tempted to have lists of them, but that's a general problem and should be addressed in a general manner.)

For RDF with bNodes, however, this is not always possible. Imagine that you know Joe is wearing a red hat, but you say only, "Someone is wearing a red hat". Now Joe takes it off. Saying, "Someone is no longer wearing a red hat" is not right at all -- that might have been true before (lots of people are no longer wearing red hats), and it doesn't even contradict the notion that Joe is still wearing a red hat.

Sometimes ontological knowledge can help. If you know only one person was born on July 3rd, then "Someone born on July 3rd is wearing a red hat" can be effectively contradicted by "Someone born on July 3rd is no longer wearing a red hat". But, since we wont always have such knowledge: Avoid ambiguous identifiers! They serve no purpose and often make retraction impossible! <sigh/>

The Web Page as a Query-Result

A simple and coherent view of web pages (especially RDF ones) is as the results of a query against the summation of all known messages.

More precisely, they are database views. They may be materialized (forward-chained, generated when you learned of the messages) or not (backward-chained, generated when someone asks for them).

What happens if someone want to modify the view? You let them try, then you see what message you would have needed to get to make that result. Then you have them send that message.

Related Ideas

When you're communicating by modifying shared information space, it's sometimes nice to actually get the modification messages as messages. This idea is variously called publish/subscribe, observer, web-push, etc. I like to call these "standing queries".

The problem of updating N web pages with each of M messages sometimes scales a better than N*M. For instance, if there are clusterings by topic of the pages (eg: pages which mention money and those which do not), the M can be split as well (give or take variables). This is the dual problem to some of query optimization.

Query-by-assertion fits in here somewhere, too. It certainly makes standing-queries more logical.

Notes: RDF Delta/Sigma (Diff/Patch)

This work is being done as part of the MIT/LCS DAML Project under the MIT/AFRL cooperative agreement number F30602- 00-2-0593. This work is not on the W3C recommendation track and is not the product of a W3C working group or interest group.

Sandro Hawke
First: 2002/12/06; This: $Date: 2002/12/10 14:52:28 $