A database federation is a database composed of multiple autonomous databases. This is the natural domain of RDF, which addresses the traditional naming problems in such systems by using web-style identifiers for schemas, objects, and datatypes. With RDF as a foundation, building a database federation with distribution transparency is fairly straightforward.
We imagine an RDF database federation appearing, via the RDF Database Access Protocol as just another database, although it may offer addition features for clients wanting to avoid total distribution transparency (eg for provenance or debugging) or manage the federation.
We view database logics for security, validation and inference as additional members in the federation. This approach requires a sort of introspection, where federation members have some access to the rest of the federation, but still offers a relatively simple approach to complex functionality.
The central access point of the federation is a module we call "the dispatcher." Each database operation request arrives at the dispatcher, is matched against the current federation map (below), and is sent out to the appropriate members, as if the dispatcher were the client. The responses arrive back at the dispatcher, are combined, and a response is sent back to the original client.
If the federation map is trivial (just a set of members), the dispatcher's rules are very simple. All operations go to all members. If any member rejects an operation (perhaps for security or validity reasons), the operation is rejected for the entire federation. Query results are combined (via union) from all members. Insertions and deletions are passed on to each member. All changes occur in distributed (2-phase) transactions, where no change is committed anywhere unless it is accepted everywhere.
It may be possible to implement some kinds of dispatchers as client side modules, but in general dispatchers are trusted by the federation members. For instance, some member X may allow access only when it is in a federation with another member Y which does detailed access control for X. This requires X trusting the dispatcher to always check with Y.
Dispatchers may be replicated for performance and fault tollerance. @@@@ issues!
A federation is defined by a federation "map" which enumerates the federation's member (component) databases, along with configuration information which helps performance and in some cases gives different results.
One part of the map is "territories", which allow the dispatcher to skip certain members for certain operations based on the particular triples involved in the operation. The map says things like: "all the triples I hold fit into one of these patterns; assume I don't have anything else;" and "I will allow insertion into the federation of triples fitting this pattern, but I wont store them, so you don't need to tell me about them." @@@@ details needed!
Another part of the map is "priorities", which group the members into layers which are processed sequenctially. All members in the highest priority layer have an opportunity to process an operation before the lower priority layers. Members below the "commit" priority (a special priority value) do not have an opportunity to reject operations. (That is, all layers down to and including the "commit" layer participate in the 2-phase commit.) Priorities also allow control over interceptions, where a member can accept an operation and hide it from lower priorities members (or give them a modified version).
Member databases use essentially the same Request/Response protocol as any other RDF Database, but with some additional information present in the requests and responses and a few additional operations.
Response: Error Levels
Response: Interception (replacing the Request)
Introspection: requests include a handle to federation, so this member can make client-like requests of the federation. These requests use a form of proxy authentication; other members will know they come from a fellow member. The issuing member can of course recognize its own operations either by seeing their own identity as the client or by recognizing the requests identity.
A federation can have one or more high-priority members which reject unauthorized operations. They can be custom coded, or have general logic driven by information in the federation (available through introspection) or another database.
Data validation can also be performed by members, custom coded or data-driven. Some validation can be done at a high priority, where just looking at the operation is enough. Other validation can be done later (but at/above commit priority) by introspectively looking at the tentatively-modified database. This requires introspection to be able to be placed inside a specific transaction.
Below-commit members can add new data to the federation in response to data which arrives and satisified inference rules. Care must be taken to remove data as appropriate (or, worst case, disallow deletions in the appropriate territories).
If the inferred data is to be subjected to validation, the inferrence should occur above-commit and be in the same transaction.
A member can respond to query operations by attempting to backward-chain through inference rules, using the federation as a knowledge base. The rules themselves might be in the federation, but don't need to be.
It is possible for a member to respond to an insertion by attempting to remove data from the federation. This could get very complicated, but may be useful. It seems more elegant and simple to allow this behavior than disallow it.
$Id: federation.html,v 1.2 2002/01/08 05:23:01 sandro Exp $