RdfDB Application Environment

One of the key features that the semantic web brings us is unique identifiers for a global distributed database. User-oriented tools for the semantic web are largely one-off applications with development and support coming from programmers with considerable investment in the learning curve. But what are the alternatives? What semi-coddling environment can be presented to a user for simpler application development? For deployment and market experience, we can look to Lotus Notes and several similar open-source development efforts.

In Lotus Notes, a relatively inexperienced developer may craft applications for a user base with little computer saavy. This was a goal of Notes and it has proven itself in many large corporate environments. Like most 4GLs, it has a legacy of macro functions with "controversial" side-effects but the underlying relationship (@@@ name) identification and data and structure replication has provided a solid environment for large distributed databases.

What if such an engine were built on top of the semantic web?

Existing Semantic Web Models

Getting data into the semantic web is pretty easy; one publishs it in RDF, N3, SOAP, or perhaps colloquial XML with namespaces. One could argue that publishing any data in a domain that has a defined mapping to semantic web structures constitutes publishing it in the semantic web as consumers may get to that data via a gateway. The problem wiht this model is that great effort needs to go into harvesting this data. Search engines or domain-specific data aggregators like user agents need to maintain large lists of data sources. Source maintanance may be semi-centralized through inderection but this still represents considerable administrative effort.

Advantages of Pre-aggregated Semantic Web Databases

There are ususally four compelling arguments for aggregating data into a centralized relational database.

administration: It is easier to manage a central repository of data than instances of the data spread amongs the consumers.
avoid version skew: Distributed, otherwise unmanaged data may be updated in one place and not be propagated to other consumers needing the update.
access speed: Centralized databases are optimized to have mutexes at the lowest level of the read/write process. Non-centralized databases put these mutexes up at higher levels of protocol, for instance, when someone updates a web page with new information.
structural clarity: External keys ensure node convergence [use better term or explain this one].

Some organizations choose to go all out on data centralization, either through all-encompassing relational databases, or through more fluid databases like Notes.

All of the above arguments except for structural clarity are valid for promoting larger centralized semantic web databases. That data is available in, say, RDF, does not make it easier or faster to access, or easier to maintain. The structure inherent to the semantic web will make the process of data centralization simpler and therefor probably even more attractive than the aggregation of chaotic data into a relational database.

Disadvantages of Pre-aggregated Semantic Web Databases

There are some reasons not to take centralization too far.

autonomy: Data that crosses organization boundries is easier maintained through indirection than treaty. [All right, so there's a treaty either way, but the indrection treaty need not go into write access issues, just maintainance and responsibility for maintained data.]
rigor: Not all data can be maintained centrally. Organizations that maintain most of their data centrally may find it seductive to model it as if all data were centrallized, hardcoding the remaining data into the applications.

notes

inter-domain gateways: Find out how Notes replicated data between domains. Inter-domain mail exchange may use such a mechanism. It could also just be a custom hack for SMTP-like database elements.
choose an existing effort: Yoga, Casbah, and CNU Gather all have some code developed. Choose based on investement, enthusiasm and accepting community and architecture.

Eric Prud'hommeaux

Last modified: Mon Apr 2 08:42:57 EDT 2001