Cambridge Semantics: Position Paper RDF Access to Relational Databases
Interest in RDF Access to Relational Databases
Cambridge Semantics Inc. develops semantic middleware to support the development of advanced semantic applications. We believe that building applications that take advantage of the formality, agility, and the distributed nature of RDF data and the stack of Semantic Web technologies requires an approach to middleware that bridges a world of poweful semantic data and services with a world of quick and flexible application design, development, and deployment. We also believe that enterprise adoption of semantic technologies will continue to be evolutionary, and one cannot expect to succeed with semantic applications that ignore the substantial amount of existing data stored in traditional relational databases.
To build the above bridge, the semantic data and services that sit atop legacy data and services must hold a variety of properties to accomodate the requirements of semantic application architectures. Cambridge Semantics believes that interoperability in providing these properties is necessary to reasonably expect to develop downstream middleware that works in conjunction with both native semantic stores and semantic layers on top of legacy stores. As already noted, relational data is a substantial portion of existing enterprise data, and as such, we feel this workshop is important to establish common properties for vendors providing solutions that expose relational data as semantic data (RDF).
Desired Areas of Interoperability
Cambridge Semantics has identified five areas that will benefit from vendor interoperability in providing RDF access to relational databases:
- Configuration. Perhaps the most divergent area among current solutions for accessing RDBMSes as RDF, configuration refers to the mapping information (generated either automatically or manually) that relates one or more relational schemas to an RDF vocabulary, schema, or an OWL ontology. Cambridge Semantics supports an interoperable approach to configuration based on mapping information supplied as RDF data. We feel that specifying configurations as RDF provides the greatest flexibility in configuration query and analysis, and in moving between different underlying RDBMSes.
- Service discovery/description. Perhaps even more so than with existing RDF stores and SPARQL endpoints, we expect that implementations of RDF access to relational databases will vary significantly in the configuration options, data access methods, and other features that they support. Because of this, we feel that an interoperable service description language/vocabulary will be important to the success of efforts to build vendor-neutral semantic applications. Possible candidate designs on which this work could be based are described on the SPARQL Endpoint Description ESW Wiki page.
- Query. Clearly, read access to the RDF form of the relational data is paramount. Cambridge Semantics expects to accomplish this via SPARQL, though acknowledges that a variety of extensions (in particular, aggregate functions) will be quite useful when working with semantic data from underlying relational stores.
- Update. While there is a class of applications that benefits from read-only access to relational (and other) data exposed as RDF, Cambridge Semantics believes that most business applications require read-write functionality. As such, it is important that implementations of RDF-RDB access provide interoperable approaches to updating the original relational data. SPARUL is one candidate approach to this; we have no strong feeling as to the proper approach to this, and are curious to learn what other approaches exist in current vendor solutions.
- Triggers. To provide powerful workflow environments based on policies and rules, and to create rich application experiences with real-time access to enterprise information, we believe that it is important to expose an RDBMS-like trigger mechanism to clients accessing the stores via an RDF layer. We surmise that work around RDF graph deltas, changesets, and replication could all be relevant. Work such as the RDF Diff design issue, Dbin, and the replication capabilities of Open Anzo (a recent fork of the IBM Semantic Layered Research Platform, sponsored in part by Cambridge Semantics) may serve as potential bases for trigger/eventing approaches. We'd be interested to learn of current approaches to this challenge.
Conclusion
Cambridge Semantics views itself as a technology vendor that will consume other vendors' solutions to the challenges of providing RDF access to relational databases. We believe that, as with other parts of the Semantic Web tehcnology landscape, interoperable technology standards (whether de jure or de facto) are a large part of what makes this technology feasible and appealing. Finally, we believe that it is not enough to simply look at configuration and read/query access to relational data in a semantic world; rather, it is important to examine other features that will enable the semantic applications that will provide business value in exchange for investments in Semantic Web technologies and infrastructure.
About
The positions herein represent those of Cambridge Semantics as authored by: