Suggestions for Semantic Web Interfaces to Relational Databases

Mike Dean
BBN Technologies
mdean@bbn.com

Position paper submitted to the W3C Workshop on RDF Access to Relational Databases

Introduction

Much of the data available on the World Wide Web currently resides in relational databases. It should stay there, to continue to support existing applications and to leverage the scalability, ACID properties, security, and performance optimizations engineered into RDBMSs over many years. Such databases already back many of the data sources on the Semantic Web.

I and others at BBN have been involved in the Semantic Web since the start of the DARPA Agent Markup Language program in 2000. We have developed numerous ontologies, data sources, tools, and applications. Recently, we have begun to package some of these tools into the Asio Tool Suite. Relevant to this workshop is our Semantic Distributed Query capability, which allows SPARQL queries expressed using a domain ontology to span 1 or more relational databases, web services, and other data sources. Individual data source ontologies are mapped to the domain ontology using SWRL mapping rules.

Semantic Web Characteristics

Characteristics we consider important for Semantic Web applications include:

Approaches

Our first interfaces to relational databases involved hand generation of a domain ontology and a custom Java servlet. This produced a high quality Semantic Web representation and afforded tremendous flexibility, but was labor intensize. Exposing several related tables in one large relational database required several thousand lines of Java code. Although we added support for more tables as needed, we never completed support for the entire database.

Our general approach to data integration is to expose each data source in a Semantic Web representation that remains faithful to its original data model. We then use SWRL to map it to a domain (application) ontology relevant to a particular community of interest. The data source can then be reused by other communities by developing additional mapping rules.

We realized that this approach could be employed or extended to provide a nearly custom-quality representation with lower development costs. A generic tool could be used to generate a basic ontology and expose relational data. SWRL rules could then be used to transform this representation into a richer ontology. This is the approach taken in Asio. For the generic tool, we initially used D2RQ and then developed our own Semantic Bridge for Relational Databases (SBRD).

Our original 2 level data source to domain ontology mapping approach has generally evolved into an n-level approach, with use of SWRL to produce the data source ontology from a more basic ontology and additional SWRL "business rules" extending the domain ontology.

Suggestions for a Standard Mapping

We would support development of a standard representation for expressing mappings from relational databases to the Semantic Web. Rather than just making the data available (e.g. using only datatype properties), this mapping should result in as "good" a Semantic Web representation as possible, i.e. addressing the Semantic Web Characteristics discussed above.

In general we expect that this representation would be driven from table and column schema information such as that returned by the JDBC metadata API, optionally augmented by additional information. While databases are increasingly being designed using tools such as UML and ERwin, it is unlikely that a mapping could automatically address the variety of tools and transformations likely to be encountered. The mapping should, however, accommodate the range of design patterns typically employed in these transformations.

Following are several suggestions for what this standard mapping should support, which might contribute to some future charter, requirements, or goals. We are not aware of any current tools that support all of these suggestions.

dynamic
The mapping should focus on dynamic interfaces to the database, where Semantic Web content is produced on demand, rather than statically converted.. This avoids duplication and stale data.
foreign keys
Metadata information on foreign keys should be used to generate object properties referring to other entities in the database. Since foreign key information is often missing from relational databases, there should be a way to provide it directly when generating the mapping.
resolvable URIs
Each entity in the database should have a resolvable URI, typically formed from a pattern involving the table name and primary key value(s). The statements returned by an HTTP GET on this URI may vary, as in the specification of SPARQL DESCRIBE. It should generally include the information contained in the record directly associated with this entity, but not the information in the records that refer to it using a foreign key. It should also be possible to use any URI returned from a SPARQL query in a subsequent SPARQL query.
implicit class hierarchies
A type or similarly named column usually signals the existence of an implicit subclass hierarchy below the containing record, possibly including multiple inheritance. It should be possible to provide these subclasses along with the associated values and have the rdf:type statements generated automatically.
external foreign keys
Some fields may contain coded fields (e.g. zip, state, or country codes) that refer to instances described elsewhere on the Semantic Web. It should be possible to represent these as object properties and specify mappings from these codes to URIs.
enumerations
Coded fields with fixed sets of values should be translated to object properties and enumerated using owl:oneOf.
superclasses
A class hierarchy is typically implemented in a relational database by employing one table for the (possibly abstract) superclass and another table for each of the subclasses. The mapping should address this design pattern by supporting an n x m relationship between classes and tables and by naming instances using their most specific class.
binary relationships
All relationships in an entity/relationship/attribute model are typically represented in a relational database as tables with foreign keys for each of the involved entities. Binary relationships without attributes can be represented as OWL object properties. Unary relationships can be represented as OWL classes. Other relationships can be represented as n-ary relations.
efficiency
The mapping should allow automated generation of efficient queries. We have found that SQL queries generated by a Semantic Web interface will be compared to hand-generated queries in a custom application with respect to number and complexity. A simple SPARQL query should translate into a single simple SQL query.
not just SPARQL
SPARQL will often be used to access relational databases. However, the mapping should also support HTTP GET and other interfaces to relational data. SQL and similar general query interfaces are increasingly being hidden within application-specific web services.
security
Not all databases are public. It is often necessary or preferable to use existing database security models and mechanisms rather than duplicating them. Any access mechanisms should at least allow for passing database user names and passwords and authentication using client certificates passed over SSL.
limited tables and views
For security and other reasons, it may be necessary to restrict the interface to specific tables and views.
update
Initial efforts should focus on read access to relational databases, but allow for write access in the future. Any write access needs to be limited to invertible mappings.

Potential Areas for Standardization

There are multiple aspects of Semantic Web interfaces to relational databases that are good candidates for standardization, individually or in combination:

Conclusions

This position paper makes several suggestions for Semantic Web interfaces to relational databases, based on our experience with numerous data sets and applications. This is an important component of the Semantic Web, and should be consistent with other Semantic Web goals. We support the creation of best practices and/or Recommendations in this area and look forward to the workshop.
Last modified: Mon Sep 10 23:05:10 Eastern Daylight Time 2007