Suggestions for Semantic Web Interfaces to Relational Databases
Mike Dean
BBN Technologies
mdean@bbn.com
Position paper submitted to the
W3C Workshop on RDF Access to Relational Databases
Introduction
Much of the data available on the World Wide Web currently resides in
relational databases. It should stay there, to continue to support
existing applications and to leverage the scalability, ACID properties, security, and
performance optimizations engineered into RDBMSs over many years.
Such databases already back many of the data sources on the Semantic Web.
I and others at BBN have been involved in the Semantic Web since the
start of the DARPA Agent Markup
Language program in 2000. We have developed
numerous ontologies, data sources, tools, and applications. Recently,
we have begun to package some of these tools into the Asio Tool Suite. Relevant to this
workshop is our Semantic Distributed Query capability, which allows
SPARQL queries expressed using a domain ontology to span 1 or more
relational databases, web services, and other data sources. Individual
data source ontologies are mapped to the domain ontology using
SWRL mapping rules.
Semantic Web Characteristics
Characteristics we consider important for Semantic Web applications include:
- Publishing each data model as an OWL ontology.
- Use of resolvable URIs.
- Favoring the use of object properties over datatype properties.
- Use of datatypes.
- Use of accepted conventions such as camelCaseNames and singular class names.
- Reuse of or mappings to existing vocabularies such as FOAF and Dublin Core.
Approaches
Our first interfaces to relational databases involved hand generation
of a domain ontology and a custom Java servlet. This produced a high
quality Semantic Web representation and afforded tremendous
flexibility, but was labor intensize. Exposing several related tables
in one large relational database required several thousand lines of
Java code. Although we added support for more tables as needed, we
never completed support for the entire database.
Our general approach to data integration is to expose each data source
in a Semantic Web representation that remains faithful to its original
data model. We then use SWRL to map it to a domain (application)
ontology relevant to a particular community of interest. The data
source can then be reused by other communities by developing additional
mapping rules.
We realized that this approach could be employed or extended to
provide a nearly custom-quality representation with lower development
costs. A generic tool could be used to generate a basic ontology and
expose relational data. SWRL rules could then be used to transform
this representation into a richer ontology. This is the approach
taken in Asio. For the generic tool, we initially used D2RQ and then
developed our own Semantic Bridge for Relational Databases (SBRD).
Our original 2 level data source to domain ontology mapping approach has
generally evolved into an n-level approach, with use of SWRL to
produce the data source ontology from a more basic ontology and
additional SWRL "business rules" extending the domain ontology.
Suggestions for a Standard Mapping
We would support development of a standard representation for
expressing mappings from relational databases to the Semantic Web.
Rather than just making the data available (e.g. using only datatype
properties), this mapping should result in as "good" a Semantic Web
representation as possible, i.e. addressing the Semantic Web
Characteristics discussed above.
In general we expect that this representation would be driven from
table and column schema information such as that returned by the JDBC
metadata API, optionally augmented by additional information.
While databases are increasingly being designed using tools such as
UML and ERwin, it is unlikely that a mapping could automatically
address the variety of tools and transformations likely to be
encountered. The mapping should, however, accommodate the range of
design patterns typically employed in these transformations.
Following are several suggestions for what this standard mapping
should support, which might contribute to some future charter, requirements,
or goals.
We are not aware of any current tools that support all of these suggestions.
- dynamic
- The mapping should focus on dynamic interfaces to the database,
where Semantic Web content is produced on demand, rather than
statically converted.. This avoids duplication and stale data.
- foreign keys
- Metadata information on foreign keys should be used to
generate object properties referring to other entities in the
database. Since foreign key information is often missing from
relational databases, there should be a way to provide it
directly when generating the mapping.
- resolvable URIs
- Each entity in the database should have a resolvable URI, typically formed from a pattern involving the table name and primary key value(s). The statements returned by an HTTP GET on this URI may vary, as in the specification of SPARQL DESCRIBE. It should generally include the information contained in the record directly associated with this entity, but not the information in the records that refer to it using a foreign key. It should also be possible to use any URI returned from a SPARQL query in a subsequent SPARQL query.
- implicit class hierarchies
- A
type
or similarly named column usually signals the existence of an implicit subclass hierarchy below the containing record, possibly including multiple inheritance. It should be possible to provide these subclasses along with the associated values and have the rdf:type
statements generated automatically.
- external foreign keys
- Some fields may contain coded fields (e.g. zip, state, or country codes) that refer to instances described elsewhere on the Semantic Web. It should be possible to represent these as object properties and specify mappings from these codes to URIs.
- enumerations
- Coded fields with fixed sets of values should be translated to object properties and enumerated using
owl:oneOf
.
- superclasses
-
- A class hierarchy is typically implemented in a relational database by employing one table for the (possibly abstract) superclass and another table for each of the subclasses. The mapping should address this design pattern by supporting an n x m relationship between classes and tables and by naming instances using their most specific class.
- binary relationships
- All relationships in an entity/relationship/attribute model are typically represented in a relational database as tables with foreign keys for each of the involved entities. Binary relationships without attributes can be represented as OWL object properties. Unary relationships can be represented as OWL classes. Other relationships can be represented as n-ary relations.
- efficiency
- The mapping should allow automated generation of efficient queries. We have found that SQL queries generated by a Semantic Web interface will be compared to hand-generated queries in a custom application with respect to number and complexity. A simple SPARQL query should translate into a single simple SQL query.
- not just SPARQL
- SPARQL will often be used to access relational databases. However, the mapping should also support HTTP GET and other interfaces to relational data. SQL and similar general query interfaces are increasingly being hidden within application-specific web services.
- security
- Not all databases are public. It is often necessary or preferable to use existing database security models and mechanisms rather than duplicating them. Any access mechanisms should at least allow for passing database user names and passwords and authentication using client certificates passed over SSL.
- limited tables and views
- For security and other reasons, it may be necessary to restrict the interface to specific tables and views.
- update
- Initial efforts should focus on read access to relational databases, but allow for write access in the future. Any write access needs to be limited to invertible mappings.
Potential Areas for Standardization
There are multiple aspects of Semantic Web interfaces to relational databases
that are good candidates for standardization, individually or in combination:
- table and column to class and property mappings
- SQL datatype to XML Schema datatype mappings
- SPARQL to SQL translation
- web service interfaces (including authentication)
Conclusions
This position paper makes several suggestions for Semantic Web
interfaces to relational databases, based on our experience with
numerous data sets and applications. This is an important component
of the Semantic Web, and should be consistent with other Semantic Web
goals. We support the creation of best practices and/or
Recommendations in this area and look forward to the workshop.
Last modified: Mon Sep 10 23:05:10 Eastern Daylight Time 2007