A Survey of Current Approaches for Mapping of Relational Databases to RDF
- Satya Sahoo, Kno.e.sis Center, Wright State University
- Wolfgang Halb, JOANNEUM RESEARCH
- Sebastian Hellmann, University of Leipzig
- Kingsley Idehen, OpenLink Software
- Ted Thibodeau Jr, OpenLink Software
- Sören Auer, University of Leipzig
- Juan Sequeda, University of Texas at Austin
- Ahmed Ezzat, Business Intelligence Software Division, HP
This document surveys the current techniques, tools and applications for mapping between Relational Databases (RDB) and Resource Description Framework (RDF). Some knowledge of RDF as well as RDB concepts and technologies is assumed for readers of this document. The survey is intended to enable the members of the W3C RDB2RDF Incubator Group to:
- Collate the existing state of the art in mapping approaches between RDB and RDF
- Use the reference framework defined in this survey to effectively compare the different mapping approaches
- This survey is a deliverable of the W3C RDB2RDF Incubator Group
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
This Wiki page is the living version of the RDB2RDF Survey Report. All information provided here does not imply endorsement by the W3C or the Incubator Group.
This document was initiated by the W3C RDB2RDF Incubator Group, part of the W3C Incubator Activity. It represents the consensus view of the group, in particular the authors of this document and those listed in the acknowledgments, on the issues regarding the current state of the art in RDB2RDF mapping approaches.
Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of W3C Membership.
Incubator Groups have as a goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy. Participants in this Incubator Group have made no statements about whether they will offer licenses according to the licensing requirements of the W3C Patent Policy for portions of this Incubator Group Report that are subsequently incorporated in a W3C Rcommendation.
A critical requirement for the evolution of the current Web of documents into a Web of Data (and ultimately a Semantic Web) is the inclusion of the vast quantities of data stored in Relational Databases (RDB). The mapping of these vast quantities of data from RDB to the Resource Description Framework (RDF) has been the focus of a large body of research work in diverse domains and has led to the implementation of generic mapping tols as well as domain-specific applications. Furthermore, the role of RDF as an integration platform for data from multiple sources, primarily in form of RDB, is one of the main motivations driving research efforts in mapping RDB to RDF. This survey documents current approaches in mapping RDB to RDF and effectively categorizes and compares the different approachs using a well-defined reference framework.
Mapping between RDB and RDF
The majority of data underpinning the Web and in domains such as life sciences [NCBI resources] and spatial data management [[[#green2008|Green et al., 2008]]] are stored in RDB with their proven track record of scalability, efficient storage, optimized query execution, and reliability. As compared to the Relational Data Model, RDF is a more expressive data model and data expressed in RDF can be interpreted, processed and reasoned over by software agents. In "Relational Databases on the Semantic Web" Tim Berners Lee [[[#bern1998|Berners Lee, 1998]]] discusses the common and distinct characteristics of RDF and the Entity Relationship (ER) model. The modeling of relationships or predicates in RDF as first class objects is listed as a significant difference between ER and RDF models. The use of Uniform Resource Identifiers (URI) for entities along with the ability to link them together using predicates enables RDF to effectively integrate data from multiple sources.
One of the important objectives of our survey is to analyze the "information gain" of mapping RDB to RDF through explicit modeling of relationships between entities that are either implicit or non-existent in RDB and incorporation of "domain semantics". The incorporation of domain semantics through the use of an application-specific domain ontology in mapping RDB to RDF is an important aspect to fully leverage the expressivity of RDF models. The survey also reviews the use of inference rules over the RDF repository leading to knowledge discovery.
The RDB2RDF Incubator Group
The W3C launched the RDB2RDF incubator group to explore the issues involved in mapping RDB to RDF in March 2008 with two objectives [W3C XG announcement]:
- To examine and classify the existing approaches in mapping between RDB and RDF as well as explore the possibility of standardiztion of the mapping process
- To examine and classify the approaches in associating OWL classes to SQL queries
This survey was undertaken to partially fulfill the objectives of the incubator group and has been identified as a deliverable long with the final report.
Reference Framework for Survey of RDB2RDF Mapping Approaches
A reference framework will enable the effective categorization and comparison of different approaches used in mapping RDB data to RDF. We have defined a reference framework for this survey consisting of six components such as the approach to generate the RDB to RDF mapping, the representation and the achieved integration of data.
In this section we describe each of the components of the reference framework in detail:
Creation of Mappings
We can classify the methods used to generate the mappings between RDB and RDF into two categories:
Automatic Mapping Generation
[[[#bern1998|Berners Lee, 1998]]] discusses a set of mappings between RDB and RDF namely:
- A RDB record is a RDF node
- The column name of a RDB table is a RDF predicate
- A RDB table cell is a value
Many systems leverage these mappings to automatically generate mappings between RDB and RDF with the RDB table as a RDF class node and the RDB column names as RDF predicates. An example of this approach is the Virtuoso RDF View [[[#blak2007|Blakeley, 2007]]] that uses the unique identifier of a record (primary key) as the RDF object, the column of a table as RDF predicate and the column value as the RDF subject. Other examples of similar tools are D2RQ [[[#bize2007|Bizer et al., 2007]]] (D2RQ also allows users to define customized mappings) and SquirrelRDF [[[#seab2007|Seaborne et al., 2007]]].
Even though these automatically generated mappings often do not capture complex domain semantics that are required by many applcations, these mappings serve as a useful starting point to create more customized, domain-specific mappings. This approach also enables Semantic Web applications to query RDB sources where the application semantics are defined in terms of the RDB schema. This approach is also called "Local ontology mapping".
Use of Existing Ontology Schema: A variation of the automatic generation of mappings between RDB and RDF is the use of an existing ontology to create simple mapings [[[#huet2007|Hu et al., 2007]]]. These simple mappings are checked for consistency and subsequently more contextual mappings are constructed. Though this approach automatically generates the RDB to RDF mappings, an existing ontology is used to enhance the quality of the mappings.
Domain Semantics-driven Mapping Generation
The second approach generates mappings from RDB to RDF by incorporating domain semantics that is often implicit or not capture at all in RDB schema. The explicit modeling of domain semantics, often modeled as a domain ontology, in the RDF repository enables software applications to take advantage of this "information gain" and execute queries that link together entities such as "gene --> expressed_in --> brain" [[[#saho2008|Sahoo et al., 2008]]]. Additionally, a mapping generated by using domain semantics also reduces the creation of triples for redundant or irrelevant knowledge. Byrne [[[#byrn2008|Byrne, 2008]]] discusses the reduction in the size of the RDF dataset by about 2.8 million triples through use of domain semantics-driven mappings from RDB to RDF.
The domain ontology may be pre-existing and sourced from public resources such as the National Center for Biomedical Ontologies (NCBO) at http://bioportal.bioontology.org/ or may be bootstrapped from local ontologies created by automatic mapping tools (as described in previous section). Green et al. [[[#green2008|Green et al., 2008]]] discuss an approach of mapping spatial data to RDF using a hydrology ontology [[[#hart2007|Hart et al., 2007]]] as the reference knowledge model and Sahoo et al. discuss an approach to generate mappings using the Entrez Knowledge Model (EKoM) to map gene data to RDF [[[#saho2008|Sahoo et al., 2008]]].
This approach is also called "Domain ontology mapping" and the process is the same as an ontology population technique where the transformed data are instances of the concepts defind in the ontology schema. Many mapping tools such as D2RQ [[[#bize2007|Bizer et al., 2007]]] allow the users to create customized mapping rules in addition to the automatically generated rules.
Mapping Representation and Accessibility of Mappings
The mappings between RDB and RDF may be represented as XPath rules in a XSLT stylesheet, in a XML-based declarative language such as R2O [[[#barr2006|Barrasa et al., 2006]]] or as "quad patterns" defined in Virtuoso [[[#blak2007|Blakeley, 2007]]] meta-schema language. The mappings, especially if they are created by domain experts or reference a domain ontology, may have wider applicability. Hence, to encourage reuse the mappings should be easily accessible by the wider community. Consequently, we review the representation and accessibility of the mappings between RDB and RDF.
The actual mapping of RDB to RDF may be either a static Extract Transform Load (ETL) implementation or a query-driven dynamic implementation. The ETL implementation such as the application described by Byrne [[[#byrn2008|Byrne, 2008]]], also called "RDF dump", use a batch process to create the RDF repository from RDB using mapping rules. The dynamic approach, an example application is described by Green et al. [[[#green2008|Green et al., 2008]]], implements the mapping dynamically in response to a query.
Similar to the data warehouse implementations the ETL approach may not reflect the most current data, but it allows additional processing or analysis over the data including execution of inference rules (well-defined RDF entailments or user-defined rules) without compromising query performance. On the other hand, the dynamic approach has an advantage that the query is executed against the most current version of data, which assumes significance if the data is frequently updated. However, a dynamic mapping implementation may incur query performance penalties if entailment rules are applied to the RDF repository to infer new information.
Queries in systems mapping RDB to RDF may either be in SPARQL, which is executed against the RDF repository, or the SPARQL query may be transformed into one or more SQL queries that are executed against the RDB. Cyganiak [[[#cyga2005|Cyganiak, 2005]]] has discussed the transformation of SPARQL to relational algebra and further into SQL. This paper describes operators such as "selection" and "inner join" implemented over RDF and correlates "RDF relational algebra" to SQL.
As discussed earlier in "Creation of Mappings" section, an important aspect of mapping RDB to RDF is the incorporation of domain semantics in the resulting RDF. Hence, we explicitly list the application domain of the work reviewed in this survey where possible (mapping tools are not domain-specific).
The RDF representation model through use of URI and explicitly modeled relationships between entities makes it easier to effecively integrate data from disparate, heterogeneous data sources. It is important to note that RDF does not automatically reconcile the multiple types of heterogeneity, such as structural, syntactic and semantic heterogeneities, that are described in the data/information integration research field. But, the use of domain ontologies along user-defined inference rules in reconciling heterogeneity between multiple RDB sources is an effective approach for creating a single or a set of "comaptible" RDF. Hence, this metric reviews the different mapping approaches with respect to data integration.
Figure 1: Reference Architecture for RDB2RDF Survey
Survey of RDB to RDF Mapping Approaches
In this section, we categorize the surveyed work into three broad classes namely:
- Proof of concept projects: Projects reviewed in this section explore specific approaches to map RDB to RDF with either a prototype or proof of concept imlementation. The work may or may not have lead to the release of a tool/application to the community
- Domain-specific projects: Many projects surveyed in this paper were driven by real-world application requirements and have used domain-semantics based customized mappings, generic mapping tools or a combination of both
- Tools/Applications: The projects surveyed in this section include D2RQ, R2O, Virtuoso, Triplify, and Dartgrid tools that have been released to the community for mapping RDB to RDF
All the projects in the three categories are reviewed according to the reference architecture defined in Section 2.
Proof of Concept Projects
- Hu et al. [[[#huet2007|Hu et al., 2007]]] describe an approach that aims at the automatic creation of simple mappings between relational database schemas and ontologies. Based on the relational schema and an ontology, initial simple mappings are derived and then checked for consistency. Based on sample input (mappings and instances for both the relational schema and the ontology) more contextual mappings are constructed. Experimental results in a limited domain showed the feasibility of the approach.
- Kashyap et al. [[[#kash2007|Kashyap et al., 2007]]] describe their work involving a mediator based approach to represent mappings from ontological concepts to disparate data soures as part of a general framework for RDF based access to heterogeneous data sources. The heterogeneous data sources, illustrated using a life sciences domain scenario, include RDB, Web services and Excel sheets (using MS Office API). The SPARQL queries are automatically translated to the appropriate query language using the mappings represented by the mediatr classes.
- Cullot et al. [[[#cull2007|Cullot et al., 2007]]] describe DB2OWL using the Table to Class and Column to Predicate approach but use specific relational database schema characteistics, that is, how tables relate to each other, to assert subclass and other object properties. The object properties represent many‐to‐many relationships and referential integrity. The mappings are stored in a R2O document.
- Tirmizi et al. [[[#tirm2008|Tirmizi et al., 2008]]] follow Li et al. [[[#liet2005|Li et al., 2005]]] and other similar work. It is the first work to present formal rules in First Order Logic to translate Table to Class and Column to Predicate. A notion of completeness for a transformation system is also presented based on all the possible foreign key and primary key combinations.
- The Semi-automatic Ontology Acquisition Method (SOAM) work by Li et al. [[[#liet2005|Li et al., 2005]]] uses the Table to Class and Column to Predicate approach to create an initial ontology schema which is then refined by referrig to a dictionary or thesauri (for example, WordNet). Constraints in the relational model are mapped to constraints in the ontology schema. For example, "NOT NULL" and "UNIQUE" are mapped to cardinality constraints on relevant properties. If a given set of relations are mapped to an ontology concept, the corresponding tuples of the relations are transformed as instances of the ontology concept.
- Sahoo et al. [[[#saho2008|Sahoo et al., 2008]]] describe work in the life sciences domain that incorporates domain semantics (from multiple, integrated ontologies) to create the mappings (represented as XPath rules in XSLT stylesheet) from RDB to RDF. A RDF dump is created using a batch approach and stored in Oracle 11g. SPARQL query language is used to query the RDF repository.
- Byrne [[[#byrn2008|Byrne, 2008]]] describes an application in the cultural heritage domain to convert the National Monument Record of Scotland data stored in reational database to RDF. The Simple Knowledge Organization System (SKOS) framework is used to transform the Royal Commission on the Ancient and Historical Monuments of Scotland (RCAHMS) thesauri to the Semantic Web. The entire RCAHMS database with 1.5 million entities is converted to 21 million RDF triples and implemented on both Jena and AlegroGraph systems.
- Green et al. [[[#green2008|Green et al., 2008]]] describe integration of spatial data in RDB for predictive modeling of diffuse water pollution using OWL-DL ontologies at multi-levels. The ontologies ("Data ontologies") at the first level are used to map each data source to concepts in the ontologies at the next level ("Domain ontologies"). The "data ontologies" are represented in the D2RQ mapping language. The "application ontology", at the third level, links the "domain ontologies" and also adds application-specific information. The D2RQ engine is modified to include spatial operators and is used to interface between the data sources and data ontologies The SPARQL query language is used to query the virtual RDF graphs generated from the data sources.
- The Virtuoso RDF View [[[#blak2007|Blakeley, 2007]]] uses the Table to Class (RDFS class) and Column as Predicate approach and takes into consideration special cases such as whether column is part of the primary key or foreign key. The foreign key relationship between tables is made explicit between the relevant classes representing the tables.
The RDB data are represented as virtual RDF graphs without physical creation of RDF datasets. The Virtuoso RDF views are composed of "quad map patterns" that define the mapping from a set of RDB columns to triples. The quad map pattern is represented in the Virtuoso meta‐schema language, which also supports SPARQL-style notations.
- D2RQ [[[#bize2007|Bizer et al., 2007]]] provides an integrated environment with multiple options to access relational data including "RDF dumps", Jena and Sesame API based access (API calls are rewritten to SQL), and SPARQL endpoints on D2RQ Server.
The mappings may be defined by the user thereby allowing the incorporation of domain semantics in the mapping process, though there are some limitations to this as described in the Ordnance Survey presentation [[[#green2008|Green et al., 2008]]]. The mappings are expressed in a "declarative mapping language". The performance varies depending on the access method and is reported to perform reasonably well for basic triple patterns butsuffers when SPARQL language features such as FILTER, LIMIT are used.
- Triplify [[[#auer2009|Auer et al., 2009]]] is a simplistic approach to publish RDF and Linked Data from relational databases. Triplify is based on mapping HTTP‐URI requests onto relational database queries expressed in SQL with some some additions. Triplify transforms the resulting relations into RDF statements and publishes the data on the Web in various RDF serialization, in particular as Linked Data. Triplify as a light‐weight software component, which can be easily integrated and deployed with the numerous widely installed Web applications. The approach does not support SPARQL, includes a method for publishing update logs to enable incremental crawling of linked data sources. Triplify is complemented by a library of configurations for common relational schemata and a REST‐enabled datasource registry. Despite its lightweight architecture Triplify is usable to publish very large datasets, such as 160 GB of geo data from the OpenStreetMap project.
- R2O [[[#barr2006|Barrasa et al., 2006]]] is a XML based declarative language to express the mappings between RDB elements and an ontology. R2O mappings can be used to "detect inconsistencies and ambiguities" in mapping definitions. The ODEMapster engine uses a R2O document to either execute the transformation in response to a query or in a batch mode to crate a RDF dump.
- Wu et al. [[[#wuet2006|Wu et al., 2006]]] and Chen et al. [[[#chen2006|Chen et al., 2006]]] describe the Dartgrid Semantic Web toolkit that offers tools for the mapping und querying of RDF generated from RDB. The mapping is basically a manual table to class mapping where the user is provided with a visual tool to define the mappings. The mappings are then stored and used for the conversion. The construction of SPARQL queries is assisted by the visual tool and the queries are translated to SQL queries based on the peviously defined mappings. A full-text search is also provided.
The tool is available at: http://ccnt.zju.edu.cn/projects/dartgrid/
- The RDBToOnto work by Cerbah [[[#cerb2008|Cerbah, 2008]]] is a highly configurable tool that eases the design and implementation of methods for ontology acquisition from relational datbases. It is also a user-oriented tool that supports the complete transitioning process from the access to the input databases to generation of populate ontologies. The settings of the learning parameters and control of the process are performed through a full-fledged dedicated interface.
The tool is available at: http://www.tao-project.eu/researchanddevelopment/demosanddownloads/RDBToOnto.html
- Asio Semantic Bridge for Relational Database and Automapper: Asio Semantic Bridge for Relational Databases (SBRD) and Automapper use the table to class approach. Automapper generates an OWL Full ontology from a relational database. In the generated ontology, each class corresponds to a table in the relational database and columns are represented as properties of the relevant class. A primary key column has cardinality set to 1. A nullable column has max cardinality set to 1. For a foreign key, an object property is created and its range is set to the corresponding class. The generated ontology includes SWRL rules to equate individuals based on multiple primary key columns. Semantic Bridge for Relational Databases provides an RDF view of data in the relational database. SPARQL queries can be written in terms of the Automapper generated data source ontology and relational data is returned as RDF SBRD rewrites the SPARQL query to SQL, executes the SQL and converts SQL rows to RDF conforming to the data source ontology.
Table 1: A Comparative View of Implementation using Survey Reference Framework
The survey highlights different aspects of the RDB to RDF mapping process; some of these aspects need to be addressed to enable the move towards standardization of the mapping process.
Mapping Representation and Accessibility ‐ As presented in the survey, currently there is no standard method for representation of mappings between RDB and RDF. Projects reviewed in the survey used a variety of representation formats such as FOL, XPath expressions, and tool‐specific languages (for example, D2RQ or R2O mapping language). Mappings, especially those that are created by domain experts or use domain ontologies, are important artifacts that should be available for reuse.
Mapping Implementation – The projects reviewed in this survey either implemented a dynamic on-demand or a static mapping from RDB to RDF. Though the advantages and disadvantages of each approach such as accessibility to latest version of data, update patterns of the data sources are well‐documented in the data warehouse community, the projects reviewed in this survey did not contain an explicit comparison between the two approaches. The RDB2RDF incubator group has discussed this issue in Requirements for Relational to RDF Mapping.
This survey to document the current state of the art in mapping approaches between RDB and RDF was conducted to partially fulfil the objectives of the W3C RDB2RDF incubator group. To enable a coherent and effective comparison of the different RDB2RDF mapping approaches we defined a reference architecture or the survey consisting of six components such as mapping generation, query execution and data integration achieved by mapping RDB to RDF.
One of the important aspects of mapping RDB to RDF is the potential to explicitly model information that was either implicitly odeled or not represented at all in RDB, such as domain semantics. Hence, many of the tools and generic application in the survey have noted the importance of using a domain ontology, in addition to information from the RDB schema, in generating RDF. Another important aspect of mapping RDB to RDF is the potential for data integration by representing data from multiple RDB sorces as a single RDF graph. The representation of mappings between RDB and RDF in a standardized form is necessary to enable their reuse and the RDB2RDF incubator group, in its final report [RDB2RDF XG Final Report] has proposed the use of the W3C Working Group Rule Interchange Format (RIF) to represent mappings. None of the projects reviewed in the survey use RIF to represent mappings between RDB and RDF.
Finally, this survey is expected to not only serve as a resource to the RDB2RDF incubator group but also for the community of researches involved in mapping RDB to RDF in order to support the evolution of the Web of documents into a "Web of Data".
We would like to thank all other members of the W3C RDB2RDF Incubator Group for their valuable suggestions and comments.
- [Auer et al., 2009]
- Triplify ‐ Lightweight Linked Data Publication from Relational Database S. Auer, S. Dietzold, J. Lehmann, S. Hellmann, D. Aumueller. To appear in proceedings of WWW 2009, Madrid, Spain.
- [Berners-Lee, 1998]
- Relational Databases on the Semantic Web T. Berners-Lee, 1998.
- [Barrasa et al., 2006]
- Upgrading relational legacy data to the semantic web (slides) J. Barrasa and A. Gómez-Pérez. In Proc. of 15th international conference on World Wide Web Conference (WWW 2006), pages 1069-1070, Edinburgh, United Kingdom, 23-26 May 2006.
- [Bizer et al., 2007]
- D2RQ — Lessons Learned C. Bizer and R. Cyganiak. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 25-26 October 2007.
- [Blakeley, 2007]
- RDF Views of SQL Data (Declarative SQL Schema to RDF Mapping) C. Blakeley, OpenLink Software, 2007.
- [Byrne, 2008]
- Having Triplets – Holding Cultural Data as RDF K. Byrne. Proceedings of the ECDL 2008 Workshop on Information Access to Cultural Heritage, Aarhus, Denmark, 18 September 2008.
- [Chen et al., 2006]
- Towards a Semantic Web of Relational Databases: A Practical Semantic Toolkit and an In-Use Case from Traditional Chinese Medicine H. Chen and Y. Wang. In Proc. of 5th International Semantic Web Conference (ISWC 2006), pages 750-763, Athens, USA, 5-9 November 2006.
- [Cerbah, 2008]
- Learning Highly Structured Semantic Repositories from Relational Databases - The RDBToOnto Tool F. Cerbah. Proceedings of the 5th European Semantic Web Conference (ESWC 2008), Tenerife, Spain, June 2008.
- [Cullot et al., 2007]
- DB2OWL: A Tool for Automatic Database-to-Ontology Mapping N. Cullot, R. Ghawi and K. Yetongnon. In Proc. of 15th Italian Symposium on Advanced Database Systems (SEBD 2007), pages 491-494, Torre Canne, Italy, 17-20 June 2007.
[Cyganiak, 2005]:: A relational algebra for SPARQL R. Cyganiak. HP Technical Report HPL-2005-170. 2005.
- [Green et al., 2008]
- Linking Ontologies to Spatial Databases J. Green and C. Dolbear, RDB2RDF XG presentation, 2008.
- [Hart et al., 2007]
- Lege Feliciter: Using Structured English to represent a Topographic Hydrology Ontology G. Hart, C. Dolbear and J. Goodwin. In Proceedings of the OWL Experiences and Directions Workshop, 2007
- [Hu et al., 2007]
- Discovering Simple Mappings Between Relational Database Schemas and Ontologies W. Hu and Y. Qu. In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225-238, Busan, Korea, 11-15 November 2007.
- [Kashyap et al., 2007]
- From Web 1.0 -> 3.0: Is RDF access to RDB enough? V. Kashyap and M. Flanagan. Position paper for the W3C Workshop on RDF Access to Relational Databases, Cambridge, USA, 25-26 October 2007.
- [Li et al., 2005]
- A Semi-automatic Ontology Acquisition Method for the Semantic Web Man Li, Xiaoyong Du and Shan Wang. 2005.
- [Sahoo et al., 2008]
- An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence S. Sahoo, O. Bodenreider, J. Rutter, K. Skinner and A. Sheth. Journal of Biomedical Informatics (Special Issue: Semantic Biomedical Mashups), (in press), 2008.
[Seaborne et al., 2007]:: SQL-RDF A. Seaborne, D. Steer, S. Williams. 2007
- [Tirmizi et al., 2008]
- Translating SQL Applications to the Semantic Web S. Tirmizi, J. Sequeda ,D. Miranker. 2008.
- [Wu et al., 2006]
- Dartgrid: a Semantic Web Toolkit for Integrating Heterogeneous Relational Databases Z. Wu, H. Chen, H. Wang, Y. Wang, Y. Mao, J. Tang and C. Zhou. Semantic Web Challenge at 4th International Semantic Web Conference (ISWC 2006), Athens, USA, 5-9 November 2006.
RDB2RDF Transformation Approaches
List of Available Tools
New list (many already exist in the cluster above):
- Protege plugins
- Virtuoso's RDBMS to RDF Meta Schema Language
- The Semantic Discovery System: Provides the functionality to rapidly build solutions for non technical Users to create and execute Ad Hoc queries using the network Graph User Interface (SPARQL to SQL is auto generated). Integrates and interconnects ALL data silo types - providing a virtual Semantic Web interface to all RDBMS's, Web Services, Excel Spreadsheets, and any Hybrid File Systems.
- lgraps mentioned in the mailinglist (out of scope)
- Protege Excel_Import(out of scope)
- François Belleaua, Marc-Alexandre Nolina, Nicole Tourignyb, Philippe Rigaulta, Jean Morissettea, Bio2RDF: Towards a mashup to build bioinformatics knowledge systems http://dx.doi.org/10.1016/j.jbi.2008.03.004
- Yuan An, Alex Borgida, and John Mylopoulos, Inferring Complex Semantic Mappings between Relational Tables and Ontologies from Simple Correspondences http://www.cs.toronto.edu/~yuana/research/publications/odbase05.paper13.pdf
- Justas Trinkunas and Olegas Vasilecas, Building ontologies from relational databases using reverse engineering methods http://ecet.ecs.ru.acad.bg/cst07/Docs/cp/SII/II.6.pdf
- Benjamin Habegger,Learning Data-Consistent Mappings from a Relational Database to an Ontology http://www.ceur-ws.org/Vol-200/06.pdf
- Martin J. O'Connor, et al., Efficiently Querying Relational Databases Using OWL and SWRL http://bmir.stanford.edu/file_asset/index.php/1163/SMI-2007-1244.pdf
- E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth, “OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies,” http://knoesis.wright.edu/library/download/MKSI96.pdf
- C. Pérez de Laborda, S. Conrad, "Database to Semantic Web Mapping using RDF Query Languages", http://dbs.cs.uni-duesseldorf.de/~perezdel/pdf/06PeCob.pdf
- There are several more Virtuoso whitepapers and presentations relevant to this topic, http://virtuoso.openlinksw.com/Whitepapers/ ... More than just Table-to-Class, Virtuoso maps Tables, Views, SQL Stored Procedures, and other Data Sources to Classes (which may not be made clear by these whitepapers without reference to or familiarity with the documentation). It seems that Virtuoso should be mentioned not only in 2.1.1 "Transformation RDB -> RDF" "Table to class" but also in 2.1.2 "Domain Semantics-driven", 2.1.3 "Automatic ontology-based mapping discovery", 2.2.1 "Querying" "SPARQL -> SQL", and 2.3 "Other", above. Especially note --
- Deploying RDF Linked Data via Virtuoso Universal Server
- Virtuoso's SQL to RDF Technology Presentation (W3C RDF & DBMS Integration Workshop 10-25-2007
- M. Dean. Use of SWRL for Ontology Translation. 2008 Semantic Technology Conference, San Jose, CA, May 2008.
- M. Fisher, M. Dean, and G. Joiner. A Tool for Semantic Relational Database Translation using OWL and SWRL. OWL Experiences and Directions 2008 DC, Gaithersburg, MD, April 2008. 
Complementary and related resources can be found at the following pages:
- W3C Workshop on RDF Access to Relational Databases 25 to 26 October 2007 Cambridge, MA, USA.
- Accepted papers at the workshop
- Work on RDF and SQL described in ESW Wiki