SQL2OWL

From W3C Wiki

Browser-based SQL2OWL Mapping Tool

This page is intended to record the design issues of a browser-based SQL-OWL mapping tool. The tool will be based on the DartGrid mapping system and virtuoso's meta schema language, and take use of AJAX-based toolkits such as OpenLink Ajax Toolkit. This new mapping tool will cover more complex mapping cases and will be tested against by the life science demo use cases and the THALIA Testbed.

This is a collaborative effort between DartGrid and Virtuoso. People who have contributed to or are interested include:

... Please feel free to add your name if you are interested or want to contribute.

Introduction

Motivation

With the ever growing need of semantically linking data across different domain, organizational, cultual, displinary boundaries using RDF and OWL, one of the major obstacles impeding the advancement of the Semantic Web is the data availability. Where should the linked data come from? How could we obtain semantically well-defined structured data? Three major sources or methods are under taken commonly. The third one is more promising as most of those hot websites have structured data hidden in their backend (often referred as deep web data also), but tightly bound to and inextricably curated by specific client applications.

   ||  || ab initio knowledge engineering || extracted from raw documents || harvested from legacy structure data ||
   || data quality || High || Low || High but may be lower than ab initio ||
   || retrieving efficiency || Low || High ||High but requiring ad hoc programs ||
   || Human intervention || High || Low if high quality is not required, but high if required || Low if having robust mapping tool which is achievable ||

This page is intended to identify and record the SQL2OWL mapping issues, integration difficulties commonly occuring in real-life applications. A set of mapping cases will be identified as best practices based on two applications: the life science demo use cases and the THALIA Testbed.

Test Dataset

THALIA

The tested relational databases are taken from the THALIA Testbed benchmark. It provides a collection of 40 relational database tables representing university course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities.

The test ontology can be found at THALIA ontology. It has resued several public ontologies such as FOAF, Dublin Core, SKOS, TIME ontology, etc..

Life science

The tested data will be taken from typical life science data sources, which have public available SQL versions. The test ontology will draw on the W3C Banff life scienece demo.

About the representation and notation

We utilize three types of representations from different perspectives.

  • Graphical representation: each case is graphically represented for documentation purpose.
  • Abstract level: a mapping is abstracally described as a rule that has the relational predicate as the rule head and ontological description as rule body.
  • Syntax level: for implementation purpose, the mappings are finally stored as virtuoso meta-schema language MDL.
 A typical mapping rule has the form like below:
   R(?a,?b,?c,?d, ..., ...)-: (uri<?a>, rdf:type, thalia:Course),
                              (uri<?a>, foaf:name,?b),
                              (uri<?a>, thalia:hasInstructor, uri<?2>),
                              (uri<?2>, rdf:type, thalia:Instructor),
                              .......
 Several notions about the mapping rule description:
  • Variables: a,b,... are original column names of the relational table. By prefixing a question mark, they are made as variables like ?a, ?b, ... used in the rule description.
  • URI: uri<?...> is a function used to generate universal resource identifiers. The input parameters can be a variables occuring in the rule head, such that ?a might be a primary key and could uniquely identify a resources. In some other cases, there is no variables that can be used to generated URI. For consideration of practical implementation, te uri function may be implemented as either generating a blank node ID (internal id) or global URI.
 A typical MDL description has the form like below:


Mapping Cases

THALIA mapping cases

Case 1 Arizona State University

The simplest situation, as illustrated in Fig.1, is the one when a single table exactly maps to one single class in the ontology. The asu table stores CS course information of the Arizona State University.


Description as a rule:

asu(?Title,?Description,?MoreInfoURL) :- (uri<?>, rdf:type, thalia:Course), 
                                        (uri<?>, dc:title, ?Title), 
                                        (uri<?>, dc:description, ?Description), 
                                        (uri<?>, rdfs:seeAlso, ?MoreInfoURL),
                                        (uri<?>, thalia:forUniversity, "http://purl.org/thalia/university/asu"),
                                        (uri<?>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").


Description in virtuoso meta-schema language:



Remained issue: Instance-level mapping, how to mapping to course subject information?

Case 2 Brown University

Fig.2 illustrates the semantic mappings from brown table to the THALIA testbed Ontology. The brown table stores CS course information of the Brown University. This use case illustrates the mapping from one table to four ontological classes: Course, Instructor, Event, Point.


Rule:

brown(?Code,?Title,?Instructor,?Room) :- 
                          (uri<?Code>, rdf:type, thalia:Course),
                          (uri<?Code>, dc:title, ?Title),
                          (uri<?Code>, thalia:hasInstructor uri<?1>),
                          (uri<?1>, dc:homepage, ?Instructor),
                          (uri<?Code>, thalia:hasLecture, uri<?2>),
                          (uri<?2>, rdf:type, event:Event),
                          (uri<?2>, event:place, uri<?3>),
                          (uri<?3>, rdf:type, geo:Point),
                          (uri<?3>, dc:title, ?Room),
                          (uri<?Code>, thalia:forUniversity, "http://purl.org/thalia/university/brown"),
                          (uri<?Code>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").



MDL


Case 3 CMU University

Fig.3 illustrates the semantic mappings from cmu table to the THALIA testbed Ontology. The cum table stores CS course information of the CMU University.

cmu(?Code,?Sec,?CourseXListed,?CourseTitle,?Lecturer,?Room,?Day,?Time,?Units):-
                                       (uri<?Code>, rdf:type, thalia:Course),
                                       (uri<?Code>, dc:title, ?CourseTitle),
                                       (uri<?Code>, dc:hasInstructor, uri<?1>),
                                       (uri<?1>, foaf:name, ?Lecturer),
                                       (uri<?Code>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:time, uri<?3>),
                                       (uri<?3>, rdf:type, time:Interval),
                                       (uri<?3>, time:inDateTime, uri<?4>),
                                       (uri<?4>, rdf:type, time:DateTiemDescription),
                                       (uri<?4>, time:dayOfWeek, ?Day),
                                       (uri<?4>, time:hour, ?Time),
                                       (uri<?2>, event:place, uri<?5>),
                                       (uri<?5>, rdf:type,geo:Point),
                                       (uri<?5>, dc:title, ?Room).


MDL Description.


Unresolved issues:

  • Instance level mapping for the Day column, MWF -> Monday, Wednesday, Friday.
  • How to split the time 1:30 - 2:50 to: time:hour, time:miniute, time:duration.
  • Do not understand the meaning of some columns: Sec, CourseXListed.

Case 5 Georgia Tech University.

gatech(?Department,?Code,?Section,?Mode_,?CRN,?Title,?Hours,?In_,?Max_,?Days,?Time_,?Instructor,?Room,?Building,?Description) -: 
                                       (uri<?Code>, rdf:type, thalia:Course),
                                       (uri<?Code>, dc:title, ?Title),
                                       (uri<?Code>, dc:hasInstructor, uri<?1>),
                                       (uri<?Code>, dc:description, ?Description), 
                                       (uri<?1>, rdf:type, thalia:Instructor),
                                       (uri<?1>, foaf:name, ?Instructor),
                                       (uri<?Code>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:time, uri<?3>),
                                       (uri<?3>, rdf:type, time:Interval),
                                       (uri<?3>, time:inDateTime, uri<?4>),
                                       (uri<?4>, rdf:type, time:DateTiemDescription),
                                       (uri<?4>, time:dayOfWeek, ?Days),
                                       (uri<?4>, time:hour, ?Time_),
                                       (uri<?2>, event:place, uri<?5>),
                                       (uri<?5>, rdf:type,geo:Point),
                                       (uri<?5>, dc:title, ?Room+?Building),
                                       (uri<?Code>, thalia:forUniversity, "http://purl.org/thalia/university/gatech"),
                                       (uri<?Code>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").



MDL Description


Remained issues: how to map ?room + ?Building to a geo place? Combing them?

Case 6 Toronto University.

toronto (?No_,?level_,?offeredTerm,?title,?instructorEmail,?instructorName,?location,?coursewebsite,?prereq,?text_):-
                                       (uri<?No_>, rdf:type, thalia:Course),
                                       (uri<?No_>, dc:title, ?title),
                                       (uri<?No_>, dc:description, ?Description), 
                                       (uri<?No_>, dc:hasInstructor, uri<?1>),
                                       (uri<?No_>, rdfs:seeAlso,?coursewebsite),
                                       (uri<?No_>, thalia:hasPrerequisite, ?prereq),
                                       (uri<?1>, rdf:type, thalia:Instructor),
                                       (uri<?1>, foaf:name, ?instructorName),
                                       (uri<?1>, foaf:mbox, ,?instructorEmail),
                                       (uri<?No_>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:place, uri<?5>),
                                       (uri<?5>, rdf:type,geo:Point),
                                       (uri<?5>, dc:title, ?location),
                                       (uri<?No_>, thalia:forUniversity, "http://purl.org/thalia/university/ucsd"),
                                       (uri<?No_>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").
     


MDL Description.


Case 7 UCSD

ucsd(?Number,?Title,?Fall2003,?Winter2004,?Spring2004):-
                                       (uri<?Number>, rdf:type, thalia:Course),
                                       (uri<?Number>, dc:title, ?Title),
                                       (uri<?Number>, dc:hasInstructor, uri<?1>),
                                       (uri<?1>, rdf:type, thalia:Instructor),
                                       (uri<?1>, foaf:name, ?Fall2003),
                                       (uri<?Number>, dc:hasInstructor, uri<?2>),
                                       (uri<?2>, rdf:type, thalia:Instructor),
                                       (uri<?2>, foaf:name, ?Winter2004),
                                       (uri<?Number>, dc:hasInstructor, uri<?3>),
                                       (uri<?3>, rdf:type, thalia:Instructor),
                                       (uri<?3>, foaf:name, ?Spring2004),
                                       (uri<?Number>, thalia:forUniversity, "http://purl.org/thalia/university/gatech"),
                                       (uri<?Number>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").


Remained issues: how to represent some teacher teaches one couse in one specific semester.

Case 8

Using the ?Code as the input to generate URI is important for this use case, because it help merge duplicate records in the databases.


 umd(?Code,?CourseName,?Credits,?GradeMethod,?SectionTitle,?SectionTime):-
                                       (uri<?Code>, rdf:type, thalia:Course),
                                       (uri<?Code>, dc:title, ?CourseName),
                                       (uri<?Code>, thalia:hasSection, ?SectionTitle),
                                       (uri<?Code>, thalia:hasLecture, uri<?2>),
                                       (uri<?2>, rdf:type, event:Event),
                                       (uri<?2>, event:time, uri<?3>),
                                       (uri<?3>, rdf:type, time:Interval),
                                       (uri<?3>, time:inDateTime, uri<?4>),
                                       (uri<?4>, rdf:type, time:DateTiemDescription),
                                       (uri<?4>, time:dayOfWeek, ?Days),
                                       (uri<?4>, time:hour, ?SectionTime_),
                                       (uri<?Code>, thalia:forUniversity, "http://purl.org/thalia/university/umd"),
                                       (uri<?Code>, skos:subject, "http://purl.org/subject/thalia/ComputerScience").
                             


Case 9

student(?id,?name,?cid)-: 
            (uri<?id>, rdf:type, thalia:Student),
            (uri<?id>, foaf:name, ?name),
            (uri<?id>, thalia:select, ?uri<cid>).

course(?id,?name)-: 
            (uri<?id>, rdf:type, thalia:Course),
            (uri<?id>, dc:title, ?name).      
    


Case 10 : Self-join


student(?id,?name,...):- 
                 (uri<?id>, rdf:type, thalia:Student),
                 (uri<?id>, foaf:name, ?name).

friends(?id1,?id2):- (uri<?id1>, rdf:type, thalia:Student),
                     (uri<?id1>, foaf:friend, uri<?id2>),
                     (uri<?id2>, rdf:type, thalia:Student).
                 


Life Science Mapping Case

Case 1

The normal mapping systems often separately consider the schema-level mapping and instance-level mapping. However, in practical application, it is not always straiforward to distinguish these two types of data. Take the following mapping case as an example. Inhibit should be considered as a property (schema) in OWL, but is stored in a relational tuple(instance). OWL has more advance features for modeling semantics and usually requires more carefully and precisely distinguish concept data (class and properties) from individual data (instance). However, in relational database, many data that are actually schema-level information are stored in relational records instead of in the data dictionaries.

Fig. 1 The Senselab mapping case. Courtesy by Kei Cheung from Yale.

2. Complex Test Case 2

Design Issues

See Also

Some log about the mapping issue =

  • 06-26-2007. The thalia ontology should include ontology for location description, as for class room.