TaskForces/CommunityProjects/LinkingOpenData/THALIATestbed
SWEO Community Project: Linking Open Data on the Semantic Web
THALIA testbed
This page collects information about THALIA testbed for benchmarking relational database to RDF mapping tools.
The page is part of the Linking Open Data Project.
You can download the latest version of THALIA benchmark immediately now.
Introduction
THALIA (Test Harness for the Assessment of Legacy information Integration Approaches) is a publicly available testbed and benchmark for testing and evaluating integration technologies. It provides researchers and practitioners with a collection of 40 relational database tables representing University course catalogs from computer science departments around the world. The data in the testbed provide a rich source of syntactic and semantic heterogeneities since we believe they still pose the greatest technical challenges to the research community. In addition, this testbed provides a set of twelve benchmark queries as well as a scoring function for ranking the performance of an integration system.
Testbed content
The initial XML/XMLSchema/XQuery version is accessible here. Our SQL/OWL/SPARQL version is presented below.
SQL scripts
You can either download THALIA benchmark or THALIA testbed SQL Schema versions. Benchmark version includes challenge university schemas and data. Testbed version contains schemas and data for all available universities courses.
MySQL 5.0.37 (bundled with archive)
PostgreSQL 8.2.3-1 (bundled with archive)
Virtuoso 5.x (and higher)
- SQL Schema Generation Scripts
- SQL-RDF Views Scripts
- SPARQL Query Scripts (i.e. for executing SPARQL via ODBC or JDBC or ADO.NET or OLE-DB or XMLA connections to Virtuoso; otherwise take out the "sparql" keyword at start of the query if you want to execute directly via SPARQL Protocol using the Virtuoso instance SPARQL Endpoint).
First public version of THALIA benchmark schema already available.
Full testbed version doesn't construct yet.
University computer science course ontology
First public version of universities computer science departments courses around the world already available too.
Benchmark SPARQL queries
A set of twelve benchmark queries represented in SPARQL already available too.
Examples of testbed SQL data in RDF format
Benchmark data is required for SPARQL queries in RDF format already available too.
THALIA testbed downloads
- First public review version, 23 June, 2007 - download it.
Some testbed examples
Arizona State University
- Step 1. Initial HTML version of a course.
110 Principles of Programming with Java. (3) MORE INFO Concepts of problem solving using Java, algorithm design, structured programming, fundamental algorithms and techniques, and computer systems concepts. Social and ethical responsibility. Lecture, lab. Prerequisite: MAT 170.
- Step 2. Original representation in the THALIA testbed in XML format.
<Course Title="110 Principles of Programming with Java. (3)"> <MoreInfo.URL>http://www.eas.asu.edu/~cse110</MoreInfo.URL> <Description>Concepts of problem solving using Java, algorithm design, structured programming, fundamental algorithms and techniques, and computer systems concepts. Social and ethical responsibility. Lecture, lab. Prerequisite: MAT 170.</Description> </Course>
- Step 3. Representation of the course translated into SQL.
CREATE TABLE asu ( Title TEXT NOT NULL, Description TEXT, MoreInfoURL TEXT);
INSERT INTO asu (Title,Description,MoreInfoURL) VALUES ('110 Principles of Programming with Java. (3)' ,'Concepts of problem solving using Java, algorithm design, structured programming, fundamental algorithms and techniques, and computer systems concepts. Social and ethical responsibility. Lecture, lab. Prerequisite: MAT 170.' , 'http://www.eas.asu.edu/~cse110');
- Step 4 and the most important. The course represented in RDF.
<University rdf:about="http://purl.org/thalia/asu"> <dc:title xml:lang="en">Arizona State University</dc:title> </University> <Course rdf:about="http://purl.org/thalia/asu/course/CSE110"> <dc:title xml:lang="en">Principles of Programming with Java</dc:title> <dc:description xml:lang="en">Concepts of problem solving using Java, algorithm design, structured programming, fundamental algorithms and techniques, and computer systems concepts. Social and ethical responsibility.</dc:description> <hasPrerequisite rdf:resource="http://purl.org/thalia/asu/course/MAT170"/> <skos:subject rdf:resource="http://purl.org/topic/thalia/ProgrammingLanguages"/> <skos:subject rdf:resource="http://purl.org/subject/thalia/AlgorithmDesign"/> <skos:subject rdf:resource="http://purl.org/subject/thalia/SystemArchitecture"/> <rdfs:seeAlso rdf:resource="http://www.eas.asu.edu/~cse110"/> <forUniversity rdf:resource="http://purl.org/thalia/asu"/> </Course>
- Step 5. Examples of XQuery and SPARQL queries to find all courses with the string 'Data Structures' in the title.
<XQuery> FOR $b in doc('umd.xml')/umd/Course WHERE $b/CourseName='%Data Structures%' RETURN $b </XQuery>
<SPARQL> SELECT ?course WHERE { ?course a :Course; dc:title ?title. FILTER (lang(?title) = "en") FILTER regex(?title, "Data Structures") } </SPARQL>
SQL to RDF mapping tools
- DartGrid
- D2RQ
- OpenLink Virtuoso's SPARQL based SQL-RDF Metaschema Language (Presentation and Technical White Paper)
- SquirrelRDF
- See also: RdfAndSql
There's a more comprehensive list at RdfAndSql.
Resources
Bibliography
- J. Hammer, M. Stonebraker, and O. Topsakal, THALIA : Test Harness for the Assessment of Legacy Information Integration Approaches, Technical Report TR05-001, Dept. of Computer Science and Information and Eng., Univ. of Florida, January 2005.
Presentations
- Christian Bizer, Turning the Web into a Database, February 2007.