Use Cases and Requirements
Use Cases and Requirements
Based on the final RDB2RDF XG this page gathers use cases (UC) and requirements of the RDB2RDF Mapping Language (R2RML) in order to publish a first RDB2RDF WG by 01/2010.
The need to share data with collaborators motivates custodians and users of relational databases to expose relational data on the Semantic Web. This document examines a set of use cases from science and industry, taking relational data and exposing it in patterns conforming to shared RDF schemas. These use cases expose a set of functional requirements for an upcoming standard for exposing relational data as RDF.
Relational databases are critical to the Semantic Web; most scientific and commercial use cases involve a fusion of different data sources, many of them relational. The Semantic Web can, in turn, offer a sound, vendor-neutral way to knit relational databases into a web of usable information.
Relational databases have unparalleled performance for many uses and can effectively serve Semantic Web queries. An effective mapping from relational data to RDF will allow queries expressed in [SPARQL], the Semantic Web query language, to be transformed into [SQL], the most deployed relational query langage. These use cases describe a relational database and the desired RDF view, with the goal that queries over the RDF representation be manifested as relational queries.
According to the the Semantic Web idea, data should be defined and linked in a way which makes them accessible for humans and machines. The Web of data is a scalable environment with explicit semantics where not only humans can navigate information as in the World Wide Web but also machines are able to find connections and use them to navigate through the information space. In order to make data machine processable, we need to :
a. follow a common model to describe, connect and access resources;
b. name resources in an unambiguous way.
The Web of Data is constantly growing due to its compelling potential of facilitating data integration and retrieval. At the same time however, closed relational database systems host the majority of enterprise information and contain a vast amount of semantic information encoded in their schemas. In order to make these data available for the Semantic Web, a connection must be established between relational databases and a format suitable for the Semantic Web. The most common way to publish resources in the Semantic Web follows the RDF model and uses Unique Resource Identifiers (URI) for resource identification, in this way facilitating the creation of a comprehensive and flexible resource description.
For this reason the working group decided to focus to defining guidelines for converting data stored in RDBMSs into a more flexible and sharable format such as RDF, which will allow data integration and queries in the Web of Data.
The advantages of creating an RDF view of relational data are inherited from the Web of Data and can be summarized based on the tasks they facilitate:
a. Integration: Data in different RDBs can be integrated using RDF semantics and mechanisms; in this sense, the Web of Data can be imagined as a big database. Moreover, information in the database can be integrated with information that comes from other Semantic Web.
b. Retrieval: once data are published in the Web of Data (as opposed to RDBMSs), queries can span different data sources and more powerful retrieval methods can be built since they follow an open world assumption.
R2RML Language Requirements
Requirements on the R2RML language, not specific to the application needs.
A. Many requirements of this kind are found in the RDB2RDF Working Group Charter and replicated here:
- The mapping language MUST define the mapping of relational data and relational schemas to RDF and OWL.
- The mapping language MUST define the set of relational algebra to be supported in the first release. This set to be supported SHOULD be as complete as possible and be defined as soon as possible after the WG official launch.
- The mapping language SHOULD have a human-readable syntax as well as XML and RDF representations of the syntax for purposes of discovery and machine generation.
- The mapping language SHOULD use W3C RIF whenever a rule engine is needed in the mapping language.
- It SHOULD be possible to subset the mapping language for lightweight applications such as Web 2.0 applications. This feature of the language will be validated by creating a library of mappings for widely used applications such as Drupal, Wordpress, or phpBB.
- The mapping language SHOULD be able to support vendor specific SQL data types.
- The mapping language specification SHOULD include guidance with regard to mapping relational data to a subset of OWL such as OWL QL or OWL RL.
- The mapping language MUST allow for a mechanism to create identifiers for database entities. The generation of identifiers should be designed to support the implementation of the linked data principles. Where possible, the language will encourage the reuse of public identifiers for long-lived entities such as persons, corporations and geo-locations.
B. Data types
I think we had consensus that relational data types should be treated consistently with rdf datatypes per SQL XSD mapping
R2ML Application Use Case Requirements
Integrating relational database content with the Semantic Web is not a stand-alone process, but a process of integrating that content with other content from the Semantic Web. Actual applications may have traditional names. For example, ETL-based datawarehouse and web-site mashup.
A. Scope of Data Processing
For simplicity we abstract and enumerate such integration pair wise, where the first web site is a web site containing relational data to be mapped and the second web site may be any Semantic Web site. [i.e. Internet accessible source of RDF or OWL]
We distinguish three refinements of mapping relationally stored data to RDF.
1. Structured: Consider only highly structured database content. String and other text fields are not considered valuable.
2. Structured + Semistructured: Text fields are considered valued but are treated simply as unparsed strings.
3. Structured + Microparsed Tagged Text: Text fields in the database are parsed into an RDF graph per an existing ontology.
Each of the above may be combined with RDF from a Semantic Web site distinguished as:
1. Structured: Data derived from a relational data source, and may further be refined per the three cases above.
2. Semantic: An arbitrary source of RDF/OWL
3. Mash-up (vs. joins)
The use-case distinction among these latter three cases is by combining relationally sourced data with other relationally sourced data, we are generally speaking to traditional database data integration applications. When combining with Semantic sources, we are speaking to any application of the Semantic Web where data may be conditionally combined. Finally, we reserve mash-up to speak to applications where data is sourced independently from a plurality of semantic sources, formatted and presented to a user. In other words, “mash-up” distinguishes user facing applications from applications comprising conditional information processing (and in particular database join operations).
Role of an Ontology
Different applications have different roles for an ontology.
1. Putative: Table and column names and other parts of SQL-DDL are exploited as parts of RDF triples and combined with the underlying data. In such cases an RDFS or OWL file may be synthesized from the SQL-DDL. The resulting file is called the putative ontology and may include substituting labels.
2. Existing Domain Ontology: An application may require the relational content be mapped to an existing domain ontology. Thus, an existing ontology serves as the basis of integration of the information source with the Semantic Web; all further data integration problems are indistinguishable from other Semantic Web integrations.
3. Create a Federating (Enterprise) Ontology: An application may not be associated with a preexisting domain ontology, but the application comprises integrating the data from a number of data sources. In a manner similar to the creation of an enterprise schema for a data warehouse, the application may call for the creation of an ontology as a consequence of examining the contents of the data sources. If all the applications data sources are relational databases then one can anticipate the application will take the form a classic mediator-based architecture [cite?]
Use Case #1: Integrating Enterprise Relational Databases
Trentino is an autonomous region in the north of Italy with a population of 1 million and more than 200 municipalities. Each municipality has data about people, organizations, building etc in their individual relational database. The goal is to integrate heterogenous relational databases and offer the user, a tax agent, an intelligent tool for navigating through the data present in the many different databases. The tool aggregates data and creates a profile for each tax payer. Each user profile shows different type of information, with links to other entities such as the buildings owner, payments made, location of residence, etc.
Each relational database can be mapped to an ontology that describes the domain in course. Queries can then be executed on the domain ontology and then translated to specific queries on each relational database.
Use Case #2: Integrating Biological Relational Database to a Linked Data source
Rob, from the RNA lab would like to integrate the data from his RNA Comparative Analysis Database (rCAD) with other databases such as GenBank, PDB, etc. This other databases are now exposed as RDF on the web following the Linked Data principles. Therefore Rob would also like to expose rCAD as RDF on the web following the Linked Data principles in order to link the data from rCAD with the data in GenBank, PDB, etc. Eventually he will be able to execute SPARQL queries on the web that will return results from different data sources.
In order to expose rCAD as RDF, Rob wants to map rCAD to an existing domain ontology called the Multiple Alignment Ontology (MAO).
Use Case #3: Exposing Relational Database content as RDF on the Web
Users of popular web applications backed by relational databases such as blogs, wikis, e-commerce websites would like to expose their relational content as RDF on the web in order that semantic web search engines can index blog posts, offerings, etc. For this purpose, widely adopted domain ontologies such as FOAF, SIOC, Dublic Core, GoodRelations should be mapped to the relational database.