TaskForces/CommunityProjects/LinkingOpenData/DraftESWC2007DemoSubmission

From W3C Wiki

Draft ESWC2007 Demo Submission

Final Draft

(ChrisBizer, 2007-03-28, changed title, updated picture, added list of participants) (TomHeath, 2007-03-29, few minor edits) (ChrisBizer and TomHeath, 2007-03-30, more edits, integrated Richard's comments.) (TedThibodeauJr, 2008-07-21, corrected "derefereneceable" spelling)

Please review the demo description at http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkingOpenData.pdf


% This is LLNCS.DEM the demonstration file of % the LaTeX macro package from Springer-Verlag % for Lecture Notes in Computer Science, % version 2.2 for LaTeX2e % \documentclass{llncs} % \usepackage{makeidx}  % allows for indexgeneration \usepackage{graphicx} %\usepackage{hyperref} %

\begin{document} %

\title{Interlinking Open Data on the Web} 
% \titlerunning{Interlinking Open Data}  % abbreviated title (for running head)

% also used for the TOC unless % \toctitle is used % \author{Chris Bizer\inst{1} \and Tom Heath\inst{2} \and Danny Ayers\inst{3} \and Yves Raimond\inst{4}} % \authorrunning{Christian Bizer et al.}  % abbreviated author list (for running head)

% \institute{Freie Universit\"at Berlin.~\email{chris[at]bizer.de} \and Knowledge Media Institute, The Open University.~\email{t.heath[at]open.ac.uk} \and Free Author.~\email{danny.ayers[at]gmail.com} \and Centre for Digital Music, Queen Mary, University of London~\email{yves.raimond[at]elec.qmul.ac.uk} } \maketitle  % typeset the title of the contribution

\begin{abstract}

A fundamental prerequisite of the Semantic Web is the existence of large amounts of meaningfully interlinked RDF data on the Web. The W3C SWEO community project {\em Linking Open Data} has made various open datasets available on the Web as RDF, and developed automated mechanisms to interlink them with RDF statements. Collectively, the datasets currently consist of over one billion triples. We believe that large scale interlinking will demonstrate the value of the Semantic Web compared to more centralized approaches such as Google Base\footnote{http://www.google.com/base}. This paper outlines the work to date and describes the accompanying demonstration.

\end{abstract} % %\section{A Web of Interlinked Datasets} % A functioning Semantic Web is predicated on the availability of large amounts of data as RDF; not in isolated islands but as a Web of interlinked datasets. To date this prerequisite has not been widely met, leading to criticism of the broader endeavour and hindering the progress of developers wishing to build Semantic Web applications. Thanks to the Open Data movement, a valuable opportunity exists to partially rectify this situation by making existing royalty-free datasets (such as Wikipedia, Musicbrainz, Geonames, Wordnet, and DBLP) available as RDF, and interlinking them on a large scale.

The W3C SWEO\footnote{http://www.w3.org/2001/sw/sweo/} community project {\em Linking Open Data}\footnote{http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData} is pursuing this avenue, and has published several large interlinked RDF datasets on the Web. The project follows the Linked Data principles\footnote{http://www.w3.org/DesignIssues/LinkedData.html} outlined by Tim Berners-Lee, such that: all items of interest should be identified using URI references; all URI references should be resolvable on the Web to RDF descriptions; and every RDF triple is conceived as a hyperlink that can be followed by Semantic Web browsers and crawlers.

Our Web of interlinked datasets currently consists of dbpedia (91 million triples), Geonames (60 million triples), Musicbrainz (50 million triples), the dbtune music server (4 million triples), the DBLP bibliography (15 million triples), Revyu reviews and ratings (15 thousand triples), a US census dataset (700 million triples), and the RDF Book Mashup (several billion triples).

These datasets are interlinked by approximately 150.000 RDF links, in the form of triples that connect a subject URI from one dataset with an object URI from another dataset. Using these links one can navigate from a computer scientist in dbpedia to her publications in the DBLP database, from a dbpedia book to reviews and sales offers for this book provided by the RDF Book Mashup, or from a band in dbpedia to a list of their songs provided by Musicbrainz or dbtune.

\begin{figure}[h]

   \centering
       \includegraphics[width=0.65\textwidth]{Firgure1.pdf}
   \caption{Linking relationships within our web of datasets.}
   \label{fig:Firgure1}

\end{figure}

%This web of data is accessible to Semantic Web crawlers and can interactively be explored using an Semantic Web browser.

In our demonstration, we will show how this web of datasets is browsed using three different Semantic Web browsers: the Tabulator browser developed at MIT, the Disco browser developed at Freie Universit\"at Berlin and the OpenLink Data Web browser. We will also demonstrate the Zitgist Semantic Web search engine which crawls the data and provides an integrated view on it as well as a easy-to-use search interface.

%\section{Conclusion}

RDF is the obvious technology to interlink data from various data sources. The RDF datasets created by the project can be used as a testbed for Semantic Web browsers and crawlers, RDF stores and reasoning engines, as well as for data linkage, data cleansing, and data mining tools. We encourage people to set RDF links into our datasets, as each new link helps to bootstrap the Semantic Web as a whole.

In addition to the authors, the following people currently contribute to the project: S\"oren Auer, Josh Tauberer (University of Pennsylvania), Frederick Giasson (Zitgist), Kingsley Idehen, Orri Erling (OpenLink), Georgi Kobilarov, Richard Cyganiak (Freie Universit\"at Berlin), Stefano Mazzocchi (MIT), Bernard Vatant (Mondeca), Marc Wick (Geonames). The {\em Open Linking Data} project is a community effort and we highly welcome further participants.

\end{document}

Third Draft

(TomHeath, 2007-03-25) (Chris Bizer, 2007-03-26, small editorial changes)

PDF again at http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkingOpenData.pdf

Title: Bootstrapping the Semantic Web by Interlinking Open Data


% This is LLNCS.DEM the demonstration file of % the LaTeX macro package from Springer-Verlag % for Lecture Notes in Computer Science, % version 2.2 for LaTeX2e % \documentclass{llncs} % \usepackage{makeidx}  % allows for indexgeneration \usepackage{graphicx} %\usepackage{hyperref} % \begin{document} % \title{Bootstrapping the Semantic Web by Interlinking Open Data} % \titlerunning{Interlinking Open Data}  % abbreviated title (for running head) % also used for the TOC unless % \toctitle is used % \author{Chris Bizer\inst{1} \and Tom Heath\inst{2} \and Danny Ayers\inst{4} \and Yves Raimond\inst{3}} % \authorrunning{Christian Bizer et al.}  % abbreviated author list (for running head)

% \institute{Freie Universit\"at Berlin~\email{chris[at]bizer.de} \and Knowledge Media Institute, The Open University~\email{t.heath[at]open.ac.uk} \and University of London~\email{yves.raimond[at]gmail.com} \and Free Author~\email{danny.ayers[at]gmail.com}}

\maketitle  % typeset the title of the contribution

\begin{abstract}

A fundamental prerequisite of the Semantic Web is the existence of large amounts of meaningfully interlinked RDF data on the Web. The W3C SWEO\footnote{http://www.w3.org/2001/sw/sweo/} community project {\em Linking Open Data}\footnote{http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData} has made various open datasets available on the Web as RDF, and developed automated mechanisms to interlink them with RDF statements. Collectively, the datasets currently consist of over one billion triples. We believe that large scale interlinking will demonstrate the value of the Semantic Web compared to more centralized approaches such as Google Base \footnote{http://www.google.com/base} or Freebase \footnote{http://www.freebase.com/}. This paper outlines the work to date and describes the accompanying demonstration.

\end{abstract} % \section{A Web of Interlinked Datasets} % A functioning Semantic Web is predicated on the availability of large amounts of data as RDF; not in isolated islands but as a Web of interlinked datasets. To date this prerequisite has not been widely met, leading to criticism of the broader endeavour and hindering the progress of developers wishing to build Semantic Web applications. Thanks to the Open Data movement, a valuable opportunity exists to partially rectify this situation by making existing royalty-free datasets (such as Wikipedia, Musicbrainz, Geonames, Wordnet, and DBLP) available as RDF, and interlinking them on a large scale.

The W3C SWEO community project Linking Open Data is pursuing this avenue, and has created a huge interlinked RDF dataset on the Web. The project follows the Linked Data principles outlined by Tim Berners-Lee [Tbl2006], such that: all items of interest should be identified using URI references; all URIs should be de-referenceable on the Web; and that every RDF triple is conceived as a hyperlink that can be followed by Semantic Web browsers and crawlers.

Our Web of interlinked data currently consists of the following datasets: dbpedia (31 million triples), Geonames (60 million triples), Musicbrainz (50 million triples), the dbtune music server (4 million triples), Revyu reviews and ratings (XX thousand triples), a D2R Server publishing the DBLP bibliography (15 million triples), the RDF Book Mashup (approximately several billion triples), and the US census dataset (700 million triples). In addition, there are various FOAF profiles that link to URIs from the dataset.

These datasets are interlinked by approximately 150.000 RDF links, in the form of triples that connect a subject URI from one dataset with a object URI from another dataset. Using these links one can, for example, navigate from the description of a computer scientist in dbpedia to her publications in the DBLP database, from a dbpedia book to reviews and sales offers for this book provided by the RDF Book Mashup, or from a band in dbpedia to a list of their songs provided by Musicbrainz or dbtune. Figure \ref{fig:Firgure1} visualizes the linking relationships within this web of data.

\begin{figure}[h]

   \centering
       \includegraphics[width=0.80\textwidth]{Firgure1.pdf}
   \caption{Linking relationships within our web of data.}
   \label{fig:Firgure1}

\end{figure}

\section{The Demonstration}

This web of data is accessible to Semantic Web crawlers and can interactively be explored using an Semantic Web browser. In our demonstration, we will show how this net is browsed using three different Semantic Web browsers: The Tabulator browser developed at MIT, the Disco browser developed at Freie Universit\"at Berlin and the OpenLink Data Web Browser. We will also demonstrate the Zitgist Semantic Web search engine, which crawls the data and provides an integrated view on it as well as a easy-to-use search interface.

\section{Conclusion}

RDF is the obvious technology to interlink data from various sources. We think that having huge interlinked RDF dataset on the Web is beneficial for various Semantic Web development areas, including Semantic Web browsers, Semantic Web crawlers, RDF repositories and reasoning engines, Semantic Web data linkage, data cleansing and data mining tools.

Having useful, general-purpose data online might inspire people to create interesting mashups and other RDF-aware applications. It might also encourage people to set links to the data and could thus help bootstrapping the Semantic Web as a whole.

The Open Linking Data project is a community effort and we highly welcome further participants.

% % % --- Bibliography --- % \begin{thebibliography}{} % \bibitem[Tbl2006]{linkeddata} Tim Berners-Lee: Linked Data. http://www.w3.org/DesignIssues/LinkedData.html

\end{thebibliography}

\end{document}


Title: Linking Open Data on the Web

Second draft:

The PDF version of the second draft is found at http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkingOpenData.pdf

The Latex source code is found below:

% This is LLNCS.DEM the demonstration file of % the LaTeX macro package from Springer-Verlag % for Lecture Notes in Computer Science, % version 2.2 for LaTeX2e % \documentclass{llncs} % \usepackage{makeidx}  % allows for indexgeneration \usepackage{graphicx} %\usepackage{hyperref} % \begin{document} % \title{Bootstrapping the Semantic Web by Interlinking Open Data on the Web} % \titlerunning{Interlinking Open Data}  % abbreviated title (for running head) % also used for the TOC unless % \toctitle is used % \author{Chris Bizer\inst{1} \and Tom Heath\inst{2} \and Danny Ayers\inst{4} \and Yves Raimond\inst{3}} % \authorrunning{Christian Bizer et al.}  % abbreviated author list (for running head)

% \institute{Freie Universität Berlin~\email{...} \and The Open University~\email{...} \and University of London~\email{...} \and Free Author~\email{...}}

\maketitle  % typeset the title of the contribution

\begin{abstract} The Open-Data movement aims at making data freely available to everyone. Examples of royalty-free datasets include Wikipedia, Wikibooks, Musicbrainz, Geonames, Wordnet, DBLP bibliography, among many others. The W3C SWEO community project {\em Linking Open Data} made various open datasets available on the Web as RDF and has set RDF links between data items from different data sources. Our dataset currently consists of over one billion triples. We believe that large-scale interlinking will demonstrate the value of the Semantic Web compared to more centralized approaches such as Google Base.

\end{abstract} % \section{A Net of Interlinked Datasets} % A general criticism against the Semantic Web is that lots of scientist are talking about it but that there is not much happening on the Web in terms of RDF data being Web-accessable. The W3C SWEO community project {\em Linking Open Data} approaches this fact by creating a huge interlinked RDF dataset on the Web. The project follows the Linked Data principles outlined by Tim Berners-Lee in~\cite{linkeddata}, meaning that all items of interest should be identified using URI references, that all URIs should be dereferenceable on the Web, and that every RDF triple is conceived as a hyperlink that can be followed by Semantic Web browsers and crawlers.

Our net of interlinked datasets on the Web currently consists of the dbpedia dataset (31 million triples), the Geonames dataset (XX million triples), the Musicbrainz dataset (XX million triples), the dbtune music server (XX million triples), the RevYu rating dataset (XX million triples), a D2R Server publishing the DBLP bibliography (15 million triples), the RDF Book Mashup (approximately several billion triples), and the US census dataset (700 million triples) and a bunch of FOAF profiles that use URIs from the dataset.

These datasets are interlinked by approximately 100.000 (??) RDF links in the form of triples that connect a subject URI from one dataset with a object URI from another dataset. Using these links you can for instance navigate from the description of a computer scientist in dbpedia to his publications in the DBLP database, from a dbpedia book to reviews and sales offers for this book provided by the RDF Book Mashup, or from a band in dbpedia to a list of their songs provided by Musicbrainz or dbtune. Figure \ref{fig:DatasetNet} visualizes the linking relationsships within this net of datasets.

\begin{figure}[h] \centering \includegraphics[width=0.80\textwidth]{Firgure1.pdf} \caption{Linking relationsships within the net of datasets.} \label{fig:Firgure1} \end{figure}

\section{The Demonstration}

This net of dataset is accessable to Semantic Web crawlers and can interactively be explored using an Semantic Web browser. In our demonstration, we will show how this net is browsed using three different Semantic Web browsers: The Tabulator browser developed at MIT, the Disco browser developed at Freie Universität Berlin and the OpenLink Data Web Browser. We will also demonstrate the Zitgist Semantic Web search engine, which crawls the data and provides an integrated view on it as well as a easy-to-use search interface.

\section{Conclusion}

RDF is the obvious technology to interlink data from various sources. We think that having huge interlinked RDF dataset on the Web is beneficial for various Semantic Web development areas, including Semantic Web browsers, Semantic Web crawlers, RDF repositories and reasoning engines, Semantic Web data linkage, data cleansing and data mining tools.

Having useful, general-purpose data online might inspire people to create interesting mashups and other RDF-aware applications. It might also encourage people to set links to the data and could thus help bootstrapping the Semantic Web as a whole.

The Open Linking Data project is a community effort and we highly welcome further participants.

% % % ---- Bibliography ---- % \begin{thebibliography}{} % \bibitem[Tbl2006]{linkeddata} Tim Berners-Lee: Linked Data. http://www.w3.org/DesignIssues/LinkedData.html

\end{thebibliography}

\end{document}

First draft:

1. Vision of the project

The Open-Data movement aims at making data freely available to everyone. Such data sources include Musicbrainz, Wikipedia, Wikibooks, Geonames, Wordnet, DBLP bibliography, among many others. We believe that inter-linking them will demonstrate the value of the Semantic Web, as it shows the way towards a large machine-understandable cultural web.

(from Danny) Much of the information contained in datasets like these is expressed through links between resources. From the viewpoint of Semantic Web technologies these relationships are data in themselves. Deep semantic linking across such datasets magnifies their total value far beyond their simple sum. (/from Danny)

2. Datasets that are currently (June 2007) interlinked, content and size, access methods

  • Magnatune->Dbpedia, Musicbrainz, Geonames: The Magnatune Creative-Commons music repository has been published in RDF within the DBTune project, and interlinked with relevant data sets, such as Dbpedia, Musicbrainz, and Geonames. As Magnatune is widely used throughout the Music Information Retrieval community, we aim to interlink DBTune with computer-generated annotations, related to tempo, harmony, melody, beats, structural segmentation, and so on. In the scope of the DBTune project, we aim to publish and interlink a large amount of music-related data, based on the Music Ontology. Further work will include other Creative-Commons music repositories, but also musical score data sets.
  • Revyu->RDFBookmashup, Data in FOAF space: ...(give details here).

3. Linking methods (brief examples of how it's actually done)

4. Clients that will be demoed (Disco, Tabulator, Zitgist, Revyu)

  • NOTE: to demonstrate the "power" of the dbtune stuff, I may create a small geo-music-player. The user has a world-map in front of him, and just click on a particular place to generate a playlist of creative commons tunes from this region. Do you think it could go in this section? - Yves

Contributors, add your name here so you will get credit (alphabetical, we can sort out the ordering later)

  • Danny Ayers
  • Chris Bizer
  • Tom Heath
  • Yves Raimond
  • Ted Thibodeau Jr