Open Data on the Web, 23 - 24 April 2013, London (Topics)

Topics for the Workshop

The main topics of the Workshop are:

discoverability
transformation (to other formats)
combinations of data from different models (e.g., linked data and CSV)
quality assessment and self-description
extracting human-readable "stories" from data

Below, the Program Committee highlights a small number of topics of known interest, but position papers from participants may cover a broad range of open data topics. For all topics on the agenda, Program Committee has a goal of greater alignment between open data publishers and those who deliver open data products and services.

Data Discovery, Data Description, Data Meaning

The majority of open data is published through data portals. These provide a catalog of the available data sets, links to applications that use the data, and more. These work well for human users but what about automatic discovery by machines? What if you could search the web for data as easily as you can for documents?

Wouldn't it be great if an application could find the data on a given site automatically, and understand what it contains?

The use of the /.well-known/ URI prefix [RFC 5785] is a good candidate for this and has been implemented in some cases for VoID data descriptions, but more standardization might be needed as well as the adoption of best practice by data publishers.

Assessing Quality

The advantage of manual discovery of data is that the content, provenance and quality of the data can readily be be assessed, but how can the same assessment be applied to something discovered programmatically? What is the key metadata that is needed and how should it be exposed? If the right metadata is provided, can the machines do the rest unaided?

This is as important to data publishers as it is to data consumers and is crucial to the effective use and reuse of open data. How can provenance and annotations be integrated into the data processing and production chain?

Web-Oriented Data Formats for Tabular Data

Data is and will continue to be published in a variety of different formats. Some data goes through careful and detailed preparation before publication, other data sets are saved directly from common desktop software and published ‘as is.’ Applications need to be able to handle this variety.

The majority of open data is published as tabular data in CSV files that can readily be converted to JSON, but are some methods of doing this better than others? What would a generic API for a CSV file look like? Are there ways of structuring or packaging CSV files that make them better for handling sophisticated data? How would that play with linked data, whether accessed via a SPARQL query or via JSON for Linked Data [JSON-LD]. See the Data Protocols site for a survey of current standards in this area.