Agenda: Open Data on the Web, 23 - 24 April 2013, London

Agenda

Tuesday 23 April

08:30 Doors open, coffee

09:20 Welcome & Introduction, Phil Archer (W3C) and Dan Brickley (Google)

09:30 Open Data: Promise and Expectation Chair: Stuart Williams; Scribe: PhilA; (10 minutes per speaker plus 20 mins discussion)

Building our houses on rock John Sheridan, The National Archives [paper] [slides]

These are still early days for open data on the web. We have seen rapid progress with various initatives by national governments and cities around the world, an emerging start-up culture and the growing use of open data by the established commercial sector. As with the rise of the first towns in the industrial revolution, much of what has been created so far has been done at pace, with little, viable, long term supporting infrastructure. As a result, too often we are building our open data houses on sand not rock.

Three issues:

how do open data users know where there is just sand (ephemeral, unmaintain, unsupported open data), and where there is rock, on which to confidently build?
what are the processes or steps that data publishers need to go through to solidify their open data publishing
what experience do we have as a community of various strategies for creating and sustaining reliable and trusted open data

The National Archives will share its experience as a publisher cementing open data.

Can open data (and big data) be used to improve the operations of development organisations?, Millie Begovic Radojevic and Giulio Quaggiotto, UNDP [slides]

UNDP, together with the World Bank and the UN Global Pulse, has embarked on a journey to explore this topic.

Our initial work has focused on procurement data - number of contracts issued, the amount, name and location of suppliers, procurement type, sector, name and location of a project under which a contract was issued, type of partners involved, etc. We use this data for financial management, accountability, accounting and reporting purposes. To date, there have been no efforts to engage wider public in the access to and analysis of these large amounts of data in order to start understanding various patterns and linkages that could lead to a more efficient investment of funds and operational effectiveness.

The open data movement and advancements in analytical skills for big data use today allow development organizations to look again at this data and generate additional value and insight that could help us answer questions such as:

Do certain networks of companies tend to win the majority of contracts in any given sector?
What are the linkages of winning companies with other suppliers within a project?
What is the pattern of knowledge transfer (for example, do projects foster south-south collaboration)?
What characterizes an environment in which certain projects tend to under-deliver?
Are there patterns in terms of what partners are involved in implementation (NGO vs. Gov’t vs. private sector); geographic location; development sector; conflict or post conflict setting; size of contracts issued under a project?
Is there a good match between a project’s location and development priorities for that geographic region?
What is the pattern between contracts awarded to companies from any given country and that country’s relationship with the funding organisation (e.g. amount of loans and programs it receives)?
How can we account for the role of intermediary firms (e.g. consulting companies who prepare documentation that a lead supplier submits) who may play an instrumental role in bidding for projects, but are rarely captured in the official process?
Are there links between companies who tend to win most contracts and their employees (e.g. do they tend to employ ex-development workers, or former government officials)?

We took a stab at answering some of these question by applying social network analysis, using World Bank financial data from their open data portal (write up). We used Tulip to analyze and visualize data.

This preliminary analysis opened up new areas for research and generated demand for having more data sets - integrating open data with big data approaches. It also opened up a new perspective on how to use procurement data as a way to achieve better development results.

We are only at the beginning of this process, but if successful, efforts to use open data in this way could result in a public good for the whole development sector, enabling operations managers in NGOs, international development organisations as well as financial institutions to take more informed decisions. We would like to engage with experts at w3 to make progress towards this objective and improve the effectiveness of development efforts.

Researching the emerging impacts of open data in developing countries (ODDC) Tim Davies, Jose Alonso, World Wide Web Foundation [paper] [slides]

As open data initiatives spread across the globe, research is needed that can deepen our shared understanding of the potential and practice of open data. Those involved in new programs and initiatives in developing countries need to understand the full value and impact of open data in strikingly different social, economic, and cultural contexts. However, researching open data is a formidable challenge. The publication and use of open data raises many socio-technical issues, cutting across fields from budget transparency or urban governance, to innovation policy and natural resource management. Open data also connects across many levels of activity, from community-led standard setting, to the creation of data portals and APIs, to grassroots use of datasets. As this paper argues, we can only understand open data in general, and design appropriate and inclusive technical tools and platforms for open data on the web, if we have a detailed understanding of how it operates in specific situations. The Web Foundation ‘Exploring the Emerging Impacts of Open Data in Developing Countries’ (ODDC) research project (funded by Canada’s International Development Research Centre) is developing a research agenda and evidence base to inform the future development of policy, technology and practice for open data on the web.

Open Data NEXT: a strategy for social & economic value from Linked Open Data Hayo Schreijer, Paul Suijkerbuijk, Ministry of Internal Affairs/KOOP [paper] [slides]

In 2012 the Dutch open data team adopted two new strategies to make open data available that has actual social and economical value. The paper describes the strategies. The paper's focus is on describing our approach to make more open data available that may solve actual social issues around depopulation and disadvantaged neighborhoods. An important part of the approach is to get more value from open data by using Linked Data technology that may also provide short term gains for involved governments by opening their data.

Open Data on the Web: 3 Principles For Maximum Participation, Bob Schloss, IBM (see paper listing for full author list) [paper] [slides]

In support of the wide availability of data, and its use in decision-support applications, IBM recommends that open data, associated technologies, and conventions must be designed with 3 principles in mind:

Incentives: Open data will include a mix of free data, motivated by the desire for transparency, and data that has been made available through incentives such as government regulations. Incentives (not limited to fee-for-data) are foundational to creating the amount of open data on the web that could transform commerce, research, learning, governing and the arts.
Trust and Security in Ensuring Data Quality: Trust and Security technologies must be involved so that consuming applications know that other consuming organizations consider the provided data has quality that is "suitable for purpose".
Provenance and Data Ownership: Clear data provenance and ownership is vital to providing accountability from correct-and-authorized, correct-and-unauthorized, or incorrect open data between suppliers and consumers of the open data is clear to both parties before the data is transferred.

Discussion with speakers plus Bill Roberts, Swirrl IT [paper]

Too many Linked Data publishing initiatives offer the end user a text box in which to enter SPARQL queries and a few words of introductory information – then the providers of the data wonder why the data is not used.

Linked Data is an ideal technology to support a wide range of data presentation approaches and services backed by data, but its value could be enhanced greatly by paying sufficient attention to the types of user interested in the data, what they would like to achieve with it and their preferred ways of interacting with the data.

This paper presents some thoughts on how this can be achieved.

10:40 Feature session Chair: Paul Davidson; Scribe: Mark Birbeck (15 minutes talk plus 15 mins discussion)

The Role of PDF and Open Data Jim King, Adobe [paper] [slides]

There is widespread belief that once data has been rendered into a PDF format any hope to access or use that data for purposed other than the original presentation is lost. While there is an element of truth to that belief, it is not nearly as dire as most believe.

We will spell out the various ways that the use of PDF can contribute to the distribution and availability of Open Data. It is way better than you think!

11:10 Coffee

11:30 Panel: Tabular Data Formats and Packages Chair: Jeni Tennison; Scribe: DanBri; (2 minute introductory comments from each panelist, followed by discussion)

Rufus Pollock, Open Knowledge Foundation See Simple Data Format

Omar Benjelloun, Google DSPL

Stuart Williams, Epimorphics [paper]

If you are genuinely enthusiastic about the open publication of your data… you have to go much further than merely surfacing your data on the web and relying people just 'finding' it, somehow. You need to think about engaging interest in the data and inspiring potential downstream consumers/reusers/value-adders. They need to hear about your data and they need to be able to explore it. You need to provide technical information and potentially a place where questions can be ask AND answered; you'll need to invest in documentation; you'll need to lower the barriers to use by providing developer accessible formats and web interfaces; and you may need to address the needs of your least able, but possibly dominant, reuser in the form of a reusable widget that can be embedded on any web page.

If you are successful, you will be rewarded with increasing usage stats that will reinforce that your data is important/useful to someone and help build confidence to contribute more to the open data publish endeavor.

John Snelson, MarkLogic [paper]

The web of documents contains valuable data. However even after enriching with RDFa or microformats, much of this information lays beyond the reach of the typical tools of the open data movement. This position paper explores avenues for accessing the data in such polystructured documents.

Tyng-Ruey Chuang, Academia Sinica [paper]

What make data open and free to all to reuse? We take the position that for data to be free on the web, it shall be packaged and distributed like free software. To make data collections (i.e. datasets) easy to use and useful to many, we need to consider issues related to their usage both on and off the Web. These issues necessarily involve software support and tool development. Tools made for and lessons learned from free software development, hence, shall apply when data collections are to made free.

We take the view that for a data collection to be open, they shall be freely downloaded, adapted, mixed with others, and rehosted for other services. Being available and accessible on the Web by itself is not sufficient. A data collection must be easily ported to other computer systems, either on or off the Web, for it to be called open. Starting from these considerations, we list below some of the main issues and offer our viewpoints.

12:30 Lightning Talks with a linked data theme Chair: Irina Bolychevsky; Scribe: Steven Pemberton; (5 mins per speaker)

Linked Data, Open Data and Big Data: Understanding the need for all three, Mark Birbeck, Sidewinder Labs [slides]

The ideas of the Semantic Web and Linked Data promise much. Data can be combined in new ways and tools can reason across the data to create exciting applications. But try to build a real application that makes use of these techniques and you'll quickly discover that data is not that widely available...hence the need for more Open Data, particularly from government organisations, but also from novel uses of crowd-sourcing.

But even when you have the data, you'll also soon discover that processing the information is not as straightforward as it seems; the Semantic Web tells us a great deal about how to organise our data in theory, but the practice is something else altogether. Perhaps the most promising collection of techniques to manage data at scale come from the world of Big Data.

Mark Birbeck has been involved for a number of years in the Semantic Web in various ways. This submission will draw specifically on two projects, one for the NHS where the key problem that arose was the cultural changes that would be required to make data available (Open Data) and the second (LevelBusiness.com) where the key problem was managing and processing large quantities of information on a daily basis (Big Data).

Publishing Linked Data Requires More than Just Using a Tool, Raphaël Troncy, Serena Villata & François Scharffe, Eurecom, INRIA wimmics, LIRMM University of Montpellier [paper] [slides]

Open data raise problems of heterogeneity due to the various adopted data formats and metadata schema descriptions. These problems may be overcome by using Semantic Web technologies in order to move from raw data to semantic data interlinked in the Web of Data. However, lifting Open Data to Linked Open Data is far from being straightforward. In this paper, we describe the challenges we faced in developing the DataLift platform and the difficulties we encountered dealing with Open Data towards the publication of semantic interlinked data.

Linked Data at the Science Museum, Tristan Roddis, Cogapp [paper] [slides]

The Science Museum in London, along with commercial agency Cogapp, have recently embarked on a project to convert their information to linked data format. So far the focus has been on consolidating disparate data by converting them to linked data which at the moment is still private not public. However, the museum is keen to investigate the possibility of releasing the data and is seeking advice on this at the workshop.

Open Linked Education: a new Community Group, Madi Solomon, Pearson [paper] [slides]

New W3C member Pearson, would like to chair the Open LInked Education Community Group. In this session, they will provide the evolving educational landscape and why open linked data is the natural state for education vocabularies.

13:00 Lunch Kindly provided by our hosts, Google

14:00 Data Interoperability Chair: Leigh Dodds; Scribe: Hadley; (10 minutes per speaker plus 20 mins discussion)

SPARQL / OData Interop, Kal Ahmed and Graham Moore, Networked Planet [paper] [slides]

Position paper describing an open-source .NET library for a proxy that provides access to a SPARQL endpoint using the OData protocol. This paper discusses the motivation for and technical implementation of this library and the questions it raises for ongoing open data standardisation efforts.

LOD Approach to Creating Health-Sensor datasets built on Big Data and Graph storage technology, Neil Benn, Roger Menday and Takahide Matsutsuka, Fujitsu [paper] [slides]

This position paper describes at a conceptual level FLE’s interest in Linked Open Health Data and briefly introduces some new research FLE is undertaking in the healthcare domain

Mapping Corporate Networks and Public/Private Spending Using Open Data, Tony Hirst, Open University [paper] (unable to attend at last minute)

Affiliation: 1)Dept of Communication and Systems, The Open University. 2) OKF

User context: open corporate data and: 1) investigative data journalism; 2) academic research; 3) transparency.

Technology context: data representation; data linkage & reconciliation; data modeling for storytelling.

For some time, I have been exploring ways of using open company data from OpenCorporates as a focus for a range of technology experiments and learnings.

Discussion with speakers plus Albert Meroño-Peñuela or Christophe Guéret, Data Archiving and Networked Services [paper]

This paper discusses the use of Linked Open Data to increase complex tabular datasets quality, machine-processability, and ease of format transformation. We illustrate this application with the historical Dutch censuses: census data is open, but notoriously difficult to compare, aggregate and query in a uniform fashion. We describe an approach to achieve these goals, emphasizing open problems and trade-offs.

14:50 Lightning Talks with a geospatial theme Chair: Alex Coley; Scribe: Raphaël Troncy; (5 mins per speaker

Open Data on the Web and how to publish it within the context of Primary health care, Mark Herringer, Konekta [paper] (unable to attend)

Mobile health care applications are being developed to improve care within disadvantaged communities in Africa and elsewhere in the world. Often there is a network of socio economic factors in addition to health that the Mhealth user needs assistance with.

If a questionare were to be developed that would shed light on this Socio economic situation what would it look like? Could it be included in the Mhealth application and randomised so as to protect the users privacy? Could it be published as open data?

GeoKnow: Leveraging Geospatial Data in the Web of Data, Jon Jay le Grange, Ontos (see paper listing for full list of authors) [paper] [slides]

Producing and updating geospatial data is expensive and resource intensive. Hence, it becomes crucial to be able to integrate, repurpose and extract added value from geospatial data to support decision making and management of local, national and global resources. Spatial Data Infrastructures (SDIs) and the standardisation eorts from the Open Geospatial Consortium (OGC) serve this goal, enabling geospatial data sharing, integration and reuse among Geographic Information Systems (GIS). Geospatial data are now, more than ever, truly syntactically interoperable. However, they remain largely isolated in the GIS realm and thus absent from the Web of Data. Linked data technologies enabling semantic interoperability, interlinking, querying, reasoning, aggregation, fusion, and visualisation of geospatial data are only slowly emerging. The vision of GeoKnow is to leverage geospatial data as rst-class citizens in the Web of Data, in proportion to their signicance for the data economy.

Interoperability of (open) geospatial data – INSPIRE and beyond, Michael Lutz, Andrea Perego and Massimo Craglia, EC/JRC [paper] [slides]

For the last 10 years, INSPIRE (Infrastructure for spatial information in Europe) has been developing a legal framework and interoperability guidelines to facilitate pan-European and cross-border access to geospatial data from quite diverse domains.

One of the key success factors for INSPIRE has been the active participation of several hundred data practitioners and users from across the EU Member States. As public administrations across Europe are starting the implementation of INSPIRE, there is a growing interest in combining INSPIRE data with other information from the public and private sectors.

The emergence of this demand and the wider move towards Open Data for both scientific and public sector data raises a number of issues, concerning, e.g., the implications of “opening up” data for the organisations, in terms of governance, long-term commitments and costs and benefits, or how to coordinate a framework for the definition of persistent and resolvable identifiers independently of changes at the organisational and technological.

The topics above are obviously not the only ones but addressing these would make a significant step forward in opening up and making use of open data on the Web.

Discussion with speakers plus Philipp Ronnenberg, Royal College of Art [paper]

Maps are power. Those who draw them control the public’s access to the world at a fundamental level--for example, in the 1500s, maps of the New World were worth their weight in gold. These days, we rely on the Global Positioning System, developed by the Department of Defense during the Cold War.

The OpenPS.info navigation system is open. Which means it is not run by companies nor control. The goal is to gather interested people on the web platform OpenPS.info to develop the necessary software, hardware and testing processes.

The idea is to use seismic activity, produced by generators in power plants, turbines in pumping stations or other large machines running in factories. These generators, machines etc. are producing seismic waves, distributed over the ground.

15:20 Coffee

15:50 Panel: The Business of Open Data Chair: John Sheridan; Scribe: Yaso; (2 minute introductory comments from each panelist, followed by discussion)

Conor Riffle and Pedro Faria, CDP [paper]

CDP is the largest climate change reporting system in the world, with over 4,000 companies and 70 of the world's largest cities disclosing information every year. CDP has recently organized an Open Data pilot with some success. CDP has also published an XBRL taxonomy for climate change reporting.

Miguel García Zabala [paper]

The search of public financing, management and tracking of scientific results derived from Research & Development and Innovation projects constitutes a business activity under which a remarkable number of consultancy firms, research centres and public organisations spend an important amount of resources. This expenditure of resources is motivated because the business processes involve working with an extensive and disperse knowledge on public financing, legislation and advanced web search abilities.

Technologies around open data (semantic web, standardisation of formats and final users’ feedback) are an interesting starting point for the optimisation of processes and the cost effective management of data resources related with our business activity.

In this paper, we present the relation of the open data, from a users’ point of view, as well as the business processes related to the search of public financing and elaboration of studies (many kinds) for the development of projects on research and innovation.

Lotte Belice Baltussen Netherlands Institute for Sound and Vision [paper] [slides]

Open Culture Data started as a grassroots movement at the end of 2011, with the aim to open up data in the cultural sector and stimulate (creative) reuse. In this context, we organised a hackathon, which resulted in the creation of 13 Open Culture Data apps. After this successful first half year, a solid network of cultural heritage professionals, copyright and open data experts and developers was formed. In April 2012, an Open Culture Data masterclass started in which 17 institutions get practical, technical and legal advice on how to open their data for re-use. Furthermore, we organised an app competition and three hackathons, in which developers were stimulated to re-use Open Cultural Datasets in new and innovative ways. These activities resulted in 27 more apps and 34 open datasets. In this paper we share lessons-learned, that will inform heritage institutions with real-life quantitative and qualitative experiences, best practices and guidelines of their peers with opening up data and the ways in which this data is reused. Since the open culture data field is still relatively young, this is highly relevant information needed to stimulate other to join the open data movement. To this end, we are already taking steps to cross the borders and let Europe know about the initiative, on both a practical and a policy level.

Michele Osella Istituto Superiore Mario Boella [paper]

In recent years the Open Data philosophy has gained a considerable momentum. In the public realm the free release of PSI datasets, besides enabling novel and promising forms of governmental accountability, paves the way to third-party developed products and services. Nevertheless, PSI re-use performed by private sector entrepreneurs is struggling to take-off due to the presence of numerous inherent roadblocks which are coupled with a certain vagueness surrounding the rationale underlying business endeavors. Taking stock of this evidence, the paper aspires to shed light on the mechanisms allowing profit-oriented value creation based on public datasets. Delving into the intricacy of PSI re-use, the paper portrays eight archetypal business models currently employed by enterprises present in the world-wide PSI-centric ecosystem.

Bart van Leeuwen Netage

The current focus on open data, especially from goverment side is to open up everything we have, and then create APP contests to promote the data. I have 2 problems with Apps in general:

Apps are the new data silo's, so from locked in the government, we move it to locked in the app.
Government data will primarily benefit the government, it opens the potential to make the government more efficient. The apps market based on open data is a hype (IMHO)

16:30 The Exhibitionists Chair: Julian Tait; Scribe: Deirdre; (10 minutes per speaker plus 20 mins discussion - make it lively, it’s been a long day)

Opening up the BBC's data to the Web, Olivier Thereaux, Sofia Angeletou, Jeremy Tarling and Michael Smethurst, BBC [paper] [slides]

This paper summarises the history of how the culture of open data developed at the BBC, and looks at three current projects and the questions and challenges they raise: the development of our linked data platform, a collaboration to model and describe news stories, and an effort to open data about our large radio and television programme archives.

Democratizing Open Data, Alvaro Graves, Rensselaer Polytechnic Institute [paper] [slides]

Progress has been made in terms of the availability of Open Data initiatives in dozens of universities, governments and organizations. For example, Open Government Data (OGD) has been adopted mainly by developers and data analysts, while the majority of people cannot make use of it, except by consuming it via an application or chart. I propose that in order to fully achieve all the benefits Open Data promises (e.g., more transparency, more participation and better informed and more empowered citizens, etc.), it is necessary to “democratize” Open Data by providing stakeholders better tools to not only consume, but to create and manipulate data. This idea goes in the same line as what happened with the Web2.0, where better tools (e.g, video and blogging platforms) helped people to create and share content. I will also present some of the projects I have been working on to create such types of tools for Open Data in general as well as Linked Data. Finally, I will describe the experience of regular users with some of these tools and the lessons learned from such sessions.

Opening Open Data, Andreas Koller, Royal College of Art [paper] [slides]

My contribution to the workshop could be to add the perspective of a graphic and information designer, data visualisation expert, maker and inventor. This paper briefly outlines my viewpoint on Open Data, mainly arguing that it is important to put it in a human context. I think that designers are a significant part of the Open Data movement. I outline which tools should be provided to them in order to make meaningful and relevant Open Data applications.

Large Scale Data & Speculative Maps, Benedikt Groß, Royal College of Art [paper] [slides]

My areas of interest are in large scale data sets, systems and tools. Most of the time my material is software, or to put it differently, code. During my time at the Royal College of Art I have developed an strong interest in (speculative) maps, landscapes, urban data and other large scale geo data. These interests extend my computational, UX, IX and scenario interests.

Discussion

17:30 End of Day 1 - Head for the ODI Reception with Tom Scott (event begins at 18:00 and is strictly by invitation only).

Wednesday 24 April

08:30 Doors open, coffee

09:00 Perspectives & Experience Chair: Hadley Beeman; Scribe: Bart v L; (10 minutes per speaker plus 20 mins discussion)

DNAdigest - a non-profit to promote and enable open-access sharing of genomics data, Fiona Nielsen, DNAdigest [paper] [slides]

The genomics revolution is upon us: the techniques for researching and characterising genomics diseases is available to both researchers (next generation DNA sequencing) and the general public (in form of personal testing like 23andme), so we should soon be able to diagnose any genetic disease by sequencing a patients DNA. This is the glorified goal of research into all genetic diseases, including hereditary diseases and cancer.

However, while data output is flooding research centres around the world, and genomics results published in high-esteemed journals, the sharing of the data that enables this research is embarassingly limited. The data ownership, the legal consent of the patients involved, the privacy of the patients involved and the mere volume and complexity of these datasets are a major hindrance to sharing of personal genetics data. As a result, each research unit is currently maintaining their own 'silo' of potentially valuable sequence and patient data. Needless to say, there may be several big genetic discoveries "out there" already sequenced, but not discovered, because noone has had the means to bring together the matching pieces of the puzzle.

The technological means to solve this problem are already existing and available, but no solution has been proposed until now, because demands a non-profit enterprise to sufficiently engage all stakeholders including researchers and patients and address their concerns while maintaining the goal: the advancement of genomics research.

I will present the underlying idea and strategic plan for how DNAdigest will enable researchers to publish their genomics data in an online open-access fashion without compromising patient consent or data privacy.

Sharing clean energy knowledge via (Linked) Open Data and controlled vocabularies, Florian Bauer, Reegle [paper] [slides]

The world actually already has virtually all of the data it needs to manage the energy and climate transformation. However, this information is not currently accessible to those who need it most. Policy-makers lack the best data to support decisions, and investors and project developers cannot easily identify and exploit the best opportunities. There are two reasons for this:

Information is scattered in silos – largely in “closed” databases at public, private and academic institutions around the world. Most institutions still share the traditional view that data is an asset. Thus, they also believe that holding data privately and restricting access to it actually creates value for the owner.
Every human being and organisation naturally classifies and categorises things in a way that fits their own world view. This means that the information is not indexed or tagged consistently, and that there is no universal library-style catalogue or directory of the content that is (and is not) available.

Opening up and linking data (Open Data), and then categorising it automatically using consistent terms based on common thesauri ("controlled vocabularies") can help dissolve both of these barriers.

Government data on everything from energy consumption to traffic, population and infrastructure is already in the public domain. Making it all available for public re-use is a first step that can unleash whole new waves of development. In parallel, using consistent terminology to classify all online resources - whether public or private - will improve collaboration between companies, across sectors and regions.

The presentation will:

explain why opening up data and increasing consistency in describing knowledge is essential for the energy transformation;
give an overview on existing examples of beneficial (linked) open data cooperations;
show how terms based on common thesauri can help to increase consistency in describing and tagging documents;
demonstrate how free and easy to integrate services (such as the reegle tagging API) can help to connect knowledge brokers

schema.org, Dan Brickley, R V Guha, Steve Macbeth, Peter Mika, Alexander Shubin, Google, Microsoft, Yahoo!, Yandex [paper] [slides]

Schema.org provides a collection of schemas typically used as annotations on HTML tags. Webmasters can use the types and properties defined at schema.org to markup their pages in ways recognized by major search providers. This paper offers a few informal observations about how schema.org fits into the wider ‘open data’ Web community.

Licensing Library and Authority Data Under CC0: The DNB Experience, Lars Svensson, Deutsche Nationalbibliothek [paper] [slides]

Effective from July 1st, 2012, the German National Library (Deutsche Nationalbibliothek, DNB) publishes most of its data under an open license. In doing so, the DNB followed the examples of Europeana and many national libraries, e.g. the Spanish National Library and the British Library, only to mention a few of the early movers. The road we took to arrive there was not free from issues and controversial discussions. This paper starts with a description of the current business model, outlines the road we took to arrive there and finally what are the plans for the future.

Open Government Data Projects in Japan, Shuichi Tashiro, IPA [paper] [slides]

Japan started many projects related to open government data after our cabinet office issued the policy statement paper of "Open Government Data Strategy".

I would like to share our experiences and plans, and would like to learn leading edge experience of open data from the world at the workshop.

10:15 Product Data Chair: Phil Archer; Scribe: Leigh; (5-7 minutes per speaker plus discussion)

John Walker/Tim Nelissen, NXP Semiconductors [paper] [slides]

In the electronics industry, the ability to get accurate, timely product data in front of the customer is a very important factor in the overall business process. Furthermore, enabling the customer to easily compare and select the right product for their application from the choice of literally hundreds, or even thousands, of candidates can reduce the overall time and costs involved in the purchasing process. Opening up access to the data is a key component, whether this is to free the data from existing silos for use within the organization, or making the data available to third parties. Also to facilitate the aggregation of data from multiple parties, it is very important to agree on a common schema that can be used to describe the products and enable easy mapping between schemata. In this paper we describe the approach we have used to manage and publish the data and discuss the questions that arise from a publishers perspective.

Andy Hedges & Richard McKeating, Tesco [slides]

Open data and why it matters to the customer: retailers, like Tesco, naturally have a large part to play in the product data ecosystems, however they are only a part of the ecosystem. By making the data open and accessible between parties we can collaborate with suppliers, customers and indeed other retailers to provide accurate, more timely and useful information to our customers.

Mark Harrison, GS1 [slides]

Philippe Plagnol, Product Open Data [slides]

It is now possible to build a public product repository based on the GTIN code (the unique worldwide identifier printed under barcodes). Open data for products requires that manufacturers publish a numeric catalog containing at least the public information printed on the packaging. This repository will be a revolution for consumers by allowing them to use GTIN codes as a new communication channel enabling them to send and receive information about a product. It will also enable studies on the worldwide product population and help authorities to take decisions. Maps are available on the Web that visualize geographic dimensions and timelines can be derived from social networks: The product Open Data repository will be the basis to representation of product dimension.

11:05 Coffee

11:25 Dumb Strings That Mean So Much Chair: Hideaki Takeda; Scribe: Naomi; (10 minutes per speaker plus 10 mins discussion)

Draft URI Strategy for the NL Public Sector, Hans Overbeek and Thijs Brentjens, Ministry of the Interior of the Netherlands, Geonovum [paper] [slides]

This document focuses on the technical considerations, made during the process of specifying a national strategy for minting URI’s to identify information of the Dutch government on the internet. A group of stakeholders and experts deducted a strategy for the Netherlands. Involved parties were, among others: Geonovum, Knowledge Center for Official Government Publications (KOOP) and the Tax Service. They formulated a strategy of which this document covers the technical parts, in order to get feedback from an expert community at the ODW 2013.

Shared understanding = shared foreign keys (and more), Richard Light, Richard Light Consultancy [paper] [slides]

This paper summarises my experience of Linked Data as background for the forthcoming workshop on Open Data.

Aggregating media fragments into collaborative mashups: standards and a prototype, Philippe Duchesne, High Latitudes [paper] [slides]

This paper presents the ongoing CrossLinks project aimed at building a data mashup standard and platform. The presented work is threefold. It first reviews existing syntaxes for URI fragments and builds upon them to formalize a uniform way to reference fragments of any media type, thereby allowing hyperlinking of sub-elements. These data fragments are then stitched together by devising a model and exchangeable format for aggregated data views. Lastly, a platform is developed to ease authoring, storage, exchange and most importantly visualization of such data mosaics.

Digital Archiving 3.0, Christophe Guéret, Data Archiving and Networked Services [paper] [slides]

Whereas physical archives used to be a place to catalog and keep content safe, their digital counterparts do not have to worry about the risks of altering the items by manipulating them. Digital archives can afford making the content of their safe accessible to the outside world and, as such, are turning into data repositories. In addition to classical tasks of identification and preservation of content, digital archives have to deal with new specific challenges such as the obsolescence of data formats, the control of the dissemination of the digital copies and the access of the content from applications. In this position paper we argue that Semantic Web technologies are a good way to approach this problems and constitute (part of) the future of digital archives.

Discussion

12:15 Discovery Panel Chair: Dan Brickley; Scribe: Uldis; (2 minute introductory comments from each panelist, followed by discussion)

Bill Roberts, Swirrl IT [paper]

Typically data publishers provide some form of metadata to describe their datasets. Currently there is a wide variety of forms and formats for such metadata, with varying degrees of machine readability.

To support easier discovery and use of open data as the number of datasets and number of open data catalogues explodes, interoperable machine readable metadata becomes steadily more important.

This paper discusses the main questions that need to be answered: at ODW 2013 latest progress on a practical solution to this issue for the UK government Department of Communities and Local Government could be presented.

Richard Wallis, OCLC [paper]

Beyond the obvious issues of openly licensing of data, many of the significant barriers to the constructive use/reuse of data are technical in nature. Both in understanding, of obscure focused vocabularies used to describe resources & concepts, and access methods & formats.

Schema.org provides a generic vocabulary to allow the mixing and publishing accross domains.

Chris Metcalf, Socrata [paper]

Some of the biggest challenges the field of open data faces as it matures are those of discoverability, interoperability, and federation. In this talk, Chris Metcalf, Director of Product Development for leading cloudbased open data platform provider Socrata, will describe some proposed methods by which we as open data practitioners and implementers can address them.

Madi Solomon and Marlowe Johnson, Pearson [paper] [slides]

This presentation offers up a recent Pearson proof of concept that revealed the promise of Linked Data in an Asset Enrichment Process as an approach to issues such as DAM, Enterprise Taxonomy Management, Enterprise Search, and educational standards correlations.

Steven Pemberton, CWI [paper]

Although Open Data is often about Big Data, there is a lot of small data on the web that made properly available can be used to help the user's experience of the web. Especially with some small level of integration in the browser, the user's life on the web can be greatly facilitated. And there is even commercial value for the browser manufacturers!

Pascal Romain or Elie Sloïm, Conseil général de la GIronde/Temesis [slides]

The first step has been taken: open data is now on the web. How to consolidate this movement and improve its quality in order to achieve linked open data useful to all ? What is the toolbox needed by data producers to guide their publication efforts ? How data quality can be adressed and fill the needs expressed by developpers, citizens and public or private producers ?

Last year, the speakers have published a checklist of good practices for open data quality with 72 criterias. This checklist is published under a CC-BY-SA license. It is designed to be used by open data producers in order to improve the process of data publication on the Web. It adresses directly three targets : the producer himself who can control better its project and the quality level he wants to reach, the developers and their needs for standards, and the final users and their need of usability.

The speakers will explain the methodology they used to conceive this list and how it's been used to benchmark various open data portals in France. They will present the results of this study with a focus on the global areas where those portals need to concentrate their quality efforts.

Designed to guide opendata publishers, this checklist can also help them to find the way to take the necessary step to linked data.

Linked data is designed to help machines understand the semantic meaning held by data. But the goal of the linked data approach is to create data usable not only by the machines but also by human users.Then it's probably possible and useful to design a more complete checklist with special criteria for linked data approaches. Those quality criterias needs now to be identified. They can help data producers to comprehend the added value of the linked data approach. Furthermore, both automatic and manual testing can also help them to improve their data production processes.

12:50 Lunch Kindly provided by our hosts, Google

13:50 The Crowd’s Wisdom Chair: Uldis Bojārs; Scribe: Agis; (10 minutes per speaker plus 20 mins discussion)

Utilising Linked Social Media Data for Tracking Public Policy and Services, Deirdre Lee, Adegboyega Ojo and Mohammad Waqar, DERI/NUIG [paper] [slides]

What are the concerns of citizens in relation to the new proposal for a ring-road? What are their views on means-testing for medical cards? Are self-employed people finding the online tax system helpful? Traditionally, such question would be answered through commissioned surveys, targeted consultations or journalistic research. Today, citizens are expressing views on topical issues such as public policy and services voluntarily on social media sites. Facebook, Twitter, LinkedIn, etc. contain views and arguments on a wide range of issues, which can be useful for informing policy-makers on public opinion. However, public policy isn’t the only topic discussed by citizens online; how can policy-makers distinguish relevant data from opinions on who should win X-Factor or what is the best sandwich filling? In this paper, we propose a solution for systematically tracking particular topics of interest across a range of social media sites. We also discuss how this solution may be employed for tracking topics related to specific public policy or service of interest.

Bottom up Activities for linked open data, open government in Japan, Takumi Shimizu, Keio University/Open Knowledge Foundation Japan (see paper listing for full list of authors) [slides]

There are two activities of open data and open government in Japan. One is top down and lead by national government. Another is bottom up, community based activities. This presentation introduce cases of bottom up open government activities. I want this presentation is to be a good trigger to develop best practice to implement community based open data/ open government activities closely harmonized with world wide activities.

LODAC is one of major LOD resource providers for art related contents. LODAC is acronym of Linked Open Data for Academia. The project was launched as research project to generate LOD resources of academic and art contents. The project has worked with extra-government body of Yokohama city which manages museums and art centers in Yokohama area. Yokohama Art Spot is a web service to provide information about events of museums and works displayed there. LODAC team has consulted for converting proprietary formatted data to RDF based one, and developed interface for usage through the web. Yokohama art spot project had got to be a catalyst to open data and open government activities at Yokohama.

In this presentation, we would like introduce some other open data activities of local communities and corporation among communities. Some of communities worked together to contribute success of International Open Data Day in Japan. 8 cities conducted various kinds of activities such as hackathon and ideathon. International Open Data Day played the role to coordinate local governments, national governments, engineers and citizens.

“Storytelling” in the economic LOD: the case of publicspending.gr, Michalis Vafopoulos, NTUA [paper] [slides]

The scope of the publicspending.gr project is to generate, curate, interlink and publish economic data in LOD formats that are easily accessible by the scientific community and to provide an user-friendly and objective layer of information that will enable citizens, journalists, business people and politicians to re-discover their own “stories” from data. This position paper presents how our research initiative enables citizens, journalists, business people and politicians to re-discover their own “stories” from economic data.

Discussion with speakers plus Christian Nolle, Good Caesar [paper]

For the upcoming Malaysian General Election we have developed a crowd sourcing site, which allows Sarawakians to report incidents of bribery and corruption on an interactive map.

14:40 Lightning Talks - eGovernment and multilingualism Chair: Yaso Córdova; Scribe: Andrea; (5 mins per speaker)

A Brief Report on the Research Data Alliance Plenary in March 2013, Bob Schloss (for Susan Malaika and Gerald Lane), IBM [paper] [flyer] [slides]

The Research Data Alliance (RDA) is a newly formed organization to accelerate and facilitate research data sharing and exchange. The RDA’s first meeting will take place Gothenburg, Sweden on 18-20 March 2013. This document provides a small report on the RDA's plenary sessions.

Open Data in Data Journalists' Workflow, Uldis Bojārs and Edgars Celms, University of Latvia [paper] [slides]

Data-driven journalism is a use case for open data on the Web and we need tools and technologies that support it. Two important processes of this use case are data discovery and publishing. In this position paper we describe an example data-journalism workflow and emphasize the importance of publishing data (both the final results and intermediate data) for further reuse.

Empowering the E-government data life cycle, Edoardo Colombo, Politecnico di Milano (see paper listing for full list of authors) [paper] [slides]

People shop online, compare online, book hotels and flights online. This happens because the data needed to complete these tasks are easily accessible and a lot of Web sites allows users to query the Web to obtain enough information to be confident. The aim of this work is to propose a framework tailored to extend the internet revolution to public administration. This work is the first step towards an infrastructure allowing people to know in a very easy way the information they need. This paper exploits the Search Computing paradigm. It is a new way for composing data. While state-of-art search systems answer generic or domain-specific queries, Search Computing enables answering questions via a constellation of cooperating data sources, called search services, which are correlated by means of join operations. Search Computing aims at responding to queries over multiple semantic fields of interest; thus, Search Computing fills the gap between generalized search systems, which are unable to find information spanning multiple topics, and domain-specific search systems, which cannot go beyond their domain limits.

Lessons learned (and questions raised) from an interdisciplinary Machine Translation approach, Timm Heuss, University of Plymouth [paper] [slides]

Linked Open Data (LOD) has ultimate benefits in various fields of computer science and especially the large area of Natural Language Processing (NLP) might be a very promising use case for it, as it widely relies on formalized knowledge. Previously, the author has published1 a fast-forward combinatorial approach he called “Semantic Web based Machine Translation” (SWMT), which tried to solve a common problem in the NLP-subfield of Machine Translation (MT) with world knowledge that is, in form of LOD, inherent in the Web of Data. This paper first introduces this practical idea shortly and then summarizes the lessons learned and the questions raised through this approach and prototype, regarding the Semantic Web tool stack and design principles. Thereby, the author aimes at fostering further discussions with the international LOD community.

Interoperability Challenges for Linguistic Linked Data, David Lewis, Trinity College Dublin [paper] [slides]

This position paper reviews the growing need for seamless interoperability and interlinking between multilingual and multimedia web content and linked data that captures linguistic knowledge. It outlines some of the state of the art in this area and highlights some interoperability issues that may be fruitfully addressed in the short to medium term.

15:20 Bar Camp Pitches

Pitch your discussion idea in 60 seconds or less! Ideas already notified:

Needs for an Open Data Ecosystem, Erik Mannens, iMinds [paper]

The Open Data movement is widely spreading. As Open Data are published nowadays it is the data publisher’s responsibility to maintain them and keep them up-to-date. Data consumers have not been able to provide feedback or contribute to the released data. Therefore, Open Data are still published as a one-way channel between the data publisher and the data consumer. In order to overcome these, we introduce the idea of a Read–Write platform based on Linked Data technologies as an enabler for Open Data Ecosystems. Linked Data can address the current lack of a homogeneous feedback loop and handle, at the same time, the emerging co-ownership, versioning and provenance issues. Therefore, they can lead to full Data Cycles in the Open Data Ecosystem, thus maximize the uppermost goal of public participation. The proposed solution comprises a uniform Web Interface based on HTTP and a powerful Data Management system based on Linked Data technologies.
Vector Space, Wolfgang Orthuber, Nummel/University Clinic of Schleswig Holstein [paper]
All real objects have numeric features. It can be online demonstrated how to realize on the web their similarity search. This search consists basically of 2 steps:
1. Selection of an appropriate numeric feature space.
2. Numeric similarity search within a selected part of the space.
This approach has much higher resolution and range than text search, because the feature spaces can be defined to all interesting topics by all domain name owners, if there would be a web standard.

So a web standard would be very useful and my question is: Can we start a working group for introduction of a web standard for worldwide valid numeric feature spaces, so that these spaces can be defined by all domain name owners?
W3C Government Linked Data Working Group Past and Future, Bernadette Hyland, 3 Round Stones, Hadley Beeman, LinkedGov
Plans for data.ac.uk & autodiscovery+trust, Christopher Gutteridge, University of Southampton

Open Organization Profile Document: A method for auto-discovering key open datasets about an organiztation.
Linked CSV Jeni Tennison
Git for Open Data: can open source development tools help us publish open data? James Smith, ODI
URI strategy and linking to legislation, Hans Overbeek
What is an open data business? Bart van Leeuwen
How should we attribute open datasets? Leigh Dodds

15:30 Bar Camp

Take your coffee with you

16:30 Bar Camp reports & wrap up Chair: Phil Archer Scribe: Someone from each group;

Hash Tag

The hashtag for the event is #odw13. Tweets with this hash tag may be quoted in the report of this event.

IRC Channel

Make comments, ask questions, follow along, help to record the procedings by joining the chat room.

Short version: use the Web interface and join channel #odw

Longer version, port numbers etc. See W3C IRC page.

Campus logistics

The workshop is on the ground floor, the café is in the basement. We have the use of the 3rd floor as well on Wednesday afternoon.

There is strong wi-fi throughout the building.

We RFC 2119 MUST be out of the building at 17:00 on Wednesday as another event is booked in.

Timing will be critical - as you can see, the agenda is very full. So please arrive on time and, if you're speaking, please finish on time.

Host

W3C gratefully acknowledges Google for hosting this workshop.

Workshop Sponsors

Adobe logo

Microsoft

Important Dates

Today

Expression of interest — please send a short e-mail to Phil Archer ASAP.

3 March 2013:
Deadline for Position Papers
(EasyChair submission)

26 March 2013:
Acceptance notification and registration instructions sent. Program and position papers posted on the workshop website.

22nd April 2013, 19:00
Open Data Meetup – London

23rd April 2013, 09:30
Workshop begins

18:00 - 20:00 ODI Networking Evening Sold Out!

24th April 17:00
Workshop ends