The workshop was run as part of the Crossover Project which aims to build a stronger community around the use of ICT for governance and policy modeling, and to produce an update to the roadmap created in the preceding Crossroad project.
The report below roughly follows the order of the agenda and highlights the topics discussed around the main presentations. In addition to these there were several lightning talks and other interventions that are linked from the agenda although not highlighted here (for example, Gwyneth Sutherlin's talk on the Iranian blogosphere and Anneke Zuiderwijk's work on Impediments, challenges and recommendations for using open government data). You can get a more raw feel for the event from the minutes (day 1 day 2) and see what caught everyone's attention most from the archived Tweets.
The event began with the first of two sessions looking at some applications that make use of Open Government Data (OGD). These included DERI's Galway Volvo Ocean Race app for Android and iPhone. To create the app, DERI had to convert various data sets into linked data and then enrich that data through crowdsourcing. The result was a highly usable application that acts as an excellent showcase for what is possible when linked data ideas are applied to real world situations.
The approach taken by DERI takes a significant level of commitment and expertise. Many developers will prefer near-ready solutions instead to create new applications and services with minimum effort. The Citadel on the Move project aims to provide the infrastructure to make that possible and, moreover, to implement the Malmö Ministerial Declaration. As well as creating templates for developers of mobile applications, specifically those centred around travel and tourism, Citadel on the Move will use Living Labs as a means to engage end users.
As well as their work on the Galway Volvo Ocean Race, DERI presented their research into existing applications that use OGD. After looking at more than 350 apps, they were able to identify 13 common themes for apps (e.g. health & safety, entertainment, sport and transport). The research also shows that the majority of apps:
In the ensuing conversation, Vagner Diniz highlighted the importance of the full value chain from the data through to the end user. Publishing data in open formats, especially linked triples, requires expert help and such investment needs to be demand-lead. What actually happens is that a lot of OGD is released without any demand from users or indeed care within governments as to whether it will be used or not. There was a contrast between the use of OGD in Manchester where it is often used by the public sector itself and Brazil where no such 'internal' use was detected.
The discussion at the end of the first session set the tone for much of what followed: open government data is only one aspect of the story. The full story includes demand from users and interpretation of the data so that the information encoded in the data can be understood, particularly by people who have no interest in data per se and for whom the acronym OGD is meaningless.
Cultural heritage data is generally held in the public sector although it is perhaps slightly removed from what one thinks of as typical government data. By virtue of the subject matter, the data published by GLAMs is often visually attractive and inherently interesting in a way that public spending data or the location of bus stops may not be. The Open Culture Data project (Open Cultuur Data) presented the results of a hackathon that took place last November in the Netherlands. The winning entry made use of a video dataset and smartphone capabilities to match a person's location with video taken in a given area. Novel applications like this are ideal vehicles for showing owners the kind of innovative uses to which their data can be put.
One example of how Linked Open Data is being used to very good effect to inform discussions held by policy makers and others is the clean energy information portal, Reegle. Data is taken from many different sources, triplified where necessary, and then combined and presented to more then 220,000 users per month through a well established information gateway. Reegle provides high quality information on renewable energy efficiency and climate compatible development around the world as easily navigable graphs and tables with a lot of additional information on hand too. Importantly this is an example of a tool that interprets raw data to provide useful information and context for end users.
Measuring the impact of tools such as Reegle on policy-making is not easy, however, after the event, Florian Bauer shared part of a survey conducted amongst 33 of the projects that are funded by his organization (REEEP). Crossover/W3C is grateful for permission to publish the following excerpt ahead of the main report:
Robust-decision making regarding targeted policies in clean energy development depends on a variety of information and data. Such data has to be analyzed and baselines have to be established for benchmarking.
Open (Government) Data can support policy-making and implementation in many areas of sustainable development – some examples are:
- An example is the biomass briquetting market where factors such as data of different biomass resources, their geographic distribution, quality and energy use data to determine the right polices. Yet there are still significant difficulties in accessing the needed data which comes from different sources.
- In the area of electricity transmission and distribution there is a great need for detailed and reliable technical data, yet much of it is being kept secret by utilities and authorities.
- Another field is the establishment of baselines to identify most efficient systems, and again the relevant data is often not available and accessible.
- Renewable energy potentials, like solar irradiation, are another crucial consideration for policy-makers and project implementers.
Bringing end users, open data practitioners and public bodies together is the aim of the Open Data Cities project in Manchester. Although end users - i.e. people - regularly cross borders between the ten local authorities, two cities, two aspiring cities and four pan-regional bodies, it is a fact of life that public administrations are geographically bounded as well as financially bounded. Therefore any service provision must be self-sustaining and is likely to have to operate "cross-border" if it is to be useful to people who really don't care if they're in Bury, Bolton or Wigan (3 of the divisions of Greater Manchester).
As Julian Tait says: There is no inherent logic in open data - that is, it is usually meaningless out of context - so what's the point of it? To answer this, a data store was set up to which all 10 of Manchester's local authorities contribute and politicians, developers and artists were brought together for themed hackdays around grand challenges. The result is an active and enthusiastic community that has produced several useful proofs of concept, such as a hackday-winning bus timetable app but no finished products. Crucially, however, business cases are becoming manifest both for private enterprise and for local authorities themselves. For example, a simple trade-off can be made between providing real time displays on bus shelters or a single real time data feed that can be used on mobile devices. Such trade-offs can point to substantial savings. Furthermore, Manchester is often cited as the place where a potential big user of OGD is the public sector itself.
After the event, Julian Tait provided further background for the figure of £8.5 million that is often quoted in the context of Manchester as the cost of not providing open data and that is cited in his slides.
The £8.5 million figure was based on an rudimentary internal audit that was conducted by Trafford Borough Council that was then extrapolated to cover the whole of Greater Manchester. There are a lot of assumptions in the figure and I did caveat it whilst doing my presentation - The figure has cropped up a few times especially from the Cabinet Office. The figure was derived from:
An estimate that at any given time during a working day there was at least 60 people within the local authority who were unable to locate the data they required to undertake their jobs. This was costed at a nominal figure of £15 p.h. They then extrapolated it across the 10 Greater Manchester authorities to give the £8.5 million figure. The actual figures used on the method were actually £15.84 million per annum but these were revised down due to local authority salary bands.
60 people x 10 = 600
Hourly rate = £8 per hour, which is less than the £15 per hour used to calculate the cost of FOIA requests
Hours per day = 8
Working days per year = 220
These are the figures that were supplied by Trafford Council
The other example was the cost of installing passenger information displays across Greater Manchester. This was an example used by TfGM when it first gave its support to open data in 2010. It worked out that to install the realtime passenger information displays on bus stops across Greater Manchester could potentially cost £21 million as there are 14,000+ bus stops and each of the displays costs approx £1,500 each including installation. I didn't state a figure in the presentation, I just said that you can see that it would cost a lot of money. If you cost it at £21 million you are assuming that every stop has a display which in the real world won't happen.
On another note Jay Nath (CIO, San Francisco) announced this calculation a couple of weeks ago with regard to 311 Call savings from making real time transport data available.
One of the major recurring themes of the event is that publishing data is only the start of the process. What's needed is the active involvement of data users and data interpreters. What is the data for? What does it mean? What can I do with it? So you've given me all this data, so what? This is summed up neatly in the 5 stars of open data engagement:
|Be demand driven|
|Put data in context|
|Support conversation around data|
|Build capacity, skills and networks|
|Collaborate on data as a common resource|
See Tim Davies' paper for details.
These ideas resurfaced in different ways in other papers. In theory, publishing data leads to more transparency, new businesses, better evidence-based policy making and increased public sector efficiency. For that to work, however, different actors in the chain - whether other public administrations or individual users - need to have co-ownership of the data and be able to participate directly in its correction. That raises issues surrounding licensing and the workshop saw examples of open data licenses from the UK, France and Austria. If licensing can be agreed upon, and if everyone concerned is able to use the same platform to publish and offer feedback/corrections directly to the data, then a much healthier ecosystem will evolve.
If open data is one side of the coin, then analysis of social media is the other. A lot of effort is going in to trying to extract public opinion and sentiment towards policies. The problem, from a policy-maker's point of view, is that the free or low cost tools are not reliable indicators of public opinion and the tools that are better at extracting real data from social media content are of course expensive. The future challenges for opinion mining are:
A further point to note is that only a percentage of social media interaction is carried out in public. This is particularly so in Facebook where individual users' walls are not accessible although discussion groups are. It is these pages, focused on a particular topic, that provide the richest seam for sentiment analysis and opinion mining.
The integration of open government data (OGD) and social media data (SMD) is an ongoing research topic at the University of Macedonia whose outputs have significant potential for policy modeling, OGD tends to be objective whilst SMD is highly subjective. Policy makers will soon be able to see the subjective reaction to objective changes through a dashboard that is powered by linked data.
It was unfortunate that Oluseun Ongbinde of BudgIT was not able to attend the workshop as his work offered a perfect segue from the ideas discussed so far around engagement into the presentation of open data through mainstream media. The BudgIT platform turns the Nigerian budget into an interactive document, complete with commentary channels via the Web and SMS.
Open data doesn't necessarily begin with a centrally located server and end with a user's computer. The workshop heard about how 'open data' is gathered as voice recordings that reach other end users by means of radio programs. Similarly, market information in Mali, enriched through linked data mechanisms, can be accessed either through the Web or through radio and the data itself distributed across many devices, thus eliminating the reliance on expensive (and rare) data warehouses. The central concepts of decentralization and open data as a two-way medium have been proven to be very powerful and empowering.
A different use of open data can be seen in mainstream media such as the BBC. Television and print media have long used graphics to explain news stories but those graphics can now be derived from the data and presented online. Interactive graphics are generally very popular and reach mass audiences very effectively. However, simple graphs seem not to have the same impact as more innovative graphics.
The BBC contributes and uses substantial amounts of data for services like its Wildlife Finder and music pages. Importantly, the use of open data engages the audience and invites them to join the search for innovation by seeking feedback on data visualizations. It's often impossible to predict how an audience will respond to a given graphic. Again, this reiterates the underlying theme of the workshop that engagement is what turns data into useful information.
A lighting talk by Farida Vis on allotment data provoked a good deal of praise and discussion. The demand for allotments in Britain — small pieces of land that can be used for growing vegetables — far outstrips supply. By taking data from multiple sources, including Freedom of Information requests and from the allotment holders themselves, the demand for, and social good done by, allotments is made clear. This is data could facilitate much better evidence-based policy making by local councils, simply by making the waiting list data easier to access and the location of existing allotments easier to see. The allotment data work struck a chord with the workshop as it is a prime example of useful data concerning a topic that genuinely engages the public.
As noted in the applications session, geospatial data is of critical importance to many uses of open data. The European INSPIRE Directive and the work that flows from it is a substantial effort to improve interoperability in this area. It is perhaps surprising that interoperability issues of a specific domain, like the geospatial one, are basically the same, preventing an effective cross-domain discovery and aggregation of government data, as described by Andrea Perego from the Institute for Environment and Sustainability, part of the European Commission's Joint Research Centre. Use of INSPIRE by public administrations is mandatory within the EU and that brought the workshop on to discuss the wider policy implications of open data.
NASA's Chief Knowledge Architect, Jeanne Holm, is currently on secondment to the White House to work on data.gov. Andrew Stott is a member of the UK's Transparency Board and former Director of Transparency & Digital Engagement. Franco Accordino heads the European Commission's Task Force on Digital Futures and Simona De Luca works in the department for Cohesion within the Italian Ministry of Economic Development. The four panelists between them were able to offer insights into the way that open data is perceived and used within various governments.
In one of his first acts as president, Barack Obama signed a memo for transparency in an effort to strengthen the feedback mechanism through which citizens can comment on US government policy. The means of achieving this has evolved over time from rating ideas through to monitoring Twitter. The primary mechanism for citizen feedback today is the use of online petitions. Once the number of signatures exceeds a threshold, currently 25,000, an official response will be issued. Similar mechanisms exist in several countries. In Britain, for example, petitions with more than 200,000 signatures are debated in the House of Commons.
More directly related to the topic of the workshop, Andrew Stott reported that the publication of open data in Britain has lead directly to changes in the policy making process. The publication of the UK crime map, for example, has prompted a change in the way police resources are prioritized. A major user of open government data turns out to be the government itself which is benefitting from much more efficient access to information. Although this has not been measured in a systematic way, it is evidently happening. It is also apparent that MPs are more likely to use sites like http://www.theyworkforyou.com than (the official) http://www.parliament.uk. Analyzing some open data, even though it has been anonymized, can have predictive results that could be applied to specific people. For example, it is now possible to predict children who will be at risk even before they are born. That opens up policy making questions of acute sensitivity.
The European Commission's Digital Future Task Force has just begun to gather ideas for evidence-based policy making that can begin in 2020 (part of the Digital Agenda/Horizon 2020 work). Interestingly, Digital Futures brings scientific data together with subjective opinions that may be contradictory. A new platform will provide open access to data and perform reasoning analysis on it. The Commission is keen to work with the US and other governments on this as data has no borders and the problems faced are global. The project is very much in line with the idea that government open data and social media comments are two aspects of a single communication channel. In the light of this, Franco Accordino asked a thought provoking question: "In a time of ICT/online public engagement, will Switzerland still need so many referenda ?"
The open data movement is now gathering momentum in Italy with the initial focus on publication with visualization and tools to come. Evidence available so far suggests that re-use of government data goes hand in hand with making it available through a single point of access which is what is being worked on. Experiments are underway to improve the speed of publication of data, select the data to publish, clean it and provide metadata for it. At this early stage, the data has not yet been used for policy modeling and work has not begun to link data sets.
Franco Accordino held out the prospect that, from a policy maker's perspective, citizens could become one big crowd that can reach consensus. We are now better able to influence a system that has been working for years in some way. The Swiss already have a culture (and resources) for it through referenda, but now with new ICT systems, governments are adapting more quickly to an ever changing world. Data is about the past but it can be used to extrapolate and make predictions about people's desires, if the open data community can be successful in reaching non specialists.
One of the most thought provoking sessions of the workshop looked at the relationship between open data, transparency and empowerment. This session perhaps acts as a summary of the whole two days since its themes were rehearsed and repeated in different ways throughout the event. In his paper Martin Murillo of Cape Breton University said: "…transparency is not enough for the reduction of corruption as other necessary conditions must also be present: that the citizen must be able to receive available information, that the different audiences can understand such information, and that there exists a mechanism to hold the government accountable (i.e. free and fair elections and other checks and balances, generally present in a democratic system)." It is not enough to publish data, the public needs to be able to understand the information reflected in the data and have the means to effect a change and that means to effect a change must exist between elections as well as at election time. If a dictatorship publishes a load of data, that might tell you how corrupt they are, but you may not be able to do anything about it.
As an example of this, Murillo reported that in a simulation, Nigeria's transparency was increased to the same level as Sweden, but no decrease in corruption was observed unless such a raise was accompanied by increased levels of education and press freedom, and the establishment of accountability mechanisms such as free and fair elections.
One problem with open data as a means to empowering citizens is that citizens need to understand the data. We don't necessarily understand where all the data around us comes from or what problems were encountered in its production. Furthermore, some developers may see data as a black box and take delight in the ease with which data can be visualized in their applications through the use of simple and simplistic APIs.
A problem with re-use of data is that, almost be definition, the re-use takes place in a different context than that for which the data was collected. Such re-use may or may not make valid assumptions about data accuracy and it is all too easy for errors to be multiplied when disparate data sets are used together. In making use of data, developers and the companies they work for have a duty to make use of the metadata that describes it as much as the data itself. Conversely, data publishers have a duty to make such metadata available with the US Census handbook held up as the gold standard (637 pages of metadata).
Katleen Janssen and Helen Darbishire raised similar issues:
Three years ago the ECHR agreed that access to data is part of freedom of expression and this opinion was matched by the UN less than a year ago. The right to data is powerful, as exemplified by the Open Knowledge Foundation's request via Ask the EU, for EU spending data which it was then able to republish in a machine-readable format.
In some cases, the mere fact of a request for information can force policy makers to take into account data they may otherwise have overlooked.
The overall message was clear: workshops like PMOD won't help directly. Open data activists have a responsibility to reach out to local activists, show what can be done with data files and realize that new dependencies are being created. The open data community should recognize its role as gate keepers.
In the ensuing discussion, several points were made:
Finally there was a question about the limits of open data and the relationship between open data and personal privacy. Is it viable to live in a society where everything is open? Several countries, such as Norway, put all tax returns online. That's an idea that is unlikely to work in, say, Belgium.
A challenge for open data advocates, especially linked open data advocates, is proving the benefits. How can you show the advantages of 5 star linked data before you have sufficient data. It's a chicken and egg problem that is being tackled in Greece, Latvia and Russia. In all three cases, there does not yet exist a substantial corpus of open data (linked or otherwise) but projects are running that demonstrate the potential.
In Greece, NTUA has created publicspending.gr. This takes data from a variety of sources and from the combined data the system is able to derive graphs showing where public money is being spent and which departments are spending it. At a time of severe economic difficulty, this initiative - the first linked open data application in Greece - is attracting serious attention in the Greek government as a possible aid to policy making and transparency.
In Latvia, the workshop heard about efforts to visualize and therefore make sense of data that is normally presented in HTML tables that are hard to interpret. Impenetrable figures become much more meaningful when visualized as a network as it becomes easy to see that, for example, some groups of MPs always vote the same way.
In Russia, the Ministry of Education and others have developed data sets around nano technology and mathematics that have been added to the LOD cloud. The input data came from multiple sources and had first to be triplified with the output being used to power a nano technology news portal, a directory of experts and a range of graphical outputs useful for policy making. This is really a showcase for Russian open data that is also the subject of round table events, Russian language guidelines and more. It follows on from Presidential Decree 601, made on 7 May 2012 supporting open data in Russia.
The common thread here is that many countries are exploring the possibilities afforded by open data and it's the high quality demonstrators and visualizations that show the potential.
Part of the attraction of opening up public sector data is that it can fuel a new economy. "Data is the new oil" we hear — but is this actually the case and, if so, what are the viable business models? The workshop heard the results of a detailed survey into this question by Istituto Superiore Mario Boella as well as about the direct experience from Open Corporates. In the ensuing discussion, which included input from several other companies seeking to make a profit from open data, there was agreement that the value, i.e. the basis of any profitable work, is in turning the freely available raw data into something genuinely useful that customers will be prepared to pay for.
As the diagram shows, there are several steps between the raw data and the end product. It is common to see successful businesses built on top of open source software. Companies like Open Corporates and Yucat are being built on open data using many of the same ideas. The products can generate revenue in a number of ways:
There's also the interesting idea that a company like Yucat or Listpoint saves public administrations money by curating and processing their own data before making it readily available. The difference between what it would cost a public administration to curate its own data and what it costs to pay a third party is where the company can make a profit whilst the public administration still saves on its own costs. In Open Corporates' case, any enrichment of the data must be done under a share-alike agreement. If a company wants to avoid sharing its enriched version of the OC data then a fee is payable.
David Mitton of Listpoint puts it like this: "We would engage the systems integrators (SIs) at bid stage, educate them on how to achieve data interoperability and systems integration more efficiently and cost effectively by managing bottom up data standards via reference data mapping. By doing this we would help lower the cost of public administration projects (bid prices of the SIs and in return set up a service based SLA with the SI for wrap-around services and support). So the cost of using the services hits the SI but the savings impact their bottom line and through to the contract price."
It was also noted that governments need to be part of the disucssion. It's important for businesses to have confidence that the government data they build a business on will be available, accessible, and in a usable format. The absence of such confidence can be the biggest barrier to true innovation and economic benefits. There are examples from the US where substantial prizes are on offer for application development but this is not repeated to the same degree in the EU, at least, not at EU level.
The workshop ended with a look at the various tools that exist today that can help the policy making process, with a detailed look at just two of them.
The Crossover Project's research roadmap includes an extensive survey of the demand for tools and how different tools are relevant at different stages in the policy making cycle. Hans Rosling's Gap Minder tool was presented as an example of a tool that can generate high involvement of citizens in policy-making through making sense of large data sets. The visualization of economic data is useful for assessing and forecasting trends, as well as for observing the localization of economic activities, especially for regional economic analysis. The visualization of health data allows the display of historical trends, incidence rates, rate/trend comparisons, screening, risk factors etc. As the workshop emphasized throughout, it is only when data is turned into information that can be used and understood without any need for technical know-how that it takes on meaning and value. Relatively simple graphs may or may not catch the public attention but for policy makers they can be extremely revealing and powerful.
Two linked data visualization tools were presented: one that allows users to explore statistical data, based on the Data Cube vocabulary and another that allows linked data sets to be mixed and presented on detailed maps of Spain. Those data sets might themselves make use of the Data Cube vocabulary. The important point about both tools is that they make it easy to explore multiple data sets at once, perhaps uncovering "unknown unknowns" - answers to questions that the policy maker may not have thought to ask.
The research roadmap presents a number of short term and long term research goals and encourages people to comment via the Crossover Project Web site.
The workshop explored the reality and the potential of open data as a communication medium between governments and their citizens. The richness of the papers, presentations and discussions that lasted two very full days cannot be represented fully in a report of this nature. Several lightning talks and supplementary papers have not been cited here at all and readers are encouraged to refer to the agenda that links to all relevant material. Nevertheless, we can attempt to extract some core messages.
I would personally like to thank:
for such an enjoyable, interesting and successful event.