Uses of Open Data Within Government for Innovation and Efficiency: Report

Introduction

The first Share-PSI 2.0 workshop took place as part of the 5th annual Samos Summit on ICT-enabled Governance. The Share-PSI 2.0 partners shared a lot of their experience of developing open data and data sharing strategies across the public sector in different European countries and this was augmented by several external speakers as well as attendees whose primary interest was other aspects of the Summit. This report summarises the discussion which was captured in a set of raw notes, photographs and tweets. All papers and slides are linked from the agenda.

The 'Family Photo' of all attendees of the first day of the Samos Summit
The 'Family Photo' showing all the attendees on day 1 of the Samos Summit

The overall theme for the Share-PSI 2.0 sessions was Uses of Open Data Within Government for Innovation and Efficiency and the Summit began with an introductory speech by the Deputy Minister at the Hellenic Ministry of Administrative Reform and e- Governance (MAREG), Dr Evy Christofilopoulou. Delivered on her behalf by Nancy Routzouni, who represents MAREG in the Share-PSI network, the speech set the scene for much of what followed.

As the workshop heard from many countries, Greece is implementing its strategy for using ICT, especially the Web, to make its data available between departments and to its citizens.

To further unlock public sector information, we have prepared the required legislative framework to actively endorse the principle of “open by default" and make government data promptly available, in open format, governed by standards, with a view to developing an ecosystem of open, interoperable services for sharing and re-use.
[…]
Because to open is to trust

Plans and Implementations

The core ideas expressed by the Greek Government Minister clearly resonate in many countries and regions: Flanders, the Canary Islands, France, the Czech Republic, Spain, Slovenia, Serbia, Croatia, Austria and now even Albania; cities like Gijón and Helsinki. In all these cases, elected politicians as well as civil servants recognise the potential benefits of opening data for sharing among colleagues and citizens.

Heather Broomfield and Steinar Skagemo of the Norwegian government (Difi) introduced the concept of a traffic light system that, although not an official designation, is a handy way to describe whether and how data should be shared.


Red: Closed. For internal use and to the customer.
Amber: Shared data – internal between public sector organisations and to the customer.
Green: open for all
The traffic light system that (informally) classifies PSI in Norway

They emphasised that the idea of 'open data' comes as a natural consequence of the real goal that is to share data. The Norwegian government mandates the sharing of data and implements a single authentication and sign on system that is used across all online services. In this way many application processes are made much more efficient. If a foreign national applies for a residency permit, for example, the government already has all the information it needs to process the application. Likewise it already knows what you're entitled to, that you have a child approaching school age or whatever. By collecting and sharing data about what payments companies have made to you, the government already knows your income and so can calculate what tax is owed without asking you to fill in a form. This makes the Norwegian tax collection system one of the most efficient in the world.

The sharing of data has three benefits:

  1. Design for sharing improves efficiencies;
  2. Improved data quality and service delivery;
  3. Data sharing within the public sector provides for greater savings and better services.

A prime example of the latter is the meteorological office whose public mission is to protect lives by providing accurate data – and Norway has a lot of weather. The business register is, of course, a key dataset. Sharing that avoids the need for different agencies to keep copies of the data and allows citizens' access too. In a country of only 5 million people, the business register is accessed approximately 1.4 millions times per month.

In Helsinki the focus has been on bringing policy makers and citizens closer together. The rough the Open Ahjo initiative, the agendas and topics for discussion at various meetings are publicised. This is now being extended to making the data on which decision are made available as well. Things like population statistics, administrative boundaries and financial data. Of course making such data available means going against the grain for many policy makers who instinctively fear such openness but others welcome it.

Night time view of snow Helsinki with speech bubbles saying thing like
Make it easier for citizens, politicians and civil servants to have a dialogue via social media
Helsinki administrators want to interact directly with citizens

One problem highlighted by Ville Meloni from Forum Virium Helsinki was that of describing locations reliably across different datasets. This was discussed in one of the bar camp sessions lead, naturally enough, by Athina Trakas of the OGC. In that session she pointed to the recent joint W3C/OGC workshop run under the SmartOpenData project that, subject to confirmation, will result in a new joint working group being formed later in 2014.

The Canary Islands are, of course, very different from the Finnish capital. With 10 million visitors per year, the island of Tenerife has 31 municipalities, more than 500 hotels, over 1000 restaurants plus museums, events and other attractions. José Luis Roda García from the Universidad de la Laguna described the 4 stage plan to enable data-based efficiencies and innovation:

  1. identification of potential data providers and their datasets, both public and private sector;
  2. requirements development, standards selection and portal implementation;
  3. application development:
  4. dissemination through competitions, meetings, talks and student motivation.

Although the local context in Tenerife is very different from Helsinki, Oslo or Brussels, it is still the case that policy makers' natural reluctance to share data had to be overcome and their high level support was as crucial to the success of this project as any other open data project.

The words strategy, concept note and action plan with a rubber stamp saying 'Approved'
You need an plan and you need high level support

Flanders approached the challenges in a top-down fashion, developing and implementing a framework and action plan to enable the implementation of open data on all levels of government. Amongst other actions, the plan included an annual Open Data Day in Brussels. That event attracted 230 people in 2012 and 250 in 2013. The uptake of data by the demand side is disappointing though. The focus of the third edition of the open data in Flanders on October 3rd 2014 will therefore be on the demand side, the users of open data; companies, organisations, developers and individuals using Open Data for re-use, a bottom-up approach this time. CEOs, CIOs and project managers will get the opportunity to voice their expectations and recommendations with respect to the open data policy and implementation at the Flemish government. They are expecting more than 300 participants.

The Flemish government also initiated and co-financed the VIP projects (Flemish Innovation Projects), inviting governmental organisations to submit innovative open data projects, the goal being to encourage the use of open data within government for innovation and efficiency. 24 applications were send in of which 10 projects were chosen. CORVe's Noël van Herreweghe described the 10 entries that were selected to receive a total of around €500,000.

Impact Studies

On behalf of the University of Economics in Prague, Jan Kučera presented a comparison of different approaches to open data taken by two different Czech organisations. One followed a careful top-down approach. The list of datasets was carefully chosen and each one prepared for publication. In contrast, another approach was driven by FOI requests. Frequently requested datasets have been selected for opening up and less emphasis was paid to data quality and process. This can be thought of as a bottom up approach.

Both approaches have their advantages. The feedback mechanism implemented by the top down approach lead to better quality data through user feedback but bottom up seemed to have broader impact and wider uptake of the data.

Another way to analyse the effectiveness and efficiency gains, or otherwise, of open data policies, is to apply Business Process Modelling. This has been done by Neven Vrček and his colleagues at the University of Zagreb who considered a number of challenges around the collection and provision of environmental pollution data. The quantity and variety of data in this domain is large and it's particularly sensitive to errors. Therefore the quality of the data handling processes is as important as the data quality itself. By using Business Process Modelling, such as BPMN2.0, it's possible to track and manage the whole data lifecycle and calculate a cost per unit of data.

In his work on Open Data Trentino, Feroz Farazi highlighted the need to make data publication an integral part of change management within public administrations. Working with a number of partners, Open Data Trentino has been building data models that help make data more useful and more reusable by modelling data as entities, making cross correlation much easier.

The Hungarian Scientific Bibliography (MTMT) was presented by András Micsik. It serves as a national scope registry of research results with strong quality control, so it can serve various statistics about Hungarian research, although they face various legal and copyright problems with opening up data.

After the presentation it was revealed that similar services are desired by other countries as well. Sweden is building a similar system and facing similar problems as are Belgium, Albania, etc.

Metadata

In any discussion of open data, the subject of metadata is usually high on the list of hot topics. Peter Parycek and Johann Höchtl of the Danube University Krems described the work in Austria to create an open data portal for businesses. Launched during the workshop itself, the portal includes datasets from companies like IBM and HP. Importantly, it uses the same metadata schema as the Austrian government's data portal. There was some heated discussion about the decision not to use DCAT as the basis for this metadata, however, on closer inspection most of the terms used are exactly the same Dublin Core terms used by DCAT.

Spain is famously a federation of regions and so it's no surprise that data catalogues in Spain are also federated. The legal framework is in place across Spain such that each region is obliged to publish data with a common metadata structure, namely the DCAT Application Profile published by the European Commission (written by Share-PSI 2.0 partner Makx Dekkers). This means that the data, wherever it may be published, is accessible via the centralised portal at http://datos.gob.es/. Local administrators are given access to the central site to set the URL of their data feed. These multiple feeds are then made available from the central site as both ATOM and RDF feeds. A total of 1,600 datasets are available via datos.gob.es at the time of writing, along with a number of APIs and widgets.

Transport

Screenshot of Gijon transport mobile app
One of the apps using Gijón's open transport data

An area where open data often has most take up from external developers is transport. Making data available is not new in France but hitherto has been done be each public administration with little coordination. Open Data France is tackling that issue and, as Philippe Mussi explained, one particular area of interest is in transport data. The community is settling around using Google's General Transit Feed but its use is subject to feedback and review across French public sector users and is seen as a 'viral format' that enables experimentation rather than a formal standard that can be hard to understand and implement.

In Spain's northern city and resort of Gijón, a rich set of data is made available through APIs. These allow mobile applications to be built but it's the same data and the same APIs that are used in the message boards at transport stops. Low cost displays can be installed anywhere – and have been in shops, restaurants and hospitals – built with a Raspberry Pi attached to an old monitor. Martin Alvarez-Espinar of CTIC reported that the Gijón city authority calculates that the initiative contributes an annual saving to the city of €0.8M/year.

Transparency and Anti Corruption

The workshop heard about several examples of public sector information being made available explicitly as a transparency measure. In one case, Slovenia, this is done explicitly as an anti-corruption measure. The Supervizor platform matches government spending with contract data and the company register. As Mateja Prešern of the Ministry of the Interior and Public Administration and Gašper Žejn of the Commission for the Prevention of Corruption reported, citizens naturally look at public expenditure in their own area. Although personal names are removed from transactional data, it was easy to spot a case where a school was giving a lot of work to a company that locals knew to be owned by the school head teacher's wife.

Screenshot of Supervizor tool showing links to business register, company subsidiaries and transactions
Some of the components of the Supervizor tool

Users are given advice on how to use the system and get the most out of it and one advantage offered by running Supervizor is that all the data has improved in quality through being looked at by many eyes. The server's logging function has been switched off due to the success of the system meaning that the logs were taking up too much space on the hard drive.

For many attendees, the Share-PSI 2.0 workshop in Samos was the first opportunity to learn about open data initiatives in Albania. Julia Hoxha presented the work of the Albanian Institute for Science, one of whose projects is Open Data Albania. The project offers data, tools and visualisations aimed at media, civic society organisations, academia and Web activists, all acting as channels to reach the real audience which is the citizenry.

The project is proving effective with journalists referring to the data and visualisations in articles and, importantly, politicians recognising its importance. The Prime Minister was as surprised as anyone else to find that so much was being spent on his car repairs (he had it looked into), and just why was so much being spent on hiring chairs in the municipality of Kavaja? As with Supervizor, transparency through open data in Albania has lead directly to reducing corruption and thereby increasing public sector efficiency.

Other examples of open data efforts aimed specifically at transparency include the OpenCoesione portal in Italy that monitors the spending of €75bn across 766 projects. Lorenzo Canova of the Politecno di Torino highlighted that, as with so many open data projects, an important result of opening the data is that errors can be spotted and corrected. For example, OpenCoesione found a remarkable number of projects funded to the tune of €1!

Transparency is the driver behind the development of the police.uk service too. As noted by Minister Christofilopoulou in her opening remarks, to open is to trust – and who needs the public's trust more than the police? England has 43 separate police forces that, when they first began making their crime data available in 2008, did so in an uncoordinated way. The single system introduced in late 2009 greatly increased efficiency and means that the monthly updates can be correlated with quarterly data from the Office of National Statistics, a process that leads to improved data quality. Users of the system also help to improve quality as people have reported crime often check that their report has been included.

Diagram shows how actual location of crime is 'snapped' to an anonymous location nearby
The snap point system used by police.uk to protect anonymity

Reporting crime presents a specific problem: greater granularity risks infringing personal privacy. Amanda Smith from the Open Data Institute described the adopted solution which is to generalise the location of each crime so that it could refer to any of at least 12 addresses, putting the point on the map in the middle of a road, not on one side or the other.

The impact of the police.uk site is an improved public perception of the Police and there is plenty of room for future expansion of the system. In particular, linking reports of crimes through to information about how it was followed up and, ultimately, to court cases and convictions.

Fire crew commanders making on the spot decision sbased on paper and pen
Fire crew commanders have to make decision on the spot with whatever data and experience they have

Bart van Leeuwen of Netage and the Amsterdam Brandweer (Fire Service) has use cases and anecdotes for data sharing that generally trump all others at these events: saving lives. His work on using Linked Data technology to aid emergency response teams is driven by a fear that all the data that would help the commanders to make the right decisions exists but is unavailable when and where it is most needed. All the relevant data has been available for a long time but has been expensive. Now it is either free or much cheaper but a lack of good APIs means it's still not readily accessible.

On the spot commanders have to make decisions very quickly, often facing real situations that are far more complex than a desk-based analysis would predict. When things go wrong, as they inevitably do, investigations that follow can go on for months so timely access to real data is a critical need if emergencies are to be handled efficiently. In that regard, Open Street Map is usually far more accurate than official maps.

Statistics

The workshop heard about three projects that are using Linked Data technologies, specifically the Data Cube Vocabulary, to offer advanced statistical data tools.

Diagram showing the dfata sources, the LOD converters, the Serbian CKAN and EU data portals as destinations
The data flows from the Serbian Statistical Office case study

Valentina Janev of the Institute Mihajlo Pupin described using the LOD2 technology stack to align code lists used by several statistical datasets and to create the Statistical Workbench: an integrated set of professional tools for accessing, manipulating, exploring and publishing statistical data. Such data can then be visualised using tools like the CubeViz RDF Data Cube Browser or mapped using tools like ESTA-LD under development in the GeoKnow project. In the case of the Statistical Office of the Republic of Serbia, the data can be published on the Serbian data portal.

George Papastefanatos of the Institute for the Management of Information Systems Research Centre “Athena" presented similar work being done in Greece. Again, disparate data sources are converted to Linked Data using standard vocabularies that enable different statistics to be queried at once, providing answers to questions that can only come from multiple sources. The Linked-Statistics.gr service offers advice for non-specialists on how to use the system and to link to specific data points.

Sarven Capadisli's work in this same area is extensive. Taking data from a wide variety of sources, which is increasingly available as SDMX-ML, he uses the Linked SDMX tooling to process the data. He goes further, however, and adds in an important provenance layer as well as an easy to use interface, like the Greek example, designed for use by non-specialists such as journalists. The service, 270a Linked Dataspaces, also offers regression analysis and more so that it's possible to link to specific statistical analysis that also shows the provenance of the data.

The various projects focussed on statistics do not of themselves increase efficiency within government. However, it provides easy to use, easy to visualise and easy to reference data points that lie at the heart of public sector decision making.

Bar Camp

The workshop ended with 10 participants all suggesting topics for further discussion. After a little negotiation and voting with feet, these 10 became 4 discussions that explored specific areas in a little more detail.

As mentioned earlier, the subject of location is important and there was a bar camp session looking at the overlap between open data and location data. Although many standards exist, and Open Street Map and GeoNames are among the most well known open data initiatives, more needs to be done to make it easier to link to locations.

The themes discussed by many of the government bodies represented at the Samos workshop were discussed further in another session. The need for political support, the need to show a successful return and the need to show external demand for open data all need to be addressed in a strategy.

PwC's Michiel De Keyzer lead a discussion around the prioritisation of datasets for publication. This built on work he's done under the ISA Programme on helping public authorities to identify the highest value datasets. The problem is a classic chicken and egg: if you ask people what data they want you will get a very limited response. Publish the data and then people will have ideas what to do with it – but how do you know what to publish first? In the UK, data around schools, planning(construction), licensing and location are most in demand, but that may not be the case everywhere. A thought provoking question raised in the session was whether some 'low value' datasets may actually be of the highest value to disadvantaged people. How would you assess that?

The discussions around the Austrian and Spanish approaches to metadata were the basis of a further bar camp session. It was argued that since metadata provision is cumbersome, and different people will describe the same thing in different ways, more of it should be automatically generated from the data itself. Data portals should be able to return not just data but visualisations of data.

Metadata provision can be coordinated in a top down manner to facilitate federation, as exemplified in Spain, or bottom up. Both approaches will have their advantages analogous perhaps to the case presented in the Czech Republic.

Conclusions

The Share-PSI 2.0 sessions at the Samos Summit formed less than two days' discussion but could easily have filled a whole week. The partners themselves represent a significant body of people engaged directly in curating, publishing and sharing data, as well as analysing the impact of such actions. The benefits of moving to a situation where individual organisations, be that public or private, manage their own data and share it so that others can just use it without replicating any of that management process provides real efficiency savings. Achieving this, however, requires a number of elements to be in place.

  • First and foremost, there needs to be a strategy that coordinates the efforts of multiple agencies.
  • The strategy needs the support of senior officials who are empowered to provide top down authority where required.
  • Local action is, however required. Internal processes in multiple agencies will need to be applied to meet a common goal but the local aspect is important for success.
  • Benefits accrue in different ways: through improved efficiency, improved effectiveness at fulfilment of the public task, and through greater trust brought about through greater transparency.
  • The most successful examples of data reusage tend to be around transport, spending tied to contracts and company registers, location and statistics. Among these, more work needs to be done to improve the representation of location in public sector information. Efforts to improve the interoperability and visualisation of statistics are impressive.
  • In terms of organisation, the bar camp sessions – i.e. facilitated discussions around a particular topic – proved very popular and this will affect the nature of the next Share-PSI 2.0 workshop in Lisbon in December.