Agenda: Open Data Priorities and Engagement — Identifying data sets for publication

The agenda is more or less stable but all session details are subject to modification.

Please use the Hashtag #sharepsi

Monday 16th March

- Coffee, Registration and Networking

Get your badge, get a coffee, work out how you're going to spend the next day and a half.

- Welcome

Aula Magna

Introduced by: Dana Petcu, West University, Timişoara. [notes] scribes: Daniel & Jan

  • Welcome: Prof. Univ. Dr. Marilen Pirtea, Rector of West University of Timişoara
  • Radu Puchiu, Secretary of State, Chancellery of the Prime-Minister (Romania)
  • Experiences of identifying datasets for sharing, Benedikt Kotmel, Ministry of Finance (Czech Republic) [slides]
  • Capturing Best Practices, Chris Harding, The Open Group (Chris will outline what we need to capture from each session) [slides]

- Coffee

- Parallel Sessions A

Come To My Session! Don't know which parallel session to go to? Come to Aula Magna to hear each facilitator describe his/her session in 60 seconds. Don't be late or you'll miss it!

Share-PSI 2.0 Track
Aula Magna

Site scraping techniques to identify and showcase information in closed formats - How do organisations find out what they already publish?

Facilitator: Peter Winstanley, Scottish Government [paper]. Scribe: Benedikt [notes]

This session addresses the question of how organisations that already publish considerable amounts of information on their website but in non-interoperable formats such as Excel and PDF might ‘discover’ what they are publishing and present it in various helpful ways to end users (including the organisation’s own staff) as part of the engagement to discover priorities for open data publication. Illustrations from site scraping of Scottish Government and NHS Scotland will be presented.

Many government and public sector bodies already publish a considerable amount of information including data and reports on their websites. However, this is frequently done under a distributed management process and using content management systems, both of which tend to militate against being able to present in a quick and flexible way the assets of any particular publication format. As a consequence, organisations might find it challenging to know where to start when establishing a programme of converting existing information resources that are not in open formats (1-2 stars on the 5 star model) to more open formats.

Open Track

The Electronic Public Procurement System, open data and story telling in Romania

Facilitator: Valentina Dimulescu, Romanian Academic Society [paper]
Scribe: Johann [notes]

Gaining access and managing public procurement information in Romania by third parties is a strenuous activity. Although the Romanian Government created the online portal under the European Open Data initiative - Digital Agenda for Europe, which includes a section on public procurement, upon accessing public datasets the question arises whether the information provided are affected by human error or malice.

After continuous failed attempts to acquire a database containing complete information on various types of public procurement contracts, The Romanian Academic Society (RAS), a Romanian think tank, concluded that the only way to get systematic access to this type of data is to connect directly to the Romanian Public Procurement Electronic System's (SEAP) server so as to copy the available information. The authors have encountered two major challenges:

  1. to assemble all the data in a consistent database;
  2. matching the errata notices to award notices.

Both information collected directly from SEAP and those from CSV files provided by the Government under open data rules contain obvious errors which refer to, among others, the economic agent's country field or absurdly low or high prices. The only manner in which these errors can be corrected is to connect the so-called "errata notices" to its respective award notice. SEAP errata notices containing modifications of errors have not been applied to public procurement open data.

The authors recommend that the newly envisaged online public procurement system (SICAP), financed through European Union development funds, should assume an export module and incorporate standardized errata information in order to correctly export the data base. Still, the publication of this specific Public Sector Information is sensitive, as many corruption cases arise from public procurement contracts.

Open Track

Free Our Maps

Facilitator: Vasile Crăciunescu, Codrina Maria Ilie, Technical University of Civil Engineering Bucharest [paper] [briefing]. Scribe: Ingo [notes]

Our session is dedicated to the importance of releasing public geodata over the Internet, under an open license and in a reusable format. Geodata is a broad term that refers to data that has a spatial component, defined through various methods, such as pairs of coordinates, name of location, address identifiers and so on. Its usage is wide spread over various domains. Even though world leading business, such as Google, Yahoo, Nokia, Apple and more, have developed services and products that have ultimately and permanently changed the way in which geodata is perceived by the wider community, such as Google Maps; even though the community itself stepped up, building an international network that, in a collaborative, volunteer and open manner, continuously works to build an open map of the world, OpenStreetMap, we do consider that, that there is an immense untapped resource of geospatial information.

That resource is represented by the databases of national agencies and institutions that have produced and collected data within national monitoring networks and research projects for an extensive period of time. For the society, to harvest in the most productive way the benefits of open public geodata, some matters need to be discussed, such as: quality and relevance, different angles of open geodata understanding: public sector, private sector and academia, bridging community driven data with public data, the impact of INSPIRE Directive to open geodata movement, geodata licenses interoperability and technical issues on releasing public geodata as open data.

- Parallel Session A Reports

Aula Magna

Brief (3 minute) summaries from each session, focusing on three questions:

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

And the best practices discussed.

- Welcome Reception

Mezzanine Lobby

End of Day 1

Tuesday 17th March

- Plenary Session

Aula Magna

Chair: Heather Broomfield, Difi

Scribes: Martin, Valentina, Phil [notes]
25 minutes including Q&A per speaker

  • Jacek Wolszczak, Ministry of Administration and Digitization (Poland) [slides]
  • Branislav Dobrosavljevic, Business Registers Agency (Serbia) [slides]
  • Szymon Lewandowski, European Commission [slides]
  • Good practices for identifying high value datasets and engaging with re-users: the case of public tendering data, Nicolas Loozen, PwC EU Services [paper] [slides]

- Parallel Sessions B

Come To My Session! Don't know which parallel session to go to? Come to Aula Magna to hear each facilitator describe his/her session in 60 seconds. Don't be late or you'll miss it!

- Coffee

- Parallel Sessions B

Share-PSI Track
Aula Magna

How good is good enough?
A common language for quality?

Facilitator: Makx Dekkers, AMI Consult [paper]. Scribe: Peter W [notes]

This session will look at the requirements and possible solutions for defining, measuring, expressing and communicating quality of published Public Sector Information. It is the intention that the outcome of the session will be submitted to W3C’s Data on the Web Best Practices Working Group to inform the development of the Data Quality vocabulary.

There is a lot of talk about the need to publish “high-quality” PSI. While it is certainly important that data has sufficient quality to make it useful and usable for users and re-users, we currently lack a common or standard way to express what the quality of data is. The question is whether it is necessary to have such a common way, and, if so, what a “quality vocabulary” could look like.

Open Track

The European Database Directive

Facilitator: Freyja van den Boom, KU Leuven [paper]. Scribe: PhilA [notes]

The European Database Directive is the key legal instrument when dealing with various databases of open scientific and raw data. This Directive harmonises the treatment of databases under copyright law and creates a new sui generis right for the creators of databases which do not qualify for copyright.

According to Article 3 of the Database Directive, for a database to receive legal protection, it must be ‘original’, i.e. the author’s ‘own intellectual creation’ by reason of the selection or arrangement of the contents.1 This level of ‘originality’ is the same as in Article 1 (3) of the Software Directive and Article 6 of the Terms of Protection Directive. Considerable variety exists in the national Courts’ approaches to the requirement of originality. Whether collections of scientific research data will meet the criterion of ‘originality’ is a question that will be dealt with on a case-by-case basis. It depends on the interpretation of each national Court.

If the database qualifies for copyright protection under the Directive, the copyright-holder will hold ‘exclusive rights’ in respect to that data.

Article 5 of the Directive enumerates those ‘exclusive rights’. The author of the work shall have the exclusive right to carry out or authorise:

  1. Temporary or permanent reproduction by any means, in any form, in whole or in part;
  2. Rights of adaptation, translation, arrangement and any other alteration;
  3. Any form of distribution to the public of the database or of copies thereof (subject to Community exhaustion); and
  4. Any communication to the public, display or performance to the public;
  5. Any reproduction, distribution, communication, display or performance to the public of the results of the acts referred to in (b).

In all Member States of the Union, an exception exists for “all acts, which are necessary to obtain access to the contents of the database and to obtain normal use of the contents by the lawful user”. This also applies to a part of the contents of the database.

Member States are also free to apply four exhaustive other exceptions to the ‘exclusive rights’ listed above. The possible exceptions are listed in article 6 (2):

  • Reproduction for private purposes of a non-electronic database;
  • Illustrative uses for teaching or scientific purposes as long as there is proper attribution and justification for this purpose;
  • Public security, administrative or judicial procedure; and
  • Other exceptions traditionally authorised in the Member State.

Note that unauthorized copying for private purposes is not permitted for digital databases.

How has the SGRDdirective been implemented in the different member states: share experiences.

How do these national differences affect the ability to (crossborder) re-use PSI

What would be best practices with respect to licensing and disclaimers?

These are some of the questions I hope to address during this session.

Open Track

Role of Open Data in Research Institutions with International Significance

Facilitator: Tamás Gyulai, Regional Innovation Agency [paper]
Scribe: Andras [notes]

Szeged has been known in Hungary as a central location for open software development and utilisation: the municipality of Szeged was among the first town administrations in Hungary to use open software in large applications and the University of Szeged is an acknowledged development center of open software solutions in Hungary.

The Regional Innovation Agency (RIA) of the South Great Plain region has been a promoter of innovation in the key thematic areas of the region, including IT development, as well. Several cluster initiatives have been implemented in the course of the years in cooperation with partners in the neighbouring regions, including also Timisoara and especially Tehimpuls Association and its professional partners. One of the most successful initiatives was the Cluster2Success project where Romanian and Hungarian IT companies met several times with the objective to work out new innovative solutions together.

One of the actual challenges that the activities of the RIA have focus on is the development of IT background of the Extreme Light Infrastructure (ELI) project as it will be a key infrastructure for research and development not only in Szeged but also in the wider region. The experiments that the researchers will conduct at the ELI facility will produce data in enormous quantities that shall be analysed and processed by international teams of researchers therefore the newest and most advanced „big data” software and hardware solutions shall be used here.

As the ELI is co-financed by the European Union, the research facility will be a public institution and therefore they shall have an open policy of information. On the other hand, some of the experiments might lead to patented inventions therefore the management of information about the research activites shall respect also the intellectual property rigths (IPR) considerations, as well.

The main local stakeholders in Szeged are all committed to the successful implementation and operation of the ELI as a key element of the scientific and economic life in the town. The cooperation among them shall be extended also to the sharing of data with the objective to make an open system that is accessible also to foreign partners. Consequently, the Share-PSI workhop can be an excellent event for meeting professional people that have already been confronted with similar challenges. It might lead to common solutions that can be designed and can be tested later in real life in Szeged.


Linguistic Linked Data as a bridge to reach a global audience

Presentation: Asunción Gómez-Pérez, Universidad Politécnica de Madrid and coordinador of LIDER project
Scribe: Felix [notes]

This presentation will introduce the notion of Linguistic Linked Data (LLD): linked data sets that can play a crucial role in making data on the Web multilingual. LLD can help PSI providers to engage directly with users around the world.

We will discuss what LLD data sets are already available, which ones should have a high priority for you, and what needs to happen to make your data multilingual.

- Parallel Session B Reports

Aula Magna

Brief (3 minute) summaries from each session, focusing on three questions:

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

And the best practices discussed.

- Lunch

- Parallel Sessions C

Come To My Session! Don't know which parallel session to go to? Come to Aula Magna to hear each facilitator describe his/her session in 60 seconds. Don't be late or you'll miss it!

Share-PSI Track
Aula Magna

Crowd sourcing alternatives to government data – how should governments respond?

Facilitator: Peter Krantz [paper]. Scribe: Jens [notes]

This session addresses the question of how public authorities can/should respond to community efforts to crowd source data that replicates official data that is not open (e.g. post code and address data). The session will start with a brief case study of how crowd sourcing initiatives of post code and address data in Sweden evolved, and the response by agencies. The session will also touch on data quality aspects of crowd sourcing initiatives.

In many areas governments have a monopoly on high quality PSI, typically by regulation for its creation, maintenance and distribution. For types of data that are used in many scenarios, e.g. geodata, there may be a sufficient number of potential users that are excluded by expensive access to government data. In these areas crowd sourcing initiatives may be able to create alternative datasets that compete with those provided by governments. There are already several initiatives, e.g. OpenStreetMap, that are good enough to make even large companies stop buying government data. The outcome of these initiatives may disturb the market for government data while at the same time contribute to lower quality services given the data is not of the same quality as that from government agencies. Governments need to find a way to deal with these issues in a way that serves society, but responses typically include legal action.

Share-PSI Track

Raising awareness and engaging citizens in re-using PSI

Facilitator: Daniel Pop, West University of Timisoara, Yannis Charalabidis, University of the Aegean [paper]. Scribe: André [notes]

Governments have been investing in publishing considerable amount of data and in modernisation of administration through e-Government services for, in some cases, more than 5 years. A legitimate question is “What is the impact on citizens, or more generally speaking, in reusing available electronic data and service?” There are different actions, initiatives, platforms that can be used to raise citizens (reusers) awareness on existing PSI and engage them in usage.

For example, open data hackathons are a widely spread tool to raise awareness on data published in* portals. Semantically enriched platforms, such as ENGAGE, enables not only reusage but feedback collection as well. What other alternative for raising awareness on open data repositories have you been using in your case? What are preferred feedback channels (e.g. social media) in your case?

Public Sector Advertising is also frequently used to raise awareness and engage citizens in re-using PSI. For example, local networks of (interactive) devices (public displays, Smart TVs, Info kiosks etc.) have been deployed by local/regional governments to cover ‘hot points’ at city/region level. Information on these networks is either managed by means of on-premises (locally installed) software packages or they can be operated by Cloud based, Web-enabled platforms, such as SEED.

We are planning to start our discussion by sharing our experiences and outcomes emerged out of two EC-funded projects (ENGAGE and SEED) and we’ll be happy to hear from you what initiatives did you put in place and how these levelled your expectations.

The session will address the following questions:

  1. How can public bodies engage the potential reusers of their data and/or services? Methodologies, channels, technical platforms have been used?
  2. What methods are available to reusers of your data to send feedback about published data?
  3. How do you handle feedback received so as to improve your data?

Share-PSI Track

How benchmarking tools can stimulate government departments to open up their data

Facilitator: Emma Beer, Open Knowledge, Martin Alvarez, ePSI Platform Advisory Board [paper]. Scribe: Heather [slides] [notes]

Open Knowledge published the 2014 Global Open Data Index which shows that whilst there has been some progress, most governments are still not providing key information in an accessible form to their citizens and businesses. With recent estimates from McKinsey and others putting the potential benefits of open data at over $1 trillion, slow progress risks the loss of a major opportunity.

The Index ranks countries based on the availability and accessibility of information in ten key areas, including government spending, election results, transport timetables, and pollution levels. The UK topped the 2014 Index retaining its pole position with an overall score of 96%, closely followed by Denmark and then France at number 3 up from 12th last year.

Francis Maude, Minister for the UK Cabinet Office and responsible for the UK open data agenda, said:

We have called for people to hold our feet to the fire and the Open Data Index is a great tool for doing just that.

In this session the project manager of the Index for 2014 will share some of the successes in stimulating governments’ to take action to open up further datasets.

After the discussion, and in order to complement this session, Martin Alvarez (Advisory Board at ePSI Platform) will introduce the PSI Scoreboard. This scoreboard is a ‘crowdsourced’ web tool published on ePSI Platform used by the European Commission as reference for their metrics. This is ‘yet another index’ to measure the status of Open Data and PSI re-use throughout the EU. It does NOT monitor government policies, but aims to assess the overall PSI re-use situation in the EU28, including the open data community's activities.

This PSI Scoreboard could be enhanced including new indicators, one of them could be the level of openness published by the Global Open Data Index in each specific country. Attendees can decide if this is interesting or not, as well as the feasible technical mechanisms to do it (automatically).


Your requirements for reaching a global audience with PSI data

Facilitator: Asunción Gómez-Pérez, Universidad Politécnica de Madrid and coordinador of LIDER project

In this session we will discuss a goal everybody has: your data wants to reach a global audience. If prioritization of data sets takes the current state into account, this aim fails: most of PSI data sets are monolingual.

The aim of this session is to understand your priorities: what data sets do you want to be multilingual? What (technical, organizational, other) obstacles do you see in achieving multilingual data sets? What business value and usage scenarios are of high priority for you that would benefit from multilingual PSI?

The outcome of this session will feed directly into activities of the LIDER project, which is building a community around linguistic linked data - an important ingredient for making your data multilingual.

- Coffee

- Parallel Session C Reports

Aula Magna

Brief (3 minute) summaries from each session, focusing on three questions:

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

And the best practices discussed.

- Bar Camp

Aula Magna (rooms A01 & 103 are available for the sessions)

Time keeper: Noël Van Herreweghe

Pitch your idea for an afternoon session in 60 seconds or less, then take your group to an available space. Remember to appoint a scribe. Please let Phil Archer know the title of your session as soon as convenient.

  1. Robert Ulrich, - making research data repositories discoverable [notes]
  2. The Pan European Data Portal - Early Wireframes, Philip Millard (via Skype) and Jens Klessmann [notes]
  3. The Critical Success Factors Taxonomy for Open Data, Yannis Charalabidis
  4. Government as a developer (to identify and open data), André Lapa
  5. What do you want from the W3C Data on the Web Best Practices? Phil Archer [notes]

- Bar Camp Reports

Aula Magna

Brief (3 minute) summaries from each session, focusing on three questions:

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

And the best practices discussed.

End of Day 2