Warning:
This wiki has been archived and is now read-only.

Lisbon/Scribe

From Share-PSI EC Project
Jump to: navigation, search

Contents

Tuesday 2nd December 2014 (Project Meeting before Workshop)

Project Meeting Session 1: 14:00 - 15:00

  • Scribe: Benedikt Kämpgen
  • >200 people expected to show up tomorrow.
  • Name game

General

  • Question: Anyone problem with having taken pictures of him/her? - Noone.
  • Make sure to sign participant sheet
  • Make sure to get Grant Agreement from Philippe

Orientation for tomorrow

  • André Lapa:
  • Shows agenda.
  • There will be a printed agenda for each participant.
  • Some part will be in another building, but not far.
  • Yet: not everything carved in stone
  • There is a Portuguese parallel track

Rooms

  • Having a look at the rooms
  • Room 3 is not available (Phil will be updating the website regardingly)
  • room 4,5,6 here in the aisle
  • registering/logistics questions can be put to the friendly people at the registration (AMA)
  • Coffee breaks will be in front of the main room.
  • The Anfiteatro is the biggest room.
  • More rooms available for breakout sessions.

To note during sessions tomorrow

  • Session leaders will be identifiable
  • Session leaders (or of their peers) shall make a 60sec announcement just before any session.
  • For the 60sec, you can use tool by Phil: http://philarcher.org/tools/countdown/
  • Petar: Announcement should concentrate on: where is it? what is the session about? Why should you come to the session?
  • Remember: check where your session is
  • Please leave you room how you found it (e.g., if you take out chairs)
  • Timing
    • Paulo Neves is expected to speak for 20min.
    • Time has to be allowed for switching from session to session.
  • Essential: Every session needs to have a scribe.
  • Deciding about scribes at the sessions
    • Have a look through the agenda: Choose sessions, look whether they have a scribe and if not, please tell Phil that you scribe there. *done during session*
    • Goal of scribe during each session: We need to be able to include descriptions of each session in reports. Other people should have a fair idea of what has been happening at the session.
    • Noel will be scribe at the session of Amanda.
    • Deidre
    • Harris - Plenary
    • LAPSI - Jan
    • ...

Project Meeting Session 2: 15:00 - 16:00

  • Scribe: Deidre Lee
  • Scribe: Deirdre Lee
  • Phil assigns the scribe for each of the sessions

BarCamp

PhilA: If anyone is pitching a barcamp,please tellPhil so he can add it to the agenda PeterW: Idea, who are we addressing to, gov or civil society PhilA: There are two whole barcamps on Thurs eve Noel: will we tell participants about the barcamp at kick-off workshop? PhilA: will mention it at the kick-off session tomorrow morning

Introductions

Heike from Birmingham City Council, Simon will come later Pekka from Helsinki Gabriele from Italian Gov Valentina from Belgrade, Serbia Anrej fromStachi, Budapest Steinar from Oslo, Norway Grant Agreement and Consortium Agreement Philippe: Sort out some stuff before the review Grant Agreement is finally settled and Consortium Agreement is nearly settled Grant Agreement is between consortium and EU. There is one copy of Grant Agreement for each of us to take back and sign. Consortium Agreement is between all participants, not the EU, we have a near final version now, Philippe has 19 scanned copies, but he needs the original signed pages from all of us. Will close this process in a couple of months, they need 40 original, signed version.

Dinner

Andre announced dinner arrangements for tomorrow (Wed) night Details here https://www.w3.org/2013/share-psi/wiki/Lisbon/Social#Wednesday_AMA_Dinner

Innovation Survey

PhilA: Project Officer wants us to discuss innovation survey Philippe: Not clear what’s understood by questionnaire, should it be completed by project participants or reviewers? It’s 3-4 pages, to describe innovation angles of project. Muriel: Innovation is something that wasn’t here before, but will be here after Under development of best practices and under development of networking Jens: Networking is social innovation PeterW: Name game is important element of innovation, In ICEGOV, everyone went home and won’t speak to each other after. Compared to innovation in terms of social networking. Benedikt: last workshop was on Open Data and Innovation PhilA: thanks for important information, we’ll get reviewers to fill this in. … we are meant to be developing technical standards, look at that the topics we want to discuss and none of them are technical and this is a problem. But the vehicles are W3C tech groups (DWBP and OGC WGs). How do we get out these non-technical outputs of the group. … Also need to talk about Timisoara, Krems and final phase of meeting.

Admin Update

Philippe: now has a few more consortium agreements. Before you leave, Philippe will give us our copy of Grant Agreement The project does not fill in the questionnaire, the panelwill do that, but the consortium should help them figure out what are the best answers to the questions.

New Attendees

Jochann from Austria Neven from Zagreb, Jose-Luis and Pedro from ULL, Tenerife PhilA: Last count 230 people are coming

Capturing Group's Outputs

Task: we have to contribute to standards, there is a group W3C DWBP WG, idea is that we feedthem with bits of information, however they are separate, they are not bound to take what we say. But we are bound with reviewing their content. The scope of this group is enormous. So we wrote a Use-case and Requirement document, with 33 use-cases from a variety of domains. UCR document, the SharePSI could review the requirements Policy is out of scope for W3C W3C DWBP WG only cares about: 1. Is it unique to publishing data on the Web? 2. Does it encourage reuse or publication of data on the Web? 3. Is it testable? A lot of use-cases from Samos are out of scope, because they don’t comply to this criteria. But there is one SharePSI use-case with an overview of all requirements that came from SharePSI Samos papers. PhilA: Shows an example W3C DWBP Best Practice template But this is only a techie output, how do we preserve all of the rest of the outputs of this group? In many

Localised Guides

Noel: opendataforum.info Strategy for Flanders: Strategic, content, legal, technical Got the inspiration from Samos workshop and W3C 10,000-20,000 unique visitors. Going to translate into English

W3C DWBP WG

PhilA: If we have a national Open Data strategy, refer to the SharePSI group. Refer to reports, use-cases. That’s how we have recognition of the SharePSI work. How do you think we as a group should coordinate this? PhilA would like to propose we have a meeting in the last 6 months. We need to carry out our localised guide in our local language. Deirdre will write to you to have a look at the DWBP requirements and agree if technically they are the things.

Capturing non-technical outputs from group

We need some way to capture non-technical best practices PeterW: what we need to do is to be developing pattern language. What is the Design/Best Practice? We should have worst practices. When these best practices are put into play what are the other problem spaces that are then faced with? You can also begin to programmatically write best practices in a consistent way. Computational dependent way as well as easy text outputs. We’ll then have a standard vocabulary for discussing elements of domain. Ingo: How do we handle multilingualism if we have a common vocabulary for these localised guide? PeterW: Can have language labels, there are same problems across many countries.

Project Meeting Session 3: 16:00 - 17:00

Scribe: Ingo

232 registered

admin aspects before review

grant agreement provided

consortium agreement delivered

SharePSI contribution to standards

We should feed W3C group: "Data on the Web Best Practices <http://w3c.github.io/dwbp/usecasesv1.html>"

We are bound to review what is developed by this group

Collecting use cases (~30) to derive requirements (~40), which are now in focus

Iterative process

SharePSI should check the requirements: anything missing? wording unambiguous? Do you agree on requirements?

Localized guides: Remember that local best practices need to be produced during the last months of the project.

Those reports can use all material from SharePSI

Coordinating this work? Do we need to get together (during the last six months) to make sure we have a coordinated set of localized reports at the end?

W3C only handles only data on the Web

OGC handles on data with geospatial content

Discussion on how to improve the Best practices documents:

We need a way to capture non-technical best practices

we need design patterns (or pattern language) to describe our best practices; patterns help to improve reports!

patterns should allow computational processing

vocabularies are important, should handle multi-lingual aspects

LAPSI runs two thematic networks - handle legal aspects (e.g. copyrights etc.), still needs to be considered by us!

Original three questions to identify Best practices that should be captured and described:

Is the Best practice relevant to PSI? What requirement does it solve/meet? What is the original problem?

Does it encourage reuse or publication of PSI?

Is it testable? Where and how is it being implemented? Is it replicable?

New three questions that should be answered by each session lead at the end: (from a data publisher perspective/data consumer):

What X is the thing that should be done to publish/reuse PSI?

Why does X facilitate the publication/reuse of PSI?

How can one achieve X and how can you measure it?

At the end of each session, there should be an understanding of Best practices relevant to topic under discussion. Make sure our work here produces some valuable outcome for future best practice reports on open data!

Scribe: Daniel

Daniel Pop presents the 3rd workshop, Timisoara (topics, aims, venue).

Discussions on Timisoara Agenda preparation: should we start Monday morning or afternoon (Andras)?

Some (Andras, Johann, Max, Benedikt etc) prefer Monday noon .Phil is trying to reach a consensus on the Agenda, so that no strong objections are raised. For this, he proposes to make use of the Monday evening for having Open Data speed dating, or similar event. Following agenda is proposed:

Timisoara Dates & Times

Monday, 16 March 2015

  • Project meeting in the morning
  • Lunch (provided by UVT)
  • Workshop starts after lunch (14:00)
  • Evening: social event

Tuesday, 17 March 2015

  • Workshop full day

Wednesday, 18 March 2015

  • Project meeting in the morning
  • Everything ends at noon.

Coffee breaks to be provided.

No other alternatives to above agenda were proposed. In the end, no strong objections to above agenda, that was accepted by all participants.

Johann (DUK) gives a short presentation about the 4th workshop, Krems:

  • Dates: 21-22 may 2015; on 20 may project meeting
  • Topic: A self-sustained business model for open data
  • Co-located with CeDEM conference
  • Details about the Venue
  • 'Writing session' proposed to take place in Krems; no decision taken
  • Draft agenda presented
  • Discussions about project meeting on Saturday morning; participants votes;

Final agenda for Krems:

Wednesday, 20 May 2015

  • Project meeting in the evening

Thursday, 21 May 2015

  • Workshop full day

Friday, 22 May 2015

  • Workshop full day

Saturday, 23 May 2015

  • Project meeting in the morning

Next, Philippe distributes signed copies of Annex III of GA.

Jens (FOKUS) briefly presents the 5th workshop in Berlin nov/dec 2015

Discussions on the dates of the workshop; some participants announces constraints for the dates

If in November, it may be co-located with the 2nd period project review (Philippe)

Final decision, 5th Workshop will take place in the last week of November, 24 - 27 November, 2015; no objections to the period were raised.

Ideas for topics (Jens)

German participation, co-located events ideas presented by Jens

Decision is to finalize the topics for Berlin during Timisoara workshop.

Wednesday 3rd December 2014 (Workshop Day 1)

Welcome: 09:00 - 09:25

Phil welcomes Paulo Neves, AMA President

Paulo starts with presenting his organisation AMA (Under the Presidency of the Council of ministers, AMA is the public body responsible for the national public services delivery strategy, for administrative simplification and for e-Government in Portugal) and its main objective for egovernment. “It is a great honour to have you all here. Open data is everywhere and all want to be part of this evolution”.

“It is expected that open data will have great impact on economic, political and research level.” He also said that Open data platform for Portugal has been already launched and as fact is that Institutions gave their data very quickly, which was something that amazed them. The willingness of institutions. They are not sure about the number of developed apps and they need to perform impact studies for getting insights and design the next steps. 4 discrete messages from Paul were highlighted:

  1. It is not sufficient to make available just data – you have to make communities around them for using and re-using them. That is the way to know which data are more valuable for publishing. In other words it will be a proof of which information is more essential to be opened.
  2. Guarantee the quality of the published. Maintain the quality of data is a difficult task, but it has to be ensured in order to create value from it.
  3. Opening, making available and publishing data has a low priority level among politicians, since they have to deal with problems that are crucial for citizens living.
  4. The objective for government is to provide OD by default as part of their daily process.

Their participation in SHARE-PSI 2.0 workshop highlights the government’s target: mobilisation of national resources, which are already present to this event. Coming back to Phil,

  • he said about speakers translating from Portuguese to English and vice versa.
  • he presented himself and his organisation W3C scope and objectives
  • continued with presenting the scope and objectives of SHARE-PSI 2.0 project.
  • he acknowledged Paul’s main points of speech.
  • explained the structure of the workshop and that the sessions are not about presentations – the main target is the active participation and knowledge sharing among participants.
  • Finally, Phil called the facilitators to explain their sessions in one minute and call for participants for discussion on their topic.
1st round of parallel sessions are included in the programme for Lisbon in the project’s site:

http://www.w3.org/2013/share-psi/workshop/lisbon/agenda

Parallel Sessions A 09:30 - 11:00

Events, hackathons and challenge series - stimulating open data reuse

Facilitators: Amanda Smith, ODI & Simon Whitehouse, Digital Birmingham, Alberto Abella & Emma Beer, OKF.

Scribe: Noel van Herreweghe notes

Summary from Si Whitehouse:

Thanks everyone who attended and contributed towards the session

Looked at experiences of people involved in different types of events or programmes to stimulate commercial development of Open Data.

Most people had been organisers of such events while some had been participants and product developers. Organisers were mainly interested in making the most of their events, with a few having had disappointingly little result from them. There was also a question raised about how much the public sector should be involved in stimulating commercial open data work.

One thing that I took away from the session was that the incentives and motivations for attendees and organisers overlap, but they don’t neessarily co-incide. People might attend your hack to meet fellow developers and exchange ideas without any intention of building something, for example.

Another learning point from Apps For Europe came from Alberto, who said that investors do not care about open data. Do not expect them to consider this when they choose to invest, they will want to know what they are likely to get in return for their investment.


What X is the thing that should be done to publish and reuse PSI?

With the exception of talks, both hackathons and challenge series such as ODCS and Apps For Europe look to increase the publication and reuse of PSI. These demonstrate proof of concept - commercial projects and so demonstrate to people within organisations that open data has a value and so give it internal credibility. Projects will actively work to get data released on behalf of participants.

Why does X facilitate the publication or reuse of PSI?

It does this through practical reuse. We had the example of one participant, the Open Geospatial Consortium, who used hackathons to test the robustness of their newly released open data. Most others were hoping to initiate and support a community around their data.

One example that we had of ALISS, A Local Information Support System that crowdsources community assets. It’s a project that started with a “hack”, but a hack that featured clinicians, people living with long-term conditions and somebody who happened to be able to code.

We had a discussion about how desirable it is to have domain experts and potential users involved in hackathons. It is remarkable that hacks don’t have that strong a link to co-production, and they could benefit from doing so.

How can one achieve X and how can you measure or test it measured?

One measurable learning from Apps For Europe was that creating an environment, in their case the Business Lounges where investors could meet projects, was unlikely to be successful. Feedback from investors was that they have their own methodologies.

Our workshop was about measuring and understanding what sort of results you can expect from different types of intervention. We didn’t draw up the chart that we had intended to, but I’d say that the way to measure results is to state your aims when you start.

So, if you said that you were going to support start ups through your programme, did you? If you’re running a hackathon, then proof of concepts developed or even just number of slices of pizza eaten might do.

LIDER Track - Multilingual PSI data on the Web

Felix Sasaki, Asunción Gómez Pérez

“What”: Technology needs

“Why”: (Commercial B2B or end customer) applications

“How:” challenges

  • Entity detection in context
  • Cross-lingual linking
  • Translation
  • Converting standard non-linked vocabularies to linked data, relying on standard linked data vocabularies
  • Create curated data
  • Aggregation
  • (meta)data cleansing an enriching
  • Language detection
  • Data curation for others
  • Translation a): Guess meaning roughly
  • Translation b): High quality translation
  • Availability of mature tooling
  • Tooling in the right workflow
  • Tool integration (e.g. “hiding SPARQL from users”)
  • Processing of mass data
  • Support of small languages
  • Handling legacy data
  • Precision of tooling
  • Dealing with uncertainty of processing output (e.g. “Is this an entity of class X?”)

What do licenses that promote and don't hinder reuse look like?

Facilitators: Antigoni Trachaliou, National Documentation Center and Leda Bargiotti, PwC EU Services

Scribe: Jo Ellis

Leda Bargiotti outlined the work PwC had done on licensing in terms of a survey and a number of workshops. Crossover with LAPSI 2.0 project's Licensing Guidelines. Problem with open licensing - OGL mentioned though not just limited to that - relating to re-user liability. Specific point relating to third party IP inadvertently released under an open licence. Who is liable - data provider or the re-user? Re-users want a way to be exempt from liability if they re-use openly licensed data that isn't actually open data Example given was UK National Address file which is used in many openly licensed datasets. IP is owned by the Royal Mail, but this is not always made clear - history of the data is unknown both by re-users and data providers in some instances.

  • Responsibility of the data provider to make sure they have the necessary IP rights in third party data.
  • Could this be dealt with by means of a copyright law exception?
  • Good metadata includes source, machine-readable licence - JSON, Open Data Rights Statement Vocabulary as recommended by UK ODI
  • Licences cannot solve all the problems. Good IP management combined with education and raising awareness essential.

Leda's Summary

  1. What could licences do to promote reuse PSI?
    • Limit number of licences, allow commercial reuse,
    • Ensure interoperability between licences
    • Be sure that what you licence as open data does not include third party rights
    • In case there is improper clearance in terms of copyrights, re-users liability should be limited
    • Attach licence to datasets and make licences machine readable as well
  2. Why does licensing facilitate the publication or reuse of PSI?
    • Legal certainty
    • Increase legal interoperability
    • Lower costs
  3. How can one achieve open data licensing and how can you measure or test it?
    • Increase awareness
    • Measure the viability of the licence by looking at its popularity
    • You can test it in Court

FINODEX

Primary notes


18 people attending. Attendance is a mix of people working in research centres and SMEs. Finodex is a European Commission funded project that supports the creation of new businesses based on FIWARE and open data. 2 open calls will be issued to partially fund and support the development of businesses that reuse open data as well as the FIWARE platform funded by the European Commission. An open call has opened and will close on Dec. 19th, 2014. It is expected to support 50 projects over several phases of acceleration. It will provide funding and support to the development of business services, including the connection to investors for instance.

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

X is FINODEX and acceleration programmes for startups reusing open data in a wider sense

1. WHAT

  • Accelerators on open data offering funds and services (training) atract a good number of companies which have never heard before about open data. They are a great tool to estimulate the demand with a commercial use.
  • Supporting the reuse of open data with providing technological infrastructure (as FIWARE in the case of FINODEX) can help start-ups with their hardware needs and therefore lowering infrastructure costs.
  • Also having single access points to different open data catalogues it's really appreciated by SMEs.

2. WHY

  • Because FINODEX and accelerators do have concrete results that can prove the real impact that PSI can have in economy. Extracting conclusions like what are the best/worst sources for commercial reuse, what domains are more capable to add value to PSI, etc. Publishers may arise questions about the impact their PSI portals have. Accelerators can be seen as a source of information to guarantee feedback proccess between reusers and publishers, stimulating the generation of new and better PSI.
  • Counting with working IT platforms can help the work of reusers
  • Lowering the efforts to look for the appropriate data for reuse.

3. HOW

  • Having a number of KPIs to monitor issues such as kind of datasets reused, types of products/services, funds risen from investors...

Plenary Session: 11:30 - 13:00

  • Room: Auditório
  • Facilitatior: Phil Archer, W3C.
  • Scribe: Harris Alexopoulos, University of the Aegean Extra notes

João Vasconcelos, AMA

Why is democracy a better Political system? There are many answers equally in importance (rights, voting, public value, representation, responsibilities, and freedom), but one detail makes all the difference and this is to perform checks and balances. He continued with some assumptions:

  1. Openness means more responsibility – exposure, accountability – but it also means more strength
    • How do our citizens see the public sector? They see it as a complex, difficult to understand and too bureaucratic story to deal with OD.
    • When we have too many LOCKED INSTITUTIONS in terms of data availability – where is the transparency and where is the participation?
    • How can citizens trust their governments if governments don’t trust their citizens? We can use ICT as an enabler to open up our governments.
    • and he continued to mentioning the 3 pillars of Open Government (a) transparency, (b) participation and (c) collaboration
  2. A more open public administration also means a stronger and more focused public administration.
    • Is Open a GEEK stuff? Not really, since 28 billion euros is on OD share and reuse.
  3. Openness is also a matter of economics. An Open government is also a smart government.

What’s happening in portugal? The open data portal have been created http://dados.gov having in mind the already developed solutions for the domain.

A presentation of the portal took place. DADOS.GOV is considered as a “BROKER”: providing SHARED SERVICE TO BE USED BY SEVERAL PLATFORMS/PUBLIC WEBSITES.

  • In line with the Citizen Shops, the Citizen and Business portals are our main online interface
  • The Business Portal is the national point of single contact within the Services Directive
  • We now have around 500 datasets from varied public organizations. Such as the Institute of Registries and Notary, the Weather Institute, the Portuguese Environment Agency, the Lisbon City Council, or the national foundation for scientific computation.

New and upcoming features were presented: A new way of having information about public services ...and continued with other Portuguese Open Initiatives

  • Public software portal
  • Open Standards (Following the approval of a Law and a resolution of the council of Ministers about open standards, the Portuguese public administration is now adopting interoperable formats)
  • Open Source Software (NATIONAL BUDGET LAW says: ALL PROPRIETARY SOFTWARE PURCHASES HAVE TO BE JUSTIFIED CONSIDERING A FREE / OPEN ALTERNATIVE)
  • PSI Directive Transposition give us the opportunity to set the national agenda about open Government, namely, open data.

He finally concludes on the following:

  • Transparency and participation should be Part of the processes.
  • Openness can be more easily implemented in new processes.
  • Opportunity to Jump stages and create a smarter government.
Question followed from the audience:

Q: How do Can you identify the specific users’ barriers for using open data? A: It is not sufficient to simply publish data. The value lies in reuse. The best thing, next step in this direction is to bring the commercial partners to this arena. Bring together the entrepreneurs and the public bodies that need applications to be developed for their scope.

Beatrice Covassi, Deputy Head of Unit, Data Value Chain, European Commission DG CONNECT

See the Revised version of these notes

Noël Van Herreweghe from CORVe

October 3th 2014, the Flemish government in Belgium organized the third edition of the “Open Data Day in Flanders”. The focus of the two previous editions was mainly on the supply side. This year the organizers decided to put the spotlight on the users of the data and information supplied by the Flemish government, developers , individuals, entrepreneurs and others who build apps and web applications with this data, which in turn create economic and social value. Almost 250 CEOs, CIOs, project managers, developers and other stakeholders got together in Brussels for this yearly event. 24 national and international speakers saw this as an opportunity to present their projects and applications to an appreciative audience. The organizers of this event are also active organizers and participants in the European Share-PSI 2.0 project. They saw this as an opportunity towards realizing the aims of the second workshop to be held in Lisbon on December 3th and 4th; “Encouraging the commercial use of open data”. They asked the Open Data community to tell them what their expectations and recommendations were with respect to things such as the relevance of defined open data policies, the availability of data feeds, standards, challenges, opportunities etc. Participants at this event also got the opportunity to attend a “DataDive” workshop were data owners (the entities of the Flemish government and the local government) and data users (interested developers, businesses, organizations and designers) got together in a constructive dialogue, to discuss challenges and opportunities with regards to the commercial use of open data.

A top-down approach - Content, technically and legaly wise. Open data day In Flondre 2012,2013 – the focus is on the supply side Licences Portal with 2.000 datasets Uptake was at least poor. 2014 – what was wrong ? how they will use that data ? SHARE-PSI project is the opportunity to answer these questions ? OD day 2014 – theme on business GEO, mobility, who wants to start a business on OD and how we can help you. Data dive workshop got together in a constructive dialogue

Conclusions from the round tables:

  • "Government" does not exist. “Government” is often a collection of non-cooperating entities.
  • “Open” data is not “free” data. Re-using the (open) data comes also with a cost such as cleaning the data, development costs, conversion and integration and maintenance costs.
  • Open data often comes without the supplier side taking any responsibility such as a stable service requiring 24/7 operational stability and an SLA based agreement or contract.
  • “Open” for business will require continuous supply side investment in infrastructure and services; is government ready for this?
  • Opening up data is no guarantee that it will be 'picked up by the market.
  • We would like to see a consultation model between the different stakeholders and users. Very often there is a discrepancy between the supply side and users: Providers built data from an internal logic, often having no idea what the impact could be of that data on the outside. Users/developers use the data as part of the individual business model, often without exactly knowing the source of the data, with no view on the future roadmap of the supply side.
  • Government needs to pay attention to things like frequency, the right communication and the quality of downloads in order for businesses to run with the data and build a business on top of the data.
  • We’d love more data, but we will only go for it if we can realize a stable business model with this data.
  • The combination/linking of datasets creates added value, not only from government, but also from private companies and NGO’s.
  • There must be a balance between price and quality, accessibility and offerings.
  • Stimulate and integrate, be clear, long live JSON, think with us, use open standards.
  • Research journalists need a firm commitment from government w.r.t. open data, the data needs to be reliable, from confirmed authentic sources, easy to find, well indexed, easy accessible, using open standards and as much as possible free of charge.
  • New Public Management causes fee maximisation, as the civil servants see themselves as responsible for the income of their relevant PSB (Public Sector Bodies).
  • Some PSBs are ready to destroy established companies to increase their revenue, they see commercial reuse as competition.
  • Many PSBs do not really see the benefits of the PSI directive, they believe in commercialisation, the selling of their data.
  • There is a lack of economic expertise within the PSBs. Sales revenue, profits and other elements of balance sheets are mixed up.
  • PSBs are ready to defend their fees even if is obvious that these fees contradict European provisions and judgements.
  • PSBs deliberate miss- and/or diss-inform decision-makers and politicians.
  • Politicians backup their PSBs as long as possible, especially if they generate revenues.
  • Neither politicians nor civil servants really understand the economic background of PSI.
  • Also Politicians prefer short term revenues instead of long term economic development, even if their programs read different.
  • We believe in the strength of our own closed data. Integration of open data and closed data creates added value. Important is to collect data from a variety of sources, make that data consistent and enrich, thereby creating additional services (eg mapping, routing, ...) and services.
  • OD standards are fine, but more important are transparent user conditions, well thought out pricing models and quality data
  • Public Private Partnerships needs to be on the agenda at government level.
  • There are few stable standards, there are lots of standard dialects, standards are interpretable and data handover is often accompanied by loss of quality.
  • The data and context needs to make sense.
  • The privacy issue needs to be looked at from different angles.
  • A sustainable business model is critical when dealing with open data. One doesn’t stand out as a business with just open data and open software, the added value is in the 'integration' of different data sources.
  • We prefer “stable” data than “more” data, “quality” data than “more” data fields.
  • Better communication between suppliers and customers will be a win-win for both parties.
  • When an open data entrepreneur builds a business one gets get investment, commitment, validation, insight as well as economic growth and new jobs.
  • Government has to be patient, it may take a while for entrepreneurs to build products.
  • Follow the ODI model for incubation.
  • Run innovation competitions.
  • Seed fund for specific outputs.
  • Be glad for businesses to get rich from open data.
  • Changes only take place when external pressure grows.
  • It is very difficult to get a consistent overview of publicly available data feeds.
  • There are to many different interpretations of the applicable legislation.
  • We see many inconsistent pricing models, sometimes contradictory.
  • Data integration remains a time-consuming and therefore expensive matter for the integrator.
  • Free market means free movement of services and products, that allows us to build information services with * added value.
  • Businesses shouldn’t need to be in competition with the data source holder.
  • Respecting privacy rules is very often very difficult.
  • We need access to reusable raw data, but data which has been defined to us in his context.
  • Give us quality data without size restrictions.
  • We need transparency w.r.t. user rights and –restrictions and stable license models.
  • Define creative pricing models, forexample. “pay per use” and listen to the market and the customer.
Questions from the audience:

Q1: Do you have real data on ROI based on the data you have released? A1: Not an idea. We do not know.

Q2: Have you any specific actions of how? A2: Round tables session, making sure they understand OD.

Q3: Ecosystem. Did you find any social society – could you find a big enough partner – proving impact on the value of publishing? A3: NGOs, Commercial Associations on National Level. Belgium one country 3 mapping systems

Pitch your session after lunch - 1 minute Speeches

2nd round of parallel sessions are included in the programme for Lisbon in the project’s site: http://www.w3.org/2013/share-psi/workshop/lisbon/agenda

Wed 14:00 - 15:10 Parallel Sessions B

Open Data Startups: Catalyzing open data demand for commercial usage

Facilitator: Amanda Smith & Elpida Prasopoulou, ODI; Martin Alvarez-Espinar, CTIC. Scribe: Jan Kucera notes

Steps to a suitable redress mechanism

Facilitatior: Cristiana Sappa (KU Leuven, LAPSI coordinator) Scribe: Maria Magnolia Pardo, Murcia University notes

Open Data Business Model Generation

Chair: Clemens Wass, openlaws.eu, Fatemeh Ahmadi, Insight Centre for Data Analytics

Scribe: Leda Bargiotti

The session discussed sustainability of projects, initiatives, business activities related to open data.

The development of a business model is fundamental to ensure the sustainability of activities based on open data.

It is important that also governments understand what the business models around their data are, without having to know what re-users do with the data because too much interference from the governments may have negative consequences. What government need to know are the needs of re-users.

Public sector bodies should guarantee access to datasets as well as their availability over time and provide the framework for open data. If the frameworks is not secure nobody would start the business.

Public bodies can be a crucial player in the market, providing almost free service and they might be a competitor of private businesses. For example publishers in Austria complained when the government made legal data available for free. The agreement found between the two parties was that the government would have provided the basic information while the private companies would have provided added-value services based on this basic information.

A problem that companies face is the ability to estimate the value for a particular market. Currently what entrepreneur would invest on activities based on open data, but they cannot evaluate the value created by it. There is a need of best practices for entrepreneurs.

Business models may vary depending on data market and on the stakeholders related to it as well as whether you are a government, an established player or a start-up. Currently 15 different business models exist.

Business models depends on the market you are operating in; For example legal information has different dynamics compared to traffic information. With regard to traffic data, the government has been pushed out of it. With regard to legal information the government realised that opening up data should be a public task and here the government not only opened up the data but also created added value services. Ultimately, the value proposition depends on who provides data and who is using the data.

The purpose of a business is to create value and to make sustainable you capture some of the value. A business model that leverages open data should: - transform a resource that is public in a source that is a competitive advantage, - identify how to package this source in a compelling way for user and - identify ways on to get value of it to sustain the activity.

Nowadays, it is possible to get open data for free. The challenge of companies is to add value on top of it for the public plus value for the company to make it sustainable.

The session concluded by discussing who profits the most from open data and whether open data represents a threat to business: start-ups are first movers but it may happen that if they are successful they are taken over by bigger companies. Also governments gains from open data thanks to the reduction of costs.

Model-Driven Engineering for Data Harvesters

  • What X is the thing that should be done to publish/reuse PSI?
    • Meta-Data Harvesting is one of the key processes when it comes to increasing the amount of data available through Open Data Portals. In addition, the discussion has shown that some Open Data portals also rely on the providers to push data over APIs to the portal.
    • Data Publishing in RDF is the ideal solution. However, this is a difficult goal to achieve. Hence, the harvesting should be intelligent enough to facilitate the transformation between meta-data formats and to enable the automated import of large amounts of data to the portals.
    • Model-driven Engineering (MDE) for Data Harvesters is one possible solution to enable the development of intelligent harvesters, which can be used on different platforms and programing languages.
  • Why does X facilitate the publication/reuse of PSI?
    • Automated Harvesting and MDE based harvesters would enable the timely availability of Open Data over the belonging portals and would increase the quality of the provided data. That way the publication and reuse would be encouraged and facilitated.
  • How can one achieve X and how can you measure it?
    • Difficult to measure in general. Easily measurable metrics such as “number of apps” and “number of portal visits” can give good hints.

Making your PSI data multilingual and interoperable

Felix Sasaki, Asunción Gómez Pérez

Question: “What X is the thing that should be done to publish or reuse PSI?”

Publish data that has quality and resolve semantic conflicts before publishing. Publish in a standardized, linked data format using ontologies and established terminologies. Re-use by combining general with domain and application specific data and metadata. Take context of data into account. Think about how to represent multilingual information attached to the terminology to provide more information or link with a language resource.

Question: “Why does X facilitate the publication or reuse of PSI?”

Rich metadata facilitates the discovery and use of PSI across languages.

Question: “How can one achieve X and how can you measure or test it?”

We are not in a condition to measure or test this. We provide best practices and guidelines about data publication.

Wed 15:40 - 16:45 Parallel Sessions C

Open Data Economy: from ‘Wow’ to ‘How’

Facilitator: Michele Osella, Istituto Superiore Mario Boella Scribe: Lorenzo Canova notes

Access and Accessibility for Data

Facilitatior: Linda Austere, Providus Scribe: Maja Lubarda notes

Roadblocks in Commercial Open Data Usage

Chair: Ingo Keck (Centre for Advanced Data Analytics Research Presentation

Scribe: Martin Alvarez-Espinar

This was a practical session with 13 attendees working to develop a business plan.

Ingo: This session aims at defining a business plan and find the barriers, risks we may face.

... Explaining what a business plan is.

… Is your business unique? how you are going to get the money? how big is your audience.

… Lets gather some business ideas:

[everyone]

  • publishing scientific data
  • Know your neighbourhood
  • Resources around you (public spaces, tablespaces)
  • More accurate business plans
  • Data quality services (cleansing information, metadata, curation)

All attendees working in groups (after choosing a topic each) trying to answer these questions :

  • What is the need the business fulfills?
  • What is the market?
  • How is your business unique?


Group 1: Data quality services (cleansing information, metadata, curation) Identifying errors in data, provide a view on data for domain experts.

  • What is thinned the business fulfills?
    • Public sector has overheads in producing good quality open data; don't have resources to do it; data users need standardised data and businesses that want to use open data currently put in effort
  • What is the market?
    • Focus on one domain and build specialist knowledge, potential customers: European Commission, investors that want to have European comparable data, companies that need the domain specific data for re-use and need to outsource the data integration/quality work
  • How is your business unique?
    • There is competition, open data and licensing knowledge, specialist European knowledge, e.g. for investor service it needs in-depth country and language, culture knowledge, standards knowledge

2.0 FTE to be allocated for this business.

What could be wrong? Risks:

  • Availability of data
  • Demand of data
  • Poor Data Quality


Group 2: Know your neighbourhood

  • What is the need the business fulfills?
    • Improves quality of life
    • Good if you were going to buy an apartment
  • How long does it take to get to place of business?
    • If you live in an area, people don’t know what is available in their neighbourhood and things change all the time,so need to keep up to date.
    • Need for planners: policy, investment into services.
  • Where to locate a new business
  • Counter-planning for future
  • parking, transport, target market, demographics, capturing local intelligence is critical, cultural differences

Need for specific visualisations

Ingo showing the example http://dublindashboard.ie, visualising Irish Data.

No time for more reviews of these business plans. The group tries to solve the three common questions:

The three questions

What X is the thing that should be done to publish or reuse PSI?

  • Have an idea how to use the data.

Why does X facilitate the publication or reuse of PSI?

  • With this idea you will have the context of the data/information which you need to facilitate.

How can one achieve X and how can you measure or test it?

  • Number of business created

COOLTURA: scalable services for cultural engagement through the cloud

Facilitators: Nikolay Tcholtchev, Fraunhofer FOKUS [paper] slides Scribe: Peter Krantz notes

Thursday 4th December 2014 (Workshop Day 2)

Plenary Session

Georg & Marc dV

Georg: Shows first book published by Compass in 1888. It was based on PSI.

Austrian law changed in 2000, gov decided that databases held by Compass were their copyright. Has taken 13 years to resolve it at great cost. So giving away data seems odd to us.

Slides

PSI Directive focused on economic flow.

Chains add value at each stage. Old PSI Directive allowed full cost recovery

Georg explains how it's meant to work

Marc takes over...

Wave of studies in the field of open data

Works through diagram

30-50% costs of charging are transaction costs (studies show)

Dark green is the ROI

Encourages people to read the paper for the detail

Barking up the wrong tree?

First wave: Early research looked at market value of the data

2nd wave - open data. Charge or not? What happens if you lower the price?

3rd wave: What does it really cost the gov to open the data? Turns out investments are low (less than €100K)

It's a value network, not a value chain.New forms of collaboration crop up. Slide is informative

The value materialises in non-traditional way. Not always monetary (see CORVe presentation yesterday).

Real value is in the summary of all these transactions. Tiny bits of value being created along the way. No research done in this field as far as I can see.

Back to Georg.

Intermediaries need to have their business plan. Value prop to customers and a marketing plan. Elements like product, pricing and placement.

Product - data that you process is a crucial element.

Problem: which data should the PAs release? EC had a consultation last summer - it's clear which data the user needs. Also obvious that PAs know what data is most valuable - they already charge for it. Need a change of paradigm. Not that commercial reusers have to ask for permission to access data. They have a right to obtain data. That's the change that PAs have to react to.

We have the marginal cost model but we also have exemptions. That's what typical civil servants will think applies to them. Imagine starting a business where you don't know the price of a key resource. Pricing is essential.

Placement: what can I do with my data? Who can I sell it to. That's licensing - am I allowed to resell? Can I act as a wholesaler? These problems need to be solved before the how really starts. Need stable framework so that business models can be created. Framework is essential.

The whole PSID is a business model for the member states. They have to invest some money and that will result in taxation gains later. Don't ask the commercial reusers to invest their money in business models that are unclear if you, the PAs didn't provide the preconditions for the reuse of PSI.

Q&A

Makx to Marc - where are we now on your graph.

Marc - it depends on the sector. In Transport, for e.g. Tom Tom has taken over from the gov.

Q: is there a country in Europe that is on the ROI side of the graph?

Marc - yes, Denmark. They opened all their data, cadastral data etc. All free. Effects already kicking in. NL not that bad. Open Basic Data report by Danish gov has those figures - they're monitoring the effects.

Asun: There is an assumption in the chain that the data has the same quality through the chain?

Marc: the answer is the market. If you're delivering data that isn't correct, you'll soon know. Is it the Gov or the market that is responsible for quality?

Asun: I have no idea ;-)

Arnold v O

Introduces the Open Group.

Slides are informative

PK: Is UDEF related to Semantic Web or is it another standard?

AvO: No. It's based on the principle of machine to machine communication whereas SemWeb is mostly machine to human interaction.

PK: I don't agree. Lots of overlap with SemWeb so it looks like repetition.

AvO: UDEF is a semantic mapping of assets. I thin you can also use it in Sem Web technology but it's independent.

Q: Is this a standard format for providing info?

AvO: It's independent of the technology and representation. e.g. UDEF doesn't care how you write a birth date.

Dolores

Our study is not a scientific quantitative one but it gives a good idea.

Two editions. First in 2011, second in 2012.

Slide shows defn of Infomediary

Slides are informative

Spider diagram - issues

Bar chart - what they see as potential

Q: You have a large infomediary sector - I assume it pre-dates open data. What part is from old (paying for data) and what part is impact of opening data

Dolores: Most of info is open data is for free. It's difficult for us to put a price on info.

BK: Do you expect changes in the type of companies using PSI?

Dolores: I don't have that info. Crisis as lead several companies to fall.

Q: This is my first workshop in this area. I am concerned with potential of final consumer. How can we guarantee integrity of info for final consumer. Right to access PSI is fine, but needs security and trust. How does Spain solve this problem? Certificates? Responsibilities for enterprise?

Dolores: we have responsibility follows the rules of national standards of security, respect laws of private data, copyright protection etc. What the market does - we don't know.

Parallel Sessions D: Boosting Open Data Re-Use and Business

  • Facilitator: Miroslav Konecny, Addsen/COMSODE, Harris Alexopoulos, University of the Aegean/Gov4All
  • Scribe: Benedikt Kämpgen
  • Session organised by COMSODE and Gov4All

Executive Summary: Answers to questions

  • What X is the thing that should be done to publish or reuse PSI?

Increase trust: Make officials recognisable to the public. Allow for service level agreements (SLA).

Allow for a feedback loop between publishers and users of data. E.g., using discussion and commenting functionalities. However, official and reliable answers need to be given.

Allow publishers to compare tools for publishing. E.g., create tool matrix.

Objectively quantify the benefit of a tool or a concept for a member state or organisation to meet the PSI directive

  • Why does X facilitate the publication or reuse of PSI?

Higher trust => Higher reuse.

Higher transparency of tool functionality => Higher trust in making the right choice of tools. Easier selection of tools. More goal-driven development of tools.

Measurable indicators of PSI directive => goal-driven tool development and deployment.

  • How can one achieve X and how can you measure or test it?

Certification mechanism?

Tool matrix?

More concrete requirements by the EU regarding the PSI directive.

Miroslav presenting COMSODE project

  • Miroslav Konecny (M):
  • Activities of the COMSODE project align well with Share-PSI project
  • FP7 project: 6 main partners, 14 associated partners
  • Project objectives:
    • Open Data platform: Open-Data-Node (open source)
    • Methodology for publication of high-quality Open Data
    • Advanced Search Toolbox
    • Release of new datasets in case studies
    • Exploit project results for SMEs and future projects.
  • Open-Data-Node
    • Data Catalogue
    • Builds on CKAN
    • Integrates with Europeana Open Data Portal
  • Benedikt Kämpgen: How is CKAN enhanced?
  • Miroslav: Additional harvesters are build to populate CKAN.
  • The goal is to have the platform be used in two use cases:
    • 1) internally in organisations as a way to do data management
    • 2) externally for governments as a way to communicate the availability of data to open government platforms.

Gabriel Achman (GA) speaking about experiences with COMSODE project results in Slovakia

  • CKAN
    • Catalogue
    • You can use it internally, or as single point of publication of Open Data.
    • Main component
  • Andrei Nicoara (AN) (Romania) from Romanian Ministry
    • There will be a European data portal next year.
    • How does your solution compare to Open Data Portal of European?
    • As a European country, we need to publish to the European portal, how is this supported?
    • GA: Two usages of Open-Data-Node: Within organisations, and public bodies.
    • AN: Are we talking about metadata, only?
    • GA: We also store the data.
  • Nikolay Tcholtchev
    • Are you able to harvest from all these different sources?
    • GA: yes: Several steps: Open Data Node as the central repository for users and application developers.
  • GA
    • Slovakia data portal uses Open Data Node
    • Shall be a blueprint for other countries
    • One nice thing about our approach is that we provide Service Level Agreements (SLA) to data providers (?)
    • This way, we can also publish certified data.
    • Therefore, the published Open Data can also be used for legal purposes.
  • Harris: What if data that you harvest does not have metadata available?
    • GA: We provide tools to enhance data with metadata. Some information is obligatory.
    • We allow to transform the data to Linked Data.
    • We use the results of the LOD2 project which also includes the enrichment.
  • Harris: We have a project (ENGAGE project) where we deploy a methodology where the enrichment step is also embedded.
  • Benedikt: Certification increases trust, have you made good experiences with it?
    • Harris: Yes, we have the same goal in ENGAGE, but it is very difficult, since the data may be enriched/modified, after which the certification may not be valid anymore?
    • GA: The data is signed automatically when data is uploaded to the system.
    • GA: Since everything is stored centrally, we can do such certification also with modifed data. Also, you can (maybe automatically) for yourself compare the original with the enhanced version.
  • Nikolay: Are the transformations Open Source so that one can check them?
    • GA: Yes. But we are still in the process of developing more transformations.
  • Petar: We had a presentation by Nikolay.
    • They build harvesters in a model-driven way.
    • What are the best practices in such transformations?
    • GA: Our goal is to harvest from a standard API. The goal is to make all public bodies have such an API.
  • Benedikt: What API are you suggesting?
    • GA: We do not know, yet.
    • One would be the CKAN interface, so publishers should also use CKAN internally.
  • Miroslav (M)
    • See deliverables on the web of COMSODE project (XXX)
    • See for example methodology, data section...
    • Czech republic, slovakia are adopting our methodology
  • Harris: How do you map vocabularies of data?
    • GA: We integrate with Poolparty. Methodology requires Data Curator. Data Curator is responsible to maintain vocabulary, e.g., using Poolparty (tool for maintaining SKOS taxonomies or controlled vocabularies)
    • The platforms can also support discussions, comments about published data.
  • AN: Tools for evaluating the quality of data?
    • GA: Format by 5-star linked data classification (XXX)
    • GA: Content by crowd source platforms. What would you suggest?
    • Andrei: A more objective way to evaluate the quality, e.g., how much percent is a dataset covering? Have not seen any tool for that.
    • Harris: Comment fields?
    • Andrei: Even for the very interesting datasets, we have 10 comments. That is not statistically relevant. Also, they are mainly negative.
  • GA: Europeana Food and Drink is aiming at getting feedback from users about data published about this. Its not only about storing but also linking, or supporting businesses such as restaurants.
  • There will be a crowd sourcing platform.
  • AN: For us, this is not realistic.
  • GA: You can also use your own feedback platforms. And then via a standard interface you can communicate such feedback to our platform.
  • Jan:
    • To sum it up: Key is crowd sourcing the data... Crowdsourcing the feedback about governmental data (quality, usefulness...)
    • And make sure that the quality actual increases on top of the feedback. Is not a closed loop, takes effort.
    • Data Curator needs help from the public.
  • Maria Eugenia (ME)
    • Is Europeana used?
    • GA: Yes.
    • Embrosia project (Food and Drink): How to make business about project?
    • GA: Open Data node is mainly a tool for complex data management not for making business.
    • ME: You are gathering the data on a national level and pushing it to Europeana?
    • GA: Yes.
    • How many cultural institutions?
      • In Slovakia all. We do not know how many else are using Open Data Node (it is Open Source)
      • We aim at deploying our tools and methodologies in other countries and companies.
      • We will have a meeting next year and would be happy about feedback and wishes and feature requests.
  • ME:
    • What is the information that you extract? Do you provide long term preservation solutions?
    • GA: We are only harvesting and storing and managing metadata and communication/distribution of metadata and data, but no preservation.
  • Benedikt
    • Great to have all this extended versions of CKAN popping up.
    • However, difficult for publishers to decide. Systems are difficult to compare.
    • What do you provide more?
    • Miroslav: More interfaces.
    • Nikolay: There is a difference between enhancement and adaptation of CKAN. We do adaptation for very specific needs of publishers.
  • What: COMSODE. Complex?
  • Why facilitate: Methodologies. We can deal with interoperability.
  • How achieve, measure, test? The number of services can be increased.

Harris presenting gov4all

  • Harris presentation
    • Last step of value chain.
    • Architecture.
    • Gov4All: We provide tools for the collaboration.
    • Feedback loop needs to be established between Government and End users.
    • Important part: Needs declaration. Addressing governments and developers.
    • Andrei: Public service gets incentives from Gov4All. What incentives are you referring to?
    • GA: Open Data APIs can foster integration with other organisations. But we are looking for concrete incentives.
  • Andrei: National platform includes comment plugin. What can you provide more? Harris: Rating mechanism. Andrei: Specific to Open Data.
  • Harris: National effort. The website: http://gov4all.azurewebsites.net/en/Application.
  • AN: How many members? Harris: 80-90 members in the community.
  • AN: Also governmental representative? H: How to make sure that community provides help to government?
  • AN: Process of opening up of data is iterative. How do you support this dialogue?
  • GA: In Slovakia. In our platform, every governmental employee needs to get certified in the platform. Citizens can be anonymous.
  • AN: Government needs to be recognisable. Certificate only for persons.
  • GA: Depends on use case. Specific dataset there might be responsible. For general issues the person might not be as important (A: But I as a citizen need to know about the officiality).
  • Surely a difficult issue.
  • Possible good practice: have a way to make visible official employees.
  • Harris: Does anyone have opinions about what is needed?
  • AN: Legal aspects. Eastern Europe is different. We are not looking for tools. We are looking for processes.
  • Benedikt Kotmel (Czech republic): For us indeed it is a question of tool. How do we know that the portal is known and allows us to fulfil the requirements by the EU? We want to make sure that the portal is well known.
  • M: How to compare?
  • Nancy: PSI Directive June 2015: Reports are needed. Open-by-default principle.
  • M: Law is just the start. If someone sues the state, that may be a driver to compare states.
  • Harris: The question is, what does the EU require? There was a question in the EU presentation about how the advancement of the EU member states is measured by the EU. The answer by the EC was: Regulary meetings and met deadlines. (Note by scribe: one information source may be the digital scoreboard: http://ec.europa.eu/digital-agenda/en/digital-agenda-scoreboard)
  • Miroslav: Want to make public servants life as easy as possible.

Session: The Central Role of Location

  • The session facilitators were Ingo Simonis (OGC), Raquel Saraiva (Directorate General Territorial Institution), Adomas Svirskas (Advisor to the Lithuanian Cadastre, President of Lithuanian Software and Services Cluster)
  • Ingo introduced the work of OGC
  • OGC is standardization body about spatial standards. Standards to publish data on the web. Much work is heavily influenced by the INSPIRE directive.
  • INSPIRE enforced all member states to publish data in a way, that it can be discovered and accessed. Not necessarily for the general public, but more on a org2org relationship
  • OGC develops only web based standards
  • Data is accessible via standards like GML or KML
  • Location/geospatial data includes adresses
  • non-geospatial data usually can gain lots of value through linking it to geospatial data
  • OGC has interoperability test projects, where relevant stakeholders are invited to test how good they can use other orgs data
  • plug-fest: Brings people together with data providers in order to explore how good they can use the organizations data
  • OGC creates many projects where crowd-sourcing plays a role
  • Example: Computing the extent of flooding based on different pictures taken by citizens at the location of flooding
  • Adomas Svirskas shortly described some of his work with PSB in Lithuania
  • Center of registers decided to develop regional geo-information environment service (REGIA) for local authorities to reuse
  • On top of that platform services like issue management systems have been built
  • Raquel Saraiva
  • have a data platform where they do data harmonisation as a first step
  • after that they produce data with many different sources included and provide this to the colleagues at DGT, which are used in their day to day work
  • DGT is a member of EU project Smart Open Data http://www.smartopendata.eu/
  • this project has brought OGC and W3C together

Discussion and comments

  • European Location Framework is being discussed at EU level, but has been ongoing quite a while
  • Canary Island doesn´t have a georeferenced adress data base
  • http://what3words.com/ initiative: Create an address pattern which is available throughout the world
  • Location in the digital world on the one hand is becoming un-important on the other side it is becoming ever more important
  • Open Group: Doesn´t seem to have any standards related to geospatial data
  • Example: industrial installations where maintenance has to be done. These installations are 3D, so you also not only need net latitude and longitude but also hight in order to identify exact locations
  • Google Maps (GM) as different platform: Is there a European answer to Google maps?
  • GM is just a background layer
  • ON the open market there is OSM
  • INSPIRE is not prescribing any potential background
  • Cadastre data is available in each member state, but probably not to the general public, because some countries choose to sell that data
  • GM is a base map in one reference system. National Base maps are usually available in varying reference systems.
  • Ingo doesn´t see any GM solution coming out of Europe any time soon
  • Example: geo-data used in research: address data which used Universal Business Language specification, which was an enormously complex task
  • oftentimes researchers flatten out hierarchical geo-data, because statistical data tools have problems in working with hierarchical data
  • What is OGC-compliant? How can an application be OGC-compliant?
  • Example: Serbian business register agency: Discussion with that agency to publish company data. They are interested, but are not entitled to publish location data of companies
  • OGC has a test suite for self testing data sets/services on compliance with INSPIRE
  • Example: Difficulties modeling geo-data into RDF because of all the mentioned variations in these data sets like differing vocabularies
  • Example: GeoJSON is massively used, just like ShapeFiles, but can one recommend their usage? One isn´t an official standard the other is dominated by a company.
  • What is the current status concerning data privacy?
  • How to interpret whether something is anonymous or not?
  • Example: From the timeline of your friends on Facebook you can predict the location of a 3rd party for the next day
  • Privacy aspects are not constantly addressed within INSPIRE

Best practice questions

What X is the thing that should be done to publish/reuse PSI?

  • Develop best practices on
    • vocabularies
    • how to model data so others can directly use it
    • how to balance data protection and re-use of geo data
    • Develop profiles of geo-spatial standards helping one to publish data in certain situations
    • What is OGC-compliant? How can an application be OGC-compliant?

Why does X facilitate the publication/reuse of PSI?

  • Use of geo data in connection with other open data creates greater value
  • using standardized geo data makes it easier to connect and thus publish geo-located data sets

How can one achieve X and how can you measure it?

The most known challenges of PSI Access and Re-USE: Intellectual Property (and Data Protection)

Facilitatior: Freyja van den Boom Scribe: Lorenzo Canova notes

Open Market Dilemmas

Dietmar Gattwinkel, Saxony Open Government Data Scribe: Sebastian Sklarß, ]init[ notes

Thu 14:00 - 15:10 Bar Camp Sessions (Round 1)

Erasmus on open data

  • Facilitator: Miguel Garcia
  • Scribe: Martin Alvarez
  • 11 attendees

Miguel explaining the Open Data Exchange programme started in Nantes, France in September. http://erasmusopendata.eu

This program is similar to the Erasmus programme for students, but this time for Open Data professional (developers, activists).

We have written a roadmap and promoted through different channels (i.e., ePSI Platform). Up to date we have received 30 letters of support.

The main idea is defined in this document: https://docs.google.com/a/fundacionctic.org/document/d/1w00AF1-0IZhImPu_DpQJi0fi_zc2JKRBuXhGd9pjMQw/edit#heading=h.vxgstopvz2yx

Of course, this programme is being defined so comments are welcome.

[Round or introductions]

Valentina: My institute (Pupin) may be interested in this programme. We have the expertise and (open) tools.

Jens: Just to clarify, are some NGOs or researchers excluded?

Miguel: No, everyone should be invited.

Jens: Initiatives such as Code for America, Code for Germany, etc. are good targets for this. … I have contacts from the Code for Germany group

Ana Martino: I’m really interested in this area. As a journalist I think that this exchange will help with sharing knowledge. … Although we started it. This project everyone’s.

Karnik: There are many conferences, so what is the new thing here?

Miguel: The idea is helping those people that learn something in the conferences to implement the things they are learning.

Ingo: I have ever been traveling and related with countries. I think this is very important. There is some Erasmus program for researchers, so the initiative should be focused on other audience.

Miguel: We are aware of different existing exchanges but this is more focused on real developments.

Bruno Almeida: I work for Instituto Pedro Nunes. We have a lab with SMEs and this is very interesting for us. We can help.

Miguel Laginha: I’m a colleague of Bruno in the IPN. I’ll talk with our Board of Directors to ask them for their interest.

Simon: I’m interested in sharing knowledge. I’m also wondering how this can work (exchange in workshops, etc.)

Jens: I suppose the EC would have asked why we should fund this programme?

Miguel: Good question.

Valentina: We have to measure the impact of the measure. … Counting the prototypes developed, maybe.

Miguel: Alberto Cottica (Spaghetti Open Data) came with the idea and all started. The Commission showed its worries about this.

Martin: Neelie Kroes talked about this program during the OKFest in Berlin. Even, she said the EC launched this programme :-) … The first meeting was in Nantes organized by Claire (Libertic, France)

Miguel PT: There are other programs (Erasmus, Leonardo, Erasmus for entrepreneurs), maybe we should show some figures of these programs to illustrate the benefits.

... I see this website (erasmusopendata.eu) but I only have doubts. I want to understand how it will be like . This program asks for money but I have no idea what is the final kind of community/network, etc.

Karnik: We need to clarify the things demanded in the program.

Benedikt: I don’t have the idea of the whole picture. I don’t understand why the EC should give money to this program. I don’t see the benefits.

Miguel: We are not asking for money for us. This is the idea from the community. Our approach is just asking for support to improve that letter. … We think this should come from the whole community. … We are the Open Data Community so we should be open.

Karnik: I would do the opposite way. As a company, I would create an idea and I would go to the Commission to make money to create the community.

Miguel: Yes, but this is not the idea. The community should be responsible for this.

Benedikt: It’s difficult getting money from the Commission for this.

Miguel: Agree.

Ana: It’s not easy, but we are trying it. … We don’t want just your support, also your contributions.

Simon: What kind of exchange do you expect.

Miguel: We are gathering ideas right now.

Ingo: This could be interested for NGOs.

Jens: These NGOs such as Wikimedia have experiences of creating bottom-up creation of networks. So we should involve them.

Czech: I have some NGO contacts in the Czech Republic.

Martin: We have defined some use cases: first a showcase gallery, application process, brokerage, working sessions, physical exchange

Valentina: Regular Skype calls are a good idea to engage people.


Miguel: Lets go for the three questions:

What X is the thing that should be done to publish or reuse PSI?‘

  • X = Open Data Exchange Programme as a bottom-up approach to connect communities in the Open Data field.

Why does X facilitate the publication or reuse of PSI?‘

  • This programme will connect the isolated communities in Europe. Share

How can one achieve X and how can you measure or test it?‘

  • The number of exchanges, products, developments produced.

Bar Camp Session: Auditorium - Muriel & Peter K

  • Joined suggested bar camp topics
    • Muriel Foulonnneau - RDF: does the emphasis on RDF data publication help commercial reuse? - muriel.foulonneau [at] tudor.lu
    • Peter Krantz - Standardised information, why is this so difficult? - peter [at] peterkrantz.se
  • Scribe: Benedikt Kämpgen kaempgen [at] fzi.de
  • Number of participants: around 40 (not all actively participating)

Executive Summary: Answers to questions

  • What X is the thing that should be done to publish or reuse PSI?

Make data self-descriptive.

  • Why does X facilitate the publication or reuse of PSI?

So that applications automatically can display data in a useful human-readable form.

  • How can one achieve X and how can you measure or test it?

Use RDF and standard vocabularies (SKOS, FOAF).

Use other standards such as SDMX, XBRL etc. with a well-defined domain model.

Have a feedback loop with application developers and users of the data (e.g., journalists). Build demonstrators.

Solve the marketing issue of RDF. Too many acronyms. Also a tool issue, fewer production-ready tools for RDF than XML. How to educate the people? We need to start with the people.

A graphical notion of RDF (similar to UML).

Detailed notes

  • Peter:
  • We have a lot of data but it is not harmonised.
  • All portals publish in different formats and standards.
  • Only few publish datasets as Linked Data.
  • Is RDF too complicated?
  • What is the advantage of RDF, anyway? Why not as a CSV file?
  • Most commercial users/developers are not familiar with RDF.
  • Is the commercial reuse of data improved by using RDF?
  • Joseph Azzopardi:
  • Core vocabularies are most important: core business, core administration, ...
  • Also important are business rules about these vocabularies.
  • Makx:
  • In my opinion promoting RDF for metadata is different from proposing to use RDF for data.
  • You need to think about how the data should be used for. There may be existing standards available in domains.
  • Good XML schema may be as good as a RDF schema. The most important thing.
  • Muriel: But many domains do not provide such standards.
  • Peter: However, users of such data still have a hard time using PSI information. For instance, I did a survey with companies (people?) in schweden.
  • The results say: For instance, assume someone wants to have information about locations. Then, it is still very difficult to gather all available data since everyone is providing the data in different formats.
  • Arnold van Overeem:
  • Open Data is not necessarily Linked Data.
  • RDF only for machine to machine. Human reuse of RDF is difficult.
  • Benedikt:
  • RDF is a data model, not a format.
  • You can publish RDF using XML, JSON, Turtle.
  • Some formats more some less easy for developers.
  • The main objectives of RDF is to make data self-descriptive.
  • Self-descriptive data is easier to automatically process for applications.
  • Applications then can display the data to users in a useful way.
  • Other standards such as SDMX, XBRL also allow to make data self-descriptive and can be transformed to RDF.
  • In the end, we need to help publishers to publish useful data.
  • CSV without metadata is not sufficient.
  • One good way is RDF, but there are other ways.
  • Peter: How to allow this self-descriptiveness. People need help to adher to standards so that data is self-descriptive.
  • Deidre: Technical framework is needed. Geo-spatial is a good example where there are good technical standards.
  • Amanda:
  • Problem: Many people do not know about the technical aspects.
  • Needed: Creating platforms tools that can be published as RDF.
  • Many people do not know what is meant with machine-understandable data.
  • Gabriel Lachmann
  • What we see as a main advantage is the Linking.
  • For registries. RDF is a infrastructure for linking. Cultural institutions.
  • Dictionaries are one good thing possible to implement using RDF.
  • Muriel: I have been working with cultural institutions and also used RDF for it.
  • Makx:
  • We have to distinguish: 1) Context needs to be provided. Basic legal situation for the process. 2) We need a community to help with the process. 3) We need flexibility with the how and when (comply or explain approach).
  • Steinar:
  • It is all about integration.
  • Decision about solution really depends on whether short term and long term.
  • If you want a long term solution: You need identifiers. You need basic protocols to share data. You need a way to link. I am very optimistic that Linked Data is one good way to do it.
  • Maria Val:
  • Important to make data useful to the users.
  • In INSPIRE project we provide tools to make it easy for people to make available machine readable data.
  • Going through legal process is not enough when metadata is missing. It might be important to also provide tools that can work with the data.
  • For instance: If you buy land, you need to describe it in a specific format, this may help with making data available in good format.
  • Phil:
  • Putting his W3C hat on.
  • Any data format is machine to human (but of course you need a tool). So there are no differences.
  • Many Semantic Web standards have been updated (SPARQL, RDF). Question: Are we (standard commitee) now redundant?
  • Some people may say Linked Data dies. If you want data only on one format, then you are stupid.
  • Any format may be useful for a specific use case.
  • If you have one table, may use CSV. If you have many tables, you may consider use RDF.
  • RDF is not the solution to everything.
  • Some people do not want to use RDF, but use JSON-LD.
  • However, Linked Data is more, e.g., content negotiation: 1) Go address/URL. 2) Browser asks server for a specific format. 3) Server provides info in useful format, e.g., HTML or RDF.
  • Steinar:
  • However, the format may still be an issue.
  • People may still refuse data in certain formats. In Web service world: Inbound or outbound information. Inbound means Linked Data. Maybe we should bridge better to other communities.
  • Lorenzo: Agrees with Amanda. RDF like MP3. We need to wait for the iPod.
  • Muriel: JSON looks a bit more human-readable. You can publish your data in different ways. Why is content negotiation used so rarely? It is not an option to say publishers to provide data in all formats.
  • Peter: One problem may be the legacy tools in company. Not nearly as many tools for RDF than for JSON and XML.
  • Arnold: Human-readable is overrated. MP3 is not human readable either. RDF as a tool for representing metadata.
  • Makx: 15years ago, data integration meeting. People shouting XML at him. XML managed to become the buzzword in board rooms. How do we need to communicate?
  • Peter: Too many things that people cannot understand.
  • Joseph: XML key technology at that time, and we are still discussing standards and formats. How to encode business rules? How to say, that forename is bevor last name. How to say name cannot contain numbers?
  • Steinar: RDF is an abstract model. What is needed: Graphical notion.
  • Ingo (OGC): Discussion is not leading anywhere. There will be new formats coming every time. Domain experts have to sit together and to develop the model. We do not have enough organisations stepping up to be responsible for a domain model.
  • Makx: Agree with Ingo. ADMS is an example. Some organisations are stepping up now.
  • Peter: Only useful if the standards are adopted.
  • Makx: Rather about having organisation adopting a standard and publishing data using standards. Europeana is an example of a growing group that uses a standard.
  • Deidree: What happens if standards are not open?
  • Makx: The government does not need to do anything. May only provide open playing field. Government should push companies to make standards open. Governmnements do not need to push the creation of standards.
  • Becker: Agrees with Makx. Tools are missing for using RDF. Its all about tools for using the standards. What is easy to use and what is gaining momentum.
  • Muriel: So in the end it is a communication problem?
  • Phil: Why is it so hard to do it? Standards are always driven by people that want to come together. Sometimes the number of people is small. I do not like 5-star-linked-data. CSV (with metadata) is just fine. See CSV on the Web Working Group.

Open Data Life Cycle and Infrastructure

Facilitators: Jan, Harris, Peter W.

Scribe: Jan

Presentation: File:Open Data Life Cycle and Infrastructure Bar Camp.pdf

Minutes

Attendees: 3 + 2 facilitators + 1 facilitator/scribe

Introduction

  • Aims of this bar cam are to discuss the Open Data lifecycles and the infrastructure that support the publication of Open Data. When discussing the Open Data lifecycles attention should be paid to the involvement in the curation of datasets and on the user engagement throughout the life cycle.
  • Existing life-cycle models
    • Linked Open Data Life Cycle
    • Data Curation model
    • Engage Life Cycle models
    • COMSODE model
  • Domains in Open Data publication
    • Planning
    • Preparation
    • Publication and cataloguing
    • User relationship management
    • Archiving
  • Three bar camp questions outlined
    • Should the users be involved in the data management (curation) of open data?
    • Should the user engagement (collaboration) be performed throughout the whole life cycle?
    • What is the optimal IT infrastructure for public sector to support the IT-driven economy?


Q: What do you mean by the “model”?

  • It is the structure of the dataset, the fields and the units of measure etc.


Q: Does the feedback loop exists in reality?

  • Note: feedback loop in which the feedback provided by the users is utilized to improve the quality of the published datasets.
  • For example in the Czech Republic feedback led to improvement of datasets published by the Czech Telecommunication Office.


Q: Can we apply the presented models to the real time data

  • Presented models are applicable to the real time data se well however an appropriate infrastructure to support the steps of the life cycle is needed.


Based on the bar camp questions three factors facilitation PSI/Open Data publication and reuse were identified:

  • User involvement in data curation
  • User engagement throughout the life cycle
  • Proper IT infrastructure supporting the publication of PSI/Open Data


Description of the above mentioned factors as well as the answers to the SharePSI 2.0 Lisbon Workshop questions (“What X is the thing that should be done to publish or reuse PSI?”, “Why does X facilitate the publication or reuse of PSI?”, “How can one achieve X and how can you measure or test it?”) are provided in Conclusions section.


Three questions for the bar camp – highlights of the discussion

1) Should the users be involved in the data management (curation) of open data?

  • Why (not)?
    • Opinion: No. Users can combine datasets but it is not necessary to involve them in the curation process.
      • Data is already in a system, just publish raw data directly from this system
      • When you build a system it just needs an API, so not involving users simplifies the development/procurement
      • The most important thing is making data available, taking into account the different needs of people
        • Sometimes data is 'tagged' by one group with one term, but with another group with another term. For enhanced discoverability it is helpful if the users can enrich data with elements that are user provided, but this should be done in a way that doesn't change the original data nor disguise the provenance.
    • Opinion: Yes. For example In Scotland users are involved in curation of datasets in the ALISS service. Data from various sources of data related to the health care domain are integrated in this service. Users can provide perspectives on the data assets.
    • Q: Can users change the published data in ALISS service?
      • A: Users cannot change the original data but they can contribute to the metadata (metadata enrichment).
    • For discovery it is good if the data are enhanced by the users – data are tagged but the original data should not be polluted.
    • In general, involvement of users in the data curation would in fact depend on what is meant by the curation.
  • How (if yes)?
    • Metadata enrichment like in the ALISS service example.
    • Format change
      • E.g. transformation of data from PDF to excel
      • However raw and machine readable data should be published from the start rather than letting users to do the transformation.


2) Should the user engagement (collaboration) be performed throughout the whole life cycle?

  • Why (not)?
    • There should be collaboration between the providers of data and the users throughout the whole life cycle.
    • Using the Engage life cycle model as an example in case of the ALISS service users are involved in most of the life cycle phases, at least up to some extend (pre-processing phase is not present in ALISS service life cycle).
    • Users might be involved for example in curating, duplication checks, pre-processing, creating/gathering the data.
  • How (if yes)? How the users are involved?
    • Examples how users are involved in the ALISS service:
      • Create phase – people work with organization to collaboratively spot the assets (parks, walks, health facilities etc.).
      • Curate phase – adding meaning to the data, adding metadata (description, tags).
      • Publishing phase – users can search the assets using a set of provided tools and API.
    • Involvement of the users might be also dependent on the data. It might difficult to achieve it with for example sensors data.
      • However people can for example reuse the earth quake sensors data.
    • User can by engaged by being involved in the data area and having the opportunity to collaborate.


3) What is the optimal IT infrastructure for public sector to support the IT-driven economy?

  • Why optimize the infrastructure
    • Users expect for example quick response from the government web sites. It can be achieved by the right architecture of the page and the location of the component parts. So we should ask what the kind of consideration of design in order to support businesses requirements is.
    • We develop on technology designed for 20th century – expensive disks, expensive memory. Is this the architecture to support for example massive mobile usage of the web site? Can’t we do better?
    • Certain “path designs” are expensive from the computation perspective in the infrastructure designed for 20th century.
    • We should try to specify the infrastructure for the specific delivery.
      • Aim for memory-only computing
      • Why do update and delate in the data stores and not just use persistent data stores
    • Most relational data will fit into 1TB database – we can get improvements with data stores optimized for handling this amount of data
    • What are the candidate checks what the best candidates are
  • Why not optimize the infrastructure (counter arguments)
    • Government should not be involved in these so low level problems.
      • However for example the Scottish government is different – it has a large IT department that is able to deal with this kind of problems.
    • Governments should look for the right people to the job and contract them
      • However the procurement lacks behind the technology. Procurement often aims for not what the best is but what is safe.
    • Cloud based service might be a solution (do not buy your hardware, use some service).
      • (+) Provides flexible cost structure.
      • (-) It might be insecure.
  • Infrastructure is a crucial thing to get the things working. If we provide documents as a service (web pages) we should be expected to be provide data as a service.
  • How
    • Two possible approaches:
      • Using SW and HW as a service – do not procure your own infrastructure.
      • Building strong IT department.
    • Abandon infrastructure with roots in the 20th century and use the infrastructure developed for the 21st century which can better satisfy needs of publication data on the web.


Conclusions

User involvement in data curation

What X is the thing that should be done to publish or reuse PSI?

  • In order to facilitate publishing and reuse of PSI users should be involved in data curation.


Why does X facilitate the publication or reuse of PSI?

  • Metadata enriched by the users can improve discoverability of the datasets and it can also help to add meaning to the data.


How can one achieve X and how can you measure or test it?

  • Metadata enrichment – users might be allowed to enrich the metadat, e.g. they can tag the datasets or add/improve datasets description.
  • Users might perform transformation between various formats of data and provide back the transformed datasets.
  • However original datasets should stay untouched. I.e. it should possible to distinguish between the original data/metadata and the user generated content.


User engagement throughout the life cycle

What X is the thing that should be done to publish or reuse PSI?

  • User engagement facilitates publication and reuse of PSI. User engagement should not just one phase of the Open Data life cycle but users should be engaged throughout the whole life cycle.


Why does X facilitate the publication or reuse of PSI?

  • If the users are engaged from the start they can help to identify datasets that are in demand and thus it facilitates reuse of the published data. It also helps publishers to focus on the right datasets.


How can one achieve X and how can you measure or test it?

  • Depending on the phase of the Open Data life cycle or the type of data users might be involved for example in collaborative selection of datasets, metadata enrichment, search or use of the provided tools and APIs.
  • In general provides of the data should be able to provide the users opportunities to collaborate.


Proper IT infrastructure supporting the publication of PSI/Open Data

What X is the thing that should be done to publish or reuse PSI?

  • In order to facilitate the PSI publication and reuse proper IT infrastructure supporting the publication should be in place.

Why does X facilitate the publication or reuse of PSI?

  • In order to be able to publish data in a way that meets users’ expectations an appropriate IT infrastructure is necessary. Users expect for example quick response from the government web sites. If the publishers are not able to satisfy expectations of the users it might hinder the reuse.IT infrastructure is a crucial thing to ensure that the level of the provided data services meets the users’ requirements.

How can one achieve X and how can you measure or test it?

  • Procure SW and HW as a service or build a strong IT department that is capable of providing and supporting the required infrastructure.
  • Abandon infrastructure that is made up technologies with roots in the 20th century and use the infrastructure developed for the 21st century which can better satisfy needs of publication data on the web.
  • Use architecture and the up-to-date technologies that are specifically designed/optimized for the intended delivery
    • E.g. in memory computing, persistent data stores, data stores optimized for the expected amount of data etc.

Thu 15:40 - 16:45 Bar Camp Sessions (Round 2)

Share-PSI 2.0 Lisbon Event

Session: Unintended Consequences of Open Data Publication and Risk Assessment

Facilitators: Heike Scribe: Simon, Heike 7 people attended Introduction When we are talking about PSI it is of course ‘non-personal’ information. The presentation about infomediaries in Spain stated that most economic activity is done with geographic/cartographic and business/economic information, which is typically non-personal information. However, socio-demographic data came in 3rd place on that list. National statistics are usually highly aggregated and as a municipality we are much more interested in local data that allows us to look at neighbourhoods or individual streets. Likewise citizens wish to have information based on their postcode about the area close to them. That means that now a lot of additional local data is being published. What happens if this is merged together and could it lead to individuals, businesses and assets becoming identifiable? This session will look at:

  • Real-life cases from Birmingham and elsewhere
  • Discussion of the risks and how organisations deal with it
  • The approach to risk assessment in organisations that publish or reuse data

Quick show of hands

  • Have you personally considered risks of OD releases?

4 of 7 have done / are starting to

  • Does your organisation have an approach to risk assessment for OD?

None has an approach in place; one has risk assessment related to e-Government services and open data is at least mentioned; mostly focussed on privacy protection. One reuser said that they didn’t need to consider risk as they were not the publisher, which was queried by another reuser. Background of participants: Saxony regional government; Czech Ministry of Interior; UK small business; Serbian Research Institute / business; standards body; UK local government; Government Slovenia

Real-life Examples

Government / Data Owner

  • India decided to publish the land registry and it showed where parcels of land were not registered to an owner. Computer literate people understood the opportunity and put in claims for such parcels. When ownership was granted, the people that lived on the land but didn’t know about this, were evicted by the newly registered owners.

Developer / Data User

  • In UK FOI request put in, asked for row level results with postcode. The request asked for as specific a postcode as the council felt was right to give in order to preserve privacy – if full postcode [Postcode unit] it will, on average, identify ~14-20 homes, if you take 2 digits off the postcode [Postcode sector] it identifies thousands of homes – the company got the data with full postcode which could show childrens homes etc – decided not to publish the results of this analysis or republish postcode data. The assumption made was that the local authority would suitably anonymise the dataset before publication. In publishing the full postcode it demonstrated that such anonymisation was not applied and that a risk assessment of the publication had probably not been done.

Comments:

  • Anyone can buy address databases with names and merge them with other datasets to find out more about individuals.
  • If open data was used to harm somebody and politicians would probably block all data releases.

Municipality / Data owner Housing data: Currently reviewing what information about social housing to publish as open data for example, address of the property, property characteristics such as heating systems, tax code, size, no of bedrooms, has it got a lift / disabled access, double glazing / window types, empty property, maintenance figures (performance of contractor) Potential risks that have been mentioned, and those outlined below, are they credible?

  • Ability to identify where social housing is to protest against people on benefits
  • Criminals could target social tenants, often vulnerable people or lower educated, to sell unwanted services such as home improvements or defraud them
  • Council receives cold calling from companies that want to sell services e.g. window replacements, repairs and this creates an unwanted overhead
  • Empty properties could attract squatters or people running parties.
  • Journalist get hold of data about empty houses or houses in poor repair and write negative articles
  • Infrastructure information e.g. electric grid – substations – targets for terror attacks

Comments: Are there any positive consequences expected? Yes, for instance:

  • If tenants apply to the council with a need for a certain property re size or area that is not available, the open data of other housing associations could be used to refer the tenant to a more suitable property.
  • The data could allow a company to offer solutions for specific problems e.g. if 5000 homes have the same boiler and a company has recently found a fix for a common fault in these boilers, they could offer this to the council. Data could encourage new business opportunities.

How valid are these concerns and what could be done to mitigate them?

  • Could provide partial data instead of the whole database
  • Department should analyse the data and procure better solutions e.g. for maintenance
  • Companies can buy commercially provided socio-economic data/market research data that classifies deprived areas. If public open data was held back there would still be commercial data i.e. the information is already there it is just more expensive. Publishing open data can level the playing field.
  • Could do a ‘honey trap’ test, i.e. publish one dataset that is controversial and monitor if it does attract particular negative use and unwanted attention. Then decide how big the risk is and if it’s worth publishing or protecting the data.
  • In Sweden a local application has been developed to map where criminally convicted people live and despite some controversy it is still available at https://www.lexbase.se
  • In the UK data is published that shows where a crime has occurred but it blurs the exact location, this example was given during the Samos conference. There is a big difference in identifying the perpetrator or the victim through the location.
  • One cannot mitigate all risks of personal identification.

Bus operators / Commercial data owners Birmingham has bus stops with electronic displays that show when the next bus will arrive. This information is based on the timetable and does not represent the real time of when a bus will arrive / is delayed. The real-time information (RTI) exists but is owned by several competitive bus operators. The citizens would have better information if the real-time information could be made available but the bus operators fear a commercial disadvantage if they share the data:

  • Bus operators are competing on certain routes. They are concerned that if they publish their data a competitor could send a bus or a taxi could drive past a bus stop where people are waiting and reduce their business.
  • That actually happens in Serbia and it happens because of local knowledge and initiative of taxi drivers – not because of RTI availability.

Experiences of participants:

  • In the Czech Republic the decision is to publish open data at 3 stars level; no plans yet to look at linked / 5 stars data because they are unsure how to deal with it.
  • The company that one delegate used to work for collected data to forecast voting behaviour during elections. This data can forecast all sorts of behaviour and the state started to market it. E.g. combining the social economic status data with data about the car model (which represents another view on spending power), matched with local spend data. Working with German post office who provided their dataset of customers that collect stamps, this is now being used for targeted advertising by pointing out areas where the post office could have more customers according to profile.
  • In the US a dataset was published on health issues to encourage prevention strategies. It was found that it allowed people to drill down to very low granularity and identified single instances of persons with a certain condition, in a certain age bracket and locality. This enabled a state Governor to be personally identified …
  • PSI Directive states if it is generally accessible it should be made available for reuse. Slovakia and Netherlands have a rule not to give out personal data e.g. public official salaries must be published but they are not made available for re-use. But there are many good cases where this type of data has led to prevention of corruption. E.g. matching data about regional development with political structures allows judgement about investment decisions being made, politicians know that they can be held to account.
  • People derive information from data that we’re not aware of e.g. in Vienna a company has used social media to identify 6 main characteristics of people and is selling this to employers for HR purposes. There is no guarantee that the algorithm used is accurate and creates realistic results.
  • Are there any discussions in UK Government about security / protection from threats when talking about open data? E.g. in Russia even archive data from 19th century is restricted, hardly anything can be accessed. In the UK and other western states so much is now available on the internet.
	Not sure if this is really an open data issue. There are no plans to release the blueprints of prisons or military bases, for example as these are restricted of national security reasons. The practical experience of people in the UK, and arguably most countries of the EU, is that citizens’ privacy is more often breached by their governments citing terrorism as the reason, that is the greater danger. 

Satellite imagery is freely available and it has included pictures of secure locations. The ODI’s Data as Culture commission included James Bridle's work, which deals with secrecy surrounding drone warfare.

What steps can we take to help with risk assessment and risk mitigation, what experiences do you have?

  • Need indicators e.g. Germany has indicator levels for aggregation of geographic information; there are national laws in the Health domain to protect identification of individuals. For instance health insurance companies will not allow contractors to take data out of their system.
  • Identify areas where data should NOT be made available
  • Identify the expectations of the data users and potential benefits and compare with the risks
  • If the public sector published open data at least it can guarantee the quality and description / meta data.
  • Developers/data users might need to take responsibility
  • Need more research, not much is known
  • Have 1 unit/group of people to check data, one unit in Slovakia approves each eGov project, cross-cutting group multi-disciplinary group (ICT security can block pretty much anything)
  • Potential to refer to an Ethics committees for research
  • Review the potential benefits vs the risks and likelihood
  • Ask who is downloading the data and for what purpose [one could ask for registration to download data but then it wouldn’t be fully open data]
  • Each database needs to be considered and checked
  • Check if a database / dataset is closely connected to personal data
  • Consider publishing partial dataset to protect some information
  • Honey trap test: try out if a dataset does attract crime, abuse, unwanted attention
  • Check if rules/laws for aggregation already exist in domains such as health, geographic information

Conclusion / Good Practice

What X is the thing that should be done to publish data/PSI on the web?

  • A review mechanism or risk assessment method for open data publication

Why does X facilitate the publication or reuse of PSI?

  • Because it is needed inside many organisations to get permission to publish data, it alleviates concerns of both data owners and data users. It could create better data, that is more trustworthy and has greater potential for benefits rather than abuse.

How can one achieve X and how can you measure and test it?

  • E.g. establish a multi-disciplinary review group with some rules and risk indicators and a review and sign off process in place.


Overcoming the resistance

Facilitators: Cristiana, Muriel

Scribe: Jan

Minutes

Note: CI = Cultural Institution

Introduction

  • Cultural institutions are the PSI holders. They want to use the IP rights to get some money (some of them are at least partially self-financed).
  • Cultural institutions in general require large amount of money to run. We need to create economy around them

Highlights of the discussion

  • Not every CI is the same. E.g. In the library sector they have been sharing data and metadata for a very long time. Archives are different but still open to sharing. Museums – curators do not want to publish data because they are afraid that someone identify errors in the data.
  • “Thank you for using our data” – it is all we want from the users but the users usually do not inform the providers about the reuse of the data.
  • It is difficult to actually track the usage of the data. However tracking of usage is applied in the music industry. This might server a source of inspiration.
  • By selling data CIs usually do not earn enough money even to recover the costs.
  • In general business model does not have to be based on selling data. E.g. data might represent an added value in a service.
  • There are legal obstacles in sharing cultural heritage data (e.g. special right to faithful reproduction of masterpieces in Austria or Italy). Legal regimes are different in different countries.
  • Private partner involved in digitization of the cultural heritage might represent a risk. E.g. in situation when the CI has limited access to copies of the data and is obliged to pay for additional copies.
  • Initiative of one organization might attract followers. In Italy digitization and making data available (even though not for free reuse) in one organization inspired other organizations to do the same thing.
  • The following factors facilitation PSI/Open Data publication and reuse were resulted from the discussion (description of these factors is included in the Conclusions section):
    • Success stories and good examples
    • Full cost calculations

Conclusions

Success stories and good examples

What X is the thing that should be done to publish or reuse PSI?

  • Success stories and good examples showing that publishing data can help the IC to fulfil its mission, e.g. it might lead to increased number of visitors. Putting data on the web is a kind of marketing – publishing data means Google can find you.

Why does X facilitate the publication or reuse of PSI?

  • Publishing data makes CI more discoverable on the web.

How can one achieve X and how can you measure or test it?

  • Measure, how the CI fulfils its mission and correlate to indicators of the usage of the published data (e.g. visits to the website).
Full cost calculations

What X is the thing that should be done to publish or reuse PSI?

  • Making cultural heritage institutions to calculate the full cost of selling data – in many cases they cannot make enough money by selling data to cover the costs. Costs of selling the data should be compared to the costs and benefits of sharing the data.

Why does X facilitate the publication or reuse of PSI?

  • People will realize that it is in their interest to actually share the data.

How can one achieve X and how can you measure or test it?

  • Analyse data published, logs in order to get data about the visits of the data and correlate with the visits to the institution. Positive correlation should show that publish data helps the CI.

Friday 5th December 2014 (Project Meeting After Workshop)

File:ScribeSharePSI FRclosing.pdf From Johann