Warning:
This wiki has been archived and is now read-only.

Samos/Scribe

From Share-PSI EC Project
Jump to: navigation, search

Session 1 Coordinated Action on Open Data

Scribe: Phil

Maxk – introduces session, emphasises time keeping. Apologises for iMinds not being present.

Heather Broomfield & Steinar Skagemo

slides File:NorwegianPublicSectorSharePSISamos.pdf

Reform and develop the public sector.

Examples and context for what we're doing in Norway.

Slides show different laws, policy and tools

Aware of the three areas of open data

We think efficiency in public sector has huge potential.

Interesting figures – huge increase in awareness of value.

Shows example of US woman living in Norway. We know what she needs, the data is there.

Citizenship, long process and yet actually the state already knows all the data they're asking you to compile.

Not just about open data, some is private

Explains traffic light system.

Applies to use case.

Steinar: shows benefits

Talking about road database – wants to make it easier to use for better collaboration. Made available through open API, built own services.

Shows external app.

Opening data revealed inaccuracies (traffic accidents in a lake) – police etc. using a default location when no data is available.

yr.no open meteorological data

Improved mission fulfilment – Norway has a lot of weather.

Usual argument about whether data needs to be improved before publication or as a result of it.

Norway mandating use of shared data – digital contact data for citizens. Single register, single sign on.

Giving open data on taxation means cost of collection is among the lowest in the world.

Talking about complexity of one part of the IT system. It's a concern for everyone

New laws coming that will help to cimplify things.

Shows US model which is much simpler but only breaks even after 4.7 years.

Slide shows list of benefits for NPRA.

Design for sharing makes things simpler (with relevant control mechanisms).

Not just open data but data sharing…

Priit – you mentioned 6 fold rate of return on the business case (US model) – can you explain that a little more? Says a little more, then… see the US dept of Homeland for more.

Amber data available to the citizen it's about.

Muriel – do you see a relationship between actual re-use and reusability? By opening up, we see a lot of orgs get better data – roads and met office data, business register etc. It solves the need for a separate copy.

1.4M lookups per month on Norway company register – a country of 5M people. Just announced will open more as Linked Data.

Speaker notes

Slide 10: These are the benefits we have focused on in our paper. To confuse you, I will save the first one to last and go a bit more into detail on that one, as I think number two and three are more well known.


Slide 11: The Norwegian Public Roads Administration Maintains the National road database. A few years ago they decided they needed to modernise the system, partly to make it easier to collaborate with partners. They ended with a solution where the data was made available through a an open API - as open data - and then they built their own services on top of that API


Slide 12: This spring a private company released this service based on the same API the public road administration uses for their own services. The new service shows where traffic accidents happen, on what day, seriousness etc. This is in front of the supermarket where I go. Luckily I don't own a car ...


Slide 13 The new service has revealed issues with the data quality

Like this; Why do a lot of accidents happen in the middle of a lake, far from any roads?

According to the Public Road Administration it is due to the police and health-care not registrating the accidents correctly, so this is where they place all accidents when the localisation of the accidents has not been registered correctly.


Slide 14 One of the most famous open data examples in Norway is data from the Meteorlogical institute.

After they released their detailed data nearly seven years ago, they have recieved a lot of feedback that have helped them identify errors in their meteorological models.

Better models means they improve their missionfulfillment, which is to protect lives and values in Norway - a country with _lots_ of weather ...

So Sharing of data and data quality - which comes first?

There are those that argue that the data quality must be improved before the data can be shared, And others who know that data must be shared in order to achieve quality improvement


Slide 16 There are several examples in the paper. One of them is a recent one, using «yellow data»

It is a new register made possible by a recent change in the Public Administration Act of Norway

The change is not only allowing the data in question to be shared across governments, but is to become Mandatory to use it.


Slide 17 The data in question is digital contact-information to citizens.

The source for the register is the national authentication portal that gives single-sign on across governmental e-services.

For all contact through means like email, sms, phone etc, every agency must use the data from the new register.

This will improve the data quality as there is only one register for the citizen to update, and every agency must use that. Less redundancy.


Slide 18 For the last (or first) benefit it is important to remember that re-use of data in the public sector is nothing we've invented. It has been the goal for many years.

Nearly twenty years ago the tax-reporting in Norway was «inverted»; instead of the citizens reporting to the authorities, the authority started to report to the citizen what other agencies and private businesses, banks etc had reported on our income, savings etc. So unless there are errors we don't need to do anything.

In fact, the systems related to tax reporting is so efficient for the citizens and businesses that despite having high tax-rates, an international ranking place Norway quite well when it comes to the total burden of taxation on norwegian small and medium sized businesses. Little or no need for expensive lawyers or tax-experts.

As Phil Archer pointed out earlier today, this is the result of an evolution. The evolution of datasharing through system-to-system integrations. Each one justified by a business goal, but the sum of it ...


Slide 19 We've been criticised for using this foil for representing the complexity resulting from the evolution of integrations of ict-systems in norwegian public sector agencies - a complexity caused by decades of tactical choices trying to fullfill business-needs

The critics say the the illustration is overly simplified ... so I must emphasis this is a «_simplified_ model of _parts_ of the systems» in the Norwegian Tax Authority.

Several years old, by the way.

Some are concerned that this complexity is about to bring the whole machinery of government to halt. They say introducing reforms today is not a 3-5 years process. We are looking more into timespans like 10-20 years - because in addition to build new systems, we need to _change_ existing.

An illustration: In 2005 the parliament voted for a new penal code in Norway. It still hasn't entered into force. Because of ICT-systems.

This is not only a concern for the public sector. Also banking and telecoms are experiencing this.

As said, the complexity is a result of systems being designed for functional requirements, and focus on the goals of each _project_.

But reality is the system will live for 20 years, maybe more. And that is 20 years of _changes_. So how do we design for change?


Slide 20 This is a model from US, Departement of Homeland Security, made on basis of a case study.

By doing «data architecting»-activities - and design the system for change, that will make it easier to share and re-use the data later, the benefit is 6:1.

Unfortunatly, the break even is after 4.7 years ...


Slide 21 I've already presented the case. Have the Public Roads Administration experienced other benefits beyond the Improved Data quality and Service Delivery mentioned already?


Slide 22 This is the list of effects presented by the Road Administration themselves.

I think the most important related to the benefit «Design for sharing improves efficiencies" is bullit-point two and three, as it shows a way out of the complexity issue

«Improved agility through radical simplification of the ICT infrastructure»

The experience from the Public Roads Adminstrations strengthens our belief in the fact that the easiest way to make sure you are designing your system for _change_ is by design it for _sharing_.

In essence it is about putting the data on the web - with the necessary security mechanisms depending on whether it is red, yellow or green data - and building services on top of that.


Slide 23 So we don't just talk open data - we talk data sharing!

Johann Höchtl & Peter Parycek

slides File:Samos SharePSI Austria UptakeandImpact fin.pdf

Peter introduces self.

Open Gov data in Austria – we learnt from others e.g. data.eu.

Federal structure is important factor.

Don't want data islands – need standards for metadata.

We need a central portal of all data sets in AT, through metadata

Worked with OKFN on creating open data on companies, launching tomorrow.

Talked to admin – we don't want to store data from companies or we'll need to have contracts with them.

New portal won public service aware last week

Open data portal from OKF AT and Wikimedia AT.

Possible through standardisation.

but standardisation can be too slow so we built up framework . See slide – various working groups.

Real time public transport data lead to 14 apps. Company who creates data said their app was enough – 13 others thought they could do better.

1 metadata schema for AT. Able to match DE and AT metadata.

Johann – insights into ongoing research. Slide is informative

Managed to persuade some nay sayers in Vienna…

Again, slides show all relevant points

Mateja and Gaspar Zejn

Introduces ministry and work with anti corruption agency

Transparency seen as important in fight against corruption, also in OGP and OSCE

Provide advice on how to access data – reactive transparency in response to requests, proactive is publishing the data.

Slide on Supervizor is detailed.

Matching public transaction data with company register helps cut down on corruption. Supervisor written into the Act.

Gaspar - Slide shows detail of components - Shows some screenshots of the tool in use

Personal names removed from transaction data before publication.

Can be hard to match with company register due to incomplete data.

More examples of openness leading to data quality improvement.

Q from audience: Do you have examples of public finding evidence of corruption?

Yes, the school headmaster who was spending a lot of money with his wife's company.

Numbers can be hard/dangerous to examine as they're just numbers.

We had to turn off logging on the server because it couldn't handle the volume.

Everyone looked up their home environment.

Mateja – main effect is preventative.

A Federation Tool for Open Data Portals Dolores Hernandez

Slides are detailed and informative

Muriel – did you calculate the investment?

Dolores: no. but it wasn't expensive as we used standard tools and methods.

Athina – I'm always happy when I see geo examples. Are you working with IGN Spain?

Dolores – they have some data in their portal that they're going to integrate more actively in the federation.

Panel

One person from each presentation. Add Michiel de Keyzer from PwC. Working on EC projects, working on SEMIC for ISA Project. We implemented some vocabularies – see leaflets outside. also working on Open data Support for DG Connect. Provide training etc. We were faced with a Q – where do we start? Which datasets should I publish first. So we worked on answering that for them. Identifying high value datasets etc. We developed a definition.

Deirdre Johann and Peter – you don't enjoy talking to lawyers about OD and distanced yourself from the PSI Directive. But countries will have to implement these amendments – esp open by default.  in the country you represent, how ios the PIS Directive being imple,nted?

Dolores – we have finished the work to make the transition in our legislation. We have a programme APORTA that includes workshops, forums with associations with enterprises, meetings with training with owners of data.

Steinar – not in EU but directive applies to Norway. Directive broadens scope and makes it harder to charge for the data. It takes longer to transpose in Norway – but we already comply with almost all of it (except cultural sector).

Peter – the potential is there is the data is really free. In many countries the data won't be free. I'm sceptical if the new directive will be successful. That's why we take a different strategy. People like the idea of marginal costs so that's going to restrict access to the data. Which is why we like ccBY. I wish we had a law like Slovenia's FOI act. We (Austria) didn't do well in table of openness.

Gaspar – we have a very good law in Slovenia so the new directive is already partially implemented. Non-commercial use is free (only?)

Transportation data should be published but it's not. This is because the company that provides the service lobbied successfully.

Makx – so it's a mixed picture. What I hear here is that the new PSID is not perfect as it leaves open the option to charge for some data although we often see that it is more expensive to charge than to give it away.

Feroz – Supervizor – we can publish once a year, once a month etc. You gave the example of the school teacher who gave a lot of money to his wife – is your data updated in real time?

Gaspar – at first it was done in monthly dumps but now it's daily although it depends on the data set.

Q: When talking about FOI, is the conflict of privacy. There were several cases – the traffic light system looks like a good solution. Who decides what is red/amber/green? Is it in the law or is it in the gift of the PA.

Supervizor – you exempted transactions below €200 – but if you really want transparency – you also need to open the tenders themselves. Then you get into the privacy of business data. Not everyone wants their business laid out – as they'd see it as their business practice.

Mateja – the info commissioner is a strong institution. The decisions are binding on the institutions. Many years of practice – it's now struck the right balance between privacy and what is generally accessible. But this is obviously sensitive. Info that is connected to public spending generally is open, it trumps privacy.

Recent ECJ case concerning farm subsidies. Court said that there should be a distinction between accessible online, and what can be accessed through FOI.

Dolores – the info that is published – it's the owner that decides. We need to be careful of the law on privacy. A new law on transparency needs us to publish spending data in open formats.

Steinar – traffic light system is not a legal framework. It was invented as a guide. It's about sharing more than publishing. If you do it right, you can improve the privacy by making it available to the person it's about.

Session 2: City Life and Open Data

Data and services to data in your neighbourhood

Chair: Mateja Presern

Scribe: Steinar Skagemo; (10 minutes + 5 min Q&A per speaker plus 15 min panel)

Helsinki Region Infoshare

A transparent City

Ville Meloni - Programme Manager - Forum Virium Helsinki

abstract Web site [slides]


Background

Helsinki Region Infoshare - www.hri.fi

Open Public Data from Helsinki region for everyone easily to use without cost

Our dream: A transparent and open, participatory, funcitonal and lively city

The Mayor of Helsinki is hevaliy behind this idea

Examples in the slides of use-cases -- easier dialogue, access to relevant discussion topics, encouraging citizen participation.

Special in Helsinki; electronic document management system - Ahjo - used to prepare all decisions in Helsinki

Ahjo is now Open (March 2013) - open API fully documented in English. Links also to video of meetings. See link in slides.

What has happened? Have seen interesting apps

One example: "Decisions" -- a good user interface to the API, including mapping all mentioned addresses to a map.

See list in slide with list of other ongoing activities, amongst these are:

- Working with users to pilot real-life use-cases. Sees the civil servants as an important user-group

- Using prize-money to encourage creation of prototype applications and visualisations

See slide with challenges, including citizens unawareness of public decision making processes and lack of standardisation

Ville Meloni is looking forward to a collaboration on this topic

QA

Q: Athina, OGC, on the need to improve the co-operation between Open Data and OGC, to ensure that open data can make better use of standards and experience from the many years of work on geospatial data

<incoming-lost-luggage-call from airport distracting the scribe>


Open Traffic Information Standard & Experimentation for Enhanced Services

Philippe Mussi for Jean-Marie Bourgogne, Open Data France 

abstract paper slides


Public Transport Data in the City of Gijon

Martin Alvarez-Espinar CTIC, Spain

abstract paper slides

</incoming-lost-luggage-call from airport distracting the scribe>

XiXon Sound: Creates music based on the traffic data from the bus-serivce. No-one never envisioned that traffic data would be used as source for music …

New low-cost displays built by recycled computer-screen with Raspberry Pi, in different public institutions (hospital, railway station, university)

Not only social benefits; see slides for a calculation of the saving pr each display (excluding installation cost)

Direct savings for the Government: € 0.8 million

Q: How did you get support for opening up the data?

A: Had to convince the managers, but it was easy, because they wanted to provide services to the customers. Saw the result after only a week of work. Had already a not-so-popular service based on SMS. The top-level was convinced -- that made it easy.


Open Crime and Justice Data in the UK

Amanda Smith - The Open Data Institute, UK

abstract paper [slides]


Launched data before the data.gov.uk-portal

Background on the policing in the UK. Wanted to make the citizens more aware of local policing teams and crime rates

One force spend a lot of money to map crimes to show on a web-site. Realised it would be better to do something for the whole police-force.

2008-version of the Crime-mapper: Shows hot-spot map, changes over time. But there was a fear that people became scared

Made a study of how people reacted, asking 1700 people. The result was clear, this was information the public really wanted.

The open data agenda hit UK i 2010. Gave lot of support for the work from the Government.

Todays situation of how police.uk looks like now, and how it works behind the scene

Also supporting the _users_ of the data, improving for instance easy generating CSV-files of the data they want. Show-case of apps made based on the data. Includes a Pebble-watch app that keep you updated on the crime rate in the area where you are currently walking.

With personal data at a granular level, how do you strike a balance between privacy and transparancy? The solution is to map all crimes to points that does not represent individual adresses. Each point covers a minimum of 12 postal addresses, and the point itself is located in the middle of the road to increase privacy. See slide with illustration.

Also showing what has been done with a reported incident, both by police and courts.

Superman!

Critics for making it harder for the market, is it necessary for the government to create the user-interface?

Slide on the future, for instance focus on underlying data quality.

Future: "Track my crime" - to track it through the judicial system

QA

Q: Examples of police forces manipulating the statistics - what would be the effect?

A: Coupling toghether National Statistics quarterly statistics with the realt-time data (monthly) improves the data quality. Also have examples of citizens discovering that their reported crimes does not show up; "Where's my burglary?"

Q: Measure the effect? Like estate-prices?

A: Amanda answered 6000 emails first year, no complaints. But all the time focusing on having an open disussion on the subject to find out if there are negative reactions to the openness


Experiences with Open Data in the Fire Department

Bart van Leeuwen, Netage, Netherlands

abstract paper [slides]


Two hats: Firefighter _and_ a company for open data for firefighter force.

Two versions of the same incident - a graph and a fire

My Fear - that all the information necessary to prevent and incident actually was available, somewhere

Timed with the Amsterdam initiativ for Open Data

Some experience with the Open Data - might sound negative, but restates that he is really enthousiastic for Open data

For instance a car and a building will change a lot very much from how it is in the registry and how it looks when the Fire departement arrives

The polygon of Anne Franks house

Previously needed to pay high costs for licensing information that is freely available today.

But a hurdle is the amount of data, as there is a lack of a good API.

Accountability: A fire fighter never make the best decision, but a decision that hopefully is good enough. Because they have to act, can't spend to much time analysing the information available.

If you choose wrong, someone gets 500.000 € and half a year to analyse the situation and find what they should have done.

Measured timeliness for fire trucks according to the classficiation of buildings and (age, historical building, demographic, …)

If co-relate the timeliness-data with risk-data it shows that the fire fighter is on time where they _should_ be on time.

The harbour: The trucks are always too late, because there are no life at stake, only buildings, and they are insured.

Chemie-pack - example of the need to make decisions quickly. Has been made a documentary about it. The first officer noticed that anything he ever learned was irrelevant. Dilemma: Chemics need foam. Aceton need water. What should they choose?

Sometimes it is better to let it burn and let the heat "clean" the air; if the fire is cooled by water, more of the dangerous chemicals will survive and spread with the smoke

Off limits …

Should we plan for un-predictable things? A Skoda in a church roof.

Open Street Map normally way better than the government maps.

Bart predicts the first question: How did the Skoda get onto the church roof? 170 km/h, ramp. Bart will post a video explaining the mechanics


Panel discussion

In addition to the speakers, Daniel Pop, West University of Timisoara, Romania

abstract paper

Working on piloting a platform on raising citizens awareness, arranging Open Data Hackathon etc and other examples which shows that Romania is making consistent steps to bridge the gap that exist between Romania and other countries that has had several years of activities related to Open Data.

Q: The main target audience is the local population, but what about tourists? Wouldn't necessarily be aware of the local open data and apps, that might not even be available in other languages

Martin: Tourists also part of the target audience, but missed it from the presentation.

?: A small municipality who used the platform to raise awareness. Managed to install over 40 displays in public places to advertise news and e-services from local government to citizens and tourists.

Philippe: What we miss is the users we don't know. From the experience from Marseille, Cultural Capital of Europa last year; had ambition to build a service for "beach addicts". Took data from restaurants, social media etc. Now a private company. Vamos a playa! http://opendata.regionpaca.fr/applis/appli/vamos-a-la-playa.html

Ville: Helsinki: Targeted tourists specifically in hackathons last year. Service maps, transportation, events. Have tried to work for standardisation of for instance events-data.

Amanda: Two groups who stood out as users; expats wanting to see what's going on back home and potential students considering where they would like to study

Bart: the real question is how do you support the data being used in different situations, for different target group. There has been som work on a "City SDK" which aimed at making it easier to move applications to new areas. Would the hotel like that the Fire-departement told the tourist have well they are doing with regards to fire-safety?

Q: Really interesting examples. But it is "status"-information. What about the dynamic element? Anticipating delays based on the regularly delayed railway lines? Or the government starting to work specifically on the crime situation in a certain area

Bart: Use sensor data from main roads to see which roads are congested. Use the data to actively engage with the citizens; you are in a high risk area, think of an escape-plan etc.

Amanda: To make it more dynamic would be to add more datasets, and time, date. But it would make it easier to identify persons. Have already some advices related to the information, but could go further.

Ville: Don't have to much experience yet. Looking forward to discussions that will come for instance if someone notes that decision-makers are treating demographically different parts of the city different, in an unfair way

Philippe: Try to do it for the regional transport planning system. Would like to emphasis that a lot of data is user generated data, like for instance signals from cell phones. Another example Jelly-fish monitoring on the beaches of the Mediteranean. People can report, and the data is open.For the moment there are so much Jelly-fish that everything is red.

Martin: Very rich information, for instance from traffic-cameras -- this information is also dynamic.

Mateja closed the session, and concluded that the presentations gave a great overview on lots of things that have happened, but some challenges remains though. A lot of useful ideas for what data could be used for.

Session 3: Order from Chaos

File:Rapporteur Noel.pdf

Session 4: Innovation & Insight

scribe: Amanda

The Flemish Innovation Projects: promoting innovation through encouraging the use and re-use of government datasets

Noël Van Herreweghe, CORVe, Belgium

Background: In 2010, Ministers of the Flemish Government met and agreed to take forward the transparency agenda.

Noel set about defining a framework that would focus on the strategy, legal, content and technology aspects.

Work to be completed included a concept note, strategy and action plan.

The ministers signed off the concept note, 2 years before new Share PSI directive, then defined an action plan for the upcoming years, all signed off

With the content and comms aspect, created an Open Data Handleiding 2.0 handbook

Worked to create a dataset register (both open/closed) and defined a URI strategy

Organised open data day in 2012 in Flanders (200 people), this has grown to 250 (2013) and now expected 300 (2014)

With legal colleagues defined 3 licenses, and re-writing legislation to accommodate new PSI directive.

Regarding technical: now using CKAN for open data platform and working to implement DCAT and necessary vocabulary.

Then waited for apps and services to be created - didn't happen, so asked for half a million euros (approved) and created the flemish innovation projects programme.

24 proposals was received, 10 of which were selected, costing a total of just under 700,000 euros Proposals included:

  • open data gent (www.gent.be) - all content for the site will be available as open data, not just the open data
  • implemented new central traffic control system (occupancy of car parks in real time) - dynamic mobility data
  • antwerp - crowdsource app and dams network (museums / archives collections of antwerp)
  • local government contact information (gent) - to be used by businesses, citizens and officials (due for delivery in august 2014)
  • westtoer - the great war centenary - facts, stories, etc.
  • economy, science and innovation - build an open data platform that links research and information data to stimulate multidisciplinary research
  • flemish codex - all regulations
  • tourism api for flanders

Q: who will finance further managemet of portal?

A: we will have to (including the APIs we have provided) – why develop a platform which you can’t maintain?

Open Government Data - Fostering Innovation

Feroz Farazi, University of Trento, Italy

Open Data Trentino

Some of the data couldn’t be published, either personal data, fear for national security, or internal documents written by researchers -- which can't always be published as subject to intellectual property rights.

It was hard to encourage opening of data sets for transparency - risk adverse - may think that something is being hidden.

Convinced them by suggesting that citizens and developers could help propose solutions for innovative problem solving.

One example of data opened up was queues at post office - to understand customer experience and how to improve this for them.

ODT project worked with provincial departments and offices - advise on better formats to use as the process continues, getting more and better data

There were 650 datasets: problems with incompleteness of fields / rows (e.g. column headers) and naming (trento / trentino)

Worked to convince publishers not to just publish data once but to continuously update and improve and re-publish the data

Q: Entity types – did you think of re-publishing these?

A: Relying on existing standards / specs - but should be able to republish these

Publishing and Consuming Linked Open Data with the LOD Statistical Workbench

Valentina Janev, Institute “Mihajlo Pupin”, Serbia

Slides are informative

LOD2 project – results to be presented at LAPSI (September)

Product – linked data stack to go-live in September

Integrated tools for managing the life cycle – extracting from databases, enrichment of the data, validation, interlinking, etc..

Worked with statistical office – extract data – map through Serbian CKAN (?????)

LOD2 Statistical workbench – tools integrated come from all partners rather than just the three shown in the slide – tool can be used for publisher / consumer

Once data is in RDF store, publish on CKAN – therefore statistical workbench integrated with CKAN.

Hard if data not coded with the same code list – first have to align the code list

Help user to define and choose values

Cubeviz – rdf data cube browser, once in store can use this to create visulations

Towards A Methodology for Publishing Linked Open Statistical Data

George Papastefanatos, IMIS / RC Athena, Greece

Slides are informative

Stats data collected by national agencies (census, geospatial, socioeconomic)

Usually provided as open data in excel and PDF files

Strict confidential policies that need to be discussed every time data is used

Statistical authority has a portal providing open data – preliminary results from the last census. Problems with this: dataset catalogue in .pdf format, no browsing, no querying, no metadata, cant see how its been processed over the entire lifecycle

Do we need open data? Large cost, little use, low value?

Converted excel format data to RDF, applying standard vocabularies

Analysed spreadsheets to see the main concepts for converting to RDF

Wanted to give them user-friendly tools to offer the data in RDF format as well and to encourage users/consumers of statistical data in Greece to take data in this format and integrate them with other sources and promote it in this way throughout the statistical domain.

Therefore, take stats results, model them for main themes, data conversion to clean the files, map to vocabularies, generate RDF files and validate the published dataset.

Also interlink the dataset with other sources of statistical information – interlink the main concept / dimension, to allow for further analysis

Statistical Linked Dataspaces and Analysis

Sarven Capadisli, Bern University of Applied Sciences, E-Government-Institute, Switzerland

Project with Stats Office Ireland was to convince of value of linked data

Eurostat – large datasets, converting and making available had its difficulties

What vocabularies do we use? Do we create our own? Who will look at the URIs later on down the life?

World Bank provided the low hanging fruit as the data was out there, but there were 4 different departments with no collaboration and data was understood and published in different ways (for example, finance departments recognised Taiwan as a country, but the World Bank indicators didn’t – therefore no code in the code list for Taiwan)

Too much command-line junking – process very frustrating, but change came when agencies published in different format (SDMX-ML) – therefore could use the data used within the agency. Stuck with SDMX-ML as a format due to time to convert from CSV/excel Will these URIs be ‘nice’ URIs in 5, 100, 10000 years?!

Provenance layer throughout all the data – as retrieve the data – add a statement saying when it was received, what transformations were made (and based on what retrievable activity) – indicates licensing etc, allows tracking to the source.

Easy to understand historical data (for example, how many people born in Berlin since 1900) but what about interesting analysis down the line? Predicting / forecasting?)

Stats.270a.info >> aimed for non-developers (data journalists, researchers, etc) and linked data friendly – don’t need to type queries, can click through.

Session 5: Bar Camp

Location Data

Scribe: Daniela

  • What standards for open data/(open) location data
  • there is a potential overlap between open data community X location data community
  • location data community (Geonames, OSM)
  • linking geographic information to other kinds of data
  • space and time are two main features, everything is geographically located
  • which standard for different types of geospatial data
  • quality management of data (example: improvement of bus stop data)
  • infrastructure - data catalogue, applications
  • format of the data, Inspire community is looking into Linked Data (Linking Geospatial Data has been held by W3C and OGC)
  • how to make that standards are used (by government, by developers)
  • data is in so many different formats
  • OGC (Open Geospatial Consortium) - everyone can join, all tenders have to comply to INSPIRE
  • .csv standard - description on the standard, header describes the columns
  • "open" side of data - standard-side of data - accessibility and the different communities
  • table joining standard (tjs) - how to join different tables and complementary information
  • standard vocabularies + semantics are needed (for instance two columns "street" - meaning something different)
  • different standards related to different geospatial problems (location, meteorological data, layer, etc.)
  • good examples for opening up geospatial data Mainz, Zagreb

Inside Government

Leader: Nancy Routzouni

Scribe: Jens Klessmann

Implementation approach

  • The group discussed which approach to follow in order to implement open data policies:
  • Force is needed otherwise public officers in broad will not become convinced public officers need to detect added value for themselves in order to follow through
  • show political benefits in order to gain support
  • win over the managers in order to spread the effort within the public agencies
  • use a top-down approach with a strategy including a priority list of data sets
  • balance for whom data is relevant: politicians, journalists
  • build up pressure from potential re-users of public databases
  • Greek transparency project as a model? It has multi-disciplinary task forces as project teams in each public office
  • Conclusion of the group: combine bottom-up and top-down approach in order to reach application of open data policies.

Measures for implementation

  • The group identified different measures for implementation open data policies

Technology

  • from a technological view a centralized platform is needed, which also allows for decentralized open data platform
  • support officers with money to open up data from legacy systems

Education

  • provide supporting material for the officers
  • create healthy feedback loops between data publishers and users in order to minimize mis-use of data and improve its quality
  • Open Data strategy days so public officers have time to brainstorm about implementation of open data policies
  • have answers for common fears around opening data:
  • Reasons for not opening data: Noel has collected 53 fears. Top fear is that public officers think their data is not fit for publication

Motivation

  • motivation is a key factor because otherwise one can have as many action plans as you want
  • motivate public officers through inspiring events
  • work with emotions like anger, pride in order to motivate
  • create a sense of reward for public agents that it is worthwhile to open up data for them
  • give clear responsibilities to public officers

High Priority Data Sets

Leader: Michiel

Opening is costly - How do you prioritize?

Looking for a definition of high value datasets.

Need strong focus on reusers

User demand

  • When concrete demands from potential reusers.But demands quite vague often. Few answers to call for interest
  • But often, need to have the data available to generate ideas
  • In the UK, data around schools, planning(construction), licensing, location are very demanded. Developers in London asked for Crime and transport data. Not clear that it is the same eveywhere

What is value?

  • You can classify data by reusage: transparencyor app development
  • Even if not reused, the data adds to transparency, can serve as a deterrent for fraud and corruption
  • Depends on the audience small high value vs.large group with small value

No single factor

Concern that low value datasets may be the ones of interest to disadvantaged people

Need a strategy!!!

Metadata

Leader: Peter P A guiding vision for open data could be to approach to the data portal, ask a question and receive an answer which could be a visualisation / an application. To achieve that, high quality data & metadata, free and well working services.

Service metadata is just too tedious to be collected by humans. Instead it would make sense to be collected by machines along the way. The importance of APIs will tremendously rise, as the concept of “owning” data will increasingly make less & less sense. However, bad metadata provides just bits and pieces of a possibility to web service discovery. Existing metadata could get enriched by the public, however it should be obvious who provided authorative data.

  • Metadata can be generated
  • Automatically;
  • Community driven;

A Combination of tools and human intervention, where humans might be from the providing side or from the user side.

The question was risen weather metadata is that important at all, as eg. Metadata could get extracted from the data. While this is true for textual data, sensor data would still be required to record where the sensor is located and what kind of information is get recorded.

The Problem of data classification

Currently data classification is a big and unsolved problem. Data is classified from the providers’ side, whereas the users have a different point of views on that data, affecting the classification they would assign to. A wealth of vocabularies exists, however these vocabularies should get integrated into data management applications. These vocabularies are, for example, EUROVOC, CORE -vocabularies, etc. However, on a national level, there are different code lists to represent the same concept and in the end none of these classifications will be “right”. Describing things in a formal way, however, works out in domains like science.

The outcome of this track of discussion was that there will be chaos and we should embrace it – that’s what the (semantic) web is about anyway.

Organisation of data portals.

The way to go is to have federated levels of portals. This contributes to the idea of having several layers where metadata could get enriched. One way is to assign classifications on a national level, move up the hierarchy and once every level has added its metadata, the levels further down have the opportunity to discuss that assignment and possibly re-asses theit own classification. Overtime, more understanding on taxonomies will arise and higher levels of harmonisation.

Another way to go is to define guidelines at the top level and enforce them top down.

PSP – Pulished service providers

The outcome of this discussion was that each metadata has its scope and there is a good reason for that.

Additional discussion during the workshop

On July 1 Austria went online with opendataportal.at. The same metadata format to describe data is used as on data.gv.at. The project is financed by grant of Wikimedia, OKFN and City of Vienna.

The motivation for companies to release data is

  • companies convinced on the concept of open innovation
  • Corporate responsibility (by public pressure)

Some international examples have been mentioned such as Banks revealing their amount of money transferred or car manufactures their test results; Sometimes there already exist legal obligations for private entities to release such sort of data.

Besides convincing private entities by explaining benefits such as open innovation approaches, it might be feasible to create more legal pressure towards opening data.

The approach to download data is fundamentally wrong. Instead go for a system, where Metadata and data is provided by virtualisation of Services. Security is not a by-product but a function of a data portal.

Metadata should be described at the metadata level AND the fact / entity level.

Share-PSI Wednesday Meeting

Scribe: Deirdre (updated by Peter B.)

Kicking off with name game! Phil: Part of point of name game is that we should know each other and are able to contact each other whenever we need!

Noel: For the next meeting, could we have a list of names with photos Phil: Yes, we will take photos today

There are flyers, Gosha handing them out to everyone. We should utilise them as best as possible, try to put them in the hands of people that actually care!

Topic: yesterday's meeting Peter (Green tee): It was exciting to know everything that was going on, but the embarrassing part is that when asked who is using them, we don't have a satisfactory answer. TRIS environment, work out what the ideal solution is, then works out what the reality is, and looking at the difference gives you a quasi-metric of the direction to go in.

chris: fantastic case studies, but largely representing one point of view - representative of the publisher pov, not many presentations from the other actors, users, etc. in order to define best practices, we need a wider scope and define actors

makx: good use cases, but we also miss the reuser, not just the user, people who want to build businesses on this. we also miss the cost part, how much does open data cost and what is the benefit. and these are the kind of information we need to convince open data to move forward. Wanted to stress the word story-telling.

muriel: I was interested into presentations, but I lost a bit the topic of the workshop, I was trying to get arguments for open data, I would need more quantifiable data. in the end they were trying to find new arguements for the benefits of Open Data.

Jens: Too many topics, have shorter/less presentations.

Johann: Break-out session were good, when we have the opportunity to sit together, we should utilise it more.

Phil: Agree with all these points. A problem was how workshop was set-up, we said if you submit paper, you get to speak, but it was not so in at least 2 cases. Worried about the set-up, about offending colleagues, worried about cramming sessions. One of the most successful sessions were the break-out session, everyone got to talk about what they want

Peter green tee: what does the government actually use for efficiency measures? If we narrow these down to relevant numbers, we could do comparison across countries.

Measurable impact of the project

phil: how could you measure efficiency?

Neven: you can model the utilization of internal resources, may take time (several months), but how can you model benefits? most are social, intangible, nothing to do with pure cost-benefit analysis.

Peter Parcyk: They can talk to City of Vienna and look at how they made their measurement. difficult to get high-level excel sheets - sheets are not registered, the heads of departments have to be convinced to provide (note: what excel sheets? note P.B: official evidence/registers in the simplest form) Action: will make a summary of city of vienna, 2-3 pages

Harris: we can always measure money

Noel: You can't do it easily, don't know how to measure efficiency

Peter (Sweden): scope about efficiency is too narrow => rather effectiveness, not efficiency

Mateja: it's kind of difficult what is a successful measure, showed open spending, the way they looked at efficiency is something that is good for society, civic participation, corruption was lowered, etc. this is efficientcy

Muriel: agree that decrease in corruption and another measure that less requests being received could be used

Neven: we concentrate on effectiveness, there are standard models for such things. EC has LOST, Germany has LIDR, etc. in all of these models there are strong emphasis in social impact

Phil: a result of the project is that we need to look-at/review these models, Neven might be in best place for this.

Steinar: What are the metrics that are most useful to convince agencies - models don't necessarily motivate leaders of the agencies. Register of shareholders, fear to handle request, so they just opened it completely. this is one example of benefit. There is also the internal benefits of designing systems for sharing data, reducing operation costs, this is direct benefit for them. Connect internal process with Open Data - one process.

Noel: if you are talking about effectiveness, that you can measure, e.g. customer satisfaction, quality, efficiency seems like money

Daniela: Benefits of Vienna, show that from 100 apps worth €550,000 benefit....don't believe this. Anyone cna find measurements to come up with any number. Governments always ask what is the value of the data. The cost benefit isn't the point, the point is this is public data and is the right of citizens.

Noel: Neelie Kroes has said the value is xs billion, where did she get this figure? it's based on the study, but is the model known? what models do they use, the UK gov asked Noel this, he emailed Neelie, but didn't get any answer

Athena: there is a geo study on free and open data from the European Association for Remote Sector Companies, they talk about basics of providing of psi/opendata basics and also cost models

Phil: keen to come to conclusion, what can we do about this? one area that might be worthwhile is effectiveness instead of efficiency. And what models do we use

Scribe: Jens (updated by Peter B.)

Review of the workshop

  • Deirdre telling the group how work of workshops comes into the work of the Data on the Web best practice group
  • one of the main working groups in W3C
  • came from Linked Data WG
  • Idea: broaden the approach and look in general at Data on the web (business and government data)
  • Sister group: (bad/good) CSV on the web
  • deliverables
  • collecting best practices on good data on the web
  • guidance on meta data, publishment of versions of the datasets
  • question about what kind of recommendations as technical group they can provide
  • looking at non-technical issues with relation to
  • quality and granularity descriptions to look at quality of data
  • data usage descriptions: what is the data usage?, how do we describe it?
  • worked with Brazil - several use cases (collection) - we extracted general requirements
  • Working group so far has defined use cases to start with
  • this document is understood as a live document. Further input is encouraged
  • Task for the Share-PSI: have a look at the document and think about whether you can provide additional use cases (including challenges).
  • There is a standard format for the use cases on the web page of the working group
  • Chris: This is very much a publishers view and less of a users view and the value they see in the data sets provided, we should look also on what is the value for the receivers/customers, asked by Phil to write something - answer if there will be time
  • Steinar: Look at infrastructure for sharing data in Estonia (X-roads - Government Service Bus) - that is successful. They are not allowed to build new system without exposing the data to other partners
  • Daniela: Worth looking at readiness for open data. Effectiveness models would be great, they found only 3 and even those were copying each other
  • Muriel: If we want to measure effectiveness instead of efficiency, the group should stress, that there are efficiency gains, we are just not looking into them
  • Ville: ODI, OKFN and Web Foundation working together in New York on a common assessment framework (Common Methods for Assessing Open Data) - he is supposed to send the link
Indented line
  • Deirdre points out the link to the document Towards Common Methods for Assessing Open Data (PDF)
  • Phil: Problem with the Samos workshop: Participants were mostly only from the project, which was ok, because the topic was efficiency in government

Timisoara

  • Current topic: High value data sets (prioritization of datasets)

Discussion about a) moving the open session to Krems and b) how to organize it

  • Agreement to have the break-out sessions (bar camp style) in Krems
  • Muriel: Many different topics will come up then (none of yesterday's were on the topic)
  • Chris: I'm not convinced with 2 days of non-conference
  • Jens: Session proposals can be made on site or beforehand in written format
  • Peter (Austria): we should keep both approaches
  • Andras: non-conference should need strong moderation
  • Some info about the Un-conference method from Amanda
  • Phil: there were too many topics in Samos (18-20), another topic for a workshop could be "location data"
  • How can we attract participants not from the project to Timisoara?
  • Muriel: try to attract other EU projects to co-locate with Timisoara workshop
  • Peter ("green shirt"): try to find people/communities with synergies
  • Athina knows relevant stakeholders from Novisad
  • Daniela suggests to contact OKFest participants, OSM community
  • Peter: Try to invite stakeholders whose primary objective is to share data
  • Daniel: Reach out to active Open Data stakeholder structures in the respective country
  • Athina: There is a strong GSI community in Romania which she can invite, noting that Berlin should have attracted more people
  • Jens: FOKUS is still interested to organize the workshop in Berlin, but does not organize the conference Open Data Dialog anymore
  • Phil: Berlin workshop could be organized sometime between November 2015 and February 2016. We might still change the current hosts, if there will be strong will of the group to do so, but it will upset the current hosts. How many rooms are available?
  • Daniela: Timisoara offers 5-6 rooms (at least) for the workshop
  • Chris: to attract the outside people, the main topic should reflect it
  • Task for every member of the project: Notify the group whom you can try to get to Timisoara
  • Possible dates for Timisoara: 16.-18. of March 2015
  • Timisoara is easy to reach. 5 h from Vienna, 1h from Belgrad by car with direct connections to major European cities

Lisbon (2.-5.12.2014)

Scribe: Peter B.

  • Topic: Engaging with people from outside of the project (e.g. engaging with LAPSI project)
  • Phil:
  • shouting to start the second part ;-)
  • it's about users' (commercial) exploitation
  • the problem with Samos meeting - we were all in the room, we didn't attracted many other people
  • Lisbon would of course be subject to wi-fi, google hangout...
  • 2.12. - afternoon - project meeting start
  • 3-4.12. - wednesday morning - LAPSI
  • 5.12. - friday morning - project review - 3 hosts of the workshop has to stay for the project review
  • not everyone is supposed to be there, but majority is supposed to be there, Makx: otherwise it will be seen as bad behaviour
  • EC will be there
  • organization committee – leaders should be from users’ organizations (SMEs…):
  • Muriel: try to align with projects
  • Peter (Sweden): can present use cases (proxy)
  • final composition (9): Martin (Open Group), Klaudius, other SMEs representatives (Amanda, Jugas), Peter (Sweden), Atina, Makx, Simon, Noël
  • Athena: there exists "SMESPIRE"
  • Muriel: info/data from Galileo?
  • Noël: can provide result of the France week on Open Data
  • Peter (green shirt): ongoing contribution? - IBM smart cities, Google, Yahoo... ? - Phil – it’s easy to get IBM representative for him
  • Phil: impact?
  • Johann: think about Open Data lifecycle - I see those hackers missing - they have good business ideas = strenghten the open data ecosystem
  • Peter (Austria): focus on SMEs, Google is not using open data, because it's not in their format
  • Deirdre: Google does use the data, big entities are using this data as well
  • Muriel: attended EC PSI meeting, companies being there might be interested, + big companies as well - data for free - asked by Phil to check/create the list
  • Andreas: offered to contact Association of Hungarian content providers, proposed all things in all workshops (due to travel issues) because regional companies might participate in that way more easier
  • Chris: how can we advertise the project in 2 sentences for those people? Understanding of the commercial interest.
  • Phil: IBM is interested into licencing
  • Klaudius: completely open data will increase the competition
  • Michiel: working on the study related to licences (under ISA)
  • Daniela: working on the standard with Google to process data
  • Deirdre: would it be useful for people from the ministries of the innovation, employment, “to create jobs” agencies ? – more people: yes
  • Gosha: preparing model for GEO licences
  • Simon: guardian - online journalists (? don’t catched this correctly  )
  • Noël: video message from EC? Muriel - not if it is a video, rather someone to talk to
  • Daniela: lifecycle - data quality - there are some companies with this scope
  • Peter (Austria): quality - to get companies of each phase of the lifecycle?
  • Steinar: trust economy - cross-check with a lot of facts that are open data, someone from this community might have potential to discuss, in Norway for example about schools
  • Jens: companies that organize e-competitions? yes
  • Daniela: helpful - some kind of discussions - how can you earn money with open data if it is open
  • Athena: some austrian cities have this kind of reports - can search for some
  • Phil: workshop is 1,5 day - how many of those people above are supposed to speak?
  • Daniela: "speak dating"? several people like it, Deirdre - what's the goal? Phil - networking is part of the project, from the project point of view we have to come out with bunch of recommendations
  • Andrej (conference call) - venue:
  • Portuguese Statistical Office, fully accessible, near to the airport, big room - 250 people
  • University is nearby
  • smaller meeting rooms might be available (Phil 5? - Andrej - have to check)
  • there will be someone from government, good chance to have Secretary of State
  • linking to OGP
  • main topic - to address how to use the data, promoting open data within Portugal
  • portuguese main suppliers are supposed to attend, other ministries, municipalities, hackatons, possible some big capitalists...
  • Muriel: breakout rooms for more important people would be good idea to bring more participants, invite also people from outside projects, reserve the rooms for talks, Phil to ask Marta to ask (push) for important EC people
  • Peter (“green shirt”): breakout sessions - meaningful questions and answers are important for such seassions and they need to be finished
  • Daniela: one idea - open space is good example - splitting into more private discussions
  • Deirdre: suggested setup: morning - together, afternoon - breakout sessions
  • Phil: first thing - speak dating (agreed by consensus), then opening speech (when exactly?)
  • Peter (Austria): minister after light lunch? (Phil: well fed people vs attention, ehm)
  • Phil: can you all contact those people? There is already draft CSP on wiki, needs help with examples of companies with money (Amanda?), on Friday we are supposed to review it – more people haven’t received the draft, Phil will check./resend
  • Muriel: do we need Call for Papers? It might be interesting
  • Daniela: do the companies write papers?
  • Nancy: papers are more academic, we can consider to give some datasets and make call for applications - Chris - call or competition?
  • Peter (Austria): maybe we can combine - call for use cases + call for applications?
  • Phil: has worry about the facility, if it can accommodate what we need
  • Daniela: public spreadsheets? Phil - wiki, partially for the closed group – but it's not archive
  • Phil: main list share-psi@ercim.eu, wiki is on W3C system, ask to create an account => contact Phil => you will added to the group, there are all the papers, wiki is publicly visible, but only the group can edit
  • Phil: wrap-up: there is no money for anything (except those already provided), but everyone can spend their project finance as they want (more people – try to look on sponsorships)

Krems (May 2015)

Consortium Agreement

  • consists of two main parts
  • specifies binding agreements between partners in addition to the general agreement
  • the draft will be distributed to the consortium by Phil
  • follows the DESCA model agreement, prepared within the 7th FP, amended for the Share-PSI
  • 4 main topics in the CA:
  • general issues (liability towards each other, force major),
  • IP rights: Everything is going to be published under CC-BY license on the project website,
  • governance (general assembly, coordinator, management support team, meeting procedures, meeting obligations, decision making, voting rules, minutes, rearrangement of parties..),
  • finances
  • Attachment includes project logo
  • Applicable law is the law of Belgium
  • CA will have retro-active effect as the GA has already been signed
  • Phil has to check with W3C rules
  • Philipps organization still needs to officially be included in the consortium
  • Phil: thanks to everyone