- 1 Monday 16th March 2015 - Project Meeting 09:00 - 12:00
- 2 Monday 16th March 2015 - Welcome Session (14:00 - 15:30)
- 2.1 Welcome: Prof. Univ. Dr. Marilen Pirtea, Rector of West University of Timişoara
- 2.2 Radu Puchiu, Secretary of State, Chancellery of the Prime-Minister (Romania)
- 2.3 Experiences of identifying datasets for sharing, Benedikt Kotmel, Ministry of Finance (Czech Republic)
- 2.4 Capturing Best Practices, Chris Harding, The Open Group
- 3 Monday 16th March 2015 - Workshops (16:10 - 17:00)
- 3.1 Valentina Dimulescu, Romanian Academic Society: The Electronic Public Procurement System, open data and story telling in Romania
- 3.2 Vasile Crăciunescu, Codrina Maria Ilie, Technical University of Civil Engineering Bucharest
- 3.3 Site scraping techniques to identify and showcase information in closed formats - How do organisations find out what they already publish?
- 4 Tuesday 17th March (09:00 - 10:50 Plenary Session)
- 4.1 Jacek Wolszczak, Ministry of Administration and Digitization (Poland)
- 4.2 Branislav Dobrosavljevic, Business Registers Agency (Serbia) - SBRA
- 4.3 EU actions on Open Data – current policy and legal context
- 4.4 Good practices for identifying high value datasets and engaging with re-users: the case of public tendering data
- 5 Tuesday 17th March (11:30 - 12:40 Parallel Sessions B)
- 6 Tuesday 18th March (14:00 - 15:15 Parallel Sessions C)
- 6.1 Crowd sourcing alternatives to government data – how should governments respond?
- 6.2 How benchmarking tools can stimulate government departments to open up their data. Emma Beer and Martin Alvarez
- 6.3 Raising awareness and engaging citizens in re-using PSI (Daniel Pop and Yannis Charalabidis)
- 7 Bar Camp Sessions
Monday 16th March 2015 - Project Meeting 09:00 - 12:00
Phil opened meeting with summary of the December Review 1, talking through the various points made.
Looked at what the project is contracted to do, how we need to respond.
Chris then presented more ideas on how the project can translate its workshop outputs into Best Practices
Noel presented the mind map he produced which shows the elements of the PSI Directive. Encouraging scribes to see which PSID elements each BP relates to.
Yannis: I see the problem. The EC wants us to expand on the Directive. But it's far too limited for what we're trying to organise here. Things like CKAN nad DCAT are different from the PSI D perspective. After Samos we spoke to Peter from Krems et al and we published a taxonomy of critical success factors. We expanded the PSI D and saw that's it's not enough for categorising what we do, so we defined something a little broader. 30-40 areas. It speaks our language - training, evaluation, sustainability, infratructure, dissemination. Worth sharing? It can be a solution - showing that we did something on top of Share-PSI.
... If I were to write a guide for the Greek government, I'd use this. Whereas if I started from the PSID it would take longer
Noel: Take LOD. Does that fit into the PSID? Yes it does - the PSID has elements where you can make the links. Paul said we can expand on the PSID.
- Scribe switching to Benedikt Kämpgen
- Phil starting a discussion about the structure of best practices.
- This structure has to be decided upon: https://www.w3.org/2013/share-psi/wiki/Best_Practices#Template
- Peter: Extended title makes it difficult to include the right keywords.
- In terms of the template:
- Short description (Overview in documents)
- Peter: Circumstances where you want to apply the pattern. The context. How do I know that I am in a situation that I need one of these best practices.
- Phil: Could be related to the PSI Directive that Noel just said?
- Makx: The Why-Section does that.
- Phil: How much detail is needed?
- Peter: Should be atomic best practice.
- Phil: Agree.
- Phil: Can you clarify on the needed context?
- Makx: Why of the best practice and not why of the specific implementation.
- Peter: Agree.
- Phil: Still do not understand why this is not included in the "Why"?
- Johann: Why not simply use the W3C template and to critisise the parts.
- Phil: They are actually the same.
- Birmingham: The PSI directive is very generic. We should look at the experiences of the project members.
- Phil: See example: https://www.w3.org/2013/share-psi/wiki/Best_Practices/Cross_Agency_Strategy
- Peter: Context at the beginning and at the end.
- Andras: Can we do it less academic but more down to earth.
- Practical examples, e.g., "have a look at this cross-agency strategy"
- Three things: problem, solution, examples
- Peter: Can we have links to the PSI directive? For instance, looking at my presentation, I will immediately identify possible links to PSI directive.
- Jannis: Would it make sense to have a contact person (for the evidence) or a date?
- Title + Short description
- Why and Context how to know when to follow it
- Intended Outcome. What should be true afterwards. Ideally, directly link to the PSI directive.
- Possible Approach. Question: How prescriptive are you? (Mostly related to Best Practices Working Group, and other group Geospatial WG)
- How to Test: How others will realise that you implemented this best practice.
- Evidence: Points to other work, presentations etc.
- Life Cycle Stage:
- Johann: It may still be useful for overviews of many best practices.
- Jan: Problem. Life cycle is about publishing datasets.
- Andras: Could also be linked to success factors of Jannis.
- Phil: A Best Practice on the wiki is a draft.
- Phil: Do we need related best practices?
- Noel: Yes.
- Chris: Why not have a coordination section.
- Georg: If we want to come nearer to the directive. Obligation in the directive; and this is how it is implemented in national law. It should be easy to define such questions.
- Phil: Agree to link to PSI directive but do not agree to link to national law. This will be the other way round.
- Phil: ...adding a "Relation to PSI Directive".
- Jannis: Agree with categorisation. But linking should be easier for writers. For instance, have a multiple-choice possibility. We need a taxonomy of Open Data. Critical success factor taxonomy. Should be possible to do the job. We should have a vocabulary with which to link the best practices. 15 categories.
- Peter: We have EUROVOC. Can we as a group feed into EUROVOC.
- Phil: It is nice to link to all kinds of things, but we have limited time so it needs to make sense.
- Phil: Will be talking about this on Wednesday when we discuss what happens after the project ends.
- ? (from Birmingham): What is the minimum viable output of the project?
- Jan: How should the scribes work with this?
- Noel: Ideally, the presentators
- Chris: The scribe should copy over the template to a new wiki page.
Monday 16th March 2015 - Welcome Session (14:00 - 15:30)
Welcome: Prof. Univ. Dr. Marilen Pirtea, Rector of West University of Timişoara
- Emphaised that transparancy is a key building block.
- Gave a short history of UVT, followed by short presentation of Faculty of Mathematics and Computer Science and of UVT.
- This workshop increases the visibility of UVT
- Facilitate exchange of ideas between participants
- Confucious: "if 2 persons change 2 apples they wil have 2 apples; if they exchange 2 ideas they will have AT LEAST 2 ideas."
Radu Puchiu, Secretary of State, Chancellery of the Prime-Minister (Romania)
- Mr. Radu Puchiu is head of Deparment of Online Services and Design (DOSD) at Chancellary of Romanian Prime Minister
- The DOSD has been charged to build up the Romania's data portal (data.gov.ro);
- Organize hackathons to involve community, make known the datasets, build apps around published data
- Open data portals shows progress of governance transparency
- Identifying the best practices is important for publishing, idenifying datasets etc;
- This workshop is an oportunity to learn from one another
- Assessing the datasets for publication need to prioritize between identifying good quality datastes and publishing them as they are
- Romania's sharing the 16th place in Global Open Data Index with Iceland and the Netherlands in 2014
- We are aiming top 10 for 2015
- The example of publishing public procurement data, which will be discussed in coming sessions, shows also shortcomings of existing systems and practices
- A new platform for public procurement that will export all data in open standards will be setup
- Receiving and acting on community feedback is important!
- Good to see familiar faces (eg Martin) in Timisoara
- Please ask for help, we're open to support community.
Experiences of identifying datasets for sharing, Benedikt Kotmel, Ministry of Finance (Czech Republic)
Welcome. I expect lots of discussion and answers. I would like to share my experience as a person who is responsible for opening up the data and as a civil servant who wants to do his job the best he can.
Review of the Open Data agenda in the Czech Republic
- Launch of the Open Data Catalogue in January 2015. It currently contains about 25 datasets.
- Five other ministries are expected to launch their open data catalogues.
Catalogue of the Ministry of finance became really popular thanks to the selection of the datasets. After the press conference it attracted huge amount of accesses.
Ministry of Interior of the Czech Republic is responsible for managing open data catalogue. Cooperation between ministries is sometimes problematic. I try to maintain communication with the Ministry of Interior in order to ensure interoperability.
No everybody (politicians) in the Czech Republic understand what Open Data is about.
DCAN used a tool to support the open data portal of the Ministry of Finance. It is recommended to other ministries as well.
- Internal directive – really important
- Demand analysis
- Supply analysis – do we have the data?
- Legal analysis – it needs to be ensured that the data can be published
- Sign off by the minister
- Feedback from the public
Roles and responsibilities
- Analytical department
- Owners of the data
- IT department
What should public servant with the power to open any data do to identify datasets for sharing?
- Demand analysis
- FOI requests – analysis of the requests
- Universities were asked what data they use use
- Non-profit organizations
- Private sector companies – it is difficult to get response from the companies, running a market survey would require additional costs
- Supply analysis
- Analysis of the information systems – excel spreadsheet, not user friendly
- Communication across the organizations
I would appreciated your ideas about identification of datasets for publication.
What was the results of the demand analysis?
- I was surprised that we received only a limited number of responses.
- Received about one page of ideas
- Identification of datasets for sharing is an endless loop
What are the invoices metadata used for?
- Provided metadata
- Description of the invoice
- Names of the employees need to be anonymised. It is not an easy task.
- Invoice data was published to show that the Ministry of Finance is transparent.
- The provided level of detail probably does not allow sophisticated analysis of spending. However redesign of the uderlying system is being considered.
Aranita: What kind of data is necessary to be opened? This is the current issue in Albania. I would like to suggest to open data of public agencies as well. For example in Albania interesting results were achieved by analysis of data from the election commissions. Many companies in Albania have interest in data about energy policy. Public offices sometimes might have more data that the ministries
- B. Kotmel: My job is to publish data of the Ministry of Finance. However there are offices in the Czech Republic like the Office of the Government
Why there are five ministerial data catalogues?
- These five ministries fall under one political party.
- However we are collaborating with the Ministry of Interior
There is a similar initiative in Romania. Could you, please, comment?
- R. Puchiu: We approached the problem differently. There is always the question which data should be published. NGOs always say all data should be published. Data portal was launched in 2013. We opened datasets from different public organizations (ministries, offices). It is important to create a mechanism to ensure continuous publication of the data (including updates of the datasets). Change of culture is necessary. It is a better way than forcing agencies to publish data with a legislation.
What are the benefits of opening up data?
- B. Kotmel: Not so many politicians understand Open Data. We are not sure who use our data. Identification of the particular users in the next step. Two years ago an application was launched that shows budgetary data from the State treasury. Many people asked for this data however is already published. It might be that not so many people are aware of the available data
- R. Puchiu: We are just publishing open data but we are not creating a mechanism of turning this data into something useful for the end user. We promoted Open Data to the IT community. Even for the public administration it is necessary to see that there is some kind of result in the end. In case of the municipalities the distance between citizens and the local authorities is small which helps the development applications like public transport guide applications.
Capturing Best Practices, Chris Harding, The Open Group
Monday 16th March 2015 - Workshops (16:10 - 17:00)
Valentina Dimulescu, Romanian Academic Society: The Electronic Public Procurement System, open data and story telling in Romania
Her research analysed the public procurement data available on the SEAP database: http://www.e-licitatie.ro. In Romania, the SEAP authority is in charge of procurement oversight and they give access to this regulatory data. The project evaluated contracts over 1 Million EUR. On the national level information was exported/provided in MS-Excel format. On local level the work was more tedious as we had to contact many local level authorities for their data.
Two major shortcomings were identified:
- Accessibility: if you want to enter the system a captcha code
- Search functions is very limited, the searcher has to know the NACE codes.
Result is only contract title, type of procedure, year contract was given. A notice of participation is given instead of the contract itself.
Q:Accessibility-wise, what would you have expected? A: Enter the name of a company and get the result of how many contracts have been awarded. However, in order to get any useful information, it is required to first download the data and make the statistics on your own.
Q: What would you prefer: Bulk download or API? A: Best practice: Instead of investing now time and money in an API which very likely still leaves sthg. open. Provide a bulk download.
Q: Was there anything “interesting” in the data sets? A: Not at the national level. On the local level some of the companies were actually state-owned. It is the local administration strategy to support it's local companies. However this can not directly classified as corruption.
Q: What is the situation in Albania? Can you have a history of companies, who are the shareholders? In Albania we identified that holders of this companies had strong ties to ministers or high ranked officials. Did you also check for such relations? A: We also constructed a database of companies which gave money to the government.
Q: Do you have an electronic legal system? A: Yes, but it is a company which was making the system usable.
Q: Did you try to download the public procurement database from the open data portal A: We found out that the corrections / erratas were not applied. Those databases are corrupt and we could not use them, there were incorrect (eg. contract values)
Q: Did you try to provide the corrections? A: We tried to match our data back to the providing system, but there where big challenges to identify matching records. Different formats, layouts make difficult the corrections upload. Technical recommendations: Separate fields for contact, numeric values like postal codes, address. As is the situation currently, it's all within in one field. Auto-complete function for eg. address
It would also be very interesting to have a data system, where you can actually upload your corrected data sets. CKAN is used in many administrations, but CKAN doesn't provide that functionality. As it is open source, YOU can do something about that.
What is also needed, instead of plain document-based data, is a way to monitor the procurement process, as contracts get awarded but the objectives get changed afterwards. We are interested in the whole flow. We have to focus on the process, because most of the corruption happens after the tendering.
- Aranita Braja (AL) presents http://spending.data.al/en/treasury/list/year/2014, an open linked database system which links company data with spending data.
- SEAP database: http://www.e-licitatie.ro
- Golden book for electronic procurement: http://ec.europa.eu/growth/single-market/public-procurement/e-procurement/golden-book/index_en.htm
- Model the (procurement) process, instead of providing document oriented data sets.
- If in doubt, provide unabridged bulk download instead of lacking APIs.
- The community has to provide feedback back to the government. That's what government is expecting to receive.
Major discussion points:
- Where to draw the line between government job and job of a company, eg. When it comes to data cleansing
- Do not complain but also provide something
Vasile Crăciunescu, Codrina Maria Ilie, Technical University of Civil Engineering Bucharest
See scribe notes
Site scraping techniques to identify and showcase information in closed formats - How do organisations find out what they already publish?
Tuesday 17th March (09:00 - 10:50 Plenary Session)
- Jacek Wolszczak, Ministry of Administration and Digitization (Poland)
- Branislav Dobrosavljevic, Business Registers Agency (Serbia)
- Szymon Lewandowski, European Commission
- Nicolas Loozen, PwC EU Services
- Chair: Heather Broomfield, Difi
Heather: Welcome! I'm one the responsible of the Open Data in Norway
... Happy Saint Patrick's Day, by the way
... [Introducing the speakers]
Heather: Jakek, floor is yours
Jacek Wolszczak, Ministry of Administration and Digitization (Poland)
Jacek: I'll present how we have implemented the Directive in Poland
... and started our Open Data initiative
... The current situation in Poland: reuse depends on the information access regulation
... now moved to a new and specific act.
... All the public documents will be able to be available to be reused
... Why a separate act?
... We want to emphasize the different between access and reuse
... Also it's important for us join all the regulation in just one act, not several separated
... The conditions of reuse:
... One option is offer the information free of charge
... referring the origin and some metadata such as the last update
... GLAM institutions will be able to impose exceptions
... Anyway GLAMs will not be able to avoid the reuse of PSI
... We are we now?
... We are listening to multi-stakeholders to understand their needs, concerns
... Further steps: drafting the law, developing guidelines
... When it will be ready? The act on PSI reuse will be released before the deadline
... But the new regulation will be issued by mid 2016
[ Scribe lost the connection (again) Argggr... ]
… We offer a request form in order to get and reuse the information.
… We will open also a Open Data portal.
… the current version of the portal is based on CKAN
… Responsive layout, accessible, knowledge base for publishers and reusers
… SW will be open sourced for others’ reuse (on Github)
… That’s all, thank you.
Chris: How do you distinguish access and reuse?
Jacek: @@@ [scribe lost connection again, sorry]
Noel: If you have to put a request to reuse, isn’t against the Directive?
Szymon: It’s not against the Directive. ‘Requests’ are included in the Directive —i.e., information not available can be requested.
Peter: Museums and Libraries can charge more than the rest.
Jacek: There are different situations depending on the aim of the reuse (personal/research, and commercial)
Jacek: We receive feedback from civil organizations and we want to make reuse easier.
Szymon: Any specific complaints from these organizations?
Jacek: My unit is in charge of the implementation. I don’t have this information but I can get it.
… All the feedback will be available on the web (for transparency shake)
Additional scribe notes
- Q1 (Cris, Open Group)– How do you make distinguish between use and re-use ?
- Indeed we track the requests for access. We do not have mechanism for tracking the re-use.
- Q2 (Peter, Scotland) - PSI is not only dealing with data. How do you manage other types of content (e.g. video, catalogs) ?
- We launched the portal DanePubliczne.gov.pl and in May 2014. With the revised Directive ‘Libraries, archives and museums will be able to impose other re-use conditions than the aforementioned’
- Comment (Noel, Belgium) – Request for re-use can be against the PSI directive ?!
- Szymon Lewandowski from the DG CONNECT answered that is not against the Directive.
- Jacek (Poland) answered that they have categories for re-use and if the purpose is re-use (scientifical data), then the data is for free
- Comment (Peter, Sweden) – We in Sweden removed the costs to improve reuse
- Jacek (Poland) – according to our analysis, it defers from country to country
- Q3 (Cris, Open Group) – Could the knowledge base you have implemented besides the Open Portal be considered as best practice ?
- Jacek (Poland) – Yes.
- Q4 (Szymon Lewandowski , EC) – What are the main complains from stakeholders ?
- Jacek (Poland) – could not speculate now, but we have the feedbacks from users (comments) in Poland language
Branislav Dobrosavljevic, Business Registers Agency (Serbia) - SBRA
Branislav: It has been a pleasure for me attending this workshop
… This Open Data will is really interesting for us
… The Business Registers Agency started in 2004
… You can find everything on our website
… About our agency:
… There was a reform on Business Registry in Serbia, started in 2004
… Before the reform just +4000 companies registered, now +11000
… We are the central department in Serbia, controlling these registries
… [Some figures]
… SBRA has a 69% of satisfied customers
… SBRA’s system architecture (technical)
… SBRA is the One-stop shop on business registries in Serbia with a clear methodology and workflow
… We developed the first eGovernment project in Serbia ($3m)
… [Explaining the different modules of the system, including the registry of business]
… What we are doing now:
… The status of eGovernment in Serbia
… From the tech point of view: some G2G services
… Also we have a Web Service for commercial banks (exchange of data by law)
… Please have a look at our site.
… About our International cooperation: SBRA is member of ECRF, EBR, CRF, IACA, and some others
… We participate in EU Open Data projects (LOD2, GeoKnow, etc.)
… We are member of a consortium for a H2020 project, and we plan to be more active in the future
… [A wrap up slide]
… We are selling some data. At the beginning used for self-financing the agency
… We want to assure a consistent market in EU. This PSI should be free of charge, but there are many companies selling data.
… The PSI Directive sets up this needed framework
… [Lessons learned]
… Technology is not a problem. The challenge is always related to policy
… Solving interoperability problems,
… Optimization of resources
… [slide SBRA’s Development Concept]
… We should give "our customers” the best information and mechanisms to be competitive
Joseph: The agency depends on which ministry?
Branislav: Each type of registry depends on their concrete ministry. But we are independent.
Question: Open Corporates. Do you offer an API for them?
Phil: If you are not in Open Corporates, you are not open.
Branislav: We have a road map using different phases: first financing ourselves, and further steps will be offering the data to the companies which make value of the data.
Additional scribe notes
Content of the presentation was structured around the following points:
- Business registration reform in Serbia (2004-2006)
- „One Stop Shop for Registration“ as a forerunner of interoperability in the Serbian public sector;
- Steps towards SBRA future: Web services deployment, international cooperation and Open data projects
- SBRA experience: Lessons learned (Keep it simple in the beginning, Put efforts on Front end services; Have complete control on services; selling enriched data not raw data; create consistent market of data, companies can make profit; no problems with the technology; all activities should be covered by legal act)
Questions from the audience:
- Q1 (Jozef, Malta) – Do you have a responsibility for the content of the registries or just the management
- Branislav (Serbia) – We have responsibility for all operations, while Ministries (currently 9) are second level of decision;
- Q2 – Serbian data is not present in OpenCorporates (https://opencorporates.com/), therefore you are not open. Do you provide API ?
- Branislav (Serbia) – You can check our pages and get an extended set of facts about the companies including financial data. We have 50000 daily visitors of the SBRA Web page www.apr.gov.rs ;
- Q3 - When does the commercial sector take over ?
- Branislav (Serbia) – Our goal is to keep our revenue lower and lower, achieve well organized market of data, and to be only a provider of raw data
EU actions on Open Data – current policy and legal context
Szymon Lewandowski, DG CONNECT
- the EC Open Data initiatives where the latest Big Data actions (see 'Big data' Communication from July 2014) addresses data in a holistic manner; Context is changing from open data towards big data
- pan-European infrastructure for (open) data (after the end of LOD2, publicdata.eu will be further developed and maintained by a consortium led by Capgemini, first release of the new portal: October 2015 at EDF. Luxemburg)
- Open Data Incubator for Europe, http://opendataincubator.eu/
- Questions the EC receive related to the implementation of revised Directive reveal that
- there are not problems with implementation, but refer the currently running contracts
- there are vagueness about the charging of specific documents
- some institution has not implemented the psi from 2003
Questions from the audience:
- Q1 (Felix, W3C) about the Multilingual support facilities in pan-European portal
- Szymon (DG CONNECT) - Automatic processing / Machine translation technologies to be deployed (first version) by October 2015, the tools will have an effect on quality of search, but we still have to test, but we cannot launch the pan-European portal without any functionality
- Q2 (Peter, Scotland )– in the new portal the Focus is on tables and numbers, where is the other content ?
- Szymon (DG CONNECT) – here the accent is on meta data, the catalogue guidelines are important and we will prepare instructions; the portal will accept also data with different licenses, also data that is not fully open; quality will be taken and checked; we will try to avoid problems with Europeana;
- Q3 (Chris, Open Group) - Business value from open data – What are the business models, Do you offer a set of business models how to use the data?
- Szymon (DG CONNECT) – We will rely on ODI models, we do not have this set / body of knowledge; we can collect e.g. 20 examples of best practice.
- Q4 (Noel, Belgium)– How the publicdata.eu will work ?
- Szymon (DG CONNECT) – EU idea is to provide an wizard to guide you with the licenses and let you combine datasets with similar licenses; we hope the portal will provide guidelines; psi-reuse is more traditional; Open Data is more proactive.
Good practices for identifying high value datasets and engaging with re-users: the case of public tendering data
Nicolas Loozen, PwC
In his presentation, Nicolas is answering 2 questions
- How we prioritize datasets, how we identify high value datasets
- How to reuse
Questions from the audience:
- Q1 ( )– Question about the transparency - When open data does not bring to transparency? How you made difference / distinction?
- Nicolas (PwC) The task of public administration is to produce that data
- Q2 ( )– How can you claim that availability improve interoperability? My experience is that we improve interoperability in several different ways.
- Nicolas (PwC) – To me the data available is the first step, you can not have interoperable data if you do not have the data at all. Open data is wrong format will not improve interoperability. We have the core vocabularies that guarantee interoperability
- Q3 (Jens, Germany) – Why do you look at the cost of opening
- Nicolas (PwC) – cost of opening is not difficult to measure, we try to tackle this issue in PwC
- (Jens, Germany) - If the data is important, but the cost is high, then it loose the attractively
Tuesday 17th March (11:30 - 12:40 Parallel Sessions B)
How good is good enough? A common language for quality?
Facilitator: Makx Dekkers
Scribe: Peter Winstanley
Makx works for the EC with the ISA programme and did some work with the Open Data support on quality of data. It is difficult to find out about quality, and improvements in quality are important, but this is about how to describe quality - we are looking for solutions for defining, measuring etc quality
Makx is going to describe some quality dimensions and then wants to focus on a couple and see how they could be measured and used, and then if a set of controlled vocab and predicates could be defined
There are a number of dimensions of quality (source: slide #8 from http://www.slideshare.net/OpenDataSupport/open-data-quality-29248578)
- Accuracy: is the data correctly representing the real-world entity or event?
- Consistency: Is the data not containing contradictions?
- Availability: Can the data be accessed now and over time?
- Completeness: Does the data include all data items representing the entity or event?
- Conformance: Is the data following accepted standards?
- Credibility: Is the data based on trustworthy sources?
- Processability: Is the data machine-readable?
- Relevance: Does the data include an appropriate amount of data?
- Timeliness: Is the data representing the actual situation and is it published soon enough?
Q: (Bendikt) Is it sufficient to describe relevance in terms of amount; what about frequency and nature of usage?
Q: (Anna) What about 'usability' or 'understandability' - can the data be used easily or not?
Q: (Noel) context is important - also the purpose for which the data was already originally collected/created
A: Makx - yes, we probably want to add that.
Q: (?) Interlinking; data from good sources might be interlinked
Q: (Lorenzo) we could split between intrinsic metrics and context metrics; some measures will work across all datasets, and others will be more domain specific.
A: Makx) this is including a 'watchdog' aspect, where someone external looks at the data and give it some scoring
Q: (Noel) intergratability?
A: (Makx) isn't this related to completeness overall?
Q:() completeness would also include metadata. A dataset of high qu
Q: (Gabriele?) persistence??
A: That is covered by ****
Q: (Yannis) We should be boosting understanding, so sometimes quality is less important that a dataset with a good visualisation, linking and availability as a service.
A: (Makx) the scope for this is more restricted to look at data alone,
Q: (Martin) Linking shouldn't be taken into account - many users don't care about this
A: (Makx) there are users that look at this from a completeness perspective, but the provider might want to make the specific point about linking
Q: (Lorenzo) 5 star data is relevant for the open data cloud
Q: (Dolores) quantitative, categorical or qualitative metrics?
Q: (Ingo) in geo ISO 19157 provides quality flags, but they work on a particular aggregation level only. Before we talk about quality we need to discuss granularity
Q: (Ingo) quality changes over time, so recency might be important
Q: (Bronislav) Experience from registers: is the data reflecting reality or not. Linking can positively improve quality, but also negatively if the links are of lower quality
Makx: we are going to assume that providers are truthful about their data, so how can we match what providers want to say about their data
Q: (Johann) ODI released a certification scheme. I think quality has to be left to the stakeholder.
A: (Makx) this is not the area we are moving into. We are looking at measures that describe the data and the user can work out how appropriate that is with
Q: (Johann) we need to include availability
Q: (Neven) we are concerned about the data, but not about the process. Would it be a good idea to also provide attention to the quality of the process?
A: (Makx) it is related to the conformance - was it done professionally, to the right standards
Q: (Dan) consistency - there might be contradictions, but it might just reflect the real world
A: (Makx) if a provider wants to say something about consistency they need to be able to do so
Q: (Dan) relevance to me means 'utility'
Q: (Dan) timeliness is not only related to the current update but also if the data has a history [?longitudinal dataset?] Data captures a picture - and it is important to see the evolution of data over time, sometimes we only see the most recent versions and not the change sets
Q: (Dan) we should include citizen feedback.
A: (Makx) feedback and quality improvement measures are related to process and are out of scope
Q: (Martin) a new project - Open Data Manager - is including some quality measures:
Q: (Lorenzo) I think that the Open Data Manager is looking more at the overall quality of the data in the portal
A: (Makx) we will look at that
Q: (Lorenzo) There is more than numerical data, but also video, image etc and some of these measures might not be appropriate
Q: (Lorenzo) consistency could be inside accuracy
A: (Makx) this might depend on context, there is some overlap.
Q: (Valentina) what about granularity?
A: (Makx) according to DWBP Group, it is a different dimension, saying what the data is about but not about the quality - part of relevance.
Makx: Availability; processability; and a cluster including accuracy, consistency and relevance are key ones that seem to be rising to the top
Makx: do we want to talk about measurement?
Availability: Is it Y/N or is it related to licensing, usage, time window,
[Johann] there is a point measure,
[Jens] Apart from licenses, there is also registration for some datasets
[Noel] This is a recurring question; As a data provider the Flemish Government doesn't commit to this, but for a business model to be applied then this information is reassuring
[Neven] We need to take account of archiving; and this information could be provided
Makx: we can use Y/N, but there is also open/restricted and also available for access only or reuse. There is also some expression of persistence policy These are qualitative measures
Makx: do we want to talk about processability? Do we agree with the 5 star?
Yes, looks as though many people do
[Ingo] familiarity with the dataset, ability to use within my system is the main thing. I think that processability shouldn't be used because it cannot work across all situations. Processability depends on the application. I would like information on the data model and the semantics
[Yannis] I agree with Ingo. 5 star stops at the point that says that the data might be processable, but if we want to make it a attribute that can be assessed automatically
[Makx] 5-star implies several things that make data processable
[Johann] I would advise refraining from using 5 star as the key indicator of processablity because you don't know what it is linked to
[Johann] none of the datasets used by companies in Austria are 5 star, but they fail to be processable for other reasons
[Lorenzo] the 5 star scheme is not linear - there is a huge difference between 1 and 2 stars, but much less between 4 and 5 star. But it is a practical approach
Makx: We have run out of time, but will continue the discussion on the mailing list.
ODI are also interested in feedback from this discussion, in relation to their certificate mechanism
[Noel] We can use the ODI approach per term and update when we get this vocab sorted
Finally, to address the questions that we have to answer:
1. What X is the thing that should be done to publish or reuse PSI?
Unambiguously express and communicate quality levels of published data.
2. Why does X facilitate the publication or reuse of PSI?
Allows re-users to take informed decisions on whether and how to re-use data.
3. How can one achieve X and how can you measure or test it?
Provide a standard set of terms that can be used by publishers.
- Initially, the number of sites that use those terms
- Later: an increase in re-use of data covered by such standard terms
Linguistic Linked Data as a bridge to reach a global audience
1 session leader, 6 participants
Multilingual data is a part of the Linked Open Data Cloud. It partly overlaps with Linguistic Linked Data (helps translations). Lider project aims at overcoming the language barriers. Builds and extends existing LOD technologies. WP structure is in the presentation. Infrastructure build by the LIDER project help to utilize the multilingual vocabularies like EUROVOC. However it supports use of other language resources like Lexicon or Corpora.
- Agreed upon vocabularies
- RDF as a standardized language for description of resources
- Use of standardized technologies – LD stack, SPARQL
- Standardized APIs
- Links to other resources
- Additional requirements
- Use of resources
Who benefits from Linguistic LOD?
Outcomes of the LIDER project
- Linguistic Linked Licensed Data (3LD)
- Language resources using standard data models published long with machine-readable license
- Adding a machine readable data might be another best practice
Finding the right resources is difficult. Therefore metadata is necessary. That is why the Linguistic LD Observatory (Linghub) is being developed.
Chris: Do you take into account statistical elements during translations? It might be necessary to take into account probability measures Answer: No. It is a difficult problem. For some languages like English or German there is a quite a lot of data, but for less widespread languages there is not enough data. Chris: Using probability measures might improve the situation.
New project named FREME (http://www.freme-project.eu) was started. Requirements for multilingual datasets are welcome.
Technical best practices for the multilingual data are being developed by the BPMLOD Community Group (https://www.w3.org/community/bpmlod/) and the DWBP WG.
Nicolas: EC published 10 rules for URIs. It is recommended to not distinguish between languages by adding language indicating string into the URIs. Felix: Codes for concepts can be used instead for example “house” that will lead to the concept definition and it can have expressions in different languages.
Chris: Cooperate to define common metadata. For example for the financial accounts. Organizations many countries publish common data but they do not provide explanation of what the data is.
Felix: When you say common metadata. Do you mean something universal or something very specific? Chris: I think we need both. Universal concepts can be linked to specific ones. E.g. universities have vocabularies for their courses. Companies have lists of customers and products. Universities will use the term “student” which will not apply to a manufacturing company. However you can link the “student” concept to its French expression.
1. What X is the thing that should be done to publish or reuse PSI?
Make the data available in the language people want it.
“People” could be: people of the state (even in the Great Britain some people might need data in Welch for example).
2. Why does X facilitate the publication or reuse of PSI?
It makes the data usable by a larger community of people.
3. How can one achieve X and how can you measure or test it?
Provide multilingual data according the proposed best practices.
Role of Open Data in Research Institutions with International Significance
Facilitator: Tamás Gyulai, Regional Innovation Agency
- Szeged is a Smart City in Europe, it is a partner in the Extreme Light Infrastructure project experimenting with high power laser equipments, doing a lot of measurements
- the project needs big data infrastructure, one option is a supercomputing Prague, the other is the data cloud in SZTAKI
- research results shall be open for analysis and evaluation, useful for material, medical science, etc., for both researchers and professionals in the industry
How can you open up data?
- There are big differences in doing this by research fields.
- often author wants to process data and publish a paper first
- industry rights may also be limiting, it is important to hide company data and data for which the company may have private interest
- there are results which should be kept secret, such as nuclear research
- in case of Szeged there is no 3rd party financing, data ownership is clear, it remains with the data provider
- in e.g. medical, social science there is a need to anonimize data
- in case of Szeged, it is not unrealistic to provide different parts and layers of data to avoid disclosure of sensitive information, but this is extra burden to arrange
- dataset publishing is slowly getting encouraged by journals, e.g. Nature
- data journals, data publications are appearing
- SZTAKI is doing an experiment on a new kind of journal providing:
- associate paper, data and algorithm,
- interactive publications, replay algorithms, reproduce results
- SZTAKI is also setting up a data mining centre for publications, doing automatic analysis of papers in order to see trends and directions
- Hungary has a national bibliography registry plus fulltext archive, to collect the whole production of the country, also capable to accept data, this service will soon have law to support its usage
- this is also a cultural issue for researcher, they are not used to publishing data
- scientific workflows, opening up workflows is also important, not only data
- teaching is important, students could learn how to publish data, but experience at KIT shows that students forget and have no time
- Linked Open Data is a good platform for open research data, but researchers are not familiar with it
How to encourage opening up research data?
- for example, in biochemistry this is normal
- a segmented approach would be good, starting with a target group where this is important, and then proceed to other target groups
- for example, chemistry or physic students could be a target group
- museologists are hiding their data, even from each other inside the museum
- we need policy development and step-by-step encouragement
- service centres are needed to support research infrastructures including education, personal guidance and easy-to-use services
- look also at the demand side: should governments encourage this via requirements or bonuses?
- does this mean a financing problem for researcher? how much does it cost to publish/persist data?
- some proposals already require to spend some of the budget on publishing data
- don't forget that industrial research data is by far larger than academic and it is closed
- possible best practice: DFG: in the proposal one has to describe plans for publishing data
- as a result, lot of new repositories are popping up
Can one easily store data?
- ELI has no solution
- DataCite could be a one-stop shop
- it is non-profit, funded by Universities
- a bottom-up, transnational approach
- re3data will be a service of DataCite
- CERN also established a network of data centers; Wigner in Hungary
- DOI usage is different in US and germany
- in the US universities pay-per-use
- in Germany this is somehow centrally provided by some Universities
- but where can I put my data if my institute does not provide me with such services?
- a kind of trust is also needed, soem repositories are badly managed, unreliable
- we need advice where can i put data
- thematic repositories (e.g. Arxiv)
- emerging so called Research Environments
The key points are:
- Encouragement via Education and Policies
- Support everyone with research infrastructures
The European Database Directive, Freyja van den Boom
Freyja: Introduces herself, cites LAPSI project (now finished).
We discussed the legal barriers to PSI and databases
Now involved in Open Science Link - making a platform in biomedial field can deposit their databases and publications.
One of the barriers is where the data is protected.
Looking for news of problems - hope we can sort them out
Summarises database protection (slide)
Talks about 'works' not data. Is data a work?
Copyright protection for databases is complicated because most of the time the data is not aimed to be created, nit much room to decide what to include and leave out.
So level of originality lacks in databases. So most data not protected under copyright.
This was recognised by EU legislators => created the databse directive, Feb 1996.
Sui Generis - "an extra right"
Some protections. Investment in creating the data has to be substantial.
In different member states we see this determined case by case. In DE, 40 people working on an online database - that was enough.
Another case, 500 people wasn't enough.
If you are allowed access then OK, but you're not allowed to download the entire DB or a substantial part.
How do we know what is protected?
This can be problematic if you're using data mining tools and linking. Is that covered or not?
We're helping researchers to link to their publications, eg through PubMed.
If a researcher does it, OK, but if we as a project do it for all our researchers, is that still OK?
No laws yet AFAIK to cover this.
Machine readable liceces - what does that means?
Lots of MSs not implemented database directive in the same way.
It's hard/impossible to know of a database is protected or not.
Opens the floor to questions and discussion.
PK: In SE, the question of whether the DB is protected or not is the big question. Answer seems to be 'it depends' - there's no simple flow chart. ... this creates uncertainty. And if that applies to SE, it surely is worse at EU level.
MV: The FI law is similar to SE as we were under SE law when they were written. ... We certainly have issues and they'll multiply at EU level.
GH: We have problems with these sui generis rights in Austria. Not accepted from the copyright machine. ... Ministry of Justice sticks to its opinion that they don't need to pass on the commercial registry due to sui generis rights. ... It's a problem for us that it doesn't conform to the usual copyright rules.
PK: So the DB rights prevent you from doing transparency issues.
GH: It's the whole download that's the issue. One by one is OK. ... Recital 22 says if it's the DB right of the PS body, it conforms with IP rights in relation to the PSI-D, but not in the national legislation.
AM: Access to the AT company registry is very expensive. It's also regional.
GH: That's Germany.
JA: Is this directive aimed at both public sector databases and private? In PSI we're dealing with PSI DBs ... How does the sui generis rights affect this publication?
FvdB: Doesn't make a distinction between public and private data. If the DB falls under the requirement for protection it's protected, whether public or private.
GH: There is a distinction |-| public and private. Because if it's public then the PSI-D applies. ... in the new one, you are obliged to publich, If the PA holds the sui generis rights they havae to publish. So the PSI-D overrides. ... but in the directive, the copyright doesn't reduce the obligation to publish.
FB: The MSs have implemented the DB Directive to say that only a legal person can assess access. An employee won't do that.
GH: If it's a third party owns the IPR then the public sector body is not allowed to release for re-use.
FB: That's the dicussion about the high value datasets like post codes etc. They would likely fall under the sui generis rights, not the PSI-D.
PK: I have a session on crowd sourcing later. Lots of people putting out a little bit of data each and then collating it...
FB: If the state is to duplicate the existing DB - there's your answer. if the end result is a copy of the DB then you are infringing the DB rights holder. ... I don't think that's been in front of the courts yet.
AM: When start ups or NGOs want to create new databases they want the sui generis rights to protect their work. ... If they take data and make a significant investment to add value to it they want protection.
FB: If they made a substaintial investment then they would have that protection.
AM: Where people take one or more public DBs and then add something to it - actual original additions. If that's with public data, they'll need the sui generis protection.
GH: They can have the source data. It's our daily business to add value by adding other data to it. Our DB is sui generis rights protected - that works. ... The problem with the old database and t he similar one - we had a DB about companies in AT before the MoJ. We got it from paper books ... and collected it every day. Then the AT gov built their own commercial register. And we copied it as it had been available for 100 years. ... Then the Database Directive came into force and we were no longer able to re-sell the data we'd had for a long time. That was in 1999.
PK: So there is a huge difference between countries. Interpretation of the database looks very different.
GH: Our original DB started in the late 80s. FB: Would it have had protection? GH: Yes, as a collection. ... but whole case was at the European Court fo Justice so it's over and done.
PK: So is there a best practice that we can create? Can we offer advice?
FB: It would certainly help. And it would make the requiremnets more clear. It's not clear what actions are allowed. ... In DE, the temporary accessing database issue was not copied into their law.
MV: Just checking the Finnish law... it's more about open data and IPR, but it says that it's accepted that if you make the DB available then as a governing org you have given consent that anyone can use it. ... It only applies to DBs that are freely available.
FB: Do they define use?
MV: The one who has the right to use the DB can create copies of parts and do other necessary actions related to the access. 'normal' use of the DB. ... If the data is freely available then scraping is OK. If you're adding value, aggregating etc. then the new product you create probably would have its own rights. That's your new product.
FB: Would you need permission to scrape? NV: No. If it's made freely available, especially via a network, then it's expected that the DB owner has given consent for use. ... It's polite to ask but not necessary.
FB: Again, I wonder what kind of use is permissible. MV: The legislation doesnt give any example of types of use. So it doesn't solve much.
PK: Isn't that a difference as you effectively have permission to scrape in Finland? MV: We have a site that compares politician's promises to actions. That's just one example.
PA: What about CAPTCHAs stopping scraping? MV: That would be 'not freely available' We had the discussion about what open means. If you have to register, that's not open.
FB: Would it make a differnece if you were allowed to register annonymously? MV: No. You still have to do something.
PK: We have agencies creating APIs for data. Most of them require an API key to control traffic. Does that constitute open data or not? MV: Then you have the responsibility issues. The biggest worry for people is the additional costs for traffic etc.
PK: When the Norwegian weather agencies opened their API it was used in NO and SE which wasn't being covered. So that's a case for API keys.
PK: So what are the BP suggestions.
FB: On a policy level, the MSs need to implement the DB Directive in a complementary/harmonised way. PK: If the goal of the PSI-D then MSs should harmonise implementation of the DB Directive (from 1996).
MV: Does the DB-D still provide something useful that isn't covered by IPR legisltation and the PSI-D.
GH: It's necessary from the perspective of a private company as it protects the work that you did to create it. ... The level of innvoative curation may provide some IPR protection on a higher level but the simple DB protection is at a lower level and is a useful tool.
MV: One could... GH: You need to step out of the open data area now. Private companies need to protect their work. MV: I take that for granted. GH: That's the DB-D and we need it. MV: But if you take lots of datasets and aggregate, can you protect the idea, but not teh data?? GH: Hmm... not really.
FB: If I create a DB based on open data I get no extra right over the original data. Only over my work.
PK: So we had a BP around harmonistaion. Anything else?
AM: It wouldn't be expensive to record the discrepencies between differnet implementations.
FB: There's a study on this - how each MS has followed or not followed it. Big study - I'll send the link.
PK: Is there anything else that governments should do?
PA: Is there a licnesing issue to discuss?
FB: We dodn't really address DB licneses. Something to address licenses would be good. I'm hoping to do more work on that in my current project. Please contact me.
PK: We had a discussion about the data we have at the SED National Library. We're using CC0 because we don't mind how they're used. Is CC enough?
FB: Looking at what's been done so far, CC is the best chocie and is widely recognised. It's also the only one that takes into account DB rights. ... CC0 - the legal status of that licence is under debate as you can't just put something in the public domain as you have copyright (whether you think so or not) PK: I think it says that there might be legal provisions in your area that yoiu'll need to look up.
GH: We have those guidelines that clearly say that you should use those CC licences. I'd be happy to end the dicsussions by saying - use CC.
??: What's the differnece between a DB and data you publish on a website? ... you can accecss it through web services, or put it on the Web. It's not just databases.
FB: What's the difference between a work and a DB - is that the question? ... Well if you dataset is on your website it can be a database. It's a collection. ... So you might have to check for who has rights to individual elements of your DB.
PK: Wraps up. Room thanks Freyja.
Tuesday 18th March (14:00 - 15:15 Parallel Sessions C)
Crowd sourcing alternatives to government data – how should governments respond?
Peter Krantz (Presenter), Jens Klessmann (Scribe)
Introduction by Peter Krantz (slides and paper)
Case #1: Post code database in Sweden
Content of the official Swedish post code database can be accessed by citizens on an individual basis for free. If you want to access the data in bulk a fee has to be payed. An individual citizen developer started scraping the post code website and provided the post code data via an API for free. He was soon issued a cease and desist letter by the Swedish post.
The question arose: How can you re-create the data from the post code database without conflicting with the copyright? An app was created in order to crowd source the post code data.
- people started using the app to report post code areas e.g. from receipts
- the activity was pretty successful in a short time, after a few weeks already large parts of the post codes had been reported
- most public sector officers didn´t even know about the problem, that citizens couldn´t freely use the post code DB
- in this case the crowdsourcing is more a political tool to get politicians to notice
- Every year there are thousands of changes to the official post code data base. How to get users to actively keep using the crowd sourcing app over time in order to maintain the free post code dataset?
Case #2 Cultural heritage data
- lots of unstructured data like images
- many cultural heritage institutions try crowdsourcing
- set up websites in order to correct OCRd images by crowdsourcing
- How to sustain engagement in such a crowdsourcing effort?
- Not a technology problem, but how to engage citizens in a sustainable fashion?
Is there a best practice to encourage crowd sourcing? General discussion
- use a git hub for publishing data and allow external improvement of data sets.
- examples: City of Chicago using github for their data sets
- crowd sourcing projects are born out of a need: BP identify need first and then seek groups able to support solving that need via crowd sourcing
- best practice: have f2f meetings for crowd sourcing in groups
- support of government to undertake crowd sourcing is sign for openness, thus gov should pay for building community via prices, meetings, etc
Examples of crowd sourcing:
The example of Galaxy Zoo offers some principles for designing citizen engagement projects. Galaxy Zoo is a citizen science project for non scientists to help classify images of distant galaxies.
- Citizens know they are contributing to scientific research
- The project has academic leadership
- The project respects the contributors and responds to them.
- After some initial mentoring the community of contributors became largely self-organising.
- The task is simple and well specified so makes low cognitive demands or contributors
- Contributing to the process of discovery offers the possibility of recognition through identifying a new galaxy or community status
- The project would not otherwise happen.
- The project is web-native.
A related project using the same methods is Old Weather, which generates historical data for the study of climate change from historic ships' log books. http://www.oldweather.org/
The principles can be applied particularly to the museum and cultural sector.
- fixyourcity efforts
- example of bus stop locations being corrected by OSM community
Problems with crowdsourcing
- you create a second non-official source of data, oftentimes competing with official sources
- distinguish whether there is an official source or not. If not, governments might be more inclined to use crowd sourcing and support it
- Look at Micro-work model to find a business model:
- BP: involve stakeholders who could benefit from a free source of certain data sets and have them provide funding in order to sustain crowd sourcing efforts
- better use crowdsourcing to complement existing data instead of creating data from scratch
- BP: use crowdsourcing without the users knowledge e.g. captcha systems
Use crowdsourcing in different phases of a data collection project
- planning: define the data structure
- start of crowd sourcing: communication: make clear, that participation will change their lives?
- actual crowd sourcing: BP utilize gamification approach
- mature stage: find stakeholders, which are willing to maintain the crowdsourced data
Success factor: Building a community
- BP: watch out, that the data collection process is actually connected with the re-use of the data
- BP: the tasks have to be really small tasks
- Collect some use cases for crowd sourcing
- connect to best practice about feedback
- tools to facilitate crowd sourcing: e.g. Pybossa
Best practices proposals
- BP identify need first and then seek groups able to support solving that need via crowd sourcing
- think of crowd sourcing as another tool to create/improve data sets and think about the phases of your data collection project and where crowd sourcing could best fit in
- BP: involve stakeholders who could benefit from a free source of certain data sets and have them provide funding in order to sustain crowd sourcing efforts
- BP: the tasks have to be really small tasks
- BP utilize gamification approach
- BP: use crowdsourcing without the users knowledge e.g. captcha systems
How benchmarking tools can stimulate government departments to open up their data. Emma Beer and Martin Alvarez
Emma - introduces Global open data index and Open Knowledge.
Project Manager for the index which is global cf. European. Started 2012, evolving since.
Dependent on volunteers to provide the data. Takes a lot of time to look at the open datasets available from the government. What the volunteers do is then reviewed through two levels. Engaging with the community, not paying others.
G8 Open Data Charter was starting point for deciding on the list of datasets.
Consider technical openness, legal openness. Each contributes 50% of the score.
97 countries or places took part this year.
Lots of challenges to collate the data. Need to help contributors to understand the questions and definitions.
Use the tool to judge countries against each other as a means to get politicians engaged.
Only 11% of the world's datasets are openly available (as defined by open definition).
Spending data - there's an OK amount of data available but very little about how it is actually spent. Only GR and UK have this.
UK came top last two years. Not a news story to 'still be top.'
Outisde UK > 50 articles around the world. France was most improved.
Collected stories about impact of taking part.
Regional efforts, eg Flanders, not always recognised nationally.
Nice quotes from UK and FR governments.
Lots of tweets about it. DK said they have a lot of open data, why don't people make use of it?
Interest in open data is growing exponentially. See sldies for tweets.
Example of change - AU company register.
Martin Alvarez introduces self Maintain the PSI- Scoreboard.
Try to measure the status of PSI in the EU28
The indicators we have are perhaps out of date and need updating. Aim is to achieve the same as Emma's project.
Tool is also crowd-sourced, so we rely on volunteers.
Methodology is clearly documented.
Technically it's really simple - just a set of spreadsheets, one per country, with well defined indicators.
My proposal is that maybe the Index could provide an API so that other scoreboards can use the data - i.e. the scoreboard would use the OK Index as an indicator. We measure the number of initiatives and the number of datasets used.
So the next step would be to make the proposal to the users of the PSI Scoreboard.
EB: I'm sure the team would be happy to make the data available like that. ... the Open Data Barometer is more research oriented, trying to measure impact. Are people researching this? From OK's perspective, we'd like people to be using the Open Data definition which is decided by the whole advisory board, not just OK.
MA: Rransparency International have indicators. Spain's cities are competing to get a higher score. helps convince politicians to publish budgets, salaries etc.
BK: I'm wondering how this session follows on from Makx's session on data quality. In terms of the index, it's difficult to add metrics on quality as we're measuing very differnet ends of the spectrum.
BK: One big difference - the previous session was all about metrics. You have a crowd sourcing approach, how do you evaluate whether the data is true.
EB: Points to country level expert review, then subject specialists (eg transport). Not water tight but this year we want an expert looking across all 97 countries (or more). hard to make it perfect. We had 4 errata this year - not bad.
Silviu Vert: Is it possible to do this across 97 countries? Multilingual issues etc?
MA: We have the same problem for the scoreboard. Basically I do all the checking, using Google Translate where necessary.
EB: One thing we found in this painful reviewing process - we had community managers managing the process, comment fields, ... and found that exchanges were useful to help people understand open data more.
MA: One of the problems we have detected... in some cases, like Belgium, you have good quality initiatives but they're not reflected in the Index.
Jan Pieter wrote a blog post about why Belgium didn't do better.
The Index handles city level.
SV: I'm the OK ambassador for Romania... we started at the beginning of the previous year. The civil society persuaded the Timisoara authorities to release data on the national portal. At the end of the year, the government gave some prizes from the OGP office to reward the city for its transparency. Timisoara won and was welcomed by city hall. So I think this kind of census is helpful to convince politicians to act. ... We told city hall that they needed to release the budget, elections etc. And got the usual excuses in response. Budget and spending are internationally bench marked - that helps us make ther case.
EB: It would be nice to put something up on our blog about that and that kind of event. How would you encourage a government to be interested in spending? Does the Index help?
Gov: Before the final stage, we were on 4th place and were proud of that. When teh final score came we were at 16 and the budget and the spending was one of the problems as its not open. We used that as an incentive.
PA: Did it work?
Gov: No so far.
EV: In our country, spending is not done at transactional level. It's not available to publish it.
PA: The country that asks the questions always comes out top... comment?
AK-S: Governments are always interested in the impact. We're trying to develop a monitoring model for the long term. We have a preliminary study - what kind of services, what kinds of cooperation - which we'll present at final Share-PSI event in Berlin.
MA: What kind of indicators are you measauring?
AK-S: we're planning, we don't know yet. Conclusion of consultant was that there wasn't any agreement.
EB: The Barometer from the Web Foundation might be helpful. It's based on focus group.
PKp: You could crowd sourse how much they use and how valuable it is.
PA: Banged on about the data usage vocab and ideas around schema.org etc.
BK: How do you relate your indicators with the PSI-D?
GabC: Do you hava a requirement to share the critera with the EC instutional level?
MA: The ePSI Scoreboard is being used by the EC to see progress towards specific goals. e.g. one measure is the transposition of the PSI-D
PA: IS there a downside to not being top, cf. the upside of doing well.
EB: Not sure - I've been pushing for regional grouping. Countries tend to thing about their neighbours more than others. getting harder to get into the top 10. Leading countries are making more investment.
PA: Big +1
MA: I tried to contact some of the people not doing so well. I found maybe a handful of enthusiasts, but in the government no one cares. e.gs Croatia and Hungary - no repsonse from either.
EB: We can use it to secure funding to do traininbg, eg in Global South.
EB: Goes through the key questions.
Adding to slides.
BK: Our BPs are typically made from a publisher POV - maybe we could rephrase this BP as 'look out for platforms measuring your performance and participate.'
Raising awareness and engaging citizens in re-using PSI (Daniel Pop and Yannis Charalabidis)
Session: Raising awareness and engaging citizens in re-using PSI
Yannis Charalabidis, University of the Aegean
Daniel Pop, West University of Timişoara
Scribe: André Lapa (AMA)
Round of introductions - Yannis asked the audience to introduce themselves
Noël Van Herreweghe, CORVe
Jan Kučera, University of Economics, Prague
Chris Harding, The Open Group
Petya Bozhkova, Balkan Services, Bulgaria
Szymon Lewandowski, European Commission - Commision is interested on the topic (want to engage more the citizens, not only the supply side)
Robert Ulrich (KIT)
Peter Winstanley, Scottish Government
Daniel presented the main topics of the session:
- Engaging reusers
- success stories
- incorporating feedback
- should we track open data use?
Ways to Boost the utilization of Open Data by Citizens:
Yannis: is working on a similar paper to this topic, has taken out 5 practical ways (we'll discuss it later)
Example: someone has put a municipal open data portal --> publishes some data sets --> and then, what is the next step to gather some visitors? Why are people not coming?
Asked the audience for some suggestions:
Noel: Has been focusing on the instruments / framework to make open data available --> people are not interested on open data - people are interested on services. People haven't been involved in the discussions re: the supply side / demand side. Academia, researchers....
Daniel: #1 axiom - involve end-users
Noel: we have to put the supply side together with SMEs, industry, or even interest groups; for every roundtable with 20 industry people we have given 6 wildcards from civil society (3 women, 3 men) to see if there are services here to be created.
We need to know if someone is going to do something with this data.
Daniel: should we provide some tools for this involvement?
Noel: yes, but more importantly we contacted local government and asked them for help in identifying stakeholders / people interested in specific themes (economy, environment, statistics + 2)
We think we are on the right track, let's see.
Robert: what exactly is a service? Is it a solution? Is it something we can build and no one will use?
Noel: that's why we should involve people. E.g. building something related to air quality / dust particles.
another e.g. baby boomers and the demographics of society and aging (real issues that Flanders heard from the community)
Robert: ok, so you are looking to satisfy solutions.
Yannis: My only concern is if those "solutions" are just a drop in the ocean, what if those focus groups are not representative.
Noel: sure, that's why we want to put this kind of questionnaires online.
Szymon: the idea of a knowledge base could also boost commercial reuse (not as much individual use) E.g. some necessities arise only when we really need them (i.e. you break a leg and want a broken legs specialist in your area)
Yannis: Would this knowledge base be like an open data portal?
Szymon: the important issue would be to make the information easily accessible.
Yannis: So, should we worry with just building a good open data portal, provide webservices, is that enough?
Szymon: with commercial reusers you might want to go further, they may need more knowledge about the data.
Jan: there are open data portals that have a questionaire regarding who wants to use data. A student is different from a company, sometimes a CSV is enough.
Szymon: but do you know that a priori? That's why you NEED A CHANNEL.
Jan: issue is - once you open you must be prepared to handle these different kinds of requests, and you MUST ENGAGE - you need a COMMUNITY MANAGER.
André Lapa: the portuguese Experience, we haven't been able to engage very successfully (low requests)
Yannis: presents 5 ways to engage reusers (using slides):
- Way #1: Give Them a Home - offer the ability to citizens / users to creat a profile and login via social media
- Way #2: Make them create an open data marketplace - citizen put a request - public for everyone to see (a bit of gamification principles)
- Way #3: Make the opendata publishers - allow for upload of datasets by users
- Way #4: Allow working on datasets / make them curators - can republish
- Way #5: Give them incentives
- publish popularity of the users
- Give incentives (free tickets to community events, free parking...)
- Datathons (longer competitions)
- Data journalism competitions
Szymon: data journalism is very important, because journalists have to explain reality to us - especially on the local level; for my experience people go to local news websites not only for the contents but also for the comments (this is why empowering the audience may be a good thing)
Yannis: there is no such creature as the perfect data user, it's a unique combination of traits that can make data shine, one that you can get combining people with IT Skills + communications skills, etc.
Robert: City Wiki of Karlsruhe - great example of crowd engagement to better living standards, boost turism, etc. - the project is not government sponsored.
Noel: we have tried almost everything here - I think we are confusing citizens with developers - We have spend over 1 million on open data - difficulty on showing the added value for citizens - we are reaching developers but no citizens (no end products)
Peter W: regarding hackatons - the problem is they are not empowering ordinary people (they create self limiting communities) - In Scotland we have created a different type of hacktons were we explain people how to create simple stuff. 2 things happened: people want to share data and people want to work with data
You should try and broaden your audience, not build things only for an IT focused audience
Tap into library communities, for example,
Noel: What I would like to see happen is that data is translated into something useful for people?
I need to see added value appearing.
Daniel: this would open a new Barcamp - Education in Open Data
Bar Camp Sessions
Robert Ulrich - re3data.org - making research data repositories discoverable
What do you want from the W3C Data on the Web Best Practices? - Phil Archer
Who is behind these best practices?
The list of contributors includes various names that come from Brazil.
Due to the funding received from the Brazilian government.
Important consideration is that none of these (credited) experts work for a government / public administration.
The comment in regard to the writing of these best practices is how wrong it can be to think that it is an easy task to come up with these BPs
Discussion focused on the template used for these BPs
RFC2119 re keywords like MUST and SHOULD is being replaced.
The scope is for data on the web and a distinction needs to be made between the web and the internet.
Three questions need to be asked within the scope of each BP:
- Is it relevant?
- Does it encourage publishing of data on the web?
- How easy is it to test it for both machines and humans?
What is the correct balance between the intended outcome (objective) of the BP and the possible approach to the implementation.
There is also the question of what granularity is required at the BP level for this to be useful.
Chris – Best Practice should lead to an outcome but one may not necessarily follow such way.
Joseph – must not be technology specific although standards can be included
Phil – Tools rather technology
Then the question of whether examples should be included to help with the understanding of the implementation methods available – mention of AMAZON in the context of books for visually impaired persons which had a specific effect (giving people more choice can reduce overall take up).
Standards versus best practices
Need to understand the meaning of data through the Why? What? and How?
The best practice / standard on how to provide meta data for a CSV file is almost ready and is a stable BP now
What is the meaning of web services? Move away from them years ago. Is it SOAP, Restful API etc etc
A BP should give the confidence of persistence and long term applicability.
Backward compatibility is very important but not always possible.
DCAT Diagram has been around for some time now.
Not important to use the names used but the concept, abstract ideas leading to physical objects are the more important.
Many datasets are on the Internet but not necessarily on the Web.
The aim is to help create an ecosystem of linked datasets
Problem of having linked datasets versus stand alone datasets
JSON is simple and powerful and good and native to the browser
Data identification leads to the need of a URI
The main grouping of the presently available best practices was considered and briefly discussed re versioning, identification and persistence also in the context of archiving, and whether this is in scope.
General agreement that HTTP code 410 is little known and under-used.
When archived data is brought back to the web, the identifier previously used must be retained.
Early wireframes for the pan-European open data portal, Philip Millard (Intrasoft via Skype), Jens Klessmann (Fraunhofer FOKUS)
- Jens and Philip introduced and discussed the scope and objectives of the technical part of the project
- The main objectives are: to deliver a beta version with high functional coverage and prepare subsequent releases of fully productive versions
- In the session Philip presented some early wireframes of the portal and asked for feedback and additional ideas from the participants
- Comments from the audience were:
- it was pointed out, that it would probably be better to have a unified search box for data and editorial content.
- there was some skepticism raised about the presented community features and we discussed the role of the data portal in relationship to existing European data-related communities
- How is the relation of the data portal to existing efforts like publicdata.eu and the ODS portal?
- The project is discussing with partners from the existing efforts how this relationship should look like
- What type of software is currently foreseen to support the e-learning material?
- Currently it is planned to use the Moodle environment