W3C eGov 26 November 2012 meeting

Please note that after the meeting, one of the guest speakers, Serafín Olcoz, submitted a number of corrections to the minutes which are recorded like this.

<scribe> Scribe: Florian Henning

<scribe> ScribeNick: fhenning

<PhilA2> Meeting:eGov Interest Group

<PhilA2> agenda: http://lists.w3.org/Archives/Public/public-egov-ig/2012Nov/0053.html

<PhilA2> chair: Tomasz

<DeirdreLee> I'm one of P5, P6 or P7..

<rgrp> hi there

<rgrp> this is rufus pollock ...

<rgrp> i am also on the phone

Open data kickoff

Tomasz: welcomes participants and ask for round of introductions

<Olcoz> Hi Martin

[audio problems from Tomasz line. reconnecting]

[audio problems resolved]

yes i'm scribing

<PhilA2> scribe: fhenning

<PhilA2> scribeNick: fhenning

DeirdreLee: from from ireland

from unu-iist/merit

Gwyn_Sutherlin: phd candidate in peace studies

mariateresa: from england

martinAlvarez: from spain

<PhilA2> agipap: is Agis Panatoniou from NTUA Greece

Agis: from greece

<PhilA2> rgrp: Is Rufus Pollock

elsa: from unu-iist, macau

<billroberts> Hi All - from UK, particular interest in Linked Data for public sector

Tomasz: few words about open data topic

<Gwyn_Sutherlin> I joined the call as IP caller, not sure how to add

Tomasz: OD = data that is free to use/reuse by anyone
... OD has not yet beenproperly exploited, but can have huge potential
... impossible to predict how it can produce value

<elsa> I am trying to connect on the phone and type 3468# and I get the message that the code is incorrect

Tomasz: according to April's OD workshop by IDRC/Berkman center, there is a range of potential benefis from OD
... but we don't understand many issues about OD
... same workshop also identified strateic tensions relating to adoption of OD
... eg. contextual differences between developed and developing countries
... also strat. tension concerning outcomes vs. impact
... another point is a strat. tnesion between qualitative and quanitative methods to explore impact of OD
... it will be dificult to quantify many impacts. this impacts how analysis is framed
... any questions at this point?

no questions

Tomasz: introduces speakers
... daniel bennet is not on the call. tomasz will fill in

<rgrp> http://notebook.okfn.org/2012/11/26/open-data-protocols-presentation-to-w3c-egov-interest-group/

presentation by open knowledge foundation (rufus pollock)

<rgrp> http://notebook.okfn.org/2012/11/26/open-data-protocols-presentation-to-w3c-egov-interest-group/

<rgrp> http://bit.ly/dataprotocols-egov-nov-2012

<PhilA2> Slides are now linked from the wiki

<rgrp> the notebook post would probably be the optimum thing to link to

<rgrp> http://ckan.org/

[audio line from rufus has suboptimal sound quality, scribing will not be complete, pls refer to slides on wiki for more complete notes]

<rgrp> http://dataprocotols.org/

phil could you aid with scribing for this presentation if you receive better audio?

thanks

<DeirdreLee> http://www.dataprotocols.org/

<PhilA2> Slide 3 - we want a rich data ecosystm. Easy to share data, easy to use

<PhilA2> scribe: PhilA2

<rgrp> http://blog.okfn.org/2011/03/31/building-the-open-data-ecosystem/

rgrp: We're missing quite a lot of this middle piece. We have the top anad bottom but not the intermediatary group
... so data tends to be quite low quality
... To give you an example. If you;re trying to build something on a hack day, you spend half your data cleaning up data
... assuming you can find it, it's not in the right form, got messy terms etc.
... You need country codes that work in a mashup and so on.
... people spend time over and over again cleaning up the same daya
... Rufus shows his age and starts talking about punch cards
... A classic thing you might want to do is garb data and put in into postgre
... this is not a one liner. There's a lot to do. What we want is a one line to get data from a catalogue and put it into a local tool

<rgrp> http://blog.okfn.org/2010/02/23/introducing-datapkg/

<Tomasz> q

<Tomasz> phil, how to check the question queue? sorry

rgrp: A lot of software integration doesn't happen automatically. A lot of it is based on APIs

<Tomasz> thank you

rgrp: We need to look at (digital) packaging
... We need that kind of software packaging ecosystem - how do we do that with data?
... WE want to be better at automating getting data on nad off our machines
... W3C does good work on schemas but it's not lightweight
... We've been doing a thing called dataprotocols.org where people can hangout and work on specs
... Slide 5
... is a screen grab of what's going on
... these are concrete services that we have built or want to build
... it's not a formal standardisation process
... but this is a space for more informal, RFC-style development
... more on slide 6
... Going on to talk about data packages
... been working on it for about 5 years
... orginally part of CKAN
... we have software packages, can we have data packages?
... Focus on tabular data. The catalogues I've seen, tabular + geo is almost all of what gets published
... In terms of original raw data, most of it is tabular
... It has a lot of attractive properties that I could go on about
... A lot of the data is file based, not API-based
... flat files like CSV are very attractive. It may not be pretty but it is effective - like a Kalashnikov rifle
... everything supports it
... it streams well, you can have massive files
... a simple schema for describing CSV would be useful
... we need version info for CSVs. Open is important, but if eberyone is collaborating, how to we do version management for CSV
... Git or Mercurial are potentially good ones for CSAV as they are line-orientated
... the actual spec is available
... what it boils down to is a bunch of data files, you have JSON and you can have other stuff
... there's a .json file that includes the metadata
... and then you have a filespec to list your files
... Slide 12 is an example

<rgrp> https://github.com/datasets/cofog/blob/master/datapackage.json

rgrp: You can add more to this. But basically it's a table schema

<rgrp> http://www.dataprotocols.org/en/latest/json-table-schema.html

The whole thing follows...

{

"metadata": {

"name": "cofog",

"title": "Classification of the Functions of Government",

"homepage": "http://unstats.un.org/unsd/class/family/family2.asp?Cl=4",

"version": "1999",

"source": "United Nations",

"licenses": [

{

"id": "odc-pddl",

"name": "Open Data Commons Public Domain Dedication and Licence (PDDL)",

"url": "http://opendatacommons.org/licenses/pddl/"

}

"description": "Classification of the Functions of Government (COFOG) is a classification defined by the United Nations Statistics Division. Its purpose is to \"classify the purpose of transactions such as outlays on final consumption expenditure, intermediate consumption, gross capital formation and capital and current transfers, by general government\" (from home page).",

"keywords": [

"Classification",

"COFOG",

"Finances",

"Government",

"United Nations"

]

"files": [

{

"path": "data/cofog.csv",

"fields": [

{

"id": "Code",

"type": "string"

{

"id": "Description",

"type": "string"

{

"id": "ExplanatoryNote",

"type": "string"

{

"id": "Change_date",

"type": "date"

}

]

}

]

}

rgrp: Interested to hear from the LD community in things like JSON-LD
... Wrapping up... you could push this to a local Web site and you have a data package
... Not concerned whether we adopt this kind of package or something that does the same thing, but it's how to publish data packages without a data catalogue
... Using JSON as your base schema language
... got to make something that can be used really usually
... Some people have suggested we drop JSON nad just use another Excel worksheet to provide the data
... Every step to making it easy, brings more uses of the data
... We want to reduce the fritcion to getting, using and sharing data

Tomasz: Thanks Rufus

<rgrp> I have finished

<rgrp> Any questions :-) ?

<Gwyn_Sutherlin> question

<fhenning> PhilA2: w3c is aware of process for schemas. there's a workshop planned for next year on exactly the issues that rufus has been addressing.

<scribe> scribe: fhenning

<PhilA2> Gwyn_Sutherlin: Do you do work around unstructured data - text, audio, video etc?

<PhilA2> rgrp: Yes, we do, It's the Open Knowledge Foundation - we're format agnostic

<rgrp> open knowledge includes content, data etc

<PhilA2> Gwyn_Sutherlin: Our cases are usually around transparency and cirrpution

<PhilA2> s/cirruption/corruption

<rgrp> we do a lot around other topics including corruption :-) eg. http://okfnlabs.org/events/hackdays/lobbying.html

[sorry, the audio problems seem to be at our device . its'not possible for me to do complete scribing at this point. could you take over phil?]

<PhilA2> DeirdreLee: Thanks Rufus for the presentation. Do you see things like DCAT as added overhead? What tools do you see for packahing data?

<PhilA2> rgrp: That's my point. The spec allows you to build the tool. We have a tool called DPM

<rgrp> http://dpm.readthedocs.org/en/latest/

<PhilA2> rgrp: DCAT is in some ways format agnostic but it's an LD format. For an Excel user, you can tell them in 30" how to export in CSV. There's no "export in .n3" option

<rgrp> metadata = ini file - xyz: abc

<PhilA2> rgrp: Maybe the metadata file should be a .ini file for a simple example

<PhilA2> rgrp: Most formats are very simple, with JSON as the most complex. Need things people can produce with the tools they have

<PhilA2> Tomasz: Thanks Rufus again

<rgrp> see python, ruby, debs, nodejs etc - all have super simple package formats ..

<rgrp> also you need a reason for people to package - you need something they can then do ... (e.g. get something into postgres in 30s)

Serafin Olcoz on Openness and Reuse of Public Sector Information using Open Data Publishing, Decree

yes phil

<PhilA2> I love this line - The public sector is an archipelago of competences and budgets

[no it drops in and out - better if you do it phil]

<PhilA2> scribe: PhilA2

Olcoz: We decided to share all our code
... Slides http://dl.dropbox.com/u/49911950/W3c%20-%20Open%20Assets.pdf
... Slide 2 has the key policies
... Memorandum specifies the schema to use etc.
... makes publication of source code etc.
... If you want to develop software, you are obloged to see what's already available and build on that
... you need to write a report on various aspects. What you're using, what you're contributing back etc. (under EUPL licence)
... provide a functional description etc.
... Also state what dependencies there are etc.
... You are required to publish at least the dependencies as it affects everyone, not just you
... The aim is to have a global idea of what is being done using public money to develop software
... This is formalising the re-use process

PhilA2: The obligation applies just to people being paid by the public sector and not to third party developers?

Olcoz: Yes
... Private sector can take OSS and develop new products and services based on the OSS directory
... They may then realise the advantage of this and can, if they want, open their own source code
... Which we hope will create a virtuous cycle
... We're offering a robust service 24/7/365
... slide 5
... We are actively encouraging development.
... Supporting local enterprise and investment without having to spend public sector money - an unimportant feature in the current climate
... We'd like others to share out approach of course and would welcome a European approach
... In order to allow people to use your OSS, you need to have a portal to make it available and to be able to access other repositories
... the repository itself is an asset that has value
... you can learn a lot about past and present components
... if somethinng is under development and you can wait for it to be ready before you use it, then you know to wait, If you can't wait, you know you need to go your own way.
... all the records are contained in an open data catalogue
... Slide 9
... We're agnostic about formats. It can be data, or text or code etc.
... We need to be able to federate our repositories
... We defined various vocabularies, including for the re-use process
... We have the support of CTIC and others
... also of ministry of finance in Spain, evaluating for use across Spain

<Tomasz> we will reschedule Elsa's presentation for the next meeting

Olcoz: talks about the schemas in use. Refers to RADion (http://www.w3.org/ns/radion) and the ISA Programme that created it

<Tomasz> but i would still like some discussion about Serafin's talk

Olcoz: Model can be used to link different sources of data from the Web. Important to see repository itself as an asset - needed to extend RADion

I did not say this, I said that RADion is too specific and we had to develop an Ontology Adion from which RADion is particular cas that includes a instance of our Repository Asset. Considering a Repositoy as an Asset it allows as to have semantic federation of a distributed hierarchy of Repositories that RADIon or DCAT do not provide and requiere external programming to have such a Federation due their models work on Silos. Adion approach has also advantages to consider data or information coming from Internet of things.

Olcoz: We find problems with ADMS and DCAT. They don't cover everything we need

I did not say this but DCAT and ADMS do not take care properly of IPR of documentation attached to Data or Assets, but due to our abstraction for Elemental Open Assests (Data, Apps, Reports, ando so on), these documents are also Assets so their Distribution is able to properly deal with IPR issues.

... Three new portals launching in a couple of weeks' time. All source code is open for re-use

Tomasz: Thanks very much Serafin - very interesting

Olcoz: If you need info about the decree - I've submitted links to English resources to the IG

Tomasz: Can you give us a sense of the size of the Basque government involvement?

Olcoz: We're still working on finishing the repositories so we'll have to wait a few months to be able to report on experience

Tomasz: Is design for re-use part of the requirement of the new software project?

Olcoz: Not yet. That's the plan for the future
... People often saw design as being very specific to a use. In the early 90s, people began to change that view. Now you see a lot of re-usable software components
... We need to work on the guidelines around this

I also added that I will present by this Thursday this work to W3C SIG GLD and that I will submit through it the vocabulary of Open Assets based on Adion to W3C in order to be taken into account for future evolution of standards related to Open Data and Re-use of Public Sector Information

Tomasz: Any more questions?

Olcoz: I'd like to say I'm making a presentation on this to the GLD WG this Thursday
... Wants to make a Member Submission

Tomasz: We're at the end of our time
... Apologies to Elsa for moving your presentation to our next meeting next month
... A reminder that we have an open call for assistance with developing the group's summary of the various presentations we have received concerning social media
... Next few meetings will be on open data - and so will welcome guests and ideas for speakers

<Gwyn_Sutherlin> thanks

Tomasz: Thanks to speakers and scribes

W3C eGov 26 November 2012 meeting

Attendees

Contents

Open data kickoff

presentation by open knowledge foundation (rufus pollock)

Serafin Olcoz on Openness and Reuse of Public Sector Information using Open Data Publishing, Decree