ODW Day 2 -- 24 Apr 2013

Has anyone seen hadley

<BartvanLeeuwen_> scribenick: BartvanLeeuwen_

Perspectives & Experience

<BartvanLeeuwen> scribenick: BartvanLeeuwen

DNA Digest by Fiona Nielsen

DNADigest non for profit org. with purpose of opening up genome sequences

DNA Sequencing, purpose : cancer research/ heritable traits and illness and rare diseases

<PhilA> Apologies to Fiona who is saying seriously interesting stuff but fighting technology problems. Facilities are on the case but so far without success

<StevenPemberton> scribenick: StevenPemberton

Fiona: researcher needs databases of genomic variation
... more data is needed to validate results
... they want to learn from other data
... but are not sharing their own
... why not?
... - confidential data
... - easy to identify individuals from genome data
... - BULKY
... so data has to be de-identified and aggregated

<scribe> scribenick: BartvanLeeuwen

no open sharing of raw data

only defects for described diseases is shared

Solution of DNA Digest, not all research needs access to full data

e.g. Does this mutation occur at higher frequency

this result is already de-identified

challenges: connecting existing datasets, incentives for sharing and commercial model

http://dnadigest.org

Next speaker: Florian Bauer http://www.reeep.org

reeep helps with renewable energy in developing countries

website after website is launched, "portal proliferation syndrome", that needs to be fixed

in current projects, data silos are created

<StevenPemberton> scribenick: StevenPemberton

Florian: the system has to deal with disambiguation

<BartvanLeeuwen> its seems very hard to stop this

<BartvanLeeuwen> but its because project deliverables state information dissemination , not how

<BartvanLeeuwen> We need to break silos and link information to help them do it differently

<BartvanLeeuwen> Can we support the linking of information with and automated system

<BartvanLeeuwen> we have to !

Florian: and use linked open data
... we supply thesauri
... our system helps tag unstructured documents
... a file is sent, in whatever format, we analyse it, extract common terms
... based on thesaurus and vocab
... and deliver lots of metadata
... with definitions etc.
... and we can store the location of the document with tags (if the user wants)
... [gives an example]
... please look at http://api.reegle.info
... we are a non-profit
... so no charge
... we consider this a first step

Next speaker - schema.org, Dan Brickley

<BartvanLeeuwen> next speaker: Dan Brickely Schema.org

<scribe> scribenick: BartvanLeeuwen

introducing schema.org

search engines decided to collaborate, even though they are competitors

putting triples on the web was only done by a limited number of people

schema.org is not SEO, its semantic technology gone mainstream

schema.org is html with markup, to generate triples

creates rich snippets for enhanced search results

allows custom search engine

working on integration and discovery of datasets

we are driven by being able to answer questions, thats why datasets are interesting

[ shows examples ]

vocabulary is not designed by the schema.org people, its community driven

the team only integrates community efforts

this community is driven by w3c community group

dataset are added based on work by a.o. W3C GLD

where do graphs stop and tables start ?

schema.org is about finding dataset not making all data rdf

<PhilA> If I get a chance I'll talk about what we're doing at W3C on vocabs - the schema.org management model is instructive

Next speaker: Licensing Library and Authority Data Under CC0: The DNB Experience, Lars Svensson

today I talk about the change business model

mixed business model right now: which dataset / which format / access type

3 methods for harvesting meta data: SRU, OAI and HTTP

SRU and OAI are free services after registration to keep customers up to date

HTTP is free, no registration or tracking

formats, RDF, CSV and MARC

MARC is a custom library format

<StevenPemberton> http://en.wikipedia.org/wiki/MARC_standards

by end of this Quarter also as PDF ( nice PDF )

Authority data is available under CC0 and metadata as well

metadata about books is mostly CC0, RDF / CSV always CC0

MARC data of last 2 years is available under a fee older under CC0

process of transition

discussion about lost of revenue and spillover

scared of people running away with our data and earning lots of money with it.

no know examples of this actually happening

lessons learned: Finding right licenses can be tricky

CC-by license was to complicated

make it easy for people to use your data !

we ended up with CC0

Next speaker: Open Government Data Projects in Japan, Shuichi Tashiro, IPA

<PhilA> IPA is the government IT standards body.

brief intro of opendata in Japan

October 2008 CIO Forum started discussion about opendata in japan

in 2001 there was already a e-gov portal site

multiple administrative procedures available in one portal, but with non standard UI

public private collaboration with a realtime datalink

between train company and seismic center

in 2011 all trains stopped 10 seconds before earthquake wave reached main land

October 2010 a Open gov Lab was started with various items

before 2012 open gov strategy

lot of data put out had low quality

Earthquake of 2011 was a turning point

demand for data:

-- who is where

-- who need what / who can provide what

-- availability of general infrastructure

-- pollution level

Multi tier collaboration is needed

[ shows examples ]

Recovery and reconstruction support program database, one stop shop for support programs

this resulted in a vocabulary problem, all local govs used there own terminology

Japan has problem with its character code

very local, and very specific to identity

[ shows architectural diagram ]

o.a. contains a vocabulary database to solve problems with local terminology

also a open license character databases, with RDF support to find relationship between characters and legislation about character sets ?

Panel: Previous speakers

hadley: calls out to panel: if you had 10billion budget where would you be in 10years

Danbri: we should finaly start integrating

Lars: Digetizing

solve rights issues, who own rights on publications is somewhat hard

Florian: governments need give us the data we need to answer the important questions. Smartgrids are important

<daveL> in relation to IPA presentation on linked data in Japan, there is a new W3C community that aims to develop best practice in multilingual linked open data: http://www.w3.org/community/bpmlod/

Fiona: Money is not a issue, not technology , standardization is the current issue

Questions from room

ivan, remark: we should generalize a bit not just government but also science data

they produce data we should be able to access that as well.

Dave Lewis: remark, focus on LOD in countries where main language is english, there is multilingual Comm group [ please insert uri ]

<daveL> http://www.w3.org/community/bpmlod/

Hans Overbeek: questions about standards, what is the best advice? should we use schema.org or DCAT profile for integrating datasets

<PhilA> DCAT Application profile work through EC's ISA Programme. Details at http://joinup.ec.europa.eu/asset/dcat_application_profile/description

danbri: Schema .org is a agile approach, which could spin of into a standardization body

<ldodds> scribenick: ldodds

Product Data

<scribe> Chair: PhilArcher

PhilArcher: I think product data is going to be very important

first speaker is John Walker, paper: http://www.w3.org/2013/04/odw/odw13_submission_40.pdf

John is talking about Open Data in the electronics industry

John: @NXPData is our twitter, please share what we're doing
... NXP is semiconductor company in netherlands, lots of large customers and large portfolio of products
... Content is the product, Product data is part of the content => data is the product
... Why make data Open?
... In bad old days, very doc centred information, lots of content silos, content reuse was copy and paste or (worse) re-keying
... Consequences: inconsistency, cost, errors, complex to manage
... Consequences (for customers): confusing, inconsistent, manual effort to gather/re-use information, difficult to find new products
... Vision: unified content strategy. Create once, approve once, re-use many times
... ISO 13584 -- data model for describing products
... DITA -- for natural language content
... variety of outputs, including flyers, data sheets, online, etc
... data sheets are mixture of text and data
... parametric search interface for finding products
... dictionary of properties of products
... NXP want to be canonical source of information which is re-published by others, e.g. distributors, aggregators, etc
... want Web of Data to help collaboration
... http://data.nxp.com and http://qa.data.nxp.com
... that is a work in progress
... give us feedback on what is there
... Open Data challenges
... how do we convince others to use (linked) open data?
... how do we justify business case (ROI) -- argue that its simpler
... what formats should we use? (rate of adoption of RDF/Linked Data)
... How do we ensure quality, security, enable access
... How do we combine semi- and unstructured content in publications?
... Are we giving away a key asset?
... How do we standardise in industry?

Next speakers are Andy Hedges & Richard McKeating, Tesco

Andy: big numbers about tesco, but what is interesting is how our data connects us
... some of that data is ours, some belongs to others: customers, suppliers, manufacturers
... may not be able to share all of it: rights, commercial sensitivity
... need to understand where information comes from, and how it can be best combined for customer benefit
... offer customers best service/prices and compare data
... perhaps not intuitive to allow cross-supermarket price comparison, but want to be a good brand have good contract with community

Richard: I'm passionate about how to use open data, microformats, etc to help customers
... Tesco Open Data is about our customers
... Places where you can buy things, incl. online
... Products that we offer (across range of brands)
... Orders: what's in your basket, or buy on the web
... Journeys: we make to fulfill orders, e.g. deliveries, logistics
... Rewards: what do you get as a customer, club card pointers, offers, price promise
... at Tesco we are on journey towards exposing this data, to allow app providers to access it
... would be great to link to open data sources
... really key area is how we share data with trading partners
... important for brands to share product information, esp. accurate data
... customer access to data, e.g. purchase history, allergen information, etc
... Tesco sell NXP products :)
... but they have little information on it (opportunity for sharing)

Andy: customers don't just shop at tesco
... standards make our life easier
... expose price promise, cross retailer product comparison, delivery choices
... can compare products using GTIN (same EAN)
... but suppliers might have different sizes, etc. Some products can vary
... Keen to start dialog with standards makers

Next speaker is Mark Harrison, GS1

Slides: http://www.w3.org/2013/04/odw/GS1-LinkedDataPresentation-ODI-April2013.pdf

GS1 is a standards body, assigns GTINs

Mark: GS1 is global stds org; 1m companies working on standards, e.g. for barcodes, logistics, rfid, etc

<scribe> ... new initiative started in Feb, GS1 digital: putting identification into the web

UNKNOWN_SPEAKER: B2C example: map human-readable keywords ("milk") to Product category identifier (GPC), search has user constraints (price, distance, urgency)
... contextual filters for product category, e.g. organic, skimmed milk
... refine search to find products and services that match needs, e.g local store offering the product
... Achieve this using data linkages
... start with keyword which is mapped to category; category has attributes/criteria
... imagine using contextual searching across all suppliers across the web, not just facet search within single website
... Schema.org and Good Relations are key vocabularies
... + GS1 : Global Product Classification (GPC)
... easy for them to open that, already available as XML, in process of creating multi-lingual RDF
... GSN: Global Data Synchronisation Network has additional data about manufacturers, etc
... GPC as Linked Open Data

Slide shows simple example graph of GTIN and facets

Mark: please collaborate with us: email gs1digital@gs1.org
... I'm working on project as researcher, others handling industry engagement

Next speaker is Phillipe Plagnol

Product Open Data

Slides: http://www.w3.org/2013/04/odw/POD-2013.04.22_01.pdf

Philippe: Product data is critical for open data movement
... everything around us is a product, has one GTIN code which is unique identifier
... products used by everybody, every day
... lots of contextual information, e.g. product packaging, nutrietion
... products are fundamental for trade, economics
... objective is to create big repository of data about products, based on barcode
... include ecological impacts, sources, support responsible consumers
... lots of apps to support product barcode scanning, then can use that to access data
... BUT: currently no public database containing this database
... manufacturers have all this information in databases to support printing, but don't share it
... largely question of access
... give us what is already printed on the packaging, using GTIN as a key
... need to have a product schema for manufacturers to support their publishing
... asking manufacturers for nutritional data
... working in France, hoping to get traction in other countries
... incredible possibilities of using this product data
... GTIN is a new communication channel. More easily support product annotation, product mapping, etc
... apps can support product reviews, consumer recommendations/decision support
... only thing stopping us is an open catalog of the data
... imagine a "google maps for products"

Example of consumer buying decisions based on using a third-party app that uses ecological data

Now moving to discussion with all speakers

PhilArcher: (to Andy) does Philippe's talk scare/please you?

Andy: pleases us, we want to see this data opened too, helps us be better retailer
... an awesome challenge

Richard: legislation is important too, "contains nuts" is a life/death decision

John: v. interesting. In semi-conductor industry, people will take the data if its available
... want to make sure products are accurately described, whether its strawberries or microchips
... help consumers find what they need

PhilArcher: Mark, what are members saying?

Mark: enthusiasm from many members, great opportunity for all of us to be more responsible consumers
... how do we spend out money, what choices do we make?

Jim King (Adobe): isn't there a large product db of RFID data?

Mark: not quite the same thing, that is likely EPC data, movement of stock through supply chain
... we're discussing the product master data
... opening EPC data is more commercially sensitive

TomHeath: this is great session, what are the concrete steps towards openly licensed data?

Mark: different kinds of product data; product categories/values can be openly licensed, largely format shift
... data from manufacturers, requires discussions with them, v. early stage
... they need confidence in benefits for themselves and others and licensing discussions
... lots of good will, enthusiasm

Richard: Tesco are translating desire to action by working with GS1 and brands
... make it easy and not disadvantage suppliers

ZachBeauvais: heard lots of positive noises, but what are the business cases on building on available open data? Any concrete examples?

Richard: one bus. case is legislative changes and ensuring that products are accurately described
... opportunities within our enterprise, we have silos
... really only just starting to understand wider bus. case

Martin ? (IBM): are competitors trying to catch up?

Richard: plenty of retailers working in this space, but not co-ordination yet

John: companies are already scraping and reselling data outside of our control
... if we can clearly license it, then can make it easier

Closing statements

Philippe: originally saw GS1 as "enemy", but can now see they're embracing open data
... GTIN code is often hidden by lots of ecommerce web sites, needs to be published and clearly available
... come and download dump our own data to see how it is constructed
... I'm following GS1 stds work, give me feedback

Mark: huge opportunity to make a difference, make world better place

John: in B2B industry, focus needs to be on streamlining business integration

Richard: keep focus on benefits for customer

Andy: echo that, need to look at customer needs.

That's the end of the session!

Dumb Strings That Mean So Much

<naomi_> Chair: Hideaki Takeda

<naomi_> scribenick: naomi

<naomi_> Next speaker: Ministry of the Interior of the Netherlands, Geonovum, Hans Overbeek

<naomi_> Hans: Concept URI Strategy for the NL Public Sector

<naomi_> ... Thijs gives you geographic informaion

<ivan> s/infomaion/information/

<PhilA> scribe: PhilA

<scribe> scribeNick: PhilA

Draft URI Strategy for the NL Public Sector, Hans Overbeek

paper http://www.w3.org/2013/04/odw/odw13_submission_14.pdf

Hans: Talking about Designing URI Sets for Public Sector, ISA programme study on URI Persistence etc
... Why do we need a URI strategy - it's about trust, provenance
... hard to do in the LOD Cloud
... We want our URIs to be recognisable and trustworthy
... We have kept registers - buildings, railways etc. for hundreds of years, mostly to define identifiers
... joint points for linked data
... we develop a model and a vocabulary for it
... a register is a list of things that you want to reference
... then there's all the sensor data etc. What we think of as the big data
... re-use of things like reference objects is what we want to re-use when we write our URI strategy
... we struggled a little as we have to mint URIs, but we have a lot of identifiers that we can re-use
... but we don't have a register for everything
... there was no register for all our municipalities
... so we had to mint URIs for them... which means making a register
... you can only have URIs if you have a register
... No register? No identifier
... so we were convinced that we needed a URI strategy
... the pattern that we used was, not surprisingly the one developed in the UK/backed by ISA
... The domain should identify the register in a persistent way, {register}.data.gov.nl
... The UK pattern has a {sector} in the pattern which sounds nice but it's hard to find someone to govern the sectors. Some will overlap etc.
... so we thought we might not need {sector} and left it out
... with no strategy, you can use any URI, but it's less recognisable and less trustworthy
... That means we end up having to have a register of registers
... What infrastructure is needed?
... which apps use the resolvers and how frequently
... There's more in the presentation and the paper
... are we heading the right way?

Shared understanding = shared foreign keys (and more), Richard Light

paper http://www.w3.org/2013/04/odw/odw13_submission_17.pdf

slides http://www.w3.org/2013/04/odw/odw13_slides_17.pdf

Richard: Want to talk about a pragmatic approach to getting URIs into the cultural heritage sector

<StevenPemberton> i/Chair: Hideaki Takeda/scribenick: naomi__

Richard: gives brief history of museum identifiers
... work was done on vocabularies, controlled vocabularies

<naomi_> scribenick: PhilA

<JeniT> is this the first RDF/XML of the workshop?

Richard: shows some examples of collections described in RDF
... There are good discussions in progress across the sectors...
... but although there is more RDF coming out, when you look in detail, a lot of values are given as strings
... Modes is software used in most UK museums, Not free but you become a share holder
... Modes includes standard term lists etc., that become standards across users
... now starting to use Web to get the terms
... Modes includes a live search of geonames as source of URLs for geographic places. Conversion happens in the software
... Can we use SPARQL endpoints as a a term list? Yes...
... Curators won't do any LD publishing themselves. All done in Modes
... uses XSLT to transform data from original XML data. handles the conneg etc
... Shows work that gave a URI to every word Shakespeare ever wrote
... Adlib and CALM so looking at generating/using linked data
... gives example of dog food eating

Aggregating media fragments into collaborative mashups: standards and a prototype, Philippe Duchesne

paper http://www.w3.org/2013/04/odw/odw13_submission_28.pdf

pd: on Media Fragments
... lists issues faced

scribe note - slides are expressive/detailed

pd: points to earlier work that is all media specific
... No harmonised definition of the fragments
... Wanted to decouple fragment from media
... geospatial and tree paths not part of any previous work AFAIK
... project done mostly using HTML/JSON developers
... but we have a SPARQL endpoint as well

JSON on the screen a few minutes after RDF/XML...

Sorry folks, no time for questions on this session

pd: gives a quick demo

Digital Archiving 3.0, Christophe Gu�ret

paper http://www.w3.org/2013/04/odw/odw13_submission_34.pdf

slides http://www.slideshare.net/cgueret/digital-archiving-30-odw

scribe note - slides are expressive

cgueret: we need to treat the data and metadata differently
... we find LD the best format for this
... Many formats for data itself
... rather than force people to transform their data, they should just get the data in the repository - it's up to the latter to sort out formats
... Forget about URIs as data

PhilA: Grrrr

cgueret: we have new formats every 5 years. Use conneg to handle format evolution

<ivan> -> a related pointer: how to cite data for scholarly purposes, the Amsterdam Manifesto, http://www.force11.org/AmsterdamManifesto

Discovery panel

<CaptSolo> scribenick: CaptSolo

<scribe> chair: danbri

danbri: we will be discovering discovery, as everything brings back to this topic

next speaker: Richard Wallis, OCLC

rjw: working for OCLC
... WorldCat (stats about number of lbiraries, books)
... integrating linked data, schema.org
... the other hat: chaired ... [could you add detail here, missed it?]
... need to publicize links to resources
... generic vocabs = generic "glue" that helps link resources

<LarsG> Richard chairs the Schema Bib Extend Community Group http://www.w3.org/community/schemabibex/

rjw: you have to demonstrate the benefits -- use the data to drive the services

next speaker: Chris Metcalf, Socrata

[http://www.w3.org/2013/04/odw/agenda#al59 abstract, paper]

Chris: ~60 customers who want to use open data
... how can we use schema.org, etc. to help solve the discovery problem
... catalogs that need to speak to each other (cities.data.gov, ...)
... how can we encourage people and industry

note: if i miss things scribing, please add them :)

(technical pause)

next speaker: Steven Pemberton, CWI

StevenPemberton

[http://www.w3.org/2013/04/odw/agenda#al9 abstract, paper]

Steven: don't have slides, can speak now :)
... involved with W3C from day 1
... name on number of standards, incl. RDFa
... point of research: make computers easier for people to use
... small data is important
... e.g., website for this conference
... you look for data on aiports, lodging, agenda, ... (and you enter same info again and again)
... if the info were in RDFa you could automatically add this info to your calendar, find best flights, ...
... if your browser helped you here, people's lives 'd better
... and browsers would would by providing services that help use this data
... win-win for all
... use RDFa

next speaker: Pascal Romain and Elie Sloïm, Conseil général de la GIronde/Temesis

scribe: Pascal from local council
... Elie from a company, W3C member

[http://www.w3.org/2013/04/odw/agenda#al38 abstract]

Elie: checklist of 72 good practices (fr, en)
... every good practice has to be available online, international, usable, realistic
... OPQuast - Open quality standards
... if you are open data producer, go and check the guidelines

<markbirbeck> ;)

Pascal: open data checklist : a tool for LOD?

markbirbeck: thanks

next: Madi Solomon, Pearson

[http://www.w3.org/2013/04/odw/agenda#al48 abstract, paper, slides]

Madi: need to find new ways of doing business
... provided a solution for publishing open linked data (?)
... but never used the words "open" or "linked"
... termed it resource enrichment (?)
... textbook can be broken down into a large number of assets
... each requires its metadata, ...
... concept extraction, keywords, faceted exploration
... built a rule to match with wikipedia, automated metadata generation as quickly as possible
... astrophysics textbook -- had to do filtering, though, to keep out sci-fi topics :)
... found out we can make taxonomies on-the-fly
... creates a baseline, curated taxonomy
... virtuos circle -- put it back into community, maintain, update, ...

Dan: question time
... back to W3C aspects
... (question to the whole panel)
... if not making standars, what should be doing instead
... ?

Steven: small data is important.
... see benets from including data on websites

Pascal?: standardisation is good efford

rjw: not standards work, it is nurturing communities that have emerged
... sharing experiences, ...

Madi: as new coach of digital Publishing IG - questions is what can i do for you?

Chris: i love simple tools that do powerful things
... w3c working on these things, but lot of people don't know re them
... need to reach out, inform

Elie: need to produce standards + guidelines for implementing those standards
... have nice specs but not always simple to implement

questions from the audience

Hadley: back to geocities era - we made lists
... feel we are doing that now = making data catalogs, lists of resources
... what we need to do to make it worthwhile to index the metadata by search engines
... so average person can participate

(applause)

Chris: that's schema.org RDFa stuff
... build it into catalogs, so data gets crawled
... not just list datasets, also make metadata schemas more practical for people
... want to type in my ZIP-code and find what's relevant to me

rjw: we (non-search-engines) we need to talk their (search engine) language
... need to put up resources in front of people

Steven: i marked my homepage w RDFa
... involved me ending up on Google Maps w the location where I live

danbri: getting more vistors? :)

Bob Schloss (IBM): by analogy to SEO - let's implement

scribe: fabulous implementation -- go to cloud's website, enter a URL from where the dataset can be fetched
... they 'd "suck in" the dataset and give it back w enriched metadata
... let's externalize it. if nobody starts, we won't have it

Chris: we are doing cataloging well
... re dscovery by search engines
... data by itself not so useful to everyday people
... people need to *use* those datasets
... we need to allow the data to be more useful

Bernadette: as someone who has spent years on vocabs for linked data
... need communication, mentoring
... info that simply and quickly explains what it is about
... to those who make decisions
... people in this room should write books, make videos, organize seminars with the stakeholders
... there's so many standards, implementations. can be overwhelming

(miss this one re google glass, RDFa, ...)

Martin, CTIC: Spanish experience re economy, corruption

scribe: Spanish government launched a technical standard for all public bodies
... all have to use these guidelines when exposing open data
... have to use linked data
... URI scheme for catalogs, datasets, ...

now the last round of comments from the panel

danbri: 10words of 30 syllables now

Steven: if you got information, it should be on the web + machine readable

?: as data producers think of objects and entities -- instead of datasets

rjw: you're only people in the domain who know the benefits -- demonstrate them to everyone !

Madi: middle place b/w where data is release and ppl access it. data-driven businesses

Chris: we as LOD advocates need to reach out to ppl outside the LOD community
... many as tired of SemWeb cause they don't see the benefits
... schema.org,RDFa, simple tools

Elie: need more metadata for search engings, end-users
... metadata quality -- checks needed
... make website for end-users
... websites
... need to work on quality

danbri: thanks to the panel

panel discussion finished

Minutes of the “Crowd’s wisdom” Session in ODW13

Scribe: Agis Papantoniou

Utilising Linked Social Media Data for Tracking Public Policy and Services, Deirdre Lee, Adegboyega Ojo and Mohammad Waqar,DERI/NUIG

Deirdre attempted to answer questions regarding concerns of citizens related to public policy and services, expressed voluntarily in social media sites, like proposals for ring-road constructions, views on means-testing for medical cards to name a few. According to her the challenge is how can policy-makers distinguish relevant data from opinions and to this end a proposal of systematically tracking particular topics of interest across a range of social media sites was presented. Deirdre also talked about the relation between Open Data and public policy and services. Open Data influences policy makers but also evaluates decisions taken, like for example where can we build a new school or a ring road and whether that decision was correct or not. For example data on environmental factors could be of help upon the choice of the place to build a road. Deirdre pointed out that there is a lot of research still to be done on supporting technology but the case becomes even more complex when social media input comes in play. Such input can be used on a business level, to analyze and identify trends, market brands and predict sales. From the government bodies, the use of social media is limited and the analysis of their input is even lower. They do understand their use, their potential and power but they do not have clear guidelines on how to get the most pout of them. So Deirdre proposed a process to utilize social media data in the policy process, involving steps like data extraction, data visualization as linked data, data analysis and decision justification, stemming out of this analysis. Many challenges popup as an outcome out of this process that need to addressed, many sources, many APIs, many formats, URI scheme considerations, making social media less noisy to name a few. The Linked2Media is an FP7 project having as a goal to harvest social media data and make them useful for SMEs. The Social Media Linked Data Space (SMLDS) platform is already implemented and it can be used by SMEs. Practical examples of the limitation of social media APIs were also presented (cf. the presentation slides) and it was discussed that SMLDS also has a model through which data were “semantified” via existing and reusable vocabularies like SIOC and schema.org. The next steps of the Linked2Media project involve further integration with more social media sources and the utilization of social media data to further identify, justify and evaluate public policy and services.

Bottom up Activities for linked open data, open government in Japan, Takumi Shimizu, Keio University/Open Knowledge Foundation Japan

There are two activities of open data and open government in Japan. One is top down and lead by national government. Another is bottom up, community based activities. Takumi introduced both cases, which had as an aim to trigger the development of best practices to implement community based open data/ open government activities closely harmonized with world wide activities. Takumi also talked about the International Open Day in Japan, with over 300 participants, to promote bottom up activities of LOD activities. Among the key factors of the success of such initiatives, the collaboration of universities and researchers with government bodies, seems to be the most important one. Such an interconnection took place within LODA, a project having as aim to develop LOD and a data exchange platform, addressing museums, arts, sports. Takumi presented two use case that took place in the cities of Yokohama city and Sabae. During the first, Yokohama Art Sport app was developed, actually being a sort of mashup combining info from local museums and events. Sabae – city use case had to do with government related open data. XML datasets involving public transportation were published as RDF. Sabae city is one of the most modern LOD initiatives as the stakeholders worked tightly together providing feedback and input for the development of the apps. After the use case presentations Takumi presented in brief the OKFN Japan and its activities, like hackathons that took place in Japan cities. Activities like the Open Data day produced quite some incentives. Around 90% of the participants were satisfied with Open Data initiatives and OKFN helps to share best practices providing toolkits and tutorials, related to open spending helping and aiming at their further localization.

“Storytelling” in the economic LOD: the case of publicspending.gr,Michalis Vafopoulos, NTUA

Michalis talked about the publicspending.gr, an initiative retrieving and processing Greek Public Spending decisions, semantifying them and visualizing them in order for the Greek citizens to learn about their country’s public expenditure. Transparency at its best! Michalis presented publicspending.gr’s search and advanced search functionality (like for example provide VAT number for a physical and legal person and see the results – whether being payer and/or payee, the amounts of the spending decisions etc), the consolidated and interlinked business profiles (interconnected with other LOD like DBpedia and WESO). The next topic of the presentation involved an economic-driven network analysis of the publicspending.gr dataset, providing focus to the most important payers and payees but also revealing the evolution of the Spending Decisions along with their spending domains (payers and payees). Michalies also discussed how the pubicspending.gr dataset is also interconnected to other countries’ open datasets and the case of the Australian Government was presented and how both datasets can be queried and provide comparable results between the 2 countries’ spending decisions. Michalis concluded that the Economic LOD is the natural nucleus of the ecosystem and that LOD apps can really change the world.

Question 1: What tools did you use for the network analysis?

Answer 1: Gephi for visualization, processed with R and other mathematical tools, Virtuoso for the triple store

Panel discussion

Christian Nolle presents himself also noting that they are building a tool prototype to monitor corruption in the UK(?)

Q1: presenters were asked from Uldiers to describe one lesson learnt

Christian: lack of collaboration with policy makers
DL: social media data are restricted and providers need to open up their data
MV: the critical mass for Open Data is lower than the whole Web – so building on its evolution is good since it will lead to more applications – this is mandatory
Takumi: the engagement with the local communities seemed to be a challenge and they have to make cross relationships with such OD local communities.

Q2: Bart V. L: did you investigate the engagement with the public via social media – asking questions instead of only listening to posts?

DL: they are looking @ this continuously but sometimes is hard to find participants to answer such questions – but listening leads to trend understanding and at this moment seems like the optimal approach.

Q3: elections and corruption – what do you expect?

Christian: sat down built a tool and its ok but what happens with the people who don’t have internet access – so SMS extension of the app is in place.

Q4: is the panel concerned on demographic issues regarding social media?

Takumi: there is no clear distinction and relation between demographics in social media but small groups usually provide feedback. The 2 Japan cases provided quite some amount that can be somehow related to demographics.>br /> MV: researchers are complaining about the results of PSGR, journalists are providing raw info – so various uses for demographics – in Greece PSGR is being more and more accessed so this opens up space for demographics
DL: no worries about the demographics as long as we are aware of such demographics
Christian: must not forget other technologies like radio etc…

Q5: political parties are getting clever and start to extract info out of social media and dashboards about opinions are already in place. There are rumors that the parties are using such data in a clever way – might we end up reading 2 views of the same sentimental analysis data? One about who they are and one about how many comments they got on their posts?

Chris: politics, the most popular wins…
DL: cannot be stopped but LD and Social Media combination can prove as a solution in order to analyze and produce facts and not rumors…
MV: if you can make info more attractive i.e. we have all the signers of an expenditure –you can relate them to geographic areas. So objective layers can be put in sentimental analysis.

Conclusions

Economic LOD is the nucleus of the ecosystem + make clear to policy makers LD as infrastructure
Takumi: relationships with local communities is important and need to organize more conferences and OD days
DL: such events like ODW are good but more interdisciplinary domains need to also attend – not only technical people

Additional notes below

<JeniT> ScribeNick: JeniT

<CaptSolo> Michalis speaking now re publicspending.gr

“Storytelling” in the economic LOD: the case of publicspending.gr

Michalis: visualising open spending data
... linked open data enables these kinds of analysis
... this has prompted new projects on LOD
... network analysis is a useful tool suited to linked data & semantic web
... particular interest in real-time open data

question: what tools did you use to generate network analysis graphs?

Michalis: visualisations were done by Gephi, processed by R & other mathematical tools

<bschloss> Coupling a data exploration, visualization, story assembly website such as IBM's Many Eyes 2.0 on top of LOD seems like something worth pursuing. See http://www-958.ibm.com/software/analytics/labs/manyeyes/#home

Michalis: all done with open source tools

Bottom up Activities for linked open data, open government in Japan

Takumi from OKF Japan

<bschloss> Reminder to PhilA -- put RDA lightening talk slides on ODW Workshop website, please!

scribe: and from Keio University

Takumi: International Open Data in Japan Feb 23rd 2013, 300 participants in 8 cities
... bottom-up activities from stakeholders are driving LOD in Japan
... academic institutions try to engage with local government & communities
... community members have same goals for LOD, which promotes collaboration
... neutral intermediary coordinates activities & shares best practices
... LODAC (LOD for ACademia) http://lod.ac/
... develops lots of datasets & builds dictionaries
... engages with local communities
... Yokohama city, one of the biggest cities in Japan, large LOD community
... Sabae city has first local government in Japan publishing LOD on its website
... LODAC & Yokohama community collaborate
... create mashup around museum & event information, demonstrating value of combining datasets from different communities
... private companies have question/answer datasets for Yokohama city

<bschloss> I will e-mail you slides, for now, add links please to http://rd-alliance.org and to http://static.squarespace.com/static/50ad9169e4b00ca12a884beb/t/50b34139e4b033e6125eec16/1353924921864/rda-flyer.pdf (their 1-page flyer)

Takumi: community generated new consortium, including government, citizen, academic members
... this led to Yokohama city becoming big LOD community
... Sabae city first publisher of LOD on its website
... 2011 published XML datasets with CC licences
... 2012 published RDF
... ATR Creative (private company) used LOD for their own product

<yoshiaki> Yokohama Art Spot: http://lod.ac/apps/yas/

Takumi: iPhone application
... Sabae city became most advanced open data city in Japan, because of collaboration between stakeholders
... government publishes datasets, but other organisations gather, aggregate, make available LOD

<yoshiaki> Sabae Burari, an application of POI and maps mush-up for local sightseeing spot: https://itunes.apple.com/jp/app/sabaeburari/id595859507?mt=8

Takumi: OKF Japan organises bottom-up activities in Japan
... organised 300 participants in open data day, some doing hackathons, some editing Wikipedia

<yoshiaki> International Open Data Day in Japan: http://odhd13.okfn.jp/

Takumi: over 90% participants were satisfied
... biggest benefits around networking, sharing ideas, and learning about open data & improving engineering skills
... those involved from different sectors
... OKF Japan helps to share best practices with each area, by providing toolkits & tutorials
... eg Where Does My Money Go? originally developed in UK, localised for Japanese usage
... used in Yokohama city, with tutorials for other cities
... Conclusion:
... bottom-up activities has driven engagement
... collaboration by academia is key
... neutral intermediary (OKF Japan) coordinates & helps share activities

Utilising Linked Social Media Data for Tracking Public Policy and Services

Deirdre from DERI

Deirdre: relationship between open data & public policy
... can be used to influence public policy & services, to lobby, influence policy makers
... eg use statistics to guide new schools
... also used to justify policy decisions
... and to evaluate policies
... eg environmental measurements to see whether regulations have had an effect
... still a lot of research to do about how this all works
... what about combining open data with social media data?
... could give evidence-based policy evaluation
... social media data is already being used for business intelligence, trend analysis, opinions on brand etc
... lots of activity from industry
... government is coming around to this, but using them in limited ways
... social media used for dissemination & limited engagement, but not to full potential
... not being used to get information from social media
... government is only a publisher, not a consumer of social media
... government should be harnessing information from social media
... proposed to do this using linked data
... extract data from social media, express as linked data, analysis on it
... challenges:
... wide variety of sources, each with its own API
... wide variety of formats
... privacy concerns
... can be noisy, difficult to process
... this is all based on solid research funded under EU FP7

<CaptSolo> www.linked2media.eu

Deirdre: Linked2media to provide SMEs with tooling
... DERI developed Social Media Linked Data Space
... now trying to apply this (designed for SMEs) to government
... has a triplestore, crawlers, integrating 25 different sites including review sites
... there are restrictions on different social media APIs which limit which data you can access
... once we have data, we model in common linked data format, reusing existing vocabularies
... using SIOC for review data

<ldodds__> PhilA: can I add a barcamp discussion proposal?

Deirdre: schema.org, rev, Marl etc

<ldodds__> PhilA: How should we attribute open datasets?

Deirdre: next steps are to look into integration of social media data with other linked data
... also using data to influence, justify & evaluate public policy
... not just technical aspects to this research, also social, political etc

Panel discussion

Christian: run a small web design company in London
... running tool to monitor corruption

Uldis: if you had one lesson learnt, what would it be?

Christian: we've had a lack of collaboration, despite open source tools

Deirdre: use of social media data is limited by the restrictions that they place on it

Michalis: big lesson is that critical mass for open data is lower than the web itself
... you build on the existing web
... make an application work, and everything else will fall into place

Takumi: engage with local community and local government
... local community has a diversity of needs
... need cross-relationships to tackle the real problems

Bart: for Deirdre: did you investigate engaging with the public rather than just reading twitter? asking specific questions rather than just listening?

Deridre: that's something we are looking at
... a lot of citizen engagement platforms ask on specific topics
... but it's hard to find participants that care enough to give that feedback
... when you just listen you can see the trends
... see what they *do* care about: maybe they just care about environment, not transport
... these are different approaches for different goals

Uldis: what do you expect to get from crowd-sourcing?

Christian: we kept hearing stories about corruption, but we didn't write them down or map them
... wanted to build something
... doesn't work for people who don't have internet access

<cerealtom> yvesr: surely its time for a game of crack attack!

Christian: they do have mobile phones, we have SMS number
... we want to tell people they have something wrong in their country

Kal: are you concerned about the demographic about people who contribute open data & participate on social media, and how that's different from demographic of general population?

Takumi: in Japan, we don't have much difference
... tends to be young people, lots of men, but otherwise not so much difference

<CaptSolo> Takumi's coauthor is speaking

Yoshiaki Fukami: in both Yokohama & Sabae, there are lots of knowledge workers

scribe: lots of data providers
... many students took part in open data day, to compose articles on Wikipedia

Michalis: researchers & journalists are different
... because of how they're funded
... there are both types of users in each demographic
... we're seeing a spread in access around Greece, among users who just want to find relevant information

Deirdre: I'm not worried about the demographic as long as we're aware of it
... we're not claiming that it's representative
... you can build in SMSs or having real workshops
... if you need something that's representative, build in other demographics

Christian: new technology goes hand-in-hand with old technology, don't forget radio

BobSchloss: political parties are getting clever at extracting features from social media
... some politicians have dashboards in which the weight of each statement is modified by Facebook friends or twitter followers
... the rumour is that they can identify whether people are influential in their communities
... will we see comments being weighted?

Christian: like A/B testing politics: it's no longer politics just the most popular person wins

Deirdre: is that something that should or can be stopped? probably not
... if we just have opinions then it's biased & subjective
... if we just have open data, it's not tied into human aspect
... we need to combine the two to get the balance

Michalis: if you can make objective information more attractive
... can you relate election area to spending, for example
... tagging the spatial location of the payment
... you can find objective information in a subjective way

Uldis: Concluding comments?

Michalis: we believe economic LOD should be nucleus of LOD
... need money to go around
... need to say clearly to policy makers which data is data infrastructure
... eg in economics, all public spending, prices
... theory and application together

Takumi: relationship between local community & other communities very important

Deirdre: as a community, it's great to see our progress, but I'd love to see more interdisciplinary talks & sessions at these events
... bringing real use cases to complement the technical skills we bring

Christian: we mustn't forget that there are parts of the world where things aren't moving at this speed

<scribe> ScribeNick: AndreaP

Lightning Talks - eGovernment and multilingualism (chair: Yaso Córdova)

A Brief Report on the Research Data Alliance Plenary in March 2013 (Bob Schloss, IBM)

Bob: Huge amount of data from scientists in the next years
... How such data will be accessible?
... RDA wants to move as IETF
... to accelerate and facilitate research data exchange.
... A number of WGs are being organised
... e.g., on Persistent Identifiers, Metadata
... They want to look at existing standards.
... Provenance and quality are other key issues for RDA.
... They want metadata to be searchable, in a cross-disciplinary way.
... Another issue: how to handle big datasets.
... Again: datasets for peer review.
... Real work starts in September. You are all encouraged to join.
... About how to join: http://rd-alliance.org/

Open Data in Data Journalists' Workflow (Uldis Bojārs, University of Latvia)

Uldis: National Library of Latvia opening up data.
... [interrupted]
... [technical issues]

Empowering the E-government data life cycle (Edoardo Colombo, Politecnico di Milano)

Edoardo: The project is multidisciplinary.
... computer scientists, engineers, ...
... Goal to set up a publishing protocol for open data that can be used by PAs.
... General goal is to set up an eGov system, enabling PAs to publish data and citizen to discover them.
... Why? To have facts, not opinions - open data are facts.

<bschloss> See rd-alliance.org , consider coming to their plenary in Washington DC in September, think what W3C standards and Open Data best practices (such as DCAT) can be extended for their needs.

Edoardo: Presenting the system "search computing architecture".
... Use case: money given to hospitals.
... The high level query is translated into low level ones.
... Result is presented to the user.

Open Data in Data Journalists' Workflow (Uldis Bojārs, University of Latvia)

Uldis: I'm back.
... Interested in the area of open data.
... We need to make it easier to work with the data, to make them more re-usable.
... Work with data frictionless from the start.
... We should be able to use it for building stories.
... Presenting workflow on data-driven journalism process.
... The idea is to have a set of tools able to cover the whole process.
... Data journalism is just one of the use cases.
... Stressing the need to get stories from data.
... Most important part is data discovery and publishing.
... Journalist must have information useful to assess the quality of the data they are going to use, first of all information on data provenance.

Lessons learned (and questions raised) from an interdisciplinary Machine Translation approach (Timm Heuss, University of Plymouth)

Timm: Motivation is that ambiguity is an issue in Natural Language Processing.
... Ambiguity may result in incorrect translations.
... Disambiguation is carried out based on dictionaries.
... Not like the approach.
... Rather: use what is in the LOD cloud.
... Statistics for LOD is key for NLP.
... Number of issues.
... Can LOD really model natural language?
... How can simply access LOD datasets? Some of the relevant ones are not easily accessible.

Interoperability Challenges for Linguistic Linked Data (David Lewis, Trinity College Dublin)

David: Would like to talk on a number of issues: content management, NLP technologies, localisation.
... Presenting localisation's value chain.
... A lot of work is outsourced.
... Re-use is also a big market.
... Statistical machine translation is also used.
... The value chain is quite long.
... Support for interoperability is there (XML-based), but interoperability is expensive.
... W3C ITS IG (http://www.w3.org/International/its/ig/) is trying to address some of these issues.
... The idea is to address all the localisation process workflow.
... This is done by using existing formats.
... Interest in using Linked Data to disambiguate terms and to introduce confidence.
... Also: how can we use Linked Open Data in the process?
... Provenance ontology (http://www.w3.org/TR/prov-o/) very relevant here.
... RDF used also for process monitoring.
... Another opportunity is to use multilingual LOD datasets to train machine translation.

Bar Camp Pitches

Eric: Project on reference implementation for LOD supporting data quality and provenance

Bart: How can we actually say that OD is successful from a business perspective.

James: Which are the barriers to using OD?

<CaptSolo> anyone ready to scribe? (as AndreaP is leaving soon)

Leigh: Attribution and OD - can we have best practices on this?

Bernadette: Best practices for Persistent URIs?

Christopher: Want to show what I did by aggregating data from different Universities

<CaptSolo> Wolfgang Orthuber

<CaptSolo> numeric feature spaces

<CaptSolo> JeniT, Omar: linked CSV

<CaptSolo> Mark Harrison: ?

<CaptSolo> AndreaP: can't manage to scribe everything, but i can add some detail

<CaptSolo> Michael Lutz: ?

<CaptSolo> ok, barcamp pitches finished

<CaptSolo> if you pitched barcamp ideas, add more detail here

Michael: What you would like to have as a contribution from the European Commission on open data? e.g., concerning legislation, regulation, reference data and services

Bar Camp Notes

Hadley/Bernadette on URI Persistence etc. (RTF)

Jeni on tabular data

Mark on product data

Chris M on open data business

Agis' additional scribe notes (RTF)

ODW Day 2

24 Apr 2013

Contents

Perspectives & Experience

Product Data

Dumb Strings That Mean So Much

Draft URI Strategy for the NL Public Sector, Hans Overbeek

Shared understanding = shared foreign keys (and more), Richard Light

Aggregating media fragments into collaborative mashups: standards and a prototype, Philippe Duchesne

Digital Archiving 3.0, Christophe Gu�ret

Discovery panel

Minutes of the “Crowd’s wisdom” Session in ODW13

“Storytelling” in the economic LOD: the case of publicspending.gr

Bottom up Activities for linked open data, open government in Japan

Utilising Linked Social Media Data for Tracking Public Policy and Services

Panel discussion

Lightning Talks - eGovernment and multilingualism (chair: Yaso Córdova)

A Brief Report on the Research Data Alliance Plenary in March 2013 (Bob Schloss, IBM)

Open Data in Data Journalists' Workflow (Uldis Bojārs, University of Latvia)

Empowering the E-government data life cycle (Edoardo Colombo, Politecnico di Milano)

Open Data in Data Journalists' Workflow (Uldis Bojārs, University of Latvia)

Lessons learned (and questions raised) from an interdisciplinary Machine Translation approach (Timm Heuss, University of Plymouth)

Interoperability Challenges for Linguistic Linked Data (David Lewis, Trinity College Dublin)

Bar Camp Pitches

Bar Camp Notes