See also: IRC log
Has anyone seen hadley
<BartvanLeeuwen_> scribenick: BartvanLeeuwen_
<BartvanLeeuwen> scribenick: BartvanLeeuwen
DNA Digest by Fiona Nielsen
DNADigest non for profit org. with purpose of opening up genome sequences
DNA Sequencing, purpose : cancer research/ heritable traits and illness and rare diseases
<PhilA> Apologies to Fiona who is saying seriously interesting stuff but fighting technology problems. Facilities are on the case but so far without success
<StevenPemberton> scribenick: StevenPemberton
Fiona: researcher needs databases
of genomic variation
... more data is needed to validate results
... they want to learn from other data
... but are not sharing their own
... why not?
... - confidential data
... - easy to identify individuals from genome data
... - BULKY
... so data has to be de-identified and aggregated
<scribe> scribenick: BartvanLeeuwen
no open sharing of raw data
only defects for described diseases is shared
Solution of DNA Digest, not all research needs access to full data
e.g. Does this mutation occur at higher frequency
this result is already de-identified
challenges: connecting existing datasets, incentives for sharing and commercial model
Next speaker: Florian Bauer http://www.reeep.org
reeep helps with renewable energy in developing countries
website after website is launched, "portal proliferation syndrome", that needs to be fixed
in current projects, data silos are created
<StevenPemberton> scribenick: StevenPemberton
Florian: the system has to deal with disambiguation
<BartvanLeeuwen> its seems very hard to stop this
<BartvanLeeuwen> but its because project deliverables state information dissemination , not how
<BartvanLeeuwen> We need to break silos and link information to help them do it differently
<BartvanLeeuwen> Can we support the linking of information with and automated system
<BartvanLeeuwen> we have to !
Florian: and use linked open
data
... we supply thesauri
... our system helps tag unstructured documents
... a file is sent, in whatever format, we analyse it, extract
common terms
... based on thesaurus and vocab
... and deliver lots of metadata
... with definitions etc.
... and we can store the location of the document with tags (if
the user wants)
... [gives an example]
... please look at http://api.reegle.info
... we are a non-profit
... so no charge
... we consider this a first step
Next speaker - schema.org, Dan Brickley
<BartvanLeeuwen> next speaker: Dan Brickely Schema.org
<scribe> scribenick: BartvanLeeuwen
introducing schema.org
search engines decided to collaborate, even though they are competitors
putting triples on the web was only done by a limited number of people
schema.org is not SEO, its semantic technology gone mainstream
schema.org is html with markup, to generate triples
creates rich snippets for enhanced search results
allows custom search engine
working on integration and discovery of datasets
we are driven by being able to answer questions, thats why datasets are interesting
[ shows examples ]
vocabulary is not designed by the schema.org people, its community driven
the team only integrates community efforts
this community is driven by w3c community group
dataset are added based on work by a.o. W3C GLD
where do graphs stop and tables start ?
schema.org is about finding dataset not making all data rdf
<PhilA> If I get a chance I'll talk about what we're doing at W3C on vocabs - the schema.org management model is instructive
Next speaker: Licensing Library and Authority Data Under CC0: The DNB Experience, Lars Svensson
today I talk about the change business model
mixed business model right now: which dataset / which format / access type
3 methods for harvesting meta data: SRU, OAI and HTTP
SRU and OAI are free services after registration to keep customers up to date
HTTP is free, no registration or tracking
formats, RDF, CSV and MARC
MARC is a custom library format
<StevenPemberton> http://en.wikipedia.org/wiki/MARC_standards
by end of this Quarter also as PDF ( nice PDF )
Authority data is available under CC0 and metadata as well
metadata about books is mostly CC0, RDF / CSV always CC0
MARC data of last 2 years is available under a fee older under CC0
process of transition
discussion about lost of revenue and spillover
scared of people running away with our data and earning lots of money with it.
no know examples of this actually happening
lessons learned: Finding right licenses can be tricky
CC-by license was to complicated
make it easy for people to use your data !
we ended up with CC0
Next speaker: Open Government Data Projects in Japan, Shuichi Tashiro, IPA
<PhilA> IPA is the government IT standards body.
brief intro of opendata in Japan
October 2008 CIO Forum started discussion about opendata in japan
in 2001 there was already a e-gov portal site
multiple administrative procedures available in one portal, but with non standard UI
public private collaboration with a realtime datalink
between train company and seismic center
in 2011 all trains stopped 10 seconds before earthquake wave reached main land
October 2010 a Open gov Lab was started with various items
before 2012 open gov strategy
lot of data put out had low quality
Earthquake of 2011 was a turning point
demand for data:
-- who is where
-- who need what / who can provide what
-- availability of general infrastructure
-- pollution level
Multi tier collaboration is needed
[ shows examples ]
Recovery and reconstruction support program database, one stop shop for support programs
this resulted in a vocabulary problem, all local govs used there own terminology
Japan has problem with its character code
very local, and very specific to identity
[ shows architectural diagram ]
o.a. contains a vocabulary database to solve problems with local terminology
also a open license character databases, with RDF support to find relationship between characters and legislation about character sets ?
Panel: Previous speakers
hadley: calls out to panel: if you had 10billion budget where would you be in 10years
Danbri: we should finaly start integrating
Lars: Digetizing
solve rights issues, who own rights on publications is somewhat hard
Florian: governments need give us the data we need to answer the important questions. Smartgrids are important
<daveL> in relation to IPA presentation on linked data in Japan, there is a new W3C community that aims to develop best practice in multilingual linked open data: http://www.w3.org/community/bpmlod/
Fiona: Money is not a issue, not technology , standardization is the current issue
Questions from room
ivan, remark: we should generalize a bit not just government but also science data
they produce data we should be able to access that as well.
Dave Lewis: remark, focus on LOD in countries where main language is english, there is multilingual Comm group [ please insert uri ]
<daveL> http://www.w3.org/community/bpmlod/
Hans Overbeek: questions about standards, what is the best advice? should we use schema.org or DCAT profile for integrating datasets
<PhilA> DCAT Application profile work through EC's ISA Programme. Details at http://joinup.ec.europa.eu/asset/dcat_application_profile/description
danbri: Schema .org is a agile approach, which could spin of into a standardization body
<ldodds> scribenick: ldodds
<scribe> Chair: PhilArcher
PhilArcher: I think product data is going to be very important
first speaker is John Walker, paper: http://www.w3.org/2013/04/odw/odw13_submission_40.pdf
John is talking about Open Data in the electronics industry
John: @NXPData is our twitter,
please share what we're doing
... NXP is semiconductor company in netherlands, lots of large
customers and large portfolio of products
... Content is the product, Product data is part of the content
=> data is the product
... Why make data Open?
... In bad old days, very doc centred information, lots of
content silos, content reuse was copy and paste or (worse)
re-keying
... Consequences: inconsistency, cost, errors, complex to
manage
... Consequences (for customers): confusing, inconsistent,
manual effort to gather/re-use information, difficult to find
new products
... Vision: unified content strategy. Create once, approve
once, re-use many times
... ISO 13584 -- data model for describing products
... DITA -- for natural language content
... variety of outputs, including flyers, data sheets, online,
etc
... data sheets are mixture of text and data
... parametric search interface for finding products
... dictionary of properties of products
... NXP want to be canonical source of information which is
re-published by others, e.g. distributors, aggregators,
etc
... want Web of Data to help collaboration
... http://data.nxp.com and
http://qa.data.nxp.com
... that is a work in progress
... give us feedback on what is there
... Open Data challenges
... how do we convince others to use (linked) open data?
... how do we justify business case (ROI) -- argue that its
simpler
... what formats should we use? (rate of adoption of RDF/Linked
Data)
... How do we ensure quality, security, enable access
... How do we combine semi- and unstructured content in
publications?
... Are we giving away a key asset?
... How do we standardise in industry?
Next speakers are Andy Hedges & Richard McKeating, Tesco
Andy: big numbers about tesco,
but what is interesting is how our data connects us
... some of that data is ours, some belongs to others:
customers, suppliers, manufacturers
... may not be able to share all of it: rights, commercial
sensitivity
... need to understand where information comes from, and how it
can be best combined for customer benefit
... offer customers best service/prices and compare data
... perhaps not intuitive to allow cross-supermarket price
comparison, but want to be a good brand have good contract with
community
Richard: I'm passionate about how
to use open data, microformats, etc to help customers
... Tesco Open Data is about our customers
... Places where you can buy things, incl. online
... Products that we offer (across range of brands)
... Orders: what's in your basket, or buy on the web
... Journeys: we make to fulfill orders, e.g. deliveries,
logistics
... Rewards: what do you get as a customer, club card pointers,
offers, price promise
... at Tesco we are on journey towards exposing this data, to
allow app providers to access it
... would be great to link to open data sources
... really key area is how we share data with trading
partners
... important for brands to share product information, esp.
accurate data
... customer access to data, e.g. purchase history, allergen
information, etc
... Tesco sell NXP products :)
... but they have little information on it (opportunity for
sharing)
Andy: customers don't just shop
at tesco
... standards make our life easier
... expose price promise, cross retailer product comparison,
delivery choices
... can compare products using GTIN (same EAN)
... but suppliers might have different sizes, etc. Some
products can vary
... Keen to start dialog with standards makers
Next speaker is Mark Harrison, GS1
Slides: http://www.w3.org/2013/04/odw/GS1-LinkedDataPresentation-ODI-April2013.pdf
GS1 is a standards body, assigns GTINs
Mark: GS1 is global stds org; 1m companies working on standards, e.g. for barcodes, logistics, rfid, etc
<scribe> ... new initiative started in Feb, GS1 digital: putting identification into the web
UNKNOWN_SPEAKER: B2C example: map
human-readable keywords ("milk") to Product category identifier
(GPC), search has user constraints (price, distance,
urgency)
... contextual filters for product category, e.g. organic,
skimmed milk
... refine search to find products and services that match
needs, e.g local store offering the product
... Achieve this using data linkages
... start with keyword which is mapped to category; category
has attributes/criteria
... imagine using contextual searching across all suppliers
across the web, not just facet search within single
website
... Schema.org and Good Relations are key vocabularies
... + GS1 : Global Product Classification (GPC)
... easy for them to open that, already available as XML, in
process of creating multi-lingual RDF
... GSN: Global Data Synchronisation Network has additional
data about manufacturers, etc
... GPC as Linked Open Data
Slide shows simple example graph of GTIN and facets
Mark: please collaborate with us:
email gs1digital@gs1.org
... I'm working on project as researcher, others handling
industry engagement
Next speaker is Phillipe Plagnol
Product Open Data
Slides: http://www.w3.org/2013/04/odw/POD-2013.04.22_01.pdf
Philippe: Product data is
critical for open data movement
... everything around us is a product, has one GTIN code which
is unique identifier
... products used by everybody, every day
... lots of contextual information, e.g. product packaging,
nutrietion
... products are fundamental for trade, economics
... objective is to create big repository of data about
products, based on barcode
... include ecological impacts, sources, support responsible
consumers
... lots of apps to support product barcode scanning, then can
use that to access data
... BUT: currently no public database containing this
database
... manufacturers have all this information in databases to
support printing, but don't share it
... largely question of access
... give us what is already printed on the packaging, using
GTIN as a key
... need to have a product schema for manufacturers to support
their publishing
... asking manufacturers for nutritional data
... working in France, hoping to get traction in other
countries
... incredible possibilities of using this product data
... GTIN is a new communication channel. More easily support
product annotation, product mapping, etc
... apps can support product reviews, consumer
recommendations/decision support
... only thing stopping us is an open catalog of the data
... imagine a "google maps for products"
Example of consumer buying decisions based on using a third-party app that uses ecological data
Now moving to discussion with all speakers
PhilArcher: (to Andy) does Philippe's talk scare/please you?
Andy: pleases us, we want to see
this data opened too, helps us be better retailer
... an awesome challenge
Richard: legislation is important too, "contains nuts" is a life/death decision
John: v. interesting. In
semi-conductor industry, people will take the data if its
available
... want to make sure products are accurately described,
whether its strawberries or microchips
... help consumers find what they need
PhilArcher: Mark, what are members saying?
Mark: enthusiasm from many
members, great opportunity for all of us to be more responsible
consumers
... how do we spend out money, what choices do we make?
Jim King (Adobe): isn't there a large product db of RFID data?
Mark: not quite the same thing,
that is likely EPC data, movement of stock through supply
chain
... we're discussing the product master data
... opening EPC data is more commercially sensitive
TomHeath: this is great session, what are the concrete steps towards openly licensed data?
Mark: different kinds of product
data; product categories/values can be openly licensed, largely
format shift
... data from manufacturers, requires discussions with them, v.
early stage
... they need confidence in benefits for themselves and others
and licensing discussions
... lots of good will, enthusiasm
Richard: Tesco are translating
desire to action by working with GS1 and brands
... make it easy and not disadvantage suppliers
ZachBeauvais: heard lots of positive noises, but what are the business cases on building on available open data? Any concrete examples?
Richard: one bus. case is
legislative changes and ensuring that products are accurately
described
... opportunities within our enterprise, we have silos
... really only just starting to understand wider bus. case
Martin ? (IBM): are competitors trying to catch up?
Richard: plenty of retailers working in this space, but not co-ordination yet
John: companies are already
scraping and reselling data outside of our control
... if we can clearly license it, then can make it easier
Closing statements
Philippe: originally saw GS1 as
"enemy", but can now see they're embracing open data
... GTIN code is often hidden by lots of ecommerce web sites,
needs to be published and clearly available
... come and download dump our own data to see how it is
constructed
... I'm following GS1 stds work, give me feedback
Mark: huge opportunity to make a difference, make world better place
John: in B2B industry, focus needs to be on streamlining business integration
Richard: keep focus on benefits for customer
Andy: echo that, need to look at customer needs.
That's the end of the session!
<naomi_> Chair: Hideaki Takeda
<naomi_> scribenick: naomi
<naomi_> Next speaker: Ministry of the Interior of the Netherlands, Geonovum, Hans Overbeek
<naomi_> Hans: Concept URI Strategy for the NL Public Sector
<naomi_> ... Thijs gives you geographic informaion
<ivan> s/infomaion/information/
<PhilA> scribe: PhilA
<scribe> scribeNick: PhilA
paper http://www.w3.org/2013/04/odw/odw13_submission_14.pdf
Hans: Talking about Designing URI
Sets for Public Sector, ISA programme study on URI Persistence
etc
... Why do we need a URI strategy - it's about trust,
provenance
... hard to do in the LOD Cloud
... We want our URIs to be recognisable and trustworthy
... We have kept registers - buildings, railways etc. for
hundreds of years, mostly to define identifiers
... joint points for linked data
... we develop a model and a vocabulary for it
... a register is a list of things that you want to
reference
... then there's all the sensor data etc. What we think of as
the big data
... re-use of things like reference objects is what we want to
re-use when we write our URI strategy
... we struggled a little as we have to mint URIs, but we have
a lot of identifiers that we can re-use
... but we don't have a register for everything
... there was no register for all our municipalities
... so we had to mint URIs for them... which means making a
register
... you can only have URIs if you have a register
... No register? No identifier
... so we were convinced that we needed a URI strategy
... the pattern that we used was, not surprisingly the one
developed in the UK/backed by ISA
... The domain should identify the register in a persistent
way, {register}.data.gov.nl
... The UK pattern has a {sector} in the pattern which sounds
nice but it's hard to find someone to govern the sectors. Some
will overlap etc.
... so we thought we might not need {sector} and left it
out
... with no strategy, you can use any URI, but it's less
recognisable and less trustworthy
... That means we end up having to have a register of
registers
... What infrastructure is needed?
... which apps use the resolvers and how frequently
... There's more in the presentation and the paper
... are we heading the right way?
paper http://www.w3.org/2013/04/odw/odw13_submission_17.pdf
slides http://www.w3.org/2013/04/odw/odw13_slides_17.pdf
Richard: Want to talk about a pragmatic approach to getting URIs into the cultural heritage sector
<StevenPemberton> i/Chair: Hideaki Takeda/scribenick: naomi__
Richard: gives brief history of
museum identifiers
... work was done on vocabularies, controlled vocabularies
<naomi_> scribenick: PhilA
<JeniT> is this the first RDF/XML of the workshop?
Richard: shows some examples of
collections described in RDF
... There are good discussions in progress across the
sectors...
... but although there is more RDF coming out, when you look in
detail, a lot of values are given as strings
... Modes is software used in most UK museums, Not free but you
become a share holder
... Modes includes standard term lists etc., that become
standards across users
... now starting to use Web to get the terms
... Modes includes a live search of geonames as source of URLs
for geographic places. Conversion happens in the software
... Can we use SPARQL endpoints as a a term list? Yes...
... Curators won't do any LD publishing themselves. All done in
Modes
... uses XSLT to transform data from original XML data. handles
the conneg etc
... Shows work that gave a URI to every word Shakespeare ever
wrote
... Adlib and CALM so looking at generating/using linked
data
... gives example of dog food eating
paper http://www.w3.org/2013/04/odw/odw13_submission_28.pdf
pd: on Media Fragments
... lists issues faced
scribe note - slides are expressive/detailed
pd: points to earlier work that
is all media specific
... No harmonised definition of the fragments
... Wanted to decouple fragment from media
... geospatial and tree paths not part of any previous work
AFAIK
... project done mostly using HTML/JSON developers
... but we have a SPARQL endpoint as well
JSON on the screen a few minutes after RDF/XML...
Sorry folks, no time for questions on this session
pd: gives a quick demo
paper http://www.w3.org/2013/04/odw/odw13_submission_34.pdf
slides http://www.slideshare.net/cgueret/digital-archiving-30-odw
scribe note - slides are expressive
cgueret: we need to treat the
data and metadata differently
... we find LD the best format for this
... Many formats for data itself
... rather than force people to transform their data, they
should just get the data in the repository - it's up to the
latter to sort out formats
... Forget about URIs as data
PhilA: Grrrr
cgueret: we have new formats every 5 years. Use conneg to handle format evolution
<ivan> -> a related pointer: how to cite data for scholarly purposes, the Amsterdam Manifesto, http://www.force11.org/AmsterdamManifesto
<CaptSolo> scribenick: CaptSolo
<scribe> chair: danbri
danbri: we will be discovering discovery, as everything brings back to this topic
next speaker: Richard Wallis, OCLC
rjw: working for OCLC
... WorldCat (stats about number of lbiraries, books)
... integrating linked data, schema.org
... the other hat: chaired ... [could you add detail here,
missed it?]
... need to publicize links to resources
... generic vocabs = generic "glue" that helps link
resources
<LarsG> Richard chairs the Schema Bib Extend Community Group http://www.w3.org/community/schemabibex/
rjw: you have to demonstrate the benefits -- use the data to drive the services
next speaker: Chris Metcalf, Socrata
[http://www.w3.org/2013/04/odw/agenda#al59 abstract, paper]
Chris: ~60 customers who want to
use open data
... how can we use schema.org, etc. to help solve the discovery
problem
... catalogs that need to speak to each other (cities.data.gov,
...)
... how can we encourage people and industry
note: if i miss things scribing, please add them :)
(technical pause)
next speaker: Steven Pemberton, CWI
StevenPemberton
[http://www.w3.org/2013/04/odw/agenda#al9 abstract, paper]
Steven: don't have slides, can
speak now :)
... involved with W3C from day 1
... name on number of standards, incl. RDFa
... point of research: make computers easier for people to
use
... small data is important
... e.g., website for this conference
... you look for data on aiports, lodging, agenda, ... (and you
enter same info again and again)
... if the info were in RDFa you could automatically add this
info to your calendar, find best flights, ...
... if your browser helped you here, people's lives 'd
better
... and browsers would would by providing services that help
use this data
... win-win for all
... use RDFa
next speaker: Pascal Romain and Elie Sloïm, Conseil général de la GIronde/Temesis
scribe: Pascal from local
council
... Elie from a company, W3C member
[http://www.w3.org/2013/04/odw/agenda#al38 abstract]
Elie: checklist of 72 good
practices (fr, en)
... every good practice has to be available online,
international, usable, realistic
... OPQuast - Open quality standards
... if you are open data producer, go and check the
guidelines
<markbirbeck> ;)
Pascal: open data checklist : a tool for LOD?
markbirbeck: thanks
next: Madi Solomon, Pearson
[http://www.w3.org/2013/04/odw/agenda#al48 abstract, paper, slides]
Madi: need to find new ways of
doing business
... provided a solution for publishing open linked data
(?)
... but never used the words "open" or "linked"
... termed it resource enrichment (?)
... textbook can be broken down into a large number of
assets
... each requires its metadata, ...
... concept extraction, keywords, faceted exploration
... built a rule to match with wikipedia, automated metadata
generation as quickly as possible
... astrophysics textbook -- had to do filtering, though, to
keep out sci-fi topics :)
... found out we can make taxonomies on-the-fly
... creates a baseline, curated taxonomy
... virtuos circle -- put it back into community, maintain,
update, ...
Dan: question time
... back to W3C aspects
... (question to the whole panel)
... if not making standars, what should be doing instead
... ?
Steven: small data is
important.
... see benets from including data on websites
Pascal?: standardisation is good efford
rjw: not standards work, it is
nurturing communities that have emerged
... sharing experiences, ...
Madi: as new coach of digital Publishing IG - questions is what can i do for you?
Chris: i love simple tools that
do powerful things
... w3c working on these things, but lot of people don't know
re them
... need to reach out, inform
Elie: need to produce standards +
guidelines for implementing those standards
... have nice specs but not always simple to implement
questions from the audience
Hadley: back to geocities era -
we made lists
... feel we are doing that now = making data catalogs, lists of
resources
... what we need to do to make it worthwhile to index the
metadata by search engines
... so average person can participate
(applause)
Chris: that's schema.org RDFa
stuff
... build it into catalogs, so data gets crawled
... not just list datasets, also make metadata schemas more
practical for people
... want to type in my ZIP-code and find what's relevant to
me
rjw: we (non-search-engines) we
need to talk their (search engine) language
... need to put up resources in front of people
Steven: i marked my homepage w
RDFa
... involved me ending up on Google Maps w the location where I
live
danbri: getting more vistors? :)
Bob Schloss (IBM): by analogy to SEO - let's implement
scribe: fabulous implementation
-- go to cloud's website, enter a URL from where the dataset
can be fetched
... they 'd "suck in" the dataset and give it back w enriched
metadata
... let's externalize it. if nobody starts, we won't have
it
Chris: we are doing cataloging
well
... re dscovery by search engines
... data by itself not so useful to everyday people
... people need to *use* those datasets
... we need to allow the data to be more useful
Bernadette: as someone who has
spent years on vocabs for linked data
... need communication, mentoring
... info that simply and quickly explains what it is
about
... to those who make decisions
... people in this room should write books, make videos,
organize seminars with the stakeholders
... there's so many standards, implementations. can be
overwhelming
(miss this one re google glass, RDFa, ...)
Martin, CTIC: Spanish experience re economy, corruption
scribe: Spanish government
launched a technical standard for all public bodies
... all have to use these guidelines when exposing open
data
... have to use linked data
... URI scheme for catalogs, datasets, ...
now the last round of comments from the panel
danbri: 10words of 30 syllables now
Steven: if you got information, it should be on the web + machine readable
?: as data producers think of objects and entities -- instead of datasets
rjw: you're only people in the domain who know the benefits -- demonstrate them to everyone !
Madi: middle place b/w where data is release and ppl access it. data-driven businesses
Chris: we as LOD advocates need
to reach out to ppl outside the LOD community
... many as tired of SemWeb cause they don't see the
benefits
... schema.org,RDFa, simple tools
Elie: need more metadata for
search engings, end-users
... metadata quality -- checks needed
... make website for end-users
... websites
... need to work on quality
danbri: thanks to the panel
panel discussion finished
Scribe: Agis Papantoniou
Utilising Linked Social Media Data for Tracking Public Policy and Services, Deirdre Lee, Adegboyega Ojo and Mohammad Waqar,DERI/NUIG
Deirdre attempted to answer questions regarding concerns of citizens related to public policy and services, expressed voluntarily in social media sites, like proposals for ring-road constructions, views on means-testing for medical cards to name a few. According to her the challenge is how can policy-makers distinguish relevant data from opinions and to this end a proposal of systematically tracking particular topics of interest across a range of social media sites was presented. Deirdre also talked about the relation between Open Data and public policy and services. Open Data influences policy makers but also evaluates decisions taken, like for example where can we build a new school or a ring road and whether that decision was correct or not. For example data on environmental factors could be of help upon the choice of the place to build a road. Deirdre pointed out that there is a lot of research still to be done on supporting technology but the case becomes even more complex when social media input comes in play. Such input can be used on a business level, to analyze and identify trends, market brands and predict sales. From the government bodies, the use of social media is limited and the analysis of their input is even lower. They do understand their use, their potential and power but they do not have clear guidelines on how to get the most pout of them. So Deirdre proposed a process to utilize social media data in the policy process, involving steps like data extraction, data visualization as linked data, data analysis and decision justification, stemming out of this analysis. Many challenges popup as an outcome out of this process that need to addressed, many sources, many APIs, many formats, URI scheme considerations, making social media less noisy to name a few. The Linked2Media is an FP7 project having as a goal to harvest social media data and make them useful for SMEs. The Social Media Linked Data Space (SMLDS) platform is already implemented and it can be used by SMEs. Practical examples of the limitation of social media APIs were also presented (cf. the presentation slides) and it was discussed that SMLDS also has a model through which data were “semantified” via existing and reusable vocabularies like SIOC and schema.org. The next steps of the Linked2Media project involve further integration with more social media sources and the utilization of social media data to further identify, justify and evaluate public policy and services.
Bottom up Activities for linked open data, open government in Japan, Takumi Shimizu, Keio University/Open Knowledge Foundation Japan
There are two activities of open data and open government in Japan. One is top down and lead by national government. Another is bottom up, community based activities. Takumi introduced both cases, which had as an aim to trigger the development of best practices to implement community based open data/ open government activities closely harmonized with world wide activities. Takumi also talked about the International Open Day in Japan, with over 300 participants, to promote bottom up activities of LOD activities. Among the key factors of the success of such initiatives, the collaboration of universities and researchers with government bodies, seems to be the most important one. Such an interconnection took place within LODA, a project having as aim to develop LOD and a data exchange platform, addressing museums, arts, sports. Takumi presented two use case that took place in the cities of Yokohama city and Sabae. During the first, Yokohama Art Sport app was developed, actually being a sort of mashup combining info from local museums and events. Sabae – city use case had to do with government related open data. XML datasets involving public transportation were published as RDF. Sabae city is one of the most modern LOD initiatives as the stakeholders worked tightly together providing feedback and input for the development of the apps. After the use case presentations Takumi presented in brief the OKFN Japan and its activities, like hackathons that took place in Japan cities. Activities like the Open Data day produced quite some incentives. Around 90% of the participants were satisfied with Open Data initiatives and OKFN helps to share best practices providing toolkits and tutorials, related to open spending helping and aiming at their further localization.
“Storytelling” in the economic LOD: the case of publicspending.gr,Michalis Vafopoulos, NTUA
Michalis talked about the publicspending.gr, an initiative retrieving and processing Greek Public Spending decisions, semantifying them and visualizing them in order for the Greek citizens to learn about their country’s public expenditure. Transparency at its best! Michalis presented publicspending.gr’s search and advanced search functionality (like for example provide VAT number for a physical and legal person and see the results – whether being payer and/or payee, the amounts of the spending decisions etc), the consolidated and interlinked business profiles (interconnected with other LOD like DBpedia and WESO). The next topic of the presentation involved an economic-driven network analysis of the publicspending.gr dataset, providing focus to the most important payers and payees but also revealing the evolution of the Spending Decisions along with their spending domains (payers and payees). Michalies also discussed how the pubicspending.gr dataset is also interconnected to other countries’ open datasets and the case of the Australian Government was presented and how both datasets can be queried and provide comparable results between the 2 countries’ spending decisions. Michalis concluded that the Economic LOD is the natural nucleus of the ecosystem and that LOD apps can really change the world.
Question 1: What tools did you use for the network analysis?
Answer 1: Gephi for visualization, processed with R and other mathematical tools, Virtuoso for the triple store
Panel discussion
Christian Nolle presents himself also noting that they are building a tool prototype to monitor corruption in the UK(?)
Q1: presenters were asked from Uldiers to describe one lesson learnt
Christian: lack of collaboration with policy makers
DL: social media data are restricted and providers need to open up their data
MV: the critical mass for Open Data is lower than the whole Web – so building on its evolution is good since it will lead to more applications – this is mandatory
Takumi: the engagement with the local communities seemed to be a challenge and they have to make cross relationships with such OD local communities.
Q2: Bart V. L: did you investigate the engagement with the public via social media – asking questions instead of only listening to posts?
DL: they are looking @ this continuously but sometimes is hard to find participants to answer such questions – but listening leads to trend understanding and at this moment seems like the optimal approach.
Q3: elections and corruption – what do you expect?
Christian: sat down built a tool and its ok but what happens with the people who don’t have internet access – so SMS extension of the app is in place.
Q4: is the panel concerned on demographic issues regarding social media?
Takumi: there is no clear distinction and relation between demographics in social media but small groups usually provide feedback. The 2 Japan cases provided quite some amount that can be somehow related to demographics.>br />
MV: researchers are complaining about the results of PSGR, journalists are providing raw info – so various uses for demographics – in Greece PSGR is being more and more accessed so this opens up space for demographics
DL: no worries about the demographics as long as we are aware of such demographics
Christian: must not forget other technologies like radio etc…
Q5: political parties are getting clever and start to extract info out of social media and dashboards about opinions are already in place. There are rumors that the parties are using such data in a clever way – might we end up reading 2 views of the same sentimental analysis data? One about who they are and one about how many comments they got on their posts?
Chris: politics, the most popular wins…
DL: cannot be stopped but LD and Social Media combination can prove as a solution in order to analyze and produce facts and not rumors…
MV: if you can make info more attractive i.e. we have all the signers of an expenditure –you can relate them to geographic areas. So objective layers can be put in sentimental analysis.
Conclusions
Economic LOD is the nucleus of the ecosystem + make clear to policy makers LD as infrastructure
Takumi: relationships with local communities is important and need to organize more conferences and OD days
DL: such events like ODW are good but more interdisciplinary domains need to also attend – not only technical people
Additional notes below
<JeniT> ScribeNick: JeniT
<CaptSolo> Michalis speaking now re publicspending.gr
Michalis: visualising open
spending data
... linked open data enables these kinds of analysis
... this has prompted new projects on LOD
... network analysis is a useful tool suited to linked data
& semantic web
... particular interest in real-time open data
question: what tools did you use to generate network analysis graphs?
Michalis: visualisations were done by Gephi, processed by R & other mathematical tools
<bschloss> Coupling a data exploration, visualization, story assembly website such as IBM's Many Eyes 2.0 on top of LOD seems like something worth pursuing. See http://www-958.ibm.com/software/analytics/labs/manyeyes/#home
Michalis: all done with open source tools
Takumi from OKF Japan
<bschloss> Reminder to PhilA -- put RDA lightening talk slides on ODW Workshop website, please!
scribe: and from Keio University
Takumi: International Open Data
in Japan Feb 23rd 2013, 300 participants in 8 cities
... bottom-up activities from stakeholders are driving LOD in
Japan
... academic institutions try to engage with local government
& communities
... community members have same goals for LOD, which promotes
collaboration
... neutral intermediary coordinates activities & shares
best practices
... LODAC (LOD for ACademia) http://lod.ac/
... develops lots of datasets & builds dictionaries
... engages with local communities
... Yokohama city, one of the biggest cities in Japan, large
LOD community
... Sabae city has first local government in Japan publishing
LOD on its website
... LODAC & Yokohama community collaborate
... create mashup around museum & event information,
demonstrating value of combining datasets from different
communities
... private companies have question/answer datasets for
Yokohama city
<bschloss> I will e-mail you slides, for now, add links please to http://rd-alliance.org and to http://static.squarespace.com/static/50ad9169e4b00ca12a884beb/t/50b34139e4b033e6125eec16/1353924921864/rda-flyer.pdf (their 1-page flyer)
Takumi: community generated new
consortium, including government, citizen, academic
members
... this led to Yokohama city becoming big LOD community
... Sabae city first publisher of LOD on its website
... 2011 published XML datasets with CC licences
... 2012 published RDF
... ATR Creative (private company) used LOD for their own
product
<yoshiaki> Yokohama Art Spot: http://lod.ac/apps/yas/
Takumi: iPhone application
... Sabae city became most advanced open data city in Japan,
because of collaboration between stakeholders
... government publishes datasets, but other organisations
gather, aggregate, make available LOD
<yoshiaki> Sabae Burari, an application of POI and maps mush-up for local sightseeing spot: https://itunes.apple.com/jp/app/sabaeburari/id595859507?mt=8
Takumi: OKF Japan organises
bottom-up activities in Japan
... organised 300 participants in open data day, some doing
hackathons, some editing Wikipedia
<yoshiaki> International Open Data Day in Japan: http://odhd13.okfn.jp/
Takumi: over 90% participants
were satisfied
... biggest benefits around networking, sharing ideas, and
learning about open data & improving engineering
skills
... those involved from different sectors
... OKF Japan helps to share best practices with each area, by
providing toolkits & tutorials
... eg Where Does My Money Go? originally developed in UK,
localised for Japanese usage
... used in Yokohama city, with tutorials for other
cities
... Conclusion:
... bottom-up activities has driven engagement
... collaboration by academia is key
... neutral intermediary (OKF Japan) coordinates & helps
share activities
Deirdre from DERI
Deirdre: relationship between
open data & public policy
... can be used to influence public policy & services, to
lobby, influence policy makers
... eg use statistics to guide new schools
... also used to justify policy decisions
... and to evaluate policies
... eg environmental measurements to see whether regulations
have had an effect
... still a lot of research to do about how this all
works
... what about combining open data with social media
data?
... could give evidence-based policy evaluation
... social media data is already being used for business
intelligence, trend analysis, opinions on brand etc
... lots of activity from industry
... government is coming around to this, but using them in
limited ways
... social media used for dissemination & limited
engagement, but not to full potential
... not being used to get information from social media
... government is only a publisher, not a consumer of social
media
... government should be harnessing information from social
media
... proposed to do this using linked data
... extract data from social media, express as linked data,
analysis on it
... challenges:
... wide variety of sources, each with its own API
... wide variety of formats
... privacy concerns
... can be noisy, difficult to process
... this is all based on solid research funded under EU FP7
<CaptSolo> www.linked2media.eu
Deirdre: Linked2media to provide
SMEs with tooling
... DERI developed Social Media Linked Data Space
... now trying to apply this (designed for SMEs) to
government
... has a triplestore, crawlers, integrating 25 different sites
including review sites
... there are restrictions on different social media APIs which
limit which data you can access
... once we have data, we model in common linked data format,
reusing existing vocabularies
... using SIOC for review data
<ldodds__> PhilA: can I add a barcamp discussion proposal?
Deirdre: schema.org, rev, Marl etc
<ldodds__> PhilA: How should we attribute open datasets?
Deirdre: next steps are to look
into integration of social media data with other linked
data
... also using data to influence, justify & evaluate public
policy
... not just technical aspects to this research, also social,
political etc
Christian: run a small web design
company in London
... running tool to monitor corruption
Uldis: if you had one lesson learnt, what would it be?
Christian: we've had a lack of collaboration, despite open source tools
Deirdre: use of social media data is limited by the restrictions that they place on it
Michalis: big lesson is that
critical mass for open data is lower than the web itself
... you build on the existing web
... make an application work, and everything else will fall
into place
Takumi: engage with local
community and local government
... local community has a diversity of needs
... need cross-relationships to tackle the real problems
Bart: for Deirdre: did you investigate engaging with the public rather than just reading twitter? asking specific questions rather than just listening?
Deridre: that's something we are
looking at
... a lot of citizen engagement platforms ask on specific
topics
... but it's hard to find participants that care enough to give
that feedback
... when you just listen you can see the trends
... see what they *do* care about: maybe they just care about
environment, not transport
... these are different approaches for different goals
Uldis: what do you expect to get from crowd-sourcing?
Christian: we kept hearing
stories about corruption, but we didn't write them down or map
them
... wanted to build something
... doesn't work for people who don't have internet access
<cerealtom> yvesr: surely its time for a game of crack attack!
Christian: they do have mobile
phones, we have SMS number
... we want to tell people they have something wrong in their
country
Kal: are you concerned about the demographic about people who contribute open data & participate on social media, and how that's different from demographic of general population?
Takumi: in Japan, we don't have
much difference
... tends to be young people, lots of men, but otherwise not so
much difference
<CaptSolo> Takumi's coauthor is speaking
Yoshiaki Fukami: in both Yokohama & Sabae, there are lots of knowledge workers
scribe: lots of data
providers
... many students took part in open data day, to compose
articles on Wikipedia
Michalis: researchers &
journalists are different
... because of how they're funded
... there are both types of users in each demographic
... we're seeing a spread in access around Greece, among users
who just want to find relevant information
Deirdre: I'm not worried about
the demographic as long as we're aware of it
... we're not claiming that it's representative
... you can build in SMSs or having real workshops
... if you need something that's representative, build in other
demographics
Christian: new technology goes hand-in-hand with old technology, don't forget radio
BobSchloss: political parties are
getting clever at extracting features from social media
... some politicians have dashboards in which the weight of
each statement is modified by Facebook friends or twitter
followers
... the rumour is that they can identify whether people are
influential in their communities
... will we see comments being weighted?
Christian: like A/B testing politics: it's no longer politics just the most popular person wins
Deirdre: is that something that
should or can be stopped? probably not
... if we just have opinions then it's biased &
subjective
... if we just have open data, it's not tied into human
aspect
... we need to combine the two to get the balance
Michalis: if you can make
objective information more attractive
... can you relate election area to spending, for example
... tagging the spatial location of the payment
... you can find objective information in a subjective way
Uldis: Concluding comments?
Michalis: we believe economic LOD
should be nucleus of LOD
... need money to go around
... need to say clearly to policy makers which data is data
infrastructure
... eg in economics, all public spending, prices
... theory and application together
Takumi: relationship between local community & other communities very important
Deirdre: as a community, it's
great to see our progress, but I'd love to see more
interdisciplinary talks & sessions at these events
... bringing real use cases to complement the technical skills
we bring
Christian: we mustn't forget that there are parts of the world where things aren't moving at this speed
<scribe> ScribeNick: AndreaP
Bob: Huge amount of data from
scientists in the next years
... How such data will be accessible?
... RDA wants to move as IETF
... to accelerate and facilitate research data exchange.
... A number of WGs are being organised
... e.g., on Persistent Identifiers, Metadata
... They want to look at existing standards.
... Provenance and quality are other key issues for RDA.
... They want metadata to be searchable, in a
cross-disciplinary way.
... Another issue: how to handle big datasets.
... Again: datasets for peer review.
... Real work starts in September. You are all encouraged to
join.
... About how to join: http://rd-alliance.org/
Uldis: National Library of Latvia
opening up data.
... [interrupted]
... [technical issues]
Edoardo: The project is
multidisciplinary.
... computer scientists, engineers, ...
... Goal to set up a publishing protocol for open data that can
be used by PAs.
... General goal is to set up an eGov system, enabling PAs to
publish data and citizen to discover them.
... Why? To have facts, not opinions - open data are facts.
<bschloss> See rd-alliance.org , consider coming to their plenary in Washington DC in September, think what W3C standards and Open Data best practices (such as DCAT) can be extended for their needs.
Edoardo: Presenting the system
"search computing architecture".
... Use case: money given to hospitals.
... The high level query is translated into low level
ones.
... Result is presented to the user.
Uldis: I'm back.
... Interested in the area of open data.
... We need to make it easier to work with the data, to make
them more re-usable.
... Work with data frictionless from the start.
... We should be able to use it for building stories.
... Presenting workflow on data-driven journalism
process.
... The idea is to have a set of tools able to cover the whole
process.
... Data journalism is just one of the use cases.
... Stressing the need to get stories from data.
... Most important part is data discovery and publishing.
... Journalist must have information useful to assess the
quality of the data they are going to use, first of all
information on data provenance.
Timm: Motivation is that
ambiguity is an issue in Natural Language Processing.
... Ambiguity may result in incorrect translations.
... Disambiguation is carried out based on dictionaries.
... Not like the approach.
... Rather: use what is in the LOD cloud.
... Statistics for LOD is key for NLP.
... Number of issues.
... Can LOD really model natural language?
... How can simply access LOD datasets? Some of the relevant
ones are not easily accessible.
David: Would like to talk on a
number of issues: content management, NLP technologies,
localisation.
... Presenting localisation's value chain.
... A lot of work is outsourced.
... Re-use is also a big market.
... Statistical machine translation is also used.
... The value chain is quite long.
... Support for interoperability is there (XML-based), but
interoperability is expensive.
... W3C ITS IG (http://www.w3.org/International/its/ig/)
is trying to address some of these issues.
... The idea is to address all the localisation process
workflow.
... This is done by using existing formats.
... Interest in using Linked Data to disambiguate terms and to
introduce confidence.
... Also: how can we use Linked Open Data in the process?
... Provenance ontology (http://www.w3.org/TR/prov-o/)
very relevant here.
... RDF used also for process monitoring.
... Another opportunity is to use multilingual LOD datasets to
train machine translation.
Eric: Project on reference implementation for LOD supporting data quality and provenance
Bart: How can we actually say that OD is successful from a business perspective.
James: Which are the barriers to using OD?
<CaptSolo> anyone ready to scribe? (as AndreaP is leaving soon)
Leigh: Attribution and OD - can we have best practices on this?
Bernadette: Best practices for Persistent URIs?
Christopher: Want to show what I did by aggregating data from different Universities
<CaptSolo> Wolfgang Orthuber
<CaptSolo> numeric feature spaces
<CaptSolo> JeniT, Omar: linked CSV
<CaptSolo> Mark Harrison: ?
<CaptSolo> AndreaP: can't manage to scribe everything, but i can add some detail
<CaptSolo> Michael Lutz: ?
<CaptSolo> ok, barcamp pitches finished
<CaptSolo> if you pitched barcamp ideas, add more detail here
Michael: What you would like to have as a contribution from the European Commission on open data? e.g., concerning legislation, regulation, reference data and services