See also: IRC log
<PhilA> scribe: PhilA
<scribe> scribeNick: PhilA
<bschloss> John Sheridan from the National Archives begins talking - move from 'ephermal, temporary' world to a world where there is more confidence in our data
JohnS: Talking philosophically
about the need for longevity
... how do I discover data that I can trust nad rely on, use etc.
<bschloss> How to discover open data I can trust and rely on
Johns: How do we firm up our open
data can begin to use it and re-use it confdeintly and
... Sustaining our open data. How do we do that.
... our budgets are declining. how do we sustain our publishing activity
... Share the responsibility of supporting and curating open data
... open data community is good at coming up with the rock to build on
... I work for a reputable institution. You'll trust the data if you trust the institution
... Adds solidity
... Extreme end is legislation. e.g. INSPIRE regulations that demand certain data norms
... How can we know if policies like INSPIRE will work? Should we be asking for more of that or going to people like the National Archives and asking them for commitments
... There's a lot to do to build our data on rock
... The ODI certificate may be one of the most important things for the community to work on this year
JohnS: it would be good to
discuss here what role things like the ODI certificate can
... Talking about the Gazettes (London Belfast etc.)
... This is about putting things on the public record, where data is available, provenence and authenticity supported and availability guaranteed
... service will be completed by September
... how do we see more services like this come into existence
... it's about devising tracks
... the way forward to make all this happen with a solid basis, that we can build on
... No one organisation can do this on its own, we need to act as a community to solidify our efforts
Millie: UNDP spends about $5bn a
year that generates a lot of data. Have we improved things?
What effect have we ahd
... we also generate procurement data
... we use thaty data mostly for accountability purposes
... we've been wondering what other insights might be accessible from that data
... can we work out which projects will be most effective
... what about the companies we pay, who is most effective, who do they employ etc
... We started a series of events called Data Dives where we worked with people we don't normally work with
... data analysists, programmers etc
... are there problems that we're not asking that we shoujld be asking
... We'll be opening a new challenge prize shortly for the best algorithm
... We took data from the World Bank on major contracts in 2007. We were interested in the suppliers and the relationships between those companies
PhilA: As an aside - must introduce Mille to Chris Taggart this evening
Mille: Certain companies tend to
win contracts in particular sectors
... two companies dominate this sub network of projects. What happens to the sub contractors is something goes wrwong with the main contractor - few points of failures
... do certain clusters of companies that tend to bid together
... we see clusters. Are these people really good ior is there something else going on?
... do contracts go to home countries or from the more developed world
Millie: A few hours' work
produced these insights
... the World bank folks had the data but not the insights which actually didn't take a huge time to create
<cjg> This analysis might be interesting (and easy) to apply to http://gtr.rcuk.ac.uk/ ...
<edsu> http://www.w3.org/2013/04/odw/odw13_submission_3.pdf is a 404 for me btw
Millie: shows visualisation of projects and performance
<danbri> edsu, i think the whole paper is in the 'abstract'
PhilA: Thanks edsu - I'll fix than when I'm done scribing
Millie: It's not big data, it's lots of little data scattered around
<edsu> danbri: thanks I found http://www.w3.org/2013/04/odw/papers now :)
Millie: global challenges coming up. We need help, people in orgs who can help open more data sets and help us get more insights out of that data
<ldodds> danbri: would make interesting reading, although I've not seen any open data on that?
<danbri> re eu, I think you'd need a temporal view... some partners sorta dominate, then EU notice that and punish them in later rounds
<cerealtom> this was the link from the final slide of the talk: http://europeandcis.undp.org/
TimD: Poses questions - why
people are interested in open data - transparency, innovation,
inclusion and empowerment
... the way we do open data can make it easier to realise these different aspects
... Talking about the launch (tomorrow) of ODDC
... Web Foundation and OGP are behind it
... Slides are expressive and contain the gist of the talk
... Draw out some key points
... As we've seen, supply needs to be built on solid foundations
... Are we building platforms that reply on always on high capacity systems in rural areas of the developing world
... are the standards right/ We articulate standards but are the right people in the room
... loads of standards being specifed - but do they work in all contexts? Does a London-based system work in Kenya?
... Are the licensing arrangements, correct/ Are first movers keeping others out?
... We have opendataresearch.org and more - see slides
Hayo: Talking about the Dutch
linked data project in NL
... We started out open data programme 2 years ago
... want to help government depts open their data
<cerealtom> good collection of questions there
<cerealtom> what problem are we solving?
hayo: now 6K data sets from national and local administrations.
<cerealtom> why spend money on opening data?
hayo: some great apps but not really solving real problems
<cerealtom> why is nobody using our data?
<cerealtom> why dont they build an app like...?
hayo: what actual problem does it solve? Where are the apps that do clever stuff?
<cerealtom> hayo: we've reached a kind of impasse; governments are losing enthusiasm
Hayo: We need to look at how OD is being used to solve real problems?
<cerealtom> hayo: our approach: focus on real-life problems
Hayo: Purple areas on shown map are where population is declining, orange it's growing
<cerealtom> e.g. disadvantaged and depopulated areas
Hayo: we want to help those
people with the real problems, disadvantaged areas etc.
... trying to companies together, working on the problem
... There's a problem of continuity. data is opened once and not updated
... produced for one hackathon and then stopped
... we're tackling that with linked data
... NL has a lot of open data around legislation, case law etc. Gov not using it, they're buying it from people who put wrapper around our data and sell it back
... can we reduce the amount of money we spend on getting our own data and maybe we can profit from it ourselves
... We notice that policy makers often say "I base my policy on law x" - people make comments or annotations - we can use those in linked data and make the data more useful
... shows nice labelled directed graph
... We're allowing people to make real links between laws, policies, their text or whatever
... what marketeers call deep linking
... we reward people for linking to laws. We contact people and say, Ok you link to the law, how about linking to this policy
... we can notify people that link to a law as it's clearly important to them
... laws have versions
... need to be able to point to a lw as it was in 2010 etc.
... System will be available in September - getting government people enthusiastic about using their open data. This is a good example of showing govs how they can use their data
... of course others can use it too.
BobS: We put together what we've
put together when considering what we think might be
... I think it's great when we get lots of open PSI
... we need it in educational, arts and business worlds too
... we need to get a virtuous circle where value is created
... looking at an Irish linked data front end
... we started in Oct 2011 with 4 Irish authorities (Dublin + 3)
... Looked at the cost/benefits of uploading open data
... this issue that the people who publish trust that their effort will deliver a return
... people have to want your data and they want it in their format
... (not yours)
... you need to be able to state how complete is the data, when and where does it cover etc.
... whole cluster of ideas
... you can synthesise this open data with yours and do good stuff
... The three principles
... (see slide 6)
... Slide 7 for the second principle
... talking about things like showing logos for limited time, potentially contacting data users
... need to be able to log if there's a new version of the data
<JeniT> disturbed a bit about the additional limitations bschloss is suggesting for "open" data
<JeniT> seems to be stretching what "open" means beyond the usual definitions
<cjg> "What if terrorists use our data" is on my bingo card: http://is.gd/gXDEaG
<cjg> (but to be fair, hazardous materials is actually a reasonable dataset to keep limited access. )
<StevenPemberton> Except if you want to see if there is hazardous material stored near your school. #west
<edsu> cjg: puts a new spin on the JISC's ‘The coolest thing to do with your data will be thought of by someone else.’
JohnS: We make instiutional commitments
Hayo: Our governments trust third
parties more than our open data
... we're trying to educate tem
TimD: We're trying to talk about purposes and use of data more than you need to publish in a given format etc.
Millie: This is a room full of evangelists, the shift in thinking needed is enormous, don't underestimate that
TomHeath: I like John's quotes. I
don't like "if you agree with me you're wise if not you're a
... How do we convince others of the wisdom
BobS: What we're doing in Dublin
- we capture the identity of the app, program and org that
downloads everything and there's an offline process for
assessing the value of that
... then go back to the data publisher and tell them what's going on, what people are doing with your data
<edsu> aside: best way to convince people is to show them the utility of it, not appealing to their better (wiser) nature imho
<cjg_> edsu: I swear that we've had people suggest that if terrorists got access to the live bus times they could use it… there's a wear and tear on my desk from banging my head on it.
PhilA: grrr dropped off IRC, sorry, missed some comments and questions
<cjg_> yeah, I've got a talk at IWMW this year about how open data can get better value for money -- seems a good way to think about it in these tightened times.
BobS: IBM has been looking at
specific cities. We don't push up hill - we find the people
that want to do open data
... We also need to find the person in the street
... we don't have 'how open data can improve your life' days
Hayo: Yes, talk about problems, not open data
TimD: Yes, we want data you can
build upon in gov and society
... Lots of great examples from places like Sao Paulo
... talking about accountability and capacity not open data
<cjg> We have a policy of always putting a front end on our open data; even if it's as simple as a basic HTML page. 99% of the users are just using that and not the underlying data, but that's OK.
TimD: so the new research project will include lots of case studies from Brazil.
BobS: In Africa, the knowledge of prices for their farming goods is transforming farming
<edsu> cjg: :-D
BobS: So we've been working on projects for people who can't read - working on spoken web in India
Millie: In the Balkans we have an
issue of forest fires and consequent air quality
... I want to know if my child can go out on the street
... we have kids building air quality monitors
... we move to solutions too quickly
<edsu> cjg: it's hard to develop all the apps/visualizations people want ; giving them the data and empowering them to do it seems like a no brainer -- except to people who don't want new interesting visualizations of their data :)
JeniT: For Bob - you spoke about the need for collecting data about people using the data and restricting terrorists's access - that's not the usual definition of open data
BobS: I see a spectrum, not a point
<cjg> I generally tell people that "open" means removing as many barriers as possible
BobS: we're going to have rock solid stuff - it will be there and accurate for 9 years. Then there's softer and softer - we need to cover the specturm
<cjg> the barriers can be technical, social or legal.
<cjg> "as open as possible" can still be used to describe data which is confidential.
<HadleyBeeman> For reference, I think JeniT is referring to the Open Definition http://opendefinition.org/
<HadleyBeeman> Great question… I've been wondering as well if we're still having the same discussions (as we were a year or two ago).
bhyland: Yes. we're all evangelists but we're not working in a vacuum. There are people in gov who are not minded to hand data over to a bunch of smart people they don't trust
Hayo: It takes pateince. We have to change contracts occasionally. We changed our legislation publishing contractor 5 years ago - that made a big difference
<StevenPemberton> I think he said that it took 5 years to change the contract
<StevenPemberton> and only then could they use their own data
Millie: SorryScribe note - sorry, I missed Millie's comment about Pulse??
Billr: My experience as a private sector person working for gov - see that some of the bigger people only just picking up the potential for open data. Some early birds are winning
JohnS: Spend more time talking to people not involved with open data about fixing problems
BibS: OD is a means, not an ends. talk about the ends
Hayo: OD will take time and money. Maybe 5 years +
Millie: UNDP uses tax payer's money to change people's lives - we need help
TimD: Think about who's in the room when we define standards
<StevenPemberton> scribenick: markbirbeck
<StevenPemberton> Scribe: Mark Birbeck
<StevenPemberton> Paper: http://www.w3.org/2013/04/odw/odw13_submission_52.pdf
<bhyland> Concluding remark from first session: "Open data is a means, not an end. Come at it from what real world problems it will solve."
Paul Davidson introducing James King — senior principal scientist at Adobe — to talk about how PDF is more open than we all think it is.
<edsu> BibS++ concur
Structure of talk- open data paradigm, PDF itself, and then its role in open data.
Organisations taking data, shaping it and presenting it.
…but others — the "processors" — would prefer to deal with the raw data...
…they might present that too, but also use the data to draw new conclusions, or use it for advocacy.
…A further group is that of the tool providers, who will help us process this data.
…About 30% of the room are providers...
…80% are processors...
…most are consumers, and some are tool providers.
…PDF will be 20 years old this June.
…PDF and Acrobat are different beasts.
…The internals of PDF have always been published, and it became an ISO Standard in 2008.
<PhilA> PhilA: Nice approach to backwards compatibility from Adobe for PDF
…A PDF 1.0 doc is also a 1.7 doc — always backwards compatible.
<bhyland> Jim King: PDF will be 20 years old this June. PDF 1.7 became an ISO Standard in July 2008. ISO work on PDF is ongoing.
<edsu> hopefully mozilla's pdf.js will get a mention ...
…To make the PDF spec into a 'proper' ISO Standard the team at Adobe had to go through the entire document…very thoroughly…
…PDFs are abundant, containing lots of useful information.
<cjg> I had surprisingly good results converting our student union committee minutes from PDF to RDF: http://lemur.ecs.soton.ac.uk/~cjg/TheyWorkForSUSU -- just looking at where on the page text appears gives more semantics than the naive pdf2utf8 (or 2html) approach.
…It's a format that distinguishes between text and graphics, and can be used to produce good looking documents.
…But it's not a data format.
<edsu> cjg: i think that's roughly what google scholar does when it scrapes pdfs
…Billions of documents out there, but difficult to extract any data that's in there.
<edsu> cjg: grabbing the largest text at the top of the first page as the title
…If pages *contain* graphics then extract that with something like Illustrator.
…If pages are text then there's a bunch of software that can process the text.
…(A big list is on Wikipedia.)
<bschloss> There is a 'spectrum of open data' -- totally free, available forever, no recording of downloader is one end of that spectrum, but airlines, investment markets, sports leagues, available job listing websites, retailers are all doing open data on a slightly different point on the spectrum.
…And if the pages are images (i.e., rather than *containing* images) then need to go the OCR route.
<cjg> We found a nice command line tool which converts PDF to and XML representation of the data structure inside and that gets it into our 'hacking comfort zone'
<ivan> wikipedia list for pdf tools
… http://en.wikipedia.org/wiki/List_of_PDF_software wikipedia list for pdf tools
…If you're making PDFs, here's what you could do to make things easier.
…Making files that both contain raw data and look good is difficult.
…There *is* software around that can embed metadata to provide structural information.
<bschloss> Seems to me that any producer of a PDF who wants it to be available to people with no sight is hopefully providing a table or textual alternative rendering in the PDF for any diagram or image in the PDF, yes?
…The structural information would be stuff like reading order, tags such as headers, footnotes, figures, maths, and so on.
…Tools can make use of this extra data which will make the extraction process much more reliable.
<markbirbeck1> …A second thing to do is make use of the attachment facility.
<markbirbeck1> scribenick: markbirbeck1
…A second thing to do is make use of the attachment facility.
…Raw data on its own is probably insufficient for doing something useful.
…For example, what's the currency? the data format? the semantics of the fields? provenance?
<alex> the attachments-in-PDFs thing might actually be useful for scholarly publications, so that the data doesn't get divorced from the paper
…So we create a PDF file that contains raw data with a schema, giving the end-user everything they need.
<alex> bhyland: yeah, presumably there's not the tools support beyond what Adobe sells
…Can then make use of all the nice PDF features that have evolved over the last 20 years, such as digital signing.
…There are some examples in the slides.
<edsu> bhyland: same could be said of most metadata on the web
Peter Murray-Rust: Spent years hacking PDFs in the wild.
…Trying to write software that will process them, but they are generally pretty bad.
…If anyone else is trying to hack on this then please talk to me; there's hundreds of billions of dollars worth of information out there that is simply unusable at the moment.
<cjg> I had a bit of a rant about PDFs as a way of communicating data to a reporter from the register, which resulted in them publishing this: http://lemur.ecs.soton.ac.uk/~cjg/Archive/Photos/2011/cjg-boffin.png (I'm quite proud of that)
Dan Brickley: Is this thing loud enough?
…PDF can be used well and powerfully, and of course it's clear that some people aren't using it well.
<edsu> heh, re: billions of dollars worth of information that's unusable, you have to wonder if that's by design, not by accident ...
…You didn't mention XMP, though, which includes RDF.
…You also didn't mention accessibility.
<bhyland> Peter Murray-Rust - Scientific publishers are paid $10B/yr worldwide to lock up scholarly publishing, that is after governments spend $100B/yr globally on scientific funding for R&D in the first place. He is looking for people to help him in his mission to unlock the enormous value locked in PDFs.
James: The accessibility aspects are quite mature in PDF, and the structured aspects help that.
<StevenPemberton> PDF is a page description language, so not in a reading order necessarily
…We don't have much control over what people produce, although things have improved in the last 5 years.
<bhyland> @edsu - perhaps re: your comment above. My experience suggests that we're more thoughtful publishing structured data about data sets (metadata) because they are fewer in quantity whereas PDF are like water, they are everywhere and almost "too easy" to create by the mere click of "Print —> PDF" …
speaker: For many people PDF data is closed data.
hadleybeeman: You've outlined many things I didn't know were possible, so why is there not the uptake on these features?
<bhyland> @hadleybeeman - because the tools are proprietary, complex to use … at least harder than clicking "Print —> PDF" and well let's face it, people are lazy and hand entered metadata has been proven to be *very* challenging and highly inconsistent.
James: Not sure if it's our fault. In some areas there have been successes, perhaps where there's industry interest or our sales people have promoted a feature.
<alex> If they want stuff like metadata to be adopted, then surely they need to encourage support in tools other than their own (OpenOffice; Word)
<inserted> scribenick: bhyland
<inserted> scribe: Bernadette Hyland
Jeni: sets the tone around different formats for tabular data, advantages & disadvantages of various approachs.
… NB: Special allowance for Rufus who has been known from time to go on ...
Rufus: Intro on OKFN and their mission to liberate data
… Proposed "Our Mission" is to make it radically easier for data to be made used & useful"
<ivan> scribenick: bhyland
Rufus: Stated problem of data on the Web in many different formats & issues that poses.
… propose 3 minor innovations involving " borrowing" approaches others have used before us.
In this model, there are the usual suspects … data creators & packagers, consumers and the effort in the middle to do "data packaging"
… Linked Data effort has been knowledge APIs and has been successful [to varying degrees]
… Packaging has to be done as a distinct step that minor packaging effort, agnostic about the data, its packaging is designed specifically ...
… Today, there is a huge amount of friction on getting & using data on the Web. We want to build for the Web. Rufus said RDF is not Web native … he has been laughed at when proposed its use ...
Proposal: 1 - One (small) part of the data chain; 2- Build for the Web; 3 − 4 − 5 [to fast to record]
Concluding remark: Package data more effectively and produce one killer tool to make data more accessible.
Speaker: Omar Benjelloun, Google, GPDE
Omar highlighted Google public search feature, Knowledge Graph capability and origins.
… Highlighted the Public Data Explorer, using Dat Cube representation. Anyone can upload & share data using RDF.
<HadleyBeeman> Oh. ignore my s/DSPL/GPDE
DSPL = Dataset Publishing Language, describes tabular data + semantic description including concepts describing re-useable data types. All packaged in a zip file. Visualizations can be shared.
Omar's Propositions: Datasets need good Web pages with stable, official, up-to-date canoncial location. Also, add good markup for reasonable SEO.
… Let tables be tables. [Let it be… ] Relational data & schema are well understood. Better than triples: tables naturally capture relations. Better than APIs: no access patterns, scalability issues.
… Add semantic annotations to tables. Leverage EXISTING approaches (RDF, schema.org) [emphasis is scribes :-)]
… Better to follow this approach than create custom data models (SDMX, DSPL).
Next speaker: Stuart Williams, Epimorphics
Overview of Epimorphics, doing services and LD design work. Working with data.gov.uk. Helped to lay down some of the sand that John Sheridan previously described.
Working to publish bathing water quality, now expanding to the river network in the UK.
… Thinking about getting 'beyond the data', we feel that we need to get beyond the 4 & 5 Star Data attribute, and evolve the message to solving a real world problem.
… Works with the UK Environmental Agency to make publication of valuable data … easy!
… Think about how to allow publishers to add simple bits of markup.
… Think about how to contribute to the virtuous circle of making it easy to contribute something valuable & receiving something valuable.
Next speaker: John Snelson from MarkLogic
Describes himself as an XML-guy and actively involved in W3C around those recommendations.
… MarkLogic helps its customers use data effectively using XML.
… John is a data pragmatist. We must look beyond those formats.
Next Speaker: Tyng-Ruey Chuang, from Academia Sinica in Taiwan
Involved in Taiwan's culture heritage efforts.
<PhilA> Tyng-Ruey Chuang, Academia Sinica (Taipei) see http://www.iis.sinica.edu.tw/pages/trc/
<cjg> I can't ask our catering department to provide menus in a well structured RDF format :-)
<cjg> (much as I wish they would)
… dealing with heterogeneous collections of content including media files, documentation. His focus is on sharing & making cultural heritage content usable for the long term.
… Putting data on the Web itself does not guarantee longevity.
… We can & should learn from the Free Software Foundation. Supports giving people the ability to make copies of content. Highlighted the importance of porting content to be ported to many other computer systems, both on & off the Web, for it to be considered truly open.
Panel Convener is Jeni … She puts the following question to Rufus. Q) There is debate on how manage metadata, to embed or not ...
Rufus: Regarding embedding, it almost becomes an AI project to figure out metadata that is embedded. It can be a nightmare. The beauty of keeping it separate is it is easier on tools & therefore treatment by tools. He is supportive of graceful degradation.
Tyng-Ruey Chuang: Prefers to have structured schema as part of the data (?)
Omar: Mainly, the important thing is to get agreement on format, then all kinds of good things can happen. Linking tables & metadata to Web pages (authoritative) is really important.
Stuart: We're been using this word "metadata" which leads us to schema information. In RDF world, we can click through to it & immediately see it.
… Using RDF model, you don't have to scramble all over the Web, rather, you get bits of schema info back because it is carried *with* the data.
… Highlighted the perils of carry possible too much provenance information that it drowns out the important data itself.
<cjg> Quite simply, tabular data requires a lower cognitive load to work with. Most people can't be bothered to learn to think in graphs. So tabular is more open because it's easier to comprehend.
<edsu> aside: embedded metadata (facebook opengraph, schema.org) is getting published because it is getting used
<HadleyBeeman> cjg I wonder how much of that is because our computer science training wasn't very graph-focused. Next generation might be different?
<edsu> i don't buy the argument that it needs to be separate ...
Questions from the audience ...
Ivan: When we speak of metadata, my biggest issue is what vocabularies to use. It is the biggest problem we have to solve, even more important than the data format/model … if we had widely available vocabularies, it would solve many problems.
<cjg> HadleyBeeman: I'm talking about the people who maintain my data. They are *not* computer scientists… they are in finance, buildings & estates, catering...
Rufus: If you meet most developers, and start talking about vocabularies, "they'll run for the hills." Been part of long countless fights on what vocab. Suggested a new site called http://GiveMeTheDamnSchema.org as a joint project of cygri and Rufus ;-)
<HadleyBeeman> cjg: Ah, I see. Yes, different user base there.
<cjg> I went to see what they already had, tidied it all up in excel and moved it to google spreadsheets so it was easy to grab automatically.
… What is the minimum to make CSV files useful. Just give me the basics, string, integer. This is *our* problem, not publishers. I'm all about 'reducing the time' … open vs. closed data.
<edsu> problem hasn't been schemas per se, as much as it has been schemas divorced from their actual use
… Licensing is a lower priority for many.
… Ease of publishing is king
<cjg> Also, I want to create a collection of SPARQL queries which produce useful spreadsheet downloads for humans to consume. Secretaries are a whizz with Excel, but only if the file loads first time. Telling them TSV can be "easily imported" is already outside their comfort zone.
… Our mission is to reduce the cost & RDF, at the moment, is not doing that.
Omar: If we want to bring data together, we have to harmonize into a common model. I don't know whether developers should have to be encumbered with that responsibility. But it is a real problem to solve.
Bhyland notes, (not in a comment), there is a wide spectrum of opinions in the room & that is good to stimulate that discussion. Deepening understanding is key to all of this.
Stuart: Finding the stuff in the first place, with schematic markup answering provenance information, is critical to solving the hurdles we face with better use of open data on the Web.
<alex> cjg: I played with SharePoint/Excel integration yesterday, and it looks like you can get Excel to live-update from SharePoint lists; I suppose something similar could be done with s/SharePoint/SPARQL endpoint/
John Snelson: Vocabularies have their place, but search is a great way to find data that is not expressed perhaps as nicely as we'd like...
<alex> then SPARQL would be truly Enterprise™
<alex> it would also be possible to embed the metadata for a table in a second sheet of an XLSX/ODS file, instead of prepending it to a CSV file
Questions from the mob: You've got to help represent/model data, but that is not the entire story. It is a "horses for courses" kind of thing. Please be careful not to reinvent RDF with JSON glasses on.
IBM guy - Dealing with data is hard. It is harder than process. We won't solve problems with data exchange standards alone. One thing we haven't heard about today is Best Practices and Architectural processes. We need to rise above data formats and really focus on data patterns, best practices.
<cjg> I have this horrific image of people creating n-triples documents in Excel...
<yvesr> cjg, i saw that being done *a lot* at the bbc
<pieterc> cjg: why would that be horrific?
<cjg> for one thing, excel plays silly buggers with certain values.
Bhyland to 'IBM guy' - let's talk real soon — there is Best Practices work, albeit nascient, underway within W3C Gov't Linked Data working group & we'd welcome your input.
<cjg> We have real trouble getting people to enter phone numbers without it getting muddled. 079671234567 gets converted to an integer as does +44....
Rufus: Described state of the world that is very fragmented, messy & dirty and urges us to not look for [utopian data model] that everyone if required to use.
Tyng-Ruey: RE: Best Practices, validators would be helpful to check data representation is correct. Need: Better validators (Note to PhilA).
<cjg> Shout out for http://app.easyopendata.com/ - for converting live google spreadsheets into XML or RDF/XML etc.
<HadleyBeeman> Oo, fun, cjg. Thanks.
Omar: "I think we've been spoiled by the Web" because search engines have done a good job. The question is, can we make this Web of Data thing work such that we publish our metadata & data and have it easily found. This is the question.
<pieterc> cjg: spreadsheets are for calculations, not data. CSV is a format which people use with spreadsheet programs, thus not suited for the job. Got your point?
Peter Murray-Rusk: To Omar - what do you do with things are labelled as tables but really are not tables?
<cjg> yeah, maybe we need a nice "CSV" editor?
<cjg> Or even a "table" editor, using PMR's description.
Omar: Smart people are working on it … it's complicated.
<pieterc> cjg: thought of it as well already
<cjg> basically a cut-down google docs.
<pieterc> cjg: open refine? ;)
John Snelson: Need to be able to break out & work with data in a schema-less fashion.
<cjg> with a magic table heading
John Sheridan asked, in the world of tables & CSVs and [screw the metadata], how are you prepared to deal with the license matter?
Rufus: I didn't say, 'screw the metadata'. Rather, we need simplicity and innovation about process. He suggested having multiple parties be part of the "packaging process".
… Clearly a license has to come from an authoritative source. Gave example about data from Bank of England. Two important points, we need minimal metadata and … [some one else augment please, scribe missed second point]
<cjg> *if* the source of the metadata is the same website as the data then that's probably good enough for me.
Wrap up from panelists - 'wear your schemas on the outside, use HTTP URIs to describe things if putting on the Web.'
John: Great opportunity for tool developers to liberate data.
<StevenPemberton> Scribe: Steven Pemberton
End of panel facilitated by Jeni. Thanks all.
<StevenPemberton> scribenick: StevenPemberton
<pieterc> I have a problem with the fact that the data are/is being able to be processed through quick bash scripts, or other low barrier scripting languages, but the meta-data needs a json parser
<bhyland> Someone else able to scribe, please? Pretty please??
markbirbeck1: I am from a semweb
... software developer for decades
... [lists examples of RDF-based software projects he has worked on]
... also involved with RDFa at W3C
... you can tell I'm setting things up to have a good moan
... usually data not available, or in inconvenient formats
<alex> markbirbeck1: ooh, a jobs ontology. we wrote our own having found nothing in the wild (https://data.ox.ac.uk/id/dataset/vacancies)
markbirbeck1: or not linked
... Lessons -
... - need a big cultural change to get open data
... - spreadsheets aren't that bad, don't need to wait for RDF
<edsu> alex: http://schema.org/JobPosting
markbirbeck1: but the timeframes were a big issue
<alex> edsu: ah, cool; thanks :-)
markbirbeck1: - Join question. Linked data would be great, but consistent code would be enough
<cjg> hmm, is there a schema.org->RDF mapping? there must be...
<yvesr> cjg, http://schema.rdfs.org/
<edsu> cjg: there is, but really who cares?
markbirbeck1: Big data is
relevant, lessons learned from that are useful.
... Open data doesn't need to be RDF, use context
... only when you cross (company) boudaries, do thinks like schemas become important
<cjg> edsu; me as we've just stared publishing vacancy data last week! Making it Linked Data is useful as it can cross-reference to our URIs for various departments & faculties.
timbl: when you mention experience you've had, please say who you are/were working for, was it big or small project, public or private, et.
<bhyland> There goes TimBL again about context, context, context! ;-)
<HadleyBeeman> Metadata for our conversations. :)
<PhilA> who'd have thought context mattered for data eh bhyland?
markbirbeck: There was a layered
approach to it in my case, people who had bought in but didn't
know enough, which was worse
... but NHS in my case was an example, timing was bad because of looming cuts
... but I was naive too about the issues involved about publishing certain types of data and aggregation
<bhyland> TimBL: Context is important. Users in intelligence community won't consider using data without provenance, won't even start the conversation or analysis.
Raphael: Most are tool builders
here, but we need more than tools
... this a report of what we have done at a "datalift data Camp" last year
... lifting data to 5 star status
... It worked a bit, but was a good learning experience
... varied data source types
... and varied companies, with different needs
... Datalift is a package with single click download
... [shows workflow]
... converts to RDF
... and then the interlinking
... used for two large data collections in France
... DIfficulties are how to choose the right vocabulary
... rdf conversion, URI schemes to adopt
... automatic detection of datasets to link to
... LOV initiative, 260+ vocabs
... now open source!
<bhyland> I love how a French speaker says "LOV bot" as love boat.
Raphael: Conclusion -
multilingual vocabs important
... hide complexity of sparql
... eg QAKIS
... Shape files are important
... INSPIRE directive and W3C GLD vocabs need to be covered
<bschloss> Since Open Data is a means to several valuable ends, IBM is talking to our clients about thoughts of "becoming a Contextual Enterprise" and we emphasize the critical need to dynamically assemble context for every key input and output of their work, including the context of external data they import. See http://www.research.ibm.com/files/pdfs/gto_booklet_executive_review_march_12.pdf for very high-level summary of our recently released Global Technology Outlook.
Raphael: GTFS/DSPL formats
Tristan: We work with cultural
heritage. Will talk about science museum now
... also a plea for help
... Science Museum is august and venerable, with loads of internal systems, we are trying to consolidate them
... we extract, and convert to linked data
... triple store
<pieterc> rtroncy: how active is the development of Datalift? I haven't seen a lot of activity on the SCM
Tristan: built a data model, in
cooperation with British Library, British Musem [others], see
... use that to drive the website
... my plea for help is what should be the next steps
... how can we make it more open?
... Publication strategies, stable URIs, dereferencable etc
... Is the data model interoperable
Madi: I am new to W3C, and open
linked data devotee
... Pearson is a publishing company, owns Financial Times and some Penguin books.
... I think we are the first W3C publisher member
<edsu> that says a lot
Madi: There is a new Community Group at W3C with 23 members
[link here to CG please]
Irina: Raphael, what were the outcomes?
<HadleyBeeman> Eek, sorry. Try this: http://www.w3.org/community/opened/
<bhyland> Madi: Data + education is a natural fit. Whatever we can do to make it easy for students + instructors + open data advocates to get together make the world a better place.
Raphael: It was part one of a two
part process. We wanted clean data, the next step will happen
later this year, to reuse the data to build apps.
... Some of data sets are just data dumps
q1: is there automatic linking between data possible?
<ivan> s/[link here to CG please]/-> http://www.w3.org/community/opened/ Open Linked Education Community Group/
MarkBirbeck: It is not just
... do you mean just numerics?
q1: Not necessarily,
MarkBirbeck: This is what I was
referring to earlier, for instance trying to identify a company
from different versions of its name
... URIs are a great goal, but you can get there earlier
<HadleyBeeman> scribenick: hadleybeeman
<scribe> Chair: LeighDodds
Kal Ahmed: Intro to talk on OData
… OData is a standardised protocol for consuming and creating data APIs. -odata.org
… originally conceived by Microsoft, this is bringing it into being a common protocol.
… Odata is entity-centric. Comes from .NET developers with tables of data. STandard itself defines how you publish your metadata: service metadata and schema.
<ivan> scribenick: HadleyBeeman
… OData has a URL-based syntax for access.
… Includes inline expansion between entities
… POST a represention to an entity set's URL. PUT, PATCH, MERGE, or DELETE.
<cjg> I've never heard of MERGE or PATCH before…
<alex> PATCH is only just a Thing, isn't it?
… Other nice features: combines metadata properties with a special media source URL. Named streams. Ability to embed your own custom actions and functions and expose them as URLs
<JeniT> PATCH is a proper thing, haven't heard of MERGE
<alex> only just> March 2010, according to http://tools.ietf.org/html/rfc5789
… There are a lot of reasons to like OData. You can reliably discover the schema. Clients are all linked. Easy to experiment using those URLs.
<pieterc> alex: The DataTank supports PATCH
<pieterc> alex: (tdt is a RESTful data adapter project in PHP)
… There is a growing set of OData consumers. GUI controls and libraries.
<pieterc> alex: (it sounds worse than it is)
<alex> http://msdn.microsoft.com/en-us/library/dd541276.aspx "The remainder of this section defines a custom HTTP MERGE method"
… Criticisms of OData: Service definitions tend to be siloed. Links don't tend to go outside the data service. Don't use any shared ontologies.
… Another slight criticism: because of its history as being pushed by Microsoft, it's seen as being vendor specific. Not true; standarisation now under OASIS, other contributors
… Why do developers use it? We live the features and the flexibility of RDF/SPARQL. We were disappointed with the Linked Data Platform proposals and the flexibility it would give.
… We wanted it to be a declarative configuration only, ultimately to do that config automatically.
… Previous attempt: LINQ - to - SPARQL, hand crafted as c#
… This implementation: Proxy service for a SPARQL endpoint. http://github.com/brightstardb/odata-sparql
… Key part of this: the annotations. They're in the OData spec. Defined for: URI namespace for entity primary keys, URIs for entity typoes, properties and directionality of links
… Annotations are visible to the consumer, mappings done against the SPARQL endpoint are visible
… Allows you to reconstruct the source triples you've just queried, if you'd ever want to.
… Implementation issues: Our naive approach: if you ask for an entity, a DESCRIBE will give you what you want. It was too unspecified, so you have to use CONSTRUCT, which led to sroting and identification issues.
… OData allows the server to do paging. If there's been a server-side limit imposed, you don't know that.
… Biggest implementation issue: because we're turning primary keys into URI identifiers, every entity in the entity set has to have the same base URI. Not a problem in most cases, but potentially.
… [Example query to select a simple film]
… [Example query to enumerate films]
… [example query to show property navigation]
… That's all leading up to a bunch of questions. First and I'm most interested in discussing here: What is the group's seen importance of standards in interoperability? Do standards need to interoperate? Do different standards body's standards need to interoperate? Whose responsibility is it?
… More questions: what could the W3C LDP WG learn from OData and vice versa. OData changed in response to feedback/requirements. Now on third iteration… Should these requirements and use cases be shared between groups?
… Finally, is there a shared meta-model for entity-oriented view of data resources between the two?
LeighDodds: Do you have a sense of uptake?
<JeniT> (uptake of OData)
Kal: hard to tell because search discovery of OData endpoints is hard. Probably more not visible to the Web than those that are.
<bschloss> [I think the SAP ERP platform, recent version, has APIs to get information as ODATA]
ivan: There have been several attempts to get these groups together. For all kinds of personal reasons, it did not work out. There is a community group at W3C on OData vs RDF; the group is silent, empty.
Kal: It shouldn't be "OData vs RDF". They should be coexistant and work together.
<bhyland> My question is (and I'm not being snarky or flip), why OData? Isn't this MS trying to redo RDF? RDF has matured and is well-documented. It is not perfect & use is far from ubiquitous however, why fragment?
subtopic: Neil Benn, Fujitsu. LOD approach to engineering health-sensory datasets/
Neil: I'll focus more specifically on health and health sensor data. I've recently joined this group, and this is one of the projects we're working on.
… We're working on a cloud platform for large-scale graph storage. Public and private data. That seems to be a tension that is coming across throughout today. Therefore, Linked (Open|Closed) (Big) Data
<bschloss> Mentions Linked (Open|Close) (Big) Data and mentions Fijitsu and DERI Collaboration on Linked Data Global Repository
… We've been working with DERI on a CKAN-like LInked Data Global Repository. Faster and more searchable.
… We're also involved in the W3C LDP WG
… With the University of Singapore, we've been working on health care sensors. Temperature monitor, heart rate monitor, establish patient history. Challenge: how to combine sensor data with patient specific data from their health record, which might be different to medical best practice, clinical recommendations, etc?
… We're making this sensor data linkable - 10m triples per person per week, for example - standardise, and link to data about effective drugs.
… Announced in Nov, just working out how to do this. Open, closed and anonymisable data involved.
… We are handling temporal data and binary data. Do we want to convert binary sensory data, with an established community of tools, into RDF? Maybe not. If not, how to work with the binary and the (other) linked data?
… These things keep me… well, not awake at night, but certainly busy during the coffee break.
… Non-technical challenges: main motivator for this paper: most open health data is on hospital numbers, costs of services, etc. But these are questions for policy makers; not as much emphasis on medical research.
… Found data on ECG and HBR stuff… but not as much emphasis of having a "broad church" of open medical health care data to generate further epidemiological and clinical research.
… Generating these datasets is labour-intensive. One researcher said teams of researchers working on a dataset would be useful… How to do on the Web?
… Could be that we have more administrative hospital data than clinical data because it's easier to lobby governments than universities and researchers?
… There still isn't much best practice on this. Vocabularies, dataset engineering patterns. We have patterns for building modular software… is there an equivalent here?
… Ex: There is an ECG ontology I came across… should I use it?
BillR: You should look at Linked Data Patterns, LeighDodds is one of the authors
Discussion with panel, including Albert Meroño-Peñuela
Albert: We work with historical censuses, encoded in thousands of .xls spreadsheets. We would like to uniformly query them, but they are extremely messy. We'd like to transform them into RDF Data Cube and other vocabularies using SPARQL queries?
Question: Bob Schloss: The value we seem to be talking about is mashups between datasets with unexpected results. Mapping was one of the first join points. What other join points do you see and do you agree this is critical?
Kal: Yes, I agree. Increasingly, I see a lot of time-series value type data, sets combined in a way to expose latent knowledge. Biggest problem is vocabulary interoperability. Odata doesn't have them so we can't do conceptual joins with data tagged with different systems.
Bob: Let's reuse the requirements gathered from XBRL in the Financial industry. They do have publicly listed busineses.
Neil: Open data is administrative, government-driven. People want to answer local questions, so that has driven a lot of the applications. But in that healthcare example, it's not geographically-specific. New disease patterns may not be tied to parts of a city.
… With regard to the vocabularies question… I don't want to learn about all the vocabs out there. In the same way I can modularly take a bit of a software library to see what's in it, I'd like to do the same with a vocabulary. I want to conceptualise my data first, and modularly pick a vocabulary.
Kal: The individual is an interesting join-point. For governments and otherwise.
Albert: In some domains, historical data is so badly degraded… and it may not have been intended to be comparable.
TomHeath: Re data engineering patterns: we do need to go further than Leigh's book. Hack-y stuff (download, GREP, etc), ad-hoc processes. Things going on in the Hadoop community to describe these processes
Neil: The term dataset engineering patterns… [coining a new phrase]
Michael (from the EC): to Neil: re the link between closed/sensitive/open data… Are you looking at aggregated personal data that then can be opened? As in other areas of sensitive public data
Neil: we don't quite have a generic process for anonymising sensitive data. Some organisations do that… I'm just in the early stages of learning the issues around that.
questionasker?: concerned about applying the label of "open data" to data that's locked behind a query API. Do you share my concerns?
Kal: OData entity set that conforms to the standard is enumerable… It's an ATOM feed with Next links in it. You can download it. Also, a data dump isn't any better — you're relying on the server's capacity to provide the data and the data being up to date.
… I can see your point but I think it applies to all open data.
questionasker?: If I were going to mortgage my house to fund a startup on this data, I would see this as a problem.
Kal: Of course, there are different applications.
<rtroncy> scribenick: rtroncy
<scribe> scribe: rtroncy
<scribe> Chair:Alex Coley
Alex introducing the session, composed of three talks
on Jay le Grange - GeoKnow: Leveraging Geospatial Data in the Web of Data
EU Project GeoKnow: http://geoknow.eu/Welcome.html
scribe: inspired by earlier work
on transforming Open Street Map into Linked Data
... 3 major sources of open geospatial data
... spatial data infrastructures (compatible with almost all GIS), open data catalogue (SHP, KML files), crowdsourced geospatial data
... ontologies: basic geo vocabulary, GeoOWL ... and GeoSPARQL
... efficient geosparql RDF querying, fusion and aggregation of geospatial RDF data, visualization and authoring, public-private geo-spatial data (sync workflows)
... aim to provide a suite of GeoKnow Generator tools
... two use case scenarios: e-commerce and supply chain
... the GeoKnow generator is expected by December 2013
RRSAgent: draft minutes
scribe: see also: http://blog.geoknow.eu/
Michael Lutz - Interoperability of (open) geospatial data – INSPIRE and beyond
Michael: INSPIRE in a
... legal framework for establishing an infrastructure for spatial information in Europe
... 34 spatial themes
... implementation 2009-2020
... there is a growing interest in creating innovative products and services based on INSPIRE and other data
... we realize that with INSPIRE we cover a lot of topics of this workshop
... key issues with INSPIRE: enriching INSPIRE data models with application specific business data
... example: urban planning, waste management plans, environmental impact assessment, risk management on top of geo data
... beyond INSPIRE, traditionnally link with GIS format and XML ... how we move towards RDF
... how to create and manage persistent identifiers
... implications of opening up data for the organisations: governance, long term commitments, etc.
... how to address those issues? ISA = Interoperability Solutions for European Public Administrations program
... see also: ARe3NA (INSPIRE reference platform), EULF (EU Location Framework)
... W3C LOCADD community group
... advertisement for the INSPIRE conference in Florence 23-27 June 2013
... ISA program http://ec.europa.eu/isa/
Mark Herringer - Open Data on the Web and how to publish it within the context of Primary health care
<scribe> unknown: question about identifiers, can we expect a better framework, e.g. URI in INSPIRE ?
Michael: in INSPIRE, there are 2
types of identifiers
... for data objects and for real-world things
... we relax recently how to write those identifiers and enable http identifiers
<PhilA> Thank you Michael Lutz on URIs
Raphael: there are a number of
initiatives that try to take part of UML diagrams of INSPIRE
and build RDF schema, see e.g. efforts from Laurent Lefort and
... are there plans to have an official schema in RDF for INSPIRE ?
Michael: yes, we will organize a workshop where everybody presents its modeling ... and we wish to have an agreed upon model
RRSAgent: generate minutes
<yaso> Lotte Belice about Open Culture Data
<JeniT> Scribe: yaso
<HadleyBeeman> scribenick: hadleybeeman
Johnlsheridan: It's 2020 and we've seen the failure of the world's first multibillion dollar open data corporation. How did this happen?
<yaso> Yes, I'm with connection problems
Conor Riffle: We've been looking at lots of business models. Sponsorship would be hard to scale to that level.
… Also look at people like Google who make tons of apps and sell ads on that.
JohnLsheridan: which of the eight business models Michele has identified could scale to that level?
Michele: Usually, all the four actors are able to manage a huge amount of data. We have some enablers - usually they are scalable - but they do not serve end users. They're in a wholesale position in the value chain. Examples: Microsoft, Socrata.
… Many of them have other business lines, even outside the boundary of public sector information.
Irina: I think you'd want lots and lots of smaller companies, not one big one. As small music app companies are threatening the big distributors, a big company doesn't fit.
Bart: The Fire Department wants to be the authoritative source of information. They won't make a business out of it, but they will engage to have usable data.
Miguel: Risk to opening up data… fear of losing control. But benefit: they will be seen as the authoritative source. We see both.
Lotte: open data can bring big benefits to companies.
questionasker?: Do we all agree that we should build public infrastructure, basic datasets to build business models on top of… If we don't do it fast, a big multi-billion company maybe wants to become a public infrastructure provider? Or the market will collapse and transform in another way. We, as a community, need to identify the basic datasets which will be the "streets" of open data.
JohnLsheridan: What are the basic datasets of interest for fire services?
Bart: Address data. Real streets. We don't have "highways" for open data yet; we have "rural roads."
… Large companies taking over scares the Fire departments as well. "What if a company over in America is holding our data?" An important discussion to have.
Johnlsheridan: Do you see CDP becoming that sort of infrastructure provider?
Conor: I think we are. Especially where companies are contributing pollutants to that atmosphere, it impacts all of us. But we see it's useful where people can make money out of it. Investors will use it. But there's more to do with it. We need a hybrid model: some monitisable, some open.
Bernadette: I'd recast the question: It would give me great joy if, next year, there are 20 companies 10-100 people with $2-20m in gross revenues who are using this technology to share information, for-profits (not grant-funded). We don't need yet another social network or cow-tipping site.
… If they are venture-funded, it would be with a social enterprise angle.
Chris Metcalf: In the US, I feel like we're seeing the steam come out of pure open data. We need to show the benefits, which are often business. We work with small businesses to do that. We need to focus on that in the community.
Bob: Infrastructure isn't always provided by regulators, grant makers and hackers/coders. It's sometimes created by lawyers and judges. I think some orgs and agencies are hesitating to publish open data because they're afraid of inaccurate records and resulting harm and subsequent lawsuits. We may need some case law to determine this.
… To Conor: because your data can impact stock price, do you have T&Cs to cover that?
Conor: We do have cleverly-written T&Cs. Many many companies to agree to them. Other orgs can learn from our lessons: we don't own the data submitted to us.
… To Chris: Yes, we need to crate value from things built on public data, but also as a provider: how can we increase the value all along the chain?
?? What we see: one the benefits is people correcting data and pushing it back to the publisher. Enhancing it, geotagging, improving our metadata.
… There was a company who wanted to make money out of the data, and we want them to succeed. But this is a public sector answer, I realise.
Lotte: Do not forget SMEs like ours: manufacturers, consulting services, pharmacies… they are the ones who will recreate the value in the data.
<scribe> … New standards, new protocols, new releases, new things.
phil tetlow: This isn't a level playing field. In the development of the Web, it's a case of survival of the fittest, driven by quality, quantity and cost.
… Chances are high that whoever that company is in the future, they are here today. I'm hearing that open data should be a communal type where everyone has a chance. Those at the front will probably stay there; this is a call to them to maintain the lead.
Thijs: Can we learn from the open source business models?
Michele: Yes, one of our models is called "open source like".
… where reusers do not pay. As with Open Corporates, Licenses allowing non-commercial reuse.
Conor: Ask: How did the open source software people monetise it? A lot of them got burned.
Thijs: Training, consultancy,
Bart: In the Netherlands, the interesting datasets are often 3GB downloads. They will pay someone to maintain it in a usable form for them. That's the added value.
<bhyland> Bart: Services model similar to what RedHat does — good packaging and great support for enterprises.
Irina: CKAN is both open source and open data. How do you make it sustainable for businesses who publish data? Isn't that only an issue for businesses who only sell data? If it's a by-product of something else, it may drive more traffic
John: final thoughts
Lotte: We're seeing a shift from the fear of publishing to the network of data and content. Besides data, I look forward to opening more videos and content.
Michele: The first enabler is the government itself. Gov has to build the governmental infrastructure. Inspiring motto from Federal CIO of USA: Everything should be an API.
… 1st step: publish open data, 2nd step: bring gov into the business model.
… data reuse. A shared data model across agencies.
Miguel: SMEs need data to create value and generate new business lines.
Bart: Fire fighting data work is 20% technology and 80% people and politics. I'd like to see this reversed.
Conor: We need to get the business model right both for the providers and users.
<StevenPemberton> Scribe: Deirdre Lee
<StevenPemberton> scribenick: DeirdreLee
<HadleyBeeman> Scribenick: hadleybeeman
<scribe> Chair: Julian Tate
<ivan> scribe: ivan
Opening up the BBC's data to the Web, Olivier Thereaux, Sofia Angeletou, Jeremy Tarling and Michael Smethurst
Sofia: The problem with the older approaches was that the material was not ours:
… we have only certain freedom to use it for some purposes
… another thing we were doing is to use MusicBrainz for the music website
… we do the same thing for the weather website
…. we use a lot of reuse from open datasets
… also from wikipedia for nature and wild life
… we reuse the wikipedia id-s
… because the uris are not static, then the service breaks
… this is a big deal for the BBC
… we cannot blindly rely on dataset and we need editorial control
… these were the first efforts with using LOD
… all of these experiences convinced BBC to invest more into the SW stuff
…. eg for the olymic web site
sofia: the sport web site uses about 4 million user a day
<scribe> scribe: DeirdreLee
Sofia: next steps for BBC is to
roll-out aproach beyond sport
... currently working on linking content together on news site
... trial from birmingham and black country will be rolled out nationwide in coming months
... will annotate news items with other pieces of related content
... would like to roll this out with archival content also
<bhyland> Appreciate Sofia's choice of headline at Google London office, "Google boss defends UK tax record to BBC" with byline "Eric Schmidt defends Google just paying 6M GBP in UK corporation taxes"
Sofia: diagram from presentation
shows content from archives, BBC hope to use Linked Data to
expose their data in interesting ways
... BBC have identified some challenges with publishing Linked Data (listed in presentation)
... what are the drivers for opening up their LD datasets, how to select good quality datasets, and how to meaure success
Alvaro Graves from RPI up next on Democratizing Open Data
Alvaro: Good news, there is
millions of Open Datasets on the Web, billions of triples in
the LOD cloud
... Bad news, there is a lot of inconsistent noisy data out there
... but this can be solved with standards, etc
... other bad news is that much of the datasets out there is boring!
... for example, stale data
... there is also 'unusable' data, that the majority of the general public can use
... how can those without access to technical skills & expertise make use of Open Data?
... small-scale communities or journalists?
... If we look at the Web, in the beginning there was a need for a webmaster to develop web-pages, but then tools like wikis, blogs came along that helped everyone to create web-content
... this should be possible with Open Data too, to encourage use
... visualisations are an easy win to get people to make use of Open Data
... Visualbox, a tool for creating visuallisations based on LD, used in workshop
... feedback was positive, and people learned quickly. however SPARQL was deemed difficult by workshop participants
... another complaint was about the quality of the data
... Call to arms: we need better tools - libraries and APIs for geeks are not enough
... general public usually have better needs. citizens need to be empowered to use Open Data, so they don't need a PhD in Semantic Web to get started!
... visualisations are a good way to start
<JeniT> seems like http://www.tableausoftware.com/products/public is relevant re tools
subtopic: Andreas Koller from Royal College of Art, talking about Opening Open Data
Andreas: background in graphic
... wants to discuss graphic design and coding, and tools that allow ordinary people use Open Data
<alex> JeniT: I once asked them whether they had documentation for an API for whatever software Oxford had bought, and they pointed me back at our own people
<alex> (they didn't seem to do 'open' at that time)
<JeniT> alex: you still have to upload your data to them, I think, to use it, so not for everyone, but in terms of interface it's something to look at
Andreas: designers could help with data ownership and data ethics
<bhyland> RE: reference to the saying, "Data is the new oil!", see http://blogs.hbr.org/cs/2012/11/data_humans_and_the_new_oil.html
<StevenPemberton> s;[link here to CG please];-> http://www.w3.org/community/opened/ Open Linked Education Community Group;
Andreas: When teaching students to code, they may have a fear of tools
<bhyland> Jer Thorp, "Any kind of data reserve that exists has not been lying in wait beneath the surface; data are being created, in vast quantities, every day. Finding value from data is much more a process of cultivation than it is one of extraction or refinement."
Andreas: having libraries for
existing designers' tools would enable easy access to Open
... as would low-level examples and list of data catalogues
... This is an example of how Open Data could be opened up to another community
... small effort for Open Data practitioners, but would be of great benefit to other communities
... easy access to Open Data would enable designers (and other communities) to see the value within the data and enable them to use it and extract knowledge from it
subtopic: Benedikt Groß, Royal College of Art, Large Scale Data & Speculative Maps
Benedikt shows Data Viz Pipeline
Benedikt: most of what we have
been talking about today focuses on the left side of the
... will show some projects that use Open Data
<bhyland> The HBR article by Jer Throp nicely supports the thoughts of the speakers, (I think), "As we proceed towards profit and progress with data, let us encourage artists, novelists, performers and poets to take an active role in the conversation. In doing so we may avoid some of the mistakes that we made with the old oil."
Benedikt: Metrology, visualises the London tube map with Open Street Map data as a mental map, by mapping actual locations to tube map, using mathematical models
<StevenPemberton_> He showed the mapping from true life to the tube map, and then reversed the process to make a real map with the same distortions
Benedikt: Speculative Sea Level
Explorer project Combines NASA data on sea level with map
visualisations to show effects of sea levels rising and
... sneak preview to m3ta.js, a visual programming language with metaphor to lego-blocks
<bschloss> Fascinating to see what Royal Academy of Art people can do for visualizations. Can less skilled people do something nearly as good. My IBM colleagues are experimenting with a site called Many Eyes 2.0 (beta) at http://www-958.ibm.com/software/analytics/labs/manyeyes/
subtopic: panel discussion
julian: do you see yourself creating a toolbox for visulaising open data?
benedikt: great to release tools, but you can't just release source-code but need documentation and examples too, which is time-consuming
Alvaro: you can't just release code/tools/projects, but you are responsible for maintaining it (like kids :) )
<yvesr> had very good experiences with http://d3js.org/ for data visualisation - very powerful toolkit
Question from audience
<bhyland> @Alvaro, Interesting analogy, Open Source is like a marriage, 'it comes back and you have to answer questions… it is also like children, you cannot let them out into the wild [without guidance]' ;-)
Ivan: if you have to convince CNN in an elevator pitch to use the approach as BBC, how would you do it?
yvesr: ) (BBC, from audience): focus on your own data, and use Open Data where possible to fill the gaps
TimBL: Who publishes data about
their own products?
... if people publish data about their own products, there won't be a need for CNN to publish data
<bhyland> I invite everyone to publish information about their organization, project, product and/or service on the Web today using http://dir.w3.org.
<bhyland> If you care, it is a entirely Linked Data app. If you don't care, just fill out the form, publish the dir.ttl file produced for you automagically (like FOAF-a-Matic) on the public Web and submit it for harvesting.
sofia: so much in archives, not just about publishing data, but reusing data
Comment from audience: metadata is advertising for your data
<bhyland> RE: dir.w3.org, if you want to read an FAQ, see http://dir.w3.org/directory/pages/faq.docbook?view
Neil Benn (Fujitsu): in 2020, what have the political arguements been to convince governments to publish Open Data
Alvara: it's socially beneficial
for everyone, Open Data enables people to solve more
... in chile, a lot of money is being invested in start-ups and entrepreneur programmes; is is not fair to ask for similar spend on democratising data?
Benedikt: in the future, there mightn't be an open data debate, it will just be the standard
Bschloss: TimBL alluded to a key
thing, CNN will have to put out metadata on related
... uses the example of airlines. putting out ticket information because they wanted to be listed
Andreas: key is that entry level for using Open Data is very low
bhyland: there is now a community directory online dir.w3.org/
<timbl> logger, pointer?
<bschloss> CNN will have to put out metadata or risk losing sales or eyeballs. Let's learn from history where first movers got value (like Airlines that listed their schedules and prices on GDS', then other Airlines followed rapidly to not be at a disadvantage)
to list Linked Data products, services and projects
<alex> bhyland: the "Create an entry" link at http://dir.w3.org/directory/pages/faq.docbook?view doesn't work, and there's a missing stylesheet error when one goes where you'd think it should have pointed
<bhyland> On behalf of the W3C Gov't Linked Data Working Group, I encourage everyone attending this workshop to add their organization to dir.w3.org today or tomorrow.
<alex> (ah; I'd missed the '?view' off the end of my guessed URL)
sofia: important to show the value to publishers of opening up data
<bhyland> It is simple to do, fast and gets more valuable Linked Data on the public web … plus it builds community & helps us all help one another.
<StevenPemberton> Best Buy reports a 30% increase in page views, and 15% increase in click throughs
Alvaro: if a major part of the population cannot access the data, the technical discussions are irrelevant. general public needs to be empowered to access and use Open Data
<bhyland> @alex, what browser are you using? I see it ok on FF & Chrome
Andreas: agrees, general public should realise Open Data is THEIR data
<rjw> bhyland: the Create an entry link on http://dir.w3.org/directory/pages/faq.docbook?view fails :-(
Benedikt: things are looking positive, lets hope to implment even 30% of what we have been discussing here tday
<bhyland> Ah Alex, I see the problem, try this http://dir.w3.org/directory/pages/create-entry.xhtml?view
<bhyland> Thanks for pointing out that incorrect link, will fix now.