07:50:05 RRSAgent has joined #odw 07:50:05 logging to http://www.w3.org/2013/04/23-odw-irc 07:50:38 PhilA has changed the topic to: ODW13 Day 1 07:50:48 meeting: Open Data on the Web Day 1 07:50:52 chair:PhilA 07:51:36 jpcs1 has joined #odw 08:19:22 jpcs1 has joined #odw 08:25:33 jpcs1 has joined #odw 08:25:39 ivan has joined #odw 08:25:45 timdavies has joined #odw 08:25:47 daveL has joined #odw 08:25:49 JeniT has joined #odw 08:26:01 Steven has joined #odw 08:26:29 yvesr has joined #odw 08:27:23 markbirbeck has joined #odw 08:27:43 mig_garcia has joined #odw 08:28:01 danbri has joined #odw 08:28:05 floppy has joined #odw 08:28:12 cjg has joined #odw 08:28:18 bschloss has joined #odw 08:29:02 scribe: PhilA 08:29:05 Agenda: http://www.w3.org/2013/04/odw/agenda 08:29:08 scribeNick: PhilA 08:29:27 Topic: John Sheridan - Building our houses on rock 08:29:37 paper http://www.w3.org/2013/04/odw/odw13_submission_25.pdf 08:29:38 laurent_au has joined #odw 08:30:04 John Sheridan from the National Archives begins talking - move from 'ephermal, temporary' world to a world where there is more confidence in our data 08:30:06 JohnS: Talking philosophically about the need for longevity 08:30:21 ... how do I discover data that I can trust nad rely on, use etc. 08:30:25 How to discover open data I can trust and rely on 08:30:46 Johns: How do we firm up our open data can begin to use it and re-use it confdeintly and well 08:30:58 JohnS: Sustaining our open data. How do we do that. 08:31:15 bhyland has joined #odw 08:31:16 ... our budgets are declining. how do we sustain our publishing activity 08:31:41 ldodds has joined #odw 08:31:48 AndyS has joined #odw 08:31:58 JohnS: Share the responsibility of supporting and curating open data 08:32:11 libby has joined #odw 08:32:17 ... open data community is good at coming up with the rock to build on 08:32:40 yoshiaki has joined #odw 08:32:45 JohnS: I work for a reputable institution. You'll trust the data if you trust the institution 08:32:51 johnS: Adds solidity 08:33:10 cerealtom has joined #odw 08:33:19 JohnS: Extreme end is legislation. e.g. INSPIRE regulations that demand certain data norms 08:34:08 JohnS: How can we know if policies like INSPIRE will work? Should we be asking for more of that or going to people like the National Archives and asking them for commitments 08:34:25 JohnS: There's a lot to do to build our data on rock 08:34:26 edsu has joined #odw 08:34:33 lottebelice has joined #odw 08:34:45 JohnS: The ODI certificate may be one of the most important things for the community to work on this year 08:34:49 jpcs1 has joined #odw 08:34:58 http://theodi.github.io/open-data-certificate/ 08:34:58 fumi has joined #odw 08:35:04 JohnS: it would be good to discuss here what role things like the ODI certificate can have 08:35:23 JohnS: Talking about the Gazettes (London Belfast etc.) 08:35:50 JohnS: This is about putting things on the public record, where data is available, provenence and authenticity supported and availability guaranteed 08:35:56 ... service will be completed by September 08:36:04 ... how do we see more services like this come into existence 08:36:15 ... it's about devising tracks 08:36:40 ... the way forward to make all this happen with a solid basis, that we can build on 08:37:05 cgueret has joined #odw 08:37:19 JohnS: No one organisation can do this on its own, we need to act as a community to solidify our efforts 08:37:52 THe ODI certificate is really interesting, something to consider for things like http://openglam.org/principles/ and http://www.opencultuurdata.nl/about/ 08:37:52 jpcs1 has joined #odw 08:37:53 topic: Can open data (and big data) be used to improve the operations of development organisations?, Millie Begovic Radojevic 08:38:07 paper http://www.w3.org/2013/04/odw/odw13_submission_3.pdf 08:38:36 Millie: UNDP spends about $5bn a year that generates a lot of data. Have we improved things? What effect have we ahd 08:38:44 ... we also generate procurement data 08:38:56 ... we use thaty data mostly for accountability purposes 08:39:13 .... we've been wondering what other insights might be accessible from that data 08:39:30 ... can we work out which projects will be most effective 08:39:46 Alexrcoley has joined #Odw 08:39:47 ... what about the companies we pay, who is most effective, who do they employ etc 08:39:52 rrsagent, set log public 08:40:12 rjw has joined #odw 08:40:17 Millie: We started a series of events called Data Dives where we worked with people we don't normally work with 08:40:21 rjw has left #odw 08:40:26 ... data analysists, programmers etc 08:40:35 ... are there problems that we're not asking that we shoujld be asking 08:40:56 Millie: We'll be opening a new challenge prize shortly for the best algorithm 08:40:59 StevenPemberton has joined #odw 08:41:13 stressindikator has joined #odw 08:41:41 Millie: We took data from the World Bank on major contracts in 2007. We were interested in the suppliers and the relationships between those companies 08:42:00 PhilA: As an aside - must introduce Mille to Chris Taggart this evening 08:42:17 Mille: Certain companies tend to win contracts in particular sectors 08:42:50 ... two companies dominate this sub network of projects. What happens to the sub contractors is something goes wrwong with the main contractor - few points of failures 08:43:00 ... do certain clusters of companies that tend to bid together 08:43:21 ... we see clusters. Are these people really good ior is there something else going on? 08:43:34 ... do contracts go to home countries or from the more developed world 08:43:40 rrsagent, here? 08:43:40 See http://www.w3.org/2013/04/23-odw-irc#T08-43-40 08:44:20 Millie: A few hours' work produced these insights 08:44:30 trc has joined #odw 08:44:42 ... the World bank folks had the data but not the insights which actually didn't take a huge time to create 08:44:56 This analysis might be interesting (and easy) to apply to http://gtr.rcuk.ac.uk/ ... 08:44:57 richardm has joined #odw 08:45:07 http://www.w3.org/2013/04/odw/odw13_submission_3.pdf is a 404 for me btw 08:45:09 Millie: shows visualisation of projects and performance 08:45:32 edsu, i think the whole paper is in the 'abstract' 08:45:47 PhilA: Thanks edsu - I'll fix than when I'm done scribing 08:45:59 jpcs1 has joined #odw 08:46:08 Millie: It's not big data, it's lots of little data scattered around 08:46:35 danbri: thanks I found http://www.w3.org/2013/04/odw/papers now :) 08:46:39 ... global challenges coming up. We need help, people in orgs who can help open more data sets and help us get more inshights out of that data 08:46:46 danbri: would make interesting reading, although I've not seen any open data on that? 08:47:15 re eu, I think you'd need a temporal view... some partners sorta dominate, then EU notice that and punish them in later rounds 08:47:17 this was the link from the final slide of the talk: http://europeandcis.undp.org/ 08:47:32 AndreaP has joined #odw 08:48:17 Topic: Researching the emerging impacts of open data in developing countries (ODDC) Tim Davies 08:48:22 rjw has joined #odw 08:48:37 paper http://www.w3.org/2013/04/odw/odw13_submission_19.pdf 08:49:16 TimD: Poses questions - why people are interested in open data - transparency, innovation, inclusion and empowerment 08:49:30 MLutz has joined #odw 08:49:32 ... the way we do open data can make it easier to realise these differnet aspects 08:49:57 richardm has left #odw 08:50:47 s/differnet/different 08:50:58 TimD: Talking about the launch (tomorrow) of ODDC 08:51:13 ... Web Foundation and OGP are behind it 08:52:08 bhyland has joined #odw 08:52:13 s/inshights/insights 08:52:18 chrismetcalf has joined #odw 08:52:41 TimD: Slides are expressive and contain the gist of the talk 08:52:56 TimD: Draw out some key points 08:53:12 TimD: As we've seen, supply needs to be built on solid foundations 08:53:36 naomi has joined #odw 08:53:50 TimD: Are we building platforms that reply on always on high capacity systems in rural areas of the developing world 08:54:13 ... are the standards right/ We articulate standards but are the right people in the room 08:54:39 ... loads of stahndards being specifed - but do they work in all contexts? Does a London-based system work in Kenya? 08:54:51 s/stahn/stan/ 08:55:23 TimD: Are the licensing arrangements, correct/ Are first movers keeping others out? 08:56:08 TimD: We have opendataresearch.org and more - see sldies 08:56:40 Topic: Open Data NEXT: a strategy for social & economic value from Linked Open Data Hayo Schreijer 08:56:57 paper http://www.w3.org/2013/04/odw/odw13_submission_50.pdf 08:58:16 Hayo: Talking about the Dutch linked data project in NL 08:58:31 hayo: We started out open data programme 2 years ago 08:58:43 ... want to help government depts open their data 08:58:44 good collection of questions there 08:58:46 ... 08:58:52 what problem are we solving? 08:58:56 ... now 6K data sets from national and local administrations. 08:59:04 why spend money on opening data? 08:59:07 ... some great apps but not really solving real problems 08:59:13 why is nobody using our data? 08:59:25 why dont they build an app like...? 08:59:27 ... what actual problem does it solve? Where are the apps that do clever stuff? 08:59:43 hayo: we've reached a kind of impasse; governments are losing enthusiasm 08:59:44 s/sldies/slides 08:59:51 Hayo: We need to look at how OD is being used to solve real problems? 09:00:05 hayo: our approach: focus on real-life problems 09:00:18 Hayo: Purple areas on shown map are where population is declining, orange it's growing 09:00:29 e.g. disadvantaged and depopulated areas 09:00:42 Hayo: we want to help those people with the real problems, disadvantaged areas etc. 09:01:02 Hayo: trying to companies together, working on the problem 09:01:25 ... There's a problem of continuity. data is opened once and not updated 09:01:37 ... produced for one hackathon and then stopped 09:01:46 ... we're tackling that with linked data 09:02:14 ... NL has a lot of open data around legislation, case law etc. Gov not using it, they're buying it from people who put wrapper around our data and sell it back 09:02:36 Lieke has joined #odw 09:02:39 ... can we reduce the amount of money we spend on getting our own data and maybe we can profit from it ourselves 09:03:32 Hayo: We notice that policy makers often say "I base my policy on law x" - people make comments or annotations - we can use those in linked data and make the data more useful 09:03:42 ... shows nice labelled directed graph 09:04:31 Hayo: We're allowing people to make real links between laws, policies, their text or whatever 09:04:37 ... what marketeers call deep linking 09:05:03 ... we reward people for linking to laws. We contact people and say, Ok you link to the law, how about linking to this policy 09:05:16 ... we can notify people that link to a law as it's clearly important to them 09:05:25 ... laws have versions 09:05:38 ... need to be able to point to a lw as it was in 2010 etc. 09:06:34 Hayo: System will be available in September - getting government people enthusiastic about using their open data. This is a good example of showing govs how they can use their data 09:06:44 ... of course others can use it too. 09:07:19 Topic: Open Data on the Web: 3 Principles For Maximum Participation, Bob Schloss, IBM 09:07:31 paper http://www.w3.org/2013/04/odw/odw13_submission_54.pdf 09:07:44 slides (already!) http://www.w3.org/2013/04/odw/W3COpenDataBriefingMaximumParticipation2013Apr19.pdf 09:07:51 cjg_ has joined #odw 09:08:41 BobS: We put together what we've put together when considering what we think might be missing 09:08:52 BobS: I think it's great when we get lots of open PSI 09:09:04 ... we need it in educational, arts and business worlds too 09:09:13 ... we need to get a virtuous circle where value is created 09:09:38 amp has joined #odw 09:09:39 ... looking at an Irish linked data front end 09:09:41 HadleyBeeman has joined #odw 09:10:14 BobS: we started in Oct 2011 with 4 Irish authorities (Dublin + 3) 09:10:32 BobS: Looked at the cost/benefits of uploading open data 09:10:53 ... this issue that the people who publish trust that their effort will deliver a return 09:11:05 ... people have to want your data and they want it in their forma 09:11:07 yaso has joined #odw 09:11:10 ... (not yours) 09:11:24 ... you need to be able to state how complete is the data, when and where does it cover etc. 09:11:44 ... whole cluster of ideas 09:11:59 ... you can synthesise this open data with yours and do good stuff 09:12:02 s/forma/format/ 09:12:07 BobS: The three principles 09:12:25 bobS: (see slides) 09:12:45 s/slides/ slide 6/ 09:12:48 cjg has joined #odw 09:12:54 slide 6 09:12:59 s/slide 6// 09:14:07 BobS: Slide 7 for the second principle 09:14:25 ... talking about things like showing logos for limited time, potentially contacting data users 09:14:48 ... need to be able to log if there's a new version of the data 09:16:26 disturbed a bit about the additional limitations bschloss is suggesting for "open" data 09:16:43 seems to be stretching what "open" means beyond the usual definitions 09:16:45 Bingo! 09:16:46 +1 09:16:59 "What if terrorists use our data" is on my bingo card: http://is.gd/gXDEaG 09:18:35 (but to be fair, hazardous materials is actually a reasonable dataset to keep limited access. ) 09:19:32 Topic: Q&A session 09:19:32 Except if you want to see if there is hazardous material stored near your school. #west 09:19:45 yaso_ has joined #odw 09:19:58 cjg: puts a new spin on the JISC's ‘The coolest thing to do with your data will be thought of by someone else.’ 09:20:18 JohnS: We make instiutional commitments 09:20:36 Hayo: Our governments trust third parties more than our open data 09:20:39 markbirbeck has joined #odw 09:20:42 takumi has joined #odw 09:20:43 ... we're trying to educate tem 09:21:09 TimD: We're trying to talk about purposes and use of data more than you need to publish in a given format etc. 09:21:42 Millie: This is a room full of evangelists, the shift in thinking needed is enormous, don't underestimate that 09:22:01 masao has joined #odw 09:22:27 TomHeath: I like John's quotes. I don't like "if you agree with me you're wise if not you're a fool" 09:22:40 TomHeath: How do we convince others of the wisdom 09:23:15 BobS: What we're doing in Dublin - we capture the identity of the app, program and org that downloads everything and there's an offline process for assessing the value of that 09:23:30 ... then go back to the data publisher and tell them what's going on, what people are doing with your data 09:23:50 cjg_ has joined #odw 09:24:28 aside: best way to convince people is to show them the utility of it, not appealing to their better (wiser) nature imho 09:24:46 pascalRomain has joined #odw 09:25:17 cjg_ has joined #odw 09:26:34 &coffee; 09:26:56 edsu: I swear that we've had people suggest that if terrorists got access to the live bus times they could use it… there's a wear and tear on my desk from banging my head on it. 09:27:34 PhilA has joined #odw 09:27:59 PhilA: grrr dropped off IRC, sorry, missed some comments and questions 09:28:02 yeah, I've got a talk at IWMW this year about how open data can get better value for money -- seems a good way to think about it in these tightened times. 09:28:28 BobS: IBM has been looking at specific cities. We don't push up hill - we find the people that want to do open data 09:28:38 BobS: We also need to find the person in the street 09:28:51 ... we don't have 'how open data can improve your life' days 09:29:04 Hayo: Yes, talk about problems, not open data 09:29:25 TimD: Yes, we want data you can build upon in gov and society 09:29:41 TimD: Lots of great examples from places like Sao Paulo 09:29:59 ... talking about accountability and capacity not open data 09:30:08 danbri_ has joined #odw 09:30:08 We have a policy of always putting a front end on our open data; even if it's as simple as a basic HTML page. 99% of the users are just using that and not the underlying data, but that's OK. 09:30:14 ... so the new research project will include lots of case studies from Brazil. 09:30:37 BobS: In Africa, the knowledge of prices for their farming goods is transforming farming 09:30:50 cjg: :-D 09:30:59 BobS: So we've been working on projects for people who can't read - working on spoken web in India 09:31:13 rrsagent, make minutes 09:31:13 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 09:31:27 Millie: In the Balkans we have an issue of forest fires and consequent air quality 09:31:35 ... I want to know if my child can go out on the street 09:31:42 ... we have kids building air quality monitors 09:31:51 ... we move to solutions too quickly 09:32:33 cjg: it's hard to develop all the apps/visualizations people want ; giving them the data and empowering them to do it seems like a no brainer -- except to people who don't want new interesting visualizations of their data :) 09:32:45 JeniT: For Bob - you spoke about the need for collecting data about people using the data and restricting terrorists's access - that's not the usual definition of open data 09:32:56 BobS: I see a spectrum, not a point 09:33:17 I generally tell people that "open" means removing as many barriers as possible 09:33:30 ... we're going to have rock solid stuff - it will be there and accurate for 9 years. Then there's softer and softer - we need to cover the specturm 09:33:31 the barriers can be technical, social or legal. 09:33:58 "as open as possible" can still be used to describe data which is confidential. 09:34:03 For reference, I think JeniT is referring to the Open Definition http://opendefinition.org/ 09:34:09 markbirbeck has joined #odw 09:34:35 Great question… I've been wondering as well if we're still having the same discussions (as we were a year or two ago). 09:34:37 bhyland: Yes. we're all evangelists but we're not working in a vacuum. There are people in gov who are not minded to hand data over to a bunch of smart people they don't trust 09:35:21 Hayo: It takes pateince. We have to change contracts occasionally. We changed our legislation publishing contractor 5 years ago - that made a big difference 09:35:56 I think he said that it took 5 years to change the contract 09:36:03 tomag has joined #odw 09:36:21 and only then could they use their own data 09:36:34 Millie: SorryScribe note - sorry, I missed Millie's comment about Pulse?? 09:37:42 PhilA: Yes, my pencils are sharpened. 09:38:00 yaso_ has joined #odw 09:38:27 Billr: My experience as a private sector person working for gov - see that some of the bigger people only just picking up the potential for open data. Some early birds are winning 09:38:40 s/PhilA: Yes, my pencils are sharpened.// 09:38:46 Last thoughts... 09:39:04 JohnS: Spend more time talking to people not involved with open data about fixing problems 09:39:15 BibS: OD is a means, not an ends. talk about the ends 09:39:25 bhyland has joined #odw 09:39:25 Hayo: OD will take time and money. Maybe 5 years + 09:39:32 BillR: +1 09:39:34 floppy has joined #odw 09:39:53 Millie: UNDP uses tax payer's money to change people's lives - we need help 09:40:08 TimD: Think about who's in the room when we define standards 09:40:46 Topic: The Role of PDF and Open Data (Jim King, Adobe) 09:41:17 scribenick: markbirbeck 09:41:19 timdavies has joined #odw 09:41:23 Scribe: Mark Birbeck 09:41:34 bhyland has joined #odw 09:41:47 Paper: http://www.w3.org/2013/04/odw/odw13_submission_52.pdf 09:42:17 Concluding remark from first session: "Open data is a means, not an end. Come at it from what real world problems it will solve." 09:42:24 " 09:42:46 HadleyBeeman has joined #odw 09:42:49 Paul Davidson introducing James King — senior principal scientist at Adobe — to talk about how PDF is more open than we all think it is. 09:42:57 BibS++ concur 09:44:27 Structure of talk: open data paradigm, PDF itself, and then its role in open data. 09:44:42 bhyland has joined #odw 09:44:44 s/:/- 09:44:46 jpcs1 has joined #odw 09:45:08 Organisations taking data, shaping it and presenting it. 09:45:33 …but others — the "processors" — would prefer to deal with the raw data... 09:45:58 …they might present that too, but also use the data to draw new conclusions, or use it for advocacy. 09:46:17 …A further group is that of the tool providers, who will help us process this data. 09:46:40 …About 30% of the room are providers... 09:46:53 …80% are processors... 09:47:23 …most are consumers, and some are tool providers. 09:48:11 …PDF will be 20 years old this June. 09:48:23 cjg has joined #odw 09:48:24 …PDF and Acrobat are different beasts. 09:48:57 …The internals of PDF have always been published, and it became an ISO Standard in 2008. 09:49:06 PhilA: Nice approach to backwards compatibility from Adobe for PDF 09:49:18 …A PDF 1.0 doc is also a 1.7 doc — always backwards compatible. 09:49:27 Jim King: PDF will be 20 years old this June. PDF 1.7 became an ISO Standard in July 2008. ISO work on PDF is ongoing. 09:49:40 hopefully mozilla's pdf.js will get a mention ... 09:50:55 …To make the PDF spec into a 'proper' ISO Standard the team at Adobe had to go through the entire document…very thoroughly… 09:51:12 amp has joined #odw 09:51:31 …PDFs are abundant, containing lots of useful information. 09:51:50 I had surprisingly good results converting our student union committee minutes from PDF to RDF: http://lemur.ecs.soton.ac.uk/~cjg/TheyWorkForSUSU -- just looking at where on the page text appears gives more semantics than the naive pdf2utf8 (or 2html) approach. 09:52:12 …It's a format that distinguishes between text and graphics, and can be used to produce good looking documents. 09:52:22 …But it's not a data format. 09:52:27 cjg: i think that's roughly what google scholar does when it scrapes pdfs 09:52:55 bschloss has joined #odw 09:53:15 …Billions of documents out there, but difficult to extract any data that's in there. 09:53:38 cjg: grabbing the largest text at the top of the first page as the title 09:53:51 …If pages *contain* graphics then extract that with something like Illustrator. 09:54:09 …If pages are text then there's a bunch of software that can process the text. 09:54:23 …(A big list is on Wikipedia.) 09:54:37 There is a 'spectrum of open data' -- totally free, available forever, no recording of downloader is one end of that spectrum, but airlines, investment markets, sports leagues, available job listing websites, retailers are all doing open data on a slightly different point on the spectrum. 09:54:50 …And if the pages are images (i.e., rather than *containing* images) then need to go the OCR route. 09:54:57 trc has joined #odw 09:55:07 http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software 09:55:30 We found a nice command line tool which converts PDF to and XML representation of the data structure inside and that gets it into our 'hacking comfort zone' 09:55:46 -> http://en.wikipedia.org/wiki/List_of_PDF_software wikipedia list for pdf tools 09:56:11 … http://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_software 09:56:19 … http://en.wikipedia.org/wiki/List_of_PDF_software wikipedia list for pdf tools 09:56:25 (Thanks Ivan and Steven!) 09:56:58 …If you're making PDFs, here's what you could do to make things easier. 09:57:37 …Making files that both contain raw data and look good is difficult. 09:58:32 …There *is* software around that can embed metadata to provide structural information. 09:58:40 AndreaP has joined #odw 09:59:09 Seems to me that any producer of a PDF who wants it to be available to people with no sight is hopefully providing a table or textual alternative rendering in the PDF for any diagram or image in the PDF, yes? 09:59:17 …The structural information would be stuff like reading order, tags such as headers, footnotes, figures, maths, and so on. 09:59:42 …Tools can make use of this extra data which will make the extraction process much more reliable. 10:00:13 markbirbeck1 has joined #odw 10:00:25 …A second thing to do is make use of the attachment facility. 10:00:37 scribenick: markbirbeck1 10:00:45 …A second thing to do is make use of the attachment facility. 10:01:08 lottebelice has joined #odw 10:01:27 …Raw data on its own is probably insufficient for doing something useful. 10:01:48 rrsagent, make minutes 10:01:48 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 10:01:53 …For example, what's the currency? the data format? the semantics of the fields? provenance? 10:02:00 the attachments-in-PDFs thing might actually be useful for scholarly publications, so that the data doesn't get divorced from the paper 10:02:27 …So we create a PDF file that contains raw data with a schema, giving the end-user everything they need. 10:02:51 s/(Thanks Ivan and Steven!)// 10:03:04 bhyland: yeah, presumably there's not the tools support beyond what Adobe sells 10:03:06 …Can then make use of all the nice PDF features that have evolved over the last 20 years, such as digital signing. 10:04:03 …There are some examples in the slides. 10:04:23 AndyS has joined #odw 10:04:23 bhyland: same could be said of most metadata on the web 10:04:48 Peter Murray-Rust: Spent years hacking PDFs in the wild. 10:05:15 …Trying to write software that will process them, but they are generally pretty bad. 10:06:08 …If anyone else is trying to hack on this then please talk to me; there's hundreds of billions of dollars worth of information out there that is simply unusable at the moment. 10:06:31 I had a bit of a rant about PDFs as a way of communicating data to a reporter from the register, which resulted in them publishing this: http://lemur.ecs.soton.ac.uk/~cjg/Archive/Photos/2011/cjg-boffin.png (I'm quite proud of that) 10:06:39 Dan Brickley: Is this thing loud enough? 10:07:04 …PDF can be used well and powerfully, and of course it's clear that some people aren't using it well. 10:07:08 heh, re: billions of dollars worth of information that's unusable, you have to wonder if that's by design, not by accident ... 10:07:14 …You didn't mention XMP, though, which includes RDF. 10:07:24 …You also didn't mention accessibility. 10:07:44 Peter Murry-Rust - Scientific publishers are paid $10B/yr worldwide to lock up scholarly publishing, that is after governments spend $100B/yr globally on scientific funding for R&D in the first place. He is looking for people to help him in his mission to unlock the enormous value locked in PDFs. 10:08:45 serena_v has joined #odw 10:08:46 s/Murry-Rust/Murray-Rust 10:09:08 James: The accessibility aspects are quite mature in PDF, and the structured aspects help that. 10:09:13 roger has joined #odw 10:09:15 PDF is a page description language, so not in a reading order necessarily 10:10:04 …We don't have much control over what people produce, although things have improved in the last 5 years. 10:10:45 @edsu - perhaps re: your comment above. My experience suggests that we're more thoughtful publishing structured data about data sets (metadata) because they are fewer in quantity whereas PDF are like water, they are everywhere and almost "too easy" to create but the mere click of "Print —> PDF" … 10:11:03 s/but the/by the 10:11:07 speaker: For many people PDF data is closed data. 10:11:31 yaso has joined #odw 10:11:55 speaker2: You've outlined many things I didn't know were possible, so why is there not the uptake on these features? 10:12:45 @hadleybeeman - because the tools are proprietary, complex to use … at least harder than clicking "Print —> PDF" and well let's face it, people are lazy and hand entered metadata has been proven to be *very* challenging and highly inconsistent. 10:13:36 James: Not sure if it's our fault. In some areas there have been successes, perhaps where there's industry interest or our sales people have promoted a feature. 10:13:52 s/speaker2/hadleybeeman/ 10:14:07 s/speaker2/HadleyBeeman/ 10:14:21 markbirbeck has joined #odw 10:15:08 If they want stuff like metadata to be adopted, then surely they need to encourage support in tools other than their own (OpenOffice; Word) 10:16:19 hideaki has joined #odw 10:17:16 yoshiaki has joined #odw 10:17:33 hideaki has left #odw 10:18:26 JeniT has joined #odw 10:26:39 jpcs1 has joined #odw 10:29:19 StevenPemberton has joined #odw 10:31:16 floppy has joined #odw 10:31:39 fumi has joined #odw 10:32:40 rjw has joined #odw 10:33:05 bhyland has joined #odw 10:33:16 cjg has joined #odw 10:33:18 stressindikator has joined #odw 10:33:26 cgueret has joined #odw 10:33:47 st has joined #odw 10:33:56 markbirbeck has joined #odw 10:34:15 ldodds has joined #odw 10:34:17 hideaki has joined #odw 10:34:17 yoshiaki has joined #odw 10:34:44 StevenPemberton has joined #odw 10:34:46 yaso has joined #odw 10:34:48 rtroncy has joined #odw 10:35:06 takumi has joined #odw 10:35:12 rrsagent, here? 10:35:13 See http://www.w3.org/2013/04/23-odw-irc#T10-35-12 10:35:30 Topic: Panel: Tabular Data Formats and Packages Chair: Jeni Tennison 10:36:17 Jeni: sets the tone around different formats for tabular data, advantages & disadvantages of various approachs. 10:36:33 … NB: Special allowance for Rufus who has been known from time to go on ... 10:36:44 markbirbeck1 has joined #odw 10:36:50 serena has joined #odw 10:36:54 Rufus: Intro on OKFN and their mission to liberate data 10:37:03 HadleyBeeman has joined #odw 10:37:09 cjg_ has joined #odw 10:37:12 … Proposed "Our Mission" is to make it radically easier for data to be made used & useful" 10:37:16 scribenick: bhyland 10:38:26 Rufus: Stated problem of data on the Web in many different formats & issues that poses. 10:38:38 http://data.okfn.org/ 10:38:45 floppy has joined #odw 10:38:46 johnlsheridan has joined #odw 10:39:22 naomi has joined #odw 10:39:31 … propose 3 minor innovations involving " borrowing" approaches others have used before us. 10:40:31 In this model, there are the usual suspects … data creators & packagers, consumers and the effort in the middle to do "data packaging" 10:41:03 … Linked Data effort has been knowledge APIs and has been successful [to varying degrees] 10:42:40 … Packaging has to be done as a distinct step that minor packaging effort, agnostic about the data, its packaging is designed specifically ... 10:42:52 masao has joined #odw 10:43:18 pieterc has joined #odw 10:43:50 … Today, there is a huge amount of friction on getting & using data on the Web. We want to build for the Web. Rufus said RDF is not Web native … he has been laughed at when proposed its use ... 10:44:31 Proposal: 1 - One (small) part of the data chain; 2- Build for the Web; 3 − 4 − 5 [to fast to record] 10:44:53 AndyS has joined #odw 10:44:58 rrsagent, make logs public 10:45:04 Concluding remark: Package data more effectively and produce one killer tool to make data more accessible. 10:45:06 rrsagent, draft minutes 10:45:06 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html PhilA 10:45:22 Speaker: Omar Benjelloun, Google, DSPL 10:46:12 Omar highlighted Google public search feature, Knowledge Graph capability and origins. 10:46:14 s/DSPL/GPDE 10:46:44 StevenPemberton has joined #odw 10:46:55 … Highlighted the Public Data Explorer, using Dat Cube representation. Anyone can upload & share data using RDF. 10:47:44 Oh. ignore my s/DSPL/GPDE 10:47:52 DSPL = Dataset Publishing Language, describes tabular data + semantic description including concepts describing re-useable data types. All packaged in a zip file. Visualizations can be shared. 10:48:09 rrsagent, hre? 10:48:09 I'm logging. Sorry, nothing found for 'hre' 10:48:15 rrsagent, here? 10:48:15 See http://www.w3.org/2013/04/23-odw-irc#T10-48-15 10:48:54 Omar's Propositions: Datasets need good Web pages with stable, official, up-to-date canoncial location. Also, add good markup for reasonable SEO. 10:49:07 s;s/DSPL/GPDE;; 10:49:29 AndyS has joined #odw 10:49:54 … Let tables be tables. [Let it be… ] Relational data & schema are well understood. Better than triples: tables naturally capture relations. Better than APIs: no access patterns, scalability issues. 10:50:28 … Add semantic annotations to tables. Leverage EXISTING approaches (RDF, schema.org) [emphasis is scribes :-)] 10:50:47 … Better to follow this approach than create custom data models (SDMX, DSPL). 10:51:10 Next speaker: Stuart Williams, Epimorphics 10:51:49 Overview of Epimorphics, doing services and LD design work. Working with data.gov.uk. Helped to lay down some of the sand that John Sheridan previously described. 10:52:05 Working to publish bathing water quality, now expanding to the river network in the UK. 10:52:54 … Thinking about getting 'beyond the data', we feel that we need to get beyond the 4 & 5 Star Data attribute, and evolve the message to solving a real world problem. 10:53:23 … Works with the UK Environmental Agency to make publication of valuable data … easy! 10:53:44 … Think about how to allow publishers to add simple bits of markup. 10:54:09 … Think about how to contribute to the virtuous circle of making it easy to contribute something valuable & receiving something valuable. 10:54:16 pieterc has joined #odw 10:54:23 Next speaker: John Snelson from MarkLogic 10:54:42 Describes himself as an XML-guy and actively involved in W3C around those recommendations. 10:55:01 hideaki has joined #odw 10:55:04 … MarkLogic helps its customers use data effectively using XML. 10:55:26 timdavies has joined #odw 10:55:26 … John is a data pragmatist. We must look beyond those formats. 10:55:53 Next Speaker: Tyng-Ruey Chuang, from Academic Sinica in Taiwan 10:56:23 Involved in Taiwan's culture heritage efforts. 10:56:25 Tyng-Ruey Chuang, Academia Sinica (Taipei) see http://www.iis.sinica.edu.tw/pages/trc/ 10:56:40 s/Academic/Academia/ 10:56:49 I can't ask our catering department to provide menus in a well structured RDF format :-) 10:56:56 (much as I wish they would) 10:57:29 … dealing with heterogeneous collections of content including media files, documentation. His focus is on sharing & making cultural heritage content usable for the long term. 10:57:55 rrsagent, make minutes 10:57:55 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 10:57:58 … Putting data on the Web itself does not guarantee longevity. 10:58:53 i/Jeni: sets the tone/scribenick: bhyland 10:58:56 rrsagent, make minutes 10:58:56 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 10:59:23 … We can & should learn from the Free Software Foundation. Supports giving people the ability to make copies of content. Highlighted the importance of porting content to be ported to many other computer systems, both on & off the Web, for it to be considered truly open. 11:00:34 Panel Convener is Jeni … She puts the following question to Rufus. Q) There is debate on how manage metadata, to embed or not ... 11:00:47 i/Jeni: sets the tone/scribe: Bernadette Hyland 11:01:08 rrsagent, make minutes 11:01:08 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 11:02:51 Rufus: Regarding embedding, it almost becomes an AI project to figure out metadata that is embedded. It can be a nightmare. The beauty of keeping it separate is it is easier on tools & therefore treatment by tools. He is supportive of graceful degradation. 11:03:44 DeirdreLee has joined #odw 11:04:09 Tyng-Ruey Chuang: Prefers to have structured schema as part of the data (?) 11:04:59 Omar: Mainly, the important thing is to get agreement on format, then all kinds of good things can happen. Linking tables & metadata to Web pages (authoritative) is really important. 11:05:37 Stuart: We're been using this word "metadata" which leads us to schema information. In RDF world, we can click through to it & immediately see it. 11:06:14 … Using RDF model, you don't have to scramble all over the Web, rather, you get bits of schema info back because it is carried *with* the data. 11:07:15 … Highlighted the perils of carry possible too much provenance information that it drowns out the important data itself. 11:07:17 Quite simply, tabular data requires a lower cognitive load to work with. Most people can't be bothered to learn to think in graphs. So tabular is more open because it's easier to comprehend. 11:07:34 aside: embedded metadata (facebook opengraph, schema.org) is getting published because it is getting used 11:07:47 cjg I wonder how much of that is because our computer science training wasn't very graph-focused. Next generation might be different? 11:07:54 i don't buy the argument that it needs to be separate ... 11:08:13 Questions from the audience ... 11:08:59 Ivan: When we speak of metadata, my biggest issue is what vocabularies to use. It is the biggest problem we have to solve, even more important than the data format/model … if we had widely available vocabularies, it would solve many problems. 11:09:13 HadleyBeeman: I'm talking about the people who maintain my data. They are *not* computer scientists… they are in finance, buildings & estates, catering... 11:10:03 http://data.southampton.ac.uk/dataset/catering.html 11:10:18 Rufus: If you meet most developers, and start talking about vocabularies, "they'll run for the hills." Been part of long countless fights on what vocab. Suggested a new site called http://GiveMeTheDamnSchema.org as a joint project of cygri and Rufus ;-) 11:10:23 cjg: Ah, I see. Yes, different user base there. 11:10:55 I went to see what they already had, tidied it all up in excel and moved it to google spreadsheets so it was easy to grab automatically. 11:11:15 andyhedges has joined #odw 11:11:15 … What is the minimum to make CSV files useful. Just give me the basics, string, integer. This is *our* problem, not publishers. I'm all about 'reducing the time' … open vs. closed data. 11:11:25 problem hasn't been schemas per se, as much as it has been schemas divorced from their actual use 11:11:28 … Licensing is a lower priority for many. 11:11:40 … Ease of publishing is king 11:12:00 Also, I want to create a collection of SPARQL queries which produce useful spreadsheet downloads for humans to consume. Secretaries are a whizz with Excel, but only if the file loads first time. Telling them TSV can be "easily imported" is already outside their comfort zone. 11:12:01 … Our mission is to reduce the cost & RDF, at the moment, is not doing that. 11:12:59 Omar: If we want to bring data together, we have to harmonize into a common model. I don't know whether developers should have to be encumbered with that responsibility. But it is a real problem to solve. 11:13:51 Bhyland notes, (not in a comment), there is a wide spectrum of opinions in the room & that is good to stimulate that discussion. Deepening understanding is key to all of this. 11:14:13 AndyS has joined #odw 11:14:34 Stuart: Finding the stuff in the first place, with schematic markup answering provenance information, is critical to solving the hurdles we face with better use of open data on the Web. 11:15:18 cjg: I played with SharePoint/Excel integration yesterday, and it looks like you can get Excel to live-update from SharePoint lists; I suppose something similar could be done with s/SharePoint/SPARQL endpoint/ 11:15:35 John Snelson: Vocabularies have their place, but search is a great way to find data that is not expressed perhaps as nicely as we'd like... 11:15:36 then SPARQL would be truly Enterprise™ 11:15:43 masao has joined #odw 11:16:42 it would also be possible to embed the metadata for a table in a second sheet of an XLSX/ODS file, instead of prepending it to a CSV file 11:16:46 cjg_ has joined #odw 11:16:52 Questions from the mob: You've got to help represent/model data, but that is not the entire story. It is a "horses for courses" kind of thing. Please be careful not to reinvent RDF with JSON glasses on. 11:16:55 pascalRomain_ has joined #odw 11:18:17 IBM guy - Dealing with data is hard. It is harder than process. We won't solve problems with data exchange standards alone. One thing we haven't heard about today is Best Practices and Architectural processes. We need to rise above data formats and really focus on data patterns, best practices. 11:18:22 I have this horrific image of people creating n-triples documents in Excel... 11:18:42 cjg, i saw that being done *a lot* at the bbc 11:18:46 gah! 11:18:49 cjg: why would that be horrific? 11:19:14 for one thing, excel plays silly buggers with certain values. 11:19:24 Bhyland to 'IBM guy' - let's talk real soon — there is Best Practices work, albeit nascient, underway within W3C Gov't Linked Data working group & we'd welcome your input. 11:19:47 We have real trouble getting people to enter phone numbers without it getting muddled. 079671234567 gets converted to an integer as does +44.... 11:20:53 Rufus: Described state of the world that is very fragmented, messy & dirty and urges us to not look for [utopian data model] that everyone if required to use. 11:21:58 Tyng-Ruey: RE: Best Practices, validators would be helpful to check data representation is correct. Need: Better validators (Note to PhilA). 11:22:24 rrsagent, make minutes 11:22:24 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 11:22:32 Shout out for http://app.easyopendata.com/ - for converting live google spreadsheets into XML or RDF/XML etc. 11:22:48 Oo, fun, cjg. Thanks. 11:22:50 Omar: "I think we've been spoiled by the Web" because search engines have done a good job. The question is, can we make this Web of Data thing work such that we publish our metadata & data and have it easily found. This is the question. 11:22:57 cjg: spreadsheets are for calculations, not data. CSV is a format which people use with spreadsheet programs, thus not suited for the job. Got your point? 11:23:43 Peter Murray-Rusk: To Omar - what do you do with things are labelled as tables but really are not tables? 11:23:50 yeah, maybe we need a nice "CSV" editor? 11:24:04 Or even a "table" editor, using PMR's description. 11:24:07 Omar: Smart people are working on it … it's complicated. 11:24:22 cjg: thought of it as well already 11:24:36 basically a cut-down google docs. 11:24:46 cjg: open refine? ;) 11:24:47 BartvanLeeuwe has joined #odw 11:24:50 John Snelson: Need to be able to break out & work with data in a schema-less fashion. 11:25:04 with a magic table heading 11:25:39 John Sheridan asked, in the world of tables & CSVs and [screw the metadata], how are you prepared to deal with the license matter? 11:26:39 Rufus: I didn't say, 'screw the metadata'. Rather, we need simplicity and innovation about process. He suggested having multiple parties be part of the "packaging process". 11:28:02 … Clearly a license has to come from an authoritative source. Gave example about data from Bank of England. Two important points, we need minimal metadata and … [some one else augment please, scribe missed second point] 11:28:12 *if* the source of the metadata is the same website as the data then that's probably good enough for me. 11:28:47 Wrap up from panelists - 'wear your data on the outside, use HTTP URIs to describe things if putting on the Web.' 11:29:00 s/data/schemas/ 11:29:16 John: Great opportunity for tool developers to liberate data. 11:29:23 Topic: Lightning Talks with a linked data theme 11:29:28 Scribe: Steven Pemberton 11:29:30 End of panel facilitated by Jeni. Thanks all. 11:29:35 scribenick: StevenPemberton 11:29:36 I have a problem with the fact that the data are/is being able to be processed through quick bash scripts, or other low barrier scripting languages, but the meta-data needs a json parser 11:30:07 Someone else able to scribe, please? Pretty please?? 11:30:13 Topic: Linked Data, Open Data and Big Data: Understanding the need for all three, Mark Birbeck, Sidewinder Labs 11:30:34 rrsagent, make minutes 11:30:34 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 11:30:35 jpcs1 has joined #odw 11:31:46 markbirbeck1: I am from a semweb background 11:32:12 ... software developer for decades 11:32:23 johnlsheridan has joined #odw 11:32:34 ... [lists examples of RDF-based software project he has worked on] 11:32:45 s/project/projects/ 11:33:00 ... also involved with RDFa at W3C 11:33:11 ... you can tell I'm setting things up to have a good moan 11:33:30 ... usually data not available, or in inconvenient formats 11:33:35 trc has joined #odw 11:33:38 markbirbeck1: ooh, a jobs ontology. we wrote our own having found nothing in the wild (https://data.ox.ac.uk/id/dataset/vacancies) 11:33:40 ... or not linked 11:34:20 ... Lessons - 11:34:37 ... - need a big cultural change to get open data 11:35:04 ... - spreadsheets aren't that bad, don't need to wait for RDF 11:35:35 alex: http://schema.org/JobPosting 11:35:38 ... but the timeframes were a big issue 11:35:58 edsu: ah, cool; thanks :-) 11:36:14 ... - Join question. Linked data would be great, but consistent code would be enough 11:36:27 hmm, is there a schema.org->RDF mapping? there must be... 11:36:50 cjg, http://schema.rdfs.org/ 11:36:59 cjg: there is, but really who cares? 11:37:06 ... Big data is relevant, lessons learned from that are useful. 11:37:34 floppy has joined #odw 11:37:50 CaptSolo has joined #odw 11:37:51 ... Open data doesn't need to be RDF, use context 11:38:37 ... only when you cross (company) boudaries, do thinks like schemas become importnant 11:38:42 s/nant/ant/ 11:39:03 Topic: Publishing Linked Data Requires More than Just Using a Tool, Raphaël Troncy, Serena Villata & François Scharffe, Eurecom, INRIA wimmics, LIRMM University of Montpellier 11:39:33 markbirbeck has joined #odw 11:40:13 edsu; me as we've just stared publishing vacancy data last week! Making it Linked Data is useful as it can cross-reference to our URIs for various departments & faculties. 11:40:25 timbl: when you mention experience you've had, please say who you are/were working for, was it big or small project, public or private, et. 11:40:32 There goes TimBL again about context, context, context! ;-) 11:40:46 Metadata for our conversations. :) 11:41:13 who'd have thought context mattered for data eh bhyland? 11:41:16 markbirbeck: There was a layered approach to it in my case, people who had bought in but didn't know enough, which was worse 11:41:39 ... but NHS in my case was an example, timing was bad because of looming cuts 11:42:08 ... but I was naive too about the issues involved about publishing certain types of data and aggregation 11:43:29 TimBL: Context is important. Users in intelligence community won't consider using data without provenance, won't even start the conversation or analysis. 11:44:03 Raphael: Most are tool builders here, but we need more than tools 11:44:27 ... this a report of what we have done at a "datalift data Camp" last year 11:44:42 ... lifting data to 5 star status 11:45:10 ... It worked a bit, but was a good learning experience 11:45:42 ... varied data source types 11:45:54 ... and varied companies, with different needs 11:46:40 ... Datalift is a package with single click download 11:46:55 ... cross-platform 11:47:03 ... [shows workflow] 11:47:09 yoshiaki_ has joined #odw 11:47:18 ... converts to RDF 11:47:27 ... and then the interlinking 11:47:44 ... used for two large data collections in France 11:48:00 ... DIfficulties are how to choose the right vocabulary 11:48:17 ... rdf conversion, URI schemes to adopt 11:48:47 ... automatic detection of datasets to link to 11:49:04 ... LOV initiative, 260+ vocabs 11:49:20 ... now open sourcee! 11:49:33 ... http://lov.okfn.org 11:49:42 s/ee!/e!/ 11:49:59 I love how a French speaker says "LOV bot" as love boat. 11:50:19 ... Conclusion - multilingual vocabs important 11:50:27 ... hide complexity of sparql 11:50:37 ... eg QAKIS 11:50:47 ... Shape files are important 11:51:08 masao has joined #odw 11:51:11 ... INSPIRE directive and W3C GLD vocabs need to be covered 11:51:22 Since Open Data is a means to several valuable ends, IBM is talking to our clients about thoughts of "becoming a Contextual Enterprise" and we emphasize the critical need to dynamically assemble context for every key input and output of their work, including the context of external data they import. See http://www.research.ibm.com/files/pdfs/gto_booklet_executive_review_march_12.pdf for very high-level summary of our recently released Global Technology Outlook. 11:51:26 ... GTFS/DSPL formats 11:51:43 Topic: Linked Data at the Science Museum, Tristan Roddis, Cogapp 11:52:04 Tristan: We work with cultural heritage. Will talk about science museum now 11:52:11 rtroncy has joined #odw 11:52:11 ... also a plea for help 11:52:36 ... Science Museum is august and venerable, with loads of internal systems, we are trying to consolidate them 11:52:55 rtroncy has joined #odw 11:53:11 ... we extract, and convert to linked data 11:53:20 ... triple store 11:53:45 rtroncy: how active is the development of Datalift? I haven't seen a lot of activity on the SCM 11:53:57 ... built a data model, in cooperation with British Library, British Musem [others], see the paper 11:54:09 .. use that to drive the website 11:54:29 ... my plea for help is what should be the next steps 11:54:45 ... how can we make it more open? 11:55:02 ... Publication strategies, stable URIs, dereferencable etc 11:55:15 ... IS the data model interoperable 11:55:21 s/IS/Is/ 11:56:02 Topic: Open Linked Education: a new Community Group, Madi Solomon, Pearson 11:56:23 Madi: I am new to W3C, and open linked data devotee 11:56:51 ... Pearson is a publishing company, owns Financial Times and some Penguin books. 11:57:04 ... I think we are the first W3C publisher member 11:57:15 [applause] 11:57:21 that says a lot 11:57:39 Madi: There is a new Community Group at W3C with 23 members 11:58:09 [link here to CG please] 11:58:19 http://www.w3.org/community/opened/ https://twitter.com/search?q=%23ODW13https://twitter.com/search?q=%23ODW13 11:58:21 Topic: Questions 11:58:42 Ivana: Raphael, what were the outcomes? 11:58:43 Eek, sorry. Try this: http://www.w3.org/community/opened/ 11:58:48 Madi: Data + education is a natural fit. Whatever we can do to make it easy for students + instructors + open data advocates will together make the world a better place. 11:58:51 +10 11:59:18 Raphael: It was part one of a two part process. We wanted clean data, the next step will happen later this year, to reuse the data to build apps. 11:59:37 s/advocates will/advocates to get 12:00:02 Raphael: Some of data sets are just data dumps 12:00:20 s/Ivana/Irina/ 12:01:27 q1: is there automatic linking between data possible? 12:01:31 s/[link here to CG please]/-> http://www.w3.org/community/opened/ Open Linked Education Community Group/ 12:01:58 MarkBirbeck: It is not just topics 12:02:26 .... do you mean just numerics? 12:02:37 q1: Not necessarily, 12:03:13 MarkBirbeck: This is what I was referring to earlier, for instance trying to identify a company from different versions of its name 12:03:26 ... URIs are a great goal, but you can get there earlier 12:03:56 [SESSION ENDS] 12:05:13 yoshiaki has joined #odw 12:07:21 naomi has joined #odw 12:12:09 AndyS has joined #odw 12:28:03 craig552uk has joined #odw 12:32:15 craig552uk has joined #odw 12:35:07 yoshiaki has joined #odw 12:52:02 cjg has joined #odw 12:58:45 StevenPemberton has joined #odw 12:59:05 floppy has joined #odw 12:59:40 floppy1 has joined #odw 13:00:30 rjw has joined #odw 13:02:04 HadleyBeeman has joined #odw 13:02:11 masao has joined #odw 13:02:21 cjg has joined #odw 13:02:25 scribenick: hadleybeeman 13:02:33 JeniT has joined #odw 13:02:43 fumi has joined #odw 13:02:50 Chair: LeighDodds 13:03:13 rtroncy has joined #odw 13:03:13 topic: Data Interoperability 13:03:37 pieterc has joined #odw 13:04:11 yaso_ has joined #odw 13:04:17 pieterc has joined #odw 13:05:11 daveL has joined #odw 13:05:41 Kal Ahmed: Intro to talk on OData 13:05:58 cerealtom has joined #odw 13:06:01 … OData is a standardised protocol for consuming and creating data APIs. -odata.org 13:06:20 … originally conceived by Microsoft, this is bringing it into being a common protocol. 13:07:11 … Odata is entity-centric. Comes from .NET developers with tables of data. STandard itself defines how you publish your metadata: service metadata and schema. 13:07:13 scribenick: HadleyBeeman 13:07:33 yoshiaki has joined #odw 13:07:37 … OData has a URL-based syntax for access. 13:08:03 … Includes inline expansion between entities 13:08:08 rrsagent, make minutes 13:08:08 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 13:08:18 yoshiaki has joined #odw 13:08:38 … POST a represention to an entity set's URL. PUT, PATCH, MERGE, or DELETE. 13:09:04 I've never heard of MERGE or PATCH before… 13:09:17 PATCH is only just a Thing, isn't it? 13:09:24 johnlsheridan has joined #odw 13:09:28 … Other nice features: combines metadata properties with a special media source URL. Named streams. Ability to embed your own custom actions and functions and expose them as URLs 13:09:35 PATCH is a proper thing, haven't heard of MERGE 13:09:47 bschloss has joined #odw 13:09:54 only just> March 2010, according to http://tools.ietf.org/html/rfc5789 13:10:06 … There are a lot of reasons to like OData. You can reliably discover the schema. Clients are all linked. Easy to experiment using those URLs. 13:10:13 alex: The DataTank supports PATCH 13:10:21 … There is a javascript serialization format 13:10:39 alex: (tdt is a RESTful data adapter project in PHP) 13:10:42 … There is a growing set of OData consumers. GUI controls and libraries. 13:10:45 alex: (it sounds worse than it is) 13:11:07 http://msdn.microsoft.com/en-us/library/dd541276.aspx "The remainder of this section defines a custom HTTP MERGE method" 13:11:09 … Criticisms of OData: Service definitions tend to be siloed. Links don't tend to go outside the data service. Don't use any shared ontologies. 13:11:49 … Another slight criticism: because of its history as being pushed by Microsoft, it's seen as being vendor specific. Not true; standarisation now under OASIS, other contributors 13:12:18 stressindikator has joined #odw 13:12:27 … Why do developers use it? We live the features and the flexibility of RDF/SPARQL. We were disappointed with the Linked Data Platform proposals and the flexibility it would give. 13:12:50 … We wanted it to be a declarative configuration only, ultimately to do that config automatically. 13:13:11 … Previous attempt: LINQ - to - SPARQL, hand crafted as c# 13:13:20 mig_garcia has joined #odw 13:13:38 … This implementation: Proxy service for a SPARQL endpoint. http://github.com/brightstardb/odata-sparql 13:13:39 lechatpito has joined #odw 13:13:44 AndyS has joined #odw 13:13:56 pieterc has joined #odw 13:14:34 … Key part of this: the annotations. They're in the OData spec. Defined for: URI namespace for entity primary keys, URIs for entity typoes, properties and directionality of links 13:14:35 naomi has joined #odw 13:14:46 bhyland has joined #odw 13:15:03 … Annotations are visible to the consumer, mappings done against the SPARQL endpoint are visible 13:15:30 … Allows you to reconstruct the source triples you've just queried, if you'd ever want to. 13:16:24 … Implementation issues: Our naive approach: if you ask for an entity, a DESCRIBE will give you what you want. It was too unspecified, so you have to use CONSTRUCT, which led to sroting and identification issues. 13:16:31 roger has joined #odw 13:17:03 … OData allows the server to do paging. If there's been a server-side limit imposed, you don't know that. 13:17:43 … Biggest implementation issue: because we're turning primary keys into URI identifiers, every entity in the entity set has to have the same base URI. Not a problem in most cases, but potentially. 13:17:58 … [Example query to select a simple film] 13:18:07 pieterc` has joined #odw 13:18:10 … [Example query to enumerate films] 13:18:49 … [example query to show property navigation] 13:19:09 jpcs1 has joined #odw 13:19:49 … That's all leading up to a bunch of questions. First and I'm most interested in discussing here: What is the group's seen importance of standards in interoperability? Do standards need to interoperate? Do different standards body's standards need to interoperate? Whose responsibility is it? 13:20:15 francois has joined #odw 13:20:49 AndreaP has joined #odw 13:20:51 … More questions: what could the W3C LDP WG learn from OData and vice versa. OData changed in response to feedback/requirements. Now on third iteration… Should these requirements and use cases be shared between groups? 13:21:34 … 13:22:01 … Finally, is there a shared meta-model for entity-oriented view of data resources between the two? 13:22:14 LeighDodds: Do you have a sense of uptake? 13:22:36 (uptake of OData) 13:22:48 Kal: hard to tell because search discovery of OData endpoints is hard. Probably more not visible to the Web than those that are. 13:23:09 [I think the SAP ERP platform, recent version, has APIs to get information as ODATA] 13:23:41 pieterc` has joined #odw 13:23:46 ivan: There have been several attempts to get these groups together. For all kinds of personal reasons, it did not work out. There is a community group at W3C on OData vs RDF; the group is silent, empty. 13:24:05 Kal: It shouldn't be "OData vs RDF". They should be coexistant and work together. 13:24:42 yaso has joined #odw 13:24:47 My question is (and I'm not being snarky or flip), why OData? Isn't this MS trying to redo RDF? RDF has matured and is well-documented. It is not perfect & use is far from ubiquitous however, why fragment? 13:24:58 subtopic: Neil Benn, Fujitsu. LOD approach to engineering health-sensory datasets/ 13:25:49 Neil: I'll focus more specifically on health and health sensor data. I've recently joined this group, and this is one of the projects we're working on. 13:26:37 … We're working on a cloud platform for large-scale graph storage. Public and private data. That seems to be a tension that is coming across throughout today. Therefore, Linked (Open|Closed) (Big) Data 13:26:54 Mentions Linked (Open|Close) (Big) Data and mentions Fijitsu and DERI Collaboration on Linked Data Global Repository 13:27:03 … We've been working with DERI on a CKAN-like LInked Data Global Repository. Faster and more searchable. 13:27:30 … We're also involved in the W3C LDP WG 13:28:43 … With the University of Singapore, we've been working on health care sensors. Temperature monitor, heart rate monitor, establish patient history. Challenge: how to combine sensor data with patient specific data from their health record, which might be different to medical best practice, clinical recommendations, etc? 13:29:16 … We're making this sensor data linkable - 10m triples per person per week, for example - standardise, and link to data about effective drugs. 13:29:45 … Announced in Nov, just working out how to do this. Open, closed and anonymisable data involved. 13:30:51 … We are handling temporal data and binary data. Do we want to convert binary sensory data, with an established community of tools, into RDF? Maybe not. If not, how to work with the binary and the (other) linked data? 13:31:05 … These things keep me… well, not awake at night, but certainly busy during the coffee break. 13:31:19 floppy has joined #odw 13:32:03 masao has joined #odw 13:32:10 … Non-technical challenges: main motivator for this paper: most open health data is on hospital numbers, costs of services, etc. But these are questions for policy makers; not as much emphasis on medical research. 13:32:42 yaso_ has joined #odw 13:32:57 … Found data on ECG and HBR stuff… but not as much emphasis of having a "broad church" of open medical health care data to generate further epidemiological and clinical research. 13:33:30 … Generating these datasets is labour-intensive. One researcher said teams of researchers working on a dataset would be useful… How to do on the Web? 13:33:35 floppy1 has joined #odw 13:33:57 … Could be that we have more administrative hospital data than clinical data because it's easier to lobby governments than universities and researchers? 13:34:50 … There still isn't much best practice on this. Vocabularies, dataset engineering patterns. We have patterns for building modular software… is there an equivalent here? 13:35:19 … Ex: There is an ECG ontology I came across… should I use it? 13:35:44 Questions 13:36:09 markbirbeck has joined #odw 13:36:14 BillR: You should look at Linked Data Patterns, LeighDodds is one of the authors 13:36:47 Discussion with panel, including Albert Meroño-Peñuela 13:37:43 http://patterns.dataincubator.org/book/ 13:37:46 Albert: We work with historical censuses, encoded in thousands of .xls spreadsheets. We would like to uniformly query them, but they are extremely messy. We'd like to transform them into RDF Data Cube and other vocabularies using SPARQL queries? 13:38:44 Question: Bob Schloss: The value we seem to be talking about is mashups between datasets with unexpected results. Mapping was one of the first join points. What other join points do you see and do you agree this is critical? 13:38:51 BartvanLeeuwen has joined #odw 13:39:15 markbirbeck1 has joined #odw 13:39:55 Kal: Yes, I agree. Increasingly, I see a lot of time-series value type data, sets combined in a way to expose latent knowledge. Biggest problem is vocabulary interoperability. Odata doesn't have them so we can't do conceptual joins with data tagged with different systems. 13:40:07 rszeno has joined #odw 13:40:18 Bob: Let's reuse the requirements gathered from XBRL in the Financial industry. They do have publicly listed busineses. 13:41:21 Neil: Open data is administrative, government-driven. People want to answer local questions, so that has driven a lot of the applications. But in that healthcare example, it's not geographically-specific. New disease patterns may not be tied to parts of a city. 13:41:43 lottebelice has joined #odw 13:42:26 … With regard to the vocabularies question… I don't want to learn about all the vocabs out there. In the same way I can modularly take a bit of a software library to see what's in it, I'd like to do the same with a vocabulary. I want to conceptualise my data first, and modularly pick a vocabulary. 13:42:47 Kal: The individual is an interesting join-point. For governments and otherwise. 13:42:59 rszeno has left #odw 13:43:22 roger has joined #odw 13:43:24 Albert: In some domains, historical data is so badly degraded… and it may not have been intended to be comparable. 13:43:58 rtroncy has joined #odw 13:44:29 TomHeath: Re data engineering patterns: we do need to go further than Leigh's book. Hack-y stuff (download, GREP, etc), ad-hoc processes. Things going on in the Hadoop community to describe these processes 13:44:59 Neil: The term dataset engineering patterns… [coining a new phrase] 13:45:01 markbirbeck has joined #odw 13:45:47 Michael (from the EC): to Neil: re the link between closed/sensitive/open data… Are you looking at aggregated personal data that then can be opened? As in other areas of sensitive public data 13:46:48 Neil: we don't quite have a generic process for anonymising sensitive data. Some organisations do that… I'm just in the early stages of learning the issues around that. 13:47:33 questionasker?: concerned about applying the label of "open data" to data that's locked behind a query API. Do you share my concerns? 13:48:33 Kal: OData entity set that conforms to the standard is enumerable… It's an ATOM feed with Next links in it. You can download it. Also, a data dump isn't any better — you're relying on the server's capacity to provide the data and the data being up to date. 13:48:44 … I can see your point but I think it applies to all open data. 13:49:04 questionasker?: If I were going to mortgage my house to fund a startup on this data, I would see this as a problem. 13:49:21 Kal: Of course, there are different applications. 13:49:27 [Closing session] 13:49:55 Topic: Lightning Talks with a geospatial theme 13:50:03 scribenick: rtroncy 13:50:11 scribe: rtroncy 13:50:24 Chair:Alex Coley 13:50:51 jpcs1 has joined #odw 13:51:39 Alex introducing the session, composed of three talks 13:52:35 on Jay le Grange - GeoKnow: Leveraging Geospatial Data in the Web of Data 13:52:46 [http://www.w3.org/2013/04/odw/odw13_submission_15.pdf paper] 13:53:20 EU Project GeoKnow: http://geoknow.eu/Welcome.html 13:53:51 markbirbeck1 has joined #odw 13:53:53 ... inspired by earlier work on transforming OSM into Linked Data 13:54:03 s/OSM/Open Street Map 13:54:20 ... 3 major sources of open geospatial data 13:54:50 yaso_ has joined #odw 13:54:50 hideaki has joined #odw 13:55:05 ... spatial data infrastructures (compatible with almost all GIS), open data catalogue (SHP, KML files), crowdsourced geospatial data 13:55:08 ldodds has joined #odw 13:55:28 ... ontologies: basic geo vocabulary, GeoOWL ... and GeoSPARQL 13:56:26 ... efficient geosparql RDF querying, fusion and aggregation of geospatial RDF data, visualization and authoring, public-private geo-spatial data (sync workflows) 13:56:47 ... aim to provide a suite of GeoKnow Generator tools 13:57:10 ... two use case scenarios: e-commerce and supply chain 13:57:22 ... the GeoKnow generator is expected by December 2013 13:57:49 RRSAgent: draft minutes 13:57:49 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html rtroncy 13:59:19 ... see also: http://blog.geoknow.eu/ 14:00:48 Michael Lutz - Interoperability of (open) geospatial data – INSPIRE and beyond 14:00:57 [http://www.w3.org/2013/04/odw/odw13_submission_58.pdf paper] 14:01:58 Michael: INSPIRE in a nutshell 14:02:26 ... legal framework for establishing an infrastructure for spatial information in Europe 14:02:33 ... 34 spatial themes 14:02:57 ... implementation 2009-2020 14:03:32 ... there is a growing interest in creating innovative products and services based on INSPIRE and other data 14:05:07 ... we realize that with INSPIRE we cover a lot of topics of this workshop 14:05:33 ... key issues with INSPIRE: enriching INSPIRE data models with application specific business data 14:06:27 ... example: urban planning, waste management plans, environmental impact assessment, risk management on top of geo data 14:07:43 ... beyond INSPIRE, traditionnally link with GIS format and XML ... how we move towards RDF 14:08:00 ... how to create and manage persistent identifiers 14:08:30 ... implications of opening up data for the organisations: governance, long term commitments, etc. 14:08:36 Albert has joined #odw 14:09:07 ... how to address those issues? ISA = Interoperability Solutions for European Public Administrations program 14:09:43 ... see also: ARe3NA (INSPIRE reference platform), EULF (EU Location Framework) 14:09:58 ... W3C LOCADD community group 14:10:31 ... advertisement for the INSPIRE conference in Florence 23-27 June 2013 14:11:57 ... ISA program http://ec.europa.eu/isa/ 14:14:16 Mark Herringer - Open Data on the Web and how to publish it within the context of Primary health care 14:14:24 [http://www.w3.org/2013/04/odw/odw13_submission_51.pdf paper] 14:14:31 Panel opened 14:15:06 bhyland has joined #odw 14:15:25 unknown: question about identifiers, can we expect a better framework, e.g. URI in INSPIRE ? 14:16:00 Michael: in INSPIRE, there are 2 types of identifiers 14:16:13 ... for data objects and for real-world things 14:16:47 StevenPemberton has joined #odw 14:17:17 ... we relax recently how to write those identifiers and enable http identifiers 14:17:30 Thank you Michael Lutz on URIs 14:17:36 st has joined #odw 14:17:59 markbirbeck has joined #odw 14:18:15 roger_ has joined #odw 14:19:09 Raphael: there are a number of initiatives that try to take part of UML diagrams of INSPIRE and build RDF schema, see e.g. efforts from Laurent Lefort and others 14:19:41 ... are there plans to have an official schema in RDF for INSPIRE ? 14:20:02 Michael: yes, we will organize a workshop where everybody presents its modeling ... and we wish to have an agreed upon model 14:20:28 RRSAgent: generate minutes 14:20:28 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html rtroncy 14:21:09 HadleyBeeman has joined #odw 14:22:01 naomi has joined #odw 14:29:14 cjg has joined #odw 14:30:05 laurent_au has left #odw 14:32:19 cjg has joined #odw 14:39:42 johnlsheridan has joined #odw 14:39:57 rjw has joined #odw 14:40:26 fumi has joined #odw 14:40:52 HadleyBeeman has joined #odw 14:40:55 yoshiaki has joined #odw 14:40:59 cjg has joined #odw 14:40:59 JeniT has joined #odw 14:41:00 StevenPemberton has joined #odw 14:41:07 jpcs1 has joined #odw 14:41:23 yaso has joined #odw 14:42:02 rrsagent, here? 14:42:02 See http://www.w3.org/2013/04/23-odw-irc#T14-42-02 14:42:14 yoshiaki_ has joined #odw 14:42:14 Lotte Belice about Open Culture Data 14:42:27 Scribe: yaso 14:44:57 AndreaP has joined #odw 14:44:58 stressindikator has joined #odw 14:45:38 stressindikator has left #odw 14:47:17 MLutz has joined #odw 14:47:39 bhyland has joined #odw 14:48:25 naomi has joined #odw 14:48:35 albertm has joined #odw 14:49:09 yaso has joined #odw 14:50:10 bhyland has joined #odw 14:51:55 scribenick: hadleybeeman 14:52:06 Topic: Panel: The Business of Open Data 14:52:33 Johnlsheridan: It's 2020 and we've seen the failure of the world's first multibillion dollar open data corporation. How did this happen? 14:52:36 Yes, I'm with connection problems 14:53:08 Conor Riffle: We've been looking at lots of business models. Sponsorship would be hard to scale to that level. 14:53:15 yaso_ has joined #odw 14:53:31 … Also look at people like Google who make tons of apps and sell ads on that. 14:54:01 JohnLsheridan: which of the eight business models Michele has identified could scale to that level? 14:54:50 yaso__ has joined #odw 14:55:07 Miguel: Usually, all the four actors are able to manage a huge amount of data. We have some enablers - usually they are scalable - but they do not serve end users. They're in a wholesale position in the value chain. Examples: Microsoft, Socrata. 14:55:27 … Many of them have other business lines, even outside the boundary of public sector information. 14:56:31 Irina: I think you'd want lots and lots of smaller companies, not one big one. As small music app companies are threatening the big distributors, a big company doesn't fit. 14:57:10 yaso has joined #odw 14:57:16 Bart: The Fire Department wants to be the authoritative source of information. They won't make a business out of it, but they will engage to have usable data. 14:57:50 Michele: Risk to opening up data… fear of losing control. But benefit: they will be seen as the authoritative source. We see both. 14:58:19 Lotte: open data can bring big benefits to companies. 14:58:45 yaso_ has joined #odw 14:59:39 questionasker?: Do we all agree that we should build public infrastructure, basic datasets to build business models on top of… If we don't do it fast, a big multi-billion company maybe wants to become a public infrastructure provider? Or the market will collapse and transform in another way. We, as a community, need to identify the basic datasets which will be the "streets" of open data. 14:59:59 JohnLsheridan: What are the basic datasets of interest for fire services? 15:00:21 Bart: Address data. Real streets. We don't have "highways" for open data yet; we have "rural roads." 15:00:49 … Large companies taking over scares the Fire departments as well. "What if a company over in America is holding our data?" An important discussion to have. 15:01:05 Johnlsheridan: Do you see CDP becoming that sort of infrastructure provider? 15:02:02 Conor: I think we are. Especially where companies are contributing pollutants to that atmosphere, it impacts all of us. But we see it's useful where people can make money out of it. Investors will use it. But there's more to do with it. We need a hybrid model: some monitisable, some open. 15:03:11 Bernadette: I'd recast the question: It would give me great joy if, next year, there are 20 companies 10-100 people with $2-20m in gross revenues who are using this technology to share information, for-profits (not grant-funded). We don't need yet another social network or cow-tipping site. 15:03:47 … If they are venture-funded, it would be with a social enterprise angle. 15:04:31 Chris Metcalf: In the US, I feel like we're seeing the steam come out of pure open data. We need to show the benefits, which are often business. We work with small businesses to do that. We need to focus on that in the community. 15:05:44 Bob: Infrastructure isn't always provided by regulators, grant makers and hackers/coders. It's sometimes created by lawyers and judges. I think some orgs and agencies are hesitating to publish open data because they're afraid of inaccurate records and resulting harm and subsequent lawsuits. We may need some case law to determine this. 15:05:51 bhyland has joined #odw 15:05:53 … To Conor: because your data can impact stock price, do you have T&Cs to cover that? 15:06:48 yaso has joined #odw 15:07:02 Conor: We do have cleverly-written T&Cs. Many many companies to agree to them. Other orgs can learn from our lessons: we don't own the data submitted to us. 15:07:44 … To Chris: Yes, we need to crate value from things built on public data, but also as a provider: how can we increase the value all along the chain? 15:08:44 Michele: What we see: one the benefits is people correcting data and pushing it back to the publisher. Enhancing it, geotagging, improving our metadata. 15:09:32 … There was a company who wanted to make money out of the data, and we want them to succeed. But this is a public sector answer, I realise. 15:09:44 DeirdreLee has joined #odw 15:10:16 Lotte: Do not forget SMEs like ours: manufacturers, consulting services, pharmacies… they are the ones who will recreate the value in the data. 15:10:21 albertm` has joined #odw 15:10:30 … New standards, new protocols, new releases, new things. 15:11:15 questionasker?: This isn't a level playing field. In the development of the Web, it's a case of survival of the fittest, driven by quality, quantity and cost. 15:11:55 … Chances are high that whoever that company is in the future, they are here today. I'm hearing that open data should be a communal type where everyone has a chance. Those at the front will probably stay there; this is a call to them to maintain the lead. 15:12:10 AndreaP has joined #odw 15:12:31 s/questionasker/phil tetlow 15:12:33 AndyS has joined #odw 15:12:52 questionasker?: Can we learn from the open source business models? 15:13:02 s/questionasker/Thijs/ 15:13:11 Miguel: Yes, one of our models is called "open source like". 15:13:22 s/Miguel/Michele/ 15:14:33 … where reusers do not pay. As with Open Corporates, Licenses allowing non-commercial reuse. 15:14:56 Conor: Ask: How did the open source software people monitise it? A lot of them got burned. 15:15:19 Thijs: Training, consultancy, 15:15:56 Bart: In the Netherlands, the interesting datasets are often 3GB downloads. They will pay someone to maintain it in a usable form for them. That's the added value. 15:15:58 rrsagent, make minutes 15:15:58 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 15:16:22 Bart: Services model similar to what RedHat does — good packaging and great support for enterprises. 15:17:11 s/tetlow?:/tetlow: 15:17:24 s/Thijs?:/Thijs:/ 15:17:48 Irina: CKAN is both open source and open data. How do you make it sustainable for businesses who publish data? Isn't that only an issue for businesses who only sell data? If it's a by-product of something else, it may drive more traffic 15:18:52 s/monitise/monetise/ 15:18:57 John: final thoughts 15:18:58 JeniT_ has joined #odw 15:19:40 Lotte: We're seeing a shift from the fear of publishing to the network of data and content. Besides data, I look forward to opening more videos and content. 15:20:17 atlets has joined #odw 15:20:51 Michele: The first enabler is the government itself. Gov has to build the governmental infrastructure. Inspiring motto from Federal CIO of USA: Everything should be an API. 15:21:26 ci has joined #odw 15:21:32 … 1st step: publish open data, 2nd step: bring gov into the business model. 15:22:26 albertm` has joined #odw 15:22:28 … data reuse. A shared data model across agencies. 15:22:31 yaso has joined #odw 15:22:51 Miguel: SMEs need data to create value and generate new business lines. 15:23:31 Bart: Fire fighting data work is 20% technology and 80% people and politics. I'd like to see this reversed. 15:23:52 Conor: We need to get the business model right both for the providers and users. 15:23:56 [Session ends] 15:24:01 rrsagent, draft minutes 15:24:01 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html HadleyBeeman 15:25:30 Scribe: Deirdre Lee 15:25:39 scribenick: DeirdreLee 15:26:17 markbirbeck has joined #odw 15:26:33 Scribenick: hadleybeeman 15:26:46 Topic: The Exhibitionists 15:26:50 Chair: Julian Tate 15:26:54 Topic: Opening up the BBC's data to the Web, Sofia Angeletou 15:27:01 masao has joined #odw 15:28:37 scribe: ivan 15:28:52 Opening up the BBC's data to the Web, Olivier Thereaux, Sofia Angeletou, Jeremy Tarling and Michael Smethurst 15:29:13 http://www.w3.org/2013/04/odw/odw13_submission_22.pdf 15:29:36 Sofia: The problem with the older approaches was that the material was not ours: 15:29:48 … we have only certain freedom to use it for some purposes 15:30:24 … another thing we were doing is to use MusicBrainz for the music website 15:30:24 … we do the same thing for the weather website 15:30:35 …. we use a lot of reuse from open datasets 15:30:43 … also from wikipedia for nature and wild life 15:30:50 … we reuse the wikipedia id-s 15:31:02 … because the uris are not static, then the service breaks 15:31:07 … this is a big deal for the BBC 15:31:19 … we cannot blindly rely on dataset and we need editorial control 15:31:28 … these were the first efforts with using LOD 15:31:43 … all of these experiences convinced BBC to invest more into the SW stuff 15:31:48 DeirdreLee has joined #odw 15:31:52 …. eg for the olymic web site 15:32:08 sofia: the sport web site uses about 4 million user a day 15:32:25 pieterc has joined #odw 15:33:04 scribe: DeirdreLee 15:33:19 Sofia: next steps for BBC is to roll-out aproach beyond sport 15:33:45 ... currently working on linking content together on news site 15:33:59 rrsagent, make minutes 15:33:59 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 15:34:22 ... trial from birmingham an black country will be rolled out nationwide in coming months 15:34:28 s/an/and 15:35:26 ... will annotate news items with other pieces of related content 15:35:47 ... would like to roll this out with archival content also 15:36:47 Appreciate Sofia's choice of headline at Google London office, "Google boss defends UK tax record to BBC" with byline "Eric Schmidt defends Google just paying 6M GBP in UK corporation taxes" 15:36:47 ... diagram from presentation shows content from archives, BBC hope to use Linked Data to expose their data in interesting ways 15:37:28 ... BBC have identified some challenges with publishing Linked Data (listed in presentation) 15:38:36 ... what are the drivers for opening up their LD datasets, how to select good quality datasets, and how to meaure success 15:39:15 Alvaro Graves from RPI up next on Democratizing Open Data 15:40:09 Alvaro: Good news, there is millions of Open Datasets on the Web, billions of triples in the LOD cloud 15:40:22 jpcs1 has joined #odw 15:40:34 ... Bad news, there is a lot of inconsistent noisy data out there 15:40:46 ... but this can be solved with standards, etc 15:41:01 ... other bad news is that much of the datasets out there is boring! 15:41:27 ... for example, stale data 15:41:50 StevenPemberton has joined #odw 15:42:03 rrsagent, here? 15:42:03 See http://www.w3.org/2013/04/23-odw-irc#T15-42-03 15:42:29 ... there is also 'unusable' data, that the majority of the general public can use 15:42:53 ... how can those without access to technical skills & expertise make use of Open Data? 15:43:20 .. small-scale communities or journalists? 15:44:36 ... If we look at the Web, in the beginning there was a need for a webmaster to develop web-pages, but then tools like wikis, blogs came along that helped everyone to create web-content 15:44:58 ... this should be possible with Open Data too, to encourage use 15:45:23 ... visualisations are an easy win to get people to make use of Open Data 15:46:28 ... Visualbox, a tool for creating visuallisations based on LD, used in workshop 15:47:24 ... feedback was positive, and people learned quickly. however SPARQL was deemed difficult by workshop participants 15:47:44 ... another complaint was about the quality of the data 15:48:10 ... Call to arms: we need better tools - libraries and APIs for geeks are not enough 15:48:50 ... general public usually have better needs. citizens need to be empowered to use Open Data, so they don't need a PhD in Semantic Web to get started! 15:49:00 ... visualisations are a good way to start 15:49:05 fumi has joined #odw 15:49:13 jpcs1 has joined #odw 15:49:20 bhyland has joined #odw 15:49:53 seems like http://www.tableausoftware.com/products/public is relevant re tools 15:49:56 subtopic: Andreas Koller from Royal College of Art, talking about Opening Open Data 15:49:59 cjg has joined #odw 15:50:28 rrsagent, make minutes 15:50:28 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton 15:50:30 Andreas: background in graphic design 15:51:09 ... wants to discuss graphic design and coding, and tools that allow ordinary people use Open Data 15:51:24 JeniT: I once asked them whether they had documentation for an API for whatever software Oxford had bought, and they pointed me back at our own people 15:51:43 (they didn't seem to do 'open' at that time) 15:52:20 but stuff like http://public.tableausoftware.com/views/Studentstatistics-UniversityofOxford/Yearlysnapshotsummary?:embed=yes&:tabs=yes&:toolbar=yes is cool 15:52:57 alex: you still have to upload your data to them, I think, to use it, so not for everyone, but in terms of interface it's something to look at 15:53:07 ... designers could help with data ownership and data ethics 15:53:54 RE: reference to the saying, "Data is the new oil!", see http://blogs.hbr.org/cs/2012/11/data_humans_and_the_new_oil.html 15:54:09 s;[link here to CG please];-> http://www.w3.org/community/opened/ Open Linked Education Community Group; 15:54:23 ... When teaching students to code, they may have a fear of tools 15:54:27 Jer Thorp, "Any kind of data reserve that exists has not been lying in wait beneath the surface; data are being created, in vast quantities, every day. Finding value from data is much more a process of cultivation than it is one of extraction or refinement." 15:54:41 BartvanLeeuwen has joined #odw 15:55:09 ... having libraries for existing designers' tools would enable easy access to Open Data 15:55:28 ... as would low-level examples and list of data catalogues 15:55:51 ... This is an example of how Open Data could be opened up to another community 15:56:22 ... small effort for Open Data practitioners, but would be of great benefit to other communities 15:56:40 StevenPemberton_ has joined #odw 15:57:22 ... easy access to Open Data would enable designers (and other communities) to see the value within the data and enable them to use it and extract knowledge from it 15:57:48 subtopic: Benedikt Groß, Royal College of Art, Large Scale Data & Speculative Maps 15:57:57 rrsagent, here? 15:57:57 See http://www.w3.org/2013/04/23-odw-irc#T15-57-57 15:58:35 Benedikt shows Data Viz Pipeline 15:58:58 Benedikt: most of what we have been talking about today focuses on the left side of the pipeline 15:59:05 rrsagent, make minutes 15:59:05 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html StevenPemberton_ 15:59:22 ... will show some projects that use Open Data 15:59:54 The HBR article by Jer Throp nicely supports the thoughts of the speakers, (I think), "As we proceed towards profit and progress with data, let us encourage artists, novelists, performers and poets to take an active role in the conversation. In doing so we may avoid some of the mistakes that we made with the old oil." 16:02:48 ... Metrology, visualises the London tube map with Open Street Map data as a mental map, by mapping actual locations to tube map, using mathematical models 16:03:35 He showed the mapping from true life to the tube map, and then reversed the process to make a real map with the same distortions 16:06:09 ... Speculative Sea Level Explorer project Combines NASA data on sea level with map visualisations to show effects of sea levels rising and falling 16:07:10 bschloss has joined #odw 16:07:53 ... sneak preview to m3ta.js, a visual programming language with metaphor to lego-blocks 16:08:04 Fascinating to see what Royal Academy of Art people can do for visualizations. Can less skilled people do something nearly as good. My IBM colleagues are experimenting with a site called Many Eyes 2.0 (beta) at http://www-958.ibm.com/software/analytics/labs/manyeyes/ 16:08:16 suptopic: panel discussion 16:08:28 s/suptopic/subtopic 16:09:38 julian: do you see yourself creating a toolbox for visulaising open data? 16:09:46 albertm` has joined #odw 16:10:24 benedikt: great to release tools, but you can't just release source-code but need documentation and examples too, which is time-consuming 16:11:02 st has joined #odw 16:11:28 Alvaro: you can't just release code/tools/projects, but you are responsible for maintaining it (like kids :) ) 16:11:37 had very good experiences with http://d3js.org/ for data visualisation - very powerful toolkit 16:12:13 Question from audience 16:12:24 @Alvaro, Interesting analogy, Open Source is like a marriage, 'it comes back and you have to answer questions… it is also like children, you cannot let them out into the wild [without guidance]' ;-) 16:12:58 Aivan: if you have to convince CNN in an elevator pitch to use the approach as BBC, how would you do it? 16:13:35 Olivier (BBC, from audience): focus on your own data, and use Open Data where possible to fill the gaps 16:13:59 TimBL: Who publlishes data about their own products? 16:14:42 s/Olivier/yvesr :) 16:15:11 s/Aivan/Ivan/ 16:15:18 ... if people publich data about their own products, there won't be a need for CNN to publish data 16:15:22 albertm`` has joined #odw 16:15:59 s/publich/publish 16:16:24 I invite everyone to publish information about their organization, project, product and/or service on the Web today using http://dir.w3.org. 16:16:25 If you care, it is a entirely Linked Data app. If you don't care, just fill out the form, publish the dir.ttl file produced for you automagically (like FOAF-a-Matic) on the public Web and submit it for harvesting. 16:16:33 s/publlishes/publishes 16:16:40 sofia: so much in archives, not just about publishing data, but reusing data 16:16:56 Comment from audience: metadata is advertising for your data 16:17:12 RE: dir.w3.org, if you want to read an FAQ, see http://dir.w3.org/directory/pages/faq.docbook?view 16:17:53 Neil Benn (Fujitsu): in 2020, what have the political arguements been to convince governments to publish Open Data 16:18:53 Alvara: it's socially beneficial for everyone, Open Data enables people to solve more problems 16:19:57 ... in chile, a lot of money is being invested in start-ups and entrepreneur programmes; is is not fair to ask for similar spend on democratising data? 16:20:32 Benedikt: in the future, there mightn't be an open data debate, it will just be the standard 16:21:39 Bschloss: TimBL alluded to a key thing, CNN will have to put out metadata on related content 16:22:57 ... uses the example of airlines. putting out ticket information because they wanted to be listed 16:23:27 Andreas: key is that entry level for using Open Data is very low 16:23:46 cgueret has joined #odw 16:24:37 bhyland has joined #odw 16:24:37 bhyland: there is now a community directory online dir.w3.org/ 16:24:39 LarsG has joined #odw 16:24:48 http://dir.w3.org 16:24:57 timbl has joined #odw 16:25:10 logger, pointer? 16:25:15 CNN will have to put out metadata or risk losing sales or eyeballs. Let's learn from history where first movers got value (like Airlines that listed their schedules and prices on GDS', then other Airlines followed rapidly to not be at a disadvantage) 16:25:16 to list Linked Data products, services and projects 16:25:17 bhyland: the "Create an entry" link at http://dir.w3.org/directory/pages/faq.docbook?view doesn't work, and there's a missing stylesheet error when one goes where you'd think it should have pointed 16:25:25 RRSAgent, pointer? 16:25:25 See http://www.w3.org/2013/04/23-odw-irc#T16-25-25 16:25:26 On behalf of the W3C Gov't Linked Data Working Group, I encourage everyone attending this workshop to add their organization to dir.w3.org today or tomorrow. 16:26:00 (ah; I'd missed the '?view' off the end of my guessed URL) 16:26:03 sofia: important to show the value to publishers of opening up data 16:26:05 http://readwrite.com/2010/06/30/how_best_buy_is_using_the_semantic_web 16:26:11 It is simple to do, fast and gets more valuable Linked Data on the public web … plus it builds community & helps us all help one another. 16:26:39 Best Buy reports a 30% increase in page views, and 15% increase in click throughs 16:26:52 Alvaro: if a major part of the population cannot access the data, the technical discussions are irrelevant. general public needs to be empowered to access and use Open Data 16:27:04 @alex, what browser are you using? I see it ok on FF & Chrome 16:27:16 Andreas: agrees, general public should realise Open Data is THEIR data 16:27:51 bhyland: the Create an entry link on http://dir.w3.org/directory/pages/faq.docbook?view fails :-( 16:27:56 Benedikt: things are looking positive, lets hope to implment even 30% of what we have been discussing here tday 16:27:59 Ah Alex, I see the problem, try this http://dir.w3.org/directory/pages/create-entry.xhtml?view 16:28:10 Thanks for pointing out that incorrect link, will fix now. 16:28:33 rrsagent, draft minutes 16:28:33 I have made the request to generate http://www.w3.org/2013/04/23-odw-minutes.html ivan 16:28:56 bhyland: thanks :-) 16:29:06 jpcs1 has joined #odw 16:30:10 s/tday/today 16:31:28 yoshiaki has joined #odw 16:31:39 Hats off to Phil for keeping us on military time & getting us to the pub on time! Awesome job program chairs, thank you Phil, Jeni, Rufus and DanBri 16:32:20 yoshiaki has joined #odw 16:35:02 cjg has joined #odw 16:37:06 cjg has joined #odw