12:57:08 RRSAgent has joined #dwbp 12:57:08 logging to http://www.w3.org/2015/03/27-dwbp-irc 12:57:10 RRSAgent, make logs 351 12:57:10 Zakim has joined #dwbp 12:57:12 Zakim, this will be DWBP 12:57:12 ok, trackbot; I see DATA_DWBP()9:00AM scheduled to start in 3 minutes 12:57:13 Meeting: Data on the Web Best Practices Working Group Teleconference 12:57:13 Date: 27 March 2015 12:58:10 annette_g has joined #dwbp 12:58:44 DATA_DWBP()9:00AM has now started 12:58:51 +??P1 12:58:54 Zakim, ??P1 is me 12:58:54 +BartvanLeeuwen; got it 12:59:29 +[IPcaller] 12:59:32 zakim, [ is me 12:59:32 +phila; got it 12:59:55 MTCarrasco has joined #dwbp 12:59:59 +??P3 13:00:15 zakim, ??P3 is me 13:00:15 +MTCarrasco; got it 13:00:17 +annette_g 13:00:24 Hi all! 13:00:53 Hi! 13:01:33 BernadetteLoscio has joined #dwbp 13:02:03 +Steve 13:02:27 +HadleyBeeman 13:02:44 deirdrelee has joined #dwbp 13:03:33 zakim, who is here? 13:03:33 On the phone I see BartvanLeeuwen, phila, MTCarrasco, annette_g, Steve, HadleyBeeman 13:03:36 On IRC I see deirdrelee, BernadetteLoscio, MTCarrasco, annette_g, Zakim, RRSAgent, BartvanLeeuwen, yaso, adler1, phila, hadleybeeman, rhiaro, trackbot 13:04:24 ericstephan has joined #dwbp 13:05:11 +estephan 13:05:15 RiccardoAlbertoni has joined #DWBP 13:05:26 zakim, estephan is ericstephan 13:05:26 +ericstephan; got it 13:05:41 newton has joined #dwbp 13:05:42 +[IPcaller] 13:05:51 zakim, ipcaller is me 13:05:51 +deirdrelee; got it 13:06:01 +RiccardoAlbertoni 13:06:01 antoine has joined #dwbp 13:06:19 hi all! 13:06:28 zakim, who is here? 13:06:28 On the phone I see BartvanLeeuwen, phila, MTCarrasco, annette_g, Steve, HadleyBeeman, ericstephan, deirdrelee, RiccardoAlbertoni 13:06:30 On IRC I see antoine, newton, RiccardoAlbertoni, ericstephan, deirdrelee, BernadetteLoscio, MTCarrasco, annette_g, Zakim, RRSAgent, BartvanLeeuwen, yaso, adler1, phila, 13:06:30 ... hadleybeeman, rhiaro, trackbot 13:06:49 yaso1 has joined #dwbp 13:06:51 +[IPcaller] 13:06:52 Zakim, yaso has Newton 13:06:52 sorry, yaso1, I do not recognize a party named 'yaso' 13:06:56 yaso1 has left #dwbp 13:07:03 zakim, pick a victim? 13:07:03 I don't understand your question, hadleybeeman. 13:07:06 zakim, IPcaller is me 13:07:06 +antoine; got it 13:07:06 yaso1 has joined #dwbp 13:07:11 zakim, pick a victim 13:07:11 Not knowing who is chairing or who scribed recently, I propose Steve 13:07:25 zakim, steve is adler1 13:07:27 +adler1; got it 13:07:38 +Reinaldo 13:07:47 zakim, who is here? 13:07:47 On the phone I see BartvanLeeuwen, phila, MTCarrasco, annette_g, adler1, HadleyBeeman, ericstephan, deirdrelee, RiccardoAlbertoni, antoine, Reinaldo 13:07:48 adler1, would you be willing to scribe? 13:07:49 On IRC I see yaso1, antoine, newton, RiccardoAlbertoni, ericstephan, deirdrelee, BernadetteLoscio, MTCarrasco, annette_g, Zakim, RRSAgent, BartvanLeeuwen, yaso, adler1, phila, 13:07:49 ... hadleybeeman, rhiaro, trackbot 13:07:52 Zakim, Reinaldo is yaso 13:07:52 +yaso; got it 13:08:30 zakim, pick a victim 13:08:30 Not knowing who is chairing or who scribed recently, I propose RiccardoAlbertoni 13:08:46 scribe: adler1 13:08:47 Zakim, Yaso has newton 13:08:47 +newton; got it 13:09:02 http://www.w3.org/2013/meeting/dwbp/2015-03-20 13:09:12 zakim, who is noisy? 13:09:19 propose: approve last meeting minutes 13:09:23 hadleybeeman, listening for 10 seconds I heard sound from the following: adler1 (25%), HadleyBeeman (14%), yaso (54%) 13:09:32 proposed: approve last meeting minutes 13:09:36 +1 13:09:39 +1 13:09:39 +1 13:09:40 +1 13:09:41 +1 13:09:42 +1 13:09:50 s/propose: approve last meeting minutes// 13:10:07 +1 13:10:10 -adler1 13:10:20 RESOLVED: approve last meeting minutes 13:10:33 AdrianoC has joined #dwbp 13:10:35 +adler1 13:10:46 +[IPcaller] 13:10:46 Agenda: https://www.w3.org/2013/dwbp/wiki/Meetings:Telecon20150327 13:10:51 zakim, ipcaller is BernadetteLoscio 13:10:51 +BernadetteLoscio; got it 13:10:52 Chair: Hadley 13:11:03 http://www.w3.org/2013/dwbp/track/issues/open 13:11:10 hadly: lets jump into open issues 13:11:18 issue-48? 13:11:18 issue-48 -- Phil to look at whether the ucr doc sufficiently covers code lists -- open 13:11:18 http://www.w3.org/2013/dwbp/track/issues/48 13:12:02 s/hadly/hadleybeeman/ 13:12:08 issue-48 resolved on the mailing list http://www.w3.org/2013/dwbp/track/issues/48 13:12:25 close issue-48 13:12:26 Closed issue-48. 13:13:08 formats: 67 examples: 56 policy: 74 technologies: 144 13:13:29 what is the general structure of the best practices document 13:13:32 issue-67? 13:13:32 issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed 13:13:32 http://www.w3.org/2013/dwbp/track/issues/67 13:13:37 issue-56? 13:13:37 issue-56 -- We need context and examples. do they go into the rec-track documents or into a separate note? -- closed 13:13:37 http://www.w3.org/2013/dwbp/track/issues/56 13:13:43 issue-74? 13:13:43 issue-74 -- Is it in scope to include mention of policy framework etc. as part of the non-normative discussion/editorialisation of the bp doc -- open 13:13:43 http://www.w3.org/2013/dwbp/track/issues/74 13:13:46 issue-144? 13:13:46 issue-144 -- There is a technological bias in several parts of the document -- open 13:13:46 http://www.w3.org/2013/dwbp/track/issues/144 13:14:04 nathalia has joined #dwbp 13:14:25 issue-67? 13:14:25 issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed 13:14:25 http://www.w3.org/2013/dwbp/track/issues/67 13:14:30 link for 67? not on the list 13:14:35 bernadette: I was in Sao Paolo, and we worked through some issues and closed issue-67 13:14:42 q+ 13:15:01 DSV (Delimiter-separated values) 13:15:01 ack MTCarrasco 13:15:24 http://en.wikipedia.org/wiki/Delimiter-separated_values 13:15:40 q+ 13:15:48 ack phila 13:16:00 phila: you can use tabs or commas to separate things 13:16:11 phila: the term everyone uses is CSV 13:16:30 q+ 13:16:33 I agree that we should broaden the term, but the CSV on the web WG as "Tabular Data". 13:16:43 phila: when we say CSV we also mean tab and comma, but it may be more accurate to say delimeter separated value 13:16:51 s/ as / uses/ 13:17:07 CSV is easier 13:17:10 ack annete 13:17:15 ack annette 13:17:19 no need to use terms people don't often use 13:17:32 Find to use CVS meaning DSV 13:17:34 q+ 13:17:42 s/CVS/CSV/ 13:17:52 q+ 13:18:09 q+ 13:18:16 q+ phila 13:18:27 q+ 13:18:30 ack adler 13:18:33 +1 to confusion 13:18:41 zakim, adler is adler1 13:18:41 +adler1; got it 13:18:52 http://www.w3.org/TR/tabular-data-model/ Tabular data is data that is structured into rows, each of which contains information about some thing. 13:19:01 ack annette 13:19:21 q- 13:19:22 annette: we could build a glossary to explain formats 13:19:26 ack mt 13:19:35 http://en.wikipedia.org/wiki/Attribute%E2%80%93value_pair 13:20:03 q+ 13:20:16 q- 13:20:27 Key-value pairs 13:20:31 ack deirdre 13:20:33 thomas: the question includes using key value pairs as a format 13:20:40 q+ 13:20:45 AdrianoC-UFMG has joined #dwbp 13:20:48 q+ 13:20:55 ack adler 13:21:36 q+ 13:21:49 q+ 13:21:55 adler1: There are so many other file types that are used in the open data 13:22:02 ack annette 13:22:24 +1 to Annette 13:22:26 annette: we should avoid key value pairs because it suggests many formats 13:22:29 ack deirdre 13:23:01 q+ 13:23:10 deidrelee: I agree with annette, for the main types of data we can stay with common formats and in Ireland we can name common formats for additional filetypes 13:23:18 q+ to talk about recommended file types 13:23:32 deidrelee: there are other filetypes to ack those as well 13:23:35 q? 13:23:37 there are also many other open binary formats that might be useful as well 13:23:41 ack MTCarrasco 13:23:41 +1 deirdrelee 13:23:45 q- 13:23:47 Formats: the most populars - textual: CSV, key-value, JSON, XML 13:24:08 q+ 13:24:10 ack me 13:24:10 phila, you wanted to talk about recommended file types 13:24:11 JSON = key-value 13:24:12 q+ 13:24:12 thomas: we can mention the most popular file types, csv, key value, xml, json 13:24:19 phila: RDF too 13:24:45 Graphical: PNG 13:24:51 Caroline_ has joined #DWBP 13:24:57 q+ 13:25:02 Zakim, Yaso has Caroline_ 13:25:02 +Caroline_; got it 13:25:53 q+ 13:26:18 q+ 13:26:47 ack ericstephan 13:27:19 adler1: Makes the point about information needing to be human consumable in things like images, videos and docs etc. 13:27:30 ericstephen: original discussions about formats and we should be explicit about file formats 13:27:46 ericstephen: maybe we need a document talking about specific formats 13:28:17 ericstephen: families of formats, like netcdf. we could go down the road describing all kinds of formats 13:28:26 q? 13:28:35 ericstephen: is there a bucket we could put all the file formats into 13:28:35 ack BernadetteLoscio 13:28:54 q+ 13:29:05 bernadette: maybe we can find some data formats and create a section in the document and list the most used data formats 13:29:21 ack deirdrelee 13:29:21 bernadette: i think we can list the main data formats 13:29:22 q+ 13:30:08 CarlosIglesias has joined #dwbp 13:30:09 deidrelee: I am not going to disagree with steve. there is an org in ireland that does archiving for ireland 13:30:20 q? 13:30:24 deidrelee: we focus on tabular and the org focuses on other file type 13:30:44 q+ 13:30:57 nathalia_ has joined #dwbp 13:31:16 q+ 13:31:25 +??P26 13:31:25 +1 deirdrelee maybe a "data catalog"? 13:31:26 deidrelee: we could classify formats as open and machine-readable and make a matrix 13:31:45 JPEG is far from dead. as a photographer, I use it all the time 13:31:51 zakim, ??P26 is me 13:31:51 +CarlosIglesias; got it 13:31:52 ack annette_g 13:31:53 q- 13:32:31 annette: a good idea to include image and video formats, but we should focus on a data format and reference to media files 13:32:36 ack MTCarrasco 13:32:41 The examples must be specific and give examples with the most popular formats: textual, graphics ... at least one for each type 13:32:43 media files have EXIF metadata 13:33:02 thomas: we must use specific examples and most popular formats 13:33:02 Also package formats 13:33:15 how to package the things 13:33:20 http://joinup.ec.europa.eu/site/med/tem/ 13:33:46 ack me 13:34:33 +1 to phila 13:34:45 Package forma: http://dragoman.org/xdossier 13:34:47 +1 to phil 13:34:55 phila: there is an intermediary between the data and a human being, and our remit stops at that intermediary 13:35:02 We have to help people publish good, clean, usable data so that developers can make good visualisations, displays, conclusions, etc from it. 13:35:07 scientists also have nasty habits of hanging onto formats that might be considered obsolete by the web publishing community. non-binary formats are used to be a bridge to those formats 13:35:08 Share-PSI http://www.w3.org/2013/share-psi/bp/hls/ 13:35:29 zakim, who is here? 13:35:29 On the phone I see BartvanLeeuwen, phila, MTCarrasco, annette_g, HadleyBeeman, ericstephan, deirdrelee, RiccardoAlbertoni, antoine, yaso, adler1.a, BernadetteLoscio, CarlosIglesias 13:35:33 yaso has Caroline_ 13:35:33 On IRC I see nathalia_, CarlosIglesias, Caroline_, AdrianoC-UFMG, AdrianoC, yaso1, antoine, newton, RiccardoAlbertoni, ericstephan, deirdrelee, BernadetteLoscio, MTCarrasco, 13:35:33 ... annette_g, Zakim, RRSAgent, BartvanLeeuwen, adler1, phila, hadleybeeman, rhiaro, trackbot 13:35:45 The formats must be human and machine readable - 13:35:48 the use of media files are well documented in our use cases 13:35:50 zakim, mute BernadetteLoscio 13:35:51 BernadetteLoscio should now be muted 13:35:53 multimedia galleries is a quite usual use case for open government data as well 13:35:58 q? 13:35:58 +1 13:36:32 ack BartvanLeeuwen 13:36:38 phila: disagrees with adler1 and thinks media files are out of scope for the group 13:36:40 "Package or Perish" 13:36:56 bart: how much does a specific file format have on the best practice 13:37:04 @mtcarrasco: I don't know much about packaging. When would someone want to do it? What are the use cases? 13:37:04 good point BartvanLeeuwen 13:37:13 q+ 13:37:31 Not mentioned formats are not excluded - but we have to be specific 13:37:35 ack adler1 13:37:42 of course if you note my comment on the sciences this isn't true 13:38:17 adler1: CSV is nearly 40 years old as an export format from Lotus 13:38:32 adler1: JPG is used everywhere for images 13:38:43 steve is brilliantly articulating the reason for the charter for the CSV on the Web working group :) 13:38:48 shapefiles geospatial... 13:38:54 ... AVI, MP3 etc are all widely used. Open data is new 13:39:13 q+ 13:39:26 ... we don't have to catalogue all file formats but we do need to look at our use cases and the file formats that they refer to 13:40:10 ... I like the idea of a table of file formats that may be used and what they can be used for. 13:40:29 adler1: Some files have their own metadata file (EXIF etc.) 13:40:35 ... and we can talk about that. 13:40:49 q? 13:40:54 ack adler 13:40:57 The Art of Unix Programming - Textuality - http://www.catb.org/esr/writings/taoup/html/textualitychapter.html 13:40:58 adler1: I think it behoves us to create a glossary of these and where our standards fit 13:41:03 ack next 13:41:09 q+ 13:41:46 deidrelee: there are an infinite number of file types, and a lot are specific to systems, and we care about data inter-operability 13:42:03 Should we ONLY reference file formats that have a syntatic online definition we can referenc? 13:42:10 deidrelee: we focus on RDF, CSV and they are fit for purpose 13:42:39 E.g. IETF, W3C links to formats 13:42:43 ack annette_g 13:42:45 deidrelee: our document is a snapshot of what we know and does not exlcude new formats 13:43:05 annette: I think we are talking about two different things, data vs file format 13:43:20 annette: maybe we need to split that into two different things 13:43:28 ack next 13:43:28 q? 13:43:49 thomas: we must create a specific example with specific formats 13:43:49 q+ 13:43:50 q+ 13:44:04 q- 13:44:11 thomas: the example must be human and machine-readable 13:44:17 q+ 13:44:18 ack hadleybeeman 13:44:32 q- 13:44:56 don't think all formats need to be human and machine readable 13:45:03 something nice to have sometimes 13:45:04 hadley: we have a lot of different topics in this discussion and we should focus on data on the web 13:45:06 but not others 13:45:29 even counter-productive in same cases 13:45:31 terminology: data model vs. data model - http://dragoman.org/format 13:45:41 JPEG is machine readable 13:45:49 terminology: data model vs. data format - http://dragoman.org/format 13:46:20 issue-67? 13:46:20 issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed 13:46:20 http://www.w3.org/2013/dwbp/track/issues/67 13:46:46 zakim, unmute BernadetteLoscio 13:46:46 BernadetteLoscio should no longer be muted 13:47:29 open a new issue 13:47:52 what file formats should we list in our glossary that are human and machine readable 13:47:56 should we add a BP about file formats (as opposed to data formats)? 13:48:00 q+ 13:48:56 q+ 13:49:05 q- 13:49:26 +1 to close 13:49:36 Maybe we should create one action to someone to provide examples for the BP. Those examples showing the possible formats that could be used, showing json, xml, csv 13:49:39 ack adler1 13:49:44 ack adler 13:50:11 Format is central to data 13:50:17 http://www.w3.org/TR/dwbp/#dataFormats 13:50:25 q+ 13:50:26 adler1: I agree that the issue is closed. But today we're tallking about something else. We're looking at discovery, quality etc. And that can include data in any file formats 13:50:54 ... these BPs could apply to multiple file types that are both human and machine readable, and that's the criteria for inclusion in the glossary 13:50:56 Define data types 13:51:11 q+ to talk about using PDF consistently with our BPs 13:51:14 q+ 13:51:55 ack BernadetteLoscio 13:52:18 q+ 13:52:38 annette: file formats are jpeg and data file formats are csv 13:53:13 +1 to annette 13:53:27 annette: ASCII is the file format and CSV is the structure 13:53:56 ack me 13:53:56 phila, you wanted to talk about using PDF consistently with our BPs 13:54:04 Is the requirement specifically relevant to data published on the Web? 13:54:04 Does the requirement encourage reuse or publication of data on the Web? 13:54:04 Is the requirement testable? 13:54:31 zakim, mute BernadetteLoscio 13:54:31 BernadetteLoscio should now be muted 13:55:07 q+ 13:55:31 jpeg has also its own structure 13:55:41 why it is not a "data format"? 13:55:55 phila: if you publish data in pdf you are publishing information for 13:56:12 @CarlosIglesias, how can jpegs be reused? :) 13:56:23 photos are machine searchable today for faces, buildings, and text 13:56:24 q- 13:56:26 (or rather, how can the data in jpegs be reused) 13:56:32 for multimedia content 13:56:44 Agree with phil, maybe put metadata on the image, on the video, on the gif, but that's all 13:56:44 through the associated metadata 13:56:49 images are analyzed for electron microscopy , new instruments are producing 100,000 images a second that need to be analyzed by machines. 13:56:54 q? 13:57:01 in fact they are one of the most reused formats 13:57:27 you can reuse the full image, not necessary just some bits 13:57:29 Define: data model, file format, data format, data type 13:57:33 ack mt 13:58:18 s/ if you publish data in pdf you are publishing information for/ if you publish data in pdf you are publishing information for people, not machines, it works against re-use/ 13:58:30 we can define terms in the intro, too 13:58:39 ack antoine 13:58:41 zakim, unmute BernadetteLoscio 13:58:41 BernadetteLoscio should no longer be muted 13:58:56 file format and data format: the same 13:59:11 antoine: in the some of the use case, DQ mentions quality of type and features 13:59:37 Its a really interesting discussion publishing for humans/machines 13:59:45 +1 to ericstephan 13:59:47 antoine: idea to sort the issue of different file types to consider DQ and if it makes sense for these files 14:00:07 indeed it is critically important to validate DQ in images and video which are constantly abused 14:00:08 I like that antoine 14:00:12 BernadetteLoscio: perhaps this is a data usage vocab item? 14:00:22 data quality: format quality and content quality 14:00:34 I could send you all terabytes of fraudulent images and video 14:00:47 https://www.w3.org/2013/dwbp/wiki/Glossary 14:00:48 terms: +1 14:00:48 80% ofall internet traffic is media 14:00:52 me, bye antoine! 14:01:00 -antoine 14:01:16 +1. A glossary will address all these issues: 52, 59, 68, 80, 82, 133 14:01:21 +1 14:01:31 thank you 14:01:37 bye! 14:01:48 bank holiday next week 14:02:00 cancelling call next week 14:02:00 it will be Holliday in Brazil 14:02:03 Easter Friday is a holiday in many countries so no call next week 14:02:12 ok 14:02:18 Thanks, all! 14:02:20 yeah for sleeping in! 14:02:52 We need a F2F agenda still? 14:02:52 Bye! Happy easter and eat chocolates! 14:02:55 bye! 14:02:57 thanks everyone 14:02:58 -adler1.a 14:02:58 bye 14:03:00 -deirdrelee 14:03:01 bye 14:03:02 yaso1 has left #dwbp 14:03:02 -phila 14:03:03 -MTCarrasco 14:03:04 -BernadetteLoscio 14:03:04 -HadleyBeeman 14:03:05 bye 14:03:06 -BartvanLeeuwen 14:03:06 -ericstephan 14:03:09 -yaso 14:03:12 -annette_g 14:03:20 trackbot end meeting 14:03:20 Zakim, list attendees 14:03:20 As of this point the attendees have been BartvanLeeuwen, phila, MTCarrasco, annette_g, HadleyBeeman, ericstephan, deirdrelee, RiccardoAlbertoni, antoine, adler1, newton, 14:03:23 ... BernadetteLoscio, Caroline_, CarlosIglesias 14:03:28 RRSAgent, please draft minutes 14:03:28 I have made the request to generate http://www.w3.org/2015/03/27-dwbp-minutes.html trackbot 14:03:29 RRSAgent, bye 14:03:42 RRSAgent, make logs public 14:03:46 trackbot end meeting 14:03:46 Zakim, list attendees 14:03:46 As of this point the attendees have been BartvanLeeuwen, phila, MTCarrasco, annette_g, HadleyBeeman, ericstephan, deirdrelee, RiccardoAlbertoni, antoine, adler1, newton, 14:03:50 ... BernadetteLoscio, Caroline_, CarlosIglesias 14:03:54 RRSAgent, please draft minutes 14:03:54 I have made the request to generate http://www.w3.org/2015/03/27-dwbp-minutes.html trackbot 14:03:55 RRSAgent, bye 14:03:55 I see no action items