12:34:05 RRSAgent has joined #csvw 12:34:05 logging to http://www.w3.org/2014/02/19-csvw-irc 12:34:07 RRSAgent, make logs public 12:34:07 Zakim has joined #csvw 12:34:09 Zakim, this will be CSVW 12:34:09 ok, trackbot; I see DATA_CSVWG()8:00AM scheduled to start in 26 minutes 12:34:10 Meeting: CSV on the Web Working Group Teleconference 12:34:10 Date: 19 February 2014 12:34:20 Chair: DanBri 12:34:51 Scribe: Jeni 12:39:58 Regrets: Axel Polleres, Adam Retter, Jürgen Umbrich 12:40:32 ivan has changed the topic to: Meeting agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-02-19 12:53:37 TimFinin has joined #csvw 12:54:04 any agenda additions? 12:54:07 konstant has joined #csvw 12:54:21 JeniT has joined #csvw 12:54:48 DATA_CSVWG()8:00AM has now started 12:54:55 +[IPcaller] 12:55:15 zakim, +[IPcaller] is me 12:55:15 sorry, konstant, I do not recognize a party named '+[IPcaller]' 12:55:37 +TimFinin 12:55:54 aren't we 5 mins early? 12:56:05 possibly 12:56:41 fonso has joined #csvw 12:56:43 my 3 three different instances of ntp fail to agree with each other 12:57:11 -[IPcaller] 12:58:05 I have problem with OSX audio - can't hear anything 12:58:45 zakim, code? 12:58:45 the conference code is 2789 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), ivan 12:59:21 +ivan 12:59:30 +[IPcaller] 12:59:36 jtandy has joined #csvw 12:59:52 trackbot, start telcon 12:59:54 RRSAgent, make logs public 12:59:56 Zakim, this will be CSVW 12:59:56 ok, trackbot, I see DATA_CSVWG()8:00AM already started 12:59:57 Meeting: CSV on the Web Working Group Teleconference 12:59:57 Date: 19 February 2014 13:00:02 Chair: Dan Brickley 13:00:06 Scribe: Jeni Tennison 13:00:09 ScribeNick: JeniT 13:00:13 danbri has joined #csvw 13:00:21 +ericstephan 13:00:26 EricStephan has joined #csvw 13:00:29 +[IPcaller] 13:00:30 zakim, IPCaller is me 13:00:30 +AndyS; got it 13:00:52 +[IPcaller] 13:01:10 zakim, IPCaller is me 13:01:10 +fresco_; got it 13:01:37 Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-02-19 13:01:53 +??P11 13:02:05 zakim, P11 is me 13:02:05 sorry, DavideCeolin, I do not recognize a party named 'P11' 13:02:12 zakim, ??P11 is me 13:02:12 +DavideCeolin; got it 13:02:16 +??P10 13:02:17 +[IPcaller] 13:02:22 probably 13:02:41 zkim, IPcaller is me 13:03:10 +[IPcaller] 13:03:19 zakim, IPcaller is me 13:03:19 +konstant; got it 13:03:22 hi - am just having challenges with an updated version of my softphone ... 13:03:43 ok, thanks JeniT 13:04:00 is it twitter tweets ;-) 13:04:39 just muted the birds, sorry 13:04:45 danbri: approve last week's minutes? 13:04:47 http://www.w3.org/2014/02/12-csvw-minutes.html 13:05:02 ... no objections, approved 13:05:27 ... same agenda as last week, if anyone else has agenda items please pipe up 13:05:45 AndyS: are we using the Tracker for Issues? 13:06:00 jeni? 13:06:13 the issue that was raised on tracker was accidental (dan!) 13:06:22 [I can barely hear her] 13:06:30 goal was to use github issues 13:06:46 jtandy, any luck? 13:07:00 it was rather echoey 13:07:00 having difficulty hearing 13:07:04 zakim, mute me 13:07:04 konstant should now be muted 13:07:06 not yet ... getting "no route to destination" 13:07:15 will try skype 13:07:23 (skype out) 13:07:24 Topic: Syntax for Tabular Data on the Web 13:07:31 http://w3c.github.io/csvw/syntax/ 13:07:58 scribenick: danbri 13:08:03 Jeni: what i've tried to do in this doc is to get a v basic idea 13:08:06 of tabular data model 13:08:22 and what a syntax for basic tabular data, ie. for that model, based on csv would look like 13:08:36 with an appendix which looks at existing specs, key implementations of csv, to see what they do 13:08:46 we might use existing specs and implementations as constraints on what we do 13:09:01 there are various issue in that we could go through 13:09:31 jeni: the tabular data model as i say v basic, … table w/ one or more cols, each named, one or more rows, each row has a field for each col in the table 13:09:37 that's the basic model 13:09:57 1st issue there is whether order of cols is significant 13:10:07 thoughts? 13:10:08 sound quality on skype appalling ... trying real phone. (2 mins please) 13:10:10 q? 13:10:21 andys: i'd have thought it was quite significant 13:10:37 often they're grouped together, for comprehensibility reasons 13:10:47 sometimes column naming used as a hint, … eg. with years 13:11:07 i think there's an intention underlying that they proceed from left to right 13:11:13 q+ 13:11:17 q- 13:11:23 ack danbri 13:11:57 dan: order needed if col names missing 13:12:46 danbri: doc order can be administratively useful (as rdf/xml etc) 13:12:47 + +44.777.586.aaaa 13:13:16 thanks 13:13:30 i'd like to see an example of data that would have problems if the columns were re-ordered 13:13:33 q+ 13:13:36 jeni: […] if not preserved, conseq would be that it would be possible for an impl to read in a file and write it out in a different order conformantly 13:13:43 q? 13:14:17 andys: some json reprs e.g. gregg's could lose order 13:14:24 ack TimFinin 13:14:25 jeni: any mappings to other formats could have this issue 13:14:44 timfinin: wrt ordering, systems that try to infer semantics of tables, order is a strong heuristic 13:15:07 … e.g. 1st column much more likely to be a key for the table, adj of columns helps inferring relationships between columns 13:15:20 jeni: another good reason 13:15:27 resolved: order of columns should be significant 13:15:32 next issue is, issue 2 - 13:15:54 jenit: in SQL, every column has a type associated with it, … should we assume same within our tabular data model? 13:16:02 or have that as a separate layer 13:16:24 dan: how would we answer this? 13:16:53 jenit: it's a design choice. for example in xml, … originally most values were not typed, then xmlschema layered on top 13:17:05 … compare w/ json, basics built-in 13:17:08 q+ 13:17:13 q+ 13:17:27 … in most data formats, we care about particular types, if you're passing around data you should care about the values 13:17:32 q+ 13:17:36 ack Tim 13:18:04 timfinin: two notions of type, … low level datatype; integer, date, … other is a semantic type 13:18:20 … it would be v interesting to support adding semantic types as help for someone trying to use that table 13:18:29 so instead of being a mere string, is a person; or a musicalartist. 13:18:30 agree with TimFinin 13:18:46 if we had something like that, it wouldn't be like a schema for validation, but more info for someone trying ot u/stand what the table is 13:18:53 q? 13:18:55 ack konstant 13:18:55 ack knostant 13:19:08 [poor audio?] 13:19:48 konstant: wrt data format, … can have header saying if something has an integer, float etc. 13:20:03 q+ 13:20:05 … wrt semantic typing per tim's comment, 1st of all […] what cell is supposed to mean 13:20:15 q? 13:20:16 ack danbri 13:20:23 zakim, mute me 13:20:23 konstant should now be muted 13:21:18 maybe we cld have any number of header rows. One might give simple datatypes. another might give column names. another might gibe URIs to semantic types 13:21:35 dan: q is whether we consider all csvs to have [homogenously] typed columns (we can always add that via external files) 13:21:55 q? 13:21:58 if we alow a *any* type it might help 13:22:10 ack ivan 13:22:11 ack ivan 13:22:35 ivan: two diff things. for me, mainly as we have all these semantic types, for me this part of the metadata we'll define. whether we assign a type, ... 13:22:55 ivan, it is 25C and sunny today. 13:22:56 for a column, it's metadata. that simplifies the treatment, management of the whole thing 13:23:09 are column header names considered metadata? 13:23:10 … also a q: reality of what's out there. What do Excel, OpenOffice, etc do? 13:23:24 do they recognise basic data like json, or they turn everything into strings 13:23:27 andys: they spot numbers 13:23:31 jenit: and dates 13:23:43 ivan: so there's a number of data that theyspot automatically 13:24:00 q+ 13:24:15 Locale sensitive as well. 1/4 confuses : "1 April" vs 0.25. 13:24:17 q+ to suggest some but not all columns MAY share a fixed type for whole column; but some cols are chaotic. 13:24:22 ack EricStephan 13:24:24 ack EricStephan 13:24:35 Eric: a huge problem in scientific arena. If you're importing into a s/sheet 13:24:49 and if it detectts in a cell something that [happens to] look like a date 13:25:07 so you sometimes have to engineer around this, to protect against the spreadsheet tool guessing badly 13:25:16 sometimes too smart 13:25:37 jenit: good point 13:25:39 http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ 13:25:39 ack danbri 13:25:39 danbri, you wanted to suggest some but not all columns MAY share a fixed type for whole column; but some cols are chaotic. 13:26:10 q? 13:26:16 http://www.biomedcentral.com/1471-2105/5/80 13:26:43 q+ 13:27:14 jenit: some more issues, but i'll make a redraft based on this, … probably with a 2 layer model 13:27:16 For semantic types, avoid "DanBri"^^foaf:Person 13:27:42 or "DanBri" rdf:type foaf:Person 13:27:59 ivan: one comment on list (maybe jtandy); often in one csv file you often have several tables; is this something we're even considering 13:28:08 q+ 13:28:10 q+ to talk about multiple tables. 13:28:17 jenit: good point, a lot of our examples have required multiple tables in some ways associated with each other 13:28:20 ack ivan 13:28:37 it is useful for this doc to somehow recognise that; then we can move on to discuss how those sep tables can be expressed 13:28:44 e.g. in one table, zipped etc 13:28:50 tables within tables: http://dx.doi.org/10.7717/peerj.259/table-3 13:28:53 q? 13:28:58 ack jtandy 13:29:13 jenit: :) 13:29:25 jtandy: the comment I made, .. often multiple CSVs are packaged as a dataset in a zipfile, each text file represents a facet of the dataset 13:29:50 q+ 13:29:50 ivan: that's friendlier than several tables in one file 13:30:12 ack andys 13:30:12 AndyS, you wanted to talk about multiple tables. 13:30:44 andys: what i'd like to see … data syntax format pointing to a region of a csv file, … 13:30:52 konstant is "stasinos" and promises to change his nick 13:30:55 …orig to be able to id the data parts from the presentational surround 13:31:11 no prob 13:31:27 q+ to question scope about handling existing CSVs 13:31:35 andys: on mult tables, … sometimes it is written, there really are two tables there, but flattened in a dump 13:31:36 q- 13:32:01 andys: eg. regions + sales items packaged together 13:32:14 … gregg talked about this 'denormalization dumping effect' 13:32:16 q? 13:32:27 zakim, unmute me 13:32:27 konstant should no longer be muted 13:32:29 ack konstant 13:32:46 konstant: I'm not really sure why ivan so worried about multiple tables in the same file 13:33:04 … we also have cols w/ diff types, diff rows, complex interdependencies, all kinds of [other] ugliness 13:33:21 … if someone dumps multiple tables in one file they'll have some kind of delimiter 13:33:30 … there should be something that is machine describable 13:33:56 ivan: surely true, a minor thing, but let's say the CSV handling toolkit w/ python would break on these things, for eg. 13:34:08 jtandy, ready to talk about Use Cases doc? 13:34:12 q? 13:34:18 q- 13:34:23 yes - quick update today 13:34:51 ivan: there are more complications out there than i expected, that's all! 13:34:54 ScribeNick: JeniT 13:34:56 zakim, mute me 13:34:56 konstant should now be muted 13:35:03 Topic: Use Cases & Requirements 13:35:32 jtandy: I've created the boilerplate document 13:35:54 http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0072.html 13:35:59 http://w3c.github.io/csvw/use-cases-and-requirements/ 13:36:04 ... abstract, intro etc are there 13:36:07 ... yet to add use cases 13:36:25 ... Alf has been working with Davide on providing more detailed examples with supporting datasets, so thanks to them 13:36:29 ... I will work through those shortly 13:36:45 ... email from Juan about CSV2RDF based on getting data out of relational databases 13:36:57 ... but there's no use case for CSV publication from relational databases 13:37:01 ... as yet 13:37:09 juan: http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0058.html 13:37:18 ... re Gregg's CSV-LD proposal, it implies a bunch of use cases, but I'm not sure how many of those we're picking up 13:37:23 gregg: http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0000.html 13:37:53 ... I'm looking for people to provide specific examples of data that we can stitch together with a narrative 13:38:07 ... I'm putting one together myself around the Met Office data which we can use as an example 13:38:18 +q 13:38:21 ... so if you have got a use case, please provide a narrative & datasets for them 13:38:42 ... there are details on the wiki under use case analysis 13:38:44 https://www.w3.org/2013/csvw/wiki/Use_Cases 13:38:52 ... I will ping people explicitly on the mailing list as I work on the use case 13:39:14 ... I'm concerned that the use cases don't yet cover the full scope of what we want to achieve 13:39:25 ... we need use cases to hang requirements on 13:39:33 danbri: are there any use cases promised but not delivered? 13:40:04 jtandy: I don't think so, but what we have doesn't much the scope of the requirements people are bringing up 13:40:09 i volunteered to give a use case for using CSV to exchange data in text information systems 13:40:11 q? 13:40:20 q? 13:40:22 ack EricStephan 13:40:22 ack eric 13:40:34 EricStephan: I've tried contributing use cases 13:40:34 I haven't shared a new version of the police data analysis use case yet but I'm close to have it done 13:40:47 text information systems -> text information extraction systems 13:40:54 davide, think you'll do that before next week's call? 13:40:56 ... I'm seeing in scientific formats, a basic format of header, then delimited data 13:41:02 ... I don't know how you want to organise those 13:41:10 ... I contributed two more this morning along those lines 13:41:13 s/text information systems/text information extraction systems/ 13:41:29 danbri, yes hopefully in a couple of days max 13:41:31 jtandy: I haven't seen those, but I'll look at them and follow up on the list 13:41:51 EricStephan: also, there were some contributions around the NetCDF format 13:41:57 ... also uncertainty qualification 13:42:10 ... eg simulations that change one parameter 13:42:18 ... which gives multiple CSVs 13:42:31 ... I can elaborate around these and more complex examples if that would be helpful 13:42:32 q? 13:42:34 jtandy: yes please 13:42:53 ... One interesting thing is how we deal with missing values 13:43:03 ... eg people using -999 13:43:10 good point 13:43:19 ... that's an example where we can make sure the syntax deals with that 13:43:31 danbri: I met last week with colleagues working on Fusion Tables 13:43:47 ... it's possible we can ask questions about the CSVs in use on the web 13:43:52 ... eg about what line endings are used 13:44:01 ... or whether -999 happens often 13:44:12 ... so if you have questions like that send them my way and I'll try to answer them 13:44:51 danbri: any other business? 13:45:00 Topic: AOB 13:45:15 sounds good 13:45:18 q+ 13:45:24 scribe next week? 13:45:35 ivan: I saw Jeremy tested in Excel 13:45:37 i might not be here next week (middle of california trip) 13:46:11 ... are tests in other tools useful? 13:46:31 JeniT: yes please 13:46:33 danbri: good to have these test files 13:46:39 ... Scribe volunteer? 13:46:43 zakim, pick a scribe 13:46:43 Not knowing who is chairing or who scribed recently, I propose konstant (muted) 13:46:56 a-ha 13:47:10 ok, ok 13:47:28 "volunteer" 13:47:34 thanks, 'volunteer' 13:47:36 :) 13:47:41 konstant will scribe next week 13:47:49 bye 13:47:50 -jtandy 13:47:51 -JeniT 13:47:52 -fonso 13:47:54 -ivan 13:47:57 -DavideCeolin 13:47:59 -ericstephan 13:47:59 -TimFinin 13:48:03 -AndyS 13:48:04 -danbri 13:48:11 bye 13:48:13 -fresco_ 13:48:22 trackbot, stop telcon 13:48:22 Sorry, ivan, I don't understand 'trackbot, stop telcon'. Please refer to for help. 13:48:28 trackbot, end telcon 13:48:28 Zakim, list attendees 13:48:28 As of this point the attendees have been TimFinin, ivan, JeniT, ericstephan, AndyS, fresco_, DavideCeolin, danbri, konstant, fonso, jtandy 13:48:36 RRSAgent, please draft minutes 13:48:36 I have made the request to generate http://www.w3.org/2014/02/19-csvw-minutes.html trackbot 13:48:37 RRSAgent, bye 13:48:37 I see no action items