12:13:11 RRSAgent has joined #csvw 12:13:11 logging to http://www.w3.org/2014/03/05-csvw-irc 12:13:13 RRSAgent, make logs public 12:13:13 Zakim has joined #csvw 12:13:15 Zakim, this will be CSVW 12:13:15 ok, trackbot; I see DATA_CSVWG()8:00AM scheduled to start in 47 minutes 12:13:16 Meeting: CSV on the Web Working Group Teleconference 12:13:16 Date: 05 March 2014 12:13:26 ivan has changed the topic to: Meeting agenda: http://www.w3.org/mid/etPan.531645eb.41a7c4c9.f9@jenit.local 12:57:49 DATA_CSVWG()8:00AM has now started 12:57:54 stasinos has joined #csvw 12:57:56 +[IPcaller] 12:58:36 DavideCeolin has joined #csvw 13:00:00 +[IPcaller] 13:00:05 zakim, IPcaller is me 13:00:05 +stasinos; got it 13:00:11 zakim, mute me 13:00:11 stasinos should now be muted 13:00:21 zakim, code? 13:00:21 the conference code is 2789 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), ivan 13:00:31 +Annette 13:00:43 zakim, Annette is me 13:00:43 +ivan; got it 13:00:48 +??P3 13:00:57 zakim, who is here? 13:00:57 On the phone I see fresco_, stasinos (muted), ivan, ??P3 13:00:58 On IRC I see DavideCeolin, stasinos, Zakim, RRSAgent, ivan, trackbot, fresco 13:01:03 zakim, ??P3 is me 13:01:03 +DavideCeolin; got it 13:01:20 JeniT has joined #csvw 13:01:39 RRSAgent, make logs public 13:01:41 Zakim, this will be CSVW 13:01:41 ok, trackbot, I see DATA_CSVWG()8:00AM already started 13:01:42 Meeting: CSV on the Web Working Group Teleconference 13:01:42 Date: 05 March 2014 13:02:03 +[IPcaller] 13:03:28 jumbrich has joined #csvw 13:03:36 +yakovsh 13:04:03 Chair: Jeni 13:04:12 Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-03-05 13:04:26 yakovsh has joined #csvw 13:05:48 Not knowing who is chairing or who scribed recently, I propose JeniT 13:06:00 Not knowing who is chairing or who scribed recently, I propose ivan 13:06:22 scribe: ivan 13:06:26 scribenick: ivan 13:06:42 JeniT: approve previous meeting minutes? 13:06:43 ... 13:06:44 ... 13:06:48 (no comment...) 13:07:03 RESOLVED: previous meeting minutes approved 13:07:50 +??P6 13:07:54 DavideCeolin: on the use cases, 5 uc were assigned to me 13:07:59 zakim, ??P6 is me 13:08:00 +jumbrich; got it 13:08:03 ... jeremy already set up the requiremenets 13:08:11 ... I am not sure they are really part of the others 13:08:38 JeniT: it would be useful to know whether there were any issues to be discussed? 13:08:45 DavideCeolin: one was abouve provenance 13:08:50 ... but this was discussed 13:09:07 ... but it has changed a bit 13:09:16 ... everyhthing related to annotation is linked to this 13:09:29 ... the possibility to map elements to URIs 13:09:33 zakim, mute me 13:09:33 jumbrich should now be muted 13:09:42 ... in some spreadsheets tehre are codes, and it would be useful to link them to URIs 13:09:52 ... I pointed that out as a separate requirements 13:09:58 ... not sure whether this is covered by others 13:10:06 JeniT: I agree that is a requirement 13:10:14 ... probably is not elsewhere noted 13:10:38 DavideCeolin: is it covered by the external definition resource? 13:10:51 JeniT: keep it separate for now, and we will be able to come back to this later 13:11:03 DavideCeolin: there is also requirement on unit of measure 13:11:14 ... partially covered by the semantic requiremements 13:11:24 ... but i still kept it separated for now 13:11:55 JeniT: you mean that the semantic type might cover some part of the measure unit? 13:12:09 DavideCeolin: I am not an expert on that, I am not sure what the best way to cover this 13:12:27 JeniT: this may be a choice of the publisher wehther this is something to go into the semantic part 13:12:53 JeniT: thanks for that, it is really good 13:12:53 ... our UC document is coming together 13:12:53 ... anybody have any issues/questions? 13:12:54 ... 13:12:55 ... 13:12:58 (no comments) 13:13:12 http://w3c.github.io/csvw/syntax/ 13:13:13 Topic: syntax document 13:13:33 JeniT: following on last call I have took the tabular data specification into the spec 13:14:09 ... in the data model we said every column has a name, now we say every column has an index 13:14:21 ... the name of the column is part of the annotation, ie, the annotated data model 13:14:25 http://w3c.github.io/csvw/syntax/#annotated-model 13:14:51 JeniT: the annotated model talks about the different types of annotations (tables, column level, etc) 13:14:58 ... these are the changes I made 13:15:11 JeniT: the first problem is issue 1 13:15:22 q+ 13:15:22 ... is the order of rows significant in a table? 13:15:28 ack fresco 13:15:40 q+ 13:15:48 q+ 13:16:17 fresco: in the case you are using several table in the same place then it may be a problem 13:16:31 JeniT: in our model we have now one table only 13:16:32 ack yakovsh 13:16:49 yakovsh: if we follow the spreadsheet model the row order is specifically significant 13:16:56 ack stasinos 13:16:57 JeniT: agreed 13:17:16 good point that references to individual cells rely on the row and column order being maintained 13:17:24 stasinos: one of the ways we are discussing is an ID which then has the properties of a fields in a row 13:17:33 q+ 13:17:41 ... we can then make the requirement for that to represent a strict order 13:17:45 ack yakovsh 13:18:05 yakovsh: rfc 7111 have a row and a column level reference 13:18:14 zakim, mute me 13:18:14 stasinos should now be muted 13:18:28 http://tools.ietf.org/html/rfc7111 13:18:38 JeniT: any reason not to have it significant? 13:18:44 (no reaction...) 13:18:53 ... in that case I will add it in 13:19:12 JeniT: next issue: in the annotated data model I have annotating table, column, row, cells 13:19:16 danbri has joined #csvw 13:19:17 ... I also have annotated regions 13:19:45 ... I have a suspicion I put it in because it looked like a useful generalization 13:19:54 ... do we really need it, or should I take it out? 13:20:03 ... any use case around that? 13:20:05 q+ 13:20:09 ack stasinos 13:20:40 stasinos: one thing, i remember we were discussing a situation of a cell being, say, the sum of other cells 13:20:47 ... in that case there is the notion of a region 13:21:01 ... in that case we may want to talk about general regions 13:21:01 q+ 13:21:03 danbri1 has joined #csvw 13:21:13 JeniT: that is interesting because it brings up referencing 13:21:25 ... referencing cells in a random manner, in a way 13:21:39 ... this is different from the current spec which talks about subtables 13:21:49 ... I think that is a useful thing to say 13:22:04 q+ 13:22:19 ack fresco 13:22:25 stasinos: it is hard to tell, without a really use case 13:22:50 fresco: referencing should be better as part of a separate specification 13:22:52 stasinos: that is reasonable 13:23:05 -jumbrich 13:23:08 JeniT: in rfc 7111 we have that notion 13:23:15 zakim, mute me 13:23:15 stasinos should now be muted 13:23:17 should RFC 7111 be referenced in our documents? I don't see it 13:23:18 ack ivan 13:23:36 +??P6 13:23:47 zakim, ??P6 is me 13:23:47 +jumbrich; got it 13:23:48 q+ 13:23:55 ack stasinos 13:23:59 referencing abstract "table" regions (i.e. data parsed from a CSV file) vs referencing parts of a CSV file 13:24:09 ivan: do we really want to represent spreadsheet functionalites 13:24:39 zakim, mute me 13:24:39 stasinos should now be muted 13:24:40 stasinos: it is not the reproduction of the functionalities, it is just to characterize the raw data itself 13:24:42 q? 13:25:07 to be able to specify which regions have data, and which have derivtives (of any sort) 13:25:24 requirement for row headings as well as column headings, to be able to say that a row is a derivative? 13:25:24 JeniT: what I will do is to put in a note that we talked about annotation regions, there may some usage, but we can refer this to more use cases 13:25:32 ... is that a reasonable way forward 13:25:34 + 13:25:36 +1 13:25:38 +1 13:25:39 +1 13:25:41 +1 13:26:10 danbri has joined #csvw 13:26:36 fresco: the parser can draw up an index, and you can have headers and rows, ie, you may want to specify the nature of rows and columns 13:26:44 JeniT: we so have annotated rows and columns 13:27:41 http://w3c.github.io/csvw/syntax/#syntax 13:27:44 JeniT: next issue in section 3 13:28:11 ... what I have done is to cut it down, it does not talk about how to input tabular data, but only how to output 13:28:22 ... that is the best practice of tabular data 13:28:22 there is an echo 13:28:45 ... we are only looking only at the output for best practice 13:29:16 ... issue 4 tries to make it as rfc compatible as possible 13:29:25 q+ 13:29:32 ... if we use the mime type, that refers to the default and usual character sets 13:29:51 ... this issue is that we would like to say that utf-8 is the default 13:29:54 q+ 13:29:57 ack yakovsh 13:31:20 yakovsh: I have discussed with the area directors and it may be possible to amend the draft 13:31:29 ...if there are specific suggestions for character sets 13:31:39 ... there is also the issue of cr and lf 13:31:48 ... i do not know about the default character set 13:32:10 ... it is definitely possible to have a default character set if we get a guidance from W3C 13:32:50 ... the issue currently says that the content type header must be used to set the character set 13:33:02 ... I know that people do not change content type anyway, let alone changing the character set 13:33:12 ... so it would be great if utf-8 would be default 13:34:12 Section 4.1.1 of RFC2046 specifies that "The canonical form of any MIME "text" subtype must always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" must represent a line break. Use of CR and LF outside of line break sequences is also forbidden." 13:34:36 q+ 13:34:40 ack fresco 13:34:40 application/csv sounds like a good idea to me 13:34:43 JeniT: if we want to say that it is o.k. to use LF, then we have a problem usint text/csv... 13:35:01 fresco: the old spec was ascii but all the parser ignored that 13:35:14 ... but the newer parsers fall back on utf8 13:35:26 ... ie, the specification could get away to use utf8 13:35:32 ... most people ignore the original spec 13:36:05 JeniT: according to the rfc we should not call that text/csv, only application/csv 13:36:08 ack yakovsh 13:36:12 ... the line ending is quite clear 13:36:26 yakovsh: rdf 4180 was passed with the old mime guidelines, but those changed 13:36:34 ... it is possible to change that 13:36:43 ... i will go back and see what is involved 13:36:49 ... i think it can be changes 13:37:03 ... question: is there a byte mark if the default in utf8? 13:37:18 JeniT: bom is usally optional with utf8 13:37:26 ... you do not usually have to use it 13:37:42 ... in practice, if you use it, you get horrible characters 13:37:47 ... I owuld like to avoid that 13:38:11 yakovsh: i will talk to the appl. working group, csv is not only the only one that has this issue 13:38:21 ... we will discuss that after the ietf meeting 13:38:52 -jumbrich 13:38:59 yakovsh: another question, rfc 4180 is an informational doc, if w3c really wants that ietf could push it through as a fast track 13:39:10 JeniT: yes, it would be good to have a standard for csv 13:39:24 +??P6 13:39:27 ... there has been other cases where the body has been done by w3c 13:39:34 zakim, ??P6 is me 13:39:34 +jumbrich; got it 13:39:38 ... if we can do this that owuld be great 13:39:40 q+ 13:39:46 ack ivan 13:40:35 ivan: any formal step is necessary from W3C? 13:40:51 yakovsh: no, it should be o.k without it 13:40:54 parser parameters: https://github.com/hubgit/csvw/wiki/CSV-Parser-Parameters 13:41:03 Topic: parsing tabular data 13:41:23 https://github.com/hubgit/csvw/wiki/CSV-Parser-Notes 13:41:32 fresco: there are also notes and looking at the different parameters parsers use 13:41:41 ... some of the things people have to specify 13:41:56 ... there are 2-3 different sections 13:42:03 ... character set, discussed 13:43:00 ... dialects of the csv file (separators, white space should be trimmed or not, how to select particular bits of the file to be used, ie, what is a comment line, etc) 13:43:23 ... there is also a separate set on how to transform data, that may be a separate issue 13:43:39 ... lot of csv parsers have these transformation fields in them 13:43:55 ... one issue is trimming of white space 13:44:09 ... one way is the '\', the unix way 13:44:17 ... or the quotes, the excel way 13:44:35 ... the quoting is in particularly for output 13:44:55 ... that is something to specify to put things in quotes only if there is a special character in the field 13:45:07 ... that is basically it... 13:45:16 q+ 13:45:25 ... I will clean it up 13:45:27 ack yakovsh 13:45:41 yakovsh: is there a list of application that you looked at? 13:45:42 https://github.com/hubgit/csvw/wiki/CSV-Parser-Notes 13:46:19 pandas 13:46:25 fresco: the big one is a python parser, and pandas in python 13:46:46 ... pandas has a lot of transformation, has a multi index with several header columns/rows 13:47:01 ... it specifies the decimal and thousands separators 13:47:11 ... java has a nice one 13:47:29 ... the standard one is the php csv parser, but it uses really with the standard case 13:47:38 https://docs.google.com/a/theodi.org/spreadsheet/ccc?key=0AiswT8ko8hb4dEtOR0x1WkJ3LS1LSm1HQm1YQzZuSHc&usp=sharing 13:47:52 https://github.com/theodi/csv-validation-research 13:47:58 JeniT: have you looked at this one 13:48:00 ? 13:48:23 fresco: the data package only specifies only a few parameters 13:48:39 fresco: there are a few more paramters in common use 13:48:47 ... they might become useful 13:49:01 JeniT: how to move that into spec space? 13:49:46 ... we could have it a standalone spec 13:49:56 +q 13:49:58 ... or roll it into the syntax spec as a separate section 13:49:59 q+ 13:50:25 fresco: on the transforamtion side it is interesting whether we would use these 13:50:43 -JeniT 13:50:47 ack stasinos 13:51:39 +[IPcaller] 13:52:27 fresco: three different types of information like comment and white space whould be part of the syntax 13:52:31 danbri has joined #csvw 13:52:51 ... the transforamtion may be a seaprate specification 13:52:51 ... it leaves us with the region specification 13:53:07 JeniT: the region selection should be part of the specification 13:53:50 stasinos: the question is how to describe a region is another thing 13:54:03 ... the syntax doc should have a best practice part 13:54:22 q? 13:54:26 q- 13:54:40 -jumbrich 13:54:49 JeniT: my inclanation to roll it into the syntax spec, with a very separate section 13:55:05 +??P6 13:55:09 ... and a much looser, permissive part describing what can be specified 13:55:20 zakim, ??P6 is me 13:55:20 +jumbrich; got it 13:55:54 JeniT: we need a separate to spec to convert to, say, json, what we need here is how to convert that into an abstract model 13:56:00 JeniT: AOB? 13:56:07 +q 13:56:27 ack stasinos 13:56:47 stasinos: there is a very interesting discussion on the problem of tables transformed into RDF graphs 13:57:07 JeniT: we took the decision with dan _not_ to discuss this until we have the UC document out 13:57:21 ... then, yes, we will get to it 13:57:39 ... but we have to have the basic things done 13:57:53 -JeniT 13:57:54 -fresco_ 13:57:57 -ivan 13:57:58 -DavideCeolin 13:57:58 -stasinos 13:58:04 -jumbrich 13:58:08 -yakovsh 13:58:09 rrsagent, draft minutes 13:58:09 I have made the request to generate http://www.w3.org/2014/03/05-csvw-minutes.html ivan 13:58:10 DATA_CSVWG()8:00AM has ended 13:58:10 Attendees were fresco_, stasinos, ivan, DavideCeolin, JeniT, yakovsh, jumbrich 13:58:20 trackbot, end telcon 13:58:20 Zakim, list attendees 13:58:20 sorry, trackbot, I don't know what conference this is 13:58:28 RRSAgent, please draft minutes 13:58:28 I have made the request to generate http://www.w3.org/2014/03/05-csvw-minutes.html trackbot 13:58:29 RRSAgent, bye 13:58:29 I see no action items