14:43:46 RRSAgent has joined #csvw 14:43:46 logging to http://www.w3.org/2015/02/18-csvw-irc 14:43:48 RRSAgent, make logs public 14:43:48 Zakim has joined #csvw 14:43:50 Zakim, this will be CSVW 14:43:50 ok, trackbot; I see DATA_CSVWG()10:00AM scheduled to start in 17 minutes 14:43:51 Meeting: CSV on the Web Working Group Teleconference 14:43:51 Date: 18 February 2015 14:53:59 jtandy has joined #csvw 14:57:09 gkellogg has joined #csvw 14:57:44 JeniT has joined #csvw 15:00:10 danbri has joined #csvw 15:00:13 zakim, code? 15:00:13 the conference code is 2789 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), gkellogg 15:00:28 DATA_CSVWG()10:00AM has now started 15:00:34 +??P11 15:00:39 zakim, I am ??P11 15:00:39 +gkellogg; got it 15:00:42 zakim, dial ivan-voip 15:00:42 ok, ivan; the call is being made 15:00:43 +Ivan 15:01:20 +[IPcaller] 15:01:24 zakim, IPcaller is danbri 15:01:24 +danbri; got it 15:01:41 +[IPcaller] 15:01:49 zakim, IPcaller is me 15:01:50 +DavideCeolin; got it 15:02:02 +[IPcaller] 15:02:02 zakim, who is talking? 15:02:13 danbri, listening for 10 seconds I heard sound from the following: gkellogg (33%), [IPcaller] (19%) 15:03:08 jumbrich has joined #csvw 15:03:55 +??P2 15:04:04 zakim ??P2 is me 15:04:18 Q: how many of us are Skyping in? considering Ivan's notification that Zakim will be shut down later this year 15:04:27 (I'm skyping) 15:04:40 I’m on skype 15:05:13 Dan, Ivan (called back to skype-in), JeniT, Gregg, … 15:05:34 scribenick: danbri 15:05:39 jenit: 6 issues listed for discussion 15:05:42 https://lists.w3.org/Archives/Public/public-csv-wg/2015Feb/0020.html 15:06:16 https://github.com/w3c/csvw/issues/195 15:06:16 … some quick hopefully, others need a bit more discussion 15:06:26 "Effect of tableSchema on both Table and TableGroup #195" 15:06:30 rrsagent, pointer? 15:06:30 See http://www.w3.org/2015/02/18-csvw-irc#T15-06-30 15:06:50 jenit: I proposed 2nd of the suggested options 15:07:05 namely that the one in table completely overides that in the table group 15:07:07 no fancy merging 15:07:09 any objections? 15:07:17 ivan: in spite of the normalization of the metadata? 15:07:18 jenit: yes 15:07:27 gkellogg: it's more a search path 15:07:41 when you're going through the cols you look at the first schema you find 15:07:50 you don't take into effect what might also be represented in the table gorup 15:08:02 you might imaging using table group … matching cols in diff langs 15:08:06 that would not be supported here 15:08:17 jenit: or you can imagine this as part of the normalization process 15:08:33 e.g. copying in where missing [approximately what jeni said] 15:08:40 jenit: any objections to proposed closure? 15:09:10 [none heard - resolved] 15:09:21 [updated to indicate editor action status] 15:09:28 https://github.com/w3c/csvw/issues/226 15:09:30 rrsagent, pointer? 15:09:30 See http://www.w3.org/2015/02/18-csvw-irc#T15-09-30 15:09:35 "Support for totalDigits and fractionDigits #226" 15:09:58 jenit: suggestion is that we remove these 15:10:07 ivan: +1 15:10:10 +1 15:10:11 +1 15:10:19 (gkellogg +1'd in git) 15:10:25 resolved - moved to editor action 15:10:25 +1 15:10:41 https://github.com/w3c/csvw/issues/220 15:10:50 rrsagent, pointer? 15:10:50 See http://www.w3.org/2015/02/18-csvw-irc#T15-10-50 15:10:51 http://w3c.github.io/csvw/metadata/#processing-tables 15:10:53 "Move Processing Tables section from metadata to model document #220 15:11:18 jenit: this is about what processors can do with tables - displayed, converted, etc etc 15:11:25 suggestion is that we move that section into the Model doc 15:11:33 …because it is about actions over the table model. 15:11:48 … that means that the metadata doc is purely about how to annotate a table model / generate a table model 15:11:51 +1 15:12:03 gkellogg: might require creating a couple of term definitions in the syntax doc 15:12:16 … juggling those between specs; makes a lot of sense to keep that in a single place. 15:12:19 jenit: yep 15:12:35 … probably a bit more editorial juggling to do also, cross-refs, where terms are defined etc. 15:12:43 +1s from gkellogg, ivan on issue 15:12:46 +1 from danbri here 15:12:47 +1 15:12:49 +1 15:13:02 resolved - moved to editorial 15:13:19 https://github.com/w3c/csvw/issues/212 15:13:27 jenit: the other 3 issues arose from possible use case offered in #212 15:13:38 … looking at real life edu data, school performance stats 15:14:11 … one thing interesting here is that looking at the data in depth, you see particular codes eg. SUPT, NE (= not entered) that can take the place of normal statistics 15:14:29 i.e. cols have typically got numeric content but can have such values as alternate content 15:14:39 one way of viewing those is to view them as null values 15:14:55 jenit: I'll summarize this set of issues as they are linked 15:14:56 https://github.com/w3c/csvw/issues/218 15:15:02 one approach is to say they're all kinds of nulls 15:15:23 see #218, … which says the value interpreted as null could be written multiple ways 15:15:27 … give a bit more structure, etc 15:15:39 https://github.com/w3c/csvw/issues/223 15:15:50 (aside https://en.wikipedia.org/wiki/Null_(SQL) … scary world) 15:16:05 jenit: #223 explores possibility of union values 15:16:13 cols are numbers or else strings like NA 15:16:40 q+ 15:16:42 https://github.com/w3c/csvw/issues/224 15:16:44 in order to support union-based types, where you'd need to list set of datatypes that the cells needed to comply with you would really need a different structure for datatypes 15:17:01 … which brings us to #224, "Reworking structure for datatypes #224". 15:17:04 ack gkellogg 15:17:04 q? 15:17:10 q+ 15:17:34 gkellogg: another use case I've often seen: col might have different date stamps in it. Dates and Date-times intermingled. This would allow a super datatype there 15:17:44 maybe allowing different dates in different formats 15:17:48 ack ivan 15:18:14 ivan: original poster explicitly asked for data type unions, not sure it was just w.r.t. null. 15:18:30 jenit: maybe it is helpful then to talk about the requirement for union types and whether we want to support them 15:18:31 q+ 15:18:42 ack ivan 15:18:44 thoughts on supporting union types? 15:18:56 ivan: I was wondering about the opposite direction? 15:19:15 (jumbrich - is there anything from your study of actual CSV files to guide us here?) 15:19:16 -jumbrich 15:19:29 jenit: any objections to restructuring the datatypes? 15:19:44 +??P2 15:19:48 … basically they become their own little object, including base datatype, … e.g. decimal, … then you have extra properties 15:19:53 zakim, ??P2 is jumbrich 15:19:53 +jumbrich; got it 15:20:01 … we could imagine in future naming those 15:20:01 +1 15:20:10 jenit: i personally think it is the right way to structure it 15:20:25 q+ 15:20:42 jenit: you can still say 'decimal' 15:20:45 ack jumbrich 15:20:48 ack m 15:20:49 but you could also use a structure to set max/min etc 15:20:51 ack mr 15:20:54 ack ivan 15:21:30 jumbrich: re dan's q, … in our study we used a simple heuristic, tried to guess cell type by using regexes, … and we found a couple of times that a col had multiple types in it. We just went with the majority. 15:21:47 I could try to look deeper, find what kinds of types did this 15:22:08 jenit: the fact that you noticed that that was happening is enough to know it's out there in real data 15:22:24 jumbrich: other case is decimals … maybe excel export 15:22:35 … maybe more interesting when strings vs numericals; or strings vs urls 15:22:50 I can try to look a bit into detail, what kind of different datatypes we observed in 1 col, and report back 15:23:05 ivan: you were asking for usecases. would it help to get that? 15:23:19 s/ivan:/jenit: ivan,/ 15:23:33 ivan: what if a cell can be interp'd by several of the datatypes 15:23:38 jenit: suggest using array order 15:24:04 ivan: yeahbut, … i know pathological, but imagine 2 alternatives. One is JSON, the other is a string. 15:24:09 How do I decide that a string is JSON 15:24:18 … or say XML 15:24:24 do I have to parse the whole thing to know? 15:24:31 I'm not sure we have a clear idea about these ugly edge cases 15:24:49 jenit: that is a separate issue about validation of xml and json and html 15:25:05 … this arises regardless because we say values have to be valid against whatever the datatype is 15:25:28 …we can go one of two ways. We say you have to go all the way and really parse it. Or else say that typing on these cols is just a hint, ... 15:25:43 (which has implics for union types in case that the markup isn't as valid as intended) 15:26:02 gkellogg: we could say that datatypes explicitly have a regex form, then that is used to match, otherwise it is the first found. That would basically get everything. 15:26:10 There are a couple of areas in xsd whjere you have confusions 15:26:16 datatime stamps are datetimes, etc 15:26:21 otherwise they are largely in diff spaces 15:26:38 ivan: but then, … for time being at least, a convertor to json or to rdf can be lax 15:26:57 … meaning that it does not do any validation. it will just believe what is there, and produce whatever is the datatype that is signalled. 15:27:12 if we introduce this, strictly speaking a converter cannot be lax, it'll have to make a decision. 15:27:18 jenit: depends on your meaning of 'lax' 15:27:34 … text we have currently says that an errror must be generated if not valid against a particular datatype 15:27:39 e.g. a decimal vs the word 'foo' 15:27:56 …means that the value of the cell is set to the string 'foo' rather than a decimal number. The conversion 15:28:04 …can then do whatever with that value. It could be lax and … 15:28:11 or strict and raise error. 15:28:13 various options. 15:28:26 jenit: the checking of the value and the generation of the value for the cell happens regardless. 15:28:39 it has to, otherwise you get in real messes around parsing of dates etc. 15:29:02 gkellogg: say i have a set of datatypes, e.g. date, boolean, … listed, … 15:29:23 as a convertor i need to check lexical form to see if value matches date, or then boolean, … then if it doesn't, what do i use, the last one? string? 15:29:35 jenit: it (i.e. "foo" in example) is a string 15:29:44 … if you had datatype: boolean, and string value was foo 15:29:50 …then value of the cell is the string 'foo' 15:30:15 gkellogg: if the type was xml literal, … because we don't have a defined format for detecting it, i'd just go ahead and say it was an xml literal 15:30:53 … could say default comes from def of that datatype 15:30:58 … format/pattern 15:31:18 https://github.com/w3c/csvw/issues/236 << validation of html/xml/json datatypes 15:31:21 we could then have same datatype diff times with diff patterns/formats 15:31:43 ivan: we've moved away from the q of whether we want a structure for datatypes to be an array 15:31:46 https://github.com/w3c/csvw/issues/224 15:31:52 +1 15:32:03 jenit: let's try closing 224. any objections to sturctured datatypes? 15:32:10 +1 15:32:10 [tumbleweed] 15:32:17 +1 is for it, right? :) 15:32:22 Right 15:32:28 resolved -> editor action 15:32:30 rrsagent, pointer? 15:32:30 See http://www.w3.org/2015/02/18-csvw-irc#T15-32-30 15:32:47 https://github.com/w3c/csvw/issues/236 15:32:49 #226 15:33:10 jenit: if we have a cell that is marked as being e.g. json do we want to validate that it is actually json 15:33:16 similarly for xml, html, … 15:33:34 ivan: I agree w/ not validating 15:33:53 danbri: +1 for not needing to 15:34:01 gkellogg: validation isn't the right word 15:34:05 (xml-wf?) 15:34:20 q+ to suggest 'wellformedness' 15:34:37 q- 15:34:50 ivan: not even WF, as an xml segment needn't have a top level element 15:35:02 -jumbrich 15:35:08 all we could find in rdf discussion of this was some DOM function 15:35:14 gkellogg: just wording choice 15:35:18 +??P2 15:35:25 i don't think we want/need detection on these 3 15:35:26 zakim, ??P2 is jumbrich 15:35:26 +jumbrich; got it 15:35:34 just a note to say that pattern/format can be used 15:35:38 to help discriminate 15:35:41 jenit: can you clarify? 15:36:11 gkellogg: I mean that if someone wanted to try to discriminate, based automatically on datatype, … could put a format in there which looked for so that they could distinguish 15:36:26 ivan: i'd keep it simple 15:36:30 Proposal: we won’t built-in recognise/validate html/xml/json, but add a note to say that authors can add a pattern if they want 15:36:49 (that's a heavy rider tacked on the end) 15:36:59 ivan: so if there is a pattern i have to use it? 15:37:48 RRSAgent has joined #csvw 15:37:48 logging to http://www.w3.org/2015/02/18-csvw-irc 15:37:51 file:///Users/user/Documents/projects/w3ctag/csvw/metadata/index.html#formats-for-other-types 15:38:05 http://w3c.github.io/csvw/metadata/#formats-for-other-types 15:38:11 rrsagent, please make logs public 15:38:25 jenit: "format property provides a regex ...." 15:38:41 gkellogg: in formats for data/time the format is yyyy-mm-dd in which case that is not a regex 15:38:47 but there is still a pattern property 15:39:12 gkellogg: what are consequences of having both format and pattern? 15:39:20 jenit: pattern only on a format for a numeric type 15:39:26 not at the top level, alongside format 15:39:29 http://w3c.github.io/csvw/metadata/#formats-for-numeric-types 15:39:30 never clashes 15:39:45 gkellogg: ok 15:40:12 jenit: back to HTML/XML/JSON people can already use format property to constrain the value as gregg described 15:40:18 so the note is just a pointer to existing functionality 15:40:19 ivan: ok 15:40:19 Proposal: we won’t built-in recognise/validate html/xml/json, but add a note to say that authors can add a pattern if they want 15:40:25 +1 15:40:28 +1 15:40:32 +1 15:40:36 +1 15:40:39 +1 15:40:45 +1 15:40:46 rrsagent, pointer? 15:40:46 See http://www.w3.org/2015/02/18-csvw-irc#T15-40-46 15:40:57 resolved 15:41:55 https://github.com/w3c/csvw/issues/218 - Categories of null values #218 15:42:17 gkellogg: instead of treating these multiple values as diff version of null, could treat them as …[missed] 15:42:21 tokens 15:42:26 ivan: what is current situation? 15:42:34 … it looked as if null was already an array 15:42:47 https://github.com/w3c/csvw/issues/136 15:42:54 jenit: it is but we decided from #136 that null would become a single value 15:43:06 … so the doc hadn't been updated to reflect that resolution 15:43:26 effectively #218 reopens #136 but with more of a rationale for why you might want multiple null values 15:43:38 ivan: what's the merge? do we concat the arrays? 15:43:42 jenit: atomic 15:43:46 so you do not merge the arrays 15:43:57 if you have two metadata files, the null list from A overides from B 15:44:00 ivan: fine w/ that 15:44:07 … and ok w/ several null values that way 15:44:10 jenit: ok 15:44:33 Proposal: allow several null values, but merge in an atomic way (don’t merge arrays) 15:44:39 +1 15:44:45 +1 15:44:46 +1 15:44:46 +1 15:44:47 +1 15:44:49 +1 15:45:06 q+ 15:45:10 rrsagent, pointer? 15:45:10 See http://www.w3.org/2015/02/18-csvw-irc#T15-45-10 15:45:29 https://github.com/w3c/csvw/issues/223 15:45:36 ack ivan 15:45:44 https://github.com/w3c/csvw/issues/223 - Allowing "unions" of datatypes? #223 15:46:02 ivan: if we move to datatypes being these objects then the q of merge arises for those as well, regardless of the union issue 15:46:05 jenit: true 15:46:14 ivan: we merge atomically or property by property 15:46:17 [someone said 'yes'] 15:46:20 jenit: i agree 15:46:30 gkellogg: general trend is to make a small set of things which merge. 15:46:38 ivan: maybe adding that note to the issue? 15:47:35 jenit: propose that we allow arrays of datatypes to be provided and the first datatype wins in terms of labelling a particular value 15:47:39 Proposal: we allow arrays of datatypes to be provided, and the first matching datatype wins in terms of assignment of datatype to a particular value 15:47:48 (and atomic merge) 15:48:15 ivan: at this moment my vote is "If we do it, then yes that's the way we should do it" (but) I would like to see jumbrich's measures before we make a decision on this rather than rush a new feature based on 1 new use case 15:48:20 jenit: ok 15:48:25 gkellogg: leave a week? 15:48:34 … we need to decide 15:48:55 jenit: what do you think is your measure of what would be persuasive. How many cases? 10, 20? 3? 15:49:01 ivan: not a number 15:49:27 … he goes through a certain amount of usecases for scientific data. If only 5% have this feature I would go against it. Obviously if it goes up to 30% then yes. 15:49:41 info: we should have around 80K+ documents, from which 60k we could parse. 15:49:48 q+ 15:49:55 jenit: I'd say 5% is too high a bound. We should support features that are in 1 in 20 docs 15:49:57 ack jumbrich 15:50:00 gregg/dan: agree 15:50:21 jumbrich: i will have a look and report on how many cols per doc we found at least 2 or 3 datatypes 15:50:27 then try to present them in a reasonable way 15:50:29 q+ 15:50:41 -jumbrich 15:50:49 [click] 15:50:55 +??P0 15:51:11 zakim, ??P0 is jumbrich 15:51:11 +jumbrich; got it 15:51:14 (sip client problems) 15:51:32 q- 15:51:37 ivan: I'll be away next week 15:51:41 (dan away too) 15:51:52 jumbrich: I'll email around beforehand 15:52:12 ivan: fear of feature creep 15:52:13 ACTION: jumbrich to do an analysis on union types to see if they are prevalent in real data 15:52:13 Created ACTION-64 - Do an analysis on union types to see if they are prevalent in real data [on Jürgen Umbrich - due 2015-02-25]. 15:52:23 ack danbri 15:53:31 jenit: fine, leave til next week 15:53:52 jenit: jumbrich's github id? 15:54:00 jumbrich is my github id 15:54:17 ...access on repo? 15:54:34 ivan: no 15:55:03 ivan to add jumbrich to our github group 15:55:13 jumbrich: if we write code, should it be hosted here? 15:55:20 e.g. to extract metadata files etc? 15:55:26 ivan: no need to, can host wherever 15:56:02 AOB? 15:56:36 (github admin details) 15:58:13 Adjourned. 15:58:32 KUTGW - please try to vote on proposals etc on github. 15:58:42 rrsagent, please draft minutes 15:58:42 I have made the request to generate http://www.w3.org/2015/02/18-csvw-minutes.html danbri 15:58:46 -DavideCeolin 15:58:47 -gkellogg 15:58:48 -JeniT 15:58:48 -jumbrich 15:58:51 -Ivan 15:58:53 -danbri 15:58:53 DATA_CSVWG()10:00AM has ended 15:58:53 Attendees were gkellogg, Ivan, danbri, DavideCeolin, JeniT, jumbrich 15:59:38 trackbot, end telcon 15:59:38 Zakim, list attendees 15:59:38 sorry, trackbot, I don't know what conference this is 15:59:46 RRSAgent, please draft minutes 15:59:46 I have made the request to generate http://www.w3.org/2015/02/18-csvw-minutes.html trackbot 15:59:47 RRSAgent, bye 15:59:47 I see 1 open action item saved in http://www.w3.org/2015/02/18-csvw-actions.rdf : 15:59:47 ACTION: jumbrich to do an analysis on union types to see if they are prevalent in real data [1] 15:59:47 recorded in http://www.w3.org/2015/02/18-csvw-irc#T15-52-13