11:04:56 RRSAgent has joined #csvw 11:04:56 logging to http://www.w3.org/2014/09/17-csvw-irc 11:04:58 RRSAgent, make logs public 11:04:58 Zakim has joined #csvw 11:05:00 Zakim, this will be CSVW 11:05:00 ok, trackbot; I see DATA_CSVWG()8:00AM scheduled to start in 55 minutes 11:05:01 Meeting: CSV on the Web Working Group Teleconference 11:05:01 Date: 17 September 2014 11:05:23 Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-09-17 11:05:31 ivan has changed the topic to: Meeting agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-09-17 11:05:41 Chair: Jeni 11:05:46 Scribe: DanBri 11:47:09 AndyS has joined #csvw 11:55:17 stasinos has joined #csvw 11:56:27 waingram has joined #csvw 11:57:47 yakovsh has joined #csvw 11:58:21 JeniT has joined #csvw 11:58:58 DATA_CSVWG()8:00AM has now started 11:59:05 + +1.718.331.aaaa 11:59:17 zakim, +1.718.331.aaaa is me 11:59:17 +yakovsh; got it 11:59:45 zakim, dial ivan-aix 11:59:45 ok, ivan; the call is being made 11:59:46 +Ivan 11:59:58 phila has joined #csvw 12:00:59 +[IPcaller] 12:01:05 zakim, ipcaller is me 12:01:05 +phila; got it 12:01:13 jtandy has joined #csvw 12:02:46 +[IPcaller] 12:03:15 +??P13 12:03:33 zakim, ??P13 is jtandy 12:03:33 +jtandy; got it 12:04:40 +[IPcaller] 12:04:50 zakim, IPCaller is me 12:04:50 +stasinos; got it 12:04:57 zakim, mute me 12:04:57 stasinos should now be muted 12:06:51 scribe: phila 12:06:52 chair: Jeni 12:07:00 Topic: Analysis of use cases 12:07:02 + +1.757.277.aabb 12:07:15 jtandy: I just sent a spreadsheet of use cases 12:07:19 http://lists.w3.org/Archives/Public/public-csv-wg/2014Sep/0068.html 12:07:33 zakim, aabb is bill_ingram 12:07:33 +bill_ingram; got it 12:08:03 jtandy: Having been through the use cases... haven't had time to go through the wiki 12:08:15 ... I have one, as does Dan and Ivan 12:08:27 ... they've been useful to inform our conversations recently 12:08:59 ... We have less than half (9) that specifically talk about transforming from CSV to X 12:09:35 ... not a big number. There are 2 or 3 others that aren't explicit in the demand for transformation but recognise the need for something like this 12:09:54 ... like the GeoJSON one 12:10:01 +[IPcaller] 12:10:05 ... so we have 12 UCs that need to transform CSV to somethign else. Just under half 12:10:18 jtandy: I asked myself a bunch of questions 12:10:30 ... do we have a target output in the use case? Usually no, most don't 12:10:37 ... which makes our assessment more difficult 12:10:39 zakim, IPcaller is me 12:10:39 +AndyS; got it 12:10:54 jtandy: Are column names mapped to properties/QNames? 12:11:24 tjandy: There are examples of mapping to properties, geonames etc 12:11:30 ... are there variables in the cells 12:11:45 ... trying to pick out hte use cases where there is a sub structure in a cell that we need to pull out 12:11:55 danbri_ has joined #csvw 12:12:12 so sorry late (esp. as I volunteered scribe), trapped in transit 12:12:26 ... there are very few examples of that. Such would increase the complexity of our templating question. I think we have 4 UCs with sub structure in a cell and others with a delimited list in a cell 12:12:53 jtandy: Most use cases don't include nesting, intermediate properties etc 12:13:18 + +44.207.346.aacc 12:13:21 zakim, ??aacc is danbri 12:13:21 sorry, danbri_, I do not recognize a party named '??aacc' 12:13:23 ... UC 4, for example, the target RDF/XML here picks up the object (such as a profession). 12:13:26 zakim, aacc is danbri 12:13:26 +danbri; got it 12:14:06 jtandy: In a CSV file you could just link to the cell. Need to think about cases where we're converting sets of files - how you want to aggregate those into a single target output or not 12:14:19 ... that prob isn't a templating Q itself but it is a question 12:14:41 ... analysing scientific spreadsheets. No complex structure, but there is a need to express units of measurements assigned to each cell. 12:15:12 ... That might be done at metadata level (or Data Cube). So I think we can avoid that 12:15:33 jtandy: Multiple tables in a single file prob don't meet our criteria of what is a CSV file. Fair? 12:15:36 All: yes 12:15:43 phila, I can take over 12:15:45 s/All:/All - / 12:15:51 usecase 21, biodiversity... is there complex structure in the output? 12:15:56 i've concluded not entirely 12:16:04 default pairs ... 12:16:09 scribenick: danbri_ 12:16:17 usecase 23, introduces idea of multiple columns all having the same semantic property 12:16:30 but the idea is that if you had up to 3 geo area codes, ... you could have one in each column 12:16:38 repeated values 12:16:46 usecase 24, hierarchy w/ occupational listings 12:16:52 does require a complex structure to be created 12:17:08 skos broader relationships that are derrived from occupational listing codes, ... transitive 12:17:18 could be generated via sparql construct afterwards 12:17:22 i.e. there are some workarounds 12:17:30 the only one needing conditional processing is occ. listings 12:17:33 conditional rules or flow control 12:17:54 jtandy: there are very few examples, none amongst use cases, where we need to manipulate value of strings to build target output 12:18:01 the only place i've come across doing this kind of thing before 12:18:06 could be hidden in usecadses 12:18:18 is generation of certain URI structures based on literal text input 12:18:31 e.g. generating URI-bsaed identifiers for the object that a row talks about 12:18:51 jtandy: as a quick overview, about half talking about transforms into other formats. But v few of those are complicated. 12:18:55 jeni: thanks, that's really useful! 12:19:08 very few that require even string processing as values to get stuff out 12:19:17 very few require text output restructuring(?) 12:19:37 ... i.e. "we need to be express this tabular structure" not "we need to convert it into this other structure" 12:19:40 http://lists.w3.org/Archives/Public/public-csv-wg/2014Sep/0036.html 12:19:46 jenit: see also this small piece of work documenting usecases 12:19:53 when people want to be doing a transformation 12:20:02 i called out 3 possibilities here 12:20:11 2 are re use created configurations 12:20:16 e.g. downloading 2nd csv file 12:20:21 ("weird echo") 12:20:33 cavernous sounds 12:21:11 1st ex was downloading set of csv files, wanting to import that into an sql db or similar 12:21:27 ... in such case, the person acquiring the CSVs will typically know the table structure they want to create 12:21:35 for the partiicular data import tool 12:21:50 2nd example, someone creating a web app displaying data from a CSV on say a map 12:22:08 and for that then if the people who are publishing a metadata file, ... defining a conversion into geojson 12:22:14 they can use that conversion for that particular display 12:22:24 but you can easily imagine someone wanting a different json target 12:22:27 e.g. a graph etc 12:22:43 3rd example, someone using serverside software to statically generate a website 12:22:55 like http://jekyllrb.com/ 12:23:11 e.g. if its contact info, they might generate vcard, schema.org JSON, produce some html with embedded metadata 12:23:25 those were the examples that I thought of 12:23:28 diff characteristics 12:23:50 in particular what came through to me, it's quite rare, quite tool specific, ... may be person specific 12:24:05 the appropriate conversion might depend on the kind of output you're actually aiming for 12:24:12 q+ 12:24:22 ack jtandy 12:24:31 (danbri: e.g. http://stackoverflow.com/questions/11088303/how-to-convert-to-d3s-json-format for D3 is common) 12:24:45 jtandy: the times i've wanted more complex output is ironically when we're trying to match community/standard models 12:24:53 in trying to get to a common way of expressing data, it gets more complex 12:25:10 e.g. if I wanted to use QUDT, or semantic sensor networks, ... 12:25:45 geojson - complexity usually is pulling out the geometry 12:25:58 others like vcard largely easy end of scale 12:26:03 rather than deeply complicated data 12:26:14 q+ 12:26:16 jenit: more comments? 12:26:18 ack phila 12:26:29 q+ to mention URIi templates. 12:26:32 phila: following up jtandy, ... re use of string functions for URI generation 12:26:45 escaping? 12:26:46 I had experience of trying to do that, ... basic string function of removing white space, case normalization, ... 12:26:53 but that's as complex as it got 12:27:15 phila: was simple excel spreadsheet, using awk 12:27:20 so turning string name of a ministry into a URI 12:27:24 pretty basic stuff 12:27:34 case normalize, and get rid of whitespace 12:28:03 ack AndyS 12:28:03 AndyS, you wanted to mention URIi templates. 12:28:09 AndyS: similar to what Phil says 12:28:12 We use a lot of URI templates 12:28:16 multiple fields into one URI 12:28:20 sector, area, ID all go in. 12:28:32 certain amount of cleaning, string manipulation, whitespace, chars we don't want, ... 12:28:42 beyond this, issue of validation 12:28:51 what to do when the data doesn't match what you need 12:29:05 although it's possible to handle it when it comes out the other end, ... feels wrong 12:29:09 but not clear cut 12:29:20 a desire to know when there's an issue and flag an error 12:29:33 jenit: we def want to be able to support validation against metadata file 12:29:54 q+ to talk about validation 12:30:15 ack phila 12:30:15 phila, you wanted to talk about validation 12:30:20 ack phila 12:30:30 phila: we're close to launching a WG on RDF validation 12:30:41 (danbri: aka 'data shapes' I think) 12:30:59 phila: although this is rdf only, the two are closely related. there's a danger both groups try to punt it to the other 12:31:31 ... other wg if its creation goes ahead as (nonbindingly) anticipated, ... could maybe be useful 12:31:42 jenit: downstream validation has the issues that andys identified 12:31:45 q+ 12:31:47 any more re requirements? 12:31:50 q- 12:32:07 jenit: next- a straw poll, helping us to see where we're at w.r.t. question 12:32:08 http://lists.w3.org/Archives/Public/public-csv-wg/2014Sep/0067.html 12:32:18 re transformation, templating 12:32:23 4 basic options (see mail) 12:32:33 a) providing no customisation of mappings to other formats 12:32:38 leaving it completely unspecified 12:32:45 I thought you said 'a' :) 12:32:48 <- 1. 12:33:02 2. Providing some kind of hooks for customised mappings 12:33:10 but nothing normative for what's used 12:33:26 3. Adopting an existing templating language, such as but not necessarily Mustache 12:33:40 providing a way to map data in csv into the variables used by that existing templating language 12:33:48 4. Going into specifying our own tempating language. 12:34:12 (is this multiple choice? I like 2. + 3.) 12:34:25 Epimorphics --> https://github.com/epimorphics/dclib 12:34:28 choose one as preferred direction 12:34:39 q+ 12:35:32 danbri: I prefer (3. with Mustache as starting point) ...with (2. to allow others), and a hint of (4.) in that Mustache could be stretched a bit, and called Mustache-inspired. 12:35:33 jtandy_ has joined #csvw 12:35:53 jenit: Ivan, you're suggesting an investigatory period? 12:36:03 ivan: at least ... this is the way we interpret however we choose 12:36:49 Jeni: straw poll on your preference for what we do next, with the assumption that if we investigate templating lang and if it's too difficult we revise our opinion (at end of year) 12:36:56 ack jtandy 12:37:31 jtandy: as i've been thinking about how we might call out to other tmplating langs, e.g. xslt, sparql constructs, other things that can do our processing, ... it wasn't clear to me how / what mechanism we might have in place to provide those hooks out for external formats 12:37:31 q+ 12:37:35 which is what 2. is talking about 12:37:38 q+ 12:37:46 can someone give a 2 minute education on (2.)? 12:38:13 jenit: within metadata file there is a property called mappings which has objects that give a title, a format, a ref to a template thing 12:38:37 http://en.wikipedia.org/wiki/RDDL 12:38:43 3. is more like GRDDL 12:38:56 jtandy: how do you get the object to the template lang? 12:39:02 jeni: that would be implementation defined 12:39:09 whereas 3. we'd define exactly what that would look like 12:39:12 for a given language. 12:39:16 ack ivan 12:39:26 ivan: to come back to your option 2 12:39:37 ... and actually even 3 12:39:43 do we define some sort of a simple fallback mapping 12:39:48 or we don't do anything whatsoever 12:39:57 e.g. if i want a json out of the csv file 12:40:00 “In all cases, we need to specify a default mapping to RDF/XML/JSON that is purely based on the metadata (which is also used to inform validation and display of the CSV files).” 12:40:03 but i do not refer to any external tool 12:40:08 does that mean i get nothing whatsoever? 12:40:18 or we have straightfwd way to extract csv in json 12:40:24 jeni: see above from email 12:40:59 q- 12:41:00 (aside: just remembered http://www.w3.org/TR/sparql11-results-json/ as a json table format) 12:41:04 q? 12:41:09 q+ 12:41:14 ack stasinos 12:42:00 jenit: ... freedom and configurability vs sensible defaults 12:42:03 1/2/3 are the same to us ... does not create a (sufficiently big) tool economy. 12:42:07 s/jenit/stasinos/ 12:42:30 stasinos: can we stipulate that it should be the case that it should be ok for all producers and should be used, but needn't be a MUST 12:42:35 you could choose to use something else 12:42:38 q+ 12:43:23 q+ 12:43:25 jenit: you'd want extensibility option, ... something with an understood level of conformance on the use of a particular templating language to get to do the conversion. 12:43:27 ack AndyS 12:43:30 zakim, mute me 12:43:30 stasinos should now be muted 12:43:56 andys: need uri mapping 12:44:22 andys: i'm answering from pov of people with requirements on transforming csv to rdf 12:44:34 making URIs for some scheme is a v important requirement for that 12:44:46 jenit: I think you could provide templates within emtadata file 12:44:53 don't need a full tempating system for that 12:45:01 andys: quite possible. but the requirement remains. 12:45:15 ack jtandy 12:45:21 ack jtandy_ 12:45:38 jtandy: (4.) a simple templating language. Is an example of that the restrictions that dan explained to us re Polymer 12:46:11 (discussion of polymer vs mustache) 12:46:20 andys: mustache lets you set things up before calling templates 12:46:22 q? 12:46:28 ... so has equiv of polymer but done in a diff way 12:46:33 4 12:46:38 2 12:46:38 initial straw poll. TYPE INTO IRC NOW 12:46:39 2 12:46:47 4 12:46:48 2 12:46:49 3 12:46:57 2 12:47:06 3 12:47:18 (BUT 2 MAY 2) 12:47:34 chairs shared their technical opinion 12:48:23 phila says that (4) looks complete - but dont make it super complex 12:48:28 -jtandy 12:48:52 phila: ultimately what I care about is that the wg has capcity to deliver 12:49:05 +??P13 12:49:07 I keep meeting people who are really looking fwd to the results of this group but don't have time to help. 12:49:21 zakim, ??P13 is jtandy_ 12:49:21 +jtandy_; got it 12:49:21 zakim, ??P13 is me 12:49:22 I already had ??P13 as jtandy_, jtandy_ 12:49:35 andys: i'd like to reinforce what phil said. the classic opensource issue here is that "someone else will do it". 12:49:53 if you're getting that kind of interest from outside, then it is time for the group to start broadcasting outward what the real factors are 12:50:03 e.g. start setting expectations 12:50:11 if the expectations exceed what the group's delivered 12:50:21 otherwise great work may go unappreciated 12:51:02 jenit: two have argued for specifying a templating lang; several who said hook and impl defined 12:51:07 2 said existing tpl lang 12:52:02 jtandy: my issue with 2 is that there is a big gap between how we take it out of parser, and into relevant templating lang, ... 12:52:05 [choppy noises] 12:52:11 jtandy - can you type 12:52:21 jtandy: re 3., mustache etc, those things may change 12:52:26 q+ to speak for 3.5 12:52:37 ... change control etc 12:52:41 which leaves us with 4. 12:52:43 +1 to opposing 3 for the reasons Jeremy gives 12:52:44 q+ 12:52:45 ... doing a bit of work 12:52:47 ack danbri_ 12:52:47 danbri_, you wanted to speak for 3.5 12:53:13 danbri says (4) only if we start from mustache 12:53:15 ack ivan 12:53:18 q+ 12:53:38 ivan: i voted 4 as i've played with that, i had this proposal, a stripped down mustache, which might be good enough 12:53:48 q+ 12:53:57 what really made me change, and i didn't sync w/ phil, ... experience is that we don't have enough ppeople to properly do that, even that level that I did 12:54:18 ... a bigger group w/ more people, I believe 4 is doable 12:54:27 could be pretty small 12:54:34 I essentially did something that I believe covers most of the use cases 12:54:42 q? 12:54:45 ack stasinos 12:55:11 stasinos: I was thinking, if it's to be something that is simpler than an existing lang, then it kind of begs the question why to bother to 12:55:24 ... vs refer to a specific github etc version 12:55:37 q+ 12:55:49 (aside: see http://www.w3.org/2013/09/normative-references ) 12:55:56 ack AndyS 12:56:00 stasinos: but for our own i don't think we're in position to complete it 12:56:03 zakim, mute me 12:56:03 stasinos should now be muted 12:56:13 AndyS: you asked why I voted for (4.), looking at other specs that feel close, in w3c space 12:56:22 if you look at something like GRDDL, it isn't a stunning success 12:56:33 GRDDL used XSLT right? 12:56:37 it's a real shame there isn't a full blown rdf rules language 12:56:40 yes Jenni 12:56:41 (RIF isn't?) 12:56:49 (for some sense of 'blown') 12:57:00 IIRC GRDDL strongly suggests but doesn't require XSLT 12:57:01 ...R2RML gets some traction but not sure if will be a roaring success 12:57:15 andys: SPARQL amazingly overshot, but back then WGs could do that 12:57:18 features did creep in 12:57:25 reason was that there were things ppl wanted to do 12:57:29 there was resistence on putting things in 12:57:38 being driven by user needs made it hard 12:57:55 one poss is to say 'if that's the way we want to go, separate it out somewhere, send it out to be a CG' 12:58:02 q+ 12:58:05 a small group could work on it in a diff way, come up with a particular proposal 12:58:06 ack ivan 12:58:10 q? 12:58:34 ivan: mustache is a v good example for the difficulties we might have 12:58:46 i initially used a mustache impl, csv files have their own features 12:58:51 mustache is text to text 12:59:30 ivan: ... we choose one tpml lang ... don't think that is practically doable 12:59:43 andys: what if we start from an existing one, then kinda fix it? make it 'the w3c one'? 12:59:46 I'd like that 12:59:51 jenit: I think that is a good approach 12:59:56 ivan: that means cutting back a bunch of things 13:00:18 -danbri 13:00:31 RRSAgent, make logs public 13:00:34 can we 13:00:36 RRSAgent, draft minutes 13:00:36 I have made the request to generate http://www.w3.org/2014/09/17-csvw-minutes.html phila 13:00:52 phila: e.g. i wanted geojson but the community who created it weren't so interested (was that right? --scribe) 13:01:01 oops. can we specify a minimum set of requirements for the template lang? driven by our use cases 13:01:09 Something like that, yes - but it's nuanced 13:01:09 bill_ingram has left #csvw 13:01:14 -jtandy_ 13:01:18 -AndyS 13:01:20 -JeniT 13:01:20 -phila 13:01:21 -bill_ingram 13:01:23 -stasinos 13:01:24 -Ivan 13:01:27 rrsagent, draft minutes 13:01:27 I have made the request to generate http://www.w3.org/2014/09/17-csvw-minutes.html ivan 13:01:51 zakim, who is here? 13:01:51 On the phone I see yakovsh 13:01:52 On IRC I see jtandy_, danbri_, jtandy, phila, JeniT, Zakim, RRSAgent, ivan, fresco, trackbot 13:02:37 -yakovsh 13:02:38 DATA_CSVWG()8:00AM has ended 13:02:38 Attendees were yakovsh, Ivan, phila, JeniT, jtandy, stasinos, +1.757.277.aabb, bill_ingram, AndyS, +44.207.346.aacc, danbri, jtandy_ 13:03:41 trackbot, end telcon 13:03:41 Zakim, list attendees 13:03:42 sorry, trackbot, I don't know what conference this is 13:03:49 RRSAgent, please draft minutes 13:03:49 I have made the request to generate http://www.w3.org/2014/09/17-csvw-minutes.html trackbot 13:03:50 RRSAgent, bye 13:03:50 I see no action items