12:50:36 RRSAgent has joined #csvw 12:50:36 logging to http://www.w3.org/2014/02/12-csvw-irc 12:51:02 Zakim has joined #csvw 12:51:14 zakim, this will be csvw 12:51:14 ok, danbri; I see DATA_CSVWG()8:00AM scheduled to start in 9 minutes 12:51:23 scribe: Dan Brickley 12:52:34 scribeNick: danbri 12:53:02 Meeting: CSVW weekly, Feb 12 2014 12:53:31 Chair: Jeni Tennison 12:57:36 JeniT has joined #csvw 13:00:09 jtandy has joined #csvw 13:00:20 Mathew has joined #csvw 13:01:05 DATA_CSVWG()8:00AM has now started 13:01:10 +??P6 13:01:12 +[IPcaller] 13:01:31 Zakim, ??P6 is me 13:01:31 +jtandy; got it 13:01:33 +[IPcaller.a] 13:01:42 zakim, P6 is me 13:01:42 sorry, AndyS, I do not recognize a party named 'P6' 13:01:59 AxelPolleres has joined #csvw 13:02:04 konstant has joined #csvw 13:02:11 +[IPcaller.a] 13:02:21 zakim, IPCaller is me 13:02:21 sorry, AndyS, I do not recognize a party named 'IPCaller' 13:02:28 zakim, IPcaller is me 13:02:28 sorry, AndyS, I do not recognize a party named 'IPcaller' 13:02:33 +[IPcaller] 13:02:44 hi 13:02:48 zakim, who is on the phone? 13:02:48 On the phone I see AndyS, jtandy, JeniT, fresco, [IPcaller] 13:03:12 +AxelPolleres 13:03:14 +[IPcaller] 13:03:17 +??P11 13:03:25 zakim, [IPcaller] is me 13:03:25 +davideceolin; got it 13:03:31 zakim, P11 is konstant 13:03:31 sorry, konstant, I do not recognize a party named 'P11' 13:03:37 jumbrich has joined #csvw 13:03:37 zakim, ??P11 is konstant 13:03:38 +konstant; got it 13:04:26 ScribeNick: danbri 13:04:29 Zakim, AxelPolleres is AxelPolleres and jumbrich 13:04:29 I don't understand you, AxelPolleres 13:04:36 Scribe: Dan 13:04:41 I can't hear anything on the telecon. I'll dial back in 13:04:43 (I was afraid, you'd say that, Zakim ;-)) 13:05:23 agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-02-12 13:05:48 Zakim, AxelPolleres is with jumbrich 13:05:48 sorry, AxelPolleres, I do not recognize a party named 'jumbrich' 13:06:09 -jtandy 13:06:32 Approval of http://www.w3.org/2014/02/05-csvw-minutes.html ? 13:06:36 resolved: approved previous meeting minutes: http://www.w3.org/2014/02/05-csvw-minutes.html 13:06:58 +??P6 13:06:59 TOPIC: http://www.w3.org/2014/02/05-csvw-minutes.html#item02 13:07:10 TOPIC: Use cases and requirements 13:07:18 Zakim, ??p6 is me 13:07:18 +jtandy; got it 13:07:45 https://www.w3.org/2013/csvw/wiki/Use_document_outline 13:07:59 jtandy: I … documents which I've started 13:08:05 https://www.w3.org/2013/csvw/wiki/Analysis_of_use_cases 13:08:06 … both of these live on the wiki currently 13:08:43 drafts are in wiki to get started. I went through the use case docs from the previous WGs - SKOS, OWL, etc., and pulled out some things which I thought we ought to cover. 13:09:03 in terms of document headings, typical w3c parts, ToC, abstract etc. Once we get into use cases, a q: 13:09:22 some docs do use cases AND user stories. Are we happy just with use cases? 13:09:30 danbri, jenit: happy 13:09:54 jtandy: implication is that use cases will have both a narrative style and technical content closer together 13:10:08 … pulling out the things people are trying do just with example 13:10:26 jenit: I'm happy with that. It can be a tricky distinction, we just need to get down to some practical examples 13:10:56 jtandy: ok so practical examples + a narrative. for each one of the detailed use cases in the doc we'll want a ref to the contributor, and a ref to a complete description (most likely link to our wiki) 13:11:17 … will want to hyperlink to specific requirements 13:11:25 jenit: why separate use case descriptions? 13:11:54 jtandy: I'd expect the actual W3C spec doc will be somewhat clipped, and likely we'll have full details in wiki 13:12:07 jenit: fine, so long as self-contained within the doc 13:12:20 Zakim, AxelPolleres is with Umbrich 13:12:20 sorry, AxelPolleres, I do not recognize a party named 'Umbrich' 13:12:28 jtandy: yes. for e.g., some of my use cases are complex, don't want to pollute doc 13:12:31 q? 13:12:34 …embedded in the doc will be requirements 13:12:41 jtandy: I'm aiming for something like 8 13:13:14 -danbri 13:13:24 danbri1 has joined #csvw 13:13:53 +[IPcaller] 13:13:53 jtandy: in some documents there are informative use cases 13:14:06 jenit: i'm not sure 13:14:10 … a particular reason to do that 13:14:22 Sorry ... Dropped connection on SIP 13:14:24 either a use case provides requirements or it doesn't; if not we shouldn't care about it 13:15:44 jenit: (re 8…) that we shouldn't constrain the number but seems about right 13:16:08 Please move on. 13:16:09 danbri: i'm poking around for both Google and schema.org use cases 13:16:18 I'll try to resolve ASAP 13:16:27 -jtandy 13:16:32 https://www.w3.org/2013/csvw/wiki/Analysis_of_use_cases 13:17:00 +??P6 13:17:18 Zakim, ??p6 is me 13:17:18 +jtandy; got it 13:17:22 TOPIC: Analysis of use cases 13:17:32 https://www.w3.org/2013/csvw/wiki/Use_Cases#Publication_of_Data_by_the_UK_Land_Registry 13:17:33 AndyS: Re publication of data by the land registry. 13:17:45 (I'm back in) 13:17:46 UK land registry keep title on property in england and wales 13:17:53 diff system in scotland, diff org, regime etc. 13:18:26 a couple of things: price paid data, … every time there is a property or land transaction in england or wales, then it is recorded by the land registry, they have a monthly publication cycle 13:18:45 about 350 million triples (internally quads); driven by a process that already existed that was producing csv files 13:18:57 so there is a relationship between those csv files and what is now linked data 13:19:06 essentially a diff vs previous month 13:19:17 [silence] … and deltions that can happen for various admin reasons 13:19:28 marked by columns abcd etc, … code lists are an important aspect 13:19:38 just looked at as csv it is not data, but a difference on the data 13:19:44 which affects the meaning of columns 13:19:52 each row has meaning given to it by parts of the process 13:20:00 info in it is at diff levels of authority 13:20:27 … not verified by land registry; price isn't checked but generally correct 13:20:45 q+ 13:20:46 q+ to ask how the csv was used previously, if at all 13:20:56 jenit: what does this mean for requirements? 13:21:17 andys: the quality of the csv is pretty good, comes from a data warehouse, in terms of syntax is would confirm to what youv'e called CSV Plus 13:21:28 they publish both with and without column headings, due to different needs 13:21:35 escaping and interesting chars - occasionally a problem 13:21:41 q? 13:21:45 i don't think any char code problems, either english or welsh 13:21:58 from absolute syntax level, … high quality 13:22:19 in terms of introducing modeling (in conjuction w/ land registry), it is quite difficult to go in and say what this data means 13:22:39 data only goes back to '95 because structures changed then 13:22:55 even in today's process, there have been subtle shifts in meaning, takes some internal investigation to figure things out 13:23:01 even though they have a well org'd data dictionary 13:23:22 despite all their good practices, still needs a knowledge capture effort 13:23:30 ack 13:23:35 ack jtandy 13:23:56 jtandy: as i was going through andys's use case for requirements doc, I tried to pull out requirements 13:24:18 key seemed to be: automated transform of csv into rdf, by automated i mean having a generic way to do it, 13:24:26 andys: the land registry did write a custom convertor 13:24:39 jtandy: but arguably we should be in a position where there's a generic transformation mechanism 13:24:59 andys: they would've been delighted if such a thing existed. they needed to do this at scale, the tools were not up to date 13:25:20 andys: the bulk conversion is relatively easy part, 13:25:32 Naively, I guess many people here would think CSV2RDF should be just a "dialect/small modification" of the existing RDB2RDF spec, or no? 13:25:37 tandy: we need a machine readable mechanism to associate rich semantics - e.g. rdf properties - with cols and rows of a csv file 13:26:05 andys: yes some sort of way to link back and talk about what a column, or possibly even a cell, … at that point what I draw out, is that each row is not an entity in itself 13:26:05 AxelPolleres, yes, I think that's an assumption 13:26:24 … if you take all the transactions, one property will be mentioned many times in different rows 13:26:32 because each row is a transaction 13:26:45 so they get mentioned in many places 13:26:47 AxelPolleres, some work is needed to analyse how that might work though, by someone who knows RDB2RDF 13:26:57 it would be ideal to try to extract out a property entity and several rows 13:27:16 jtandy: 3rd req i exrtacted, that each entity should be uniquely identifiable 13:27:19 guid 13:27:34 andys: … guid for each [don't know, need to check, it's a hash of some cols] 13:27:39 per transaction 13:27:53 internally, there are some identifers for properties, but they're not in a position to publish those 13:28:10 jtandy: each row wants to talk about a transaction, which is an update on a prev transaction 13:28:31 jtandy: final requirement, is need to associate values in a csv file with an externally published thesaurus 13:28:40 andys: very much so, that's quite important 13:29:09 jtandy: 2-fold, a) you need to be able to ref a thesaurus/vocab, or b) you might need to expand some code as refering to some specific entity 13:29:20 (discussion of impact on the conversion workflow) 13:29:31 andys: if you look at it as a table of transaactions 13:30:27 jtandy: looking at analysis you've done, are there any particular things you'd like to flag for help, input etc? 13:30:37 s/jtandy/jenit/ 13:30:48 hmm did that do something bad, for scribe? 13:31:14 jtandy: we need example (for nat. archives) we have useful discussion but a more specific example would help. 13:31:38 also, 2nd use cases also from adam, relational data row and formats, … [missed detail] 13:32:01 jtandy: 3 and 4 from jenit. For 3., I've identified that they were talking about Excel, would be useful to id a list of commonly used tools 13:32:30 … a particular dataset for use case 3 would be a specific csv file to illustrate 13:32:37 tools wiki: https://www.w3.org/2013/csvw/wiki/Tools 13:33:20 jtandy: no 4, no comments. no 5, one of mine - meteorological observations; no 6, andy's discussed already; pretty specific; no 7, from Alf ... 13:33:23 q- danbri1 13:33:31 q- danbri 13:33:59 Alf's use case 7., search results from SOLR, my interpretation is that you're trying to illustrate how to deal with a larger dataset 13:34:03 e.g. a huge resultset from a search 13:34:11 just using the search result as an illustration of this process 13:34:15 alf: yes, pretty much right 13:34:30 jtandy: i remember from last call + use case, that we don't want to get into designing a search protocol 13:34:46 … so interest is not so much the search protocol but dealign with a subset of a larger collection. 13:35:08 jtandy: am trying to write the use case to make that clear, following a specific narrative … example of open refine, 13:35:20 how things that you tried to do affect how you have to process the csv file 13:35:24 alf: yes 13:35:45 jtandy: i've written that the search topic is misleading, we're not doing a protocol, just pagination within a dataset 13:36:11 in terms of no.8, the police open data reliability analysis, i think that it would be useful to include a set of csv files, some of which are unreliable, ... 13:36:21 so see how […] categories and geo areas 13:36:30 davide: ok i'll do that 13:36:54 jtandy: the analysis is great, looking at change over time, comparability etc. Just needs some specific examples to show where it's broken. 13:37:34 jtandy: a q for jenit/danbri, … in a lot of use cases, people are having to do manual effort to manipulate the files. should we say explicitly in use cases, "And it took ages as I had to do … to get into matlab etc."? 13:37:43 jenit: pull out what's requirement for particular tools 13:37:51 jenit: to inform what we do 13:38:09 jtandy: in a perfect world it would all just work! but we're writing use cases for today 13:38:38 jenit: reading stuff into R, or favourite stats package, may need extra impl work to read in the files in the format we're talking about defining, to get all the extra info / context in 13:38:58 but we need to have an idea about the backwards compatibility story, … how much new tools can work with CSVs, new CSV etc 13:39:15 jtandy: no9, analysis of scientific spreadsheets, again via davide 13:39:48 … also see this with my scientific colleagues; i've seen people do v similar things w/ hydrology and river flow (topical topic...) 13:40:05 [missed the action but davide will do something] 13:40:21 alf's no 10, suggest merging with alf's no 9 13:40:30 alf: yes, suggest that 13:40:39 alf/davide to discuss converging them 13:41:07 alf: i have a q about this, http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0048.html 13:41:26 … are we trying to help ppl who would normally publish excel to do csv instead, or a subset 13:41:34 jenti: that's a pretty fundamental question 13:41:48 … we should be aiming to let people express the kinds of info that ppl express in excel files 13:42:12 q+ to ask about functions/expressions 13:42:28 jenit: then make judgement calls about expressivity 13:42:30 [noise] 13:42:46 [bg noise] 13:43:03 alf: for this use case i'll go thru the excel files and try to pick out what might be represented 13:43:06 ack me 13:43:07 danbri, you wanted to ask about functions/expressions 13:43:28 danbri: excel functions/expressions too? 13:43:37 alf: you might want totals of columns 13:43:56 jenit: that's the kind of q that we need to pull out as a potential requirement, and say if we'll try to address it or not 13:44:23 tandy: when it comes to the use case, it'll be essential to bring out, that these are things we're trying to achieve 13:44:49 (thought: we could/should/might say that losslessly representing original format is not a goal) 13:45:01 jtandy: annotating time series … 13:45:14 artificial shifts, e.g. when an instrument recalibrated 13:45:21 it would be better if you could pull out a use case 13:45:37 alf: [missed] merging with weather observation series 13:45:52 jtandy: i'm looking at integrating that with international surface temperature dataset 13:45:59 they're merging csv datasets from all around the world 13:46:06 single consolidated dataset 13:46:15 this would be in a sep piece of the workflow 13:46:37 alf: you might indicate a volcano erruption at a point in time etc 13:46:48 jtandy: I linked some software, … 13:47:35 jenit: thanks for all this work, it would be great now if we can get it into w3c draft format, let's talk offline about praticalities 13:48:04 jtandy: i'll try to get this done before next week 13:48:09 " alf: you might want totals of columns" ... so you want to *extend* the CSV format? ... don't see this covered by the charter at the moment. 13:48:14 +q 13:48:17 (… incl issues discussed, if people supply the details) 13:48:22 ack konstant 13:48:52 NetCDF 13:48:52 konstant: I didn't get chance to introduce myself last week, … but wanted to mention i'll provide text on wiki, … 13:48:59 ie. the consequence of this would be rather "Spreadsheets on the Web" rather than "CSV on the Web", wouldn't it? 13:49:12 netcdf scientific data, they have complex headers, 13:49:16 in between discussion 13:49:34 they have a header that describes the semantics of the columns, incl the ranges for the diff columns, 13:49:43 http://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.cdl 13:50:02 for example [url above], it documents the ranges for the diff values 13:50:06 Axel - maybe it could be by metadata saying what a cell means? 13:50:11 then there is a data section at the end of the file. 13:50:45 konstant: we're cooperating on a project w/ a dutch university, who have huge amount of these files, and diff modeling software that they're using, to predict crops and crop yields 13:50:55 Andy, you mean something like being able to say something like "lastline contains totals"? or alike? 13:51:05 they combine this data with metereological data, create new netcdf files using this modeling software 13:51:26 s/lastline/last row/ 13:51:28 they compare predictions, data, … q is how to combine netcdf w/ other kinds of data 13:51:34 netCDF -- http://www.unidata.ucar.edu/software/netcdf/ 13:51:39 jenit: good stuff, jtandy can take it via wiki, … 13:52:01 jtandy: i'm accutely aware of the netcdf efforts, ERA 40 dataset etc. Are you dealing with a specific variant? 13:52:12 konstant: I'm not sure, will need to investigate that 13:52:55 we should start giving you … examples from different decades, … I'll check with wageningen 13:53:10 jtandy: interesting to look at mixing this with other kinds of dataset 13:54:02 jenit: yes, please make use of the mailing list 13:54:20 http://w3c.github.io/csvw/syntax/ 13:54:33 jenit: I made a start at drafting a definition of what CSV might look like 13:54:47 … not time to discuss in detail now, pls take a look and comment on the list 13:55:08 there's an appendix, i picked out Excel and others; expect more work needed there on current state of the art 13:55:29 I'd encourage any of you that have particular favourite tools for CSV, e.g. excel on windows which I don't have, add samples etc 13:55:41 so request for review of this doc and input on tools section 13:55:57 (good stuff jeni :) 13:56:18 jenit: i don't particularly want to be editor of that doc, so if you're interested in editing role please say 13:56:19 AOB? 13:56:25 TOPIC: EDF ad-hoc meeting 13:56:39 AxelPolleres: I got one reply from colleague in athens 13:56:43 but not yet more 13:57:16 danbri: it doesn't need to be WG members only 13:57:26 AxelPolleres: I can mention in my talk 13:57:43 … i can promote a bit the existence of the WG 13:57:57 jenit: good idea, more attention, more input, more impact, .. so yes please :) 13:57:58 AOB? 13:58:24 jtandy: several of us will be at linking geospatial conf, … packed agenda but we can find a few minutes there 13:59:42 AndyS suggests maybe meeting evening before 13:59:50 jenit: discuss on list 14:00:11 adjourned. 14:00:14 -fresco 14:00:15 -danbri 14:00:15 -JeniT 14:00:21 -AndyS 14:00:22 -AxelPolleres 14:00:22 -davideceolin 14:00:23 -konstant 14:00:24 jumbrich has left #csvw 14:00:24 -jtandy 14:00:25 zakim, bye 14:00:25 Zakim has left #csvw 14:00:25 DATA_CSVWG()8:00AM has ended 14:00:25 Attendees were jtandy, JeniT, AndyS, fresco, danbri, AxelPolleres, davideceolin, konstant 14:00:26 AxelPolleres has left #csvw 14:00:30 rrsagent, make log public 14:00:36 rrsagent, draft minutes 14:00:36 I have made the request to generate http://www.w3.org/2014/02/12-csvw-minutes.html danbri1 14:00:47 rrsagent, bye 14:00:47 I see no action items