12:03:12 RRSAgent has joined #csvw 12:03:12 logging to http://www.w3.org/2014/02/05-csvw-irc 12:03:14 RRSAgent, make logs public 12:03:14 Zakim has joined #csvw 12:03:16 Zakim, this will be CSVW 12:03:16 ok, trackbot; I see DATA_CSVWG()8:00AM scheduled to start in 57 minutes 12:03:17 Meeting: CSV on the Web Working Group Teleconference 12:03:17 Date: 05 February 2014 12:03:42 Chair: Danbri 12:27:16 Regrets: andys, rossjones 12:30:00 davideceolin has joined #csvw 12:53:10 jtandy has joined #csvw 12:54:09 fonso__ has joined #csvw 12:54:31 hi - i am constrained with another meeting at 14:00Z ... will need to disappear 5 minutes early 12:54:41 TimFinin has joined #csvw 12:55:58 ericstephan has joined #CSVW 12:55:59 DATA_CSVWG()8:00AM has now started 12:56:05 +??P0 12:56:46 hi jeni - i think ??P0 is me ... 12:57:08 zakim, dial ivan-voip 12:57:08 ok, ivan; the call is being made 12:57:09 +Ivan 12:57:14 + +1.509.554.aaaa 12:57:22 +[IPcaller] 12:57:24 i'll get the hang of the zakim stuff at somepoint 12:57:34 zakim, drop me 12:57:34 Ivan is being disconnected 12:57:36 -Ivan 12:57:38 +1.509.554.aaaa is me 12:58:08 +??P4 12:58:11 zakim, dial ivan-voip 12:58:11 ok, ivan; the call is being made 12:58:12 +Ivan 12:58:24 +??P5 12:58:43 ScribeNick: ericstephan 12:58:49 zakim, ??P4 is davideceolin 12:58:49 +davideceolin; got it 12:58:49 Scribe: Eric 12:59:11 zakim, who is here? 12:59:11 On the phone I see jtandy, Ivan, ericstephan, JeniT, davideceolin, ??P5 12:59:14 On IRC I see ericstephan, TimFinin, fonsoN, jtandy, davideceolin, Zakim, RRSAgent, danbri, ivan, JeniT, fresco, trackbot 12:59:14 Agenda: https://www.w3.org/2013/csvw/wiki/Kickoff_Meeting_Agenda 12:59:19 + +1.410.461.aabb 12:59:36 zakim, who is noisy? 12:59:56 ivan, listening for 19 seconds I heard sound from the following: ericstephan (6%), davideceolin (33%) 13:00:15 +??P7 13:00:17 I put myself on mute 13:00:25 Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-02-05 13:00:36 +??P8 13:01:02 Minutes: http://www.w3.org/2014/01/29-csvw-minutes.html 13:01:15 zakim, who is here? 13:01:15 On the phone I see jtandy, Ivan, ericstephan, JeniT, davideceolin, ??P5, +1.410.461.aabb (muted), danbri, ??P8 13:01:17 On IRC I see ericstephan, TimFinin, fonsoN, jtandy, davideceolin, Zakim, RRSAgent, danbri, ivan, JeniT, fresco, trackbot 13:01:24 I am P5, I guess 13:01:29 zakim, unmute me 13:01:29 Ivan was not muted, ivan 13:01:36 AxelPolleres has joined #csvw 13:01:46 zakim, ??P5 is TimFinin 13:01:46 +TimFinin; got it 13:01:47 + +43.13.aacc 13:01:58 zakim, who is here? 13:01:59 On the phone I see jtandy, Ivan, ericstephan, JeniT, davideceolin, TimFinin, +1.410.461.aabb (muted), danbri, ??P8, +43.13.aacc 13:01:59 On IRC I see AxelPolleres, ericstephan, TimFinin, fonsoN, jtandy, davideceolin, Zakim, RRSAgent, danbri, ivan, JeniT, fresco, trackbot 13:01:59 Zakim, aacc is me 13:02:01 +AxelPolleres; got it 13:02:19 I'm not sure who I am 13:02:42 410 is me 13:02:58 zakim, aabb is TimFinin 13:02:58 +TimFinin; got it 13:03:08 zakim, who is here? 13:03:08 On the phone I see jtandy, Ivan, ericstephan, JeniT, davideceolin, TimFinin, TimFinin.a (muted), danbri, ??P8, AxelPolleres 13:03:10 On IRC I see AxelPolleres, ericstephan, TimFinin, fonsoN, jtandy, davideceolin, Zakim, RRSAgent, danbri, ivan, JeniT, fresco, trackbot 13:03:24 yes. I'm luck to be here once 13:03:45 yes 13:03:58 (for me anyway) 13:04:44 I will 13:04:51 JenIT set me up 13:04:56 I am on mute 13:05:06 Minutes: http://www.w3.org/2014/01/29-csvw-minutes.html 13:05:21 none 13:05:34 Meeting notes accepted 13:05:40 FTF 13:05:40 +[IPcaller] 13:05:44 Topic FTF 13:05:46 Topic: F2F 13:05:55 Thank you JeniT 13:07:08 q+ to ask about whether any possibility to join f2f with EDF2014 in March? 13:07:13 QUESTION: can we confirm we're ok with plan that F2F currently planned before TPAC 13:08:02 Some events other than the F2F might be useful to collaborate in the room 13:08:15 note it's in less than 8 weeks time 13:08:17 Zakim, [IPcaller] is fresco 13:08:17 sorry, fresco, I do not recognize a party named '[IPcaller]' 13:08:30 action: axelpollerres take a lead arranging an *informal* gathering of wg members at EDF 13:08:30 Error finding 'axelpollerres'. You can review and register nicknames at . 13:08:51 TOPIC: Use Cases and Requirements 13:08:54 zakim, alfeaton is fresco 13:08:54 +fresco; got it 13:08:54 Raj Singh and I plan to meet to discuss things at the OGC TC meeting in washington, late march ... will add to the wiki 13:08:55 EDF 2014 http://2014.data-forum.eu/ 13:09:56 charter: http://www.w3.org/2013/05/lcsv-charter 13:09:58 JeniT: 3rd working draft in March 2014, we were supposted to start earlier 13:10:29 JeniT we need to a volunteer to deliver that document and expect everyone to contribute 13:10:49 wiki materials - https://www.w3.org/2013/csvw/wiki/Use_Cases 13:10:55 JeniT we need that by next week 13:11:07 Jeremy: I am willing to help 13:11:33 JeniT: Others can help as well 13:12:18 https://www.w3.org/2013/csvw/wiki/Use_Cases#Other_W3C_use_cases_and_requirement_docs 13:12:24 JeniT: Dan has put together a use case and wiki page references done by other groups 13:12:28 wiki page contributors so far: Adam Retter, Jeni Tennison , Jeremy Tandy, Andy Seaborne, Alf Eaton, Davide Ceolin, Martine de Vos via Davide Ceolin, … 13:12:47 JeniT: The more concrete examples the better 13:12:58 Danbri: Owl example stood out 13:13:28 Danbri: Scott's stood out as a real world project 13:13:42 q+ 13:13:45 SKOS 13:13:50 sorrry 13:13:53 s/Scott's/SKOS/ 13:14:12 jeremy is here 13:14:15 ack 13:14:24 q- 13:14:32 just a sed 13:14:32 ack ericstephan 13:14:33 ack ericstephan 13:14:33 ack ericstephan? 13:15:04 ericstephan: we're trying to pull together use cases; how much detail do you want on the data? 13:15:07 eric: we're trying to pull together real world use cases. how much detail do 13:15:10 thanks jeni:) 13:15:19 ... I'm trying to provide data about where it's coming from, how people are using it etc 13:15:31 danbri: a high-level big picture, and then some concrete samples of data 13:15:53 lol 13:15:57 ericstephan: I'll exclude data that hasn't been published 13:15:58 Thank you 13:16:09 ivan: in the meantime, I'll put it up for you 13:16:18 Ivan: Contact me if you want to put this on the wiki 13:16:29 Danbri: Can we go over who has contributed? 13:16:43 jeremy now, then davide, jeni, … 13:16:46 Jeremy: My use cases weather observation data 13:17:14 https://www.w3.org/2013/csvw/wiki/Use_Cases#Publication_of_weather_observation_time-series_data_as_input_into_analysis_or_impact_assessment 13:17:23 Jeremy: brought out specific issues and and key requirements, no formal semantics associated with csv 13:17:30 ISSUE: there is no machine-readable mechanism available to describe how the set of files are related 13:17:31 Created ISSUE-1 - There is no machine-readable mechanism available to describe how the set of files are related. Please complete additional details at . 13:17:35 doh, sorry bot 13:17:46 IZZUE: there is no machine-readable mechanism to associate, or attach, file-level data properties to the entity described in each row of the CSV file. 13:18:42 Jeremy: Often the datafiles are partitioned into mulitple files ands structures 13:19:03 Jeremy: If a property is applied to each entity that is summaried as the file level. 13:19:30 Jeremy: Under the proposal section of the wiki put link back to JeniT document 13:19:53 Danbri: Are you using packaging mechanisms 13:20:14 Jeremy: Typically not, docs on the web next to the dataset 13:20:56 Danbri: Different parties involved on the workflow who ultimately sets the structure of the csv 13:20:58 q+ to ask if a zip would be appropriate for this data in any case 13:21:30 ack JeniT 13:21:30 JeniT, you wanted to ask if a zip would be appropriate for this data in any case 13:21:32 Jeremy: The structure of the csv is based on who produces the data. Need to do more digging into the users 13:22:01 JeniT: With weather data you are dealing with you might want to have various packages 13:22:23 Jeremy: They can get large. Ex 100 million recordsd 13:22:48 JeniT: Can you point to data and how it fits in the dataset? 13:23:09 Jeremy: Yes 13:23:45 I am sorry who is speaking? 13:23:47 https://www.w3.org/2013/csvw/wiki/Use_Cases#Reliability_Analysis_of_Police_Open_Data 13:23:50 davide speaking 13:23:54 Davide Ceolin 13:23:54 thank you 13:25:01 Davideceolin: over time different csv files representing different kinds of information such as crime counts. Over time Different policies have made different formats. 13:25:03 q+ re stats in rdf and need for footnotes 13:25:35 Davidceolin: I already developed the tool, can I automatically identify the elements in the csv file would be useful 13:25:41 A useful question to ask is whether the CSV table itself is appropriately formatted. For example, the HadCET data has "year + day of month" as rows and "month" as columns, whereas it would be easier to process if each row was a single day, and all the values were all in a single column. 13:25:55 https://www.w3.org/2013/csvw/wiki/Use_Cases#Analysis_of_Scientific_Spreadsheets 13:26:00 Martine de Vos 13:26:06 contributed via davideceolin 13:26:26 q- 13:26:53 Davidceolin: Spreadsheets (e.g. excel) I haven't reported any yet, but having problems with meaning of figures and numbers in spreadsheet is difficult. Need better understanding of content. 13:28:13 that's what DataCube helps with 13:28:19 Danbri: speaking to people with stats everything was footnotes and annotated by links etc, some of the early rdf data put things at a graph level. 13:28:48 http://www.w3.org/TR/2013/PR-vocab-data-cube-20131217/ 13:28:54 JeniT: Works with statistical data, and provides that kind of thing 13:29:13 my proposal also seeks to use RDF Data Cube ... 13:29:39 danbri: The distinction between describing csv today versus best practices for the future. 13:29:54 ie should we be putting something together that works with currently published CSV files 13:30:01 danbri: Where should we be on that spectra? 13:30:04 or trying to get people to publish CSV differently 13:30:15 tx, yes 13:30:33 Jeremy: People may do things that are useful to them already or because they don't need any better. 13:30:36 q+ 13:30:44 ack ericstephan 13:30:58 ericstephan: netcdf (community) uses to publish their datstream with 13:31:07 … conventions along lines of best practices 13:31:27 … i like idea of providing a means/solution ppl can move towards. not dictatorial, … 13:31:55 I think it's useful to show best practises in terms of "if you publish data like this, then you can process it as easily as this" 13:32:26 https://www.w3.org/2013/csvw/wiki/Use_Cases#Publishing_the_results_of_scientific_experiments 13:32:43 fresco: Replicates is always a problem in spreadsheets. 13:33:25 https://www.w3.org/2013/csvw/wiki/Use_Cases#Visualisation_of_time_series_data_with_annotations 13:33:29 fresco: The other thing from a scientific experiment knowing what data is in each cell. Also knowing where the data came from would also be helpful 13:33:39 https://www.w3.org/2013/csvw/wiki/Use_Cases#Processing_search_results_from_Solr 13:34:20 jeremy agrees with alf 13:34:22 fresco: Searching...you want to know in the metadata, like open search in csv, what offset, the number of rows, on publishing you may want people to break up the chunks for large csv files. 13:34:36 ... need to extract subsets from a larger dataset 13:34:44 q+ 13:34:45 fresco: Publishing invididual csv files 13:34:50 +1 to link relations between CSV files 13:34:58 q? 13:35:34 fresco: Adding annotations to ??? if you want to annotate a particular cell. 13:35:41 http://tools.ietf.org/search/rfc7111 13:35:56 ack jtandy 13:35:56 q- 13:36:08 I'll try to add a usecase relevant to use of CSV files for output of text information extraction systems. A common requirement is linking extracted facts to string offsets in a document. 13:36:44 TimFinin, great, that sounds like a useful use case 13:36:54 jeremy: I agree. When we are querying the datasets we may not know what it is, but we want to know the logical structure of the csv 13:37:02 q+ to mention fragids 13:37:14 ack jenit 13:37:14 JeniT, you wanted to mention fragids 13:37:39 q+ 13:37:44 ack ivan 13:38:12 [I lost audio briefly] 13:38:24 ivan: tsv is not covered by csv? 13:38:25 character-separated values? 13:38:49 jeniT: We need to answer the question about delimeter 13:39:08 can you hear me? 13:39:17 FWIW, as for binary formats for large amounts of data RDF HDT format might be a fit here? (although RDF compression not really in the scope of this group...) http://www.w3.org/Submission/2011/SUBM-HDT-20110330/ 13:39:29 danbri: We are drifting to the protocol design.. 13:39:57 danbri: is there a distinction betweeen search results and data on the web 13:39:59 https://www.w3.org/2013/csvw/wiki/Use_Cases#Publication_of_Statistics 13:40:20 e.g. http://www.ons.gov.uk/ons/index.html 13:40:35 JeniT: If you work with Statistics you care a lot about annotation and metadata 13:41:00 JeniT: Use excel that provides that level of adding metadata 13:41:21 https://www.w3.org/2013/csvw/wiki/Use_Cases#Organogram_Data 13:42:04 JeniT: Organogram data. Linked data for sharing organizational structures. Easiest way of sharing government org data because everyone works with spreadsheets. 13:42:17 (I like this observation, "When the CSVs are published on the web, they need to reference this centrally defined schema (rather than, say, being packaged with a copy of schema) to make sure that they are adhering to the correct format." … this was my Q to Jeremy re workflow and definitions) 13:42:41 q? 13:42:43 JeniT: Two csv files are published together and reference each other by identifier. 13:42:52 good examples 13:42:56 great! 13:43:11 danbri: What kinds of questions should we be asking? 13:43:39 q+ to ask if jenit tried http://www.w3.org/TR/r2rml/ or similar w/ organogram case 13:44:12 JeniT: I think we should pull out requirements for what the technology needs to do. What are the requirements in the particular use case. Must be something be able to point to a theme and not repeat it again and again. 13:44:27 s/theme/schema/ 13:44:28 q? 13:45:03 q+ (relational data) 13:45:04 danbri: DId you look athe OGC spec for Organogram case? JenIT no 13:45:09 q? 13:45:10 q- 13:45:12 q- 13:45:13 q- 13:45:13 oops 13:45:14 q- 13:45:17 q+ 13:45:46 q- "sees (relational, data), fresco" 13:46:25 ... didn't Dan ask about RDB2RDF... i.e. s/OCG spec/RDB2RDF spec/ , or did I get that wrong? 13:46:26 -> http://www.w3.org/2001/sw/wiki/RDB2RDF info on the RDB to RDF mappings 13:47:01 JeniT: List within a cell? Can you identify that? 13:47:27 danbri: Most csv pretty boring, but good to note other examples 13:47:42 Topic: Definition of CSV 13:48:24 JenIT: Thought it would be useful to talk about what we mean by CSV 13:48:40 http://dataprotocols.org/csv-dialect/ 13:48:50 q+ to mention http://www.w3.org/wiki/WebSchemas/LookInside#Background_Research_.26_Related_Work (R, Octave, Matlab) data frames 13:49:02 q- 13:49:17 JenIT: what delimeters etc can be used, what encodings are supported by different applications. 13:50:15 JenIT: Conventions and how that is mapped into an info set for csv, one of the frequent places where commas are used in names. 13:50:18 q+ to ask whther Is ther any issue about delimiters vs. decimal separators vs. quotes as such (or whether it is just me ;-))? 13:50:19 http://en.wikipedia.org/wiki/Decimal_mark#Countries_using_Arabic_numerals_with_decimal_comma 13:50:48 ack me? 13:51:09 JeniT: We need to pull together a definition in roughly the same time line of the use cases. 13:51:30 ack me 13:51:30 danbri, you wanted to mention http://www.w3.org/wiki/WebSchemas/LookInside#Background_Research_.26_Related_Work (R, Octave, Matlab) data frames 13:51:31 also cf https://github.com/theodi/csv-validation-research 13:51:33 ack axelpolleras 13:51:59 (aside: I read some csvs use diff encoding in each row!) 13:52:38 AxelPolleres: Not only delimiter, decimal points, and code convention, language differences in csv files 13:53:02 AxelPolleres: Makes integration difficult. 13:54:16 JeniT: This is exactly the same problem I have encountered. Although Excel fixes some things, how things should be escaped and delimited. You get diffent kinds of behavior. We need in depth import and export capabilities to understand constraints 13:55:44 ericstephan: there are other tools than Excel, and other binary tabular formats than Excel 13:56:35 danbri: Looking for large target user communities 13:56:59 zakim, pick a scribe 13:56:59 Not knowing who is chairing or who scribed recently, I propose ??P8 13:57:06 perfect! 13:57:09 FWIW, also quotes and quotes escaping are an issue on CSV "in the wild"... although it is specified in http://www.ietf.org/rfc/rfc4180.txt ... it would be nice to provide cleansing tools a la xmltidy for CSV :-) 13:57:20 zakim, pick a scribe 13:57:20 Not knowing who is chairing or who scribed recently, I propose jtandy 13:57:44 next scribe: danbri 13:57:51 AxelPolleres, yes, but we need to specify what to cleanse into! 13:58:00 danbri: ANything else? 13:58:03 -AxelPolleres 13:58:04 -JeniT 13:58:06 -jtandy 13:58:09 -??P8 13:58:10 -danbri 13:58:11 -TimFinin.a 13:58:14 -Ivan 13:58:15 AxelPolleres has left #csvw 13:58:16 WHO IS ??P8 :) 13:58:23 -unknowncaller1 13:58:30 -fresco 13:58:31 :) 13:58:32 -davideceolin 13:58:38 zakim, who is here? 13:58:38 On the phone I see ericstephan 13:58:39 On IRC I see ericstephan, TimFinin, fonsoN, davideceolin, Zakim, RRSAgent, danbri, ivan, JeniT, fresco, trackbot 13:58:48 ivan, can you help turn ericstephan's work into minutes? 13:58:54 rrsagent, generate minutes 13:58:54 I have made the request to generate http://www.w3.org/2014/02/05-csvw-minutes.html ericstephan 13:59:07 Ok sorry 13:59:15 oh great thank you 13:59:18 ericstephan, thanks for scribing! 13:59:21 zakim, drop ericstephan 13:59:21 ericstephan is being disconnected 13:59:23 DATA_CSVWG()8:00AM has ended 13:59:23 Attendees were jtandy, Ivan, +1.509.554.aaaa, JeniT, ericstephan, davideceolin, +1.410.461.aabb, danbri, +43.13.aacc, AxelPolleres, unknowncaller1, fresco 13:59:23 ivan are you still on the phone? 13:59:24 yes, thanks ericstephan! 13:59:53 trackbot, end telcon 13:59:53 Zakim, list attendees 13:59:53 sorry, trackbot, I don't know what conference this is 13:59:57 I might have messed up on one of the presenters 14:00:01 RRSAgent, please draft minutes 14:00:02 I have made the request to generate http://www.w3.org/2014/02/05-csvw-minutes.html trackbot 14:00:02 RRSAgent, bye 14:00:02 I see 1 open action item saved in http://www.w3.org/2014/02/05-csvw-actions.rdf : 14:00:02 ACTION: axelpollerres take a lead arranging an *informal* gathering of wg members at EDF [1] 14:00:02 recorded in http://www.w3.org/2014/02/05-csvw-irc#T13-08-30 14:00:05 in the use case discussion 14:00:05 meaning?