See also: IRC log
<ivan> Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-02-05
Meeting notes accepted
QUESTION: can we confirm we're ok with plan that F2F currently planned before TPAC
Some events other than the F2F might be useful to collaborate in the room
<JeniT> note it's in less than 8 weeks time
<danbri> ACTION: axelpollerres take a lead arranging an *informal* gathering of wg members at EDF [recorded in http://www.w3.org/2014/02/05-csvw-minutes.html#action01]
<jtandy> Raj Singh and I plan to meet to discuss things at the OGC TC meeting in washington, late march ... will add to the wiki
<AxelPolleres> EDF 2014 http://2014.data-forum.eu/
<danbri> charter: http://www.w3.org/2013/05/lcsv-charter
JeniT: 3rd working draft in March 2014, we were supposed to start earlier
... we need to a volunteer to deliver that document and expect everyone to contribute
<danbri> wiki materials - https://www.w3.org/2013/csvw/wiki/Use_Cases
... we need that by next week
Jeremy: I am willing to help
JeniT: Others can help as well
JeniT: Dan has put together a use case and wiki page references done by other groups
<danbri> wiki page contributors so far: Adam Retter, Jeni Tennison , Jeremy Tandy, Andy Seaborne, Alf Eaton, Davide Ceolin, Martine de Vos via Davide Ceolin, …
JeniT: The more concrete examples the better
Danbri: Owl example stood out
... SKOS stood out as a real world project
<jtandy> jeremy is here
<JeniT> ericstephan: we're trying to pull together use cases; how much detail do you want on the data?
<JeniT> ... I'm trying to provide data about where it's coming from, how people are using it etc
<JeniT> danbri: a high-level big picture, and then some concrete samples of data
<JeniT> ericstephan: I'll exclude data that hasn't been published
<JeniT> ivan: in the meantime, I'll put it up for you
Ivan: Contact me if you want to put this on the wiki
Danbri: Can we go over who has contributed?
<danbri> jeremy now, then davide, jeni, …
Jeremy: My use cases weather observation data
Jeremy: brought out specific issues and and key requirements, no formal semantics associated with csv
<danbri> ISSUE: there is no machine-readable mechanism available to describe how the set of files are related
Jeremy: Often the datafiles are partitioned
into mulitple files ands structures
... If a property is applied to each entity that is summaried as the file level.
... Under the proposal section of the wiki put link back to JeniT document
Danbri: Are you using packaging mechanisms
Jeremy: Typically not, docs on the web next to the dataset
Danbri: Different parties involved on the workflow who ultimately sets the structure of the csv
<Zakim> JeniT, you wanted to ask if a zip would be appropriate for this data in any case
Jeremy: The structure of the csv is based on who produces the data. Need to do more digging into the users
JeniT: With weather data you are dealing with you might want to have various packages
Jeremy: They can get large. Ex 100 million recordsd
JeniT: Can you point to data and how it fits in the dataset?
Davideceolin: over time different csv files representing different kinds of information such as crime counts. Over time Different policies have made different formats.
Davidceolin: I already developed the tool, can I automatically identify the elements in the csv file would be useful
<fresco> A useful question to ask is whether the CSV table itself is appropriately formatted. For example, the HadCET data has "year + day of month" as rows and "month" as columns, whereas it would be easier to process if each row was a single day, and all the values were all in a single column.
<danbri> Martine de Vos
<danbri> contributed via davideceolin
Davidceolin: Spreadsheets (e.g. excel) I haven't reported any yet, but having problems with meaning of figures and numbers in spreadsheet is difficult. Need better understanding of content.
<JeniT> that's what DataCube helps with
Danbri: speaking to people with stats everything was footnotes and annotated by links etc, some of the early rdf data put things at a graph level.
JeniT: Works with statistical data, and provides that kind of thing
<jtandy> my proposal also seeks to use RDF Data Cube ...
danbri: The distinction between describing csv today versus best practices for the future.
<JeniT> ie should we be putting something together that works with currently published CSV files
danbri: Where should we be on that spectra?
<JeniT> or trying to get people to publish CSV differently
<danbri> tx, yes
Jeremy: People may do things that are useful to them already or because they don't need any better.
<danbri> ericstephan: netcdf (community) uses to publish their datstream with
<danbri> … conventions along lines of best practices
<danbri> … i like idea of providing a means/solution ppl can move towards. not dictatorial, …
<fresco> I think it's useful to show best practises in terms of "if you publish data like this, then you can process it as easily as this"
fresco: Replicates is always a problem in spreadsheets.
fresco: The other thing from a scientific experiment knowing what data is in each cell. Also knowing where the data came from would also be helpful
<jtandy> jeremy agrees with alf
fresco: Searching...you want to know in the metadata, like open search in csv, what offset, the number of rows, on publishing you may want people to break up the chunks for large csv files.
<jtandy> ... need to extract subsets from a larger dataset
fresco: Publishing invididual csv files
<JeniT> +1 to link relations between CSV files
fresco: Adding annotations to ??? if you want to annotate a particular cell.
<TimFinin> I'll try to add a usecase relevant to use of CSV files for output of text information extraction systems. A common requirement is linking extracted facts to string offsets in a document.
<JeniT> TimFinin, great, that sounds like a useful use case
jeremy: I agree. When we are querying the datasets we may not know what it is, but we want to know the logical structure of the csv
<Zakim> JeniT, you wanted to mention fragids
<danbri> [I lost audio briefly]
ivan: tsv is not covered by csv?
<fresco> character-separated values?
jeniT: We need to answer the question about delimeter
<danbri> can you hear me?
<AxelPolleres> FWIW, as for binary formats for large amounts of data RDF HDT format might be a fit here? (although RDF compression not really in the scope of this group...) http://www.w3.org/Submission/2011/SUBM-HDT-20110330/
danbri: We are drifting to the protocol
... is there a distinction betweeen search results and data on the web
<danbri> e.g. http://www.ons.gov.uk/ons/index.html
JeniT: If you work with Statistics you care
a lot about annotation and metadata
... Use excel that provides that level of adding metadata
JeniT: Organigram data. Linked data for sharing organizational structures. Easiest way of sharing government org data because everyone works with spreadsheets.
<danbri> (I like this observation, "When the CSVs are published on the web, they need to reference this centrally defined schema (rather than, say, being packaged with a copy of schema) to make sure that they are adhering to the correct format." … this was my Q to Jeremy re workflow and definitions)
JeniT: Two csv files are published together and reference each other by identifier.
<jtandy> good examples
danbri: What kinds of questions should we be asking?
JeniT: I think we should pull out requirements for what the technology needs to do. What are the requirements in the particular use case. Must be something be able to point to a schema and not repeat it again and again.
danbri: DId you look athe RDB2RDF spec for Organogram case? JenIT no
JeniT: List within a cell? Can you identify that?
danbri: Most csv pretty boring, but good to note other examples
JenIT: Thought it would be useful to talk about what we mean by CSV
JenIT: what delimeters etc can be used, what
encodings are supported by different applications.
... Conventions and how that is mapped into an info set for csv, one of the frequent places where commas are used in names.
<danbri> ack me?
JeniT: We need to pull together a definition in roughly the same time line of the use cases.
<Zakim> danbri, you wanted to mention http://www.w3.org/wiki/WebSchemas/LookInside#Background_Research_.26_Related_Work (R, Octave, Matlab) data frames
<JeniT> also cf https://github.com/theodi/csv-validation-research
<danbri> (aside: I read some csvs use diff encoding in each row!)
AxelPolleres: Not only delimiter, decimal
points, and code convention, language differences in csv files
... Makes integration difficult.
JeniT: This is exactly the same problem I have encountered. Although Excel fixes some things, how things should be escaped and delimited. You get diffent kinds of behavior. We need in depth import and export capabilities to understand constraints
<JeniT> ericstephan: there are other tools than Excel, and other binary tabular formats than Excel
danbri: Looking for large target user communities
<AxelPolleres> FWIW, also quotes and quotes escaping are an issue on CSV "in the wild"... although it is specified in http://www.ietf.org/rfc/rfc4180.txt ... it would be nice to provide cleansing tools a la xmltidy for CSV :-)
<danbri> next scribe: danbri
<JeniT> AxelPolleres, yes, but we need to specify what to cleanse into!
danbri: ANything else?
<JeniT> ericstephan, thanks for scribing!
<danbri> yes, thanks ericstephan!
<ivan> trackbot, end telcon