CSV on the Web Working Group Teleconference -- 26 Feb 2014

<JeniT> http://www.w3.org/2014/02/19-csvw-minutes.html

Jeni: approve previous meeting records

No objects were voiced

Scoping

jeni: starting with scoping and general approach

<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0134.html

jeni: do we need to talk about all CSV and textual tabular data
... or do we say "this how to publish CSV"
... map all CSV and textual tabular data to a data model

<jtandy> +1

<EricStephan> Good summary! +1

<AndyS1> +1 to the approach

<DavideC> +1

jtandy: focus on text tables
... not excel, rdb, netCDF or anything else in their native formats

jeni: maybe dumps

<chrismetcalf> You should be able to up-convert from dumps into those native formats though

jeni: need a specification for parsing tabular data into the data model

<JeniT> stasinos: are we trying to specify what a parser should be able to parse, and what the output is?

<JeniT> ... or the metadata that a more generic engine needs in order to parse a given piece of data

<JeniT> ... eg what's the delimiter

<JeniT> fresco_: are you talking about the stuff that is specified in the SDF?

what I am saying is: are we going to specify one particular data format

<JeniT> stasinos: if the first row is a header row or not, that kind of thing

jeni: how much configuration is needed and how much should be std-ized
... explicitly specifying or having algorithms to guess

sorry, I didn't catch was was just said

can the speaker please ty pe a few words?

<JeniT> https://github.com/theodi/csv-validation-research

<fresco_> i will try to summarize the parameters that existing parsers (in various languages) use

andy: is this doc the starting point?

<fresco_> e.g. delimiters, which rows/columns are fixed/headers, enclosure character, etc

jeni: suggests not a format, but what is there
... use as starting point for what can be parameterized

<Zakim> AndyS, you wanted to ask if SDF is our initial starting point (yes?)

andy: about what features exactly are needed

<fresco_> JeniT: yes, that would be the next step - to list the features required by the use cases that existing parsers don't handle

jeni: base on requirements from use cases documents

<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0159.html

jeni: moving on to definition of tabular data
... exact meaning of "tabular"

fresco asks how free a definition we are after

<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0161.html

jeni: should have columns of items with consistent meaning

<chrismetcalf> +1 for JeniT’s definition

jtandy: each row is about 1 thing
... columns with uniform meaning
... and a regular number of columns
... "regular" is explained in reference to the example in Jeni's email. Some items have sub-sructure

<fresco_> ok, so tabular implies data with columns, to differentiate it from a line-oriented format

jeni: prefers fixed
... prefers fixed number of columns

EricStephan: each column has a heading
... column header is metadata
... it would be good to have more than header names

<timfinin> zaikem, mute me

<AndyS1> [request for example? (ptr?)]

jeni: header, column names, and everything under it is data

<JeniT> http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/chivenordata.txt

<AndyS1> +1 to tabular data can be part of a file

<EricStephan> Thank you for the example

<chrismetcalf> This would be a good example a fixed-width tabular file

jtandy: sub-structure inside fields

jtandy: do we recommend it? deal with it?

yakovsh: column names are an assumption

<AxelPolleres> for "headerless" csvs we could always define default properties, :column1 ,... , :columnn , right?

<EricStephan> That sounds good

jeni: columns only have numbers, and row 1 titles are simple annotations

timfinin: mentions medical clinical trial data
... had a hard time putting that data into what we are discussing

<AndyS1> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0207.html

timfinin: some columns descrive properties, not values
... often have row headers and column headers

<AxelPolleres> @tim, rowheaders = keys?

timfinin: big chunk of data, many peoiple interested

<AxelPolleres> ... or you mean transposed tables?

Jeni and timfinin agree example would be good

<EricStephan> Very interesting Medical clinical use case!

<timfinin> zakim mute me

<fresco_> +1 for names as annotations - it means that you can specify the names outside the CSV file if they are not present

AxelPolleres: columns are names or properties
... not sure this is a binary decision

<AxelPolleres> say you have a CSV with column headers "foo,bar"

<AxelPolleres> and row 1,2

stasinos: can never be sure what these are, so betrer make them annotations

Jeni: also leans towards annotations

<yakovsh> we can indicate that heraders are preferred in bcp

<fresco_> need to distinguish between a header row that is "title,date_published" (property) and a header row that is "Article Title,Date Published" (label)?

<AxelPolleres> then you could translate that to [ :column1 1; ns:foo 1; :column1 2; ns:bar 2 ] . or [ :column1 1 ; column1 2 ] . :column1 rdfs:label "foo" . :column2 rdfs:label "bar" .

UCR

jtandy: update on use cases and requirements

<JeniT> http://w3c.github.io/csvw/use-cases-and-requirements/

davideC continues .

<EricStephan> can't here

<EricStephan> can't hear

Too much noise, but I think DavideCpromised something for next week

DavideC: will work towards coompleteing his action by next week

<DavideC> I'll try to have it done by next week, at least half will be ready

More arranging of how to proceed with the use cases document

Scope of CVS validation

Jeni: is in scope

AndyS: what does it mean to be in scope?

ivan: the definition and validation of CSV is not on the charter; CSV is a given

AndyS discusses algorithms for error recovery

ivan: main focus: metadata around CSV and conversion
... writing a definition, validation opens many problems to do right

Jeni: validating that given format is consistent with the metadata

<EricStephan> Similar to XML document being "valid" versus "well formed"?

AndyS (I think): :validation" might mean diff things. It's about people exchanging data knowing that they mean the same thing

<yakovsh> +q

chrismetcalf: validation: checking it is CSV, datatypes are observed
... datatypes are in scope
... point to appropriate std

yakovsh: RFC defines the mime-type

<chrismetcalf> IETF RFC for CSV: http://tools.ietf.org/html/rfc4180

yakovsh: it is not considered a std, it is just for information purposes
... no issues with updating, but moving to a proposed standard is a diff styory

<chrismetcalf> In my experience, many “CSV” generators, tools, and files are not compliant with that standard

yakovsh: will talk to the right people to check if making a std is considered

<AndyS1> +1 to split

jtandy: two kinds of validations soon to come

<JeniT> http://w3c.github.io/csvw/syntax/

Jeni: proposed to end the call, ScribeNick is happy

no prob

<AndyS1> jenit, Is there one priority issue you'd like people to respond to?

<chrismetcalf> Lots of echo

Ivan: time differences
... for three weeks, one hour earlier if we keep US time constant

<EricStephan> and the US west coast get to sleep in for an hour :-)

<EricStephan> I have no problem with that

ivan: will check if it is possible to base time on GMT

<timfinin> more sleep :)

<EricStephan> JenIT :-)

ivan: but be sure: many people on the wrong time no matter what we do

<chrismetcalf> I vote that standardizing DST is out of scope :)

jtandy: time left to FPWD

Jeni, ivan: publish ASAP, even rough. Comments help

CSV on the Web Working Group Teleconference

26 Feb 2014

Attendees

Contents

Scoping

UCR

Summary of Action Items