See also: IRC log
<JeniT> http://www.w3.org/2014/02/19-csvw-minutes.html
Jeni: approve previous meeting records
No objects were voiced
jeni: starting with scoping and general approach
<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0134.html
jeni: do we need to talk about all CSV and
textual tabular data
... or do we say "this how to publish CSV"
... map all CSV and textual tabular data to a data model
<jtandy> +1
<EricStephan> Good summary! +1
<AndyS1> +1 to the approach
<DavideC> +1
jtandy: focus on text tables
... not excel, rdb, netCDF or anything else in their native formats
jeni: maybe dumps
<chrismetcalf> You should be able to up-convert from dumps into those native formats though
jeni: need a specification for parsing tabular data into the data model
<JeniT> stasinos: are we trying to specify what a parser should be able to parse, and what the output is?
<JeniT> ... or the metadata that a more generic engine needs in order to parse a given piece of data
<JeniT> ... eg what's the delimiter
<JeniT> fresco_: are you talking about the stuff that is specified in the SDF?
what I am saying is: are we going to specify one particular data format
<JeniT> stasinos: if the first row is a header row or not, that kind of thing
jeni: how much configuration is needed and
how much should be std-ized
... explicitly specifying or having algorithms to guess
sorry, I didn't catch was was just said
can the speaker please ty pe a few words?
<JeniT> https://github.com/theodi/csv-validation-research
<fresco_> i will try to summarize the parameters that existing parsers (in various languages) use
andy: is this doc the starting point?
<fresco_> e.g. delimiters, which rows/columns are fixed/headers, enclosure character, etc
jeni: suggests not a format, but what is
there
... use as starting point for what can be parameterized
<Zakim> AndyS, you wanted to ask if SDF is our initial starting point (yes?)
andy: about what features exactly are needed
<fresco_> JeniT: yes, that would be the next step - to list the features required by the use cases that existing parsers don't handle
jeni: base on requirements from use cases documents
<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0159.html
jeni: moving on to definition of tabular
data
... exact meaning of "tabular"
fresco asks how free a definition we are after
<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0161.html
jeni: should have columns of items with consistent meaning
<chrismetcalf> +1 for JeniT’s definition
jtandy: each row is about 1 thing
... columns with uniform meaning
... and a regular number of columns
... "regular" is explained in reference to the example in Jeni's email.
Some items have sub-sructure
<fresco_> ok, so tabular implies data with columns, to differentiate it from a line-oriented format
jeni: prefers fixed
... prefers fixed number of columns
EricStephan: each column has a heading
... column header is metadata
... it would be good to have more than header names
<timfinin> zaikem, mute me
<AndyS1> [request for example? (ptr?)]
jeni: header, column names, and everything under it is data
<JeniT> http://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/chivenordata.txt
<AndyS1> +1 to tabular data can be part of a file
<EricStephan> Thank you for the example
<chrismetcalf> This would be a good example a fixed-width tabular file
jtandy: sub-structure inside fields
jtandy: do we recommend it? deal with it?
yakovsh: column names are an assumption
<AxelPolleres> for "headerless" csvs we could always define default properties, :column1 ,... , :columnn , right?
<EricStephan> That sounds good
jeni: columns only have numbers, and row 1 titles are simple annotations
timfinin: mentions medical clinical trial
data
... had a hard time putting that data into what we are discussing
<AndyS1> http://lists.w3.org/Archives/Public/public-csv-wg/2014Feb/0207.html
timfinin: some columns descrive properties,
not values
... often have row headers and column headers
<AxelPolleres> @tim, rowheaders = keys?
timfinin: big chunk of data, many peoiple interested
<AxelPolleres> ... or you mean transposed tables?
Jeni and timfinin agree example would be good
<EricStephan> Very interesting Medical clinical use case!
<timfinin> zakim mute me
<fresco_> +1 for names as annotations - it means that you can specify the names outside the CSV file if they are not present
AxelPolleres: columns are names or
properties
... not sure this is a binary decision
<AxelPolleres> say you have a CSV with column headers "foo,bar"
<AxelPolleres> and row 1,2
stasinos: can never be sure what these are, so betrer make them annotations
Jeni: also leans towards annotations
<yakovsh> we can indicate that heraders are preferred in bcp
<fresco_> need to distinguish between a header row that is "title,date_published" (property) and a header row that is "Article Title,Date Published" (label)?
<AxelPolleres> then you could translate that to [ :column1 1; ns:foo 1; :column1 2; ns:bar 2 ] . or [ :column1 1 ; column1 2 ] . :column1 rdfs:label "foo" . :column2 rdfs:label "bar" .
jtandy: update on use cases and requirements
<JeniT> http://w3c.github.io/csvw/use-cases-and-requirements/
davideC continues .
<EricStephan> can't here
<EricStephan> can't hear
Too much noise, but I think DavideCpromised something for next week
DavideC: will work towards coompleteing his action by next week
<DavideC> I'll try to have it done by next week, at least half will be ready
More arranging of how to proceed with the use cases document
Scope of CVS validation
Jeni: is in scope
AndyS: what does it mean to be in scope?
ivan: the definition and validation of CSV is not on the charter; CSV is a given
AndyS discusses algorithms for error recovery
ivan: main focus: metadata around CSV and
conversion
... writing a definition, validation opens many problems to do right
Jeni: validating that given format is consistent with the metadata
<EricStephan> Similar to XML document being "valid" versus "well formed"?
AndyS (I think): :validation" might mean diff things. It's about people exchanging data knowing that they mean the same thing
<yakovsh> +q
chrismetcalf: validation: checking it is
CSV, datatypes are observed
... datatypes are in scope
... point to appropriate std
yakovsh: RFC defines the mime-type
<chrismetcalf> IETF RFC for CSV: http://tools.ietf.org/html/rfc4180
yakovsh: it is not considered a std, it is
just for information purposes
... no issues with updating, but moving to a proposed standard is a diff
styory
<chrismetcalf> In my experience, many “CSV” generators, tools, and files are not compliant with that standard
yakovsh: will talk to the right people to check if making a std is considered
<AndyS1> +1 to split
jtandy: two kinds of validations soon to come
<JeniT> http://w3c.github.io/csvw/syntax/
Jeni: proposed to end the call, ScribeNick is happy
no prob
<AndyS1> jenit, Is there one priority issue you'd like people to respond to?
<chrismetcalf> Lots of echo
Ivan: time differences
... for three weeks, one hour earlier if we keep US time constant
<EricStephan> and the US west coast get to sleep in for an hour :-)
<EricStephan> I have no problem with that
ivan: will check if it is possible to base time on GMT
<timfinin> more sleep :)
<EricStephan> JenIT :-)
ivan: but be sure: many people on the wrong time no matter what we do
<chrismetcalf> I vote that standardizing DST is out of scope :)
jtandy: time left to FPWD
Jeni, ivan: publish ASAP, even rough. Comments help