CSV on the Web Working Group Teleconference -- 14 Jan 2015

<trackbot> Date: 14 January 2015

<jtandy> just waiting for fire alarm test to complete :-(

<JeniT> ScribeNick: JeniT

<danbri2> JeniT & al, sorry I massively blanked on the call - can't believe it's weds already

jtandy: we can simplify conversions if we are just basing off the tabular data model

gkellogg: yes, precedence etc all gets taken care of in the model generation

JeniT: I've added standing agenda on the main page

<danbri> ScribeNick: danbri2

<JeniT> gkellogg: it would be good to still get emails to confirm that it's going ahead

<JeniT> https://github.com/w3c/csvw/issues?q=is%3Aopen+label%3A%22Requires+telcon+discussion%2Fdecision%22+sort%3Aupdated-asc

<danbri> jenit: prioritising issues for today, trying to check off our big list by working thru from either old-to-new or least-recently-updated, ...

<JeniT> https://github.com/w3c/csvw/issues/55

<JeniT> JeniT: syntax for regular expressions

<JeniT> ... any thoughts?

<JeniT> gkellogg: every language has a regex syntax, which big overlaps

<JeniT> ... there are edge cases in particular languages

<JeniT> ... this concerns me

<JeniT> ... we should probably say a best practice for interop is to be conservative in use

<JeniT> ... the only reason for a regex is to determine whether the value of a cell is used or not

<JeniT> ... there's no componentisation

<JeniT> jtandy: my impression is compatible with gkellogg's

<JeniT> ... I'm struggling where we actually decided to use regular expressions within the metadata

<JeniT> ... we were talking about using regexs to parse difficult bits of strings

<JeniT> ... so is this time-expired?

<jtandy> ok - so regexp is used for validating CSV files ...

<JeniT> jtandy: so the main point is for validation, to make sure that CSV meets a schema

<danbri1> gkellogg: we have too many 'MAYs' in specs, w.r.t. testing

<JeniT> JeniT: there's the option of making it 'implementation defined' and encouraging authors to be conservative

<danbri1> scribenick: danbri1

gkellogg: don't think we can say its impl defined, ...

... but also don't want to go too overboard in testcases, cooking up weird ways of breaking things via regexes

jenit: where does this leave us?

... which of the regex options do we say that impl has to conform with?

gkellogg: i expect the XML Schema one will be the most broadly available

... versus probably need to load a js env within ruby

... not v appealing

... don't want to repeat their test suites to ensure all and every detail followed

gkellogg: I share j's concern w/ datetime strings and formats

jenit: propose we go fwd with speccing

... i don't think it's true that everyone has an xml schema impl handy

... it is an extra impl on top

(+1 on that 'dan)

jenit: my pref is to go for ecmascript

gkellogg: ok

jenit fills out github issue

<jtandy> sorry - lost connection via zakim - trying to get back in!

<JeniT> https://github.com/w3c/csvw/issues/65

jenit: next one is about datetime picture string

"Register of recognised date-time picture string formats"

gkellogg: I was unable to find an impl that uses them for parsing

they are more often used for formatting

jenit: interesting. that suprises me a bit. Doesn't moment.js use it?

...for parsing

<JeniT> http://momentjs.com/docs/#/parsing/string-format/

gkellogg: I didn't find one (but there may be impl out there)

<JeniT> https://github.com/w3c/csvw/issues/54

issue: it would be better to just have a list of known datetime formats

<trackbot> Created ISSUE-2 - It would be better to just have a list of known datetime formats. Please complete additional details at <http://www.w3.org/2013/csvw/track/issues/2/edit>.

gah, bot.

jenit: this github i**ue is about the idea that it would be better to just have a list of known datetime formats

... some that were hardcoded, this is what the format means, ...

jtandy: yes, spec would identify a registry/page, which says "this is how to identify the ... datetime string, moment.js etc"

jenit: i was misinterpreting. This is about the syntaxes you recognise for the strings

jtandy: if you look at -

<jtandy> { "datatype" : "date", "format" { "python" : "%m/%d/%Y", # standard python format "javascript" : "M/D/YYYY", # javascript's moment.js format etc. } }

... register of known values effectively saying e.g. that the key 'python' refers to the python datetime string as spec'd at [url]

gkellogg: this didn't appeal to me. ugly!

... my u/standing of it has just changed, ... but it is not even clear that these, the python and js, do same thing

... trying to enumerate all the ways, ... editors won't do this, will only specify the ones of interest to them

... my main concern is w.r.t. interoperability

<jtandy> "datatype": "date",

<jtandy> "format": {

<jtandy> "picture-strings": [

<jtandy> "unicode": "dd MMM yyyy",

<jtandy> "xpath": "[D01] [MN,*-3] [Y0001]"

<jtandy> ]

<jtandy> }

jtandy: there's a better eg in the csv2json doc, pasted above here.

... we assume that the metadata publisher cares enough about unicode version, xpath version, perhaps not about other options

q in terms of software impl, "do i understand the unicode thing? xpath?" "i can't read the datetime format anyway"

jenit: reading datetime format is pretty important for conversions, unlike regex which is more for validating

jtandy: same applies for number formats that we also discussed

gkellogg: I disagree that the regex stuff is just for validators

... that was my interpretation of it. If cell doesn't match some regex, it was ... use a null or default value.

jenit: i don't think it says that anywhere

...rather than try parse all datetime formats have list of popular ones, e.g. lists from excel, google spreadsheets etc.

gkellogg: I think that is more likely to get good impl

... perhaps ns doc could contain registry of these formats?

... so we could update that without updating the spec?

jenit: is*ue there is impl conformance. ... how often would it need to check the registry?

jtandy: are we anticipating then, that validating + parsing software would try to detect which of the blessed formats it used?

<JeniT> 'datatype': 'date', 'format': 'ISO'

jenit: you'd have something like this [above], ... col is a datetype date, format is ISO

<JeniT> YYYY-MM-DD

(iso8601?)

jenit: there would be a builtin list within each impl for such formats, we could name them after the unicode picture string

maybe later we could move towards fully supporting

jtandy: we give a particular list of pic strings, ... beyond those you might be stuck

jenit: or fall back to regex

jtandy: seems workable

gkellogg: agree, best way fwd

jenit: maybe not enough of us to make a formal resolution but i'll update issues

[no audio]

<jtandy> https://github.com/w3c/csvw/issues/54

[skype has kicked me off.]

<danbri> jtandy: conversion doc will need to reflect [metadata doc] when that's done

<danbri> jenit: conv doc will need to say to use the value parsed per metadata spec

<jtandy> sorry - booted off the call again!!!

<danbri> ... use the semantic value of the cell

<danbri> gkellogg: place to handle this is in the metadata doc

<danbri> ... transform docs should just reference that

<danbri> +1

<danbri> jenit: putting that as a comment on the is*ue

<danbri> summary: metadata doc will talk about how string values are parsed into a semantic value, i.e. an actual structured date

<danbri> ...based off these formats. All the conversion docs must say is that you're using this

<danbri> gkellogg: following this logic, if [something regex] there'd be a null value

<danbri> jenit: maybe needs to be added to model, if an error on cell, ...

<danbri> ...then conversion doc can define error behaviour for cells

<danbri> (dan: I like this granular validity model)

<danbri> jenit: adding to issue

<danbri> *an

<danbri> jtandy: if can't recognise, then the literal value will need to be used as no other options

<danbri> gkellogg: e.g. if it was a datetime, ... and we didn't recognise it, would rdf conversion use a datetime datatype, vs as a literal

<danbri> ... i think in rdfa it would've gone to plain literal

<danbri> jenit: makes sense

<danbri> jtandy: same issue is relevant to parsing numbers

<danbri> jenit: the issue we put it on (#54) covers that

<danbri> ...reasonable middle ground

<JeniT> https://github.com/w3c/csvw/issues/66

<danbri> #66 "Composite primary keys and foreign key references"

<danbri> nice fiddly issue!

<danbri> skipping for now.

<JeniT> ACTION: JeniT to add foreign key discussion to F2F agenda [recorded in http://www.w3.org/2015/01/14-csvw-irc]

<trackbot> Created ACTION-60 - Add foreign key discussion to f2f agenda [on Jeni Tennison - due 2015-01-21].

<danbri> https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02

<JeniT> https://github.com/w3c/csvw/issues/49

CSV on the Web Working Group Teleconference

14 Jan 2015

Attendees

Contents

Summary of Action Items