CSV on the Web Working Group Teleconference

22 Apr 2015

See also: IRC log


Jeni, Jeremy, Gregg, Dan, Ivan


<trackbot> Date: 22 April 2015

=Structuring issue handing

ivan: issues so far came in via email lists, am moving them towards github

… have to be careful all issues that come up should have a clear reference when things are done and we get a hopefully positive reply from orig commentator

ivan: you all might have other tricks esp re github

… isssues relating to our oblig horizontal reviews eg i18n

ivan: we might want dedicated labels for these

also 2 things require something from us

eg. yakov ietf

<Zakim> danbri, you wanted to discuss github accounts and to note i18n asked about schedule

ivan: re schedule, what we said we'll try to do is move to CR in June so that's probably our answer

'therefore comments towards end of may would be great'

jenit: can someone take resp for putting all issues into github, otherwise it is likely to [bad things]

ivan: i have been doing that past few days
... everything so far ought to be in github

but what details?

<ivan> https://github.com/w3c/csvw/issues/507

consider for example #507 clipboard stuff

ivan: if i'm not around someone else could cover

orig mail should be put into, copied, into issue, with a link

if a thread on list, … we ought to take a look at both, … when we answer to the mailing list maybe we should also link/copy in github

jeni: great if you can take responsibility

agree we don't need to copy them all

ref to 1st + summary text is great

responding to mail saying thank you, … point to github, may help direct relevant conversation into github

we should try to keep our discussion in the github issue as much as poss

and when we reach a conclusion after github discussion, reply in orig thread

<jtandy> +1 to JeniT 's suggestion about discussion in github and pasting conclusion into email thread

ivan: at end of may i'll be out in florence, then to nyc, …

<Zakim> danbri, you wanted to suggest a weekly duty rosta and to suggest changing thread titles

<ivan> https://github.com/w3c/csvw/issues

there is also the "Milestone" mechanism

we could have "PreCR horiz review" there

<jtandy> I can swap labels for milestones

<jtandy> (if we agree that would be useful)

<ivan> csvw

jtandy: … still #205 to rework requirements

that issue is currently assigned to davide

jtandy: at tail end of f2f, he volunteered

jenit pinging in github

jenit, let's go thru issues

clipboard formats… https://github.com/w3c/csvw/issues/507

<JeniT> https://github.com/w3c/csvw/issues/507

clear that this is not relevant as we are not doing clipboard formats for csv files

jenit: propose that i respond to list making that point

<ivan> https://github.com/w3c/csvw/issues/507#issuecomment-94857565

ivan: after i asked it, … one more comment i put here, … he did reply to my mail saying 'yes indeed'

gkellogg: does relate somewhat to an issue Ivan and I discussed, w.r.t. transformations if there is no source. In which case it nominally uses the annotated model. However there is no official API for that model to use.

But perhaps there is a way to serialize the data model back

… in which case a clipboard format for that annotated data model would be relevant

danbri: exciting work for v2!

jenit: +1
... for me, all of those transformations is completely impl defined

e.g. how any json is passed in

or rdf etc, stdin, params or whatever

we say nothing currently

also applies to what happens when someone specifies a transform with some format that we have not defined any kind of handling for

gkellogg: we haven't defined any way to access that abstract model

jenit: i think we can't, it is outside our scope and we don't know yet what will be wanted/useful

… let's look in a year's time

can stdardize later

<jtandy> +1 to JeniT ... once there are implementations in place, we could construct (at least) a Note to describe

back re #507

ivan: let's suggest we don't need to do anything, hopefully response is fine, then we can close the issue

https://github.com/w3c/csvw/issues/506 Relationships to RDB tables

ivan: not sure how closely Ashok is following our group/charter etc.
... this is where the issue arose that gregg raises - the only thing that I think may make sense. In case of the transformation, there would be a way for the external process to get hold of the merged metadata.

… merged metadata has a clear syntax as we define it

<JeniT> hmph, Skype died

… if a way to get hold of that, i could imagine a way for processors to use it w.r.t. a relational database

ivan: it would be possible to create a processor that takes merged metadata and provides a relational database schema

and then from that point on, that's all we can do.

what it would require, in spec of the metadata doc, …

ivan: the only thing it would req in the metadata doc is one more flag to transform to say that it takes merged metadata as its imput

jenit: how does that help?

ivan: only thing i was wondering about … just idea … i may imagine a processor that takes the metadata doc, the way we define it, and can turn the metadata doc into a relational schema. Could provide a hook in transformation, ...

jenit: if i was doing an import i'd want the data not just the schema

<jtandy> [so, for example, the metadata defined foreign keys that would materialise in a relational schema]

<Zakim> danbri, you wanted to say i expected import/export tooling not schema gen

gkellogg: other thing still open re rdb relationship, was bit about not being able to export cols that are from a reference

ivan: it would add a relatively complex thing to a slightly secondary feature

… adding a ref into spec to a wiki page that you can create is fine

gkellogg: do we need to document that we dont fully match rdb model in that regard

jenit: couldn't we have an appendix on the rdb mapping?

(in cvs2rdf presumably?)

ivan: we can turn wiki page into an appendix

<jtandy> (think so)

… i would not think that adding additional sparql engine etc is needed there

jenit: can we put an action on jtandy or gregg to add some appendix to the rdf doc?

ivan: let me do that, i wrote the wiki page. i'll make a 1st run.

turn it into appendix

(jenit makes a dedicated issue)

<jtandy> (lots of fast typing noises!)

ivan: doing anything more we consider beyond our charter, but we'll integrate that wiki page into final rdf doc.

dan to respond to ashok on this

ivan: should we cc tag list

back to https://github.com/w3c/csvw/issues

<ivan> https://github.com/w3c/csvw/issues/505

#505 minor, ivan's editorial that gregg/jeni could handle

<ivan> https://github.com/w3c/csvw/issues/509

Relationship to Data Cube #509

jtandy: i have a reasonably well worked out example with real data, if i publish that into wiki i can refer to it

<Zakim> danbri, you wanted to suggest virtual cols help

<ivan> https://github.com/w3c/csvw/issues/179

danbri: v useful having a datacube-based example on the value of virtual cols

ivan: my #179 issue i asked if we needed virtual columns. Doing the rdbrdf text i found i needed virtual columns. So we need it by charter.

<jtandy> [virtual columns are necessary for mapping CSV to RDF Data Cube]

so i am ok closing that issue

jenit: can you ivan make that comment within the issue?

so we can close it within that issue

ivan: ok, will do today

<ivan> https://github.com/w3c/csvw/issues/508

Comments on registration template for "appplication/csvm+json" #508

ivan: [missed]

will start things moving

ivan: process says this should be done before candidate rec

<scribe> new process doesn't say much else

we should do it

gkellogg: it became apparent in my blog post writing, that metadata merge is a big issue for implementors. Complexity ....

only reason for it right now is to pull in titles from CSVs

it is possible that some embedded metadata could be defined, but as-is now, it only really exists for adding titles. Only there for verification purposes.

should we be rethinking notion of metadata merge, replace it with validation phase, checking CSV's cols to those described in metadata.

jenit: what of issue of user-provided metadata?

gkellogg: yes it might merge with some other metadata ...

… if what the user is providing is on the minor nature, e.g. a tab-separated values option, then sure some need to modify

but otherwise it is pretty minor

e.g. dialect description

<Zakim> jtandy, you wanted to comment about 3rd party usage of CSV files

jtandy: (missed detail, sorry)

ivan: there is user metadata that jeni mentioned

we do not have evidence right now that gregg's point is true

[(or false :)]

there is option that the csv might carry metadata

which would also be merged with whatever is out there

any metadata that is part of the doc is only on trivial matters

<jtandy> [detail: for people re-using CSV files published by third parties, we can override any metadata defined at source because we can always start the CSV "import" by pointing to _my_ local metadata file]

<Zakim> danbri, you wanted to ask re Spreadsheets support (excel, drive)

jenit: Rufus has mentioned that (alongside virtual cols) the merge could be a barrier for implementors

we have had evidence that it is putting people off

I'm in favour of people defining formats with embedded metadata within them

we could say: it is down to them to say how that embedded metadata is converted to other, … we provide vocabulary, but not necssarily the exact processing rules

jenit: we have simplified around saying 'you use the 1st metadata file you find', even when that is user supplied, that reduces the times when we need merge

it is really now only focussed on titles from CSV files

and the user supplied options

and for them it will be impl defined

so because we're not defining what options, apis, commandlines etc , …

with all of that evidence i am also in favour of removing the need for merge

and instead have specialized step for checking titles supplied within CSV

<jtandy> +1 from me (the metadata merge discussions confused me)

ivan: 1st of all, admin issue. FOr all our processing, it would be good if this went into issue list.


ivan: this seems a significant change, we should open/close it in github.

gkellogg: I will do that

would we need another pre-lccr pub?

[thoughtful silence]

ivan: let me come back to that

there is another related admin issue.

list of changes...

ivan: we do not have an *obligation* to issue another doc, since this is not formally a LC

since we are not in LC

i think that what we can do, … we make the change, and if it is documented on the issue list, we can send around an email on our own lists, we can draw attention to it

maybe contact implementors that we know about directly

<Zakim> jtandy, you wanted to ask if there were any requirements demanding metadata merge

jtandy: checking reqs to see if anything demanded a merge, … didn't find on a quick skim

danbri: test case that passes now and will fail or v-versa?

gkellogg: multiple relevant tests. will have to see which are affected. over 100 distinct tests…
... as part of my extended action i'll investigate via my implementation

<jtandy> [a quick skim through the requirements indicates nothing demanding metadata merge]

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.140 (CVS log)
$Date: 2015/04/22 15:09:06 $