CSV on the Web Working Group Teleconference

04 Feb 2015

See also: IRC log


Ivan, JeniT, Greg Kellogg( gkellogg), Dan Brickley (danbri), Jürgen Umbrich (jumbrich), Davide Ceolin (DavideCeolin)


gregg's suggestions: https://lists.w3.org/Archives/Public/public-csv-wg/2015Feb/0004.html

jenit: let's start w/ f2f and topics for that, next thu/fri


jenit: i've been going through issues clustering them, … pulled out only 4 areas

(I don't see in https://lists.w3.org/Archives/Public/public-csv-wg/2015Feb/ … you mean github notifications?)

… there's a set around disconnect between CSV files and tables

one set around primary key + refs between multiple files

a couple around annotations and how annotations reference other things, how they are incorp'd in conversions

and a few around language, and how lang is handled in mapping esp for common properties

those are the areas where I found clusters of issues

also we should spend a fairly high % of time looking at the conversions

as they need more attention and have most issues

any other thoughts on f2f priorities?

(+1 from me on conversions getting attention -dan)

ivan: we have currently 75 open issues, which is quite a number

but there may be quite a lot of these which are really editorial things

jenit: yes, quite a few are labelled as resolved, just require editor action but not needing more discussion


https://github.com/w3c/csvw/issues?q=is%3Aopen+is%3Aissue+label%3AResolved shows 9 'resolved' but open.

jenit: requires people to go through some other issues

… if an issue is neither resolved or requires discussion/decision then it is in a limbo and needs looking at

ivan: we have been a bit chaotic with issues. After f2f we have made over 100

… but we'll find a way

jenit: yes

ivan: What is the goal we set ourselves for the f2f?
... My dream would be that after f2f, we have to make a bunch of editorial work, but after that round i.e. approx end march, we're in position to issue what would've once been called a Last Call.

…. "maybe I'm a dreamer…"

dan: we might not close all issues at f2f but can leave it with a clear owner on anything left open

gregg: this should be achievable

where i am with my impl it looks pretty solid

i don't think we have normative text on extracting embedded metadata on CSVs

perhaps we need reqs around what other formats need to provide

e.g. TSVs are simple

ivan: eh?

gregg: we have some description on how to extract CSVs, illustrative text, gives a flavour for how other formats might be handled

this should be normative

if we're opening door for other formats, need to be clearer on this and use some hypothetical format at least to be testable

re csv mappings and our existing examples they are pretty solid

ivan: to other formats I think that we agreed at some point in time, that we have the conceptual thing which is our model

… and we should not get into any other syntax variation on how these can be mapped on our model

i don't think that we should go there

i fully agree that the default metadata issue is still open and needs to be defined somewhere

we agreed at some point… that if metadata is part of the orig csv file, we do not define the format of that

in a sense the only thing that we will have somehow in our default metadata is when the 1st row are the col names

that's a kind of metadata

any other variation on metadata within the csv file we decided to be out of scope for now

gregg: I thinkyou're right, not trying to solve that here

… jsut that i think the results we have are [good]

ivan: back to orig thing, i'm playing with an impl as well. Although there are many things I have not done, I have same feeling as you. Where we seem to be the weakest, … the usage of foreign keys and how diff tables are related to each other

in our current docs that area is weakest

can we spend time reviewing that?

other issues are smaller details

good as we have captured big picture

to be less of a dreamer, it is realistic to expect "last call" sometime spring

we aim for intermediate draft publication end of march

may not be last call

aim for lc by early june

that is realistic

jenit: maybe we can work it out at f2f, ...

… may simply be useful for us to all sit in a room and type

… actually get some stuff done or on a screen

sharing in smaller groups

actually do some work, not just discussion

<scribe> ACTION: dan check beamer setup for non google staff [recorded in http://www.w3.org/2015/02/04-csvw-minutes.html#action01]

<trackbot> Created ACTION-62 - Check beamer setup for non google staff [on Dan Brickley - due 2015-02-11].

jenit: Rufus may or may not come for some portion of the time

ivan: same for phil archer

… they're not on list but might come over

for a day maybe

<jumbrich> zakim ??P16 is me

jenit: let's try to eat together somewhere around victoria, plenty of places

ivan, can you paste the f2f url?

<JeniT> https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02

<ivan> https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2015-02

thx thx

gregg: we're approaching time when counting implementations matters

jenit: that's it for f2f

hi Davide!

davide: I may be able to attend f2f, will try

<JeniT> https://lists.w3.org/Archives/Public/public-csv-wg/2015Feb/0004.html

Gregg's Big List Of Issues

Issue #96 (PR #187): Use of aboutUrl, propertyUrl and valueUrl. (should be able to close #101).

<JeniT> https://github.com/w3c/csvw/issues/96

oops, bot clash

gregg: #96 this was as discussed/resolved last week, change urlTemplate into aboutUrl, propertyUrl to be a uritemplate property, and add a value url and make them all inherited properties

(don't take my spelling as canonical --scribe)

<JeniT> https://github.com/w3c/csvw/pull/187

… see a PR in github


… desc in metadata doc, … property can be an array i.e. repeatable, that is challenging w.r.t. spec text how to handle/define this.

that's the primary thing I noticed

ivan: there was another issue, whether the property uri can be an array, … q was whether the about uri if it inherited down to a column level, then the about uri can change from one col to another

that's a question

(did i capture that? is there an issue # for this?)

gregg: that's implication and intention for making it inheritable

point was to allow diff columns in rdf to be diff entities

not clear what this means in javascript

(javascript meaning json)

gregg: json properties typically based on column name not the about url

if you just used col names there would be nothing to distinguish from the single json object

we'd need to decide how we'd reflect them

[if at all]

ivan: let's separate the 2 issues

property names being an array vs about URLs

diff numbers?

<JeniT> https://github.com/w3c/csvw/issues/186

(+1 on allowing different entities for diff columns in the rdf)

gregg: the other impl of proeprty uri being a template is that it could vary per row

… some discussion on not including col metadata

jenit: re #186 and multiple property names

<JeniT> https://github.com/w3c/csvw/issues/186#issuecomment-72842231

my comment (see link) i found it useful when mapping data into rdf, you sometimes want dc:title AND rdfs:label

but for simplicity sake let's have it be only a single value

+1 for jeni's proposal

<JeniT> PROPOSED RESOLUTION: #186 propertyUrl only has one value

<JeniT> +1

<gkellogg> +1

danbri: in rdfa/microdata multiple arcs to a value are graceful but horrible in json, support skipping


<ivan> +1

<jumbrich> 0

<DavideCeolin> 0

jenit: jumbrich and davide -- if you don't feel you have enough info to vote you can write 0

<ivan> RESOLUTION: #186 propertyUrl only has one value

(i.e. it helps to indicate that you don't object)

ivan: shall i close it directly?

gregg: let's keep it open until text is updated and commited.

ivan: i'll mark it as editorial

<scribe> (done)

jenit: i suggest we don't go into full complexity of #187

gregg: i thought if we could resolve this we'd be in a less compelx situation

re pull requests, dependencies, ...

jenit: ok

ivan: #187 makes all 3 properties inheritable


gregg: ivan if you're not comfortable with this let's wait til f2f

ivan: I'm not comfortable

gregg: let's just move on

ivan: yes, f2f.

gregg: a couple of other pull requests

ivan: [as above]

… consequences of having all 3 properties inherited are … potentially complex

not a prob with property uri, but with the about uri

jenit: yes

gregg: not a problem for rdf

ivan: even in rdf, now we have a structure with the row property

… what is , … how do i bind the triple to the rest of the structure, so to say?

gregg: up to creative use of the metadata

normally you'd […] for the row

if someone assigns diff abouturi to diff rows, complex things become possible

jenit: timeout

… we'll do better going through this at f2f with examples

<JeniT> https://github.com/w3c/csvw/pull/185

(re examples - reminder that https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/chinook is a nice dataset with links and multiple entities)


ivan: bunch of editorial things in pull req

gregg: back to orig intention, you can have embedded csv without lang in it but matching title in asserted metadata that does have a language
... i had made a change some time ago, meaning that lang of metadata needed to be applied when you created the embedded metadata

you'd end up with title having same lang as the embedded metadata and they'd match

this change allows a title with no lang or lang=UND i.e. undefined to match a title in any other language

meaning that we do not need to arbirarily apply a lang to embedded metadata

and we can apply as intended to [missed]

i tried!

gregg: e.g. tree metadata

… lang for metadata terms in the context is assreted as english

the csv itself when you extract metadata from it

default metadata

something that's emplty

creates meta without a default lang

no name field only title assumed from col names

in order to allow the title from the embedded metadata to match the title from the found metadata which is in english, we need to allow it to match

ivan: i ran into this problem exactly

gregg: there is a further conseq which is that if you were to merge these two metadata you would get descriptions the same except for language spec

ivan: hence should be considered as the same

gregg: yes, as being the one with the lang (defined)

ivan: i agree

jenit: i have a comment on it, but it is captured elsewhere, happy just merging

gregg: great

jenit: done

that was #184

skipping #64 as related to phantom col thing, which we need to work through.



<JeniT> https://github.com/w3c/csvw/issues/170

gregg: relates to rdf/json mappings, 3 sections

core table, table from metadata, table group

as i went through it, to specify it to ignore metadata would seem to need processor action

and wouldn't provide anything that you'd get from just using default extracted metadata

i did not see a reasonable need to have a different processing form for core tabular data

suggestion to remove those


ivan: fundamentally agree. 2 areas where a simplification on the conv docs can be done -

… by pushing some of the things into a common place, which is probably the metadata doc

one is the one just said

conversion doc works only with annotated table

as one of the specs says what the default metadata is

i think it makes a lot of sense to have default metadata specified somewhere anyway

other thing related but diff, is that there are a lot of words in these docs on how the cell values should be converted and also how the inheritance of properties work, down to a cell level

i think this is something that will eventually move to the metadata doc too as it is a general principle

gregg: i believe most of it is already there

ivan: that is great, means these 2 will make the conv docs very much simpler

this is reflected in my impl experience

most of the sweat is on the metadata

once it is there, … generation is of rdf or json

comparatively simple

<gkellogg> http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/gk-transformations/csv2rdf/index.html

gregg: i did my own version of the rdf transofrm doc -^ url

takes this approach, is more template like

proscribes triples to generate

ivan: q is whether making the core tabular data model dissapear is a major change

jenit: bbiab

gregg: doc has instead of SHALL specific triples given for output

does include specific metadata eg on cols

fundamentally it outputs triple info

for each row, it uses all the lang from the metadata doc

ivan: that's more or less what i do as well

what i do, once all the metadata are merged, i create for every row on the fly, i make a cell level with all the transformations etc

then use the structure to issue either rdf or json

gregg: my version is extremely sparse

simple describes what's emitted

dan q re testing

gregg: i created a no. of tests

could make more, paused as we were changing things

i think we have perhaps 30 or 40 tests for both rdf and json

I asked about testing ivan's impl w.r.t. the tests gregg created

jenit: we were going to get rid of the idea of the core tabular data model, there's no value add for it, ...

i don't think there are any implications of that

mark this as resolved

ie. in #170

<JeniT> PROPOSED RESOLUTION: We remove ‘Core Tabular Data Model’ and define everything in terms of ‘Annotated Tabular Data Model’ #170


<JeniT> +1

<gkellogg> +1

<JeniT> we will spend time at F2F going through conversion documents

<JeniT> RESOLUTION: We remove ‘Core Tabular Data Model’ and define everything in terms of ‘Annotated Tabular Data Model’ #170


gregg: this was basically "what is the default metadata ?"

what's expected to be provided by it?

gregg: primary diff from core metadata, … is that core does not assume that 1st row is header

instead we had col=123 etc for headings

my take is that typical way you expect to find a csv is that 1st row is a col titles

… so using that you can get a reasonable mapping

so table group with empty resources [scribe not capturing detail]

gregg: some discussion on http content type param, header absent, ...

to get back to orig appraoch of col=1 etc instead of an initial header

jenit: I don't really understand this w.r.t. defaulting

even with an indiv csv file you get the metadata you can guess at from that csv file

gregg; you need a context from which to extrat the dialect info

having simply an empty metadata estabs that context

could be same as saying that there is none

gregg: my processor looks at all the metadata, merges, uses that then re-merges everything

some concept of having metadata when you extract embedded metadata is required

ivan: there are 2 things, i still have to understand whether essentially saying,… forgetting table groups, i have only one table

… if you have no metadata at all, all the processing we have on metadata will ultimately generate [whatever we need]

not sure if this true

gregg: it generates a reasonable output

ivan: wait, no… not whether it generates reasonable output. In the prev discussion, we do not need the core tabular model, because we have only got annotated tabular models now

… we need to show clearly that if there is really nothing in the current terminology -core tab data - process of merging, defaulting etc, will give us a reasonable annotated model.

gregg: to achieve this we could change defaults for skip rows and header

if you say header: false, you'd get to what is currently in the core metadata extraction

jenit: feels like a real technicality

… without examples that show diffs it is hard to decide

ivan: agree, one of the issues for next week

… another q for what gregg said

ivan: I had this discussion with Ivan, … the dialect info that we introduced, if the processing model of the whole thing is such that the dialect is used to control the parsing, with prescriptive stuff, … or vs is it info i get after parser?

you seem to go the 1st way, ...

gregg: dialect tells you what the col separator is, e.g. ",". In order to parse the doc, you'd need to know that

ivan: maybe worth putting into the doc somewhere, is some sort of an abstract processing model, if i have a conformant processor, what are the steps that it must take w.r.t. metadata, parsing, generation of the output

gregg: syntax doc, … locating the metadata, …

<gkellogg> http://w3c.github.io/csvw/syntax/#parsing

ivan: not only the parsing but the merging of the metadata etc

… my impl right now does it the other way

… it has to be restructured. I start with the CSV file. I extract everything from there. But essentially I should be doing it the other way around, get metadata 1st then do parsing afterwards.

gregg: that was point of the default metadata yes

… in doc for parsing it has informative language towards what you want

jenit: t-2

… taking parsing discussion to f2f

… we have very strong direction originally that we couldn't talk about parsing, it was out of our remit

AOB pre f2f?

gregg: suggest people examine the test cases that are in there as they explore many of the things in there

can go thru at f2f

describe additional testcase needs

jenit: i'll try to put a rough structure together for f2f but expect it to be fairly fluid

… let's try to get to a place where editors are equiped to take fwd drafts end march

Summary of Action Items

[NEW] ACTION: dan check beamer setup for non google staff [recorded in http://www.w3.org/2015/02/04-csvw-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.140 (CVS log)
$Date: 2015-02-04 17:21:37 $