CSV on the Web Working Group Teleconference -- 07 Jan 2015

<trackbot> Date: 07 January 2015

<JeniT> sorry, just trying to dial

<bill> JeniT: lots of issues needing discussions — what should be focus on?

<bill> ivan: would like to see an example where metadata comes from different places

<bill> JeniT: examples don't [currently] represent that situation

<bill> …propose discussing examples then import/metadata issues

<bill> ivan: prefer to look at merge and import stuff first

<JeniT> http://w3c.github.io/csvw/syntax/#examples

<bill> gkellogg: trying to understand the merge order

<bill> …how to use data from meta file, e.g., skip columns, before ever seeing the csv itself

<JeniT> https://github.com/w3c/csvw/issues/145 is ivan’s point

<JeniT> ie how to merge metadata

<bill> … let's do combination first

<JeniT> https://github.com/w3c/csvw/issues/145#issuecomment-68766764

<bill> JeniT: point is to separate when we're talking about merged metadata document used to annotate tabular data model

<bill> …issues raised around how the metadata file is created from multiple metadata files — using imports

<bill> …why did we think we needed imports in the first place?

<JeniT> http://www.w3.org/2014/10/27-csvw-minutes.html#item09

<JeniT> “RESOLUTION: We use an ‘import’ property in the first metadata document found through the precedence hierarchy described in section 3 (but with inclusion of user-defined metadata); the merge is a depth first recursive inclusion”

<bill> …search for this text ^

<bill> ivan: I remember JeniT's comment that you have do a bunch of GETs in any case

<bill> gkellogg: you don't want to repeat yourself; import mechanism solves that

<bill> …but seems relatively advanced to start with

<bill> …first need to figure out how to deal with multiple sources of metadata

<bill> ivan: we had a slightly different situation: the unnecessary GETs were just one of the issues

<bill> …also have the possibility to talk about several files, i.e., directory-level metadata

<bill> JeniT: still have to repeat for each table common stuff

<bill> ivan: much more complicated in (e.g.) javascript because of async nature

<bill> …question is whether it is really worth it

<bill> gkellogg: complication is when promises are chained together

<bill> …if you encounter an import in there

<bill> ivan: right, you end up with recursive calls to promises — it's ugly

<bill> JeniT: let's move away from implementation in favor of focus on use cases

<bill> …do we need to merge metadata files at all — is that useful

<bill> ivan: wondering whether we can have a structure within the metadata file instead of relying on import

<bill> …some sort of a global structure that is conceptually copied into each

<bill> gkellogg: certainly the ability to have common table-level metadata is complicated by the fact that @id is required by table

<bill> …one way around that is to require @id *after* all processing is finished

<bill> JeniT: in that case schema is used in a slightly different way

<bill> gkellogg: schema is property of table group

<bill> JeniT: sounds like suggesting that some kind of table group property contains all these *global* properties

<bill> JeniT: propose we scrap imports in lieu of table group metadata

<bill> ivan: looking back at the structure from before the f2f

<bill> …you have few ways to get the metadata; also can do more than one, in which case you'd have to merge

<JeniT> it says “Processors must attempt to locate a metadata document based on each of these locations in order, and use first metadata document that is successfully located in this way.”

<bill> gkellogg: suggest you take the first one and then stop

<bill> …merge issues still exists in cases of user-supplied metadata, etc

<JeniT> http://w3c.github.io/csvw/syntax/#using-overriding-metadata

<bill> JeniT: example 2 (?) handles user-supplied metadata ^

<bill> gkellogg: whatever user metadata is provided, process is to coerce it into consistent metadata

<bill> …consistent == if title is in one it must be the same as in the other

<bill> ivan: several titles are put into an array

<bill> …the beauty and complication of import mechanism is that is defines a semantic merge

<bill> …we have to know which has higher precedence

<bill> …my personal view is that we take everything and step by step merge them using import algorithm

<bill> gkellogg: would like to differ that until we've addressed merging in general

<bill> ivan: two things: 1.) import property, necessary or not 2.) import as an algorithm for merging

<bill> JeniT: to move forward, propose we take the language from the syntax document about merging, and from the metadata document about where to find metadata files and how to merge them

<bill> …we have an example there for dealing with that

<bill> …regarding the import statement itself, more discussion is needed

<bill> …leave it in for now

<bill> ivan: can we formally propose it and discuss it over e-mail and see where it goes

<bill> gkellogg: don't want to belabor but once we have the merge order rules, the import directive will be more better understood

<bill> …proposed in my description a way in which it might work

<bill> …if we just think about title, and have several sources of title information

<bill> ivan: current merge algorithm handles this already

<bill> JeniT: name and title are handled very differently

<bill> gkellogg: i think name is set from title unless it is otherwise stated

<bill> ivan: titles can pile up in an array, but name must be the same

<bill> JeniT: name is a required property on column desc

<bill> ivan: i need to have a name, else I cannot convert to RDF

<bill> …according to the current algorithm, titles may be different, but if (at the first step) if i don't have the name, i don't have a column desc for the final metadata

<bill> …need to start somewhere

<bill> gkellogg: postpone validation until after merge has taken place

<bill> …what's the difference between an annotation and a property?

<bill> …title will always exist, name may not, so extract name from title

<bill> …allows space for other metadata to specify name

<bill> ivan: what i claim is that this does not work with the current merging algorithm

<bill> …if i have two metadata files with two arrays of column descriptions, if i can find common

<bill> …values, I can merge

<bill> …otherwise there is no way to find the merge

<bill> JeniT: reason is that current algorithm as described assumes every column will have a name

<bill> …since the creation of the metadata documents, we've introduced this issues

<bill> gkellogg: sounds like JeniT has a concept, which we'll need to work out

<bill> JeniT: I do have concept for how this works

<bill> …specified either by the implementation or the specification, it's only when you get to the metadata documents where that merge might happen

<bill> …in other words, there's work to do in terms of describing that in terms of metadata documents

<bill> gkellogg: it might just be useful to walk through an example to see what the process would be

<JeniT> http://w3c.github.io/csvw/syntax/#annotated-tabular-data-model

<bill> JeniT: will try to make examples for further discussion

<JeniT> ScribeNick: JeniT

<gkellogg> ivan: the current schema is complicated as two merged schemas may have a different structure.

<bill> :) bye

<scribe> ScribeNick: gkellogg

… If we make use of the order and number of column descriptions, then it’s easied by knowing they have the same number of columns.

CSV on the Web Working Group Teleconference

07 Jan 2015

Attendees

Contents

Summary of Action Items