CSV on the Web Working Group Teleconference -- 10 Sep 2014

<trackbot> Date: 10 September 2014

hmm

<scribe> scribenick: danbri_

Jeni: discuss a few issues raised over last month or so. Some explanation re next week's "special call" on our templating decisions
... I'll be asking around for F2F looking for volunteers to lead sessions. With only 2 chairs + 2 particpants today we're not quorate for decisions, but can have a bit of discussion.

<JeniT> https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-09-10

type vs datatype

<JeniT> https://github.com/w3c/csvw/issues/22

Jeni: this is an issue from the metadata document, where we need to figure out what to call the property in the metadata that refers to the datatype of the values in a partiular column
... there are some constraints here from attempting to adopt json-ld
... while seems a small issue has some impact on relationship w/ json-ld
... json-ld, the interpretation of an 'object', ... using an @type, using just @type, and using @datatype
... if we use @type, then that is interpreted specially in json-ld to be w.r.t. the type of the thing being described

<JeniT> { "@type": "Column" }

which in this case would be the columns rather than individual cells

thinks it makes sense, i mean

jeni: 2nd option is to just use the plain term 'type'. reason this is a little problematic, ... for other properties on columns, ...
... they're generally treated as meaning whatever Dublin Core says type means

e.g. we can have a col description that basically says

<JeniT> { "source": "http://example.org/"" }

"source is some-source"

"and that would be interpreted ... the source property would be interpreted as meaning same as the DC property 'source')

but DC has a property 'type', which isn't particularly helpful here

<Zakim> danbri_, you wanted to ask if json-ld context is making the DC mapping explicit

jenit: yes, we'd explicitly map to dc

<JeniT> https://github.com/w3c/csvw/blob/gh-pages/metadata/csvm-context.json

jenit: in repo for what we're doing here, ... then we have got an initial start on what that context looks like
... this came up re simple data format discussion

danbri: when did we decide to fold all of dublin core into our lang?

jenit: happy to discuss

<JeniT> danbri_: maybe we should map schema.org instead

<JeniT> ... that will map on to DublinCore

<JeniT> JeniT: which set of properties?

danbri: "whatever we want"

fresco: ... not sure what the mapping in the context file is, beyond mapping col names to url

jeni: these aren't about mapping col names, but about mapping metadata re particular columns, source it comes from, rights over any data, when that column was created, ... these properties could apply also at the table level. E.g. publisher of a particular CSV file.

fresco: do we expect usage on columns, vs cells?

jenit: you have a global context, ... rather than it being different for different objects
... i.e. if define all of these things so that they could apply to tables, ...

which would seem rational, since DC is doc-metadata-centric

scribe: that the upshot of this is that they would also be used to interpret any other column, row metadata

danbri: we could define inline contexts

<fresco> http://schema.org/Dataset

fresco: would be good to look at schema.org to see if it has what we'd need

(plus Organization, Person, etc etc)

(in github here, https://github.com/rvguha/schemaorg )

schema.org doesn't seem to have 'type' yet, at least. http://schema.org/type (but does keep adding stuff)

jenit: to take this fwd, shall I widen it out to be an issue on whether to adopt the DC set of metadata terms, or the schema.org set of metadata terms

jtandy: i think that would be sensible; it doesn't replace the type vs datatype issue

jenit: the issue changes then. the issue becomes potential confusion between 'type' and '@type', since they could be both used but have diff meanings

jtandy: on topic of schema.org, DC; if there is a clear mapping from schema.org to DC, it would appear to be a sensible way fwd

danbri: some terms map; but there are a few differences
... is there enough specificity in the use cases to drive a decision on using dc, schema.org etc.

<JeniT> ACTION: JeniT to write to mailing list re using schema.org rather than Dublin Core for metadata about CSV files, then binding decision on following telcon [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action01]

<trackbot> Created ACTION-26 - Write to mailing list re using schema.org rather than dublin core for metadata about csv files, then binding decision on following telcon [on Jeni Tennison - due 2014-09-17].

jtandy: stating a license, stating who is responsible, ... ... but often use cases just say "we need publishing metadata", " this ,category of metadata"

[...]

yakovsh: re datatypes, maybe i'm unfamiliar with rdf, ... is there a link to what datatypes there are?

this? http://www.w3.org/TR/rdf11-concepts/#section-Datatypes

<JeniT> http://w3c.github.io/csvw/metadata/#datatypes

jenit: this assumes standard set of w3c-designed datatypes, which came from xml schema
... i.e. those most usually used within RDF, but slightly extended to include Number, Binary, ...

(what's binary mean in CSV?)

<JeniT> http://w3c.github.io/csvw/metadata/#datatypes

yakovsh: having a clear list is important

jenit: see 3.8.4

ah, "the datatype binary which is exactly equivalent to base64Binary"

jenit: we're extending the list here

doc has specific issues flagged

scribe: being consistent with simple data format, and other existing work around w3c

yakovsh: if that list needed to change in future, how would that work?
... on ietf side, we tend to worry about extensibility
... things change over time

FYI see http://www.w3.org/2001/05/xmlschema-errata for changes in http://www.w3.org/TR/xmlschema-2/ vs earlier version

[...]

jenit: ... various ways, e.g. consider impact on validators

yakovsh: do we want to discuss extensibility?

jenit: yes, should def be part of the pattern of how we work on the standard.

Jena CSV update

AndyS, any thoughts/conclusions?

Andy - not a lot to say. Mapping is v simple, hardcoded / built-in. Purpose of project 2-fold. Get something working in the time available (google summer of code student). And didn't want to pre-judge WG decisions.

[same update as last week except audio quality 1000x better :]

jenit: work done - anything that makes you feel direction here should be one thing or another?

andys: we didn't push on it beyond column=predicate, ...

goal was code rather than a research project

jenit: next steps with it?

andys: "wait" :)

Use case 4

<JeniT> https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4

JeniT: update from my summer investigation.

Jeni: UC-4 is quite interesting. About publication of information about org structures in the uk civil service

each dept publishes a pair of linked csv files

scribe: same schema(s)
... always in pairs

certain places where schema is extensible

kinda

e.g. dept might have sub-groups

<JeniT> https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/attempts/attempt-1

scribe: readme throws out some of the issues for discussion
...

special codes in front of columns

the way I addressed it was use of regexes

scribe: struck me afterwards, having a separate csv file could be a better design for that package of csv files
... made me think about revisiting that design
... some things around co-constraints across the columns

in one of the files, ... if the unique post ref in some column, then the job title must be not-in-post, function must be n/a, etc.

jenit: do we need within schema files to indicate co-constraints?
... validation constraints. whether cols can be both NULL and required.

i.e. must have NULL value textually in the actual CSV text.

jenit: working through that was v interesting in terms of highlighting specific issues
... any comments/questions?

jtandy: I agree that taking through the specific examples is great way to learn how this stuff works

looking at what you did, ... I see under Attempt 1, that you have the 2 target JSON files, ...

the ones you created by running through some form of parser

and you also have a metadata.json that's fairly trivial for now

some of your issues from README.md, e.g. dealing with null, conditionals, ...

scribe: do you anticipate an attempt-2 with an approach to some of those issues?

jenit: I realise I didn't commit all the data
... the bit I wanted to focus on was how to manage the packages, where you want a set of CSV files to conform
... and also how they could be better split out
... be better structured

if you had that kind of metadata

my observation from these files is that they are extremely flat, repetitious

wanted to see how it'd look in an ideal world where we had this csv on the Web approach; in which case, how might they be publishing it differently in such a new world

scribe: didn't want to stray too far from what's in the metadata spec

jenit: lots of things not yet agreed, so just exploring

jtandy: in processing these things, have you been able to create any targets; any of the transformed content?

jenit: my focus has been more on validation rather than transformation

<JeniT> https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/output

jenit: within this particular use case there is an output ...
... which gives RDF in particular formats
... could that be generated with the metadata

what would you need to do to get that RDF from those CSV files + metadata?

+ maybe templating

1st piece def needs templating

scribe: but could the packaging be restructured?

<Zakim> danbri_, you wanted to ask about moving these stuff into common github branch - does the structure w/ 'attempts' more or less work for us?

danbri: filetree ok?

jenit: suggest we roll it in

resolved: sure, whatever.

jeni: AOB on that example?

(oh, forgot to scribe: earlier Jeni confirmed that the files in output/ can be treated as 'golden triples' for template mapping experiments)

Templates

jeni: we need to decide asap on a course of action w.r.t. whether and how we describe a templating format.

Whether we make it an extension, whether it be done at all, etc.

<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Sep/0006.html

scribe: will have a special call next week (Weds as usual), attempting to make a resolution on this.

If we can't get consensus on this, we'll defer until f2f.

AndyS: please go ahead
... I'm completely confused by the area and can't make the f2f
... Is there a f2f attendee list?

jenit: not afaik

danbri: we should all register for TPAC (which involves fee etc)

<JeniT> ACTION: JeniT to get Ivan to send round reminder re TPAC and to create attendee list [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action02]

<trackbot> Created ACTION-27 - Get ivan to send round reminder re tpac and to create attendee list [on Jeni Tennison - due 2014-09-17].

andys: dep on UK trains, I could possibly be at next week's call

jenit: if you can join that's great otherwise please let's be using the mailing list

jtandy: key point re templating q is balance between applying more resources to create an additional recommendation (the tempating lang) vs a standard that might not be as powerful as we hoped

jenit: that's roughly it

jeni/jtandy - looking at use cases important

<JeniT> ACTION: jtandy to survey use cases re requirement for templating [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action03]

<trackbot> Created ACTION-28 - Survey use cases re requirement for templating [on Jeremy Tandy - due 2014-09-17].

fresco: similar point to make. maybe someone could add to one of the docs, reasons why templating is thought to be useful in the 1st place.
... not seeing motivation

jenit: what are the patterns of use that we are anticipating seeing?

<fresco> particularly can the processing be performed on the in-memory data model, rather than on output

jenit: are we anticipating people who are receiving the data downloading the templates then processing them?

or tools at publisher end

scribe: what patterns of use do we anticipate?

jenit: anybody want to volunteer to try to capture what those patterns of use might be, around conversion?
...
...

[tumbleweed]

jenit: ok, I'll try

<JeniT> ACTION: JeniT to document patterns of use for conversion to different formats [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action04]

<trackbot> Created ACTION-29 - Document patterns of use for conversion to different formats [on Jeni Tennison - due 2014-09-17].

(this sounds similar to jtandy's action too)

<AndyS> e.g. https://github.com/w3c/csvw/blob/testing-variations/examples%2Fsimple-weather-observation.md

jenit: similar but UCs have focussed on what the CSV looks like more than what is then done with it.
... rather than how it fits into workflows.

<Zakim> danbri_, you wanted to discuss balance

<JeniT> danbri_: other question is to what extent is this a CSV problem

<JeniT> ... there are existing tools eg around Mustache

<JeniT> ... do we need something CSV-oriented or are there existing things that could be used

<JeniT> ... we keep coming back to Mustache, whereas Django is implementation specific

<AndyS> Velocity

<JeniT> ... "is this really a CSV problem?"

yakovsh: wanted to mention ECMAScript has tempating built in as well

jenit: got a link?

see also: http://www.polymer-project.org/docs/polymer/expressions.html which is based on HTML Templates

jenit: i was looking at web components, similarly

<yakovsh> http://tc39wiki.calculist.org/es6/template-strings/

<Zakim> danbri_, you wanted to discuss web components

<yakovsh> https://people.mozilla.org/~jorendorff/es6-draft.html#sec-template-literal-lexical-components

<JeniT> danbri_: the <template> doesn't do interpolation; Polymer builds Mustache on top of it

jenit: AOB?

yakovsh: [missed] something re XML, XSLT?

jeni: I heart XSLT

Adjourned.

ALL HANDS NEXT WEEK

<yakovsh> xslt for templating

Thanks Jeni

CSV on the Web Working Group Teleconference

10 Sep 2014

Attendees

Contents

type vs datatype

Jena CSV update

Use case 4

Templates

Summary of Action Items