W3C

CSV on the Web Working Group Teleconference

03 Sep 2014

Agenda

See also: IRC log

Attendees

Present
Ivan Herman (Ivan) Andy Seaborne (AndyS), Dan Brickley (danbri) , Davide Ceolin (DavideCeolin), Bill Ingram (waingram)
Regrets
Jeni, Yakov
Chair
DanBri
Scribe
danbri

Contents


<trackbot> Date: 03 September 2014


AndyS, would you like to talk us through http://jena.staging.apache.org/documentation/csv/ and any lessons learned / plans?

<AndyS> Is that better?

<AndyS> It's not that noisy here - the mic picks up what I can't hear!

:)

supersenses

<scribe> Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-09-03

<AndyS> ... so remember, I didn't say it , and it wasn't me

<AndyS> Cheltenham is not far from here.

Ivan's template doc. https://www.w3.org/2013/csvw/wiki/CSVTemplating_status

ivan: Uni. Illinois, introduces new member

<AndyS> Hi waingram (Bill Ingram)

Bill Ingram, U Illinois. Repository developer, manage a team of devs on institutional repo, archives, ...

experience w/ a lot of CSV via research datasets, things attached to electronic theses, dissertations, RDF a lot, XML etc.

danbri: anything missing / interesting in use cases doc?

bill: did look, it's quite extensive; happy to contribute, but not sure if it's necessary, is probably covered

ivan: please look if the various features that would be in another use case are already covered. If already addressed, we're probably ok

bill: will keep in mind. i saw something around science data that seemed close, will look.

Templating Status

dan: can we take a start w/ https://www.w3.org/2013/csvw/wiki/CSVTemplating_status

ivan: looked at state of things yesterday
... latest status I could find

in addition, from JeniT: https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4

ivan: summarizing, ... we are heading towards a structure that's inspired by template systems like Mustache, used as an example here, ...

with hope that we could have one system/structure defined that can be used essentially in an unchanged manner for the various output syntaxes that we have

 i.e. RDF's various serializations + JSON(-LD), XML.

that's the hope. all the examples here are in Turtle. Could've used JSON but Turtle was in email.

 this is one thing.

ivan: the very simple, basic approach that may cover several use cases, is to have a simple template like Mustache

where the template patterns are keys that identify the names of columns, and the template itself

 in jeremy's example, is a file that can be referrred to from the metdata file

in some cases could be inline even in the metadata

won't copy in from the page

that's probably where things are very simple and quite useful

where we got into complications, world is not that simple, we need some sort of "variable" structure, ...
... if exists, can be used for templating

e.g. each col you can have a number of variables, defined by a regex, named

 what it means is that if i'm working on a specific cell in a col, then i check the regex, if it matches, then the corresponding variable is considered to be true/replaceable
... in the template itself i could then use the cell value for an output

(that's the 2nd example)

but likely we'll want conditionals of some sort

if-then-else

so you'll need a way to use the variables as a kind of branching mechanism

there were 2 approaches to that

jeremy had a structure that he put into the metadata, ... which essentially said that depending on a variable being true, ... acceptable, ...

 then he browsed into separate template files, that's how if/then/else was created

once you have the template it becomes very mechanical to generate the output

 some risk of combinatorial explosiion of template parts /c omonents

so i went back at took at look at mustache's mechanism

trying to use here a template that uses # if ... and a variable name

ivan: this was more or less where we got to in the discussion

some things weren't settled

i tried to list the points of disagreement, issues etc.

unclear what to do with unmatched templates

simplest is that nothing happens.

[...]

ivan: ... must be a place where i can put global templates; things that appear only once. Typical case is that if I want to generate a prefix statement in Turtle. Templates should have global values taken only once.
... i've put there as a separate issue, ... a repeat structure, ... anything outside that can be either not be a template or ref to global metadata keys
... also Datatypes
... in metadata doc, we have a number of datatypes for a cell

can define them per column, per row, per cell, ...

but those datatypes may not have an equiv in all the output systems

e.g. pure json, this doesn't have a direct match

in rdf or xml they're close due to xml datatypes

ivan: also the template syntax itself. needs some care. e.g. '{{', ... does this work for Turtle? XML? probably. For JSON? ugh etc.
... can this be parameterised?
... sure there are other issues here. that's where I got yesterday.

<AndyS> {{ is impossible JSON so that's OK?

Dan ask's AndyS for status / perspective

AndyS: Ivan didn't say anything much I disagree with. Re templates, ...
... they might be developed by different parties, may be use cases where diff templates make sense

there are technical things that can be done around sharing/embedding/nesting

alternative is one template with lots of conditionals

which is known to be problematic

<AndyS> "Liquid" - a different templating system - not/less HTML content focused - http://docs.shopify.com/themes/liquid-documentation/basics. Used by jekyll sitegenerator (and github.io).

AndyS: I'd also emphasise, that we need to be careful ... re Mustache etc, that if W3C is to produce a templating language within the standards world, ...

then I presume we would need to specify the templating language ourselves, even if it is a close match to external work, to give a standards process and control

scribe: that might be an issue from w3c point of view

AndyS: also reporting, Google Summer of Code student work

we got a great student

(Jena?)

 will work on basis for potential w3c work, csv to rdf mappings
... used a mechanistic mapping for now, as there was no proposal ready to implement

from JeniT: https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4

https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/attempts/attempt-1

danbri: a common repo structure sohuld let us try out different design candidates

ivan: responding to AndyS; re multiple templates on same file, ... jeremy's clever trick, ... ref to the template is in the metadata

[train noises]

[steam train!]

ivan: part of the metadata for a specific file, the way the model works, ... you can have metadata from diff sources, you aggregate those to get the final metadata
... if someone wants his or her own, can use links/refs [...]

(handleable)

ivan: AndyS, you're absolutely right, in that we can't make standard ref to the Mustache project as it may change
... so we need to specify it ourself in full detail
... to be v clear about it and address high-level question: i've written down what the template language approach would mean
... and what we've realised here after all the discussion, is that we cannot get away with something super-simple because we hit if-then-else requirements very early
... AndyS emphasised this very early on.
... I'm not saying personally that the template mechanism, ... that this is the ideal one, ... nor that this is THE solution that we must follow, ...

<ivan> https://github.com/w3c/csvw/tree/rdfconversion-ivan

Ivan: maybe by exploring this route, we might need to step back from it. A while back I tried a purely mechanistic way, ... using CSV + metadata (link above here)
... this essentially writes down the mechanical generation of RDF, same could be used for others

we could say that this is what we define, where we can attach an XSLT script, RDF sparql transform, etc

 that this might be as far as we go
... we have explored templating langs so far, but we need to consider possibility that we back off, and use some simple structure + exploit existing tools
... I'd be perfectly happy with that as well

AndyS: reiterating a point I've made before. By involving xslt, sparql, js, ... etc. We would disenfranchise all those people who don't use those tool chains.

q

ivan: ii'm not saying you must use XSLT, but that we could/should/might provide a way to indicate XSLT

[...

]

ivan: not just expressivity of lang ... but what we can reasonably define and get accepted by the community

AndyS: on that Q, has there been pushback?

danbri: not that i've heard

AndyS: [concerned by indecision]

ivan: ... "this is _a_ candidate road, but what we need feedback on is not just the lang, but whether we need a language in the 1st place"
... alternative is that we define a structure and mechanistic mapping plus metadata for additional other mapping mechanisms

dan: how much work to take strawman through to a Tech Report, framed as a strawman

ivan: wiki page is close enough to First Public WD territory, with medium amount of work
... but i don't have a lot of time to try

same q to Andy

AndyS: hearing contradictory msgs, ...
... catalog of examples with lots of diff formats
... ppl like jeremy have spent quite some time on an approach that works for them

<AndyS> Charter?

<AndyS> http://www.w3.org/TR/csvw-ucr/#R-CsvToRdfTransformation -- published so expectation setting?

<AndyS> +JSON, +XML

timecheck

AndyS: [we don't normally have existential crises each week!]

sorry, kicked out of teleconf room

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2014/09/03 13:31:18 $