CSV on the Web Working Group Teleconference

09 Apr 2014

See also: IRC log


AndyS, fresco, +1.937.207.aaaa, phila, JeniT, MathewT, DavideCeolin, danbri, +44.777.586.aabb, jtandy
Axel, Stasinos, Alfonso
Andy Seaborne


<trackbot> Date: 09 April 2014

<scribe> scribe: Andy Seaborne

<scribe> scribenick: AndyS

<danbri> thanks AndyS!

AndyS: Regrets for next week.

<JeniT> JeniT: Regrets for next week

<JeniT> http://www.w3.org/2014/04/02-csvw-minutes.html

<danbri> looks good

AndyS: Not all actions recorded in the tracker

<danbri> 3 of them are for me; i'll make todos directly.

APPROVED: Minutes http://www.w3.org/2014/04/02-csvw-minutes.html


Davide: will sync with jeremy

phila: making progress on my action for a UC

<phila> ACTION: phila to add use case linking from metadata to the data [recorded in http://www.w3.org/2014/04/09-csvw-minutes.html#action01]

<trackbot> Created ACTION-12 - to add use case linking from metadata to the data [on Phil Archer - due 2014-04-16].

<danbri> (phil's action was on me last week as "chase phila for his usecase in which a party provides metadata for another's csv"; I declare my work done)



<JeniT> AndyS: we had a telcon yesterday

<JeniT> ... including jtandy, Gregg, Juan

<JeniT> ... we're looking at processing from CSV to CSV to clean up the general data

<JeniT> ... eg fixing up new lines, delimiters, date formats

<JeniT> ... thought better to do that as rewriting CSV

<JeniT> ... then convert clean CSV to RDF/JSON/XML

<JeniT> ... R2RML is the nuclear option for complicated transforms

<JeniT> ... we didn't push on the boundaries around that

<JeniT> ... similarly might want to do RDF-to-RDF or JSON-to-JSON transforms after conversion

<JeniT> ... we don't want to repeat work done elsewhere, or add more tools to end users' toolchain

<JeniT> ... we discussed on what's published

<JeniT> ... there's CSVs published as the outcome of a longer process

<JeniT> ... shared schemas, shared transformations, custom mappings

<JeniT> ... at scale & in volume; sharing parts of the files is beneficial

<JeniT> ... vs someone taking CSV from data.gov.uk

<JeniT> ... and adding their own transform

<JeniT> ... they need something more self-contained

<JeniT> ... a single file to control the transformation

<JeniT> ... also whether the CSV was created without the web in mind, or with the web in mind

<JeniT> ... particularly with spotting links & data formats

<JeniT> ... Gregg is going to look at pulling out his transform description to apply it independently of JSON-LD

<JeniT> ... we're hopeful that there will be commonality in conversion to JSON

<JeniT> ... which kind of depends on whether the conversion is to JSON-LD

<JeniT> ... had a good chat with Ivan when we met up

<JeniT> ... comments on what's been written would be great

<JeniT> ... it's a bit scruffy, but the general approach is there

<JeniT> ... I'm using the term 'basic mapping' rather than 'direct mapping'

<danbri> 'simple mapping'?

<JeniT> ... there's a progression of complexity

<danbri> 'wishfulthinking mapping'

<Zakim> danbri, you wanted to ask status of test case csvs for this exploration

<JeniT> danbri: are there test files?

<JeniT> AndyS: there's tests in the repo

<JeniT> danbri: are they mainstream examples or test cases?

<JeniT> AndyS: the test ones from gkellogg are focused

<JeniT> danbri: we'd like mainstream examples

<JeniT> AndyS: I've put some of those in the document

<JeniT> ... if you could work through one of the examples you want to put in, that would be great, like jtandy did

<danbri> https://github.com/w3c/csvw/blob/gh-pages/examples/simple-weather-observation.md

JTandy: we also talked about was charter and metadata in RDF
... may be distinct from the mapping framing (not in RDF)
... want to test this with WG.

<JeniT> AndyS: yes, metadata about the CSV file may or may not be in RDF

<JeniT> ... it might be simpler to have one language that drives all the mappings

<JeniT> ... which might include provenance etc

<phila> from the charter "The vocabulary should be defined, or should have an encoding, in standard RDF and, wherever possible and appropriate, should refer to, and reuse, existing vocabularies developed elsewhere." - i.e. it doesn't have to *only* be in RDF

<JeniT> ... even in JSON-LD, the context part isn't RDF

<JeniT> jtandy: we talked about gkellogg pulling out the transformation stuff from JSON-LD to see if it could be expressed in Turtle

jeniT: easy to write might mean TTL
... want to see the things it will say to guide the syntax choice.
... separating CSV-specific xform from JSON-LD will be good.,
... nudged Rufus and Ross Jones re JSON.

<JeniT> https://www.w3.org/2013/csvw/wiki/Conversions

<danbri> aside - another JSON-LD launch at google this week: https://devsite.googleplex.com/webmasters/business-location-pages/schema.org-examples (i.e. we like JSON-LD)

Model for tabular data

jenit: e.g. import into relational DB

davide: may have some interesting data as example

<jtandy> danbri - that looks like an internal link (googleplex) ... just tried it :-)

subtopic: null fields

<JeniT> http://w3c.github.io/csvw/syntax/#core-tabular-data-model

jenit: "What is a null field" comment from D Booth
... absent and empty : same? different?

jtandy: in the discussion, defaults value need to be handled.

<danbri> lost audio

jtandy: empty field returned. Have a explicit "null" marker (999, whatever)

subtopic: packaging

<JeniT> http://w3ctag.github.io/packaging-on-the-web/

jenit: TAG work

<jtandy> the "999" marker would be declared in the metadata annotation as a token indicating a "null field" / missing field

jenit: need arises in various places
... general need for web development
... we need to do similar - CSV(s) and metadata

<JeniT> http://w3ctag.github.io/packaging-on-the-web/#downloading-data-for-local-processing

jenit: link to draft of the TAG direction with a specific example for this WG
... individual file are still on the web
... but that a "package fetch" pulls them all at once.
... individual files LInk back to their metadata
... streamable proposed based on multi-part
... comments invited

<jtandy> ok - packaging stuff looks interesting

<phila> no questions but it's interesting, thank you

danbri: Other groups feedback?

jenit: no HTTP changes

danbri: what about HTTP layer optimizations? e.g. caching

jenit: overlap with HTTP/2
... would need packaging aware caching to cache sub parts but format allows cache header per part
... will write to the list

subtopic: metadata packaging
... metadata format

jenit: hold back until we know what's in it

jtandy: been looking at "Simple Data Packaging" (now renamed) looks very close
... start from that?

jenit: Would be good to start from there - except it assumes JSON.

jtandy: start with the JSON assumption and see how it is received on WD

<Zakim> danbri, you wanted to say start from SDP as a *vocabulary* is fine, but something that fits with RDF is also important

danbri: schema.org ==> vocabulary start good, but syntax of JSON only might be a barrier.

<jtandy> +1 to taking SDP metadata and expressing in RDF over JSON-LD

phila: Uncomfortable if excludes the dataprotocols work when it need not.
... significant community
... at least add conversions to/from.

<JeniT> AndyS: I think there was something that said the data package might become JSON-LD

<danbri> i can't find a good link for SDF, was it renamed?

<JeniT> ... I'd like to get a sense of how successful that format has been

<JeniT> ... and if there are any others

<danbri> http://dataprotocols.org/tabular-data-package/

<JeniT> ... I thought it was a good starting point, but I realised I didn't know what the reception had been

jenai: DSPL alternative

jenit: DSPL alternative

<danbri> DSPL is https://developers.google.com/public-data/ ; Omar I mentioned earlier was working to migrate this to schema.org / RDF / JSON world

<danbri> https://www.w3.org/wiki/WebSchemas/LookInside

jenit: used the format in our (ODI) tools
... and providing feedback (ldodds)
... would they contrib a draft?

phila: Rufus is IE in this WG because it helps align the work.
... this WG will likely go beyond that work as extensions. Maybe WG NOTE for existing work.

<danbri> I'd suggest we take it as expressivity requirements and we 'should' at least have a clear mapping

jenit: will contact Rufus
... can we take into account data package work?

<JeniT> ... in the conversions

<JeniT> http://w3c.github.io/csvw/syntax/#package

jenit: AOB?

jtandy: timescales?
... next publication esp UCR doc?

phila: no lower limit on repub cycle

jtandy: Happen to move forward in May

<jtandy> s /Happen/Happy/

jenit: UCR will remain "open" to capture new discoveries.

jtandy: requirements are placeholders, more categorization and "accept" requirements

jenit: aim of mid May with more UCs.
... ??
... after Easter , process to accept requirements.

danbri: propose skip next week

<jtandy> +1 to skip

danbri to chair next time, 2 weeks time. Wed after Easter.


<phila> DNM 23 April