CSV on the Web Working Group Teleconference

04 Jun 2014


See also: IRC log


Phil Archer (philA), Dan Brickley (danbri), Jeni Tennison (JeniT), Jeremy Tandy (jtandy), Andy Seaborne (Andys), Eric Stephan (estephan), Ivan Herman (ivan), Davide Ceolin (DavideCeolin), Alfonso Noriega (fonso)
Dan Brickley
Andy Seaborne


<trackbot> Date: 04 June 2014

<danbri> https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-06-04No formal meeting last week.

<danbri> week before last's meeting: http://www.w3.org/2014/05/21-csvw-minutes.html

Progressing XML & JSON conversions

danbri: need people to champion these formats

jtandy: we have UC for each format

jenit: co-chairs intend to reach out to members of the WG

danbri: test cases - need structure in the repo for examples/test cases.
... inputs directory for csv, and then folder per output possibility
... can compare the different approaches.
... go though UCs and get the CSVs then produce outputs

jtandy: makes sense ... UCs already have files for CSV inputs

danbri: target presentations given?

jtandy: somewhat but not systematic.
... starting point, UC#21, TSV, some possible geojson output from that

danbri: allow different mappings with close position of different mappings

estephen: UC#6 is an interesting one

ivan: procedure -- if we use UC as test cases, we have be careful because once in /TR/ files are frozen

<danbri> [slightly muffled audio]

jenit: in the structure have different metadata for same CSV file.

danbri: good point

<danbri> (maybe a raw_source, canonical_source)

<danbri> + multiple metadata + mapping files

<DavideCeolin> ops sorry AndyS, I thought that was last assigned id

metadata format

<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Jun/0012.html

jenit: conversation with Rufus
... metadata specifying parsing options and whether CSV is 1-1 aligned to the "tabular format"
... #1 - standard CSV files , frags work

<danbri> cf http://tools.ietf.org/html/draft-hausenblas-csv-fragment-02

jenit: #2 - parsing options - e.g separator

<danbri> losing audio

<danbri> buzzzing noise

<estephan> aliens landing?

jenit: #3 padding around the CSV data, maybe more than one table in one file

<JeniT> andys: I think we’ll need to do a bit of #3

<JeniT> … especially to handle padding

<JeniT> … because tabular data is made to look nice in Excel

<JeniT> … the question is how much attention to pay to that

<Zakim> danbri, you wanted to suggest metadata linking dataset packages could work; package A could be TSV with fluff; package B could be a downstream transform of A

danbri: not great if input has appearance
... but pointing into cells good
... maybe CSV original => CSV cleaned => tabular data

jenit: that is #1 -- metadata refs cleaned file

<danbri> andy: risk of losing row/col refs to shape of the original file

andys: reference issue on rows

jtandy: suggest #1, #2 - agree -- and fixed format forms
... re: #3 -- location of data in the file. -- the frag relative to that datum.

jenit: yes - that's the idea in the message
... rows, cols references do not match the original file.

jtandy: provide a tool to extract the data in the processing pipeline.

<danbri> [re tooling, I stumbled across http://tablib.readthedocs.org/en/latest/tutorial/#quickstart yesterday, quite promising]

jtandy: step one can be extract : pure #3 is potentially confusing

ivan: is metadata content is generated by parsing or instructs the parser?

jenit: latter - it is input to parser

ivan: tab data model is non-norm on this -- not a charter item (in IETF)
... need to come up with test cases etc.

jenit: is more work. Think that #2 + explain how to use the hints.

<danbri> [there's some difference between defining a kind of software component ('parser'), versus giving metadata description of mappings from chars to tabular datasets]

<danbri> JeniT also suggested we discuss 'separating 'schema' from 'notes'' under this item

<Zakim> AndyS, you wanted to ask that an issue is left in doc about this for community.

<danbri> andys: can we have an issue is left in doc about this

<danbri> ivan: we can't avoid the procedural issue that IETF standardize CSV

jtandy: multistep processing to get tabular data format. Provide information tools/advice.

<danbri> jeni: there are other RFCs for other delimited formats

jenit: encoding and header are in RFC anyway ... other RFCs for other formats ... escape is borderline.
... encourages people to create good files.

danbri: IETF/W3C is not a worrying disconnect -- stuff to do

subtopic: notes for schemas

<danbri> vs schemas

<danbri> "separating 'schema' from 'notes'"

jenit: separate out the schema that can be reused across files

<jtandy> +1

<ivan> +1

ivan: in current doc - bare JSON and JSON-LD examples - decide?

jenit: aiming for JSON-LD compatibility

Model for Tabular Data and Metadata on the Web (Jeni)

<danbri> "renaming 'fields' to 'cells' to avoid culture clash"

jenit: renaming 'fields' to 'cells' to avoid culture clash -- "field" is a column

<ivan> +1

(what is the CSV RFC terminology?)

<danbri> +1

<jtandy> thankyou

<danbri> record = field *(COMMA field)

<danbri> in http://tools.ietf.org/html/rfc4180

jenit: records and fields in RFC

<danbri> "determine if we are happy that _all_ RTL tabular data is logically the same as LTR tabular data - just that it is rendered differently"

<JeniT> http://w3c.github.io/csvw/syntax/#bidirectionality-in-csv-files

<danbri> (does scribe bot have notion of a subtopic?)

<JeniT> AndyS, no: we will keep ‘column’ as ‘column’

danbri: need to grab a copy.

<Zakim> AndyS, you wanted to ask if the intention to use field as column?

jtandy: data is serialized byte 0, byte 1 , ... and RTL is about display.

ivan: same every RTL location?
... question to Yakov

jenit: highlighted in doc to be published.

ivan: remind me to follow up with W3C offices

<Zakim> danbri, you wanted to comment on RTL and separators

Use cases and requirements

jtandy: in good shape.
... HL7 -- maybe too ambitious - complex encoding. Drop?
... currently incomplete and no requirements so just leaving it as is is not good.
... from James McKinney (sp?)

jenit: ping contributor?

<Zakim> danbri, you wanted to suggest an 'other topics' section if don't have one

jenit: issue on additonal UC ... if community gives input

jtandy: ... HL7 not in current except as "known, incomplete"
... will put in UC current on list .. I have "a little list"

<danbri> ACTION: danbri assign more actions [recorded in http://www.w3.org/2014/06/04-csvw-minutes.html#action01]

<trackbot> Created ACTION-21 - Assign more actions [on Dan Brickley - due 2014-06-11].

<danbri> http://w3c.github.io/csvw/syntax/#bidirectionality-in-csv-files

jtandy: UC review by contributors --- currently empty ... draw people's attention to that

<danbri> action everyone read https://www.w3.org/2013/csvw/wiki/Use_Cases_Check

<trackbot> Error finding 'everyone'. You can review and register nicknames at <http://www.w3.org/2013/csvw/track/users>.

jtandy: issues in UCR ... take to list.

Generating RDF from Web Tabular Data

<danbri> Andys: I haven't had any time to spend on it lately

<danbri> danbri: seemed to be some mild disagreements last week - can someone summarise?

<danbri> andys: ivan's concerned about us taking on too much work

<danbri> ivan: what is happening now is that we are exploring the whole approach of basing conversion on templates

<danbri> … this expl means … i try to understand whats going on, i'm slow in understanding that, some details unclear to me

<danbri> … this is where we are , some emails flying around between andy and me. Next step as far as I'm concerned (given time avail) is to write down a draft spec

<danbri> … a bit on similar level to what I did a while back as 'mechanical approach'; then we have to make a decision about overall direction. Shouldn't be andy's and mine only.

<danbri> … back to json, xml — both andy and I have been using Turtle examples. At least in my case, main reason is that i'm more comfortable with it.

<danbri> … whole discussion/approach is pretty much generic. Andy do you agree?

<danbri> … that same tmpl approach should work with json, with xml … without any significant change

<danbri> … in this sense not just an rdf thing. sense/structure is not rdf specific. some details might be, but whole thing is generic.

<danbri> AndyS: I can see it working for json. Just don't know well enough at xml level to understand one way or the other.

<danbri> … Jeni - there were outstanding Qs to you from couple weeks ago. You mentioned requirements - I was asking for concrete requirements.

<danbri> JeniT: on conditional processing?

<danbri> … will use dan's example to illustrate that

<danbri> … Dan's suggestion to focus on some examples and target output with specific metadata files, draft templates, will be helpful in exploring this

<danbri> AndyS: a bit confused by that, as there are examples already, e.g. jeremy's

<danbri> JeniT: absolutely; just to hve this in a structured way in the directory structure, to help unlock progress

<danbri> ivan: back to procedural side, … dan/jeni will reach out to rest of group for the xml/json conversion

<danbri> … assumption that andy/ivan could edit, though i didn't really volunteer, so let's not assume that. Perhaps Andy in same situation.

<danbri> AndyS: yes, I have a time issue. If I don't see sufficient support for the approach i'm exploring, unlikely I can put more time in.

<danbri> dan: how close are the other specs, jeni?

publication status

<danbri> jeni: model spec is close. metadata spec has a lot of issues but ok for FPWD

tab data - nearly ready

jenit: metadata - happy to publish with issue boxes

<danbri> practicalities: ivan vacation 4 weeks from 14 July.

<danbri> dan some July vacation too.

jenit: tabular data in good shape -- s/field/cell/g

<danbri> ACTION: dan make a concrete proposal for concrete test cases structure [recorded in http://www.w3.org/2014/06/04-csvw-minutes.html#action02]

<trackbot> Created ACTION-22 - Make a concrete proposal for concrete test cases structure [on Dan Brickley - due 2014-06-11].

ivan: UCR looks ready or v close already


Summary of Action Items

[NEW] ACTION: dan make a concrete proposal for concrete test cases structure [recorded in http://www.w3.org/2014/06/04-csvw-minutes.html#action02]
[NEW] ACTION: danbri assign more actions [recorded in http://www.w3.org/2014/06/04-csvw-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2014-06-04 13:12:14 $