CSV on the Web Working Group Teleconference

19 Mar 2014


See also: IRC log


Yakov Shafranovich (yakovsh), Dan Brickley (danbri), Gregg Kellogg (gkellogg), Jeni Tennison (JeniT), Ivan Herman (Ivan), Andy Seaborne (AndyS), Matthew Thomas (MathewThomas), David Ceolin (DavideCeolin), Tim Finin (tfinin), Eric Stephan (ericstephan), Alf Eaton (fresco_), Anastasia Dimou (andimou)
Stasinos Konstantopoulos, Jeremy Tandy, Rufus Pollock, Jürgen Umbrich
Jeni, Dan



review last week's minutes

<scribe> ScribeNick: JeniT

<scribe> Scribe: Jeni

<danbri> http://www.w3.org/2014/03/05-csvw-minutes.html

RESOLUTION: Minutes accepted

<danbri> resolved: minutes are a fair record

<AndyS> That was March 5

<AndyS> http://www.w3.org/2014/03/12-csvw-minutes.html

<danbri> http://www.w3.org/2014/03/12-csvw-minutes.html

Model for Tabular Data & Metadata on the Web

<danbri> 'this morning i went through the actions from last week's call

<danbri> basically to add in a section that talks about the various methods of locating metadata

<danbri> about a csv file.

<danbri> that section is now …


<danbri> <- here

<danbri> 'it's very sparse, but with lots of issues highlighting places where more discussion/ work needed to resolve the details. but fine for FPWD.

<danbri> danbri: are you proposing that we publish this?

<danbri> jeni: it looks ready. changed short name as req'd. refined abstract. i think addresses concerns from ivan/ralph discussion.

<danbri> ivan: to be precise, ralph didn't object as such; i was trying to anticipate possible issues. i think it's fine.

ivan: we had some discussion about adding text to the status section

<danbri> … we had some discussion re adding text to SOTD

ivan: which we could add to make it clear what's happening in relation to IETF

<danbri> jeni: fine adding text that dan suggested

AndyS: suggest using 'tabular-data-model' rather than 'tabular-model' to make distinct from eg HTML

JeniT: happy to make that change

<danbri> ivan: let's concentrate on data rather than html tables

danbri: we might extend to HTML tables at some point
... propose we publish as FPWD

<ivan> PROPOSED: the tabular data model should be published as FPWD as 'tabular-data-model'


<ericstephan> +1

<AndyS> +1

<gkellogg> +1

<ivan> +1

<danbri> +1

<yakovsh> +1

<DavideCeolin> +1

<tfinin> +1

<fresco> +1

<danbri> +1 from Jeremy

<ivan> RESOLVED: the tabular data model should be published as FPWD as 'tabular-data-model'

<MathewThomas> +1

<andimou> +1

Use Cases Document

DavideCeolin: I just sent an email about the issue about XML conversion

<danbri> http://w3c.github.io/csvw/use-cases-and-requirements/

DavideCeolin: I added that issue, so as far as I'm concerned it's fine

danbri: we can always make another WD, do we have to resolve this before we publish?

ericstephan: we've put all the use cases together: 18 use cases
... if there isn't a use case that you submitted, it might have been combined with another
... we have a number of requirements too
... but I believe we should be good to go for FPWD

danbri: consensus from the editors
... is short name fixed?

<ivan> PROPOSED: the use case document should be published as FPWD as 'csvw-ucr'

<ivan> +1


<danbri> +1

<ericstephan> +1

<yakovsh> +1

<AndyS> +1

<danbri> +1 relayed from Jeremy

<fresco> +1

<tfinin> +1

<gkellogg> +1

<andimou> +1

<AndyS> TR/csvw-use-cases-and-reqs/

AndyS: 'csvw-ucr' is not what's in the document

<AndyS> Changing to shorter is fine.

ivan: yes, I propose changing to 'csvw-ucr' as it's short

<ivan> RESOLVED: the use case document should be published as FPWD as 'csvw-ucr'

<danbri> …

ivan: there is a bit of a process
... danbri & JeniT will have to make the request for these short names
... point to the editor drafts & say what they do
... it won't be out of the blue

<danbri> ivan: no formal template, but not a big deal. can cc chairs.

ivan: in parallel I'll contact web master

<danbri> … i can contact webmaster as a placeholder

ivan: I've already started checking the documents; I'll merge changes etc
... I propose publishing 27th March

danbri: ok, we'll get the request out
... when do we lose you?

ivan: early April, hence trying to publication before then

danbri: what about minutes publishing?

ivan: I'll take care of it, but probably with some delay

Subgroups on Conversions

<danbri> "Sub-groups to explore RDF/JSON/XML mapping systems (that address our use cases)"


<danbri> jeni: dan and i propose subgroups on particular conversions, for rdf and json; and if a requirement, for xml as well.

<danbri> see wiki page ^—

<danbri> each group would look at what info is needed to convert something in tabular data model + annotations, into xyz format

<danbri> "idea is … if each group does this semi-independently, as a whole group we can look at overlaps

<danbri> e.g. if each group needs to know about datatypes for particular values, we can resolve what that looks like for whole group.

<danbri> "is this a reasonable way forward?"

ivan: I'm worried about strictly separating the various conversions
... we discussed that the JSON and RDF conversions may be part of the same thing
... ie using JSON-LD
... I'm worried that if we strictly separate them, we'd remove a possible synergy

<gkellogg> The same could be said about XML, if RDF/XML is a reasonable way to publish XML

<danbri> jeni: i'm strongly of opinion that we should not aim for those synergies too early

<AndyS> I was going to join both because ... err ... will need both.

<danbri> we shouldn't prematurely assume that json users want json-ld too early. we might miss something.

<danbri> it's fine for the rdf group to think about how that might be done using XML, json(-ld), RDFa, etc.

<danbri> …but i want to make sure that we don't force people who are primarily interested in the browser to go through an unwanted rdf step

<Zakim> AndyS, you wanted to ask about a "direct mapping" style

AndyS: a common framework is to think of CSV as a big array of fields
... if that's how people are used to using it in the browser, the JSON conversion should factor that in
... we're looking very much at the annotated version of the data model
... I was wondering about the direct mapping, without annotations

<danbri> jeni: I think in all the cases we should be looking at a conversion, … a default conversion unguided by annotations; and that the annotations then are tweaks over the top

<Zakim> danbri, you wanted to suggest that someone can simultaneously work on json and rdf

AndyS: ok, we do have to factor in the direct mapping

danbri: we should avoid being tribal here, eg words like 'RDF people' or 'XML people'
... this group is full of pragmatists who use a bunch of technologies

<AndyS> PS We did briefly mention doing RDF via the abstract data model. Should all work out with direct to JSON-DL.

danbri: ivan's concerns that we fork too early won't apply because we'll all be watching the mailing list
... we shouldn't rule out JSON-LD, but we shouldn't assume it
... the use cases should keep us on track

yakovsh: I agree with danbri that groups with subgroups tend to fracture
... also, are the conversions well-defined formats?
... do all three need their own media types?

danbri: I don't think we need media types necessarily
... the RDF group in particular, don't need them

yakovsh: there are JSON & XML media types that are separate

<danbri> jeni: yes, as discussed on the list, depends on how the conversion is done, e.g. you could map into JSON that explicitly says 'there is a table with these columns', …

<danbri> … json properties for table/column/row

<danbri> … which could be an explicit media type for tabular data in json

<danbri> (and same in xml, <row>, <column>, <field>, ...)

<danbri> ….but i think more likely our mappings will be based on particular csv original documents

gkellogg: I'd suggest that a direct mapping to JSON, based on column headings is compatible with JSON-LD
... with zero edits, plus the context
... doesn't mean it solves every desire to convert the data
... I think a direct mapping plus a JSON-LD context does quite a lot
... and obviously gets us part way to RDF

ivan: one more question from yakovsh was about well-defined format
... we need to do more than an example, but a clear formal specification of a mapping
... a clear standard mapping definition
... also I wanted to add: I haven't looked at the details of the use cases in terms of these mappings
... but it would be good to look at the practice on how the mapping to JSON happens in those examples
... what is the usage of that in the real world
... and adding use cases on the conversion side rather than the structure of the CSV file
... those use cases & requirements might be useful
... there's no point having a specification that's completely different from what's done out there

<Zakim> danbri, you wanted to ask if we should be targeting existing XML and JSON idioms (e.g. SportsML for a sports CSV)

<danbri> eg. ical https://tools.ietf.org/html/rfc6321

danbri: as we map into particular XML languages etc: should we have a goal of mapping into fixed formats?

<yakovsh> open document xml

danbri: eg ical, SportsML
... in XML we might use XSLT to map into one of these formats
... but is that our aim?

<tfinin> ML370?

<danbri> ack ivan?

ivan: I think it's somewhere in between: I think if we say we should be able to map a CSV file to any XML schema or any RDF vocabulary out there, we will have to define something fairly complicated
... it's equivalent to what the RML work did, on converting relational databases to RDF
... that said, if there's something in the middle, in simple cases we might add something in the metadata

<AndyS> To gregg -- /me worried about putting JSON-LD algorithms on RDF path.

ivan: to help with the conversion, to produce an RDF closer to what's desired
... I wouldn't want to be able to do it automatically with any vocabulary out there

<danbri> ack jenit?

<yakovsh> +q

<danbri> jenit: in other places, you can hand off to another system for the conversion

<andimou> *equivalent to what R2RML does with Relational Databases, RML extends R2RML and maps CSV/XML/JSON to RDF

<danbri> eg. grddl turns normal XMLs into RDF via XSLT files

<danbri> we might want to consider that we do need to discuss on the list, whether we want to have the option to bug out to such languages

<danbri> (bug?)

<ivan> +1 to Jeni

<danbri> …eg. ptr to an xslt file over a standard mapping into a specific format

<danbri> or construct

<danbri> jeni: using those kinds of langs might be worthwhile

danbri: an easy win is to reuse the hard work of other groups
... eg the work of the relational-to-RDF mapping work
... Barry Norton has mapped musicbrainz data into linked data
... he's dropping down into SQL all the time
... in some use cases, that's just right: in other cases XSLT might give us that flexibility
... in browsers they might use Javascript

yakovsh: we're talking about conversion into JSON/XML/RDF, very removed from what users see

<danbri> musicbrainz example: https://github.com/LinkedBrainz/MusicBrainz-R2RML/blob/master/mappings/artist.ttl

yakovsh: what about open document XML and open XML, the document formats?

<danbri> http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument

yakovsh: if we're talking about conversions into formats, those are probably more common than anything else
... getting a spreadsheet out of CSV is going to be very common

danbri: that might be an unarticulated aspect of one of our existing use cases, we should take a look

<danbri> ACTION: danbri scan use cases to see if http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument are mentioned/implied [recorded in http://www.w3.org/2014/03/19-csvw-minutes.html#action01]

<trackbot> Created ACTION-8 - Scan use cases to see if http://en.wikipedia.org/wiki/comparison_of_office_open_xml_and_opendocument are mentioned/implied [on Dan Brickley - due 2014-03-26].

ivan: if I look at open document format, if I converted to that, it's down to Excel or OpenOffice, but these systems can do it directly from CSV
... so there's no need to convert
... is there a significant use case of systems other than traditional spreadsheet programs
... that want to manipulate another format because they can't use the comma-separated version

danbri: last time I looked at the open office format, it was a container that had lots of extensibility points
... even within XML languages, some are more specific than others

yakovsh: the point might be that these programs already handle CSV conversion, so no point because it's already built in

ivan: yes, I want to see if there are uses outside traditional spreadsheets

<danbri> jenit: also considering importing into a relational database

<danbri> … also can apply to spreadsheet formats

<danbri> … would be useful to know types of columns, to format; to create database structures etc.

<danbri> … maybe we need a kind of focus in this area, to make sure we collect useful metadata

<danbri> jenit: do we need a focus group on reading csv into spreadsheets, relational databases

danbri: why not MatLab or R etc?

JeniT: I think we should be looking at those: that's what data scientists use

danbri: we should make some initial forays and see where we get

<danbri> jenit: maybe given earlier discussion, instead of 'sub-groups', think of them as products we're aiming at

<danbri> so we'd need a lead editor on each, with co-editors etc

<danbri> … and yes let's have a 4th, csv reading into tabular data stores of various kinds

<danbri> (dan: I'd say structures/frameworks not stores)

<yakovsh> +q

yakovsh: I believe Part 9 of SQL discusses how they load CSVs
... we should look at that

<danbri> http://en.wikipedia.org/wiki/SQL/MED

<danbri> http://en.wikipedia.org/wiki/SQL:2011

JeniT: yes, all of the conversions are going to have to take into account existing work

yakovsh: Postgres is the only one I think that has a full implementation

danbri: musicbrainz data is shared as a Postgres dump, which isn't a good way of sharing data
... maybe they could look at CSV support from Postgres

<Zakim> AndyS, you wanted to ask about "sub group"

AndyS: what does 'sub group' mean? are we discussing on the main mailing list?

danbri: we'll focus on products: all discussion on main mailing list, but having these as focused efforts & documents


danbri: anyone have a clear view on which days they want to meet? Mon/Tue or Thur/Fri?

ivan: I'd like Mon/Tue because I have to be in another group Thur/Fri

<danbri> ivan: pref mon/tues

danbri: ok

<yakovsh> no particualr pref

danbri: any objections to mon/tue?

none heard

<danbri> ACTION: danbri to request mon/tues tpac meeting [recorded in http://www.w3.org/2014/03/19-csvw-minutes.html#action02]

<trackbot> Created ACTION-9 - Request mon/tues tpac meeting [on Dan Brickley - due 2014-03-26].

danbri: please someone volunteer to scribe next week

<danbri> thanks all

Summary of Action Items

[NEW] ACTION: danbri scan use cases to see if http://en.wikipedia.org/wiki/Comparison_of_Office_Open_XML_and_OpenDocument are mentioned/implied [recorded in http://www.w3.org/2014/03/19-csvw-minutes.html#action01]
[NEW] ACTION: danbri to request mon/tues tpac meeting [recorded in http://www.w3.org/2014/03/19-csvw-minutes.html#action02]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2014-03-19 14:11:16 $