CSV on the Web Working Group Teleconference

25 Jun 2014


See also: IRC log


MathewThomas (Matthew Thomas), Ivan Herman (Ivan), Jeremy Tandy (jtandy), Jeni Tennison (JeniT), Dan Brickely (danbri),  Davide Ceolin (DavideCeolin), Andy Seaborne (AndyS), Rufus Pollock (rgrp)
Jeni Tennison
Jeremy Tandy, Jeni Tennison


<JeniT> Agenda: https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-06-24

<danbri> looks good.

Approve agenda and previous

<JeniT> Previous Minutes: http://www.w3.org/2014/06/18-csvw-minutes.html

JeniT: any objections?
... hearing none.

<JeniT> RESOLVED: minutes at http://www.w3.org/2014/06/18-csvw-minutes.html are accepted

CSV on the Web: Use Cases and Requirements

jtandy: ready for republication as WD

… ivan needs to take it over to the web master for publication

… we’ve been through the workflow

… there are changes put in as suggested by I18N group

… re <bdi>

… getting rid of validation issues, and improving the text

… note to JeniT that the same changes will need to be done in the syntax document

ivan: I will put everything up on the website before business end this afternoon

… then I expect it will be checked on Friday or Monday

… I don’t expect major issues

jtandy: we get an error on the patent policy but that comes from ReSpec

ivan: I’ll handle that

… probably published next Tuesday

… be good if a blog item or something to announce it would be good

… some text on the home page

danbri: I’ll do that

jtandy: the diff shows the new use cases

<danbri> ACTION: danbri draft blog post [recorded in http://www.w3.org/2014/06/25-csvw-minutes.html#action01]

<trackbot> Created ACTION-25 - Draft blog post [on Dan Brickley - due 2014-07-02].

jtandy: in terms of next steps, there might be more use cases that crop up as we go along

… I think danbri was trying to find one

… the requirements are all still in the ‘Candidate’ area at the moment

… yet to formally review which requirements are in or out

… which would require a cross reference against the other specs

<danbri> yup don't wait; i'll try to get more schema.org-related UCs but I think the existing use cases catch a lot

JeniT: getting concrete CSVs out that we can try as examples

jtandy: many are there, where should we put them?

danbri: I’ve committed a structure we have in mind into github

<danbri> https://github.com/w3c/csvw/tree/testing-variations/examples/tests

… aiming for a structured file tree allowing a common structure without baking in solutions

… structure that will allow for experimentation

… this branch has two different folders

… scenarios folder with README, CSV files, metadata documents, maybe target

… then one ore more attempts with input files copied from the main folder

… you could have a separate metadata file

… output files directory showing normal output

… and then template or mapping files for the magic stuff

… readme pointing off to documentation in the wiki etc

… I’m trying this out for one example

<rgrp> hi all

jtandy: where would you like me to put the files from the UC spec?

ivan: they’re all in the folder at the moment

… they can be used or copied or we can refer to the TR space

… part of the folder that contains the use case document

jtandy: not every use case has files

… there would only be a subset of use cases that were applicable for this activity

… I can create a top-level directory which is ‘use cases’

danbri: I think it’s ok to have a few identical copies in different places

… what’s missing is the tie between the .csv and the use case id

jtandy: I’ll create a bunch of subdirectories, one per use case

… we can change the names later if they should become scenarios

danbri: I’ll work through one or two before the end of the week, convince myself it works

… if it does then I’ll merge the branch

JeniT: anything else on use cases work?

<jtandy> ScribeNick: jtandy

Mappings and template mechanisms

JeniT: discuss work on the mailing list ...

ivan: common theme to discuss the template mechanism
... jeremy introduced the idea of micro syntax into the metadata
... for a given cell, you can use a regexp to describe named elements of a structured cell value
... allowing one to reference a micro syntax element directly
... the next thing we discussed is the need for conditional matching in the templates
... there are some use cases that clearly indicate that we need conditional matching
... complicated - but it seems necessary
... Jeremy & I were wondering if the condition for the matching could be expressed using REGEXP matching
... two different patterns:
... Jeremy put the conditional match into the metadata itself; e.g. if the regexp is valid use _this_ template
... concerned that this will lead to lots of alternative templates being provided; potentially very messy

<rgrp> is french mustache different from english?

ivan: proposed alternative to embed the conditional in the template itself - used example of moustache syntax
... this allows one template to express all the if-then-else variations

<JeniT> jtandy: example of organisational data into SKOS structure

ivan: the resulting syntax is "disgusting"

JeniT: (meta level) question - given the choice of tempting, do we create a new syntax for a template or try to reuse?

AndyS: we should be as much informed by existing syntax as possible - but we need to have control over the details
... for example the conditional blocks
... things like Moustache might change beyond our control ... and it does some things we don't want to do!
... the good news is that tempting languages are all fairly similar ...

<rgrp> just a note - i will need to head out a bit early - viz in about 7-10m

ivan: one additional thing, we agreed that this tempting mechanism would work for _all_ target langs
... we've all used TTL so far in examples; need to check this against JSON and XML etc.

JeniT: to give time for discussion on metadata spec, lets truncate this conversation
... but, are we confident that we've looked at the other template syntaxes? are we covering the things that other people have already found important

AndyS: looked at 10 java and 20 ruby template syntaxes
... as classes of languages they're all quite similar (odd details aside)

JeniT: don't want to invent something new!

AndyS: but we're forced to "invent" something new because there are no _standard_ template syntaxes; we need to build for the long term

Metadata Spec

rgrp: (Rufus) I have been having regular catch ups with JeniT
... items and issues are being raised on github
... outstanding items include:
... i) structured notes and annotations
... ii) specifying the cdv dialect

<AndyS> https://github.com/w3c/csvw/issues

rgrp: will raise the meaty concerns as issues with aim to have a new version out ahead of cdv conf next month

JeniT: plan is to convert the _issues in the document_ to _issues in github_ to allow the conversation to occur on github

rgrp: we'll make sure the team are aware of the issues that are raised

ivan: better tracking of the issues (e.g. on github) will be helpful in the long term as the issues within the document tend to disappear

JeniT: agreed - and the issue numbers change in the doc too! very difficult to get consistent reference!

ivan: when is csv conf?

rgrp: July 15th

ivan: do we want to have an official (FP)WD for metadata vocab and data model prior to cdv conf

<rgrp> i'm about to have to drop :-/

JeniT: (I think!) seems like a good idea

ivan: in that case we need to target a particular publication date.
... one issue to note is that because the metadata doc is new, we need to request an official "short name"
... we need to do this urgently - today!

<AndyS> Publish!

ivan: call for formal resolution of metadata doc as FPWD

JeniT: inclined to publish; presumably this doesn't commit us to a date?

<AndyS> *Everything* can change after FPWD.

ivan: agreed -

rgrp: I don't want to rush things

ivan: OK - but FPWD is not stable - just a line in the sand

rgrp: ok

<ivan> PROPOSED: To publish the metadata document as a fpwd, possibly on the week of the 7th of July

<ivan> +1

<JeniT> +1


<AndyS> +1

<rgrp> +0

<rgrp> i have to drop ;-)

<MathewThomas> 1

<DavideCeolin> +1

<rgrp> sounds good o&o for now

<danbri> +1

<ivan> RESOLVED: To publish the metadata document as a fpwd, possibly on the week of the 7th of July

<MathewThomas> +1

ivan: I will request the short name for the metadata document TODAY; copy JeniT
... when do think you can have both metadata doc and data model doc ready for publication?

Model for Tabular Data and Metadata on the Web

JeniT: data model doc is ready now
... rgrp wants to do more work on the metadata vocal doc
... but think we should be ready to run through the publication workflow next week

ivan: great ... let's make sure to publish both docs on the same day

JeniT: any other questions about the metadata or data model doc?
... hearing none
... all to be aware that the plan is to raise a bunch of issues about the metadata doc over this week and next
... any other questions about mapping?

AndyS: any movement about JSON and XML?

JeniT: have pestered Ross Jones and Rufus about JSON
... have Adam in my sights for XML mapping
... the only other thing, thinking about the micro syntax parsing ...

<danbri> this? http://www.jenitennison.com/datatypes/DTLL.html

<danbri> (?[year]-?[0-9]{4})-(?[month][0-9]{2})-(?[day][0-9]{2})

JeniT: way back [in the distant past when dinosaurs roamed the earth] I did some work on using single regexp to pull out subgroups ...
... using named subexpressions within a regexp
... this is supported in _some_ regexp implementations

AndyS: which ones?

ivan: I know you can number subexpressions

JeniT: yes ... naming is useful; e.g. Jeremy was trying to associate particular subexpressions?

zakin, unmute me

<JeniT> http://www.boost.org/doc/libs/1_40_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.named_subexpressions

<JeniT> Perl regexes

<danbri> http://www.pcre.org/pcre.txt

danbri: there is utility in naming subexpressions

<AndyS> http://regexkit.sourceforge.net/Documentation/pcre/pcresyntax.html

<danbri> """


<danbri> Identifying capturing parentheses by number is simple, but it can be

<danbri> very hard to keep track of the numbers in complicated regular expres-

<danbri> sions. Furthermore, if an expression is modified, the numbers may

<danbri> change. To help with this difficulty, PCRE supports the naming of sub-

<danbri> patterns. This feature was not added to Perl until release 5.10. Python

<danbri> had the feature earlier, and PCRE introduced it at release 4.0, using

<danbri> the Python syntax.

<danbri> """

JeniT: my paper has been accepted at CSV conf
... danbri and I are away next week so we will cancel next weeks call
... AOB?
... hearing none


JeniT: meeting is closed.

Summary of Action Items

[NEW] ACTION: danbri draft blog post [recorded in http://www.w3.org/2014/06/25-csvw-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2014-06-25 13:15:53 $