CSV on the Web Working Group Teleconference -- 21 May 2014

<trackbot> Date: 21 May 2014

waiting for more people to join

prev minutes: http://www.w3.org/2014/05/14-csvw-minutes.html

<scribe> ACTION: dan check in with gregg, yakov to see how rigidly they are constrained in http://doodle.com/wk24me9g99hku83s#table [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action01]

<trackbot> Created ACTION-16 - Check in with gregg, yakov to see how rigidly they are constrained in http://doodle.com/wk24me9g99hku83s#table [on Dan Brickley - due 2014-05-28].

• Approve agenda and previous http://www.w3.org/2014/05/14-csvw-minutes.html

• http://w3c.github.io/csvw/syntax/ (Jeni)

• http://w3c.github.io/csvw/metadata/ (Rufus/Jeni)

• http://w3c.github.io/csvw/csv2rdf/ (Ivan/Gregg/Andy)

• http://w3c.github.io/csvw/use-cases-and-requirements/ (Jeremy/Davide/Eric)

last weeks minutes?

<AndyS> +1 to the minutes

approved.

use cases and requirements

jtandy: summarizing, …
... the right to left use case has had a bit of work on it this week

ericstephan: thanks to yakovsh for helping with the hebrew text

… we added a couple of things. First a ref to some languages e.g. japanese, mandarin, that represent vertically, as they don't follow the same r2l rules we've been documenting for arabic and hebrew

secondly, i had images originally for all of the csv

we put also the data into the doc

seems like we're getting diff results from diff browsers. jtandy?

jtandy: i think that the difficulty i have, depending on which browser/application i use to look at orig data file, it tries to interpret r2l or l2r

mozilla seems to do it reasonably

is there any guidance that people who are good with RESpec can give w.r.t. using r2l or l2r, to make sure it prints correctly (or at least in a determined way)

<ericstephan> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-SupportingRightToLeftDirectionality

ivan: not sure, but trying same file in Chome and Firefox

jtandy: when i'm editing the source i use Oxygen XML, it puts everything backwards

ivan: I think that's correct w.r.t. the source xml

jtandy: yes, just confusing for some of us!

ivan: point is, at the moment it works ok.

jtandy: what would be helpful for eric finishing these - once the UC is in a state he's happy with, to get final review from a content expert

ivan: we have a w3c team member (Shadi) who happens to be Egyptian, I could ask him to take a look

<yakovsh> yes

…and Yakov checked the Hebrew part

eric: as i was including the data, … these are referencing csv files on a website, would it be useful to have someone make a local archive copy

(github?)

ivan: +1

… add locally in w3c via github tree

jtandy: that's what i've been using; embedding examples in the text of the html, but giving people the full csv file as well via local copy in github

ivan: maybe good idea to have a small separate index page somewhere for these

eric: could make sense for use case team to do this

ivan: in the egyptian version, 3rd row, i see arabic in the left-most field for example

… check later w/ shadi

<scribe> ACTION: ericstephan have a small separate index page for csv use case sources [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action03]

hm, ivan help?

eric: reviewing of use cases

<ericstephan> http://w3c.github.io/csvw/use-cases-and-requirements/index.html

… and the requirements

jtandy: more comments on specific UCs

… thanks to Ivan for intro with Liam, … 1st UC (dig. preservation) used XML as an interim step, so we are incorporating Liam's perspective there

… also that there are people using xslt, it is good to ack their requirements - so we now have an csv2xml requirement

…the use intro was with HL7, need to follow up on that.

jtandy: biodiversity UC requires some work from me to make it a more action-oriented story. I wanted to raise the review of requirements work that Davide is handling.

<ericstephan> https://www.w3.org/2013/csvw/wiki/Use_Cases_Check

eric: regarding management of UCs, finalizing, we've have a number of recent updates, so i've made a wiki page for the contributors to have some kind of a review.

… can comment here, verify links etc.

… felt useful given that we have 22 UCs

davide: no specifics to add

danbri: relationship between this and the RDF mapping work?

csv2rdf

ivan: we did quite a lot of work, Andy and I, got to a point where we have a friendly disagreement, ...

… a use case that should win here, and tell which is wrong/right.

… we have defined a scheme for generating rdf

the way it's done now. translating it to generating xml or json would be relatively easy; it is not particularly rdf or json-specific.

it uses the metadata fields that are defined in the metadata doc

systematically goes through each row and figures out what is needed per-row

we discussed at some point ,that a purely mechanical mapping is not enough

e.g. regex conversions, simple replacement templates with field names, ...

ivan: q is whether this is enough or not? if not, what else should we do when converting?

… one way is to say is that we as a group stop at this point, rely on some external processing, dependent on the format we use

… rely on some xslt, sparql processing,

[danbri: .js ? ]

ivan: the other alternative (not mutally exclusive) is that we define a more complex templating language/mechanism which essentially …

… uses a skeleton output, and in there you have macro-like / template-like things, which are in the simplest things, replacements. In more complex case, … [not sure yet]

… a kind of template, shape language

ivan: and so the q is whether we really need this kind of thing or not

… when for example in oen of the use cases, … is the generated rdf a v simple one that can be mapped directly from the content. Or does it have a more complex shape?

or is the use case handled by referring to a sparql engine, xslt, etc

so let's try to ground this in use cases.

how exactly the json or rdf or xml looks like once it is generated, and how far we have to go in the general standards

andys: you characterized the algorithmic approach as the easiest way to get conversions defined

… i'm not convinced that there is quite so much sharing between the different languages (rdf,xml,json)

…and you need to combine various fragments appropriately per-syntax

I don't see that we have examples where this simple, fairly mechanical conversion, is exactly what people will find acceptable/useful

i'm not sure those people are programmers

defining the shape of the rdf, should come from the publisher side, not be a task that the data consumer has to undertake

what i've found looking at some conversions we've done - you have to sit down w/ the csv file. Even with such metadata as we spec, there's a lot of higher level info that you'll also want to expose to make the exercise worthwhile.

andys: q for jeni, who said she had some requirements to share, which i've asked for

jtandy: summarizing what was being said there to check my u/standing:

…that mechanistic row by row conversion, simply using info from the metadata vocab, no templating?

ivan: not exactly. uses metadata, but can also add simple local templating

ie. can add cell level templating

andys: … one subject per row (no nested structures)

jtandy: in order to get those nested structures, we'd need extra structure

andys: but that's not our proposal

jtandy: if i was to interpret inputs or requirements from my scientific colleagues; they're not programmers. data managers, data processing people.

so the more hoops they must jump through, the more likely they'll do something … random

so i'd like a mechanism that asks for a reasonable amount of thinking to be expressed as a template

some people try to create conversions by rote, adapting previous examples

ivan/andys: makes sense

<ericstephan> +1 Jtandy

andys: the metadata only conversion could be defined by using the metadata available to auto-generate a template

<DavideCeolin> +1

it would be great if that template could be exposed by the tools -> learning by seeing

jtandy: concur

andys: a bit worried the algorithmic approach might end up with capabilities that aren't possible in the template

… and the gap between a naive and complex conversion coudl be too large

andys: concern we're creating large work items for the WG

<ericstephan> +q

ivan: yup

jtandy, is your q answered?

ivan: jeni mentioned that working with templating languages,, always ended up needing conditionals

… intellectually v challenging but a major undertaking

andys: i don't know exactly what requirements she had, but could be simple or complex

in terms of defining languages, helpful that peopel are pointing them out

i did a quick survey of templating languages

several specifically target HTML output

(andys, url for your notes?)

andys: there are also _lots_

<ericstephan> are we talking about templating languages or perhaps design patterns?

<Zakim> danbri, you wanted to mention https://developers.google.com/webmasters/business-location-pages/schema.org-examples

eric: are we talking about templating languages or perhaps design patterns?

ivan: are you thinking about the UCs they have?

eric: e.g. i'm starting to work with Bernadette, data best practices wg

… how data lives on the web, how it is used, ...

Bernadette brought up design patterns

e.g. for table based data, csv.

danbri: can you share some links?

eric: yup, will do

<scribe> ACTION: ericstephan share links for Best Practices WG discussion of design patterns [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action04]

<trackbot> Created ACTION-18 - Share links for best practices wg discussion of design patterns [on Eric Stephan - due 2014-05-28].

yakovs: regarding templating languages, we should bear in mind security

even xslt can make a huge mess

(c.f. http://xkcd.com/327/ )

yakovs: we should mention security considerations for untrusted templates

yakovs: re overall discussion, … not us defining canonical transformations, we have certain guidelines/overview, these 3 things (rdf, xml, [json]) serve as examples of that

… but they're not exclusive to having other targets

andys: can you put this on the mailing lists?

ivan: my goal at least was that the algorithmic/mechanical description that i produced, should be essentially repeatable on some other language

<yakovsh> ACTION: yakovsh share with the mailing lists the information about security aspects [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action05]

<trackbot> Created ACTION-19 - Share with the mailing lists the information about security aspects [on Yakov Shafranovich - due 2014-05-28].

… andy rightly notes that there will be specificities for json, xml, etc.; but there is a generic component

ivan: that statement would also be true for a general templating mechanism as well

<yakovsh> ACTION: yakovsh share with the mailing list thoughts on generic guidelines and templates serving as examples [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action06]

<trackbot> Created ACTION-20 - Share with the mailing list thoughts on generic guidelines and templates serving as examples [on Yakov Shafranovich - due 2014-05-28].

ivan: … not only format specific but syntax specific (a distinction that matters for RDF)
... regarding security, … mechanical part is not fully secure. If you allow regular expressions, … scope for trickiness.

… so valid for both.

<yakovsh> http://msdn.microsoft.com/en-us/magazine/ff646973.aspx

<yakovsh> example of a regex ddos

andys: xslt can call out to your file system, for example.

jtandy: to ivan - you mentioned you don't like templates including specific syntaxes. But I find it hard to avoid wanting to say 'my rdf should look like this…'

ivan: the rdf case is different than json

… json is defined and described as a syntax

… rdf is defined as a set of triples with variety of serializations

ivan: extra difficulty, if you give a template in mockup turtle, but if you need output in json-ld, then that system has to be able to parse the turtle and serialize json-ld

<yakovsh> regarding security, i think there is a review process for w3c specs here although I am not familiar with it: http://www.w3.org/Security/wiki/IG/W3C_spec_review

jtandy: i got Ivan's point, ended up where andy was; that there are conversion tools.

i expect format mismatch an unlikely problem

andys: i'm hoping the template comes from the publisher, not only consumer

e.g. temperature reading UC

jtandy: allowing consumer to choose from templates

danbri: Editors: what more do you need from WG members?

ivan: at least personally, i'd like to see something more about templating languages

that won't lead to 3 years R'n'D

andys: so gregg has written it up
... we need to ground decisions in expectations; discussions are amongst a small core of ppl who turn up for the calls. Significant issue that you can get so far down the road, and then getting diff reception from wider audience.

davide: need more feedback/input

NOTE: for next week's call, let's decide later this week what we're doing with timing. See mailing list for exact timing choice.

<ericstephan> Dropping off now, have a good week! I am on travel next week.

<AndyS> ADJOURNED

AndyS, got a minute to talk more in irc re http://w3c.github.io/csvw/csv2rdf/ ?

<ivan> trackbot, end telcon

<AndyS> A few minutes ...

CSV on the Web Working Group Teleconference

21 May 2014

Attendees

Contents

use cases and requirements

csv2rdf

Summary of Action Items