See also: IRC log
<trackbot> Date: 21 May 2014
waiting for more people to join
prev minutes: http://www.w3.org/2014/05/14-csvw-minutes.html
<scribe> ACTION: dan check in with gregg, yakov to see how rigidly they are constrained in http://doodle.com/wk24me9g99hku83s#table [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action01]
<trackbot> Created ACTION-16 - Check in with gregg, yakov to see how rigidly they are constrained in http://doodle.com/wk24me9g99hku83s#table [on Dan Brickley - due 2014-05-28].
• Approve agenda and previous http://www.w3.org/2014/05/14-csvw-minutes.html
• http://w3c.github.io/csvw/syntax/ (Jeni)
• http://w3c.github.io/csvw/metadata/ (Rufus/Jeni)
• http://w3c.github.io/csvw/csv2rdf/ (Ivan/Gregg/Andy)
• http://w3c.github.io/csvw/use-cases-and-requirements/ (Jeremy/Davide/Eric)
last weeks minutes?
<AndyS> +1 to the minutes
jtandy: summarizing, …
... the right to left use case has had a bit of work on it this week
ericstephan: thanks to yakovsh for helping with the hebrew text
… we added a couple of things. First a ref to some languages e.g. japanese, mandarin, that represent vertically, as they don't follow the same r2l rules we've been documenting for arabic and hebrew
secondly, i had images originally for all of the csv
we put also the data into the doc
seems like we're getting diff results from diff browsers. jtandy?
jtandy: i think that the difficulty i have, depending on which browser/application i use to look at orig data file, it tries to interpret r2l or l2r
mozilla seems to do it reasonably
is there any guidance that people who are good with RESpec can give w.r.t. using r2l or l2r, to make sure it prints correctly (or at least in a determined way)
ivan: not sure, but trying same file in Chome and Firefox
jtandy: when i'm editing the source i use Oxygen XML, it puts everything backwards
ivan: I think that's correct w.r.t. the source xml
jtandy: yes, just confusing for some of us!
ivan: point is, at the moment it works ok.
jtandy: what would be helpful for eric finishing these - once the UC is in a state he's happy with, to get final review from a content expert
ivan: we have a w3c team member (Shadi) who happens to be Egyptian, I could ask him to take a look
…and Yakov checked the Hebrew part
eric: as i was including the data, … these are referencing csv files on a website, would it be useful to have someone make a local archive copy
… add locally in w3c via github tree
jtandy: that's what i've been using; embedding examples in the text of the html, but giving people the full csv file as well via local copy in github
ivan: maybe good idea to have a small separate index page somewhere for these
eric: could make sense for use case team to do this
ivan: in the egyptian version, 3rd row, i see arabic in the left-most field for example
… check later w/ shadi
<scribe> ACTION: ericstephan have a small separate index page for csv use case sources [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action03]
hm, ivan help?
eric: reviewing of use cases
… and the requirements
jtandy: more comments on specific UCs
… thanks to Ivan for intro with Liam, … 1st UC (dig. preservation) used XML as an interim step, so we are incorporating Liam's perspective there
… also that there are people using xslt, it is good to ack their requirements - so we now have an csv2xml requirement
…the use intro was with HL7, need to follow up on that.
jtandy: biodiversity UC requires some work from me to make it a more action-oriented story. I wanted to raise the review of requirements work that Davide is handling.
eric: regarding management of UCs, finalizing, we've have a number of recent updates, so i've made a wiki page for the contributors to have some kind of a review.
… can comment here, verify links etc.
… felt useful given that we have 22 UCs
davide: no specifics to add
danbri: relationship between this and the RDF mapping work?
ivan: we did quite a lot of work, Andy and I, got to a point where we have a friendly disagreement, ...
… a use case that should win here, and tell which is wrong/right.
… we have defined a scheme for generating rdf
the way it's done now. translating it to generating xml or json would be relatively easy; it is not particularly rdf or json-specific.
it uses the metadata fields that are defined in the metadata doc
systematically goes through each row and figures out what is needed per-row
we discussed at some point ,that a purely mechanical mapping is not enough
e.g. regex conversions, simple replacement templates with field names, ...
ivan: q is whether this is enough or not? if not, what else should we do when converting?
… one way is to say is that we as a group stop at this point, rely on some external processing, dependent on the format we use
… rely on some xslt, sparql processing,
[danbri: .js ? ]
ivan: the other alternative (not mutally exclusive) is that we define a more complex templating language/mechanism which essentially …
… uses a skeleton output, and in there you have macro-like / template-like things, which are in the simplest things, replacements. In more complex case, … [not sure yet]
… a kind of template, shape language
ivan: and so the q is whether we really need this kind of thing or not
… when for example in oen of the use cases, … is the generated rdf a v simple one that can be mapped directly from the content. Or does it have a more complex shape?
or is the use case handled by referring to a sparql engine, xslt, etc
so let's try to ground this in use cases.
how exactly the json or rdf or xml looks like once it is generated, and how far we have to go in the general standards
andys: you characterized the algorithmic approach as the easiest way to get conversions defined
… i'm not convinced that there is quite so much sharing between the different languages (rdf,xml,json)
…and you need to combine various fragments appropriately per-syntax
I don't see that we have examples where this simple, fairly mechanical conversion, is exactly what people will find acceptable/useful
i'm not sure those people are programmers
defining the shape of the rdf, should come from the publisher side, not be a task that the data consumer has to undertake
what i've found looking at some conversions we've done - you have to sit down w/ the csv file. Even with such metadata as we spec, there's a lot of higher level info that you'll also want to expose to make the exercise worthwhile.
andys: q for jeni, who said she had some requirements to share, which i've asked for
jtandy: summarizing what was being said there to check my u/standing:
…that mechanistic row by row conversion, simply using info from the metadata vocab, no templating?
ivan: not exactly. uses metadata, but can also add simple local templating
ie. can add cell level templating
andys: … one subject per row (no nested structures)
jtandy: in order to get those nested structures, we'd need extra structure
andys: but that's not our proposal
jtandy: if i was to interpret inputs or requirements from my scientific colleagues; they're not programmers. data managers, data processing people.
so the more hoops they must jump through, the more likely they'll do something … random
so i'd like a mechanism that asks for a reasonable amount of thinking to be expressed as a template
some people try to create conversions by rote, adapting previous examples
ivan/andys: makes sense
<ericstephan> +1 Jtandy
andys: the metadata only conversion could be defined by using the metadata available to auto-generate a template
it would be great if that template could be exposed by the tools -> learning by seeing
andys: a bit worried the algorithmic approach might end up with capabilities that aren't possible in the template
… and the gap between a naive and complex conversion coudl be too large
andys: concern we're creating large work items for the WG
jtandy, is your q answered?
ivan: jeni mentioned that working with templating languages,, always ended up needing conditionals
… intellectually v challenging but a major undertaking
andys: i don't know exactly what requirements she had, but could be simple or complex
in terms of defining languages, helpful that peopel are pointing them out
i did a quick survey of templating languages
several specifically target HTML output
(andys, url for your notes?)
andys: there are also _lots_
<ericstephan> are we talking about templating languages or perhaps design patterns?
<Zakim> danbri, you wanted to mention https://developers.google.com/webmasters/business-location-pages/schema.org-examples
eric: are we talking about templating languages or perhaps design patterns?
ivan: are you thinking about the UCs they have?
eric: e.g. i'm starting to work with Bernadette, data best practices wg
… how data lives on the web, how it is used, ...
Bernadette brought up design patterns
e.g. for table based data, csv.
danbri: can you share some links?
eric: yup, will do
<scribe> ACTION: ericstephan share links for Best Practices WG discussion of design patterns [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action04]
<trackbot> Created ACTION-18 - Share links for best practices wg discussion of design patterns [on Eric Stephan - due 2014-05-28].
yakovs: regarding templating languages, we should bear in mind security
even xslt can make a huge mess
(c.f. http://xkcd.com/327/ )
yakovs: we should mention security considerations for untrusted templates
yakovs: re overall discussion, … not us defining canonical transformations, we have certain guidelines/overview, these 3 things (rdf, xml, [json]) serve as examples of that
… but they're not exclusive to having other targets
andys: can you put this on the mailing lists?
ivan: my goal at least was that the algorithmic/mechanical description that i produced, should be essentially repeatable on some other language
<yakovsh> ACTION: yakovsh share with the mailing lists the information about security aspects [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action05]
<trackbot> Created ACTION-19 - Share with the mailing lists the information about security aspects [on Yakov Shafranovich - due 2014-05-28].
… andy rightly notes that there will be specificities for json, xml, etc.; but there is a generic component
ivan: that statement would also be true for a general templating mechanism as well
<yakovsh> ACTION: yakovsh share with the mailing list thoughts on generic guidelines and templates serving as examples [recorded in http://www.w3.org/2014/05/21-csvw-minutes.html#action06]
<trackbot> Created ACTION-20 - Share with the mailing list thoughts on generic guidelines and templates serving as examples [on Yakov Shafranovich - due 2014-05-28].
ivan: … not only format specific but syntax
specific (a distinction that matters for RDF)
... regarding security, … mechanical part is not fully secure. If you allow regular expressions, … scope for trickiness.
… so valid for both.
<yakovsh> example of a regex ddos
andys: xslt can call out to your file system, for example.
jtandy: to ivan - you mentioned you don't like templates including specific syntaxes. But I find it hard to avoid wanting to say 'my rdf should look like this…'
ivan: the rdf case is different than json
… json is defined and described as a syntax
… rdf is defined as a set of triples with variety of serializations
ivan: extra difficulty, if you give a template in mockup turtle, but if you need output in json-ld, then that system has to be able to parse the turtle and serialize json-ld
<yakovsh> regarding security, i think there is a review process for w3c specs here although I am not familiar with it: http://www.w3.org/Security/wiki/IG/W3C_spec_review
jtandy: i got Ivan's point, ended up where andy was; that there are conversion tools.
i expect format mismatch an unlikely problem
andys: i'm hoping the template comes from the publisher, not only consumer
e.g. temperature reading UC
jtandy: allowing consumer to choose from templates
danbri: Editors: what more do you need from WG members?
ivan: at least personally, i'd like to see something more about templating languages
that won't lead to 3 years R'n'D
andys: so gregg has written it up
... we need to ground decisions in expectations; discussions are amongst a small core of ppl who turn up for the calls. Significant issue that you can get so far down the road, and then getting diff reception from wider audience.
davide: need more feedback/input
NOTE: for next week's call, let's decide later this week what we're doing with timing. See mailing list for exact timing choice.
<ericstephan> Dropping off now, have a good week! I am on travel next week.
AndyS, got a minute to talk more in irc re http://w3c.github.io/csvw/csv2rdf/ ?
<ivan> trackbot, end telcon
<AndyS> A few minutes ...