12:02:15 RRSAgent has joined #csvw 12:02:15 logging to http://www.w3.org/2014/09/10-csvw-irc 12:02:17 RRSAgent, make logs public 12:02:17 Zakim has joined #csvw 12:02:19 Zakim, this will be CSVW 12:02:19 ok, trackbot, I see DATA_CSVWG()8:00AM already started 12:02:20 Meeting: CSV on the Web Working Group Teleconference 12:02:20 Date: 10 September 2014 12:02:56 yakovsh has joined #csvw 12:03:36 +[IPcaller] 12:03:59 danbri_ has joined #csvw 12:04:04 + +44.207.346.aabb 12:04:46 fresco has joined #csvw 12:05:18 hmm 12:05:30 scribenick: danbri 12:05:53 Jeni: discuss a few issues raised over last month or so. SOme explanation re next week's "special call" on our templating decisions 12:06:18 +??P11 12:06:20 ... I'll be asking around for F2F looking for volunteers to lead sessions. With only 2 chairs + 2 particpants today we're not quorate for decisions, but can have a bit of discussion. 12:06:45 https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-09-10 12:07:01 topic: type vs datatype 12:07:02 +[IPcaller] 12:07:11 zakim, IPCaller is me 12:07:11 +AndyS; got it 12:07:21 https://github.com/w3c/csvw/issues/22 12:07:44 ... this is an issue from the metadata document, where we need to figure out what to call the property in the metadata that refers to the datatype of the values in a partiular column 12:07:54 ... there are some constraints here from attempting to adopt json-ld 12:08:06 ... while seems a small issue has some impact on relationship w/ json-ld 12:08:22 ... json-ld, the interpretation of an 'object', ... using an @type, using just @type, and using @datatype 12:08:42 ... if we use @type, then that is interpreted specially in json-ld to be w.r.t. the type of the thing being described 12:08:51 { “@type”: “Column” } 12:08:55 which in this case would be the columns rather than individual cells 12:09:15 thinks it makes sense, i mean 12:09:41 jeni: 2nd option is to just use the plain term 'type'. reason this is a little problematic, ... for other properties on columns, ... 12:09:53 ...they're generally treated as meaning whatever Dublin Core says type means 12:10:04 e.g. we can have a col description that basically says 12:10:11 { “source”: “http://example.org/“ } 12:10:11 "source is some-source" 12:10:31 "and that would be interpreted ... the source property would be interpreted as meaning same as the DC property 'source') 12:10:44 but DC has a property 'type', which isn't particularly helpful here 12:10:47 + +1.443.650.aacc 12:10:48 jtandy has joined #csvw 12:10:54 q+ to ask if json-ld context is making the DC mapping explicit 12:11:07 zakim, +1.443.650.aacc 12:11:07 I don't understand '+1.443.650.aacc', yakovsh 12:11:24 ack dan 12:11:24 danbri_, you wanted to ask if json-ld context is making the DC mapping explicit 12:11:46 jenit: yes, we'd explicitly map to dc 12:12:06 https://github.com/w3c/csvw/blob/gh-pages/metadata/csvm-context.json 12:12:07 ... in repo for what we're doing here, ... then we have got an initial start on what that context looks like 12:12:48 jenit: this came up re simple data format discussion 12:13:11 danbri: when did we decide to fold all of dublin core into our lang? 12:13:18 jenit: happy to discuss 12:13:32 danbri_: maybe we should map schema.org instead 12:13:39 … that will map on to DublinCore 12:13:51 JeniT: which set of properties? 12:14:16 danbri: "whatever we want" 12:14:25 q+ 12:14:49 [speaker?]: ... not sure what the mapping in the context file is, beyond mapping col names to url 12:15:27 jeni: these aren't about mapping col names, but about mapping metadata re particular columns, source it comes from, rights over any data, when that column was created, ... these properties could apply also at the table level. E.g. publisher of a particular CSV file. 12:15:35 s/[speaker?]/fresco/ 12:15:51 fresco: do we expect usage on columns, vs cells? 12:16:02 jenit: you have a global context, ... rather than it being different for different objects 12:16:21 ... i.e. if define all of these things so that they could apply to tables, ... 12:16:31 which would seem rational, since DC is doc-metadata-centric 12:16:50 ... that the upshot of this is that they would also be used to interpret any other column, row metadata 12:17:24 danbri: we could define inline contexts 12:17:38 http://schema.org/Dataset 12:17:42 fresco: would be good to look at schema.org to see if it has what we'd need 12:17:52 (plus Organization, Person, etc etc) 12:18:09 (in github here, https://github.com/rvguha/schemaorg ) 12:18:43 schema.org doesn't seem to have 'type' yet, at least. http://schema.org/type (but does keep adding stuff) 12:19:06 jenit: to take this fwd, shall I widen it out to be an issue on whether to adopt the DC set of metadata terms, or the schema.org set of metadata terms 12:19:32 jtandy: i think that would be sensible; it doesn't replace the type vs datatype issue 12:20:01 jenit: the issue changes then. the issue becomes potential confusion between 'type' and '@type', since they could be both used but have diff meanings 12:20:01 q? 12:20:04 ack jtandy 12:20:21 jtandy: on topic of schema.org, DC; if there is a clear mapping from schema.org to DC, it would appear to be a sensible way fwd 12:21:28 danbri: some terms map; but there are a few differences 12:22:50 danbri: is there enough specificity in the use cases to drive a decision on using dc, schema.org etc. 12:22:59 ACTION: JeniT to write to mailing list re using schema.org rather than Dublin Core for metadata about CSV files, then binding decision on following telcon 12:22:59 Created ACTION-26 - Write to mailing list re using schema.org rather than dublin core for metadata about csv files, then binding decision on following telcon [on Jeni Tennison - due 2014-09-17]. 12:23:20 jtandy: stating a license, stating who is responsible, ... ... but often use cases just say "we need publishing metadata", " this ,category of metadata" 12:24:36 [...] 12:25:08 q+ 12:25:13 ack yakovsh 12:25:29 yakovsh: re datatypes, maybe i'm unfamiliar with rdf, ... is there a link to what datatypes there are? 12:25:53 this? http://www.w3.org/TR/rdf11-concepts/#section-Datatypes 12:25:59 http://w3c.github.io/csvw/metadata/#datatypes 12:26:16 jenit: this assumes standard set of w3c-designed datatypes, which came from xml schema 12:26:39 ... i.e. those most usually used within RDF, but slightly extended to include Number, Binary, ... 12:26:52 (what's binary mean in CSV?) 12:27:04 http://w3c.github.io/csvw/metadata/#datatypes 12:27:06 yakovsh: having a clear list is important 12:27:15 jenit: see 3.8.4 12:27:38 ah, "the datatype binary which is exactly equivalent to base64Binary" 12:27:48 jenit: we're extending the list here 12:27:54 doc has specific issues flagged 12:28:11 ... being consistent with simple data format, and other existing work around w3c 12:28:22 yakovsh: if that list needed to change in future, how would that work? 12:28:27 zakim, who is on the phone? 12:28:27 On the phone I see waingram, JeniT, fresco, danbri, jtandy, AndyS, yakovsh 12:28:42 yakovsh: on ietf side, we tend to worry about extensibility 12:28:45 ... things change over time 12:29:40 FYI see http://www.w3.org/2001/05/xmlschema-errata for changes in http://www.w3.org/TR/xmlschema-2/ vs earlier version 12:29:41 [...] 12:29:52 jenit: ... various ways, e.g. consider impact on validators 12:30:05 yakovsh: do we want to discuss extensibility? 12:30:16 jenit: yes, should def be part of the pattern of how we work on the standard. 12:31:12 topic: Jena CSV update 12:31:18 AndyS, any thoughts/conclusions? 12:31:50 Andy - not a lot to say. Mapping is v simple, hardcoded / built-in. Purpose of project 2-fold. Get something working in the time available (google summer of code student). And didn't want to pre-judge WG decisions. 12:32:05 [same update as last week except audio quality 1000x better :] 12:32:26 jenit: work done - anything that makes you feel direction here should be one thing or another? 12:32:42 andys: we didn't push on it beyond column=predicate, ... 12:32:49 goal was code rather than a research project 12:33:02 jenit: next steps with it? 12:33:09 andys: "wait" :) 12:33:42 topic: Use case 4 12:33:44 https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4 12:33:49 JeniT: update from my summer investigation. 12:34:15 Jeni: UC-4 is quite interesting. About publication of information about org structures in the uk civil service 12:34:24 each dept publishes a pair of linked csv files 12:34:30 ... same schema(s) 12:34:33 ... always in pairs 12:34:43 certain places where schema is extensible 12:34:44 kinda 12:35:05 e.g. dept might have sub-groups 12:35:36 https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/attempts/attempt-1 12:35:53 ... readme throws out some of the issues for discussion 12:36:52 ... 12:37:01 special codes in front of columns 12:37:11 the way I addressed it was use of regexes 12:37:25 ... struck me afterwards, having a separate csv file could be a better design for that package of csv files 12:37:35 ... made me think about revisiting that design 12:37:47 ... some things around co-constraints across the columns 12:38:07 in one of the files, ... if the unique post ref in some column, then the job title must be not-in-post, function must be n/a, etc. 12:38:18 jenit: do we need within schema files to indicate co-constraints? 12:38:53 ... validation constraints. whether cols can be both NULL and required. 12:39:04 e.g. must have NULL value textually in the actual CSV text. 12:39:09 s/e.g./i.e./ 12:39:22 jenit: working through that was v interesting in terms of highlighting specific issues 12:39:27 ... any comments/questions? 12:39:36 q+ 12:39:40 ack jtandy 12:39:51 jtandy: I agree that taking through the specific examples is great way to learn how this stuff works 12:40:04 looking at what you did, ... I see under Attempt 1, that you have the 2 target JSON files, ... 12:40:10 the ones you created by running through some form of parser 12:40:20 and you also have a metadata.json that's fairly trivial for now 12:40:32 some of your issues from README.md, e.g. dealing with null, conditionals, ... 12:41:10 ... do you anticipate an attempt-2 with an approach to some of those issues? 12:41:29 jenit: I realise I didn't commit all the data 12:41:42 ... the bit I wanted to focus on was how to manage the packages, where you want a set of CSV files to conform 12:41:49 ... and also how they could be better split out 12:41:52 ... be better structured 12:41:57 if you had that kind of metadata 12:42:08 my observation from these files is that they are extremely flat, repetitious 12:42:34 wanted to see how it'd look in an ideal world where we had this csv on the Web approach; in which case, how might they be publishing it differently in such a new world 12:43:11 ... didn't want to stray too far from what's in the metadata spec 12:43:46 q+ to ask about moving these stuff into common github branch - does the structure w/ 'attempts' more or less work for us? 12:44:01 jenit: lots of things not yet agreed, so just exploring 12:44:17 jtandy: in processing these things, have you been able to create any targets; any of the transformed content? 12:44:28 jenit: my focus has been more on validation rather than transformation 12:44:45 https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/output 12:44:48 jenit: within this particular use case there is an output ... 12:45:05 ...... which gives RDF in particular formats 12:45:12 ... could that be generated with the metadata 12:45:24 what would you need to do to get that RDF from those CSV files + metadata? 12:45:27 + maybe templating 12:45:31 1st piece def needs templating 12:45:38 ... but could the packaging be restructured? 12:45:47 q? 12:45:53 ack danbri_ 12:45:53 danbri_, you wanted to ask about moving these stuff into common github branch - does the structure w/ 'attempts' more or less work for us? 12:46:39 danbri: filetree ok? 12:46:48 jenit: suggest we roll it in 12:47:24 resolved: sure, whatever. 12:47:27 :) 12:47:46 jeni: AOB on that example? 12:48:09 (oh, forgot to scribe: earlier Jeni confirmed that the files in output/ can be treated as 'golden triples' for template mapping experiments) 12:48:15 topic: Templates 12:48:32 jeni: we need to decide asap on a course of action w.r.t. whether and how we describe a templating format. 12:48:44 Whether we make it an extension, whether it be done at all, etc. 12:48:52 http://lists.w3.org/Archives/Public/public-csv-wg/2014Sep/0006.html 12:49:14 ... will have a special call next week (Weds as usual), attempting to make a resolution on this. 12:49:21 If we can't get consensus on this, we'll defer until f2f. 12:49:44 AndyS: please go ahead 12:49:52 q+ 12:50:04 AndyS: I'm completely confused by the area and can't make the f2f 12:50:12 AndyS: Is there a f2f attendee list? 12:50:23 jenit: not afaik 12:50:46 danbri: we should all register for TPAC (which involves fee etc) 12:50:56 q+ 12:50:57 ACTION: JeniT to get Ivan to send round reminder re TPAC and to create attendee list 12:50:57 Created ACTION-27 - Get ivan to send round reminder re tpac and to create attendee list [on Jeni Tennison - due 2014-09-17]. 12:51:10 andys: dep on UK trains, I could possibly be at next week's call 12:51:19 jenit: if you can join that's great otherwise please let's be using the mailing list 12:51:30 q? 12:51:32 ack jtandy 12:52:08 jtandy: key point re templating q is balance between applying more resources to create an additional recommendation (the tempating lang) vs a standard that might not be as powerful as we hoped 12:52:11 jenit: that's roughly it 12:52:25 q+ re balance 12:53:01 jeni/jtandy - looking at use cases important 12:53:21 ACTION: jtandy to survey use cases re requirement for templating 12:53:21 Created ACTION-28 - Survey use cases re requirement for templating [on Jeremy Tandy - due 2014-09-17]. 12:53:25 ack fresco 12:53:49 fresco: similar point to make. maybe someone could add to one of the docs, reasons why templating is thought to be useful in the 1st place. 12:54:08 ... not seeing motivation 12:54:32 jenit: what are the patterns of use that we are anticipating seeing? 12:54:44 particularly can the processing be performed on the in-memory data model, rather than on output 12:54:47 ... are we anticipating people who are receiving the data downloading the templates then processing them? 12:54:52 or tools at publisher end 12:55:00 ... what patterns of use do we anticipate? 12:55:16 jenit: anybody want to volunteer to try to capture what those patterns of use might be, around conversion? 12:55:21 ... 12:55:23 ... 12:55:27 [tumbleweed] 12:55:32 jenit: ok, I'll try 12:55:44 ACTION: JeniT to document patterns of use for conversion to different formats 12:55:44 Created ACTION-29 - Document patterns of use for conversion to different formats [on Jeni Tennison - due 2014-09-17]. 12:55:55 (this sounds similar to jtandy's action too) 12:56:23 e.g. https://github.com/w3c/csvw/blob/testing-variations/examples%2Fsimple-weather-observation.md 12:56:28 jenit: similar but UCs have focussed on what the CSV looks like more than what is then done with it. 12:56:36 ... rather than how it fits into workflows. 12:56:36 q? 12:56:39 ack danbri_ 12:56:39 danbri_, you wanted to discuss balance 12:57:04 danbri_: other question is to what extent is this a CSV problem 12:57:11 … there are existing tools eg around Mustache 12:57:25 … do we need something CSV-oriented or are there existing things that could be used 12:57:34 +q 12:57:43 … we keep coming back to Mustache, whereas Django is implementation specific 12:57:45 Velocity 12:57:55 … “is this really a CSV problem?” 12:57:59 ack yakovsh 12:58:12 yakovsh: wanted to mention ECMAScript has tempating built in as well 12:58:28 jenit: got a link? 12:58:36 see also: http://www.polymer-project.org/docs/polymer/expressions.html which is based on HTML Templates 12:58:50 jenit: i was looking at web components, similarly 12:58:56 q+ re web components 12:59:00 http://tc39wiki.calculist.org/es6/template-strings/ 12:59:17 ack danbri_ 12:59:17 danbri_, you wanted to discuss web components 12:59:40 https://people.mozilla.org/~jorendorff/es6-draft.html#sec-template-literal-lexical-components 12:59:54 danbri_: the