See also: IRC log
<trackbot> Date: 10 September 2014
hmm
<scribe> scribenick: danbri_
Jeni: discuss a few issues raised over
last month or so. Some explanation re next week's "special call" on our
templating decisions
... I'll be asking around for F2F looking for volunteers to lead
sessions. With only 2 chairs + 2 particpants today we're not quorate for
decisions, but can have a bit of discussion.
<JeniT> https://www.w3.org/2013/csvw/wiki/Meeting_Agenda_2014-09-10
<JeniT> https://github.com/w3c/csvw/issues/22
Jeni: this is an issue from the metadata
document, where we need to figure out what to call the property in the
metadata that refers to the datatype of the values in a partiular column
... there are some constraints here from attempting to adopt json-ld
... while seems a small issue has some impact on relationship w/ json-ld
... json-ld, the interpretation of an 'object', ... using an @type,
using just @type, and using @datatype
... if we use @type, then that is interpreted specially in json-ld to be
w.r.t. the type of the thing being described
<JeniT> { "@type": "Column" }
which in this case would be the columns rather than individual cells
thinks it makes sense, i mean
jeni: 2nd option is to just use the plain
term 'type'. reason this is a little problematic, ... for other
properties on columns, ...
... they're generally treated as meaning whatever Dublin Core says type
means
e.g. we can have a col description that basically says
<JeniT> { "source": "http://example.org/"" }
"source is some-source"
"and that would be interpreted ... the source property would be interpreted as meaning same as the DC property 'source')
but DC has a property 'type', which isn't particularly helpful here
<Zakim> danbri_, you wanted to ask if json-ld context is making the DC mapping explicit
jenit: yes, we'd explicitly map to dc
<JeniT> https://github.com/w3c/csvw/blob/gh-pages/metadata/csvm-context.json
jenit: in repo for what we're doing here,
... then we have got an initial start on what that context looks like
... this came up re simple data format discussion
danbri: when did we decide to fold all of dublin core into our lang?
jenit: happy to discuss
<JeniT> danbri_: maybe we should map schema.org instead
<JeniT> ... that will map on to DublinCore
<JeniT> JeniT: which set of properties?
danbri: "whatever we want"
fresco: ... not sure what the mapping in the context file is, beyond mapping col names to url
jeni: these aren't about mapping col names, but about mapping metadata re particular columns, source it comes from, rights over any data, when that column was created, ... these properties could apply also at the table level. E.g. publisher of a particular CSV file.
fresco: do we expect usage on columns, vs cells?
jenit: you have a global context, ...
rather than it being different for different objects
... i.e. if define all of these things so that they could apply to
tables, ...
which would seem rational, since DC is doc-metadata-centric
scribe: that the upshot of this is that they would also be used to interpret any other column, row metadata
danbri: we could define inline contexts
<fresco> http://schema.org/Dataset
fresco: would be good to look at schema.org to see if it has what we'd need
(plus Organization, Person, etc etc)
(in github here, https://github.com/rvguha/schemaorg )
schema.org doesn't seem to have 'type' yet, at least. http://schema.org/type (but does keep adding stuff)
jenit: to take this fwd, shall I widen it out to be an issue on whether to adopt the DC set of metadata terms, or the schema.org set of metadata terms
jtandy: i think that would be sensible; it doesn't replace the type vs datatype issue
jenit: the issue changes then. the issue becomes potential confusion between 'type' and '@type', since they could be both used but have diff meanings
jtandy: on topic of schema.org, DC; if there is a clear mapping from schema.org to DC, it would appear to be a sensible way fwd
danbri: some terms map; but there are a
few differences
... is there enough specificity in the use cases to drive a decision on
using dc, schema.org etc.
<JeniT> ACTION: JeniT to write to mailing list re using schema.org rather than Dublin Core for metadata about CSV files, then binding decision on following telcon [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action01]
<trackbot> Created ACTION-26 - Write to mailing list re using schema.org rather than dublin core for metadata about csv files, then binding decision on following telcon [on Jeni Tennison - due 2014-09-17].
jtandy: stating a license, stating who is responsible, ... ... but often use cases just say "we need publishing metadata", " this ,category of metadata"
[...]
yakovsh: re datatypes, maybe i'm unfamiliar with rdf, ... is there a link to what datatypes there are?
this? http://www.w3.org/TR/rdf11-concepts/#section-Datatypes
<JeniT> http://w3c.github.io/csvw/metadata/#datatypes
jenit: this assumes standard set of
w3c-designed datatypes, which came from xml schema
... i.e. those most usually used within RDF, but slightly extended to
include Number, Binary, ...
(what's binary mean in CSV?)
<JeniT> http://w3c.github.io/csvw/metadata/#datatypes
yakovsh: having a clear list is important
jenit: see 3.8.4
ah, "the datatype binary which is exactly equivalent to base64Binary"
jenit: we're extending the list here
doc has specific issues flagged
scribe: being consistent with simple data format, and other existing work around w3c
yakovsh: if that list needed to change in
future, how would that work?
... on ietf side, we tend to worry about extensibility
... things change over time
FYI see http://www.w3.org/2001/05/xmlschema-errata for changes in http://www.w3.org/TR/xmlschema-2/ vs earlier version
[...]
jenit: ... various ways, e.g. consider impact on validators
yakovsh: do we want to discuss extensibility?
jenit: yes, should def be part of the pattern of how we work on the standard.
AndyS, any thoughts/conclusions?
Andy - not a lot to say. Mapping is v simple, hardcoded / built-in. Purpose of project 2-fold. Get something working in the time available (google summer of code student). And didn't want to pre-judge WG decisions.
[same update as last week except audio quality 1000x better :]
jenit: work done - anything that makes you feel direction here should be one thing or another?
andys: we didn't push on it beyond column=predicate, ...
goal was code rather than a research project
jenit: next steps with it?
andys: "wait" :)
<JeniT> https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4
JeniT: update from my summer investigation.
Jeni: UC-4 is quite interesting. About publication of information about org structures in the uk civil service
each dept publishes a pair of linked csv files
scribe: same schema(s)
... always in pairs
certain places where schema is extensible
kinda
e.g. dept might have sub-groups
<JeniT> https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/attempts/attempt-1
scribe: readme throws out some of the
issues for discussion
...
special codes in front of columns
the way I addressed it was use of regexes
scribe: struck me afterwards, having a
separate csv file could be a better design for that package of csv files
... made me think about revisiting that design
... some things around co-constraints across the columns
in one of the files, ... if the unique post ref in some column, then the job title must be not-in-post, function must be n/a, etc.
jenit: do we need within schema files to
indicate co-constraints?
... validation constraints. whether cols can be both NULL and required.
i.e. must have NULL value textually in the actual CSV text.
jenit: working through that was v
interesting in terms of highlighting specific issues
... any comments/questions?
jtandy: I agree that taking through the specific examples is great way to learn how this stuff works
looking at what you did, ... I see under Attempt 1, that you have the 2 target JSON files, ...
the ones you created by running through some form of parser
and you also have a metadata.json that's fairly trivial for now
some of your issues from README.md, e.g. dealing with null, conditionals, ...
scribe: do you anticipate an attempt-2 with an approach to some of those issues?
jenit: I realise I didn't commit all the
data
... the bit I wanted to focus on was how to manage the packages, where
you want a set of CSV files to conform
... and also how they could be better split out
... be better structured
if you had that kind of metadata
my observation from these files is that they are extremely flat, repetitious
wanted to see how it'd look in an ideal world where we had this csv on the Web approach; in which case, how might they be publishing it differently in such a new world
scribe: didn't want to stray too far from what's in the metadata spec
jenit: lots of things not yet agreed, so just exploring
jtandy: in processing these things, have you been able to create any targets; any of the transformed content?
jenit: my focus has been more on validation rather than transformation
<JeniT> https://github.com/w3c/csvw/tree/testing-variations/examples/tests/scenarios/uc-4/output
jenit: within this particular use case
there is an output ...
... which gives RDF in particular formats
... could that be generated with the metadata
what would you need to do to get that RDF from those CSV files + metadata?
+ maybe templating
1st piece def needs templating
scribe: but could the packaging be restructured?
<Zakim> danbri_, you wanted to ask about moving these stuff into common github branch - does the structure w/ 'attempts' more or less work for us?
danbri: filetree ok?
jenit: suggest we roll it in
resolved: sure, whatever.
:)
jeni: AOB on that example?
(oh, forgot to scribe: earlier Jeni confirmed that the files in output/ can be treated as 'golden triples' for template mapping experiments)
jeni: we need to decide asap on a course of action w.r.t. whether and how we describe a templating format.
Whether we make it an extension, whether it be done at all, etc.
<JeniT> http://lists.w3.org/Archives/Public/public-csv-wg/2014Sep/0006.html
scribe: will have a special call next week (Weds as usual), attempting to make a resolution on this.
If we can't get consensus on this, we'll defer until f2f.
AndyS: please go ahead
... I'm completely confused by the area and can't make the f2f
... Is there a f2f attendee list?
jenit: not afaik
danbri: we should all register for TPAC (which involves fee etc)
<JeniT> ACTION: JeniT to get Ivan to send round reminder re TPAC and to create attendee list [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action02]
<trackbot> Created ACTION-27 - Get ivan to send round reminder re tpac and to create attendee list [on Jeni Tennison - due 2014-09-17].
andys: dep on UK trains, I could possibly be at next week's call
jenit: if you can join that's great otherwise please let's be using the mailing list
jtandy: key point re templating q is balance between applying more resources to create an additional recommendation (the tempating lang) vs a standard that might not be as powerful as we hoped
jenit: that's roughly it
jeni/jtandy - looking at use cases important
<JeniT> ACTION: jtandy to survey use cases re requirement for templating [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action03]
<trackbot> Created ACTION-28 - Survey use cases re requirement for templating [on Jeremy Tandy - due 2014-09-17].
fresco: similar point to make. maybe
someone could add to one of the docs, reasons why templating is thought
to be useful in the 1st place.
... not seeing motivation
jenit: what are the patterns of use that we are anticipating seeing?
<fresco> particularly can the processing be performed on the in-memory data model, rather than on output
jenit: are we anticipating people who are receiving the data downloading the templates then processing them?
or tools at publisher end
scribe: what patterns of use do we anticipate?
jenit: anybody want to volunteer to try to
capture what those patterns of use might be, around conversion?
...
...
[tumbleweed]
jenit: ok, I'll try
<JeniT> ACTION: JeniT to document patterns of use for conversion to different formats [recorded in http://www.w3.org/2014/09/10-csvw-minutes.html#action04]
<trackbot> Created ACTION-29 - Document patterns of use for conversion to different formats [on Jeni Tennison - due 2014-09-17].
(this sounds similar to jtandy's action too)
<AndyS> e.g. https://github.com/w3c/csvw/blob/testing-variations/examples%2Fsimple-weather-observation.md
jenit: similar but UCs have focussed on
what the CSV looks like more than what is then done with it.
... rather than how it fits into workflows.
<Zakim> danbri_, you wanted to discuss balance
<JeniT> danbri_: other question is to what extent is this a CSV problem
<JeniT> ... there are existing tools eg around Mustache
<JeniT> ... do we need something CSV-oriented or are there existing things that could be used
<JeniT> ... we keep coming back to Mustache, whereas Django is implementation specific
<AndyS> Velocity
<JeniT> ... "is this really a CSV problem?"
yakovsh: wanted to mention ECMAScript has tempating built in as well
jenit: got a link?
see also: http://www.polymer-project.org/docs/polymer/expressions.html which is based on HTML Templates
jenit: i was looking at web components, similarly
<yakovsh> http://tc39wiki.calculist.org/es6/template-strings/
<Zakim> danbri_, you wanted to discuss web components
<yakovsh> https://people.mozilla.org/~jorendorff/es6-draft.html#sec-template-literal-lexical-components
<JeniT> danbri_: the <template> doesn't do interpolation; Polymer builds Mustache on top of it
jenit: AOB?
yakovsh: [missed] something re XML, XSLT?
jeni: I heart XSLT
Adjourned.
ALL HANDS NEXT WEEK
<yakovsh> xslt for templating
Thanks Jeni