CSV on the Web Working Group Teleconference -- 08 Oct 2014

ivan: wonders if we can do anything for planning about the face to face meeting?
... especially who will dial in?

AndyS: is hoping to dial in ... but only if there's an agenda to help target the conversation

<phila> DWBP threw togetehr an outline for planning purposes https://www.w3.org/2013/dwbp/wiki/TPAC_2014

danbri: notes that early morning PST will help for Europeans

ivan: and danbri organise their calendars to sort out the agenda
... for the F2F

dc/schema @context and normative refs issue

danbri: schema.org, DC, plain old English prose
... can we bottom out this conversation.

ivan: we need to define the json terms and then a one or two sentence description of those terms [to define the semantics]
... to have fixed json terms is important because then metadata validators can use them

<rgrp> i'm generally plus one on having *some*

ivan: and we can provide example @context docs to map to other vocabs

rgrp (rufus): a short list recommended terms is useful ... to define a pattern of usage for the community.

... basically in general agreement with ivan

<rgrp> jtandy: its MAY not MUST in terms of use and people can obviously add their own ...

danbri: what is the next action here

<danbri> "specifiy few core terms, keep it small, under 10"

ivan: we need to resolve the action and write down the short list of terms

danbri: notes his comparison of DC with schema.org

could have been an outsider here

Terms in the issue:

created
creator
description
language
license
modified
provenance
publisher
rights
rightsHolder
source
spatial
subject
temporal
title

ivan: shares his list of terms (above) ... but is not convinced that all the terms are necessary
... e.g. spatial and temporal
... the rest is probably ok

<danbri> http://lists.w3.org/Archives/Public/public-csv-wg/2014Oct/0008.html

<danbri> • created: http://schema.org/dateCreated

<danbri> • creator: http://schema.org/creator or http://schema.org/author

<danbri> • description: http://schema.org/description

<danbri> • language: http://schema.org/language (definition applies to actions;

<danbri> could be generalized)

<danbri> • license: http://schema.org/license

<danbri> • modified: http://schema.org/dateModified

<danbri> • provenance: no direct. http://schema.org/evidenceOrigin is related.

<danbri> • publisher: http://schema.org/publisher

<danbri> • rights: no direct mapping

<danbri> • rightsHolder: http://schema.org/copyrightHolder

<danbri> • source: no direct mapping (how does this compare to provenance), not

<danbri> http://schema.org/source which is medical.

<danbri> • spatial: https://schema.org/spatial

<danbri> • subject: http://schema.org/about

<danbri> • temporal: https://schema.org/temporal

<danbri> • title: https://schema.org/name (rather than https://schema.org/title)

ivan: need long discussion to resolve the final list & suggests lubrication with beer to help

<rgrp> i'm +0 on dropping spatial / temporal ...

<rgrp> i have to say dateCreated is kind of nicer to be explicit but i'm easy either way

danbri: may be able to tweak schema.org to get rid of the differences

ivan: it's not really a problem because we can change the "name" for a given term in the @context
... also think we don't need "source"; because the metadata already refers to the csv file resource

<rgrp> i'd vote for source ...

<rgrp> provenance is somewhat fancy ...

<rgrp> or even better: "sources" ...

<rgrp> but i think that takes us away from dc ...

<danbri> jtandy: i was trying some examples myself

<danbri> dcat metadata for dataset i was working on

<danbri> things like lcense in dcat are part of distribution not about dataset

<danbri> there are some diffs in how we vs dcat handle things, in our csv metadata doc

<rgrp> right, but dcat makes it a bit overcomplex there ...

<danbri> based on a comment in recent spec, should we not try to normalize around DCAT given that W3C has chosen this for discovery metadata?

<phila> +1 to normalising with DCAT (surprise surprise)

<rgrp> i think datasets can have license in dcat no?

<danbri> or say that we've chosen not to?

<Zakim> phila, you wanted to talk about cores and onions, rights and licences

<danbri> phila: CSV files are distributions

thanks

<rgrp> phila: the dataset / distribution distinction for a CSV (they are sort of the same here)

<rgrp> but agree generally - that's why you should support multiple resources/distributions ...

phila: about the dropping of spatial and temporal ... it _sometimes_ matters

<rgrp> +q

<rgrp> to be clear - this is not dropping spatial and temporal its about not having them on the special shortlist ...

phila: so we need to let people know that there are other terms outside the core data that people might use
... whether spatial and temporal are important depends on the data

<rgrp> no, no, no ivan ;-)

phila: suggests that we have a category of "useful" as well as "core" ... and if you don't use the "useful" terms then you need a reason

<rgrp> dcat definitely about datasets ;-)

<phila> +1 rgrp

<rgrp> ivan: you are right that definitely for use by data catalogs to talk about the datasets they hold or point to :-)

<danbri> schema.org Dataset is very dcat-inspired, and def about datasets

ivan: about spatial and temporal ... very important to consider that we are defining a very small core set of terms that can be used **without qualification**

<rgrp> +1 to ivan's points - want stuff to be very clear ...

<ivan> db:temporal

<rgrp> jtandy: good summary - +1 to that

<ivan> dc:temporal

ivan: we can't disallow use of other terms (beyond the small core) ... people can use what they want
... there are many situations where things are useful, but let's stick to the core set

rgrp (rufus): the core set in no way prohibits people using other terms; the core should be the set of terms applicable to **every** CSV

scribe: the point is ... that we shouldn't remove spatial because it is applicable to almost every CSV so we should drive that usage

<rgrp> to every or almost every CSV ...

scribe: but notes that there are many ways to express "spatial" so can drive complexity

danbri: we need to give people the freedom to use what they need for their local tool chains and keep the mandatory list _very_ short
... go for provision of _examples_ rather than normative recommendation

rgrp: happy to take the list we have here and update the metadata vocab document

ivan: to avoid misunderstanding, the section listing loads of DC terms should be removed

<phila> I note that the EC's DCAT Application Profile does not include spatial and temporal as mandatory https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final

rgrp: agrees, and will make sure that people know they can add their own terms as necessary

danbri: so is the core list normative?

rgrp: yes

<phila> https://joinup.ec.europa.eu/system/files/project/f9/42/c0/DCAT-AP_Final_v1.00.png

<danbri> rgrp, can you 'ACTION:' yourself a suitable editors task here?

<rgrp> danbri: yes ...

phila: references the DCAT application profile for EU (EC?) ... a list of terms that should be used when describing a dataset & talks of mandatory and optional terms

phila: spatial and temporal are optional; this might provide support for our decision in establishing the core list

<phila> ACTION: rufus to amend metadata draft with the shortlist [recorded in http://www.w3.org/2014/10/08-csvw-minutes.html#action02]

<trackbot> Created ACTION-32 - Amend metadata draft with the shortlist [on Rufus Pollock - due 2014-10-15].

<rgrp> issue #29 ...

rgrp: asks for the final list to be shared after this call

<danbri> [not chair hat] I propose using same short names as schema.org, per my mail above

rgrp: so as above, without spatial and temporal and I get to choose about the english prose of dateCreated or createAt

direct mapping status

ivan: I have a first version running
... didn't hit any major issues
... have not implemented datatype handling yet
... sometimes I find the metadata convoluted (e.g. primarykey)

ivan: the complexity makes the implementation more complex than it would otherwise be ... but nonetheless, it is implementable
... the implementation is the same for both JSON and RDF right up to the point where the output format is chosen
... this is good
... notes that jtandy found issues with the mapping

http://lists.w3.org/Archives/Public/public-csv-wg/2014Oct/0027.html

ivan: how far can we go with the direct mapping?
... need to be sure we don't over complicate things

danbri: I tried to feed the output from ivan's tool into the JSON-LD playground; something not quite right

ivan: needs to check with Greg

danbri: wonders if properties that are not in a namespace get dropped in JSON-LD

<Zakim> danbri, you wanted to ask about http://w3c.github.io/csvw/experiments/simple-templates-jquery/tree-ops/metadata.csvm

ivan: I'm aiming for JSON, not full blown JSON-LD

danbri: what about the specifications?

ivan: only a few changes ... but notes the need to update the metadata vocab as agreed earlier
... datatype area is currently under specified; esp. date formats

<danbri> jtandy: I'm looking forward to contributing as an editor

ivan: notes the need for help with the specification work ... and notes that there's still an XML doc to do too

phila: lots of people still care about XML

<danbri> http://www.google.com/trends/explore#q=XML%2C%20JSON%2C%20SQL

<rgrp> i have to drop if that is ok ...

<danbri> thanks rufus

phila: ivan mentioned dates and datatypes; people write dates inconsistently in CSV files ... how can we handle date normalisation?

<rgrp> +1 on phil's point re bad dates ;-) cf http://okfnlabs.org/bad-data/ex/gla-spending/

ivan: from the conversion point of view it is easy ... using the 'format' specification in the metadata we can convert into a "proper" RDF (xsd) datatype
... but if people write rubbish, what can we do?

phila: because the poor date / datetime writing is so common, can we make a special case for validation?

ivan: there is a "format" metadata term, also there are about 15 well known date forms that could be checked against

danbri: otoh, if date strings are so poor, this could be an argument for tolerance?

phila

phila: agreed, I worry about enforcement of a detailed pattern introducing errors where people don't know the details

<danbri> t-10

danbri: ultimately the only thing that will drive up data quality is getting data used!

R2RML experimentation report (danbri)

danbri: has posted to the mailing list ...
... implementation has been decoupled from SQL and modified to take CSV as input
... the "event" example is working; 10-triples per row and exactly the triples I wanted (matching what people actually use)
... but this is template driven, significantly beyond direct mapping
... is it beyond mustache?
... Shall we chase authors for a Working Group Note?

<danbri> https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/events

<danbri> begun https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/uc-24

ivan: a WG Note (for R2RML) is useful; no problem there.
... do we also want Notes for mustache etc.

<danbri> example https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/attempts/attempt-1/mapping-events.rml.ttl

<danbri> bye AndyS

<danbri> ivan: we need to say how to ref an RML file from our metadata

ivan: we _do_ need to include the Recommendation how to refer to these external templates
... we have 3 mapping processes so far: R2RML, mustache, direct mapping
... would be useful to run through all the use cases to see where the capabilities of each mapping process reach

danbri: there are a few in progress now ... a few more should be enough?

ivan: to be systematic, would go through all use cases ... to document the pros and cons of each approach
... this is a lot of work
... perhaps discuss at the F2F meeting?
... at some point we'll have to build tests _anyway_
... we need proper testing in order to progress to Recommendation

danbri: if we share the R2RML and mustache implementations we're working with already, then others in the group could work through the rest of the use cases
... will nag the R2RML folks to include an Open source license

<ivan> trackbot, end telcon

CSV on the Web Working Group Teleconference

08 Oct 2014

Attendees

Contents

dc/schema @context and normative refs issue

direct mapping status

R2RML experimentation report (danbri)

Summary of Action Items