See also: IRC log
ivan: wonders if we can do anything for
planning about the face to face meeting?
... especially who will dial in?
AndyS: is hoping to dial in ... but only if there's an agenda to help target the conversation
<phila> DWBP threw togetehr an outline for planning purposes https://www.w3.org/2013/dwbp/wiki/TPAC_2014
danbri: notes that early morning PST will help for Europeans
ivan: and danbri organise their calendars to
sort out the agenda
... for the F2F
danbri: schema.org, DC, plain old English
prose
... can we bottom out this conversation.
ivan: we need to define the json terms and
then a one or two sentence description of those terms [to define the
semantics]
... to have fixed json terms is important because then metadata
validators can use them
<rgrp> i'm generally plus one on having *some*
ivan: and we can provide example @context docs to map to other vocabs
rgrp (rufus): a short list recommended terms is useful ... to define a pattern of usage for the community.
... basically in general agreement with ivan
<rgrp> jtandy: its MAY not MUST in terms of use and people can obviously add their own ...
danbri: what is the next action here
<danbri> "specifiy few core terms, keep it small, under 10"
ivan: we need to resolve the action and write down the short list of terms
danbri: notes his comparison of DC with schema.org
could have been an outsider here
Terms in the issue:
ivan: shares his list of terms (above) ...
but is not convinced that all the terms are necessary
... e.g. spatial and temporal
... the rest is probably ok
<danbri> http://lists.w3.org/Archives/Public/public-csv-wg/2014Oct/0008.html
<danbri> • created: http://schema.org/dateCreated
<danbri> • creator: http://schema.org/creator or http://schema.org/author
<danbri> • description: http://schema.org/description
<danbri> • language: http://schema.org/language (definition applies to actions;
<danbri> could be generalized)
<danbri> • license: http://schema.org/license
<danbri> • modified: http://schema.org/dateModified
<danbri> • provenance: no direct. http://schema.org/evidenceOrigin is related.
<danbri> • publisher: http://schema.org/publisher
<danbri> • rights: no direct mapping
<danbri> • rightsHolder: http://schema.org/copyrightHolder
<danbri> • source: no direct mapping (how does this compare to provenance), not
<danbri> http://schema.org/source which is medical.
<danbri> • spatial: https://schema.org/spatial
<danbri> • subject: http://schema.org/about
<danbri> • temporal: https://schema.org/temporal
<danbri> • title: https://schema.org/name (rather than https://schema.org/title)
ivan: need long discussion to resolve the final list & suggests lubrication with beer to help
<rgrp> i'm +0 on dropping spatial / temporal ...
<rgrp> i have to say dateCreated is kind of nicer to be explicit but i'm easy either way
danbri: may be able to tweak schema.org to get rid of the differences
ivan: it's not really a problem because we
can change the "name" for a given term in the @context
... also think we don't need "source"; because the metadata already
refers to the csv file resource
<rgrp> i'd vote for source ...
<rgrp> provenance is somewhat fancy ...
<rgrp> or even better: "sources" ...
<rgrp> but i think that takes us away from dc ...
<danbri> jtandy: i was trying some examples myself
<danbri> dcat metadata for dataset i was working on
<danbri> things like lcense in dcat are part of distribution not about dataset
<danbri> there are some diffs in how we vs dcat handle things, in our csv metadata doc
<rgrp> right, but dcat makes it a bit overcomplex there ...
<danbri> based on a comment in recent spec, should we not try to normalize around DCAT given that W3C has chosen this for discovery metadata?
<phila> +1 to normalising with DCAT (surprise surprise)
<rgrp> i think datasets can have license in dcat no?
<danbri> or say that we've chosen not to?
<Zakim> phila, you wanted to talk about cores and onions, rights and licences
<danbri> phila: CSV files are distributions
thanks
<rgrp> phila: the dataset / distribution distinction for a CSV (they are sort of the same here)
<rgrp> but agree generally - that's why you should support multiple resources/distributions ...
phila: about the dropping of spatial and temporal ... it _sometimes_ matters
<rgrp> +q
<rgrp> to be clear - this is not dropping spatial and temporal its about not having them on the special shortlist ...
phila: so we need to let people know that
there are other terms outside the core data that people might use
... whether spatial and temporal are important depends on the data
<rgrp> no, no, no ivan ;-)
phila: suggests that we have a category of "useful" as well as "core" ... and if you don't use the "useful" terms then you need a reason
<rgrp> dcat definitely about datasets ;-)
<phila> +1 rgrp
<rgrp> ivan: you are right that definitely for use by data catalogs to talk about the datasets they hold or point to :-)
<danbri> schema.org Dataset is very dcat-inspired, and def about datasets
ivan: about spatial and temporal ... very important to consider that we are defining a very small core set of terms that can be used **without qualification**
<rgrp> +1 to ivan's points - want stuff to be very clear ...
<ivan> db:temporal
<rgrp> jtandy: good summary - +1 to that
<ivan> dc:temporal
ivan: we can't disallow use of other terms
(beyond the small core) ... people can use what they want
... there are many situations where things are useful, but let's stick
to the core set
rgrp (rufus): the core set in no way prohibits people using other terms; the core should be the set of terms applicable to **every** CSV
scribe: the point is ... that we shouldn't remove spatial because it is applicable to almost every CSV so we should drive that usage
<rgrp> to every or almost every CSV ...
scribe: but notes that there are many ways to express "spatial" so can drive complexity
danbri: we need to give people the freedom
to use what they need for their local tool chains and keep the mandatory
list _very_ short
... go for provision of _examples_ rather than normative recommendation
rgrp: happy to take the list we have here and update the metadata vocab document
ivan: to avoid misunderstanding, the section listing loads of DC terms should be removed
<phila> I note that the EC's DCAT Application Profile does not include spatial and temporal as mandatory https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-application-profile-data-portals-europe-final
rgrp: agrees, and will make sure that people know they can add their own terms as necessary
danbri: so is the core list normative?
rgrp: yes
<phila> https://joinup.ec.europa.eu/system/files/project/f9/42/c0/DCAT-AP_Final_v1.00.png
<danbri> rgrp, can you 'ACTION:' yourself a suitable editors task here?
<rgrp> danbri: yes ...
phila: references the DCAT application profile for EU (EC?) ... a list of terms that should be used when describing a dataset & talks of mandatory and optional terms
phila: spatial and temporal are optional; this might provide support for our decision in establishing the core list
<phila> ACTION: rufus to amend metadata draft with the shortlist [recorded in http://www.w3.org/2014/10/08-csvw-minutes.html#action02]
<trackbot> Created ACTION-32 - Amend metadata draft with the shortlist [on Rufus Pollock - due 2014-10-15].
<rgrp> issue #29 ...
rgrp: asks for the final list to be shared after this call
<danbri> [not chair hat] I propose using same short names as schema.org, per my mail above
rgrp: so as above, without spatial and temporal and I get to choose about the english prose of dateCreated or createAt
ivan: I have a first version running
... didn't hit any major issues
... have not implemented datatype handling yet
... sometimes I find the metadata convoluted (e.g. primarykey)
<danbri> see also http://lists.w3.org/Archives/Public/public-csv-wg/2014Oct/0032.html -> http://w3c.github.io/csvw/experiments/simple-templates-jquery/test.html
ivan: the complexity makes the
implementation more complex than it would otherwise be ... but
nonetheless, it is implementable
... the implementation is the same for both JSON and RDF right up to the
point where the output format is chosen
... this is good
... notes that jtandy found issues with the mapping
http://lists.w3.org/Archives/Public/public-csv-wg/2014Oct/0027.html
ivan: how far can we go with the direct
mapping?
... need to be sure we don't over complicate things
danbri: I tried to feed the output from ivan's tool into the JSON-LD playground; something not quite right
ivan: needs to check with Greg
danbri: wonders if properties that are not in a namespace get dropped in JSON-LD
<Zakim> danbri, you wanted to ask about http://w3c.github.io/csvw/experiments/simple-templates-jquery/tree-ops/metadata.csvm
ivan: I'm aiming for JSON, not full blown JSON-LD
danbri: what about the specifications?
ivan: only a few changes ... but notes the
need to update the metadata vocab as agreed earlier
... datatype area is currently under specified; esp. date formats
<danbri> jtandy: I'm looking forward to contributing as an editor
ivan: notes the need for help with the specification work ... and notes that there's still an XML doc to do too
phila: lots of people still care about XML
<danbri> http://www.google.com/trends/explore#q=XML%2C%20JSON%2C%20SQL
<rgrp> i have to drop if that is ok ...
<danbri> thanks rufus
phila: ivan mentioned dates and datatypes; people write dates inconsistently in CSV files ... how can we handle date normalisation?
<rgrp> +1 on phil's point re bad dates ;-) cf http://okfnlabs.org/bad-data/ex/gla-spending/
ivan: from the conversion point of view it
is easy ... using the 'format' specification in the metadata we can
convert into a "proper" RDF (xsd) datatype
... but if people write rubbish, what can we do?
phila: because the poor date / datetime writing is so common, can we make a special case for validation?
ivan: there is a "format" metadata term, also there are about 15 well known date forms that could be checked against
danbri: otoh, if date strings are so poor, this could be an argument for tolerance?
phila
phila: agreed, I worry about enforcement of a detailed pattern introducing errors where people don't know the details
<danbri> t-10
danbri: ultimately the only thing that will drive up data quality is getting data used!
danbri: has posted to the mailing list ...
... implementation has been decoupled from SQL and modified to take CSV
as input
... the "event" example is working; 10-triples per row and exactly the
triples I wanted (matching what people actually use)
... but this is template driven, significantly beyond direct mapping
... is it beyond mustache?
... Shall we chase authors for a Working Group Note?
<danbri> https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/events
<danbri> begun https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/uc-24
ivan: a WG Note (for R2RML) is useful; no
problem there.
... do we also want Notes for mustache etc.
<danbri> example https://github.com/w3c/csvw/blob/gh-pages/examples/tests/scenarios/events/attempts/attempt-1/mapping-events.rml.ttl
<danbri> bye AndyS
<danbri> ivan: we need to say how to ref an RML file from our metadata
ivan: we _do_ need to include the
Recommendation how to refer to these external templates
... we have 3 mapping processes so far: R2RML, mustache, direct mapping
... would be useful to run through all the use cases to see where the
capabilities of each mapping process reach
danbri: there are a few in progress now ... a few more should be enough?
ivan: to be systematic, would go through all
use cases ... to document the pros and cons of each approach
... this is a lot of work
... perhaps discuss at the F2F meeting?
... at some point we'll have to build tests _anyway_
... we need proper testing in order to progress to Recommendation
danbri: if we share the R2RML and mustache
implementations we're working with already, then others in the group
could work through the rest of the use cases
... will nag the R2RML folks to include an Open source license
<ivan> trackbot, end telcon