DCAT team 2018-07-05

05 July 2018

Meeting Minutes

confirm agenda

<SimonCox> switch items 5 & 6

Approve minutes from last meeting

<SimonCox> https://‌www.w3.org/‌2018/‌06/‌28-dxwgdcat-minutes

<roba> +0

<SimonCox> +1

+0 (absent, sent late regrets)

<Jaroslav_Pullmann> +1

Resolved: Approve minutes from last meeting

Catalogues in which dataset is a bag of files

<SimonCox> https://‌github.com/‌w3c/‌dxwg/‌issues/‌256

<SimonCox> ack: alejandra

SimonCox: issue discussed 3 weeks ago

<SimonCox> alejandra: what about when distributions are bags-of-files?

<SimonCox> ... do we need to define an entity 'bag of files'

<SimonCox> ... sibling to dcat:Distribution

I meant entity File

rather than bag-of-files

roba: I was going to raise the relationship with other use cases
… the case of SOAP services
… the payload returned is wrapped inside a document
… there is a general need to describe both the packaging and the internal content separately
… one way is to say that a distribution conforms to multiple profiles
… what the wrap containers are
… I'm sure there are other approaches as well
… multiple solutions for this problem

<Jaroslav_Pullmann> Pattern from IDS: [content]->[representation: format + compression etc.]->[artifcat: materialization as file]

Jaroslav_Pullmann: in Genoa we talked about a pattern
… from IDS
… representation - the syntax, how data is structured in terms of syntactical data types, media types, compression
… if we are talking about files, we have to note artifacts
… artifacts as materialization as file

what is IDS?

<Zakim> SimonCox, you wanted to comment on how much abstraction vs. solving immediate problem

<Jaroslav_Pullmann> IDS: https://‌www.fraunhofer.de/‌en/‌research/‌lighthouse-projects-fraunhofer-initiatives/‌industrial-data-space.html

SimonCox: I'm hearing roba and Jaroslav_Pullmann pointing out that we are talking about a special case of a more general problem
… motivation when proposing this use case was dealing with a legacy issue
… common issue with existing catalogues
… as they weren't design to distinguish distributions
… in the wild repositories often ask people depositing data to give an archive or a set of files
… I'm a little bit nervous about loosing the initial common concern
… alejandra has spotted something important
… the solution I proposed has missed the representation of the entity file
… on further reflection I don't think every distribution would be a file
… what the relationship between a distribution and a file might be?

Jaroslav_Pullmann: my reference to IDS was related to alejandra's concept of file
… cannot we described as it is done in ADMS?
… it supports nesting of datasets
… a legacy file, why not use this pattern
… dataset that has distribution
… ADMS included asset

<Jaroslav_Pullmann> I was referring to this predicate for purpose of composing "bag" of files: https://‌www.w3.org/‌TR/‌vocab-adms/#adms-includedasset

<SimonCox> alejandra: DCAT does not have granularity required

<Zakim> SimonCox, you wanted to point out that dct:relation could also manage partonomy (dataset) relations

granularity for describing the contents of a distribution

the relationship between bag-of-files and distribution is key

<SimonCox> https://‌rawgit.com/‌w3c/‌dxwg/‌dcat-dataset-relations-simon/‌dcat/‌index.html#class-dataset

as we need clear guidelines on when to use one or the other

<SimonCox> https://‌rawgit.com/‌w3c/‌dxwg/‌dcat-dataset-relations-simon/‌dcat/‌index.html#Property:dataset_relation

and a distribution itself may be a bag-of-files

so potentially we need some recursive representation

<SimonCox> See in usage note "One of the more specific sub-properties should be used if the semantics of the link are known."

<SimonCox> and 'See also:
… dct:conformsTo, dcat:distribution, dct:hasPart, dct:references, dct:requires'

SimonCox: showing the current PR with a potential representation

SimonCox: I'm motivating this from cases I've seen in catalogues
… including some documentation, perhaps a schema, files that are parts of a whole dataset
… as well as alternative representations
… subproperties of dct:relation

<Jaroslav_Pullmann> the usage note provides a sensible explanation, +1 for using "dct:relation" in case we don't know about the details

SimonCox: trying to address alejandra's concern

<Zakim> alejandra, you wanted to remind about the comment we received https://‌lists.w3.org/‌Archives/‌Public/‌public-dxwg-comments/‌2018Apr/‌0001.html

SimonCox: both in usage note and notes, the relationship should be used in the semantics are known
… are you looking for a stronger instruction to users

<SimonCox> alejandra: we need specific examples to illustrate recommended patterns

<SimonCox> ... from CKAN, other repositories

<riccardoAlbertoni> +1 to have stronger language

reminder about the comment on the list https://‌lists.w3.org/‌Archives/‌Public/‌public-dxwg-comments/‌2018Apr/‌0001.html

SimonCox: yes, we need to deal with the issue of manifest
… perhaps the solution is to run some experiments
… and using some examples
… and working up with increasing sophistication

roba: there are a couple of overlapping concerns
… how individual distributions bundle things
… needs to be separated from a dataset as a set of files

<riccardoAlbertoni> yes

<riccardoAlbertoni> i think so

roba: is there something saying that distribution is disjoint of a dataset

<riccardoAlbertoni> No i think they are disjoint

roba: is the problem the separate concepts of dataset or distribution

SimonCox: maybe I should have put files
… I'm looking at CKAN and CSIRO data access portal
… I think it is call collection in DAP
… when a person adds a dataset to a repository can add multiple files
… different representations of a dataset as a whole

roba: the issue is that dataset and distribution are conflated
… then surely the packaging is a platform specific choice
… certain platforms can choose a dataset

SimonCox: the issue is that there will be a lot bag of files

roba: another case of qualified relation problem

SimonCox: there are some first class relations
… dcat:distribution
… subproperties of dct:relation, it might have been done as qualified relations
… if you don't know the semantics of the relationship
… and you're not sure if it is a distribution
… use a dct:relation

SimonCox: we need to give people a recommendation when they don't know what the relationship is

roba: I don't think it is restrictive to legacy
… it is a common problem

SimonCox: at the mo, there is nothing on the DCAT spec to say people how to deal with this common problem

roba: there ought to be a note to say if there is no specific semantics, use a qualified relationship

SimonCox: how to qualify it if you don't know the relationship?

roba: you could put some note

SimonCox: we're trying to provide a mechanism
… alternative to distribution
… CKAN does it wrong
… because we don't tell them how to represent it
… for people that are using dcat:distribution incorrectly

<Zakim> alejandra, you wanted to say about dataset and distribution abstraction and evolution of catalogues

SimonCox: I'd defer the suggestion of a qualified relation

<SimonCox> alejandra: is Distribution actually a kind of Dataset? Did DCAT do a conflation?

<Zakim> SimonCox, you wanted to comment that definition of dcat:Distribution as _representation _ needs clarifying

I raised the issue about evolution of catalogues

what if a dataset was a bag of files

and now the same dataset is given in another representation

SimonCox: Jaroslav_Pullmann in Genoa was discussing about tighten up the definition of dcat:Distribution as a representation
… then some of the files I'm talking about in this case, if they are parts of a dataset, might be reasonable also model as representation of other datasets
… but the general problem you're discussing goes away if we consider a Distribution as a representation

Jaroslav_Pullmann: this would break a lot of things
… people wouldn't bother about the distinction
… between abstract data and syntax

Jaroslav_Pullmann: the proposed solution was replying to the idea of file

Jaroslav_Pullmann: I think we have a viable solution
… that wouldn't break anything
… it would help people to find files within the catalogue

Jaroslav_Pullmann: what are the use cases for finding datasets

question about evolution of catalogues

<SimonCox> SimonCox asks alejandra: what is relationship between dcat:Distribution and dcat:File?

when you have a dataset as a bag-of-files and then the dataset is expanded with a new representation

Jaroslav_Pullmann: are not we breaking the crucial distinction between abstract concept and concrete file
… we are talking about composites
… we have the wrapper file that is a dataset that is called a boundary
… archive file
… ADMS has further notes on the dataset
… a schema would be a dataset

<SimonCox> When I wrote 'bag of files' in the UC, I meant that there woul dbe links from the Dataset intances to each of the files in the bag, but that the dcat:distribution predicate was incorrect for some members of the bag

Jaroslav_Pullmann: I don't see the problem here if we adopt the distinction between dataset and distribution

SimonCox: I thought we had those cases covered
… hasPart to point to another dataset
… conformsTo to point to a schema

Jaroslav_Pullmann: we should not omit the concept of dataset

<SimonCox> https://‌rawgit.com/‌w3c/‌dxwg/‌dcat-dataset-relations-simon/‌dcat/‌index.html#Property:dataset_part

SimonCox: probably we should go back to alejandra's proposal about giving examples
… graduated set of examples


<Jaroslav_Pullmann> +1 for looking at how this modeling applies to concrete (composite) examples

Action: SimonCox to construct examples to show usage of Dataset -dct:relation etc

<trackbot> Sorry, but no Tracker is associated with this channel.

Action: Jaroslav_Pullmann to construct examples of relations from real catalogs

<trackbot> Sorry, but no Tracker is associated with this channel.

Action: alejandra also to develop examples of dct:relation etc

<trackbot> Sorry, but no Tracker is associated with this channel.

<riccardoAlbertoni> bye, thanks a lot for the interesting discussion

<DaveBrowning> Very valuable, and constructive...

thanks, and bye!

<Jaroslav_Pullmann> bye!

Summary of Action Items

  1. SimonCox to construct examples to show usage of Dataset -dct:relation etc
  2. Jaroslav_Pullmann to construct examples of relations from real catalogs
  3. alejandra also to develop examples of dct:relation etc

Summary of Resolutions

  1. Approve minutes from last meeting
Minutes formatted by Bert Bos's scribe.perl version 2.41 (2018/03/23 13:13:49), a reimplementation of David Booth's scribe.perl. See CVS log.


Succeeded: s/any/every/

Succeeded: s/granulatiry/granularity/