IRC log of dxwgdcat on 2018-07-05

Timestamps are in UTC.

08:18:31 [RRSAgent]
RRSAgent has joined #dxwgdcat
08:18:31 [RRSAgent]
logging to https://www.w3.org/2018/07/05-dxwgdcat-irc
08:19:00 [SimonCox]
meeting: DCAT team 2018-07-05
08:19:06 [SimonCox]
chair: SimonCox
08:20:04 [SimonCox]
regrets: AndreaPerego , PWinstanley , DaveBrowning
08:20:15 [SimonCox]
present+
08:20:28 [SimonCox]
rrsagent, draft minutes
08:20:28 [RRSAgent]
I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox
08:30:58 [roba]
roba has joined #dxwgdcat
08:31:42 [alejandra]
alejandra has joined #dxwgdcat
08:31:52 [Jaroslav_Pullmann]
Jaroslav_Pullmann has joined #dxwgdcat
08:31:58 [Jaroslav_Pullmann]
present+
08:33:09 [SimonCox]
regrets: lars
08:33:42 [alejandra]
present+
08:34:49 [SimonCox]
scribe: alejandra
08:34:59 [SimonCox]
scribenick: alejandra
08:35:19 [DaveBrowning]
DaveBrowning has joined #dxwgdcat
08:35:31 [SimonCox]
agenda: https://www.w3.org/2017/dxwg/wiki/Meetings:DCAT-Telecon2018.07.05#Main_agenda
08:35:45 [SimonCox]
Topic: confirm agenda
08:37:03 [SimonCox]
switch items 5 & 6
08:37:20 [SimonCox]
topic: Approve minutes from last meeting
08:37:26 [SimonCox]
https://www.w3.org/2018/06/28-dxwgdcat-minutes
08:37:29 [roba]
+0
08:37:36 [SimonCox]
+1
08:37:44 [alejandra]
+0 (absent, sent late regrets)
08:37:53 [Jaroslav_Pullmann]
+1
08:38:16 [SimonCox]
resolved: Approve minutes from last meeting
08:38:36 [SimonCox]
topic: Catalogues in which dataset is a bag of files
08:38:44 [SimonCox]
https://github.com/w3c/dxwg/issues/256
08:39:31 [SimonCox]
q?
08:39:38 [alejandra]
+q
08:39:46 [SimonCox]
ack: alejandra
08:39:47 [alejandra]
SimonCox: issue discussed 3 weeks ago
08:39:54 [SimonCox]
ack alejandra
08:39:59 [roba]
q+
08:41:10 [SimonCox]
alejandra: what about when distributions are bags-of-files?
08:41:39 [SimonCox]
... do we need to define an entity 'bag of files'
08:41:53 [Jaroslav_Pullmann]
q+
08:42:13 [SimonCox]
... sibling to dcat:Distribution
08:42:16 [SimonCox]
q?
08:42:23 [SimonCox]
ack roba
08:42:26 [DaveBrowning]
present+
08:42:29 [alejandra]
I meant entity File
08:42:37 [alejandra]
rather than bag-of-files
08:42:50 [alejandra]
roba: I was going to raise the relationship with other use cases
08:42:56 [alejandra]
... the case of SOAP services
08:43:06 [alejandra]
... the payload returned is wrapped inside a document
08:43:08 [riccardoAlbertoni]
riccardoAlbertoni has joined #dxwgdcat
08:43:20 [SimonCox]
q+
08:43:21 [alejandra]
... there is a general need to describe both the packaging and the internal content separately
08:43:31 [riccardoAlbertoni]
present+
08:43:34 [alejandra]
... one way is to say that a distribution conforms to multiple profiles
08:43:48 [alejandra]
... what the wrap containers are
08:43:56 [alejandra]
... I'm sure there are other approaches as well
08:44:05 [alejandra]
... multiple solutions for this problem
08:44:05 [SimonCox]
ack Jaroslav_Pullmann
08:44:18 [Jaroslav_Pullmann]
Pattern from IDS: [content]->[representation: format + compression etc.]->[artifcat: materialization as file]
08:44:20 [alejandra]
Jaroslav_Pullmann: in Genoa we talked about a pattern
08:44:34 [alejandra]
... from IDS
08:44:59 [alejandra]
... representation - the syntax, how data is structured in terms of syntactical data types, media types, compression
08:45:11 [alejandra]
... if we are talking about files, we have to note artifacts
08:45:11 [SimonCox]
q+ to comment on how much abstraction vs. solving immediate problem
08:45:25 [alejandra]
... artifacts as materialization as file
08:45:30 [alejandra]
what is IDS?
08:45:42 [SimonCox]
ack SimonCox
08:45:42 [Zakim]
SimonCox, you wanted to comment on how much abstraction vs. solving immediate problem
08:45:58 [Jaroslav_Pullmann]
IDS: https://www.fraunhofer.de/en/research/lighthouse-projects-fraunhofer-initiatives/industrial-data-space.html
08:46:09 [alejandra]
SimonCox: I'm hearing roba and Jaroslav_Pullmann pointing out that we are talking about a special case of a more general problem
08:46:24 [alejandra]
... motivation when proposing this use case was dealing with a legacy issue
08:46:31 [alejandra]
... common issue with existing catalogues
08:46:43 [alejandra]
... as they weren't design to distinguish distributions
08:47:04 [alejandra]
... in the wild repositories often ask people depositing data to give an archive or a set of files
08:47:04 [Jaroslav_Pullmann]
q+
08:47:24 [alejandra]
... I'm a little bit nervous about loosing the initial common concern
08:47:36 [alejandra]
... alejandra has spotted something important
08:47:54 [alejandra]
... the solution I proposed has missed the representation of the entity file
08:48:12 [alejandra]
... on further reflection I don't think any distribution would be a file
08:48:14 [alejandra]
+q
08:48:23 [SimonCox]
ack Jaroslav_Pullmann
08:48:28 [alejandra]
... what the relationship between a distribution and a file might be?
08:48:37 [SimonCox]
s/any/every/
08:48:43 [alejandra]
Jaroslav_Pullmann: my reference to IDS was related to alejandra's concept of file
08:49:03 [alejandra]
... cannot we described as it is done in ADMS?
08:49:10 [alejandra]
... it supports nesting of datasets
08:49:24 [alejandra]
... a legacy file, why not use this pattern
08:49:29 [alejandra]
... dataset that has distribution
08:49:40 [alejandra]
... ADMS included asset
08:49:58 [SimonCox]
ack alejandra
08:50:49 [SimonCox]
q+ to point out that dct:relation could also manage partonomy (dataset) relations
08:51:12 [Jaroslav_Pullmann]
I was referring to this predicate for purpose of composing "bag" of files: https://www.w3.org/TR/vocab-adms/#adms-includedasset
08:51:13 [SimonCox]
alejandra: DCAT does not have granulatiry required
08:51:31 [SimonCox]
s/granulatiry/granularity/
08:51:40 [SimonCox]
q?
08:51:59 [SimonCox]
ack SimonCox
08:51:59 [Zakim]
SimonCox, you wanted to point out that dct:relation could also manage partonomy (dataset) relations
08:52:07 [alejandra]
granularity for describing the contents of a distribution
08:52:19 [alejandra]
the relationship between bag-of-files and distribution is key
08:52:21 [SimonCox]
https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#class-dataset
08:52:35 [alejandra]
as we need clear guidelines on when to use one or the other
08:52:36 [SimonCox]
https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_relation
08:52:46 [alejandra]
and a distribution itself may be a bag-of-files
08:52:55 [alejandra]
so potentially we need some recursive representation
08:53:26 [SimonCox]
See in usage note "One of the more specific sub-properties should be used if the semantics of the link are known."
08:53:41 [SimonCox]
and 'See also:dct:conformsTo, dcat:distribution, dct:hasPart, dct:references, dct:requires'
08:53:45 [alejandra]
SimonCox: showing the current PR with a potential representation
08:54:06 [alejandra]
SimonCox: I'm motivating this from cases I've seen in catalogues
08:54:22 [alejandra]
... including some documentation, perhaps a schema, files that are parts of a whole dataset
08:54:29 [alejandra]
... as well as alternative representations
08:54:44 [alejandra]
... subproperties of dct:relation
08:55:12 [Jaroslav_Pullmann]
the usage note provides a sensible explanation, +1 for using "dct:relation" in case we don't know about the details
08:55:29 [alejandra]
+q to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
08:56:02 [SimonCox]
q?
08:56:07 [alejandra]
SimonCox: trying to address alejandra's concern
08:56:10 [SimonCox]
ack alejandra
08:56:10 [Zakim]
alejandra, you wanted to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
08:56:47 [alejandra]
... both in usage note and notes, the relationship should be used in the semantics are known
08:57:09 [alejandra]
... are you looking for a stronger instruction to users
08:57:44 [roba]
q+
08:58:16 [SimonCox]
alejandra: we need specific examples to illustrate recommended patterns
08:58:29 [SimonCox]
... from CKAN, other repositories
08:58:29 [riccardoAlbertoni]
+1 to have stronger language
08:59:49 [alejandra]
reminder about the comment on the list https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
08:59:58 [alejandra]
SimonCox: yes, we need to deal with the issue of manifest
09:00:06 [alejandra]
... perhaps the solution is to run some experiments
09:00:11 [alejandra]
... and using some examples
09:00:19 [SimonCox]
q?
09:00:21 [alejandra]
... and working up with increasing sophistication
09:00:24 [SimonCox]
ack roba
09:00:38 [alejandra]
roba: there are a couple of overlapping concerns
09:00:49 [alejandra]
... how individual distributions bundle things
09:00:59 [alejandra]
... needs to be separated from a dataset as a set of files
09:01:01 [riccardoAlbertoni]
yes
09:01:05 [riccardoAlbertoni]
i think so
09:01:13 [alejandra]
... is there something saying that distribution is disjoint of a dataset
09:01:24 [riccardoAlbertoni]
No i think they are disjoint
09:01:31 [SimonCox]
q?
09:01:58 [alejandra]
roba: is the problem the separate concepts of dataset or distribution
09:02:13 [alejandra]
SimonCox: maybe I should have put files
09:02:38 [alejandra]
... I'm looking at CKAN and CSIRO data access portal
09:02:48 [alejandra]
... I think it is call collection in DAP
09:03:05 [alejandra]
... when a person adds a dataset to a repository can add multiple files
09:03:16 [alejandra]
... different representations of a dataset as a whole
09:03:46 [alejandra]
+q
09:03:51 [SimonCox]
q?
09:04:08 [alejandra]
roba: the issue is that dataset and distribution are conflated
09:04:17 [alejandra]
... then surely the packaging is a platform specific choice
09:04:25 [alejandra]
... certain platforms can choose a dataset
09:04:50 [alejandra]
SimonCox: the issue is that there will be a lot bag of files
09:05:12 [alejandra]
roba: another case of qualified relation problem
09:05:23 [alejandra]
SimonCox: there are some first class relations
09:05:27 [alejandra]
... dcat:distribution
09:05:54 [alejandra]
... subproperties of dct:relation, it might have been done as qualified relations
09:06:07 [alejandra]
... if you don't know the semantics of the relationship
09:06:16 [alejandra]
... and you're not sure if it is a distribution
09:06:20 [alejandra]
... use a dct:relation
09:06:45 [alejandra]
+q to say about dataset and distribution abstraction and evolution of catalogues
09:07:01 [SimonCox]
q?
09:07:09 [alejandra]
SimonCox: we need to give people a recommendation when they don't know what the relationship is
09:07:18 [alejandra]
roba: I don't think it is restrictive to legacy
09:07:22 [alejandra]
... it is a common problem
09:07:33 [SimonCox]
q?
09:07:43 [alejandra]
SimonCox: at the mo, there is nothing on the DCAT spec to say people how to deal with this common problem
09:08:04 [alejandra]
roba: there ought to be a note to say if there is no specific semantics, use a qualified relationship
09:08:15 [alejandra]
SimonCox: how to qualify it if you don't know the relationship?
09:08:20 [alejandra]
roba: you could put some note
09:08:44 [alejandra]
SimonCox: we're trying to provide a mechanism
09:08:50 [alejandra]
... alternative to distribution
09:09:01 [alejandra]
... CKAN does it wrong
09:09:09 [alejandra]
... because we don't tell them how to represent it
09:09:38 [alejandra]
... for people that are using dcat:distribution incorrectly
09:10:08 [SimonCox]
ack alejandra
09:10:08 [Zakim]
alejandra, you wanted to say about dataset and distribution abstraction and evolution of catalogues
09:10:08 [alejandra]
SimonCox: I'd defer the suggestion of a qualified relation
09:11:10 [SimonCox]
alejandra: is Distribution actually a kind of Dataset? Did DCAT do a conflation?
09:11:24 [SimonCox]
q+
09:12:06 [SimonCox]
q+ to comment that definition of dcat:Distribution as _representation _ needs clarifying
09:12:34 [SimonCox]
ack SimonCox
09:12:34 [Zakim]
SimonCox, you wanted to comment that definition of dcat:Distribution as _representation _ needs clarifying
09:12:44 [alejandra]
I raised the issue about evolution of catalogues
09:12:52 [alejandra]
what if a dataset was a bag of files
09:13:09 [alejandra]
and now the same dataset is given in another representation
09:13:37 [Jaroslav_Pullmann]
q+
09:13:49 [alejandra]
SimonCox: Jaroslav_Pullmann in Genoa was discussing about tighten up the definition of dcat:Distribution as a representation
09:14:19 [alejandra]
... then some of the files I'm talking about in this case, if they are parts of a dataset, might be reasonable also model as representation of other datasets
09:14:39 [alejandra]
... but the general problem you're discussing goes away if we consider a Distribution as a representation
09:14:46 [alejandra]
Jaroslav_Pullmann: this would break a lot of things
09:15:02 [alejandra]
... people wouldn't bother about the distinction
09:15:07 [alejandra]
... between abstract data and syntax
09:16:01 [SimonCox]
q?
09:16:10 [SimonCox]
ack Jaroslav_Pullmann
09:16:15 [alejandra]
Jaroslav_Pullmann: the proposed solution was replying to the idea of file
09:16:15 [SimonCox]
q?
09:16:48 [alejandra]
Jaroslav_Pullmann: I think we have a viable solution
09:16:53 [alejandra]
... that wouldn't break anything
09:17:17 [alejandra]
... it would help people to find files within the catalogue
09:17:26 [alejandra]
+q
09:17:38 [alejandra]
Jaroslav_Pullmann: what are the use cases for finding datasets
09:18:39 [SimonCox]
ack alejandra
09:23:17 [alejandra]
question about evolution of catalogues
09:23:19 [Jaroslav_Pullmann]
q+
09:24:18 [SimonCox]
SimonCox asks alejandra: what is relationship between dcat:Distribution and dcat:File?
09:24:42 [alejandra]
when you have a dataset as a bag-of-files and then the dataset is expanded with a new representation
09:24:47 [SimonCox]
q?
09:24:53 [SimonCox]
ack Jaroslav_Pullmann
09:25:15 [alejandra]
Jaroslav_Pullmann: are not we breaking the crucial distinction between abstract concept and concrete file
09:25:21 [alejandra]
... we are talking about composites
09:25:33 [alejandra]
... we have the wrapper file that is a dataset that is called a boundary
09:25:36 [alejandra]
... archive file
09:25:48 [alejandra]
... ADMS has further notes on the dataset
09:25:53 [alejandra]
... a schema would be a dataset
09:26:21 [SimonCox]
When I wrote 'bag of files' in the UC, I meant that there woul dbe links from the Dataset intances to each of the files in the bag, but that the dcat:distribution predicate was incorrect for some members of the bag
09:26:37 [alejandra]
... I don't see the problem here if we adopt the distinction between dataset and distribution
09:27:30 [alejandra]
SimonCox: I thought we had those cases covered
09:27:44 [alejandra]
... hasPart to point to another dataset
09:27:52 [alejandra]
... conformsTo to point to a schema
09:28:13 [alejandra]
Jaroslav_Pullmann: we should not omit the concept of dataset
09:28:21 [SimonCox]
https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_part
09:29:17 [alejandra]
SimonCox: probably we should go back to alejandra's proposal about giving examples
09:29:22 [alejandra]
... graduated set of examples
09:29:40 [alejandra]
+1
09:30:00 [Jaroslav_Pullmann]
+1 for looking at how this modeling applies to concrete (composite) examples
09:30:16 [SimonCox]
Action: SimonCox to construct examples to show usage of Dataset -dct:relation etc
09:30:16 [trackbot]
Sorry, but no Tracker is associated with this channel.
09:30:57 [SimonCox]
action: Jaroslav_Pullmann to construct examples of relations from real catalogs
09:30:57 [trackbot]
Sorry, but no Tracker is associated with this channel.
09:32:03 [SimonCox]
action: alejandra also to develop examples of dct:relation etc
09:32:03 [trackbot]
Sorry, but no Tracker is associated with this channel.
09:32:44 [riccardoAlbertoni]
bye, thanks a lot for the interesting discussion
09:32:59 [DaveBrowning]
Very valuable, and constructive...
09:33:07 [alejandra]
thanks, and bye!
09:33:15 [Jaroslav_Pullmann]
bye!
09:33:16 [SimonCox]
rrsagent, draft minutes v2
09:33:16 [RRSAgent]
I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox
09:33:18 [Jaroslav_Pullmann]
present-
09:35:13 [SimonCox]
rrsagent, make logs public
09:35:22 [SimonCox]
rrsagent, draft minutes v2
09:35:22 [RRSAgent]
I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox