IRC log of dxwgdcat on 2018-07-05
Timestamps are in UTC.
- 08:18:31 [RRSAgent]
- RRSAgent has joined #dxwgdcat
- 08:18:31 [RRSAgent]
- logging to https://www.w3.org/2018/07/05-dxwgdcat-irc
- 08:19:00 [SimonCox]
- meeting: DCAT team 2018-07-05
- 08:19:06 [SimonCox]
- chair: SimonCox
- 08:20:04 [SimonCox]
- regrets: AndreaPerego , PWinstanley , DaveBrowning
- 08:20:15 [SimonCox]
- present+
- 08:20:28 [SimonCox]
- rrsagent, draft minutes
- 08:20:28 [RRSAgent]
- I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox
- 08:30:58 [roba]
- roba has joined #dxwgdcat
- 08:31:42 [alejandra]
- alejandra has joined #dxwgdcat
- 08:31:52 [Jaroslav_Pullmann]
- Jaroslav_Pullmann has joined #dxwgdcat
- 08:31:58 [Jaroslav_Pullmann]
- present+
- 08:33:09 [SimonCox]
- regrets: lars
- 08:33:42 [alejandra]
- present+
- 08:34:49 [SimonCox]
- scribe: alejandra
- 08:34:59 [SimonCox]
- scribenick: alejandra
- 08:35:19 [DaveBrowning]
- DaveBrowning has joined #dxwgdcat
- 08:35:31 [SimonCox]
- agenda: https://www.w3.org/2017/dxwg/wiki/Meetings:DCAT-Telecon2018.07.05#Main_agenda
- 08:35:45 [SimonCox]
- Topic: confirm agenda
- 08:37:03 [SimonCox]
- switch items 5 & 6
- 08:37:20 [SimonCox]
- topic: Approve minutes from last meeting
- 08:37:26 [SimonCox]
- https://www.w3.org/2018/06/28-dxwgdcat-minutes
- 08:37:29 [roba]
- +0
- 08:37:36 [SimonCox]
- +1
- 08:37:44 [alejandra]
- +0 (absent, sent late regrets)
- 08:37:53 [Jaroslav_Pullmann]
- +1
- 08:38:16 [SimonCox]
- resolved: Approve minutes from last meeting
- 08:38:36 [SimonCox]
- topic: Catalogues in which dataset is a bag of files
- 08:38:44 [SimonCox]
- https://github.com/w3c/dxwg/issues/256
- 08:39:31 [SimonCox]
- q?
- 08:39:38 [alejandra]
- +q
- 08:39:46 [SimonCox]
- ack: alejandra
- 08:39:47 [alejandra]
- SimonCox: issue discussed 3 weeks ago
- 08:39:54 [SimonCox]
- ack alejandra
- 08:39:59 [roba]
- q+
- 08:41:10 [SimonCox]
- alejandra: what about when distributions are bags-of-files?
- 08:41:39 [SimonCox]
- ... do we need to define an entity 'bag of files'
- 08:41:53 [Jaroslav_Pullmann]
- q+
- 08:42:13 [SimonCox]
- ... sibling to dcat:Distribution
- 08:42:16 [SimonCox]
- q?
- 08:42:23 [SimonCox]
- ack roba
- 08:42:26 [DaveBrowning]
- present+
- 08:42:29 [alejandra]
- I meant entity File
- 08:42:37 [alejandra]
- rather than bag-of-files
- 08:42:50 [alejandra]
- roba: I was going to raise the relationship with other use cases
- 08:42:56 [alejandra]
- ... the case of SOAP services
- 08:43:06 [alejandra]
- ... the payload returned is wrapped inside a document
- 08:43:08 [riccardoAlbertoni]
- riccardoAlbertoni has joined #dxwgdcat
- 08:43:20 [SimonCox]
- q+
- 08:43:21 [alejandra]
- ... there is a general need to describe both the packaging and the internal content separately
- 08:43:31 [riccardoAlbertoni]
- present+
- 08:43:34 [alejandra]
- ... one way is to say that a distribution conforms to multiple profiles
- 08:43:48 [alejandra]
- ... what the wrap containers are
- 08:43:56 [alejandra]
- ... I'm sure there are other approaches as well
- 08:44:05 [alejandra]
- ... multiple solutions for this problem
- 08:44:05 [SimonCox]
- ack Jaroslav_Pullmann
- 08:44:18 [Jaroslav_Pullmann]
- Pattern from IDS: [content]->[representation: format + compression etc.]->[artifcat: materialization as file]
- 08:44:20 [alejandra]
- Jaroslav_Pullmann: in Genoa we talked about a pattern
- 08:44:34 [alejandra]
- ... from IDS
- 08:44:59 [alejandra]
- ... representation - the syntax, how data is structured in terms of syntactical data types, media types, compression
- 08:45:11 [alejandra]
- ... if we are talking about files, we have to note artifacts
- 08:45:11 [SimonCox]
- q+ to comment on how much abstraction vs. solving immediate problem
- 08:45:25 [alejandra]
- ... artifacts as materialization as file
- 08:45:30 [alejandra]
- what is IDS?
- 08:45:42 [SimonCox]
- ack SimonCox
- 08:45:42 [Zakim]
- SimonCox, you wanted to comment on how much abstraction vs. solving immediate problem
- 08:45:58 [Jaroslav_Pullmann]
- IDS: https://www.fraunhofer.de/en/research/lighthouse-projects-fraunhofer-initiatives/industrial-data-space.html
- 08:46:09 [alejandra]
- SimonCox: I'm hearing roba and Jaroslav_Pullmann pointing out that we are talking about a special case of a more general problem
- 08:46:24 [alejandra]
- ... motivation when proposing this use case was dealing with a legacy issue
- 08:46:31 [alejandra]
- ... common issue with existing catalogues
- 08:46:43 [alejandra]
- ... as they weren't design to distinguish distributions
- 08:47:04 [alejandra]
- ... in the wild repositories often ask people depositing data to give an archive or a set of files
- 08:47:04 [Jaroslav_Pullmann]
- q+
- 08:47:24 [alejandra]
- ... I'm a little bit nervous about loosing the initial common concern
- 08:47:36 [alejandra]
- ... alejandra has spotted something important
- 08:47:54 [alejandra]
- ... the solution I proposed has missed the representation of the entity file
- 08:48:12 [alejandra]
- ... on further reflection I don't think any distribution would be a file
- 08:48:14 [alejandra]
- +q
- 08:48:23 [SimonCox]
- ack Jaroslav_Pullmann
- 08:48:28 [alejandra]
- ... what the relationship between a distribution and a file might be?
- 08:48:37 [SimonCox]
- s/any/every/
- 08:48:43 [alejandra]
- Jaroslav_Pullmann: my reference to IDS was related to alejandra's concept of file
- 08:49:03 [alejandra]
- ... cannot we described as it is done in ADMS?
- 08:49:10 [alejandra]
- ... it supports nesting of datasets
- 08:49:24 [alejandra]
- ... a legacy file, why not use this pattern
- 08:49:29 [alejandra]
- ... dataset that has distribution
- 08:49:40 [alejandra]
- ... ADMS included asset
- 08:49:58 [SimonCox]
- ack alejandra
- 08:50:49 [SimonCox]
- q+ to point out that dct:relation could also manage partonomy (dataset) relations
- 08:51:12 [Jaroslav_Pullmann]
- I was referring to this predicate for purpose of composing "bag" of files: https://www.w3.org/TR/vocab-adms/#adms-includedasset
- 08:51:13 [SimonCox]
- alejandra: DCAT does not have granulatiry required
- 08:51:31 [SimonCox]
- s/granulatiry/granularity/
- 08:51:40 [SimonCox]
- q?
- 08:51:59 [SimonCox]
- ack SimonCox
- 08:51:59 [Zakim]
- SimonCox, you wanted to point out that dct:relation could also manage partonomy (dataset) relations
- 08:52:07 [alejandra]
- granularity for describing the contents of a distribution
- 08:52:19 [alejandra]
- the relationship between bag-of-files and distribution is key
- 08:52:21 [SimonCox]
- https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#class-dataset
- 08:52:35 [alejandra]
- as we need clear guidelines on when to use one or the other
- 08:52:36 [SimonCox]
- https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_relation
- 08:52:46 [alejandra]
- and a distribution itself may be a bag-of-files
- 08:52:55 [alejandra]
- so potentially we need some recursive representation
- 08:53:26 [SimonCox]
- See in usage note "One of the more specific sub-properties should be used if the semantics of the link are known."
- 08:53:41 [SimonCox]
- and 'See also:dct:conformsTo, dcat:distribution, dct:hasPart, dct:references, dct:requires'
- 08:53:45 [alejandra]
- SimonCox: showing the current PR with a potential representation
- 08:54:06 [alejandra]
- SimonCox: I'm motivating this from cases I've seen in catalogues
- 08:54:22 [alejandra]
- ... including some documentation, perhaps a schema, files that are parts of a whole dataset
- 08:54:29 [alejandra]
- ... as well as alternative representations
- 08:54:44 [alejandra]
- ... subproperties of dct:relation
- 08:55:12 [Jaroslav_Pullmann]
- the usage note provides a sensible explanation, +1 for using "dct:relation" in case we don't know about the details
- 08:55:29 [alejandra]
- +q to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
- 08:56:02 [SimonCox]
- q?
- 08:56:07 [alejandra]
- SimonCox: trying to address alejandra's concern
- 08:56:10 [SimonCox]
- ack alejandra
- 08:56:10 [Zakim]
- alejandra, you wanted to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
- 08:56:47 [alejandra]
- ... both in usage note and notes, the relationship should be used in the semantics are known
- 08:57:09 [alejandra]
- ... are you looking for a stronger instruction to users
- 08:57:44 [roba]
- q+
- 08:58:16 [SimonCox]
- alejandra: we need specific examples to illustrate recommended patterns
- 08:58:29 [SimonCox]
- ... from CKAN, other repositories
- 08:58:29 [riccardoAlbertoni]
- +1 to have stronger language
- 08:59:49 [alejandra]
- reminder about the comment on the list https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html
- 08:59:58 [alejandra]
- SimonCox: yes, we need to deal with the issue of manifest
- 09:00:06 [alejandra]
- ... perhaps the solution is to run some experiments
- 09:00:11 [alejandra]
- ... and using some examples
- 09:00:19 [SimonCox]
- q?
- 09:00:21 [alejandra]
- ... and working up with increasing sophistication
- 09:00:24 [SimonCox]
- ack roba
- 09:00:38 [alejandra]
- roba: there are a couple of overlapping concerns
- 09:00:49 [alejandra]
- ... how individual distributions bundle things
- 09:00:59 [alejandra]
- ... needs to be separated from a dataset as a set of files
- 09:01:01 [riccardoAlbertoni]
- yes
- 09:01:05 [riccardoAlbertoni]
- i think so
- 09:01:13 [alejandra]
- ... is there something saying that distribution is disjoint of a dataset
- 09:01:24 [riccardoAlbertoni]
- No i think they are disjoint
- 09:01:31 [SimonCox]
- q?
- 09:01:58 [alejandra]
- roba: is the problem the separate concepts of dataset or distribution
- 09:02:13 [alejandra]
- SimonCox: maybe I should have put files
- 09:02:38 [alejandra]
- ... I'm looking at CKAN and CSIRO data access portal
- 09:02:48 [alejandra]
- ... I think it is call collection in DAP
- 09:03:05 [alejandra]
- ... when a person adds a dataset to a repository can add multiple files
- 09:03:16 [alejandra]
- ... different representations of a dataset as a whole
- 09:03:46 [alejandra]
- +q
- 09:03:51 [SimonCox]
- q?
- 09:04:08 [alejandra]
- roba: the issue is that dataset and distribution are conflated
- 09:04:17 [alejandra]
- ... then surely the packaging is a platform specific choice
- 09:04:25 [alejandra]
- ... certain platforms can choose a dataset
- 09:04:50 [alejandra]
- SimonCox: the issue is that there will be a lot bag of files
- 09:05:12 [alejandra]
- roba: another case of qualified relation problem
- 09:05:23 [alejandra]
- SimonCox: there are some first class relations
- 09:05:27 [alejandra]
- ... dcat:distribution
- 09:05:54 [alejandra]
- ... subproperties of dct:relation, it might have been done as qualified relations
- 09:06:07 [alejandra]
- ... if you don't know the semantics of the relationship
- 09:06:16 [alejandra]
- ... and you're not sure if it is a distribution
- 09:06:20 [alejandra]
- ... use a dct:relation
- 09:06:45 [alejandra]
- +q to say about dataset and distribution abstraction and evolution of catalogues
- 09:07:01 [SimonCox]
- q?
- 09:07:09 [alejandra]
- SimonCox: we need to give people a recommendation when they don't know what the relationship is
- 09:07:18 [alejandra]
- roba: I don't think it is restrictive to legacy
- 09:07:22 [alejandra]
- ... it is a common problem
- 09:07:33 [SimonCox]
- q?
- 09:07:43 [alejandra]
- SimonCox: at the mo, there is nothing on the DCAT spec to say people how to deal with this common problem
- 09:08:04 [alejandra]
- roba: there ought to be a note to say if there is no specific semantics, use a qualified relationship
- 09:08:15 [alejandra]
- SimonCox: how to qualify it if you don't know the relationship?
- 09:08:20 [alejandra]
- roba: you could put some note
- 09:08:44 [alejandra]
- SimonCox: we're trying to provide a mechanism
- 09:08:50 [alejandra]
- ... alternative to distribution
- 09:09:01 [alejandra]
- ... CKAN does it wrong
- 09:09:09 [alejandra]
- ... because we don't tell them how to represent it
- 09:09:38 [alejandra]
- ... for people that are using dcat:distribution incorrectly
- 09:10:08 [SimonCox]
- ack alejandra
- 09:10:08 [Zakim]
- alejandra, you wanted to say about dataset and distribution abstraction and evolution of catalogues
- 09:10:08 [alejandra]
- SimonCox: I'd defer the suggestion of a qualified relation
- 09:11:10 [SimonCox]
- alejandra: is Distribution actually a kind of Dataset? Did DCAT do a conflation?
- 09:11:24 [SimonCox]
- q+
- 09:12:06 [SimonCox]
- q+ to comment that definition of dcat:Distribution as _representation _ needs clarifying
- 09:12:34 [SimonCox]
- ack SimonCox
- 09:12:34 [Zakim]
- SimonCox, you wanted to comment that definition of dcat:Distribution as _representation _ needs clarifying
- 09:12:44 [alejandra]
- I raised the issue about evolution of catalogues
- 09:12:52 [alejandra]
- what if a dataset was a bag of files
- 09:13:09 [alejandra]
- and now the same dataset is given in another representation
- 09:13:37 [Jaroslav_Pullmann]
- q+
- 09:13:49 [alejandra]
- SimonCox: Jaroslav_Pullmann in Genoa was discussing about tighten up the definition of dcat:Distribution as a representation
- 09:14:19 [alejandra]
- ... then some of the files I'm talking about in this case, if they are parts of a dataset, might be reasonable also model as representation of other datasets
- 09:14:39 [alejandra]
- ... but the general problem you're discussing goes away if we consider a Distribution as a representation
- 09:14:46 [alejandra]
- Jaroslav_Pullmann: this would break a lot of things
- 09:15:02 [alejandra]
- ... people wouldn't bother about the distinction
- 09:15:07 [alejandra]
- ... between abstract data and syntax
- 09:16:01 [SimonCox]
- q?
- 09:16:10 [SimonCox]
- ack Jaroslav_Pullmann
- 09:16:15 [alejandra]
- Jaroslav_Pullmann: the proposed solution was replying to the idea of file
- 09:16:15 [SimonCox]
- q?
- 09:16:48 [alejandra]
- Jaroslav_Pullmann: I think we have a viable solution
- 09:16:53 [alejandra]
- ... that wouldn't break anything
- 09:17:17 [alejandra]
- ... it would help people to find files within the catalogue
- 09:17:26 [alejandra]
- +q
- 09:17:38 [alejandra]
- Jaroslav_Pullmann: what are the use cases for finding datasets
- 09:18:39 [SimonCox]
- ack alejandra
- 09:23:17 [alejandra]
- question about evolution of catalogues
- 09:23:19 [Jaroslav_Pullmann]
- q+
- 09:24:18 [SimonCox]
- SimonCox asks alejandra: what is relationship between dcat:Distribution and dcat:File?
- 09:24:42 [alejandra]
- when you have a dataset as a bag-of-files and then the dataset is expanded with a new representation
- 09:24:47 [SimonCox]
- q?
- 09:24:53 [SimonCox]
- ack Jaroslav_Pullmann
- 09:25:15 [alejandra]
- Jaroslav_Pullmann: are not we breaking the crucial distinction between abstract concept and concrete file
- 09:25:21 [alejandra]
- ... we are talking about composites
- 09:25:33 [alejandra]
- ... we have the wrapper file that is a dataset that is called a boundary
- 09:25:36 [alejandra]
- ... archive file
- 09:25:48 [alejandra]
- ... ADMS has further notes on the dataset
- 09:25:53 [alejandra]
- ... a schema would be a dataset
- 09:26:21 [SimonCox]
- When I wrote 'bag of files' in the UC, I meant that there woul dbe links from the Dataset intances to each of the files in the bag, but that the dcat:distribution predicate was incorrect for some members of the bag
- 09:26:37 [alejandra]
- ... I don't see the problem here if we adopt the distinction between dataset and distribution
- 09:27:30 [alejandra]
- SimonCox: I thought we had those cases covered
- 09:27:44 [alejandra]
- ... hasPart to point to another dataset
- 09:27:52 [alejandra]
- ... conformsTo to point to a schema
- 09:28:13 [alejandra]
- Jaroslav_Pullmann: we should not omit the concept of dataset
- 09:28:21 [SimonCox]
- https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_part
- 09:29:17 [alejandra]
- SimonCox: probably we should go back to alejandra's proposal about giving examples
- 09:29:22 [alejandra]
- ... graduated set of examples
- 09:29:40 [alejandra]
- +1
- 09:30:00 [Jaroslav_Pullmann]
- +1 for looking at how this modeling applies to concrete (composite) examples
- 09:30:16 [SimonCox]
- Action: SimonCox to construct examples to show usage of Dataset -dct:relation etc
- 09:30:16 [trackbot]
- Sorry, but no Tracker is associated with this channel.
- 09:30:57 [SimonCox]
- action: Jaroslav_Pullmann to construct examples of relations from real catalogs
- 09:30:57 [trackbot]
- Sorry, but no Tracker is associated with this channel.
- 09:32:03 [SimonCox]
- action: alejandra also to develop examples of dct:relation etc
- 09:32:03 [trackbot]
- Sorry, but no Tracker is associated with this channel.
- 09:32:44 [riccardoAlbertoni]
- bye, thanks a lot for the interesting discussion
- 09:32:59 [DaveBrowning]
- Very valuable, and constructive...
- 09:33:07 [alejandra]
- thanks, and bye!
- 09:33:15 [Jaroslav_Pullmann]
- bye!
- 09:33:16 [SimonCox]
- rrsagent, draft minutes v2
- 09:33:16 [RRSAgent]
- I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox
- 09:33:18 [Jaroslav_Pullmann]
- present-
- 09:35:13 [SimonCox]
- rrsagent, make logs public
- 09:35:22 [SimonCox]
- rrsagent, draft minutes v2
- 09:35:22 [RRSAgent]
- I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox