08:18:31 RRSAgent has joined #dxwgdcat 08:18:31 logging to https://www.w3.org/2018/07/05-dxwgdcat-irc 08:19:00 meeting: DCAT team 2018-07-05 08:19:06 chair: SimonCox 08:20:04 regrets: AndreaPerego , PWinstanley , DaveBrowning 08:20:15 present+ 08:20:28 rrsagent, draft minutes 08:20:28 I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox 08:30:58 roba has joined #dxwgdcat 08:31:42 alejandra has joined #dxwgdcat 08:31:52 Jaroslav_Pullmann has joined #dxwgdcat 08:31:58 present+ 08:33:09 regrets: lars 08:33:42 present+ 08:34:49 scribe: alejandra 08:34:59 scribenick: alejandra 08:35:19 DaveBrowning has joined #dxwgdcat 08:35:31 agenda: https://www.w3.org/2017/dxwg/wiki/Meetings:DCAT-Telecon2018.07.05#Main_agenda 08:35:45 Topic: confirm agenda 08:37:03 switch items 5 & 6 08:37:20 topic: Approve minutes from last meeting 08:37:26 https://www.w3.org/2018/06/28-dxwgdcat-minutes 08:37:29 +0 08:37:36 +1 08:37:44 +0 (absent, sent late regrets) 08:37:53 +1 08:38:16 resolved: Approve minutes from last meeting 08:38:36 topic: Catalogues in which dataset is a bag of files 08:38:44 https://github.com/w3c/dxwg/issues/256 08:39:31 q? 08:39:38 +q 08:39:46 ack: alejandra 08:39:47 SimonCox: issue discussed 3 weeks ago 08:39:54 ack alejandra 08:39:59 q+ 08:41:10 alejandra: what about when distributions are bags-of-files? 08:41:39 ... do we need to define an entity 'bag of files' 08:41:53 q+ 08:42:13 ... sibling to dcat:Distribution 08:42:16 q? 08:42:23 ack roba 08:42:26 present+ 08:42:29 I meant entity File 08:42:37 rather than bag-of-files 08:42:50 roba: I was going to raise the relationship with other use cases 08:42:56 ... the case of SOAP services 08:43:06 ... the payload returned is wrapped inside a document 08:43:08 riccardoAlbertoni has joined #dxwgdcat 08:43:20 q+ 08:43:21 ... there is a general need to describe both the packaging and the internal content separately 08:43:31 present+ 08:43:34 ... one way is to say that a distribution conforms to multiple profiles 08:43:48 ... what the wrap containers are 08:43:56 ... I'm sure there are other approaches as well 08:44:05 ... multiple solutions for this problem 08:44:05 ack Jaroslav_Pullmann 08:44:18 Pattern from IDS: [content]->[representation: format + compression etc.]->[artifcat: materialization as file] 08:44:20 Jaroslav_Pullmann: in Genoa we talked about a pattern 08:44:34 ... from IDS 08:44:59 ... representation - the syntax, how data is structured in terms of syntactical data types, media types, compression 08:45:11 ... if we are talking about files, we have to note artifacts 08:45:11 q+ to comment on how much abstraction vs. solving immediate problem 08:45:25 ... artifacts as materialization as file 08:45:30 what is IDS? 08:45:42 ack SimonCox 08:45:42 SimonCox, you wanted to comment on how much abstraction vs. solving immediate problem 08:45:58 IDS: https://www.fraunhofer.de/en/research/lighthouse-projects-fraunhofer-initiatives/industrial-data-space.html 08:46:09 SimonCox: I'm hearing roba and Jaroslav_Pullmann pointing out that we are talking about a special case of a more general problem 08:46:24 ... motivation when proposing this use case was dealing with a legacy issue 08:46:31 ... common issue with existing catalogues 08:46:43 ... as they weren't design to distinguish distributions 08:47:04 ... in the wild repositories often ask people depositing data to give an archive or a set of files 08:47:04 q+ 08:47:24 ... I'm a little bit nervous about loosing the initial common concern 08:47:36 ... alejandra has spotted something important 08:47:54 ... the solution I proposed has missed the representation of the entity file 08:48:12 ... on further reflection I don't think any distribution would be a file 08:48:14 +q 08:48:23 ack Jaroslav_Pullmann 08:48:28 ... what the relationship between a distribution and a file might be? 08:48:37 s/any/every/ 08:48:43 Jaroslav_Pullmann: my reference to IDS was related to alejandra's concept of file 08:49:03 ... cannot we described as it is done in ADMS? 08:49:10 ... it supports nesting of datasets 08:49:24 ... a legacy file, why not use this pattern 08:49:29 ... dataset that has distribution 08:49:40 ... ADMS included asset 08:49:58 ack alejandra 08:50:49 q+ to point out that dct:relation could also manage partonomy (dataset) relations 08:51:12 I was referring to this predicate for purpose of composing "bag" of files: https://www.w3.org/TR/vocab-adms/#adms-includedasset 08:51:13 alejandra: DCAT does not have granulatiry required 08:51:31 s/granulatiry/granularity/ 08:51:40 q? 08:51:59 ack SimonCox 08:51:59 SimonCox, you wanted to point out that dct:relation could also manage partonomy (dataset) relations 08:52:07 granularity for describing the contents of a distribution 08:52:19 the relationship between bag-of-files and distribution is key 08:52:21 https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#class-dataset 08:52:35 as we need clear guidelines on when to use one or the other 08:52:36 https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_relation 08:52:46 and a distribution itself may be a bag-of-files 08:52:55 so potentially we need some recursive representation 08:53:26 See in usage note "One of the more specific sub-properties should be used if the semantics of the link are known." 08:53:41 and 'See also: dct:conformsTo, dcat:distribution, dct:hasPart, dct:references, dct:requires' 08:53:45 SimonCox: showing the current PR with a potential representation 08:54:06 SimonCox: I'm motivating this from cases I've seen in catalogues 08:54:22 ... including some documentation, perhaps a schema, files that are parts of a whole dataset 08:54:29 ... as well as alternative representations 08:54:44 ... subproperties of dct:relation 08:55:12 the usage note provides a sensible explanation, +1 for using "dct:relation" in case we don't know about the details 08:55:29 +q to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html 08:56:02 q? 08:56:07 SimonCox: trying to address alejandra's concern 08:56:10 ack alejandra 08:56:10 alejandra, you wanted to remind about the comment we received https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html 08:56:47 ... both in usage note and notes, the relationship should be used in the semantics are known 08:57:09 ... are you looking for a stronger instruction to users 08:57:44 q+ 08:58:16 alejandra: we need specific examples to illustrate recommended patterns 08:58:29 ... from CKAN, other repositories 08:58:29 +1 to have stronger language 08:59:49 reminder about the comment on the list https://lists.w3.org/Archives/Public/public-dxwg-comments/2018Apr/0001.html 08:59:58 SimonCox: yes, we need to deal with the issue of manifest 09:00:06 ... perhaps the solution is to run some experiments 09:00:11 ... and using some examples 09:00:19 q? 09:00:21 ... and working up with increasing sophistication 09:00:24 ack roba 09:00:38 roba: there are a couple of overlapping concerns 09:00:49 ... how individual distributions bundle things 09:00:59 ... needs to be separated from a dataset as a set of files 09:01:01 yes 09:01:05 i think so 09:01:13 ... is there something saying that distribution is disjoint of a dataset 09:01:24 No i think they are disjoint 09:01:31 q? 09:01:58 roba: is the problem the separate concepts of dataset or distribution 09:02:13 SimonCox: maybe I should have put files 09:02:38 ... I'm looking at CKAN and CSIRO data access portal 09:02:48 ... I think it is call collection in DAP 09:03:05 ... when a person adds a dataset to a repository can add multiple files 09:03:16 ... different representations of a dataset as a whole 09:03:46 +q 09:03:51 q? 09:04:08 roba: the issue is that dataset and distribution are conflated 09:04:17 ... then surely the packaging is a platform specific choice 09:04:25 ... certain platforms can choose a dataset 09:04:50 SimonCox: the issue is that there will be a lot bag of files 09:05:12 roba: another case of qualified relation problem 09:05:23 SimonCox: there are some first class relations 09:05:27 ... dcat:distribution 09:05:54 ... subproperties of dct:relation, it might have been done as qualified relations 09:06:07 ... if you don't know the semantics of the relationship 09:06:16 ... and you're not sure if it is a distribution 09:06:20 ... use a dct:relation 09:06:45 +q to say about dataset and distribution abstraction and evolution of catalogues 09:07:01 q? 09:07:09 SimonCox: we need to give people a recommendation when they don't know what the relationship is 09:07:18 roba: I don't think it is restrictive to legacy 09:07:22 ... it is a common problem 09:07:33 q? 09:07:43 SimonCox: at the mo, there is nothing on the DCAT spec to say people how to deal with this common problem 09:08:04 roba: there ought to be a note to say if there is no specific semantics, use a qualified relationship 09:08:15 SimonCox: how to qualify it if you don't know the relationship? 09:08:20 roba: you could put some note 09:08:44 SimonCox: we're trying to provide a mechanism 09:08:50 ... alternative to distribution 09:09:01 ... CKAN does it wrong 09:09:09 ... because we don't tell them how to represent it 09:09:38 ... for people that are using dcat:distribution incorrectly 09:10:08 ack alejandra 09:10:08 alejandra, you wanted to say about dataset and distribution abstraction and evolution of catalogues 09:10:08 SimonCox: I'd defer the suggestion of a qualified relation 09:11:10 alejandra: is Distribution actually a kind of Dataset? Did DCAT do a conflation? 09:11:24 q+ 09:12:06 q+ to comment that definition of dcat:Distribution as _representation _ needs clarifying 09:12:34 ack SimonCox 09:12:34 SimonCox, you wanted to comment that definition of dcat:Distribution as _representation _ needs clarifying 09:12:44 I raised the issue about evolution of catalogues 09:12:52 what if a dataset was a bag of files 09:13:09 and now the same dataset is given in another representation 09:13:37 q+ 09:13:49 SimonCox: Jaroslav_Pullmann in Genoa was discussing about tighten up the definition of dcat:Distribution as a representation 09:14:19 ... then some of the files I'm talking about in this case, if they are parts of a dataset, might be reasonable also model as representation of other datasets 09:14:39 ... but the general problem you're discussing goes away if we consider a Distribution as a representation 09:14:46 Jaroslav_Pullmann: this would break a lot of things 09:15:02 ... people wouldn't bother about the distinction 09:15:07 ... between abstract data and syntax 09:16:01 q? 09:16:10 ack Jaroslav_Pullmann 09:16:15 Jaroslav_Pullmann: the proposed solution was replying to the idea of file 09:16:15 q? 09:16:48 Jaroslav_Pullmann: I think we have a viable solution 09:16:53 ... that wouldn't break anything 09:17:17 ... it would help people to find files within the catalogue 09:17:26 +q 09:17:38 Jaroslav_Pullmann: what are the use cases for finding datasets 09:18:39 ack alejandra 09:23:17 question about evolution of catalogues 09:23:19 q+ 09:24:18 SimonCox asks alejandra: what is relationship between dcat:Distribution and dcat:File? 09:24:42 when you have a dataset as a bag-of-files and then the dataset is expanded with a new representation 09:24:47 q? 09:24:53 ack Jaroslav_Pullmann 09:25:15 Jaroslav_Pullmann: are not we breaking the crucial distinction between abstract concept and concrete file 09:25:21 ... we are talking about composites 09:25:33 ... we have the wrapper file that is a dataset that is called a boundary 09:25:36 ... archive file 09:25:48 ... ADMS has further notes on the dataset 09:25:53 ... a schema would be a dataset 09:26:21 When I wrote 'bag of files' in the UC, I meant that there woul dbe links from the Dataset intances to each of the files in the bag, but that the dcat:distribution predicate was incorrect for some members of the bag 09:26:37 ... I don't see the problem here if we adopt the distinction between dataset and distribution 09:27:30 SimonCox: I thought we had those cases covered 09:27:44 ... hasPart to point to another dataset 09:27:52 ... conformsTo to point to a schema 09:28:13 Jaroslav_Pullmann: we should not omit the concept of dataset 09:28:21 https://rawgit.com/w3c/dxwg/dcat-dataset-relations-simon/dcat/index.html#Property:dataset_part 09:29:17 SimonCox: probably we should go back to alejandra's proposal about giving examples 09:29:22 ... graduated set of examples 09:29:40 +1 09:30:00 +1 for looking at how this modeling applies to concrete (composite) examples 09:30:16 Action: SimonCox to construct examples to show usage of Dataset -dct:relation etc 09:30:16 Sorry, but no Tracker is associated with this channel. 09:30:57 action: Jaroslav_Pullmann to construct examples of relations from real catalogs 09:30:57 Sorry, but no Tracker is associated with this channel. 09:32:03 action: alejandra also to develop examples of dct:relation etc 09:32:03 Sorry, but no Tracker is associated with this channel. 09:32:44 bye, thanks a lot for the interesting discussion 09:32:59 Very valuable, and constructive... 09:33:07 thanks, and bye! 09:33:15 bye! 09:33:16 rrsagent, draft minutes v2 09:33:16 I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox 09:33:18 present- 09:35:13 rrsagent, make logs public 09:35:22 rrsagent, draft minutes v2 09:35:22 I have made the request to generate https://www.w3.org/2018/07/05-dxwgdcat-minutes.html SimonCox