20:56:05 RRSAgent has joined #dxwgdcat 20:56:05 logging to https://www.w3.org/2019/02/05-dxwgdcat-irc 20:56:18 rrsagent, make logs public 20:57:05 meeting: DXWG DCAT Working Session teleconference 05 February 2019 - Distributions 20:57:28 rrsagent, draft minutes v2 20:57:28 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html DaveBrowning 20:58:01 present+ 20:58:13 riccardoAlbertoni has joined #dxwgdcat 20:58:22 agenda: https://www.w3.org/2017/dxwg/wiki/Meetings:DCAT-Telecon2019.02.05 20:58:26 present+ 20:59:31 rrsagent, draft minutes v2 20:59:31 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html DaveBrowning 21:00:03 AndreaPerego has joined #dxwgdcat 21:00:49 alejandra has joined #dxwgdcat 21:00:49 present+ 21:01:12 present+ 21:01:13 present+ 21:01:28 hello! 21:05:58 SimonCox has joined #dxwgdcat 21:06:04 present+ 21:06:20 https://docs.google.com/document/d/18tFkR3PP7DECjBnQjsIf0_XD8i51-UTIBPuCMtQUXyw/edit?usp=sharing 21:07:38 +q 21:07:51 ack alejandra 21:07:59 q+ to say we should scribe as for any other meeting 21:08:44 PWinstanley_ has joined #dxwgdcat 21:09:34 ack AndreaPerego 21:09:34 AndreaPerego, you wanted to say we should scribe as for any other meeting 21:09:42 Makx has joined #dxwgdcat 21:09:53 I think we need to first discuss the set of issues to discuss within the broad topic of distributions 21:10:12 present+ Makx 21:10:52 scribenick: DaveBrowning 21:10:57 I created two github projects: one with issues related to distribution definition and relationship to profiles and the second one with Makx's set of issues related to packaging and file composition of distributions 21:11:19 then I suggest we use IRC as usual and the google doc for collaborative editing of the text once we reach some conclusions 21:11:31 RRSAgent, draft minutes v2 21:11:31 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 21:11:57 links to project with relevent issues: 21:11:57 https://github.com/w3c/dxwg/projects/8 21:12:05 https://github.com/w3c/dxwg/projects/6 21:13:00 +1 21:13:30 PWinstanley has joined #dxwgdcat 21:14:25 Focus on definition first 21:14:43 ...ie https://github.com/w3c/dxwg/projects/8 21:14:56 ....which has had some recent discussion 21:15:24 +q to mention Clement's comment about profiles and distributions 21:15:27 SimonCox: Original definition had this concept of a downloadable thing/file 21:16:05 ... introduction of dataservices tighethened up the definition of distribution 21:16:55 ... with teh idea that its a representation in REST term 21:17:10 s/tighethened/tightened 21:17:12 s/tighethened/tightened/ 21:17:37 s/teh idea/the idea/ 21:18:04 The issue we are discussing is: https://github.com/w3c/dxwg/issues/317 21:18:12 ... so what is the notion of distribution for? 21:19:09 q+ to ask what kind of differences would be acceptable 21:19:31 ack alejandra 21:19:31 alejandra, you wanted to mention Clement's comment about profiles and distributions 21:19:56 https://github.com/w3c/dxwg/issues/531 21:20:11 alejandra: Rob's original issue has an echo in Clemen's question & example 21:20:59 ... how do profiles and different distributions for teh same dataset interact> 21:21:03 https://github.com/w3c/dxwg/issues/411 21:22:12 q? 21:22:24 ack DaveBrowning 21:22:24 DaveBrowning, you wanted to ask what kind of differences would be acceptable 21:22:42 Makx_ has joined #dxwgdcat 21:23:12 PWinstanley has joined #dxwgdcat 21:27:32 RRSAgent, draft minutes v2 21:27:32 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 21:28:00 q+ 21:29:09 +q 21:29:12 q+ 21:29:25 ack AndreaPerego 21:30:51 SimonCox: Unikely to have a very hard and fast decision - will be domain (and publisher) dependent 21:31:51 AndreaPerego: Lots of variation in the industry 21:32:12 q+ 21:32:33 ... there doesn't seem to be a defining rule 21:33:02 q- 21:33:26 ... informationally equivalent is too hard a rule to be applied in every situation 21:33:49 although we are saying that we leave it to the providers, what mechanism are we providing for them to actively say that they are informationally equivalent (or not). I'm thinking here about datasets that might be too large for people to examine in detail 21:34:02 ... give examples of how things can be done 21:34:05 q? 21:34:12 ack alejandra 21:34:49 alejandra: including any hard rule won't help 21:35:23 Makx has joined #dxwgdcat 21:35:37 q+ 21:35:47 ack riccardoAlbertoni 21:36:21 alejandra: including additional files and information looks more valuable 21:36:48 q? 21:37:01 +q 21:37:53 riccardoAlbertoni: if we had examples, then we could make it clear where informationally equivalent wouldn't be useful 21:37:59 I think we need to consider the support for associating files to distributions 21:38:16 q? 21:39:34 I think we need to move away from 'informationally equivalent' goal 21:39:44 q+ to ask about alignment with services 21:41:15 ack AndreaPerego 21:42:10 AndreaPerego: we should also try to understand what the definition is for - most data providers have a strong view... 21:42:52 ... sometimes they will use non-equivalent distributions sometimes equivalent. This should be informative guidance 21:43:09 q+ 21:43:16 ... we can show how people use it 21:43:35 ... there is now right and wrong 21:43:41 s/now/no/ 21:44:27 ... shouldn't over harmonize 21:44:33 q? 21:44:37 ack alejandra 21:45:07 alejandra: Suggest we avoid informationally equivalent... 21:45:53 ... examples only talk of formats, but we need to acknowledge other ways that distributions might differ 21:46:03 ... suggest we vote... 21:46:05 q? 21:46:12 ack DaveBrowning 21:46:12 DaveBrowning, you wanted to ask about alignment with services 21:47:24 DaveBrowning: distribution concept is the access information 21:47:39 ... and there is no guarantee that what you are going to find is equivalent 21:47:49 ... you can describe it in a more precise way but you have to add all of that 21:50:36 ack Makx 21:52:11 Makx: Should keep the definition succinct 21:52:57 ... it gives access to a file that gives data for the dataset 21:53:15 ?+ to ask about machines, rather than people, and how they might 'decide' between what to select when there are choices of the same dataset 21:53:25 +q 21:53:30 q+ 21:53:52 ... but examples are good 21:55:37 q? 21:57:07 ack alejandra 21:57:51 alejandra: distribution is the representation of the data, yes 21:58:21 ... but only gives examples of differ by format - we have practice of other differences 21:58:47 +1 no no more info equivalence... 21:58:55 Distribution is the data, there might be choices, one might be uncleaned (full of duplicates) and the other is cleaned up. Is there going to be any mechanism for a machine to know that there is the same information in the two? 21:59:38 ... but we shouldn't talk about info equivalent 22:00:14 ok 22:00:37 +1 to alejandra 22:01:23 proposed: we won't require different distributions to be informational equivalent and leave this as a judgement call by the data providers 22:01:29 +1 22:01:31 +1 22:01:34 +1 22:01:34 +1 22:01:36 +1 22:01:58 q+ 22:02:07 +1 22:02:27 resolved: e won't require different distributions to be informational equivalent and leave this as a judgement call by the data providers 22:02:31 RRSAgent, draft minutes v2 22:02:31 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:02:40 q? 22:02:48 scribenick: AndreaPerego 22:02:56 ack PWinstanley 22:03:03 s/e won't/we won't/ 22:03:29 q+ 22:03:43 DaveBrowning: The second part you were mentioning, alejandra, you said we should acknowledge the different examples of distributions, right? 22:03:45 q- 22:03:50 ack DaveBrowning 22:03:58 alejandra: Yep. I can write a proposal and see if people agree. 22:03:59 proposed: to add to the dcat:Distribution definition a mention of that they may differ in various ways, including Natural language Media-type or format Schematic organization Temporal and spatial resolution, level of detail ... 22:04:09 +1 22:04:09 ... By copying SimonCox's text :) 22:04:09 +1 22:04:12 +1 22:04:14 +1 22:04:29 +1 22:04:46 +1 22:06:51 the current definition is" “A specific representation of a dataset. A dataset might be available in several different forms, and these forms might comprise both different serializations or different schematic arrangements of the same data. Examples of distributions include a CSV file, a netCDF file, a JSON document, or a data-cube.” 22:06:56 0 22:08:17 resolved: to add to the dcat:Distribution definition a mention of that they may differ in various ways, including Natural language Media-type or format Schematic organization Temporal and spatial resolution, level of detail ... 22:08:20 resolved: to add to the dcat:Distribution definition a mention of that they may differ in various ways, including Natural language Media-type or format Schematic organization Temporal and spatial resolution, level of detail ... 22:08:23 RRSAgent, draft minutes v2 22:08:23 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:08:52 s/resolved: to add to the dcat:Distribution definition a mention of that they may differ in various ways, including Natural language Media-type or format Schematic organization Temporal and spatial resolution, level of detail ...// 22:08:55 RRSAgent, draft minutes v2 22:08:55 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:09:04 maybe let's check again: https://github.com/w3c/dxwg/issues/411 22:09:08 DaveBrowning: Any other aspects we should discuss? 22:09:21 and see if we can close it after addressing this? 22:11:03 alejandra: Have we also addressed the long discussion on informational equivalence in https://github.com/w3c/dxwg/issues/411 ? 22:11:55 DaveBrowning: Overall, yes, IMO the main concerns are addressed by the resolution. 22:12:28 ... So let's move to the other aspects we mentioned. 22:13:14 ... Let's wait to close #411 after we have revised as decided the DCAT spec. 22:13:31 DaveBrowning: Should we look at the other issues in sprint 1? 22:14:18 +q 22:14:30 ack alejandra 22:14:58 alejandra: There was a lot of discussion on the profile one, so it would sensible to see if we can reach some agreement on it. 22:15:02 +1 22:15:02 +1 22:15:05 https://github.com/w3c/dxwg/issues/531 22:15:07 +1 22:15:16 +1 22:15:29 +1 22:15:37 +1 22:16:05 most of the discussion happened here: https://github.com/w3c/dxwg/issues/317 22:16:09 but 531 is related 22:16:49 [all]: They look like the same issue. 22:17:26 alejandra: SimonCox, you say the one to be closed should be the Clemens's one, right? 22:17:47 SimonCox: Yes, saying that this is addressed by the decision on the other one. 22:17:58 q? 22:18:27 q+ 22:18:39 alejandra: Actually, Clemens refers to informational equivalence, but about profiles. So, do we need to distinguish between profiles and distributions? Or I misunderstanding Clemens's point. 22:18:45 ack Makx 22:19:00 s/Or I /Or I'm/ 22:19:40 Makx: Yes, Clemens mentions profiles, but it is actually about different distributions (in different profiles which are not informationally equivalent). 22:20:05 alejandra: So we can add profiles in the Google doc as one of the examples of how distributions can be different. 22:20:14 +1 to add "profile" on the list of the possible variations 22:21:17 DaveBrowning: So this looks like we can close #317 22:21:44 RRSAgent, draft minutes v2 22:21:44 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:21:48 Moved https://github.com/w3c/dxwg/issues/531 to 'In progress' 22:22:05 DaveBrowning: We still have citations and distributions. 22:22:22 yes 22:22:29 +q 22:23:54 alejandra: I think what we need to do is to do the same we did for datasets. So, what you need to cite is a persistent identifier, and what should be cited should have bib metadata (as authors, publisher, publication year). 22:23:56 q+ 22:24:16 q? 22:24:25 q- 22:24:28 alejandra: Can we assume that the authors of the dataset are the authors of the distribution? 22:24:38 ack AndreaPerego 22:25:00 all was considered here: https://github.com/w3c/dxwg/issues/61 22:25:50 data citation principles: https://www.force11.org/datacitationprinciples 22:27:23 +q 22:28:00 q+ 22:28:06 ack alejandra 22:29:47 ack SimonCox 22:31:07 q+ 22:31:13 [all]: [discussing on data citation approaches] 22:31:21 ack Makx 22:31:43 +1 to Makx , AndreaPerego , SimonCox 22:31:44 fine by me, I agree with Andrea 22:32:04 +1 to let the citation on dataset 22:32:04 +1 to not duplicating 22:32:18 SimonCox: I'm concerned that we are making things (datasets & distributions) too similar intensionally. 22:32:24 Makx: Same concern. 22:32:41 keep to the profile 22:32:45 DaveBrowning: So we need to reply to Annette. 22:33:49 ... Who wants to do that? 22:33:54 SimonCox: I can do that. 22:33:55 I can revise the whole https://github.com/w3c/dxwg/issues/411 to see if there is anything missing or it is addressed 22:34:37 DaveBrowning: About https://github.com/w3c/dxwg/issues/411 , what we decided? 22:35:20 alejandra: I think we decided we have to review it to see if the resolution on informational equivalence is enough to address it, or there's also something else. 22:35:42 alejandra: I can take care of that. 22:37:01 https://github.com/w3c/dxwg/projects/6 22:37:08 q+ 22:37:09 DaveBrowning: Should we look at these issues ^^ ? 22:37:18 ack Makx 22:37:42 Makx: Last week I gave a summary about a resolution could be on this. 22:38:22 ... We could go through them. 22:38:22 +1 to adding the two properties 22:38:52 DaveBrowning: Which of these are not backward compatible? 22:39:38 Makx: There may be implementations out there that are using mediatype for that, since there was no other option. 22:40:26 q+ 22:40:35 ack AndreaPerego 22:41:09 s\+1 to adding the two properties\+1 to adding the two properties for compressed distributions 22:42:39 DaveBrowning: Seems that there's no objection to your proposal, Makx. 22:43:06 ... So please go ahead and make a proposal. 22:43:15 alejandra: Can you submit it as a PR? 22:43:21 Makx: I'll try. 22:44:07 DaveBrowning: So, what we do now? 22:44:19 alejandra: Maybe we plan the next sprint. 22:44:32 DaveBrowning: I think another key issue is the one about versioning. 22:45:09 ... So, this can be a sensible discussion. 22:46:01 qq+ 22:46:06 SimonCox: We have 2 options: do a lot of work, or realise we cannot do all that work, and try to address the issues to the best extent we can. 22:46:16 qq+ 22:46:19 q+ 22:46:51 ack mak 22:46:55 DaveBrowning: It may be also the case that we decide that all these should go into a guidance document. 22:47:00 ack Makx 22:47:10 SimonCox: Would you prepare a number of options on how we can address this? 22:47:57 Makx: We are actually not starting from scratch on versioning. I would support SimonCox's proposal not to do much. But we can pick up what we did before, at the beginning of the WG. 22:48:33 ... I don't think we should define what versioning is, but rather, in case you have versioned data, these are the properties you can use. 22:48:53 q+ 22:49:01 ack DaveBrowning 22:49:01 ... Also here, we are dealing with a domain-specific problem, which is addressed in many different ways. 22:49:07 +1 to not define what is versioning but indicating some simple set of terms that can be used.. 22:49:25 DaveBrowning: Yes, I also think we shouldn't do much, just because we don't have the time. 22:50:22 +q 22:50:25 ... And to answer to SimonCox's point, I'll prepare some options. 22:50:50 q- 22:50:55 DaveBrowning: About when to meet... 22:51:04 +1 on 2 hours for the usual DCAT subgroup slot 22:51:06 +1 to regular slot extended 22:51:10 Makx: Maybe next week, normal slot. 22:51:49 we need to be on the plenary 22:51:50 RRSAgent, draft minutes v2 22:51:50 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:51:55 so same slot is better 22:52:34 bye thanks 22:52:39 RRSAgent, draft minutes v2 22:52:39 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:52:46 [meeting adjourned] 22:52:51 chair: DaveBrowning 22:52:53 RRSAgent, draft minutes v2 22:52:53 I have made the request to generate https://www.w3.org/2019/02/05-dxwgdcat-minutes.html AndreaPerego 22:52:53 bye 22:53:06 thanks all 22:53:07 bye 22:53:11 bye