07:55:15 RRSAgent has joined #dxwg 07:55:15 logging to http://www.w3.org/2017/07/18-dxwg-irc 07:55:17 RRSAgent, make logs public 07:55:17 Zakim has joined #dxwg 07:55:19 Zakim, this will be 07:55:19 I don't understand 'this will be', trackbot 07:55:20 Meeting: Dataset Exchange Working Group Teleconference 07:55:20 Date: 18 July 2017 07:55:42 s/Teleconference/Oxford F2F Day 2/ 07:55:51 chair: Karen, Caroline 07:56:27 LuizBonino has joined #dxwg 07:57:42 newton has joined #dxwg 08:02:38 newton has joined #dxwg 08:03:21 DaveBrowning has joined #dxwg 08:03:44 present+ DaveBrowning 08:03:48 antoine has joined #dxwg 08:03:55 present+ antoine 08:05:48 annette_g has joined #dxwg 08:08:22 Caroline_ has joined #DXWG 08:08:25 Present+ 08:08:29 kcoyle has joined #dxwg 08:09:21 SimonCox has joined #dxwg 08:10:51 present+ 08:10:59 annette_g has joined #dxwg 08:11:07 Ine has joined #dxwg 08:11:09 Present+ annette_g 08:11:22 Present+ Ine 08:11:38 present+ 08:12:41 Thomas has joined #dxwg 08:12:49 present+ 08:13:19 Scribe annette_g 08:13:48 present+ 08:14:03 PWinstanley has joined #dxwg 08:14:19 present+ PWinstanley 08:14:34 LarsG has joined #dxwg 08:14:38 kcoyle: no need to revise the agenda 08:14:43 present+ 08:14:44 prewent+ 08:14:53 present+ Dave_Raggett 08:14:56 Makx has joined #dxwg 08:14:58 s/prewent+/present+/ 08:15:09 present+ Makx 08:15:13 rrsagent, set logs publc 08:15:15 Jaroslav_Pullmann has joined #dxwg 08:15:15 ic 08:15:21 present+ 08:15:28 danbri has joined #dxwg 08:15:28 rrsagent, set logs public 08:15:34 kcoyle: the first group are gathered under "dataset distributions". Some may be more pertinent than others. Start with use case 1. 08:15:45 Topic: Dataset Distributions 08:16:29 Makx: a lot of people produce zip files as distros. But you don't know what's in it. What kind of formats have been packaged in it? 08:17:09 alejandra has joined #dxwg 08:17:51 Makx: a lot of people argue that the right way to do that is to have zip files within zip files. We got into a discussion where you have an extra field "representation technique" to explain that. Is there a way to describe packaging better? 08:17:57 q+ to ask about URI params for addressing inside zip files 08:18:05 ack dsr 08:18:05 dsr, you wanted to ask about URI params for addressing inside zip files 08:18:10 present+ 08:18:36 Dsr: it sounds like something could be done in DCAT where you talk about the kind of distribution. 08:18:38 q? 08:18:39 zakim: who is present? 08:18:46 present+ 08:18:58 q+ 08:19:04 Makx: I have no opinion on how to do it. 08:19:09 present+ 08:19:22 q? 08:19:34 Makx: the issue is that just getting a zip file requires them to download before they can know what's in it. 08:19:39 epos_ingv_team has joined #dxwg 08:19:47 zip file might contain xml, which follows the OMXML application of GML - is it zip, xml, gml or omxml?? 08:20:11 q+ 08:20:21 Makx: the solution could be to have a required field that tells what's in the zip file. 08:20:26 ack Jaroslav_Pullmann 08:20:42 present+ 08:21:02 Jaroslav_Pullmann: the issue is the type of file 08:21:31 q+ 08:21:32 Makx: the media type is not the issue, you need to say what's inside 08:21:53 ack SimonCox 08:22:18 q+ to say that OMXML is propile of GML that is a profile of XML 08:22:39 SimonCox: taking an example fro geospatial data, if you have observational data in GXML, that might all be zipped up. So even a serialization description can use four different file types. If you impose too rigid a number of levels, you can get tied up in knots. 08:22:53 q? 08:22:59 SimonCox highlights the issue of profiles within a serialisation within a package 08:23:06 ack antoine 08:23:45 antoine: Makx , do you think one might do a csv distribution that is explicitly listed as containing csv? 08:23:52 q+ 08:23:55 q+ 08:23:55 q+ 08:23:57 q? 08:24:18 ack LarsG 08:24:18 LarsG, you wanted to say that OMXML is propile of GML that is a profile of XML 08:24:21 ack dsr 08:24:30 Lars: oops, I missed that, can you fill it? 08:24:45 q? 08:24:58 q+ to note a design that came up in packaged webapps - https://www.w3.org/TR/2013/WD-app-uri-20130516/#fragment 08:25:01 lars: this is media type vs content type 08:25:07 SimonCox: there's also overlap with the use case from yesterday about dataset types 08:25:22 ack LarsG 08:25:27 ack SimonCox 08:25:34 ack LuizBonino 08:25:50 q+ 08:25:55 LarsG: Just iterating myself, SimonCox's point is about media types vs profiles and then in this particular case we have the packaging, too 08:26:01 ack danbri 08:26:01 danbri, you wanted to note a design that came up in packaged webapps - https://www.w3.org/TR/2013/WD-app-uri-20130516/#fragment 08:26:02 LuizBonino: we should focus on the actual data we want. We could have the distribution layered, so that the top layer explains the content format. 08:26:13 q? 08:26:14 q+ 08:26:15 This sounds like a question of the model around metadata for datasets, distributions and data record. If we don’t want to model the structure of the resources in DCAT, we need a way to identify the profile/schema language that can, e.g. as a hierarchy of typed resources 08:26:36 ... and I could add that content type should indicate the media type, not the packaging 08:26:39 multipart mime-types? 08:26:39 q+ to talk about Web publications 08:26:41 danbri: schema.org tried to handle this. URI scheme for pointing into the contents of a zip file. 08:26:46 ack Keith 08:27:07 ack Jaroslav_Pullmann 08:27:12 Keith: consider content negotiation 08:27:30 Jaroslav_Pullmann: it could be integrated into it 08:27:31 ack phila 08:27:31 phila, you wanted to talk about Web publications 08:27:56 -> https://www.w3.org/publishing/groups/publ-wg/ Publishing Working Group 08:28:08 annette_g, rather - at Google we tried to build something using DCAT-or-Schema.org and CSVW, but when we had *multiple* linked CSV tables and a csvw-meta.json file - we didn't understand how to model those as a "distribution" unless it was zipped into one file. 08:28:20 Phila: a new WG is starting, publishing working group. They want a single URI for a whole package. 08:28:50 q+ 08:29:10 ack Makx 08:29:17 (here's a sketch of csvw for a *single* table treated as the main topic of a Dataset ... not sure if it is better done as a distribution or distributions. https://gist.github.com/danbri/154e1b98240fe7fe60c26bd5c04d1325 ) 08:30:10 q? 08:30:16 Makx: I just have this simple use case that people are actually struggling with. We can address that or try and solve everything. Do we want to provide a simple solution for simple cases, or not? 08:30:29 kcoyle: are we ready to vote? More comments? 08:30:33 Silence ensues 08:30:46 PROPOSED: Accept UC ID1 08:30:49 +1 08:30:50 +1 (to accept all use cases ;-) ) 08:30:54 +1 08:30:54 +1 08:30:54 +1 08:30:54 +1 08:30:55 +1 08:30:57 +1 08:30:58 +1 accept all use cases 08:30:59 +1 08:30:59 +1 08:31:01 +1 08:31:02 kcoyle: +1 08:31:03 0 (not currently empowered to vote) 08:31:13 following Phil's link, -> https://www.w3.org/2017/04/publ-wg-charter/ "Packaged Web Publications" -> https://w3ctag.github.io/packaging-on-the-web/ for TAG work on this. 08:31:15 +1 08:31:22 +1 08:31:40 RESOLVED: Accept UC ID1 08:31:41 Maybe we should change the motion to 'move on to brief discussion of next use case' ;-) 08:31:48 +1 08:31:51 +1 08:32:02 RRSAgent, draft minutes v2 08:32:02 I have made the request to generate http://www.w3.org/2017/07/18-dxwg-minutes.html phila 08:32:08 Just noting that the discussion of content vs packaging is essentially http Content-Type vs Content-Encoding 08:32:27 kcoyle: next is use case 25, synchronized catalog information 08:32:27 RRSAgent, make logs public 08:32:31 RRSAgent, draft minutes v2 08:32:31 I have made the request to generate http://www.w3.org/2017/07/18-dxwg-minutes.html phila 08:33:30 Jaroslav_Pullmann: this is about how the data might be published and whether there are restrictions. 08:33:38 q+ 08:33:47 It would prevent copying of the dataset without paying a license fee, etc. 08:34:13 q+ 08:34:14 We talked to some customers who are interested in this. 08:34:30 This holds for the open data domain as well. 08:34:31 ack PWinstanley 08:35:02 q+ to talk about ResourceSync 08:35:03 (more re dataset packaging, https://w3ctag.github.io/packaging-on-the-web/#downloading-data-for-local-processing ) 08:35:06 PWinstanley: events for transitions are relevant here 08:35:07 q+ 08:35:25 kcoyle: are you offered get to create a use case? 08:35:37 ACTION: Peter will create a use case for event,transition 08:35:38 PWinstanley: I guess I am 08:35:38 Created ACTION-21 - Will create a use case for event,transition [on Peter Winstanley - due 2017-07-25]. 08:35:44 q? 08:36:12 Makx: Jaro, are you really talking about dataset descriptions or the data itself? 08:36:24 Usually people don't mind having descriptions shared. 08:36:30 q- 08:36:37 ack ma 08:36:51 ack me 08:36:51 phila, you wanted to talk about ResourceSync 08:36:54 Jaroslav_Pullmann: this is regulating access to the metadata. And distribution of it. 08:37:15 -> http://www.openarchives.org/rs/toc ResourceSync 08:37:44 phila: I went to Geneva and while there somebody gave a tutorial about "resource sync", which enables you to have a master an distributed copies with access control. 08:38:02 q+ 08:38:08 I think that would be outside the scope of DCAT, but it's a useful discussion point to say here's how to do this. 08:38:28 ack antoine 08:38:38 s/somebody/Herbert van de Sompel/ 08:38:43 riccardoAlbertoni has joined #dxwg 08:39:05 Makx: wonders whether we should mention this in data identifications. 08:39:25 phila: yes, it does link to that, that would be another way to do it. 08:39:33 q? 08:39:47 S/Makx/Antoine/ 08:40:16 https://www.w3.org/TR/ldn/ 08:40:26 q+ 08:40:33 kcoyle: do we want to vote to accept or to declare it out of scope? 08:40:36 ack Keith 08:40:37 s/data identifications/Linked Data Notifications/ 08:40:38 s/this in data identifications/Linked Data Notifications 08:40:43 q+ 08:40:47 Keith: to what extent can we use license and rights for this? 08:41:09 In theory it should be possible to specify with license and rights. 08:41:17 q+ to talk about licensing of metadata 08:41:24 Jaroslav_Pullmann: it's harder to get it to apply to the metadata 08:41:35 present+ 08:41:52 q? 08:41:55 ack ma 08:41:58 q? 08:42:29 Makx: reacting to Keith, the licenses and rights in DCAT are for the data, not the metadata. You can assign through catalog records, but that's not correctly done. 08:42:45 q? 08:43:00 Keith: what about considering a catalog as a dataset? 08:43:08 Makx: catalogs are not datasets. 08:43:14 q+ is redhat linux v5 a dataset or a catalog? 08:43:24 Phila: that's fighting words ;) 08:43:34 q+ to ask whether redhat linux v5 is a dataset or a catalog? 08:43:35 q? 08:43:40 q+ 08:43:41 ack LarsG 08:43:41 LarsG, you wanted to talk about licensing of metadata 08:43:58 ack danbri 08:43:58 danbri, you wanted to ask whether redhat linux v5 is a dataset or a catalog? 08:44:18 danbri: is red hat distribution number 5 a dataset or catalog or both? 08:44:20 q- 08:44:26 Makx: I don't know 08:44:58 If you have an edge case, then you have to look at it. But it doesn't help to start mixing up the hierarchy of DCAT. 08:45:16 LuizBonino: isn't that what we are doing here`? 08:45:28 kcoyle: well, we don't want to break what already exists 08:45:39 Makx: right, we don't want to upset people 08:45:42 q? 08:45:58 PROPOSED: Accept ID25 08:46:01 +1 08:46:08 +1 08:46:18 +1 08:46:20 -1 it think it's out of scope 08:46:23 I'm not interested to tear everything up and start over - I just want to know how to apply DCAT to a fairly obvious use case - packaged sofware distributions. It is totally fine to conclude that DCAT cannot handle this. 08:46:24 -1 08:46:27 -1 08:46:31 -1 08:46:31 My point is that in several cases, people are using DCAT in a certain way exactly because things are not clear or incomplete, and if we are block because of this the situation will persist. 08:46:47 +1 08:46:50 -1: neds further work 08:47:01 0 08:47:04 -1 08:47:08 +1 08:47:12 +1 08:47:14 @dan, we weren't talking about packing right now 08:47:27 s/packing/packaging 08:47:44 -1 08:47:48 Jaroslav_Pullmann: there is no statement of whether dataset metadata might be freely distributed. 08:48:38 Makx, I took you to be suggesting that Catalog and Dataset should be treated as owl disjoint forever. But we can pick this up elsewhere, you're right it's not core to this UC. 08:48:38 +1 08:48:42 Phila: resourcing and subscribing are application specific 08:48:53 q+ 08:49:33 phila: this is important, and we could publish a note on it if we want. 08:50:08 That would be valuable. But it is out of scope for updating DCAT and profiles work 08:50:18 PROPOSED: UC ID25 is out of scope 08:50:22 +1 08:50:25 q+ 08:50:26 +1 08:50:28 -1 08:50:28 -1 08:50:28 +1 08:50:33 -1 08:50:39 -1 08:50:39 q? 08:50:40 +1 08:50:43 -1 08:51:01 -1 meaning I think it is in scope 08:51:07 -1 08:51:18 Dsr: looking at the text, you have catalogs and datasets. I given resource may appear in multiple categories. This use case seems orthogonal. 08:51:27 ack dsr 08:51:29 q+ 08:51:30 ack alejandra 08:51:37 S/I given/A given/ 08:51:52 q+ 08:52:27 alejandra: if the use case is just about synchronization, I agree that it's out of scope. 08:52:39 catalogues need to remain in sync with the data sets they describe, but the possibility that a given dataset/distributions are in multiple catalogues is orthogonal to that. 08:52:55 Jaroslav_Pullmann: representing the relationship would be in scope. 08:53:07 q+ 08:53:18 ack Keith 08:53:35 S/Jaroslav_Pullman/Alejandra/ 08:54:23 Keith: the relationships can be very complex, and that's quite typical. People do that to get exposure. I think it is a relevant use case. 08:54:39 ack Thomas 08:54:46 q+ 08:55:49 Thomas: is trying to find a way to address the example of when you want to describe the current state of something that has a history. 08:55:49 ack antoine 08:57:21 antoine: I made this to address policy aspects, like encryption. We only have one case on access policies (17), and maybe that's a bit too narrow. The problem of having the dataset in different catalogs may also create situations that need to be addressed. This wasn't about synchronization. 08:57:40 @Jaroslav, happy to help on ID25 08:57:43 ack Jaroslav_Pullmann 08:57:56 kcoyle: maybe Jaroslav_Pullmann can edit 08:58:23 thanks Makx! I'll supply a new proposal 08:58:29 @Jaroslav, happy to help on ID25 08:58:34 ACTION: on Jaroslav_Pullmann to edit and bring back to group 08:58:34 Error finding 'on'. You can review and register nicknames at . 08:58:38 phila: There was a paper/talk on duplicate entries in catalogues at the SDSVoc workshop by the folks working on the EU data portl https://www.w3.org/2016/11/sdsvoc/agenda#p24 08:58:52 kcoyle: next is 32, relationships between datasets 08:59:00 q? 08:59:21 alejandra: this relates to the previous discussion, relationships between datasets. 08:59:25 riccardoAlbertoni_ has joined #dxwg 08:59:54 q? 08:59:56 We need the ability to represent relationships and aggregations, etc. 09:00:04 Versioning is also related to this. 09:00:08 q+ 09:00:40 Thomas: how generic do you see this use case? 09:00:46 ack t 09:00:56 q+ 09:01:08 Jaroslav_Pullmann_ has joined #dxwg 09:01:15 alejandra: it may not be possible to identify all the relationships in advance, but some are known. 09:01:47 Thomas: we will need some guidance on this. I'm worried about everyone adding all relationships. 09:01:54 q+ 09:02:05 ack DaveBrowning 09:02:08 kcoyle: the question is whether we need to have some control over at least a core of relationships 09:02:09 http://patterns.dataincubator.org/book/qualified-relation.html 09:02:09 q+ to talk about relationship management 09:02:55 DaveBrowning: this is somewhat like provenance. 09:03:08 q+ 09:03:15 I don't see how this would end up being expressed in the same way. 09:03:19 q- later 09:03:40 Thomas: it's a nightmare for governance 09:03:46 q+ 09:03:48 q+ 09:03:49 kcoyle: we don't have to have an answer today 09:03:59 ack ma 09:04:00 ack Makx 09:04:23 Makx: this is one of the most hairy issues for application profiles. We were hoping this group would have an answer for it. 09:04:40 +1 to Makx 09:04:49 q? 09:04:52 DCAT doesn't consider any relationship between datasets. This is an opportunity. 09:04:55 Makx, where's the dcat-ap mailing list archive? 09:04:55 Q+ 09:05:01 ack me 09:05:01 phila, you wanted to talk about relationship management 09:05:03 ack phila 09:05:48 ack Jaroslav_Pullmann_ 09:05:50 q+ 09:05:54 phila: +1 to Makx. The relationships are important to state, but they may not be the same across domains. We do have to say something. We provide a framework and a few explicit ones, then it's up to profiles 09:06:05 q? 09:06:15 +1 to Makx. We shouldn't shy out because things are complex. In my opinion is better to face complexity and try to come up with a generic and elegant solution than to over simplify and come up with a useless solution. 09:06:24 Jaroslav_Pullmann: DCAT so far does not have structure within the datasets, like aggregation of data. 09:06:29 Q- 09:06:32 q+ to ask about a core set of relation types 09:06:40 ack Keith 09:07:02 Keith: I agree with phila. A framework can have the role and temporal bounds. 09:07:07 ack alejandra 09:07:43 q+ 09:07:44 alejandra: I put a link for data cite. They have some things that we could consider as solutions. 09:07:46 ack LarsG 09:07:46 LarsG, you wanted to ask about a core set of relation types 09:07:51 q+ 09:08:19 LarsG: do we define a few datatypes or leave everything to profiles? Maybe we adopt what datacite already has. 09:08:23 ack Thomas 09:08:57 ack Keith 09:09:03 Thomas: I agree with Phil, we define a few and let people describe in profiles. 09:09:22 Keith: reasons of privacy and security say you should keep that stuff separate. 09:09:33 annette_g: I think LarsG said relation-types, rather than datatypes? 09:09:48 PROPOSED: accept ID32 as in scope 09:09:50 +1 09:09:52 +1 09:09:52 +1 09:09:52 +1 09:09:53 +1 09:09:53 +1 09:09:53 +1 09:09:54 +1 09:09:55 S/datatypes/relation types/ 09:09:55 +1 09:09:56 +1 09:09:58 +1 09:10:01 +1 09:10:02 +1 09:10:11 +1 09:10:20 RESOLVED: accept ID32 as in scope 09:10:54 kcoyle: ID34 is next 09:11:02 More about relationships 09:12:13 https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID34 09:12:23 Actually, I don't agree that a CSV and the Excel file it came from are the same, *unless* the annotations are copied in the CSV as well 09:12:28 q? 09:12:46 q+ 09:12:47 q+ 09:13:02 q+ 09:13:09 q+ 09:13:10 ack Jaroslav_Pullmann_ 09:13:11 q- 09:13:27 scribe: Caroline_ 09:13:33 I think that it is related to how strict is the definition of dataset. For instance, if a distribution of a dataset contain different data points, wouldn't be different datasets or version of the dataset? 09:13:43 Jaroslav_Pullmann_: this is a distribution but it is not a change of the dataset 09:13:48 scribeNick: Caroline_ 09:13:49 the data should remains the same 09:14:11 q? 09:14:19 a subset of the data is prefigure 09:14:32 q+ 09:14:36 I support the idea that the data should remain the same because the distribution is just the interface 09:14:37 ack DaveBrowning 09:14:45 DaveBrowning: I agree with Jaroslav_Pullmann_ 09:14:57 annette_g_ has joined #dxwg 09:15:20 q? 09:15:44 q+ 09:15:44 we introduced the idea that all the distributions of datasets ?? 09:15:45 q- 09:15:57 it does make easier to discover 09:16:10 the same thing in each dataset 09:16:30 the distributions telling you in different ways seems to work 09:16:43 q? 09:16:44 relationships among distributions 09:16:46