08:14:39 RRSAgent has joined #dxwg 08:14:39 logging to http://www.w3.org/2017/07/17-dxwg-irc 08:15:39 Caroline_ has joined #DXWG 08:15:50 Present+ 08:16:16 LarsG has joined #dxwg 08:16:22 present+ 08:16:34 Introductions... 08:16:41 roba has joined #dxwg 08:16:47 newton has joined #dxwg 08:17:58 Can't hear :-( 08:18:28 AndreaPerego has joined #dxwg 08:18:33 rrsagent, set logs public 08:18:38 present+ AndreaPerego 08:18:43 present+ 08:18:48 present+ 08:18:52 meeting: DXWG Oxford Face to Face 08:19:26 Can hear you reasonably well Karen, but it dropped out past danbri 08:19:43 annette_g has joined #dxwg 08:19:43 chair: Karen 08:19:50 Jaroslav_Pullmann has joined #dxwg 08:19:55 present+ 08:19:55 present+ Dave_Raggett 08:20:04 Present+ annette_g 08:20:18 LuizBonino has joined #dxwg 08:20:29 present+ 08:20:37 scribenick: caroline_ 08:20:42 * agenda says check IRC for pw ... IRC says see todays agenda... 08:21:17 Scribe: Caroline_ 08:21:52 Topic: Introductions 08:21:52 kcoyle: the main goal for our F2F is to discuss the UCR 08:22:18 present Rob Atkinson 08:22:44 s/present/present+/ 08:22:45 ... Caroline and I tried to categorize them. If we get to a Use Case and think it is in another category we just move it 08:22:48 present+ Rob_Atkinson 08:23:31 ... the idea is to get through all of them even though if we don't get resolutions about all we have listed 08:23:48 https://www.w3.org/2017/dxwg/wiki/Main_Page#Working_Documents -> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space & https://www.w3.org/2017/dxwg/wiki/Use_Cases_and_Requirements 08:23:56 ... if we need we may finish some of them afterwords 08:24:07 Keith has joined #dxwg 08:24:13 s/afterwords/afterwards/ 08:24:34 Topic: DCAT and "dataset" 08:24:49 q+ 08:24:51 kcoyle: the first one we are going to discuss is https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID8 08:24:53 q? 08:25:11 ack SimonCox 08:25:48 SimonCox: looking the version one of DCAT there is no ?? 08:26:14 ... to the extended DCAT part of what we are looking at is part of dublincore 08:26:21 https://www.w3.org/TR/vocab-dcat/#introduction """Data can come in many formats, ranging from spreadsheets over XML and RDF to various speciality formats. DCAT does not make any assumptions about the format of the datasets described in a catalog. Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information.""" 08:27:20 ... what is the scope for DCAT descriptions? 08:27:22 ... also dataset 08:27:43 ... we recommend the use of existing DCAT recommendations 08:27:51 DCAT alludes to http://dublincore.org/documents/2003/02/12/dcmi-type-vocabulary/ """(Dataset) A dataset is information encoded in a defined structure (for example, lists, tables, and databases), intended to be useful for direct machine processing.""" 08:28:15 ... the original dublicore metadata 08:28:32 q? 08:28:38 ... the description of the use case is above 08:29:07 ... it is clear as well as the requirements "Guidance on use of dc:type or similar for DCAT records. Recommendation on content-type vocabularies." 08:29:19 annette_g has joined #dxwg 08:29:24 q+ 08:30:04 Jaroslav_Pullmann: Is this still a dataset or is any resource which is not anymore a dataset? 08:30:13 ... I support the dataset 08:30:19 About the different resource types in different metadata standards, I prepared a summary table (incomplete): https://docs.google.com/spreadsheets/d/1nlAgLUGQcBe40oTk5WNCVz-6rud1JtLwjoYyyqAT45U/edit?usp=sharing 08:30:24 q? 08:30:24 ... it should be more than separetaly 08:30:41 s/separetaly/separately/ 08:30:47 ack Makx 08:30:48 s/separetaly/separately 08:31:01 Q+ 08:31:38 Makx: I am against of limiting the scope of what DCAT dataset is 08:31:39 q? 08:31:42 q+ 08:31:42 +1 to Makx 08:31:43 q+ 08:31:46 ack annette_g 08:31:49 ... I am in favor of using vocab to say what dataset is 08:32:09 annette_g: I think the use case approach should come down to actual use cases 08:32:24 s/here/hear 08:32:25 ... some of the use cases are questions 08:32:41 q? 08:32:48 ... we may consider those as separate questions 08:32:53 q+ 08:33:06 LuizBonino: I like the idea to be able to describe diferent types of information as assets 08:33:20 q? 08:33:26 ack LuizBonino 08:33:29 ack antoine 08:34:10 antoine: it seems this use case is to describe what is the dataset but it can also be understood about the context 08:34:19 q? 08:34:21 q+ to suggest that ANY collection of 0s and 1s (including empty collection) can be treated as a dataset; "dataset" is about how the data is handled/treated/managed, not an intrinsic property. 08:34:25 ack alejandra 08:34:40 alejandra: I think it is important to discuss the scope of the use cases 08:34:50 ... make sure that we provide guidance on the type 08:35:03 ... I agree with the Use Case and I think we need to consider it 08:35:06 q+ 08:35:08 the problem with using 'type' is that 'type' may be made up of many different attributes 08:35:23 q? 08:35:26 Keith++ 08:35:48 q? 08:35:52 ack danbri 08:35:52 danbri, you wanted to suggest that ANY collection of 0s and 1s (including empty collection) can be treated as a dataset; "dataset" is about how the data is handled/treated/managed, 08:35:55 ... not an intrinsic property. 08:35:59 q? 08:36:07 ack Makx 08:36:19 Makx: the definition of dataset 08:36:37 ... has to be cured 08:36:42 curated 08:36:42 seems to me the main thing is not to try to define it now - but to decide if we will maintain (or adopt) a list of types 08:36:44 s/cured/curated/ 08:36:58 ... I think it is important to clear it up 08:37:00 q? 08:37:04 danbri: I think we agree 08:37:13 ... is about the curation of the process around data 08:37:19 +1 for makx and dan 08:37:24 maybe this is useful: software vs data https://github.com/danielskatz/software-vs-data 08:37:24 accept 08:37:41 [I agree with Makx that being a dataset is around the social context surrounding data, not the data itself] 08:37:41 kcoyle: can we accept the use case ID8 as it is? 08:38:02 q+ 08:38:25 annette_g has joined #dxwg 08:38:27 Jaroslav_Pullmann: we can just accept it 08:38:40 +1 to Jaroslav 08:38:45 ... there are questions that are not stated on the use case 08:39:05 q? 08:39:08 S/can/can't/ 08:39:18 ... maybe we could check others use case related to see the requirements and descriptions to see if they complete themselves 08:40:00 kcoyle: let's check the use case ID20 https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID20 08:40:09 kcoyle: we ante to be able to specify a type 08:40:24 ... we are probably going to have to point to a small number of recommended vocabs 08:40:38 q? 08:40:49 ... given that, could we vote on the ID8 and ID20 at the same time? 08:40:52 The link to parse.insight in the use-case description was unhelpful - I've corrected it 08:40:54 antoine: I think it would be all right 08:41:08 ... maybe SimonCox could explain 08:41:38 PWinstanley has joined #dxwg 08:41:51 SimonCox: there are a lot of diferent file types 08:41:55 present+ PWinstanley 08:42:07 ... they call content type 08:42:11 dataset type != encoding type - dataset may be exposed in many encodings 08:42:21 ... there are diferent formats of media type 08:42:46 antoine: on the web context content type uses media type 08:42:46 [media type could be .Z (application/x-compress, LZW) in the case of the Web History collection https://www.w3.org/History/1992/timbl-floppies/TimBerners-Lee_CERN/hype.tar.Z ] 08:42:48 a problem with the concept of dataset concerns streaming data because of its continuity: is the dataset the whole thing or a defined 'window' 08:42:49 s/diferent/different 08:42:55 SimonCox: I am talking about semantic oriented 08:43:08 q? 08:43:11 ... the language chosen is certain conflicted 08:43:14 ack an 08:43:20 ... talking about content type 08:43:22 should definitely change "content-type" wording in Use Case 08:43:51 we are talking about the range of dc:type 08:43:51 Is it the "nature" of the dataset instead of how it is serialised, right? 08:44:08 Right; that's how I perceive it also 08:44:09 ... the dublincore descriptions from 20 years ago recognize datasets chich are images, maps, etc 08:44:24 s/etc/spreadsheets, et 08:44:27 s/et/etc 08:44:31 q? 08:44:40 ... there is a strong sense the images are different 08:44:40 s/datasets chich/datasets which/ 08:44:52 antoine: I accept it 08:45:01 q+ 08:45:02 https://en.wikipedia.org/wiki/List_of_HTTP_header_fields 08:45:10 ...as someone suggested to put a small note saying it 08:46:03 q+ 08:46:09 kcoyle: we have mentioned something that was not discussed on the use cases 08:46:17 SimonCox: it says that in the use case 08:47:28 q+ 08:47:29 kcoyle: are we at a point that caould we vote on this 08:47:35 Jaroslav_Pullmann: we should merge them 08:47:41 s/caould/could/ 08:47:42 +1 to merge them 08:48:10 Sorry, merge what? 08:48:12 Makx: reminded us that we could merge only the requirements 08:48:16 +1 to merging reqs 08:48:24 the uses cases ID8 and ID20, AndreaPerego 08:48:30 +1 to keeping the use case separated (they were contributed separately) but having the requirements consolidated. 08:48:37 q- 08:48:41 +1 08:48:51 q+ 08:49:04 ack Jaroslav_Pullmann 08:49:14 +1 to merge reqs - this will drive DCAT 1.x - keep use cases separate for record keeping 08:49:27 Jaroslav_Pullmann: if we are looking for audiences we have differents 08:49:43 ... they were not in the discussions. That was my motivation to merge them 08:49:56 q+ to ask if it's just about to accept or decline use cases 08:50:00 ... it might be interesting for researchers to see them merged 08:50:27 ... if we talk about access the question is if are we talking about datasets 08:50:32 Ine_ has joined #dxwg 08:50:47 ... we should be talking always about digital access resources 08:51:00 ... the access would be only by protocols 08:51:32 ... the definition of data maybe also about non digital data. It can be anything. So we must be sure to be talking about data accessible 08:51:38 ack Thomas 08:51:48 [is there anything DCAT can't describe? :] 08:51:53 Thomas: these two use cases coul be anout anything 08:52:18 ... the discussion about content type and so on is part of content negotiation 08:52:18 s/coul be anout /could be about / 08:52:31 q? 08:52:53 ... agree with Jaroslav_Pullmann to merge the requirements 08:52:55 +1 danbri 08:53:22 Jaroslav_Pullmann: is the purpose is to have a history we should merge only the requirements 08:53:32 q+ 08:54:02 Jaroslav_Pullmann: sometimes the use cases are very valueable 08:54:22 s/valueable/valuable/ 08:54:23 ... it is important to have reports of what we are missing 08:54:35 kcoyle: if you feel there is a use case missing, please create it 08:54:38 ack AndreaPerego 08:55:08 annette_g_ has joined #dxwg 08:56:38 AndreaPerego: we should consider include descriptions or resources that are not data 08:56:43 q+ 08:56:45 ack LarsG 08:56:45 LarsG, you wanted to ask if it's just about to accept or decline use cases 08:57:08 LarsG: I have a metaquestion. are we discussin the merging and how to proceed? 08:57:10 q- 08:57:37 ... we discussed that in a call and agreed to keep the use cases separeted and merge the requirements 08:57:57 ... alo a catalogue should be considered 08:58:36 Proposal will follow here 08:58:56 q+ 08:58:59 I agree that ID20 partly elaborates ID8, but it is only the requirements arising from these that matters in the end! 08:59:16 PROPOSAL: to accept the use cases ID8 and ID20 as they are 08:59:24 q- 08:59:36 The use-cases stay on the books so that we can check at the end if the products solve the use-cases 08:59:40 Q+ 08:59:48 +1 o Simon 08:59:58 s/o/to 09:00:17 +1 09:00:32 kcoyle: is up to the group to drive requirements 09:00:43 PROPOSAL: to accept the use cases ID8 and ID20 as they are 09:00:45 +1 09:00:45 +1 09:00:45 +1 09:00:47 +1 09:00:47 +1 09:00:47 +1 09:00:48 +1 09:00:48 -! 09:00:49 +1 09:00:49 +1 09:00:50 +1 09:00:50 +1 09:00:51 +1 09:00:51 +1 09:00:52 +1 09:00:52 +1 09:00:53 -1 09:00:53 +1 09:00:56 +1 09:00:56 with or without the requirement part? 09:00:57 +1 09:00:57 +1 09:01:14 antoine without for now 09:01:20 annette_g_: I still have a concern about the ID8 being a use case 09:01:21 ok then +1 09:01:26 ... it is too general 09:01:28 Philippe has joined #dxwg 09:01:35 ... I feel the use cases should be concrete 09:02:09 kcoyle: annette_g_ do you volunteer to rewrite it? 09:02:21 SimonCox: I agree that annette_g_ do it 09:02:57 PROPOSAL: to accept the use cases ID8 with edits that annette_g_ will provide and ID20 as it is 09:03:09 +1 09:03:10 +1 09:03:12 +1 09:03:12 +1 09:03:12 +1 09:03:13 +1 09:03:13 +1 09:03:14 +1 09:03:14 +1 09:03:15 +1 09:03:15 +1 09:03:15 +1 09:03:16 +1 09:03:17 +1 09:03:17 +1 09:03:19 +1 09:03:19 +1 09:03:20 present+philippe_roccaserra 09:03:24 +1 09:03:32 +1 09:03:38 +0 09:03:39 RESOLVED: to accept the use cases ID8 with edits that annette_g_ will provide and ID20 as it is 09:03:56 philippe keep the space after + 09:04:05 sorry; it works 09:04:17 (still getting used to IRC) 09:04:25 IMO we should be quite generous in accepting use-cases, since these exemplify concerns in the community. The more challenging part is distilling the _requirements_ and consolidating these where they overlap or duplicate. The requirements will drive the design of the products. 09:04:25 sorry I've abstained only because I've missed the explanation of how annette_g_ wanted to make the UC more concrete. 09:04:31 RRSAgent, draft minutes v2 09:04:31 I have made the request to generate http://www.w3.org/2017/07/17-dxwg-minutes.html AndreaPerego 09:04:39 THE USE CASE ID36 https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID36 09:04:56 s/THE USE CASE/ the use case/ 09:05:43 q+ 09:05:47 q? 09:05:51 Makx: Cross-vocabulary relationships is about the need that might be in the dcat about those other type of datasets 09:05:58 ack Jaroslav_Pullmann 09:05:58 +1 to Makx ( probably is just a matter of providing some examples..) 09:06:00 agree with Simon, accept all use cases and get on with the work of distilling requirements 09:06:04 [q: couldn't I distribute my qb:DataSet in either Turtle or RDF/XML syntaxes, each being a Distribution?] 09:06:11 Q- 09:06:13 q? 09:06:17 Jaroslav_Pullmann: I can reffer to the wikipage 09:06:41 ... Makx is right. Some schema.org consider the data being abstract 09:06:51 s/reffer/refer 09:06:58 q+ 09:07:17 ack roba 09:07:30 roba: I think it is an important use case 09:07:35 ... it is not just a distribuition 09:07:48 q+ to mention CSVW too 09:08:10 ... we should just double check that we create a situtation that can't be a dcat 09:08:13 +1 to roba 09:08:22 Makx: it is a litle bit more complicate than that 09:08:34 ... if you have a dataset as a datacube 09:08:47 ... the concept is almost the same, but now you have 2 implementation 09:08:51 q+ 09:09:24 ... one part would be of what dcat call a dataset 09:09:44 roba: I was saying that description can be a distribution 09:09:49 q- 09:09:55 q+ 09:10:24 ... just we don't get confused on describing data 09:10:30 ack danbri 09:10:30 danbri, you wanted to mention CSVW too 09:10:36 danbri: it is a very important problem 09:10:49 dataset/distribution: the problem is DCAT does not use the concepts conceptual, logical, physical - this would help 09:11:11 ... we have the choice of going of very specific things 09:11:29 ... seems that we have agreed with evey domain 09:11:33 ... we have to be pragmatic 09:11:49 s/evey /every / 09:12:04 ... if we are describing as a distribution then describe it as a distribution 09:12:15 q? 09:12:19 ... there is no right answer, but having concrete use cases might help 09:12:37 ack antoine 09:12:48 q? 09:12:51 ack Jaroslav_Pullmann 09:13:09 Jaroslav_Pullmann: if this would modif dcat standard 09:13:19 ... concepts of what this dataset is 09:13:34 ... if we agree that the dataset is abstract 09:13:52 ... with this notion in mind we should compare with other standards 09:14:03 ... these are the differences 09:14:10 ... comparing to shema 09:14:16 s/situtation/situation/ 09:14:24 [Dublin Core is scruffy and pragmatic where https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records#FRBR_entities is overly prescriptive; even scoped to libraries, having 4 mutually exclusive types has been hard. It feels like there's a lesson for describing data here.] 09:14:30 s/comparing to shema/comparing to schema/ 09:14:33 kcoyle: is this a use case we want to address? 09:14:48 PROPOSAL: accept the use case ID36 09:14:49 +1 09:14:51 +1 09:14:52 +1 09:14:52 +1 09:14:52 +1 of course 09:14:54 +1 09:14:54 +1 09:14:54 +1 09:14:55 +1 09:14:56 +1 09:14:57 +1 09:14:59 +1 09:14:59 +1 09:14:59 +1 09:15:00 +1 09:15:00 +1 09:15:01 +1 09:15:02 +1 09:15:04 +1 09:15:05 +1 09:15:05 +1 09:15:07 +2 09:15:15 RESOLVED: accept the use case ID36 09:15:30 scribe: DaveBrowning 09:15:50 I vote to accept all use cases. But then we will need to distill, and collate, the *requirements* implied by the use cases. 09:15:58 scribenick: dsr 09:15:58 +1 09:16:02 scribe: Dave_Raggett 09:16:37 Topic: DCAT data elements 09:16:59 We start with ID9, seehttps://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID9 09:17:12 which talks about Common requirements for scientific data 09:17:57 Andrea: this is a use case based upon experience at JRC 09:18:19 s/Andrea/AndreaPerego 09:18:38 We need to verify requirements for multidisciplinary scientific data 09:18:49 q+ 09:19:00 q+ 09:19:20 annette_g_ has joined #dxwg 09:19:23 Q+ 09:20:37 we want to be able to describe the context, inclding authors, lineage, usage, links to publications about the dataset and links to input data 09:21:08 s/inclding/including/ 09:22:16 ack PWinstanley 09:22:23 we should start with a link to the context, and later work on what we can describe in the context 09:22:35 q+ 09:23:00 ack Keith 09:23:13 PWinstanley: I would be very hesitant to distinguish scientific in the requirements, although its fine as a use case 09:23:24 +1 to Peter's concern about distinguishing "science" from non 09:23:27 +q 09:23:40 q+ 09:23:56 Keith: I would like to go further with a complex set of role bound properties 09:24:00 q+ to comment on the use of "scientific" in the use case 09:24:24 We need this additional layer if intelligent software is to make use if it effectively 09:24:54 ack annette_g_ 09:25:25 ack Jaroslav_Pullmann 09:25:27 q+ to note that data citation metadata is critical for data(set) discovery (+maybe "scholarly" can substitute for "science" in some places?) 09:25:31 Annette will extend the use case 09:25:51 Keith will generate an extended use case referencing ID9 emphasising relationships of dataset to many other entities with role and temporal limits 09:26:34 q+ 09:26:39 Jaroslav: for scientific datasets, there will be an appropriate set of metadata 09:27:05 ack alejandra 09:27:14 ACTION: Keith to generate an extended use case referencing ID9 emphasising relationships of dataset to many other entities with role and temporal limits 09:27:14 Error finding 'Keith'. You can review and register nicknames at . 09:27:27 q? 09:27:39