07:46:36 RRSAgent has joined #sdsvoc 07:46:36 logging to http://www.w3.org/2016/11/30-sdsvoc-irc 07:46:44 Zakim has joined #sdsvoc 07:47:04 meeting: Smart Descriptions & Smarter Vocabularies (SDSVoc) Day 1 07:47:06 chair: PhilA 07:47:17 agenda: https://www.w3.org/2016/11/sdsvoc/agenda 07:47:27 phila has changed the topic to: Smart Descriptions & Smarter Vocabularies (SDSVoc) Day 1 08:01:11 bgrova has joined #sdsvoc 08:04:45 nandana has joined #sdsvoc 08:05:07 PWinstanley has joined #sdsvoc 08:05:20 AndreaPerego has joined #sdsvoc 08:05:28 present+ PWinstanley 08:05:35 present+ AndreaPerego 08:05:42 LindavdB has joined #Sdsvoc 08:06:36 damires has joined #sdsvoc 08:06:53 Topic: Opening remarks 08:06:57 present+ nandana 08:07:03 present+ phila 08:07:19 Jacco and Phil made general opening welcomes 08:07:36 Topic: VRE4EIC Project, CERIF 08:07:41 oystein has joined #sdsvoc 08:08:11 Steven has joined #sdsvoc 08:08:46 LarsG has joined #sdsvoc 08:10:37 Keith: Follows slides which are self describing 08:10:39 deirdrelee has joined #sdsvoc 08:10:43 scribe: phila 08:10:50 scribeNick: phila 08:11:58 Tessel has joined #sdsvoc 08:12:41 DaveBr has joined #sdsvoc 08:12:55 RRSAgent, make logs public 08:13:05 RRSAgent, draft minutes 08:13:05 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 08:13:11 BartvanLeeuwen has joined #sdsvoc 08:13:17 riccardoAlbertoni has joined #sdsvoc 08:15:54 Keith: Talks about context of a citation, which includes many facets 08:18:02 Dom_Fripp__Jisc has joined #sdsvoc 08:18:03 ... Gets in a slight dig about 'the later Dublin Core' 08:18:29 jrvosse has joined #sdsvoc 08:20:06 antoine has joined #sdsvoc 08:20:50 Dom_Jisc has joined #sdsvoc 08:23:15 stefano has joined #sdsvoc 08:25:11 [Not making many notes here as Keith's slides are comprehensive] 08:30:35 AndreaPerego: You said you have mapping, Keith. Temporal dimension seems to be missing? 08:31:03 keith: Yes, it's missing and we know that. Metadata often ignores temporal 08:31:12 AndreaPerego: Do you plan to add, maybe using PROV? 08:31:41 Keith: We're working on Prov-o with Kerry Taylor, and there's an ENVRI+ project 08:31:59 PeterW: Acronym hell. RDA means something else 08:32:10 alejandra has joined #sdsvoc 08:32:11 Keith: Sorry, yes. resource Description 08:32:29 PeterW: So RDA is the body to use for this? 08:32:44 Keith: I did point out the acronym clash 08:33:35 ThomasDH: You talked about locations and persons etc. In the EU context we have the Core Vocs. I hope we can merge? Across Govt and science? 08:33:46 ... Don't want differnet standards on different levels. 08:34:05 Keith: Yes, the VRE4EIC project has that in its sights. CERIF has concept of declared semantics. 08:34:25 ... Doesn't say you must use these semantics, but provides containers for semantics 08:34:58 s/RDA means something else/RDA means something else (Resource Description and Access) 08:35:01 s/Agaist/Against 08:36:41 s/PeterW/PWinstanley 08:37:52 Topic: Dataset Description Mode 08:38:11 AG: Talks about The HCLS Community Profile: Describing Datasets, Versions, and Distributions 08:38:14 deirdrelee has joined #sdsvoc 08:38:28 s/Dataset Description Mod/Dataset Description Models 08:38:41 AG: Talks about origin in OpenPHACTS project 08:39:10 ... Highlights ChemBL versioning issues 08:39:25 ... ChemBL was at version 13, but that number wasn't in our data 08:39:35 RRSAgent, draft minutes 08:39:35 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 08:39:50 DaveBr_ has joined #sdsvoc 08:40:07 AG: Still don't know if we used version 8 or 13 in OpenPHACTS 08:40:29 AG: Includes provenance feature so you can see where data items came from 08:40:48 ... Now using ChemBL 20 08:41:32 [Slides are self describing] 08:42:49 AG: contrasts DC and VOiD as opposite ends of spectrum, neither met requirements for HCLS 08:44:25 -> https://www.w3.org/TR/hcls-dataset/ Dataset Descriptions: HCLS Community Profile 08:45:59 stefano has joined #sdsvoc 08:47:35 AG: Talks about mandatory and optional properties 08:47:40 ... This requires tooling 08:47:50 ... Developed the validata tool, more on that tomorrow. 08:48:04 ... Several implementations of HCLS profile 08:48:42 AG: Emphasises thjat we need to know about versions 08:49:04 AxelPolleres has joined #sdsvoc 08:49:17 Q: Adopted beyond your community? 08:49:28 AG: Not aware of it but it is generic and could be 08:49:52 Q: In latest version of DCAT-AP coves some of what you say 08:50:03 Caroline_ has joined #sdsvoc 08:50:11 AG: isVersionOf didn't exist when we were doing this 4 years ago, glad it's in DCAT-AP 08:50:17 s/coves/covers/ 08:50:26 Jacco: Debate about whether version is in the URL? 08:50:47 AG: We don't say that, just that there should be different URLs for summary description, etc. 08:51:03 Topic: Andrea Perego Using DCAT-AP for research data 08:51:11 RRSAgent, draft minutes 08:51:11 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 08:51:29 sebneumaier has joined #sdsvoc 08:51:57 AndreaPerego: Introduces DCAT-AP 08:52:14 -> https://joinup.ec.europa.eu/asset/dcat_application_profile/description DCAT-AP 08:52:25 AndreaPerego: Introduces JRC 08:53:16 [Slide on JRC is self explanatory] 08:54:28 AndreaPerego: Talks about wide variety of methods and standards. Some people asking what metadata is 08:57:27 AndreaPerego: Talking about citations. Some people don't care about their data being cited. 08:58:54 AndreaPerego: Prov used for complex/complete info 08:59:00 ... On Data Citation 09:00:03 ... Data reproducability is important for policy as well as science 09:00:25 ... Did a mapping exercise between DCAP-AP and DataCite 09:01:30 ... Mostly good matches 09:02:44 ... Agent Roles seems particularly hard 09:02:54 ... May need a registry of roles to use across standards 09:03:48 AndreaPerego: Skips to Publishing metadata on the Web 09:03:58 ... Talks about mapping to schema.org 09:04:13 ... Identified some gaps. But do we need to fill those in schema.org? 09:04:28 ... Do we need to publish all our metadata, or just what improves visibility? 09:05:14 AxelPolleres: Is there any effort to endorse identifiers like ORCID? 09:05:49 ... The link with STORK etc. would be interesting, but there's no initiative AFAIK 09:06:04 Keith: Often IDs are associated with a role, like ORCID and Driving Licence info 09:06:26 Ivan: Force11 had their general principles. Did you match against those? 09:06:33 AndreaPerego: Yes, we have looked at that, and FAIR 09:06:46 ... Trying to address practical issues 09:06:55 Ivan: Sure they're at a higher level 09:07:01 RRSAgent, draft minutes 09:07:01 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 09:07:13 Topic: The Metadata Ecosystem of DataID 09:07:30 Markus: I'm release manager of DBPedia 09:07:43 ... We have a lot of data in our releases 09:07:50 [Slides include text] 09:11:43 q+ to ask about MR licences and agent names 09:12:24 FWIW, further to my question… there seem to have been some efforts to e.g. link STORK (national eIDs) to ECAS, cf. https://www.eid-stork.eu/index.php?option=com_content&task=view&id=253&Itemid=83 … the reason why I had asked about links to ORCID is that many of the information you have to provide to the EU for ECAS overlap with info covered in ORCID, e.g. publications, grants, etc. 09:16:24 [Slides still self-explanatory] 09:17:36 Markus: Talks about core and extensions in DataID for things like statistics, you need extra fields 09:20:05 KevinDCC has joined #sdsvoc 09:22:09 phila: You used ODRL a little, but not a lot. Is it lacking? 09:22:19 Markus: Nothing fixed yet, open to change 09:22:31 phila: GOod - ODRL on Rec Track now so speak up! 09:22:33 q- 09:22:46 Topic: Towards a Common Description Vocabulary for Industrial Datasets 09:23:03 CM: Work with Soeren Auer 09:23:56 CM: Introduces Smart Services and Industry 4.0 09:24:19 ... Talks about needing to be aware of privacy and some control over data 09:24:50 ... Want to build reference architecture for secure data infrastructure, retaining sovereignty 09:25:26 CM: IDS = Industrial Data Spaces 09:26:31 CM: Industrial Data Space vocab as glue to capture domain-spcific semantics 09:26:43 newton has joined #sdsvoc 09:26:50 [Slide self explanatory] 09:29:04 CM: IDS defining own protocol 09:31:55 pascaline has joined #SDSVoc 09:32:29 -> http://ids.semantic-interoperability.org/ The Industrial Data Space Metadata Vocabulary 09:34:58 Q: How specific is this to industrial data? Can it work in other domains? 09:35:29 stefano has joined #sdsvoc 09:35:32 CM: It's about requirements, like security. Things like which vocabs to use for different tasks, not domains 09:36:05 PeterW; have you though of entity resolution. What metadata to associate with their data? They may not know. 09:36:32 CM: No, we've not looked at that. We want to partner with data publishers and help them make their data more easily found on the Web. 09:36:50 Topic: Loupe - An RDF Dataset Description Model for Expressing Vocabulary Usage Patterns 09:37:00 RRSAgent, draft minutes 09:37:00 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 09:38:20 nandana: 2 use cases we have problems with 09:38:32 ... Discovery, I don't want to spend a lot of time searching. 09:39:03 ... data for training machine learning is hard to find automatically 09:39:45 nandana: Another use case - if Imn an ontology engineer, I'd like to see how my vocab has been used in a dataset, to see if my conceptualisation matches reality 09:39:55 ... e.g. where is the SSN Ontology used? 09:40:23 ... This can be done using LOD stats but if you want to know how they were used, ranges etc. that's harder 09:40:55 [slides descriptive] 09:42:31 nandana: Wraps up brief presentation and invites questions 09:43:03 AG: Can you give a statistical report about which properties and classes are linked 09:44:00 Topic: Discussion 09:44:17 Keith: Big range of topics. Expressivity etc. 09:44:39 makx: I was hearing things like there is this standard, but it didn't work for me. 09:45:06 ... You need to be aware - we need to try and solve a problem. DC tries to ;look at common problems, CERIF tries to go into depth 09:45:39 Makx: We spent most time trying to solve common problems. DC and DCAT start simple and then people complain that things are missing 09:45:43 ... You can extend 09:46:06 ... You come up with different requirements and you soon come up with 50 properties that are never used 09:46:19 MF: I agree. The general approach of DCAT has its benefits 09:46:37 ... But there is important data misisng when we handle datasets. For e.g. more specific prov info 09:46:52 ... Basic pattern of catalogue, dataset and distribution has prevailed 09:47:04 ... But we shojld look at how to improve DCAT and that's whey we're here 09:47:23 AG: In HCLS we didn't want to cme up with a standard, just a profile that used existing ones 09:47:51 PeterW: I find lots of people talking about differente metadata frameorks but less about the data that goes into them 09:48:13 ... Some ilustrations of marked up stuff. If I have a dataset, what are the frameworks that match the pattern of the data that I have 09:48:25 ... Maybe ML techniques can be used. 09:49:00 Keith: The papers have more. The RDA has a metadata standards group that is making a list of hte available metadata schemes. Nots of work coming from Digital Curatiuon Centre (see Kevin Ashley) 09:49:35 Q: Automatic machine readable to access the data itself, not just a URL 09:49:42 MF: Yes, this is a task for us 09:49:49 ... This problem came up a lot 09:49:57 ... I'm hooing for insights from other directions 09:50:20 Q: Accessing satellite data for e.g. you need to restrict the access to specific subsets. 09:50:22 PWinstanley_ has joined #sdsvoc 09:50:33 ... There are rest APIs like Swagger, but there's no predefined method 09:50:36 present+ PWinstanley 09:50:48 CM: Lotys of approaches for describing services on the Web, but haven't had a lot of impact for some raeson 09:51:05 CM: Maybe because they introduce complexity 09:51:17 nandana: Hydra CG is in thaty direction 09:51:30 Q: But that's very restricted to Rest. 09:51:52 AndreaPerego: We have a bar camp on this specific topic :-) 09:52:15 MF: It's a big issue. We're dealing with datasets usuall,y not endpoints 09:53:11 phila: Talks about subsetting issue Open Search etc. 09:53:28 Keith: You can't get into the data because it's too big so you don't know what to ask for 09:53:54 AndreaPerego: Talks about different levels that can be addressed. Need to include users 09:54:12 Keith: Geonetwork allows you to peek into the data to see if you're in the right area 09:54:55 CM: Working with industrial partners - tooling is very important. If you have a schema, you need the partners to tell you the detail you need 09:55:15 AG: We developed a very specific tool that was user-driven 09:55:42 ivan has joined #sdsvoc 09:55:44 ... focussed ion user-friendliness so it's not easily transferrable 09:56:10 s/ ion / on / 09:56:41 RRSAgent, draft minutes 09:56:41 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 10:11:58 DaveBr_ has left #sdsvoc 10:20:55 brandon has joined #sdsvoc 10:22:44 DaveBr has joined #sdsvoc 10:31:50 ivan has joined #sdsvoc 10:35:14 BartvanLeeuwen has joined #sdsvoc 10:35:27 newton has joined #sdsvoc 10:36:20 Thoke_Magnussen has joined #sdsvoc 10:37:17 phila has joined #sdsvoc 10:37:57 newton has joined #sdsvoc 10:38:24 AxelPolleres has joined #sdsvoc 10:38:40 present+ newton 10:38:49 PWinstanley has joined #sdsvoc 10:39:02 present+ PWinstanley 10:39:09 present+ Ivan 10:40:24 present+ BartvanLeeuwen 10:41:34 nandana has joined #SDSvoc 10:48:25 PWinstanley has joined #sdsvoc 10:48:34 Caroline_ has joined #sdsvoc 10:48:37 present+ PWinstanley 10:49:05 KevinDCC has joined #sdsvoc 10:49:30 newton has joined #sdsvoc 10:49:30 Present+ Caroline_ 10:50:40 present+ brandon 10:56:23 riccardoAlbertoni has joined #sdsvoc 10:56:25 Zakim, how is scribing? 10:56:25 sorry, Caroline_, I do not understand your question 10:56:31 Zakim, scribe 10:56:31 I don't understand 'scribe', Caroline_ 10:57:25 damires has joined #sdsvoc 11:03:21 Tessel has joined #sdsvoc 11:04:25 newton has joined #sdsvoc 11:12:15 antoine has joined #sdsvoc 11:19:27 DomJisc has joined #sdsvoc 11:24:55 danbri has joined #sdsvoc 11:25:45 Keith has joined #sdsvoc 11:32:17 Present+ damires 11:34:56 me Caroline_ yes, apologies for not mentioning all of you! 11:35:16 s/me Caroline_ yes, apologies for not mentioning all of you!// 11:37:40 :) 11:40:32 could panel members please talk to the room rather then just among themselves - it's not easy to hear them without PA sytems 11:41:02 there's a microphone on the front desk, is it not wired up? 11:41:16 s/front/smaller/ 11:42:33 @danbri: it is needed at the larger table for the group discussion 11:45:56 newton has joined #sdsvoc 11:46:59 Call for greater clarity in some of the DCAT definitions. Also guidance, perhaps a primer. Take various national APs as input 11:48:32 @phila: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Jul/att-0010/DCAT-APimplementationguide.pdf needs to be updated 11:49:09 antoine has joined #sdsvoc 11:57:23 can I respond to the google/schema question? 11:58:52 ok will respond later 12:06:51 Dee is in charge, not me :-) 12:07:35 Discussion around data that is not published in rarely versions 12:07:48 Need to handle data that changes all the time (real time data etc.) 12:10:10 +1 volatile/dynamic datasets probably need different metadata than “slower changing” datesets… where more versioning vocab is an issue. 12:10:52 mutable vs immutable datasets is relevant information 12:11:54 there are different forms of “mutable”, e.g. (monotone) growing vs. actually changing… is that reflected in any of the existing vocabs? 12:12:27 @AxelPolleres: yes, that's my point 12:12:35 for us (use case crawling and tracking changes/evolution) it would be terribly useful if these were advertised. 12:13:26 s/terribly/very(!)/ 12:13:30 Jim has joined #sdsvoc 12:13:34 Danielle Bailo: what are the boundaries of DCAT? 12:15:48 Andreas Kuckartz: DCAT seems less useful for describing binary programs 12:16:29 Makx: Some people in the WG see DCAT as very general that can describe many things 12:20:32 PWinstanley and AxelPolleres it's a real issue, we had some discussions about it during DWBP meetings 12:21:27 I would like to see this addressed on the charter of a new WG 12:21:33 newton, are you aware of any vocabs that actually define this difference? i.e., monotone groth vs. arbitrary changes, changeFrequency, groethrate, etc.? 12:23:41 newton, Axel: there used to be a vocabulary for 'accrcual policies' at DC. Mayb e not the right granularity though. 13:06:26 LarsG has joined #sdsvoc 13:08:54 brandon has joined #sdsvoc 13:09:13 damires has joined #sdsvoc 13:11:02 AndreaPerego has joined #sdsvoc 13:11:16 present+ AndreaPerego 13:11:36 present + damires 13:16:12 present+ brandon 13:17:27 newton has joined #sdsvoc 13:18:46 riccardoAlbertoni has joined #sdsvoc 13:18:58 phila has joined #sdsvoc 13:19:26 jrvosse has joined #sdsvoc 13:19:28 PWinstanley has joined #sdsvoc 13:19:39 present+ PWinstanley 13:20:16 Caroline_ has joined #sdsvoc 13:20:18 Present_ Caroline_ 13:20:30 AxelPolleres has joined #sdsvoc 13:21:05 Linda van den Brink from Geonovum on geospatial data 13:21:10 danbri has joined #sdsvoc 13:21:41 ... a key problem is that people from outside the geo domain do not understand the standards we use 13:21:43 BartvanLeeuwen has joined #sdsvoc 13:22:02 tessel has joined #sdsvoc 13:22:33 * I'm scribing but feel free to add 13:24:33 scribe: Jacco 13:24:38 scribeNick: jrvosse 13:28:50 bgrova has joined #sdsvoc 13:29:33 Linda is discussing a testbed testing use of mappings in the context of geoDCAT (see https://joinup.ec.europa.eu/node/154143/) and schema.org 13:30:07 see slides for testbed report 13:32:49 Q: Jacco: what do you think the key mission of a new WG be? 13:33:36 A: Lynda: Small core of a standard, for SDI coverage is really key, quality is also very important 13:33:57 Q: Phil: is something like the dcterms spatial concept core? 13:34:06 A: Lynda: yes 13:35:09 Q: Daniele Bailo: Is the loss of data in the mappings really an issue for end users on the web? 13:35:53 L: Linda: Maybe not, for discovery it may not be a problem. There are levels of importance 13:36:01 s/Lynda/Linda 13:37:08 Andrea Perego on GeoDCAT-AP 13:41:25 antoine has joined #sdsvoc 13:42:10 GeoDCAT-AP not replacing existing standards such as INSPIRE or ISO metadata for spatial, but providing extra interoperability by providing RDF-binding 13:42:25 RRSAgent, draft minutes 13:42:25 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 13:43:03 s/ISO/ISO 19115 13:43:23 Topic: Time and Space 13:43:30 RRSAgent, draft minutes 13:43:30 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 13:47:27 ... need for http conneg on profiles/schemas not just on format 13:50:39 ... need to model dataset distributions, distinguish data sets from data APIs 13:54:29 ... need for best practices for quality-related descriptions, there are too many patterns/standards 13:55:49 Q: Keith Jeffrey: need spatial coordidates both for what is observed and from where, what are your thoughts? 13:56:40 A: Yes, this is a difficult problem, also in crowdsource context and other contexts, but is not addressed at the metadata level, more at the level of the features 13:57:29 Q: Herbert: New iso spec "Resources" from those that made PMH, but more "webby" 13:58:02 Hermert: I'm involved in signposting.org which is also relevant 13:58:32 s/Hermert/Herbert/ 13:58:43 s/Resources/ResourceSync/ 13:59:12 Topic: Panel on Time and Space 13:59:33 Otakar Čerba joins panel 14:00:47 Otakar Čerba I'm here because we are developing a smart points of interest RDF dataset with 120M POI published, incl via a SPARQL endpoint 14:01:18 Daniele Bailo joins the panel 14:01:58 Daniele Bailo represents the EPOS geo ESFRI with lots of geospatial data 14:02:33 ... with many different types of data, this needs to be reflected in the metadata 14:04:09 ... need to think about who the audience is: general web users vs scientists from specific domains? 14:04:13 ... need to think about who the audience is: general web users vs scientists from specific domains? 14:05:07 Q Bart: is just getting the metadata currently not too complicated already? 14:05:21 remark Re: dataset vs service description - this is also to some extent related to the issue we mentioned before some time up in the chat about fast-changing/highly dynamic data (which may be rather seen as a service than a dataset) 14:05:56 A Andrea: Yes, for the general public ISO may be too much, especially if it is just for discovery purposes 14:07:33 ... for us , a dataset is what you decided to call a dataset 14:08:16 Linda: I see dcat as something for portals to find and reuse each other data sets, not necessarily as something for the end user 14:09:29 Daniele: I know the scientific user relatively well, typically does not want general web search. The "web use"r could be an software agent or human user. 14:09:43 newton has joined #sdsvoc 14:11:43 Otakar: same experience, users often do not use metadata. We have many Czech data portals but few real users 14:12:40 Andrea: my students use Google also because they do not know where the data is, this also makes it important to publish data on the Web 14:14:01 deirdrelee has joined #sdsvoc 14:14:02 Daniele: I agree, but I'm trying to understand the requirements for doing so. In my community people tend not to use persistent IDs or even URLs. This si a challenge/ 14:14:38 s/This si/This is/ 14:15:03 Bart: high level conclusion could be that there is too much info from the data in the metadata 14:15:39 Otakar: we also need feature metadata in the geo spatial domain 14:16:43 Andrea: data quality is more general that just spatial, and solutions can be reused for other domains 14:17:37 Linda: spatial coverage is key for first discovery step, use of all other quality and prov metadata is part of a second step 14:18:17 Daniele: what is need is on the scientific side is a huge effort on data and metadata harmonisation 14:38:33 newton has joined #sdsvoc 14:43:55 newton has joined #sdsvoc 14:44:52 LarsG has joined #sdsvoc 14:45:59 damires has joined #sdsvoc 14:46:39 scribe: deirdrelee 14:46:49 topic: Searching for data 14:47:17 Show Me The Way session 14:47:52 Searching for data session 14:48:02 Dmytro Potiekhin 14:48:11 CivicOS: Governance & Campaigning Data Standard 14:48:43 dmytro: worked in ukraine 14:49:01 Caroline_ has joined #sdsvoc 14:49:26 ... important to work with civil society and citizens is very important when there is danger of falsification at elections 14:49:59 ... integrating data is an important issue to protect democracy 14:50:07 AndreaPerego has joined #sdsvoc 14:50:08 Present+ Caroline_ 14:50:22 ... it is obvious w/out a proper voabulary describing needs of civil society, this is impossible 14:50:46 zakim, who is on irc? 14:50:46 I don't understand your question, AndreaPerego. 14:50:46 ... secondly, it is impossible to create such a vocabulary from top-down approach 14:50:53 zakim, who is here? 14:50:53 Present: PWinstanley, AndreaPerego, nandana, phila, newton, Ivan, BartvanLeeuwen, Caroline_, brandon, damires 14:50:56 On IRC I see AndreaPerego, Caroline_, damires, LarsG, newton, deirdrelee, bgrova, jrvosse, phila, brandon, Jim, Keith, DaveBr, pascaline, oystein, Zakim, RRSAgent 14:51:08 ... e.g. even with vocabularies that the european commission are working on 14:51:25 present+ LarsG 14:51:26 ... this vocab or set of interoperability vocabs must be demand driven 14:51:37 ... something that is accepted by the citizens 14:51:49 ... this is CivicOS 14:52:18 ... if we can unite efforts around development of such vocabularies, I am glad to help and this is what I am trying to do with colleagues 14:52:32 AxelPolleres has joined #sdsvoc 14:52:38 ... e.g. i am collaborating with the Stanford ?? Institute 14:52:58 ... this is not just a problem for Ukranians, but it is a global problem 14:53:13 ... my final request would be, not to just give everything to the governments. 14:53:40 ... in democratic societies, it is okay for governments to have all this technology, etc. 14:53:53 danbri has joined #sdsvoc 14:53:56 ... but in countries still fighting for democracy, this can be a problem 14:54:22 PWinstanley has joined #sdsvoc 14:54:35 present+ PWinstanley 14:54:59 ... for an example, there are often petitions to put pressure on governments. but if the petition is done by the governments, it is beuracratic. and it is also giving them a contact list of people that disagree with them 14:55:17 ... undermining civil society and what they are trying to achieve 14:55:28 s/beuracratic/bureaucratic 14:56:10 ... i encourage to keep developing vocabularies, but also to retain the activation and development of civil society 14:56:29 kevin: questiions? 14:57:24 kevin: the situation at the moment isn't ideal for discovery and interoperability of data, for the use-case you are talking about - empowering citizens 14:58:09 dmytro: for the commercial part it is working great, e.g. flight information automatically added to google calender 14:58:25 ... so standards are already working, but this needs to be brought to our community 14:58:51 s/questiions/questions/ 14:58:59 ... in egovernment, we see this too. but we see a trend to focus on egovernment, and not on egovernance or ecivilsociety 15:00:12 RRSAgent, draft minutes 15:00:12 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html jrvosse 15:00:43 ... these platforms should be controlled by civil society, not by dictators 15:01:13 ... even if personal identity issues are resolved, there will be interoperability issues 15:01:46 .. and this integration should not be less successful than government or commercial sectors 15:02:15 ... we need to apply these commercial standards in the government and civil society sectors 15:02:38 phila: you mentioned you wanted to integrate with schema.org 15:03:33 dmytro: we are experienced in the structures of what makes civil society works 15:03:45 ... but these are not described in schema.org 15:04:01 ... e.g. we have a list of 200 different types on non-violent actions 15:04:11 ... the leading vocabularies only document about 5 15:04:41 ... we would like to collaborate on the development of vocabularies and how to incorporate into schema 15:05:19 danbri: you can just go ahead and develop a vocabulary. we have built extensions that facilitate that 15:05:28 BartvanLeeuwen has joined #sdsvoc 15:05:58 ... there are some generic descriptions that could potentially in schema.org core, and for more detailed terms, extension might be best 15:06:04 ... but happy to chat 15:06:42 attendee: there is some similar work being done in the US, by beth novack, called ??? 15:07:12 ... this can have an impact and is similar to what you were talking aboutj 15:07:22 cf huridocs for human rights documentation 15:07:37 danbri: we were actually discussing schema.org and the documentation of hate crimes last week 15:07:46 ... happy to continue discussions 15:07:58 Raf Buyle 15:08:06 topic: The Public Sector DNA on the web: semantically marking up government portals. 15:08:32 s/aboutj/about 15:09:05 present+ damires 15:09:20 Raf: representing the Flemish Government 15:10:01 ... we believe publi services should be centered around citizens and businesses 15:10:23 ... today if you ask for info online about opening times and location of public building you get it 15:10:50 ... we think this should go further, e.g. providing info on using services 15:11:04 ... need to link to base registries 15:11:11 s/publi/public/ 15:11:26 ... flemish governmetn is working on strategy to add markup to government portals 15:11:40 tessel has joined #sdsvoc 15:11:52 ... we have seen success with schema.org, etc. this can be a bridge between public and private sectors 15:12:32 ... the citizen wants to find the info on the public service they want, regardless of public body providing it 15:12:56 ... we are looking at using and extending open standards, e.g. from W3C, ISA, OGC, etc. 15:13:41 ... the European Interoperability Framework states that you should look at all layers of interoperability, semantic, technical, etc 15:14:18 ... base registries are fundamental, but it is very difficult to get this data on the web, to integrate it with the private sector 15:15:04 ... imagine if we could ask private company, like google, about public services. where you could make an appointment, all the information at a user's fingertips 15:15:40 ... bridging between public and private sectors. schema.org is working very well. it has been widely adopted 15:15:52 ... this could be a strategy to get public services information out there 15:16:41 ... schema.org was first to discover data, but it is also used for new data services, e.g. bing and google knowledge graph 15:16:52 RRSAgent, draft minutes 15:16:52 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 15:17:19 ... we have a pilot [slide with architecture diagram] 15:18:07 ... we would like to combine schema.org with ISA core vocabularies 15:18:25 Keith has joined #sdsvoc 15:18:56 ... rdfs:seealso pointing from a schema.org resource to a isa core voc resource shows that more info is available 15:19:20 ... we are waiting to rolling this our on local and regional level 15:19:46 ... on the one hand, we are saying it is not difficult to annotate data in this way 15:19:53 tessel has joined #sdsvoc 15:20:11 ... we also want to see if these annotations are picked up by major search engines 15:20:39 ... and also interested in seeing if the search engines will pick up the extra ISA core voc info and display that as well 15:21:21 ... i have some questions [FEEDBACK slide with questions] 15:21:27 BernadetteLoscio has joined #sdsvoc 15:22:56 kevin: questions? 15:23:23 PWinstanley: what kind of mechanisms can we use to avoid false information getting into system? 15:24:26 Raf: i talked about a feedback loop. perhaps there could be a validation check comparing the original data and data being presented 15:24:41 Luis-Daniel Ibáñez 15:24:48 How we search for data? Towards User-Driven dataset descriptions 15:25:10 s/How we search for data? Towards User-Driven dataset descriptions/Topic: How we search for data? Towards User-Driven dataset descriptions/ 15:25:45 stefano has joined #sdsvoc 15:26:13 Luis-Daniel: For better data search 15:27:10 ... we carried out an analysis of data searches by talking to data professionals and analysing logs from data portals 15:28:00 ... [reads feedback from interviews - quotes from data professionals] 15:28:20 ... we found a lot of things that we discussed in previous talks 15:28:44 ... something maybe to highlight is users asking for a summary/preview of data 15:28:46 PWinstanley has joined #sdsvoc 15:28:57 present+ PWinstanley 15:29:21 ... with quantitative results, mainly desktop devices, etc.... 15:30:01 ... 68% of queries came from web search engines, suggesting that dat search is a work-related activity and people are relying on general-purpose search engines 15:30:38 ... is this because people use what they know or data portals are not doing their job properly? still open question for us 15:31:35 ... query characteristics show exploratory search, e.g. 'crime' - show me all crime data, not specific query 15:32:42 Artemis Lavasa 15:32:50 topic: CERN Analysis Preservation 15:34:30 Artemis: our aim is to capture, analyse and preserve data 15:34:56 ... we need to preserve the tools, processing steps, etc. we capture everything 15:35:13 ... we want to have as much context as possible so that we can recreate thata in future 15:35:32 ... we capture all that information via our forms 15:35:45 ... we describe our information using a json-based schema 15:35:58 ... it can handle complex metadata, which we have 15:36:26 ... the data capture forms are rich, so can be very long and vary a lot from experiment to experiment 15:36:49 ... [showing slide of example metadata] 15:37:24 ... we work closely with physicists and callibrate them according to their needs 15:38:15 ... in order to facilitate search, we need this metadata. e.g. a physicist might want to look at a particular particle, so looking at the title of the metadata is not sufficient 15:38:37 ... we need intelligent search, very precise 15:39:28 ... we played around with schema.org and json-ld. we could describe the high-level information, but not specialised fields. 15:39:40 ... we would like to use a standardised approach 15:40:18 ... i have tried to harmonise the schemas we have, but 80% of fields are something unique to what a physicist wanted 15:41:08 Alejandra Gonzalez-Beltran 15:41:16 topic: DATS: dataset descriptions for data discovery in DataMed 15:41:59 Alejandra: project funded by NIH in the US 15:43:07 ... DATS DatA Tag Suite is used to index data sources in datamed 15:43:31 ... [slide with online links to work] 15:43:56 ... we focus on the findability and accessiblity of datasets 15:44:55 ... we rely on adoption by data providers 15:45:29 ... we started by collecting lots of use-cases from the community and by looking at existing schemas 15:46:02 ... we considered multiple existing models, e.g. schema.org, datacite, rif-cs, hcls, dcat, etc 15:46:47 ... these models are lacking some elements in use-cases 15:47:09 ... we also looked at domain-specific models from biomed domain 15:47:35 ... the DATS model is a combination of elements we needed 15:48:13 ... we split the model into core entities (adopted elements from datacite and Force) and extended entities 15:48:51 ... we did a mapping to schema.org and looking at elixir 15:49:34 ... there are adopters of DATS, implementing it in their systems 15:49:48 ... i would like to thank groups that were involved 15:49:49 sebneumaier has joined #sdsvoc 15:49:58 Richard Nagelmaeker 15:50:05 topic: Linked Data needs a Data Location Service 15:50:36 RRSAgent: draft minutes 15:50:36 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html deirdrelee 15:52:18 Richard: I would like to pitch an idea to you 15:52:41 ... when i started with Linked Data, the idea was to put all data in one triple store 15:53:26 ... the internet has DNS 15:54:19 ... [shows slide with diagram] 15:54:45 ... there is data that as an organisation you have control over, and can have IRIs part of big picture 15:55:05 AndreaPerego has joined #sdsvoc 15:55:12 ... but there is also information that as an organisation you want to know, but is external to the organisation 15:55:39 ... e.g. customers, suppliers, etc. but as they are external they will have different IRIs 15:56:19 the issue is that behind a sparql endpoint will always contain the IRIs of the domain of the endpoint 15:56:24 Linda has joined #sdsvoc 15:56:29 ... DNS cannot help us 15:56:56 ... but the problem is similar to what DNS solves, so could it potentially help us find Linked Data IRIs? 15:57:48 ... there are a number of building blocks, e.g. triple stores, sparql endpionts, VOID 15:59:15 ... results slide..... resolves the discrepancy between dataset IRIs and IRIs of a SPARQL endpoint 15:59:20 Topic: panel 15:59:44 ivan has joined #sdsvoc 16:00:46 kevin: what evidence do we have that any of the efforts we've been talking about today will help people find the data they want? 16:00:57 ... if we don't have evidence, how can we get it? 16:01:03 richard: just do it! 16:02:11 kevin: instead of let's building it and see what happens, well with the web, once something is implemented, how can you measure before? 16:03:13 PWinstanley: there was a project in 2004 on bioinformatics ?? that disappeared. Were the lessons learned from that picked up by alejandra's project? 16:04:20 alejandra: there was a heavy load, you had to build uml model, tag with ontologies, etc. I think the lessons learned is that there is a more 'webby' approach, lighter, easier 16:05:25 ... at least in biomedical databases, there is a lot of effort in curation. many databases already have ways to find data 16:05:44 ... hopefully we will help people search across databases 16:06:16 kevin: luis, you have looked at what users are actually behaving on open data portals 16:07:12 luis: one observation, for the qualitative part we were with data experts, but with the quantitative part, it was open to all users. those that just wanted an answer, not necessarily 'data' 16:08:31 danbri: there are two very different paths, one making data available to billions of people, and making data available to the tiny minority of people who want to analyse specific data 16:08:48 ... both are important and can have huge impact, but very different. 16:09:11 nandana has joined #sdsvoc 16:09:45 ... ultimately, we want computer/google that knows the information, not just the data file 16:10:46 attendee: for luis' presentations, the one-word searches might be more related to people just finding an answer, not that there is structured data behind them 16:11:27 luis: we also know what people actually click on, not just search 16:12:14 Andreas Kuckner: will technologies like sparql still play a roll in ten years? 16:12:35 s/roll/role 16:12:53 richard: it depends how you look at IT. I think IT is a tool to help people. in this way i think sparql will be there 16:13:17 Caroline has joined #sdsvoc 16:13:22 ... the way i look at neural networks, they are trying to do something by themselves, this is a different kind of IT 16:13:24 Present+ Caroline 16:14:10 Raf: if you can look at rdf and sparql, i think these are approaches, moreso than technology 16:14:17 s/bioinformatics ??/bioinformatics [Cancer Bioinformatics Grid -- caBIG] 16:14:39 Artemis: i think in one way or another we all use rdf, so if not in this form it will survive in some form 16:14:59 luis: i think neural networks will learn how to use rdf 16:15:19 ... but the big question...will neural networks replace us all! 16:16:11 danbri: sparql is a very practical technology, which tend to stick around. I'm sure it'll be seen as a tool for using data, like sql and purl. but AI might increase more and more 16:17:37 kevin: will it be difficult to enrich data? 16:18:47 luis: it is important to know what has been done to the data, for example with crime data if the data was anonymised on purpose, should there be an effort to uncover the data that was removed? 16:19:20 alejandra: whatever the data is, what we care about is finding patterns in the data ... 16:19:44 antoine has joined #sdsvoc 16:20:03 ... [question to luis] because you were looking at user search, were you constrained by keyword 16:20:26 luis: we wanted to see if people asked questions or used keyword search 16:21:27 phila: In Raf's case, data is relevant to everyone in Flanders (public) and Artemis' case is relevant to very specialised physicists 16:22:01 ... danbri said that csv data can be incorporated into google's knowledge graph using csv on the web 16:22:32 aremis: there is a cern data portal, with huge data releases - TBs and PBs of data 16:23:20 ... there is also private data, meant for collaborations 16:24:15 ... the analysis is for specific purposes, people wanted to preserve this, but it is very sensitive, it won't be opened. most people also wont be interested in this data 16:24:38 ... aim was to help physicists preserve their analysis. that was the demand 16:25:14 Raf: why are public services data important? 16:25:22 ... 1. if I want to move to flanders 16:25:31 ... and want to set up a business 16:26:25 2. for business intelligence - e.g. you can compare how flanders compares to other regions, e.g. for a place to live 16:27:25 ... if this data is on the Web, more people can use these public services, lowering the barriers 16:27:58 danbri: a good thing from private orgs is that even if they can't release the data, you can release software 16:28:27 kevin: and also release info about the dat ais there, so that people can follow up on potential data access 16:29:03 BartvanLeeuwen: Raf, you asked is 'annotated data the new datset'? 16:29:34 ... so danbri is this something that will be possible 16:30:28 danbri: you can do ?productname? 16:30:57 ... there will also be dataset search, there is a page onine already, will distribute 16:31:17 ... we are looking at research data, data portals, we'll see what we can build 16:31:57 Raf: a lot of portals have feedback channels. should schema.org incorporate that? 16:32:32 danbri: maybe. schema.org is a dictionary, you have to build things with it. we have reviews, rating, etc 16:33:19 newton has joined #sdsvoc 16:34:31 Attendee: Describing datasets properly is a problem. also the problem is mapping the questions from natural language to sparql for example. A lot of problems around data discovery relates to where the data is and what kind of data there is. Any comments on natural language to formal queries? 16:35:09 Alejandra: there is one pilot project who are looking into this question of how the user can find datasets 16:36:04 danbri: we did put question/answer in schema.org,e.g. stack overflow. and researchers are starting to pick up on that 16:36:43 ... i would hope there would be more focus on social aspects of open data portals, which could in turn help discoverability 16:36:45 Caro 16:37:19 attendee: What mechanisms can address general search but also very focused search? 16:37:29 s/Caro/ 16:37:32 ... how does this affect reproducability 16:38:40 Raf: we combine schema.org at a general level for discoverabilty, which we combne with the core vocabularies which help with the specifics 16:39:22 alejandra: in the curation practices, generically it is very important to consider this. it is very relevant to know higher level terms and dmore specific terms that are speific 16:39:57 ... for reproducabilty it goes much further, you not only need the discoverability metadata, but also how the data was prepared, etc. 16:40:25 Linda has joined #sdsvoc 16:40:28 danbri: we recentely added a field in schema.org variable 16:41:03 luis: to me there is the dataset search levels in metadata 16:41:24 ... but to answer a more detailed question, you have to go deeper 16:41:52 ... what is the effort involved in adding this to metadata 16:42:24 AndreaPerego: another piece of information on helping to find the data is how the data is being used 16:42:41 ... e.g. feedback from users, this is important data 16:43:07 ... datasets have been used for purposes other than their original purpose 16:43:50 ... people can see how other people have used the data and it might help them decide if it's useful for them 16:44:44 Raf: if you knew information about when people physically go to public services, this could help advise when people should go 16:45:01 ... info on how public services are used could help improve the service provided 16:45:48 luis: i agree, it's important to know how data is being used, but it's difficult to convince users of this 16:46:27 alejandra: something very important is data citations. it is great to have it, but it is limitations 16:47:17 danbri: when datasets are used and discovered, they can go to their funders and justify the availability of data 16:48:13 .. data citation in the scholarly sector is done, but it is not common for example in media 16:48:23 ... this might be turning point 16:48:56 alejandra: it is also important to have contact information 16:49:13 wonders if W3C WebMention spec could help with this issue of citation and data usage [ https://www.w3.org/TR/webmention/ ] 16:50:04 kevin: it is difficult to measure if we had implemented something differently, how would impact be different 16:50:56 ... guidelines like the w3c dwbp and csv on the web have been referenced a lot today, they're obviously very useful for the communityl 16:51:04 s/communityl/community 16:51:14 RRSAgent: generate minutes 16:51:14 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html deirdrelee 16:51:44 Time for wine and canapes!!! 16:53:39 RRSAgent, draft minutes 16:53:39 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 16:53:43 +1 Dee 16:56:00 RRSAgent, draft minutes 16:56:00 I have made the request to generate http://www.w3.org/2016/11/30-sdsvoc-minutes.html phila 16:57:15 pascaline has left #sdsvoc 17:20:59 danbri has joined #sdsvoc 18:26:11 Zakim has left #sdsvoc 19:59:04 Linda has joined #sdsvoc 20:03:44 newton has joined #sdsvoc 20:24:39 AxelPolleres has joined #sdsvoc 20:40:50 Linda has joined #sdsvoc 20:43:53 newton has joined #sdsvoc 20:47:04 Linda has joined #sdsvoc 20:56:29 Linda has joined #sdsvoc 20:56:50 LarsG has joined #sdsvoc