06:40:51 RRSAgent has joined #dataprivacy18 06:40:51 logging to https://www.w3.org/2018/04/18-dataprivacy18-irc 06:41:13 rrsagent, make logs public 06:51:15 scribe: simonstey 06:56:13 rrsagent, draft minutes 06:56:13 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html simonstey 06:57:47 scribe: Bert 06:58:46 skirrane has joined #dataprivacy18 07:01:14 Zakim has joined #dataprivacy18 07:02:03 Zakim, this will be "W3C Workshop on Privacy and Linked Data (Day 2)" 07:02:03 ok, simonstey 07:02:37 meeting: "W3C Workshop on Privacy and Linked Data (Day 2)" 07:02:58 date: 18 April 2018 07:03:25 chair: Stefan_Decker 07:03:39 agenda: https://www.w3.org/2018/vocabws/#schedule 07:03:47 rrsagent, draft minutes 07:03:47 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html simonstey 07:05:11 topic: admin 07:05:34 chair: Vassilios_Peristeras 07:05:40 skirrane has joined #dataprivacy18 07:05:56 Vassilios_Peristeras: we'll extend the research track unti 11:00 07:06:22 Eva_Schlehahn_ULD has joined #dataprivacy18 07:07:00 topic: Ontological access control (Ramisa Gachpaz Hamed) 07:07:48 gerardkuys has joined #dataprivacy18 07:08:02 rigo has joined #dataprivacy18 07:08:20 rigo has changed the topic to: W3C Workshop on Privacy and Linked Data (Day 2) 07:08:40 topic: Privacy preserving profiling (Ramisa Gachpaz Hamed) 07:08:42 RRSAgent, pointer? 07:08:42 See https://www.w3.org/2018/04/18-dataprivacy18-irc#T07-08-42 07:09:20 Ramisa_Hamed: user doesn't understand what happens during disclosure decisions 07:09:52 ... we aim to empore users in disclosure decisions 07:10:15 balance automatic decision and user engagement 07:10:21 ... establishing a balance between automatic disclosure decision and active user engangement 07:10:41 Interesting that they also talk about disclosure 07:11:11 ... we use foaf & schema.org 07:11:50 Ramisa_Hamed: no overall vocabulary for personal data 07:12:08 scribe: skirrane 07:13:18 Harald-ULD has joined #dataprivacy18 07:14:04 ... we utilize various semantic web technologies 07:14:09 christians has joined #dataprivacy18 07:14:38 ... e.g. using SPARQL for returning access decisions 07:14:54 vassilios has joined #dataprivacy18 07:15:15 ... Framework consists of data owner and data requester 07:16:03 VMireles has joined #dataprivacy18 07:16:09 ... with a so called "privacy preserving unit" in between 07:16:20 They have automatic and semi-automatic decision making which considers contextual information 07:16:31 ontology based access control management 2016 07:16:50 RRSAgent, make minutes v2 07:16:50 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html Bert 07:17:04 they focus on primarily on healthcare 07:17:34 ... we use an extended version of an access control ontology 07:18:43 ... disclosure decisions are either asserted or newly inferred by the reasoner 07:20:09 ... we use owlexplanation for generating a natural language description of the inferred policy 07:20:24 s/policy/disclosure decision/ 07:22:01 rigo: article 22 of the GDPR explicitly mentions that "The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her." 07:22:10 http://www.privacy-regulation.eu/en/article-22-automated-individual-decision-making-including-profiling-GDPR.htm 07:22:24 ... how do you address this? 07:22:53 Ramisa_Hamed: it's not fully automatic 07:23:43 ida has joined #dataprivacy18 07:24:29 rigo: when you are in the profiling step, how do you identify whether the delta is significant? 07:25:17 ... [giving an example on the US credit system] 07:25:20 schunter has joined #dataprivacy18 07:26:20 i/topic: admin/scribenick: Bert, simonstey, skirrane 07:26:22 RRSAgent, make minutes v2 07:26:22 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html Bert 07:26:24 Stefan Decker: Can the system also tell the data subject tell, what the minium of data needed is - in contrast to the current appraoch of collecting the maximum amount of data? 07:26:45 i/topic: admin/scribenick: Bert, simonstey, skirrane, Harald-ULD 07:26:50 RRSAgent, make minutes v2 07:26:50 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html Bert 07:27:30 Stefan_Decker: you have listed quite a large amount of SWT 07:27:51 ... despite some of them still being somewhat experimental; what about scalability? 07:28:30 ... any information about computational complexity? 07:29:02 vassilios: you mentioned an access control ontology 07:29:13 ... what are your extensions to that ontology? 07:29:35 Ramisa_Hamed: we added roles, different actions 07:30:04 vassilios: what are the missing parts you identified? 07:30:35 Ramisa_Hamed: the main problem was finding a taxonomy for personal data 07:31:00 topic: Linked Data, Provenance, Compliance (Javier Fernández) 07:32:11 jfernandez: first giving an overview about the SPECIAL framework 07:32:31 ... composed of several components 07:33:22 ... 5 categories of consent: data, processing, purpose, storage, recipients 07:34:26 ... 2 main data components: policies, and log of events 07:36:00 ... we think usage policy language should be standardized 07:36:21 transparency and compliance framework: A language for the complance log has been developed (SPLOG) 07:37:20 ... Log consists of LogEntry; LogEntry is either a PolicyEntry or DataEvent 07:37:22 log entries / events: two types of events - relateing to policies or relateing to the data 07:40:00 ... https://aic.ai.wu.ac.at/qadlod/policyLog/ (The SPECIAL Policy Log Vocabulary) 07:40:06 jfernandez: we have two optional parts Immutable Record and BPM 07:40:51 rigo has joined #dataprivacy18 07:40:55 https://aic.ai.wu.ac.at/qadlod/policyLog/ 07:41:06 https://aic.ai.wu.ac.at/qadlod/policyLanguage/ 07:41:33 Vobabularies are currently available online 07:42:25 jfernandez: [giving some SPLOG examples] 07:43:18 ... we are working on a running prototype (which we'll present as a demo at ESWC'18) 07:44:46 ... possible points for further discussions: 07:45:02 ... 1) lack of standard vocab for representing privacy-related events 07:45:35 ... 2) it should be possible to describe event content on different levels of granularity 07:46:23 Q: what about anonymization of the logs? 07:46:37 Matthias_Schunter: Anonymising data before putting it in the log? 07:47:04 jfernandez: 3) interoperability/standard APIs 07:47:13 jfernandez: Yes, possible 07:47:59 Harald-ULD: Deleting info and meta-data to say something has been deleted? 07:48:40 jfernandez: We have a hash of deleted data. 07:49:22 skirrane: ongoing discussions revolve around what actually needs/has to go into the log 07:49:43 ... some believe that instance data shouldn't go into logs at all (only classes) 07:49:49 skirrane: Some believe personal data should not be in the log at all. But we also want integrity checks on the data. 07:50:31 ... how to actually ensure the link between history table and actual log? 07:50:40 ... History data about what chnaged or was deleted. Hash shows data was tampered with, but not what was changed. 07:51:28 Q: have you done or planned to do user studies on your approach? 07:52:03 jfernandez: Yes. 07:52:53 skirrane: we already got some initial feedback from user studies done with students 07:53:21 skirrane: Yes, we have social science students on that. We also have a UI and user studies in the project (but not presented here, because the workshop is about vocabs.) 07:53:56 J. Langfort: Proposed log suggestion is out there used with social networks. Comment: Avoid storeing more and more data in logs of which users may actually want to not have those stored. 07:54:00 topic: Data-driven privacy and trust enhancement mechanisms (Yi Yin) 07:54:13 skirrane has joined #dataprivacy18 07:54:20 rigo has joined #dataprivacy18 07:55:01 Yi Yin, PHD student involved in VRE4EIC (ERCIM is a partner in that) 07:55:08 Yi_Yin: https://www.w3.org/2018/vocabws/papers/yin.pdf (position statement) 07:55:59 for social science and eHealth always face the problem of privacy to pool research data 07:56:18 researchers are not aware of GDPR 07:56:50 VRE4IEC platform asked Delft lawyers, had no answer 07:56:58 Yi_Yin: We don't store data sets themselves, but have a catalogs of them. Question for researchers is what the privacy rules are when they want to use the data sets. 07:57:08 Building the platform they consulted legal advice and did not get concrete advice. 07:57:41 privacy impact assessment needed. Is there a recommendation to do that automatically? 07:57:56 researchers want to research, not concentrate on privacy 07:58:03 ... So that's my question for the discussion here: how do we explain and manage the privacy rules of the data sets? 07:59:13 ... How can researchers check themselves? How to make sure they do the checks? 08:02:13 [discussing how the GDPR will affect research efforts] 08:04:15 [discussion on research with data and protection of personal data in reserach] 08:05:35 Statement: Researchers in the past had to deal with dogmas of the time and fight against those. Data Protection may be current dogma limiting research. 08:06:10 related GDPR article: http://www.privacy-regulation.eu/en/article-89-safeguards-and-derogations-relating-to-processing-for-archiving-purposes-the-public-interest-scientific-or-hi-GDPR.htm 08:07:21 Rigo citeing recital 33 GDPR: Data subjects are allowed to consent to yet unknown future steps and results with the data - but this is within "certain areas of scientific research" and within ethical boundaries. 08:08:13 GDPR recital 33: http://www.privacy-regulation.eu/en/recital-33-GDPR.htm 08:09:04 Eva: you should look closely at what you want to do and then go back and look for consent. 08:09:11 See realted recital 33 at: http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN 08:09:49 specifying purposes is important in that context - when you can name them reasonably, you've done a big step already 08:10:20 Yi_Yin: EU also encourages sharing of research data. 08:11:32 topic: Meta-data to describe the details of the anonymization (Benjamin Heitmann) 08:11:49 Harald-ULD: SPECIAL is looking at research data too. Often a question of purpose: published for a cerrtain purpose and not another. And often an ethical question. 08:12:22 Harald: We need to see, how reserach can work with public data - that is, data published by the data subjects themselves but in other contexts unrelated to the research. 08:14:33 Benjamin Heitman: privacy as an enabler of the data economy 08:14:47 ... talking about data being bought and sold 08:15:01 s/Benjamin Heitmann/Benjamin_Heitmann/ 08:15:04 Incentives needed on both sides: Incentive to collect the data and incentive for data subjects to have their data collected. 08:15:44 ... anonymisation is one approach 08:15:46 ... in order for data to be sold, it needs to be anonymized 08:16:12 ... buyers need to determine value based on utility 08:16:17 yiyin has joined #dataprivacy18 08:16:17 ... in order to have both sides happy you need incentives on both sides. 08:16:27 ... at the same time one has to identify the value of anonymized data 08:16:47 ... anonymisation is widely used to limit impact on data subject. 08:17:08 ... anonymisation is one approach however wee need metadata to describe the approach 08:17:24 ... 3 distinct subsets of tabular data: Identifiable Attr (IA), Quasi-Identifier (QID), Sensitive Attr (SA) 08:17:55 s/distinct/disjoint/ 08:18:11 ... anonymisation usually meant to delete these categories of data from a table / dataset. 08:18:42 k-anonymity and the conceps are well understood already. 08:18:56 ... [giving a brief intro in k-anonymity,l-diversity] 08:19:47 ... there's a difference between k-anonymity for tabular data and k-anonymity for graph data 08:20:11 ... [example on anonymisation of graph data] 08:22:11 ... we need meta-data describing encrypted/anonymized data 08:23:22 rigo: we don't do anonymization but talk about anonymization 08:23:27 Benjamin_Heitmann: Somebody who buys anonymised data needs to know before buying it if the data in this form is still useful for his purpose, hence we need meta-data to describe the anonymisation. 08:23:48 ... as anonymisation cahnges the utility of data. This changes the valuea of the data. We need metadata expressing information on the anonymisation process 08:24:42 Freddy_de_Meersman: You mentioned incentives and different countries: Could some countries, China, US, Europe, be at relative competitive disadvantage to each other? 08:25:26 Benjamin_Heitmann: There are negative but also positive incentives. Europe on its way to create positive incentives. 08:25:50 ... Developing new technologies. 08:26:51 ... Does mean more "red tape". But privacy protection also means company data protection. 08:30:19 Martin, you can do more with raw data. 08:30:42 Benjamin_Heitmann: [In answer to question from Jaroslav Pullman] Examples of anonymising for different usages, leaving out different things based on purpose. Anonymising on the fly could keep data useful for more than one purpose. 08:30:42 ... face recognition works better for asian faces 08:31:29 Martin_Kurze: Do we need a kind of European Wall to protect ourselves? 08:32:19 skirrane has joined #dataprivacy18 08:32:57 ... Benjamin_Heitmann not proposeing a wall. Use the opportunities given. There are countries with strong research focuses e.g. in security Israel is strong on crypography. 08:32:59 Benjamin_Heitmann: No, I don't propose that. History shows that countries that required security now have good cryptographers and export technology. 08:34:08 ... The positive incentives are enough to create business opportunities. 08:34:14 European legislation provides incentives to develop research in certain areas, wich may not be strong in other areas of the world. 08:34:42 Darren_Bell: Same for risk analysis. 08:35:39 Rigo on European business opportunites: There are businesses in Europe makeing money on customers fleeing insecure US-services. 08:35:50 Rigo: Example of T Systems advertising its secure communications to attract customers form the US. 08:37:52 -> http://www.bbc.co.uk/news/technology-43797128 BBC story on Facebook facial recognition opt-out 08:39:07 Benjamin_Heitmann: No doubt result of legal pressure in EU. 08:39:29 topic: Privacy-utility Control For Linked Data Against Deanonymisability Risk (Dalal Al-Azizy) 08:39:41 StefanD has joined #dataprivacy18 08:39:55 ... The EU rules may lead to the best tech being implemented, even if the options are turned off in some countries. 08:42:16 Dalal_AlAzizy: deanonymization attacks related to principles of Linked Data 08:42:37 ... e.g., linkage/inference attacks 08:44:18 Dalal_Al-Azizy: There's a lack of tools to help determine the risks of publishing data. 08:44:54 Q: what are the biggest challenges you see in moving tabular data to the linked data realm? especially wrt. anonymization techniques 08:45:48 s/Q/Darren_Bell/ 08:46:42 ...one challenge: How to control / asses the risk of Insertion of owl:sameAs into LOD 08:46:50 Dalal_Al-Azizy: E.g., if somebody later published data that can be combined with my earlier data. 08:48:08 Benjamin_Heitmann: A fan of Lindon(sp?), but has its limits, because what it is based on. 08:48:54 Rigo: Data protection terminology is strict and explicit about entities (data subject, controller, processor) as erveything is about the realtions bethween these entities. Useing security terminology meay easily misguide. 08:49:39 Rigo: For the next discussion: One complex issue is how to connect the metadata to the actual data. 08:49:41 skirrane has joined #dataprivacy18 08:49:57 Rigo: we have not discussed how we attach policies to data in sufficent detail 08:50:47 ... P3P had a "policy reference file". But not sufficient for a big service like Akamai: the file would be 16MB for each metadata item! 08:51:07 Rigo: Followup-questions: Do we need to secure this link and... 08:54:34 vassilios: who believes there is room for standardisation in this space 08:54:41 Participant poll: Raise hand, if you think, that standardisation is useful in at least one of the sub-topics disussed (100% affirmative) 08:55:58 skirrane has joined #dataprivacy18 08:56:19 Stefan Decker: GDPR will influence worldwide. So adapting to the GDPR makes sense - not only for EU-companies. 08:56:30 Stefan: Also for discussion: We believe the GDPR is European, but will have effect worldwide, and thus we also need global standards. 08:57:00 topic: break 08:57:03 [break] 09:04:42 Zakim has left #dataprivacy18 09:11:57 skirrane has joined #dataprivacy18 09:24:35 Eva_Schlehahn_ULD has joined #dataprivacy18 09:43:38 skirrane has joined #dataprivacy18 09:44:24 Topic: Discussion - standardisation opportunities 09:44:48 vassilios: We tried to group the post-it notes. 09:45:16 ... We want to go quickly through our understanding of what is written on your notes. 09:46:54 ... Some topics correspond to more than one phase of the process of data collection - data processing - dissemination. 09:47:48 skirrane: About that process: This taxonomy comes from a legal scholar and is used extensively in social sciences. 09:48:15 ... Data collection includes surveilance and interrogation. 09:48:50 ... Processing includes aggregation, identification, insecurity, secondary use, exclusion. 09:49:29 ... Dissemination step includes (breach of) confidentiality, disclosure, etc. 09:50:18 ... Before Collection there is an "Invasion" step, intrusion into the privacy of the data subject. 09:50:55 taxonomies of privacy data, of disclosure, how information is collected, how can it be used 09:51:08 Stefan: One group of post-it notes deal with taxonomies. 09:51:15 skirrane has joined #dataprivacy18 09:51:28 how can we access individual harm 09:53:31 @5: Classifying financial benefit to a company to own data. 09:54:41 skirrane has joined #dataprivacy18 09:55:05 Rigo: There is another kind of party, not in this model: users of data who get the data not from the data subjects but from another data collector, indirectly. 09:55:26 skirrane has joined #dataprivacy18 09:57:59 Darren_Bell: Create taxonomies to support research to create ontologies. 09:58:04 Harald-ULD has joined #dataprivacy18 09:58:32 Eva_Schlehahn_ULD has joined #dataprivacy18 09:59:23 Discussions about necessary and/or sufficient. What is the intended next step. 10:00:14 skirrane has joined #dataprivacy18 10:00:51 Vassilios: At the moment not making priorities, just grouping. Can decide later what is easy to do, most necessary, etc. 10:01:19 Rigo: Goal of workshop discussion is also to get people engaged. 10:02:30 Vassilios: So now what? How do we decide which of the topics we discuss and how? 10:03:06 Rigo: Also interested in what are the most popular topics. 10:04:00 [Input provided by participants on sticky posts has been clustered according to Frank Bernieri's categories of stages of processing: Information Collection IC, information dissemination, information processing] 10:04:43 Mark_Lizar: about standardisation, what's most popular is maybe not the first factor. What fits the W3C process is also important. 10:05:10 skirrane: Let's do a live poll. 10:05:12 VMireles has joined #dataprivacy18 10:05:26 RRSAgent, make minutes v2 10:05:26 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html Bert 10:06:50 Ramisa has joined #dataprivacy18 10:07:14 Further discussion about clustering and polling. 10:07:28 log vocab appeared several times 10:07:53 Stefan: One keyword that returned in several post-its is event logging. 10:08:20 ... Another is categories of purposes. 10:08:33 purpose categories appeared is not too similar with taxonomies, but also appeared 10:08:46 purpose is taxonomy, but not related to data subject 10:08:51 Stefan: Purpose categories as a taxonomy related to the data and not the data subject as a further cluster 10:09:09 cs has joined #dataprivacy18 10:09:31 Information Dissemination: Metadata for anonymisation 10:09:53 ... We grouped another few notes under Dissemination. Includes also risks and de-anonymisation. 10:09:57 - 10:10:06 consent interoperability fit in there as well, number of different metadata 10:10:44 geographical region and controller ship 10:10:56 BenjaminHeitmann has joined #dataprivacy18 10:10:57 related to purpose 10:11:07 general category purpose with subtopics 10:11:18 ... purpose as general category with several sub-topics associated to it 10:11:57 Information collection 10:12:09 ... Under Processing, we also put issues with geographical location of processing. 10:12:10 exchange formats for data and consent types 10:12:21 is it processable 10:13:45 ... We left some topics unclassified, such as industry-domain-specific issues. 10:14:04 Rigo: missing is a profile for data protection 10:14:23 of available RDF vocabularies, like geodata and provenance 10:14:53 ... Also enforcement and policy patterns are uncategorised. 10:17:29 Sabrina is constructing a poll to order the various requested taxonomies, such as for anonymisation, for purpose categories or for human behaviour. 10:18:39 The goal is that the participants rank them in order of what they would most want to participate in. 10:19:58 Adress of the poll: https://pollev.com/sabrinakirra386 10:23:00 Rigo: Taxonomy of privacy data actually includes taxonomy of purposes, because privacy data is defined to include the purpose. 10:24:25 @6: Different kind of taxonomy, purposes independent of privacy. Something ODRL is lacking. 10:25:34 Discussion about what "privacy data" means. Is it private data? Is it everything relating to data processing, including purpose? Something else? 10:27:04 Sabrina changes the poll form from "taxonomy of privacy data" to "taxonomy of GDPR term" 10:28:26 because there is difference between the concepts of 'privacy' and 'data protection' 10:29:55 Harald: missing is a mention of actors involved 10:33:36 Rigo suggests: include things like controller, processor, data subject, do not include name, adress, business adress ... 10:33:48 VMireles has joined #dataprivacy18 10:35:02 The poll is here: https://pollev.com/sabrinakirra386 10:35:12 maybe somebody can set it as the topic of the channel ? 10:35:42 Bert has changed the topic to: https://pollev.com/sabrinakirra386 10:36:57 People start filling the poll, even though discussion about its content is continuing. 10:37:30 Vassilios and Rigo agree: We should not discuss on taxonomies that actually should come from other domains, to avoid that we bring in data protection bias. 10:38:00 Is it too high-level? Where is the issue of linking to data? Is that not a taxonomy? 10:39:16 Jaroslav_Pullman: Not sure how to formalise it, but questions about what is needed to make decisions, requirements documents, whether something is enforceable. 10:40:47 ... It is maybe a glossary of relevant context, rather than a taxonomy. 10:42:19 BenjaminHeitmann has joined #dataprivacy18 10:48:14 More discussion about which type of data is covered by which question on the poll. 10:49:08 ida_ has joined #dataprivacy18 10:49:54 [Lunch break - and online polling in parallel] 10:52:08 RRSAgent, please draft minutes 10:52:08 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html rigo 10:53:26 skirrane has joined #dataprivacy18 10:57:32 skirrane has joined #dataprivacy18 12:05:41 Eva_Schlehahn_ULD has joined #dataprivacy18 12:06:28 VMireles has joined #dataprivacy18 12:09:11 Topic: Discussion of next steps 12:09:37 Ranking poll resulted in "taxonomy of personal data" on top. 12:10:51 Rigo explains W3C's infrastructure for workshops, working groups and community groups. 12:11:35 Workshop can be a first step to see if some group is needed and what its goals should be. 12:12:08 Harald-ULD has joined #dataprivacy18 12:13:04 To join cooperative work in the community group - get public W3C-account and have it added to the community group. 12:13:27 A group then needs a charter. A community group doesn't make a standard, but its output can be the creation of a working group, which, in turn creates a standard. 12:14:10 Jaroslav_Pullman: What is the Privacy Interest Group of W3C and its relation to this work? 12:14:19 The success depends on the activity of the participants. 12:15:04 Rigo: An IG doesn't create specs. It may review them. 12:16:12 Stefan: So what next if we do want a standard? 12:17:06 Once taxonomy is final the community group creates a charter that could lead to a working group beeing created. 12:17:44 Rigo explains charter development (usually with help from a staff member). Then needs a certain number of W3C members, about 20, to support the charter before the WG can be created. 12:20:19 Community group on the other hand has almost no requirements. If five people join, it's a group. 12:20:53 laurent_oz has joined #dataprivacy18 12:21:26 Rigo asks if everybody is OK to receive follow-up mail after the workshop, to announce the report and possibly the community group. 12:21:39 People not OK should tell Rigo. 12:22:41 Presenters who have slides and haven't given them yet, should send them, so they can be linked from the Workshop Web page, with the report. 12:23:31 Round of thanks. 12:24:09 To Rigo and Axel Polleres for the organisation. 12:25:02 To the chairs for managing the presentations and discussions. 12:25:28 And to Sabrina for the local arrangements. 12:25:47 RRSAgent, make minutes v2 12:25:47 I have made the request to generate https://www.w3.org/2018/04/18-dataprivacy18-minutes.html Bert 12:58:35 skirrane has joined #dataprivacy18 13:06:14 AxelPollleres has joined #dataprivacy18 14:49:03 skirrane has joined #dataprivacy18