"W3C Workshop on Privacy and Linked Data (Day 2)"

Meeting Minutes

admin

Vassilios_Peristeras: we'll extend the research track unti 11:00

Ontological access control (Ramisa Gachpaz Hamed)

<rigo> rigo has changed the topic to: W3C Workshop on Privacy and Linked Data (Day 2)

Privacy preserving profiling (Ramisa Gachpaz Hamed)

Ramisa_Hamed: user doesn't understand what happens during disclosure decisions
… we aim to empore users in disclosure decisions

<rigo> balance automatic decision and user engagement

Ramisa_Hamed: establishing a balance between automatic disclosure decision and active user engangement

Interesting that they also talk about disclosure

Ramisa_Hamed: we use foaf & schema.org

Ramisa_Hamed: no overall vocabulary for personal data
… we utilize various semantic web technologies
… e.g. using SPARQL for returning access decisions
… Framework consists of data owner and data requester
… with a so called "privacy preserving unit" in between

They have automatic and semi-automatic decision making which considers contextual information

<rigo> ontology based access control management 2016

they focus on primarily on healthcare

Ramisa_Hamed: we use an extended version of an access control ontology
… disclosure decisions are either asserted or newly inferred by the reasoner
… we use owlexplanation for generating a natural language description of the inferred disclosure decision

rigo: article 22 of the GDPR explicitly mentions that "The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her."

http://‌www.privacy-regulation.eu/‌en/‌article-22-automated-individual-decision-making-including-profiling-GDPR.htm

... how do you address this?

Ramisa_Hamed: it's not fully automatic

rigo: when you are in the profiling step, how do you identify whether the delta is significant?
… [giving an example on the US credit system]

Stefan Decker: Can the system also tell the data subject tell, what the minium of data needed is - in contrast to the current appraoch of collecting the maximum amount of data?

Stefan_Decker: you have listed quite a large amount of SWT
… despite some of them still being somewhat experimental; what about scalability?
… any information about computational complexity?

vassilios: you mentioned an access control ontology
… what are your extensions to that ontology?

Ramisa_Hamed: we added roles, different actions

vassilios: what are the missing parts you identified?

Ramisa_Hamed: the main problem was finding a taxonomy for personal data

Linked Data, Provenance, Compliance (Javier Fernández)

jfernandez: first giving an overview about the SPECIAL framework
… composed of several components
… 5 categories of consent: data, processing, purpose, storage, recipients
… 2 main data components: policies, and log of events
… we think usage policy language should be standardized

transparency and compliance framework: A language for the complance log has been developed (SPLOG)

jfernandez: Log consists of LogEntry; LogEntry is either a PolicyEntry or DataEvent

log entries / events: two types of events - relateing to policies or relateing to the data

jfernandez: https://‌aic.ai.wu.ac.at/‌qadlod/‌policyLog/ (The SPECIAL Policy Log Vocabulary)

jfernandez: we have two optional parts Immutable Record and BPM

https://‌aic.ai.wu.ac.at/‌qadlod/‌policyLog/

https://‌aic.ai.wu.ac.at/‌qadlod/‌policyLanguage/

Vobabularies are currently available online

jfernandez: [giving some SPLOG examples]
… we are working on a running prototype (which we'll present as a demo at ESWC'18)
… possible points for further discussions:
… 1) lack of standard vocab for representing privacy-related events
… 2) it should be possible to describe event content on different levels of granularity

Q: what about anonymization of the logs?

Matthias_Schunter: Anonymising data before putting it in the log?

jfernandez: 3) interoperability/standard APIs

jfernandez: Yes, possible

Harald-ULD: Deleting info and meta-data to say something has been deleted?

jfernandez: We have a hash of deleted data.

skirrane: ongoing discussions revolve around what actually needs/has to go into the log
… some believe that instance data shouldn't go into logs at all (only classes)

skirrane: Some believe personal data should not be in the log at all. But we also want integrity checks on the data.
… how to actually ensure the link between history table and actual log?
… History data about what chnaged or was deleted. Hash shows data was tampered with, but not what was changed.

Q: have you done or planned to do user studies on your approach?

jfernandez: Yes.

skirrane: we already got some initial feedback from user studies done with students

skirrane: Yes, we have social science students on that. We also have a UI and user studies in the project (but not presented here, because the workshop is about vocabs.)

J. Langfort: Proposed log suggestion is out there used with social networks. Comment: Avoid storeing more and more data in logs of which users may actually want to not have those stored.

Data-driven privacy and trust enhancement mechanisms (Yi Yin)

<rigo> Yi Yin, PHD student involved in VRE4EIC (ERCIM is a partner in that)

Yi_Yin: https://‌www.w3.org/‌2018/‌vocabws/‌papers/‌yin.pdf (position statement)

<rigo> for social science and eHealth always face the problem of privacy to pool research data

<rigo> researchers are not aware of GDPR

<rigo> VRE4IEC platform asked Delft lawyers, had no answer

Yi_Yin: We don't store data sets themselves, but have a catalogs of them. Question for researchers is what the privacy rules are when they want to use the data sets.

Building the platform they consulted legal advice and did not get concrete advice.

<rigo> privacy impact assessment needed. Is there a recommendation to do that automatically?

<rigo> researchers want to research, not concentrate on privacy

Yi_Yin: So that's my question for the discussion here: how do we explain and manage the privacy rules of the data sets?
… How can researchers check themselves? How to make sure they do the checks?

[discussing how the GDPR will affect research efforts]

[discussion on research with data and protection of personal data in reserach]

Statement: Researchers in the past had to deal with dogmas of the time and fight against those. Data Protection may be current dogma limiting research.

Rigo citeing recital 33 GDPR: Data subjects are allowed to consent to yet unknown future steps and results with the data - but this is within "certain areas of scientific research" and within ethical boundaries.

GDPR recital 33: http://‌www.privacy-regulation.eu/‌en/‌recital-33-GDPR.htm

Eva: you should look closely at what you want to do and then go back and look for consent.

See realted recital 33 at: http://‌eur-lex.europa.eu/‌legal-content/‌EN/‌TXT/‌HTML/?uri=CELEX:32016R0679&from=EN

<Eva_Schlehahn_ULD> specifying purposes is important in that context - when you can name them reasonably, you've done a big step already

Yi_Yin: EU also encourages sharing of research data.

Meta-data to describe the details of the anonymization (Benjamin_Heitmann)

Harald-ULD: SPECIAL is looking at research data too. Often a question of purpose: published for a cerrtain purpose and not another. And often an ethical question.

Harald: We need to see, how reserach can work with public data - that is, data published by the data subjects themselves but in other contexts unrelated to the research.

Benjamin Heitman: privacy as an enabler of the data economy

... talking about data being bought and sold

Incentives needed on both sides: Incentive to collect the data and incentive for data subjects to have their data collected.

... anonymisation is one approach

... in order for data to be sold, it needs to be anonymized

... buyers need to determine value based on utility

... in order to have both sides happy you need incentives on both sides.

... at the same time one has to identify the value of anonymized data

... anonymisation is widely used to limit impact on data subject.

... anonymisation is one approach however wee need metadata to describe the approach

... 3 disjoint subsets of tabular data: Identifiable Attr (IA), Quasi-Identifier (QID), Sensitive Attr (SA)

... anonymisation usually meant to delete these categories of data from a table / dataset.

k-anonymity and the conceps are well understood already.

... [giving a brief intro in k-anonymity,l-diversity]

... there's a difference between k-anonymity for tabular data and k-anonymity for graph data

... [example on anonymisation of graph data]

... we need meta-data describing encrypted/anonymized data

rigo: we don't do anonymization but talk about anonymization

Benjamin_Heitmann: Somebody who buys anonymised data needs to know before buying it if the data in this form is still useful for his purpose, hence we need meta-data to describe the anonymisation.

... as anonymisation cahnges the utility of data. This changes the valuea of the data. We need metadata expressing information on the anonymisation process

Freddy_de_Meersman: You mentioned incentives and different countries: Could some countries, China, US, Europe, be at relative competitive disadvantage to each other?

Benjamin_Heitmann: There are negative but also positive incentives. Europe on its way to create positive incentives.
… Developing new technologies.
… Does mean more "red tape". But privacy protection also means company data protection.

<rigo> Martin, you can do more with raw data.

Benjamin_Heitmann: [In answer to question from Jaroslav Pullman] Examples of anonymising for different usages, leaving out different things based on purpose. Anonymising on the fly could keep data useful for more than one purpose.

<rigo> ... face recognition works better for asian faces

Martin_Kurze: Do we need a kind of European Wall to protect ourselves?

... Benjamin_Heitmann not proposeing a wall. Use the opportunities given. There are countries with strong research focuses e.g. in security Israel is strong on crypography.

Benjamin_Heitmann: No, I don't propose that. History shows that countries that required security now have good cryptographers and export technology.
… The positive incentives are enough to create business opportunities.

European legislation provides incentives to develop research in certain areas, wich may not be strong in other areas of the world.

Darren_Bell: Same for risk analysis.

Rigo on European business opportunites: There are businesses in Europe makeing money on customers fleeing insecure US-services.

Rigo: Example of T Systems advertising its secure communications to attract customers form the US.

BBC story on Facebook facial recognition opt-out

Benjamin_Heitmann: No doubt result of legal pressure in EU.

Privacy-utility Control For Linked Data Against Deanonymisability Risk (Dalal Al-Azizy)

Benjamin_Heitmann: The EU rules may lead to the best tech being implemented, even if the options are turned off in some countries.

Dalal_AlAzizy: deanonymization attacks related to principles of Linked Data
… e.g., linkage/inference attacks

Dalal_Al-Azizy: There's a lack of tools to help determine the risks of publishing data.

Darren_Bell: what are the biggest challenges you see in moving tabular data to the linked data realm? especially wrt. anonymization techniques

<VMireles> ...one challenge: How to control / asses the risk of Insertion of owl:sameAs into LOD

Dalal_Al-Azizy: E.g., if somebody later published data that can be combined with my earlier data.

Benjamin_Heitmann: A fan of Lindon(sp?), but has its limits, because what it is based on.

Rigo: Data protection terminology is strict and explicit about entities (data subject, controller, processor) as erveything is about the realtions bethween these entities. Useing security terminology meay easily misguide.

Rigo: For the next discussion: One complex issue is how to connect the metadata to the actual data.

Rigo: we have not discussed how we attach policies to data in sufficent detail
… P3P had a "policy reference file". But not sufficient for a big service like Akamai: the file would be 16MB for each metadata item!

Rigo: Followup-questions: Do we need to secure this link and...

vassilios: who believes there is room for standardisation in this space

Participant poll: Raise hand, if you think, that standardisation is useful in at least one of the sub-topics disussed (100% affirmative)

Stefan Decker: GDPR will influence worldwide. So adapting to the GDPR makes sense - not only for EU-companies.

Stefan: Also for discussion: We believe the GDPR is European, but will have effect worldwide, and thus we also need global standards.

break

[break]

Discussion - standardisation opportunities

vassilios: We tried to group the post-it notes.
… We want to go quickly through our understanding of what is written on your notes.
… Some topics correspond to more than one phase of the process of data collection - data processing - dissemination.

skirrane: About that process: This taxonomy comes from a legal scholar and is used extensively in social sciences.
… Data collection includes surveilance and interrogation.
… Processing includes aggregation, identification, insecurity, secondary use, exclusion.
… Dissemination step includes (breach of) confidentiality, disclosure, etc.
… Before Collection there is an "Invasion" step, intrusion into the privacy of the data subject.

<rigo> taxonomies of privacy data, of disclosure, how information is collected, how can it be used

Stefan: One group of post-it notes deal with taxonomies.

<rigo> how can we access individual harm

@5: Classifying financial benefit to a company to own data.

Rigo: There is another kind of party, not in this model: users of data who get the data not from the data subjects but from another data collector, indirectly.

Darren_Bell: Create taxonomies to support research to create ontologies.

Discussions about necessary and/or sufficient. What is the intended next step.

Vassilios: At the moment not making priorities, just grouping. Can decide later what is easy to do, most necessary, etc.

Rigo: Goal of workshop discussion is also to get people engaged.

Vassilios: So now what? How do we decide which of the topics we discuss and how?

Rigo: Also interested in what are the most popular topics.

[Input provided by participants on sticky posts has been clustered according to Frank Bernieri's categories of stages of processing: Information Collection IC, information dissemination, information processing]

Mark_Lizar: about standardisation, what's most popular is maybe not the first factor. What fits the W3C process is also important.

skirrane: Let's do a live poll.

Further discussion about clustering and polling.

<rigo> log vocab appeared several times

Stefan: One keyword that returned in several post-its is event logging.
… Another is categories of purposes.

<rigo> purpose categories appeared is not too similar with taxonomies, but also appeared

<rigo> purpose is taxonomy, but not related to data subject

Stefan: Purpose categories as a taxonomy related to the data and not the data subject as a further cluster

<rigo> Information Dissemination: Metadata for anonymisation

Stefan: We grouped another few notes under Dissemination. Includes also risks and de-anonymisation.

<rigo> consent interoperability fit in there as well, number of different metadata

<rigo> geographical region and controller ship

<rigo> related to purpose

<rigo> general category purpose with subtopics

... purpose as general category with several sub-topics associated to it

<rigo> Information collection

Stefan: Under Processing, we also put issues with geographical location of processing.

<rigo> exchange formats for data and consent types

<rigo> is it processable

Stefan: We left some topics unclassified, such as industry-domain-specific issues.

Rigo: missing is a profile for data protection

<rigo> of available RDF vocabularies, like geodata and provenance

Stefan: Also enforcement and policy patterns are uncategorised.

Sabrina is constructing a poll to order the various requested taxonomies, such as for anonymisation, for purpose categories or for human behaviour.

The goal is that the participants rank them in order of what they would most want to participate in.

Adress of the poll: https://‌pollev.com/‌sabrinakirra386

Rigo: Taxonomy of privacy data actually includes taxonomy of purposes, because privacy data is defined to include the purpose.

@6: Different kind of taxonomy, purposes independent of privacy. Something ODRL is lacking.

Discussion about what "privacy data" means. Is it private data? Is it everything relating to data processing, including purpose? Something else?

Sabrina changes the poll form from "taxonomy of privacy data" to "taxonomy of GDPR term"

<Eva_Schlehahn_ULD> because there is difference between the concepts of 'privacy' and 'data protection'

<Eva_Schlehahn_ULD> Harald: missing is a mention of actors involved

Rigo suggests: include things like controller, processor, data subject, do not include name, adress, business adress ...

<BenjaminHeitmann> The poll is here: https://‌pollev.com/‌sabrinakirra386

<BenjaminHeitmann> maybe somebody can set it as the topic of the channel ?

Bert has changed the topic to: https://‌pollev.com/‌sabrinakirra386

People start filling the poll, even though discussion about its content is continuing.

Vassilios and Rigo agree: We should not discuss on taxonomies that actually should come from other domains, to avoid that we bring in data protection bias.

Is it too high-level? Where is the issue of linking to data? Is that not a taxonomy?

Jaroslav_Pullman: Not sure how to formalise it, but questions about what is needed to make decisions, requirements documents, whether something is enforceable.
… It is maybe a glossary of relevant context, rather than a taxonomy.

More discussion about which type of data is covered by which question on the poll.

[Lunch break - and online polling in parallel]

Discussion of next steps

Ranking poll resulted in "taxonomy of personal data" on top.

Rigo explains W3C's infrastructure for workshops, working groups and community groups.

Workshop can be a first step to see if some group is needed and what its goals should be.

To join cooperative work in the community group - get public W3C-account and have it added to the community group.

A group then needs a charter. A community group doesn't make a standard, but its output can be the creation of a working group, which, in turn creates a standard.

Jaroslav_Pullman: What is the Privacy Interest Group of W3C and its relation to this work?

The success depends on the activity of the participants.

Rigo: An IG doesn't create specs. It may review them.

Stefan: So what next if we do want a standard?

Once taxonomy is final the community group creates a charter that could lead to a working group beeing created.

Rigo explains charter development (usually with help from a staff member). Then needs a certain number of W3C members, about 20, to support the charter before the WG can be created.

Community group on the other hand has almost no requirements. If five people join, it's a group.

Rigo asks if everybody is OK to receive follow-up mail after the workshop, to announce the report and possibly the community group.

People not OK should tell Rigo.

Presenters who have slides and haven't given them yet, should send them, so they can be linked from the Workshop Web page, with the report.

Round of thanks.

To Rigo and Axel Polleres for the organisation.

To the chairs for managing the presentations and discussions.

And to Sabrina for the local arrangements.

– DRAFT –
"W3C Workshop on Privacy and Linked Data (Day 2)"

18 April 2018