Workshop "Data Privacy Controls and Vocabularies"

Meeting Minutes

Opening remarks

Sabrina: Welcome on behalf of WU.

Rigo: Some history on myself, the SPECIAL project and the workshop.

... trying to identify gaps for follow up initiatives

Rigo: Workshop is looking for your feedback.
… Workshop chairs: Stefan Decker and Vassilios Peristeras.
… Both old hands at Linked Data and at privacy.
… Interested in agreeing on vocabulary.

... core privacy vocabulary

Stefan: it's not just about what happened in the last couple of weeks
… I would love to see some W3C activities on privacy vocabularies
… I work since a long time on Linked Data. Privacy more recently. Subject is topical (Facebook, Cambridge Analytica...)
… ethical behavior dealing with data needs to be discussed too
… There are ethical questions as well as technical.

Vassilios: I come from systems, interoperability, public administration.
… I'm here also to learn about privacy.

Stefan: more time for discussion, less for presentations

Vassilios: About the workshop programme: four sessions. Discussions important, so please leave time for those.
… First session on existing initiatives.
… Second on industry, third on government.
… Sabrina will explain the networking event.

Stefan: Tomorrow we have the research track

Vassilios: Tomorrow about research, and more time for discussion.
… Panel discussions are not to let panel talk among themselves, but to talk with you.
… Conclusions tomorrow before lunch, because some people need to leave early, but informal discussions continue after.

Rigo: We use IRC for taking notes (as typical for a W3C workshop)

Round of introductions.

<AxelPollleres> quick remote intro: Axel Polleres (listening in via VoIP) , WU & currently visiting at Stanford (which is why I couldn't make it), looking forward to the workshop, my primary interest: interoperability for data portability and transparency.

Use the whiteboard, the post-it notes, or IRC, to note topics for discussions later.

COELITION (Joss Langford)

Joss_Langford: From OASIS COELITION group.

<rigo> Coelition presentation, strong industry focus

Joss_Langford: Which is the SIG for the OASIS standard.
… Help industry in responsible use of personal data.
… Speaking here with my OASIS hat.

<Eva_Schlehahn__ULD_> Do I understand correctly that basically anyone can make notes here?

Joss_Langford: IOT is about what we do, rather than what we say.

<AxelPollleres> Are the slides somewhere online?

Joss_Langford: COEL syntactic level and semantic layer.
… Semantics came for large part from Unilever.

<AxelPollleres> Can you please link the position statements to the Webpage? that'd help

<AxelPollleres> https://‌www.oasis-open.org/‌committees/‌tc_home.php?wg_abbrev=coel

Joss_Langford: We found around a 100 behaviours per day for a person. We have 5000 currently defined.

<AxelPollleres> https://‌coelition.org/

Eva_Schlehahn: I'm a legal person, confused about the "data source" you talk about.

Joss_Langford: I explained it poorly...

<Bob> OASIS COEL TC page: https://‌www.oasis-open.org/‌committees/‌coel/

<Bob> OASIS COEL spec: http://‌docs.oasis-open.org/‌coel/‌COEL/‌v1.0/‌cs01/‌COEL-v1.0-cs01.pdf

<Bob> OASIS COEL taxonomy: http://‌docs.oasis-open.org/‌coel/‌COEL/‌v1.0/‌cs01/‌model/‌coel.json

<Bob> OASIS COEL taxonomy interactive visualisation: https://‌coelition.org/‌business/‌resources/‌visualising-life/

ODRL Usage Control (Pullmann, Mader, Eitel 10min)

Jaroslav_Pullmann: introducing Industrial Data Space
… it's built around the notion of a connector, where data is published/consumed
… continuous evaluation of the context
… data isn't stored centralized in the cloud but remains with the data provider all the time
… policies are negotiated between data provider and data consumer directly
… what makes a policy enforceable?
… Coverage vs. Enforcement
… Specification level policies vs implementation level policies
… SLP doesn't include any implementation specific information
… looking at ODRL; investigating whether it can be used for our purposes
… especially wrt. to enforceability
… establish a community of practice, we are missing the actual users of a standard policy language

<AxelPollleres> ... does the model cover "purpose" as a concept?

<Andreas> Industrial Data Space Information Model: https://‌github.com/‌IndustrialDataSpace/‌InformationModel/‌tree/‌develop

AxelPollleres: does your model cover "purpose"?

<Andreas> More information about the information model (see section 3.4): https://‌www.fraunhofer.de/‌content/‌dam/‌zv/‌de/‌Forschungsfelder/‌industrial-data-space/‌Industrial-Data-Space_Reference-Architecture-Model-2017.pdf

Jaroslav_Pullmann: we currently use ODRL for modeling the policies
… the evaluation of the policies is part of querying
… we still don't have the negotiation of policies
… policies being dynamically negotiated between parties, rather than them being statically attached to the assets

rigo: this raises another question; how do you actually attach the policy to the asset?

<rigo> how to connect different connectors. Yukon is not sufficient here

Remote Obligation Enforcement (Lux, Brost, Schütte)

Michael_Lux: what's the actual meaning of "delete"? what's the technical concept behind it?
… or "listing to the audio file for 3 times"
… what's 3 times?
… in our context, we don't have a UID for each asset

<Eva_Schlehahn__ULD_> Rigo asks wehther a taxonomy already exists

Michael_Lux: as we would generate policies on the fly
… We have some questions/suggestions for enhancing ODRL in IoT context.

<rigo> Induce is much more powerful than Yukon

<Andreas> Yukon= LUCON. More information about ind2uce in general: https://‌ind2uce.de/ for developers and detailed technical information: http://‌dev.ind2uce.de/

Attaching the policy directly to the data instead of linking it via a UID makes UID unnecessary.

Harald: This can may be a major benefit for the data protection principle of "unlikability" and "purpose binding". Where UIDs are unnecessary its preferable to avoid them. However, we may end up with UID if we need a backchannel to data subjects to enable functionalities such as withdrawal of consent or righ[CUT]

Kantara CISWG (Mark Lizar)

Lizar talks about consent receipts

MarkL_: objective is to develop a common record of privacy that is consent centric
… there a common requirements regarding privacy around the internet
… what identifies privacy transparency?

MarkL_: we are mapping consent recipes to GDPR terms
… research shows that ~90% of the companies have very non-transparent privacy policies (?)
… there's a gap between the technical POV and being meaningful to people

MarkL: I'm personally interested in enhancing our Consent Receipt to make it understandable to people.

Stefan: thinking about the google cookie notifications; everyone clicks them away without reading them

Stefan: Reminds me of EU cookie policy. Everybody clicks the banner away. Isn't your solution similar?

MarkL: The banner is definitely not the appropriate machanism.

Decentralised Identifiers (Markus Sabadello)

Markus Sabadello speaking about decentralised identifiers

Rigo puts up Facebook's full-page advert in a local newspaper, which says in big letters (in German), "the EU regulations bring you more data protection".

<Eva_Schlehahn__ULD_> It's perfect for agood laugh in between

Markus_Sabadello: who are we, when we go online?

Markus_Sabadello: Emerging paradigm is "self-sovereign identity"
… today, a digital identity is something that's given to us
… we try to move away from that

<Eva_Schlehahn__ULD_> In Germany, we have the concept of informational self-determination

Markus_Sabadello: Put self in the center, rather than your Facebook identity or Amazon identity.
… we are developing a concept called DID (decentralized identifiers)
… they are cryptographically verifiable
… My company's approach is via a Decentralized IDentifier, a URL like "did:xxxxx"
… prefix determines where the DID is "registered"

<rigo> did:method:hash

Markus_Sabadello: Various types of did-URIs, called "methods".
… a DID can be resolved to a DID document (JSON-LD)
… those DID documents are public information
… https://‌uniresolver.io allows to resolve DIDs
… We have an implementation of a universal resolver for did-URIs, and building various other things on top, including distributed PKI.
… currently working on decentralizing several concepts such as PKI, KMS
… PKI can in turn be used for something like verifiable claims, which makes it interesting for companies, e.g.,

<rigo> put DID on blockchain to have verifiable statements of one ID about the other ID

<AxelPollleres> Markus, I know we had this discussion already, but I still don't get why DIDs are better then URIs... it is just about the governance structure... if you have a subset of URIs that follow a more rigid goernance than general URIs, that'd not need be incompatible with URIs

Freddy_de_Meersman: There seem to be different, incompatible initiatives in different countries. How will the market evolve?

Markus_Sabadello: Yes, indeed. Not an expert on standardization, but think a few dominant ones will win.

<AxelPollleres> I see the point though, that verifiable claims or something similar could be used to carry personal data in a verifiable manner.

Harald: can this be also used for a verifiable revocation of consent?

Markus_Sabadello: These did-URIs are valid URIs, can use them anywhere where a URI is expected, thus also in Linked Data.

Stefan: persistent identifiers are an open problem, Bob Kahn, created DOI, there is also ORCID
… how do you position yourself compared to handle system

Markus_Sabadello: These handle systems aren't decentralized, you need to register with some authority.

<rigo> ID is here comparable to a keypair

Stefan: So what's your strategy to convince the world to use yours?

Markus_Sabadello: We're not really at that point yet :-)
… We have a vision. We hope people share it.

<AxelPollleres> is there a queue so, or not?

<AxelPollleres> ah.

Markus_Sabadello: [In response to a question from Mark Lizar] Managing your own identity has the risk you lose it. Nobody has a backup.

<Eva_Schlehahn__ULD_> or what happens if someone tampers with the identity? how to roll back this?

<AxelPollleres> :-)

Markus_Sabadello: Your self-managed digital identity is meant to be complementary to state-issued identities, not contradictory.

[coffee break]

<AxelPollleres> put in my questeion/remark here again: (1) could SSI be made compatible with URIs? there is a Web infrastruxcture which supports HTTP URIs out there, I think it would make sense to be compatible with that (2) verifiable claims can also play an important role in the puzzle, e.g. for those claims protecing personal data not needing to be shipped around directly, but jsut be confirmed

Vassilios: Question about compatibility of self-sovereign identity and official identity.

Markus_Sabadello shows a paper on the subject. "Self-sovereign identity". On-going work on this

<rigo> integrating with eIDAS, qualified self-sovereign idenity. Using qualified signatures

Eva_Schlehahn: How to verify identity?

Markus_Sabadello: I keep the identity and the verification separate. I can make as many identities as I want, without need for verification.
… But if verification is needed, we can use something like Verifiable Claims, to let a third party state its verification.

AxelPollleres: why do we need a different protocol/concept than http/URI?

AxelPollleres: USing http instead of did would make it compatile with existing Web.

<MarkL> is all of this really about demonstrating compliance - is distributed identity ultimately a governance issue ?

Markus_Sabadello: http URIs don't satisfy the requirements of self-sovereignity

Markus_Sabadello: did-URIs are not incompatible.
… It's just another URI in RDF.
… it would be an interesting exercise to do some testing on that front

<AxelPollleres> I agree with Markus of course that other URI schemes are fine, but I mean that http(s) URIS have a ton of tools and softwarea already available, what I meant is more that an idea like DID could possible also be realized with "normal HTTP URIS, say (without understanding the technicalities here in all detail, admittedly), assume DID wouldn't use did:XYZ but http://‌did.org/‌XYZ with the same dezentralized governance structure behind, what w[CUT]

<AxelPollleres> the diffference?

Usage Control & GDPR (Sabrina Kirrane)

Sabrina: Representing SPECIAL project.
… we are trying to support the regulators (enforcing the GDPR)
… supporting the companies to (semi-)automatically check the permissions that come with data
… Trying to support, via (semi-)automatic means, the companies, the users and the regulators.
… we need to be able to model legal policies (like the GDPR) but also contracts/usage policies
… Requires repesenting the legal policies; all relevant ones, not just GDPR.
… laws and regulations are subjective (and they are that for a reason)
… But we're really far away from full automation. Laws are ambiguous, on purpose.
… Approach via logs: inspecting logs to check for compliance, after the fact.
… companies don't want to check the law afterwards
… But also need to check before the fact.
… there are many different policy languages out there
… In addition, everything needs to be scalable. E.g., partner Thomson Reuters has lots of data.

Sabrina: we are using OWL 2 EL
… There are technologies coming out of research, there is ODRL, and we're working on our own. Ours is very influenced by our project's legal partners.
… there's usually a subsumption relation between terms
… we want to see, whether the processing adheres to the usage policy
… I went through GDPR line-by-line, looking for rules and terms. Creating a vocubulary, several taxonomies, that companies can use.
… We already make a list of existing vocabularies, e.g., for identifying people, people's health data (use-case about fitness), geo-location, etc.
… Many of these come from W3C.
… we based a lot of our concepts on P3P
… One submisison to this workshop mentioned P3P. We should discuss that later. We enhanced P3P.
… Provenance vocab also useful.
… SPECIAL Policy Language is one of the deliverables.
… Standardization deliverable 6.3 (https://‌www.specialprivacy.eu/‌images/‌documents/‌SPECIAL_D6.3_M9_V1.0.pdf)

<vassilios> I have mentioned a couple of times also the Core Vocabularies developed by the European Commission: https://‌joinup.ec.europa.eu/‌page/‌core-vocabularies

Sabrina: rather than approaching it from a technical perspective, we went through the structure of the GDPR
… identifying the structure
… what do we need?
… [answering a question from Monica Palmirani] I'm not a lawyer myself, so I engaged law students to analyze the text, and especially to find the implicit references.
… we had 2 people with legal background going through it
… We need the legal community to check the work.

These are looking also at explicit and implicit references and linkages between different legal provisions on the GDPR.
… We will have not one, but several interpretations

Monica_Palmirani: Interested in the legal reasoning, in addition to terms and rules.

Sabrina: I'll be at a Dagstuhl seminar about that.

Martin_Kurze: Shouldn't the vocabulary be larger than just representing the GDPR?

Sabrina: Indeed, need general approach. But we cannot look at everything at once.

Question about business cases.

Victor_Mireles: so this language is also able to represent the business logic of your applications?

Sabrina: Javier will talk later about that, when he talks about analysing log events.
… [Answering a question] We use OWL to benefit from existing reasoners. We don't have formal semantics for ODRL yet.

Jaroslav_Pullman: what's the motivation behind coming up with your own language, esp. compared to ODRL?
… so you are focusing more on compliance than enforcement?

Sabrina: yes [and explained why]

Victor_Mireles: what about adaptability?

An ODRL profile for GDPR (Ensar Hadziselimovic)

Ensar_Hadziselimovic: I'm here on behalf of my colleague who's not able to join today
… GDPR has 3 parties, data processors/controllers/subjects
… so we looked into SWT for modeling GDPR's workflow
… what we are proposing is building a profile for ODRL
… extending it by e.g. a concept of "editing"
… this profile is just in its infancy

<rigo> mapped GDPR text into ODRL Profile

<rigo> it is online

<rigo> goal is to have smart contracts between controllers

<VMireles> https://‌old.datahub.io/‌dataset/‌gdprtext

<rigo> ... also thinking about enforcement, blockchain, etc

Martin_Kurze: Sabrina just said modeling is difficult and you say we just did it in ODRL :-) How did you do it?

Ensar_Hadziselimovic: ODRL seemed appropriate.

Rigo: We're trying to understand what the legislator wanted. Another issue is ePrivacy, which is on its way. If we model ePrivacy, a reasoner could tell us what happens if we vary some parameter.

Ensar_Hadziselimovic: Yes, we are interested in that, too. We have a group studying that.

People who have topics for discussions tomorrow should put a post-it on the wall next to the podium.

Panel Discussions

Stefan_Decker: what are the interdependencies between the things presented today?

<AxelPollleres> intormation models: minimal (GDPR) vs. general policies

Sabrina: one of the goals we actually have is to see data sharing (which usually happens between companies)
… something what Markus presented (DID) might be very valuable in that context

Stefan_Decker: what are requirements for such identity systems?
… wouldn't you assume some form of registry where you store (?) information about identities?

Jaroslav_Pullman: resources might be generated

Mark_Lizar: We also have identities for companies. Allows detailed policies. Need same level of granularity. Distinguish data that can can be transparent or not.

Mark_Lizar: ... multiple user identities.

@1: How do you model time? Some things may be retained for years, others only for minutes.

Mark_Lizar: Do you expect companies to delete data? Legal requirements to separate data.

Sabrina: The lawyers I talked to expect things to be deleted in full, including from all backups.

Jaroslav_Pullman: which is nonsense

Sabrina: But this is something we need to discuss and define better.
… Use data subsets with different policies.

Joss_Langford: Deletion can be "splitting", or rendering it non-personal.

Sabrina: which in turns means to deal with things like integrity

Mark_Lizar: Powerful concept of breaking identifier from the data.

Eva_Schlehahn: There are requirements from different domains: legal, technical, industry...
… What kind of rules? How do we get a overview of them?

Mark_Lizar: We track the notice requirements.

Sabrina: especially obligations

Mark_Lizar: Data protection, processing, internal processing is important, too. But we're looking at notices and what makes user trust.

Q: what are your thoughts on an entity having multiple identities which are independent from each other?

Stefan: We wanted to shorten the lunch by 10 minutes for more discussions. But we've already used 20 minutes. So time for lunch! :-)

[Lunch Break]

<AxelPollleres> I am off for today, and will try to rejoin tomowwor afternoon

Privacy challenges in the Opera Browser (Michael Markevich)

Michael_Markevich: opera has ~300 mio users
… we try to not identify users
… collect a lot of personal data
… users are way more tempted to use service, rather than being concerned about privacy
… some users are adjusting privacy settings deliberately
… which seems to be tied also to specific markets
… in Europe, Germany and Austria are the most privacy concerned countries
… Opera is present in ~95 markets around the world

[explains how data collection is carried out & enriched with consent information]

Matthias_Schunter: Which role do you see that browsers will pay in the field of data protection compliance?

Michael_Markevich: Depends on the browser. Opera depends on Google (using their engine). Major effort could be in building a real transparent way to obtain consent and to provide transparency about which data is actually shared by users with webpages.
… Data protection is driven by legal persons. Opera could contibute on the privacy-aspects with the end-user perspective as browser are their access point to many services.

Q: ePrivacyR-draft aims to have browsers as tool for interaction business-usesrs on privacy preferences. Any initatives on Operas side so far already?

Michael_Markevich: Not yet.

Tracking protection (Matthias Schunter, Martin Kurze)

Second half of presentation on tracking prevention by Martin Kurze: Do not track (DNT) is only about a bit (1/0 message) about the user preference. Having only one bit was the initial mistake in standardisation. Preferences need to be able to express more than only a yes and no decision. As a mobile communication provider Telecom also considers location tracking as tracking and took this into consideration. Location information is insofar mu[CUT]

Rigo: DNT has a tracking status resource. Can I use that to store [missed]?

Matthias_Schunter: Sure.

Rigo: Tricky legal matter: third party and "legitimate interest".

<Eva_Schlehahn_ULD> I think companies will learn over time that it's not a good idea to base too much on 'legitimate interest' - justification and documentation of compliance requires just too much effort and if users become aware the backlash will be evil

Modeling, recording, communicating and interoperability of consent (Georg Philip Krog 10min)

Formalise privacy policies so that they can be machine readable

Create trust in the network protocols to enable 3rd parties to share data

Slide - what is required to automate lawful processing of data in a network

Georg_Philip_Krog: We're a company from Norway that develops software to manage & disclose end-user's consent.
… Several 3rd parties. Which data needs to be disclosed to which? What is the 1st party's processing and what is its purpose?
… How detailed does the legal basis need be specified?

Scenario: User data is sent to "company blue" shared with 3rd parties (yellow) who then share with further shared parties (red). User wants to know what happens to her data. What is the minimum information that must be presented to the users? - "company"-Identity, all purposes of processing, legal basis ....

<Eva_Schlehahn_ULD> remark: controller also need to decide for a specific legal basis because this is also what determines data subject's rights

Georg_Philip_Krog: Some countries need the processing duration period to be specified

Specifying consent - difficulty here: Is it sufficient to specify consent only or is more required whenever other combinations of consent and other legal ground of Art 6 (1) (b-f) are applicable?

Georg_Philip_Krog: How do you build the record for the consent data and in what format?

Necessary to name receiving data controllers (yellow companies), where yellow is a data processor at least the purposes of having this processor need to be named.

Georg_Philip_Krog: [Slide shows data subject (gray), company (blue), some 3rd parties (yellow), and several addition 3rd parties (red) that get data from the yellow ones]

Jason Novak hasn't arrived. Topic skipped. Short break.

interoperability issues for mobile operators (Freddy de Meersman 10min)

Freddy_de_Meersman briefly presents Proximus and their collaborations with Eurostats and MIT

<Eva_Schlehahn_ULD> they focus on exploring opportunities in the field of location based services

Freddy_de_Meersman: Telecom operator, and active in location information services. Innovation.

... 3-4 years ago they were asked by Eurostat if they were willing to provide data to compute population density statistics

Shows video about Eurostat and Proximus collaboration on mobile data.

https://‌drive.google.com/‌file/‌d/‌1leZzWmqwdqzKM3VJ6qThhIdvEyKfR6ok/‌view?usp=sharing

... video presented link above

Video explains Eurostat's role (statistics for EU) and how it gets data from national bodies.

Eurostat also looks at quality and cost of collected data, and scientific models.

Requires IT skills, esp. in big data.

skipping to min 16:00 of the video - staged strategy gaining more operators for the system

Another fragment of the video talks about a staged strategy created together with Proximus, but eventually open to other network operatores.

The data flow in the first stage of this project is from collected data, to standardized data (done by network operator), to aggregate data.

Rudy: This data collection has to be secure.

Freddy_de_Meersman: Churn is a big problem, therefore we want to be able to predict this in advance

Freddy_de_Meersman: Proximus is biggest in Belgium, but loses money because customers are leaving. So we're modeling customers to predict their behaviour, based on events.
… interested in computing macro economic models
… for this we would need to engage banks, shops etc..
… Reliable data for a country or for Europe needs data from more than one operator.
… plus there are obvious privacy issues
… we want to do research, however we don't know how to do this in light of the GDPR
… Looking carefully at privacy. E.g., statistical data of a GSM cell where only a handful people connect is not really anonymous.
… this is one of the major drawbacks of the GDPR

Martin_Kurze: What is the business model behind for collecting the statistics?

Freddy_de_Meersman: We will get paid for it by Eurostat. And we use it to learn what our data means and how we can exploit it ourselves.
… There is money in being able to provide new data. E.g., number of cyclists in a location is interesting for certain billboard companies or advertisers.

Ben: Is this within the competence of Proximus?
… Do you imagine Proximus doing the analytics or other entities?

Freddy_de_Meersman: This is key competency of Proximus.
… we consider we have this expertise.

@2: You'd need to track people across borders for some kind of data...

Freddy_de_Meersman: Yes, indeed. With Eurostat, we are looking for other operators to exchange data with.

Freddy_de_Meersman: The first stage was within Belgium, however stage two is Europe wide

Freddy_de_Meersman: E.g., Eurostat interested in movement in Schengen zone.

Freddy_de_Meersman: we need standards and interoperability with other telecom operators

Freddy_de_Meersman: We'd like to be able to track at least movement of groups of people.

Rigo: One of the limitations is that you lower the value of the data

Rigo: Aggregation and anonymisation usually lowers the value of the data, because de-anonymisation techniques are getting stronger.
… deanonymisation is so good you need layers of anonymisation which destroys utility.
… therefore it is better to work with the data subjects and make sure that they know that the data will not be used for other pruposes

Mark_Lizar: Collecting events without the identity is one approach.
… Events seem to be key here

Rigo: But if the event is precise enough, you can often connect it back: The guy that always buys a newspaper at 8:01 in the morning. You don't even need to know his name...

Building the Legal Knowledge Graph (Victor Mireles)

Victor Mireles presenting Building the Legal Knowledge Graph, Lynx-Project

The idea of the Lynx project is to allow companies to search for relevant regulations in other countries

Victor_Mireles: Idea: Search through different European and Member States regulations. Use case: Company wants to know which rules will apply.
… we provide the services necessary to make the search as effective as possible
… Language processing of laws, regulations, case law.

It is explained that several legacy systems can be involved, e.g. data protection, oil & gas compliance, labor law

Victor_Mireles: We're partner in the Lynx project. Goal is a system that is able to search through data such as contracts, which are fed into the system in the form of annotated text.
… Not our goal to formalize every piece of legislation. Rather to extract selected pieces of knowledge, and build a knowledge graph for those.
… Text from courts, video, aerial photos, etc.
… we are pushing for the legal industry to take our project outputs

Victor_Mireles: Modernizing the legal practice.
… Legal texts include government data, but also business policies.
… publishing legal information in a multi-lingual way
… with as much use of controlled vocabularies as possible

Victor_Mireles: Controled vocabulaires allow translation.

Monica_Palmirani: Your representations of legal texts are not complete?

Victor_Mireles: Indeed, but they still help legal practitioners.

Monica_Palmirani: Risk of bias?

Victor_Mireles: Yes, the risk exists.

Monica_Palmirani: Your selection makes certain aspects easier to access than others, which invites bias.

Sabrina: Avoiding bias is really difficult.

Sabrina: Yes, we had that experience. People trying to formalize texts put in more of what they knew well.
… A question for you: What motivates your choice of subsets?

Subsets: 1. privacy law compliance, 2. labour law in spain for companies setting up business in Spain, 3. renewable energey regulation in Norway

Victor_Mireles: Our use cases, labour law in Spain and renewable energy in Norway.

Sabrina: How do you do the extraction? Manual? Automatic?

Victor_Mireles: No answer yet, hopefully in a year...

Q: Number of text analytics done? To be answered in the future, currently developing the tools for that.

Rigo: In my experience, lawyers in a library find info faster than a computer.

Rigo: Legal identifiers are realy unique. In every kind of jurisdiction we have namespaces for regulations including numbers. Question: France has 103'000 Provisions. Wouldn't it be better to go by one by one, starting with specific laws.
… An approach may be to model case by case, and try to merge them into a general tree afterwards, instead of modeling, say, the GDPR as a whole.

Victor_Mireles: Surface forms of laws differ a lot, especially between countries.

About Identifiers: They exist in theory but in practice there are serveral ways to refer to an article in Spain, which gives several identifiers for the same rule.

Victor_Mireles: We do a bit as you suggest. We start from our three use cases.

Eva_Schlehahn: Do you have a strategy how to integrate?
… Law is not static. Each court decion may take interpretation further. Do you have a plan how to deal with this?

Victor_Mireles: We work mostly at level of a paragraph, we rely on humans to decide if one paragraph overrides another.
… Have basic paragraphs and annotate them. But real interpretation is for the humans. Technology provides you the paragraphs with appear to the coputer to address the same things.

@3: ... [something about time: rules that were valid yesterday but not today]...

Task to put on sticky paper: What are the issues you believe that could be standardized with respect to privacy. Up to three issues, if none stick up empty post. Posts are necessary for tomorrows session.

Vassilios: For the discussion tomorrow, we'd like you to put post-its on the left wall with three points that you think should be standardized.

The governmental side & initiatives

Integrating ontologies for privacy legal reasoning (Monica Palmirani 10min)

Monika Palmirani presenting on PrOnto: Privacy Ontology for Legal Reasoning

Monica_Palmirani: Presenting PrOnto project.
… Existing XML format for legal documents.
… Formats and ontologies change over time.
… LOD to bind them together and bind to the official, published texts.
… Need to track process to prove we comply.
… Developed ontology.
… Two teams, one in Autralia and one in Luxemburg, with legal and logical experts.
… Need common vocabulary to be able to compare their results.
… Coded version cannot legally replace the text, but we use it to annotate.

Further sources include crowdsourceing, and other sources IoT, ...

Monica_Palmirani: Modules for rules in different countries.
… wants to include deontic concepts
… PrOnto thus helps to find the basis for rules.
… Pillar of the ontology is data, processing, purposes, agents, rights/obligations
… Ontology based on concepts of data, processing, rights/obligations, agents, and purposes.

quite similar to SPECIAL

Monica_Palmirani: E.g., processing concept includes ideal workflow, plan and actual execution.
… one part of the ontology dedicated to the execution (check the plan and compare it to the policy)
… Another part of the ontology dedicated to the execution (...of the law?) - e.g. in case of data breaches.
… Agent includes roles and events in time.
… Agents: Each entity is connected with roles. Each entity is connected with rights.
… Rights are an extension of LegalRuleML.
… Purposes used by Processing.

Cloud4EU Project Scenario

Monica_Palmirani: The Cloud4EU project did a first part of this, focused on privacy by design.
… LegalRuleML has links to original text.
… We have tools, in particular a dashboard for the users.
… And tool for compliance checking, that creates a report (i.e., more than just yes/no)

Eva_Schlehahn: Wondering if there are ways to express information on the means of processing (software used, hardware used).
… Wondering about the means of processing, methods and software.

Monica_Palmirani: Process is sequence of defined actions.
… Project is based on a model of UK. Model of actors and conditions can be extended by more steps.
… Based on analysis done in UK.

Privacy and Data Protection in Australia (David Watts 10min)

Davd_Watts: My background is policy maker and lawyer, not technology.
… From that perspective:
… Policy for me means public policy, not protocols.
… So for me the GDPR is not policy, but rather an implementation of a policy.
… Another confusing term is ownership, in particular "information ownership".
… You cannot own information. But in tech., it means controling, being responsible for.
… And I'm from Australia. GDPR doesn't apply there, as to 90% of the world.

GDPR applies to 6% of people - there are another 94%.

Davd_Watts: There is some extraterritorial application, but it's limited.
… Clinton administration in 1993 set a policy. It was based on cooperation rather than regulation.

When internet was set up the policy was set up by the Clinton administration. Core-concept was cooperation

Davd_Watts: That is basically how the Internet works to this day.
… Internet seems to have become a surveillande domain.
… GDPR changes things a bit, but probably not a lot.

Internet became surveillance space. Currency are personal data, as advcertisers pay for interent services.

Davd_Watts: Internet based on monetising information, advertising.
… Internet services' goal is to make users addicted, make them stay on a site.
… I think "consent" as a concept has failed.
… Not that consent isn't important. But implementation failed.

(Slide 2:) Failures of core regulatory concepts - consent, purpose, anonymity, "resonable steps", security => All these concepts were collected from the public sector not from business processes.

Davd_Watts: If the goal of a service is to sell adviertising, than surveillance seems like a valid pupose.

The question is how can this vocabulary inmplemented that it actually works? Missing this aim has threats such as endangering democracy, uncontrolled monopolies, manipulation of attitudes and opinions

Davd_Watts: Consequence is covert manipulation, resulting in threat to democracy.
… So why do we _say_ privacy is important but _act_ as if it isn't?
… Lack of knowledge? Addiction?
… Can Linked Data support regulatory policies?
… Must be in conjunction with public policy.
… And needs standards.
… Empowering end-users?
… Can be done through technololgy, but only together with law.

@4: Information governance?

Davd_Watts: There have been many problems caused by lack of governance.
… Often to do with lack of security.
… Hard to get managers to understand the need for security.
… They understand security in oil digging, not in information.
… Also have to deal with ethics.
… What are the ethics? And what are consequences of nort following them.

Rigo gives example of engineer buliding a bridge that collapses. He will be fired. Not so in software. But Sabrina counters that, in one case like the other, it is often a group effort, and not clear who is responsible.

Somebody remarks that insurance companies will be interested in that question, when money is involved.

Privacy for linked open government data (Peter Bruhn Andersen 10min)

Peter_Bruhn_Andresen on Privacy for linked open government data

Peter_Bruhn_Andersen: Danmark is highly digitized.

Danmark is highly digitalized. all citizens have an ID.

Peter_Bruhn_Andersen: Everybody has an ID that is used in all government databases, for all kinds of purposes.
… So in theory should be easy to find out what data is stored about me.
… Working on Linked Data to allow this.
… But we have many thousands of databases.
… They use different APIs and different access-control mechanisms.
… Would like to have unified API based on Linked Data.

Sabrina: In preparation for this workshop, we focused on vocabulary question, but we're interested in protociols, scale, etc. as well.
… They cannot always be separated.

Comment by Sabrina: When we started to think about this workshop we were also very interested in the platform. Given that we have open and closed data we need a platform for those, so we should also think about platforms.

The UK Data Archive (Darren Bell 10min)

Darren_Bell: From perspective of data architect.

Darren Bell: We talk about disclosure are opposed to Privacy

... once the data has been shared it's about managed risk

Darren_Bell: Talking about private data, but in reality, when it's shared, it no longer private.

... we have administrative processes however they juts don't scale

... we need more more machine actionable clever approaches that can scale

Darren_Bell: Assessment of risk instead. E.g., smart electricity meters being introduced in UK. Leads to a lot of data.

... we need to unlock governmental data. At the moment governments are risk averse

... we need an ontology solution and also metadata

Darren_Bell: Unlock data based on meta-data ontologies.

Ontology descries de-anonymisation operations to define risks.

... we would like an ontology that describes deanonymisation that can feed into a risk profile

... aggregation there is a utility tradeoff, we need more information on this

Darren_Bell: Describe the de-anonymisation processes in order to formulate useful ontologies.
… Linked Open Data doesn't have much disclosure risks, in principle.
… Linkage becomes trivial, technically. But risk assessment is not.

Rigo: How do you measure disclosure risks?
… Disclosure is usually treated as yes/no.
… I proposed an entropy-based approach long ago.

Darren_Bell: It's indeed not binary.
… I don't know how we will measure risk. But we need to go in that direction.
… We don't know the answer, but I think we can find an answer.
… Experts still give different evaluations of the same situation.

Joss Langford: Are you aware of the anonymisation decision framework out of the uk?

Darren_Bell: Yes, but not sufficiently mathematical yet.

Sabrina: Aren't policy makers too risk-averse for XXX?

<MarkLizar> UMA Legal -Licensing for personal data--> https://‌kantarainitiative.org/‌confluence/‌display/‌uma/‌UMA+Legal

Rigo: Standardisation can be an 80-20 solution. Defining the definitely green area and leave the rest for later.

GDPR transparency requirements (Schlehahn/Zwingelberg 10min)

We have heard different approaches about transparency and obtaining consent - talk will focus on legal text's requirements.

Eva_Schlehahn: My role here is to explain what the GDPR actually says about transparency.
… Not just about data, also about processing.

Aspects to make transparent: which data for which extend? processed in which way / by which means? purposes? transfer to third parties / cuntries?

Eva_Schlehahn: which data, how, why, by whom, for whom?
… Some examples:

Control and understanding of their own processes is also important to organisations themsleves.

Eva_Schlehahn: IT processes, but also other processes. Logs of who accessed the data. Versions of systems.

Necessary for the specification may be the "staus of a consent" in terms of given / pending / refused / withdrawn? Consent mangement should be made easy for data subjects, e.g. with mobile phone apps.

Could the specification also support the execution of data subjects rights such as right to access, deletion or rectification?

Specification should be able to express categories of data (special categories vs. 'normal' categories), typcial classes such as master record data, movement and location data, logfiles and protocol data.

Eva_Schlehahn: GDPR defines special categories of data, such as religious beliefs, health or trade union membership.
… Data about a child needs consent from an adult.
… Consent is distinguished in implicit and explicit. Consent can be given, not yet given or withdrawn.

Monica_Palmirani: Portability of data restricted to your own original data, not what the company derived from it.

Q: Classed of data - during research they found that classification is useful between original data, derived data, linked data. Right of access may not be applicable to linked data or derived results. (Art 22 algorithms.

Eva_Schlehahn: Data always linked to purpose.

Harald: [in answer to a question] standardizing purposes amounts to standardizing business processes. We're probably better off leaving that as a free text field. But may extend what P3P did.

Purposes can also be _kind_ of purpose, which may be more managable.

Q: Clear suggestionf for data types but not for purposes. reason? A: too many potential purposes.

Monica: Art 29 WP on consent pointed to have specific purposes and to not fallback to consent as a default.

Rigo: W3C is about to obsolete P3P. P3P was very much influenced by [YYY]. But useful to study it.

– DRAFT –
Workshop "Data Privacy Controls and Vocabularies"

17 April 2018