MinutesOfMeeting 20220119

From Data Privacy Vocabularies and Controls Community Group

DPVCG Meeting Call 19 JAN 2022

Agenda.

Attendees

Present
:, beatriz, harsh, jan, julian, mark
Regrets
paul
Chair
harsh
Scribe
harsh

Contents

  1. Jurisdictions - resolving proposed concepts
  2. Data categories in DPV
  3. Personal Data Handling in DPV
  4. Sensitive Data and Special Data
  5. Data Source and Processing
  6. Next Meeting

Meeting minutes

Jurisdictions - resolving proposed concepts

No comments or objections raised. Proposed concepts accepted.

Data categories in DPV

We have moved personal data categories to dpv-pd extension with the namespace/IRI as `https://w3.org/ns/dpv-pd%60 to be redirected to `https://w3c.github.io/dpv/dpv-pd%60. This includes renaming PersonalDataCategory to PersonalData.

Some categories have been retained in DPV (main). These include: Sensitive, Special, Derived categories.

New proposals include InferredPersonalData as subclass of Derived.

In addition, to indicate something is not personal data the concept NonPersonalData is proposed along with the umbrella/parent concept Data to represent both personal and non-personal data.

PseudoAnonymousData as a subclass of PersonalData and AnonymousData as a subclass of NonPersonalData are proposed.

Jan and Julian raised concerns about NonPersonalData being combined with other data to become PersonalData. A note is to be added to the concept and its usage to illustrate caution in such usage and to clarify that non-personal is a 'tag' or 'label' to declare something is not personal data e.g. after anonymisation.

---

The reasons for inclusion of Data as a concept include 1) as a semantic parent of PersonalData and NonPersonalData; and 2) to specify something is just data i.e. it is neither personal nor non-personal e.g. if it is unknown whether something is personal data.

jan: Are or could the existing categories provided by DPV be used to identify if something is personal data. For example by looking up whether a category is present in the list, if yes, its personal data.

harsh: Yes, this is possible. Though the list is neither authoritative (a category in DPV is not always personal data e.g. organisation email) nor exhaustive (not all possible categories are represented).

jan: Is the identifiability associated with data also represented? Or how could this be represented?

harsh: We don't provide a way to currently specify how personal data is related to an identity whether implicitly or explicitly. This is complex to do because identity could be an identifier present in the data itself, present externally and associated with the data, or the data could be used to re-identify an individual either by itself or in combination with other data.

harsh: The use and definition of PII as a term, whether under ISO or NIST is relevant here. As there is discussion on whether the ISO term PII is equivalent to Personal Data under GDPR. There are arguments that it is not.

harsh: Until we have a clear authoritative answer and a good working definition, we refrain from representing PII separately as a concept.

julian: So how would PII fit within DPV concepts? (paraphrased)

harsh: PII is either equivalent to PersonalData, in which case we don't model it, or it is a subset of PersonalData, in which case we add it as a subclass.

---

For relevance, the definitions we refer to are listed here

GDPR → A.4-1 ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cult

ural or social identity of that natural person;

ISO → 29100:2011 2.9 personally identifiable information PII any information that (a) can be used to identify the PII principal to whom such information relates, or (b) is or might be directly or indirectly linked to a PII principal

NIST → Personally Identifiable Information; Any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.

Personal Data Handling in DPV

Proposal for a property hasPersonalDataHandling to indicate applicability or inclusion of PersonalDataHandling with another concept.

This is useful to nest instances of personal data handling e.g. to distinguish between different legal bases or recipients. Example includes a personal data handling instance associated with a policy that further contains additional personal data handling instances for different legal basis.

Additionally, the personal data handling property can also be used to link a personal data handling with concepts such as DataSubject or Purpose for non-conventional modelling of policies. For example, to indicate data subjects (employees) are associated with a specific personal data handling instance.

julian: Does this mean the property hasPersonalDataHandling can be (effectively) used as an inverse property of hasDataSubject which links PersonalDataHandling to DataSubject ?

harsh: Yes, this is one of the possible uses of this.

Sensitive Data and Special Data

We have SensitivePersonalData which is a subclass of PersonalData and then SpecialCategoryPersonalData which is a subclass of SensitivePersonalData.

Definitions for these are as follows

Sensitive → That which requires additional considerations, measures, or protection

Special → That which is sensitive, prohibited from processing, and which requires additional (separate) legal basis for use.

This distinction helps in identifying data that should be handled cautiously (e.g. location traces) from that which is special category as defined under e.g. GDPR.

beatriz: in PROTECT we have the question whether DPV will be providing a list of sensitive personal data categories ?

harsh: If someone provides or we have an authoritative source for what is sensitive then yes we will provide them, similar to special categories from GDPR.

Data Source and Processing

Proposal for indicating a DataSource is PublicDataSource or NonPublicDataSource.

This is relevant in cases where public data sources have additional or different requirements and legal obligations.

jan: Does public here mean accessible? If so, then how is that accessibility specified?

jan: We also need to know how it applies to e.g. government providing data versus someone's profile being set to be publicly viewable on the internet

harsh: The definition of public and accessible can vary a lot based on jurisdiction, domain, and mediums. So there is no proposal for these at the moment.

jan: DPV should provide tools/concepts for specifying that data collected from public sources should be ensured to be valid (i.e. source is still providing that data publicly) and that it should be deleted after some amount of time.

harsh: This is not for us (DPVCG) to define, but could be from laws or codes of conduct.

harsh: To indicate duration of storage or processing, we have concepts to define these.

julian: Why are the DataSource concepts in the Processing section?

harsh: For convenience. The spreadsheet and tabs roughly correspond to where that concept will be presented in the DPV documentation. As data sources are related to data collection, they are in the same section i.e. Processing.

jan: When data is reused e.g. from public sources, what should be the legal basis and how is it indicated?

harsh: There's the hasLegalBasis property. E.g. using a dataset under CC-by license, the license is technically the legal basis for collection/usage of that data, so this makes sense to use.

---

harsh: ISO/IEC 29184 defines four categories of data collection / sources

directly → from data subject e.g. web form

indirectly → from third party e.g. credit agency

observed by controller → from something the data subject does or has but does not directly provide e.g. browser fingerprinting ; this could also understood as 'extracting' i.e. the data is present

inferred by controller → by reusing existing data to produce something that is not present in that data e.g. identifying demographics from non-demographic information (mouse movements)

Proposal is to add these to DPV as collection methods or as data sources. Contributions for these are welcome.

Next Meeting

We will meet again next week as per our regular weekly schedule.

WED JAN-26 13:00 WET / 14:00 CET

We will continue going through the spreadsheet and GitHub issues for resolution of proposed concepts.

Minutes manually created (not a transcript), formatted by scribe.perl version 188 (Sat Jan 8 18:27:23 2022 UTC).